[spark] branch master updated: [SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala

2021-11-02 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 59c55dd  [SPARK-24774][SQL][FOLLOWUP] Remove unused code in 
SchemaConverters.scala
59c55dd is described below

commit 59c55dd4c6f7772ef7949653679a2b76211788e8
Author: Gengliang Wang 
AuthorDate: Wed Nov 3 08:43:25 2021 +0300

[SPARK-24774][SQL][FOLLOWUP] Remove unused code in SchemaConverters.scala

### What changes were proposed in this pull request?

As MaxGekk pointed out in 
https://github.com/apache/spark/pull/22037/files#r741373793, there is some 
unused code in SchemaConverters.scala.  The UUID generator was for generating 
`fix` avro field names but we figure out a better solution during PR review.
This PR is to remove the dead code.

### Why are the changes needed?

Code clean up

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UT.

Closes #34472 from gengliangwang/SPARK-24774-followup.

Authored-by: Gengliang Wang 
Signed-off-by: Max Gekk 
---
 .../src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala   | 4 
 1 file changed, 4 deletions(-)

diff --git 
a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala 
b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
index 1c9b06b..347364c 100644
--- 
a/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
+++ 
b/external/avro/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala
@@ -18,14 +18,12 @@
 package org.apache.spark.sql.avro
 
 import scala.collection.JavaConverters._
-import scala.util.Random
 
 import org.apache.avro.{LogicalTypes, Schema, SchemaBuilder}
 import org.apache.avro.LogicalTypes.{Date, Decimal, LocalTimestampMicros, 
LocalTimestampMillis, TimestampMicros, TimestampMillis}
 import org.apache.avro.Schema.Type._
 
 import org.apache.spark.annotation.DeveloperApi
-import org.apache.spark.sql.catalyst.util.RandomUUIDGenerator
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.types.Decimal.minBytesForPrecision
 
@@ -35,8 +33,6 @@ import org.apache.spark.sql.types.Decimal.minBytesForPrecision
  */
 @DeveloperApi
 object SchemaConverters {
-  private lazy val uuidGenerator = RandomUUIDGenerator(new Random().nextLong())
-
   private lazy val nullSchema = Schema.create(Schema.Type.NULL)
 
   /**

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (78cc91c -> 90c23eb)

2021-11-02 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 78cc91c  [SPARK-32567][SQL] Add code-gen for full outer shuffled hash 
join
 add 90c23eb  [SPARK-37191][SQL] Allow merging DecimalTypes with different 
precision values

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/types/StructType.scala| 10 ++--
 .../apache/spark/sql/types/StructTypeSuite.scala   | 28 +-
 .../datasources/parquet/ParquetQuerySuite.scala| 22 +
 3 files changed, 51 insertions(+), 9 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-32567][SQL] Add code-gen for full outer shuffled hash join

2021-11-02 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 78cc91c  [SPARK-32567][SQL] Add code-gen for full outer shuffled hash 
join
78cc91c is described below

commit 78cc91c962abd48d7ec2e9721d1e1429f802dced
Author: Cheng Su 
AuthorDate: Wed Nov 3 11:18:12 2021 +0800

[SPARK-32567][SQL] Add code-gen for full outer shuffled hash join

### What changes were proposed in this pull request?

As title. This PR is to add code-gen support for FULL OUTER shuffled hash 
join.

The main change is in `ShuffledHashJoinExec.scala:doProduce()` to generate 
code for FULL OUTER join.
* `ShuffledHashJoinExec.scala:codegenFullOuterJoinWithUniqueKey()` is the 
code for join with unique join key from build side.
* `ShuffledHashJoinExec.scala:codegenFullOuterJoinWithNonUniqueKey()` is 
the code for join with non-unique key.

Example query:

```
val df1 = spark.range(5).select($"id".as("k1"))
val df2 = spark.range(10).select($"id".as("k2"))
df1.join(df2.hint("SHUFFLE_HASH"), $"k1" === $"k2" % 3 && $"k1" + 3 =!= 
$"k2", "full_outer")
```

Generated code for example query: 
https://gist.github.com/c21/828b782ee81827f4148939cb50314a7b

### Why are the changes needed?

Improve query performance for FULL OUTER shuffled hash join.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

* Added unit test in `WholeStageCodegenSuite`.
* Existing unit test in `OuterJoinSuite`.

Closes #3 from c21/shj-codegen.

Authored-by: Cheng Su 
Signed-off-by: Wenchen Fan 
---
 .../joins/BroadcastNestedLoopJoinExec.scala|   2 +-
 .../spark/sql/execution/joins/HashJoin.scala   |   4 +-
 .../sql/execution/joins/JoinCodegenSupport.scala   |  50 ++--
 .../sql/execution/joins/ShuffledHashJoinExec.scala | 260 -
 .../sql/execution/joins/SortMergeJoinExec.scala|   3 +-
 .../sql/execution/WholeStageCodegenSuite.scala |  45 +++-
 .../spark/sql/execution/joins/OuterJoinSuite.scala |  21 ++
 7 files changed, 349 insertions(+), 36 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
index 77a30b7..0677211 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala
@@ -463,7 +463,7 @@ case class BroadcastNestedLoopJoinExec(
   private def codegenOuter(ctx: CodegenContext, input: Seq[ExprCode]): String 
= {
 val (buildRowArray, buildRowArrayTerm) = prepareBroadcast(ctx)
 val (buildRow, checkCondition, _) = getJoinCondition(ctx, input, streamed, 
broadcast)
-val buildVars = genBuildSideVars(ctx, buildRow, broadcast)
+val buildVars = genOneSideJoinVars(ctx, buildRow, broadcast, 
setDefaultValue = true)
 
 val resultVars = buildSide match {
   case BuildLeft => buildVars ++ input
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
index f87acb8..0e8bb84 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala
@@ -444,7 +444,7 @@ trait HashJoin extends JoinCodegenSupport {
 val HashedRelationInfo(relationTerm, keyIsUnique, _) = prepareRelation(ctx)
 val (keyEv, anyNull) = genStreamSideJoinKey(ctx, input)
 val matched = ctx.freshName("matched")
-val buildVars = genBuildSideVars(ctx, matched, buildPlan)
+val buildVars = genOneSideJoinVars(ctx, matched, buildPlan, 
setDefaultValue = true)
 val numOutput = metricTerm(ctx, "numOutputRows")
 
 // filter the output via condition
@@ -646,7 +646,7 @@ trait HashJoin extends JoinCodegenSupport {
 val existsVar = ctx.freshName("exists")
 
 val matched = ctx.freshName("matched")
-val buildVars = genBuildSideVars(ctx, matched, buildPlan)
+val buildVars = genOneSideJoinVars(ctx, matched, buildPlan, 
setDefaultValue = false)
 val checkCondition = if (condition.isDefined) {
   val expr = condition.get
   // evaluate the variables from build side that used by condition
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala
index 96aa0be..75f0a35 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/JoinCodegenSupport.scala
+++ 

[spark] branch master updated (293c085 -> d246010)

2021-11-02 Thread dongjoon
This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 293c085  [SPARK-36895][SQL][FOLLOWUP] CREATE INDEX command should rely 
on the analyzer framework to resolve columns
 add d246010  [SPARK-36935][SQL] Extend ParquetSchemaConverter to compute 
Parquet repetition & definition level

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/parquet/io/ColumnIOUtil.java}  |   26 +-
 .../parquet/SpecificParquetRecordReaderBase.java   |2 +
 .../datasources/parquet/ParquetColumn.scala|   55 +
 .../datasources/parquet/ParquetRowConverter.scala  |   11 +-
 .../parquet/ParquetSchemaConverter.scala   |  198 +++-
 .../datasources/parquet/ParquetSchemaSuite.scala   | 1241 +++-
 6 files changed, 1433 insertions(+), 100 deletions(-)
 copy 
sql/{hive-thriftserver/src/main/java/org/apache/hive/service/cli/OperationStatus.java
 => core/src/main/java/org/apache/parquet/io/ColumnIOUtil.java} (59%)
 create mode 100644 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetColumn.scala

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-36895][SQL][FOLLOWUP] CREATE INDEX command should rely on the analyzer framework to resolve columns

2021-11-02 Thread huaxingao
This is an automated email from the ASF dual-hosted git repository.

huaxingao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 293c085  [SPARK-36895][SQL][FOLLOWUP] CREATE INDEX command should rely 
on the analyzer framework to resolve columns
293c085 is described below

commit 293c085d677220e71966d98c25cde8a06ae78468
Author: Wenchen Fan 
AuthorDate: Tue Nov 2 14:39:42 2021 -0700

[SPARK-36895][SQL][FOLLOWUP] CREATE INDEX command should rely on the 
analyzer framework to resolve columns

### What changes were proposed in this pull request?

This PR leverages the existing framework to resolve columns in the CREATE 
INDEX command.

### Why are the changes needed?

To fail earlier instead of passing invalid column names to v2 sources.

### Does this PR introduce _any_ user-facing change?

no

### How was this patch tested?

new test

Closes #34467 from cloud-fan/col.

Lead-authored-by: Wenchen Fan 
Co-authored-by: Wenchen Fan 
Signed-off-by: Huaxin Gao 
---
 .../org/apache/spark/sql/catalyst/analysis/Analyzer.scala | 15 +++
 .../org/apache/spark/sql/catalyst/parser/AstBuilder.scala |  4 ++--
 .../spark/sql/catalyst/plans/logical/v2Commands.scala | 12 +++-
 .../apache/spark/sql/catalyst/parser/DDLParserSuite.scala |  6 +++---
 .../execution/datasources/v2/DataSourceV2Strategy.scala   |  5 -
 .../apache/spark/sql/connector/DataSourceV2SQLSuite.scala | 11 ---
 6 files changed, 35 insertions(+), 18 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index f0a1c8c..068886e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -272,7 +272,7 @@ class Analyzer(override val catalogManager: CatalogManager)
   ResolveInsertInto ::
   ResolveRelations ::
   ResolvePartitionSpec ::
-  ResolveAlterTableCommands ::
+  ResolveFieldNameAndPosition ::
   AddMetadataColumns ::
   DeduplicateRelations ::
   ResolveReferences ::
@@ -3529,11 +3529,18 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   }
 
   /**
-   * Rule to mostly resolve, normalize and rewrite column names based on case 
sensitivity
-   * for alter table column commands.
+   * Rule to resolve, normalize and rewrite field names based on case 
sensitivity for commands.
*/
-  object ResolveAlterTableCommands extends Rule[LogicalPlan] {
+  object ResolveFieldNameAndPosition extends Rule[LogicalPlan] {
 def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
+  case cmd: CreateIndex if cmd.table.resolved &&
+  cmd.columns.exists(_._1.isInstanceOf[UnresolvedFieldName]) =>
+val table = cmd.table.asInstanceOf[ResolvedTable]
+cmd.copy(columns = cmd.columns.map {
+  case (u: UnresolvedFieldName, prop) => resolveFieldNames(table, 
u.name, u) -> prop
+  case other => other
+})
+
   case a: AlterTableCommand if a.table.resolved && 
hasUnresolvedFieldName(a) =>
 val table = a.table.asInstanceOf[ResolvedTable]
 a.transformExpressions {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
index 722a055..a16674f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
@@ -4429,7 +4429,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
 }
 
 val columns = ctx.columns.multipartIdentifierProperty.asScala
-  .map(_.multipartIdentifier.getText).toSeq
+  .map(_.multipartIdentifier).map(typedVisit[Seq[String]]).toSeq
 val columnsProperties = ctx.columns.multipartIdentifierProperty.asScala
   .map(x => 
(Option(x.options).map(visitPropertyKeyValues).getOrElse(Map.empty))).toSeq
 val options = 
Option(ctx.options).map(visitPropertyKeyValues).getOrElse(Map.empty)
@@ -4439,7 +4439,7 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with 
SQLConfHelper with Logg
   indexName,
   indexType,
   ctx.EXISTS != null,
-  
columns.map(FieldReference(_).asInstanceOf[FieldReference]).zip(columnsProperties),
+  columns.map(UnresolvedFieldName(_)).zip(columnsProperties),
   options)
   }
 
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala
index 

[spark] branch master updated (ec6a3ae -> b78167a)

2021-11-02 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from ec6a3ae  [SPARK-37176][SQL] Sync JsonInferSchema#infer method's 
exception handle logic with JacksonParser#parse method
 add b78167a  [SPARK-37066][SQL] Improve error message to show file path 
when failed to read next file

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/avro/AvroLogicalTypeSuite.scala  |  2 +-
 .../apache/spark/sql/errors/QueryExecutionErrors.scala|  8 
 .../spark/sql/execution/datasources/FileScanRDD.scala | 13 ++---
 .../execution/datasources/v2/FilePartitionReader.scala| 15 +++
 .../execution/datasources/FileSourceStrategySuite.scala   |  4 ++--
 .../spark/sql/execution/datasources/csv/CSVSuite.scala|  4 ++--
 .../datasources/parquet/ParquetSchemaSuite.scala  |  6 ++
 7 files changed, 24 insertions(+), 28 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle logic with JacksonParser#parse method

2021-11-02 Thread maxgekk
This is an automated email from the ASF dual-hosted git repository.

maxgekk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new ec6a3ae  [SPARK-37176][SQL] Sync JsonInferSchema#infer method's 
exception handle logic with JacksonParser#parse method
ec6a3ae is described below

commit ec6a3ae6dff1dc9c63978ae14a1793ccd771
Author: Xianjin YE 
AuthorDate: Tue Nov 2 12:40:09 2021 +0300

[SPARK-37176][SQL] Sync JsonInferSchema#infer method's exception handle 
logic with JacksonParser#parse method

### What changes were proposed in this pull request?
Change `JsonInferSchema#infer`'s exception handle logic to be aligned with 
`JacksonParser#parse`

### Why are the changes needed?
To reduce behavior inconsistency, users can have the same expectation for 
schema infer and json parse when dealing with some malformed input.

### Does this PR introduce _any_ user-facing change?
Yes.
Before this patch, json's inferring schema could be failed for some 
malformed input but succeeded when parsing.
After this patch, they have the same exception handle logic.

### How was this patch tested?
Added one new test and modify one exist test to cover the new case.

Closes #34455 from advancedxy/SPARK-37176.

Authored-by: Xianjin YE 
Signed-off-by: Max Gekk 
---
 .../spark/sql/catalyst/json/JsonInferSchema.scala  | 33 +++-
 .../test/resources/test-data/malformed_utf8.json   |  3 ++
 .../sql/execution/datasources/json/JsonSuite.scala | 35 ++
 3 files changed, 63 insertions(+), 8 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
index 3b17cde..3b62b16 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.catalyst.json
 
+import java.io.CharConversionException
+import java.nio.charset.MalformedInputException
 import java.util.Comparator
 
 import scala.util.control.Exception.allCatch
@@ -45,6 +47,18 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
 legacyFormat = FAST_DATE_FORMAT,
 isParsing = true)
 
+  private def handleJsonErrorsByParseMode(parseMode: ParseMode,
+  columnNameOfCorruptRecord: String, e: Throwable): Option[StructType] = {
+parseMode match {
+  case PermissiveMode =>
+Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
StringType
+  case DropMalformedMode =>
+None
+  case FailFastMode =>
+throw 
QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
+}
+  }
+
   /**
* Infer the type of a collection of json records in three stages:
*   1. Infer the type of each record
@@ -68,14 +82,17 @@ private[sql] class JsonInferSchema(options: JSONOptions) 
extends Serializable {
 Some(inferField(parser))
   }
 } catch {
-  case  e @ (_: RuntimeException | _: JsonProcessingException) => 
parseMode match {
-case PermissiveMode =>
-  Some(StructType(Seq(StructField(columnNameOfCorruptRecord, 
StringType
-case DropMalformedMode =>
-  None
-case FailFastMode =>
-  throw 
QueryExecutionErrors.malformedRecordsDetectedInSchemaInferenceError(e)
-  }
+  case e @ (_: RuntimeException | _: JsonProcessingException |
+_: MalformedInputException) =>
+handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, 
e)
+  case e: CharConversionException if options.encoding.isEmpty =>
+val msg =
+  """JSON parser cannot handle a character in its input.
+|Specifying encoding as an input option explicitly might help 
to resolve the issue.
+|""".stripMargin + e.getMessage
+val wrappedCharException = new CharConversionException(msg)
+wrappedCharException.initCause(e)
+handleJsonErrorsByParseMode(parseMode, columnNameOfCorruptRecord, 
wrappedCharException)
 }
   }.reduceOption(typeMerger).toIterator
 }
diff --git a/sql/core/src/test/resources/test-data/malformed_utf8.json 
b/sql/core/src/test/resources/test-data/malformed_utf8.json
new file mode 100644
index 000..c57eb43
--- /dev/null
+++ b/sql/core/src/test/resources/test-data/malformed_utf8.json
@@ -0,0 +1,3 @@
+{"a": 1}
+{"a": 1}
+�
\ No newline at end of file
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala
 

Subscribe

2021-11-02 Thread XING JIN


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-37156][PYTHON] Inline type hints for python/pyspark/storagelevel.py

2021-11-02 Thread gurwls223
This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73e8628  [SPARK-37156][PYTHON] Inline type hints for 
python/pyspark/storagelevel.py
73e8628 is described below

commit 73e8628f48db3f17a39f2154b54cbdea3d31e92c
Author: dchvn 
AuthorDate: Tue Nov 2 16:43:17 2021 +0900

[SPARK-37156][PYTHON] Inline type hints for python/pyspark/storagelevel.py

### What changes were proposed in this pull request?
Inline type hints for python/pyspark/storagelevel.py

### Why are the changes needed?
We can take advantage of static type checking within the functions by 
inlining the type hints.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Existing tests

Closes #34437 from dchvn/SPARK-37156.

Authored-by: dchvn 
Signed-off-by: Hyukjin Kwon 
---
 python/pyspark/storagelevel.py  | 25 +---
 python/pyspark/storagelevel.pyi | 43 -
 2 files changed, 22 insertions(+), 46 deletions(-)

diff --git a/python/pyspark/storagelevel.py b/python/pyspark/storagelevel.py
index ecf8e5c..51fdebd 100644
--- a/python/pyspark/storagelevel.py
+++ b/python/pyspark/storagelevel.py
@@ -17,6 +17,8 @@
 
 __all__ = ["StorageLevel"]
 
+from typing import ClassVar
+
 
 class StorageLevel(object):
 
@@ -29,18 +31,35 @@ class StorageLevel(object):
 formats.
 """
 
-def __init__(self, useDisk, useMemory, useOffHeap, deserialized, 
replication=1):
+DISK_ONLY: ClassVar["StorageLevel"]
+DISK_ONLY_2: ClassVar["StorageLevel"]
+DISK_ONLY_3: ClassVar["StorageLevel"]
+MEMORY_ONLY: ClassVar["StorageLevel"]
+MEMORY_ONLY_2: ClassVar["StorageLevel"]
+MEMORY_AND_DISK: ClassVar["StorageLevel"]
+MEMORY_AND_DISK_2: ClassVar["StorageLevel"]
+OFF_HEAP: ClassVar["StorageLevel"]
+MEMORY_AND_DISK_DESER: ClassVar["StorageLevel"]
+
+def __init__(
+self,
+useDisk: bool,
+useMemory: bool,
+useOffHeap: bool,
+deserialized: bool,
+replication: int = 1,
+):
 self.useDisk = useDisk
 self.useMemory = useMemory
 self.useOffHeap = useOffHeap
 self.deserialized = deserialized
 self.replication = replication
 
-def __repr__(self):
+def __repr__(self) -> str:
 return "StorageLevel(%s, %s, %s, %s, %s)" % (
 self.useDisk, self.useMemory, self.useOffHeap, self.deserialized, 
self.replication)
 
-def __str__(self):
+def __str__(self) -> str:
 result = ""
 result += "Disk " if self.useDisk else ""
 result += "Memory " if self.useMemory else ""
diff --git a/python/pyspark/storagelevel.pyi b/python/pyspark/storagelevel.pyi
deleted file mode 100644
index 2eb0585..000
--- a/python/pyspark/storagelevel.pyi
+++ /dev/null
@@ -1,43 +0,0 @@
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-from typing import ClassVar
-
-class StorageLevel:
-DISK_ONLY: ClassVar[StorageLevel]
-DISK_ONLY_2: ClassVar[StorageLevel]
-MEMORY_ONLY: ClassVar[StorageLevel]
-MEMORY_ONLY_2: ClassVar[StorageLevel]
-DISK_ONLY_3: ClassVar[StorageLevel]
-MEMORY_AND_DISK: ClassVar[StorageLevel]
-MEMORY_AND_DISK_2: ClassVar[StorageLevel]
-OFF_HEAP: ClassVar[StorageLevel]
-
-useDisk: bool
-useMemory: bool
-useOffHeap: bool
-deserialized: bool
-replication: int
-def __init__(
-self,
-useDisk: bool,
-useMemory: bool,
-useOffHeap: bool,
-deserialized: bool,
-replication: int = ...,
-) -> None: ...

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (b4a6eb6 -> 6d42230)

2021-11-02 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from b4a6eb6  [SPARK-37164][SQL] Add ExpressionBuilder for functions with 
complex overloads
 add 6d42230  [SPARK-37168][SQL] Improve error messages for SQL functions 
and operators under ANSI mode

No new revisions were added by this update.

Summary of changes:
 core/src/main/resources/error/error-classes.json   |  6 +-
 .../catalyst/expressions/datetimeExpressions.scala | 90 ++
 .../catalyst/expressions/intervalExpressions.scala | 13 +++-
 .../spark/sql/catalyst/util/DateTimeUtils.scala|  6 +-
 .../spark/sql/errors/QueryExecutionErrors.scala| 48 ++--
 .../expressions/StringExpressionsSuite.scala   |  2 +-
 .../resources/sql-tests/results/ansi/array.sql.out | 14 ++--
 .../resources/sql-tests/results/ansi/date.sql.out  |  8 +-
 .../resources/sql-tests/results/ansi/map.sql.out   |  4 +-
 .../sql-tests/results/ansi/timestamp.sql.out   | 20 ++---
 .../sql-tests/results/postgreSQL/date.sql.out  |  6 +-
 .../results/timestampNTZ/timestamp-ansi.sql.out| 20 ++---
 12 files changed, 155 insertions(+), 82 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated (320fa07 -> b4a6eb6)

2021-11-02 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 320fa07  [SPARK-37159][SQL][TESTS] Change 
HiveExternalCatalogVersionsSuite to be able to test with Java 17
 add b4a6eb6  [SPARK-37164][SQL] Add ExpressionBuilder for functions with 
complex overloads

No new revisions were added by this update.

Summary of changes:
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  20 ++-
 .../catalyst/expressions/stringExpressions.scala   | 163 +++--
 .../spark/sql/errors/QueryCompilationErrors.scala  |   4 +-
 .../expressions/StringExpressionsSuite.scala   |   4 +-
 .../scala/org/apache/spark/sql/functions.scala |   4 +-
 .../sql-functions/sql-expression-schema.md |   4 +-
 .../sql-tests/inputs/string-functions.sql  |   8 +-
 .../results/ansi/string-functions.sql.out  |  16 +-
 .../sql-tests/results/string-functions.sql.out |  28 ++--
 9 files changed, 137 insertions(+), 114 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org