date:20210905

[spark] branch master updated (bdb73bb -> db95960)

2021-09-05 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from bdb73bb  [SPARK-36613][SQL][SS] Use EnumSet as the implementation of 
Table.capabilities method return value
 add db95960  [SPARK-36660][SQL] Add cot as Scala and Python functions

No new revisions were added by this update.

Summary of changes:
 python/pyspark/sql/functions.py | 17 +
 python/pyspark/sql/functions.pyi|  1 +
 .../src/main/scala/org/apache/spark/sql/functions.scala |  9 +
 3 files changed, 27 insertions(+)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-36613][SQL][SS] Use EnumSet as the implementation of Table.capabilities method return value

2021-09-05 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new bdb73bb  [SPARK-36613][SQL][SS] Use EnumSet as the implementation of 
Table.capabilities method return value
bdb73bb is described below

commit bdb73bbc277519f0c7ffa3ab856cf87515c12934
Author: yangjie01 
AuthorDate: Sun Sep 5 08:23:05 2021 -0500

[SPARK-36613][SQL][SS] Use EnumSet as the implementation of 
Table.capabilities method return value

### What changes were proposed in this pull request?
The `Table.capabilities` method return a `java.util.Set` of 
`TableCapability` enumeration type, which is implemented using 
`java.util.HashSet` now. Such Set can be replaced `with java.util.EnumSet` 
because `EnumSet` implementations can be much more efficient compared to other 
sets.

### Why are the changes needed?
Use more appropriate data structures.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?

- Pass GA or Jenkins Tests.
- Add a new benchmark to compare `create` and `contains` operation between 
`EnumSet` and `HashSet`

Closes #33867 from LuciferYang/SPARK-36613.

Authored-by: yangjie01 
Signed-off-by: Sean Owen 
---
 .../spark/sql/kafka010/KafkaSourceProvider.scala   |   4 +-
 .../EnumTypeSetBenchmark-jdk11-results.txt | 104 
 .../benchmarks/EnumTypeSetBenchmark-results.txt| 104 
 .../spark/sql/connector/catalog/V1Table.scala  |   3 +-
 .../CreateTablePartitioningValidationSuite.scala   |   3 +-
 .../connector/catalog/EnumTypeSetBenchmark.scala   | 176 +
 .../sql/connector/catalog/InMemoryTable.scala  |   5 +-
 .../datasources/noop/NoopDataSource.scala  |   6 +-
 .../sql/execution/datasources/v2/FileTable.scala   |   2 +-
 .../execution/datasources/v2/jdbc/JDBCTable.scala  |   2 +-
 .../spark/sql/execution/streaming/console.scala|   4 +-
 .../spark/sql/execution/streaming/memory.scala |   3 +-
 .../streaming/sources/ForeachWriterTable.scala |   4 +-
 .../streaming/sources/RateStreamProvider.scala |   4 +-
 .../sources/TextSocketSourceProvider.scala |   3 +-
 .../sql/execution/streaming/sources/memory.scala   |   3 +-
 .../spark/sql/connector/JavaSimpleBatchTable.java  |   5 +-
 .../connector/JavaSimpleWritableDataSource.java|   6 +-
 .../spark/sql/connector/DataSourceV2Suite.scala|   4 +-
 .../connector/FileDataSourceV2FallBackSuite.scala  |   5 +-
 .../spark/sql/connector/LocalScanSuite.scala   |   5 +-
 .../sql/connector/SimpleWritableDataSource.scala   |   2 +-
 .../sql/connector/TableCapabilityCheckSuite.scala  |   8 +-
 .../spark/sql/connector/V1ReadFallbackSuite.scala  |   4 +-
 .../spark/sql/connector/V1WriteFallbackSuite.scala |   4 +-
 .../sources/StreamingDataSourceV2Suite.scala   |  17 +-
 .../streaming/test/DataStreamTableAPISuite.scala   |   7 +-
 .../sql/streaming/util/BlockOnStopSource.scala |   4 +-
 28 files changed, 433 insertions(+), 68 deletions(-)

diff --git 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
index 4a75ab0..640996d 100644
--- 
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
+++ 
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
@@ -408,8 +408,8 @@ private[kafka010] class KafkaSourceProvider extends 
DataSourceRegister
   // ACCEPT_ANY_SCHEMA is needed because of the following reasons:
   // * Kafka writer validates the schema instead of the SQL analyzer (the 
schema is fixed)
   // * Read schema differs from write schema (please see Kafka integration 
guide)
-  Set(BATCH_READ, BATCH_WRITE, MICRO_BATCH_READ, CONTINUOUS_READ, 
STREAMING_WRITE,
-ACCEPT_ANY_SCHEMA).asJava
+  ju.EnumSet.of(BATCH_READ, BATCH_WRITE, MICRO_BATCH_READ, 
CONTINUOUS_READ, STREAMING_WRITE,
+ACCEPT_ANY_SCHEMA)
 }
 
 override def newScanBuilder(options: CaseInsensitiveStringMap): 
ScanBuilder =
diff --git a/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt 
b/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt
new file mode 100644
index 000..4c961c1
--- /dev/null
+++ b/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt
@@ -0,0 +1,104 @@
+OpenJDK 64-Bit Server VM 11+28 on Linux 4.14.0_1-0-0-42
+Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
+Test contains use empty Set:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)Rate(M/s)   Per Row(ns)   Relative
+
+Use HashSet

[spark] branch master updated (6bd491e -> 3584838)

2021-09-05 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6bd491e  [SPARK-36643][SQL] Add more information in ERROR log while 
SparkConf is modified when spark.sql.legacy.setCommandRejectsSparkCoreConfs is 
set
 add 3584838  [SPARK-36602][COER][SQL] Clean up redundant asInstanceOf casts

No new revisions were added by this update.

Summary of changes:
 core/src/main/scala/org/apache/spark/api/r/RBackendHandler.scala| 2 +-
 core/src/main/scala/org/apache/spark/rdd/RDD.scala  | 2 +-
 .../src/main/scala/org/apache/spark/scheduler/InputFormatInfo.scala | 4 ++--
 core/src/main/scala/org/apache/spark/util/Utils.scala   | 2 +-
 core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala | 2 +-
 .../org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala | 2 +-
 core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala| 2 +-
 .../org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala | 5 ++---
 .../src/test/scala/org/apache/spark/ml/linalg/BLASSuite.scala   | 2 +-
 .../src/test/scala/org/apache/spark/ml/linalg/MatricesSuite.scala   | 6 +++---
 .../scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala| 2 +-
 mllib/src/test/scala/org/apache/spark/mllib/linalg/BLASSuite.scala  | 2 +-
 .../test/scala/org/apache/spark/mllib/linalg/MatricesSuite.scala| 6 +++---
 .../pmml/export/BinaryClassificationPMMLModelExportSuite.scala  | 2 +-
 .../apache/spark/mllib/pmml/export/KMeansPMMLModelExportSuite.scala | 2 +-
 .../main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala  | 2 +-
 .../apache/spark/sql/catalyst/expressions/SpecificInternalRow.scala | 2 +-
 .../org/apache/spark/sql/catalyst/expressions/arithmetic.scala  | 2 +-
 .../spark/sql/catalyst/expressions/ExpressionEvalHelper.scala   | 2 +-
 .../sql/catalyst/expressions/xml/ReusableStringReaderSuite.scala| 6 +++---
 .../spark/sql/catalyst/expressions/xml/UDFXPathUtilSuite.scala  | 2 +-
 .../apache/spark/sql/catalyst/util/ArrayDataIndexedSeqSuite.scala   | 4 ++--
 .../apache/spark/sql/catalyst/util/CaseInsensitiveMapSuite.scala| 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala   | 2 +-
 .../spark/sql/execution/datasources/v2/ContinuousScanExec.scala | 2 +-
 .../sql/execution/streaming/continuous/ContinuousExecution.scala| 4 ++--
 sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala| 2 +-
 .../org/apache/spark/sql/execution/joins/HashedRelationSuite.scala  | 2 +-
 .../src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala | 2 +-
 .../apache/spark/sql/streaming/StreamingQueryListenerSuite.scala| 2 +-
 .../spark/sql/streaming/sources/StreamingDataSourceV2Suite.scala| 2 +-
 sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala | 3 +--
 .../scala/org/apache/spark/streaming/BasicOperationsSuite.scala | 4 ++--
 33 files changed, 44 insertions(+), 46 deletions(-)

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (bdb73bb -> db95960)

[spark] branch master updated: [SPARK-36613][SQL][SS] Use EnumSet as the implementation of Table.capabilities method return value

[spark] branch master updated (6bd491e -> 3584838)

3 matches

Site Navigation

Mail list logo

Footer information