This is an automated email from the ASF dual-hosted git repository.
srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new bdb73bb [SPARK-36613][SQL][SS] Use EnumSet as the implementation of
Table.capabilities method return value
bdb73bb is described below
commit bdb73bbc277519f0c7ffa3ab856cf87515c12934
Author: yangjie01
AuthorDate: Sun Sep 5 08:23:05 2021 -0500
[SPARK-36613][SQL][SS] Use EnumSet as the implementation of
Table.capabilities method return value
### What changes were proposed in this pull request?
The `Table.capabilities` method return a `java.util.Set` of
`TableCapability` enumeration type, which is implemented using
`java.util.HashSet` now. Such Set can be replaced `with java.util.EnumSet`
because `EnumSet` implementations can be much more efficient compared to other
sets.
### Why are the changes needed?
Use more appropriate data structures.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Pass GA or Jenkins Tests.
- Add a new benchmark to compare `create` and `contains` operation between
`EnumSet` and `HashSet`
Closes #33867 from LuciferYang/SPARK-36613.
Authored-by: yangjie01
Signed-off-by: Sean Owen
---
.../spark/sql/kafka010/KafkaSourceProvider.scala | 4 +-
.../EnumTypeSetBenchmark-jdk11-results.txt | 104
.../benchmarks/EnumTypeSetBenchmark-results.txt| 104
.../spark/sql/connector/catalog/V1Table.scala | 3 +-
.../CreateTablePartitioningValidationSuite.scala | 3 +-
.../connector/catalog/EnumTypeSetBenchmark.scala | 176 +
.../sql/connector/catalog/InMemoryTable.scala | 5 +-
.../datasources/noop/NoopDataSource.scala | 6 +-
.../sql/execution/datasources/v2/FileTable.scala | 2 +-
.../execution/datasources/v2/jdbc/JDBCTable.scala | 2 +-
.../spark/sql/execution/streaming/console.scala| 4 +-
.../spark/sql/execution/streaming/memory.scala | 3 +-
.../streaming/sources/ForeachWriterTable.scala | 4 +-
.../streaming/sources/RateStreamProvider.scala | 4 +-
.../sources/TextSocketSourceProvider.scala | 3 +-
.../sql/execution/streaming/sources/memory.scala | 3 +-
.../spark/sql/connector/JavaSimpleBatchTable.java | 5 +-
.../connector/JavaSimpleWritableDataSource.java| 6 +-
.../spark/sql/connector/DataSourceV2Suite.scala| 4 +-
.../connector/FileDataSourceV2FallBackSuite.scala | 5 +-
.../spark/sql/connector/LocalScanSuite.scala | 5 +-
.../sql/connector/SimpleWritableDataSource.scala | 2 +-
.../sql/connector/TableCapabilityCheckSuite.scala | 8 +-
.../spark/sql/connector/V1ReadFallbackSuite.scala | 4 +-
.../spark/sql/connector/V1WriteFallbackSuite.scala | 4 +-
.../sources/StreamingDataSourceV2Suite.scala | 17 +-
.../streaming/test/DataStreamTableAPISuite.scala | 7 +-
.../sql/streaming/util/BlockOnStopSource.scala | 4 +-
28 files changed, 433 insertions(+), 68 deletions(-)
diff --git
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
index 4a75ab0..640996d 100644
---
a/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
+++
b/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceProvider.scala
@@ -408,8 +408,8 @@ private[kafka010] class KafkaSourceProvider extends
DataSourceRegister
// ACCEPT_ANY_SCHEMA is needed because of the following reasons:
// * Kafka writer validates the schema instead of the SQL analyzer (the
schema is fixed)
// * Read schema differs from write schema (please see Kafka integration
guide)
- Set(BATCH_READ, BATCH_WRITE, MICRO_BATCH_READ, CONTINUOUS_READ,
STREAMING_WRITE,
-ACCEPT_ANY_SCHEMA).asJava
+ ju.EnumSet.of(BATCH_READ, BATCH_WRITE, MICRO_BATCH_READ,
CONTINUOUS_READ, STREAMING_WRITE,
+ACCEPT_ANY_SCHEMA)
}
override def newScanBuilder(options: CaseInsensitiveStringMap):
ScanBuilder =
diff --git a/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt
b/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt
new file mode 100644
index 000..4c961c1
--- /dev/null
+++ b/sql/catalyst/benchmarks/EnumTypeSetBenchmark-jdk11-results.txt
@@ -0,0 +1,104 @@
+OpenJDK 64-Bit Server VM 11+28 on Linux 4.14.0_1-0-0-42
+Intel(R) Xeon(R) Gold 6271C CPU @ 2.60GHz
+Test contains use empty Set: Best Time(ms) Avg Time(ms)
Stdev(ms)Rate(M/s) Per Row(ns) Relative
+
+Use HashSet