anishshri-db opened a new pull request, #40600: URL: https://github.com/apache/spark/pull/40600
### What changes were proposed in this pull request? Add option to skip commit coordinator as part of StreamingWrite API for DSv2 sources/sinks. This option was already present as part of the BatchWrite API ### Why are the changes needed? Sinks such as the following are atleast-once for which we do not need to go through the commit coordinator on the driver to ensure that a single partition commits. This is even less useful for streaming use-cases where batches could be replayed from the checkpoint dir. - memory sink - console sink - no-op sink - Kafka v2 sink ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test for the change ``` [info] ReportSinkMetricsSuite: 22:23:01.276 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 22:23:03.139 WARN org.apache.spark.sql.execution.streaming.ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled. [info] - test ReportSinkMetrics with useCommitCoordinator=true (2 seconds, 709 milliseconds) 22:23:04.522 WARN org.apache.spark.sql.execution.streaming.ResolveWriteToStream: spark.sql.adaptive.enabled is not supported in streaming DataFrames/Datasets and will be disabled. [info] - test ReportSinkMetrics with useCommitCoordinator=false (373 milliseconds) 22:23:04.941 WARN org.apache.spark.sql.streaming.ReportSinkMetricsSuite: ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.streaming.ReportSinkMetricsSuite, threads: ForkJoinPool.commonPool-worker-19 (daemon=true), rpc-boss-3-1 (daemon=true), shuffle-boss-6-1 (daemon=true) ===== [info] Run completed in 4 seconds, 934 milliseconds. [info] Total number of tests run: 2 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 2, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
