[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224936431 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -3,7 +3,11 @@ layout: global title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher) --- -The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 [Direct Stream approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers). It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the [new Kafka consumer API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change. +The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka +partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses +the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, +there are notable differences in usage. This version of the integration is marked as experimental, so the API is --- End diff -- Yeah, good general point. Is the kafka 0.10 integration at all experimental anymore? Is anything that survives from 2.x to 3.x? I'd say "no" in almost all cases. What are your personal views on that? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224899199 --- Diff: docs/streaming-kafka-0-10-integration.md --- @@ -3,7 +3,11 @@ layout: global title: Spark Streaming + Kafka Integration Guide (Kafka broker version 0.10.0 or higher) --- -The Spark Streaming integration for Kafka 0.10 is similar in design to the 0.8 [Direct Stream approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers). It provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses the [new Kafka consumer API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, there are notable differences in usage. This version of the integration is marked as experimental, so the API is potentially subject to change. +The Spark Streaming integration for Kafka 0.10 provides simple parallelism, 1:1 correspondence between Kafka +partitions and Spark partitions, and access to offsets and metadata. However, because the newer integration uses +the [new Kafka consumer API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the simple API, +there are notable differences in usage. This version of the integration is marked as experimental, so the API is --- End diff -- Do we want to leave the new integration marked as experimental if it is now the only available one? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224870517 --- Diff: python/pyspark/streaming/tests.py --- @@ -1047,259 +1046,6 @@ def check_output(n): self.ssc.stop(True, True) -class KafkaStreamTests(PySparkStreamingTestCase): --- End diff -- OK, you or @holdenk or @koeninger might want to skim this change to make sure I didn't delete Pyspark + Structured Streaming + Kafka support inadvertentently. I don't think so, but it's not my area so much. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224680978 --- Diff: python/pyspark/streaming/tests.py --- @@ -1047,259 +1046,6 @@ def check_output(n): self.ssc.stop(True, True) -class KafkaStreamTests(PySparkStreamingTestCase): --- End diff -- Yup. Kafka 0.10 support at PySpark was not added per SPARK-16534. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/22703#discussion_r224621015 --- Diff: python/pyspark/streaming/tests.py --- @@ -1047,259 +1046,6 @@ def check_output(n): self.ssc.stop(True, True) -class KafkaStreamTests(PySparkStreamingTestCase): --- End diff -- Am I correct that all of this Pyspark Kafka integration is 0.8, not 0.10? that structured streaming is the only option now for Pyspark + Kafka? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/22703 [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integration ## What changes were proposed in this pull request? Remove Kafka 0.8 integration ## How was this patch tested? Existing tests, build scripts You can merge this pull request into a Git repository by running: $ git pull https://github.com/srowen/spark SPARK-25705 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22703 commit 4f0bab810a0a29c644f59b710c2348ae5e30598e Author: Sean Owen Date: 2018-10-11T22:13:31Z Remove Kafka 0.8 integration --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org