[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224936431
  
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -3,7 +3,11 @@ layout: global
 title: Spark Streaming + Kafka Integration Guide (Kafka broker version 
0.10.0 or higher)
 ---
 
-The Spark Streaming integration for Kafka 0.10 is similar in design to the 
0.8 [Direct Stream 
approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers).
  It provides simple parallelism,  1:1 correspondence between Kafka partitions 
and Spark partitions, and access to offsets and metadata. However, because the 
newer integration uses the [new Kafka consumer 
API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, there are notable differences in usage. This version of the 
integration is marked as experimental, so the API is potentially subject to 
change.
+The Spark Streaming integration for Kafka 0.10 provides simple 
parallelism, 1:1 correspondence between Kafka 
+partitions and Spark partitions, and access to offsets and metadata. 
However, because the newer integration uses 
+the [new Kafka consumer 
API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, 
+there are notable differences in usage. This version of the integration is 
marked as experimental, so the API is 
--- End diff --

Yeah, good general point. Is the kafka 0.10 integration at all experimental 
anymore? Is anything that survives from 2.x to 3.x? I'd say "no" in almost all 
cases. What are your personal views on that?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread koeninger
Github user koeninger commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224899199
  
--- Diff: docs/streaming-kafka-0-10-integration.md ---
@@ -3,7 +3,11 @@ layout: global
 title: Spark Streaming + Kafka Integration Guide (Kafka broker version 
0.10.0 or higher)
 ---
 
-The Spark Streaming integration for Kafka 0.10 is similar in design to the 
0.8 [Direct Stream 
approach](streaming-kafka-0-8-integration.html#approach-2-direct-approach-no-receivers).
  It provides simple parallelism,  1:1 correspondence between Kafka partitions 
and Spark partitions, and access to offsets and metadata. However, because the 
newer integration uses the [new Kafka consumer 
API](http://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, there are notable differences in usage. This version of the 
integration is marked as experimental, so the API is potentially subject to 
change.
+The Spark Streaming integration for Kafka 0.10 provides simple 
parallelism, 1:1 correspondence between Kafka 
+partitions and Spark partitions, and access to offsets and metadata. 
However, because the newer integration uses 
+the [new Kafka consumer 
API](https://kafka.apache.org/documentation.html#newconsumerapi) instead of the 
simple API, 
+there are notable differences in usage. This version of the integration is 
marked as experimental, so the API is 
--- End diff --

Do we want to leave the new integration marked as experimental if it is now 
the only available one?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224870517
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1047,259 +1046,6 @@ def check_output(n):
 self.ssc.stop(True, True)
 
 
-class KafkaStreamTests(PySparkStreamingTestCase):
--- End diff --

OK, you or @holdenk or @koeninger might want to skim this change to make 
sure I didn't delete Pyspark + Structured Streaming + Kafka support 
inadvertentently. I don't think so, but it's not my area so much.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224680978
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1047,259 +1046,6 @@ def check_output(n):
 self.ssc.stop(True, True)
 
 
-class KafkaStreamTests(PySparkStreamingTestCase):
--- End diff --

Yup. Kafka 0.10 support at PySpark was not added per SPARK-16534.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22703#discussion_r224621015
  
--- Diff: python/pyspark/streaming/tests.py ---
@@ -1047,259 +1046,6 @@ def check_output(n):
 self.ssc.stop(True, True)
 
 
-class KafkaStreamTests(PySparkStreamingTestCase):
--- End diff --

Am I correct that all of this Pyspark Kafka integration is 0.8, not 0.10? 
that structured streaming is the only option now for Pyspark + Kafka?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22703: [SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 ...

2018-10-11 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/22703

[SPARK-25705][BUILD][STREAMING] Remove Kafka 0.8 integration

## What changes were proposed in this pull request?

Remove Kafka 0.8 integration

## How was this patch tested?

Existing tests, build scripts

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-25705

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22703


commit 4f0bab810a0a29c644f59b710c2348ae5e30598e
Author: Sean Owen 
Date:   2018-10-11T22:13:31Z

Remove Kafka 0.8 integration




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org