[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181632575 --- Diff: python/pyspark/streaming/listener.py --- @@ -22,6 +22,10 @@ class StreamingListener(object): def __init__(self): pass + +def onStreamingStarted(self, streamingStarted): --- End diff -- I first thought it's doc only change but I realised it's actually not after taking a close look. Implementing `onStreamingStarted` looks actually required: > Add missing method to StreamingListener, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found. This wasn't there at the first - https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py / https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala ; however, this method was added from https://github.com/apache/spark/commit/ce99f51d2e8cbcb1565c9e7a729183bd74a0c2bc but seems Python's implementation is missed. Strictly it sounds better to have an explicit test since @aviv-ebates has a reproducer (assuming from the description) and should be easy to add. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181630426 --- Diff: python/pyspark/streaming/listener.py --- @@ -22,6 +22,10 @@ class StreamingListener(object): def __init__(self): pass + +def onStreamingStarted(self, streamingStarted): --- End diff -- this isnt doc only change then, I think? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181568263 --- Diff: python/pyspark/streaming/listener.py --- @@ -22,6 +22,10 @@ class StreamingListener(object): def __init__(self): pass + +def onStreamingStarted(self, streamingStarted): --- End diff -- Can you add a test to `pyspark.streaming.tests.StreamingListenerTests`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181553378 --- Diff: python/pyspark/streaming/kafka.py --- @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None, :param topics: list of topic_name to consume. :param kafkaParams: Additional params for Kafka. :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting -point of the stream. +point of the stream (Dict with keys of type TopicAndPartition and int values). --- End diff -- Shall we fix as so? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user aviv-ebates commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181460539 --- Diff: python/pyspark/streaming/kafka.py --- @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None, :param topics: list of topic_name to consume. :param kafkaParams: Additional params for Kafka. :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting -point of the stream. +point of the stream (Dict with keys of type TopicAndPartition and int values). --- End diff -- Yeah, that works too. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21057#discussion_r181299329 --- Diff: python/pyspark/streaming/kafka.py --- @@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, fromOffsets=None, :param topics: list of topic_name to consume. :param kafkaParams: Additional params for Kafka. :param fromOffsets: Per-topic/partition Kafka offsets defining the (inclusive) starting -point of the stream. +point of the stream (Dict with keys of type TopicAndPartition and int values). --- End diff -- I would say sth like ``a dictionary containing `TopicAndPartition` to integers.``. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs
GitHub user aviv-ebates opened a pull request: https://github.com/apache/spark/pull/21057 2 Improvements to Pyspark docs Each of these 2 items has cost me a few hours of debugging. Hopefully, this will stop others from having to debug the same thing. 1. Describe the expected type of `fromOffsets` param. 2. Add missing method to `StreamingListener`, which is invoked by proxy from the Java/Scala side, and throws a strange exception when not found. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aviv-ebates/spark improve-docs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21057 commit 4b67ecd0a5e835c35083fe9a1b069f240e8062e1 Author: Aviv <37817804+aviv-ebates@...> Date: 2018-04-10T17:53:42Z improve doc1 commit e040a68151c308a1e46cda6723fed2b7023a7331 Author: Aviv <37817804+aviv-ebates@...> Date: 2018-04-10T17:55:02Z missing, undocumented, method. Since this method is missing here, trying to use a Listener throws an expection. Since it's not documented, it's hard to handle. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org