[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-15 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181632575
  
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
 
 def __init__(self):
 pass
+
+def onStreamingStarted(self, streamingStarted):
--- End diff --

I first thought it's doc only change but I realised it's actually not after 
taking a close look. 

Implementing `onStreamingStarted` looks actually required:

> Add missing method to StreamingListener, which is invoked by proxy from 
the Java/Scala side, and throws a strange exception when not found.

This wasn't there at the first - 
https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/python/pyspark/streaming/listener.py
 / 
https://github.com/apache/spark/blob/ace0db47141ffd457c2091751038fc291f6d5a8b/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingListener.scala
 ; however, this method was added from 
https://github.com/apache/spark/commit/ce99f51d2e8cbcb1565c9e7a729183bd74a0c2bc 
but seems Python's implementation is missed.

Strictly it sounds better to have an explicit test since @aviv-ebates has a 
reproducer (assuming from the description) and should be easy to add.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-15 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181630426
  
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
 
 def __init__(self):
 pass
+
+def onStreamingStarted(self, streamingStarted):
--- End diff --

this isnt doc only change then, I think?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181568263
  
--- Diff: python/pyspark/streaming/listener.py ---
@@ -22,6 +22,10 @@ class StreamingListener(object):
 
 def __init__(self):
 pass
+
+def onStreamingStarted(self, streamingStarted):
--- End diff --

Can you add a test to `pyspark.streaming.tests.StreamingListenerTests`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-14 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181553378
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, 
fromOffsets=None,
 :param topics:  list of topic_name to consume.
 :param kafkaParams: Additional params for Kafka.
 :param fromOffsets: Per-topic/partition Kafka offsets defining the 
(inclusive) starting
-point of the stream.
+point of the stream (Dict with keys of type 
TopicAndPartition and int values).
--- End diff --

Shall we fix as so?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-13 Thread aviv-ebates
Github user aviv-ebates commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181460539
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, 
fromOffsets=None,
 :param topics:  list of topic_name to consume.
 :param kafkaParams: Additional params for Kafka.
 :param fromOffsets: Per-topic/partition Kafka offsets defining the 
(inclusive) starting
-point of the stream.
+point of the stream (Dict with keys of type 
TopicAndPartition and int values).
--- End diff --

Yeah, that works too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-13 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21057#discussion_r181299329
  
--- Diff: python/pyspark/streaming/kafka.py ---
@@ -104,7 +104,7 @@ def createDirectStream(ssc, topics, kafkaParams, 
fromOffsets=None,
 :param topics:  list of topic_name to consume.
 :param kafkaParams: Additional params for Kafka.
 :param fromOffsets: Per-topic/partition Kafka offsets defining the 
(inclusive) starting
-point of the stream.
+point of the stream (Dict with keys of type 
TopicAndPartition and int values).
--- End diff --

I would say sth like ``a dictionary containing `TopicAndPartition` to 
integers.``.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21057: 2 Improvements to Pyspark docs

2018-04-12 Thread aviv-ebates
GitHub user aviv-ebates opened a pull request:

https://github.com/apache/spark/pull/21057

2 Improvements to Pyspark docs

Each of these 2 items has cost me a few hours of debugging. Hopefully, this 
will stop others from having to debug the same thing.

1. Describe the expected type of `fromOffsets` param.
2. Add missing method to `StreamingListener`, which is invoked by proxy 
from the Java/Scala side, and throws a strange exception when not found.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aviv-ebates/spark improve-docs

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21057


commit 4b67ecd0a5e835c35083fe9a1b069f240e8062e1
Author: Aviv <37817804+aviv-ebates@...>
Date:   2018-04-10T17:53:42Z

improve doc1

commit e040a68151c308a1e46cda6723fed2b7023a7331
Author: Aviv <37817804+aviv-ebates@...>
Date:   2018-04-10T17:55:02Z

missing, undocumented, method.

Since this method is missing here, trying to use a Listener throws an 
expection. Since it's not documented, it's hard to handle.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org