Thank you.

I verified -- the map is not called at all. so it is not a closure variable
scope issue.

It looks like, indeed, i can't seem to make pyspark streaming work in
current master.


On Tue, Dec 15, 2015 at 9:44 AM, Felix Cheung <felixcheun...@hotmail.com>
wrote:

> Since pushCount() is called from map() it would need to ship to executor
> like you have speculated. It is likely that doesn't work with z.
>
> Perhaps you can use the accumulator and set z at the driver (which is the
> Zeppelin interpreter)
>
>
> http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka
> (Note, Scala only it seems)
>
>
> _____________________________
> From: Dmitriy Lyubimov <dlie...@gmail.com>
> Sent: Monday, December 14, 2015 5:02 PM
> Subject: Pyspark streaming in Zeppelin.
> To: <users@zeppelin.incubator.apache.org>
>
>
>
> Hi,
>
> I am trying to build a simple notebook to confirm dynamic streaming
> updates to an angular widget via z.bindAngular.
>
> note that I use %pyspark streaming. Here is my simplified variation of the
> pyspark streaming tutorial, which i confirmed working in the pyspark shell:
>
> Note1:
> %pyspark
>
> from pyspark import SparkContext
> from pyspark.streaming import StreamingContext
>
>
> def pushCount(c):
>     global z
>     z.put("c",c)
>     return c
>
> ssc = StreamingContext(sc, 5)
> lines = ssc.socketTextStream("localhost",9999)
>
> lines.count().map(pushCount).pprint()
> ssc.start()
>
> in Note2, which is Scala/spark, we communicate the count value to the
> angular, which is also ripped from a working example on Zeppelin PR 27:
>
> Note:
>
> val timer = new java.util.Timer()
>
> // start monitoring. once a second.
> val task = new java.util.TimerTask {
>   def run() {
>       val c = z.get("c")
>       z.angularBind("c", Array("TestCount" -> c))
>   }
> }
>
> timer.schedule(task, 1000L, 1000L)
>
> Finally, there is a note attaching angular progressbar attached to the
> variable "c" (counts).
>
> I confirmed that if i execute z.put("c", 5) in a %pyspark note, the bar
> updates (via timer task polling). But update from the stream doesn't seem
> to be working at all, dead on arrival. I can't get any values from it no
> matter what i do.
>
> (1) Is there a pyspark streaming support in Zeppelin at all? Was it ever
> verified to work? In fact, i haven't found a single example anywhere of
> Zeppelin + pyspark streaming.
>
> (2) Or, the assumption that z.put() works from inside DStream.map is
> wrong? Which could be true if this closure is shipped via spark-submit
> somewhere.
>
> (3) or i am doing something else wrong?
>
> Master is local[*]. I am running the master Zeppelin branch (latest
> snapshot).
>
> thank you very much in advance,
> -d
>
>
>

Reply via email to