Thank you. I verified -- the map is not called at all. so it is not a closure variable scope issue.
It looks like, indeed, i can't seem to make pyspark streaming work in current master. On Tue, Dec 15, 2015 at 9:44 AM, Felix Cheung <felixcheun...@hotmail.com> wrote: > Since pushCount() is called from map() it would need to ship to executor > like you have speculated. It is likely that doesn't work with z. > > Perhaps you can use the accumulator and set z at the driver (which is the > Zeppelin interpreter) > > > http://spark.apache.org/docs/latest/programming-guide.html#accumulators-a-nameaccumlinka > (Note, Scala only it seems) > > > _____________________________ > From: Dmitriy Lyubimov <dlie...@gmail.com> > Sent: Monday, December 14, 2015 5:02 PM > Subject: Pyspark streaming in Zeppelin. > To: <users@zeppelin.incubator.apache.org> > > > > Hi, > > I am trying to build a simple notebook to confirm dynamic streaming > updates to an angular widget via z.bindAngular. > > note that I use %pyspark streaming. Here is my simplified variation of the > pyspark streaming tutorial, which i confirmed working in the pyspark shell: > > Note1: > %pyspark > > from pyspark import SparkContext > from pyspark.streaming import StreamingContext > > > def pushCount(c): > global z > z.put("c",c) > return c > > ssc = StreamingContext(sc, 5) > lines = ssc.socketTextStream("localhost",9999) > > lines.count().map(pushCount).pprint() > ssc.start() > > in Note2, which is Scala/spark, we communicate the count value to the > angular, which is also ripped from a working example on Zeppelin PR 27: > > Note: > > val timer = new java.util.Timer() > > // start monitoring. once a second. > val task = new java.util.TimerTask { > def run() { > val c = z.get("c") > z.angularBind("c", Array("TestCount" -> c)) > } > } > > timer.schedule(task, 1000L, 1000L) > > Finally, there is a note attaching angular progressbar attached to the > variable "c" (counts). > > I confirmed that if i execute z.put("c", 5) in a %pyspark note, the bar > updates (via timer task polling). But update from the stream doesn't seem > to be working at all, dead on arrival. I can't get any values from it no > matter what i do. > > (1) Is there a pyspark streaming support in Zeppelin at all? Was it ever > verified to work? In fact, i haven't found a single example anywhere of > Zeppelin + pyspark streaming. > > (2) Or, the assumption that z.put() works from inside DStream.map is > wrong? Which could be true if this closure is shipped via spark-submit > somewhere. > > (3) or i am doing something else wrong? > > Master is local[*]. I am running the master Zeppelin branch (latest > snapshot). > > thank you very much in advance, > -d > > >