Hi Sean

I guess I am missing something.

JavaDStream<String> foo = Š
JavaDStream<Long> c = foo.count()

This is circular. I need to get the count as an actual scalar value not a
JavaDStream. Some one else posted psudo code that used foreachRDD() . This
seems to work for me.

Thanks

Andy


From:  Sean Owen <so...@cloudera.com>
Date:  Wednesday, October 1, 2014 at 2:32 AM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  Re: how to get actual count from as long from JavaDStream ?

> It's much easier than all this. Spark Streaming gives you a DStream of
> RDDs. You want the count for each RDD. DStream.count() gives you
> exactly that: a DStream of Longs which are the counts of events in
> each mini batch.
> 
> On Tue, Sep 30, 2014 at 8:42 PM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>>  Hi
>> 
>>  I have a simple streaming app. All I want to do is figure out how many lines
>>  I have received in the current mini batch. If numLines was a JavaRDD I could
>>  simply call count(). How do you do something similar in Streaming?
>> 
>> 
>>  Here is my psudo code
>> 
>> 
>> 
>>  JavaDStream<String> msg = logs.filter(selectINFO);
>> 
>>  JavaDStream<Long> numLines  = msg.count()
>> 
>> 
>>  Long totalCount = numLines ???
>> 
>> 
>> 
>>  Here is what I am really trying to do. I have a python script that generated
>>  a graph of totalCount vs time. Python does not support streaming. As a work
>>  around I have a java program that does the steaming. I want to pass the data
>>  back to the python script. It has been suggested I can use rdd.pipe().
>> 
>> 
>>  In python I call rdd.pipe(scriptToStartJavaSteam.sh)
>> 
>> 
>>  All I need to do is for each mini batch figure out how to get the the count
>>  of the current mini batch and write it to standard out. Seems like this
>>  should be simple.
>> 
>> 
>>  Maybe Streams do not work the way I think? In a spark core app, I am able to
>>  get values like count in my driver and do what ever I want with the local
>>  value. With streams I know I am getting mini patches because print() display
>>  the first 10 lines of my steam. I assume that some how print is executed in
>>  my driver so somehow  data was sent from the workers back to the driver.
>> 
>> 
>>  Any comments or suggestions would be greatly appreciated.
>> 
>> 
>>  Andy
>> 
>> 
>>  P.s. Should I be asking a different question?
>> 
>> 
>> 
>> 
>> 
> 


Reply via email to