Hi Sean

Many many thanks. This really clears a lot up for me

Andy

From:  Sean Owen <so...@cloudera.com>
Date:  Wednesday, October 1, 2014 at 11:27 AM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  Re: can I think of JavaDStream<> foreachRDD() as being 'for each
mini batch' ?

> Yes, foreachRDD will do your something for each RDD, which is what you
> get for each mini-batch of input.
> 
> The operations you express on a DStream (or JavaDStream) are all,
> really, "for each RDD", including print(). Logging is a little harder
> to reason about since the logging will happen on a potentially remote
> receiver. I am not sure if this explains your observed behavior; it
> depends on what you were logging.
> 
> On Wed, Oct 1, 2014 at 6:51 PM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>>  Hi
>> 
>>  I am new to Spark Streaming. Can I think of  JavaDStream<> foreachRDD() as
>>  being 'for each mini batch¹? The java doc does not say much about this
>>  function.
>> 
>>  Here is the background. I am writing a little test program to figure out how
>>  to use streams. At some point I wanted to calculate an aggregate. In my
>>  first attempt my driver did a couple of transformations and tries to get the
>>  count(). I wrote this code like I would a spark core app and added some
>>  logging. My log statements where only executed once, how ever
>>  JavaDStream<>print() appears to run on each mini batch.
>> 
>>  When I put my logging and aggregation code inside foreachRDD() things work
>>  as expected my aggegrate and logging appear to be executed on each
>>  minibactch
>> 
>>  I am running on a 4 machine cluster.  I create a message with the value of
>>  each mini batch aggregate and use logging and System.out. Given the message
>>  shows up on my console is it safe to assume that this output code is
>>  executing in my driver? The ip shows it is running on my Master.
>> 
>>  I thought maybe the message is showing up here because I do not have enough
>>  data in the steam to force load onto the workers?
>> 
>> 
>>  Any in sites would be greatly appreciated.
>> 
>>  Andy
> 


Reply via email to