Answers inline.

On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay <bill.jaypeter...@gmail.com>
wrote:

> Hi all,
>
> I am currently using Spark Streaming to conduct a real-time data
> analytics. We receive data from Kafka. We want to generate output files
> that contain results that are based on the data we receive from a specific
> time interval.
>
> I have several questions on Spark Streaming's timestamp:
>
> 1) If I use saveAsTextFiles, it seems Spark streaming will generate files
> in complete minutes, such as 5:00:01, 5:00:01 (converted from Unix time),
> etc. Does this mean the results are based on the data from 5:00:01 to
> 5:00:02, 5:00:02 to 5:00:03, etc. Or the time stamps just mean the time the
> files are generated?
>
> File named  5:00:01 contains results from data received between  5:00:00
and  5:00:01 (based on system time of the cluster).



> 2) If I do not use saveAsTextFiles, how do I get the exact time interval
> of the RDD when I use foreachRDD to do custom output of the results?
>
> There is a version of foreachRDD which allows you specify the function
that takes in Time object.


> 3) How can we specify the starting time of the batches?
>

What do you mean? Batches are timed based on the system time of the
cluster.


>
> Thanks!
>
> Bill
>

Reply via email to