Get Spark Streaming timestamp

2014-07-23 Thread Bill Jay
Hi all,

I have a question regarding Spark streaming. When we use the
saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
series of files such as:

result-140614896, result-140614802, result-140614808, etc.

I think this is the timestamp for the beginning of each batch. How can we
extract the variable and use it in our code? Thanks!

Bill


Re: Get Spark Streaming timestamp

2014-07-23 Thread Tobias Pfeiffer
Bill,

Spark Streaming's DStream provides overloaded methods for transform() and
foreachRDD() that allow you to access the timestamp of a batch:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.DStream

I think the timestamp is the end of the batch, not the beginning. For
example, I compute runtime taking the difference between now() and the time
I get as a parameter in foreachRDD().

Tobias



On Thu, Jul 24, 2014 at 6:39 AM, Bill Jay bill.jaypeter...@gmail.com
wrote:

 Hi all,

 I have a question regarding Spark streaming. When we use the
 saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
 series of files such as:

 result-140614896, result-140614802, result-140614808, etc.

 I think this is the timestamp for the beginning of each batch. How can we
 extract the variable and use it in our code? Thanks!

 Bill



Re: Get Spark Streaming timestamp

2014-07-23 Thread Bill Jay
Hi Tobias,

It seems this parameter is an input to the function. What I am expecting is
output from a function that tells me the starting or ending time of the
batch. For instance, If I use saveAsTextFiles, it seems DStream will
generate a batch every minute and the starting time is a complete minute
(batch size is 60 seconds). Thanks!

Bill


On Wed, Jul 23, 2014 at 6:56 PM, Tobias Pfeiffer t...@preferred.jp wrote:

 Bill,

 Spark Streaming's DStream provides overloaded methods for transform() and
 foreachRDD() that allow you to access the timestamp of a batch:

 http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.dstream.DStream

 I think the timestamp is the end of the batch, not the beginning. For
 example, I compute runtime taking the difference between now() and the time
 I get as a parameter in foreachRDD().

 Tobias



 On Thu, Jul 24, 2014 at 6:39 AM, Bill Jay bill.jaypeter...@gmail.com
 wrote:

 Hi all,

 I have a question regarding Spark streaming. When we use the
 saveAsTextFiles function and my batch is 60 seconds, Spark will generate a
 series of files such as:

 result-140614896, result-140614802, result-140614808, etc.

 I think this is the timestamp for the beginning of each batch. How can we
 extract the variable and use it in our code? Thanks!

 Bill