Re: RDD boundaries and triggering processing using tags in the data

2015-06-01 Thread Akhil Das
May be you can make use of the Window operations
https://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#window-operations,
Also another approach would be to keep your incoming data in
Hbase/Redis/Cassandra kind of database and then whenever you need to
average it, you just query the database and average it.

Thanks
Best Regards

On Thu, May 28, 2015 at 1:22 AM, David Webber david.web...@gmail.com
wrote:

 Hi All,

 I'm new to Spark and I'd like some help understanding if a particular use
 case would be a good fit for Spark Streaming.

 I have an imaginary stream of sensor data consisting of integers 1-10.
 Every time the sensor reads 10 I'd like to average all the numbers that
 were received since the last 10

 example input: 10 5 8 4 6 2 1 2 8 8 8 1 6 9 1 3 10 1 3 10 ...
 desired output: 4.8, 2.0

 I'm confused about what happens if sensor readings fall into different
 RDDs.

 RDD1:  10 5 8 4 6 2 1 2 8 8 8
 RDD2:  1 6 9 1 3 10 1 3 10
 output: ???, 2.0

 My imaginary sensor doesn't read at fixed time intervals, so breaking the
 stream into RDDs by time interval won't ensure the data is packaged
 properly.  Additionally, multiple sensors are writing to the same stream
 (though I think flatMap can parse the origin stream into streams for
 individual sensors, correct?).

 My best guess for processing goes like
 1) flatMap() to break out individual sensor streams
 2) Custom parser to accumulate input data until 10 is found, then create
 a
 new output RDD for each sensor and data grouping
 3) average the values from step 2

 I would greatly appreciate pointers to some specific documentation or
 examples if you have seen something like this before.

 Thanks,
 David



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/RDD-boundaries-and-triggering-processing-using-tags-in-the-data-tp23060.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




RDD boundaries and triggering processing using tags in the data

2015-05-27 Thread David Webber
Hi All,

I'm new to Spark and I'd like some help understanding if a particular use
case would be a good fit for Spark Streaming.

I have an imaginary stream of sensor data consisting of integers 1-10. 
Every time the sensor reads 10 I'd like to average all the numbers that
were received since the last 10

example input: 10 5 8 4 6 2 1 2 8 8 8 1 6 9 1 3 10 1 3 10 ...
desired output: 4.8, 2.0

I'm confused about what happens if sensor readings fall into different RDDs.  

RDD1:  10 5 8 4 6 2 1 2 8 8 8
RDD2:  1 6 9 1 3 10 1 3 10
output: ???, 2.0

My imaginary sensor doesn't read at fixed time intervals, so breaking the
stream into RDDs by time interval won't ensure the data is packaged
properly.  Additionally, multiple sensors are writing to the same stream
(though I think flatMap can parse the origin stream into streams for
individual sensors, correct?).  

My best guess for processing goes like
1) flatMap() to break out individual sensor streams
2) Custom parser to accumulate input data until 10 is found, then create a
new output RDD for each sensor and data grouping
3) average the values from step 2

I would greatly appreciate pointers to some specific documentation or
examples if you have seen something like this before.

Thanks,
David



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-boundaries-and-triggering-processing-using-tags-in-the-data-tp23060.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org