Re: Locking for shared RDDs
Aditya, I think you have the mental model of spark streaming a little off the mark. Unlike traditional streaming systems, where any kind of state is mutable, SparkStreaming is designed on Sparks immutable RDDs. Streaming data is received and divided into immutable blocks, then form immutable RDDs, and then transformations form new immutable RDDs. Its best that you first read the Spark paper and then the Spark Streaming paper to under the model. Once you understand that, you will realize that since everything is immutable, the question of consistency does not even arise :) TD On Mon, Dec 8, 2014 at 9:44 PM, Raghavendra Pandey wrote: > You don't need to worry about locks as such as one thread/worker is > responsible exclusively for one partition of the RDD. You can use > Accumulator variables that spark provides to get the state updates. > > > On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye > wrote: >> >> I am relatively new to Spark. I am planning to use Spark Streaming for my >> OLAP use case, but I would like to know how RDDs are shared between >> multiple >> workers. >> If I need to constantly compute some stats on the streaming data, >> presumably >> shared state would have to updated serially by different spark workers. Is >> this managed by Spark automatically or does the application need to ensure >> distributed locks are acquired? >> >> Thanks >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Locking for shared RDDs
You don't need to worry about locks as such as one thread/worker is responsible exclusively for one partition of the RDD. You can use Accumulator variables that spark provides to get the state updates. On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye wrote: > I am relatively new to Spark. I am planning to use Spark Streaming for my > OLAP use case, but I would like to know how RDDs are shared between > multiple > workers. > If I need to constantly compute some stats on the streaming data, > presumably > shared state would have to updated serially by different spark workers. Is > this managed by Spark automatically or does the application need to ensure > distributed locks are acquired? > > Thanks > > > > -- > View this message in context: http://apache-spark-user-list. > 1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Locking for shared RDDs
I am relatively new to Spark. I am planning to use Spark Streaming for my OLAP use case, but I would like to know how RDDs are shared between multiple workers. If I need to constantly compute some stats on the streaming data, presumably shared state would have to updated serially by different spark workers. Is this managed by Spark automatically or does the application need to ensure distributed locks are acquired? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org