Re: Locking for shared RDDs

2014-12-11 Thread Tathagata Das
Aditya, I think you have the mental model of spark streaming a little
off the mark. Unlike traditional streaming systems, where any kind of
state is mutable, SparkStreaming is designed on Sparks immutable RDDs.
Streaming data is received and divided into immutable blocks, then
form immutable RDDs, and then transformations form new immutable RDDs.
Its best that you first read the Spark paper and then the Spark
Streaming paper to under the model. Once you understand that, you will
realize that since everything is immutable, the question of
consistency does not even arise :)

TD

On Mon, Dec 8, 2014 at 9:44 PM, Raghavendra Pandey
 wrote:
> You don't need to worry about locks as such as one thread/worker is
> responsible exclusively for one partition of the RDD. You can use
> Accumulator variables that spark provides to get the state updates.
>
>
> On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye 
> wrote:
>>
>> I am relatively new to Spark. I am planning to use Spark Streaming for my
>> OLAP use case, but I would like to know how RDDs are shared between
>> multiple
>> workers.
>> If I need to constantly compute some stats on the streaming data,
>> presumably
>> shared state would have to updated serially by different spark workers. Is
>> this managed by Spark automatically or does the application need to ensure
>> distributed locks are acquired?
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Locking for shared RDDs

2014-12-08 Thread Raghavendra Pandey
You don't need to worry about locks as such as one thread/worker is
responsible exclusively for one partition of the RDD. You can use
Accumulator variables that spark provides to get the state updates.

On Mon Dec 08 2014 at 8:14:28 PM aditya.athalye 
wrote:

> I am relatively new to Spark. I am planning to use Spark Streaming for my
> OLAP use case, but I would like to know how RDDs are shared between
> multiple
> workers.
> If I need to constantly compute some stats on the streaming data,
> presumably
> shared state would have to updated serially by different spark workers. Is
> this managed by Spark automatically or does the application need to ensure
> distributed locks are acquired?
>
> Thanks
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Locking for shared RDDs

2014-12-08 Thread aditya.athalye
I am relatively new to Spark. I am planning to use Spark Streaming for my
OLAP use case, but I would like to know how RDDs are shared between multiple
workers. 
If I need to constantly compute some stats on the streaming data, presumably
shared state would have to updated serially by different spark workers. Is
this managed by Spark automatically or does the application need to ensure
distributed locks are acquired?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Locking-for-shared-RDDs-tp20578.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org