Re: Spark streaming and ThreadLocal

2016-02-01 Thread N B
Is each partition guaranteed to execute in a single thread in a worker? Thanks N B On Fri, Jan 29, 2016 at 6:53 PM, Shixiong(Ryan) Zhu wrote: > I see. Then you should use `mapPartitions` rather than using ThreadLocal. > E.g., > > dstream.mapPartitions( iter -> >

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
Fixed a typo in the code to avoid any confusion Please comment on the code below... dstream.map( p -> { ThreadLocal d = new ThreadLocal<>() { public SomeClass initialValue() { return new SomeClass(); } }; somefunc(p, d.get()); d.remove(); return p; }; ); On Fri, Jan

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
So this use of ThreadLocal will be inside the code of a function executing on the workers i.e. within a call from one of the lambdas. Would it just look like this then: dstream.map( p -> { ThreadLocal d = new ThreadLocal<>() { public SomeClass initialValue() { return new SomeClass(); }

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
Well won't the code in lambda execute inside multiple threads in the worker because it has to process many records? I would just want to have a single copy of SomeClass instantiated per thread rather than once per each record being processed. That was what triggered this thought anyways. Thanks

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
It looks weird. Why don't you just pass "new SomeClass()" to "somefunc"? You don't need to use ThreadLocal if there are no multiple threads in your codes. On Fri, Jan 29, 2016 at 4:39 PM, N B wrote: > Fixed a typo in the code to avoid any confusion Please comment on the

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
I see. Then you should use `mapPartitions` rather than using ThreadLocal. E.g., dstream.mapPartitions( iter -> val d = new SomeClass(); return iter.map { p => somefunc(p, d.get()) }; }; ); On Fri, Jan 29, 2016 at 5:29 PM, N B wrote: > Well won't the

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
Of cause. If you use a ThreadLocal in a long living thread and forget to remove it, it's definitely a memory leak. On Thu, Jan 28, 2016 at 9:31 PM, N B wrote: > Hello, > > Does anyone know if there are any potential pitfalls associated with using > ThreadLocal variables in

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
Thanks for the response Ryan. So I would say that it is in fact the purpose of a ThreadLocal i.e. to have a copy of the variable as long as the thread lives. I guess my concern is around usage of threadpools and whether Spark streaming will internally create many threads that rotate between tasks

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
Spark Streaming uses threadpools so you need to remove ThreadLocal when it's not used. On Fri, Jan 29, 2016 at 12:55 PM, N B wrote: > Thanks for the response Ryan. So I would say that it is in fact the > purpose of a ThreadLocal i.e. to have a copy of the variable as long

Spark streaming and ThreadLocal

2016-01-28 Thread N B
Hello, Does anyone know if there are any potential pitfalls associated with using ThreadLocal variables in a Spark streaming application? One things I have seen mentioned in the context of app servers that use thread pools is that ThreadLocals can leak memory. Could this happen in Spark streaming