Is each partition guaranteed to execute in a single thread in a worker?
Thanks
N B
On Fri, Jan 29, 2016 at 6:53 PM, Shixiong(Ryan) Zhu wrote:
> I see. Then you should use `mapPartitions` rather than using ThreadLocal.
> E.g.,
>
> dstream.mapPartitions( iter ->
>
Fixed a typo in the code to avoid any confusion Please comment on the
code below...
dstream.map( p -> { ThreadLocal d = new ThreadLocal<>() {
public SomeClass initialValue() { return new SomeClass(); }
};
somefunc(p, d.get());
d.remove();
return p;
}; );
On Fri, Jan
So this use of ThreadLocal will be inside the code of a function executing
on the workers i.e. within a call from one of the lambdas. Would it just
look like this then:
dstream.map( p -> { ThreadLocal d = new ThreadLocal<>() {
public SomeClass initialValue() { return new SomeClass(); }
Well won't the code in lambda execute inside multiple threads in the worker
because it has to process many records? I would just want to have a single
copy of SomeClass instantiated per thread rather than once per each record
being processed. That was what triggered this thought anyways.
Thanks
It looks weird. Why don't you just pass "new SomeClass()" to "somefunc"?
You don't need to use ThreadLocal if there are no multiple threads in your
codes.
On Fri, Jan 29, 2016 at 4:39 PM, N B wrote:
> Fixed a typo in the code to avoid any confusion Please comment on the
I see. Then you should use `mapPartitions` rather than using ThreadLocal.
E.g.,
dstream.mapPartitions( iter ->
val d = new SomeClass();
return iter.map { p =>
somefunc(p, d.get())
};
}; );
On Fri, Jan 29, 2016 at 5:29 PM, N B wrote:
> Well won't the
Of cause. If you use a ThreadLocal in a long living thread and forget to
remove it, it's definitely a memory leak.
On Thu, Jan 28, 2016 at 9:31 PM, N B wrote:
> Hello,
>
> Does anyone know if there are any potential pitfalls associated with using
> ThreadLocal variables in
Thanks for the response Ryan. So I would say that it is in fact the purpose
of a ThreadLocal i.e. to have a copy of the variable as long as the thread
lives. I guess my concern is around usage of threadpools and whether Spark
streaming will internally create many threads that rotate between tasks
Spark Streaming uses threadpools so you need to remove ThreadLocal when
it's not used.
On Fri, Jan 29, 2016 at 12:55 PM, N B wrote:
> Thanks for the response Ryan. So I would say that it is in fact the
> purpose of a ThreadLocal i.e. to have a copy of the variable as long
Hello,
Does anyone know if there are any potential pitfalls associated with using
ThreadLocal variables in a Spark streaming application? One things I have
seen mentioned in the context of app servers that use thread pools is that
ThreadLocals can leak memory. Could this happen in Spark streaming
10 matches
Mail list logo