Thanks Stephan, I had a MapFunction using Unirest and that was the origin
of the leak.

On Tue, Aug 2, 2016 at 7:36 AM, Stephan Ewen <se...@apache.org> wrote:

> My guess would be that you have a thread leak in the user code.
> More memory will not solve the problem, only push it a bit further away.
>
> On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <paulo.ce...@gogeo.io> wrote:
>
>> Hi folks,
>>
>>
>> I'm trying to run a DataSet program but after around 200k records are 
>> processed a "java.lang.OutOfMemoryError: unable to create new native thread" 
>> stops me.
>>
>>
>> I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 
>> nodes (each with 8 cores) and starting 10 task managers, each with 8 slots 
>> and 6GB of RAM.
>>
>>
>> Except for the data sink that writes to HDFS and runs with a parallelism of 
>> 1, my job runs with a parallelism of 80 and has two input datasets, each is 
>> a HDFS file with around 6GB and 20mi lines. Most of my map functions uses 
>> external services via RPC or REST APIs to enrich the raw data with info from 
>> other sources.
>>
>> Might I be doing something wrong or I really should have more memory 
>> available?
>>
>> Thanks,
>> Paulo Cezar
>>
>>
>

Reply via email to