Thanks Stephan, I had a MapFunction using Unirest and that was the origin of the leak.
On Tue, Aug 2, 2016 at 7:36 AM, Stephan Ewen <se...@apache.org> wrote: > My guess would be that you have a thread leak in the user code. > More memory will not solve the problem, only push it a bit further away. > > On Mon, Aug 1, 2016 at 9:15 PM, Paulo Cezar <paulo.ce...@gogeo.io> wrote: > >> Hi folks, >> >> >> I'm trying to run a DataSet program but after around 200k records are >> processed a "java.lang.OutOfMemoryError: unable to create new native thread" >> stops me. >> >> >> I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with 10 >> nodes (each with 8 cores) and starting 10 task managers, each with 8 slots >> and 6GB of RAM. >> >> >> Except for the data sink that writes to HDFS and runs with a parallelism of >> 1, my job runs with a parallelism of 80 and has two input datasets, each is >> a HDFS file with around 6GB and 20mi lines. Most of my map functions uses >> external services via RPC or REST APIs to enrich the raw data with info from >> other sources. >> >> Might I be doing something wrong or I really should have more memory >> available? >> >> Thanks, >> Paulo Cezar >> >> >