Re: High virtual memory usage

2017-01-05 Thread Paulo Cezar
Hi Stephan, thanks for your support.

I was able to track the problem a few days ago. Unirest was the one to
blame, I was using it on some mapfuncionts to connect to external services
and for some reason it was using insane amounts of virtual memory.

Paulo Cezar

On Mon, Dec 19, 2016 at 11:30 AM Stephan Ewen <se...@apache.org> wrote:

> Hi Paulo!
>
> Hmm, interesting. The high discrepancy between virtual and physical memory
> usually means that the process either maps large files into memory, or that
> it pre-allocates a lot of memory without immediately using it.
> Neither of these things are done by Flink.
>
> Could this be an effect of either the Docker environment (mapping certain
> kernel spaces / libraries / whatever) or a result of one of the libraries
> (gRPC or so)?
>
> Stephan
>
>
> On Mon, Dec 19, 2016 at 12:32 PM, Paulo Cezar <paulo.ce...@gogeo.io>
> wrote:
>
>   - Are you using RocksDB?
>
> No.
>
>
>   - What is your flink configuration, especially around memory settings?
>
> I'm using default config with 2GB for jobmanager and 5GB for taskmanagers.
> I'm starting flink via "./bin/yarn-session.sh -d -n 5 -jm 2048 -tm 5120 -s
> 4 -nm 'Flink'"
>
>   - What do you use for TaskManager heap size? Any manual value, or do you
> let Flink/Yarn set it automatically based on container size?
>
> No manual values here. YARN config is pretty much default with maximum
> allocation of 12GB of physical memory and ratio between virtual memory to
> physical memory 2.1 (via yarn.nodemanager.vmem-pmem-ratio).
>
>
>   - Do you use any libraries or connectors in your program?
>
> I'm using  flink-connector-kafka-0.10_2.11, a MongoDB client, a gRPC
> client and some http libraries like unirest and Apache HttpClient.
>
>   - Also, can you tell us what OS you are running on?
>
> My YARN cluster runs on Docker containers (docker version 1.12) with
> images based on Ubuntu 14.04. Host OS is Ubuntu 14.04.4 LTS (GNU/Linux
> 3.19.0-65-generic x86_64).
>
>
>


Re: High virtual memory usage

2016-12-19 Thread Paulo Cezar
  - Are you using RocksDB?

No.


  - What is your flink configuration, especially around memory settings?

I'm using default config with 2GB for jobmanager and 5GB for taskmanagers.
I'm starting flink via "./bin/yarn-session.sh -d -n 5 -jm 2048 -tm 5120 -s
4 -nm 'Flink'"

  - What do you use for TaskManager heap size? Any manual value, or do you
let Flink/Yarn set it automatically based on container size?

No manual values here. YARN config is pretty much default with maximum
allocation of 12GB of physical memory and ratio between virtual memory to
physical memory 2.1 (via yarn.nodemanager.vmem-pmem-ratio).


  - Do you use any libraries or connectors in your program?

I'm using  flink-connector-kafka-0.10_2.11, a MongoDB client, a gRPC client
and some http libraries like unirest and Apache HttpClient.

  - Also, can you tell us what OS you are running on?

My YARN cluster runs on Docker containers (docker version 1.12) with images
based on Ubuntu 14.04. Host OS is Ubuntu 14.04.4 LTS (GNU/Linux
3.19.0-65-generic x86_64).


High virtual memory usage

2016-12-16 Thread Paulo Cezar
Hi Folks,

I'm running Flink (1.2-SNAPSHOT nightly) on YARN (Hadoop 2.7.2). A few
hours after I start a streaming job (built using kafka connect 0.10_2.11)
it gets killed seemingly for no reason. After inspecting the logs my best
guess is that YARN is killing containers due to high virtual memory usage.

Any guesses on why this might be happening or tips of what I should be
looking for?

What I'll do next is enable taskmanager.debug.memory.startLogThread to keep
investigating. Also, I was deploying flink-1.2-SNAPSHOT-bin-hadoop2.tgz
<https://s3.amazonaws.com/flink-nightly/flink-1.2-SNAPSHOT-bin-hadoop2.tgz>
on YARN, but my job uses scala 2.11 dependencies so I'll try using
flink-1.2-SNAPSHOT-bin-hadoop2_2.11.tgz
<https://s3.amazonaws.com/flink-nightly/flink-1.2-SNAPSHOT-bin-hadoop2_2.11.tgz>
instead.


   - Flink logs:

2016-12-15 17:44:03,763 WARN  akka.remote.ReliableDeliverySupervisor
 - Association with remote system
[akka.tcp://flink@10.0.0.8:49832] has failed, address is now gated for
[5000] ms. Reason is: [Disassociated].
2016-12-15 17:44:05,475 INFO
org.apache.flink.yarn.YarnFlinkResourceManager-
Container ResourceID{resourceId='container_1481732559439_0002_01_04'}
failed. Exit status: 1
2016-12-15 17:44:05,476 INFO
org.apache.flink.yarn.YarnFlinkResourceManager-
Diagnostics for container
ResourceID{resourceId='container_1481732559439_0002_01_04'} in
state COMPLETE : exitStatus=1 diagnostics=Exception from
container-launch.
Container id: container_1481732559439_0002_01_04
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1



   - YARN logs:

container_1481732559439_0002_01_04: 2.6 GB of 5 GB physical memory
used; 38.1 GB of 10.5 GB virtual memory used
2016-12-15 17:44:03,119 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 62223 for container-id
container_1481732559439_0002_01_01: 656.3 MB of 2 GB physical
memory used; 3.2 GB of 4.2 GB virtual memory used
2016-12-15 17:44:03,766 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exit code from container container_1481732559439_0002_01_04 is : 1
2016-12-15 17:44:03,766 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Exception from container-launch with container ID:
container_1481732559439_0002_01_04 and exit code: 1
ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at 
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at 
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Best regards,
Paulo Cezar


Failures on DataSet programs

2016-09-27 Thread Paulo Cezar
Hi Folks,

I was wondering if it's possible to keep partial outputs from dataset
programs.
I have a batch pipeline that writes its output on HDFS using
writeAsFormattedText. When it fails the output file is deleted but I would
like to keep it so that I can generate new inputs for the pipeline to avoid
reprocessing.

[]'s
Paulo Cezar


OutOfMemoryError

2016-08-01 Thread Paulo Cezar
Hi folks,


I'm trying to run a DataSet program but after around 200k records are
processed a "java.lang.OutOfMemoryError: unable to create new native
thread" stops me.


I'm deploying Flink (via bin/yarn-session.sh) on a YARN cluster with
10 nodes (each with 8 cores) and starting 10 task managers, each with
8 slots and 6GB of RAM.


Except for the data sink that writes to HDFS and runs with a
parallelism of 1, my job runs with a parallelism of 80 and has two
input datasets, each is a HDFS file with around 6GB and 20mi lines.
Most of my map functions uses external services via RPC or REST APIs
to enrich the raw data with info from other sources.

Might I be doing something wrong or I really should have more memory available?

Thanks,
Paulo Cezar