Hello,
I am using Pig version 0.17.0. When I attempt to run my pig script from the
command line on a Yarn cluster I get out of memory errors. From the Yarn
application logs, I see this stack trace:
2018-04-27 13:22:10,543 ERROR [main]
org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:2992)
at
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:2817)
at
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:2689)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1326)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1298)
at
org.apache.pig.backend.hadoop.datastorage.ConfigurationUtil.mergeConf(ConfigurationUtil.java:70)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.setLocation(PigOutputFormat.java:185)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.setUpContext(PigOutputCommitter.java:115)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:89)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.<init>(PigOutputCommitter.java:70)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:297)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:550)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$3.call(MRAppMaster.java:532)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1779)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:532)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:309)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1734)
at
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1668)
Now, in trying to increase the heap size, I added this to the beginning of the
script:
SET mapreduce.map.java.opts '-Xmx2048m';
SET mapreduce.reduce.java.opts '-Xmx2048m';
SET mapreduce.map.memory.mb 2536;
SET mapreduce.reduce.memory.mb 2536;
But this causes no effect, as it is being ignored. From the Yarn logs, I see
the Container being launched with 1024m heap size:
echo "Launching container"
exec /bin/bash -c "$JAVA_HOME/bin/java -Djava.io.tmpdir=$PWD/tmp
-Dlog4j.configuration=container-log4j.properties
-Dyarn.app.container.log.dir=/opt/hadoop/lo
gs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001
-Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.
root.logfile=syslog -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171
521_0223_01_000001/stdout
2>/opt/hadoop/logs/userlogs/application_1523452171521_0223/container_1523452171521_0223_01_000001/stderr
“
I also tried setting the memory requirements with the PIG_OPTS environment
variable:
export PIG_OPTS="-Dmapreduce.reduce.memory.mb=5000
-Dmapreduce.map.memory.mb=5000 -Dmapreduce.map.java.opts=-Xmx5000m”
No matter what I do, the container is always launched with -Xmx1024m and the
same OOM error occurs.
The question is, what is the proper way to specify the heap sizes for my Pig
mappers and reducers?
Best regards,
Alex soto