1I would pass the memory parameters in the args array directly. The hadoop
specific arguments must come before your custom arguments, so like this
String[] args = new String[]{"-Dmapreduce.map.memory.mb=12323","customOpt1"
ToolRunner.run(..args)
The tool runner takes care of putting the hadoop specific arguments in the
jobs configs and. I bet the configuration you use is overridden or replaced
by something else.
Other than that, there is also
job.getConfiguration().set("mapred.map.child.java.opts", "-Xmx2G");
which works for me, but this is dependent on the hadoop version i guess.
On Thu, Feb 20, 2014 at 9:15 PM, Justin Kay <[email protected]> wrote:
> Hi everyone,
>
> I've been stuck on an OutOfMemoryError when attempting to run a
> SparseVectorsFromSequenceFiles() Job in Java. I'm using Mahout 0.9 and
> Hadoop 2.2, run in a Maven project. I've tried setting the heap
> configurations through Java using a Hadoop Configuration that is passed to
> the Job:
>
> CONF.set("mapreduce.map.memory.mb", "1536");
> CONF.set("mapreduce.map.java.opts", "-Xmx1024m");
> CONF.set("mapreduce.reduce.memory.mb", "1536");
> CONF.set("mapreduce.reduce.java.opts", "-Xmx1024m");
> CONF.set("task.io.sort.mb", "512");
> CONF.set("task.io.sort.factor", "100");
>
> etc., but nothing has seemed to work. My Java heap settings are similar and
> configured to "-Xms512m -Xmx1536m" when running the project. The data I'm
> using is 100,000 sequence files totally ~250mb. It doesn't fail on a data
> set of 63 sequence files ~2mb. Here is an example stack trace:
>
> Exception in thread "Thread-18" java.lang.OutOfMemoryError: Java heap space
> at sun.util.resources.TimeZoneNames.getContents(TimeZoneNames.java:205)
> at
>
> sun.util.resources.OpenListResourceBundle.loadLookup(OpenListResourceBundle.java:125)
> at
>
> sun.util.resources.OpenListResourceBundle.loadLookupTablesIfNecessary(OpenListResourceBundle.java:113)
> (this seems to get thrown on different bits of code every time)
> ......
> java.lang.IllegalStateException: Job failed!
> at
>
> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95)
> at
>
> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:257)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>
> This is the code I'm running it with in order to pass in my own
> Configuration:
>
> SparseVectorsFromSequenceFiles VectorizeJob = new
> SparseVectorsFromSequenceFiles();
> VectorizeJob.setConf(CONF);
> ToolRunner.run(VectorizeJob, args);, where args is a String[] of command
> line options
>
> Any suggestions would be greatly appreciated.
>
> Justin Kay
>