When I run a fetch on my nutch 0.8 hadoop system I always get this error message, "java.lang.OutOfMemoryError: Java heap space".

I have tried to set the java memory manually with export (JAVA_OPTS="-Xmx2000m -Xms128m") but with no effect.

OS: 2x SUSE 10.1 64-bit, AMD 3000 | 4000m and AMD X2 3800 | 4000 m

Java version:
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_07-b03)
Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_07-b03, mixed mode)

My hadoop-site.xml

<property>
 <name>fs.default.name</name>
 <value>192.168.1.208:9000</value>
 <description>
   The name of the default file system. Either the literal string
   "local" or a host:port for NDFS.
 </description>
</property>

<property>
 <name>mapred.job.tracker</name>
 <value>192.168.1.208:9001</value>
 <description>
   The host and port that the MapReduce job tracker runs at. If
   "local", then jobs are run in-process as a single map and
   reduce task.
 </description>
</property>

<property>
 <name>mapred.map.tasks</name>
 <value>9</value>
 <description>
   define mapred.map tasks to be number of slave hosts
 </description>
</property>

<property>
 <name>mapred.reduce.tasks</name>
 <value>2</value>
 <description>
   define mapred.reduce tasks to be number of slave hosts
 </description>
</property>

<property>
 <name>dfs.name.dir</name>
 <value>/home/nutch/filesystem/name</value>
</property>

<property>
 <name>dfs.data.dir</name>
 <value>/home/nutch/filesystem/data</value>
</property>

<property>
 <name>mapred.system.dir</name>
 <value>/home/nutch/filesystem/mapreduce/system</value>
</property>

<property>
 <name>mapred.local.dir</name>
 <value>/home/nutch/filesystem/mapreduce/local</value>
</property>

<property>
 <name>dfs.replication</name>
 <value>2</value>
</property>

<property>
 <name>mapred.tasktracker.tasks.maximum</name>
 <value>1</value>
 <description>The maximum number of tasks that will be run
 simultaneously by a task tracker.
 </description>
</property>

<property>
 <name>mapred.child.java.opts</name>
 <value>-Xmx200m</value>
 <description>Java opts for the task tracker child processes.  Subsumes
 'mapred.child.heap.size' (If a mapred.child.heap.size value is found
 in a configuration, its maximum heap size will be used and a warning
 emitted that heap.size has been deprecated). Also, the following symbols,
 if present, will be interpolated: @taskid@ is replaced by current TaskID;
and @port@ will be replaced by mapred.task.tracker.report.port + 1 (A second
 child will fail with a port-in-use if mapred.tasktracker.tasks.maximum is
 greater than one). Any other occurrences of '@' will go unchanged. For
 example, to enable verbose gc logging to a file named for the taskid in
 /tmp and to set the heap maximum to be a gigabyte, pass a 'value' of:

       -Xmx1024m -verbose:gc -Xloggc:/tmp/@[EMAIL PROTECTED]
 </description>
</property>

<property>
 <name>mapred.task.timeout</name>
 <value>6000000</value>
 <description>The number of milliseconds before a task will be
 terminated if it neither reads an input, writes an output, nor
 updates its status string.
 </description>
</property>

Reply via email to