Hi,
I'm trying to run the matrix multiplication of two relatively small
(4219*200)(200*54622) but it is taking too long because only a single
mapper is launched. I'm running this on a 10 node cluster.
I have tried changing the MAHOUT_OPTS in the mahout file:
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.map.tasks=18"
MAHOUT_OPTS="$MAHOUT_OPTS -Dmapred.reduce.tasks=9"
Also passing the options directly on the command:
mahout matrixmult -Dmapred.map.tasks=18 -Dmapred.reduce.tasks=9
--numRowsA 200 --numColsA 4819 --numRowsB 200 --numColsB 54622
--inputPathA matrixA --inputPathB matrixB
But no luck with this either.
My Hadoop mapred-site.xml looks like this:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>serverX:54311</value>
<final>true</final>
</property>
<property>
<name>mapred.child.ulimit</name>
<value>unlimited</value>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
<final>true</final>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>2</value>
<final>true</final>
</property>
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx2000m</value>
</property>
</configuration>
Am I missing something on the configuration?
Right now with 1 mapper it is taking 4 min in average to advance 1%
with the mapper task.
Thank you,
Rafael Alfaro