Hi,
I tried to call programmatically also but facing same issue : Only single 
MapTask is running and that too spilling the map output  continuously. Hence im 
not able to generate the output for large matrix multiplication.

Code Snippet :

DistributedRowMatrix a = new DistributedRowMatrix(new 
Path("/test/points/matrixA"), new Path("/test/temp"),Integer.parseInt("100"), 
Integer.parseInt("100000"));
DistributedRowMatrix b = new DistributedRowMatrix(new 
Path("/test/points/matrixA"),new Path("tempDir"),Integer.parseInt("100"), 
Integer.parseInt("100000"));
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://DS-1078D24B4736:10818");
conf.set("mapred.child.java.opts", "-Xmx2048m");
conf.set("mapred.max.split.size","10485760");
a.setConf(conf);
b.setConf(conf);
a.times(b);

Where Im going wrong. Any idea ?

Thanks
Stuti
-----Original Message-----
From: Stuti Awasthi 
Sent: Wednesday, January 16, 2013 2:55 PM
To: Mahout User List
Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ?

Hey Sean,
Thanks for response. MatrixMultiplicationJob help shows the usage like :
usage: <command> [Generic Options] [Job-Specific Options] 

Here Generic Option can be provided by -D <property=value>. Hence I tried with 
commandline -D options but it seems like that it is not making any effect.  It 
is also suggested in :
https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/common/AbstractJob.html
 

Here I have noted 1 thing after your suggestion  that currently Im passing 
arguments like -D<property=value> rather than -D <property=value>. I tried with 
space between -D and property=value also but then its giving error like:
13/01/16 14:21:47 ERROR common.AbstractJob: Unexpected /test/points/matrixA 
while processing Job-Specific Options:

No such error comes if im passing the arguments without space between -D.

By reference of Hadoop Definite Guide : "Do not confuse setting Hadoop 
properties using the -D property=value option to GenericOptionsParser (and 
ToolRunner) with setting JVM system properties using the                   
-Dproperty=value option to the java command. The syntax for JVM system 
properties does not allow any whitespace between the D and the property name, 
whereas GenericOptionsParser requires them to be separated by whitespace."

Hence I suppose that GenericOptions should be parsed by -D property=value 
rather than -Dproperty=value.

Additionally I tried -Dmapred.max.split.size=10485760 also through commandline 
but again only single MapTask started.

Please Suggest


-----Original Message-----
From: Sean Owen [mailto:[email protected]]
Sent: Wednesday, January 16, 2013 1:23 PM
To: Mahout User List
Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?

It's up to Hadoop in the end.

Try calling FileInputFormat.setMaxInputSplitSize() with a smallish value, like 
your 10MB (10000000).

I don't know if Hadoop params can be set as sys properties like that anyway?

On Wed, Jan 16, 2013 at 7:48 AM, Stuti Awasthi <[email protected]> wrote:
> Hi,
>
> I am trying to multiple dense matrix of size [100 x 100k]. The size of the 
> file is 104MB and with default block sizeof 64MB only 2 blocks are getting 
> created.
> So I reduced the block size to 10MB and now my file divided into 11 blocks 
> across the cluster. Cluster size is 10 nodes with 1 NN/JT and 9 DN/TT.
>
> Everytime Im running Mahout MatrixMultiplicationJob through commandline, I 
> can see on JobTracker WebUI that only 1 map task is launched. According to my 
> understanding of Inputsplit, there should be 11 map tasks launched.
> Apart from this Map task stays at 0.99% completion and in the Tasks Logs , I 
> can see that map task is spilling the map output.
>
> Mahout Command:
>
> mahout matrixmult -Dmapred.child.java.opts=-Xmx1024M
> -Dfs.inmemory.size.mb=200 -Dio.sort.factor=100 -Dio.sort.mb=200
> -Dio.file.buffer.size=131072 --inputPathA /test/matrixA --numRowsA 100 
> --numColsA 100000 --inputPathB /test/matrixA --numRowsB 100 --numColsB
> 100000 --tempDir /test/temp
>
> Now here I want to know that why only 1 map task is launched everytime and 
> how can I performance tune the cluster so that I can perform the dense matrix 
> multiplication of the order [90K x 1 Million] .
>
> Thanks
> Stuti
>
>
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
>
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as 
> information could be intercepted, corrupted, lost, destroyed, arrive 
> late or incomplete, or may contain viruses in transmission. The e mail and 
> its contents (with or without referred errors) shall therefore not attach any 
> liability on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of 
> the author and may not necessarily reflect the views or opinions of 
> HCL or its affiliates. Any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and / or publication of this 
> message without the prior written consent of authorized representative of HCL 
> is strictly prohibited. If you have received this email in error please 
> delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses 
> and other defects.
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------

Reply via email to