The issue is that currently my matrix is of dimension (100x100k), Later it can 
be (1MX10M) or big.
 
Even now if my job is running with the single mapper for (100x100k) and it is 
not able to complete the Job. As I mentioned map task just proceed to 0.99% and 
started spilling the map output. Hence I wanted to tune my job so that Mahout 
is able to complete the job and I can utilize my cluster resources.

As MatrixMultiplicationJob is a MR, so it should be able to handle parallel map 
tasks. I am not sure if there is any algorithmic constraints due to which it 
runs only with single mapper ?
I have taken the reference of thread so that I can set Configuration myself 
rather by getting it with getConf() but did not got any success
http://lucene.472066.n3.nabble.com/Setting-Number-of-Mappers-and-Reducers-in-DistributedRowMatrix-Jobs-td888980.html
 

Stuti

-----Original Message-----
From: Sean Owen [mailto:[email protected]] 
Sent: Wednesday, January 16, 2013 4:46 PM
To: Mahout User List
Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ?

Why do you need multiple mappers? Is one too slow? Many are not necessarily 
faster for small input On Jan 16, 2013 10:46 AM, "Stuti Awasthi" 
<[email protected]> wrote:

> Hi,
> I tried to call programmatically also but facing same issue : Only 
> single MapTask is running and that too spilling the map output  continuously.
> Hence im not able to generate the output for large matrix multiplication.
>
> Code Snippet :
>
> DistributedRowMatrix a = new DistributedRowMatrix(new 
> Path("/test/points/matrixA"), new 
> Path("/test/temp"),Integer.parseInt("100"), 
> Integer.parseInt("100000")); DistributedRowMatrix b = new 
> DistributedRowMatrix(new Path("/test/points/matrixA"),new 
> Path("tempDir"),Integer.parseInt("100"),
> Integer.parseInt("100000"));
> Configuration conf = new Configuration(); conf.set("fs.default.name", 
> "hdfs://DS-1078D24B4736:10818"); conf.set("mapred.child.java.opts", 
> "-Xmx2048m"); conf.set("mapred.max.split.size","10485760");
> a.setConf(conf);
> b.setConf(conf);
> a.times(b);
>
> Where Im going wrong. Any idea ?
>
> Thanks
> Stuti
> -----Original Message-----
> From: Stuti Awasthi
> Sent: Wednesday, January 16, 2013 2:55 PM
> To: Mahout User List
> Subject: RE: MatrixMultiplicationJob runs with 1 mapper only ?
>
> Hey Sean,
> Thanks for response. MatrixMultiplicationJob help shows the usage like :
> usage: <command> [Generic Options] [Job-Specific Options]
>
> Here Generic Option can be provided by -D <property=value>. Hence I 
> tried with commandline -D options but it seems like that it is not 
> making any effect.  It is also suggested in :
>
> https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout
> /common/AbstractJob.html
>
> Here I have noted 1 thing after your suggestion  that currently Im 
> passing arguments like -D<property=value> rather than -D 
> <property=value>. I tried with space between -D and property=value 
> also but then its giving error
> like:
> 13/01/16 14:21:47 ERROR common.AbstractJob: Unexpected 
> /test/points/matrixA while processing Job-Specific Options:
>
> No such error comes if im passing the arguments without space between -D.
>
> By reference of Hadoop Definite Guide : "Do not confuse setting Hadoop 
> properties using the -D property=value option to GenericOptionsParser 
> (and
> ToolRunner) with setting JVM system properties using the 
> -Dproperty=value option to the java command. The syntax for JVM system 
> properties does not allow any whitespace between the D and the 
> property name, whereas GenericOptionsParser requires them to be 
> separated by whitespace."
>
> Hence I suppose that GenericOptions should be parsed by -D 
> property=value rather than -Dproperty=value.
>
> Additionally I tried -Dmapred.max.split.size=10485760 also through 
> commandline but again only single MapTask started.
>
> Please Suggest
>
>
> -----Original Message-----
> From: Sean Owen [mailto:[email protected]]
> Sent: Wednesday, January 16, 2013 1:23 PM
> To: Mahout User List
> Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?
>
> It's up to Hadoop in the end.
>
> Try calling FileInputFormat.setMaxInputSplitSize() with a smallish 
> value, like your 10MB (10000000).
>
> I don't know if Hadoop params can be set as sys properties like that 
> anyway?
>
> On Wed, Jan 16, 2013 at 7:48 AM, Stuti Awasthi <[email protected]>
> wrote:
> > Hi,
> >
> > I am trying to multiple dense matrix of size [100 x 100k]. The size 
> > of
> the file is 104MB and with default block sizeof 64MB only 2 blocks are 
> getting created.
> > So I reduced the block size to 10MB and now my file divided into 11
> blocks across the cluster. Cluster size is 10 nodes with 1 NN/JT and 9 
> DN/TT.
> >
> > Everytime Im running Mahout MatrixMultiplicationJob through 
> > commandline,
> I can see on JobTracker WebUI that only 1 map task is launched. 
> According to my understanding of Inputsplit, there should be 11 map tasks 
> launched.
> > Apart from this Map task stays at 0.99% completion and in the Tasks 
> > Logs
> , I can see that map task is spilling the map output.
> >
> > Mahout Command:
> >
> > mahout matrixmult -Dmapred.child.java.opts=-Xmx1024M
> > -Dfs.inmemory.size.mb=200 -Dio.sort.factor=100 -Dio.sort.mb=200
> > -Dio.file.buffer.size=131072 --inputPathA /test/matrixA --numRowsA 
> > 100 --numColsA 100000 --inputPathB /test/matrixA --numRowsB 100 
> > --numColsB
> > 100000 --tempDir /test/temp
> >
> > Now here I want to know that why only 1 map task is launched 
> > everytime
> and how can I performance tune the cluster so that I can perform the 
> dense matrix multiplication of the order [90K x 1 Million] .
> >
> > Thanks
> > Stuti
> >
> >
> > ::DISCLAIMER::
> > --------------------------------------------------------------------
> > --
> > --------------------------------------------------------------------
> > --
> > --------
> >
> > The contents of this e-mail and any attachment(s) are confidential 
> > and
> intended for the named recipient(s) only.
> > E-mail transmission is not guaranteed to be secure or error-free as 
> > information could be intercepted, corrupted, lost, destroyed, arrive 
> > late or incomplete, or may contain viruses in transmission. The e 
> > mail
> and its contents (with or without referred errors) shall therefore not 
> attach any liability on the originator or HCL or its affiliates.
> > Views or opinions, if any, presented in this email are solely those 
> > of the author and may not necessarily reflect the views or opinions 
> > of HCL or its affiliates. Any form of reproduction, dissemination, 
> > copying, disclosure, modification, distribution and / or publication 
> > of
> this message without the prior written consent of authorized 
> representative of HCL is strictly prohibited. If you have received 
> this email in error please delete it and notify the sender immediately.
> > Before opening any email and/or attachments, please check them for
> viruses and other defects.
> >
> > --------------------------------------------------------------------
> > --
> > --------------------------------------------------------------------
> > --
> > --------
>

Reply via email to