Hi,

Following are the stats 
Cluster Size -> 8 Datanodes with configured capacity of 10 maptask each node. 
So total map task capacity = 80

Attempt 1
1. Created a file with matrix of dimension 100 x 10000
2. Split this file to 20 part files
3. Submitted to the Mahout matrixMultiplicationJob. It submitted 20 map task . 
It completed the job in 1 hour , 7mins and 48 sec

Attempt2
1. Same file
2. . Split this file to 50 part files
3.  . Submitted to the Mahout matrixMultiplicationJob. . It submitted 50 map 
task. Job Failed .

Error :
13/01/30 11:44:54 INFO mapred.JobClient: Job Failed: # of failed Map Tasks 
exceeded allowed limit. FailedCount: 1. LastFailedTask: 
task_201301291845_0004_m_000008

On Investigating more there are so many failed map tasks each with same error 
like :
Task attempt_201301291845_0004_m_000000_0 failed to report status for 604 
seconds. Killing!

What went wrong ? How can I improve the performance ? I thought to increase the 
number of files so that it can distribute to different mappers and I can 
utilize my cluster capacity but the Job failed..

Any pointers will be useful.

Thanks
Stuti



-----Original Message-----
From: satish verma [mailto:[email protected]] 
Sent: Tuesday, January 29, 2013 7:27 PM
To: [email protected]
Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?

I think I was able to create multiple reducers by setting this property 
mapred.reduce.tasks = 10 in the MR code.


Try setting  this.  If it does not work, I will check my code n let u know.
But it is doable.

The multiplication part was tricky for the mapper part . Reducer part was easy .


On Tuesday, 29 January 2013, Stuti Awasthi wrote:

> Hey Satish,
> Thanks a ton. It worked for me also. Is there any way to increase 
> reducer also currently only single reducer is working.
>
> Thanks
> Stuti
>
> -----Original Message-----
> From: satish verma [mailto:[email protected]]
> Sent: Monday, January 28, 2013 7:13 PM
> To: [email protected]
> Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?
>
> I faced this problem too.
>
> Split the seq file in which ur data is there into Multiple files. Then 
> run the matrix multiplication with the folder as input . If the folder 
> contains N sequence files, N mappers will be created.
>
>
>
> On Monday, 28 January 2013, Sean Owen wrote:
>
> > These are settings to Hadoop, not Mahout. You may need to set them 
> > in your cluster config. They are still only suggestions.
> >
> > The question still remains why you think you need several mappers. Why?
> >
> > On Mon, Jan 28, 2013 at 1:28 PM, Stuti Awasthi 
> > <[email protected]>
> > wrote:
> > > Hi,
> > > I would like to again consolidate all the steps which I performed.
> > >
> > > Issue : MatrixMultiplication example is getting executed with only 
> > > 1 map
> > task.
> > >
> > > Steps :
> > > 1. I created a file with size 104MB which is divided into 11 
> > > blocks with
> > size 10MB each. The file contains 200x100000 size of matrix.
> > > 2. I exported $MAHOUT_OPTS to the following
> > >           $   echo $MAHOUT_OPTS
> > >           -Dmapred.min.split.size=10485760 -Dmapred.map.tasks=7 3.
> > > Tried to execute matrix multiplication example using commandline :
> > > mahout matrixmult --inputPathA /test/points/matrixA --numRowsA 200
> > --numColsA 100000 --inputPathB /test/points/matrixA --numRowsB 200 
> > --numColsB 100000 --tempDir /test/temp
> > >
> > > When I check the Jobtracker UI , its shows me following for the 
> > > running
> > job :
> > > Running Map Tasks : 1
> > > Occupied Map Slots: 1
> > >
> > > How can I distribute the map task on different mappers for
> > MatrixMultiplication Job dynamically.
> > > Is it even possible that MatrixMultiplication can run 
> > > distributedly on
> > multiple mappers as it internally uses CompositeInputFormat .
> > >
> > > Please Suggest
> > >
> > > Thanks
> > > Stuti
> > >
> > >
> > > -----Original Message-----
> > > From: Sean Owen [mailto:[email protected]]
> > > Sent: Wednesday, January 23, 2013 6:42 PM
> > > To: Mahout User List
> > > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ?
> > >
> > > Mappers are usually extremely fast since they start themselves on 
> > > top of
> > the data and their job is usually just parsing and emitting key 
> > value pairs. Hadoop's choices are usually fine.
> > >
> > > If not it is usually because the mapper is emitting far more data 
> > > than
> > it ingests. Are you computing some kind of Cartesian product of input?
> > >
> > > That's slow no matter what. More mappers may increase parallelism 
> > > but
> > its still a lot of I/O. Avoid it if you can by sampling or pruning 
> > unimportant values. Otherwise , try to implement a Combiner.
> > > On Jan 23, 2013 12:04 PM, "Jonas Grote" <[email protected]> wrote:
> > >
> > >> I'd play with the mapred.map.tasks option. Setting it to 
> > >> something bigger than 1 gave me performance improvements for 
> > >> various hadoop jobs on my cluster.
> > >>
> > >>
> > >> 2013/1/16 Ashish <[email protected]>
> > >>
> > >> > I am afraid I don't know the answer. Need to experiment a bit more.
> > >> > I
> > >> have
> > >> > not used CompositeInputFormat so cannot comment.
> > >> >
> > >> > Probably, someone else on the ML(Mailing List) would be able to 
> > >> > guide
> > >> here.
> > >> >
> > >> >
> > >> > On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi
> > >> > <::DISCLAIMER::
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
>
> The contents of this e-mail and any attachment(s) are confidential and 
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as 
> information could be intercepted, corrupted, lost, destroyed, arrive 
> late or incomplete, or may contain viruses in transmission. The e mail 
> and its contents (with or without referred errors) shall therefore not 
> attach any liability on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of 
> the author and may not necessarily reflect the views or opinions of 
> HCL or its affiliates. Any form of reproduction, dissemination, 
> copying, disclosure, modification, distribution and / or publication 
> of this message without the prior written consent of authorized 
> representative of HCL is strictly prohibited. If you have received 
> this email in error please delete it and notify the sender 
> immediately.
> Before opening any email and/or attachments, please check them for 
> viruses and other defects.
>
>
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> --------
>
>

Reply via email to