Hi, Following are the stats Cluster Size -> 8 Datanodes with configured capacity of 10 maptask each node. So total map task capacity = 80
Attempt 1 1. Created a file with matrix of dimension 100 x 10000 2. Split this file to 20 part files 3. Submitted to the Mahout matrixMultiplicationJob. It submitted 20 map task . It completed the job in 1 hour , 7mins and 48 sec Attempt2 1. Same file 2. . Split this file to 50 part files 3. . Submitted to the Mahout matrixMultiplicationJob. . It submitted 50 map task. Job Failed . Error : 13/01/30 11:44:54 INFO mapred.JobClient: Job Failed: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201301291845_0004_m_000008 On Investigating more there are so many failed map tasks each with same error like : Task attempt_201301291845_0004_m_000000_0 failed to report status for 604 seconds. Killing! What went wrong ? How can I improve the performance ? I thought to increase the number of files so that it can distribute to different mappers and I can utilize my cluster capacity but the Job failed.. Any pointers will be useful. Thanks Stuti -----Original Message----- From: satish verma [mailto:[email protected]] Sent: Tuesday, January 29, 2013 7:27 PM To: [email protected] Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ? I think I was able to create multiple reducers by setting this property mapred.reduce.tasks = 10 in the MR code. Try setting this. If it does not work, I will check my code n let u know. But it is doable. The multiplication part was tricky for the mapper part . Reducer part was easy . On Tuesday, 29 January 2013, Stuti Awasthi wrote: > Hey Satish, > Thanks a ton. It worked for me also. Is there any way to increase > reducer also currently only single reducer is working. > > Thanks > Stuti > > -----Original Message----- > From: satish verma [mailto:[email protected]] > Sent: Monday, January 28, 2013 7:13 PM > To: [email protected] > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ? > > I faced this problem too. > > Split the seq file in which ur data is there into Multiple files. Then > run the matrix multiplication with the folder as input . If the folder > contains N sequence files, N mappers will be created. > > > > On Monday, 28 January 2013, Sean Owen wrote: > > > These are settings to Hadoop, not Mahout. You may need to set them > > in your cluster config. They are still only suggestions. > > > > The question still remains why you think you need several mappers. Why? > > > > On Mon, Jan 28, 2013 at 1:28 PM, Stuti Awasthi > > <[email protected]> > > wrote: > > > Hi, > > > I would like to again consolidate all the steps which I performed. > > > > > > Issue : MatrixMultiplication example is getting executed with only > > > 1 map > > task. > > > > > > Steps : > > > 1. I created a file with size 104MB which is divided into 11 > > > blocks with > > size 10MB each. The file contains 200x100000 size of matrix. > > > 2. I exported $MAHOUT_OPTS to the following > > > $ echo $MAHOUT_OPTS > > > -Dmapred.min.split.size=10485760 -Dmapred.map.tasks=7 3. > > > Tried to execute matrix multiplication example using commandline : > > > mahout matrixmult --inputPathA /test/points/matrixA --numRowsA 200 > > --numColsA 100000 --inputPathB /test/points/matrixA --numRowsB 200 > > --numColsB 100000 --tempDir /test/temp > > > > > > When I check the Jobtracker UI , its shows me following for the > > > running > > job : > > > Running Map Tasks : 1 > > > Occupied Map Slots: 1 > > > > > > How can I distribute the map task on different mappers for > > MatrixMultiplication Job dynamically. > > > Is it even possible that MatrixMultiplication can run > > > distributedly on > > multiple mappers as it internally uses CompositeInputFormat . > > > > > > Please Suggest > > > > > > Thanks > > > Stuti > > > > > > > > > -----Original Message----- > > > From: Sean Owen [mailto:[email protected]] > > > Sent: Wednesday, January 23, 2013 6:42 PM > > > To: Mahout User List > > > Subject: Re: MatrixMultiplicationJob runs with 1 mapper only ? > > > > > > Mappers are usually extremely fast since they start themselves on > > > top of > > the data and their job is usually just parsing and emitting key > > value pairs. Hadoop's choices are usually fine. > > > > > > If not it is usually because the mapper is emitting far more data > > > than > > it ingests. Are you computing some kind of Cartesian product of input? > > > > > > That's slow no matter what. More mappers may increase parallelism > > > but > > its still a lot of I/O. Avoid it if you can by sampling or pruning > > unimportant values. Otherwise , try to implement a Combiner. > > > On Jan 23, 2013 12:04 PM, "Jonas Grote" <[email protected]> wrote: > > > > > >> I'd play with the mapred.map.tasks option. Setting it to > > >> something bigger than 1 gave me performance improvements for > > >> various hadoop jobs on my cluster. > > >> > > >> > > >> 2013/1/16 Ashish <[email protected]> > > >> > > >> > I am afraid I don't know the answer. Need to experiment a bit more. > > >> > I > > >> have > > >> > not used CompositeInputFormat so cannot comment. > > >> > > > >> > Probably, someone else on the ML(Mailing List) would be able to > > >> > guide > > >> here. > > >> > > > >> > > > >> > On Wed, Jan 16, 2013 at 6:01 PM, Stuti Awasthi > > >> > <::DISCLAIMER:: > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > -------- > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, lost, destroyed, arrive > late or incomplete, or may contain viruses in transmission. The e mail > and its contents (with or without referred errors) shall therefore not > attach any liability on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of > the author and may not necessarily reflect the views or opinions of > HCL or its affiliates. Any form of reproduction, dissemination, > copying, disclosure, modification, distribution and / or publication > of this message without the prior written consent of authorized > representative of HCL is strictly prohibited. If you have received > this email in error please delete it and notify the sender > immediately. > Before opening any email and/or attachments, please check them for > viruses and other defects. > > > ---------------------------------------------------------------------- > ---------------------------------------------------------------------- > -------- > >
