Command line call is this - hadoop-0.20 jar mahout-core-0.4-SNAPSHOT.job org.apache.mahout.math.hadoop.GenSimMatrixJob -Dmapred.input.dir=/user/kris/simMatrix/mahoutIndexTFIDF.vec -Dmapred.map.tasks=8 -Dmapred.reduce.tasks=8 --tempDir /tmp/matrixMulitiplication/ --numDocs 12843450 --numTerms 719050
org.apache.mahout.math.hadoop.GenSimMatrixJob is my own class that calls the matrix transposition and then multiplication. Is it maybe because I'm using hadoop 0.20? Kris 2010/6/14 Sean Owen <[email protected]> > That's odd since those methods just set the exact same parameter to Hadoop: > > public void setNumMapTasks(int n) { setInt("mapred.map.tasks", n); } > > It is indeed not read by anything except Hadoop. > > What's your command line? there must be some little glitch here that's > making it not be set as expected. You should be able to set this in > the command line, or Hadoop XML files, and it shouldn't impact the > Mahout code either way. > > > > On Mon, Jun 14, 2010 at 3:39 PM, Kris Jack <[email protected]> wrote: > > Hi Sean, > > > > Yes, I tried using those parameters but they didn't seem to have any > > effect. What's more, the number of reducers never increased above 1, > > meaning that I never got to see any results when running with large data > > sets (doing matrix multiplication). > > > > I looked in the code to find where these parameters were being read by > the > > jobs that I was using (i.e. MatrixMultiplicationJob and TransposeJob) but > > couldn't find them. As a result, I modified their builders and called > the > > setNumMapTasks and setNumReducerTasks functions from the conf objects. > This > > now works from the command line using the parameters that you suggested. > > > > Please do let me know if I was just not calling them correctly or if you > > think that there already exists an alternative way to do this. I would > like > > to use Mahout as it was intended and not make lots of little changes > myself > > if they aren't necessary. > > > > Thanks, > > Kris > > > > > > > > 2010/6/11 Sean Owen <[email protected]> > > > >> -Dmapred.map.tasks and same for reduce? These should be Hadoop params > >> you set directly to Hadoop. > >> > >> On Fri, Jun 11, 2010 at 5:07 PM, Kris Jack <[email protected]> > wrote: > >> > Hi everyone, > >> > > >> > I am running code that uses some of the jobs defined in the > >> > DistributedRowMatrix class and would like to know if I can define the > >> number > >> > of mappers and reducers that they use when running? In particular, > with > >> the > >> > jobs: > >> > > >> > - MatrixMultiplicationJob > >> > - TransposeJob > >> > > >> > I am happy to comfortable with changing the code to get this to work > but > >> I > >> > was wondering if the algorithmic logic being employed would allow > >> multiple > >> > mappers and reducers. > >> > > >> > Thanks, > >> > Kris > >> > > >> > > > > > > > > -- > > Dr Kris Jack, > > http://www.mendeley.com/profiles/kris-jack/ > > > -- Dr Kris Jack, http://www.mendeley.com/profiles/kris-jack/
