Hi Sean,
I'm calling getConf() and using it to configure my DistributedRowMatrix.
//
Configuration originalConf = getConf();
String inputPathString = originalConf.get("mapred.input.dir");
String outputTmpPathString = parsedArgs.get("--tempDir");
int numDocs = Integer.parseInt(parsedArgs.get("--numDocs"));
int numTerms = Integer.parseInt(parsedArgs.get("--numTerms"));
DistributedRowMatrix text = new DistributedRowMatrix(new
Path(inputPathString), new Path(outputTmpPathString), numDocs, numTerms);
text.configure(new JobConf(getConf()));
DistributedRowMatrix transpose = text.transpose();
//
On debugging, I notice that originalConf object does not have the values
that I sent in through the command line. When text.transpose() is called,
the transpose job's conf doesn't have the right values for the mappers and
reducers neither. Where am I supposed to get the command line values to be
used by these jobs?
Thanks,
Kris
2010/6/14 Sean Owen <[email protected]>
> Looks right to me. My next question is are you calling getConf() to
> get Hadoop's configuration object rather than configuring and setting
> your own? if you did that, you'd lose anything Hadoop parsed from its
> files and command line -- but would explain why re-setting it yourself
> in the code works.
>
> I think we're all on 0.20.2 now, yes.
>
> On Mon, Jun 14, 2010 at 4:52 PM, Kris Jack <[email protected]> wrote:
> > Command line call is this -
> >
> > hadoop-0.20 jar mahout-core-0.4-SNAPSHOT.job
> > org.apache.mahout.math.hadoop.GenSimMatrixJob
> > -Dmapred.input.dir=/user/kris/simMatrix/mahoutIndexTFIDF.vec
> > -Dmapred.map.tasks=8 -Dmapred.reduce.tasks=8 --tempDir
> > /tmp/matrixMulitiplication/ --numDocs 12843450 --numTerms 719050
> >
> > org.apache.mahout.math.hadoop.GenSimMatrixJob is my own class that calls
> the
> > matrix transposition and then multiplication. Is it maybe because I'm
> using
> > hadoop 0.20?
>
--
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/