Got a little further by making some more class changes...
//
public class GenSimMatrixJob extends AbstractJob {
public GenSimMatrixJob() {
}
@Override
public int run(String[] strings) throws Exception {
addOption("numDocs", "nd", "Number of documents in the input");
addOption("numTerms", "nt", "Number of terms in the input");
Map<String,String> parsedArgs = parseArguments(strings);
if (parsedArgs == null) {
// FIXME
return 0;
}
Configuration originalConf = getConf();
String inputPathString = originalConf.get("mapred.input.dir");
String outputTmpPathString = parsedArgs.get("--tempDir");
int numDocs = Integer.parseInt(parsedArgs.get("--numDocs"));
int numTerms = Integer.parseInt(parsedArgs.get("--numTerms"));
DistributedRowMatrix text = new
DistributedRowMatrix(inputPathString,
outputTmpPathString, numDocs, numTerms);
text.configure(new JobConf(getConf()));
DistributedRowMatrix transpose = text.transpose();
DistributedRowMatrix similarity = transpose.times(transpose);
System.out.println("Similarity matrix lives: " +
similarity.getRowPath());
return 1;
}
public static void main(String[] args) throws Exception {
ToolRunner.run(new GenSimMatrixJob(), args);
}
}
//
Giving the error...
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
details.
10-Jun-2010 15:16:28 org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient
configureCommandLineOptions
WARNING: Use GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient
configureCommandLineOptions
WARNING: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.FileInputFormat listStatus
INFO: Total input paths to process : 1
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Running job: job_local_0001
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.FileInputFormat listStatus
INFO: Total input paths to process : 1
10-Jun-2010 15:16:28 org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
10-Jun-2010 15:16:28 org.apache.hadoop.io.compress.CodecPool getDecompressor
INFO: Got brand-new decompressor
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.MapTask runOldMapper
INFO: numReduceTasks: 1
10-Jun-2010 15:16:28 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: io.sort.mb = 100
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: data buffer = 79691776/99614720
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
INFO: record buffer = 262144/327680
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.LocalJobRunner$Job run
WARNING: job_local_0001
java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
cast to org.apache.hadoop.io.IntWritable
at
org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: map 0% reduce 0%
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
INFO: Job complete: job_local_0001
10-Jun-2010 15:16:29 org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
2010/6/10 Kris Jack <[email protected]>
> In the attempt to create a document-document similarity matrix, I am
> getting the following error:
>
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further
> details.
> 10-Jun-2010 13:25:04 org.apache.hadoop.metrics.jvm.JvmMetrics init
> INFO: Initializing JVM Metrics with processName=JobTracker, sessionId=
> 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
> WARNING: Use GenericOptionsParser for parsing the arguments. Applications
> should implement Tool for the same.
> 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.JobClient
> configureCommandLineOptions
> WARNING: No job jar file set. User classes may not be found. See
> JobConf(Class) or JobConf#setJar(String).
> 10-Jun-2010 13:25:04 org.apache.hadoop.mapred.FileInputFormat listStatus
> INFO: Total input paths to process : 1
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Running job: job_local_0001
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.FileInputFormat listStatus
> INFO: Total input paths to process : 1
> 10-Jun-2010 13:25:05 org.apache.hadoop.util.NativeCodeLoader <clinit>
> WARNING: Unable to load native-hadoop library for your platform... using
> builtin-java classes where applicable
> 10-Jun-2010 13:25:05 org.apache.hadoop.io.compress.CodecPool
> getDecompressor
> INFO: Got brand-new decompressor
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask runOldMapper
> INFO: numReduceTasks: 1
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: io.sort.mb = 100
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: data buffer = 79691776/99614720
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.MapTask$MapOutputBuffer
> <init>
> INFO: record buffer = 262144/327680
> 10-Jun-2010 13:25:05 org.apache.hadoop.mapred.LocalJobRunner$Job run
> WARNING: job_local_0001
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be
> cast to org.apache.hadoop.io.IntWritable
> at
> org.apache.mahout.math.hadoop.TransposeJob$TransposeMapper.map(TransposeJob.java:1)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
> 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: map 0% reduce 0%
> 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
> INFO: Job complete: job_local_0001
> 10-Jun-2010 13:25:06 org.apache.hadoop.mapred.Counters log
> INFO: Counters: 0
> Exception in thread "main" java.lang.RuntimeException: java.io.IOException:
> Job failed!
> at
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:163)
> at
> org.apache.mahout.math.hadoop.GenSimMatrixLocal.generateMatrix(GenSimMatrixLocal.java:24)
> at
> org.apache.mahout.math.hadoop.GenSimMatrixLocal.main(GenSimMatrixLocal.java:34)
> Caused by: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
> at
> org.apache.mahout.math.hadoop.DistributedRowMatrix.transpose(DistributedRowMatrix.java:158)
> ... 2 more
>
>
> I created a test solr index with 3 documents and generated a sparse feature
> matrix out of it using mahout's
> org.apache.mahout.utils.vectors.lucene.Driver.
>
> I then ran the following code using the sparse feature matrix as input
> (mahoutIndexTFIDF.vec).
>
> {
> private void generateMatrix() {
> String inputPath = "/home/kris/data/mahoutIndexTFIDF.vec";
> String tmpPath = "/tmp/matrixMultiplySpace";
> int numDocuments = 3;
> int numTerms = 4;
>
> DistributedRowMatrix text = new DistributedRowMatrix(inputPath,
> tmpPath, numDocuments, numTerms);
>
> JobConf conf = new JobConf("similarity job");
> text.configure(conf);
>
> DistributedRowMatrix transpose = text.transpose();
>
> DistributedRowMatrix similarity = transpose.times(transpose);
>
> System.out.println("Similarity matrix lives: " +
> similarity.getRowPath());
> }
>
> public static void main (String [] args) {
> GenSimMatrixLocal similarity = new GenSimMatrixLocal();
>
> similarity.generateMatrix();
> }
> }
>
> Anyone see why there is a problem between LongWritable and IntWritable
> casting? Does it need to be configured differently?
>
> Thanks,
> Kris
>
>
>
>
--
Dr Kris Jack,
http://www.mendeley.com/profiles/kris-jack/