Hi,
I need to generate similar documents .
Steps I am following :
Step 1
- Convert documents into sequence files.
Step2
- Convert sequence files into sparse vectors.
Step3
-Use RowSimilarityJob to get similar rows
In Step 3 I am facing issues
When I try to run it using following parameters
Bin/mahout rowsimilarity-i
D:\MahoutResult\seq2spraseoutput\tfidf-vectors\part-r-00000 -o
D:\MahoutResult\rowsimilarityoutput -r 20 -s
SIMILARITY_TANIMOTO_COEFFICIENT -m 10
I am getting following exception
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:124)
Oct 29, 2010 9:51:53 AM org.apache.hadoop.mapred.JobClient
monitorAndPrintJob
INFO: map 0% reduce 0%
Oct 29, 2010 9:51:53 AM org.apache.hadoop.mapred.JobClient
monitorAndPrintJob
INFO: Job complete: job_local_0002
Oct 29, 2010 9:51:53 AM org.apache.hadoop.mapred.Counters log
INFO: Counters: 0
Oct 29, 2010 9:51:53 AM org.apache.hadoop.metrics.jvm.JvmMetrics init
INFO: Cannot initialize JVM Metrics with processName=JobTracker, sessionId=
- already initialized
Oct 29, 2010 9:51:54 AM org.apache.hadoop.mapred.JobClient
configureCommandLineOptions
WARNING: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
Exception in thread "main"
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
not exist: temp/pairwiseSimilarity
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFo
rmat.java:224)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(Seq
uenceFileInputFormat.java:55)
at
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFor
mat.java:241)
at
org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at
org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.run(RowSimilarityJ
ob.java:174)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at
org.apache.mahout.math.hadoop.similarity.RowSimilarityJob.main(RowSimilarity
Job.java:86)
Is there any issues with the input?
Am I following the steps correctly ?
Regards,
Divya