Is it possible you have any empty lines or extra whitespace at the end or in the middle of any of your input files? I don't know for sure but that's where I'd start looking.
Are you on the most recent release? On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <tannaa...@gmail.com> wrote: > Mahout in local mode > > I am able to successfully run the below command on smaller data set, but > then when I am running this command on large data set I am getting below > error. Its looks like I need to increase size of some parameter but then I > am not sure which one. It is failing with this error java.io.EOFException > which creating the dictionary-0 file > > Please fine the attached file for more details. > > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf > > Main error : > > > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003 > java.io.EOFException > at java.io.DataInputStream.readByte(DataInputStream.java:267) > at > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299) > at > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320) > at org.apache.hadoop.io.Text.readFields(Text.java:263) > at > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > at > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) > at > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete: > job_local1308764206_0003 > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20 > 16/02/03 23:02:18 INFO mapred.JobClient: File Output Format Counters > 16/02/03 23:02:18 INFO mapred.JobClient: Bytes Written=14923244 > 16/02/03 23:02:18 INFO mapred.JobClient: FileSystemCounters > 16/02/03 23:02:18 INFO mapred.JobClient: FILE_BYTES_READ=1412144036729 > 16/02/03 23:02:18 INFO mapred.JobClient: > FILE_BYTES_WRITTEN=323876626568 > 16/02/03 23:02:18 INFO mapred.JobClient: File Input Format Counters > 16/02/03 23:02:18 INFO mapred.JobClient: Bytes Read=11885543289 > 16/02/03 23:02:18 INFO mapred.JobClient: Map-Reduce Framework > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce input groups=223 > 16/02/03 23:02:18 INFO mapred.JobClient: Map output materialized > bytes=2214020551 > 16/02/03 23:02:18 INFO mapred.JobClient: Combine output records=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Map input records=223 > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce shuffle bytes=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Physical memory (bytes) > snapshot=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce output records=222 > 16/02/03 23:02:18 INFO mapred.JobClient: Spilled Records=638 > 16/02/03 23:02:18 INFO mapred.JobClient: Map output bytes=2214019100 > 16/02/03 23:02:18 INFO mapred.JobClient: CPU time spent (ms)=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Total committed heap usaAT > (bytes)=735978192896 > 16/02/03 23:02:18 INFO mapred.JobClient: Virtual memory (bytes) > snapshot=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Combine input records=0 > 16/02/03 23:02:18 INFO mapred.JobClient: Map output records=223 > 16/02/03 23:02:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=9100 > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce input records=222 > Exception in thread "main" java.lang.IllegalStateException: Job failed! > at > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > at > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > . > . > > > > -- > Thanks & Regards, > > Alok R. Tanna > >