Thank you Andrew for the quick response . I have around 300 input files. It would take a while for me to go though each file. I will try to look into that, but then I had successfully generated the sequence file use mahout seqdirectory for the same dataset. How can I find which mahout release I am on? also can you let me know how can I increase io.sort.mb = 100 when I have Mahout running in local mode.
In the earlier attach file you can see it says 16/02/03 22:59:04 INFO mapred.MapTask: Record too large for in-memory buffer: 99614722 bytes How can I increase in-memory buffer for Mahout local mode. I hope this has nothing to do with this error. Thanks, Alok Tanna On Wed, Feb 3, 2016 at 10:50 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > Is it possible you have any empty lines or extra whitespace at the end or > in the middle of any of your input files? I don't know for sure but that's > where I'd start looking. > > Are you on the most recent release? > > On Wed, Feb 3, 2016 at 7:33 PM, Alok Tanna <tannaa...@gmail.com> wrote: > > > Mahout in local mode > > > > I am able to successfully run the below command on smaller data set, but > > then when I am running this command on large data set I am getting below > > error. Its looks like I need to increase size of some parameter but > then I > > am not sure which one. It is failing with this error > java.io.EOFException > > which creating the dictionary-0 file > > > > Please fine the attached file for more details. > > > > command: mahout seq2sparse -i /home/ubuntu/AT/AT-Seq/ -o > > /home/ubuntu/AT/AT-vectors/ -lnorm -nv -wt tfidf > > > > Main error : > > > > > > 16/02/03 23:02:06 INFO mapred.LocalJobRunner: reduce > reduce > > 16/02/03 23:02:17 INFO mapred.LocalJobRunner: reduce > reduce > > 16/02/03 23:02:18 WARN mapred.LocalJobRunner: job_local1308764206_0003 > > java.io.EOFException > > at java.io.DataInputStream.readByte(DataInputStream.java:267) > > at > > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299) > > at > > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320) > > at org.apache.hadoop.io.Text.readFields(Text.java:263) > > at > > org.apache.mahout.common.StringTuple.readFields(StringTuple.java:142) > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67) > > at > > > org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40) > > at > > > org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) > > at > > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) > > at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) > > at > > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418) > > at > > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) > > 16/02/03 23:02:18 INFO mapred.JobClient: Job complete: > > job_local1308764206_0003 > > 16/02/03 23:02:18 INFO mapred.JobClient: Counters: 20 > > 16/02/03 23:02:18 INFO mapred.JobClient: File Output Format Counters > > 16/02/03 23:02:18 INFO mapred.JobClient: Bytes Written=14923244 > > 16/02/03 23:02:18 INFO mapred.JobClient: FileSystemCounters > > 16/02/03 23:02:18 INFO mapred.JobClient: > FILE_BYTES_READ=1412144036729 > > 16/02/03 23:02:18 INFO mapred.JobClient: > > FILE_BYTES_WRITTEN=323876626568 > > 16/02/03 23:02:18 INFO mapred.JobClient: File Input Format Counters > > 16/02/03 23:02:18 INFO mapred.JobClient: Bytes Read=11885543289 > > 16/02/03 23:02:18 INFO mapred.JobClient: Map-Reduce Framework > > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce input groups=223 > > 16/02/03 23:02:18 INFO mapred.JobClient: Map output materialized > > bytes=2214020551 > > 16/02/03 23:02:18 INFO mapred.JobClient: Combine output records=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Map input records=223 > > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce shuffle bytes=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Physical memory (bytes) > > snapshot=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce output records=222 > > 16/02/03 23:02:18 INFO mapred.JobClient: Spilled Records=638 > > 16/02/03 23:02:18 INFO mapred.JobClient: Map output bytes=2214019100 > > 16/02/03 23:02:18 INFO mapred.JobClient: CPU time spent (ms)=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Total committed heap usaAT > > (bytes)=735978192896 > > 16/02/03 23:02:18 INFO mapred.JobClient: Virtual memory (bytes) > > snapshot=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Combine input records=0 > > 16/02/03 23:02:18 INFO mapred.JobClient: Map output records=223 > > 16/02/03 23:02:18 INFO mapred.JobClient: SPLIT_RAW_BYTES=9100 > > 16/02/03 23:02:18 INFO mapred.JobClient: Reduce input records=222 > > Exception in thread "main" java.lang.IllegalStateException: Job failed! > > at > > > org.apache.mahout.vectorizer.DictionaryVectorizer.makePartialVectors(DictionaryVectorizer.java:329) > > at > > > org.apache.mahout.vectorizer.DictionaryVectorizer.createTermFrequencyVectors(DictionaryVectorizer.java:199) > > at > > > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:274) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > > at > > > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:606) > > at > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > > at > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > at > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > > . > > . > > > > > > > > -- > > Thanks & Regards, > > > > Alok R. Tanna > > > > > -- Thanks & Regards, Alok R. Tanna