Hi, I am trying to run seq2sparse on amazon emr. I created the jobflow with this command:
*elastic-mapreduce --create --alive --num-instances 10 --name 20130821_2052 --log-uri s3n://xxx/20130821_2052 --key-pair xxx.pub --instance-type m1.large --ami-version 2.2.1 --bootstrap-action s3://elasticmapreduce/bootstrap-actions/configure-hadoop --args "-m","mapred.reduce.child.java.opts=-Xmx2g","-m","mapred.job.shuffle.input.buffer.percent=0.50","-m","mapred.child.ulimit=unlimited" * Then i run *elastic-mapreduce --enable-debugging --jar s3://xxx/mahout-examples-0.8-job.jar --main-class org.apache.mahout.driver.MahoutDriver --arg seq2sparse --arg --input --arg s3n://xxx/files.seq --arg --output --arg s3n://xxx/files.vec --arg -wt --arg tfidf --arg -ow --arg --namedVector --arg --maxNGramSize --arg 3 --arg -md --arg 2 -j j-xxx* But i get this error *Exception in thread "main" java.lang.IllegalStateException: Job failed!* * at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95) * * at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:256) * * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)* * at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)* * at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:56) * * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* * at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) * * at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) * * at java.lang.reflect.Method.invoke(Method.java:597)* * at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) * * at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)* * at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)* * at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)* * at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) * * at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) * * at java.lang.reflect.Method.invoke(Method.java:597)* * at org.apache.hadoop.util.RunJar.main(RunJar.java:187)* with lots of *2013-08-21 17:59:50,995 INFO org.apache.hadoop.mapred.JobClient (main): Task Id : attempt_201308211756_0001_m_000000_2, Status : FAILED* *Error: Java heap space* Note: Minimum size of a chunk file is 110 MB. *s3cmd ls s3://xxx/files.seq/* *2013-08-11 19:28 145413944 s3://xxx/tutanaklar.seq/chunk-0* *2013-08-11 19:29 115952924 s3://xxx/tutanaklar.seq/chunk-1* *2013-08-11 19:29 160465842 s3://xxx/tutanaklar.seq/chunk-2* *2013-08-11 19:29 250955730 s3://xxx/tutanaklar.seq/chunk-3* *2013-08-11 19:30 300159624 s3://xxx/tutanaklar.seq/chunk-4* *2013-08-11 19:31 314128179 s3://xxx/tutanaklar.seq/chunk-5* *2013-08-11 19:31 244719881 s3://xxx/tutanaklar.seq/chunk-6* *2013-08-11 19:32 557089212 s3://xxx/tutanaklar.seq/chunk-7* *2013-08-11 19:34 849233960 s3://xxx/tutanaklar.seq/chunk-8* Any suggestions?
