As I troll through the code at times trying to polish here and there I
notice small issues to bring up --
Line separators. Lots of code independently reads
System.getProperty(line.separator) in order to output a platform
specific line break. I argue this is actually slightly bad, since it
means
Its this kind of thing that forced to move to sequence files instead of
TextKeyValueInput format and other text based/ csv based formats. Kind of
regretting the decision to go with tab separated format for BayesClassifier
which i wrote it 2 years ago. I will be modifying this to use sparse vectors
could you be specific on which map/reduce job you encountered the error ?
On Mon, Jan 18, 2010 at 7:28 PM, Olivier Grisel olivier.gri...@ensta.orgwrote:
2010/1/18 Robin Anil robin.a...@gmail.com:
Its this kind of thing that forced to move to sequence files instead of
TextKeyValueInput
2010/1/18 Robin Anil robin.a...@gmail.com:
could you be specific on which map/reduce job you encountered the error ?
I thought it was on:
hadoop jar examples/target/mahout-examples-0.3-SNAPSHOT.job
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver -i
wikipediadump/chunk-0001.xml
could you check the logs. you will see a bigger stack trace might lead back
to mahout classes
On Mon, Jan 18, 2010 at 9:19 PM, Olivier Grisel olivier.gri...@ensta.orgwrote:
2010/1/18 Olivier Grisel olivier.gri...@ensta.org:
2010/1/18 Robin Anil robin.a...@gmail.com:
could you be specific
2010/1/18 Robin Anil robin.a...@gmail.com:
could you check the logs. you will see a bigger stack trace might lead back
to mahout classes
In the tasktracker logs I could find a more complete stacktrace (jetty
related, not sign of mahout classes) and google could pointed me to
this: