Hi,
After proving for FuzzyKMeans clustering methods in CLI I'm now moving to a
Java app.
I'm running into an issue I can't seem to get past.
Error I'm getting:
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.mahout.common.StringTuple
at
org.apache.mahout.vectorizer.collocations.llr.CollocMapper.map(CollocMapper.java:41)
...
I understand the type issue being reported; any insights for the fix? Also, I'm
not explicitly calling FSDataOutputStream as I believe that the new Path param
that is in the mahout method is handling the stream out.
Here's how I'm calling the method:
<snip>
String luceneSequenceFile = "hdfs://<server>:50070/opt/mahout/lucene-seq/index";
String outputDir = "hdfs://<server>:50070/opt/mahout/fkmeans-newsClusters";
String vectorsOutput =
"hdfs://<server>:50070/opt/mahout/fkmeans-newsVectorsOutput";
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
DictionaryVectorizer.createTermFrequencyVectors(
new Path(luceneSequenceFile),
new Path(outputDir),
vectorsOutput,
conf,
minSupport,
maxNGramSize,
minLLRValue,
normPower,
true,
reduceTasks,
chunkSize,
sequentialAccessOutput, false);
</snip>
TIA,
Chris