createTermFrequencyVectors, Hadoop, cast error

Christopher Schindler Sun, 17 Nov 2013 11:58:38 -0800

Hi,

After proving for FuzzyKMeans clustering methods in CLI I'm now moving to a 
Java app.


I'm running into an issue I can't seem to get past. 

Error I'm getting: 
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.mahout.common.StringTuple
    at 
org.apache.mahout.vectorizer.collocations.llr.CollocMapper.map(CollocMapper.java:41)
...

I understand the type issue being reported; any insights for the fix? Also, I'm 
not explicitly calling FSDataOutputStream as I believe that the new Path param 
that is in the mahout method is handling the stream out. 


Here's how I'm calling the method:

<snip>
String luceneSequenceFile = "hdfs://<server>:50070/opt/mahout/lucene-seq/index";
String outputDir = "hdfs://<server>:50070/opt/mahout/fkmeans-newsClusters";
String vectorsOutput = 
"hdfs://<server>:50070/opt/mahout/fkmeans-newsVectorsOutput";

Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);

DictionaryVectorizer.createTermFrequencyVectors(
                    new Path(luceneSequenceFile),
                    new Path(outputDir), 
                    vectorsOutput,
                    conf, 
                    minSupport, 
                    maxNGramSize, 
                    minLLRValue, 
                    normPower, 
                    true, 
                    reduceTasks,
                    chunkSize, 
                    sequentialAccessOutput, false);
</snip>

TIA,
Chris

createTermFrequencyVectors, Hadoop, cast error

Reply via email to