Yes, with:
bin/mahout lucene.vector \
--dir /home/hadoop/MahoutStatements/tf_index \
--field fulltext \
--dictOut /home/hadoop/MahoutStatements/dict.txt \
--output /home/hadoop/MahoutStatements/tfidf-vectors \
--idField id \
--weight TFIDF
2011/5/25 Jake Mannix <[email protected]>:
> Did you rebuild your tfidf-vectors with trunk as well?
>
> On Wed, May 25, 2011 at 6:59 AM, Stefan Wienert <[email protected]> wrote:
>
>> First, I use http://svn.apache.org/repos/asf/mahout/trunk, tested some
>> minutes ago with the newest version.
>>
>> And still:
>> bin/mahout transpose \
>> --input /home/hadoop/MahoutStatements/tfidf-vectors \
>> --numRows 227 \
>> --numCols 107909 \
>> --tempDir /home/hadoop/MahoutStatements/tfidf-matrix/transpose
>> produces:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
>> be cast to org.apache.hadoop.io.IntWritable
>>
>> My first idea to change "lucene.vector" does not work, there is too
>> much to change.
>>
>> So... Ideas? What about changing "transpose" and "matrixmult" to use
>> LongWritable instead of IntWritable? Is this problematically?
>>
>> 2011/5/25 Jake Mannix <[email protected]>:
>> > On Wed, May 25, 2011 at 6:14 AM, Stefan Wienert <[email protected]>
>> wrote:
>> >
>> >> So the real problem is, that "transpose" and "matrixmult" (maybe)
>> >> still uses IntWritable instead of LongWritable".
>> >>
>> >
>> > It's the other way around: matrix operations use keys which are ints, and
>> > the lucene.vector class needs to respect this. It doesn't on current
>> trunk?
>> >
>> > -jake
>> >
>>
>>
>>
>> --
>> Stefan Wienert
>>
>> http://www.wienert.cc
>> [email protected]
>>
>> Telefon: +495251-2026838 (neue Nummer seit 20.06.10)
>> Mobil: +49176-40170270
>>
>
--
Stefan Wienert
http://www.wienert.cc
[email protected]
Telefon: +495251-2026838 (neue Nummer seit 20.06.10)
Mobil: +49176-40170270