Re: How to convert SequenceFile to SequenceFile?

Stefan Wienert Wed, 25 May 2011 06:15:02 -0700

So the real problem is, that "transpose" and "matrixmult" (maybe)
still uses IntWritable instead of LongWritable".


Still, int should be enough, the easiest approach for that is to
modify "lucene.vector" so that it creates
SequenceFile<IntWritable,VectorWritable>

Or is there a "transpose" and "matrixmult" for <LongWritable,VectorWritable>?

2011/5/25 Sean Owen <[email protected]>:
> (This wiki looks well out of date, recent as of only version 0.2. For
> example we don't use IntWritable anymore. It needs an update, but,
> ought not be too different as of now, version 0.5.)
>
> You would need to key by int instead of long, conceptually. Do you
> really have 32-bit values? then just write out an int in a Writable,
> not long.
>
> If not, you can just hash. I use elsewhere what Java uses: value ^
> (value >>> 32). Actually I also add "& 0x7FFFFFFF" to make sure it's
> positive.
>
> You would need to store the reverse mapping from int to long to get
> your original values out later. And there is a tiny chance of
> collision.
>
> On Wed, May 25, 2011 at 12:59 PM, Stefan Wienert <[email protected]> wrote:
>> Hi,
>>
>> I need some help using Hadoop :
>>
>> I'm trying to do some Dimensional reduction after this tutorial:
>> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>>
>> I created my tf-idf-vectors from text saved in lucene:
>> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
>>
>> Now, I got a problem when I try to transpose the tf-idf-vectors:
>> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
>> be cast to org.apache.hadoop.io.IntWritable
>>
>> So, my question:
>> What is the easiest way to get my data from
>> SequenceFile<LongWritable,VectorWritable> to
>> SequenceFile<IntWritable,VectorWritable>?
>> Or can I create this directly from a Lucene-Index?
>>
>> Thx
>> Stefan
>>
>



-- 
Stefan Wienert

http://www.wienert.cc
[email protected]

Telefon: +495251-2026838 (neue Nummer seit 20.06.10)
Mobil: +49176-40170270

Re: How to convert SequenceFile to SequenceFile?

Reply via email to