(This wiki looks well out of date, recent as of only version 0.2. For
example we don't use IntWritable anymore. It needs an update, but,
ought not be too different as of now, version 0.5.)

You would need to key by int instead of long, conceptually. Do you
really have 32-bit values? then just write out an int in a Writable,
not long.

If not, you can just hash. I use elsewhere what Java uses: value ^
(value >>> 32). Actually I also add "& 0x7FFFFFFF" to make sure it's
positive.

You would need to store the reverse mapping from int to long to get
your original values out later. And there is a tiny chance of
collision.

On Wed, May 25, 2011 at 12:59 PM, Stefan Wienert <[email protected]> wrote:
> Hi,
>
> I need some help using Hadoop :
>
> I'm trying to do some Dimensional reduction after this tutorial:
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>
> I created my tf-idf-vectors from text saved in lucene:
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
>
> Now, I got a problem when I try to transpose the tf-idf-vectors:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.IntWritable
>
> So, my question:
> What is the easiest way to get my data from
> SequenceFile<LongWritable,VectorWritable> to
> SequenceFile<IntWritable,VectorWritable>?
> Or can I create this directly from a Lucene-Index?
>
> Thx
> Stefan
>

Reply via email to