Re: How to convert SequenceFile to SequenceFile?

Sean Owen Wed, 25 May 2011 05:32:39 -0700

(This wiki looks well out of date, recent as of only version 0.2. For
example we don't use IntWritable anymore. It needs an update, but,
ought not be too different as of now, version 0.5.)

You would need to key by int instead of long, conceptually. Do you
really have 32-bit values? then just write out an int in a Writable,
not long.

If not, you can just hash. I use elsewhere what Java uses: value ^
(value >>> 32). Actually I also add "& 0x7FFFFFFF" to make sure it's
positive.

You would need to store the reverse mapping from int to long to get
your original values out later. And there is a tiny chance of
collision.

On Wed, May 25, 2011 at 12:59 PM, Stefan Wienert <[email protected]> wrote:
> Hi,
>
> I need some help using Hadoop :
>
> I'm trying to do some Dimensional reduction after this tutorial:
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
>
> I created my tf-idf-vectors from text saved in lucene:
> https://cwiki.apache.org/confluence/display/MAHOUT/Creating+Vectors+from+Text
>
> Now, I got a problem when I try to transpose the tf-idf-vectors:
> java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot
> be cast to org.apache.hadoop.io.IntWritable
>
> So, my question:
> What is the easiest way to get my data from
> SequenceFile<LongWritable,VectorWritable> to
> SequenceFile<IntWritable,VectorWritable>?
> Or can I create this directly from a Lucene-Index?
>
> Thx
> Stefan
>

Re: How to convert SequenceFile to SequenceFile?

Reply via email to