I haven't touched elephant bird in some time.  I had some fits with it at
the time that I used it whenever I strayed from the well-trod path, but I
had heard it was much better lately.

Sorry not to be much more help than that.

On Fri, Mar 1, 2013 at 3:50 AM, Colum Foley <[email protected]> wrote:

> I am trying to store Mahout RandomAccessSparseVector using
> elephant-bird and pig. The data is of the form
> key(text),value(RandomAccessSparseVector). when I run pig describe it
> presents the following:
>
> pair: {key: int,val: (cardinality: int,entries: {entry: (index:
> int,value: double)})}
>
> My problem is that when I try to store tuples using elephant-bird's
> SequenceFileStorage as follows:
>
> store clusteredOut into 'logsvectors.dat' using
> com.twitter.elephantbird.pig.store.SequenceFileStorage (
>    '-c com.twitter.elephantbird.pig.util.TextConverter',
>    '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter  --
> -sparse'
> );
>
> It runs successfully but when I examine the resulting Sequencefile all
> the vectors are empty.
>
> On the other hand, if I run the following instead:
>
> store clusteredOut into 'logsvectors.dat' using
> com.twitter.elephantbird.pig.store.SequenceFileStorage ();
>
> ie do not specify the types of the key or value.
>
> The vectors are non-empty but are of type text..and this causes my
> clustering algorithm to fail(as they are expecting VectorWritable).
>
> So my problem is that I need to output in VectorFileFormat, but when I
> do the resulting vectors are empty.
>
> Anyone else have experience with this issue?
>
> Many thanks,
> Colum
>

Reply via email to