I haven't touched elephant bird in some time. I had some fits with it at the time that I used it whenever I strayed from the well-trod path, but I had heard it was much better lately.
Sorry not to be much more help than that. On Fri, Mar 1, 2013 at 3:50 AM, Colum Foley <[email protected]> wrote: > I am trying to store Mahout RandomAccessSparseVector using > elephant-bird and pig. The data is of the form > key(text),value(RandomAccessSparseVector). when I run pig describe it > presents the following: > > pair: {key: int,val: (cardinality: int,entries: {entry: (index: > int,value: double)})} > > My problem is that when I try to store tuples using elephant-bird's > SequenceFileStorage as follows: > > store clusteredOut into 'logsvectors.dat' using > com.twitter.elephantbird.pig.store.SequenceFileStorage ( > '-c com.twitter.elephantbird.pig.util.TextConverter', > '-c com.twitter.elephantbird.pig.mahout.VectorWritableConverter -- > -sparse' > ); > > It runs successfully but when I examine the resulting Sequencefile all > the vectors are empty. > > On the other hand, if I run the following instead: > > store clusteredOut into 'logsvectors.dat' using > com.twitter.elephantbird.pig.store.SequenceFileStorage (); > > ie do not specify the types of the key or value. > > The vectors are non-empty but are of type text..and this causes my > clustering algorithm to fail(as they are expecting VectorWritable). > > So my problem is that I need to output in VectorFileFormat, but when I > do the resulting vectors are empty. > > Anyone else have experience with this issue? > > Many thanks, > Colum >
