Hi All,
I have some question about using EB's VectorWritableConverter in my Pig script
for data vectorization.
I am generating the tuples using a UDF, however for
simplicity I am loading the data from a file in the following code. My
UDF returns tuples of the form (1,0,1,1...) etc.
My map.dat file has the following format:
1,0,1,1
0,1,1,1,
0,0,1,1,
1,1,0,0,
.......
.......
........
I register the necessary jar files.
%declare SEQFILE_LOADER 'com.twitter.elephantbird.pig.load.SequenceFileLoader';
%declare TEXT_CONVERTER 'com.twitter.elephantbird.pig.util.TextConverter';
%declare LONG_CONVERTER
'com.twitter.elephantbird.pig.util.LongWritableConverter';
%declare VECTOR_CONVERTER
'com.twitter.elephantbird.pig.mahout.VectorWritableConverter';
/* Loading from a file instead of UDF for simplicity */
A = LOAD 'map.dat';
/*
I am not sure how to use the VectorWritableConverter to convert tuple
in the relation A to a vector using VectorWritableConverter */
B = FOREACH A GENERATE $VECTOR_CONVERTER();
DUMP B;