I'm going to put a UDF up on the pygmalion project hopefully today that will
convert that into something more usable. Props to Jacob from infochimps - he
and I have been creating UDFs like that lately for use with Cassandra. There's
an associated UDF for getting it back into the key, cols form to output to
cassandra as well. I'll try to get that pushed tonight but take a look at:
https://github.com/jeromatron/pygmalion/
That's where I'll push the code - hopefully that will help.
What it does is takes the data structure returned from cassandra and allows you
say, give me the key and the values for these column names in a bag so for your
example it would return:
{(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)}
and you could assign var names for each like key, first, last within pig.
Anyway, if that helps, look for that soon. It's helping us use the output as
tabular data.
On Apr 6, 2011, at 5:40 PM, bob wrote:
> No matter what I try, I end up losing the tuples after the initial flatten.
> I'm using some auto-generated test data with firstn, last and a concatanation
> for the key. The script and outputs. . .
>
> rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as
> (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
> dump rows;
>
> (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
> (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
> (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
> (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
> (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
> (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
> (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
> (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
> (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
> (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})
>
> So far, so good.
>
>
> columns = foreach rows generate flatten(cols) as (name, value);
> dump columns;
>
> ()
> ()
> ()
> ()
> ()
> ()
> ()
> ()
> ()
> ()
>
>
> Not so good.
>
>
>
> I've tried multiple combinations w/ no success. If I just leave bag empty in
> the initial load, i.e. cols:bag{} and then leave off the as in the flatten I
> get something that looks like a list of tuples. But, trying to access $1 in
> that result gives me an Error 1000 index out of range. So, that's not the
> ticket either.
>
> What I'd really like is to flatten the bag into a map, but I'm about as
> successful there as well.
>
> Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0)
>
>