Honestly, I'd rather have a keyed bag of maps on the initial load, but that'd work too. Is it really that hard to get cassandra data out that you need a UDF to do anything besides an initial dump?
On Apr 6, 2011, at 3:51 PM, Jeremy Hanna wrote: > I'm going to put a UDF up on the pygmalion project hopefully today that will > convert that into something more usable. Props to Jacob from infochimps - he > and I have been creating UDFs like that lately for use with Cassandra. > There's an associated UDF for getting it back into the key, cols form to > output to cassandra as well. I'll try to get that pushed tonight but take a > look at: > https://github.com/jeromatron/pygmalion/ > That's where I'll push the code - hopefully that will help. > > What it does is takes the data structure returned from cassandra and allows > you say, give me the key and the values for these column names in a bag so > for your example it would return: > {(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)} > and you could assign var names for each like key, first, last within pig. > > Anyway, if that helps, look for that soon. It's helping us use the output as > tabular data. > > On Apr 6, 2011, at 5:40 PM, bob wrote: > >> No matter what I try, I end up losing the tuples after the initial flatten. >> I'm using some auto-generated test data with firstn, last and a >> concatanation for the key. The script and outputs. . . >> >> rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as >> (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } ); >> dump rows; >> >> (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)}) >> (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)}) >> (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)}) >> (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)}) >> (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)}) >> (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)}) >> (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)}) >> (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)}) >> (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)}) >> (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)}) >> >> So far, so good. >> >> >> columns = foreach rows generate flatten(cols) as (name, value); >> dump columns; >> >> () >> () >> () >> () >> () >> () >> () >> () >> () >> () >> >> >> Not so good. >> >> >> >> I've tried multiple combinations w/ no success. If I just leave bag empty >> in the initial load, i.e. cols:bag{} and then leave off the as in the >> flatten I get something that looks like a list of tuples. But, trying to >> access $1 in that result gives me an Error 1000 index out of range. So, >> that's not the ticket either. >> >> What I'd really like is to flatten the bag into a map, but I'm about as >> successful there as well. >> >> Any help is most welcome . (Cassandra 7.4 and Pig 0.8.0) >> >> >
