Honestly, I'd rather have a keyed bag of maps on the initial load, but that'd 
work too. Is it really that hard to get cassandra data out that you need a UDF 
to do anything besides an initial dump?

On Apr 6, 2011, at 3:51 PM, Jeremy Hanna wrote:

> I'm going to put a UDF up on the pygmalion project hopefully today that will 
> convert that into something more usable.  Props to Jacob from infochimps - he 
> and I have been creating UDFs like that lately for use with Cassandra.  
> There's an associated UDF for getting it back into the key, cols form to 
> output to cassandra as well.  I'll try to get that pushed tonight but take a 
> look at:
> https://github.com/jeromatron/pygmalion/
> That's where I'll push the code - hopefully that will help.
> 
> What it does is takes the data structure returned from cassandra and allows 
> you say, give me the key and the values for these column names in a bag so 
> for your example it would return:
> {(faaaaaaaaazzzzzzeaaa,faaaaaaaaa,zzzzzzeaaa)}
> and you could assign var names for each like key, first, last within pig.
> 
> Anyway, if that helps, look for that soon.  It's helping us use the output as 
> tabular data.
> 
> On Apr 6, 2011, at 5:40 PM, bob wrote:
> 
>> No matter what I try, I end up losing the tuples after the initial flatten. 
>> I'm using some auto-generated test data with firstn, last and a 
>> concatanation for the key. The script and outputs. . .
>> 
>> rows = LOAD 'cassandra://Keyspace2/Standard1' USING CassandraStorage() as 
>> (key:chararray, cols:bag{T:tuple(name:chararray, value:chararray) } );
>> dump rows;
>> 
>> (faaaaaaaaazzzzzzeaaa,{(first,faaaaaaaaa),(last,zzzzzzeaaa)})
>> (jaaaaaaaaazzzlaaaaaa,{(first,jaaaaaaaaa),(last,zzzlaaaaaa)})
>> (naaaaaaaaazzzzzpaaaa,{(first,naaaaaaaaa),(last,zzzzzpaaaa)})
>> (uaaaaaaaaazzzzzsaaaa,{(first,uaaaaaaaaa),(last,zzzzzsaaaa)})
>> (vaaaaaaaaafaaaaaaaaa,{(first,vaaaaaaaaa),(last,faaaaaaaaa)})
>> (zuaaaaaaaazpaaaaaaaa,{(first,zuaaaaaaaa),(last,zpaaaaaaaa)})
>> (zuaaaaaaaazzzzhaaaaa,{(first,zuaaaaaaaa),(last,zzzzhaaaaa)})
>> (zwaaaaaaaaznaaaaaaaa,{(first,zwaaaaaaaa),(last,znaaaaaaaa)})
>> (zziaaaaaaazfaaaaaaaa,{(first,zziaaaaaaa),(last,zfaaaaaaaa)})
>> (zzkaaaaaaazzzdaaaaaa,{(first,zzkaaaaaaa),(last,zzzdaaaaaa)})
>> 
>> So far, so good.
>> 
>> 
>> columns = foreach rows generate flatten(cols) as (name, value);        
>> dump columns;
>> 
>> ()
>> ()
>> ()
>> ()
>> ()
>> ()
>> ()
>> ()
>> ()
>> ()
>> 
>> 
>> Not so good.
>> 
>> 
>> 
>> I've tried multiple combinations w/ no success.  If I just leave bag empty 
>> in the initial load, i.e. cols:bag{} and then leave off the as in the 
>> flatten I get something that looks like a list of tuples. But, trying to 
>> access $1 in that result gives me an Error 1000 index out of range. So, 
>> that's not the ticket either.
>> 
>> What I'd really like is to flatten the bag into a map, but I'm about as 
>> successful there as well.
>> 
>> Any help is most welcome .  (Cassandra 7.4 and Pig 0.8.0)
>> 
>> 
> 

Reply via email to