Re: Help with Pig Script

Jeremy Hanna Thu, 17 Nov 2011 13:46:29 -0800

On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote:

> Jeremy Hanna <jeremy.hanna1234 <at> gmail.com> writes:
> 
>> 
>> If you are only interested in loading one row, why do you need to use Pig?  
>> Is 
> it an extremely wide row?
>> 
>> Unless you are using an ordered partitioner, you can't limit the rows you 
> mapreduce over currently - you
>> have to mapreduce over the whole column family.  That will change probably 
>> in 
> 1.1.  However, again, if
>> you're only after 1 row, why don't you just use a regular cassandra client 
>> and 
> get that row and operate on it
>> that way?
>> 
>> I suppose you *could* use pig and filter by the ID or something.  If you 
>> *do* 
> have an ordered partitioner in
>> your cluster, it's just a matter of specifying the key range.
>> 
>> On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote:
>> 
>>> I am trying to do the following with a PIG script and am having trouble 
> finding 
>>> the correct syntax.
>>> 
>>> - I want to use the LOAD function to load a single key/value "row" into a 
> pig 
>>> object.
>>> - The contents of that row is then flattened into a list of keys.
>>> - I then want to use that list of keys for another load function to select 
> the 
>>> key/value pairs from another column family.
>>> 
>>> The only way I can get this to work is by using a generic load function 
>>> then 
>>> applying filters to get at the data I want. Then joining the two pig 
>>> objects 
>>> together to filter the second column family.
>>> 
>>> I want to avoid having to pull the entire column familys into pig, it is 
>>> way 
> too 
>>> much data.
>>> 
>>> Any suggestions?
>>> 
>>> Thanks!
>>> 
>> 
>> 
> 
> 
> It is a very wide row, with nested keys to another column family.  Pig makes 
> it 
> easy convert it into a list of keys.
> 
> It also makes it easy to write out the results into Hadoop.
> 
> I then want to take that list of keys to go get rows from whatever column 
> family 
> they are for.
> 
> Thanks for you response.
> 
>


Okay.  Makes sense.  There is work being done to support wide rows with 
mapreduce - https://issues.apache.org/jira/browse/CASSANDRA-3264 which is now 
being worked on as part of transposition - 
https://issues.apache.org/jira/browse/CASSANDRA-2474.  Transposition would make 
it so each wide row would turn into several transposed rows - (key, column, 
value) combinations.

I think the easiest way to do what you're trying to do is to use a client to 
page through the row and get the whole thing, then you can copy that up to hdfs 
or whatever else you want to do with it.

Re: Help with Pig Script

Reply via email to