On Nov 17, 2011, at 1:44 PM, Aaron Griffith wrote: > Jeremy Hanna <jeremy.hanna1234 <at> gmail.com> writes: > >> >> If you are only interested in loading one row, why do you need to use Pig? >> Is > it an extremely wide row? >> >> Unless you are using an ordered partitioner, you can't limit the rows you > mapreduce over currently - you >> have to mapreduce over the whole column family. That will change probably >> in > 1.1. However, again, if >> you're only after 1 row, why don't you just use a regular cassandra client >> and > get that row and operate on it >> that way? >> >> I suppose you *could* use pig and filter by the ID or something. If you >> *do* > have an ordered partitioner in >> your cluster, it's just a matter of specifying the key range. >> >> On Nov 17, 2011, at 11:16 AM, Aaron Griffith wrote: >> >>> I am trying to do the following with a PIG script and am having trouble > finding >>> the correct syntax. >>> >>> - I want to use the LOAD function to load a single key/value "row" into a > pig >>> object. >>> - The contents of that row is then flattened into a list of keys. >>> - I then want to use that list of keys for another load function to select > the >>> key/value pairs from another column family. >>> >>> The only way I can get this to work is by using a generic load function >>> then >>> applying filters to get at the data I want. Then joining the two pig >>> objects >>> together to filter the second column family. >>> >>> I want to avoid having to pull the entire column familys into pig, it is >>> way > too >>> much data. >>> >>> Any suggestions? >>> >>> Thanks! >>> >> >> > > > It is a very wide row, with nested keys to another column family. Pig makes > it > easy convert it into a list of keys. > > It also makes it easy to write out the results into Hadoop. > > I then want to take that list of keys to go get rows from whatever column > family > they are for. > > Thanks for you response. > >
Okay. Makes sense. There is work being done to support wide rows with mapreduce - https://issues.apache.org/jira/browse/CASSANDRA-3264 which is now being worked on as part of transposition - https://issues.apache.org/jira/browse/CASSANDRA-2474. Transposition would make it so each wide row would turn into several transposed rows - (key, column, value) combinations. I think the easiest way to do what you're trying to do is to use a client to page through the row and get the whole thing, then you can copy that up to hdfs or whatever else you want to do with it.