I think I must be confused because it seems like you know the keys beforehand, so you can create a tuple in the Loader. And if you don't know them, you can't pull them out anyhow...
Anyway, is the original question how to pull the key-value pairs out of a map prior to processing them in a UDF? mapdata = Load 'foo' using MyLoader() as ( hash:map[] ); rows = foreach mapdata generate (chararray) hash#'key1' as key1, (int) hash#'key2' as key2; .... Does that help? -D On Wed, Sep 8, 2010 at 10:48 AM, Christian Decker < decker.christ...@gmail.com> wrote: > Well my problem with the tuple is that I do not get key-value pairs that > can > be accessed by using the key in my Pig Scripts. As I understand it there is > currently no way to have access using a key to its value because the keys > are simply aliases to the indices in the tuple, and have to be specified by > the UDF. So basically I cannot just return a hashmap and then reference > them. > > What I'm trying to do is to load rows from Cassandra and then work on them, > but the CassandraStorage provided by Cassandra just created a Tuple of the > key and a databag containing key-value pairs as tuples. I'd like to have a > more mysql-esque way of addressing the columns loaded from Cassandra :-) > -- > Christian Decker > Software Architect > http://blog.snyke.net > > > On Tue, Sep 7, 2010 at 7:52 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > > > Yes. A tuple is (kind of) like a row in a database table -- a container > for > > fields, which may be of a number of different types. A LoadFunc returns > > rows. You can stuff any objects into them that you like, however, by > > serializing them into bytearays, or by doing things like extending Tuple > > and > > overriding its methods (see for example the ProtobufTuple in > Elephant-Bird: > > > > > http://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/ProtobufTuple.java(the<http://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/ProtobufTuple.java%28the> > > latter should probably be considered a risky thing to do, as the Tuple > > class is still evolving). > > > > Can you be more specific about what you are trying to do? > > > > -D > > > > On Tue, Sep 7, 2010 at 8:19 AM, Christian Decker < > > decker.christ...@gmail.com > > > wrote: > > > > > I was thinking about creating my own Cassandra Storage to be able to > > > efficiently load data from a secondary index, and since I was already > > > writing most of the stuff I thought it might be a good idea to already > > > convert it into the format I'd like to elaborate on later, but as it > > turns > > > out LoadFunc is not generic and can therefor only return Tuples, is > that > > > correct? > > > > > > Regards, > > > Chris > > > > > > On Wed, Aug 25, 2010 at 10:08 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > > > wrote: > > > > > > > yeah, absolutely. You can have an EvalFunc<Map> that does this. I > think > > > it > > > > has to be <Map>, not <Map<String, Object>> because of how function > > > > prototypes get mapped, but more or less the same deal. > > > > > > > > -D > > > > > > > > On Wed, Aug 25, 2010 at 11:53 AM, Christian Decker < > > > > decker.christ...@gmail.com> wrote: > > > > > > > > > I'm not sure either, but it's a good point. So basically it would > be > > > > > possible to create a UDF that generates a Map<String, Object> from > my > > > > > input, > > > > > right? > > > > > -- > > > > > Christian Decker > > > > > Software Architect > > > > > http://blog.snyke.net > > > > > > > > > > > > > > > On Wed, Aug 25, 2010 at 8:11 PM, Dmitriy Ryaboy < > dvrya...@gmail.com> > > > > > wrote: > > > > > > > > > > > Chris, > > > > > > This sort of pattern is not common because Map<String, Object> is > a > > > > > > primitive data type in Pig, I am not sure why Cassandra doesn't > > just > > > > use > > > > > > it. > > > > > > That would seem to be the right solution based on what I am > reading > > > in > > > > > your > > > > > > email. > > > > > > > > > > > > -D > > > > > > > > > > > > On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker < > > > > > > decker.christ...@gmail.com> wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > I'm trying to read some data from CassandraStorage (contrib by > > > > > Cassandra) > > > > > > > and then work on it, but the format of the data is just > > incredibly > > > > > ugly. > > > > > > > When just loading it and dumping it I can see that the format > is > > > > > > something > > > > > > > like this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)}) > > > > > > > > > > > > > > > > > > > > > which makes my UDFs incredibly ugly: > > > > > > > > > > > > > > public Boolean exec(Tuple arg0) throws IOException { > > > > > > > > > > > > > > DataBag b = (DataBag) arg0.get(0); > > > > > > > > > > > > > > Iterator<Tuple> i = b.iterator(); > > > > > > > > > > > > > > while(i.hasNext()){ > > > > > > > > > > > > > > Tuple next = i.next(); > > > > > > > > > > > > > > if("col1".equals(next.get(0).toString())) > > > > > > > > > > > > > > col1 = Double.parseDouble(next.get(1).toString()); > > > > > > > > > > > > > > else if("longitude".equals(next.get(0).toString())) > > > > > > > > > > > > > > col2 = Double.parseDouble(next.get(1).toString()); > > > > > > > > > > > > > > } > > > > > > > > > > > > > > } > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > As you can see the most part of this is just iterating over the > > > > DataBag > > > > > > and > > > > > > > mapping the column names to their value, before working on the > > real > > > > > data. > > > > > > > Since my guess is that this is quite commonplace and > > timeconsuming, > > > I > > > > > was > > > > > > > wondering whether there is a better way to prepare the data > > before > > > > > > passing > > > > > > > it to the UDFs, some sort of HashMap that extracts column names > > and > > > > > > values > > > > > > > and stores them correctly. > > > > > > > > > > > > > > Regards, > > > > > > > Chris > > > > > > > > > > > > > > -- > > > > > > > Christian Decker > > > > > > > Software Architect > > > > > > > http://blog.snyke.net > > > > > > > > > > > > > > > > > > > > > > > > > > > >