Chris, This sort of pattern is not common because Map<String, Object> is a primitive data type in Pig, I am not sure why Cassandra doesn't just use it. That would seem to be the right solution based on what I am reading in your email.
-D On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker < decker.christ...@gmail.com> wrote: > Hi all, > > I'm trying to read some data from CassandraStorage (contrib by Cassandra) > and then work on it, but the format of the data is just incredibly ugly. > When just loading it and dumping it I can see that the format is something > like this: > > (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)}) > > > which makes my UDFs incredibly ugly: > > public Boolean exec(Tuple arg0) throws IOException { > > DataBag b = (DataBag) arg0.get(0); > > Iterator<Tuple> i = b.iterator(); > > while(i.hasNext()){ > > Tuple next = i.next(); > > if("col1".equals(next.get(0).toString())) > > col1 = Double.parseDouble(next.get(1).toString()); > > else if("longitude".equals(next.get(0).toString())) > > col2 = Double.parseDouble(next.get(1).toString()); > > } > > } > > ... > > } > > > As you can see the most part of this is just iterating over the DataBag and > mapping the column names to their value, before working on the real data. > Since my guess is that this is quite commonplace and timeconsuming, I was > wondering whether there is a better way to prepare the data before passing > it to the UDFs, some sort of HashMap that extracts column names and values > and stores them correctly. > > Regards, > Chris > > -- > Christian Decker > Software Architect > http://blog.snyke.net >