yeah, absolutely. You can have an EvalFunc<Map> that does this. I think it has to be <Map>, not <Map<String, Object>> because of how function prototypes get mapped, but more or less the same deal.
-D On Wed, Aug 25, 2010 at 11:53 AM, Christian Decker < decker.christ...@gmail.com> wrote: > I'm not sure either, but it's a good point. So basically it would be > possible to create a UDF that generates a Map<String, Object> from my > input, > right? > -- > Christian Decker > Software Architect > http://blog.snyke.net > > > On Wed, Aug 25, 2010 at 8:11 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > wrote: > > > Chris, > > This sort of pattern is not common because Map<String, Object> is a > > primitive data type in Pig, I am not sure why Cassandra doesn't just use > > it. > > That would seem to be the right solution based on what I am reading in > your > > email. > > > > -D > > > > On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker < > > decker.christ...@gmail.com> wrote: > > > > > Hi all, > > > > > > I'm trying to read some data from CassandraStorage (contrib by > Cassandra) > > > and then work on it, but the format of the data is just incredibly > ugly. > > > When just loading it and dumping it I can see that the format is > > something > > > like this: > > > > > > > > > (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)}) > > > > > > > > > which makes my UDFs incredibly ugly: > > > > > > public Boolean exec(Tuple arg0) throws IOException { > > > > > > DataBag b = (DataBag) arg0.get(0); > > > > > > Iterator<Tuple> i = b.iterator(); > > > > > > while(i.hasNext()){ > > > > > > Tuple next = i.next(); > > > > > > if("col1".equals(next.get(0).toString())) > > > > > > col1 = Double.parseDouble(next.get(1).toString()); > > > > > > else if("longitude".equals(next.get(0).toString())) > > > > > > col2 = Double.parseDouble(next.get(1).toString()); > > > > > > } > > > > > > } > > > > > > ... > > > > > > } > > > > > > > > > As you can see the most part of this is just iterating over the DataBag > > and > > > mapping the column names to their value, before working on the real > data. > > > Since my guess is that this is quite commonplace and timeconsuming, I > was > > > wondering whether there is a better way to prepare the data before > > passing > > > it to the UDFs, some sort of HashMap that extracts column names and > > values > > > and stores them correctly. > > > > > > Regards, > > > Chris > > > > > > -- > > > Christian Decker > > > Software Architect > > > http://blog.snyke.net > > > > > >