Hi all,

I'm trying to read some data from CassandraStorage (contrib by Cassandra)
and then work on it, but the format of the data is just incredibly ugly.
When just loading it and dumping it I can see that the format is something
like this:

(key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)})


which makes my UDFs incredibly ugly:

public Boolean exec(Tuple arg0) throws IOException {

 DataBag b = (DataBag) arg0.get(0);

 Iterator<Tuple> i = b.iterator();

 while(i.hasNext()){

 Tuple next = i.next();

 if("col1".equals(next.get(0).toString()))

 col1 = Double.parseDouble(next.get(1).toString());

 else if("longitude".equals(next.get(0).toString()))

 col2 = Double.parseDouble(next.get(1).toString());

 }

 }

 ...

}


As you can see the most part of this is just iterating over the DataBag and
mapping the column names to their value, before working on the real data.
Since my guess is that this is quite commonplace and timeconsuming, I was
wondering whether there is a better way to prepare the data before passing
it to the UDFs, some sort of HashMap that extracts column names and values
and stores them correctly.

Regards,
Chris

--
Christian Decker
Software Architect
http://blog.snyke.net

Reply via email to