Chris,
This sort of pattern is not common because Map<String, Object> is a
primitive data type in Pig, I am not sure why Cassandra doesn't just use it.
That would seem to be the right solution based on what I am reading in your
email.

-D

On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker <
decker.christ...@gmail.com> wrote:

> Hi all,
>
> I'm trying to read some data from CassandraStorage (contrib by Cassandra)
> and then work on it, but the format of the data is just incredibly ugly.
> When just loading it and dumping it I can see that the format is something
> like this:
>
> (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)})
>
>
> which makes my UDFs incredibly ugly:
>
> public Boolean exec(Tuple arg0) throws IOException {
>
>  DataBag b = (DataBag) arg0.get(0);
>
>  Iterator<Tuple> i = b.iterator();
>
>  while(i.hasNext()){
>
>  Tuple next = i.next();
>
>  if("col1".equals(next.get(0).toString()))
>
>  col1 = Double.parseDouble(next.get(1).toString());
>
>  else if("longitude".equals(next.get(0).toString()))
>
>  col2 = Double.parseDouble(next.get(1).toString());
>
>  }
>
>  }
>
>  ...
>
> }
>
>
> As you can see the most part of this is just iterating over the DataBag and
> mapping the column names to their value, before working on the real data.
> Since my guess is that this is quite commonplace and timeconsuming, I was
> wondering whether there is a better way to prepare the data before passing
> it to the UDFs, some sort of HashMap that extracts column names and values
> and stores them correctly.
>
> Regards,
> Chris
>
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>

Reply via email to