Re: Expand row content

Dmitriy Ryaboy Wed, 25 Aug 2010 13:09:10 -0700

yeah, absolutely. You can have an EvalFunc<Map> that does this. I think it
has to be <Map>, not <Map<String, Object>> because of how function
prototypes get mapped, but more or less the same deal.


-D

On Wed, Aug 25, 2010 at 11:53 AM, Christian Decker <
decker.christ...@gmail.com> wrote:

> I'm not sure either, but it's a good point. So basically it would be
> possible to create a UDF that generates a Map<String, Object> from my
> input,
> right?
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>
>
> On Wed, Aug 25, 2010 at 8:11 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> wrote:
>
> > Chris,
> > This sort of pattern is not common because Map<String, Object> is a
> > primitive data type in Pig, I am not sure why Cassandra doesn't just use
> > it.
> > That would seem to be the right solution based on what I am reading in
> your
> > email.
> >
> > -D
> >
> > On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker <
> > decker.christ...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'm trying to read some data from CassandraStorage (contrib by
> Cassandra)
> > > and then work on it, but the format of the data is just incredibly
> ugly.
> > > When just loading it and dumping it I can see that the format is
> > something
> > > like this:
> > >
> > >
> >
> (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)})
> > >
> > >
> > > which makes my UDFs incredibly ugly:
> > >
> > > public Boolean exec(Tuple arg0) throws IOException {
> > >
> > >  DataBag b = (DataBag) arg0.get(0);
> > >
> > >  Iterator<Tuple> i = b.iterator();
> > >
> > >  while(i.hasNext()){
> > >
> > >  Tuple next = i.next();
> > >
> > >  if("col1".equals(next.get(0).toString()))
> > >
> > >  col1 = Double.parseDouble(next.get(1).toString());
> > >
> > >  else if("longitude".equals(next.get(0).toString()))
> > >
> > >  col2 = Double.parseDouble(next.get(1).toString());
> > >
> > >  }
> > >
> > >  }
> > >
> > >  ...
> > >
> > > }
> > >
> > >
> > > As you can see the most part of this is just iterating over the DataBag
> > and
> > > mapping the column names to their value, before working on the real
> data.
> > > Since my guess is that this is quite commonplace and timeconsuming, I
> was
> > > wondering whether there is a better way to prepare the data before
> > passing
> > > it to the UDFs, some sort of HashMap that extracts column names and
> > values
> > > and stores them correctly.
> > >
> > > Regards,
> > > Chris
> > >
> > > --
> > > Christian Decker
> > > Software Architect
> > > http://blog.snyke.net
> > >
> >
>

Re: Expand row content

Reply via email to