I think I must be confused because it seems like you know the keys
beforehand, so you can create a tuple in the Loader. And if you don't know
them, you can't pull them out anyhow...

Anyway, is the original question how to pull the key-value pairs out of a
map prior to processing them in a UDF?

mapdata = Load 'foo' using MyLoader() as ( hash:map[] );
rows = foreach mapdata generate
  (chararray) hash#'key1' as key1,
  (int) hash#'key2' as key2;

....

Does that help?

-D

On Wed, Sep 8, 2010 at 10:48 AM, Christian Decker <
decker.christ...@gmail.com> wrote:

> Well my problem with the tuple is that I do not get key-value pairs that
> can
> be accessed by using the key in my Pig Scripts. As I understand it there is
> currently no way to have access using a key to its value because the keys
> are simply aliases to the indices in the tuple, and have to be specified by
> the UDF. So basically I cannot just return a hashmap and then reference
> them.
>
> What I'm trying to do is to load rows from Cassandra and then work on them,
> but the CassandraStorage provided by Cassandra just created a Tuple of the
> key and a databag containing key-value pairs as tuples. I'd like to have a
> more mysql-esque way of addressing the columns loaded from Cassandra :-)
> --
> Christian Decker
> Software Architect
> http://blog.snyke.net
>
>
> On Tue, Sep 7, 2010 at 7:52 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
>
> > Yes. A tuple is (kind of) like a row in a database table -- a container
> for
> > fields, which may be of a number of different types. A LoadFunc returns
> > rows. You can stuff any objects into them that you like, however, by
> > serializing them into bytearays, or by doing things like extending Tuple
> > and
> > overriding its methods (see for example the ProtobufTuple in
> Elephant-Bird:
> >
> >
> http://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/ProtobufTuple.java(the<http://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/util/ProtobufTuple.java%28the>
> > latter should probably be considered a risky thing to do, as the Tuple
> > class is still evolving).
> >
> > Can you be more specific about what you are trying to do?
> >
> > -D
> >
> > On Tue, Sep 7, 2010 at 8:19 AM, Christian Decker <
> > decker.christ...@gmail.com
> > > wrote:
> >
> > > I was thinking about creating my own Cassandra Storage to be able to
> > > efficiently load data from a secondary index, and since I was already
> > > writing most of the stuff I thought it might be a good idea to already
> > > convert it into the format I'd like to elaborate on later, but as it
> > turns
> > > out LoadFunc is not generic and can therefor only return Tuples, is
> that
> > > correct?
> > >
> > > Regards,
> > > Chris
> > >
> > > On Wed, Aug 25, 2010 at 10:08 PM, Dmitriy Ryaboy <dvrya...@gmail.com>
> > > wrote:
> > >
> > > > yeah, absolutely. You can have an EvalFunc<Map> that does this. I
> think
> > > it
> > > > has to be <Map>, not <Map<String, Object>> because of how function
> > > > prototypes get mapped, but more or less the same deal.
> > > >
> > > > -D
> > > >
> > > > On Wed, Aug 25, 2010 at 11:53 AM, Christian Decker <
> > > > decker.christ...@gmail.com> wrote:
> > > >
> > > > > I'm not sure either, but it's a good point. So basically it would
> be
> > > > > possible to create a UDF that generates a Map<String, Object> from
> my
> > > > > input,
> > > > > right?
> > > > > --
> > > > > Christian Decker
> > > > > Software Architect
> > > > > http://blog.snyke.net
> > > > >
> > > > >
> > > > > On Wed, Aug 25, 2010 at 8:11 PM, Dmitriy Ryaboy <
> dvrya...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Chris,
> > > > > > This sort of pattern is not common because Map<String, Object> is
> a
> > > > > > primitive data type in Pig, I am not sure why Cassandra doesn't
> > just
> > > > use
> > > > > > it.
> > > > > > That would seem to be the right solution based on what I am
> reading
> > > in
> > > > > your
> > > > > > email.
> > > > > >
> > > > > > -D
> > > > > >
> > > > > > On Wed, Aug 25, 2010 at 10:59 AM, Christian Decker <
> > > > > > decker.christ...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I'm trying to read some data from CassandraStorage (contrib by
> > > > > Cassandra)
> > > > > > > and then work on it, but the format of the data is just
> > incredibly
> > > > > ugly.
> > > > > > > When just loading it and dumping it I can see that the format
> is
> > > > > > something
> > > > > > > like this:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> (key,{(col0,col0value),(col1,col1value),(col2,col2value),(col3,col3value)})
> > > > > > >
> > > > > > >
> > > > > > > which makes my UDFs incredibly ugly:
> > > > > > >
> > > > > > > public Boolean exec(Tuple arg0) throws IOException {
> > > > > > >
> > > > > > >  DataBag b = (DataBag) arg0.get(0);
> > > > > > >
> > > > > > >  Iterator<Tuple> i = b.iterator();
> > > > > > >
> > > > > > >  while(i.hasNext()){
> > > > > > >
> > > > > > >  Tuple next = i.next();
> > > > > > >
> > > > > > >  if("col1".equals(next.get(0).toString()))
> > > > > > >
> > > > > > >  col1 = Double.parseDouble(next.get(1).toString());
> > > > > > >
> > > > > > >  else if("longitude".equals(next.get(0).toString()))
> > > > > > >
> > > > > > >  col2 = Double.parseDouble(next.get(1).toString());
> > > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  }
> > > > > > >
> > > > > > >  ...
> > > > > > >
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > As you can see the most part of this is just iterating over the
> > > > DataBag
> > > > > > and
> > > > > > > mapping the column names to their value, before working on the
> > real
> > > > > data.
> > > > > > > Since my guess is that this is quite commonplace and
> > timeconsuming,
> > > I
> > > > > was
> > > > > > > wondering whether there is a better way to prepare the data
> > before
> > > > > > passing
> > > > > > > it to the UDFs, some sort of HashMap that extracts column names
> > and
> > > > > > values
> > > > > > > and stores them correctly.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Chris
> > > > > > >
> > > > > > > --
> > > > > > > Christian Decker
> > > > > > > Software Architect
> > > > > > > http://blog.snyke.net
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to