A custom loader which partitions on a key known to live on a given box?

Jonathan Coveney Fri, 07 Jan 2011 14:35:11 -0800

I will implement this if I need to, but it seems to me that SOMEBODY has to
have run into this. I don't know if it's possible, but it's worth asking...


Basically I have a hadoop cluster of X servers, and one thing that I know is
that for anything with key k, all of the values associated with that key
will live on the same server. I've been told that the way to take advantage
of this is to make a custom loader which extends CollectibleLoader (I think,
it may be called something else), which then let's group operations be done
on the map side.

I know that Zebra implements this, but the cluster at hand is all flat
files, and getting away from that is not an option. Without a special file
format, is there a reasonable way to implement this? Has anyone done
something like this? I think having this in the piggybank or pigloader, if
it's possible, would be super useful for datasets like this.

Thanks for the help
Jon

A custom loader which partitions on a key known to live on a given box?

Reply via email to