You can write an EvalFunc UDF that depends on a sort, and there are
several in piggybank that do so. COR (the correlate UDF) is such an
example. You call these UDFs on a relation after ordering them.

For example:

answers = foreach (group data by key)
{
  sorted = order data by value;
  generate my_udf(sorted.field1, sorted.field2);
}

If I remember correctly, you can in fact also do this:

sorted = order data by field;
answer = foreach sorted generate my_udf(sorted.field, sorted.other_field);

Although strictly speaking, Pig doesn't garuantee a sort is maintained
outside of {}

I can't help on the JOIN, I don't know about that. But check Pig's
bloom filter: 
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/Bloom.html

Russell Jurney twitter.com/rjurney


On Oct 5, 2012, at 11:46 AM, Brian Stempin <[email protected]> wrote:

> Hi,
> I'm fairly new to writing UDFs and Pig in general.  I want to be able to 
> write a UDF that can take advantage of MapReduce's sorting of data.  
> Specifically, I'm trying to conceive how I'd write a UDF to do a specialized 
> join or a pivot. In both cases, sorting would be useful.  EvalFunc seems to 
> give no guarantees about ordering of tuples that are passed in.
>
> Is there any way to do such things as a UDF?
>
> TIA for your help,
> Brian Stempin
> Machine Learning Engineer
> ColdLight Solutions, LLC
>
> ________________________________
> This e-mail is intended solely for the above-mentioned recipient and it may 
> contain confidential or privileged information. If you have received it in 
> error, please notify us immediately and delete the e-mail. You must not copy, 
> distribute, disclose or take any action in reliance on it. In addition, the 
> contents of an attachment to this e-mail may contain software viruses which 
> could damage your own computer system. While ColdLight Solutions, LLC has 
> taken every reasonable precaution to minimize this risk, we cannot accept 
> liability for any damage which you sustain as a result of software viruses. 
> You should perform your own virus checks before opening the attachment.

Reply via email to