I'll just keep responding to myself. ;)

I ended up figuring out how to do it. I just used junit and called init, 
iterate, terminatePartial, etc from inside the unit test. After knowing a 
typical flow of function calls (as I mentioned below), the main other gotcha is 
making sure to have a new UDAF object for each instance. For example, in my 
example below, there would be three separate UDAF instances.

-Aurora

On Mar 11, 2011, at 5:02 PM, Aurora Skarra-Gallagher wrote:

> I'm looking for something like this, but for a UDAF instead of a UDF:
> http://svn.apache.org/repos/asf/hive/branches/branch-0.7/ql/src/test/org/apache/hadoop/hive/ql/udf/TestUDFDateDiff.java
> 
> -Aurora
> 
> On Mar 11, 2011, at 4:44 PM, Aurora Skarra-Gallagher wrote:
> 
>> Hi,
>> 
>> Did you actually call those functions directly from your unit tests? I'm 
>> looking for examples of that working, but all I see reference to are tests 
>> to make sure the query produces the expected output (rather than directly 
>> testing the UDAF).
>> 
>> -Aurora
>> 
>> On Mar 11, 2011, at 3:44 PM, Christopher, Pat wrote:
>> 
>>> Awesome, awesome.  That's what I had pieced together from Steve and Ed's 
>>> emails.  Glad to get confirmation on it.
>>> 
>>> Its also what I did for my unit testing.  I also called everything with 
>>> null arguments to make sure those got handled gracefully.
>>> 
>>> Pat
>>> 
>>> -----Original Message-----
>>> From: Aurora Skarra-Gallagher [mailto:aur...@yahoo-inc.com] 
>>> Sent: Friday, March 11, 2011 3:40 PM
>>> To: user@hive.apache.org
>>> Cc: Steven Wong
>>> Subject: Re: UDAF documentation
>>> 
>>> Hadoop: The Definitive Guide has a good section on this. Chapter 12: Hive: 
>>> User Defined Functions. It has a diagram that shows how things are called 
>>> and when. The example I'm looking at shows this sequence:
>>> 
>>> (first instance)
>>> init()
>>> iterate(1)
>>> iterate(2)
>>> iterate(3)
>>> terminatePartial()
>>> 
>>> (second instance)
>>> init()
>>> iterate(4)
>>> iterate(2)
>>> terminatePartial()
>>> 
>>> (then)
>>> init()
>>> merge(3)
>>> merge(4)
>>> terminate()
>>> 
>>> The UDAF being described is a max integer function, hence the merge ending 
>>> up with the highest integer from each instance.
>>> 
>>> -Aurora
>>> 
>>> On Mar 11, 2011, at 9:54 AM, Christopher, Pat wrote:
>>> 
>>>> Ahh, perfect.  The docs don't agree terribly well but the case study is 
>>>> great.  The context for when merge() gets called was not clear to me.
>>>> 
>>>> Thanks guys!
>>>> 
>>>> Pat
>>>> 
>>>> -----Original Message-----
>>>> From: Steven Wong [mailto:sw...@netflix.com] 
>>>> Sent: Thursday, March 10, 2011 6:24 PM
>>>> To: user@hive.apache.org
>>>> Cc: Christopher, Pat
>>>> Subject: RE: UDAF documentation
>>>> 
>>>> Take a look at http://wiki.apache.org/hadoop/Hive/GenericUDAFCaseStudy, in 
>>>> case you haven't found it already.
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
>>>> Sent: Thursday, March 10, 2011 6:18 PM
>>>> To: user@hive.apache.org
>>>> Cc: Christopher, Pat
>>>> Subject: Re: UDAF documentation
>>>> 
>>>> On Thu, Mar 10, 2011 at 8:27 PM, Christopher, Pat
>>>> <patrick.christop...@hp.com> wrote:
>>>>> Hi Guys,
>>>>> 
>>>>> I'm writing a UDAF to run against hive 0.5 or hive 0.7.  The 
>>>>> documentation I
>>>>> can find says to implement UDAFEvaluator and ensure that you implement
>>>>> init() , aggregate() and evaluate().  However, all of the examples I can
>>>>> find implement init(), iterate(), merge(), terminatePartial() and
>>>>> terminate().
>>>>> 
>>>>> 
>>>>> 
>>>>> What's the difference and where I can find the documentation on how to 
>>>>> write
>>>>> a UDAF?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Pat
>>>> 
>>>> At time the documentation may lag behind the code. I would checkout
>>>> the hive source code for the version you are working with and base
>>>> your work on other already existing UDAF's that are similar.
>>>> 
>>>> Edward
>>>> 
>>> 
>> 
> 

Reply via email to