Re: Invoke a UDAF inside another UDAF

rohan monga Wed, 01 Feb 2012 12:29:10 -0800

thanks Mark,
I ended up going the custom reducer way. I will try out the query you have
sent.


Regards,
--
Rohan Monga


On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover <mgro...@oanda.com> wrote:

> Rohan,
> You could do it one of the following ways:
> 1) Write a UDAF that does the avg(f2 - avg_f2) computation.
> 2) Write a custom reducer that does the avg(f2 - avg_f2) computation.
> 3) Do it with multiple passes over the data. Something like this
> (untested):
>
> select
>   table.f1,
>   avg_table.avg_f2,
>   avg(table.f2-avg_table.avg_f2)
> from
> (
> select
>   f1,
>   avg(f2) as avg_f2
> from
>   table
> group by
>   f1)avg_table
> join
> table
> ON (avg_table.f1=table.f1)
> group by
>   table.f1,
>   avg_table.avg_f2;
>
> Mark
>
> Mark Grover, Business Intelligence Analyst
> OANDA Corporation
>
> www: oanda.com www: fxtrade.com
> e: mgro...@oanda.com
>
> "Best Trading Platform" - World Finance's Forex Awards 2009.
> "The One to Watch" - Treasury Today's Adam Smith Awards 2009.
>
>
> ----- Original Message -----
> From: "rohan monga" <monga.ro...@gmail.com>
> To: user@hive.apache.org
> Sent: Friday, January 20, 2012 6:00:54 PM
> Subject: Re: Invoke a UDAF inside another UDAF
>
> my bad, i hastily converted the query to a wrong example.
>
> it should be like this
>
> select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1;
>
> In essence, I just want to use the value generated by one UDAF ( in this
> case avg(f2) ) as a single number and then apply that value to the group
> inside a different UDAF.
> For e.g. if I were to use a streaming reducer, it would be something like
> this
>
> avg1 = computeSum(list) / len(list)
> return computeSum(x-avg1 for x in list) / len(list)
>
> As I write this I realize why this might not be possible [ the group
> computation being done in one step and the information being lost ] :)
>
> But why the nullpointer exception?
>
> Regards,
> --
> Rohan Monga
>
>
>
> On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxg...@gmail.com> 
> wrote:
>
>
> IMHO You can not possibly nest the percentile calculation because the
> results would be meaningless. percentile has to aggregate a set and
> pick the Nth element, But if you nest then the inner percentile only
> returns one result to the outer percentile, and that is pretty
> meaningless.
>
> (I think someone talked about this on list in the last month or so).
> Without seeing your input data and your expected results, i can not
> understand what your query wants to do, and suggest an alternative.
>
>
>
>
>
> On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote:
> > thanks edward that seems to work :)
> >
> > However, I have another query is like this
> >
> > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx(
> b,
> > .5), .5 ) from table1 group by a
> >
> > Here I will loose the group info if I include the inner query in the FROM
> > clause, is there a way to get this to work?
> >
> > Thanks,
> > --
> > Rohan Monga
> >
> >
> > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo
> > < edlinuxg...@gmail.com >wrote:
> >
> >> I think if you are grouping by b, b has to be in your select list. Try
> >> this.
> >> FROM (
> >> select b,count(a) as theCount from table one group by b
> >> ) a select mean(theCount);
> >>
> >> I think that should work.
> >>
> >> On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote:
> >> > Hi,
> >> > I am trying to run a query like
> >> > "select mean(count(a)) from table1 group by b;"
> >> >
> >> > I am getting the following error
> >> > <snip>
> >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null)
> >> > java.lang.NullPointerException
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223)
> >> > at
> >> >
> >>
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428)
> >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
> >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
> >> > at
> >> >
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401)
> >> > at
> >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
> >> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635)
> >> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552)
> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> > at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> >> > </snip>
> >> >
> >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8
> >> >
> >> > Thanks
> >> > --
> >> > Rohan Monga
> >> >
> >>
> >
>
>

Re: Invoke a UDAF inside another UDAF

Reply via email to