thanks Mark, I ended up going the custom reducer way. I will try out the query you have sent.
Regards, -- Rohan Monga On Wed, Feb 1, 2012 at 11:06 AM, Mark Grover <mgro...@oanda.com> wrote: > Rohan, > You could do it one of the following ways: > 1) Write a UDAF that does the avg(f2 - avg_f2) computation. > 2) Write a custom reducer that does the avg(f2 - avg_f2) computation. > 3) Do it with multiple passes over the data. Something like this > (untested): > > select > table.f1, > avg_table.avg_f2, > avg(table.f2-avg_table.avg_f2) > from > ( > select > f1, > avg(f2) as avg_f2 > from > table > group by > f1)avg_table > join > table > ON (avg_table.f1=table.f1) > group by > table.f1, > avg_table.avg_f2; > > Mark > > Mark Grover, Business Intelligence Analyst > OANDA Corporation > > www: oanda.com www: fxtrade.com > e: mgro...@oanda.com > > "Best Trading Platform" - World Finance's Forex Awards 2009. > "The One to Watch" - Treasury Today's Adam Smith Awards 2009. > > > ----- Original Message ----- > From: "rohan monga" <monga.ro...@gmail.com> > To: user@hive.apache.org > Sent: Friday, January 20, 2012 6:00:54 PM > Subject: Re: Invoke a UDAF inside another UDAF > > my bad, i hastily converted the query to a wrong example. > > it should be like this > > select f1, avg(f2) as avg_f2, avg(f2 - avg_f2) from table group by f1; > > In essence, I just want to use the value generated by one UDAF ( in this > case avg(f2) ) as a single number and then apply that value to the group > inside a different UDAF. > For e.g. if I were to use a streaming reducer, it would be something like > this > > avg1 = computeSum(list) / len(list) > return computeSum(x-avg1 for x in list) / len(list) > > As I write this I realize why this might not be possible [ the group > computation being done in one step and the information being lost ] :) > > But why the nullpointer exception? > > Regards, > -- > Rohan Monga > > > > On Fri, Jan 20, 2012 at 2:32 PM, Edward Capriolo < edlinuxg...@gmail.com> > wrote: > > > IMHO You can not possibly nest the percentile calculation because the > results would be meaningless. percentile has to aggregate a set and > pick the Nth element, But if you nest then the inner percentile only > returns one result to the outer percentile, and that is pretty > meaningless. > > (I think someone talked about this on list in the last month or so). > Without seeing your input data and your expected results, i can not > understand what your query wants to do, and suggest an alternative. > > > > > > On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote: > > thanks edward that seems to work :) > > > > However, I have another query is like this > > > > select a, avg(b) as avg_b, percentile_approx( avg_b - percentile_approx( > b, > > .5), .5 ) from table1 group by a > > > > Here I will loose the group info if I include the inner query in the FROM > > clause, is there a way to get this to work? > > > > Thanks, > > -- > > Rohan Monga > > > > > > On Fri, Jan 20, 2012 at 12:51 PM, Edward Capriolo > > < edlinuxg...@gmail.com >wrote: > > > >> I think if you are grouping by b, b has to be in your select list. Try > >> this. > >> FROM ( > >> select b,count(a) as theCount from table one group by b > >> ) a select mean(theCount); > >> > >> I think that should work. > >> > >> On 1/20/12, rohan monga < monga.ro...@gmail.com > wrote: > >> > Hi, > >> > I am trying to run a query like > >> > "select mean(count(a)) from table1 group by b;" > >> > > >> > I am getting the following error > >> > <snip> > >> > FAILED: Hive Internal Error: java.lang.NullPointerException(null) > >> > java.lang.NullPointerException > >> > at > >> > > >> > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:151) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:656) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:777) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:157) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7447) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7405) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapGroupByOperator(SemanticAnalyzer.java:2747) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genGroupByPlanMapAggr1MR(SemanticAnalyzer.java:3365) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5858) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6480) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7223) > >> > at > >> > > >> > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:240) > >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:428) > >> > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) > >> > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901) > >> > at > >> > > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:253) > >> > at > >> org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:210) > >> > at > >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:401) > >> > at > >> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336) > >> > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:635) > >> > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:552) > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> > at > >> > > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> > at > >> > > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> > at java.lang.reflect.Method.invoke(Method.java:597) > >> > at org.apache.hadoop.util.RunJar.main(RunJar.java:197) > >> > </snip> > >> > > >> > Is there a workaround ? I have tried with hive 0.7.1 and 0.8 > >> > > >> > Thanks > >> > -- > >> > Rohan Monga > >> > > >> > > > >