Hi Mark, Thanks for the response! The UDAFPercentile.java have two terminate() methods since it is handling two different input types by the two inner classes: PercentileLongEvaluator and PercentileLongArrayEvaluator. I am handling only a single input type of double from one table column to the iterate() method and wish to return an ArrayList<DoubleWritable> from the terminate() method. What is wrong in my class? Moreover, is there any way for UDF/UDAF/UDTF which can process all the rows of the table and output only a subset of the total rows based on some aggregation function of one column attribute i.e., similar to my case of computing the top-n-percent of a column attribute and output the entire set of filtered rows with all other columns from the table?
Thanks, Abhishek On Sun, Feb 10, 2013 at 12:36 PM, Mark Grover <grover.markgro...@gmail.com>wrote: > Hi Abhishek, > The code looks incomplete. > > See the comment at > https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/UDAF.java#L22 > Those are all the methods your UDAF class needs to implement but you seem > to be missing them. > > Mark > > On Sat, Feb 9, 2013 at 11:08 PM, Abhishek Bhattacharya > <abhat...@fiu.edu>wrote: > >> Thanks for the response. >> The link to the code is: >> https://github.com/Abhishek2301/Hive/blob/master/src/UDAFTopNPercent.java >> Please let me know to fix it! >> >> Thanks, >> Abhishek >> >> >> >> On Fri, Feb 8, 2013 at 5:02 PM, Mark Grover >> <grover.markgro...@gmail.com>wrote: >> >>> Abhishek, >>> The code doesn't seem to be complete. >>> >>> Look at >>> https://github.com/apache/hive/blob/trunk/ql/src/java/org/apache/hadoop/hive/ql/udf/UDAFPercentile.javafor >>> reference. It has two terminate()'s - one for UDAF and one for the >>> Evaluator. >>> >>> Do you mind posting your complete code on github somewhere so it's >>> easier to analyze? >>> >>> Mark >>> >>> On Fri, Feb 8, 2013 at 2:05 PM, Abhishek Bhattacharya >>> <abhat...@fiu.edu>wrote: >>> >>>> Hi, >>>> >>>> I have implemented a simple UDAF for top-n-percent as follows: >>>> import java.util.ArrayList; >>>> import java.util.Collections; >>>> >>>> import org.apache.hadoop.hive.ql.exec.UDAF; >>>> import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; >>>> >>>> public class UDAFTopNPercent extends UDAF{ >>>> >>>> public static class Result { >>>> ArrayList<Double> list; >>>> double min; >>>> } >>>> >>>> public class TopNPercentEvaluator implements UDAFEvaluator { >>>> >>>> private Result res; >>>> private int rowIndex; >>>> private int percent; >>>> >>>> public TopNPercentEvaluator() { >>>> super(); >>>> res = new Result(); >>>> init(); >>>> rowIndex = 0; >>>> } >>>> @Override >>>> public void init() { >>>> res.list = new ArrayList<Double>(); >>>> res.min = Double.MAX_VALUE; >>>> } >>>> >>>> public boolean iterate(Double rowVal, int pct) { >>>> ArrayList<Double> resList = res.list; >>>> rowIndex++; >>>> resList.add(rowVal); >>>> percent = pct; >>>> return true; >>>> } >>>> >>>> public ArrayList<Double> terminatePartial() { >>>> ArrayList<Double> resList = res.list; >>>> Collections.sort(resList); >>>> return resList; >>>> } >>>> >>>> public boolean merge(ArrayList<Double> otherList) { >>>> ArrayList<Double> resList = res.list; >>>> resList.addAll(otherList); >>>> return true; >>>> } >>>> >>>> public ArrayList<Double> terminate() { >>>> ArrayList<Double> resList = res.list; >>>> double num_rows = (double)percent/100.0*rowIndex; >>>> Collections.sort(resList); >>>> int lastIdx = resList.size()- (int) num_rows; >>>> if(lastIdx <= 0) { >>>> return resList; >>>> } >>>> for(int i=0; i<lastIdx; i++) { >>>> resList.remove(i); >>>> } >>>> return resList; >>>> } >>>> } >>>> >>>> /** >>>> * @param args >>>> */ >>>> public static void main(String[] args) { >>>> // TODO Auto-generated method stub >>>> >>>> } >>>> >>>> } >>>> >>>> But throws some error such as first few lines are: >>>> FAILED: Hive Internal Error: >>>> java.lang.ClassCastException(org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableFloatObjectInspector >>>> cannot be cast to >>>> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector) >>>> java.lang.ClassCastException: >>>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableFloatObjectInspector >>>> cannot be cast to >>>> org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector >>>> at >>>> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConverter(ObjectInspectorConverters.java:116) >>>> at >>>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFUtils$ConversionHelper.<init>(GenericUDFUtils.java:300) >>>> at >>>> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFBridge$GenericUDAFBridgeEvaluator.init(GenericUDAFBridge.java:129) >>>> >>>> Please help me to debug this! >>>> Is it throwing from returning ArrayList<Double> in terminate()? >>>> How should I return a List from UDAF? >>>> >>>> Thanks, >>>> Abhishek >>>> >>> >>> >> >> >> -- >> Thanks and Regards, >> >> Abhishek Bhattacharya >> PhD Computer Science >> School of Computing and Information Sciences >> Florida International University >> > > -- Thanks and Regards, Abhishek Bhattacharya PhD Computer Science School of Computing and Information Sciences Florida International University