Hi, I've hit problems when writing custom UDTF that should return string values. I couldn't find anywhere what type should have the values that get forward()ed to collector. The only info I could dig out from google was few blogs with examples and 4 UDTFs that are among the hive sources. From that I figured out, that it should be OK to simply pass Strings inside the forwarded Object[] array. Here are the relevant parts of my code:
private Object[] forwardListObj; @Override public StructObjectInspector initialize(ObjectInspector[] args) throws UDFArgumentException { // snipped irrelevant code forwardListObj = new Object[1]; forwardListObj[0] = new String(); ArrayList<String> fieldNames = new ArrayList<String>(1); ArrayList<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(1); fieldNames.add("section"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } In proces() there is simple forwarding of some String: forwardListObj[0] = ""; forward(forwardListObj); // OR String s = ... forwardListObj[0] = s; forward(forwardListObj); I was testing the function with a simple query SELECT my_func(arg) AS x FROM logs WHERE (dt=2011120104); and it worked just as intended. But at the moment I got from testing to actually using the function in more complex queries, I got into trouble. Even LATERAL VIEW statement can cause failures: SELECT x FROM logs LATERAL VIEW my_func(arg) t AS x WHERE (dt=2011120104); causes tasks to fail with exception java.lang.ClassCastException: java.lang.String cannot be cast to org.apache.hadoop.io.Text at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:45) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getDouble(PrimitiveObjectInspectorUtils.java:607) at org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorConverter$DoubleConverter.convert(PrimitiveObjectInspectorConverter.java:229) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPEqual.evaluate(GenericUDFOPEqual.java:73) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator$DeferredExprObject.get(ExprNodeGenericFuncEvaluator.java:56) at org.apache.hadoop.hive.ql.udf.generic.GenericUDFOPAnd.evaluate(GenericUDFOPAnd.java:52) at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:86) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.LateralViewJoinOperator.processOp(LateralViewJoinOperator.java:133) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744) at org.apache.hadoop.hive.ql.exec.UDTFOperator.forwardUDTFOutput(UDTFOperator.java:112) at org.apache.hadoop.hive.ql.udf.generic.UDTFCollector.collect(UDTFCollector.java:44) at org.apache.hadoop.hive.ql.udf.generic.GenericUDTF.forward(GenericUDTF.java:81) at cz.seznam.im.functions.ExplodeSection.process(ExplodeSection.java:103) ... I should also mention that I use custom SerDe and InputFormat for the 'logs' table. When I was trying to figure it out, I was trying to run the same queries as listed above on different table without the customizations and it worked correctly too. So I think the SerDe and/or InputFormat probably play some role in this as well. What I don't understand is why the problem exhibits itself only with LATERAL VIEW. Any ideas anyone? Also, is it really correct to send String in forward()? Best regards, Jan