[ https://issues.apache.org/jira/browse/SPARK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072258#comment-14072258 ]
Apache Spark commented on SPARK-2569: ------------------------------------- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/1552 > Customized UDFs in hive not running with Spark SQL > -------------------------------------------------- > > Key: SPARK-2569 > URL: https://issues.apache.org/jira/browse/SPARK-2569 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.0.0 > Environment: linux or mac, hive 0.9.0 and hive 0.13.0 with hadoop > 1.0.4, scala 2.10.3, spark 1.0.0 > Reporter: jacky hung > Assignee: Michael Armbrust > Priority: Critical > > start spark-shell, > init (like create hiveContext, import ._ ect, make sure the jar including the > UDFs is in classpath) > hql("CREATE TEMPORARY FUNCTION t_ts AS 'udf.Timestamp'"), which is > successful. > then i tried hql("select t_ts(time) from data_common where xxxx limit > 1").collect().foreach(println), which failed with NullPointException > we had discussion about it in the mail list. > http://apache-spark-user-list.1001560.n3.nabble.com/run-sparksql-hiveudf-error-throw-NPE-td8888.html#a9006 > java.lang.NullPointerException > org.apache.spark.sql.hive.HiveFunctionFactory$class.getFunctionClass(hiveUdfs.scala:117) > org.apache.spark.sql.hive.HiveUdf.getFunctionClass(hiveUdfs.scala:157) > org.apache.spark.sql.hive.HiveFunctionFactory$class.createFunction(hiveUdfs.scala:119) > org.apache.spark.sql.hive.HiveUdf.createFunction(hiveUdfs.scala:157) > org.apache.spark.sql.hive.HiveUdf.function$lzycompute(hiveUdfs.scala:170) > org.apache.spark.sql.hive.HiveUdf.function(hiveUdfs.scala:170) > org.apache.spark.sql.hive.HiveSimpleUdf.method$lzycompute(hiveUdfs.scala:181) > org.apache.spark.sql.hive.HiveSimpleUdf.method(hiveUdfs.scala:180) > org.apache.spark.sql.hive.HiveSimpleUdf.wrappers$lzycompute(hiveUdfs.scala:186) > org.apache.spark.sql.hive.HiveSimpleUdf.wrappers(hiveUdfs.scala:186) > org.apache.spark.sql.hive.HiveSimpleUdf.eval(hiveUdfs.scala:220) > org.apache.spark.sql.catalyst.expressions.MutableProjection.apply(Projection.scala:64) > > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:160) > > org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$7.apply(Aggregate.scala:153) > org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:580) > org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:580) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) > org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:261) > org.apache.spark.rdd.RDD.iterator(RDD.scala:228) > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) -- This message was sent by Atlassian JIRA (v6.2#6252)