Seems you hit https://issues.apache.org/jira/browse/SPARK-4296. It has been
fixed in 1.2.1 and 1.3.
On Thu, Feb 26, 2015 at 1:22 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Can someone confirm if they can run UDFs in group by in spark1.2?
I have two builds running -- one from a custom build from early December
(commit 4259ca8dd12) which works fine, and Spark1.2-RC2.
On the latter I get:
jdbc:hive2://XXX.208:10001 select
from_unixtime(epoch,'-MM-dd-HH'),count(*) count
. . . . . . . . . . . . . . . . . . from tbl
. . . . . . . . . . . . . . . . . . group by
from_unixtime(epoch,'-MM-dd-HH');
Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Expression not in GROUP BY:
HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
AS _c0#1004, tree:
Aggregate
[HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)],
[HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
AS _c0#1004,COUNT(1) AS count#1003L]
MetastoreRelation default, tbl, None (state=,code=0)
This worked fine on my older build. I don't see a JIRA on this but maybe
I'm not looking right. Can someone please advise?