[SparkSQL, Spark 1.2] UDFs in group by broken?

2015-02-26 Thread Yana Kadiyska
Can someone confirm if they can run UDFs in group by in spark1.2?

I have two builds running -- one from a custom build from early December
(commit 4259ca8dd12) which works fine, and Spark1.2-RC2.

On the latter I get:

 jdbc:hive2://XXX.208:10001 select
from_unixtime(epoch,'-MM-dd-HH'),count(*) count
. . . . . . . . . . . . . . . . . . from tbl
. . . . . . . . . . . . . . . . . . group by
from_unixtime(epoch,'-MM-dd-HH');
Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException:
Expression not in GROUP BY:
HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
AS _c0#1004, tree:
Aggregate 
[HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)],
[HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
AS _c0#1004,COUNT(1) AS count#1003L]
 MetastoreRelation default, tbl, None (state=,code=0)

​

This worked fine on my older build. I don't see a JIRA on this but maybe
I'm not looking right. Can someone please advise?


Re: [SparkSQL, Spark 1.2] UDFs in group by broken?

2015-02-26 Thread Yin Huai
Seems you hit https://issues.apache.org/jira/browse/SPARK-4296. It has been
fixed in 1.2.1 and 1.3.

On Thu, Feb 26, 2015 at 1:22 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:

 Can someone confirm if they can run UDFs in group by in spark1.2?

 I have two builds running -- one from a custom build from early December
 (commit 4259ca8dd12) which works fine, and Spark1.2-RC2.

 On the latter I get:

  jdbc:hive2://XXX.208:10001 select 
 from_unixtime(epoch,'-MM-dd-HH'),count(*) count
 . . . . . . . . . . . . . . . . . . from tbl
 . . . . . . . . . . . . . . . . . . group by 
 from_unixtime(epoch,'-MM-dd-HH');
 Error: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
 Expression not in GROUP BY: 
 HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
  AS _c0#1004, tree:
 Aggregate 
 [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)],
  
 [HiveSimpleUdf#org.apache.hadoop.hive.ql.udf.UDFFromUnixTime(epoch#1049L,-MM-dd-HH)
  AS _c0#1004,COUNT(1) AS count#1003L]
  MetastoreRelation default, tbl, None (state=,code=0)

 ​

 This worked fine on my older build. I don't see a JIRA on this but maybe
 I'm not looking right. Can someone please advise?