[jira] [Commented] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15970390#comment-15970390 ] Yael Aharon commented on SPARK-13680: - This has been fixed in spark 1.6. It can probably be closed. > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv, setup.hql > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS > `count_all`, count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19884) Add the ability to get all registered functions from a SparkSession
[ https://issues.apache.org/jira/browse/SPARK-19884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yael Aharon resolved SPARK-19884. - Resolution: Not A Problem Thank you so much for your reply > Add the ability to get all registered functions from a SparkSession > --- > > Key: SPARK-19884 > URL: https://issues.apache.org/jira/browse/SPARK-19884 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.1.0 >Reporter: Yael Aharon > > It would be very useful to get the list of functions that are registered with > a SparkSession. Built-in and otherwise. > This would be useful e.g. for auto-completion support in editors built around > Spark SQL. > thanks! -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19884) Add the ability to get all registered functions from a SparkSession
Yael Aharon created SPARK-19884: --- Summary: Add the ability to get all registered functions from a SparkSession Key: SPARK-19884 URL: https://issues.apache.org/jira/browse/SPARK-19884 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.1.0 Reporter: Yael Aharon It would be very useful to get the list of functions that are registered with a SparkSession. Built-in and otherwise. This would be useful e.g. for auto-completion support in editors built around Spark SQL. thanks! -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yael Aharon updated SPARK-13680: Attachment: setup.hql > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv, setup.hql > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS > `count_all`, count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180120#comment-15180120 ] Yael Aharon commented on SPARK-13680: - I found this in the spark executor logs when running the MyDoubleAVG UDAF. Execution continued in spite of this exception: java.lang.ClassCastException: org.apache.spark.sql.types.GenericArrayData cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:41) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getLong(rows.scala:247) at org.apache.spark.sql.catalyst.expressions.JoinedRow.getLong(JoinedRow.scala:85) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply772_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:174) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:171) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.processCurrentSortedGroup(SortBasedAggregationIterator.scala:100) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:139) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:30) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:74) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS > `count_all`, count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180120#comment-15180120 ] Yael Aharon edited comment on SPARK-13680 at 3/4/16 4:42 PM: - I found this in the spark executor logs when running the MyDoubleAVG UDAF. Execution continued in spite of this exception: java.lang.ClassCastException: org.apache.spark.sql.types.GenericArrayData cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:41) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getLong(rows.scala:247) at org.apache.spark.sql.catalyst.expressions.JoinedRow.getLong(JoinedRow.scala:85) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply772_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:174) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:171) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.processCurrentSortedGroup(SortBasedAggregationIterator.scala:100) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:139) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:30) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:74) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) was (Author: yael): I found this in the spark executor logs when running the MyDoubleAVG UDAF. Execution continued in spite of this exception: java.lang.ClassCastException: org.apache.spark.sql.types.GenericArrayData cannot be cast to java.lang.Long at scala.runtime.BoxesRunTime.unboxToLong(BoxesRunTime.java:110) at org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow$class.getLong(rows.scala:41) at org.apache.spark.sql.catalyst.expressions.GenericMutableRow.getLong(rows.scala:247) at org.apache.spark.sql.catalyst.expressions.JoinedRow.getLong(JoinedRow.scala:85) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply772_0$(Unknown Source) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificMutableProjection.apply(Unknown Source) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:174) at org.apache.spark.sql.execution.aggregate.AggregationIterator$$anonfun$11.apply(AggregationIterator.scala:171) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.processCurrentSortedGroup(SortBasedAggregationIterator.scala:100) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:139) at org.apache.spark.sql.execution.aggregate.SortBasedAggregationIterator.next(SortBasedAggregationIterator.scala:30) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:119) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:74) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:88) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.
[jira] [Updated] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yael Aharon updated SPARK-13680: Description: I am trying to incorporate the Java UDAF from https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java into an SQL query. I registered the UDAF like this: sqlContext.udf().register("myavg", new MyDoubleAvg()); My SQL query is: SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS `count_all`, count(nulli) AS `count_nulli` FROM mytable As soon as I add the UDAF myavg to the SQL, all the results become incorrect. When I remove the call to the UDAF, the results are correct. I was able to go around the issue by modifying bufferSchema of the UDAF to use an array and the corresponding update and merge methods. was: I am trying to incorporate the Java UDAF from https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java into an SQL query. I registered the UDAF like this: sqlContext.udf().register("myavg", new MyDoubleAvg()); My SQL query is: SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count(*) AS `count_all`, count(nulli) AS `count_nulli` FROM mytable As soon as I add the UDAF myavg to the SQL, all the results become incorrect. When I remove the call to the UDAF, the results are correct. I was able to go around the issue by modifying bufferSchema of the UDAF to use an array and the corresponding update and merge methods. > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count( * ) AS > `cou
[jira] [Comment Edited] (SPARK-13494) Cannot sort on a column which is of type "array"
[ https://issues.apache.org/jira/browse/SPARK-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172115#comment-15172115 ] Yael Aharon edited comment on SPARK-13494 at 3/4/16 4:35 PM: - I am using Spark 1.5 from Cloudera distribution CDH 5.5.2 . Do you think this was fixed since? The Hive schema of the column in question is array was (Author: yael): I am using Spark 5.2 from Cloudera distribution CDH 5.2 . Do you think this was fixed since? The Hive schema of the column in question is array > Cannot sort on a column which is of type "array" > > > Key: SPARK-13494 > URL: https://issues.apache.org/jira/browse/SPARK-13494 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yael Aharon > > Executing the following SQL results in an error if columnName refers to a > column of type array > SELECT * FROM myTable ORDER BY columnName ASC LIMIT 50 > The error is > org.apache.spark.sql.AnalysisException: cannot resolve 'columnName ASC' due > to data type mismatch: cannot sort data type array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15180096#comment-15180096 ] Yael Aharon commented on SPARK-13680: - I attached data.csv which is the data used for this test > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count(*) AS `count_all`, > count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
[ https://issues.apache.org/jira/browse/SPARK-13680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yael Aharon updated SPARK-13680: Attachment: data.csv > Java UDAF with more than one intermediate argument returns wrong results > > > Key: SPARK-13680 > URL: https://issues.apache.org/jira/browse/SPARK-13680 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: CDH 5.5.2 >Reporter: Yael Aharon > Attachments: data.csv > > > I am trying to incorporate the Java UDAF from > https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java > into an SQL query. > I registered the UDAF like this: > sqlContext.udf().register("myavg", new MyDoubleAvg()); > My SQL query is: > SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, > AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS > `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS > `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS > `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS > `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS > `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS > `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS > `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, > AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, > SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, > MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count(*) AS `count_all`, > count(nulli) AS `count_nulli` FROM mytable > As soon as I add the UDAF myavg to the SQL, all the results become incorrect. > When I remove the call to the UDAF, the results are correct. > I was able to go around the issue by modifying bufferSchema of the UDAF to > use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13680) Java UDAF with more than one intermediate argument returns wrong results
Yael Aharon created SPARK-13680: --- Summary: Java UDAF with more than one intermediate argument returns wrong results Key: SPARK-13680 URL: https://issues.apache.org/jira/browse/SPARK-13680 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Environment: CDH 5.5.2 Reporter: Yael Aharon I am trying to incorporate the Java UDAF from https://github.com/apache/spark/blob/master/sql/hive/src/test/java/org/apache/spark/sql/hive/aggregate/MyDoubleAvg.java into an SQL query. I registered the UDAF like this: sqlContext.udf().register("myavg", new MyDoubleAvg()); My SQL query is: SELECT AVG(seqi) AS `avg_seqi`, AVG(seqd) AS `avg_seqd`, AVG(ci) AS `avg_ci`, AVG(cd) AS `avg_cd`, AVG(stdevd) AS `avg_stdevd`, AVG(stdevi) AS `avg_stdevi`, MAX(seqi) AS `max_seqi`, MAX(seqd) AS `max_seqd`, MAX(ci) AS `max_ci`, MAX(cd) AS `max_cd`, MAX(stdevd) AS `max_stdevd`, MAX(stdevi) AS `max_stdevi`, MIN(seqi) AS `min_seqi`, MIN(seqd) AS `min_seqd`, MIN(ci) AS `min_ci`, MIN(cd) AS `min_cd`, MIN(stdevd) AS `min_stdevd`, MIN(stdevi) AS `min_stdevi`,SUM(seqi) AS `sum_seqi`, SUM(seqd) AS `sum_seqd`, SUM(ci) AS `sum_ci`, SUM(cd) AS `sum_cd`, SUM(stdevd) AS `sum_stdevd`, SUM(stdevi) AS `sum_stdevi`, myavg(seqd) as `myavg_seqd`, AVG(zero) AS `avg_zero`, AVG(nulli) AS `avg_nulli`,AVG(nulld) AS `avg_nulld`, SUM(zero) AS `sum_zero`, SUM(nulli) AS `sum_nulli`,SUM(nulld) AS `sum_nulld`,MAX(zero) AS `max_zero`, MAX(nulli) AS `max_nulli`,MAX(nulld) AS `max_nulld`,count(*) AS `count_all`, count(nulli) AS `count_nulli` FROM mytable As soon as I add the UDAF myavg to the SQL, all the results become incorrect. When I remove the call to the UDAF, the results are correct. I was able to go around the issue by modifying bufferSchema of the UDAF to use an array and the corresponding update and merge methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13494) Cannot sort on a column which is of type "array"
[ https://issues.apache.org/jira/browse/SPARK-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172212#comment-15172212 ] Yael Aharon commented on SPARK-13494: - It is Spark 1.5.0 Using spark-assembly-1.5.0-cdh5.5.2-hadoop2.6.0-cdh5.5.2.jar in local mode > Cannot sort on a column which is of type "array" > > > Key: SPARK-13494 > URL: https://issues.apache.org/jira/browse/SPARK-13494 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yael Aharon > > Executing the following SQL results in an error if columnName refers to a > column of type array > SELECT * FROM myTable ORDER BY columnName ASC LIMIT 50 > The error is > org.apache.spark.sql.AnalysisException: cannot resolve 'columnName ASC' due > to data type mismatch: cannot sort data type array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13494) Cannot sort on a column which is of type "array"
[ https://issues.apache.org/jira/browse/SPARK-13494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172115#comment-15172115 ] Yael Aharon commented on SPARK-13494: - I am using Spark 5.2 from Cloudera distribution CDH 5.2 . Do you think this was fixed since? The Hive schema of the column in question is array > Cannot sort on a column which is of type "array" > > > Key: SPARK-13494 > URL: https://issues.apache.org/jira/browse/SPARK-13494 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Yael Aharon > > Executing the following SQL results in an error if columnName refers to a > column of type array > SELECT * FROM myTable ORDER BY columnName ASC LIMIT 50 > The error is > org.apache.spark.sql.AnalysisException: cannot resolve 'columnName ASC' due > to data type mismatch: cannot sort data type array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13494) Cannot sort on a column which is of type "array"
Yael Aharon created SPARK-13494: --- Summary: Cannot sort on a column which is of type "array" Key: SPARK-13494 URL: https://issues.apache.org/jira/browse/SPARK-13494 Project: Spark Issue Type: Bug Reporter: Yael Aharon Executing the following SQL results in an error if columnName refers to a column of type array SELECT * FROM myTable ORDER BY columnName ASC LIMIT 50 The error is org.apache.spark.sql.AnalysisException: cannot resolve 'columnName ASC' due to data type mismatch: cannot sort data type array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org