[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-04-17 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244566#comment-15244566
 ] 

Apache Spark commented on SPARK-13363:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/12451

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new Aggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-04-13 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239182#comment-15239182
 ] 

Apache Spark commented on SPARK-13363:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/12359

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new Aggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-04-09 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233752#comment-15233752
 ] 

koert kuipers commented on SPARK-13363:
---

this issue is still present after 
 [SPARK-14451][SQL] Move encoder definition into Aggregator interface #12231
was merged in

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new Aggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-03-31 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15220492#comment-15220492
 ] 

koert kuipers commented on SPARK-13363:
---

just doing some digging. the issue seems to be that when the 
TypedAggregateExpression is created from the Aggregator aEncoder is set to 
None, and it stays None. then when the check is done that calls resolved on 
TypedAggregateExpression it returns false because aEncoder is None. 

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new Aggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-03-19 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200094#comment-15200094
 ] 

Reynold Xin commented on SPARK-13363:
-

[~maropu] if you have time to bring your patch up to date, that'd be great.


> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new SqlAggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-03-19 Thread koert kuipers (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199947#comment-15199947
 ] 

koert kuipers commented on SPARK-13363:
---

yes problem is still there

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new SqlAggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-03-15 Thread Reynold Xin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15194882#comment-15194882
 ] 

Reynold Xin commented on SPARK-13363:
-

cc [~maropu]

Is this still a problem now DataFrame/Dataset is merged? If it is still a 
problem we should fix it.


> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new SqlAggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13363) Aggregator not working with DataFrame

2016-02-19 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154034#comment-15154034
 ] 

Apache Spark commented on SPARK-13363:
--

User 'maropu' has created a pull request for this issue:
https://github.com/apache/spark/pull/11269

> Aggregator not working with DataFrame
> -
>
> Key: SPARK-13363
> URL: https://issues.apache.org/jira/browse/SPARK-13363
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: koert kuipers
>Priority: Blocker
>
> org.apache.spark.sql.expressions.Aggregator doc/comments says: A base class 
> for user-defined aggregations, which can be used in [[DataFrame]] and 
> [[Dataset]]
> it works well with Dataset/GroupedDataset, but i am having no luck using it 
> with DataFrame/GroupedData. does anyone have an example how to use it with a 
> DataFrame?
> in particular i would like to use it with this method in GroupedData:
> {noformat}
>   def agg(expr: Column, exprs: Column*): DataFrame
> {noformat}
> clearly it should be possible, since GroupedDataset uses that very same 
> method to do the work:
> {noformat}
>   private def agg(exprs: Column*): DataFrame =
> groupedData.agg(withEncoder(exprs.head), exprs.tail.map(withEncoder): _*)
> {noformat}
> the trick seems to be the wrapping in withEncoder, which is private. i tried 
> to do something like it myself, but i had no luck since it uses more private 
> stuff in TypedColumn.
> anyhow, my attempt at using it in DataFrame:
> {noformat}
> val simpleSum = new SqlAggregator[Int, Int, Int] {
>   def zero: Int = 0 // The initial value.
>   def reduce(b: Int, a: Int) = b + a// Add an element to the running total
>   def merge(b1: Int, b2: Int) = b1 + b2 // Merge intermediate values.
>   def finish(b: Int) = b// Return the final result.
> }.toColumn
> val df = sc.makeRDD(1 to 3).map(i => (i, i)).toDF("k", "v")
> df.groupBy("k").agg(simpleSum).show
> {noformat}
> and the resulting error:
> {noformat}
> org.apache.spark.sql.AnalysisException: unresolved operator 'Aggregate 
> [k#104], [k#104,($anon$3(),mode=Complete,isDistinct=false) AS sum#106];
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:241)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
> at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:122)
> at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
> at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:46)
> at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
> at org.apache.spark.sql.DataFrame.(DataFrame.scala:130)
> at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:49)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org