[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16051288#comment-16051288 ] Takeshi Yamamuro commented on SPARK-17237: -- I checked the other behaviours and I probably think it seems to be correct to use aggregated column names without qualifiers (since other aggregated columns has no qualifier: https://github.com/apache/spark/pull/18302/files). So, it seems the current master behaviour is okay to me for now. > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16049230#comment-16049230 ] Apache Spark commented on SPARK-17237: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/18302 > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042713#comment-16042713 ] Takeshi Yamamuro commented on SPARK-17237: -- sorry for confusing you, but I originally mean nested buck-ticks are not accepted as written in the description: "`1_max(t.`v2`)`". Spark internally add outer buck-ticks in this case. > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042513#comment-16042513 ] Alberto Fernández commented on SPARK-17237: --- Hi there, thank you for your thorough answer. I think it would be good to fix it since the patch version bump in Spark may lead to think that you don't need to make changes in your code, and that may not be the case. Also, why are backticks not allowed since Spark 2.1.1? They seem to work fine in Spark 2.1.0. Thanks in advance. > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16041434#comment-16041434 ] Takeshi Yamamuro commented on SPARK-17237: -- Thanks for the report. I think there are two points you pointed out: a qualifier and buck-ticks. Yea, you're right and it seems my pr above wrongly drop a qualifier for aggregated column names(then, it changed the behaviour). {code} // Spark-v2.1 scala> Seq((1, 2)).toDF("id", "v1").createOrReplaceTempView("s") scala> Seq((1, 2)).toDF("id", "v2").createOrReplaceTempView("t") scala> val df1 = sql("SELECT * FROM s") df1: org.apache.spark.sql.DataFrame = [id: int, v1: int] scala> val df2 = sql("SELECT * FROM t") df2: org.apache.spark.sql.DataFrame = [id: int, v2: int] scala> df1.join(df2, "id" :: Nil).groupBy("id").pivot("id").max("v1", "v2").show +---+-+-+ | id|1_max(s.`v1`)|1_max(t.`v2`)| +---+-+-+ | 1|2|2| +---+-+-+ // Master scala> df1.join(df2, "id" :: Nil).groupBy("id").pivot("id").max("v1", "v2").show +---+-+-+ | id|1_max(v1)|1_max(v2)| +---+-+-+ | 1|2|2| +---+-+-+ {code} We could easily fix this, but I'm not 100% sure that we need to fix this. WDYT? cc: [~smilegator] {code} // Master with a patch (https://github.com/apache/spark/compare/master...maropu:SPARK-17237-4) scala> df1.join(df2, "id" :: Nil).groupBy("id").pivot("id").max("v1", "v2").show +---+---+---+ | id|1_max(s.v1)|1_max(t.v2)| +---+---+---+ | 1| 2| 2| +---+---+---+ {code} On the other hand, IIUC back-ticks are not allowed in column names cuz they have special meaning in Spark. > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16040716#comment-16040716 ] Alberto Fernández commented on SPARK-17237: --- Hi there, I think this change introduced a breaking change in the way the "withColumnRenamed" method works. I can reproduce this breaking change with the following example: {code} dataframe = sql("SELECT * FROM db.table") another_dataframe = sql("SELECT * FROM db.another_table") dataframe .join(another_dataframe, on=[...]) .pivot("column_name", values=[0, 1]) .max("column1", "column2") .withColumnRenamed("0_max(another_table.`column1`)", "name1") .withColumnRenamed("0_max(another_table.`column2`)", "name2") {code} With Spark 2.1.0, the behaviour is the expected (buggy, but expected): columns doesn't get renamed. With Spark 2.1.1, and if this issue was resolved, you wouldn't need to change anything for the renames to work. However, the column doesn't get renamed at all because now you would need to use the following renames: {code} dataframe = sql("SELECT * FROM db.table") another_dataframe = sql("SELECT * FROM db.another_table") dataframe .join(another_dataframe, on=[...]) .pivot("column_name", values=[0, 1]) .max("column1", "column2") .withColumnRenamed("0_max(column1)", "name1") .withColumnRenamed("1_max(column2)", "name2") {code} As you can see, it seems that this PR somehow managed to removed the table name from the join context and also removed the backticks, thus introducing a breaking change. I should also notice that the original issue didn't happen when using JSON as output format. It only happens because Parquet doesn't support () characters in column names, but in JSON they work just fine. Here is an example of the error thrown by Parquet after upgrading to Spark 2.1.1 and not modifying your code. {code} Attribute name "0_max(column1)" contains invalid character(s) among " ,;{}()\\n\\t=". Please use alias to rename it. {code} I think the original issue was that the parseAttributeName cannot detect "table.column" notation, and as I understand this PR still doesn't fix this issue right? As a workaround, you can change your column renames to accomodate the new format. Any ideas? Am I missing something? > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi >Assignee: Takeshi Yamamuro > Labels: newbie > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at >
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15821983#comment-15821983 ] Apache Spark commented on SPARK-17237: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/16565 > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi > Labels: newbie > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437500#comment-15437500 ] Apache Spark commented on SPARK-17237: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/14812 > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi > Labels: newbie > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17237) DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException
[ https://issues.apache.org/jira/browse/SPARK-17237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437184#comment-15437184 ] Takeshi Yamamuro commented on SPARK-17237: -- The root cause of this bug is nested backticks in column names. UnresolvedAttribute#parseAttributeName cannot handle the nested backtics. > DataFrame fill after pivot causing org.apache.spark.sql.AnalysisException > - > > Key: SPARK-17237 > URL: https://issues.apache.org/jira/browse/SPARK-17237 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0 >Reporter: Jiang Qiqi > Labels: newbie > > I am trying to run a pivot transformation which I ran on a spark1.6 cluster, > namely > sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res1: org.apache.spark.sql.DataFrame = [a: int, b: int, c: int] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > res2: org.apache.spark.sql.DataFrame = [a: int, 3_count(c): bigint, 3_avg(c): > double, 4_count(c): bigint, 4_avg(c): double] > scala> res1.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0).show > +---+--++--++ > | a|3_count(c)|3_avg(c)|4_count(c)|4_avg(c)| > +---+--++--++ > | 2| 1| 4.0| 0| 0.0| > | 3| 0| 0.0| 1| 5.0| > +---+--++--++ > after upgrade the environment to spark2.0, got an error while executing > .na.fill method > scala> sc.parallelize(Seq((2,3,4), (3,4,5))).toDF("a", "b", "c") > res3: org.apache.spark.sql.DataFrame = [a: int, b: int ... 1 more field] > scala> res3.groupBy("a").pivot("b").agg(count("c"), avg("c")).na.fill(0) > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `3_count(`c`)`; > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:103) > at > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:113) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:168) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:218) > at org.apache.spark.sql.Dataset.col(Dataset.scala:921) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:411) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:162) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$2.apply(DataFrameNaFunctions.scala:159) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:159) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:149) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org