[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16975733#comment-16975733 ] sandeshyapuram commented on SPARK-29890: [~imback82] This happens even for a normal join: {noformat} val p1 = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val p2 = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") p1.join(p2, Seq("nums"), "left") .na.fill(0).show {noformat} > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29890: --- Affects Version/s: 2.4.3 > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3, 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974087#comment-16974087 ] sandeshyapuram edited comment on SPARK-29890 at 11/14/19 9:36 AM: -- I've raised it as a bug because I feel fill.na(0) needs to fill 0 regardless of duplicate column names. [~cloud_fan] Thoughts was (Author: sandeshyapuram): I've raised it as a bug because I feel fill.na(0) needs to fill 0 regardless of duplicate column names. > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974087#comment-16974087 ] sandeshyapuram commented on SPARK-29890: I've raised it as a bug because I feel fill.na(0) needs to fill 0 regardless of duplicate column names. > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) > ... 54 elided{noformat} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29890: --- Description: Trying to fill out na values with 0. {noformat} scala> :paste // Entering paste mode (ctrl-D to finish) val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show{noformat} {noformat} 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: error looking up the name of group 820818257: No such file or directory org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc, abc.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) ... 54 elided{noformat} was: Trying to fill out na values with 0. {noformat} scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show{noformat} {noformat} 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: error looking up the name of group 820818257: No such file or directory org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc, abc.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) ... 54 elided{noformat} > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish) > val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29890: --- Description: Trying to fill out na values with 0. {noformat} scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show{noformat} {noformat} 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: error looking up the name of group 820818257: No such file or directory org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc, abc.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) ... 54 elided{noformat} was: Trying to fill out na values with 0. {code:java} scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show{code} > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {noformat} > scala> :paste > // Entering paste mode (ctrl-D to finish)val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{noformat} > {noformat} > 9/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: > error looking up the name of group 820818257: No such file or directory > org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could > be: abc, abc.; > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) > at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) > at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) > at > org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) > at > org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) > at > org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) > at > org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) > at >
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29890: --- Description: Trying to fill out na values with 0. {code:java} scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show{code} > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > > Trying to fill out na values with 0. > {code:java} > scala> :paste > // Entering paste mode (ctrl-D to finish)val parent = > spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") > val c1 = parent.filter(lit(true)) > val c2 = parent.filter(lit(true)) > c1.join(c2, Seq("nums"), "left") > .na.fill(0).show{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29890) Unable to fill na with 0 with duplicate columns
sandeshyapuram created SPARK-29890: -- Summary: Unable to fill na with 0 with duplicate columns Key: SPARK-29890 URL: https://issues.apache.org/jira/browse/SPARK-29890 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 2.3.3 Environment: Trying to fill out na values with 0. {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.3 /_/Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) Type in expressions to have them evaluated. Type :help for more information.scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show // Exiting paste mode, now interpreting.ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used 19/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: error looking up the name of group 820818257: No such file or directory org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc, abc.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) ... 54 elided {code} Reporter: sandeshyapuram -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29890) Unable to fill na with 0 with duplicate columns
[ https://issues.apache.org/jira/browse/SPARK-29890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29890: --- Environment: (was: Trying to fill out na values with 0. {code:java} Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.3.3 /_/Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_222) Type in expressions to have them evaluated. Type :help for more information.scala> :paste // Entering paste mode (ctrl-D to finish)val parent = spark.sparkContext.parallelize(Seq((1,2),(3,4),(5,6))).toDF("nums", "abc") val c1 = parent.filter(lit(true)) val c2 = parent.filter(lit(true)) c1.join(c2, Seq("nums"), "left") .na.fill(0).show // Exiting paste mode, now interpreting.ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,/etc/hive/conf.dist/ivysettings.xml will be used 19/11/14 04:24:24 ERROR org.apache.hadoop.security.JniBasedUnixGroupsMapping: error looking up the name of group 820818257: No such file or directory org.apache.spark.sql.AnalysisException: Reference 'abc' is ambiguous, could be: abc, abc.; at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolve(LogicalPlan.scala:213) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveQuoted(LogicalPlan.scala:117) at org.apache.spark.sql.Dataset.resolve(Dataset.scala:220) at org.apache.spark.sql.Dataset.col(Dataset.scala:1246) at org.apache.spark.sql.DataFrameNaFunctions.org$apache$spark$sql$DataFrameNaFunctions$$fillCol(DataFrameNaFunctions.scala:443) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:500) at org.apache.spark.sql.DataFrameNaFunctions$$anonfun$7.apply(DataFrameNaFunctions.scala:492) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186) at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186) at org.apache.spark.sql.DataFrameNaFunctions.fillValue(DataFrameNaFunctions.scala:492) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:171) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:155) at org.apache.spark.sql.DataFrameNaFunctions.fill(DataFrameNaFunctions.scala:134) ... 54 elided {code}) > Unable to fill na with 0 with duplicate columns > --- > > Key: SPARK-29890 > URL: https://issues.apache.org/jira/browse/SPARK-29890 > Project: Spark > Issue Type: Bug > Components: Spark Shell >Affects Versions: 2.3.3 >Reporter: sandeshyapuram >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973985#comment-16973985 ] sandeshyapuram commented on SPARK-29682: Thanks! > Failure when resolving conflicting references in Join: > -- > > Key: SPARK-29682 > URL: https://issues.apache.org/jira/browse/SPARK-29682 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.3 >Reporter: sandeshyapuram >Assignee: Terry Kim >Priority: Major > Fix For: 2.4.5, 3.0.0 > > > When I try to self join a parentDf with multiple childDf say childDf1 ... ... > where childDfs are derived after a cube or rollup and are filtered based on > group bys, > I get and error > {{Failure when resolving conflicting references in Join: }} > This shows a long error message which is quite unreadable. On the other hand, > if I replace cube or rollup with old groupBy, it works without issues. > > *Sample code:* > {code:java} > val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums") > val cubeDF = numsDF > .cube("nums") > .agg( > max(lit(0)).as("agcol"), > grouping_id().as("gid") > ) > > val group0 = cubeDF.filter(col("gid") <=> lit(0)) > val group1 = cubeDF.filter(col("gid") <=> lit(1)) > cubeDF.printSchema > group0.printSchema > group1.printSchema > //Recreating cubeDf > cubeDF.select("nums").distinct > .join(group0, Seq("nums"), "inner") > .join(group1, Seq("nums"), "inner") > .show > {code} > *Sample output:* > {code:java} > numsDF: org.apache.spark.sql.DataFrame = [nums: int] > cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more > field] > group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > org.apache.spark.sql.AnalysisException: > Failure when resolving conflicting references in Join: > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > Conflicting attributes: nums#220 > ;; > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) > at >
[jira] [Commented] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965216#comment-16965216 ] sandeshyapuram commented on SPARK-29682: [~imback82] & [~cloud_fan] Currently I've worked my way around by renaming every column in the dataframes to perform joins and that works. Let me know if you have a better workaround to deal with it. > Failure when resolving conflicting references in Join: > -- > > Key: SPARK-29682 > URL: https://issues.apache.org/jira/browse/SPARK-29682 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > When I try to self join a parentDf with multiple childDf say childDf1 ... ... > where childDfs are derived after a cube or rollup and are filtered based on > group bys, > I get and error > {{Failure when resolving conflicting references in Join: }} > This shows a long error message which is quite unreadable. On the other hand, > if I replace cube or rollup with old groupBy, it works without issues. > > *Sample code:* > {code:java} > val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums") > val cubeDF = numsDF > .cube("nums") > .agg( > max(lit(0)).as("agcol"), > grouping_id().as("gid") > ) > > val group0 = cubeDF.filter(col("gid") <=> lit(0)) > val group1 = cubeDF.filter(col("gid") <=> lit(1)) > cubeDF.printSchema > group0.printSchema > group1.printSchema > //Recreating cubeDf > cubeDF.select("nums").distinct > .join(group0, Seq("nums"), "inner") > .join(group1, Seq("nums"), "inner") > .show > {code} > *Sample output:* > {code:java} > numsDF: org.apache.spark.sql.DataFrame = [nums: int] > cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more > field] > group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > org.apache.spark.sql.AnalysisException: > Failure when resolving conflicting references in Join: > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > Conflicting attributes: nums#220 > ;; > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) > at >
[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29682: --- Component/s: Spark Shell Spark Core > Failure when resolving conflicting references in Join: > -- > > Key: SPARK-29682 > URL: https://issues.apache.org/jira/browse/SPARK-29682 > Project: Spark > Issue Type: Bug > Components: Spark Core, Spark Shell, Spark Submit >Affects Versions: 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > When I try to self join a parentDf with multiple childDf say childDf1 ... ... > where childDfs are derived after a cube or rollup and are filtered based on > group bys, > I get and error > {{Failure when resolving conflicting references in Join: }} > This shows a long error message which is quite unreadable. On the other hand, > if I replace cube or rollup with old groupBy, it works without issues. > > *Sample code:* > {code:java} > val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums") > val cubeDF = numsDF > .cube("nums") > .agg( > max(lit(0)).as("agcol"), > grouping_id().as("gid") > ) > > val group0 = cubeDF.filter(col("gid") <=> lit(0)) > val group1 = cubeDF.filter(col("gid") <=> lit(1)) > cubeDF.printSchema > group0.printSchema > group1.printSchema > //Recreating cubeDf > cubeDF.select("nums").distinct > .join(group0, Seq("nums"), "inner") > .join(group1, Seq("nums"), "inner") > .show > {code} > *Sample output:* > {code:java} > numsDF: org.apache.spark.sql.DataFrame = [nums: int] > cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more > field] > group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > org.apache.spark.sql.AnalysisException: > Failure when resolving conflicting references in Join: > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > Conflicting attributes: nums#220 > ;; > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:335) >
[jira] [Commented] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963789#comment-16963789 ] sandeshyapuram commented on SPARK-29682: I have reproduced this in spark-submit as well as cubole notebooks. I'm not sure how I can provide you with a self reproducer > Failure when resolving conflicting references in Join: > -- > > Key: SPARK-29682 > URL: https://issues.apache.org/jira/browse/SPARK-29682 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > When I try to self join a parentDf with multiple childDf say childDf1 ... ... > where childDfs are derived after a cube or rollup and are filtered based on > group bys, > I get and error > {{Failure when resolving conflicting references in Join: }} > This shows a long error message which is quite unreadable. On the other hand, > if I replace cube or rollup with old groupBy, it works without issues. > > *Sample code:* > {code:java} > val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums") > val cubeDF = numsDF > .cube("nums") > .agg( > max(lit(0)).as("agcol"), > grouping_id().as("gid") > ) > > val group0 = cubeDF.filter(col("gid") <=> lit(0)) > val group1 = cubeDF.filter(col("gid") <=> lit(1)) > cubeDF.printSchema > group0.printSchema > group1.printSchema > //Recreating cubeDf > cubeDF.select("nums").distinct > .join(group0, Seq("nums"), "inner") > .join(group1, Seq("nums"), "inner") > .show > {code} > *Sample output:* > {code:java} > numsDF: org.apache.spark.sql.DataFrame = [nums: int] > cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more > field] > group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, > agcol: int ... 1 more field] > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > root > |-- nums: integer (nullable = true) > |-- agcol: integer (nullable = true) > |-- gid: integer (nullable = false) > org.apache.spark.sql.AnalysisException: > Failure when resolving conflicting references in Join: > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > Conflicting attributes: nums#220 > ;; > 'Join Inner > :- Deduplicate [nums#220] > : +- Project [nums#220] > : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > : +- Project [nums#212, nums#212 AS nums#219] > : +- Project [value#210 AS nums#212] > : +- SerializeFromObject [input[0, int, false] AS value#210] > :+- ExternalRDD [obj#209] > +- Filter (gid#217 <=> 0) >+- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS > agcol#216, spark_grouping_id#218 AS gid#217] > +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], > [nums#212, nums#220, spark_grouping_id#218] > +- Project [nums#212, nums#212 AS nums#219] > +- Project [value#210 AS nums#212] >+- SerializeFromObject [input[0, int, false] AS value#210] > +- ExternalRDD [obj#209] > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) > at > org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) > at >
[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29682: --- Description: When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums") val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1)) cubeDF.printSchema group0.printSchema group1.printSchema //Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show {code} *Sample output:* {code:java} numsDF: org.apache.spark.sql.DataFrame = [nums: int] cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more field] group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field] group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field] root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) org.apache.spark.sql.AnalysisException: Failure when resolving conflicting references in Join: 'Join Inner :- Deduplicate [nums#220] : +- Project [nums#220] : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] : +- Project [nums#212, nums#212 AS nums#219] : +- Project [value#210 AS nums#212] : +- SerializeFromObject [input[0, int, false] AS value#210] :+- ExternalRDD [obj#209] +- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219] +- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209] Conflicting attributes: nums#220 ;; 'Join Inner :- Deduplicate [nums#220] : +- Project [nums#220] : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] : +- Project [nums#212, nums#212 AS nums#219] : +- Project [value#210 AS nums#212] : +- SerializeFromObject [input[0, int, false] AS value#210] :+- ExternalRDD [obj#209] +- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219] +- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:335) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106) at
[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29682: --- Description: When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema group0.printSchema group1.printSchema//Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show {code} *Sample output:* {code:java} numsDF: org.apache.spark.sql.DataFrame = [nums: int] cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more field] group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field] group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field] root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false) org.apache.spark.sql.AnalysisException: Failure when resolving conflicting references in Join: 'Join Inner :- Deduplicate [nums#220] : +- Project [nums#220] : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] : +- Project [nums#212, nums#212 AS nums#219] : +- Project [value#210 AS nums#212] : +- SerializeFromObject [input[0, int, false] AS value#210] :+- ExternalRDD [obj#209] +- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219] +- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209] Conflicting attributes: nums#220 ;; 'Join Inner :- Deduplicate [nums#220] : +- Project [nums#220] : +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] :+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] : +- Project [nums#212, nums#212 AS nums#219] : +- Project [value#210 AS nums#212] : +- SerializeFromObject [input[0, int, false] AS value#210] :+- ExternalRDD [obj#209] +- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219] +- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:335) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106) at
[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29682: --- Description: When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema group0.printSchema group1.printSchema//Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show {code} *Sample output:* {code:java} numsDF: org.apache.spark.sql.DataFrame = [nums: int]cubeDF: org.apache.spark.sql.DataFrame = [nums: int, agcol: int ... 1 more field]group0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field]group1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [nums: int, agcol: int ... 1 more field]root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false)root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false)root |-- nums: integer (nullable = true) |-- agcol: integer (nullable = true) |-- gid: integer (nullable = false)org.apache.spark.sql.AnalysisException:Failure when resolving conflicting references in Join:'Join Inner:- Deduplicate [nums#220]: +- Project [nums#220]: +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217]:+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218]: +- Project [nums#212, nums#212 AS nums#219]: +- Project [value#210 AS nums#212]: +- SerializeFromObject [input[0, int, false] AS value#210]:+- ExternalRDD [obj#209]+- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219]+- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209]Conflicting attributes: nums#220;;'Join Inner:- Deduplicate [nums#220]: +- Project [nums#220]: +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217]:+- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218]: +- Project [nums#212, nums#212 AS nums#219]: +- Project [value#210 AS nums#212]: +- SerializeFromObject [input[0, int, false] AS value#210]:+- ExternalRDD [obj#209]+- Filter (gid#217 <=> 0) +- Aggregate [nums#220, spark_grouping_id#218], [nums#220, max(0) AS agcol#216, spark_grouping_id#218 AS gid#217] +- Expand [List(nums#212, nums#219, 0), List(nums#212, null, 1)], [nums#212, nums#220, spark_grouping_id#218] +- Project [nums#212, nums#212 AS nums#219] +- Project [value#210 AS nums#212] +- SerializeFromObject [input[0, int, false] AS value#210] +- ExternalRDD [obj#209] at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:42) at org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:335) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:125) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:85) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:96) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:109) at org.apache.spark.sql.catalyst.analysis.Analyzer$$anonfun$executeAndCheck$1.apply(Analyzer.scala:106) at
[jira] [Updated] (SPARK-29682) Failure when resolving conflicting references in Join:
[ https://issues.apache.org/jira/browse/SPARK-29682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sandeshyapuram updated SPARK-29682: --- Description: When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema group0.printSchema group1.printSchema//Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show{code} was: When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema group0.printSchema group1.printSchema//Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show{code} > Failure when resolving conflicting references in Join: > -- > > Key: SPARK-29682 > URL: https://issues.apache.org/jira/browse/SPARK-29682 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.4.3 >Reporter: sandeshyapuram >Priority: Major > > When I try to self join a parentDf with multiple childDf say childDf1 ... ... > where childDfs are derived after a cube or rollup and are filtered based on > group bys, > I get and error > {{Failure when resolving conflicting references in Join: }} > This shows a long error message which is quite unreadable. On the other hand, > if I replace cube or rollup with old groupBy, it works without issues. > > *Sample code:* > {code:java} > val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF > .cube("nums") > .agg( > max(lit(0)).as("agcol"), > grouping_id().as("gid") > ) > > val group0 = cubeDF.filter(col("gid") <=> lit(0)) > val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema > group0.printSchema > group1.printSchema//Recreating cubeDf > cubeDF.select("nums").distinct > .join(group0, Seq("nums"), "inner") > .join(group1, Seq("nums"), "inner") > .show{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29682) Failure when resolving conflicting references in Join:
sandeshyapuram created SPARK-29682: -- Summary: Failure when resolving conflicting references in Join: Key: SPARK-29682 URL: https://issues.apache.org/jira/browse/SPARK-29682 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.4.3 Reporter: sandeshyapuram When I try to self join a parentDf with multiple childDf say childDf1 ... ... where childDfs are derived after a cube or rollup and are filtered based on group bys, I get and error {{Failure when resolving conflicting references in Join: }} This shows a long error message which is quite unreadable. On the other hand, if I replace cube or rollup with old groupBy, it works without issues. *Sample code:* {code:java} val numsDF = sc.parallelize(Seq(1,2,3,4,5,6)).toDF("nums")val cubeDF = numsDF .cube("nums") .agg( max(lit(0)).as("agcol"), grouping_id().as("gid") ) val group0 = cubeDF.filter(col("gid") <=> lit(0)) val group1 = cubeDF.filter(col("gid") <=> lit(1))cubeDF.printSchema group0.printSchema group1.printSchema//Recreating cubeDf cubeDF.select("nums").distinct .join(group0, Seq("nums"), "inner") .join(group1, Seq("nums"), "inner") .show{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org