Terry Kim created SPARK-30065: --------------------------------- Summary: Unable to drop na with duplicate columns Key: SPARK-30065 URL: https://issues.apache.org/jira/browse/SPARK-30065 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Terry Kim
Trying to drop rows with null values fails even when no columns are specified. This should be allowed: {code:java} scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2") left: org.apache.spark.sql.DataFrame = [col1: string, col2: string] scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2") right: org.apache.spark.sql.DataFrame = [col1: string, col2: string] scala> val df = left.join(right, Seq("col1")) df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more field] scala> df.show +----+----+----+ |col1|col2|col2| +----+----+----+ | 1|null| 2| | 3| 4|null| +----+----+----+ scala> df.na.drop("any") org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could be: col2, col2.; at org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org