Terry Kim created SPARK-30065:
---------------------------------

             Summary: Unable to drop na with duplicate columns
                 Key: SPARK-30065
                 URL: https://issues.apache.org/jira/browse/SPARK-30065
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Terry Kim


Trying to drop rows with null values fails even when no columns are specified. 
This should be allowed:


{code:java}
scala> val left = Seq(("1", null), ("3", "4")).toDF("col1", "col2")
left: org.apache.spark.sql.DataFrame = [col1: string, col2: string]

scala> val right = Seq(("1", "2"), ("3", null)).toDF("col1", "col2")
right: org.apache.spark.sql.DataFrame = [col1: string, col2: string]

scala> val df = left.join(right, Seq("col1"))
df: org.apache.spark.sql.DataFrame = [col1: string, col2: string ... 1 more 
field]

scala> df.show
+----+----+----+
|col1|col2|col2|
+----+----+----+
|   1|null|   2|
|   3|   4|null|
+----+----+----+


scala> df.na.drop("any")
org.apache.spark.sql.AnalysisException: Reference 'col2' is ambiguous, could 
be: col2, col2.;
  at 
org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:240)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to