[GitHub] [spark] cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns

GitBox Thu, 21 Nov 2019 18:49:46 -0800

cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] 
DataFrameNaFunctions.fill should handle duplicate columns
URL: https://github.com/apache/spark/pull/26593#discussion_r349413831


 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##########
 @@ -468,12 +477,26 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
       s"Unsupported value type ${v.getClass.getName} ($v).")
   }
 
+  private def toAttributes(cols: Seq[String]): Seq[Attribute] = {
+    def resolve(colName: String) : Attribute = {
+      df.col(colName).named.toAttribute match {
+        case a: Attribute => a
+        case _ => throw new IllegalArgumentException(s"'$colName' is not a top 
level column.")
 
 Review comment:
   Yea we should call `df.col` to handle `*` etc. But we shouldn't call 
`.named.toAttribute` which turns everything to attribute and make us not able 
to detect nested fields.
   
   Can we add a test to fill nested fields and see the result?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a change in pull request #26593: [SPARK-29890][SQL] DataFrameNaFunctions.fill should handle duplicate columns

Reply via email to