[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe URL: https://github.com/apache/spark/pull/25768#discussion_r326727451 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -488,7 +488,7 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { } val columnEquals = df.sparkSession.sessionState.analyzer.resolver -val projections = df.schema.fields.map { f => +val filledColumns = df.schema.fields.filter { f => Review comment: We can also traverse df.logicalPlan.output to avoid calling withColumns, but it might not be a big deal here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe URL: https://github.com/apache/spark/pull/25768#discussion_r326722825 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { throw new IllegalArgumentException(s"$targetType is not matched at fillValue") } // Only fill if the column is part of the cols list. - if (typeMatches && cols.exists(col => columnEquals(f.name, col))) { -fillCol[T](f, value) - } else { -df.col(f.name) - } + typeMatches && cols.exists(col => columnEquals(f.name, col)) +}.map { col => + (col.name, fillCol[T](col, value)) } -df.select(projections : _*) +df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2)) Review comment: we can simplify the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe URL: https://github.com/apache/spark/pull/25768#discussion_r323549161 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ## @@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: DataFrame) { throw new IllegalArgumentException(s"$targetType is not matched at fillValue") } // Only fill if the column is part of the cols list. - if (typeMatches && cols.exists(col => columnEquals(f.name, col))) { -fillCol[T](f, value) - } else { -df.col(f.name) - } + typeMatches && cols.exists(col => columnEquals(f.name, col)) +}.map { col => + (col.name, fillCol[T](col, value)) } -df.select(projections : _*) +df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2)) Review comment: When `df` has a duplicate column name, what is the behavior? Also, we need to add test cases to ensure the behaviors are consistent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org