[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe

2019-09-20 Thread GitBox
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] 
Modify fillValue approach to support joined dataframe
URL: https://github.com/apache/spark/pull/25768#discussion_r326727451
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -488,7 +488,7 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
 }
 
 val columnEquals = df.sparkSession.sessionState.analyzer.resolver
-val projections = df.schema.fields.map { f =>
+val filledColumns = df.schema.fields.filter { f =>
 
 Review comment:
   We can also traverse df.logicalPlan.output to avoid calling withColumns, but 
it might not be a big deal here.  


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe

2019-09-20 Thread GitBox
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] 
Modify fillValue approach to support joined dataframe
URL: https://github.com/apache/spark/pull/25768#discussion_r326722825
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   throw new IllegalArgumentException(s"$targetType is not matched at 
fillValue")
   }
   // Only fill if the column is part of the cols list.
-  if (typeMatches && cols.exists(col => columnEquals(f.name, col))) {
-fillCol[T](f, value)
-  } else {
-df.col(f.name)
-  }
+  typeMatches && cols.exists(col => columnEquals(f.name, col))
+}.map { col =>
+  (col.name, fillCol[T](col, value))
 }
-df.select(projections : _*)
+df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2))
 
 Review comment:
   we can simplify the code.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] Modify fillValue approach to support joined dataframe

2019-09-11 Thread GitBox
gatorsmile commented on a change in pull request #25768: [SPARK-29063][SQL] 
Modify fillValue approach to support joined dataframe
URL: https://github.com/apache/spark/pull/25768#discussion_r323549161
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala
 ##
 @@ -497,12 +497,10 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
   throw new IllegalArgumentException(s"$targetType is not matched at 
fillValue")
   }
   // Only fill if the column is part of the cols list.
-  if (typeMatches && cols.exists(col => columnEquals(f.name, col))) {
-fillCol[T](f, value)
-  } else {
-df.col(f.name)
-  }
+  typeMatches && cols.exists(col => columnEquals(f.name, col))
+}.map { col =>
+  (col.name, fillCol[T](col, value))
 }
-df.select(projections : _*)
+df.withColumns(fillColumnsInfo.map(_._1), fillColumnsInfo.map(_._2))
 
 Review comment:
   When `df` has a duplicate column name, what is the behavior? Also, we need 
to add test cases to ensure the behaviors are consistent. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org