[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...

bravo-zhang Mon, 07 Aug 2017 21:12:38 -0700

Github user bravo-zhang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18820#discussion_r131817518
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/DataFrameNaFunctions.scala ---
    @@ -366,11 +370,15 @@ final class DataFrameNaFunctions private[sql](df: 
DataFrame) {
           return df
         }
     
    -    // replacementMap is either Map[String, String] or Map[Double, Double] 
or Map[Boolean,Boolean]
    -    val replacementMap: Map[_, _] = replacement.head._2 match {
    -      case v: String => replacement
    -      case v: Boolean => replacement
    -      case _ => replacement.map { case (k, v) => (convertToDouble(k), 
convertToDouble(v)) }
    +    // replacementMap is either Map[String, String], Map[Double, Double], 
Map[Boolean,Boolean]
    +    // while value can have null
    --- End diff --
    
    If replacement is `Map[Any, Any]` type, the replacementMap will not be 
confined to these 3 types.
    We tell users to only use doubles, strings and booleans in the replacement 
map in the method doc. But user can still use `df.na.replace("*", Map(10 -> 20, 
"Alpha" -> "Bravo"))`. The result is that only fields that have same type as 
the 1st key in the replacement map will perform replacement. This is due to the 
implementation of `targetColumnType` a few lines below.
    I'll modify the comments here. But for the negative examples (like the one 
I mentioned in this comment), do I need explain in the method doc to users?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18820: [SPARK-14932][SQL] Allow DataFrame.replace() to r...

Reply via email to