Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10037#discussion_r46471921
  
    --- Diff: R/pkg/R/generics.R ---
    @@ -623,6 +623,10 @@ setGeneric("getItem", function(x, ...) { 
standardGeneric("getItem") })
     
     #' @rdname column
     #' @export
    +setGeneric("isNaN", function(x) { standardGeneric("isNaN") })
    --- End diff --
    
    This is loosely related to this PR and I'd suggest we discuss more on this. 
Currently this behavior is very confusing, see my example below @shivaram your 
thought?
    
    tl;dr - it prints out value as NA but one would interact with it as NULL
    ```
    > head(a)
        area     peri     shape   perm
    1   4990 2791.900 0.0903296    6.3
    2   7002 3892.600 0.1486220    6.3
    3   7558       NA 0.1833120    6.3
    4   7352 3869.320 0.1170630    6.3
    > df <- as.DataFrame(sqlContext, a)
    > head(df)
      area    peri     shape perm
    1 4990 2791.90 0.0903296  6.3
    2 7002 3892.60 0.1486220  6.3
    3 7558      NA 0.1833120  6.3
    4 7352 3869.32 0.1170630  6.3
    5 7943 3948.54 0.1224170 17.1
    6 7979 4010.15 0.1670450 17.1
    > a <- filter(df, "peri != NA")
    15/12/02 19:20:06 ERROR RBackendHandler: filter on 16 failed
    Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
      org.apache.spark.sql.AnalysisException: cannot resolve 'NA' given input 
columns area, peri, shape, perm;
    
    ### Here there is no notion of 'NA' even though the user can see it right 
there
    
    > a <- filter(df, "isnull(peri)")
    > head(a)
      area peri    shape perm
    1 7558   NA 0.183312  6.3
    > a <- filter(df, "isnotnull(peri)")
    > head(a)
      area    peri     shape perm
    1 4990 2791.90 0.0903296  6.3
    2 7002 3892.60 0.1486220  6.3
    3 7352 3869.32 0.1170630  6.3
    4 7943 3948.54 0.1224170 17.1
    5 7979 4010.15 0.1670450 17.1
    6 9333 4345.75 0.1896510 17.1
    > a <- filter(df, "peri IS NULL")
    > head(a)
      area peri    shape perm
    1 7558   NA 0.183312  6.3
    ```
    
    If we are to keep the automatic NULL<->NA conversion then I'd suggest we 
don't expose a `isnull` `isnotnull`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to