Github user felixcheung commented on a diff in the pull request:
https://github.com/apache/spark/pull/10037#discussion_r46471921
--- Diff: R/pkg/R/generics.R ---
@@ -623,6 +623,10 @@ setGeneric("getItem", function(x, ...) {
standardGeneric("getItem") })
#' @rdname column
#' @export
+setGeneric("isNaN", function(x) { standardGeneric("isNaN") })
--- End diff --
This is loosely related to this PR and I'd suggest we discuss more on this.
Currently this behavior is very confusing, see my example below @shivaram your
thought?
tl;dr - it prints out value as NA but one would interact with it as NULL
```
> head(a)
area peri shape perm
1 4990 2791.900 0.0903296 6.3
2 7002 3892.600 0.1486220 6.3
3 7558 NA 0.1833120 6.3
4 7352 3869.320 0.1170630 6.3
> df <- as.DataFrame(sqlContext, a)
> head(df)
area peri shape perm
1 4990 2791.90 0.0903296 6.3
2 7002 3892.60 0.1486220 6.3
3 7558 NA 0.1833120 6.3
4 7352 3869.32 0.1170630 6.3
5 7943 3948.54 0.1224170 17.1
6 7979 4010.15 0.1670450 17.1
> a <- filter(df, "peri != NA")
15/12/02 19:20:06 ERROR RBackendHandler: filter on 16 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) :
org.apache.spark.sql.AnalysisException: cannot resolve 'NA' given input
columns area, peri, shape, perm;
### Here there is no notion of 'NA' even though the user can see it right
there
> a <- filter(df, "isnull(peri)")
> head(a)
area peri shape perm
1 7558 NA 0.183312 6.3
> a <- filter(df, "isnotnull(peri)")
> head(a)
area peri shape perm
1 4990 2791.90 0.0903296 6.3
2 7002 3892.60 0.1486220 6.3
3 7352 3869.32 0.1170630 6.3
4 7943 3948.54 0.1224170 17.1
5 7979 4010.15 0.1670450 17.1
6 9333 4345.75 0.1896510 17.1
> a <- filter(df, "peri IS NULL")
> head(a)
area peri shape perm
1 7558 NA 0.183312 6.3
```
If we are to keep the automatic NULL<->NA conversion then I'd suggest we
don't expose a `isnull` `isnotnull`
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]