Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/20464#discussion_r165271143
--- Diff: R/pkg/R/column.R ---
@@ -169,7 +169,7 @@ setMethod("alias",
#' @note substr since 1.4.0
setMethod("substr", signature(x = "Column"),
function(x, start, stop) {
- jc <- callJMethod(x@jc, "substr", as.integer(start - 1),
as.integer(stop - start + 1))
+ jc <- callJMethod(x@jc, "substr", as.integer(start),
as.integer(stop - start + 1))
--- End diff --
This API behavior should be considered as wrong and performs
inconsistently. Because for starting position 1, we get substring from 1st
element, but for position 2, we still get the substring from 1. So we will get
the following inconsistent results:
```R
> collect(select(df, substr(df$a, 1, 5)))
substring(a, 0, 5)
1 abcde
> collect(select(df, substr(df$a, 2, 5)))
substring(a, 1, 4)
1 abcd
```
For such change, we might need to add a note in the doc as @HyukjinKwon
suggested.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]