Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/21250#discussion_r186348079
--- Diff: docs/sparkr.md ---
@@ -663,3 +663,7 @@ You can inspect the search path in R with
[`search()`](https://stat.ethz.ch/R-ma
- The `stringsAsFactors` parameter was previously ignored with `collect`,
for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It
has been corrected.
- For `summary`, option for statistics to compute has been added. Its
output is changed from that from `describe`.
- A warning can be raised if versions of SparkR package and the Spark JVM
do not match.
+
+## Upgrading to Spark 2.3.1 and above
+
+ - The `start` parameter of `substr` method was wrongly subtracted by one,
previously. In other words, the index specified by `start` parameter was
considered as 0-base. This can lead to inconsistent substring results and also
does not match with the behaviour with `substr` in R. It has been fixed so the
`start` parameter of `substr` method is now 1-base, e.g., therefore to get the
same result as `substr(df$a, 2, 5)`, it should be changed to `substr(df$a, 1,
4)`.
--- End diff --
we should mention the version more explicitly, e.g.
```
In SparkR 2.3.0 and earlier, the `start` parameter ... In version 2.3.1 and
later, ... As an example, `substr(lit('abcdef'), 2, 5)` would result to `abc`
in SparkR 2.3.0, and in SparkR 2.3.1, the result would be ...
```
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]