[GitHub] spark pull request #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr shou...

cloud-fan Mon, 07 May 2018 05:28:18 -0700

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21250#discussion_r186407992
  
    --- Diff: docs/sparkr.md ---
    @@ -663,3 +663,7 @@ You can inspect the search path in R with 
[`search()`](https://stat.ethz.ch/R-ma
      - The `stringsAsFactors` parameter was previously ignored with `collect`, 
for example, in `collect(createDataFrame(iris), stringsAsFactors = TRUE))`. It 
has been corrected.
      - For `summary`, option for statistics to compute has been added. Its 
output is changed from that from `describe`.
      - A warning can be raised if versions of SparkR package and the Spark JVM 
do not match.
    +
    +## Upgrading to SparkR 2.3.1 and above
    +
    + - In SparkR 2.3.0 and earlier, the `start` parameter of `substr` method 
was wrongly subtracted by one, previously. In other words, the index specified 
by `start` parameter was considered as 0-base. This can lead to inconsistent 
substring results and also does not match with the behaviour with `substr` in 
R. In version 2.3.1 and later, it has been fixed so the `start` parameter of 
`substr` method is now 1-base. As an example, `substr(lit('abcdef'), 2, 4))` 
would result to `abc` in SparkR 2.3.0, and the result would be `bcd` in SparkR 
2.3.1.
    --- End diff --
    
    please make sure `substr(lit('abcdef'), 2, 4))` is valid in Spark R, I 
didn't check it with Spark R document when writing it...



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21250: [SPARK-23291][SQL][R][BRANCH-2.3] R's substr shou...

Reply via email to