zero323 commented on a change in pull request #27359:
[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0
URL: https://github.com/apache/spark/pull/27359#discussion_r370965418
##########
File path: R/pkg/tests/fulltests/test_sparkSQL.R
##########
@@ -848,24 +848,31 @@ test_that("collect() and take() on a DataFrame return
the same number of rows an
})
test_that("collect() support Unicode characters", {
- lines <- c("{\"name\":\"안녕하세요\"}",
- "{\"name\":\"您好\", \"age\":30}",
- "{\"name\":\"こんにちは\", \"age\":19}",
- "{\"name\":\"Xin chào\"}")
+ jsonPath <- file.path(
+ Sys.getenv("SPARK_HOME"),
+ "R", "pkg", "tests", "fulltests", "data",
+ "test_utils_utf.json"
+ )
+
+ lines <- readLines(jsonPath, encoding = "UTF-8")
- jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
- writeLines(lines, jsonPath)
+ expected <- regmatches(lines, gregexpr('(?<="name": ").*?(?=")', lines, perl
= TRUE))
df <- read.df(jsonPath, "json")
rdf <- collect(df)
expect_true(is.data.frame(rdf))
- expect_equal(rdf$name[1], markUtf8("안녕하세요"))
Review comment:
Do you have any preference towards the method used?
In general I have no access to Windows setup on which I can test this across
testthat (things vary, as 2.0 and latest 2.x versions come with different
dependencies), R (we have 3.1 on Jenkins, latest on Windows builds) and
encoding. So this is the best I came up with, in the limited time I had to
tinker with the whole thing. But I am opened to other proposals.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]