zero323 commented on a change in pull request #27359: 
[SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0 
URL: https://github.com/apache/spark/pull/27359#discussion_r370964207
 
 

 ##########
 File path: R/pkg/tests/fulltests/test_sparkSQL.R
 ##########
 @@ -848,24 +848,31 @@ test_that("collect() and take() on a DataFrame return 
the same number of rows an
 })
 
 test_that("collect() support Unicode characters", {
-  lines <- c("{\"name\":\"안녕하세요\"}",
-             "{\"name\":\"您好\", \"age\":30}",
-             "{\"name\":\"こんにちは\", \"age\":19}",
-             "{\"name\":\"Xin chào\"}")
+  jsonPath <- file.path(
+    Sys.getenv("SPARK_HOME"),
+    "R", "pkg", "tests", "fulltests", "data",
+    "test_utils_utf.json"
+  )
+
+  lines <- readLines(jsonPath, encoding = "UTF-8")
 
-  jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
-  writeLines(lines, jsonPath)
+  expected <- regmatches(lines, gregexpr('(?<="name": ").*?(?=")', lines, perl 
= TRUE))
 
   df <- read.df(jsonPath, "json")
   rdf <- collect(df)
   expect_true(is.data.frame(rdf))
-  expect_equal(rdf$name[1], markUtf8("안녕하세요"))
 
 Review comment:
   @dongjoon-hyun  This fails on Windows due to encoding issues ([the whole 
thing is quite messy](https://developer.r-project.org/Encodings_and_R.html)), 
when used directly, irrespective of testthat version. Similar to recursive call 
test case, failure is easy to reproduce (Windows, English locale,  CP1252 
encoding) outside tests, but somehow is ignored in testthat 1.x.
   
   The alternative to this is to write bytes explicitly, i.e.
   
   ```r
   test_that("collect() support Unicode characters", {
   
     lines <- markUtf8(c(
       '{"name": "안녕하세요"}',
       '{"name": "您好", "age": 30}',
       '{"name": "こんにちは", "age": 19}',
       '{"name": "Xin ch\xc3\xa0o"}'
     ))
   
     jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
     writeLines(lines, jsonPath, useBytes = TRUE)
   
     expected <- regmatches(lines, regexec('(?<="name": ").*?(?=")', lines, 
perl = TRUE))
   
     df <- read.df(jsonPath, "json")
     rdf <- collect(df)
     expect_true(is.data.frame(rdf))
   
     rdf$name <- markUtf8(rdf$name)
     expect_equal(rdf$name[1], expected[[1]])
     expect_equal(rdf$name[2], expected[[2]])
     expect_equal(rdf$name[3], expected[[3]])
     expect_equal(rdf$name[4], expected[[4]])
   
     df1 <- createDataFrame(rdf)
     expect_equal(
       collect(
         where(df1, df1$name == expected[[2]])
       )$name,
       expected[[2]]
     )
   })
   ```
   
   or skip test case on Windows, but as-is, this will and should fail on 
AppVeyor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to