[GitHub] [spark] zero323 edited a comment on issue #27328: [WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0

GitBox Thu, 23 Jan 2020 19:22:25 -0800

zero323 edited a comment on issue #27328: [WIP][SPARK-23435][SPARKR][TESTS] 
Update testthat to >= 2.0.0
URL: https://github.com/apache/spark/pull/27328#issuecomment-577975049
 
 
   I've checked some of the failures, and it is pretty clear that these tests 
must have been silenced somehow.
   
   - `node stack overflow` in `cleanClosure` is an obvious bug and it is 
trivial to reproduce outside tests - I've opened 
[SPARK-30629](https://issues.apache.org/jira/browse/SPARK-30629) to track this 
one.
   - Seems like mismatches in `collect() support Unicode characters` are caused 
by Windows / R encoding quirks. As is right now `lines` don't even seem to be 
recognized as  a valid JSON input
   
       ```r
       library(SparkR)
       SparkR::sparkR.session()
       Sys.info()
       #           sysname           release           version 
       #         "Windows"      "Server x64"     "build 17763" 
       #          nodename           machine             login 
       # "WIN-5BLT6Q610KH"          "x86-64"   "Administrator" 
       #              user    effective_user 
       #   "Administrator"   "Administrator" 
       
       Sys.getlocale()
       
       # [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
       
       lines <- c("{\"name\":\"안녕하세요\"}",
                  "{\"name\":\"您好\", \"age\":30}",
                  "{\"name\":\"こんにちは\", \"age\":19}",
                  "{\"name\":\"Xin chào\"}")
   
       system(paste0("cat ", jsonPath))
       # {"name":"<U+C548><U+B155><U+D558><U+C138><U+C694>"}
       # {"name":"<U+60A8><U+597D>", "age":30}
       # {"name":"<U+3053><U+3093><U+306B><U+3061><U+306F>", "age":19}
       # {"name":"Xin chào"}
       # [1] 0
   
       
       jsonPath <- tempfile(pattern = "sparkr-test", fileext = ".tmp")
       writeLines(lines, jsonPath)
       
       df <- read.df(jsonPath, "json")
       
       
       printSchema(df)
       # root
       #  |-- _corrupt_record: string (nullable = true)
       #  |-- age: long (nullable = true)
       #  |-- name: string (nullable = true)
       
       head(df)
       #              _corrupt_record age                                     
name
       # 1                       <NA>  NA 
<U+C548><U+B155><U+D558><U+C138><U+C694>
       # 2                       <NA>  30                         
<U+60A8><U+597D>
       # 3                       <NA>  19 
<U+3053><U+3093><U+306B><U+3061><U+306F>
       # 4 {"name":"Xin ch<U+FFFD>o"}  NA                                     
<NA>
       ```
   
   
   Let's see what else surfaces downstream.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zero323 edited a comment on issue #27328: [WIP][SPARK-23435][SPARKR][TESTS] Update testthat to >= 2.0.0

Reply via email to