[jira] [Commented] (ARROW-11682) [R] Regression from 2.0.0 -> 3.0.0: Null character in string prevents dataset from loading

Neal Richardson (Jira) Wed, 17 Feb 2021 18:35:05 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286241#comment-17286241
 ]


Neal Richardson commented on ARROW-11682:
-----------------------------------------

It loaded in 2.0.0 but the string was silently truncated, which is (arguably) 
worse. 

https://arrow.apache.org/docs/r/news/index.html#enhancements mentions the 
solution, which is to set `options(arrow.skip_nul = TRUE)` to read in files 
with embedded nuls. I don't recommend this as a global setting though because 
it will likely be significantly slower. 

There's some discussion on ARROW-11478 to improve this experience, please feel 
free to chime in there if you have opinions. And see ARROW-6582 and the linked 
pull request if you're interested in more details on how we got here. 



> [R] Regression from 2.0.0 -> 3.0.0:  Null character in string prevents 
> dataset from loading
> -------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11682
>                 URL: https://issues.apache.org/jira/browse/ARROW-11682
>             Project: Apache Arrow
>          Issue Type: New Feature
>    Affects Versions: 3.0.0
>            Reporter: Kyle Kavanagh
>            Priority: Major
>
> When a feather file contains a valid string which happens to contain the 
> appearance of a null character, R fails to read the file.  Example string: 
> '#\001200\01'
> Pyarrow is able to successfully read the file and correctly display the 
> string.
> This dataset was previously able to be loaded in 2.0.0 but fails in 3.0.0 
> with the error:
> Error in Table__to_dataframe(x, use_threads = option_use_threads()) : 
>   embedded nul in string: '#\001200\01'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11682) [R] Regression from 2.0.0 -> 3.0.0: Null character in string prevents dataset from loading

Reply via email to