[ https://issues.apache.org/jira/browse/ARROW-11682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17286241#comment-17286241 ]
Neal Richardson commented on ARROW-11682: ----------------------------------------- It loaded in 2.0.0 but the string was silently truncated, which is (arguably) worse. https://arrow.apache.org/docs/r/news/index.html#enhancements mentions the solution, which is to set `options(arrow.skip_nul = TRUE)` to read in files with embedded nuls. I don't recommend this as a global setting though because it will likely be significantly slower. There's some discussion on ARROW-11478 to improve this experience, please feel free to chime in there if you have opinions. And see ARROW-6582 and the linked pull request if you're interested in more details on how we got here. > [R] Regression from 2.0.0 -> 3.0.0: Null character in string prevents > dataset from loading > ------------------------------------------------------------------------------------------- > > Key: ARROW-11682 > URL: https://issues.apache.org/jira/browse/ARROW-11682 > Project: Apache Arrow > Issue Type: New Feature > Affects Versions: 3.0.0 > Reporter: Kyle Kavanagh > Priority: Major > > When a feather file contains a valid string which happens to contain the > appearance of a null character, R fails to read the file. Example string: > '#\001200\01' > Pyarrow is able to successfully read the file and correctly display the > string. > This dataset was previously able to be loaded in 2.0.0 but fails in 3.0.0 > with the error: > Error in Table__to_dataframe(x, use_threads = option_use_threads()) : > embedded nul in string: '#\001200\01' -- This message was sent by Atlassian Jira (v8.3.4#803005)