[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash
[ https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231924#comment-13231924 ] Sebb commented on CSV-58: - I think the default should be to retain the original source characters if the escape sequence is not recognised. This will allow the application to take further action if necessary. Failing that, throw an exception. Silently dropping the escape character seems the worst choice as the default. There's also the issue of what meta-characters should be de-escaped. It seems reasonable to include the encapsulator and CR, LF, possibly also the delimiter. But should any escapes - apart from the encapsulator itself - be processed in an encapsulated token? There's no need to do so. Maybe escape handling should be overrideable by the user. Unicode escapes are lost if escape character is backslash - Key: CSV-58 URL: https://issues.apache.org/jira/browse/CSV-58 Project: Commons CSV Issue Type: Bug Components: Parser Reporter: Sebb Fix For: 1.0 The current escape parsing converts escchar to plain char if the char is not one of the special characters to be escaped. This can affect unicode escapes if the esc character is backslash. One way round this is to specifically check for char == 'u', but it seems wrong to only do this for 'u'. Another solution would be to leave escchar as is unless the char is one of the special characters. There are several possible ways to treat unrecognised escapes: - treat it as if the escape char had not been present (current behaviour) - leave the escape char as is - throw an exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash
[ https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229415#comment-13229415 ] Sebb commented on CSV-58: - If unicode parsing is not selected, the unicode sequences lose their escape character so cannot then be parsed later. This is really about more than just unicode escape sequences, though that is what alerted me to the issue. The whole business of escape handling needs to be very carefully documented (and tested!) to ensure predictable behaviour. Unicode escapes are lost if escape character is backslash - Key: CSV-58 URL: https://issues.apache.org/jira/browse/CSV-58 Project: Commons CSV Issue Type: Bug Reporter: Sebb The current escape parsing converts escchar to plain char if the char is not one of the special characters to be escaped. This can affect unicode escapes if the esc character is backslash. One way round this is to specifically check for char == 'u', but it seems wrong to only do this for 'u'. Another solution would be to leave escchar as is unless the char is one of the special characters. There are several possible ways to treat unrecognised escapes: - treat it as if the escape char had not been present (current behaviour) - leave the escape char as is - throw an exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash
[ https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229762#comment-13229762 ] Emmanuel Bourg commented on CSV-58: --- Understood. The whole escaping logic is dubious, there are a lot of corner cases. I'm trying to understand who actually use unicode and control character escapes in CSV files. It seems at least HSQLDB accept them when reading, but prefers using quotes when writing. Unicode escapes are lost if escape character is backslash - Key: CSV-58 URL: https://issues.apache.org/jira/browse/CSV-58 Project: Commons CSV Issue Type: Bug Reporter: Sebb The current escape parsing converts escchar to plain char if the char is not one of the special characters to be escaped. This can affect unicode escapes if the esc character is backslash. One way round this is to specifically check for char == 'u', but it seems wrong to only do this for 'u'. Another solution would be to leave escchar as is unless the char is one of the special characters. There are several possible ways to treat unrecognised escapes: - treat it as if the escape char had not been present (current behaviour) - leave the escape char as is - throw an exception -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira