[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash

2012-03-17 Thread Sebb (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13231924#comment-13231924
 ] 

Sebb commented on CSV-58:
-

I think the default should be to retain the original source characters if the 
escape sequence is not recognised.
This will allow the application to take further action if necessary.

Failing that, throw an exception. Silently dropping the escape character seems 
the worst choice as the default.

There's also the issue of what meta-characters should be de-escaped.
It seems reasonable to include the encapsulator and CR, LF, possibly also the 
delimiter.

But should any escapes - apart from the encapsulator itself - be processed in 
an encapsulated token?
There's no need to do so.

Maybe escape handling should be overrideable by the user.


 Unicode escapes are lost if escape character is backslash
 -

 Key: CSV-58
 URL: https://issues.apache.org/jira/browse/CSV-58
 Project: Commons CSV
  Issue Type: Bug
  Components: Parser
Reporter: Sebb
 Fix For: 1.0


 The current escape parsing converts escchar to plain char if the char 
 is not one of the special characters to be escaped.
 This can affect unicode escapes if the esc character is backslash.
 One way round this is to specifically check for char == 'u', but it seems 
 wrong to only do this for 'u'.
 Another solution would be to leave escchar as is unless the char is one 
 of the special characters.
 There are several possible ways to treat unrecognised escapes:
 - treat it as if the escape char had not been present (current behaviour)
 - leave the escape char as is
 - throw an exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash

2012-03-14 Thread Sebb (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229415#comment-13229415
 ] 

Sebb commented on CSV-58:
-

If unicode parsing is not selected, the unicode sequences lose their escape 
character so cannot then be parsed later.

This is really about more than just unicode escape sequences, though that is 
what alerted me to the issue.

The whole business of escape handling needs to be very carefully documented 
(and tested!) to ensure predictable behaviour.

 Unicode escapes are lost if escape character is backslash
 -

 Key: CSV-58
 URL: https://issues.apache.org/jira/browse/CSV-58
 Project: Commons CSV
  Issue Type: Bug
Reporter: Sebb

 The current escape parsing converts escchar to plain char if the char 
 is not one of the special characters to be escaped.
 This can affect unicode escapes if the esc character is backslash.
 One way round this is to specifically check for char == 'u', but it seems 
 wrong to only do this for 'u'.
 Another solution would be to leave escchar as is unless the char is one 
 of the special characters.
 There are several possible ways to treat unrecognised escapes:
 - treat it as if the escape char had not been present (current behaviour)
 - leave the escape char as is
 - throw an exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CSV-58) Unicode escapes are lost if escape character is backslash

2012-03-14 Thread Emmanuel Bourg (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CSV-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13229762#comment-13229762
 ] 

Emmanuel Bourg commented on CSV-58:
---

Understood. The whole escaping logic is dubious, there are a lot of corner 
cases. I'm trying to understand who actually use unicode and control character 
escapes in CSV files. It seems at least HSQLDB accept them when reading, but 
prefers using quotes when writing.

 Unicode escapes are lost if escape character is backslash
 -

 Key: CSV-58
 URL: https://issues.apache.org/jira/browse/CSV-58
 Project: Commons CSV
  Issue Type: Bug
Reporter: Sebb

 The current escape parsing converts escchar to plain char if the char 
 is not one of the special characters to be escaped.
 This can affect unicode escapes if the esc character is backslash.
 One way round this is to specifically check for char == 'u', but it seems 
 wrong to only do this for 'u'.
 Another solution would be to leave escchar as is unless the char is one 
 of the special characters.
 There are several possible ways to treat unrecognised escapes:
 - treat it as if the escape char had not been present (current behaviour)
 - leave the escape char as is
 - throw an exception

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira