[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14053910#comment-14053910 ] Benedikt Ritter commented on CSV-35: I've asked the ML to to comment on this fix. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce test.patch, commons-csv CSV-35 escapeCRLFOnce.patch, mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14047217#comment-14047217 ] Benedikt Ritter commented on CSV-35: [~tn] go ahead an commit the patch yourself ;-) Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: CSV-35.patch, mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14035579#comment-14035579 ] Thomas Neidhart commented on CSV-35: This issue is related to CSV-102. The patch there adds support for custom record separators when parsing, similar to my patch. Although I think that the ExtendedBufferedReader should not be changed, as CR/LF is used there to get the actual line number of the parsed file for error handling / debug information only. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: CSV-35.patch, mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033891#comment-14033891 ] Thomas Neidhart commented on CSV-35: Right now the lexer does not use the record separator(s) specified in the format to be parsed. In the mysql example, \n or LF is the record separator. The record looks as follows: 3;Value\r \\nwith a line break,c\n the CRLF sequence is escaped so that \n is not used as record separator, but the second \n then finished the record. So I would suggest that: * support multiple record separators for a format, e.g. \n, \r, or \r\n * the lexer uses the record separators defined for the format * an escape character indicates that the following character can not be used as record separator Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033941#comment-14033941 ] Gary Gregory commented on CSV-35: - It seems like a bug not to use the format's record separator. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14034404#comment-14034404 ] Sebb commented on CSV-35: - Gary: I suspect the record separator was originally intended as output only Thomas: I agree. However there is a possibility that the record separator (RS) could contain the escape character. How should it handle that case? I suspect this should be disallowed, as it will cause issues. In the case of the MySQL examples: If the escape char is set to '\' , then if the input is unescaped before checking for the RS, it would be possible to parse the input OK, by choosing RS=LF or RS=CRLF. i.e. there is no need to use the escape character in the RS because the unescaping is done first. This should of course be tested ... If one checks for RS before unescaping, then it would not be possible to escape the RS sequence. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13949990#comment-13949990 ] Gary Gregory commented on CSV-35: - This looks to me like the main serious issue remaining before 1.0. There is also [CSV-58]. Handling MySQL exports sounds pretty basic. We already have {{CSVFormat.MYSQL}}, so we are telling the world we know how to do MySQL... Gary Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879966#comment-13879966 ] Sebb commented on CSV-35: - Interesting. Do you have to tell MySQL what the EOL is when reloading from the CSV file? Or does it work this out for itself? This could be tricky if there are CR/LF at the end of the record. So long as the CSV code knows whether to treat LF or CRLF as the (only) line terminator it should be easy to parse this format. For EOL=LF, only LF needs to be escaped on output, and only an unescaped LF acts as an EOL on parse. For EOL=CRLF, in theory either (or both) need to be escaped on output; only if CR and LF are both unescaped is EOL detected on parse. However, if one wants to support a variable EOL - or the EOL is not known at the start - it quickly becomes very tricky to parse. It would be interesting to know how MySQL handles \CR\LF and LF\CR as input in the CRLF case. Also, how does it handle a bare CR on output? Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Attachments: mysql-export-line-terminated-by-crlf.csv, mysql-export-line-terminated-by-lf.csv Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877640#comment-13877640 ] Sebb commented on CSV-35: - I would be OK with adding an option to support this, but it should not be called anything to do with MySQL. It so happens that MySQL is known to generate the escape sequence, but other CSV exports may do so as well, so I think the option name should relate only to the functionality. The Javadoc can mention that the option may be needed for MySQL parsing. For example the option could be called: withEscapeCRLF(boolean). Default should be false (i.e. only escape the CR). Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13877129#comment-13877129 ] Gary Gregory commented on CSV-35: - So, not addressing this means that we cannot deal with MySQL exports? That seems harsh (for users). It sounds like, for users, the way to tell [csv] about this is with a withMySQLEol(boolean) option? Which would be a special case as Sebb mentioned. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726343#comment-13726343 ] Benedikt Ritter commented on CSV-35: Do we ant to fix the parsing behavior for MySQL exports? I think we don't need to fix this for 1.0. MySQL is just one format among many. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13696333#comment-13696333 ] Sebb commented on CSV-35: - As already noted, this would require special-casing, and will mean one parse generate escCR followed by plain LF. At the very least, this needs to be carefully documented to avoid suprises (and complaints). Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13695422#comment-13695422 ] Emmanuel Bourg commented on CSV-35: --- I rechecked the output of MySQL and it does indeed produce escCRLF and not escCRescLF. I think we should handle that case and extend the escaping to the second new line character. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692015#comment-13692015 ] Sebb commented on CSV-35: - Yes - see my comment dated 29/Mar/12 16:13. i.e. the escape char only affects the subsequent character. I suppose escCRLF could be special-cased if it is likely to be needed. However how does one then support the current behaviour - again, if there is a user-case for it? There could be switchable option, but that would only work for the complete file. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13241295#comment-13241295 ] Sebb commented on CSV-35: - The Lexer does currently (r1036896) handle escLF and escCR. The code currently treats escCRLF as escCR followed by LF. The LF is handled as EOL. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CSV-35) Escaped line separators are not supported
[ https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13236649#comment-13236649 ] Sebb commented on CSV-35: - What should happen in the case of escapeCRLF? I presume only the CR should be subject to the escape. If an application wants to include CRLF in a field, then the application should generate escapeCRescapeLF. Escaped line separators are not supported - Key: CSV-35 URL: https://issues.apache.org/jira/browse/CSV-35 Project: Commons CSV Issue Type: Bug Reporter: Emmanuel Bourg Fix For: 1.0 Commons CSV doesn't handle escaped line separators, for example: {code} value1;value2;value3a\ value3b {code} In this case the expected result is: {code}[value1, value2, value3a\nvalue3b]{code} This kind of escaping is produced by MySQL, whether the field enclosing is enabled or not. It's possible to see enclosing quotes and escaped line separators like this: {code} value1;value2;value3a\ value3b {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira