[jira] [Created] (ARROW-10848) [C++] CSV ISO-8601 date and timestamp short form
Maciej created ARROW-10848: -- Summary: [C++] CSV ISO-8601 date and timestamp short form Key: ARROW-10848 URL: https://issues.apache.org/jira/browse/ARROW-10848 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Maciej Arrow supports {color:#008000}ISO-8601 {color:#172b4d}for data and timestamp parsing but doesn't support short form of them. E.g.{color}{color} {code:java} 19990108 or 19990108 040506 {code} Examples taken from: https://www.postgresql.org/docs/12/datatype-datetime.html -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10847) [C++] CSV date custom parser
Maciej created ARROW-10847: -- Summary: [C++] CSV date custom parser Key: ARROW-10847 URL: https://issues.apache.org/jira/browse/ARROW-10847 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 2.0.0 Reporter: Maciej When I have a custom date format in CSV I'd like to parse it by adding additional DateParser, equivalent to TimestampParser which may be added to {color:#001080}timestamp_parsers{color} in {color:#267f99}ConvertOptions.{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10315) [C++] CSV skip wrong rows
Maciej created ARROW-10315: -- Summary: [C++] CSV skip wrong rows Key: ARROW-10315 URL: https://issues.apache.org/jira/browse/ARROW-10315 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 1.0.1 Reporter: Maciej It would be helpful to add another option to {color:#267f99}ReadOptions {color}which will enable skipping rows with wrong data (e.g. data type mismatch with column type) and continue reading next rows. Wrong rows numbers may be reported at the end of processing. This way I can deal with the wrongly formatted data or ignore it if I have a large load success rate and I don’t care about the exceptions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10314) [C++] CSV wrong row number in error message
Maciej created ARROW-10314: -- Summary: [C++] CSV wrong row number in error message Key: ARROW-10314 URL: https://issues.apache.org/jira/browse/ARROW-10314 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 1.0.1 Reporter: Maciej When I try to read CSV file with wrong data, I get message like: {code:java} CSV file reader error: Invalid: In CSV column #0: CSV conversion error to timestamp[s]: invalid value '1' {code} Would be very helpful to add information about row with wrong data e.g. {code:java} CSV file reader error: Invalid: In CSV column #0 line number #123456: CSV conversion error to timestamp[s]: invalid value '1' {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10115) [C++] CSV empty quoted string is treated as NULL
Maciej created ARROW-10115: -- Summary: [C++] CSV empty quoted string is treated as NULL Key: ARROW-10115 URL: https://issues.apache.org/jira/browse/ARROW-10115 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 1.0.1 Reporter: Maciej When parsing my CSV I have set {color:#267f99}ConvertOptions{color}::s{color:#001080}trings_can_be_null to true.{color} {color:#001080}Now as I have values:{color} {code:java} 1234,"",345 {code} {color:#001080}the string value which is an empty string is treated as NULL. I've checkeced default valeus of {color:#267f99}ConvertOptions{color}::n{color}{color:#001080}ull_values and there is empty string considered null but it's not empty string it's quoted empty string which shouldn't be treated as NULL in my opionion. Simillar behavior we have in Postgresql empty quoted string is not treated as a NULL: https://www.postgresql.org/docs/12/sql-copy.html{color} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9964) [C++] CSV date support
Maciej created ARROW-9964: - Summary: [C++] CSV date support Key: ARROW-9964 URL: https://issues.apache.org/jira/browse/ARROW-9964 Project: Apache Arrow Issue Type: Improvement Components: C++ Affects Versions: 1.0.1 Reporter: Maciej There is no support for reading date type from CSV file. I'd like to read such a value: {code:java} 1991-02-03 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)