[jira] [Created] (ARROW-10848) [C++] CSV ISO-8601 date and timestamp short form

2020-12-08 Thread Maciej (Jira)
Maciej created ARROW-10848:
--

 Summary: [C++] CSV ISO-8601 date and timestamp short form
 Key: ARROW-10848
 URL: https://issues.apache.org/jira/browse/ARROW-10848
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Maciej


Arrow supports {color:#008000}ISO-8601 {color:#172b4d}for data and timestamp 
parsing but doesn't support short form of them. E.g.{color}{color}
{code:java}
19990108
or
19990108 040506
{code}
Examples taken from: https://www.postgresql.org/docs/12/datatype-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10847) [C++] CSV date custom parser

2020-12-08 Thread Maciej (Jira)
Maciej created ARROW-10847:
--

 Summary: [C++] CSV date custom parser
 Key: ARROW-10847
 URL: https://issues.apache.org/jira/browse/ARROW-10847
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 2.0.0
Reporter: Maciej


When I have a custom date format in CSV I'd like to parse it by adding 
additional DateParser, equivalent to TimestampParser which may be added to 
{color:#001080}timestamp_parsers{color} in {color:#267f99}ConvertOptions.{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10315) [C++] CSV skip wrong rows

2020-10-15 Thread Maciej (Jira)
Maciej created ARROW-10315:
--

 Summary: [C++] CSV skip wrong rows
 Key: ARROW-10315
 URL: https://issues.apache.org/jira/browse/ARROW-10315
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 1.0.1
Reporter: Maciej


It would be helpful to add another option to {color:#267f99}ReadOptions 
{color}which will enable skipping rows with wrong data (e.g. data type mismatch 
with column type) and continue reading next rows. Wrong rows numbers may be 
reported at the end of processing.

This way I can deal with the wrongly formatted data or ignore it if I have a 
large load success rate and I don’t care about the exceptions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10314) [C++] CSV wrong row number in error message

2020-10-15 Thread Maciej (Jira)
Maciej created ARROW-10314:
--

 Summary: [C++] CSV wrong row number in error message
 Key: ARROW-10314
 URL: https://issues.apache.org/jira/browse/ARROW-10314
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 1.0.1
Reporter: Maciej


When I try to read CSV file with wrong data, I get message like:
{code:java}
CSV file reader error: Invalid: In CSV column #0: CSV conversion error to 
timestamp[s]: invalid value '1'
{code}
Would be very helpful to add information about row with wrong data e.g.
{code:java}
CSV file reader error: Invalid: In CSV column #0 line number #123456: CSV 
conversion error to timestamp[s]: invalid value '1'
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10115) [C++] CSV empty quoted string is treated as NULL

2020-09-28 Thread Maciej (Jira)
Maciej created ARROW-10115:
--

 Summary: [C++] CSV empty quoted string is treated as NULL
 Key: ARROW-10115
 URL: https://issues.apache.org/jira/browse/ARROW-10115
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 1.0.1
Reporter: Maciej


When parsing my CSV I have set
{color:#267f99}ConvertOptions{color}::s{color:#001080}trings_can_be_null to 
true.{color}

{color:#001080}Now as I have values:{color}
{code:java}
1234,"",345
{code}
{color:#001080}the string value which is an empty string is treated as NULL.
I've checkeced default valeus of 
{color:#267f99}ConvertOptions{color}::n{color}{color:#001080}ull_values and 
there is empty string considered null but it's not empty string it's quoted 
empty string which shouldn't be treated as NULL in my opionion. Simillar 
behavior we have in Postgresql empty quoted string is not treated as a NULL: 
https://www.postgresql.org/docs/12/sql-copy.html{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9964) [C++] CSV date support

2020-09-10 Thread Maciej (Jira)
Maciej created ARROW-9964:
-

 Summary: [C++] CSV date support
 Key: ARROW-9964
 URL: https://issues.apache.org/jira/browse/ARROW-9964
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 1.0.1
Reporter: Maciej


There is no support for reading date type from CSV file. I'd like to read such 
a value:
{code:java}
1991-02-03
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)