Hi Gary,
According to ARROW-9561 <https://issues.apache.org/jira/browse/ARROW-9561> [1]
you need to pass in the type for the column specifically. i.e.
csv.ConvertOptions(column_types={'t': pa.timestamp('us')})
Hope this helps.
-Micah
[1] https://issues.apache.org/jira/browse/ARROW-9561
On Thu, Sep 3, 2020 at 8:23 AM Gary Clark <[email protected]> wrote:
> Hi,
>
> Not sure if I am missing something, but I am unable to get pyarrow to
> parse my datetimes that are being inferred as strings, to be timestamps.
>
> My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'
>
> I have tried creating:
> convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
> df = csv.read_csv('path_to_csv', convert_options=convert_opts)
> print(df.schema)
>
> This yields no change and has my columns with these formatted timestamps
> still showing as strings.
>
> Additionally, I have tried casting as well:
>
> dfschema = pa.schema([
> ('date_column', pa.timestamp('ms'))
> ])
> df = csv.read_csv('path_to_csv')
> df.cast(target_schema=dfschema)
>
> This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse
> string: 2015-01-09 00:00:00.000"
>
> I am using pyarrow=1.0.1 on a linux docker container.
>
> Thanks,
>
> --
> Gary Clark
> *Data Scientist & Data Engineer*
> *B.S. Mechanical Engineering, Howard University '13*
> +1 (717) 798-6916
> [email protected]
>