Hi,
Not sure if I am missing something, but I am unable to get pyarrow to parse
my datetimes that are being inferred as strings, to be timestamps.
My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'
I have tried creating:
convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
df = csv.read_csv('path_to_csv', convert_options=convert_opts)
print(df.schema)
This yields no change and has my columns with these formatted timestamps
still showing as strings.
Additionally, I have tried casting as well:
dfschema = pa.schema([
('date_column', pa.timestamp('ms'))
])
df = csv.read_csv('path_to_csv')
df.cast(target_schema=dfschema)
This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse
string: 2015-01-09 00:00:00.000"
I am using pyarrow=1.0.1 on a linux docker container.
Thanks,
--
Gary Clark
*Data Scientist & Data Engineer*
*B.S. Mechanical Engineering, Howard University '13*
+1 (717) 798-6916
[email protected]