Hi,

Not sure if I am missing something, but I am unable to get pyarrow to parse
my datetimes that are being inferred as strings, to be timestamps.

My strings are arriving in CSVs with this format: '2015-01-09 00:00:00.000'

I have tried creating:
convert_ops = csv.ConvertOptions(timestamp_parsers=['%Y-%m-%d %H:%M:%S.%f])
df = csv.read_csv('path_to_csv', convert_options=convert_opts)
print(df.schema)

This yields no change and has my columns with these formatted timestamps
still showing as strings.

Additionally, I have tried casting as well:

dfschema = pa.schema([
('date_column', pa.timestamp('ms'))
])
df = csv.read_csv('path_to_csv')
df.cast(target_schema=dfschema)

This way yields the error: "pyarrow.lib.ArrowInvalid: Failed to parse
string: 2015-01-09 00:00:00.000"

I am using pyarrow=1.0.1 on a linux docker container.

Thanks,

-- 
Gary Clark
*Data Scientist & Data Engineer*
*B.S. Mechanical Engineering, Howard University '13*
+1 (717) 798-6916
[email protected]

Reply via email to