Github user BryanCutler commented on the issue:
https://github.com/apache/spark/pull/19459
After incorporating date and timestamp types for this, I had to refactor a
little to use `_create_batch` from serializers to make Arrow batches from
Columns even when the user doesn't specify the schema to be able to use the
casts for these types. It doesn't seem to affect performance from the initial
benchmark.
I came across an issue when using pandas DataFrame with timestamps without
Arrow. Spark will read values as long and not datetime, so currently a test
for this will fail
```
In [1]: spark.conf.set("spark.sql.execution.arrow.enabled", "false")
In [2]: import pandas as pd
...: from datetime import datetime
...:
In [3]: pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)]})
In [4]: df = spark.createDataFrame(pdf)
In [5]: df.show()
+-------------------+
| ts|
+-------------------+
|1509411661000000000|
+-------------------+
In [6]: df.schema
Out[6]: StructType(List(StructField(ts,LongType,true)))
In [7]: pdf
Out[7]:
ts
0 2017-10-31 01:01:01
In [9]: pdf.dtypes
Out[9]:
ts datetime64[ns]
dtype: object
```
@HyukjinKwon or @ueshin could you confirm you see the same? and do you
consider this a bug?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]