You can specify an explicit Arrow schema when converting a
pandas.DataFrame to pyarrow.Table or RecordBatch. So it might be
better to write out the schema you want (kind of like when you write
the schema in SQL with CREATE TABLE ...) and then ensure that pandas
objects are coerced into that?
On
Ah - I hadn't thought about how the object dtype complicates things:
What I'm trying to do at a higher level is maybe wacky:
- I want a set of parquet files to be read/written by PySpark and Pandas
interchangeably.
- For each file, I want to to specify, in code, the column types
I don't think there is specifically (one could be added in theory). Is
the goal to determine whether `pyarrow.array(pandas_object)` will
succeed or not, or something else? Since a lot of pandas data is
opaquely represented with object dtype it can be tricky unless you
want to go to the expense of
Hi all,
If I have a pandas dtype and an arrow type, is there a pyarrow API that
allows me to check whether the pandas dtype is convertible to the arrow
type?
It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work in
most cases, because pandas dtypes tend to be at least as wide