Re: check whether pandas type is convertible to arrow type

2020-06-03 Thread Wes McKinney
You can specify an explicit Arrow schema when converting a pandas.DataFrame to pyarrow.Table or RecordBatch. So it might be better to write out the schema you want (kind of like when you write the schema in SQL with CREATE TABLE ...) and then ensure that pandas objects are coerced into that? On

Re: check whether pandas type is convertible to arrow type

2020-06-01 Thread Sandy Ryza
Ah - I hadn't thought about how the object dtype complicates things: What I'm trying to do at a higher level is maybe wacky: - I want a set of parquet files to be read/written by PySpark and Pandas interchangeably. - For each file, I want to to specify, in code, the column types

Re: check whether pandas type is convertible to arrow type

2020-05-30 Thread Wes McKinney
I don't think there is specifically (one could be added in theory). Is the goal to determine whether `pyarrow.array(pandas_object)` will succeed or not, or something else? Since a lot of pandas data is opaquely represented with object dtype it can be tricky unless you want to go to the expense of

check whether pandas type is convertible to arrow type

2020-05-29 Thread Sandy Ryza
Hi all, If I have a pandas dtype and an arrow type, is there a pyarrow API that allows me to check whether the pandas dtype is convertible to the arrow type? It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work in most cases, because pandas dtypes tend to be at least as wide