Ah - I hadn't thought about how the object dtype complicates things: What I'm trying to do at a higher level is maybe wacky:
- I want a set of parquet files to be read/written by PySpark and Pandas interchangeably. - For each file, I want to to specify, in code, the column types expected in the file. - Before writing out a Pandas DataFrame to a file, I want to check whether it matches the expected column types for the file. I don't need to provably catch every violation, but the more I can catch, the better. - I'm considering using pyarrow types for expressing the expected column types for each file. Does that make sense? Is there a different way you'd advise accomplishing this? On 2020/05/30 15:07:05, Wes McKinney <[email protected]> wrote: > I don't think there is specifically (one could be added in theory). Is> > the goal to determine whether `pyarrow.array(pandas_object)` will> > succeed or not, or something else? Since a lot of pandas data is> > opaquely represented with object dtype it can be tricky unless you> > want to go to the expense of using `pandas.lib.infer_dtype` to> > determine the effective logical type of the values.> > > On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <[email protected]> wrote:> > >> > > Hi all,> > >> > > If I have a pandas dtype and an arrow type, is there a pyarrow API that allows me to check whether the pandas dtype is convertible to the arrow type?> > >> > > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work in most cases, because pandas dtypes tend to be at least as wide as equivalent arrow types, but I'm wondering whether there's something more principled.> > >> > > Any help much appreciated,> > > Sandy> > >> >
