Re: check whether pandas type is convertible to arrow type

Sandy Ryza Mon, 01 Jun 2020 08:46:00 -0700

Ah - I hadn't thought about how the object dtype complicates things:

What I'm trying to do at a higher level is maybe wacky:


   - I want a set of parquet files to be read/written by PySpark and Pandas
   interchangeably.
   - For each file, I want to to specify, in code, the column types
   expected in the file.
   - Before writing out a Pandas DataFrame to a file, I want to check
   whether it matches the expected column types for the file.  I don't need to
   provably catch every violation, but the more I can catch, the better.
   - I'm considering using pyarrow types for expressing the expected column
   types for each file.

Does that make sense?  Is there a different way you'd advise accomplishing
this?

On 2020/05/30 15:07:05, Wes McKinney <[email protected]> wrote:
> I don't think there is specifically (one could be added in theory). Is>
> the goal to determine whether `pyarrow.array(pandas_object)` will>
> succeed or not, or something else? Since a lot of pandas data is>
> opaquely represented with object dtype it can be tricky unless you>
> want to go to the expense of using `pandas.lib.infer_dtype` to>
> determine the effective logical type of the values.>
>
> On Fri, May 29, 2020 at 4:18 PM Sandy Ryza <[email protected]> wrote:>
> >>
> > Hi all,>
> >>
> > If I have a pandas dtype and an arrow type, is there a pyarrow API that
allows me to check whether the pandas dtype is convertible to the arrow
type?>
> >>
> > It seems like "arrow_type.to_pandas_dtype() == pandas_dtype" would work
in most cases, because pandas dtypes tend to be at least as wide as
equivalent arrow types, but I'm wondering whether there's something more
principled.>
> >>
> > Any help much appreciated,>
> > Sandy>
> >>
>

Re: check whether pandas type is convertible to arrow type

Reply via email to