Re: [Python] Documentation for PyArrow's schema

Weston Pace Mon, 30 Aug 2021 16:50:12 -0700

The Arrow columnar format defines a "schema"[1].  That is the most
basic concept all implementations support.  The C data interface also
defines a "schema"[2] which is based on [1] that you might also want
to be aware of.  Pyarrow defines a python object called a Schema[3]
which wraps [2] and represents [1] for the python implementation.


I don't really know what a multi-header / nested-header is exactly.  I
can make some guesses but would rather make sure I understand what you
are after first.  Can you expand on that a little bit and maybe
provide an example?

[1] 
https://github.com/apache/arrow/blob/apache-arrow-5.0.0/format/Schema.fbs#L415
[2] 
https://arrow.apache.org/docs/format/CDataInterface.html#the-arrowschema-structure
[3] https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html

On Mon, Aug 30, 2021 at 11:22 AM Natasha Jokinen
<[email protected]> wrote:
>
> Hi Team,
>
>
>
> Is the schema that PyArrow uses to know how to convert between an Apache 
> Arrow table and a Pandas Dataframe documented? I’m looking at ways my company 
> can have non-python languages share an Apache Arrow schema and it would be 
> great to build off of an existing schema like what Pandas uses rather than 
> coming up with our own.
>
>
>
> I’m particularly interest in documentation of how multi-headers/nested 
> headers are expressed in the PyArrow schema since Pandas only supports flat 
> columns and how PyArrow’s schema indicates row grouping.
>
>
>
> Thanks,
>
> Natasha
>
>

Re: [Python] Documentation for PyArrow's schema

Reply via email to