The Arrow columnar format defines a "schema"[1]. That is the most basic concept all implementations support. The C data interface also defines a "schema"[2] which is based on [1] that you might also want to be aware of. Pyarrow defines a python object called a Schema[3] which wraps [2] and represents [1] for the python implementation.
I don't really know what a multi-header / nested-header is exactly. I can make some guesses but would rather make sure I understand what you are after first. Can you expand on that a little bit and maybe provide an example? [1] https://github.com/apache/arrow/blob/apache-arrow-5.0.0/format/Schema.fbs#L415 [2] https://arrow.apache.org/docs/format/CDataInterface.html#the-arrowschema-structure [3] https://arrow.apache.org/docs/python/generated/pyarrow.Schema.html On Mon, Aug 30, 2021 at 11:22 AM Natasha Jokinen <[email protected]> wrote: > > Hi Team, > > > > Is the schema that PyArrow uses to know how to convert between an Apache > Arrow table and a Pandas Dataframe documented? I’m looking at ways my company > can have non-python languages share an Apache Arrow schema and it would be > great to build off of an existing schema like what Pandas uses rather than > coming up with our own. > > > > I’m particularly interest in documentation of how multi-headers/nested > headers are expressed in the PyArrow schema since Pandas only supports flat > columns and how PyArrow’s schema indicates row grouping. > > > > Thanks, > > Natasha > >
