A lot of interesting discussion on the thread, but to still answer your original question about the dataframe index being dropped. I don't know the historical reason to do so, but the easy workaround (without needing to rely on private modules) is to convert the dataframe to a table first (and ensure your index is preserved in this step), and pass that to write_feather (which supports pyarrow.Table as well in addition to pandas.DataFrame):
import pyarrow as pa from pyarrow import feather table = pa.Table.from_pandas(df, preserve_index=True) feather.write_feather(table, dest, ...) On Tue, 13 Jul 2021 at 19:06, Arun Joseph <[email protected]> wrote: > Hi, > > I've noticed that if I pass a pandas dataframe to write_feather > <https://github.com/apache/arrow/blob/release-4.0.1/python/pyarrow/feather.py#L152> > (hyperlink to relevant part of code), it will automatically drop the index. > Was this behavior intentionally chosen to only drop the index and not to > allow the user to specify? I assumed the behavior would match the default > behavior of converting from a pandas dataframe to an arrow table as > mentioned in the docs > <https://arrow.apache.org/docs/python/pandas.html#handling-pandas-indexes> > . > > Is the best way around this to do the following? > > ```python3 > import pyarrow.lib as ext > from pyarrow.lib import Table > > table = Table.from_pandas(df) > ext.write_feather(table, dest, > compression=compression, > compression_level=compression_level, > chunksize=chunksize, version=version) > ``` > Thank You, > -- > Arun Joseph > >
