'Plain' Dataset Python API doesn't memory map?

Daniel Nugent Wed, 29 Apr 2020 16:59:02 -0700

Hi,

I'm trying to use the 0.17 dataset API to map in an arrow table in the
uncompressed feather format (ultimately hoping to work with data larger
than memory). It seems like it reads all the constituent files into memory
before creating the Arrow table object though.


When I use the FeatherDataset API, it does appear to work map the files and
the Table is created based off of mapped data.

Any hints at what I'm doing wrong? I didn't see any options relating to
memory mapping for the general datasets

Here's the code for the plain dataset api call:

    from pyarrow.dataset import dataset as ds
    t = ds('demo', format='feather').read_table()

Here's the code for reading using the FeatherDataset api:

    from pyarrow.feather import FeatherDataset as ds
    from pathlib import Path
    t = ds(list(Path('demo').iterdir())).read_table()

Thanks!

-Dan Nugent

'Plain' Dataset Python API doesn't memory map?

Reply via email to