Thanks Joris. That did the trick.
-Dan Nugent On Apr 30, 2020, 10:01 -0400, Wes McKinney <[email protected]>, wrote: > For the record, as I've stated elsewhere I'm fairly sure, I don't > agree with toggling memory mapping at the filesystem level. If a > filesystem supports memory mapping, then a consumer of the filesystem > should IMHO be able to request a memory map. > > On Thu, Apr 30, 2020 at 2:27 AM Joris Van den Bossche > <[email protected]> wrote: > > > > Hi Dan, > > > > Currently, the memory mapping in the Datasets API is controlled by the > > filesystem. So to enable memory mapping for feather, you can do: > > > > import pyarrow.dataset as ds > > from pyarrow.fs import LocalFileSystem > > > > fs = LocalFileSystem(use_mmap=True) > > t = ds.dataset('demo', format='feather', filesystem=fs).to_table() > > > > Can you try if that is working for you? > > We should better document this (and there is actually also some discussion > > about the best API for this, see > > https://issues.apache.org/jira/browse/ARROW-8156, > > https://issues.apache.org/jira/browse/ARROW-8307) > > > > Joris > > > > On Thu, 30 Apr 2020 at 01:58, Daniel Nugent <[email protected]> wrote: > > > > > > Hi, > > > > > > I'm trying to use the 0.17 dataset API to map in an arrow table in the > > > uncompressed feather format (ultimately hoping to work with data larger > > > than memory). It seems like it reads all the constituent files into > > > memory before creating the Arrow table object though. > > > > > > When I use the FeatherDataset API, it does appear to work map the files > > > and the Table is created based off of mapped data. > > > > > > Any hints at what I'm doing wrong? I didn't see any options relating to > > > memory mapping for the general datasets > > > > > > Here's the code for the plain dataset api call: > > > > > > from pyarrow.dataset import dataset as ds > > > t = ds('demo', format='feather').read_table() > > > > > > Here's the code for reading using the FeatherDataset api: > > > > > > from pyarrow.feather import FeatherDataset as ds > > > from pathlib import Path > > > t = ds(list(Path('demo').iterdir())).read_table() > > > > > > Thanks! > > > > > > -Dan Nugent
