You could maybe use datasets on top of fsspec's zip file system [1]? [1] https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/zip.html
On Tuesday, July 19, 2022, Kirby, Adam <[email protected]> wrote: > Hi All, > > I'm currently using pyarrow.csv.read_csv to parse a CSV stream that > originates from a ZIP of multiple CSV files. For now, I'm using a separate > implementation to do the streaming ZIP decompression, then > using pyarrow.csv.read_csv at each CSV file boundary. > > I would love if there were a way to leverage pyarrow to handle the > decompression. From what I've seen in examples, a ZIP file containing a > single CSV is supported -- that is, it's possible to operate on a > compressed CSV stream -- but I wonder if it's possible to handle a > compressed stream that contains multiple files? > > Thank you in advance! >
