You could maybe use datasets on top of fsspec's zip file system [1]?

[1]
https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/zip.html

On Tuesday, July 19, 2022, Kirby, Adam <[email protected]> wrote:

> Hi All,
>
> I'm currently using pyarrow.csv.read_csv to parse a CSV stream that
> originates from a ZIP of multiple CSV files. For now, I'm using a separate
> implementation to do the streaming ZIP decompression, then
> using pyarrow.csv.read_csv at each CSV file boundary.
>
> I would love if there were a way to leverage pyarrow to handle the
> decompression. From what I've seen in examples, a ZIP file containing a
> single CSV is supported -- that is, it's possible to operate on a
> compressed CSV stream -- but I wonder if it's possible to handle a
> compressed stream that contains multiple files?
>
> Thank you in advance!
>

Reply via email to