Micah, Great idea, thank you! I really appreciate the pointer.

On Wed, Jul 20, 2022 at 12:04 AM Micah Kornfield <[email protected]>
wrote:

> You could maybe use datasets on top of fsspec's zip file system [1]?
>
> [1]
> https://filesystem-spec.readthedocs.io/en/latest/_modules/fsspec/implementations/zip.html
>
> On Tuesday, July 19, 2022, Kirby, Adam <[email protected]> wrote:
>
>> Hi All,
>>
>> I'm currently using pyarrow.csv.read_csv to parse a CSV stream that
>> originates from a ZIP of multiple CSV files. For now, I'm using a separate
>> implementation to do the streaming ZIP decompression, then
>> using pyarrow.csv.read_csv at each CSV file boundary.
>>
>> I would love if there were a way to leverage pyarrow to handle the
>> decompression. From what I've seen in examples, a ZIP file containing a
>> single CSV is supported -- that is, it's possible to operate on a
>> compressed CSV stream -- but I wonder if it's possible to handle a
>> compressed stream that contains multiple files?
>>
>> Thank you in advance!
>>
>

Reply via email to