Re: [Python] pyarrow.read_feather use_threads option not respected?

Wes McKinney Mon, 12 Jul 2021 13:08:36 -0700

hi Arun — the `use_threads` argument here only toggles whether
multiple threads are used in the conversion from the Arrow/Feather
representation to pandas. Since you elected to use compression,
multiple threads are used when decompressing the data, and this can
only be changed by setting the number of threads globally in the
pyarrow library [1]


This seems a bit misleading to me, so it would be good to open a Jira
issue to clarify in the documentation what "use_threads" does

[1]: 
http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count

On Mon, Jul 12, 2021 at 3:00 PM Arun Joseph <[email protected]> wrote:
>
> I'm running the following:
>
> Python 3.7.4 (default, Aug 13 2019, 20:35:49)
> [GCC 7.3.0] :: Anaconda, Inc. on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyarrow
> >>> pyarrow.__version__
> '4.0.1'
>
> from pyarrow import feather
>
> feather.write_feather(df, dest=file_path, compression='zstd', 
> compression_level=19)
> file_path=f'{valid_file_path}'
> feather.read_feather(file_path, use_threads=False)
>
> It seems like the use_threads argument does not alter the number of threads 
> launched. I've tested with both use_threads=True and use_threads=False. Am I 
> misunderstanding what use_threads actually means? It seems like it launches 
> ~12 threads.
>
> Could this be related to the compression strategy of the file itself?
>
> Thank You,
> Arun Joseph
>

Re: [Python] pyarrow.read_feather use_threads option not respected?

Reply via email to