Re: [Python] pyarrow.read_feather use_threads option not respected?

Burke Kaltenberger Mon, 12 Jul 2021 13:11:44 -0700

Please take me off the mailing list

On Mon, Jul 12, 2021 at 1:08 PM Wes McKinney <[email protected]> wrote:


> hi Arun — the `use_threads` argument here only toggles whether
> multiple threads are used in the conversion from the Arrow/Feather
> representation to pandas. Since you elected to use compression,
> multiple threads are used when decompressing the data, and this can
> only be changed by setting the number of threads globally in the
> pyarrow library [1]
>
> This seems a bit misleading to me, so it would be good to open a Jira
> issue to clarify in the documentation what "use_threads" does
>
> [1]:
> http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count
>
> On Mon, Jul 12, 2021 at 3:00 PM Arun Joseph <[email protected]> wrote:
> >
> > I'm running the following:
> >
> > Python 3.7.4 (default, Aug 13 2019, 20:35:49)
> > [GCC 7.3.0] :: Anaconda, Inc. on linux
> > Type "help", "copyright", "credits" or "license" for more information.
> > >>> import pyarrow
> > >>> pyarrow.__version__
> > '4.0.1'
> >
> > from pyarrow import feather
> >
> > feather.write_feather(df, dest=file_path, compression='zstd',
> compression_level=19)
> > file_path=f'{valid_file_path}'
> > feather.read_feather(file_path, use_threads=False)
> >
> > It seems like the use_threads argument does not alter the number of
> threads launched. I've tested with both use_threads=True and
> use_threads=False. Am I misunderstanding what use_threads actually means?
> It seems like it launches ~12 threads.
> >
> > Could this be related to the compression strategy of the file itself?
> >
> > Thank You,
> > Arun Joseph
> >
>


-- 
*First Talent Search & Placement*
*Burke Kaltenberger
<https://www.linkedin.com/in/burke-kaltenberger-3a41731/> | Founder*
*408.458.0071*

Re: [Python] pyarrow.read_feather use_threads option not respected?

Reply via email to