hi Arun — the `use_threads` argument here only toggles whether multiple threads are used in the conversion from the Arrow/Feather representation to pandas. Since you elected to use compression, multiple threads are used when decompressing the data, and this can only be changed by setting the number of threads globally in the pyarrow library [1]
This seems a bit misleading to me, so it would be good to open a Jira issue to clarify in the documentation what "use_threads" does [1]: http://arrow.apache.org/docs/python/generated/pyarrow.set_cpu_count.html#pyarrow.set_cpu_count On Mon, Jul 12, 2021 at 3:00 PM Arun Joseph <[email protected]> wrote: > > I'm running the following: > > Python 3.7.4 (default, Aug 13 2019, 20:35:49) > [GCC 7.3.0] :: Anaconda, Inc. on linux > Type "help", "copyright", "credits" or "license" for more information. > >>> import pyarrow > >>> pyarrow.__version__ > '4.0.1' > > from pyarrow import feather > > feather.write_feather(df, dest=file_path, compression='zstd', > compression_level=19) > file_path=f'{valid_file_path}' > feather.read_feather(file_path, use_threads=False) > > It seems like the use_threads argument does not alter the number of threads > launched. I've tested with both use_threads=True and use_threads=False. Am I > misunderstanding what use_threads actually means? It seems like it launches > ~12 threads. > > Could this be related to the compression strategy of the file itself? > > Thank You, > Arun Joseph >
