Dear, I have a parquet files with 300 000 columns and 30 000 rows. If I load a such file to pandas dataframe (with pyarrow) that take around 100 GO of ram.
As I perform a pairwise comparison between column I could load those data by N columns by N columns. So is it possible to load from a parquet file only few columns by their names ? Which will save some memory. Thanks -- Researcher computational biology PhD, Jonathan MERCIER Bioinformatics (LBI) 2, rue Gaston Crémieux 91057 Evry Cedex Tel :(+33)1 60 87 83 44 Email :[email protected]
