Sure - I believe you can do it even in pandas - you have columns
parameter: pd.read_parquet('f.pq', columns=['A', 'B'])
arrow is more useful if you need to do some conversion of filtering.
BR,
Jacek
pt., 12 lut 2021 o 15:21 jonathan mercier <[email protected]>
napisał(a):
>
> Dear,
> I have a parquet files with 300 000 columns and 30 000 rows.
> If I load a such file to pandas dataframe (with pyarrow) that take
> around 100 GO of ram.
>
> As I perform a pairwise comparison between column I could load those
> data by N columns by N columns.
>
> So is it possible to load from a parquet file only few columns by their
> names ? Which will save some memory.
>
> Thanks
>
>
> --
> Researcher computational biology
> PhD, Jonathan MERCIER
>
> Bioinformatics (LBI)
> 2, rue Gaston
> Crémieux
> 91057 Evry Cedex
>
>
> Tel :(+33)1 60 87 83 44
> Email :[email protected]
>
>
>