Re: Reading large csv file with pyarrow

Daniel Nugent Mon, 17 Feb 2020 10:36:08 -0800

Arrow-3410 maybe?
On Feb 17, 2020, 07:47 -0500, Wes McKinney <wesmck...@gmail.com>, wrote:
> I seem to recall discussions about 1 chunk-at-a-time reading of CSV
> files. Such an API is not yet available in Python. This is also
> required for the C++ Datasets API. If there are not one or more JIRA
> issues about this I suggest that we open some to capture the use cases
>
> On Fri, Feb 14, 2020 at 3:16 PM filippo medri <filippo.me...@gmail.com> wrote:
> >
> > Hi,
> > by experimenting with arrow read_csv function to convert csv fie into 
> > parquet I found that it reads the data in memory.
> > On a side the ReadOptions class allows to specify a blocksize parameter to 
> > limit how much bytes to process at a time, but by looking at the memory 
> > usage my understanding is that the underlying Table is filled with all data.
> > Is there a way to at least specify a parameter to limit the read to a batch 
> > of rows? I see that I can skip rows from the beginning, but I am not 
> > finding a way to limit how many rows to read.
> > Which is the intended way to read a csv file that does not fit into memory?
> > Thanks in advance,
> > Filippo Medri

Re: Reading large csv file with pyarrow

Reply via email to