Yes, that looks right. There will need to be corresponding work in Python to make this available (probably through the datasets API)
On Mon, Feb 17, 2020 at 12:35 PM Daniel Nugent <[email protected]> wrote: > > Arrow-3410 maybe? > On Feb 17, 2020, 07:47 -0500, Wes McKinney <[email protected]>, wrote: > > I seem to recall discussions about 1 chunk-at-a-time reading of CSV > files. Such an API is not yet available in Python. This is also > required for the C++ Datasets API. If there are not one or more JIRA > issues about this I suggest that we open some to capture the use cases > > On Fri, Feb 14, 2020 at 3:16 PM filippo medri <[email protected]> wrote: > > > Hi, > by experimenting with arrow read_csv function to convert csv fie into parquet > I found that it reads the data in memory. > On a side the ReadOptions class allows to specify a blocksize parameter to > limit how much bytes to process at a time, but by looking at the memory usage > my understanding is that the underlying Table is filled with all data. > Is there a way to at least specify a parameter to limit the read to a batch > of rows? I see that I can skip rows from the beginning, but I am not finding > a way to limit how many rows to read. > Which is the intended way to read a csv file that does not fit into memory? > Thanks in advance, > Filippo Medri
