Hi I'm looking at using Arrow primarily on low-resource instances with out of memory datasets. This is the workflow I'm trying to implement.
* Write record batches in IPC streaming format to a file from a C runtime. * Consume it one row at a time from python/C by loading the file in chunks. * If the schema is simple enough to support zero copy operations, make the table readable from pandas. This needs me to, * convert it into a Table with a single chunk per column (since pandas can't use mmap with chunked arrays). * write the table in IPC random access format to a file. PyArrow provides a method `combine_chunks` to combine chunks into a single chunk. However, it needs to create the entire table in memory (I suspect it is 2x, since it loads both versions of the table in memory but that can be avoided). Since the Arrow layout is columnar, I'm curious if it is possible to write the table one column at a time. And if the existing glib/python APIs support it? The C++ file writer objects seem to go down to serializing a single record batch at a time and not per column. Thank you, Ishan