Hi all, I'm aiming to reply to the following thread. Not sure if this message will appear in the right place. https://lists.apache.org/thread/93kg641xk52lm5m11vwodbyc1hzvbnf3
I've implemented a workaround for a similar use case. I thought I'd share, as either someone could recommend a better solution using the existing API. Or perhaps to discuss additions to the API which could make this easier. In my use case the limitation is the memory available when reading a record batch. I'd like to keep the in-memory size of each record batch within a maximum number of bytes. Note, I'm not concerned about the disk size (which will be smaller due to LZ4 compression). So when appending values, I'd like to be able to specify a maximum value, say 500MB, and then once that's exceeded write the record batch to disk. The data types I need to support are float64, int64, bool, listof(float64), listof(int64), listof(bool), and strings. In my use case, I'm writing to a builder in a row-wise fashion. My current approach is, when I write each cell I increment a variable which keeps track of the approximate used memory size in bytes. Luckily, for the types I need to support, this is fairly simple to track approximately. i.e. a float64 is "+8", list-of float64 is "len(floats)*8+8". Is there a better way to do this using the existing API? Would it make sense for this to be supported natively by the API? I'm using the Go implementation. But I guess this applies equally to the C++, and maybe other implementations too. Thanks for taking the time to read this. Cheers, Greg
