Great, thanks, I just hadn't noticed until now - thanks! On Thu, Jan 13, 2022 at 4:09 PM Micah Kornfield <[email protected]> wrote:
> Hi Chris, > >> Upgrading to 6.0.X I noticed that record batches can have body >> compression which I think is great. > > > Small nit: this was released in Arrow 4. > > I had trouble finding examples in python or R (or java) of writing an IPC >> file with various types of compression used for the record batch. > > > Java code is at [1] with implementations for compression codec living in > [2]. > > Is the compression applied per-column or upon the record batch after the >> buffers have been serialized to the batch? If it is applied per column >> which buffers - given that text for example can consist of 3 buffers >> (validity, offset, data) is compression applied to all three or just data >> or data and offset? > > It is applied per buffer, all buffers are compressed. > > Cheers, > Micah > > > [1] > https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java#L100 > [2] > https://github.com/apache/arrow/tree/971a9d352e456882aa5b70ac722607840cdb9df7/java/compression/src > > On Thu, Jan 13, 2022 at 2:55 PM Chris Nuernberger <[email protected]> > wrote: > >> Upgrading to 6.0.X I noticed that record batches can have body >> compression which I think is great. >> >> I had trouble finding examples in python or R (or java) of writing an IPC >> file with various types of compression used for the record batch. >> >> Is the compression applied per-column or upon the record batch after the >> buffers have been serialized to the batch? If it is applied per column >> which buffers - given that text for example can consist of 3 buffers >> (validity, offset, data) is compression applied to all three or just data >> or data and offset? >> >
