Hi Chris,

> Upgrading to 6.0.X I noticed that record batches can have body compression
> which I think is great.


Small nit: this was released in Arrow 4.

I had trouble finding examples in python or R (or java) of writing an IPC
> file with various types of compression used for the record batch.


Java code is at [1] with implementations for compression codec living in
[2].

Is the compression applied per-column or upon the record batch after the
> buffers have been serialized to the batch?  If it is applied per column
> which buffers - given that text for example can consist of 3 buffers
> (validity, offset, data) is compression applied to all three or just data
> or data and offset?

It is applied per buffer, all buffers are compressed.

Cheers,
Micah


[1]
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java#L100
[2]
https://github.com/apache/arrow/tree/971a9d352e456882aa5b70ac722607840cdb9df7/java/compression/src

On Thu, Jan 13, 2022 at 2:55 PM Chris Nuernberger <[email protected]>
wrote:

> Upgrading to 6.0.X I noticed that record batches can have body compression
> which I think is great.
>
> I had trouble finding examples in python or R (or java) of writing an IPC
> file with various types of compression used for the record batch.
>
> Is the compression applied per-column or upon the record batch after the
> buffers have been serialized to the batch?  If it is applied per column
> which buffers - given that text for example can consist of 3 buffers
> (validity, offset, data) is compression applied to all three or just data
> or data and offset?
>

Reply via email to