Re: examples of using new compression scheme

Chris Nuernberger Thu, 13 Jan 2022 15:11:40 -0800

Great, thanks, I just hadn't noticed until now - thanks!

On Thu, Jan 13, 2022 at 4:09 PM Micah Kornfield <[email protected]>
wrote:


> Hi Chris,
>
>> Upgrading to 6.0.X I noticed that record batches can have body
>> compression which I think is great.
>
>
> Small nit: this was released in Arrow 4.
>
> I had trouble finding examples in python or R (or java) of writing an IPC
>> file with various types of compression used for the record batch.
>
>
> Java code is at [1] with implementations for compression codec living in
> [2].
>
> Is the compression applied per-column or upon the record batch after the
>> buffers have been serialized to the batch?  If it is applied per column
>> which buffers - given that text for example can consist of 3 buffers
>> (validity, offset, data) is compression applied to all three or just data
>> or data and offset?
>
> It is applied per buffer, all buffers are compressed.
>
> Cheers,
> Micah
>
>
> [1]
> https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/java/vector/src/main/java/org/apache/arrow/vector/VectorUnloader.java#L100
> [2]
> https://github.com/apache/arrow/tree/971a9d352e456882aa5b70ac722607840cdb9df7/java/compression/src
>
> On Thu, Jan 13, 2022 at 2:55 PM Chris Nuernberger <[email protected]>
> wrote:
>
>> Upgrading to 6.0.X I noticed that record batches can have body
>> compression which I think is great.
>>
>> I had trouble finding examples in python or R (or java) of writing an IPC
>> file with various types of compression used for the record batch.
>>
>> Is the compression applied per-column or upon the record batch after the
>> buffers have been serialized to the batch?  If it is applied per column
>> which buffers - given that text for example can consist of 3 buffers
>> (validity, offset, data) is compression applied to all three or just data
>> or data and offset?
>>
>

Re: examples of using new compression scheme

Reply via email to