[Python] Support for Run-Length Encoding in Parquet Files or Arrow IPC

Nikhil Makan Wed, 11 Jan 2023 16:04:42 -0800

Hi Team,

Question 1:
I would like to know if pyarrow has support for writing parquet files with
run-length encoding? There is mention of this in the Python Docs under the
compression section.


'can be compressed after the encoding passes (dictionary, RLE encoding)'
https://arrow.apache.org/docs/python/parquet.html#compression-encoding-and-file-compatibility

However I am not seeing the option in the API reference:
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table

I do note it's covered off in the C++ documentation, anyway we can access
this in python?
https://arrow.apache.org/docs/cpp/parquet.html

Question 2:
In addition to the above, I am interested to know if there are any methods
to apply this type of encoding to data in transit over a network. Our
actual use case has a large amount of data and would GREATLY benefit
from run-length encoding due to the repetition (sensors not changing values
that often). We are trying to send this data from a warehouse (the
warehouse has not been selected as yet) to an application back end, which
ultimately gets sent onto an application front end to visualise.

Kind regards
Nikhil Makan

[Python] Support for Run-Length Encoding in Parquet Files or Arrow IPC

Reply via email to