Hi Team,

Question 1:
I would like to know if pyarrow has support for writing parquet files with
run-length encoding? There is mention of this in the Python Docs under the
compression section.

'can be compressed after the encoding passes (dictionary, RLE encoding)'
https://arrow.apache.org/docs/python/parquet.html#compression-encoding-and-file-compatibility

However I am not seeing the option in the API reference:
https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html#pyarrow.parquet.write_table

I do note it's covered off in the C++ documentation, anyway we can access
this in python?
https://arrow.apache.org/docs/cpp/parquet.html

Question 2:
In addition to the above, I am interested to know if there are any methods
to apply this type of encoding to data in transit over a network. Our
actual use case has a large amount of data and would GREATLY benefit
from run-length encoding due to the repetition (sensors not changing values
that often). We are trying to send this data from a warehouse (the
warehouse has not been selected as yet) to an application back end, which
ultimately gets sent onto an application front end to visualise.

Kind regards
Nikhil Makan

Reply via email to