Re: Summary of RLE and other compression efforts?

2020-03-10 Thread Radev, Martin
Hey Evan, thank you for the interest. There has been some effort for compressing floating-point data on the Parquet side, namely the BYTE_STREAM_SPLIT encoding. On its own it does not compress floating point data but makes it more compressible for when a compressor, such as ZSTD, LZ4, etc, is

Request for review: [C++] Expose codec compression level to user https://github.com/apache/arrow/pull/5071

2019-08-20 Thread Radev, Martin
Dear all, since this patch modifies the API and touches a lot of files to propagate the information through the stack, it would be great to receive some more constructive reviews on what makes sense and what doesn't. Patch: https://github.com/apache/arrow/pull/5071 [C++] Expose codec compr

Re: Adding a new encoding for FP data

2019-07-11 Thread Radev, Martin
> I hope this feature can be implemented in Arrow soon, so that we can use > it > > in our system. > > > > Best, > > Liya Fan > > > > On Thu, Jul 11, 2019 at 5:55 PM Radev, Martin > wrote: > > > > > Hello Liya Fan, > > > > >

Re: Adding a new encoding for FP data

2019-07-11 Thread Radev, Martin
n Thu, Jul 11, 2019 at 5:15 PM Radev, Martin wrote: > Hello people, > > > there has been discussion in the Apache Parquet mailing list on adding a > new encoder for FP data. > The reason for this is that the supported compressors by Apache Parquet > (zstd, gzip, etc) do not

Adding a new encoding for FP data

2019-07-11 Thread Radev, Martin
Hello people, there has been discussion in the Apache Parquet mailing list on adding a new encoder for FP data. The reason for this is that the supported compressors by Apache Parquet (zstd, gzip, etc) do not compress well raw FP data. In my investigation it turns out that a very simple simpl