Owen,
Yes, you are correct. I misunderstood RLEv2 which does not use LEB128.
To answer your question:
1. RLEv1 + fixed 8 byte in my experiment means that we don't do LEB128
encoding for RLE literals and directly write fixed 8 bytes in little
endian.
2. The data is from our production data which i
Thanks for the sample data.
Just out of curiosity, is the natural data actually sorted like that?
I think you have a misunderstanding of RLEv2. It doesn't use LEB128 except
for the values in the header. What does RLEv1 + fixed 8 byte mean?
Based on the 512 values that you posted, I see:
512 val
I think here the bigger issue is the combination of zstd and LEB128 which
results in much lower compression ratio compared to Zlib. This is by design for
zstd level 1.And according to the answer from zstd community (see link from
Gang), this only gets better after much higher level (says 12).
I
Owen
I have put the example data to reproduce the issue in
https://github.com/facebook/zstd/issues/1325. It contains 512 unsigned
numbers which are already zigzag-encoded using (val « 1) ^ (val » 63). The
low overhead representation of literals is exactly what we need for RLEv3.
We should also pa
Gang,
As you correctly point out, some columns don't work well with RLE.
Unfortunately, without being able to look at the data it is hard for me to
guess what the right compression strategies are. Based on your description,
I would guess that the data doesn't have a lot of patterns to it and cov
Hi,
> From above observation, we find that it is better to disable LEB128 encoding
> while zstd is used.
You can enable file size optimizations (automatically recommend better layouts
for compression) when
"orc.encoding.strategy"="COMPRESSION"
There are a bunch of bitpacking loops that's co
Hi,
We are using zstd as the default compressor in production for ORC. Overall
the performance is very good. Through our analysis, there is some room of
improvement for integers.
As we know, all integers use base 128 varint encoding (a.k.a LEB128) after
RLE. This works well for zlib and other com