Hi Owen, Thanks for the response. I saw that DirectDecompressor will be used if available and the difference was only in compression. Keeping in mind what you said, I looked at the code again. I see that the only specific piece that ORC uses is "nowrap" = true in Deflater. As far as I understand from the description, it should directly correspond to CompressionHeader.NO_HEADER in ZlibCompressor. In this case, ZlibCompressor with the right setup can be a replacement for Deflater. What do you think?
Aleksei *Aleksei Statkevich *| Engineering Manager <http://www.google.com/url?q=http%3A%2F%2Frocketfuel.com%2F&sa=D&sntz=1&usg=AFrqEzfAQ9xih8SV05CiYtvyyIAKLzpX2g> <https://www.google.com/url?q=https%3A%2F%2Ftwitter.com%2Frocketfuelinc&sa=D&sntz=1&usg=AFrqEzdmS-VfAbRejUE27Yrsp6UaaAoUdw> <https://www.google.com/url?q=https%3A%2F%2Fwww.facebook.com%2Frocketfuelinc%2F&sa=D&sntz=1&usg=AFrqEzc8zstBb-QJdiYqd7m9Wmmt-UHs7A> <https://www.google.com/url?q=https%3A%2F%2Fwww.instagram.com%2Frocketfuellife%2F&sa=D&sntz=1&usg=AFrqEzf8veiDVVhTCQnpUnRttXonn6y9-g> <https://www.google.com/url?q=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Frocket-fuel-inc-&sa=D&sntz=1&usg=AFrqEzcvsj2bSqJ_SYc8qpQWQJnXXEjvLQ> <https://www.google.com/url?q=https%3A%2F%2Fwww.glassdoor.com%2FOverview%2FWorking-at-Rocket-Fuel-EI_IE286428.11%2C22.htm&sa=D&sntz=1&usg=AFrqEzf6IUelwlAKdidiiJ3wTFdjnigQVg> On Thu, Jun 23, 2016 at 2:35 PM, Owen O'Malley <[email protected]> wrote: > > > On Fri, Jun 17, 2016 at 11:31 PM, Aleksei Statkevich < > [email protected]> wrote: > >> Hello, >> >> I recently looked at ORC encoding and noticed >> that hive.ql.io.orc.ZlibCodec uses java's java.util.zip.Deflater and not >> Hadoop's native ZlibCompressor. >> >> Can someone please tell me what is the reason for it? >> > > It is more subtle than that. The first piece to notice is that if your > Hadoop has the direct decompression > (org.apache.hadoop.io.compress.zlib.ZlibDirectDecompressor), it will be > used. The reason that the ZlibCompressor isn't used is because ORC needs a > different API. In particular, ORC doesn't use stream compression, but > rather block compression. That is done so that it can jump over compression > blocks for predicate push down. (If you are skipping over a lot of values, > ORC doesn't need to decompress the bytes.) > > .. Owen > > > >> >> Also, how does performance of Deflater (which also uses native >> implementation) compare to Hadoop's native zlib implementation? >> >> Thanks, >> Aleksei >> >> >
