TIMESTAMP Encoding for 1969-12-31 23:59:59.5 in ORC
Hi, >From the code encoding TIMESTAMP: final long secs = vec.time[i + offset] / MILLIS_PER_SECOND; > final int newNanos = vec.nanos[i + offset]; So for 1969-12-31 23:59:59.500, it will be: secs = 0(-500 / 1000) newNanos = 500_000_000 (500ms) Thus when the time is reading back, it will be 1970-01-01 00:00:00.500 right? For timestamp earlier than 1969-12-31 23:59:59, since secs will be negative, decoder will subtracted the result by 1 second. However, for timestamp within 1969-12-31 23:59:59 and 1970-01-01 00:00:00, it looks like the data will be read back as a timestamp 1 second later? Thank you ! (The encoding is https://github.com/apache/orc/blob/ 5b5c0d5bb14469ddf8e399b4ed879cc1cca9bec6/java/core/src/java/ org/apache/orc/impl/writer/TimestampTreeWriter.java#L127-L128 ) Best, Wenlei -- Best Regards,
Re: Zstd decoder support
Our expectation is maybe in a quarter. -dain > On May 17, 2018, at 11:42 AM, Xiening Daiwrote: > > Hi Dain, > > Do you have a roughly timeline regarding when the Java zstd compressor will > be available? Thanks. > > >> On May 7, 2018, at 12:34 PM, Dain Sundstrom wrote: >> >> The fixes are released in v0.11 >> >> -dain >> >>> On May 6, 2018, at 9:36 PM, Xiening Dai wrote: >>> >>> Thanks for clarification. It makes sense to wait for your fixes. Thx. >>> On May 5, 2018, at 1:04 PM, Dain Sundstrom wrote: > On May 5, 2018, at 11:46 AM, Xiening Dai wrote: > BTW we are about to do a release that fixes a bug with zstd. > > I am curious which bug you are referring to. Is it a bug with Java > implementation or it affects C++ as well? The bugs were in the Java implementation. IIRC there were two problems. In v0.10, we added support for zstd concatenated frames, and it had a rare buffer overrun problem. The second problem has been around since the beginning. When the file contains checksums they weren’t being validated correctly. We missed this one because the default native implementation was not adding checksums so the code wasn’t actually being tested. -dain >>> >> >
Re: Zstd decoder support
Hi Dain, Do you have a roughly timeline regarding when the Java zstd compressor will be available? Thanks. > On May 7, 2018, at 12:34 PM, Dain Sundstromwrote: > > The fixes are released in v0.11 > > -dain > >> On May 6, 2018, at 9:36 PM, Xiening Dai wrote: >> >> Thanks for clarification. It makes sense to wait for your fixes. Thx. >> >>> On May 5, 2018, at 1:04 PM, Dain Sundstrom wrote: >>> >>> On May 5, 2018, at 11:46 AM, Xiening Dai wrote: >>> BTW we are about to do a release that fixes a bug with zstd. I am curious which bug you are referring to. Is it a bug with Java implementation or it affects C++ as well? >>> >>> The bugs were in the Java implementation. IIRC there were two problems. >>> In v0.10, we added support for zstd concatenated frames, and it had a rare >>> buffer overrun problem. The second problem has been around since the >>> beginning. When the file contains checksums they weren’t being validated >>> correctly. We missed this one because the default native implementation >>> was not adding checksums so the code wasn’t actually being tested. >>> >>> -dain >> >
[GitHub] orc issue #268: ORC-363 Enable zstd decompression in ORC Java reader
Github user xndai commented on the issue: https://github.com/apache/orc/pull/268 The current solution is not perfect. But at least it gives us some ability to read zstd Orc files, which I believe is important from the compatibility perspective - our in-house system has zstd Orc that would like to be consumed by Hive, Spark, etc. I am not sure when the zstd compressor will be available. It's probably another 6 months or a year. If we enable zstd on C++ reader/writer first. Then we enable Java reader to consume zstd Orc from C++ writer. Would you consider that as end-to-end test? ---