TIMESTAMP Encoding for 1969-12-31 23:59:59.5 in ORC

2018-05-17 Thread Wenlei Xie
Hi,

>From the code encoding TIMESTAMP:

  final long secs = vec.time[i + offset] / MILLIS_PER_SECOND;
>   final int newNanos = vec.nanos[i + offset];




So for 1969-12-31 23:59:59.500, it will be:

secs = 0(-500 / 1000)
newNanos = 500_000_000   (500ms)

Thus when the time is reading back, it will be 1970-01-01 00:00:00.500
right?

For timestamp earlier than 1969-12-31 23:59:59, since secs will be
negative, decoder will subtracted the result by 1 second. However, for
timestamp within 1969-12-31 23:59:59 and 1970-01-01 00:00:00, it looks like
the data will be read back as a timestamp 1 second later?


Thank you !


(The encoding is https://github.com/apache/orc/blob/
5b5c0d5bb14469ddf8e399b4ed879cc1cca9bec6/java/core/src/java/
org/apache/orc/impl/writer/TimestampTreeWriter.java#L127-L128 )


Best,
Wenlei

-- 
Best Regards,


Re: Zstd decoder support

2018-05-17 Thread Dain Sundstrom
Our expectation is maybe in a quarter.  

-dain

> On May 17, 2018, at 11:42 AM, Xiening Dai  wrote:
> 
> Hi Dain,
> 
> Do you have a roughly timeline regarding when the Java zstd compressor will 
> be available? Thanks.
> 
> 
>> On May 7, 2018, at 12:34 PM, Dain Sundstrom  wrote:
>> 
>> The fixes are released in v0.11
>> 
>> -dain
>> 
>>> On May 6, 2018, at 9:36 PM, Xiening Dai  wrote:
>>> 
>>> Thanks for clarification. It makes sense to wait for your fixes. Thx.
>>> 
 On May 5, 2018, at 1:04 PM, Dain Sundstrom  wrote:
 
 
> On May 5, 2018, at 11:46 AM, Xiening Dai  wrote:
> 
 BTW we are about to do a release that fixes a bug with zstd.
> 
> I am curious which bug you are referring to. Is it a bug with Java 
> implementation or it affects C++ as well?
 
 The bugs were in the Java implementation.  IIRC there were two problems.  
 In v0.10, we added support for zstd concatenated frames, and it had a rare 
 buffer overrun problem.  The second problem has been around since the 
 beginning.  When the file contains checksums they weren’t being validated 
 correctly.  We missed this one because the default native implementation 
 was not adding checksums so the code wasn’t actually being tested.
 
 -dain
>>> 
>> 
> 



Re: Zstd decoder support

2018-05-17 Thread Xiening Dai
Hi Dain,

Do you have a roughly timeline regarding when the Java zstd compressor will be 
available? Thanks.


> On May 7, 2018, at 12:34 PM, Dain Sundstrom  wrote:
> 
> The fixes are released in v0.11
> 
> -dain
> 
>> On May 6, 2018, at 9:36 PM, Xiening Dai  wrote:
>> 
>> Thanks for clarification. It makes sense to wait for your fixes. Thx.
>> 
>>> On May 5, 2018, at 1:04 PM, Dain Sundstrom  wrote:
>>> 
>>> 
 On May 5, 2018, at 11:46 AM, Xiening Dai  wrote:
 
>>> BTW we are about to do a release that fixes a bug with zstd.
 
 I am curious which bug you are referring to. Is it a bug with Java 
 implementation or it affects C++ as well?
>>> 
>>> The bugs were in the Java implementation.  IIRC there were two problems.  
>>> In v0.10, we added support for zstd concatenated frames, and it had a rare 
>>> buffer overrun problem.  The second problem has been around since the 
>>> beginning.  When the file contains checksums they weren’t being validated 
>>> correctly.  We missed this one because the default native implementation 
>>> was not adding checksums so the code wasn’t actually being tested.
>>> 
>>> -dain
>> 
> 



[GitHub] orc issue #268: ORC-363 Enable zstd decompression in ORC Java reader

2018-05-17 Thread xndai
Github user xndai commented on the issue:

https://github.com/apache/orc/pull/268
  
The current solution is not perfect. But at least it gives us some ability 
to read zstd Orc files, which I believe is important from the compatibility 
perspective - our in-house system has zstd Orc that would like to be consumed 
by Hive, Spark, etc. I am not sure when the zstd compressor will be available. 
It's probably another 6 months or a year.

If we enable zstd on C++ reader/writer first. Then we enable Java reader to 
consume zstd Orc from C++ writer. Would you consider that as end-to-end test?


---