Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Wes McKinney
+1. The Thrift structures are private to parquet-cpp, so it should not be an issue On Wed, Sep 26, 2018 at 12:23 PM Zoltan Ivanfi wrote: > > Hi, > > I just had a conversation with Nandor and he pointed out to me that even if > we broke _reading_ in parquet-cpp 1.5.0, we could simply release a

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
Hi, I just had a conversation with Nandor and he pointed out to me that even if we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1 version that fixes it. The important thing is that _writing_ is good and parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
Hi, Please let me know your opinions as well. So far all concerns were only raised by me, which may indicate that other community members do not consider this issue serious and in this case my suggestions may be excessive and unjustified. Just to clarify: The data structures for the encrypion

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Gidon Gershinsky
Oh, I see now. Moving the crypto structure from position 8 to 9. Let me process that, and we can switch to a direct channel to continue this discussion. On Wed, Sep 26, 2018 at 3:30 PM Zoltan Ivanfi wrote: > Hi, > > I think it's safer if we skip id 8 altogether and use id 9 for the new > crypto

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
Hi, I think it's safer if we skip id 8 altogether and use id 9 for the new crypto structure. This way we don't have to worry about remaining backwards compatible with the accidentally released structure. Br, Zoltan On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky wrote: > Hi, > > I think we

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Gidon Gershinsky
Hi, I think we should use this id for its current purpose. This field had been defined and merged months ago, and should be there is any scenario. Except for the last week's change, the encryption format had been stable for a while now. The timing of this change was unfortunate; but the change

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
Hi, It seems that I spoke too early. I just noticed that a new field was added to the ColumnChunk struct in https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 Although the field is optional, we can't pretend it was never released, because parquet-cpp

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Gidon Gershinsky
Sounds good. I agree a feature branch is the right thing to do in these cases. Cheers, Gidon. On Wed, Sep 26, 2018 at 2:42 PM Zoltan Ivanfi wrote: > Hi, > > If the encryption code release in parquet-cpp is unused at this moment then > I think we are fine. It means that we are still free to

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
Hi, If the encryption code release in parquet-cpp is unused at this moment then I think we are fine. It means that we are still free to decide any way about the data structures without the risk of incompatility issues. In this case I would suggest to proceed as we planned at the Parquet sync.

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Gidon Gershinsky
That's correct. A layer that runs the crypto classes to encrypt pages and structures is not merged yet. Even if the parquet.thrift is out, there is virtually no chance encrypted files are created with it, and certainly not in production. Given a (hopefully mild) headache this last-minute

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Wes McKinney
AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so reading and writing files with encryption is not possible. Am I wrong about that? I'm wholly supportive of changing the project to use a released version of the Parquet format, so let's do that ASAP. On Tue, Sep 25, 2018 at 1:30 PM

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Gidon Gershinsky
Yep! (sent in parallel :) On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi wrote: > Hi, > > As a short update, I just checked the PR for PARQUET-1419 and although in > its current form it is a breaking change, it can be easily rewritten to > become backwards-compatible so this part of the problem

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Gidon Gershinsky
Hi, There is a third options. PARQUET-1419 doesn't have to be breaking. Removal of the 'required boolean encrypted_footer" is not a must (it can be set to true, and checked in readers). I can modify PARQUET-1419 pr to keep this field - then we have only one new field, an optional one, which

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Zoltan Ivanfi
Hi, As a short update, I just checked the PR for PARQUET-1419 and although in its current form it is a breaking change, it can be easily rewritten to become backwards-compatible so this part of the problem does not apply any more. Br, Zoltan On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi