Re: Parquet-cpp has already released unofficial thrift changes
+1. The Thrift structures are private to parquet-cpp, so it should not be an issue On Wed, Sep 26, 2018 at 12:23 PM Zoltan Ivanfi wrote: > > Hi, > > I just had a conversation with Nandor and he pointed out to me that even if > we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1 > version that fixes it. The important thing is that _writing_ is good and > parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have to > worry about breaking changes that would make existing data files unreadable. > > Based on this, I would [once again :)] suggest moving forward as planned. > > Thanks, > > Zoltan > > On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi wrote: > > > Hi, > > > > Please let me know your opinions as well. So far all concerns were only > > raised by me, which may indicate that other community members do not > > consider this issue serious and in this case my suggestions may be > > excessive and unjustified. > > > > Just to clarify: The data structures for the encrypion feature are > > unlikely to change. I solely suggest these measures to remedy the situation > > of parquet.thrift having been released in parquet-cpp before officially > > getting released in parquet-format. > > > > Thanks, > > > > Zoltan > > > > On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi wrote: > > > >> Hi, > >> > >> I think it's safer if we skip id 8 altogether and use id 9 for the new > >> crypto structure. This way we don't have to worry about remaining backwards > >> compatible with the accidentally released structure. > >> > >> Br, > >> > >> Zoltan > >> > >> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky > >> wrote: > >> > >>> Hi, > >>> > >>> I think we should use this id for its current purpose. This field had > >>> been > >>> defined and merged months ago, and should be there is any scenario. > >>> Except for the last week's change, the encryption format had been stable > >>> for a while now. The timing of this change was unfortunate; but the > >>> change > >>> was minor and done for a good reason. > >>> I hope there won't be new disturbances from now on. > >>> > >>> Cheers, Gidon. > >>> > >>> > >>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi > >>> wrote: > >>> > >>> > Hi, > >>> > > >>> > It seems that I spoke too early. I just noticed that a new field was > >>> added > >>> > to the ColumnChunk struct in > >>> > > >>> > > >>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 > >>> > Although the field is optional, we can't pretend it was never released, > >>> > because parquet-cpp already expects that field for the id 8. > >>> Therefore, if > >>> > we were to add a different field with id 8 to the ColumnChunk struct, > >>> the > >>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when > >>> > reading new files, because it would try to read the ColumnChunk struct > >>> > including all of its fields (regardless of whether the code actually > >>> uses > >>> > them or not) and the field with id 8 would not match its Thrift schema. > >>> > > >>> > We still can avoid breaking changes in the ColumnChunk struct by either > >>> > using id 8 for its current purpose or not using it at all (skip id 8 > >>> and > >>> > use 9 for the next field we add). > >>> > > >>> > Br, > >>> > > >>> > Zoltan > >>> > > >>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > >>> > > >>> > > Hi, > >>> > > > >>> > > If the encryption code release in parquet-cpp is unused at this > >>> moment > >>> > > then I think we are fine. It means that we are still free to decide > >>> any > >>> > way > >>> > > about the data structures without the risk of incompatility issues. > >>> In > >>> > this > >>> > > case I would suggest to proceed as we planned at the Parquet sync. > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Zoltan > >>> > > > >>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky > >>> > wrote: > >>> > > > >>> > >> That's correct. A layer that runs the crypto classes to encrypt > >>> pages > >>> > and > >>> > >> structures is not merged yet. Even if the parquet.thrift is out, > >>> there > >>> > is > >>> > >> virtually no chance encrypted files are created with it, and > >>> certainly > >>> > not > >>> > >> in production. > >>> > >> > >>> > >> Given a (hopefully mild) headache this last-minute > >>> feature/requirement > >>> > has > >>> > >> caused, let me add details on where it is coming from. I received > >>> the > >>> > >> requirement from a company that got an internal pushback from teams > >>> > >> working > >>> > >> with unencrypted columns only, and unwilling to update Parquet libs. > >>> > They > >>> > >> need support for a transition period for the current readers. > >>> > >> > >>> > >> We have an option to say no, and stick to the original goal list. > >>> But I > >>> > >> think the requirement makes sense, even if coming very late - and > >>> will > >>> > >> significantly ease adoption of the Parquet encryption. Also, it > >>>
Re: Parquet-cpp has already released unofficial thrift changes
Hi, I just had a conversation with Nandor and he pointed out to me that even if we broke _reading_ in parquet-cpp 1.5.0, we could simply release a 1.5.1 version that fixes it. The important thing is that _writing_ is good and parquet-format-compliant in parquet-cpp 1.5.0, therefore we do not have to worry about breaking changes that would make existing data files unreadable. Based on this, I would [once again :)] suggest moving forward as planned. Thanks, Zoltan On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi wrote: > Hi, > > Please let me know your opinions as well. So far all concerns were only > raised by me, which may indicate that other community members do not > consider this issue serious and in this case my suggestions may be > excessive and unjustified. > > Just to clarify: The data structures for the encrypion feature are > unlikely to change. I solely suggest these measures to remedy the situation > of parquet.thrift having been released in parquet-cpp before officially > getting released in parquet-format. > > Thanks, > > Zoltan > > On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi wrote: > >> Hi, >> >> I think it's safer if we skip id 8 altogether and use id 9 for the new >> crypto structure. This way we don't have to worry about remaining backwards >> compatible with the accidentally released structure. >> >> Br, >> >> Zoltan >> >> On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky >> wrote: >> >>> Hi, >>> >>> I think we should use this id for its current purpose. This field had >>> been >>> defined and merged months ago, and should be there is any scenario. >>> Except for the last week's change, the encryption format had been stable >>> for a while now. The timing of this change was unfortunate; but the >>> change >>> was minor and done for a good reason. >>> I hope there won't be new disturbances from now on. >>> >>> Cheers, Gidon. >>> >>> >>> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi >>> wrote: >>> >>> > Hi, >>> > >>> > It seems that I spoke too early. I just noticed that a new field was >>> added >>> > to the ColumnChunk struct in >>> > >>> > >>> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 >>> > Although the field is optional, we can't pretend it was never released, >>> > because parquet-cpp already expects that field for the id 8. >>> Therefore, if >>> > we were to add a different field with id 8 to the ColumnChunk struct, >>> the >>> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when >>> > reading new files, because it would try to read the ColumnChunk struct >>> > including all of its fields (regardless of whether the code actually >>> uses >>> > them or not) and the field with id 8 would not match its Thrift schema. >>> > >>> > We still can avoid breaking changes in the ColumnChunk struct by either >>> > using id 8 for its current purpose or not using it at all (skip id 8 >>> and >>> > use 9 for the next field we add). >>> > >>> > Br, >>> > >>> > Zoltan >>> > >>> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: >>> > >>> > > Hi, >>> > > >>> > > If the encryption code release in parquet-cpp is unused at this >>> moment >>> > > then I think we are fine. It means that we are still free to decide >>> any >>> > way >>> > > about the data structures without the risk of incompatility issues. >>> In >>> > this >>> > > case I would suggest to proceed as we planned at the Parquet sync. >>> > > >>> > > Thanks, >>> > > >>> > > Zoltan >>> > > >>> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky >>> > wrote: >>> > > >>> > >> That's correct. A layer that runs the crypto classes to encrypt >>> pages >>> > and >>> > >> structures is not merged yet. Even if the parquet.thrift is out, >>> there >>> > is >>> > >> virtually no chance encrypted files are created with it, and >>> certainly >>> > not >>> > >> in production. >>> > >> >>> > >> Given a (hopefully mild) headache this last-minute >>> feature/requirement >>> > has >>> > >> caused, let me add details on where it is coming from. I received >>> the >>> > >> requirement from a company that got an internal pushback from teams >>> > >> working >>> > >> with unencrypted columns only, and unwilling to update Parquet libs. >>> > They >>> > >> need support for a transition period for the current readers. >>> > >> >>> > >> We have an option to say no, and stick to the original goal list. >>> But I >>> > >> think the requirement makes sense, even if coming very late - and >>> will >>> > >> significantly ease adoption of the Parquet encryption. Also, it >>> turns >>> > out >>> > >> to be easy to implement, with minor changes in the code. >>> > >> >>> > >> In any case, its certainly a good idea to keep a single copy of >>> > >> parquet.thrift in the format repo. >>> > >> >>> > >> Cheers, Gidon >>> > >> >>> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney >>> > >> wrote: >>> > >> >>> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so >>> > >> >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, Please let me know your opinions as well. So far all concerns were only raised by me, which may indicate that other community members do not consider this issue serious and in this case my suggestions may be excessive and unjustified. Just to clarify: The data structures for the encrypion feature are unlikely to change. I solely suggest these measures to remedy the situation of parquet.thrift having been released in parquet-cpp before officially getting released in parquet-format. Thanks, Zoltan On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi wrote: > Hi, > > I think it's safer if we skip id 8 altogether and use id 9 for the new > crypto structure. This way we don't have to worry about remaining backwards > compatible with the accidentally released structure. > > Br, > > Zoltan > > On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky wrote: > >> Hi, >> >> I think we should use this id for its current purpose. This field had been >> defined and merged months ago, and should be there is any scenario. >> Except for the last week's change, the encryption format had been stable >> for a while now. The timing of this change was unfortunate; but the change >> was minor and done for a good reason. >> I hope there won't be new disturbances from now on. >> >> Cheers, Gidon. >> >> >> On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi >> wrote: >> >> > Hi, >> > >> > It seems that I spoke too early. I just noticed that a new field was >> added >> > to the ColumnChunk struct in >> > >> > >> https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 >> > Although the field is optional, we can't pretend it was never released, >> > because parquet-cpp already expects that field for the id 8. Therefore, >> if >> > we were to add a different field with id 8 to the ColumnChunk struct, >> the >> > thrift parser of parquet-cpp 1.5.0 would probably throw an error when >> > reading new files, because it would try to read the ColumnChunk struct >> > including all of its fields (regardless of whether the code actually >> uses >> > them or not) and the field with id 8 would not match its Thrift schema. >> > >> > We still can avoid breaking changes in the ColumnChunk struct by either >> > using id 8 for its current purpose or not using it at all (skip id 8 and >> > use 9 for the next field we add). >> > >> > Br, >> > >> > Zoltan >> > >> > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: >> > >> > > Hi, >> > > >> > > If the encryption code release in parquet-cpp is unused at this moment >> > > then I think we are fine. It means that we are still free to decide >> any >> > way >> > > about the data structures without the risk of incompatility issues. In >> > this >> > > case I would suggest to proceed as we planned at the Parquet sync. >> > > >> > > Thanks, >> > > >> > > Zoltan >> > > >> > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky >> > wrote: >> > > >> > >> That's correct. A layer that runs the crypto classes to encrypt pages >> > and >> > >> structures is not merged yet. Even if the parquet.thrift is out, >> there >> > is >> > >> virtually no chance encrypted files are created with it, and >> certainly >> > not >> > >> in production. >> > >> >> > >> Given a (hopefully mild) headache this last-minute >> feature/requirement >> > has >> > >> caused, let me add details on where it is coming from. I received the >> > >> requirement from a company that got an internal pushback from teams >> > >> working >> > >> with unencrypted columns only, and unwilling to update Parquet libs. >> > They >> > >> need support for a transition period for the current readers. >> > >> >> > >> We have an option to say no, and stick to the original goal list. >> But I >> > >> think the requirement makes sense, even if coming very late - and >> will >> > >> significantly ease adoption of the Parquet encryption. Also, it turns >> > out >> > >> to be easy to implement, with minor changes in the code. >> > >> >> > >> In any case, its certainly a good idea to keep a single copy of >> > >> parquet.thrift in the format repo. >> > >> >> > >> Cheers, Gidon >> > >> >> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney >> > >> wrote: >> > >> >> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so >> > >> > reading and writing files with encryption is not possible. Am I >> wrong >> > >> > about that? >> > >> > >> > >> > I'm wholly supportive of changing the project to use a released >> > >> > version of the Parquet format, so let's do that ASAP. >> > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > > >> > >> wrote: >> > >> > > >> > >> > > Yep! (sent in parallel :) >> > >> > > >> > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi >> > > > >> > >> > >> > > wrote: >> > >> > > >> > >> > > > Hi, >> > >> > > > >> > >> > > > As a short update, I just checked the PR for PARQUET-1419 and >> > >> although >> > >> > in >> > >> > > > its current form it is a breaking change, it can be easily >> > >>
Re: Parquet-cpp has already released unofficial thrift changes
Oh, I see now. Moving the crypto structure from position 8 to 9. Let me process that, and we can switch to a direct channel to continue this discussion. On Wed, Sep 26, 2018 at 3:30 PM Zoltan Ivanfi wrote: > Hi, > > I think it's safer if we skip id 8 altogether and use id 9 for the new > crypto structure. This way we don't have to worry about remaining backwards > compatible with the accidentally released structure. > > Br, > > Zoltan > > On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky wrote: > > > Hi, > > > > I think we should use this id for its current purpose. This field had > been > > defined and merged months ago, and should be there is any scenario. > > Except for the last week's change, the encryption format had been stable > > for a while now. The timing of this change was unfortunate; but the > change > > was minor and done for a good reason. > > I hope there won't be new disturbances from now on. > > > > Cheers, Gidon. > > > > > > On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi > > wrote: > > > > > Hi, > > > > > > It seems that I spoke too early. I just noticed that a new field was > > added > > > to the ColumnChunk struct in > > > > > > > > > https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 > > > Although the field is optional, we can't pretend it was never released, > > > because parquet-cpp already expects that field for the id 8. Therefore, > > if > > > we were to add a different field with id 8 to the ColumnChunk struct, > the > > > thrift parser of parquet-cpp 1.5.0 would probably throw an error when > > > reading new files, because it would try to read the ColumnChunk struct > > > including all of its fields (regardless of whether the code actually > uses > > > them or not) and the field with id 8 would not match its Thrift schema. > > > > > > We still can avoid breaking changes in the ColumnChunk struct by either > > > using id 8 for its current purpose or not using it at all (skip id 8 > and > > > use 9 for the next field we add). > > > > > > Br, > > > > > > Zoltan > > > > > > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > > > > > > > Hi, > > > > > > > > If the encryption code release in parquet-cpp is unused at this > moment > > > > then I think we are fine. It means that we are still free to decide > any > > > way > > > > about the data structures without the risk of incompatility issues. > In > > > this > > > > case I would suggest to proceed as we planned at the Parquet sync. > > > > > > > > Thanks, > > > > > > > > Zoltan > > > > > > > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky > > > wrote: > > > > > > > >> That's correct. A layer that runs the crypto classes to encrypt > pages > > > and > > > >> structures is not merged yet. Even if the parquet.thrift is out, > there > > > is > > > >> virtually no chance encrypted files are created with it, and > certainly > > > not > > > >> in production. > > > >> > > > >> Given a (hopefully mild) headache this last-minute > feature/requirement > > > has > > > >> caused, let me add details on where it is coming from. I received > the > > > >> requirement from a company that got an internal pushback from teams > > > >> working > > > >> with unencrypted columns only, and unwilling to update Parquet libs. > > > They > > > >> need support for a transition period for the current readers. > > > >> > > > >> We have an option to say no, and stick to the original goal list. > But > > I > > > >> think the requirement makes sense, even if coming very late - and > will > > > >> significantly ease adoption of the Parquet encryption. Also, it > turns > > > out > > > >> to be easy to implement, with minor changes in the code. > > > >> > > > >> In any case, its certainly a good idea to keep a single copy of > > > >> parquet.thrift in the format repo. > > > >> > > > >> Cheers, Gidon > > > >> > > > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney > > > >> wrote: > > > >> > > > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > > > >> > reading and writing files with encryption is not possible. Am I > > wrong > > > >> > about that? > > > >> > > > > >> > I'm wholly supportive of changing the project to use a released > > > >> > version of the Parquet format, so let's do that ASAP. > > > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky < > gg5...@gmail.com> > > > >> wrote: > > > >> > > > > > >> > > Yep! (sent in parallel :) > > > >> > > > > > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > > > > >> > > > > >> > > wrote: > > > >> > > > > > >> > > > Hi, > > > >> > > > > > > >> > > > As a short update, I just checked the PR for PARQUET-1419 and > > > >> although > > > >> > in > > > >> > > > its current form it is a breaking change, it can be easily > > > >> rewritten to > > > >> > > > become backwards-compatible so this part of the problem does > not > > > >> apply > > > >> > any > > > >> > > > more. > > > >> > > > > > > >> > > > Br, > > > >> > > > > > > >> > > >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, I think it's safer if we skip id 8 altogether and use id 9 for the new crypto structure. This way we don't have to worry about remaining backwards compatible with the accidentally released structure. Br, Zoltan On Wed, Sep 26, 2018 at 2:25 PM Gidon Gershinsky wrote: > Hi, > > I think we should use this id for its current purpose. This field had been > defined and merged months ago, and should be there is any scenario. > Except for the last week's change, the encryption format had been stable > for a while now. The timing of this change was unfortunate; but the change > was minor and done for a good reason. > I hope there won't be new disturbances from now on. > > Cheers, Gidon. > > > On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > It seems that I spoke too early. I just noticed that a new field was > added > > to the ColumnChunk struct in > > > > > https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 > > Although the field is optional, we can't pretend it was never released, > > because parquet-cpp already expects that field for the id 8. Therefore, > if > > we were to add a different field with id 8 to the ColumnChunk struct, the > > thrift parser of parquet-cpp 1.5.0 would probably throw an error when > > reading new files, because it would try to read the ColumnChunk struct > > including all of its fields (regardless of whether the code actually uses > > them or not) and the field with id 8 would not match its Thrift schema. > > > > We still can avoid breaking changes in the ColumnChunk struct by either > > using id 8 for its current purpose or not using it at all (skip id 8 and > > use 9 for the next field we add). > > > > Br, > > > > Zoltan > > > > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > > > > > Hi, > > > > > > If the encryption code release in parquet-cpp is unused at this moment > > > then I think we are fine. It means that we are still free to decide any > > way > > > about the data structures without the risk of incompatility issues. In > > this > > > case I would suggest to proceed as we planned at the Parquet sync. > > > > > > Thanks, > > > > > > Zoltan > > > > > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky > > wrote: > > > > > >> That's correct. A layer that runs the crypto classes to encrypt pages > > and > > >> structures is not merged yet. Even if the parquet.thrift is out, there > > is > > >> virtually no chance encrypted files are created with it, and certainly > > not > > >> in production. > > >> > > >> Given a (hopefully mild) headache this last-minute feature/requirement > > has > > >> caused, let me add details on where it is coming from. I received the > > >> requirement from a company that got an internal pushback from teams > > >> working > > >> with unencrypted columns only, and unwilling to update Parquet libs. > > They > > >> need support for a transition period for the current readers. > > >> > > >> We have an option to say no, and stick to the original goal list. But > I > > >> think the requirement makes sense, even if coming very late - and will > > >> significantly ease adoption of the Parquet encryption. Also, it turns > > out > > >> to be easy to implement, with minor changes in the code. > > >> > > >> In any case, its certainly a good idea to keep a single copy of > > >> parquet.thrift in the format repo. > > >> > > >> Cheers, Gidon > > >> > > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney > > >> wrote: > > >> > > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > > >> > reading and writing files with encryption is not possible. Am I > wrong > > >> > about that? > > >> > > > >> > I'm wholly supportive of changing the project to use a released > > >> > version of the Parquet format, so let's do that ASAP. > > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > > >> wrote: > > >> > > > > >> > > Yep! (sent in parallel :) > > >> > > > > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > > >> > > > >> > > wrote: > > >> > > > > >> > > > Hi, > > >> > > > > > >> > > > As a short update, I just checked the PR for PARQUET-1419 and > > >> although > > >> > in > > >> > > > its current form it is a breaking change, it can be easily > > >> rewritten to > > >> > > > become backwards-compatible so this part of the problem does not > > >> apply > > >> > any > > >> > > > more. > > >> > > > > > >> > > > Br, > > >> > > > > > >> > > > Zoltan > > >> > > > > > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi > > >> wrote: > > >> > > > > > >> > > > > Hi, > > >> > > > > > > >> > > > > On the Parquet sync we discussed that the practice of > > maintaining > > >> a > > >> > copy > > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must > > >> take > > >> > care > > >> > > > to > > >> > > > > not release parquet-format changes in parquet-cpp before we > > >> > officially > > >> > > > > release them in parquet-format. As I got back to my
Re: Parquet-cpp has already released unofficial thrift changes
Hi, I think we should use this id for its current purpose. This field had been defined and merged months ago, and should be there is any scenario. Except for the last week's change, the encryption format had been stable for a while now. The timing of this change was unfortunate; but the change was minor and done for a good reason. I hope there won't be new disturbances from now on. Cheers, Gidon. On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi wrote: > Hi, > > It seems that I spoke too early. I just noticed that a new field was added > to the ColumnChunk struct in > > https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 > Although the field is optional, we can't pretend it was never released, > because parquet-cpp already expects that field for the id 8. Therefore, if > we were to add a different field with id 8 to the ColumnChunk struct, the > thrift parser of parquet-cpp 1.5.0 would probably throw an error when > reading new files, because it would try to read the ColumnChunk struct > including all of its fields (regardless of whether the code actually uses > them or not) and the field with id 8 would not match its Thrift schema. > > We still can avoid breaking changes in the ColumnChunk struct by either > using id 8 for its current purpose or not using it at all (skip id 8 and > use 9 for the next field we add). > > Br, > > Zoltan > > On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > > > Hi, > > > > If the encryption code release in parquet-cpp is unused at this moment > > then I think we are fine. It means that we are still free to decide any > way > > about the data structures without the risk of incompatility issues. In > this > > case I would suggest to proceed as we planned at the Parquet sync. > > > > Thanks, > > > > Zoltan > > > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky > wrote: > > > >> That's correct. A layer that runs the crypto classes to encrypt pages > and > >> structures is not merged yet. Even if the parquet.thrift is out, there > is > >> virtually no chance encrypted files are created with it, and certainly > not > >> in production. > >> > >> Given a (hopefully mild) headache this last-minute feature/requirement > has > >> caused, let me add details on where it is coming from. I received the > >> requirement from a company that got an internal pushback from teams > >> working > >> with unencrypted columns only, and unwilling to update Parquet libs. > They > >> need support for a transition period for the current readers. > >> > >> We have an option to say no, and stick to the original goal list. But I > >> think the requirement makes sense, even if coming very late - and will > >> significantly ease adoption of the Parquet encryption. Also, it turns > out > >> to be easy to implement, with minor changes in the code. > >> > >> In any case, its certainly a good idea to keep a single copy of > >> parquet.thrift in the format repo. > >> > >> Cheers, Gidon > >> > >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney > >> wrote: > >> > >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > >> > reading and writing files with encryption is not possible. Am I wrong > >> > about that? > >> > > >> > I'm wholly supportive of changing the project to use a released > >> > version of the Parquet format, so let's do that ASAP. > >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > >> wrote: > >> > > > >> > > Yep! (sent in parallel :) > >> > > > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > >> > > >> > > wrote: > >> > > > >> > > > Hi, > >> > > > > >> > > > As a short update, I just checked the PR for PARQUET-1419 and > >> although > >> > in > >> > > > its current form it is a breaking change, it can be easily > >> rewritten to > >> > > > become backwards-compatible so this part of the problem does not > >> apply > >> > any > >> > > > more. > >> > > > > >> > > > Br, > >> > > > > >> > > > Zoltan > >> > > > > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi > >> wrote: > >> > > > > >> > > > > Hi, > >> > > > > > >> > > > > On the Parquet sync we discussed that the practice of > maintaining > >> a > >> > copy > >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must > >> take > >> > care > >> > > > to > >> > > > > not release parquet-format changes in parquet-cpp before we > >> > officially > >> > > > > release them in parquet-format. As I got back to my computer and > >> > started > >> > > > to > >> > > > > create a JIRA about this, I noticed that unfortunately this has > >> > already > >> > > > > happened. > >> > > > > > >> > > > > The encryption-releated parquet.thrift changes have not only > been > >> > added > >> > > > to > >> > > > > only parquet-format, but to parquet-cpp as well, and these > changes > >> > got > >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because > >> > > > > PARQUET-1419 would change the encryption in a breaking way, > which > >> is > >> > only >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, It seems that I spoke too early. I just noticed that a new field was added to the ColumnChunk struct in https://github.com/apache/parquet-cpp/pull/463/files#diff-0589b447b73e51c88d9b2bdcb0957084R708 Although the field is optional, we can't pretend it was never released, because parquet-cpp already expects that field for the id 8. Therefore, if we were to add a different field with id 8 to the ColumnChunk struct, the thrift parser of parquet-cpp 1.5.0 would probably throw an error when reading new files, because it would try to read the ColumnChunk struct including all of its fields (regardless of whether the code actually uses them or not) and the field with id 8 would not match its Thrift schema. We still can avoid breaking changes in the ColumnChunk struct by either using id 8 for its current purpose or not using it at all (skip id 8 and use 9 for the next field we add). Br, Zoltan On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > Hi, > > If the encryption code release in parquet-cpp is unused at this moment > then I think we are fine. It means that we are still free to decide any way > about the data structures without the risk of incompatility issues. In this > case I would suggest to proceed as we planned at the Parquet sync. > > Thanks, > > Zoltan > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky wrote: > >> That's correct. A layer that runs the crypto classes to encrypt pages and >> structures is not merged yet. Even if the parquet.thrift is out, there is >> virtually no chance encrypted files are created with it, and certainly not >> in production. >> >> Given a (hopefully mild) headache this last-minute feature/requirement has >> caused, let me add details on where it is coming from. I received the >> requirement from a company that got an internal pushback from teams >> working >> with unencrypted columns only, and unwilling to update Parquet libs. They >> need support for a transition period for the current readers. >> >> We have an option to say no, and stick to the original goal list. But I >> think the requirement makes sense, even if coming very late - and will >> significantly ease adoption of the Parquet encryption. Also, it turns out >> to be easy to implement, with minor changes in the code. >> >> In any case, its certainly a good idea to keep a single copy of >> parquet.thrift in the format repo. >> >> Cheers, Gidon >> >> On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney >> wrote: >> >> > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so >> > reading and writing files with encryption is not possible. Am I wrong >> > about that? >> > >> > I'm wholly supportive of changing the project to use a released >> > version of the Parquet format, so let's do that ASAP. >> > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky >> wrote: >> > > >> > > Yep! (sent in parallel :) >> > > >> > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > >> > > wrote: >> > > >> > > > Hi, >> > > > >> > > > As a short update, I just checked the PR for PARQUET-1419 and >> although >> > in >> > > > its current form it is a breaking change, it can be easily >> rewritten to >> > > > become backwards-compatible so this part of the problem does not >> apply >> > any >> > > > more. >> > > > >> > > > Br, >> > > > >> > > > Zoltan >> > > > >> > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi >> wrote: >> > > > >> > > > > Hi, >> > > > > >> > > > > On the Parquet sync we discussed that the practice of maintaining >> a >> > copy >> > > > > of parquet.thrift in parquet-cpp is dangerous and that we must >> take >> > care >> > > > to >> > > > > not release parquet-format changes in parquet-cpp before we >> > officially >> > > > > release them in parquet-format. As I got back to my computer and >> > started >> > > > to >> > > > > create a JIRA about this, I noticed that unfortunately this has >> > already >> > > > > happened. >> > > > > >> > > > > The encryption-releated parquet.thrift changes have not only been >> > added >> > > > to >> > > > > only parquet-format, but to parquet-cpp as well, and these changes >> > got >> > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because >> > > > > PARQUET-1419 would change the encryption in a breaking way, which >> is >> > only >> > > > > acceptable as long as the original is not released. Additionally, >> it >> > has >> > > > > been discussed that a formal voting should take place before >> > > > incorporating >> > > > > the encryption features in the format. >> > > > > >> > > > > Now that parquet-cpp has already shipped these changes, we must >> > choose >> > > > the >> > > > > lesser evil of the following two options: >> > > > > >> > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk >> that >> > > > >encrypted data files already written with parquet-cpp 1.5.0 >> will >> > not >> > > > be >> > > > >readable any more. >> > > > >- Release the encryption in parquet-format as it is, >> regardless of >> > > > >
Re: Parquet-cpp has already released unofficial thrift changes
Sounds good. I agree a feature branch is the right thing to do in these cases. Cheers, Gidon. On Wed, Sep 26, 2018 at 2:42 PM Zoltan Ivanfi wrote: > Hi, > > If the encryption code release in parquet-cpp is unused at this moment then > I think we are fine. It means that we are still free to decide any way > about the data structures without the risk of incompatility issues. In this > case I would suggest to proceed as we planned at the Parquet sync. > > Thanks, > > Zoltan > > On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky wrote: > > > That's correct. A layer that runs the crypto classes to encrypt pages and > > structures is not merged yet. Even if the parquet.thrift is out, there is > > virtually no chance encrypted files are created with it, and certainly > not > > in production. > > > > Given a (hopefully mild) headache this last-minute feature/requirement > has > > caused, let me add details on where it is coming from. I received the > > requirement from a company that got an internal pushback from teams > working > > with unencrypted columns only, and unwilling to update Parquet libs. They > > need support for a transition period for the current readers. > > > > We have an option to say no, and stick to the original goal list. But I > > think the requirement makes sense, even if coming very late - and will > > significantly ease adoption of the Parquet encryption. Also, it turns out > > to be easy to implement, with minor changes in the code. > > > > In any case, its certainly a good idea to keep a single copy of > > parquet.thrift in the format repo. > > > > Cheers, Gidon > > > > On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney > wrote: > > > > > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > > > reading and writing files with encryption is not possible. Am I wrong > > > about that? > > > > > > I'm wholly supportive of changing the project to use a released > > > version of the Parquet format, so let's do that ASAP. > > > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > > wrote: > > > > > > > > Yep! (sent in parallel :) > > > > > > > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > As a short update, I just checked the PR for PARQUET-1419 and > > although > > > in > > > > > its current form it is a breaking change, it can be easily > rewritten > > to > > > > > become backwards-compatible so this part of the problem does not > > apply > > > any > > > > > more. > > > > > > > > > > Br, > > > > > > > > > > Zoltan > > > > > > > > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi > > wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > On the Parquet sync we discussed that the practice of > maintaining a > > > copy > > > > > > of parquet.thrift in parquet-cpp is dangerous and that we must > take > > > care > > > > > to > > > > > > not release parquet-format changes in parquet-cpp before we > > > officially > > > > > > release them in parquet-format. As I got back to my computer and > > > started > > > > > to > > > > > > create a JIRA about this, I noticed that unfortunately this has > > > already > > > > > > happened. > > > > > > > > > > > > The encryption-releated parquet.thrift changes have not only been > > > added > > > > > to > > > > > > only parquet-format, but to parquet-cpp as well, and these > changes > > > got > > > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because > > > > > > PARQUET-1419 would change the encryption in a breaking way, which > > is > > > only > > > > > > acceptable as long as the original is not released. Additionally, > > it > > > has > > > > > > been discussed that a formal voting should take place before > > > > > incorporating > > > > > > the encryption features in the format. > > > > > > > > > > > > Now that parquet-cpp has already shipped these changes, we must > > > choose > > > > > the > > > > > > lesser evil of the following two options: > > > > > > > > > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk > > that > > > > > >encrypted data files already written with parquet-cpp 1.5.0 > will > > > not > > > > > be > > > > > >readable any more. > > > > > >- Release the encryption in parquet-format as it is, > regardless > > of > > > > > >voting results and discard PARQUET-1419. > > > > > > > > > > > > Personally I have a hard time deciding which one I consider > lesser > > > evil. > > > > > > What are your opinions? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Zoltan > > > > > > > > > > > > > > > > >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, If the encryption code release in parquet-cpp is unused at this moment then I think we are fine. It means that we are still free to decide any way about the data structures without the risk of incompatility issues. In this case I would suggest to proceed as we planned at the Parquet sync. Thanks, Zoltan On Wed, Sep 26, 2018 at 7:49 AM Gidon Gershinsky wrote: > That's correct. A layer that runs the crypto classes to encrypt pages and > structures is not merged yet. Even if the parquet.thrift is out, there is > virtually no chance encrypted files are created with it, and certainly not > in production. > > Given a (hopefully mild) headache this last-minute feature/requirement has > caused, let me add details on where it is coming from. I received the > requirement from a company that got an internal pushback from teams working > with unencrypted columns only, and unwilling to update Parquet libs. They > need support for a transition period for the current readers. > > We have an option to say no, and stick to the original goal list. But I > think the requirement makes sense, even if coming very late - and will > significantly ease adoption of the Parquet encryption. Also, it turns out > to be easy to implement, with minor changes in the code. > > In any case, its certainly a good idea to keep a single copy of > parquet.thrift in the format repo. > > Cheers, Gidon > > On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney wrote: > > > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > > reading and writing files with encryption is not possible. Am I wrong > > about that? > > > > I'm wholly supportive of changing the project to use a released > > version of the Parquet format, so let's do that ASAP. > > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > wrote: > > > > > > Yep! (sent in parallel :) > > > > > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > > > wrote: > > > > > > > Hi, > > > > > > > > As a short update, I just checked the PR for PARQUET-1419 and > although > > in > > > > its current form it is a breaking change, it can be easily rewritten > to > > > > become backwards-compatible so this part of the problem does not > apply > > any > > > > more. > > > > > > > > Br, > > > > > > > > Zoltan > > > > > > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi > wrote: > > > > > > > > > Hi, > > > > > > > > > > On the Parquet sync we discussed that the practice of maintaining a > > copy > > > > > of parquet.thrift in parquet-cpp is dangerous and that we must take > > care > > > > to > > > > > not release parquet-format changes in parquet-cpp before we > > officially > > > > > release them in parquet-format. As I got back to my computer and > > started > > > > to > > > > > create a JIRA about this, I noticed that unfortunately this has > > already > > > > > happened. > > > > > > > > > > The encryption-releated parquet.thrift changes have not only been > > added > > > > to > > > > > only parquet-format, but to parquet-cpp as well, and these changes > > got > > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because > > > > > PARQUET-1419 would change the encryption in a breaking way, which > is > > only > > > > > acceptable as long as the original is not released. Additionally, > it > > has > > > > > been discussed that a formal voting should take place before > > > > incorporating > > > > > the encryption features in the format. > > > > > > > > > > Now that parquet-cpp has already shipped these changes, we must > > choose > > > > the > > > > > lesser evil of the following two options: > > > > > > > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk > that > > > > >encrypted data files already written with parquet-cpp 1.5.0 will > > not > > > > be > > > > >readable any more. > > > > >- Release the encryption in parquet-format as it is, regardless > of > > > > >voting results and discard PARQUET-1419. > > > > > > > > > > Personally I have a hard time deciding which one I consider lesser > > evil. > > > > > What are your opinions? > > > > > > > > > > Thanks, > > > > > > > > > > Zoltan > > > > > > > > > > > >
Re: Parquet-cpp has already released unofficial thrift changes
That's correct. A layer that runs the crypto classes to encrypt pages and structures is not merged yet. Even if the parquet.thrift is out, there is virtually no chance encrypted files are created with it, and certainly not in production. Given a (hopefully mild) headache this last-minute feature/requirement has caused, let me add details on where it is coming from. I received the requirement from a company that got an internal pushback from teams working with unencrypted columns only, and unwilling to update Parquet libs. They need support for a transition period for the current readers. We have an option to say no, and stick to the original goal list. But I think the requirement makes sense, even if coming very late - and will significantly ease adoption of the Parquet encryption. Also, it turns out to be easy to implement, with minor changes in the code. In any case, its certainly a good idea to keep a single copy of parquet.thrift in the format repo. Cheers, Gidon On Wed, Sep 26, 2018 at 12:55 AM Wes McKinney wrote: > AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so > reading and writing files with encryption is not possible. Am I wrong > about that? > > I'm wholly supportive of changing the project to use a released > version of the Parquet format, so let's do that ASAP. > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky wrote: > > > > Yep! (sent in parallel :) > > > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > wrote: > > > > > Hi, > > > > > > As a short update, I just checked the PR for PARQUET-1419 and although > in > > > its current form it is a breaking change, it can be easily rewritten to > > > become backwards-compatible so this part of the problem does not apply > any > > > more. > > > > > > Br, > > > > > > Zoltan > > > > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi wrote: > > > > > > > Hi, > > > > > > > > On the Parquet sync we discussed that the practice of maintaining a > copy > > > > of parquet.thrift in parquet-cpp is dangerous and that we must take > care > > > to > > > > not release parquet-format changes in parquet-cpp before we > officially > > > > release them in parquet-format. As I got back to my computer and > started > > > to > > > > create a JIRA about this, I noticed that unfortunately this has > already > > > > happened. > > > > > > > > The encryption-releated parquet.thrift changes have not only been > added > > > to > > > > only parquet-format, but to parquet-cpp as well, and these changes > got > > > > released in parquet-cpp 1.5.0. This is very unfortunate, because > > > > PARQUET-1419 would change the encryption in a breaking way, which is > only > > > > acceptable as long as the original is not released. Additionally, it > has > > > > been discussed that a formal voting should take place before > > > incorporating > > > > the encryption features in the format. > > > > > > > > Now that parquet-cpp has already shipped these changes, we must > choose > > > the > > > > lesser evil of the following two options: > > > > > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk that > > > >encrypted data files already written with parquet-cpp 1.5.0 will > not > > > be > > > >readable any more. > > > >- Release the encryption in parquet-format as it is, regardless of > > > >voting results and discard PARQUET-1419. > > > > > > > > Personally I have a hard time deciding which one I consider lesser > evil. > > > > What are your opinions? > > > > > > > > Thanks, > > > > > > > > Zoltan > > > > > > > >
Re: Parquet-cpp has already released unofficial thrift changes
AFAIK encryption is not fully supported in parquet-cpp 1.5.0 so reading and writing files with encryption is not possible. Am I wrong about that? I'm wholly supportive of changing the project to use a released version of the Parquet format, so let's do that ASAP. On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky wrote: > > Yep! (sent in parallel :) > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > As a short update, I just checked the PR for PARQUET-1419 and although in > > its current form it is a breaking change, it can be easily rewritten to > > become backwards-compatible so this part of the problem does not apply any > > more. > > > > Br, > > > > Zoltan > > > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi wrote: > > > > > Hi, > > > > > > On the Parquet sync we discussed that the practice of maintaining a copy > > > of parquet.thrift in parquet-cpp is dangerous and that we must take care > > to > > > not release parquet-format changes in parquet-cpp before we officially > > > release them in parquet-format. As I got back to my computer and started > > to > > > create a JIRA about this, I noticed that unfortunately this has already > > > happened. > > > > > > The encryption-releated parquet.thrift changes have not only been added > > to > > > only parquet-format, but to parquet-cpp as well, and these changes got > > > released in parquet-cpp 1.5.0. This is very unfortunate, because > > > PARQUET-1419 would change the encryption in a breaking way, which is only > > > acceptable as long as the original is not released. Additionally, it has > > > been discussed that a formal voting should take place before > > incorporating > > > the encryption features in the format. > > > > > > Now that parquet-cpp has already shipped these changes, we must choose > > the > > > lesser evil of the following two options: > > > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk that > > >encrypted data files already written with parquet-cpp 1.5.0 will not > > be > > >readable any more. > > >- Release the encryption in parquet-format as it is, regardless of > > >voting results and discard PARQUET-1419. > > > > > > Personally I have a hard time deciding which one I consider lesser evil. > > > What are your opinions? > > > > > > Thanks, > > > > > > Zoltan > > > > >
Re: Parquet-cpp has already released unofficial thrift changes
Yep! (sent in parallel :) On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi wrote: > Hi, > > As a short update, I just checked the PR for PARQUET-1419 and although in > its current form it is a breaking change, it can be easily rewritten to > become backwards-compatible so this part of the problem does not apply any > more. > > Br, > > Zoltan > > On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi wrote: > > > Hi, > > > > On the Parquet sync we discussed that the practice of maintaining a copy > > of parquet.thrift in parquet-cpp is dangerous and that we must take care > to > > not release parquet-format changes in parquet-cpp before we officially > > release them in parquet-format. As I got back to my computer and started > to > > create a JIRA about this, I noticed that unfortunately this has already > > happened. > > > > The encryption-releated parquet.thrift changes have not only been added > to > > only parquet-format, but to parquet-cpp as well, and these changes got > > released in parquet-cpp 1.5.0. This is very unfortunate, because > > PARQUET-1419 would change the encryption in a breaking way, which is only > > acceptable as long as the original is not released. Additionally, it has > > been discussed that a formal voting should take place before > incorporating > > the encryption features in the format. > > > > Now that parquet-cpp has already shipped these changes, we must choose > the > > lesser evil of the following two options: > > > >- Release a parquet-cpp 1.6.0 with a breaking change and risk that > >encrypted data files already written with parquet-cpp 1.5.0 will not > be > >readable any more. > >- Release the encryption in parquet-format as it is, regardless of > >voting results and discard PARQUET-1419. > > > > Personally I have a hard time deciding which one I consider lesser evil. > > What are your opinions? > > > > Thanks, > > > > Zoltan > > >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, There is a third options. PARQUET-1419 doesn't have to be breaking. Removal of the 'required boolean encrypted_footer" is not a must (it can be set to true, and checked in readers). I can modify PARQUET-1419 pr to keep this field - then we have only one new field, an optional one, which wouldn't introduce a breaking change. Release of parquet-format with it would be compatible with parquet-cpp 1.5.0. Cheers, Gidon. On Tue, Sep 25, 2018 at 8:10 PM Zoltan Ivanfi wrote: > Hi, > > On the Parquet sync we discussed that the practice of maintaining a copy of > parquet.thrift in parquet-cpp is dangerous and that we must take care to > not release parquet-format changes in parquet-cpp before we officially > release them in parquet-format. As I got back to my computer and started to > create a JIRA about this, I noticed that unfortunately this has already > happened. > > The encryption-releated parquet.thrift changes have not only been added to > only parquet-format, but to parquet-cpp as well, and these changes got > released in parquet-cpp 1.5.0. This is very unfortunate, because > PARQUET-1419 would change the encryption in a breaking way, which is only > acceptable as long as the original is not released. Additionally, it has > been discussed that a formal voting should take place before incorporating > the encryption features in the format. > > Now that parquet-cpp has already shipped these changes, we must choose the > lesser evil of the following two options: > >- Release a parquet-cpp 1.6.0 with a breaking change and risk that >encrypted data files already written with parquet-cpp 1.5.0 will not be >readable any more. >- Release the encryption in parquet-format as it is, regardless of >voting results and discard PARQUET-1419. > > Personally I have a hard time deciding which one I consider lesser evil. > What are your opinions? > > Thanks, > > Zoltan >
Re: Parquet-cpp has already released unofficial thrift changes
Hi, As a short update, I just checked the PR for PARQUET-1419 and although in its current form it is a breaking change, it can be easily rewritten to become backwards-compatible so this part of the problem does not apply any more. Br, Zoltan On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi wrote: > Hi, > > On the Parquet sync we discussed that the practice of maintaining a copy > of parquet.thrift in parquet-cpp is dangerous and that we must take care to > not release parquet-format changes in parquet-cpp before we officially > release them in parquet-format. As I got back to my computer and started to > create a JIRA about this, I noticed that unfortunately this has already > happened. > > The encryption-releated parquet.thrift changes have not only been added to > only parquet-format, but to parquet-cpp as well, and these changes got > released in parquet-cpp 1.5.0. This is very unfortunate, because > PARQUET-1419 would change the encryption in a breaking way, which is only > acceptable as long as the original is not released. Additionally, it has > been discussed that a formal voting should take place before incorporating > the encryption features in the format. > > Now that parquet-cpp has already shipped these changes, we must choose the > lesser evil of the following two options: > >- Release a parquet-cpp 1.6.0 with a breaking change and risk that >encrypted data files already written with parquet-cpp 1.5.0 will not be >readable any more. >- Release the encryption in parquet-format as it is, regardless of >voting results and discard PARQUET-1419. > > Personally I have a hard time deciding which one I consider lesser evil. > What are your opinions? > > Thanks, > > Zoltan >