Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-22 Thread Zoltan Borok-Nagy
OK, thanks!

On Wed, Feb 21, 2018 at 7:33 PM, Deepak Majeti 
wrote:

> Yes! The min/max will be set to NaN in the case when all the values are
> NaN.
>
> On Wed, Feb 21, 2018 at 10:54 AM, Zoltan Borok-Nagy <
> borokna...@cloudera.com
> > wrote:
>
> > Deepak, just for clarification, does it mean that parquet-cpp will also
> > write statistics when all the values are NaN?
> >
> >
> > On Wed, Feb 21, 2018 at 1:16 PM, Deepak Majeti 
> > wrote:
> >
> > > I am okay with this proposed fix for Impala.
> > >
> > > On Tue, Feb 20, 2018 at 5:46 PM, Zoltan Borok-Nagy <
> > > borokna...@cloudera.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm implementing the quick fix for Impala. The current proposal for
> the
> > > > write path fix is to behave like the fmax()/fmin() functions in
> math.h,
> > > ie.
> > > > ignore NaNs, except for the case when all the values are NaN.
> > > >
> > > > http://en.cppreference.com/w/c/numeric/math/fmax
> > > > https://issues.apache.org/jira/browse/IMPALA-6542
> > > >
> > > > But, it is also OK for me if you guys think that we should not write
> > > > statistics at all when all the values are Nan. I just think that the
> > > chosen
> > > > behavior should be identical.
> > > >
> > > > BR,
> > > > Zoltan-BN
> > > >
> > > >
> > > >
> > > > On Tue, Feb 20, 2018 at 5:57 PM, Uwe L. Korn 
> wrote:
> > > >
> > > > > Due to the issues raised, I will close this RC and once all patches
> > are
> > > > > merged, I will build a new one.
> > > > >
> > > > > Uwe
> > > > >
> > > > > On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> > > > > > Wes, Zoltan,
> > > > > >
> > > > > > I am taking a look at the issue now. I will handle the patch for
> > this
> > > > > one.
> > > > > > Thanks!
> > > > > >
> > > > > > On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney <
> > wesmck...@gmail.com>
> > > > > wrote:
> > > > > > > hi Zoltan -- my quick read is that one appropriate fix in
> > > parquet-cpp
> > > > > > > would be to exclude NaN values from statistics calculations
> > (there
> > > is
> > > > > > > also the case that the whole row group is NaN for a column, in
> > > which
> > > > > > > case we should not write statistics perhaps?)? This might not
> > take
> > > > too
> > > > > > > long to fix in parquet-cpp, and we have some other patches up
> > that
> > > we
> > > > > > > could merge in as well.
> > > > > > >
> > > > > > > Deepak, Phillip, or Uwe do you have any time to look at this? I
> > can
> > > > > > > also make time to look
> > > > > > >
> > > > > > > Thanks
> > > > > > > Wes
> > > > > > >
> > > > > > > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi <
> z...@cloudera.com>
> > > > > wrote:
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> I wonder whether the fix for PARQUET-1225
> > > > > > >>  should
> be
> > > > > included in
> > > > > > >> the next release, even if it causes a delay.
> > > > > > >>
> > > > > > >> Br,
> > > > > > >>
> > > > > > >> Zoltan
> > > > > > >>
> > > > > > >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn <
> uw...@xhochy.com>
> > > > > wrote:
> > > > > > >>
> > > > > > >>> +1 (binding)
> > > > > > >>>
> > > > > > >>> verified on Ubuntu 16.04
> > > > > > >>> verified on macOS High Sierra but needed to set the following
> > env
> > > > > vars to
> > > > > > >>> get Thrift 0.11 building:
> > > > > > >>>
> > > > > > >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> > > > > > >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> > > > > > >>>
> > > > > > >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > > > > > >>> > All,
> > > > > > >>> >
> > > > > > >>> > I propose that we accept the following release candidate as
> > the
> > > > > official
> > > > > > >>> > Apache Parquet C++ 1.4.0 release.
> > > > > > >>> >
> > > > > > >>> > Parquet C++ 1.4.0-rc0 includes the following:
> > > > > > >>> > ---
> > > > > > >>> > The CHANGELOG for the release is available at:
> > > > > > >>> >
> > > > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=
> > > > > CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> > > > > > >>> >
> > > > > > >>> > The tag used to create the release candidate is:
> > > > > > >>> >
> > > > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=
> > > > > shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> > > > > > >>> >
> > > > > > >>> > The release candidate is available at:
> > > > > > >>> >
> > > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> > > > > > >>> >
> > > > > > >>> > The MD5 checksum of the release candidate can be found at:
> > > > > > >>> >
> > > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> > > > > > >>> >
> > > > > > >>> > The signature of the release candidate can be found at:
> > > > > > >>> >
> > > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > > par

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-21 Thread Deepak Majeti
Yes! The min/max will be set to NaN in the case when all the values are NaN.

On Wed, Feb 21, 2018 at 10:54 AM, Zoltan Borok-Nagy  wrote:

> Deepak, just for clarification, does it mean that parquet-cpp will also
> write statistics when all the values are NaN?
>
>
> On Wed, Feb 21, 2018 at 1:16 PM, Deepak Majeti 
> wrote:
>
> > I am okay with this proposed fix for Impala.
> >
> > On Tue, Feb 20, 2018 at 5:46 PM, Zoltan Borok-Nagy <
> > borokna...@cloudera.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I'm implementing the quick fix for Impala. The current proposal for the
> > > write path fix is to behave like the fmax()/fmin() functions in math.h,
> > ie.
> > > ignore NaNs, except for the case when all the values are NaN.
> > >
> > > http://en.cppreference.com/w/c/numeric/math/fmax
> > > https://issues.apache.org/jira/browse/IMPALA-6542
> > >
> > > But, it is also OK for me if you guys think that we should not write
> > > statistics at all when all the values are Nan. I just think that the
> > chosen
> > > behavior should be identical.
> > >
> > > BR,
> > > Zoltan-BN
> > >
> > >
> > >
> > > On Tue, Feb 20, 2018 at 5:57 PM, Uwe L. Korn  wrote:
> > >
> > > > Due to the issues raised, I will close this RC and once all patches
> are
> > > > merged, I will build a new one.
> > > >
> > > > Uwe
> > > >
> > > > On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> > > > > Wes, Zoltan,
> > > > >
> > > > > I am taking a look at the issue now. I will handle the patch for
> this
> > > > one.
> > > > > Thanks!
> > > > >
> > > > > On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney <
> wesmck...@gmail.com>
> > > > wrote:
> > > > > > hi Zoltan -- my quick read is that one appropriate fix in
> > parquet-cpp
> > > > > > would be to exclude NaN values from statistics calculations
> (there
> > is
> > > > > > also the case that the whole row group is NaN for a column, in
> > which
> > > > > > case we should not write statistics perhaps?)? This might not
> take
> > > too
> > > > > > long to fix in parquet-cpp, and we have some other patches up
> that
> > we
> > > > > > could merge in as well.
> > > > > >
> > > > > > Deepak, Phillip, or Uwe do you have any time to look at this? I
> can
> > > > > > also make time to look
> > > > > >
> > > > > > Thanks
> > > > > > Wes
> > > > > >
> > > > > > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi 
> > > > wrote:
> > > > > >> Hi,
> > > > > >>
> > > > > >> I wonder whether the fix for PARQUET-1225
> > > > > >>  should be
> > > > included in
> > > > > >> the next release, even if it causes a delay.
> > > > > >>
> > > > > >> Br,
> > > > > >>
> > > > > >> Zoltan
> > > > > >>
> > > > > >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn 
> > > > wrote:
> > > > > >>
> > > > > >>> +1 (binding)
> > > > > >>>
> > > > > >>> verified on Ubuntu 16.04
> > > > > >>> verified on macOS High Sierra but needed to set the following
> env
> > > > vars to
> > > > > >>> get Thrift 0.11 building:
> > > > > >>>
> > > > > >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> > > > > >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> > > > > >>>
> > > > > >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > > > > >>> > All,
> > > > > >>> >
> > > > > >>> > I propose that we accept the following release candidate as
> the
> > > > official
> > > > > >>> > Apache Parquet C++ 1.4.0 release.
> > > > > >>> >
> > > > > >>> > Parquet C++ 1.4.0-rc0 includes the following:
> > > > > >>> > ---
> > > > > >>> > The CHANGELOG for the release is available at:
> > > > > >>> >
> > > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=
> > > > CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> > > > > >>> >
> > > > > >>> > The tag used to create the release candidate is:
> > > > > >>> >
> > > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=
> > > > shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> > > > > >>> >
> > > > > >>> > The release candidate is available at:
> > > > > >>> >
> > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> > > > > >>> >
> > > > > >>> > The MD5 checksum of the release candidate can be found at:
> > > > > >>> >
> > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> > > > > >>> >
> > > > > >>> > The signature of the release candidate can be found at:
> > > > > >>> >
> > > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> > > > > >>> >
> > > > > >>> > The GPG key used to sign the release are available at:
> > > > > >>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > > > > >>> >
> > > > > >>> > The release is based on the commit hash
> > > > > >>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> > > > > >>> >
> > > > > >>> > Please download, verify, and test.
> > > > > >>> >
> > > 

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-21 Thread Zoltan Borok-Nagy
Deepak, just for clarification, does it mean that parquet-cpp will also
write statistics when all the values are NaN?


On Wed, Feb 21, 2018 at 1:16 PM, Deepak Majeti 
wrote:

> I am okay with this proposed fix for Impala.
>
> On Tue, Feb 20, 2018 at 5:46 PM, Zoltan Borok-Nagy <
> borokna...@cloudera.com>
> wrote:
>
> > Hi,
> >
> > I'm implementing the quick fix for Impala. The current proposal for the
> > write path fix is to behave like the fmax()/fmin() functions in math.h,
> ie.
> > ignore NaNs, except for the case when all the values are NaN.
> >
> > http://en.cppreference.com/w/c/numeric/math/fmax
> > https://issues.apache.org/jira/browse/IMPALA-6542
> >
> > But, it is also OK for me if you guys think that we should not write
> > statistics at all when all the values are Nan. I just think that the
> chosen
> > behavior should be identical.
> >
> > BR,
> > Zoltan-BN
> >
> >
> >
> > On Tue, Feb 20, 2018 at 5:57 PM, Uwe L. Korn  wrote:
> >
> > > Due to the issues raised, I will close this RC and once all patches are
> > > merged, I will build a new one.
> > >
> > > Uwe
> > >
> > > On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> > > > Wes, Zoltan,
> > > >
> > > > I am taking a look at the issue now. I will handle the patch for this
> > > one.
> > > > Thanks!
> > > >
> > > > On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney 
> > > wrote:
> > > > > hi Zoltan -- my quick read is that one appropriate fix in
> parquet-cpp
> > > > > would be to exclude NaN values from statistics calculations (there
> is
> > > > > also the case that the whole row group is NaN for a column, in
> which
> > > > > case we should not write statistics perhaps?)? This might not take
> > too
> > > > > long to fix in parquet-cpp, and we have some other patches up that
> we
> > > > > could merge in as well.
> > > > >
> > > > > Deepak, Phillip, or Uwe do you have any time to look at this? I can
> > > > > also make time to look
> > > > >
> > > > > Thanks
> > > > > Wes
> > > > >
> > > > > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi 
> > > wrote:
> > > > >> Hi,
> > > > >>
> > > > >> I wonder whether the fix for PARQUET-1225
> > > > >>  should be
> > > included in
> > > > >> the next release, even if it causes a delay.
> > > > >>
> > > > >> Br,
> > > > >>
> > > > >> Zoltan
> > > > >>
> > > > >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn 
> > > wrote:
> > > > >>
> > > > >>> +1 (binding)
> > > > >>>
> > > > >>> verified on Ubuntu 16.04
> > > > >>> verified on macOS High Sierra but needed to set the following env
> > > vars to
> > > > >>> get Thrift 0.11 building:
> > > > >>>
> > > > >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> > > > >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> > > > >>>
> > > > >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > > > >>> > All,
> > > > >>> >
> > > > >>> > I propose that we accept the following release candidate as the
> > > official
> > > > >>> > Apache Parquet C++ 1.4.0 release.
> > > > >>> >
> > > > >>> > Parquet C++ 1.4.0-rc0 includes the following:
> > > > >>> > ---
> > > > >>> > The CHANGELOG for the release is available at:
> > > > >>> >
> > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=
> > > CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> > > > >>> >
> > > > >>> > The tag used to create the release candidate is:
> > > > >>> >
> > > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=
> > > shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> > > > >>> >
> > > > >>> > The release candidate is available at:
> > > > >>> >
> > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> > > > >>> >
> > > > >>> > The MD5 checksum of the release candidate can be found at:
> > > > >>> >
> > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> > > > >>> >
> > > > >>> > The signature of the release candidate can be found at:
> > > > >>> >
> > > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> > > > >>> >
> > > > >>> > The GPG key used to sign the release are available at:
> > > > >>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > > > >>> >
> > > > >>> > The release is based on the commit hash
> > > > >>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> > > > >>> >
> > > > >>> > Please download, verify, and test.
> > > > >>> >
> > > > >>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
> > > > >>> >
> > > > >>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
> > > > >>> > [ ] +0
> > > > >>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0
> because...
> > > > >>>
> > > >
> > > >
> > > >
> > > > --
> > > > regards,
> > > > Deepak Majeti
> > >
> >
>
>
>
> --
> regards,
> Deepak Majeti
>


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-21 Thread Deepak Majeti
I am okay with this proposed fix for Impala.

On Tue, Feb 20, 2018 at 5:46 PM, Zoltan Borok-Nagy 
wrote:

> Hi,
>
> I'm implementing the quick fix for Impala. The current proposal for the
> write path fix is to behave like the fmax()/fmin() functions in math.h, ie.
> ignore NaNs, except for the case when all the values are NaN.
>
> http://en.cppreference.com/w/c/numeric/math/fmax
> https://issues.apache.org/jira/browse/IMPALA-6542
>
> But, it is also OK for me if you guys think that we should not write
> statistics at all when all the values are Nan. I just think that the chosen
> behavior should be identical.
>
> BR,
> Zoltan-BN
>
>
>
> On Tue, Feb 20, 2018 at 5:57 PM, Uwe L. Korn  wrote:
>
> > Due to the issues raised, I will close this RC and once all patches are
> > merged, I will build a new one.
> >
> > Uwe
> >
> > On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> > > Wes, Zoltan,
> > >
> > > I am taking a look at the issue now. I will handle the patch for this
> > one.
> > > Thanks!
> > >
> > > On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney 
> > wrote:
> > > > hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp
> > > > would be to exclude NaN values from statistics calculations (there is
> > > > also the case that the whole row group is NaN for a column, in which
> > > > case we should not write statistics perhaps?)? This might not take
> too
> > > > long to fix in parquet-cpp, and we have some other patches up that we
> > > > could merge in as well.
> > > >
> > > > Deepak, Phillip, or Uwe do you have any time to look at this? I can
> > > > also make time to look
> > > >
> > > > Thanks
> > > > Wes
> > > >
> > > > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi 
> > wrote:
> > > >> Hi,
> > > >>
> > > >> I wonder whether the fix for PARQUET-1225
> > > >>  should be
> > included in
> > > >> the next release, even if it causes a delay.
> > > >>
> > > >> Br,
> > > >>
> > > >> Zoltan
> > > >>
> > > >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn 
> > wrote:
> > > >>
> > > >>> +1 (binding)
> > > >>>
> > > >>> verified on Ubuntu 16.04
> > > >>> verified on macOS High Sierra but needed to set the following env
> > vars to
> > > >>> get Thrift 0.11 building:
> > > >>>
> > > >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> > > >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> > > >>>
> > > >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > > >>> > All,
> > > >>> >
> > > >>> > I propose that we accept the following release candidate as the
> > official
> > > >>> > Apache Parquet C++ 1.4.0 release.
> > > >>> >
> > > >>> > Parquet C++ 1.4.0-rc0 includes the following:
> > > >>> > ---
> > > >>> > The CHANGELOG for the release is available at:
> > > >>> >
> > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=
> > CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> > > >>> >
> > > >>> > The tag used to create the release candidate is:
> > > >>> >
> > > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=
> > shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> > > >>> >
> > > >>> > The release candidate is available at:
> > > >>> >
> > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> > > >>> >
> > > >>> > The MD5 checksum of the release candidate can be found at:
> > > >>> >
> > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> > > >>> >
> > > >>> > The signature of the release candidate can be found at:
> > > >>> >
> > > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> > parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> > > >>> >
> > > >>> > The GPG key used to sign the release are available at:
> > > >>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > > >>> >
> > > >>> > The release is based on the commit hash
> > > >>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> > > >>> >
> > > >>> > Please download, verify, and test.
> > > >>> >
> > > >>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
> > > >>> >
> > > >>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
> > > >>> > [ ] +0
> > > >>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
> > > >>>
> > >
> > >
> > >
> > > --
> > > regards,
> > > Deepak Majeti
> >
>



-- 
regards,
Deepak Majeti


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-20 Thread Zoltan Borok-Nagy
Hi,

I'm implementing the quick fix for Impala. The current proposal for the
write path fix is to behave like the fmax()/fmin() functions in math.h, ie.
ignore NaNs, except for the case when all the values are NaN.

http://en.cppreference.com/w/c/numeric/math/fmax
https://issues.apache.org/jira/browse/IMPALA-6542

But, it is also OK for me if you guys think that we should not write
statistics at all when all the values are Nan. I just think that the chosen
behavior should be identical.

BR,
Zoltan-BN



On Tue, Feb 20, 2018 at 5:57 PM, Uwe L. Korn  wrote:

> Due to the issues raised, I will close this RC and once all patches are
> merged, I will build a new one.
>
> Uwe
>
> On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> > Wes, Zoltan,
> >
> > I am taking a look at the issue now. I will handle the patch for this
> one.
> > Thanks!
> >
> > On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney 
> wrote:
> > > hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp
> > > would be to exclude NaN values from statistics calculations (there is
> > > also the case that the whole row group is NaN for a column, in which
> > > case we should not write statistics perhaps?)? This might not take too
> > > long to fix in parquet-cpp, and we have some other patches up that we
> > > could merge in as well.
> > >
> > > Deepak, Phillip, or Uwe do you have any time to look at this? I can
> > > also make time to look
> > >
> > > Thanks
> > > Wes
> > >
> > > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi 
> wrote:
> > >> Hi,
> > >>
> > >> I wonder whether the fix for PARQUET-1225
> > >>  should be
> included in
> > >> the next release, even if it causes a delay.
> > >>
> > >> Br,
> > >>
> > >> Zoltan
> > >>
> > >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn 
> wrote:
> > >>
> > >>> +1 (binding)
> > >>>
> > >>> verified on Ubuntu 16.04
> > >>> verified on macOS High Sierra but needed to set the following env
> vars to
> > >>> get Thrift 0.11 building:
> > >>>
> > >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> > >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> > >>>
> > >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > >>> > All,
> > >>> >
> > >>> > I propose that we accept the following release candidate as the
> official
> > >>> > Apache Parquet C++ 1.4.0 release.
> > >>> >
> > >>> > Parquet C++ 1.4.0-rc0 includes the following:
> > >>> > ---
> > >>> > The CHANGELOG for the release is available at:
> > >>> >
> > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=
> CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> > >>> >
> > >>> > The tag used to create the release candidate is:
> > >>> >
> > >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=
> shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> > >>> >
> > >>> > The release candidate is available at:
> > >>> >
> > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> > >>> >
> > >>> > The MD5 checksum of the release candidate can be found at:
> > >>> >
> > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> > >>> >
> > >>> > The signature of the release candidate can be found at:
> > >>> >
> > >>> https://dist.apache.org/repos/dist/dev/parquet/apache-
> parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> > >>> >
> > >>> > The GPG key used to sign the release are available at:
> > >>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > >>> >
> > >>> > The release is based on the commit hash
> > >>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> > >>> >
> > >>> > Please download, verify, and test.
> > >>> >
> > >>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
> > >>> >
> > >>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
> > >>> > [ ] +0
> > >>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
> > >>>
> >
> >
> >
> > --
> > regards,
> > Deepak Majeti
>


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-20 Thread Uwe L. Korn
Due to the issues raised, I will close this RC and once all patches are merged, 
I will build a new one.

Uwe

On Tue, Feb 20, 2018, at 1:48 AM, Deepak Majeti wrote:
> Wes, Zoltan,
> 
> I am taking a look at the issue now. I will handle the patch for this one.
> Thanks!
> 
> On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney  wrote:
> > hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp
> > would be to exclude NaN values from statistics calculations (there is
> > also the case that the whole row group is NaN for a column, in which
> > case we should not write statistics perhaps?)? This might not take too
> > long to fix in parquet-cpp, and we have some other patches up that we
> > could merge in as well.
> >
> > Deepak, Phillip, or Uwe do you have any time to look at this? I can
> > also make time to look
> >
> > Thanks
> > Wes
> >
> > On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi  wrote:
> >> Hi,
> >>
> >> I wonder whether the fix for PARQUET-1225
> >>  should be included in
> >> the next release, even if it causes a delay.
> >>
> >> Br,
> >>
> >> Zoltan
> >>
> >> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn  wrote:
> >>
> >>> +1 (binding)
> >>>
> >>> verified on Ubuntu 16.04
> >>> verified on macOS High Sierra but needed to set the following env vars to
> >>> get Thrift 0.11 building:
> >>>
> >>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> >>> export PATH="/usr/local/opt/bison/bin:$PATH"
> >>>
> >>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> >>> > All,
> >>> >
> >>> > I propose that we accept the following release candidate as the official
> >>> > Apache Parquet C++ 1.4.0 release.
> >>> >
> >>> > Parquet C++ 1.4.0-rc0 includes the following:
> >>> > ---
> >>> > The CHANGELOG for the release is available at:
> >>> >
> >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> >>> >
> >>> > The tag used to create the release candidate is:
> >>> >
> >>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> >>> >
> >>> > The release candidate is available at:
> >>> >
> >>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> >>> >
> >>> > The MD5 checksum of the release candidate can be found at:
> >>> >
> >>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> >>> >
> >>> > The signature of the release candidate can be found at:
> >>> >
> >>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> >>> >
> >>> > The GPG key used to sign the release are available at:
> >>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> >>> >
> >>> > The release is based on the commit hash
> >>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> >>> >
> >>> > Please download, verify, and test.
> >>> >
> >>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
> >>> >
> >>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
> >>> > [ ] +0
> >>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
> >>>
> 
> 
> 
> -- 
> regards,
> Deepak Majeti


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Deepak Majeti
Wes, Zoltan,

I am taking a look at the issue now. I will handle the patch for this one.
Thanks!

On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney  wrote:
> hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp
> would be to exclude NaN values from statistics calculations (there is
> also the case that the whole row group is NaN for a column, in which
> case we should not write statistics perhaps?)? This might not take too
> long to fix in parquet-cpp, and we have some other patches up that we
> could merge in as well.
>
> Deepak, Phillip, or Uwe do you have any time to look at this? I can
> also make time to look
>
> Thanks
> Wes
>
> On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi  wrote:
>> Hi,
>>
>> I wonder whether the fix for PARQUET-1225
>>  should be included in
>> the next release, even if it causes a delay.
>>
>> Br,
>>
>> Zoltan
>>
>> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn  wrote:
>>
>>> +1 (binding)
>>>
>>> verified on Ubuntu 16.04
>>> verified on macOS High Sierra but needed to set the following env vars to
>>> get Thrift 0.11 building:
>>>
>>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
>>> export PATH="/usr/local/opt/bison/bin:$PATH"
>>>
>>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
>>> > All,
>>> >
>>> > I propose that we accept the following release candidate as the official
>>> > Apache Parquet C++ 1.4.0 release.
>>> >
>>> > Parquet C++ 1.4.0-rc0 includes the following:
>>> > ---
>>> > The CHANGELOG for the release is available at:
>>> >
>>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
>>> >
>>> > The tag used to create the release candidate is:
>>> >
>>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
>>> >
>>> > The release candidate is available at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
>>> >
>>> > The MD5 checksum of the release candidate can be found at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
>>> >
>>> > The signature of the release candidate can be found at:
>>> >
>>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
>>> >
>>> > The GPG key used to sign the release are available at:
>>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
>>> >
>>> > The release is based on the commit hash
>>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
>>> >
>>> > Please download, verify, and test.
>>> >
>>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
>>> >
>>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
>>> > [ ] +0
>>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
>>>



-- 
regards,
Deepak Majeti


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Wes McKinney
hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp
would be to exclude NaN values from statistics calculations (there is
also the case that the whole row group is NaN for a column, in which
case we should not write statistics perhaps?)? This might not take too
long to fix in parquet-cpp, and we have some other patches up that we
could merge in as well.

Deepak, Phillip, or Uwe do you have any time to look at this? I can
also make time to look

Thanks
Wes

On Mon, Feb 19, 2018 at 9:39 AM, Zoltan Ivanfi  wrote:
> Hi,
>
> I wonder whether the fix for PARQUET-1225
>  should be included in
> the next release, even if it causes a delay.
>
> Br,
>
> Zoltan
>
> On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn  wrote:
>
>> +1 (binding)
>>
>> verified on Ubuntu 16.04
>> verified on macOS High Sierra but needed to set the following env vars to
>> get Thrift 0.11 building:
>>
>> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
>> export PATH="/usr/local/opt/bison/bin:$PATH"
>>
>> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
>> > All,
>> >
>> > I propose that we accept the following release candidate as the official
>> > Apache Parquet C++ 1.4.0 release.
>> >
>> > Parquet C++ 1.4.0-rc0 includes the following:
>> > ---
>> > The CHANGELOG for the release is available at:
>> >
>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
>> >
>> > The tag used to create the release candidate is:
>> >
>> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
>> >
>> > The release candidate is available at:
>> >
>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
>> >
>> > The MD5 checksum of the release candidate can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
>> >
>> > The signature of the release candidate can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
>> >
>> > The GPG key used to sign the release are available at:
>> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
>> >
>> > The release is based on the commit hash
>> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
>> >
>> > Please download, verify, and test.
>> >
>> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
>> >
>> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
>> > [ ] +0
>> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
>>


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Zoltan Ivanfi
Hi,

I wonder whether the fix for PARQUET-1225
 should be included in
the next release, even if it causes a delay.

Br,

Zoltan

On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn  wrote:

> +1 (binding)
>
> verified on Ubuntu 16.04
> verified on macOS High Sierra but needed to set the following env vars to
> get Thrift 0.11 building:
>
> export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
> export PATH="/usr/local/opt/bison/bin:$PATH"
>
> On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> > All,
> >
> > I propose that we accept the following release candidate as the official
> > Apache Parquet C++ 1.4.0 release.
> >
> > Parquet C++ 1.4.0-rc0 includes the following:
> > ---
> > The CHANGELOG for the release is available at:
> >
> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> >
> > The tag used to create the release candidate is:
> >
> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> >
> > The release candidate is available at:
> >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> >
> > The MD5 checksum of the release candidate can be found at:
> >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> >
> > The signature of the release candidate can be found at:
> >
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> >
> > The GPG key used to sign the release are available at:
> > https://dist.apache.org/repos/dist/dev/parquet/KEYS
> >
> > The release is based on the commit hash
> > 76388ea4eb8b23656283116bc656b0c8f5db093b.
> >
> > Please download, verify, and test.
> >
> > The vote will close on Mi 21. Feb 21:41:23 CET 2018
> >
> > [ ] +1 Release this as Apache Parquet C++ 1.4.0
> > [ ] +0
> > [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...
>


Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-18 Thread Uwe L. Korn
+1 (binding)

verified on Ubuntu 16.04
verified on macOS High Sierra but needed to set the following env vars to get 
Thrift 0.11 building:

export OPENSSL_ROOT_DIR=/usr/local/Cellar/openssl/1.0.2n
export PATH="/usr/local/opt/bison/bin:$PATH"

On Sun, Feb 18, 2018, at 10:09 PM, Uwe L. Korn wrote:
> All,
> 
> I propose that we accept the following release candidate as the official
> Apache Parquet C++ 1.4.0 release.
> 
> Parquet C++ 1.4.0-rc0 includes the following:
> ---
> The CHANGELOG for the release is available at:
> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git&f=CHANGELOG&hb=apache-parquet-cpp-1.4.0-rc0
> 
> The tag used to create the release candidate is:
> https://gitbox.apache.org/repos/asf?p=parquet-cpp.git;a=shortlog;h=refs/tags/apache-parquet-cpp-1.4.0-rc0
> 
> The release candidate is available at:
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz
> 
> The MD5 checksum of the release candidate can be found at:
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.md5
> 
> The signature of the release candidate can be found at:
> https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-cpp-1.4.0-rc0/apache-parquet-cpp-1.4.0.tar.gz.asc
> 
> The GPG key used to sign the release are available at:
> https://dist.apache.org/repos/dist/dev/parquet/KEYS
> 
> The release is based on the commit hash 
> 76388ea4eb8b23656283116bc656b0c8f5db093b.
> 
> Please download, verify, and test.
> 
> The vote will close on Mi 21. Feb 21:41:23 CET 2018
> 
> [ ] +1 Release this as Apache Parquet C++ 1.4.0
> [ ] +0
> [ ] -1 Do not release this as Apache Parquet C++ 1.4.0 because...