[jira] [Assigned] (PARQUET-1225) NaN values may lead to incorrect filtering under certain circumstances

2018-02-19 Thread Deepak Majeti (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Majeti reassigned PARQUET-1225: -- Assignee: Deepak Majeti > NaN values may lead to incorrect filtering under certain

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Deepak Majeti
Wes, Zoltan, I am taking a look at the issue now. I will handle the patch for this one. Thanks! On Tue, Feb 20, 2018 at 12:54 AM, Wes McKinney wrote: > hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp > would be to exclude NaN values from statistics

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Zoltan Ivanfi
Hi, I wonder whether the fix for PARQUET-1225 should be included in the next release, even if it causes a delay. Br, Zoltan On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn wrote: > +1 (binding) > > verified on Ubuntu 16.04 >

[jira] [Created] (PARQUET-1223) Implement specification-compliant floating point comparison

2018-02-19 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1223: -- Summary: Implement specification-compliant floating point comparison Key: PARQUET-1223 URL: https://issues.apache.org/jira/browse/PARQUET-1223 Project: Parquet

[jira] [Created] (PARQUET-1225) NaN values may lead to incorrect filtering under certain circumstances

2018-02-19 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1225: -- Summary: NaN values may lead to incorrect filtering under certain circumstances Key: PARQUET-1225 URL: https://issues.apache.org/jira/browse/PARQUET-1225

[jira] [Updated] (PARQUET-1222) Definition of float and double sort order is ambigious

2018-02-19 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1222: --- Description: Currently parquet-format specifies the sort order for floating point numbers

[jira] [Updated] (PARQUET-1222) Definition of float and double sort order is ambigious

2018-02-19 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi updated PARQUET-1222: --- Description: Currently parquet-format specifies the sort order for floating point numbers

[jira] [Commented] (PARQUET-1208) Occasional endless loop in unit test

2018-02-19 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369359#comment-16369359 ] Zoltan Ivanfi commented on PARQUET-1208: https://github.com/apache/parquet-mr/pull/455 >

[jira] [Resolved] (PARQUET-1208) Occasional endless loop in unit test

2018-02-19 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Ivanfi resolved PARQUET-1208. Resolution: Fixed Fix Version/s: 1.10.0 > Occasional endless loop in unit test >

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-19 Thread Tim Armstrong
We could drop NaNs and require that -0 be normalised to +0 when writing out stats. That would remove any degrees of freedom from the writer and then straightforward comparison with =, <, >, >=, <=, != would work as expected. On Mon, Feb 19, 2018 at 8:04 AM, Zoltan Ivanfi

[jira] [Created] (PARQUET-1226) [C++] Fix new build warnings with clang 5.0

2018-02-19 Thread Wes McKinney (JIRA)
Wes McKinney created PARQUET-1226: - Summary: [C++] Fix new build warnings with clang 5.0 Key: PARQUET-1226 URL: https://issues.apache.org/jira/browse/PARQUET-1226 Project: Parquet Issue

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-19 Thread Zoltan Ivanfi
Hi, Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222 as the preferred solution. Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix, but I think it would be better to just ignore NaN-s for the

[jira] [Commented] (PARQUET-1226) [C++] Fix new build warnings with clang 5.0

2018-02-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369573#comment-16369573 ] ASF GitHub Bot commented on PARQUET-1226: - wesm commented on a change in pull request #442:

[jira] [Commented] (PARQUET-1226) [C++] Fix new build warnings with clang 5.0

2018-02-19 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16369587#comment-16369587 ] ASF GitHub Bot commented on PARQUET-1226: - cpcloud commented on a change in pull request #442:

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Wes McKinney
hi Zoltan -- my quick read is that one appropriate fix in parquet-cpp would be to exclude NaN values from statistics calculations (there is also the case that the whole row group is NaN for a column, in which case we should not write statistics perhaps?)? This might not take too long to fix in