Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-20 Thread Alexander Behm
whole row group can be included. The addition of NaNs doesn't change > > that. > > > > OTOH, if b <= a <= c, then we have to check the whole row group, and > > the addition of NaNs doesn't change that. > > > > On Tue, Feb 20, 2018 at 9:14 AM, Alexande

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-20 Thread Alexander Behm
On Mon, Feb 19, 2018 at 8:04 AM, Zoltan Ivanfi wrote: > Hi, > > Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222 > as the preferred > solution. > > Alex, not writing min/max if there is a NaN is indeed a feasible quic

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Alexander Behm
DefinedOrderWithCorrectOrderingForDoubles". We could also count > > up, > > > > like TypeDefinedOrderV2 and so on. > > > > > > > > An alternative would be to list all writers that are known to have > > > written > > > > incorrect st

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Alexander Behm
aN in the data, so the reader can't do anything useful > with the stats anyway unless it's NaN-aware. > The writer solution is to only write stats if the data does not contain special values (common case). > > On Fri, Feb 16, 2018 at 9:03 AM, Alexander Behm > wrote: >

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Alexander Behm
I hope the common cases is that data files do not contain these special float values. As the simplest solution, how about writers refrain from populating the stats if a special value is encountered? That fix does not preclude a more thorough solution in the future, but it addresses the common case