Re: Moving forward with the timestamp proposal

2019-02-21 Thread Zoltan Ivanfi
in Parquet/Orc > first? Or are we going to use low-level physical types directly and add > Spark-specific metadata to Parquet/Orc files? > > On Wed, Feb 20, 2019 at 10:57 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > Last december we shared a timestamp harmon

Moving forward with the timestamp proposal

2019-02-20 Thread Zoltan Ivanfi
Hi, Last december we shared a timestamp harmonization proposal with the Hive, Spark and Impala communities. This was followed by an extensive discussion in January that lead to various updates and improvements to the proposal, as well as the creation of a new document for f

Re: Adding more timestamp types to on-disk storage formats

2019-01-22 Thread Zoltan Ivanfi
t in > China. > > Thanks, > Quanlong > > On 2019/01/17 16:33:36, Zoltan Ivanfi wrote: > > Hi,> > > > > One of the feedbacks I got for the SQL timestamp type harmonization> > > proposal was that I should reach out the file format communities as> >

Adding more timestamp types to on-disk storage formats

2019-01-17 Thread Zoltan Ivanfi
Hi, One of the feedbacks I got for the SQL timestamp type harmonization proposal was that I should reach out the file format communities as well. For this purpose I created a separate document from their perspective and sent it to the Avro, ORC, Parquet, Arrow, Kudu and Iceberg developer lists. Pl

Updated proposal: Consistent timestamp types in Hadoop SQL engines

2018-12-19 Thread Zoltan Ivanfi
Dear All, I would like to thank every reviewer of the consistent timestamps proposal[1] for their time and valuable comments. Based on your feedback, I have updated the proposal. The changes include clarifications, fixes and other improvements as summarized at the end of the document, in the Chang

Re: cloudera.com From headers being re-written on this list

2018-07-12 Thread Zoltan Ivanfi
Hi, I have seen this happening for other e-mail addresses and on other mailing lists as well. I may be wrong, but I would suppose it is a deliberate anti-spam measure. Zoltan On Tue, Jun 26, 2018 at 6:23 PM Michael Brown wrote: > Hi, > > For some reason mail to this list from users @cloudera.c

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-20 Thread Zoltan Ivanfi
up can be included. The addition of NaNs doesn't change > that. > > OTOH, if b <= a <= c, then we have to check the whole row group, and > the addition of NaNs doesn't change that. > > On Tue, Feb 20, 2018 at 9:14 AM, Alexander Behm > wrote: > > On Mon, F

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-19 Thread Zoltan Ivanfi
Hi, Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222 as the preferred solution. Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix, but I think it would be better to just ignore NaN-s for the p

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Zoltan Ivanfi
27; and 'inf' with min/max analytic fns": the discussion there > offers > > notable points on: > > 1. How Impala handles similar problems in different (but related) areas, > > 2. How other database products (Hive, PostgeSQL, etc.) handle similar > > iss

Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Zoltan Ivanfi
Dear Parquet and Impala Developers, We have exposed min/max statistics to extensive compatibility testing and found troubling inconsistencies regarding float and double values. Under certain (fortunately rather extreme) circumstances, this can lead to predicate pushdown incorrectly discarding row