[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

The behavior follows the way fmax()/fmin() works, ie. Impala
will only write NaN into the stats when all the values are NaNs.
This behavior is aligned with the quick fix of Parquet-CPP.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Reviewed-on: http://gerrit.cloudera.org:8080/9381
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 90 insertions(+), 17 deletions(-)

Approvals:
  Lars Volker: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 08 Mar 2018 07:34:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-07 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:54:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2062/


--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:54:21 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-07 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 08 Mar 2018 03:53:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-06 Thread Zoltan Borok-Nagy (Code Review)
Hello Lars Volker, Gabor Kaszab, Jim Apple, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#6).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

The behavior follows the way fmax()/fmin() works, ie. Impala
will only write NaN into the stats when all the values are NaNs.
This behavior is aligned with the quick fix of Parquet-CPP.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 90 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/6
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG@12
PS5, Line 12: ignores
> If two NaN's are inserted, it wouldn't ignore them but write them into the
Yeah, the next sentence describes that case. Let me rephrase it.

I'm not sure if this is the best option, but it was intentional. I had a 
discussion with the parquet-cpp community about the behavior of the quick fix. 
We agreed to follow what fmax/fmin does: they only return NaN when both 
arguments are NaNs.

IMPALA-6539 is about the long-term fix, but it can't be implemented until 
PARQUET-1222 is resolved.


http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@580
PS5, Line 580: 
> Can you add a test that inserts multiple NaNs, with and without following t
Added some tests for these cases.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 06 Mar 2018 17:25:56 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9381/5//COMMIT_MSG@12
PS5, Line 12: ignores
If two NaN's are inserted, it wouldn't ignore them but write them into the 
stats, no? Is this desired?


http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/5/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@580
PS5, Line 580: 
Can you add a test that inserts multiple NaNs, with and without following them 
with a non-NaN?



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 05 Mar 2018 21:55:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-05 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 5: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 05 Mar 2018 11:13:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-02 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 5:

(2 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37:
> If you prefer, you can use
Yes, it is also a viable approach, however I'd stick with the current solution 
because of the followings:
* more concise
* fewer enable_ifs
* decltype(auto) chooses the right return type we need, and I think it is more 
readable as well


http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h@29
PS4, Line 29:   template 
> I think this would be better placed in some parquet-specific files since it
Yeah I agree, btw ColumnStatsBase was my other candidate as well.

Fixed.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 02 Mar 2018 20:08:46 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-03-02 Thread Zoltan Borok-Nagy (Code Review)
Hello Lars Volker, Gabor Kaszab, Jim Apple, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#5).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.h
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 41 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/5
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-24 Thread Jim Apple (Code Review)
Jim Apple has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> you're welcome!
If you prefer, you can use

template
const std::enable_if_t{}, T> &
Bigger(const T& x, const T& y) {
  return std::max(x, y);
}

template
std::enable_if_t
Bigger(T x, T y) {
  return std::fmax(x, y);
}



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Sat, 24 Feb 2018 19:12:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-23 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h
File be/src/util/stat-util.h:

http://gerrit.cloudera.org:8080/#/c/9381/4/be/src/util/stat-util.h@29
PS4, Line 29: struct MinMaxTrait {
I think this would be better placed in some parquet-specific files since it is 
the min/max ordering we use for parquet files. The other function in stat-util 
looks like it's for things to compute statistical values. My preference would 
be to have it as a class member of ColumnStatsBase.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 23 Feb 2018 17:15:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h@29
PS3, Line 29: template 
> nit: Shouldn't these live inside either ColumnStats or somewhere in util di
Yeah, probably it is not the best place for them.

ColumnStats is a template class and putting it there would be confusing I think.
Maybe I could nest it inside ColumnStatsBase.

Anyway, I found an unused header in the util dir called stat-util.h, I moved 
the definitions there for now.


http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> Thanks for checking and for the explanations.
you're welcome!


http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/1/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@507
PS1, Line 507: select * from test_nan where val < 100
> nit: .
added . here, and also to some other sentences.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 23 Feb 2018 11:31:53 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Lars Volker, Gabor Kaszab, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#4).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M be/src/util/stat-util.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
3 files changed, 44 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/4
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 3:

(1 comment)

Thanks for the explanation, Zoli!
If there is no way to merge the general and the floating point cases to one 
place, than I'm fine with it.

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/3/be/src/exec/parquet-column-stats.inline.h@29
PS3, Line 29: template 
nit: Shouldn't these live inside either ColumnStats or somewhere in util dir?



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2018 16:53:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Lars Volker, Gabor Kaszab, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#3).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 32 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/3
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@495
PS2, Line 495: # IMPALA-6527: NaN values lead to incorrect filtering.
The following 3 tests were introduced with the read path fix. Do you think 
these are still required since we have a separate test suite to test the reader 
(with the inputs written by an incorrect writer)?



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2018 15:53:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-22 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> I just realized that these kind of constexpr ifs will only be valid in C++1
I'm not sure I get this. I played around a little bit with putting 
is_floating_point::value to an if condition and for me this seems working 
even with c++11. Am I missing something?


http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
File testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test:

http://gerrit.cloudera.org:8080/#/c/9381/2/testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test@530
PS2, Line 530: nof
nit: Typo or some abbreviation I don't know? :)



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 22 Feb 2018 15:32:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Lars Volker, Gabor Kaszab, Tim Armstrong,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/9381

to look at the new patch set (#2).

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 39 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/2
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> sgtm, thanks.
I just realized that these kind of constexpr ifs will only be valid in C++17.

In C++14 it would generate a compile-time error since there is no fmax() for 
StringValue and so on.

I think the current solution with the comment is readable, but if you think 
normal function overloads or full template specializations would be better, 
I'll change the code.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Feb 2018 22:54:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-21 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> Yes, but I would need to do it for float and double as well.
sgtm, thanks.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Feb 2018 22:52:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
> Wouldn't normal function overloads work here, too?
Yes, but I would need to do it for float and double as well.

The 'if' can also work, however the return type must be 'T' in that case.

Currently the return type is 'const T&' for std::min/max, and 'T' for 
fmin/fmax. But, I suspect it doesn't really matter because the extra work will 
be optimized away anyway.

I'll choose the 'if' if you don't have a strong preference for the other.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 21 Feb 2018 19:45:28 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-21 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9381 )

Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9381/1/be/src/exec/parquet-column-stats.inline.h@37
PS1, Line 37: std::is_floating_point::value>
Wouldn't normal function overloads work here, too?

Alternatively, wouldn't this also work using a if inside a method?

  if (std::is_floating_point::value) ... else ...;

The compiler should still be able to optimize it all away but it might be more 
concise and easier to follow.



--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 21 Feb 2018 19:31:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

2018-02-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/9381


Change subject: IMPALA-6542: Fix inconsistent write path of Parquet min/max 
statistics
..

IMPALA-6542: Fix inconsistent write path of Parquet min/max statistics

Quick fix of Parquet write path until the Parquet community
agrees on the ordering of floating point numbers.

This commit ignores NaNs during min/max statistics
calculation. Only write NaN as 'min_value' and 'max_value'
if all the values are NaNs in the column chunk.

Added e2e tests as well.

Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
---
M be/src/exec/parquet-column-stats.inline.h
M testdata/workloads/functional-query/queries/QueryTest/parquet-stats.test
2 files changed, 36 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/9381/1
--
To view, visit http://gerrit.cloudera.org:8080/9381
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3957806948f7c661af4be5495f2ec92d1e9fc9d6
Gerrit-Change-Number: 9381
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy