[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Internal Jenkins (Code Review)
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


IMPALA-1657: Rework detection and reporting of corrupt table stats.

1. Minor fixes for cardinality estimation of unpartitioned tables.
2. Reworks handling of corrupt table stats as follows:
   The stats of a table or partition are reported as corrupt if the
   numRows < -1, or if numRows == 0 but the table size is positive.
3. Removes the Preconditions check reported in IMPALA-1657 in favor
   or issuing a corrupt table stats warning.
4. Fixes a few tests to set numRows together with
   STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely
   set in the HMS.

Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Reviewed-on: http://gerrit.cloudera.org:8080/4166
Reviewed-by: Alex Behm 
Tested-by: Internal Jenkins
---
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test
M tests/metadata/test_compute_stats.py
4 files changed, 81 insertions(+), 22 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Internal Jenkins (Code Review)
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 3: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 3: Code-Review+2

Carry Marcel's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Alex Behm (Code Review)
Hello Marcel Kornacker,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4166

to look at the new patch set (#2).

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..

IMPALA-1657: Rework detection and reporting of corrupt table stats.

1. Minor fixes for cardinality estimation of unpartitioned tables.
2. Reworks handling of corrupt table stats as follows:
   The stats of a table or partition are reported as corrupt if the
   numRows < -1, or if numRows == 0 but the table size is positive.
3. Removes the Preconditions check reported in IMPALA-1657 in favor
   or issuing a corrupt table stats warning.
4. Fixes a few tests to set numRows together with
   STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely
   set in the HMS.

Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
---
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test
M tests/metadata/test_compute_stats.py
4 files changed, 81 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/4166/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4166/1/fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java:

Line 359:* Also computes totalBytes_
> , totalFiles_, hasCorruptTableStats_
Done


Line 370:   if ((cardinality_ < -1) || (cardinality_ == 0 && 
tbl_.getTotalHdfsBytes() > 0)) {
> remove () from (cardinality_ < -1) - i find that easier to read because it 
Done


Line 374: totalFiles_ += partitions_.get(0).getFileDescriptors().size();
> another checkstate that we only have a single partition?
Done


Line 377: // Nothing to scan. Definitely a cardinality of 0 even if we 
have no stats.
> make this branch the first one to avoid negation
Done


Line 414: if (!(cardinality_ >= 0 || cardinality_ == -1)) {
> a bit easier: if (cardinality_ < -1)
Done


http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:

Line 495: 04:HASH JOIN [INNER JOIN]
> that generally seems to be the case as often as it is not. :)
Turns out my hbase setup was messed up, sorry. Reverted.


-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 1: Code-Review+2

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4166/1/fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java:

Line 359:* Also computes totalBytes_
, totalFiles_, hasCorruptTableStats_


Line 370:   if ((cardinality_ < -1) || (cardinality_ == 0 && 
tbl_.getTotalHdfsBytes() > 0)) {
remove () from (cardinality_ < -1) - i find that easier to read because it 
requires less parenthesis counting


Line 374: totalFiles_ += partitions_.get(0).getFileDescriptors().size();
another checkstate that we only have a single partition?


Line 377: // Nothing to scan. Definitely a cardinality of 0 even if we 
have no stats.
make this branch the first one to avoid negation


Line 414: if (!(cardinality_ >= 0 || cardinality_ == -1)) {
a bit easier: if (cardinality_ < -1)


http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:

Line 495: 04:HASH JOIN [INNER JOIN]
> Plan looks better, but I'm still double checking whether something is wrong
that generally seems to be the case as often as it is not. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-29 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:

Line 495: 04:HASH JOIN [INNER JOIN]
Plan looks better, but I'm still double checking whether something is wrong 
with my hbase setup.


-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm 
Gerrit-Reviewer: Alex Behm 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.

2016-08-29 Thread Alex Behm (Code Review)
Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4166

Change subject: IMPALA-1657: Rework detection and reporting of corrupt table 
stats.
..

IMPALA-1657: Rework detection and reporting of corrupt table stats.

1. Minor fixes for cardinality estimation of unpartitioned tables.
2. Reworks handling of corrupt table stats as follows:
   The stats of a table or partition are reported as corrupt if the
   numRows < -1, or if numRows == 0 but the table size is positive.
3. Removes the Preconditions check reported in IMPALA-1657 in favor
   or issuing a corrupt table stats warning.
4. Fixes a few tests to set numRows together with
   STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely
   set in the HMS.

Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
---
M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
M fe/src/main/java/com/cloudera/impala/planner/Planner.java
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test
M tests/metadata/test_compute_stats.py
5 files changed, 88 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/4166/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm