[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Internal Jenkins has submitted this change and it was merged. Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. IMPALA-1657: Rework detection and reporting of corrupt table stats. 1. Minor fixes for cardinality estimation of unpartitioned tables. 2. Reworks handling of corrupt table stats as follows: The stats of a table or partition are reported as corrupt if the numRows < -1, or if numRows == 0 but the table size is positive. 3. Removes the Preconditions check reported in IMPALA-1657 in favor or issuing a corrupt table stats warning. 4. Fixes a few tests to set numRows together with STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely set in the HMS. Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Reviewed-on: http://gerrit.cloudera.org:8080/4166 Reviewed-by: Alex Behm Tested-by: Internal Jenkins --- M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java M fe/src/main/java/com/cloudera/impala/planner/Planner.java M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test M tests/metadata/test_compute_stats.py 4 files changed, 81 insertions(+), 22 deletions(-) Approvals: Internal Jenkins: Verified Alex Behm: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Internal Jenkins has posted comments on this change. Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. Patch Set 3: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Internal Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Alex Behm has posted comments on this change. Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. Patch Set 3: Code-Review+2 Carry Marcel's +2 -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/4166 to look at the new patch set (#2). Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. IMPALA-1657: Rework detection and reporting of corrupt table stats. 1. Minor fixes for cardinality estimation of unpartitioned tables. 2. Reworks handling of corrupt table stats as follows: The stats of a table or partition are reported as corrupt if the numRows < -1, or if numRows == 0 but the table size is positive. 3. Removes the Preconditions check reported in IMPALA-1657 in favor or issuing a corrupt table stats warning. 4. Fixes a few tests to set numRows together with STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely set in the HMS. Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 --- M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java M fe/src/main/java/com/cloudera/impala/planner/Planner.java M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test M tests/metadata/test_compute_stats.py 4 files changed, 81 insertions(+), 22 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/4166/2 -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Alex Behm has posted comments on this change.
Change subject: IMPALA-1657: Rework detection and reporting of corrupt table
stats.
..
Patch Set 1:
(6 comments)
http://gerrit.cloudera.org:8080/#/c/4166/1/fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java:
Line 359:* Also computes totalBytes_
> , totalFiles_, hasCorruptTableStats_
Done
Line 370: if ((cardinality_ < -1) || (cardinality_ == 0 &&
tbl_.getTotalHdfsBytes() > 0)) {
> remove () from (cardinality_ < -1) - i find that easier to read because it
Done
Line 374: totalFiles_ += partitions_.get(0).getFileDescriptors().size();
> another checkstate that we only have a single partition?
Done
Line 377: // Nothing to scan. Definitely a cardinality of 0 even if we
have no stats.
> make this branch the first one to avoid negation
Done
Line 414: if (!(cardinality_ >= 0 || cardinality_ == -1)) {
> a bit easier: if (cardinality_ < -1)
Done
http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:
Line 495: 04:HASH JOIN [INNER JOIN]
> that generally seems to be the case as often as it is not. :)
Turns out my hbase setup was messed up, sorry. Reverted.
--
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm
Gerrit-Reviewer: Alex Behm
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Marcel Kornacker
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Marcel Kornacker has posted comments on this change.
Change subject: IMPALA-1657: Rework detection and reporting of corrupt table
stats.
..
Patch Set 1: Code-Review+2
(6 comments)
http://gerrit.cloudera.org:8080/#/c/4166/1/fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java
File fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java:
Line 359:* Also computes totalBytes_
, totalFiles_, hasCorruptTableStats_
Line 370: if ((cardinality_ < -1) || (cardinality_ == 0 &&
tbl_.getTotalHdfsBytes() > 0)) {
remove () from (cardinality_ < -1) - i find that easier to read because it
requires less parenthesis counting
Line 374: totalFiles_ += partitions_.get(0).getFileDescriptors().size();
another checkstate that we only have a single partition?
Line 377: // Nothing to scan. Definitely a cardinality of 0 even if we
have no stats.
make this branch the first one to avoid negation
Line 414: if (!(cardinality_ >= 0 || cardinality_ == -1)) {
a bit easier: if (cardinality_ < -1)
http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test:
Line 495: 04:HASH JOIN [INNER JOIN]
> Plan looks better, but I'm still double checking whether something is wrong
that generally seems to be the case as often as it is not. :)
--
To view, visit http://gerrit.cloudera.org:8080/4166
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm
Gerrit-Reviewer: Alex Behm
Gerrit-Reviewer: Dimitris Tsirogiannis
Gerrit-Reviewer: Marcel Kornacker
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Alex Behm has posted comments on this change. Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/4166/1/testdata/workloads/functional-planner/queries/PlannerTest/hbase.test File testdata/workloads/functional-planner/queries/PlannerTest/hbase.test: Line 495: 04:HASH JOIN [INNER JOIN] Plan looks better, but I'm still double checking whether something is wrong with my hbase setup. -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm Gerrit-Reviewer: Alex Behm Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-1657: Rework detection and reporting of corrupt table stats.
Alex Behm has uploaded a new change for review. http://gerrit.cloudera.org:8080/4166 Change subject: IMPALA-1657: Rework detection and reporting of corrupt table stats. .. IMPALA-1657: Rework detection and reporting of corrupt table stats. 1. Minor fixes for cardinality estimation of unpartitioned tables. 2. Reworks handling of corrupt table stats as follows: The stats of a table or partition are reported as corrupt if the numRows < -1, or if numRows == 0 but the table size is positive. 3. Removes the Preconditions check reported in IMPALA-1657 in favor or issuing a corrupt table stats warning. 4. Fixes a few tests to set numRows together with STATS_GENERATED_VIA_STATS_TASK so that the numRows is definitely set in the HMS. Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 --- M fe/src/main/java/com/cloudera/impala/planner/HdfsScanNode.java M fe/src/main/java/com/cloudera/impala/planner/Planner.java M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test M testdata/workloads/functional-query/queries/QueryTest/corrupt-stats.test M tests/metadata/test_compute_stats.py 5 files changed, 88 insertions(+), 31 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/66/4166/1 -- To view, visit http://gerrit.cloudera.org:8080/4166 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I1d3305791d96e1c23a901af7b7c109af9352bb44 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm
