Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16082 )
Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types) ...................................................................... Patch Set 12: (9 comments) http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java: http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1368 PS11, Line 1368: // Validate all the partition key/values to ensure you can convert them toThrift() : Expr.treesToThrift(getPartitionValues()); : } catch (Exception e) { : throw new CatalogException("Partition (" + getPartitionName() + : ") has invalid partition column values: ", e); : } : } : } : : /** : * Comparator to allow ordering of partitions by their partition-key values. : */ : public static final KeyValueComparator KV_COMPARATOR = new KeyValueComparator(); : public static class KeyValueComparator implements Comparator<FeFsPartition> { : @Override : public int compare(FeFsPartition o1, FeFsPartition o2) { : return comparePartitionKeyValues(o1.getPartitionValues(), o2.getPartitionValues()); : } : } : : @VisibleForTesting : public static int comparePartitionKeyValues(List<LiteralExpr> lhs, : List<LiteralExpr> rhs) { : > nit: Could you move these non-static methods to somewhere above the HdfsPar Done http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453 PS11, Line 1453: "that have deleted rows and complex types. As a workaround you can run a " + : "major compaction."; > Maybe mention that the query references complex types? Done. Full-ACID complex types are coming soon, but it was low effort to add some tests for it so I added. http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1468 PS11, Line 1468: throws AnalysisException { > unused arguement Done http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1471 PS11, Line 1471: hdfsTblRef.getUniqueAlias() > Looks like if the tableRef doesn't have an explict alias, this is the full It works well, added some tests. http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1537 PS11, Line 1537: /*isPartitionKeySc > Could you add some tests for isPartitionKeyScan=true? IIUC, we only scan on Thanks for pointing this out. The code really had a bug, though I needed a bit different statements to hit it. When 'isPartitionKeyScan=true' HdfsScanNode creates only one scan range for each file descriptor. Then the backend scanners only return a single row from each scan range. So to hit the bug I had to issue the following statements: create table my_acid (id int) partitioned by (p int) stored as orc tblproperties('transactional'='true'); insert into my_acid partition(p=0) values (0), (1), (2); insert into my_acid partition(p=1) values (0), (1), (2); delete from my_acid where p = 1 and id = 0; // Run this in Impala. The result should be 1 since partition(p=1) is NOT empty. select max(p) from my_acid; In p=1 we have only one insert delta file which contains (0), (1), and (2), all of them fit into one scan range. So the scanner only returns the first element (0) that we have deleted. Added this example as e2e test. http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1541 PS11, Line 1541: /*isPartitionKeySc > Should we always set this to true to scan all delete rows? I think you meant 'false', and you are right. And we also need to set it to false for the insert delta files as well. http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@714 PS11, Line 714: ==== > Could you add some tests to make sure that the query hints are respected? A Done http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test File testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test: http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test@21 PS11, Line 21: insert into acid values (5), (5), (5); > Could you add a test for insert overwrite? Thanks, it actually revealed a bug related to the splitted file descriptors. If the table had delete deltas and I issued a REFRESH, then the old deltas could shadow the files coming from a newer base directory because I didn't clear the old file descriptors in ParallelFileMetadataLoader.loadAndSet(). http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test@52 PS11, Line 52: use $DATABASE; > Could you add a test for show partitions on this table? Will the deleted pa alltypes_deleted_rows doesn't have deleted partitions, but I added a test for a temporary table. SHOW PARTITIONS shows the partitions with no data, similar to Hive. -- To view, visit http://gerrit.cloudera.org:8080/16082 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659 Gerrit-Change-Number: 16082 Gerrit-PatchSet: 12 Gerrit-Owner: Zoltan Borok-Nagy <[email protected]> Gerrit-Reviewer: Aman Sinha <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Mon, 13 Jul 2020 18:30:23 +0000 Gerrit-HasComments: Yes
