Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
......................................................................


Patch Set 12:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1368
PS11, Line 1368:         // Validate all the partition key/values to ensure you 
can convert them toThrift()
               :         Expr.treesToThrift(getPartitionValues());
               :       } catch (Exception e) {
               :         throw new CatalogException("Partition (" + 
getPartitionName() +
               :             ") has invalid partition column values: ", e);
               :       }
               :     }
               :   }
               :
               :   /**
               :    * Comparator to allow ordering of partitions by their 
partition-key values.
               :    */
               :   public static final KeyValueComparator KV_COMPARATOR = new 
KeyValueComparator();
               :   public static class KeyValueComparator implements 
Comparator<FeFsPartition> {
               :     @Override
               :     public int compare(FeFsPartition o1, FeFsPartition o2) {
               :       return 
comparePartitionKeyValues(o1.getPartitionValues(), o2.getPartitionValues());
               :     }
               :   }
               :
               :   @VisibleForTesting
               :   public static int 
comparePartitionKeyValues(List<LiteralExpr> lhs,
               :       List<LiteralExpr> rhs) {
               :
> nit: Could you move these non-static methods to somewhere above the HdfsPar
Done


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS11, Line 1453:         "that have deleted rows and complex types. As a 
workaround you can run a " +
               :         "major compaction.";
> Maybe mention that the query references complex types?
Done. Full-ACID complex types are coming soon, but it was low effort to add 
some tests for it so I added.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1468
PS11, Line 1468: throws AnalysisException {
> unused arguement
Done


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1471
PS11, Line 1471: hdfsTblRef.getUniqueAlias()
> Looks like if the tableRef doesn't have an explict alias, this is the full
It works well, added some tests.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1537
PS11, Line 1537: /*isPartitionKeySc
> Could you add some tests for isPartitionKeyScan=true? IIUC, we only scan on
Thanks for pointing this out. The code really had a bug, though I needed a bit 
different statements to hit it.

When 'isPartitionKeyScan=true' HdfsScanNode creates only one scan range for 
each file descriptor. Then the backend scanners only return a single row from 
each scan range.

So to hit the bug I had to issue the following statements:

 create table my_acid (id int) partitioned by (p int) stored as orc
   tblproperties('transactional'='true');
 insert into my_acid partition(p=0) values (0), (1), (2);
 insert into my_acid partition(p=1) values (0), (1), (2);
 delete from my_acid where p = 1 and id = 0;
 // Run this in Impala. The result should be 1 since partition(p=1) is NOT 
empty.
 select max(p) from my_acid;

In p=1 we have only one insert delta file which contains (0), (1), and (2), all 
of them fit into one scan range. So the scanner only returns the first element 
(0) that we have deleted.

Added this example as e2e test.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1541
PS11, Line 1541: /*isPartitionKeySc
> Should we always set this to true to scan all delete rows?
I think you meant 'false', and you are right. And we also need to set it to 
false for the insert delta files as well.


http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@714
PS11, Line 714: ====
> Could you add some tests to make sure that the query hints are respected? A
Done


http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
File testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test@21
PS11, Line 21: insert into acid values (5), (5), (5);
> Could you add a test for insert overwrite?
Thanks, it actually revealed a bug related to the splitted file descriptors.

If the table had delete deltas and I issued a REFRESH, then the old deltas 
could shadow the files coming from a newer base directory because I didn't 
clear the old file descriptors in ParallelFileMetadataLoader.loadAndSet().


http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test@52
PS11, Line 52: use $DATABASE;
> Could you add a test for show partitions on this table? Will the deleted pa
alltypes_deleted_rows doesn't have deleted partitions, but I added a test for a 
temporary table.

SHOW PARTITIONS shows the partitions with no data, similar to Hive.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 13 Jul 2020 18:30:23 +0000
Gerrit-HasComments: Yes

Reply via email to