Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> Yeah I don't think we should pass the conjuncts to the delete deltas. Also
>>  It's because COMPUTE STATS is query-based, so it runs count(*) and 
>> count(distinct) queries on a table and stores the results of it.
Makes sense..so Compute Stats is also doing the left anti join.  Do you know if 
the sampling clause of Compute Stats will be supported for the ACID table ? i.e 
a 10% sample on the delta table would need to be anti-joined with the 
delete_delta.  I suppose that should work. In any case, that's a separate 
investigation.

Regarding the naming, yes, 'EVENTS'  is more accurate than 'TABLE' since it is 
not really a table.


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
> It only adds the hidden ACID columns + the partitioning columns (year, mont
I see..I missed that the other columns were partitioning cols. I would think 
that not all the columns need to be in the 'hash predicate' .. as long as the 
ones with high NDV are considered for hashing, the rest could be part of the 
'other join predicates' to avoid hashing on too many columns.   But this could 
be a future enhancement.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <amsi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Mon, 22 Jun 2020 17:54:44 +0000
Gerrit-HasComments: Yes

Reply via email to