[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Reviewed-on: http://gerrit.cloudera.org:8080/16082
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/custom_cluster/test_local_catalog.py
M tests/query_test/test_acid.py
27 files changed, 1,710 insertions(+), 151 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 14: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 14 Jul 2020 12:53:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6127/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 14 Jul 2020 07:49:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 14: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 14 Jul 2020 07:49:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 13: Code-Review+2

(1 comment)

Thanks for adding bunch of tests! LGTM. Let's ship it!

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1541
PS11, Line 1541: /*isPartitionKeySc
> I think you meant 'false', and you are right. And we also need to set it to
Oops, yes, I mean 'false'... Thanks for addressing this.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 14 Jul 2020 00:20:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6576/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 13 Jul 2020 18:58:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 13:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6577/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 13 Jul 2020 18:57:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Quanlong Huang, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#13).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/custom_cluster/test_local_catalog.py
M tests/query_test/test_acid.py
27 files changed, 1,710 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/13
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16082/12/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/16082/12/tests/custom_cluster/test_local_catalog.py@30
PS12, Line 30: from tests.common.skip import (SkipIfHive2, SkipIfCatalogV2, 
SkipIfS3, SkipIfABFS,
> flake8: F401 'tests.common.skip.SkipIfCatalogV2' imported but unused
Done



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 13 Jul 2020 18:31:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 12:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1368
PS11, Line 1368: // Validate all the partition key/values to ensure you 
can convert them toThrift()
   : Expr.treesToThrift(getPartitionValues());
   :   } catch (Exception e) {
   : throw new CatalogException("Partition (" + 
getPartitionName() +
   : ") has invalid partition column values: ", e);
   :   }
   : }
   :   }
   :
   :   /**
   :* Comparator to allow ordering of partitions by their 
partition-key values.
   :*/
   :   public static final KeyValueComparator KV_COMPARATOR = new 
KeyValueComparator();
   :   public static class KeyValueComparator implements 
Comparator {
   : @Override
   : public int compare(FeFsPartition o1, FeFsPartition o2) {
   :   return 
comparePartitionKeyValues(o1.getPartitionValues(), o2.getPartitionValues());
   : }
   :   }
   :
   :   @VisibleForTesting
   :   public static int 
comparePartitionKeyValues(List lhs,
   :   List rhs) {
   :
> nit: Could you move these non-static methods to somewhere above the HdfsPar
Done


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS11, Line 1453: "that have deleted rows and complex types. As a 
workaround you can run a " +
   : "major compaction.";
> Maybe mention that the query references complex types?
Done. Full-ACID complex types are coming soon, but it was low effort to add 
some tests for it so I added.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1468
PS11, Line 1468: throws AnalysisException {
> unused arguement
Done


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1471
PS11, Line 1471: hdfsTblRef.getUniqueAlias()
> Looks like if the tableRef doesn't have an explict alias, this is the full
It works well, added some tests.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1537
PS11, Line 1537: /*isPartitionKeySc
> Could you add some tests for isPartitionKeyScan=true? IIUC, we only scan on
Thanks for pointing this out. The code really had a bug, though I needed a bit 
different statements to hit it.

When 'isPartitionKeyScan=true' HdfsScanNode creates only one scan range for 
each file descriptor. Then the backend scanners only return a single row from 
each scan range.

So to hit the bug I had to issue the following statements:

 create table my_acid (id int) partitioned by (p int) stored as orc
   tblproperties('transactional'='true');
 insert into my_acid partition(p=0) values (0), (1), (2);
 insert into my_acid partition(p=1) values (0), (1), (2);
 delete from my_acid where p = 1 and id = 0;
 // Run this in Impala. The result should be 1 since partition(p=1) is NOT 
empty.
 select max(p) from my_acid;

In p=1 we have only one insert delta file which contains (0), (1), and (2), all 
of them fit into one scan range. So the scanner only returns the first element 
(0) that we have deleted.

Added this example as e2e test.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1541
PS11, Line 1541: /*isPartitionKeySc
> Should we always set this to true to scan all delete rows?
I think you meant 'false', and you are right. And we also need to set it to 
false for the insert delta files as well.


http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@714
PS11, Line 714: 
> Could you add some tests to make sure that the query hints are respected? A
Done



[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16082/12/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/16082/12/tests/custom_cluster/test_local_catalog.py@30
PS12, Line 30: from tests.common.skip import (SkipIfHive2, SkipIfCatalogV2, 
SkipIfS3, SkipIfABFS,
flake8: F401 'tests.common.skip.SkipIfCatalogV2' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 13 Jul 2020 18:29:34 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Quanlong Huang, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#12).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/custom_cluster/test_local_catalog.py
M tests/query_test/test_acid.py
27 files changed, 1,710 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/12
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-09 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 11:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@1368
PS11, Line 1368:
   :   @Override
   :   public long getWriteId() {
   : return writeId_;
   :   }
   :
   :   @Override
   :   public HdfsPartition genInsertDeltaPartition() {
   : ImmutableList fileDescriptors = 
!encodedInsertFileDescriptors_.isEmpty() ?
   : encodedInsertFileDescriptors_ : 
encodedFileDescriptors_;
   : return new HdfsPartition.Builder(this)
   : .setId(id_)
   : .setFileDescriptors(fileDescriptors)
   : .build();
   :   }
   :
   :   @Override
   :   public HdfsPartition genDeleteDeltaPartition() {
   : if (encodedDeleteFileDescriptors_.isEmpty()) return null;
   : return new HdfsPartition.Builder(this)
   : .setId(id_)
   : .setFileDescriptors(encodedDeleteFileDescriptors_)
   : .build();
   :   }
nit: Could you move these non-static methods to somewhere above the 
HdfsPartition.Builder? Other codes here are static subclass, fields or methods.


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS11, Line 1453: final String NOT_SUPPORTED_YET = "This query is not 
supported on full ACID tables " +
   : "that have deleted rows. As a workaround you can run a 
major compaction.";
Maybe mention that the query references complex types?

Not sure how complex it is to add a test for this. It may worths some test 
coverage on it. Or maybe we'll support reading full-ACID complex types soon?


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1468
PS11, Line 1468: List partitions
unused arguement


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1471
PS11, Line 1471: hdfsTblRef.getUniqueAlias()
Looks like if the tableRef doesn't have an explict alias, this is the full 
qualified table name. Could you add some tests to make sure the resolution 
works as expected? I.e. tests that the full-acid table is used more than once 
and all occurances don't have explicit alias, e.g.

 select count(*) from (
   select * from functional_orc_def.alltypes_deleted_rows where id % 2 = 0
   union all
   select * from functional_orc_def.alltypes_deleted_rows where id % 2 != 0
 ) t;

 select id from functional_orc_def.alltypes_deleted_rows where id % 2 = 0 limit 
10
 union all
 select max(id) from functional_orc_def.alltypes_deleted_rows;


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1537
PS11, Line 1537: isPartitionKeyScan
Could you add some tests for isPartitionKeyScan=true? IIUC, we only scan one 
row per partition when isPartitionKeyScan=true. If the first row from the 
insert delta doesn't match the first row from the delete delta, we will 
consider this partition as non-empty, no matter whether there are other delete 
rows that could match the first row from the insert delta.

I think it's worth to add a test like this:

 create table my_acid (id int) partitioned by (p int) stored as orc
   tblproperties('transactional'='true');
 insert into my_acid partition(p=0) values (0), (1), (2);
 // (p=1, id=0) may be the first row of insert delta
 insert into my_acid partition(p=1) values (0), (1), (2);
 // (p=1, id=2) may be the first row of delete delta
 delete from my_acid where p = 1 and id = 2;
 delete from my_acid where p = 1 and id = 1;
 delete from my_acid where p = 1 and id = 0;
 // Run this in Impala. The result should be 0 since partition(p=1) is empty.
 select max(p) from my_acid;


http://gerrit.cloudera.org:8080/#/c/16082/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1541
PS11, Line 1541: isPartitionKeyScan
Should we always set this to true to scan all delete rows?


http://gerrit.cloudera.org:8080/#/c/16082/11/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File 

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 11:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6513/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jul 2020 15:29:09 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-07 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 11:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@376
PS10, Line 376:
> Could you add a TODO in this method for IMPALA-9883?
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2010
PS10, Line 2010: getNumFileDescriptors();
> Please change this to getNumFileDescriptors() after we revise its implement
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@56
PS10, Line 56: import org.apache.thrift.TException;
> nit: unused import
Thanks for pointing these out! Done.


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@255
PS10, Line 255:   }
> nit: 4 spaces indent
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@409
PS10, Line 409: public ImmutableList getFileDescriptors() {
  :   if (insertFds_.isEmpty()) return fds_;
  :   List ret = Lists.newArrayListWithCapacity(
  :   insertFds_.size() + deleteFds_.size());
  :   ret.addAll(insertFds_);
  :   ret.addAll(deleteFds_);
  :   return ImmutableList.copyOf(ret);
  : }
  :
  : @Override
  : public ImmutableList 
getInsertFileDescriptors() {
  :   return insertFds_;
  : }
  :
> Need to update these
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@133
PS10, Line 133: !fileDescriptors_.isEmpty() ||
> nit: It's inefficient to construct the list for this. Maybe change to
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@138
PS10, Line 138:
> nit: It's inefficient to construct the list for this. Maybe change to
Instead of the ternary operator, I rather just sum up the sizes to keep it more 
readable.


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@45
PS9, Line 45: import org.apache.impala.analysis.NullLiteral;
> Looks like IntelliJ is more intelligent in this :D
Yeah :D It's a shame vscode fails on such a trivial task. But apart from this 
it's a great IDE.


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1518
PS9, Line 1518:   feTable.getMetaStoreTable().getParameters()));
> PS10 looks good to me. I think the insert files and delete files are used s
I'm OK with this if you think it's good.

Yeah, I was also thinking about merging fileDescriptors_ and 
insertFileDescriptors_. My only concern is that it would be a bit weird for 
genDeleteDeltaPartition(), because it would put the delete delta descriptors 
into the regular descriptors.


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1536
PS10, Line 1536: hdfsTblRef.getDesc(), conjuncts, 
insertDeltaPartitions, hdfsTblRef,
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
File fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@497
PS10, Line 497: () {
> not *Fail* anymore
Done



[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16082/11/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/16082/11/tests/custom_cluster/test_local_catalog.py@30
PS11, Line 30: from tests.common.skip import (SkipIfHive2, SkipIfCatalogV2, 
SkipIfS3, SkipIfABFS,
flake8: F401 'tests.common.skip.SkipIfCatalogV2' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jul 2020 15:01:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-07 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Quanlong Huang, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#11).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/custom_cluster/test_local_catalog.py
M tests/query_test/test_acid.py
27 files changed, 1,460 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/11
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-07 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10:

(9 comments)

Thanks for making the change! I'm still looking into the join related logic and 
tests.

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
File fe/src/main/java/org/apache/impala/catalog/FeFsTable.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/FeFsTable.java@376
PS10, Line 376: getFilesSample
Could you add a TODO in this method for IMPALA-9883?


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2010
PS10, Line 2010: getFileDescriptors().size()
Please change this to getNumFileDescriptors() after we revise its 
implementation.


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@56
PS10, Line 56: import org.apache.orc.FileMetadata;
nit: unused import


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@255
PS10, Line 255:
nit: 4 spaces indent


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@133
PS10, Line 133: !getFileDescriptors().isEmpty()
nit: It's inefficient to construct the list for this. Maybe change to

 return !fileDescriptors_.isEmpty() || !insertFileDescriptors_.isEmpty() || 
!deleteFileDescriptors_.isEmpty().


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@138
PS10, Line 138: getFileDescriptors().size()
nit: It's inefficient to construct the list for this. Maybe change to

 return fileDescriptors_.isEmpty()? insertFileDescriptors_.size() + 
deleteFileDescriptors_.size() : fileDescriptors_.size();


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@45
PS9, Line 45: import org.apache.impala.analysis.NullLiteral;
> Yeah, I use my IDE to do that for me. Interestingly, even if I delete this
Looks like IntelliJ is more intelligent in this :D


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1518
PS9, Line 1518:   feTable.getMetaStoreTable().getParameters()));
> I implemented it in PS10, but I'm afraid that the change became a bit too i
PS10 looks good to me. I think the insert files and delete files are used 
separately so it makes sense to store them in different fileds. However, It'd 
be good to ask more feedbacks on this decision.

Maybe we can merge fileDescriptors_ and insertFileDescriptors_ to one field 
(since only one of them is valid at a time) to reduce the complexity.


http://gerrit.cloudera.org:8080/#/c/16082/10/tests/custom_cluster/test_local_catalog.py
File tests/custom_cluster/test_local_catalog.py:

http://gerrit.cloudera.org:8080/#/c/16082/10/tests/custom_cluster/test_local_catalog.py@467
PS10, Line 467:   def test_full_acid_support(self):
Could you add a test here for reading functional_orc_def.alltypes_deleted_rows 
in LocalCatalog mode?



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 07 Jul 2020 09:59:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 21:16:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
File fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java@409
PS10, Line 409: @Override
  : public ImmutableList getFileDescriptors() {
  :   return fds_;
  : }
  :
  : @Override
  : public ImmutableList 
getInsertFileDescriptors() {
  :   return fds_;
  : }
  :
  : @Override
  : public ImmutableList 
getDeleteFileDescriptors() {
  :   return fds_;
  : }
Need to update these


http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
File fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java@497
PS10, Line 497: Fail
not *Fail* anymore



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 17:07:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6094/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 16:09:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6499/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 16:05:15 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@45
PS9, Line 45: import org.apache.impala.analysis.Path.PathType;
> nit: keep the import list sorted in groups (usually the IDE will do this fo
Yeah, I use my IDE to do that for me. Interestingly, even if I delete this 
line, and ask VSCode to import PathType for me, it puts the import to this line 
again...

Same with HdfsPartition.FileDescriptor. Seems like if an import path has more 
components it can confuse vscode.

Anyway, fixed it manually :D


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1418
PS9, Line 1418:   if (addAcidSlotsIfNeeded(analyzer, hdfsTblRef, 
partitions)) {
> nit: what about merging this if-statement with its outer scope so they are
Done


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1518
PS9, Line 1518: // Let's separate insert delta File Descriptors from delete 
delta FDs.
> I think we should separate the file descriptors in catalogd after loading t
I implemented it in PS10, but I'm afraid that the change became a bit too 
invasive. An alternative approach is just to add the genInsertDeltaPartition() 
and genDeleteDeltaPartition() to the partition classes and those would filter 
out the unneeded file descs and return a new Partition object. This way the 
change would be smaller, but we'd still have to generate these insert/delta 
partitions for each query. What do you think?



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 15:46:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/10/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1536
PS10, Line 1536: hdfsTblRef.getDesc(), conjuncts, 
insertDeltaPartitions, hdfsTblRef, /*aggInfo=*/null,
line too long (93 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 06 Jul 2020 15:42:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-06 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Quanlong Huang, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#10).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java
M fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/DirectMetaProvider.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/MetaProvider.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
25 files changed, 1,429 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/10
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-07-01 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@45
PS9, Line 45: import org.apache.impala.analysis.Path.PathType;
nit: keep the import list sorted in groups (usually the IDE will do this for 
you).


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1418
PS9, Line 1418:   if (addAcidSlotsIfNeeded(analyzer, hdfsTblRef, 
partitions)) {
nit: what about merging this if-statement with its outer scope so they are

 if (isPartitionKeyScan && queryOpts.optimize_partition_key_scans) {
   ...
 } else if (addAcidSlotsIfNeeded(analyzer, hdfsTblRef, partitions)) {
   ...
 } else {
   ...
 }


http://gerrit.cloudera.org:8080/#/c/16082/9/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1518
PS9, Line 1518: // Let's separate insert delta File Descriptors from delete 
delta FDs.
I think we should separate the file descriptors in catalogd after loading them 
from HDFS instead of doing it here for each query. We can introduce two fileds 
in HdfsPartition: encodedInsertDeltaFileDescriptors_ and 
encodedDeleteDeltaFileDescriptors_ (and related fields in THdfsPartition and 
TPartialPartitionInfo). If a partition contains delete deltas, we separate them 
by setting these two fields and leaving encodedFileDescriptors_ null. We can 
also introduce two methods for FeFsPartition: genInsertDeltaPartition() and 
genDeleteDeltaPartition() using HdfsPartition.Builder in this way:

 public HdfsPartition genInsertDeltaPartition() {
return new HdfsPartition.Builder(this)
.setFileDescriptors(InsertDeltaFileDescriptors_)
.build();
 }

With this we don't need to remove the "final" marker of encodedFileDescriptors_ 
and add back the setFileDescriptor() method, which violates our goal to make 
HdfsPartition immutable. The setFileDescriptor() method may encourage future 
developers to modify HdfsPartitions in-place in catalogd, which will break 
IMPALA-7533.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 01 Jul 2020 06:57:52 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Jun 2020 14:02:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6464/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Jun 2020 09:28:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-30 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6078/ 
DRY_RUN=true


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Jun 2020 09:02:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-30 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 9: Code-Review+1

PS9 is a rebase. Carrying +1.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 30 Jun 2020 09:02:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-30 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Tim Armstrong, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#9).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,148 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/9
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6399/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Jun 2020 16:28:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#8).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,148 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/8
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-23 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 8: Code-Review+1

(2 comments)

Carrying Aman's +1

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> >>  It's because COMPUTE STATS is query-based, so it runs count(*) and coun
Thanks for mentioning stats extrapolation. I created IMPALA-9883 for that.

Renamed DELETA TABLE to DELETE EVENTS.


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
> I see..I missed that the other columns were partitioning cols. I would thin
Added a TODO comment about it in SingleNodePlanner.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Jun 2020 16:06:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 7: Code-Review+1

Couple of minor comments. Overall, this LGTM.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 2020 17:55:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> Yeah I don't think we should pass the conjuncts to the delete deltas. Also
>>  It's because COMPUTE STATS is query-based, so it runs count(*) and 
>> count(distinct) queries on a table and stores the results of it.
Makes sense..so Compute Stats is also doing the left anti join.  Do you know if 
the sampling clause of Compute Stats will be supported for the ACID table ? i.e 
a 10% sample on the delta table would need to be anti-joined with the 
delete_delta.  I suppose that should work. In any case, that's a separate 
investigation.

Regarding the naming, yes, 'EVENTS'  is more accurate than 'TABLE' since it is 
not really a table.


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
> It only adds the hidden ACID columns + the partitioning columns (year, mont
I see..I missed that the other columns were partitioning cols. I would think 
that not all the columns need to be in the 'hash predicate' .. as long as the 
ones with high NDV are considered for hashing, the rest could be part of the 
'other join predicates' to avoid hashing on too many columns.   But this could 
be a future enhancement.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 2020 17:54:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6390/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 2020 16:48:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6389/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 2020 16:41:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 7:

PS 6&7 are only rebases.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 2020 16:21:57 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#7).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,143 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/7
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#6).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,142 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/6
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 5:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG@9
PS4, Line 9: Hive ACID supports row-level DELETE and UPDATE operations on a 
table.
> nit: spelling 'operatations'
Done


http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG@11
PS4, Line 11: maintaining two sets of files in a table. The first set is in the
> nit: spelling 'maintinaining'
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454: if (fd.getRelativePath().startsWith("delete_delta_")) {
> I think for this patch it is fine to keep it this way.  I saw in the doc th
Yeah I agree. Will do that during perf analysis.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551: HdfsScanNode deltaScanNode = new 
HdfsScanNode(ctx_.getNextNodeId(),
> Actually, the planner does this optimization as part of SanNode.applyCountS
You are right, I was thinking about a different optimization in the ORC scanner 
which is also for count(*):

https://github.com/apache/impala/blob/17fd15c6e4981499932c02d541c76757a5fdf87d/be/src/exec/hdfs-orc-scanner.cc#L531-L548

Anyway, now we have tests for count(*) + ACID so if someone implements the 
Parquet-kind count(*) optimization for ORC in the future, they'll know if they 
break stg.


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> Regarding the conjuncts, wouldn't it be incorrect to  even pass the conjunc
Yeah I don't think we should pass the conjuncts to the delete deltas. Also it 
wouldn't have any effect since the delete delta files contain no real data, 
only the ACID fields (originaltransaction, rowid, etc.) which serve as a 
tombstone.

To answer the question about cardinality, yes, COMPUTE STATS would show 90. 
It's because COMPUTE STATS is query-based, so it runs count(*) and 
count(distinct) queries on a table and stores the results of it. And the 
"DELETE TABLE" is not a separate table. The delete files are stored under the 
table's directory in so called "delete delta" directories next to the "delta" 
directories, e.g.:

  
/test-warehouse/managed/full_acid_part/p=1/delete_delta_003_003_/bucket_0
  
/test-warehouse/managed/full_acid_part/p=1/delta_001_001_/bucket_0_0

Given that do you think we should rename "DELETE TABLE HASH JOIN" to "DELETE 
EVENTS HASH JOIN"? (The Hive docs usually refer to the rows in delete delta 
files as "delete events")


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
> This looks like it is adding all the other Slots to hash predicates, apart
It only adds the hidden ACID columns + the partitioning columns (year, month).



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 22 Jun 

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-22 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#5).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operations on a table.
It achieves it via assigning a unique row-id for each row, and
maintaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,147 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/5
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-19 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG@9
PS4, Line 9: Hive ACID supports row-level DELETE and UPDATE operatations on a 
table.
nit: spelling 'operatations'


http://gerrit.cloudera.org:8080/#/c/16082/4//COMMIT_MSG@11
PS4, Line 11: maintinaining two sets of files in a table. The first set is in 
the
nit: spelling 'maintinaining'



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 19 Jun 2020 19:12:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88
PS2, Line 88: displayName_ = "DELETE TABLE " + displayName_;
> Yeah, I just wanted to somehow mark this JOIN to make it obvious what's goi
Sounds good.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454: if (fd.getRelativePath().startsWith("delete_delta_")) {
> Unfortunately there's no such metadata AFAICT. It could be added during dat
I think for this patch it is fine to keep it this way.  I saw in the doc that 
there's a performance analysis phase ..perhaps it could be 
investigated/improved in that phase.  Since this extra cost is added during 
planning time which typically is in hundreds of ms, It would be interesting to 
see for typical workloads what is the percentage change in planning time.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480
PS2, Line 1480: rawPath.add("row__id");
> It's needed and it's added by the above for loop at L1474-L1478.
ah, i see..thanks.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551: HdfsScanNode deltaScanNode = new 
HdfsScanNode(ctx_.getNextNodeId(),
> Yeah, this makes sense.
Actually, the planner does this optimization as part of 
SanNode.applyCountStarOptimization() which is triggered for Parquet file 
formats currently..so I suppose in the ORC case it doesn't even matter 
(although I could foresee doing such optimization for ORC as well).


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
> Maybe it's not that wrong. If I use compute stats then the actual cardinali
Regarding the conjuncts, wouldn't it be incorrect to  even pass the conjuncts 
to the delete delta scan ?  we don't want to prune out deleted records too 
early.
For the cardinality, do you mean if the original cardinality was 100 and 10 
rows were deleted,  does COMPUTE STATS show  90 ?  Even if no compaction was 
done ?


http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/4/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@286
PS4, Line 286: |  |  hash predicates: 
functional_orc_def.alltypes_deleted_rows.month = 
functional_orc_def.alltypes_deleted_rows-delete-delta.month, 
functional_orc_def.alltypes_deleted_rows.row__id.bucket = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.bucket, 
functional_orc_def.alltypes_deleted_rows.row__id.originaltransaction = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.originaltransaction,
 functional_orc_def.alltypes_deleted_rows.row__id.rowid = 
functional_orc_def.alltypes_deleted_rows-delete-delta.row__id.rowid, 
functional_orc_def.alltypes_deleted_rows.year = 
functional_orc_def.alltypes_deleted_rows-delete-delta.year
This looks like it is adding all the other Slots to hash predicates, apart from 
the hidden slots.  Isn't that wrong ? My understanding was the anti join should 
only join on the ACID specific columns.



--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 19 Jun 2020 01:27:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6371/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Jun 2020 17:05:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 4:

(11 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@673
PS2, Line 673: fileFormatDescriptor_ = other.fileFormatDescriptor_;
> Looking at the changes in https://issues.apache.org/jira/browse/IMPALA-9778
Resolved it during the rebase (PS3).


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178
PS2, Line 178:   if (!isAcidJoin_ || detailLevel.ordinal() >= 
TExplainLevel.EXTENDED.ordinal()) {
> For the anti join should we not show the join conditions in the Impala quer
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@83
PS2, Line 83:   // True if this join is used to do the join between insert and 
delete delta files.
> nit: spelling 'deleta'
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88
PS2, Line 88: displayName_ = "DELETE TABLE " + displayName_;
> The  'ACID' prefix for a HashJoin could be misleading to end users since a
Yeah, I just wanted to somehow mark this JOIN to make it obvious what's going 
on, but probably 'ACID' is not the best term to use.

I think 'DELETE TABLE HASH JOIN' makes sense.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS2, Line 1453:   for (HdfsPartition.FileDescriptor fd : 
partition.getFileDescriptors()) {
> The casting  is redundant since 'partition' is already of type FeFsPartitio
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454: if (fd.getRelativePath().startsWith("delete_delta_")) {
> Is there any indicator in the HMS's partition object that indicates whether
Unfortunately there's no such metadata AFAICT. It could be added during data 
loading, though it'd complicate the code a little, especially when local 
catalog is being used since it reuses loaded partitions while trying to filter 
out file descriptors based on an ACID write id list, see IMPALA-9791 for more 
details.

Also, I did some measurements. Created a table with 1000 partitions and 10 
files per each partition. Then I've nested this for-loop into another for-loop 
with 100 iterations to simulate a table with 100K partitions and 1 million 
files. This whole check was evaluated between 100-200 milliseconds.

So for now I'm leaning towards keeping it simple unless you feel strongly 
against it.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480
PS2, Line 1480: rawPath.add("row__id");
> The design doc suggests the anti-join is on 4 fields..is the partition id n
It's needed and it's added by the above for loop at L1474-L1478.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551: HdfsScanNode deltaScanNode = new 
HdfsScanNode(ctx_.getNextNodeId(),
> The  'aggInfo' is meant for the simple 'select count(*) from table'  type o
Yeah, this makes sense.
It didn't cause a bug because the backend only optimizes count(*) if it's a 
"zero-slot table scan", and we've added ACID slots to materialize.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1591
PS2, Line 1591: insertSlotDesc.getMaterializedPath())) {
> Call analyze for the BinaryPredicate.  Also, could you add a simple test wi
Done. Added planner tests for explain_level 2 and 3.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1605
PS2, Line 1605:   /**
> nit: spelling 'partitoin'
Done


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File 

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#4).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operatations on a table.
It achieves it via assigning a unique row-id for each row, and
maintinaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 1,147 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/4
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6365/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Jun 2020 12:50:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 3:

PS3 is only a rebase


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 18 Jun 2020 12:28:44 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Aman Sinha, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#3).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operatations on a table.
It achieves it via assigning a unique row-id for each row, and
maintinaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 745 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/3
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-17 Thread Aman Sinha (Code Review)
Aman Sinha has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 2:

(11 comments)

Did a first pass..some comments below.

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@673
PS2, Line 673:   protected HdfsPartition(HdfsPartition other) {
Looking at the changes in https://issues.apache.org/jira/browse/IMPALA-9778,  
the HdfsPartition class  has been restructured quite a bit.. although it seems 
your changes should be ok since you are only creating a clone.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178
PS2, Line 178:   if (!isAcidJoin_ || detailLevel.ordinal() == 
TExplainLevel.VERBOSE.ordinal()) {
For the anti join should we not show the join conditions in the Impala query 
profile ? (query profile output is in EXTENDED mode, not VERBOSE).


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java
File fe/src/main/java/org/apache/impala/planner/JoinNode.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@83
PS2, Line 83:   // True if this join is used to do the join between insert and 
deleta delta files.
nit: spelling 'deleta'


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/JoinNode.java@88
PS2, Line 88: displayName_ = "ACID " + displayName_;
The  'ACID' prefix for a HashJoin could be misleading to end users since a plan 
operator itself is not ACID in DBMS terms.  Could we use something like. 'DELTA 
TABLE HASH JOIN'  (open to other suggestions).


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1453
PS2, Line 1453:   FeFsPartition fsPartition = (FeFsPartition)partition;
The casting  is redundant since 'partition' is already of type FeFsPartition.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1454
PS2, Line 1454:   for (HdfsPartition.FileDescriptor fd : 
fsPartition.getFileDescriptors()) {
Is there any indicator in the HMS's partition object that indicates whether any 
delete was done ?  It does seem expensive to  check all the file descriptors 
within a partition since in vast majority of cases nothing may have been 
deleted or the 'delete_delta' is found only towards the end. However,  if such 
metadata does not exist,  then I suppose this is the only alternative.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1480
PS2, Line 1480: String[] acidFields = {"originaltransaction", "bucket", 
"rowid"};
The design doc suggests the anti-join is on 4 fields..is the partition id not 
needed in this join ?


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1551
PS2, Line 1551: hdfsTblRef.getDesc(), conjuncts, insertPartitions, 
hdfsTblRef, aggInfo,
The  'aggInfo' is meant for the simple 'select count(*) from table'  type of 
query where the aggregation value can be found from the row count in the 
metadata.  But in this case, since  we want to compute this value only after 
the anti-join has been done,  I think it should be null.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1591
PS2, Line 1591:   ret.add(new BinaryPredicate(
Call analyze for the BinaryPredicate.  Also, could you add a simple test with 
explain_level=3 to check for the join condition ? I don't think I saw one.


http://gerrit.cloudera.org:8080/#/c/16082/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1605
PS2, Line 1605:* @return the cloned partitoin
nit: spelling 'partitoin'


http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
File testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test:

http://gerrit.cloudera.org:8080/#/c/16082/2/testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test@58
PS2, Line 58: |  row-size=100B cardinality=143
The cardinality estimate for this hash join 

[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6350/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 17 Jun 2020 16:16:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/16082

to look at the new patch set (#2).

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operatations on a table.
It achieves it via assigning a unique row-id for each row, and
maintinaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/FeFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
15 files changed, 733 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/2
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16082 )

Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6329/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 15 Jun 2020 17:36:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive types)

2020-06-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16082


Change subject: IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified 
tables (primitive types)
..

IMPALA-9859: Full ACID Milestone 4: Part 1 Reading modified tables (primitive 
types)

Hive ACID supports row-level DELETE and UPDATE operatations on a table.
It achieves it via assigning a unique row-id for each row, and
maintinaining two sets of files in a table. The first set is in the
base/delta directories, they contain the INSERTed rows. The second set
of files are in the delete-delta directories, they contain the DELETEd
rows.

(UPDATE operations are implemented via DELETE+INSERT.)

In the filesystem it looks like e.g.:
 * full_acid/delta_001_001_/_0
 * full_acid/delta_002_002_/_0
 * full_acid/delete_delta_003_003_/_0

During scanning we need to return INSERTed rows minus DELETEd rows.
This patch implements it by creating an ANTI JOIN between the INSERT and
DELETE sets. It is a planner-only modification. Every HDFS SCAN
that scans full ACID tables (that also have deleted rows) are converted
to two HDFS SCANs, one for the INSERT deltas, and one for the DELETE
deltas. Then a LEFT ANTI HASH JOIN with BROADCAST distribution mode is
created above them.

Later we can add support for other distribution modes if the performance
requires it. E.g. if we have too many deleted rows then probably we are
better off with PARTITIONED distribution mode. We could estimate the
number of deleted rows by sampling the delete delta files.

The current patch only works for primitive types. I.e. we cannot select
nested data if the table has deleted rows.

Testing:
 * added planner test
 * added e2e tests

Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
---
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/util/AcidUtils.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/util/AcidUtilsTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/acid-scans.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/full-acid-scans.test
M tests/query_test/test_acid.py
13 files changed, 708 insertions(+), 88 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/16082/1
--
To view, visit http://gerrit.cloudera.org:8080/16082
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I15c8feabf40be1658f3dd46883f5a1b2aa5d0659
Gerrit-Change-Number: 16082
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy