[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-19 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16396/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Jun 2024 11:43:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 9: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Jun 2024 11:25:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-19 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc@319
PS8, Line 319: )
> Now that this function doesn't return void, I think it helps readability to
Done


http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test:

http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test@5
PS8, Line 5: UNION node which behaves
   : # differently
> Does it mean that the result will be different? Or only that we can't asser
The results can be different because how rand() is being used in the fragment 
instance.

I haven't insvestigated all the details, but we get different results when both 
data files are scanned in the same fragment VS the data files are scheduled on 
different nodes.



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Jun 2024 11:18:43 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-19 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#9).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M tests/query_test/test_scanners.py
5 files changed, 164 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/9
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-19 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 8: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/8/be/src/exec/iceberg-delete-builder.cc@319
PS8, Line 319: )
Now that this function doesn't return void, I think it helps readability to add 
annotation for the return value "-> Status".


http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test
File 
testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test:

http://gerrit.cloudera.org:8080/#/c/21435/8/testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test@5
PS8, Line 5: UNION node which behaves
   : # differently
Does it mean that the result will be different? Or only that we can't assert 
specific values in the profile?



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Jun 2024 09:32:06 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 8: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 19 Jun 2024 02:01:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10732/


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 21:26:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 8:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16389/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 21:02:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#8).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M tests/query_test/test_scanners.py
5 files changed, 164 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/8
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 8:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10736/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 20:39:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21435/7/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/21435/7/tests/query_test/test_scanners.py@526
PS7, Line 526: @
flake8: F811 redefinition of unused 'test_iceberg_query' from line 520



-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 17:14:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16384/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 16:49:32 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#7).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
A testdata/workloads/functional-query/queries/QueryTest/iceberg-query-v2.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-query.test
M tests/query_test/test_scanners.py
5 files changed, 164 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/7
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10732/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 16:17:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 6: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10729/


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 14:41:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10729/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 18 Jun 2024 09:29:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 15:32:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16368/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 15:12:04 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 6:

(3 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@305
PS5, Line 305: vect
> I think using the actual type (vector&) is more readable.
Done


http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@327
PS5, Line 327: join
> "this IcebergDeleteBuilder"?
It was intentional from me to write 'join', as the associated JOIN fragment 
processes the data files, and in the builder's context it should be clear which 
'join' we are referring to.


http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@328
PS5, Line 328: process
> Nit: "processes".
Done



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 14:47:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#6).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 113 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/6
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 5:

(4 comments)

Thanks Zoltán!

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242
PS3, Line 242: auto
> The actual type is a pair.
Yes, it's also a pity that structured bindings don't allow type annotation.


http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@305
PS5, Line 305: auto
I think using the actual type (vector&) is more readable.


http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@327
PS5, Line 327: join
"this IcebergDeleteBuilder"?


http://gerrit.cloudera.org:8080/#/c/21435/5/be/src/exec/iceberg-delete-builder.cc@328
PS5, Line 328: process
Nit: "processes".



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 14:41:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16366/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 14:11:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 5:

(9 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@70
PS3, Line 70: s
> Nit: this should be "Processes".
Done


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@121
PS3, Line 121: ctor of vectors means we can just allocate another vector to hold
 :   // subsequent eleme
> Could you clarify this? Is it about 'intermediate_delete_rows_' in addition
Rephrased a bit, I hope it's clearer now.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@149
PS3, Line 149: s
> We usually take non-const (output) arguments by pointer.
Done, also switched the parameters, as we prefer to have output parameters at 
the end.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@184
PS3, Line 184: Inte
> I think using the actual type (IntermediateDeleteRowVector&) is more readab
Done


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242
PS3, Line 242: auto
> I think using the actual type (IntermediateDeleteRowVector&) is more readab
The actual type is a pair.

I switched to structured binding as I think with proper variable names it is 
more readable.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@250
PS3, Line 250: e_ve
> I think using the actual type (vector) is more readable. Also L252
Done


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@296
PS3, Line 296: cons
> I think using the actual type (vector) is more readable. Also L306
Done


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@319
PS3, Line 319:
> Can we take it by const ref?
Done


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@327
PS3, Line 327:
> Can it happen that (*path) is not a key in 'intermediate_delete_rows_' but
Done



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Mon, 17 Jun 2024 13:47:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-17 Thread Zoltan Borok-Nagy (Code Review)
Hello Daniel Becker, Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#5).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 113 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/5
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-14 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 3:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@70
PS3, Line 70: d
Nit: this should be "Processes".


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@121
PS3, Line 121: The newly added vectors have double capacity so we don't end up 
having
 :   // too many vectors
Could you clarify this? Is it about 'intermediate_delete_rows_' in addition to 
the final map, or the embedded vectors?


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.h@149
PS3, Line 149: &
We usually take non-const (output) arguments by pointer.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@184
PS3, Line 184: auto
I think using the actual type (IntermediateDeleteRowVector&) is more readable.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@242
PS3, Line 242: auto
I think using the actual type (IntermediateDeleteRowVector&) is more readable.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@250
PS3, Line 250: auto
I think using the actual type (vector) is more readable. Also L252.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@296
PS3, Line 296: auto
I think using the actual type (vector) is more readable. Also L306.


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@319
PS3, Line 319: StringValue*
Can we take it by const ref?


http://gerrit.cloudera.org:8080/#/c/21435/3/be/src/exec/iceberg-delete-builder.cc@327
PS3, Line 327: }
Can it happen that (*path) is not a key in 'intermediate_delete_rows_' but 
path->Len() != 0? If not, we could add DCHECK(false) in a following ELSE 
branch, with an explaining comment.



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 14 Jun 2024 16:37:40 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 3:

(6 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95
PS2, Line 95:   // Collects the processed data files' file paths and fills 
'intermediate_delete_rows_'
> The function comment is not valid now. Would be nice to mention that this p
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124
PS2, Line 124:   // Keys are data file paths processed by the join operator. 
Values are the file
> Can you add a comment what the keys and values are for in this map?
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146
PS2, Line 146:
 :   /// Inserts the conte
> nit: seems a weird place to break the line
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184
PS2, Line 184: auto& delete_vector =
> To increase readability, do you think it'd make sense to introduce a variab
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259
PS2, Line 259: rrent delet
> nit: I'd break the line before the first param, and then each param could f
Done


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325
PS2, Line 325:   state->LogError(
> KrpcDataStreamSender already filters out NULL file_paths. Is there a need t
We can see NULLs when NUM_NODES=1



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:31:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16333/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 13 Jun 2024 16:54:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-06-13 Thread Zoltan Borok-Nagy (Code Review)
Hello Gabor Kaszab, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#3).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 108 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/3
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-30 Thread Gabor Kaszab (Code Review)
Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@95
PS2, Line 95:   // Checks distribution mode and collects the processed data 
files' file path in case
The function comment is not valid now. Would be nice to mention that this 
populates the keys of intermediate_delete_rows_ and adds empty vectors to the 
values.


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@124
PS2, Line 124:   using IntermediateDeleteRowHashTable =
Can you add a comment what the keys and values are for in this map?


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.h@146
PS2, Line 146: IntermediateDeleteRowVector&
 : intermediate_vector
nit: seems a weird place to break the line


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@184
PS2, Line 184: retval.first->second.
To increase readability, do you think it'd make sense to introduce a 
variable/reference for 'retval.first->second'?


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@259
PS2, Line 259: std::unique
nit: I'd break the line before the first param, and then each param could fit 
into a (separate) single line.


http://gerrit.cloudera.org:8080/#/c/21435/2/be/src/exec/iceberg-delete-builder.cc@325
PS2, Line 325:   ErrorMsg(TErrorCode::GENERAL, "NULL found as file_path 
in delete file"));
KrpcDataStreamSender already filters out NULL file_paths. Is there a need to 
add an error message here too? Can't we DCHECK for non-nullness? Or is this 
because of some data corruption scenario?



--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 30 May 2024 14:58:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-23 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/16211/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 23 May 2024 13:09:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-23 Thread Zoltan Borok-Nagy (Code Review)
Hello Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21435

to look at the new patch set (#2).

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
(cherry picked from commit d08315fe5c57ccb5b197cd196b62eeedf7d90ec3)
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 101 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/2
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 May 2024 19:54:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-21 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has removed a vote on this change.

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Removed Verified-1 by Impala Public Jenkins 
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-21 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10655/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Tue, 21 May 2024 14:53:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 1: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/10646/


--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 16 May 2024 21:28:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-16 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21435 )

Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/10646/ 
DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 16 May 2024 16:25:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

2024-05-16 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21435


Change subject: IMPALA-13088: (part 1) Improve build batch processing of 
IcebergDeleteBuilder
..

IMPALA-13088: (part 1) Improve build batch processing of IcebergDeleteBuilder

When there are lots of delete records the IcebergDeleteBuilder can
become a bottleneck. Since the left side of the JOIN is blocked on
the build side any improvement we make here significantly improves
Iceberg V2 table scanning.

Improvements of this patch:

* Use a vector of vectors to collect the position delete records.
  This way we can avoid large re-allocations and copyings.
* Insert large ranges from the build batches into the collected
  delete records instead of doing it one-by-one.

Measurements

Local measurement with 824 Million position delete records:
JOIN BUILD: ~32s -> ~14s (6s is the final sorting)

40-node cluster with 68.5 Billion position delete records:
JOIN BUILD: 4m15s -> 1m45s (1m7s is the final sorting)

Parallelization of the final sort will be added in a follow-up CR.

Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
---
M be/src/exec/iceberg-delete-builder.cc
M be/src/exec/iceberg-delete-builder.h
2 files changed, 101 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/35/21435/1
--
To view, visit http://gerrit.cloudera.org:8080/21435
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I14541a064a522d4780fb5f02636736259e79b9cf
Gerrit-Change-Number: 21435
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy