UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2275332927
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2275332927
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3078462264
> > select t1.value from range(100) t1 join range(819200) t2 on t1.value +
t2.value < t1.value * t2.value;
>
> I'm happy to include this benchmark in the bench suite this week,
2010YOUY01 commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3076513755
> select t1.value from range(100) t1 join range(819200) t2 on t1.value +
t2.value < t1.value * t2.value;
I'm happy to include this benchmark in the bench suite this week, un
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3071663279
> * Refactor so we limit building the entire cartesian product of both
batches (this is already covered in the issue and I believe @UBarney is willing
to work on this)
Yes. I'
alamb commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070765444
Awesome -- thanks @jonathanc-n and @UBarney -- I am very happy to see this
moving along
--
This is an automated message from the Apache Git Service.
To respond to the message, please
alamb merged PR #16443:
URL: https://github.com/apache/datafusion/pull/16443
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscr...@datafusi
jonathanc-n commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070761175
@alamb Yes I believe all comments have been addressed. I think we have two
notable follow ups:
- Refactor so we limit building the entire cartesian product of both
batches (t
alamb commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3070585433
Is this one ready to merge?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific
2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203255450
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filte
Dandandan commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203255187
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203230458
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202485385
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,15 +845,125 @@ impl NestedLoopJoinStream {
let poll = handle_state!(self
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202485179
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -705,8 +696,29 @@ impl NestedLoopJoinStreamState {
}
}
+/// Tracks incremental output of j
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process_pr
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process_pr
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process_pr
2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199484385
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filte
2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199423074
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process
jonathanc-n commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3058063741
@2010YOUY01 Special types need to only return the matching rows, so only one
side needs to return rows while the other side can return a null array and not
be projected in the fi
2010YOUY01 commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3056490809
> I have addressed all of your comments. @2010YOUY01 please take another look
>
> > I recommend to doc more high-level ideas to key functions, to make this
module easier to
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3055598311
I have addressed all of your comments. @2010YOUY01 please take another look
> I recommend to doc more high-level ideas to key functions, to make this
module easier to
2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2194486005
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -689,6 +674,8 @@ enum NestedLoopJoinStreamState {
ProcessProbeBatch(RecordBatch),
///
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2194337073
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
alamb commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2192449966
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter: &J
alamb commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2192449082
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter: &J
Dandandan commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2191954803
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter
Dandandan commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2191939442
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter
Dandandan commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2191937168
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter
Dandandan commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2191935939
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter
2010YOUY01 commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3047980169
> @korowa @2010YOUY01 Are you able to take a quick look? Thanks!
Thank you so much for this optimization. It's on my list, but due to the
complexity of the join operator, I
jonathanc-n commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3045016104
@korowa @2010YOUY01 Are you able to take a quick look? Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub an
jonathanc-n commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2181119355
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.proces
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-3038952572
> Thanks @UBarney, just some comments
Thanks @jonathanc-n for reviewing. I have addressed all of your comments.
--
This is an automated message from the Apache Git Service.
T
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2182935284
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
jonathanc-n commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2182837993
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filt
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2182032475
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2181428292
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -883,44 +1000,63 @@ impl NestedLoopJoinStream {
let visited_left_side = left_data.bitmap(
jonathanc-n commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2181125973
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +828,127 @@ impl NestedLoopJoinStream {
handle_state!(self.proces
jonathanc-n commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2181102938
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.proces
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2177799623
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -883,44 +1002,66 @@ impl NestedLoopJoinStream {
let visited_left_side = left_data.bitmap(
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2177790602
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -1215,16 +1324,25 @@ pub(crate) mod tests {
batches.extend(
more_bat
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r216496
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process_pr
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r210988
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -729,10 +716,26 @@ struct NestedLoopJoinStream {
right_side_ordered: bool,
/// Current s
2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173715452
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -883,44 +1002,66 @@ impl NestedLoopJoinStream {
let visited_left_side = left_data.bitm
Copilot commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173686344
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -729,10 +716,26 @@ struct NestedLoopJoinStream {
right_side_ordered: bool,
/// Current s
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173642138
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -883,44 +1002,66 @@ impl NestedLoopJoinStream {
let visited_left_side = left_data.bitmap(
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173635381
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.process_pr
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173634499
##
datafusion/physical-plan/src/joins/utils.rs:
##
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
probe_indices: UInt32Array,
filter:
korowa commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2173258366
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -729,10 +716,26 @@ struct NestedLoopJoinStream {
right_side_ordered: bool,
/// Current st
jonathanc-n commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2160347506
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -828,13 +833,127 @@ impl NestedLoopJoinStream {
handle_state!(self.proces
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2994063902
> * `apply_join_filter_to_indices` Showed a reduction in execution time
(sample count reduced from 528million to 241million).
The benchmark results indicate that restricting th
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2993919183
> When you are running the benchmarks do they stay consistent?
Yes. bechmarks result almost consistent.
I ran the benchmarks a few minutes ago on commit.
It's wort
jonathanc-n commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2993895538
When you are running the benchmarks do they stay consistent?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2993893069
> I'll find out why there is a performance improvement
From the flame graph (when executing the SQL `select t1.value from
range(8192) t1 join range(8192) t2 on t1.value + t2.va
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2156308803
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -510,8 +511,6 @@ impl ExecutionPlan for NestedLoopJoinExec {
})?;
let batch_si
jonathanc-n commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2988168626
Those benchmark helper functions are really cool, I'll see if I can take a
look today.
--
This is an automated message from the Apache Git Service.
To respond to the message, p
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2156308803
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -510,8 +511,6 @@ impl ExecutionPlan for NestedLoopJoinExec {
})?;
let batch_si
UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2156308803
##
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##
@@ -510,8 +511,6 @@ impl ExecutionPlan for NestedLoopJoinExec {
})?;
let batch_si
UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2986400295
# benchmark
I use this
[script](https://gist.github.com/UBarney/9dcbf304e65f061d3352b34abd0f0e05#file-sql_bench-py)
to do benchmark
| ID | SQL | join_base Time(s) | join_li
UBarney opened a new pull request, #16443:
URL: https://github.com/apache/datafusion/pull/16443
## Which issue does this PR close?
part of #16364
## Rationale for this change
see issue
## What changes are included in this PR?
1. Limit intermediate_batch Siz
61 matches
Mail list logo