Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton closed pull request #19431: perf: improve `range` and `generate_series` for `Int64` URL: https://github.com/apache/datafusion/pull/19431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696815910 Closing as it is fast enough and this have some regression -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb-ghbot commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696805577 🤖: Benchmark completed Details ``` group improve-range-and-generate-series-for-int-64main - generate_series(0, 100, 5) 1.81 836.7±12.13µs? ?/sec 1.00462.2±6.92µs? ?/sec generate_series(100) 2.19 3.4±0.02ms? ?/sec 1.00 1552.7±14.22µs? ?/sec generate_series(100, 0, -5)1.00327.6±7.64µs? ?/sec 1.39454.4±6.16µs? ?/sec range(0, 100, 5) 1.00388.5±3.23µs? ?/sec 1.16450.3±4.20µs? ?/sec range(100) 1.00 1214.4±7.87µs? ?/sec 1.28 1556.8±16.35µs? ?/sec range(100, 0, -5) 1.00323.2±3.67µs? ?/sec 1.40452.1±8.87µs? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb-ghbot commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696745321 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing improve-range-and-generate-series-for-int-64 (df3c1b91a1eda6774db51fb5e1eca0e1c6454bdb) to 83ed19235b700a2bd41283301cfd344cb5b565bc [diff](https://github.com/apache/datafusion/compare/83ed19235b700a2bd41283301cfd344cb5b565bc..df3c1b91a1eda6774db51fb5e1eca0e1c6454bdb) BENCH_NAME=range_and_generate_series BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series BENCH_FILTER= BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64 Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb commented on code in PR #19431: URL: https://github.com/apache/datafusion/pull/19431#discussion_r2651085722 ## datafusion/sqllogictest/test_files/table_functions.slt: ## @@ -510,6 +510,277 @@ SELECT c, f.* FROM json_table, LATERAL generate_series(1,2) f; 2 1 2 2 +# To test edge cases related to batch size +statement ok +set datafusion.execution.batch_size = 10; + +# range equal batch size +query I +SELECT * FROM range(10) + +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + +# generate_series equal batch size (including end) +query I +SELECT * FROM generate_series(9) + +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 + +# range equal batch size * 2 +query I +SELECT * FROM range(20) + +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 + +# generate_series equal batch size * 2 (including end) +query I +SELECT * FROM generate_series(19) + +0 +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 + +# range equal batch size with starting value +query I +SELECT * FROM range(1, 11) + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 + +# generate_series equal batch size with starting value (including end) +query I +SELECT * FROM generate_series(1, 10) + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 + +# range equal batch size * 2 with starting value +query I +SELECT * FROM range(1, 21) + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 + +# generate_series equal batch size * 2 with starting value (including end) +query I +SELECT * FROM generate_series(1, 20) + +1 +2 +3 +4 +5 +6 +7 +8 +9 +10 +11 +12 +13 +14 +15 +16 +17 +18 +19 +20 + +# range equal batch size with starting value and step +query I +SELECT count(*) FROM range(1, 21, 2) + +10 + +# range equal batch size with starting value and step +query I +SELECT * FROM range(1, 21, 2) + +1 +3 +5 +7 +9 +11 +13 +15 +17 +19 + +# generate_series equal batch size with starting value and step (including end) +query I +SELECT count(*) FROM generate_series(1, 20, 2) + +10 + +# generate_series equal batch size with starting value and step (including end) +query I +SELECT * FROM generate_series(1, 20, 2) + +1 +3 +5 +7 +9 +11 +13 +15 +17 +19 + +# range equal batch size * 2 with starting value and step +query I +SELECT count(*) FROM range(1, 40, 2) + +20 + +# range equal batch size * 2 with starting value and step +query I +SELECT * FROM range(1, 40, 2) + +1 +3 +5 +7 +9 +11 +13 +15 +17 +19 +21 +23 +25 +27 +29 +31 +33 +35 +37 +39 + +# generate_series equal batch size * 2 with starting value and step (including end) +query I +SELECT count(*) FROM generate_series(1, 39, 2) + +20 + +# generate_series equal batch size * 2 with starting value and step (including end) +query I +SELECT * FROM generate_series(1, 39, 2) Review Comment: should we also add a test for a starting value other than 1? Perhaps 100 and -100? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696670210 run benchmark range_and_generate_series -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696670052 These cases seems to have slowed down ``` generate_series(0, 100, 5) 1.86838.9±8.69µs? ?/sec 1.00449.9±6.31µs? ?/sec generate_series(100) 2.25 3.5±0.05ms? ?/sec 1.00 1552.1±21.57µs? ?/sec ``` Is that expected? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb-ghbot commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678835978 🤖: Benchmark completed Details ``` group improve-range-and-generate-series-for-int-64main - generate_series(0, 100, 5) 1.86838.9±8.69µs? ?/sec 1.00449.9±6.31µs? ?/sec generate_series(100) 2.25 3.5±0.05ms? ?/sec 1.00 1552.1±21.57µs? ?/sec generate_series(100, 0, -5)1.00326.4±6.72µs? ?/sec 1.42462.1±6.47µs? ?/sec range(0, 100, 5) 1.00 393.8±12.61µs? ?/sec 1.15 453.1±10.31µs? ?/sec range(100) 1.00 1237.9±19.84µs? ?/sec 1.26 1559.6±24.79µs? ?/sec range(100, 0, -5) 1.00319.1±5.62µs? ?/sec 1.44 460.7±15.48µs? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
pepijnve commented on code in PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#discussion_r2637894371
##
datafusion/functions-table/src/generate_series.rs:
##
@@ -109,6 +133,59 @@ impl SeriesValue for i64 {
fn display_value(&self) -> String {
self.to_string()
}
+
+fn generate_array_for_series(
+series_state: &mut GenericSeriesState,
+) -> Result {
+let array: Int64Array = if series_state.step > 0 {
+if series_state.include_end {
+Int64Array::from_iter_values(
+(series_state.current..=series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+} else {
+Int64Array::from_iter_values(
+(series_state.current..series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+}
+} else {
+// step < 0
+let cur = series_state.current as i128;
Review Comment:
I gave it a go locally. This seems to do the trick:
```rust
if series_state.include_end {
Int64Array::from_iter_values(
(series_state.end..=series_state.current)
.rev()
.step_by(-series_state.step as usize)
.take(series_state.batch_size),
)
} else {
Int64Array::from_iter_values(
((series_state.end + 1)..=series_state.current)
.rev()
.step_by(-series_state.step as usize)
.take(series_state.batch_size),
)
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678818049 > @alamb I might increased the benchmark too much, is it stuck? I don't know what happened -- the runner had died. I restarted t -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb-ghbot commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678817260 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing improve-range-and-generate-series-for-int-64 (67b2d0404ed65f27526b1f803b18abeeaa7027e9) to 4249e4ecd354e6060e2a4ea33f38552366299a4e [diff](https://github.com/apache/datafusion/compare/4249e4ecd354e6060e2a4ea33f38552366299a4e..67b2d0404ed65f27526b1f803b18abeeaa7027e9) BENCH_NAME=range_and_generate_series BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series BENCH_FILTER= BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64 Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
pepijnve commented on code in PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#discussion_r2637804725
##
datafusion/functions-table/src/generate_series.rs:
##
@@ -109,6 +133,59 @@ impl SeriesValue for i64 {
fn display_value(&self) -> String {
self.to_string()
}
+
+fn generate_array_for_series(
+series_state: &mut GenericSeriesState,
+) -> Result {
+let array: Int64Array = if series_state.step > 0 {
+if series_state.include_end {
+Int64Array::from_iter_values(
+(series_state.current..=series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+} else {
+Int64Array::from_iter_values(
+(series_state.current..series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+}
+} else {
+// step < 0
+let cur = series_state.current as i128;
Review Comment:
Can the same effect be achieved using a reverse iterator like:
```
(series_state.end..=series_state.current).rev()
```
Depending on whether end is exclusive or not, you may have to offset it by
one.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678672236 show benchmarks queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678672009 show benchmark queue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678216185 @alamb I might increased the benchmark too much, is it stuck? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
alamb-ghbot commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678179025 🤖 `./gh_compare_branch_bench.sh` [compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing improve-range-and-generate-series-for-int-64 (67b2d0404ed65f27526b1f803b18abeeaa7027e9) to 4249e4ecd354e6060e2a4ea33f38552366299a4e [diff](https://github.com/apache/datafusion/compare/4249e4ecd354e6060e2a4ea33f38552366299a4e..67b2d0404ed65f27526b1f803b18abeeaa7027e9) BENCH_NAME=range_and_generate_series BENCH_COMMAND=cargo bench --features=parquet --bench range_and_generate_series BENCH_FILTER= BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64 Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]
rluvaton commented on PR #19431: URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678178987 run benchmark range_and_generate_series -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
