Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


rluvaton closed pull request #19431: perf: improve `range` and 
`generate_series` for `Int64`
URL: https://github.com/apache/datafusion/pull/19431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


rluvaton commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696815910

   Closing as it is fast enough and this have some regression


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


alamb-ghbot commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696805577

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group  
improve-range-and-generate-series-for-int-64main
   -  

   generate_series(0, 100, 5) 1.81   836.7±12.13µs? ?/sec   
  1.00462.2±6.92µs? ?/sec
   generate_series(100)   2.19  3.4±0.02ms? ?/sec   
  1.00  1552.7±14.22µs? ?/sec
   generate_series(100, 0, -5)1.00327.6±7.64µs? ?/sec   
  1.39454.4±6.16µs? ?/sec
   range(0, 100, 5)   1.00388.5±3.23µs? ?/sec   
  1.16450.3±4.20µs? ?/sec
   range(100) 1.00   1214.4±7.87µs? ?/sec   
  1.28  1556.8±16.35µs? ?/sec
   range(100, 0, -5)  1.00323.2±3.67µs? ?/sec   
  1.40452.1±8.87µs? ?/sec
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


alamb-ghbot commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696745321

   🤖 `./gh_compare_branch_bench.sh` 
[compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing improve-range-and-generate-series-for-int-64 
(df3c1b91a1eda6774db51fb5e1eca0e1c6454bdb) to 
83ed19235b700a2bd41283301cfd344cb5b565bc 
[diff](https://github.com/apache/datafusion/compare/83ed19235b700a2bd41283301cfd344cb5b565bc..df3c1b91a1eda6774db51fb5e1eca0e1c6454bdb)
   BENCH_NAME=range_and_generate_series
   BENCH_COMMAND=cargo bench --features=parquet --bench 
range_and_generate_series
   BENCH_FILTER=
   BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


alamb commented on code in PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#discussion_r2651085722


##
datafusion/sqllogictest/test_files/table_functions.slt:
##
@@ -510,6 +510,277 @@ SELECT c, f.*  FROM json_table, LATERAL 
generate_series(1,2) f;
 2 1
 2 2
 
+# To test edge cases related to batch size
+statement ok
+set datafusion.execution.batch_size = 10;
+
+# range equal batch size
+query I
+SELECT * FROM range(10)
+
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+
+# generate_series equal batch size (including end)
+query I
+SELECT * FROM generate_series(9)
+
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+
+# range equal batch size * 2
+query I
+SELECT * FROM range(20)
+
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+
+# generate_series equal batch size * 2 (including end)
+query I
+SELECT * FROM generate_series(19)
+
+0
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+
+# range equal batch size with starting value
+query I
+SELECT * FROM range(1, 11)
+
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
+# generate_series equal batch size with starting value (including end)
+query I
+SELECT * FROM generate_series(1, 10)
+
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+
+# range equal batch size * 2 with starting value
+query I
+SELECT * FROM range(1, 21)
+
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+
+# generate_series equal batch size * 2 with starting value (including end)
+query I
+SELECT * FROM generate_series(1, 20)
+
+1
+2
+3
+4
+5
+6
+7
+8
+9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+
+# range equal batch size with starting value and step
+query I
+SELECT count(*) FROM range(1, 21, 2)
+
+10
+
+# range equal batch size with starting value and step
+query I
+SELECT * FROM range(1, 21, 2)
+
+1
+3
+5
+7
+9
+11
+13
+15
+17
+19
+
+# generate_series equal batch size with starting value and step (including end)
+query I
+SELECT count(*) FROM generate_series(1, 20, 2)
+
+10
+
+# generate_series equal batch size with starting value and step (including end)
+query I
+SELECT * FROM generate_series(1, 20, 2)
+
+1
+3
+5
+7
+9
+11
+13
+15
+17
+19
+
+# range equal batch size * 2 with starting value and step
+query I
+SELECT count(*) FROM range(1, 40, 2)
+
+20
+
+# range equal batch size * 2 with starting value and step
+query I
+SELECT * FROM range(1, 40, 2)
+
+1
+3
+5
+7
+9
+11
+13
+15
+17
+19
+21
+23
+25
+27
+29
+31
+33
+35
+37
+39
+
+# generate_series equal batch size * 2 with starting value and step (including 
end)
+query I
+SELECT count(*) FROM generate_series(1, 39, 2)
+
+20
+
+# generate_series equal batch size * 2 with starting value and step (including 
end)
+query I
+SELECT * FROM generate_series(1, 39, 2)

Review Comment:
   should we also add a test for a starting value other than 1? Perhaps 100 and 
-100?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


alamb commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696670210

   run benchmark range_and_generate_series


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-29 Thread via GitHub


alamb commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3696670052

   These cases seems to have slowed down
   ```
   generate_series(0, 100, 5) 1.86838.9±8.69µs? ?/sec   
  1.00449.9±6.31µs? ?/sec
   generate_series(100)   2.25  3.5±0.05ms? ?/sec   
  1.00  1552.1±21.57µs? ?/sec
   ```
   
   Is that expected?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


alamb-ghbot commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678835978

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group  
improve-range-and-generate-series-for-int-64main
   -  

   generate_series(0, 100, 5) 1.86838.9±8.69µs? ?/sec   
  1.00449.9±6.31µs? ?/sec
   generate_series(100)   2.25  3.5±0.05ms? ?/sec   
  1.00  1552.1±21.57µs? ?/sec
   generate_series(100, 0, -5)1.00326.4±6.72µs? ?/sec   
  1.42462.1±6.47µs? ?/sec
   range(0, 100, 5)   1.00   393.8±12.61µs? ?/sec   
  1.15   453.1±10.31µs? ?/sec
   range(100) 1.00  1237.9±19.84µs? ?/sec   
  1.26  1559.6±24.79µs? ?/sec
   range(100, 0, -5)  1.00319.1±5.62µs? ?/sec   
  1.44   460.7±15.48µs? ?/sec
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


pepijnve commented on code in PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#discussion_r2637894371


##
datafusion/functions-table/src/generate_series.rs:
##
@@ -109,6 +133,59 @@ impl SeriesValue for i64 {
 fn display_value(&self) -> String {
 self.to_string()
 }
+
+fn generate_array_for_series(
+series_state: &mut GenericSeriesState,
+) -> Result {
+let array: Int64Array = if series_state.step > 0 {
+if series_state.include_end {
+Int64Array::from_iter_values(
+(series_state.current..=series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+} else {
+Int64Array::from_iter_values(
+(series_state.current..series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+}
+} else {
+// step < 0
+let cur = series_state.current as i128;

Review Comment:
   I gave it a go locally. This seems to do the trick:
   ```rust
   if series_state.include_end {
   Int64Array::from_iter_values(
   (series_state.end..=series_state.current)
   .rev()
   .step_by(-series_state.step as usize)
   .take(series_state.batch_size),
   )
   } else {
   Int64Array::from_iter_values(
   ((series_state.end + 1)..=series_state.current)
   .rev()
   .step_by(-series_state.step as usize)
   .take(series_state.batch_size),
   )
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


alamb commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678818049

   > @alamb I might increased the benchmark too much, is it stuck?
   
   I don't know what happened -- the runner had died. I restarted t


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


alamb-ghbot commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678817260

   🤖 `./gh_compare_branch_bench.sh` 
[compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing improve-range-and-generate-series-for-int-64 
(67b2d0404ed65f27526b1f803b18abeeaa7027e9) to 
4249e4ecd354e6060e2a4ea33f38552366299a4e 
[diff](https://github.com/apache/datafusion/compare/4249e4ecd354e6060e2a4ea33f38552366299a4e..67b2d0404ed65f27526b1f803b18abeeaa7027e9)
   BENCH_NAME=range_and_generate_series
   BENCH_COMMAND=cargo bench --features=parquet --bench 
range_and_generate_series
   BENCH_FILTER=
   BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


pepijnve commented on code in PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#discussion_r2637804725


##
datafusion/functions-table/src/generate_series.rs:
##
@@ -109,6 +133,59 @@ impl SeriesValue for i64 {
 fn display_value(&self) -> String {
 self.to_string()
 }
+
+fn generate_array_for_series(
+series_state: &mut GenericSeriesState,
+) -> Result {
+let array: Int64Array = if series_state.step > 0 {
+if series_state.include_end {
+Int64Array::from_iter_values(
+(series_state.current..=series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+} else {
+Int64Array::from_iter_values(
+(series_state.current..series_state.end)
+.step_by(series_state.step as usize)
+.take(series_state.batch_size),
+)
+}
+} else {
+// step < 0
+let cur = series_state.current as i128;

Review Comment:
   Can the same effect be achieved using a reverse iterator like:
   
   ```
   (series_state.end..=series_state.current).rev()
   ```
   
   Depending on whether end is exclusive or not, you may have to offset it by 
one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


rluvaton commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678672236

   show benchmarks queue
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-21 Thread via GitHub


rluvaton commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678672009

   show benchmark queue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-20 Thread via GitHub


rluvaton commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678216185

   @alamb I might increased the benchmark too much, is it stuck?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-20 Thread via GitHub


alamb-ghbot commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678179025

   🤖 `./gh_compare_branch_bench.sh` 
[compare_branch_bench.sh](https://github.com/alamb/datafusion-benchmarking/blob/main/scripts/compare_branch_bench.sh)
 Running
   Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing improve-range-and-generate-series-for-int-64 
(67b2d0404ed65f27526b1f803b18abeeaa7027e9) to 
4249e4ecd354e6060e2a4ea33f38552366299a4e 
[diff](https://github.com/apache/datafusion/compare/4249e4ecd354e6060e2a4ea33f38552366299a4e..67b2d0404ed65f27526b1f803b18abeeaa7027e9)
   BENCH_NAME=range_and_generate_series
   BENCH_COMMAND=cargo bench --features=parquet --bench 
range_and_generate_series
   BENCH_FILTER=
   BENCH_BRANCH_NAME=improve-range-and-generate-series-for-int-64
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] perf: improve `range` and `generate_series` for `Int64` [datafusion]

2025-12-20 Thread via GitHub


rluvaton commented on PR #19431:
URL: https://github.com/apache/datafusion/pull/19431#issuecomment-3678178987

   run benchmark range_and_generate_series


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]