[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082729694


##
datafusion/common/src/utils.rs:
##
@@ -22,8 +22,16 @@ use arrow::array::ArrayRef;
 use arrow::compute::SortOptions;
 use std::cmp::Ordering;
 
+/// Given column vectors, returns row at `idx`.
+pub fn get_row_at_idx(columns: &[ArrayRef], idx: usize) -> 
Result> {
+columns
+.iter()
+.map(|arr| ScalarValue::try_from_array(arr, idx))
+.collect()
+}
+
 /// This function compares two tuples depending on the given sort options.
-fn compare(
+pub fn compare_rows(

Review Comment:
   This function is used both in `datafusion/common` and 
`datafusion/physical_expr.` Hence unfortunately `pub(crate)` wouldn't work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082669525


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(
+item_columns: &[ArrayRef],
+target: &[ScalarValue],
+sort_options: &[SortOptions],
+) -> Result {
+let low: usize = 0;
+let high: usize = item_columns
+.get(0)
+.ok_or_else(|| {
+DataFusionError::Internal("Column array shouldn't be 
empty".to_string())
+})?
+.len();
+let compare_fn = |current: &[ScalarValue], target: &[ScalarValue]| {

Review Comment:
   `LexicographicalComparator` API, compares values at the two indices and 
returns their ordering. This useful to find change detection, or partition 
boundaries. However, in our case we need to search for specific value inside 
Array (possibly not existing in the array.). However, maybe with some kind of 
tweak, we maybe able to use `LexicographicalComparator` for our use case. I 
will think about it in detail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082607725


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(

Review Comment:
   To make it compatible with `bisect` api, we add two versions; where one 
version takes `sort_options: &[SortOptions]`, other version takes `compare_fn: 
F`. We wanted to add both versions in case anyone need them. In terms of 
functionality first version is not necessary. However, since constructing 
comparator function from `SortOptions` is a bit cumbersome, we wanted to have 
that version also.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082607725


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(

Review Comment:
   To make it compatible with `bisect` api, we add two versions; where one 
version takes `sort_options: &[SortOptions]`, other version takes `compare_fn: 
F`. We want to put both in case anyone need it. In terms of functinality first 
version is not necessary. However, since constructing comparator function from 
`SortOptions` is a bit cumbersome, we wanted to have that version also.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4989: Add support for linear range calculation in WINDOW functions

2023-01-20 Thread GitBox


mustafasrepo commented on code in PR #4989:
URL: https://github.com/apache/arrow-datafusion/pull/4989#discussion_r1082607725


##
datafusion/common/src/utils.rs:
##
@@ -103,6 +111,53 @@ where
 Ok(low)
 }
 
+/// This function searches for a tuple of given values (`target`) among the 
given
+/// rows (`item_columns`) via a linear scan. It assumes that `item_columns` is 
sorted
+/// according to `sort_options` and returns the insertion index of `target`.
+/// Template argument `SIDE` being `true`/`false` means left/right insertion.
+pub fn linear_search(

Review Comment:
   To make it compatible with `bisect` api, we have one version taking 
`sort_options: &[SortOptions]`, one version taking `compare_fn: F`. We want to 
put it in case anyone need it. Since constructing comparator function from 
`SortOptions` is a bit cumbersome.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org