Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-22 Thread via GitHub


Jefffrey commented on PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#issuecomment-3685038710

   Thanks @Omega359 & @martin-g 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-22 Thread via GitHub


Jefffrey merged PR #19341:
URL: https://github.com/apache/datafusion/pull/19341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-21 Thread via GitHub


Jefffrey commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2638734239


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
 DataType::Interval(DayTime),
 Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
 ]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),

Review Comment:
   I was more thinking of trying to simplify the signature (which looks a bit 
gnarly) and potentially the implementation too. In terms of actual 
functionality it should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-21 Thread via GitHub


Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2637885648


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
 DataType::Interval(DayTime),
 Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
 ]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),

Review Comment:
   I took a look at this and it's a bit more than just the timeunit - it's the 
type. For time32 there is just two timeunits supported - Second and 
Millisecond. For Time64 it's Microsecond and Nanosecond. 
   
   Supporting a mix of those I think would be unnecessary. Forcing a single 
type (`Time64(Nanosecond)`) might work but I don't see it as beneficial really. 
Are you of a different opinion?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-19 Thread via GitHub


Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636496819


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +667,82 @@ fn date_bin_impl(
 origin, stride, stride_fn, array, tz_opt,
 )?
 }
+Time32(Millisecond) => {
+if !is_time {

Review Comment:
   I can't think of one that shouldn't be coerced to time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-19 Thread via GitHub


Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636493036


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -193,11 +245,17 @@ impl ScalarUDFImpl for DateBinFunc {
 ) -> Result {
 let args = &args.args;
 if args.len() == 2 {
-// Default to unix EPOCH
-let origin = 
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
-Some(0),
-Some("+00:00".into()),
-));
+let origin = if matches!(args[1].data_type(), Time32(_)) {
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(0)))
+} else if matches!(args[1].data_type(), Time64(_)) {
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(0)))

Review Comment:
   True, I'll take a look at this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-19 Thread via GitHub


Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636492155


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
 DataType::Interval(DayTime),
 Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
 ]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),

Review Comment:
   It didn't occur to me essentially :/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-19 Thread via GitHub


Jefffrey commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2635645465


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -193,11 +245,17 @@ impl ScalarUDFImpl for DateBinFunc {
 ) -> Result {
 let args = &args.args;
 if args.len() == 2 {
-// Default to unix EPOCH
-let origin = 
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
-Some(0),
-Some("+00:00".into()),
-));
+let origin = if matches!(args[1].data_type(), Time32(_)) {
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(0)))
+} else if matches!(args[1].data_type(), Time64(_)) {
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(0)))

Review Comment:
   It seems odd that the signature requires origin (if provided) & expression 
to have the same time granularity, but if origin is omitted we provide 
granularity of only second or nanosecond; so we could have expression of 
time64(microseconds) with origin of time64(nanoseconds)?



##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -176,13 +229,12 @@ impl ScalarUDFImpl for DateBinFunc {
 
 fn return_type(&self, arg_types: &[DataType]) -> Result {
 match &arg_types[1] {
-Timestamp(Nanosecond, None) | Utf8 | Null => 
Ok(Timestamp(Nanosecond, None)),
-Timestamp(Nanosecond, tz_opt) => Ok(Timestamp(Nanosecond, 
tz_opt.clone())),
-Timestamp(Microsecond, tz_opt) => Ok(Timestamp(Microsecond, 
tz_opt.clone())),
-Timestamp(Millisecond, tz_opt) => Ok(Timestamp(Millisecond, 
tz_opt.clone())),
-Timestamp(Second, tz_opt) => Ok(Timestamp(Second, tz_opt.clone())),
+Utf8 | LargeUtf8 | Utf8View | Null => Ok(Timestamp(Nanosecond, 
None)),

Review Comment:
   ```suggestion
   ```
   
   Signature doesn't allow string arguments (or null for that matter)



##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +667,82 @@ fn date_bin_impl(
 origin, stride, stride_fn, array, tz_opt,
 )?
 }
+Time32(Millisecond) => {
+if !is_time {

Review Comment:
   Is it possible to provide a non-time origin when the expression is time?



##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
 DataType::Interval(DayTime),
 Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
 ]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),

Review Comment:
   Could you explain why for time we require the expression & origin arguments 
to have the same granularity in terms of time units, whereas for timestamps 
above we seem to always have origin as Nanoseconds regardless of the expression 
granularity?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-17 Thread via GitHub


martin-g commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2626048002


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +656,82 @@ fn date_bin_impl(
 origin, stride, stride_fn, array, tz_opt,
 )?
 }
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos =
+stride_fn(stride, x as i64 * NANOS_PER_MILLI, 
origin);
+let nanos = binned_nanos % (NANOSECONDS_IN_DAY);
+(nanos / NANOS_PER_MILLI) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos =
+stride_fn(stride, x as i64 * NANOS_PER_SEC, 
origin);
+let nanos = binned_nanos % (NANOSECONDS_IN_DAY);
+(nanos / NANOS_PER_SEC) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i64| {
+let binned_nanos = stride_fn(stride, x, origin);

Review Comment:
   ```suggestion
   let binned_nanos = stride_fn(stride, x * 
NANOS_PER_MICRO, origin);
   ```



##
datafusion/functions/src/datetime/date_bin.rs:
##


Review Comment:
   Maybe add a new example for `time` too ? Currently there are two examples 
for timestamps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-16 Thread via GitHub


Omega359 commented on PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#issuecomment-3661934872

   Thanks for the review @martin-g. I've implemented your review feedback - 
sorry for the numerous bugs in that initial version :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-16 Thread via GitHub


Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2624378376


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -370,11 +419,53 @@ fn date_bin_impl(
 }
 };
 
-let origin = match origin {
-ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(v), _)) => 
*v,
+let (origin, is_time) = match origin {
+ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(v), _)) => 
{
+(*v, false)
+}
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(v))) => {
+match stride {
+Interval::Months(m) => {
+if m > 0 {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than 
1 day"
+);
+}
+}
+Interval::Nanoseconds(ns) => {
+if ns >= ONE_DAY_IN_NS {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than 
1 day"
+);
+}
+}
+}
+
+(*v as i64, true)
+}
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(v))) => {
+match stride {
+Interval::Months(m) => {
+if m > 0 {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than 
1 day"
+);
+}
+}
+Interval::Nanoseconds(ns) => {
+if ns >= ONE_DAY_IN_NS {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than 
1 day"
+);
+}
+}
+}
+
+(*v, true)
+}
 ColumnarValue::Scalar(v) => {

Review Comment:
   Yes, thanks, fixing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]

2025-12-16 Thread via GitHub


martin-g commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2623042483


##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +614,82 @@ fn date_bin_impl(
 origin, stride, stride_fn, array, tz_opt,
 )?
 }
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64 
origin"
+);
+}
+use arrow::array::cast::AsArray;

Review Comment:
   This is already imported at line 27 as `use arrow::array::AsArray;`



##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +614,82 @@ fn date_bin_impl(
 origin, stride, stride_fn, array, tz_opt,
 )?
 }
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32 
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64 
origin"
+);
+}
+use arrow::array::cast::AsArray;
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i64| {
+let binned_nanos = stride_fn(stride, x, origin);
+binned_nanos % (ONE_DAY_IN_NS)
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Nanosecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 s