Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Jefffrey commented on PR #19341: URL: https://github.com/apache/datafusion/pull/19341#issuecomment-3685038710 Thanks @Omega359 & @martin-g -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Jefffrey merged PR #19341: URL: https://github.com/apache/datafusion/pull/19341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Jefffrey commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2638734239
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
DataType::Interval(DayTime),
Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),
Review Comment:
I was more thinking of trying to simplify the signature (which looks a bit
gnarly) and potentially the implementation too. In terms of actual
functionality it should be fine.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2637885648
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
DataType::Interval(DayTime),
Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),
Review Comment:
I took a look at this and it's a bit more than just the timeunit - it's the
type. For time32 there is just two timeunits supported - Second and
Millisecond. For Time64 it's Microsecond and Nanosecond.
Supporting a mix of those I think would be unnecessary. Forcing a single
type (`Time64(Nanosecond)`) might work but I don't see it as beneficial really.
Are you of a different opinion?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636496819
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +667,82 @@ fn date_bin_impl(
origin, stride, stride_fn, array, tz_opt,
)?
}
+Time32(Millisecond) => {
+if !is_time {
Review Comment:
I can't think of one that shouldn't be coerced to time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636493036
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -193,11 +245,17 @@ impl ScalarUDFImpl for DateBinFunc {
) -> Result {
let args = &args.args;
if args.len() == 2 {
-// Default to unix EPOCH
-let origin =
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
-Some(0),
-Some("+00:00".into()),
-));
+let origin = if matches!(args[1].data_type(), Time32(_)) {
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(0)))
+} else if matches!(args[1].data_type(), Time64(_)) {
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(0)))
Review Comment:
True, I'll take a look at this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2636492155
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
DataType::Interval(DayTime),
Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),
Review Comment:
It didn't occur to me essentially :/
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Jefffrey commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2635645465
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -193,11 +245,17 @@ impl ScalarUDFImpl for DateBinFunc {
) -> Result {
let args = &args.args;
if args.len() == 2 {
-// Default to unix EPOCH
-let origin =
ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(
-Some(0),
-Some("+00:00".into()),
-));
+let origin = if matches!(args[1].data_type(), Time32(_)) {
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(0)))
+} else if matches!(args[1].data_type(), Time64(_)) {
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(0)))
Review Comment:
It seems odd that the signature requires origin (if provided) & expression
to have the same time granularity, but if origin is omitted we provide
granularity of only second or nanosecond; so we could have expression of
time64(microseconds) with origin of time64(nanoseconds)?
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -176,13 +229,12 @@ impl ScalarUDFImpl for DateBinFunc {
fn return_type(&self, arg_types: &[DataType]) -> Result {
match &arg_types[1] {
-Timestamp(Nanosecond, None) | Utf8 | Null =>
Ok(Timestamp(Nanosecond, None)),
-Timestamp(Nanosecond, tz_opt) => Ok(Timestamp(Nanosecond,
tz_opt.clone())),
-Timestamp(Microsecond, tz_opt) => Ok(Timestamp(Microsecond,
tz_opt.clone())),
-Timestamp(Millisecond, tz_opt) => Ok(Timestamp(Millisecond,
tz_opt.clone())),
-Timestamp(Second, tz_opt) => Ok(Timestamp(Second, tz_opt.clone())),
+Utf8 | LargeUtf8 | Utf8View | Null => Ok(Timestamp(Nanosecond,
None)),
Review Comment:
```suggestion
```
Signature doesn't allow string arguments (or null for that matter)
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +667,82 @@ fn date_bin_impl(
origin, stride, stride_fn, array, tz_opt,
)?
}
+Time32(Millisecond) => {
+if !is_time {
Review Comment:
Is it possible to provide a non-time origin when the expression is time?
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -146,7 +162,44 @@ impl DateBinFunc {
DataType::Interval(DayTime),
Timestamp(array_type, Some(TIMEZONE_WILDCARD.into())),
]),
-]
+];
+
+match array_type {
+Second | Millisecond => {
+v.append(&mut vec![
+Exact(vec![
+DataType::Interval(MonthDayNano),
+Time32(array_type),
+Time32(array_type),
Review Comment:
Could you explain why for time we require the expression & origin arguments
to have the same granularity in terms of time units, whereas for timestamps
above we seem to always have origin as Nanoseconds regardless of the expression
granularity?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
martin-g commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2626048002
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +656,82 @@ fn date_bin_impl(
origin, stride, stride_fn, array, tz_opt,
)?
}
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos =
+stride_fn(stride, x as i64 * NANOS_PER_MILLI,
origin);
+let nanos = binned_nanos % (NANOSECONDS_IN_DAY);
+(nanos / NANOS_PER_MILLI) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos =
+stride_fn(stride, x as i64 * NANOS_PER_SEC,
origin);
+let nanos = binned_nanos % (NANOSECONDS_IN_DAY);
+(nanos / NANOS_PER_SEC) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i64| {
+let binned_nanos = stride_fn(stride, x, origin);
Review Comment:
```suggestion
let binned_nanos = stride_fn(stride, x *
NANOS_PER_MICRO, origin);
```
##
datafusion/functions/src/datetime/date_bin.rs:
##
Review Comment:
Maybe add a new example for `time` too ? Currently there are two examples
for timestamps
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on PR #19341: URL: https://github.com/apache/datafusion/pull/19341#issuecomment-3661934872 Thanks for the review @martin-g. I've implemented your review feedback - sorry for the numerous bugs in that initial version :( -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
Omega359 commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2624378376
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -370,11 +419,53 @@ fn date_bin_impl(
}
};
-let origin = match origin {
-ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(v), _)) =>
*v,
+let (origin, is_time) = match origin {
+ColumnarValue::Scalar(ScalarValue::TimestampNanosecond(Some(v), _)) =>
{
+(*v, false)
+}
+ColumnarValue::Scalar(ScalarValue::Time32Second(Some(v))) => {
+match stride {
+Interval::Months(m) => {
+if m > 0 {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than
1 day"
+);
+}
+}
+Interval::Nanoseconds(ns) => {
+if ns >= ONE_DAY_IN_NS {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than
1 day"
+);
+}
+}
+}
+
+(*v as i64, true)
+}
+ColumnarValue::Scalar(ScalarValue::Time64Nanosecond(Some(v))) => {
+match stride {
+Interval::Months(m) => {
+if m > 0 {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than
1 day"
+);
+}
+}
+Interval::Nanoseconds(ns) => {
+if ns >= ONE_DAY_IN_NS {
+return exec_err!(
+"DATE_BIN stride for TIME input must be less than
1 day"
+);
+}
+}
+}
+
+(*v, true)
+}
ColumnarValue::Scalar(v) => {
Review Comment:
Yes, thanks, fixing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Update date_bin to support Time32 and Time64 data types [datafusion]
martin-g commented on code in PR #19341:
URL: https://github.com/apache/datafusion/pull/19341#discussion_r2623042483
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +614,82 @@ fn date_bin_impl(
origin, stride, stride_fn, array, tz_opt,
)?
}
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64
origin"
+);
+}
+use arrow::array::cast::AsArray;
Review Comment:
This is already imported at line 27 as `use arrow::array::AsArray;`
##
datafusion/functions/src/datetime/date_bin.rs:
##
@@ -481,17 +614,82 @@ fn date_bin_impl(
origin, stride, stride_fn, array, tz_opt,
)?
}
+Time32(Millisecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time32(Second) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time32 source requires Time32
origin"
+);
+}
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i32| {
+let binned_nanos = stride_fn(stride, x as i64, origin);
+let nanos = binned_nanos % (ONE_DAY_IN_NS);
+(nanos / 1_000_000_000) as i32
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Microsecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 source requires Time64
origin"
+);
+}
+use arrow::array::cast::AsArray;
+let array = array.as_primitive::();
+let apply_stride_fn = move |x: i64| {
+let binned_nanos = stride_fn(stride, x, origin);
+binned_nanos % (ONE_DAY_IN_NS)
+};
+let array: PrimitiveArray =
+array.unary(apply_stride_fn);
+ColumnarValue::Array(Arc::new(array))
+}
+Time64(Nanosecond) => {
+if !is_time {
+return exec_err!(
+"DATE_BIN with Time64 s
