Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
phillipleblanc commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3698277047 > @phillipleblanc are you ok to make progress with this PR? Do you need any help to move it along? I've unfortunately run out of time to work on this - I'll close this PR for now and let someone else take the issue forward. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
phillipleblanc closed pull request #19321: Add placeholder type inference for CASE expressions URL: https://github.com/apache/datafusion/pull/19321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
alamb commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3697404341 @phillipleblanc are you ok to make progress with this PR? Do you need any help to move it along? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
adriangb commented on code in PR #19321:
URL: https://github.com/apache/datafusion/pull/19321#discussion_r2644611809
##
datafusion/expr/src/expr.rs:
##
@@ -3929,6 +4056,366 @@ mod test {
}
}
+#[test]
+fn infer_placeholder_simple_case() {
Review Comment:
```suggestion
fn infer_case_expr_placeholder_simple() {
```
The name was a bit confusing to me
##
datafusion/expr/src/expr.rs:
##
@@ -3929,6 +4056,366 @@ mod test {
}
}
+#[test]
+fn infer_placeholder_simple_case() {
+// CASE department_id WHEN $1 THEN $2 WHEN $3 THEN 'Engineering' ELSE
$4 END
+let schema = Arc::new(Schema::new(vec![
+Field::new("department_id", DataType::Int32, true),
+Field::new("name", DataType::Utf8, true),
+]));
+let df_schema = DFSchema::try_from(schema).unwrap();
+
+let case_expr = case(col("department_id"))
+.when(placeholder("$1"), placeholder("$2"))
+.when(placeholder("$3"), lit("Engineering"))
+.otherwise(placeholder("$4"))
+.unwrap();
+
+let (inferred_expr, contains_placeholder) =
+case_expr.infer_placeholder_types(&df_schema).unwrap();
+
+assert!(contains_placeholder);
+
+match inferred_expr {
+Expr::Case(Case {
+expr,
+when_then_expr,
+else_expr,
+}) => {
+// Base expression should remain unchanged
+assert!(matches!(**expr.as_ref().unwrap(), Expr::Column(_)));
+
+// WHEN expressions should infer Int32 (to match department_id
column)
+for (when_expr, then_expr) in &when_then_expr {
+if let Expr::Placeholder(placeholder) = when_expr.as_ref()
{
+assert_eq!(
+placeholder.field.as_ref().unwrap().data_type(),
+&DataType::Int32,
+"WHEN placeholder {} should infer Int32",
+placeholder.id
+);
+}
+
+// THEN expressions should infer Utf8 (to match the known
literal)
+if let Expr::Placeholder(placeholder) = then_expr.as_ref()
{
+assert_eq!(
+placeholder.field.as_ref().unwrap().data_type(),
+&DataType::Utf8,
+"THEN placeholder {} should infer Utf8",
+placeholder.id
+);
+}
+}
+
+// ELSE expression should infer Utf8 (to match THEN
expressions)
+if let Some(Expr::Placeholder(placeholder)) =
+else_expr.as_ref().map(|e| e.as_ref())
+{
+assert_eq!(
+placeholder.field.as_ref().unwrap().data_type(),
+&DataType::Utf8,
+"ELSE placeholder {} should infer Utf8",
+placeholder.id
+);
+}
+}
+_ => panic!("Expected Case expression"),
+}
+}
+
+#[test]
+fn infer_placeholder_searched_case() {
Review Comment:
Is there any way we could make these slt tests?
##
datafusion/expr/src/expr.rs:
##
@@ -2024,6 +2025,110 @@ impl Expr {
| Expr::SimilarTo(Like { expr, pattern, .. }) => {
rewrite_placeholder(pattern.as_mut(), expr.as_ref(),
schema)?;
}
+Expr::Case(Case {
+expr,
+when_then_expr,
+else_expr,
+}) => {
+// If `expr` is present, then it must match the types of
each WHEN expression.
+// If `expr` is not present, then the type of each WHEN
expression must evaluate to a boolean.
+// The types of the THEN and ELSE expressions must match
the result type of the CASE expression.
+
+fn concrete_type(expr: &Expr, schema: &DFSchema) ->
Option {
Review Comment:
This seems like it could be a broader helper function? Maybe on `Expr`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
asolimando commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3670864541 > @asolimando @pepijnve Thanks for the reviews, I've pushed up a couple of commits which addresses your feedback to unify the type. LGTM, not blocking but it might be worth adding a few more test cases: - Mixed placeholders + literals in WHEN (for simple CASE) - Example: CASE col WHEN $1 WHEN 100 WHEN $2 - Type coercion conflicts (what if types can't be coerced?) - Multiple WHEN with different concrete types - Example: CASE WHEN true THEN 1::INT32 WHEN false THEN 2::INT64 ELSE $1 - Placeholder in base expr with no concrete WHEN - Example: CASE $1 WHEN $2 THEN 'X' (both untyped) The test harness is verbose, so I understand if these are too much to add. Cases 1 and 3 would probably give the most value if you want to add just a couple. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
pepijnve commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3670623617 I think we're heading in the right direction by reusing `get_coerce_type_for_case_expression`. Seeing that had me wondering if we could reuse more of the logic from `type_coercion.rs`. I gave it a go in https://github.com/phillipleblanc/datafusion/pull/1 as quick experiment. Seems to be feasible. Should we try that to avoid duplicate type coercion code? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
phillipleblanc commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3668458573 @asolimando @pepijnve Thanks for the reviews, I've pushed up a couple of commits which addresses your feedback to unify the type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
phillipleblanc commented on code in PR #19321:
URL: https://github.com/apache/datafusion/pull/19321#discussion_r2629625347
##
datafusion/expr/src/expr.rs:
##
@@ -2701,6 +2785,20 @@ fn rewrite_placeholder(expr: &mut Expr, other: &Expr,
schema: &DFSchema) -> Resu
Ok(())
}
+/// Sets the [`DataType`] of a [`Placeholder`] expression if it is currently
`None`.
+///
+/// This is used during placeholder type inference to assign a specific data
type
+/// to untyped placeholders based on context (e.g., the expected type from a
function signature).
+fn rewrite_placeholder_type(expr: &mut Expr, dt: &DataType) -> Result<()> {
+if let Expr::Placeholder(Placeholder { id: _, field }) = expr
+&& field.is_none()
+&& !dt.is_null()
+{
+*field = Some(Arc::new(Field::new("", dt.clone(), true)));
Review Comment:
I've changed `rewrite_placeholder` to set the field name to empty.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
pepijnve commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3660590063 > It makes sense, but I think its fine to use the first type as the hint for the placeholders - as long as it can be coerced into the unified type when the placeholders are assigned. Does this introduce the possibility of deriving too strict a type? In the current implementation for this example somewhere along the return type is derived to be `float32`. Does this type inference happen before or after that? In the case it happens before, the type of the placeholder would be set to `int64` while you might want it to be `float32` instead. ``` > explain select case a when 0 then ? when 1 then 1::bigint else 2::real end from values (0),(1),(2) t(a); +-+-+ | plan_type | plan | +-+-+ | logical_plan| Projection: CASE t.a WHEN Int64(0) THEN CAST(? AS Float32) WHEN Int64(1) THEN Float32(1) ELSE Float32(2) END AS CASE t.a WHEN Int64(0) THEN ? WHEN Int64(1) THEN Int64(1) ELSE Int64(2) END | | | SubqueryAlias: t | | | Projection: column1 AS a | | | Values: (Int64(0)), (Int64(1)), (Int64(2)) | | physical_plan_error | Execution error: Placeholder '?' was not provided a value for execution. | +-+-+ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
pepijnve commented on code in PR #19321:
URL: https://github.com/apache/datafusion/pull/19321#discussion_r2623296165
##
datafusion/expr/src/expr.rs:
##
@@ -2701,6 +2785,20 @@ fn rewrite_placeholder(expr: &mut Expr, other: &Expr,
schema: &DFSchema) -> Resu
Ok(())
}
+/// Sets the [`DataType`] of a [`Placeholder`] expression if it is currently
`None`.
+///
+/// This is used during placeholder type inference to assign a specific data
type
+/// to untyped placeholders based on context (e.g., the expected type from a
function signature).
+fn rewrite_placeholder_type(expr: &mut Expr, dt: &DataType) -> Result<()> {
+if let Expr::Placeholder(Placeholder { id: _, field }) = expr
+&& field.is_none()
+&& !dt.is_null()
+{
+*field = Some(Arc::new(Field::new("", dt.clone(), true)));
Review Comment:
Using the empty string here as the name of the field is appropriate, but we
do get a subtle difference in behaviour compared to `rewrite_placeholder`.
There the result of `Expr::to_field` is used verbatim, which may result in e.g.
the name of column popping up in the placeholder. Would it be preferable to
either always use the empty string of the id of the placeholder as field name
in `rewrite_placeholder` as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
phillipleblanc commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3660514420 > The SQL Standard (Foundation, Section 9.5) requires type inference over CASE expressions to unify/coerce over the whole result expressions (the THEN/ELSE). > > It doesn't cover prepared statements/placeholders explicitly, but I think we should at least unify all non-placeholder expressions, instead of the considering only the first one (which makes the result order-dependent). > > WDYT? It makes sense, but I think its fine to use the first type as the hint for the placeholders - as long as it can be coerced into the unified type when the placeholders are assigned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
asolimando commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3659277042 The SQL Standard (Foundation, Section 9.5) requires type inference over CASE expressions to unify/coerce over the whole result expressions (the THEN/ELSE). It doesn't cover prepared statements/placeholders explicitly, but I think we should at least unify all non-placeholder expressions, instead of the considering only the first one (which makes the result order-dependent). WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add placeholder type inference for CASE expressions [datafusion]
alamb commented on PR #19321: URL: https://github.com/apache/datafusion/pull/19321#issuecomment-3657663861 Thank you for this PR @phillipleblanc @pepijnve given your expertise in CASE I wonder if you might have time to review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
