Re: [PR] Expand LIKE simplification [datafusion]

2024-11-06 Thread via GitHub


alamb commented on code in PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#discussion_r1830873028


##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -2987,41 +3051,41 @@ mod tests {
 })
 }
 
-fn like(expr: Expr, pattern: &str) -> Expr {
+fn like(expr: Expr, pattern: impl Into) -> Expr {

Review Comment:
   I think we could use Expr::like 
https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#method.like
 and similar here instead of these functions
   
   This is likely left over from when the Expr API was less expressive



##
datafusion/sqllogictest/test_files/string/string_view.slt:
##
@@ -396,8 +396,9 @@ EXPLAIN SELECT
 FROM test;
 
 logical_plan
-01)Projection: test.column1_utf8view LIKE Utf8View("foo") AS like, 
test.column1_utf8view ILIKE Utf8View("foo") AS ilike
-02)--TableScan: test projection=[column1_utf8view]
+01)Projection: __common_expr_1 AS like, __common_expr_1 AS ilike

Review Comment:
   Nice!
   
   Though I think this test's meaning is now changed it is supposed to be 
verifying cast  exprs for like
   
   Perhaps you can change the patterns to `like '%foo%'`



##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -1470,34 +1471,67 @@ impl<'a, S: SimplifyInfo> TreeNodeRewriter for 
Simplifier<'a, S> {
 }) => Transformed::yes(simplify_regex_expr(left, op, right)?),
 
 // Rules for Like
-Expr::Like(Like {
-expr,
-pattern,
-negated,
-escape_char: _,
-case_insensitive: _,
-}) if matches!(
-pattern.as_ref(),
-Expr::Literal(ScalarValue::Utf8(Some(pattern_str))) if 
pattern_str == "%"
-) || matches!(
-pattern.as_ref(),
-Expr::Literal(ScalarValue::LargeUtf8(Some(pattern_str))) if 
pattern_str == "%"
-) || matches!(
-pattern.as_ref(),
-Expr::Literal(ScalarValue::Utf8View(Some(pattern_str))) if 
pattern_str == "%"
-) =>
-{
-// exp LIKE '%' is
-//   - when exp is not NULL, it's true
-//   - when exp is NULL, it's NULL
-// exp NOT LIKE '%' is
-//   - when exp is not NULL, it's false
-//   - when exp is NULL, it's NULL
-Transformed::yes(Expr::Case(Case {
-expr: Some(Box::new(Expr::IsNotNull(expr))),
-when_then_expr: vec![(Box::new(lit(true)), 
Box::new(lit(!negated)))],
-else_expr: None,
-}))
+Expr::Like(like) => {
+match as_string_scalar(&like.pattern) {
+Some((data_type, pattern_str)) => {
+match pattern_str {
+None => return 
Ok(Transformed::yes(lit_bool_null())),
+Some(pattern_str) if pattern_str == "%" => {
+// exp LIKE '%' is
+//   - when exp is not NULL, it's true
+//   - when exp is NULL, it's NULL
+// exp NOT LIKE '%' is
+//   - when exp is not NULL, it's false
+//   - when exp is NULL, it's NULL
+let result_for_non_null = lit(!like.negated);
+Transformed::yes(if 
!info.nullable(&like.expr)? {
+result_for_non_null
+} else {
+Expr::Case(Case {
+expr: 
Some(Box::new(Expr::IsNotNull(like.expr))),
+when_then_expr: vec![(
+Box::new(lit(true)),
+Box::new(result_for_non_null),
+)],
+else_expr: None,
+})
+})
+}
+Some(pattern_str)

Review Comment:
   I am not sure about this rule (see questions below) -- maybe you could add 
some comments explanation on what it is supposed to do. 
   



##
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs:
##
@@ -3633,32 +3697,123 @@ mod tests {
 
 #[test]
 fn test_like_and_ilke() {
-// LIKE '%'
-let expr = like(col("c1"), "%");
+let null = lit(ScalarValue::Utf8(None));
+
+// expr [NOT] [I]LIKE NULL
+let expr = like(col("c1"), null.clone());
+assert_eq!(simplify(expr), lit_bool_null());
+
+let expr = not_like(col("c1"), null

Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2458895362

   @goldmedal rebased, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


goldmedal commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2458624344

   > draft - to be rebased after #13259 lands
   > 
   > still ready to review cc @crepererum @goldmedal
   
   #13259 has been merged.  šŸ‘ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on code in PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#discussion_r1829718515


##
datafusion/optimizer/Cargo.toml:
##
@@ -47,6 +47,7 @@ indexmap = { workspace = true }
 itertools = { workspace = true }
 log = { workspace = true }
 paste = "1.0.14"
+regex = "1.11.0"

Review Comment:
   good catch, done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


goldmedal commented on code in PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#discussion_r1829563391


##
datafusion/optimizer/Cargo.toml:
##
@@ -47,6 +47,7 @@ indexmap = { workspace = true }
 itertools = { workspace = true }
 log = { workspace = true }
 paste = "1.0.14"
+regex = "1.11.0"

Review Comment:
   ```suggestion
   regex = { workspace = true }
   ```
   Although the version is different, I think it works well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2457460720

   draft - to be rebased after https://github.com/apache/datafusion/pull/13259 
lands
   
   still ready to review
   cc @crepererum @goldmedal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


adriangb commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2457442850

   I’m happy with my changes being included in this PR :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2457435093

   seemed easy enough, done.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2457414646

   CI is green, so ready to review
   but this will conflict with @adriangb's 
https://github.com/apache/datafusion/pull/13061, because this changes code 
structure a little bit
   conflict resolution will be trivial though.
   @adriangb if you want, i can do this, adding your changes to this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Expand LIKE simplification [datafusion]

2024-11-05 Thread via GitHub


findepi commented on PR #13260:
URL: https://github.com/apache/datafusion/pull/13260#issuecomment-2457210525

   currently depends on https://github.com/apache/datafusion/pull/13259


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]