[GitHub] [arrow-datafusion] nevi-me commented on a change in pull request #508: add expr::like and expr::notlike to pruning logic

2021-06-08 Thread GitBox


nevi-me commented on a change in pull request #508:
URL: https://github.com/apache/arrow-datafusion/pull/508#discussion_r647316738



##
File path: datafusion/src/physical_optimizer/pruning.rs
##
@@ -586,8 +587,47 @@ fn build_predicate_expression(
 .min_column_expr()?
 .lt_eq(expr_builder.scalar_expr().clone())
 }
+Operator::Like => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+// Split the string to get the first part before '%'
+let split = string.split('%').next().unwrap().to_string();

Review comment:
   Yes, won't panic because String::split() will always return at least 1 
result, the full string if there's nothing to spilt by




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-datafusion] nevi-me commented on a change in pull request #508: add expr::like and expr::notlike to pruning logic

2021-06-06 Thread GitBox


nevi-me commented on a change in pull request #508:
URL: https://github.com/apache/arrow-datafusion/pull/508#discussion_r646248751



##
File path: datafusion/src/physical_optimizer/pruning.rs
##
@@ -1095,6 +1133,60 @@ mod tests {
 Ok(())
 }
 
+#[test]
+fn row_group_predicate_starts_with() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that is converted to a 'starts_with'
+let expr = col("c1").like(lit("Banana%"));
+let expected_expr =
+"#c1_min LtEq Utf8(\"Banana\") And Utf8(\"Banana\") LtEq #c1_max";
+let predicate_expr =
+build_predicate_expression(, ,  
RequiredStatColumns::new())?;
+assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+
+Ok(())
+}
+
+#[test]
+fn row_group_predicate_like() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that can't be converted to a 'starts_with'
+let expr = col("c1").like(lit("%Banana%"));
+let expected_expr = "Boolean(true)";
+let predicate_expr =
+build_predicate_expression(, ,  
RequiredStatColumns::new())?;
+assert_eq!(format!("{:?}", predicate_expr), expected_expr);
+
+Ok(())
+}
+
+#[test]
+fn row_group_predicate_not_starts_with() -> Result<()> {
+let schema = Schema::new(vec![Field::new("c1", DataType::Utf8, true)]);
+// test LIKE operator that can't be converted to a 'starts_with'
+let expr = col("c1").not().like(lit("Banana%"));

Review comment:
   This explains why the filter was negated, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-datafusion] nevi-me commented on a change in pull request #508: add expr::like and expr::notlike to pruning logic

2021-06-06 Thread GitBox


nevi-me commented on a change in pull request #508:
URL: https://github.com/apache/arrow-datafusion/pull/508#discussion_r646248560



##
File path: datafusion/src/physical_optimizer/pruning.rs
##
@@ -586,8 +587,45 @@ fn build_predicate_expression(
 .min_column_expr()?
 .lt_eq(expr_builder.scalar_expr().clone())
 }
+Operator::Like => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+let scalar_expr =
+
Expr::Literal(ScalarValue::Utf8(Some(string.replace('%', "";
+// Behaves like Eq
+let min_column_expr = expr_builder.min_column_expr()?;
+let max_column_expr = expr_builder.max_column_expr()?;
+min_column_expr
+.lt_eq(scalar_expr.clone())
+.and(scalar_expr.lt_eq(max_column_expr))
+}
+_ => unhandled,
+}
+}
+Operator::NotLike => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+let scalar_expr =
+
Expr::Literal(ScalarValue::Utf8(Some(string.replace('%', "";

Review comment:
   Thanks, fixed and changed tests




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow-datafusion] nevi-me commented on a change in pull request #508: add expr::like and expr::notlike to pruning logic

2021-06-05 Thread GitBox


nevi-me commented on a change in pull request #508:
URL: https://github.com/apache/arrow-datafusion/pull/508#discussion_r645996852



##
File path: datafusion/src/physical_optimizer/pruning.rs
##
@@ -586,8 +587,45 @@ fn build_predicate_expression(
 .min_column_expr()?
 .lt_eq(expr_builder.scalar_expr().clone())
 }
+Operator::Like => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>
+{
+let scalar_expr =
+
Expr::Literal(ScalarValue::Utf8(Some(string.replace('%', "";
+// Behaves like Eq
+let min_column_expr = expr_builder.min_column_expr()?;
+let max_column_expr = expr_builder.max_column_expr()?;
+min_column_expr
+.lt_eq(scalar_expr.clone())
+.and(scalar_expr.lt_eq(max_column_expr))
+}
+_ => unhandled,
+}
+}
+Operator::NotLike => {
+match &**right {
+// If the literal is a 'starts_with'
+Expr::Literal(ScalarValue::Utf8(Some(string)))
+if !string.starts_with('%') =>

Review comment:
   I only focused on expressions that don't start with `%`, under the 
assumption that they would be a `starts_with`. I don't think we can support 
anything other than a `starts_with` because we translate the queries to `min 
LtEq value && value LtEq max`.
   
   Or how would `LIKE '100\% %'` be evaluated?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org