[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r585042342 ## File path: rust/datafusion/src/physical_plan/type_coercion.rs ## @@ -168,20 +168,35 @@ fn maybe_data_types( pub fn can_coerce_from(type_into: , type_from: ) -> bool { use self::DataType::*; match type_into { -Int8 => matches!(type_from, Int8), -Int16 => matches!(type_from, Int8 | Int16 | UInt8), -Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16), +Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8), Review comment: @jorgecarleitao @alamb I would like to get the rest of these SQL functions implemented then pick up the type-coercion work. I was also thinking a `try_cast` implementation but it will be a lot of work. I think we also need to produce a matrix like this: https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver15#implicit-conversions to work out the scope of the changes as I think we have a very imbalanced set of implementations at the moment. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r585040519 ## File path: rust/datafusion/src/physical_plan/string_expressions.rs ## @@ -361,10 +534,167 @@ pub fn ltrim(args: &[ArrayRef]) -> Result { } } -/// Converts the string to all lower case. -/// lower('TOM') = 'tom' -pub fn lower(args: &[ColumnarValue]) -> Result { -handle(args, |x| x.to_ascii_lowercase(), "lower") +/// Returns last n characters in the string, or when n is negative, returns all but first |n| characters. +/// right('abcde', 2) = 'de' +pub fn right(args: &[ArrayRef]) -> Result { +let string_array: = args[0] +.as_any() +.downcast_ref::>() +.ok_or_else(|| { +DataFusionError::Internal("could not cast string to StringArray".to_string()) +})?; + +let n_array: = +args[1] +.as_any() +.downcast_ref::() +.ok_or_else(|| { +DataFusionError::Internal("could not cast n to Int64Array".to_string()) +})?; + +let result = string_array +.iter() +.enumerate() +.map(|(i, x)| { +if n_array.is_null(i) { +None +} else { +x.map(|x: | { +let n: i64 = n_array.value(i); Review comment: Hi @jorgecarleitao thanks for this. My understanding is that one of the core properties of a `RecordBatch` is that all columns must have the same length: https://github.com/apache/arrow/blob/master/rust/arrow/src/record_batch.rs#L52 implemented here: https://github.com/apache/arrow/blob/master/rust/arrow/src/record_batch.rs#L134 From what I can see, if we did adopt a `zip` then we would implicitly be treating the shorter argument as a `None` which wont break the out of bounds check but might produce some very strange function results. I do agree with you that many of the core Rust Arrow implementations are throwing away the benefits of the Rust compiler so we should try to sensibly refactor for safety. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r584420423 ## File path: rust/datafusion/src/physical_plan/type_coercion.rs ## @@ -168,20 +168,35 @@ fn maybe_data_types( pub fn can_coerce_from(type_into: , type_from: ) -> bool { use self::DataType::*; match type_into { -Int8 => matches!(type_from, Int8), -Int16 => matches!(type_from, Int8 | Int16 | UInt8), -Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16), +Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8), Review comment: No problem @alamb. How about I revert the coercion logic from this PR and re-add the explicit cast in the tests. I think there is a _major_ piece of work to fully address CAST/coercion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r583302248 ## File path: rust/datafusion/src/physical_plan/type_coercion.rs ## @@ -168,20 +168,35 @@ fn maybe_data_types( pub fn can_coerce_from(type_into: , type_from: ) -> bool { use self::DataType::*; match type_into { -Int8 => matches!(type_from, Int8), -Int16 => matches!(type_from, Int8 | Int16 | UInt8), -Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16), +Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8), Review comment: @Dandandan @alamb if you agree I can try to add it as I still have the function work in my head. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r583147743 ## File path: rust/datafusion/tests/sql.rs ## @@ -530,17 +530,6 @@ async fn sqrt_f32_vs_f64() -> Result<()> { Ok(()) } -#[tokio::test] -async fn csv_query_error() -> Result<()> { -// sin(utf8) should error -let mut ctx = create_ctx()?; -register_aggregate_csv( ctx)?; -let sql = "SELECT sin(c1) FROM aggregate_test_100"; -let plan = ctx.create_logical_plan(); -assert!(plan.is_err()); Review comment: Ok ill re-add. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad
seddonm1 commented on a change in pull request #9565: URL: https://github.com/apache/arrow/pull/9565#discussion_r583147592 ## File path: rust/datafusion/src/physical_plan/type_coercion.rs ## @@ -168,20 +168,35 @@ fn maybe_data_types( pub fn can_coerce_from(type_into: , type_from: ) -> bool { use self::DataType::*; match type_into { -Int8 => matches!(type_from, Int8), -Int16 => matches!(type_from, Int8 | Int16 | UInt8), -Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16), +Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8), Review comment: @alamb I think you have hit on a bigger issue. Postgres will do this type coercion silently: `SELECT LEFT('abcde', '1');` will return `a`. And `SELECT LEFT('abcde', 'a');` will return `invalid input syntax for type integer: "a"`. I think the default return of a failed `CAST` in DataFusion is currently `NULL` which is not good and not an ANSI expected behavior. I will volunteer to fix it if we can reach a consensus. This is one of my major issues with Spark. Some other engines explicitly call out this behavior with `SAFE CAST`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org