[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-03-01 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r585042342



##
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##
@@ -168,20 +168,35 @@ fn maybe_data_types(
 pub fn can_coerce_from(type_into: , type_from: ) -> bool {
 use self::DataType::*;
 match type_into {
-Int8 => matches!(type_from, Int8),
-Int16 => matches!(type_from, Int8 | Int16 | UInt8),
-Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16),
+Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8),

Review comment:
   @jorgecarleitao @alamb 
   
   I would like to get the rest of these SQL functions implemented then pick up 
the type-coercion work. I was also thinking a `try_cast` implementation but it 
will be a lot of work. 
   
   I think we also need to produce a matrix like this: 
https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver15#implicit-conversions
 to work out the scope of the changes as I think we have a very imbalanced set 
of implementations at the moment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-03-01 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r585040519



##
File path: rust/datafusion/src/physical_plan/string_expressions.rs
##
@@ -361,10 +534,167 @@ pub fn ltrim(args: 
&[ArrayRef]) -> Result {
 }
 }
 
-/// Converts the string to all lower case.
-/// lower('TOM') = 'tom'
-pub fn lower(args: &[ColumnarValue]) -> Result {
-handle(args, |x| x.to_ascii_lowercase(), "lower")
+/// Returns last n characters in the string, or when n is negative, returns 
all but first |n| characters.
+/// right('abcde', 2) = 'de'
+pub fn right(args: &[ArrayRef]) -> Result {
+let string_array:  = args[0]
+.as_any()
+.downcast_ref::>()
+.ok_or_else(|| {
+DataFusionError::Internal("could not cast string to 
StringArray".to_string())
+})?;
+
+let n_array:  =
+args[1]
+.as_any()
+.downcast_ref::()
+.ok_or_else(|| {
+DataFusionError::Internal("could not cast n to 
Int64Array".to_string())
+})?;
+
+let result = string_array
+.iter()
+.enumerate()
+.map(|(i, x)| {
+if n_array.is_null(i) {
+None
+} else {
+x.map(|x: | {
+let n: i64 = n_array.value(i);

Review comment:
   Hi @jorgecarleitao thanks for this.
   
   My understanding is that one of the core properties of a `RecordBatch` is 
that all columns must have the same length: 
https://github.com/apache/arrow/blob/master/rust/arrow/src/record_batch.rs#L52 
implemented here: 
https://github.com/apache/arrow/blob/master/rust/arrow/src/record_batch.rs#L134
   
   From what I can see, if we did adopt a `zip` then we would implicitly be 
treating the shorter argument as a `None` which wont break the out of bounds 
check but might produce some very strange function results.
   
   I do agree with you that many of the core Rust Arrow implementations are 
throwing away the benefits of the Rust compiler so we should try to sensibly 
refactor for safety.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-02-28 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r584420423



##
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##
@@ -168,20 +168,35 @@ fn maybe_data_types(
 pub fn can_coerce_from(type_into: , type_from: ) -> bool {
 use self::DataType::*;
 match type_into {
-Int8 => matches!(type_from, Int8),
-Int16 => matches!(type_from, Int8 | Int16 | UInt8),
-Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16),
+Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8),

Review comment:
   No problem @alamb. How about I revert the coercion logic from this PR 
and re-add the explicit cast in the tests. I think there is a _major_ piece of 
work to fully address CAST/coercion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-02-25 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r583302248



##
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##
@@ -168,20 +168,35 @@ fn maybe_data_types(
 pub fn can_coerce_from(type_into: , type_from: ) -> bool {
 use self::DataType::*;
 match type_into {
-Int8 => matches!(type_from, Int8),
-Int16 => matches!(type_from, Int8 | Int16 | UInt8),
-Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16),
+Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8),

Review comment:
   @Dandandan @alamb if you agree I can try to add it as I still have the 
function work in my head.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-02-25 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r583147743



##
File path: rust/datafusion/tests/sql.rs
##
@@ -530,17 +530,6 @@ async fn sqrt_f32_vs_f64() -> Result<()> {
 Ok(())
 }
 
-#[tokio::test]
-async fn csv_query_error() -> Result<()> {
-// sin(utf8) should error
-let mut ctx = create_ctx()?;
-register_aggregate_csv( ctx)?;
-let sql = "SELECT sin(c1) FROM aggregate_test_100";
-let plan = ctx.create_logical_plan();
-assert!(plan.is_err());

Review comment:
   Ok ill re-add.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] seddonm1 commented on a change in pull request #9565: ARROW-11655: [Rust][DataFusion] Postgres String Functions: left, lpad, right, rpad

2021-02-25 Thread GitBox


seddonm1 commented on a change in pull request #9565:
URL: https://github.com/apache/arrow/pull/9565#discussion_r583147592



##
File path: rust/datafusion/src/physical_plan/type_coercion.rs
##
@@ -168,20 +168,35 @@ fn maybe_data_types(
 pub fn can_coerce_from(type_into: , type_from: ) -> bool {
 use self::DataType::*;
 match type_into {
-Int8 => matches!(type_from, Int8),
-Int16 => matches!(type_from, Int8 | Int16 | UInt8),
-Int32 => matches!(type_from, Int8 | Int16 | Int32 | UInt8 | UInt16),
+Int8 => matches!(type_from, Int8 | Utf8 | LargeUtf8),

Review comment:
   @alamb I think you have hit on a bigger issue.
   
   Postgres will do this type coercion silently: `SELECT LEFT('abcde', '1');` 
will return `a`.  And `SELECT LEFT('abcde', 'a');` will return `invalid input 
syntax for type integer: "a"`.
   
   I think the default return of a failed `CAST` in DataFusion is currently 
`NULL` which is not good and not an ANSI expected behavior. I will volunteer to 
fix it if we can reach a consensus.
   
   This is one of my major issues with Spark. Some other engines explicitly 
call out this behavior with `SAFE CAST`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org