Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-26 Thread via GitHub


cloud-fan closed pull request #46165: [SPARK-47351][SQL] Add collation support 
for StringToMap & Mask string expressions
URL: https://github.com/apache/spark/pull/46165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-26 Thread via GitHub


cloud-fan commented on PR #46165:
URL: https://github.com/apache/spark/pull/46165#issuecomment-2079297307

   thanks, merging to master!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-25 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1579263303


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.immutable.Seq
+
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.{MapType, StringType}
+
+// scalastyle:off nonascii
+class CollationSQLExpressionsSuite

Review Comment:
   for example, Mihailo is working on some more expressions right now, and they 
don't fall under stringExpressions nor regexpExpressions, so I suggest we keep 
`CollationSQLExpressionsSuite.scala` for these purposes
   
   and then we can migrate CollationStringExpressionsSuite & 
CollationRegexpExpressionsSuite into CollationSQLExpressionsSuite later on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-24 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1578946140


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.immutable.Seq
+
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.{MapType, StringType}
+
+// scalastyle:off nonascii
+class CollationSQLExpressionsSuite

Review Comment:
   We do have separate suites for `stringExpressions` and `regexpExpressions`, 
which are the two larger "groups" of SQL expressions. So now it's a bit unclear 
on where to put these tests, and there may be more to come for some expression 
that don't fall into any of the previous 2 categories
   
   So it's either multiple smaller suites or 1 large suite to cover them all. 
Should we go with this for now and then I'll shuffle these tests around in a 
separate effort in order to unify them under a single suite?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-24 Thread via GitHub


cloud-fan commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1578940945


##
sql/core/src/test/scala/org/apache/spark/sql/CollationSQLExpressionsSuite.scala:
##
@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import scala.collection.immutable.Seq
+
+import org.apache.spark.sql.test.SharedSparkSession
+import org.apache.spark.sql.types.{MapType, StringType}
+
+// scalastyle:off nonascii
+class CollationSQLExpressionsSuite

Review Comment:
   We already have multiple test suite for string collation, can you justify 
the need of a new suite?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-24 Thread via GitHub


cloud-fan commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1577440230


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -79,12 +81,14 @@ import org.apache.spark.unsafe.types.UTF8String
 object MaskExpressionBuilder extends ExpressionBuilder {
   override def functionSignature: Option[FunctionSignature] = {
 val strArg = InputParameter("str")
-val upperCharArg = InputParameter("upperChar", 
Some(Literal(Mask.MASKED_UPPERCASE)))
-val lowerCharArg = InputParameter("lowerChar", 
Some(Literal(Mask.MASKED_LOWERCASE)))
-val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
-val otherCharArg = InputParameter(
-  "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+val upperCharArg = InputParameter("upperChar",
+  Some(Literal(Mask.MASKED_UPPERCASE, SQLConf.get.defaultStringType)))

Review Comment:
   nit: we can use `Literal.create`, so that we don't need to convert String to 
UTF8String ahead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-23 Thread via GitHub


cloud-fan commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575775863


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   We did this before but reverted. `Literal` is a low-level thing and it's 
better to not access SQLConf there.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-23 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575760634


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   update: I fixed this in the latest commit, but I don't like the idea of 
doing this manually
   
   instead, I think we should consider updating `literals.scala` to construct 
string literals with respect to `defaultStringType` instead of `StringType`
   
   if that sound good @cloud-fan, then I'll get to that a separate effort



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575736465


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   Agreed, all 4 string literals should be `SQLConf.get.defaultStringType` here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


mihailom-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575723275


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   Why are we then leaving digitCharArg as StringType? Either they all take 
StringType or SQLConf.get.defaultStringType.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575711699


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   `MASKED_IGNORE` is null, so we have to specify a type for this literal in 
func signature
   so instead of `StringType(0)` it should only be correct to get the default 
string type
   
   we enforce implicit cast for this expression, so in a situation where the 
first parameter takes the default string type (for example when not using 
explicit collation) there would be a conflict with this `StringType(0)` if it 
were to stay here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575701941


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule {
 
 case otherExpr @ (
   _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: 
Greatest | _: Least |
-  _: Coalesce | _: BinaryExpression | _: ConcatWs) =>
+  _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) =>

Review Comment:
   For `StringToMap`, the second and third parameters are treated as regex:
   
   > Both `pairDelim` and `keyValueDelim` are treated as regular expressions."
   
   so no casting there, and their collation shouldn't be taken into 
consideration



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


cloud-fan commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575699776


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala:
##
@@ -84,7 +86,7 @@ object MaskExpressionBuilder extends ExpressionBuilder {
 val digitCharArg = InputParameter("digitChar", 
Some(Literal(Mask.MASKED_DIGIT)))
 val otherCharArg = InputParameter(
   "otherChar",
-  Some(Literal(Mask.MASKED_IGNORE, StringType)))
+  Some(Literal(Mask.MASKED_IGNORE, SQLConf.get.defaultStringType)))

Review Comment:
   why do we use the default string type as the expected input type here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


uros-db commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575701941


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule {
 
 case otherExpr @ (
   _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: 
Greatest | _: Least |
-  _: Coalesce | _: BinaryExpression | _: ConcatWs) =>
+  _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) =>

Review Comment:
   For StringToMap, the second and third parameters are treated as regex:
   `Both `pairDelim` and `keyValueDelim` are treated as regular expressions."`
   
   so no casting there, and their collation shouldn't be taken into 
consideration



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47351][SQL] Add collation support for StringToMap & Mask string expressions [spark]

2024-04-22 Thread via GitHub


cloud-fan commented on code in PR #46165:
URL: https://github.com/apache/spark/pull/46165#discussion_r1575700039


##
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CollationTypeCasts.scala:
##
@@ -54,7 +54,7 @@ object CollationTypeCasts extends TypeCoercionRule {
 
 case otherExpr @ (
   _: In | _: InSubquery | _: CreateArray | _: ArrayJoin | _: Concat | _: 
Greatest | _: Least |
-  _: Coalesce | _: BinaryExpression | _: ConcatWs) =>
+  _: Coalesce | _: BinaryExpression | _: ConcatWs | _: Mask) =>

Review Comment:
   should `StringToMap` have the same requirement?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org