[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051578 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.errors.QueryErrorsBase +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. + The function replaces characters with 'X' or 'x', and numbers with 'n'. + This can be useful for creating copies of tables with sensitive information removed. + Error behavior: null value as replacement argument will throw AnalysisError. + """, + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 + """, + examples = """ +Examples: + > SELECT _FUNC_('abcd-EFGH-8765-4321'); +--- + > SELECT _FUNC_('abcd-EFGH-8765-4321', 'Q'); +--- + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#'); +XxXXnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q'); +QxQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q', 'd'); +QqQQddd-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q', 'd', 'o'); +QqQQddd + > SELECT _FUNC_('AbCD123-@$#', -1, 'q', 'd', 'o'); +AqCDddd + > SELECT _FUNC_('AbCD123-@$#', -1, -1, 'd', 'o'); +AbCDddd + > SELECT _FUNC_('AbCD123-@$#', -1, -1, -1, 'o'); +AbCD123 + > SELECT _FUNC_(NULL, -1, -1, -1, 'o'); +NULL + > SELECT _FUNC_(NULL); +NULL + > SELECT _FUNC_('AbCD123-@$#', -1, -1, -1, -1); +AbCD123-@$# + """, + since = "3.4.0", + group = "string_funcs") +// scalastyle:on line.size.limit +case class Mask( +input: Expression, +upperChar: Expression, +lowerChar: Expression, +digitChar: Expression, +otherChar: Expression) +extends QuinaryExpression +with ImplicitCastInputTypes Review Comment: yes, input type is string -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051385 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.errors.QueryErrorsBase +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. + The function replaces characters with 'X' or 'x', and numbers with 'n'. + This can be useful for creating copies of tables with sensitive information removed. + Error behavior: null value as replacement argument will throw AnalysisError. + """, + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 + """, + examples = """ +Examples: + > SELECT _FUNC_('abcd-EFGH-8765-4321'); +--- + > SELECT _FUNC_('abcd-EFGH-8765-4321', 'Q'); +--- + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#'); +XxXXnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q'); +QxQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q', 'd'); +QqQQddd-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q', 'q', 'd', 'o'); +QqQQddd + > SELECT _FUNC_('AbCD123-@$#', -1, 'q', 'd', 'o'); +AqCDddd + > SELECT _FUNC_('AbCD123-@$#', -1, -1, 'd', 'o'); +AbCDddd + > SELECT _FUNC_('AbCD123-@$#', -1, -1, -1, 'o'); +AbCD123 + > SELECT _FUNC_(NULL, -1, -1, -1, 'o'); +NULL + > SELECT _FUNC_(NULL); +NULL + > SELECT _FUNC_('AbCD123-@$#', -1, -1, -1, -1); +AbCD123-@$# + """, + since = "3.4.0", + group = "string_funcs") +// scalastyle:on line.size.limit +case class Mask( +input: Expression, +upperChar: Expression, +lowerChar: Expression, +digitChar: Expression, +otherChar: Expression) +extends QuinaryExpression +with ImplicitCastInputTypes +with QueryErrorsBase +with NullIntolerant { + + def this(input: Expression) = +this( + input, + Literal(Mask.MASKED_UPPERCASE), + Literal(Mask.MASKED_LOWERCASE), + Literal(Mask.MASKED_DIGIT), + Literal(Mask.MASKED_IGNORE)) + + def this(input: Expression, upperChar: Expression) = +this( + input, + upperChar, + Literal(Mask.MASKED_LOWERCASE), + Literal(Mask.MASKED_DIGIT), + Literal(Mask.MASKED_IGNORE)) + + def this(input: Expression, upperChar: Expression, lowerChar: Expression) = +this(input, upperChar, lowerChar, Literal(Mask.MASKED_DIGIT), Literal(Mask.MASKED_IGNORE)) + + def this( + input: Expression, + upperChar: Expression, + lowerChar: Expression, + digitChar: Expression) = +this(input, upperChar, lowerChar, digitChar, Literal(Mask.MASKED_IGNORE)) + + override def checkInputDataTypes(): TypeCheckResult = { + +def checkInputDataType(exp: Expression, message: String): Option[TypeCheckResult] = { + if (!exp.foldable) { Review Comment: Yes, added
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1044051062 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +58,69 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from -3); SELECT substring('Spark SQL' from 5 for 1); +-- mask function +SELECT mask('AbCD123-@$#'); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1040069687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.errors.QueryErrorsBase +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. + The function replaces characters with 'X' or 'x', and numbers with 'n'. + This can be useful for creating copies of tables with sensitive information removed. + Error behavior: null value as replacement argument will throw AnalysisError. + """, + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' Review Comment: Yes, in hive also, -1 used to retain the original character Ref : [This](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java#L41) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1040069687 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult.DataTypeMismatch +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.errors.QueryErrorsBase +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value. + The function replaces characters with 'X' or 'x', and numbers with 'n'. + This can be useful for creating copies of tables with sensitive information removed. + Error behavior: null value as replacement argument will throw AnalysisError. + """, + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' Review Comment: Yes, in hive also -1 used to retain the original character Ref : [This](https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java#L41) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013416152 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = +"""_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value""", Review Comment: Updated the description -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415700 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = +"""_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value""", + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 + """, + examples = """ +Examples: + > SELECT _FUNC_('abcd-EFGH-8765-4321'); +--- + > SELECT _FUNC_('abcd-EFGH-8765-4321', 'Q'); +--- + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#'); +XxXXnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q'); +QxQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd'); +QqQQddd-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd', 'o'); +QqQQddd + > SELECT _FUNC_('AbCD123-@$#', -1, 'q', 'd', 'o'); +AqCDddd + > SELECT _FUNC_('AbCD123-@$#', -1,-1, 'd', 'o'); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415091 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +60,54 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from -3); SELECT substring('Spark SQL' from 5 for 1); +-- mask function +SELECT mask('AbCD123-@$#'); +SELECT mask('AbCD123-@$#', 'Q'); +SELECT mask('AbCD123-@$#', 'Q','q'); +SELECT mask('AbCD123-@$#', 'Q','q', 'd'); +SELECT mask('AbCD123-@$#', 'Q','q', 'd', 'o'); +SELECT mask('AbCD123-@$#', -1, 'q', 'd', 'o'); +SELECT mask('AbCD123-@$#', -1,-1, 'd', 'o'); +SELECT mask('AbCD123-@$#', -1,-1, -1, 'o'); +SELECT mask('AbCD123-@$#', -1, -1, -1, -1); +SELECT mask(NULL); +SELECT mask(NULL, -1, 'q', 'd', 'o'); +SELECT mask(NULL, -1,-1, 'd', 'o'); +SELECT mask(NULL, -1,-1, -1, 'o'); +SELECT mask(NULL, -1, -1, -1, -1); +SELECT mask(c1) from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q')from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q', 'd') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q', 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1, 'q', 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1,-1, 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1,-1, -1, 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1, -1, -1, -1) from values ('AbCD123-@$#') as tab(c1); +SELECT mask('abcd-EFGH-8765-4321'); +SELECT mask('abcd-EFGH-8765-4321', 'Q'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q', 'd'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q', 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1, 'q', 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1,-1, 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1,-1, -1, '*'); +SELECT mask('abcd-EFGH-8765-4321', -1, -1, -1, -1); +SELECT mask(NULL); +SELECT mask(NULL, -1, 'q', 'd', '*'); +SELECT mask(NULL, -1,-1, 'd', '*'); +SELECT mask(NULL, -1,-1, -1, '*'); +SELECT mask(NULL, -1, -1, -1, -1); +SELECT mask(c1) from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q') from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q','q')from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q','q', 'd') from values ('abcd-EFGH-8765-4321') as tab(c1); Review Comment: When arguments are column references, it will throw an Analysis exception. Only foldable expressions are allowed. Implemented `checkInputDataTypes` to return an error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013415091 ## sql/core/src/test/resources/sql-tests/inputs/string-functions.sql: ## @@ -58,6 +60,54 @@ SELECT substring('Spark SQL' from 5); SELECT substring('Spark SQL' from -3); SELECT substring('Spark SQL' from 5 for 1); +-- mask function +SELECT mask('AbCD123-@$#'); +SELECT mask('AbCD123-@$#', 'Q'); +SELECT mask('AbCD123-@$#', 'Q','q'); +SELECT mask('AbCD123-@$#', 'Q','q', 'd'); +SELECT mask('AbCD123-@$#', 'Q','q', 'd', 'o'); +SELECT mask('AbCD123-@$#', -1, 'q', 'd', 'o'); +SELECT mask('AbCD123-@$#', -1,-1, 'd', 'o'); +SELECT mask('AbCD123-@$#', -1,-1, -1, 'o'); +SELECT mask('AbCD123-@$#', -1, -1, -1, -1); +SELECT mask(NULL); +SELECT mask(NULL, -1, 'q', 'd', 'o'); +SELECT mask(NULL, -1,-1, 'd', 'o'); +SELECT mask(NULL, -1,-1, -1, 'o'); +SELECT mask(NULL, -1, -1, -1, -1); +SELECT mask(c1) from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q')from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q', 'd') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, 'Q','q', 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1, 'q', 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1,-1, 'd', 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1,-1, -1, 'o') from values ('AbCD123-@$#') as tab(c1); +SELECT mask(c1, -1, -1, -1, -1) from values ('AbCD123-@$#') as tab(c1); +SELECT mask('abcd-EFGH-8765-4321'); +SELECT mask('abcd-EFGH-8765-4321', 'Q'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q', 'd'); +SELECT mask('abcd-EFGH-8765-4321', 'Q','q', 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1, 'q', 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1,-1, 'd', '*'); +SELECT mask('abcd-EFGH-8765-4321', -1,-1, -1, '*'); +SELECT mask('abcd-EFGH-8765-4321', -1, -1, -1, -1); +SELECT mask(NULL); +SELECT mask(NULL, -1, 'q', 'd', '*'); +SELECT mask(NULL, -1,-1, 'd', '*'); +SELECT mask(NULL, -1,-1, -1, '*'); +SELECT mask(NULL, -1, -1, -1, -1); +SELECT mask(c1) from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q') from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q','q')from values ('abcd-EFGH-8765-4321') as tab(c1); +SELECT mask(c1, 'Q','q', 'd') from values ('abcd-EFGH-8765-4321') as tab(c1); Review Comment: When arguments are column references, it will throw an Analysis exception. Only foldable expressions are allowed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013413707 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = +"""_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value""", + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 + """, + examples = """ +Examples: + > SELECT _FUNC_('abcd-EFGH-8765-4321'); +--- + > SELECT _FUNC_('abcd-EFGH-8765-4321', 'Q'); +--- + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#'); +XxXXnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q'); +QxQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd'); +QqQQddd-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd', 'o'); +QqQQddd + > SELECT _FUNC_('AbCD123-@$#', -1, 'q', 'd', 'o'); +AqCDddd + > SELECT _FUNC_('AbCD123-@$#', -1,-1, 'd', 'o'); +AbCDddd + > SELECT _FUNC_('AbCD123-@$#', -1,-1, -1, 'o'); +AbCD123 + > SELECT _FUNC_(NULL, -1,-1, -1, 'o'); Review Comment: Added test cases where replacement characters are NULL, it will throw an analysis exception -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a diff in pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on code in PR #38146: URL: https://github.com/apache/spark/pull/38146#discussion_r1013412397 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala: ## @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.expressions.codegen._ +import org.apache.spark.sql.types.{AbstractDataType, DataType, StringType} +import org.apache.spark.unsafe.types.UTF8String + +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = +"""_FUNC_(input[, upperChar, lowerChar, digitChar, otherChar]) - masks the given string value""", + arguments = """ +Arguments: + * input - string value to mask. Supported types: STRING, VARCHAR, CHAR + * upperChar - character to replace upper-case characters with. Specify -1 to retain original character. Default value: 'X' + * lowerChar - character to replace lower-case characters with. Specify -1 to retain original character. Default value: 'x' + * digitChar - character to replace digit characters with. Specify -1 to retain original character. Default value: 'n' + * otherChar - character to replace all other characters with. Specify -1 to retain original character. Default value: -1 + """, + examples = """ +Examples: + > SELECT _FUNC_('abcd-EFGH-8765-4321'); +--- + > SELECT _FUNC_('abcd-EFGH-8765-4321', 'Q'); +--- + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#'); +XxXXnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q'); +QxQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q'); +QqQQnnn-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd'); +QqQQddd-@$# + > SELECT _FUNC_('AbCD123-@$#', 'Q','q', 'd', 'o'); +QqQQddd + > SELECT _FUNC_('AbCD123-@$#', -1, 'q', 'd', 'o'); +AqCDddd + > SELECT _FUNC_('AbCD123-@$#', -1,-1, 'd', 'o'); +AbCDddd + > SELECT _FUNC_('AbCD123-@$#', -1,-1, -1, 'o'); Review Comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org