[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1377734764 Thanks for the suggestions , I'll raise a new PR to change -1 to NULL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1377706874 @srielau , Thanks for checking this. There is no specific reason other than following same approach as in Hive documentation: https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java#L41 If still need to change to NULL I can raise a new PR. @gengliangwang @dtenedor , please share your recommendations too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1344455185 @gengliangwang, Review comments are addressed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1337857038 @HyukjinKwon , @dtenedor , Can you please check this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1306022238 Hi @dtenedor , @HyukjinKwon @gengliangwang Review comments are resolved, can you please check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1294245157 @dtenedor , yes, please close yours as a dup. I appreciate your help in reviewing this PR and on top of this change, I'm planning to add additional built-in mask functions supported in Hive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1291114363 > Reference snowflake: https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html @melin Thanks to the reference from Snoflake, which describes dynamic data masking based on users/roles. As Spark does not support access control/roles, so we cannot implement dynamic data masking based on users/roles. This PR implements the same data masking functionality as Hive - mask(string str[, string upper[, string lower[, string number]]]) https://issues.apache.org/jira/browse/SPARK-40686 - mask_first_n(string str[, int n]) - mask_last_n(string str[, int n] - mask_show_first_n(string str[, int n]) - mask_show_last_n(string str[, int n]) Ref : https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-DataMaskingFunctions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in Function 'mask'
vinodkc commented on PR #38146: URL: https://github.com/apache/spark/pull/38146#issuecomment-1272016984 @HyukjinKwon , this PR is a generic approach to mask the string based on the arguments. This mask function can be applied to any string value and it does not expect a pattern on the input string. Apache Hive **mask** function has the same logic. Eg: Arguments: * input - string value to mask. Supported types: STRING, VARCHAR, CHAR * upperChar - character to replace upper-case characters with. Specify -1 to retain the original character. Default value: 'X' * lowerChar - character to replace lower-case characters with. Specify -1 to retain the original character. Default value: 'x' * digitChar - character to replace digit characters with. Specify -1 to retain the original character. Default value: 'n' * otherChar - character to replace all other characters with. Specify -1 to retain the original character. Default value: -1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org