[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2023-01-10 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1377734764

   Thanks for the suggestions , I'll raise a new PR to change -1 to  NULL


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2023-01-10 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1377706874

   @srielau , Thanks for checking this. There is no specific reason other than 
following same approach as in Hive  documentation: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java#L41
   If still need to change to NULL I can raise a new PR. @gengliangwang 
@dtenedor , please share your recommendations too


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-09 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1344455185

   @gengliangwang, Review comments are addressed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-12-05 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1337857038

   @HyukjinKwon ,  @dtenedor , Can you please check this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-11-07 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1306022238

   Hi @dtenedor , @HyukjinKwon @gengliangwang
   Review comments are resolved, can you please check


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-10-27 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1294245157

   @dtenedor , yes, please close yours as a dup. I appreciate your help in 
reviewing this PR and on top of this change, I'm planning to  add additional 
built-in  mask functions  supported in Hive


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in function 'mask'

2022-10-25 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1291114363

   > Reference snowflake: 
https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html
   
   @melin 
   Thanks to the reference from Snoflake, which describes dynamic data masking 
based on users/roles. As Spark does not support access control/roles, so we 
cannot implement dynamic data masking based on users/roles.
   
   This PR implements the same data masking functionality as Hive
   - mask(string str[, string upper[, string lower[, string number]]])
   
   https://issues.apache.org/jira/browse/SPARK-40686
   - mask_first_n(string str[, int n])
   - mask_last_n(string str[, int n]
   - mask_show_first_n(string str[, int n])
   - mask_show_last_n(string str[, int n])
   Ref : 
https://cwiki.apache.org/confluence/display/hive/languagemanual+udf#LanguageManualUDF-DataMaskingFunctions
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] vinodkc commented on pull request #38146: [SPARK-40687][SQL] Support data masking built-in Function 'mask'

2022-10-07 Thread GitBox


vinodkc commented on PR #38146:
URL: https://github.com/apache/spark/pull/38146#issuecomment-1272016984

   @HyukjinKwon , this PR is a generic approach to mask the string based on the 
arguments. This mask function can be applied to any string value and it does 
not expect a pattern on the input string. Apache Hive **mask** function has the 
same logic.
   Eg: 
   Arguments:
 * input  - string value to mask. Supported types: STRING, VARCHAR, 
CHAR
 * upperChar  - character to replace upper-case characters with. 
Specify -1 to retain the original character. Default value: 'X'
 * lowerChar  - character to replace lower-case characters with. 
Specify -1 to retain the original character. Default value: 'x'
 * digitChar  - character to replace digit characters with. Specify -1 
to retain the original character. Default value: 'n'
 * otherChar  - character to replace all other characters with. Specify 
-1 to retain the original character. Default value: -1
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org