dtenedor opened a new pull request, #38065:
URL: https://github.com/apache/spark/pull/38065

   ### What changes were proposed in this pull request?
   
   Add MASK_CCN and TRY_MASK_CCN functions to redact credit card string values.
   
   Each of these functions converts a string 'input' representing a credit card 
number to an updated version applying a transformation to the characters. This 
can be useful for creating copies of tables with sensitive information removed, 
but retaining the same schema.
   
   Both functions return an error if the format string is invalid.
   
   MASK_CCN returns an error if the input string does not match the format 
string.
   
   TRY_MASK_CCN instead returns NULL in that case.
   
   The format can consist of the following characters, case insensitive:
     - Each 'X' represents a digit which will be converted to 'X' in the result.
     - Each digit '0'-'9' represents a digit which will be left unchanged in 
the result.
     - Each '-' character should match exactly in the input string.
     - The default format string is: XXXX-XXXX-XXXX-XXXX.
   
   No other format characters are allowed.
   Any leading or trailing whitespace in the input string is stripped out.
   
   Examples:
   ```
   > SELECT MASK_CCN(ccn) FROM VALUES ("1234-5678-9876-5432") AS tab(ccn);
     XXXX-XXXX-XXXX-XXXX
   > SELECT MASK_CCN("  1234-5678-9876-5432  ", "XXXX-XXXX-XXXX-1234");
     XXXX-XXXX-XXXX-5432
   > SELECT MASK_CCN("[1234-5678-9876-5432]", "[XXXX-XXXX-XXXX-1234]");
     Error: the format string is invalid
   > SELECT MASK_CCN("1234567898765432");
     Error: the input string does not match the format
   > SELECT MASK_CCN("1234567898765432", "XXXX-XXXX-XXXX-1234");
     Error: the input string does not match the format
   > SELECT TRY_MASK_CCN("1234567898765432");
     NULL
   > SELECT TRY_MASK_CCN("1234567898765432", "XXXX-XXXX-XXXX-1234");
     NULL
   ```
   
   ### Why are the changes needed?
   
   These functions are useful for processing string values.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, it adds new SQL functions.
   
   ### How was this patch tested?
   
   This PR adds a new unit test suite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to