Hello Fang-Yu Rao, Norbert Luksa, Kurt Deschler, Zoltan Borok-Nagy, Impala
Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/14963
to look at the new patch set (#2).
Change subject: IMPALA-9010: Add builtin mask functions
......................................................................
IMPALA-9010: Add builtin mask functions
There're 6 builtin GenericUDFs for column masking in Hive:
mask_show_first_n(value, charCount, upperChar, lowerChar, digitChar,
otherChar, numberChar)
mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar,
otherChar, numberChar)
mask_first_n(value, charCount, upperChar, lowerChar, digitChar,
otherChar, numberChar)
mask_last_n(value, charCount, upperChar, lowerChar, digitChar,
otherChar, numberChar)
mask_hash(value)
mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar,
dayValue, monthValue, yearValue)
Description of the parameters:
value - value to mask. Supported types: TINYINT, SMALLINT, INT,
BIGINT, STRING, VARCHAR, CHAR, DATE(only for mask()).
charCount - number of characters. Default value: 4
upperChar - character to replace upper-case characters with. Specify
-1 to retain original character. Default value: 'X'
lowerChar - character to replace lower-case characters with. Specify
-1 to retain original character. Default value: 'x'
digitChar - character to replace digit characters with. Specify -1
to retain original character. Default value: 'n'
otherChar - character to replace all other characters with. Specify
-1 to retain original character. Default value: -1
numberChar - character to replace digits in a number with. Valid
values: 0-9. Default value: '1'
dayValue - value to replace day field in a date with.
Specify -1 to retain original value. Valid values: 1-31.
Default value: 1
monthValue - value to replace month field in a date with. Specify -1
to retain original value. Valid values: 0-11. Default
value: 0
yearValue - value to replace year field in a date with. Specify -1
to retain original value. Default value: 1
In Hive, these functions accept variable length of arguments in
non-restricted types:
mask_show_first_n(val)
mask_show_first_n(val, 8)
mask_show_first_n(val, 8, 'X', 'x', 'n')
mask_show_first_n(val, 8, 'x', 'x', 'x', 'x', -1)
mask_show_first_n(val, 8, 'x', -1, 'x', 'x', '9')
The arguments of upperChar, lowerChar, digitChar, otherChar and
numberChar can be in string or numeric types.
We currently don't have a corresponding framework for GenericUDF
(IMPALA-9271), so we implement these by overloads. However, it may
requires hundreds of overloads to cover all possible combinations. We
just implement some important overloads, including
- those used by Ranger default masking policies,
- those with simple arguments and may be useful for users,
- an overload with all arguments in int type for full functionality.
Char argument need to be converted to their ASCII value.
Tests:
- Add BE tests in expr-test
Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-test.cc
A be/src/exprs/mask-functions-ir.cc
A be/src/exprs/mask-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 1,583 insertions(+), 0 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/14963/2
--
To view, visit http://gerrit.cloudera.org:8080/14963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
Gerrit-Change-Number: 14963
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>