Hello Fang-Yu Rao, Norbert Luksa, Kurt Deschler, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14963

to look at the new patch set (#9).

Change subject: IMPALA-9010: Add builtin mask functions
......................................................................

IMPALA-9010: Add builtin mask functions

There're 6 builtin GenericUDFs for column masking in Hive:
  mask_show_first_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_first_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_last_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_hash(value)
  mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar,
      dayValue, monthValue, yearValue)

Description of the parameters:
   value      - value to mask. Supported types: TINYINT, SMALLINT, INT,
                BIGINT, STRING, VARCHAR, CHAR, DATE(only for mask()).
   charCount  - number of characters. Default value: 4
   upperChar  - character to replace upper-case characters with. Specify
                -1 to retain original character. Default value: 'X'
   lowerChar  - character to replace lower-case characters with. Specify
                -1 to retain original character. Default value: 'x'
   digitChar  - character to replace digit characters with. Specify -1
                to retain original character. Default value: 'n'
   otherChar  - character to replace all other characters with. Specify
                -1 to retain original character. Default value: -1
   numberChar - character to replace digits in a number with. Valid
                values: 0-9. Default value: '1'
   dayValue   - value to replace day field in a date with.
                Specify -1 to retain original value. Valid values: 1-31.
                Default value: 1
   monthValue - value to replace month field in a date with. Specify -1
                to retain original value. Valid values: 0-11. Default
                value: 0
   yearValue  - value to replace year field in a date with. Specify -1
                to retain original value. Default value: 1

In Hive, these functions accept variable length of arguments in
non-restricted types:
   mask_show_first_n(val)
   mask_show_first_n(val, 8)
   mask_show_first_n(val, 8, 'X', 'x', 'n')
   mask_show_first_n(val, 8, 'x', 'x', 'x', 'x', 2)
   mask_show_first_n(val, 8, 'x', -1, 'x', 'x', '9')
The arguments of upperChar, lowerChar, digitChar, otherChar and
numberChar can be in string or numeric types.

We currently don't have a corresponding framework for GenericUDF
(IMPALA-9271), so we implement these by overloads. However, it may
requires hundreds of overloads to cover all possible combinations. We
just implement some important overloads, including
 - those used by Ranger default masking policies,
 - those with simple arguments and may be useful for users,
 - an overload with all arguments in int type for full functionality.
   Char argument need to be converted to their ASCII value.

Tests:
 - Add BE tests in expr-test

Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-test.cc
A be/src/exprs/mask-functions-ir.cc
A be/src/exprs/mask-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 1,605 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/14963/9
--
To view, visit http://gerrit.cloudera.org:8080/14963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
Gerrit-Change-Number: 14963
Gerrit-PatchSet: 9
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to