Hello Fang-Yu Rao, Norbert Luksa, Kurt Deschler, Zoltan Borok-Nagy, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14963

to look at the new patch set (#12).

Change subject: IMPALA-9010: Add builtin mask functions
......................................................................

IMPALA-9010: Add builtin mask functions

There're 6 builtin GenericUDFs for column masking in Hive:
  mask_show_first_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_show_last_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_first_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_last_n(value, charCount, upperChar, lowerChar, digitChar,
      otherChar, numberChar)
  mask_hash(value)
  mask(value, upperChar, lowerChar, digitChar, otherChar, numberChar,
      dayValue, monthValue, yearValue)

Description of the parameters:
   value      - value to mask. Supported types: TINYINT, SMALLINT, INT,
                BIGINT, STRING, VARCHAR, CHAR, DATE(only for mask()).
   charCount  - number of characters. Default value: 4
   upperChar  - character to replace upper-case characters with. Specify
                -1 to retain original character. Default value: 'X'
   lowerChar  - character to replace lower-case characters with. Specify
                -1 to retain original character. Default value: 'x'
   digitChar  - character to replace digit characters with. Specify -1
                to retain original character. Default value: 'n'
   otherChar  - character to replace all other characters with. Specify
                -1 to retain original character. Default value: -1
   numberChar - character to replace digits in a number with. Valid
                values: 0-9. Default value: '1'
   dayValue   - value to replace day field in a date with.
                Specify -1 to retain original value. Valid values: 1-31.
                Default value: 1
   monthValue - value to replace month field in a date with. Specify -1
                to retain original value. Valid values: 0-11. Default
                value: 0
   yearValue  - value to replace year field in a date with. Specify -1
                to retain original value. Default value: 1

In Hive, these functions accept variable length of arguments in
non-restricted types:
   mask_show_first_n(val)
   mask_show_first_n(val, 8)
   mask_show_first_n(val, 8, 'X', 'x', 'n')
   mask_show_first_n(val, 8, 'x', 'x', 'x', 'x', 2)
   mask_show_first_n(val, 8, 'x', -1, 'x', 'x', '9')
The arguments of upperChar, lowerChar, digitChar, otherChar and
numberChar can be in string or numeric types.

Impala doesn't support Hive GenericUDFs, so we are lack of these mask
functions to support Ranger column masking policies. On the other hand,
we want the masking functions to be evaluated in the C++ builtin logic
rather than calling out to java UDFs for performance. This patch
introduces our builtin implementation of them.

We currently don't have a corresponding framework for GenericUDF
(IMPALA-9271), so we implement these by overloads. However, it may
requires hundreds of overloads to cover all possible combinations. We
just implement some important overloads, including
 - those used by Ranger default masking policies,
 - those with simple arguments and may be useful for users,
 - an overload with all arguments in int type for full functionality.
   Char argument need to be converted to their ASCII value.

Tests:
 - Add BE tests in expr-test

Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
---
M be/src/codegen/impala-ir.cc
M be/src/exprs/CMakeLists.txt
M be/src/exprs/expr-test.cc
A be/src/exprs/mask-functions-ir.cc
A be/src/exprs/mask-functions.h
M be/src/exprs/scalar-expr-evaluator.cc
M common/function-registry/impala_functions.py
7 files changed, 1,590 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/63/14963/12
--
To view, visit http://gerrit.cloudera.org:8080/14963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
Gerrit-Change-Number: 14963
Gerrit-PatchSet: 12
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>

Reply via email to