Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14963 )

Change subject: IMPALA-9010: Add builtin mask functions
......................................................................


Patch Set 2:

(10 comments)

Thanks for your coments! Addressed them.

http://gerrit.cloudera.org:8080/#/c/14963/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14963/2//COMMIT_MSG@25
PS2, Line 25: number of characters
> nit: Is it better to use "number of characters to retain" to make it cleare
I'm afraid not. It has different meanings in different functions. In 
mask_show_first_n(), it's the number of characters to retain. In 
mask_first_n(), it's the number of characters to mask. So I think just leave it 
as this is better. The meaning is only clear with the function name.

BTW, these descriptions are copied and merged from Hive's javadoc:
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMask.java#L40-L48
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowFirstN.java#L31-L38
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskShowLastN.java#L31-L38
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskFirstN.java#L31-L38
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskLastN.java#L31-L38
https://github.com/apache/hive/blob/ae008b7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFMaskHash.java#L32


http://gerrit.cloudera.org:8080/#/c/14963/2//COMMIT_MSG@30
PS2, Line 30: digitChar  - character to replace digit characters with. Specify 
-1
            :                 to retain original character. Default value: 'n'
> After reading the description, I found that the difference between 'digitCh
digitChar is used for string values. numberChar is used for numeric values. E.g.
hive> select mask_show_first_n(cast(12345 as smallint), 3, 'x', 'x', 'x', -1, 
'5');
12355
hive> select mask_show_first_n("12345", 3, 'x', 'x', 'x', -1, '5');
'123xx'


http://gerrit.cloudera.org:8080/#/c/14963/2//COMMIT_MSG@34
PS2, Line 34: numberChar - character to replace digits in a number with. Valid
            :                 values: 0-9. Default value: '1'
> Sorry I meant to say "Specify -1 to use the default value, i.e., 1" (if my
-1 is an invalid value for numberChar. All invalid values will be treated as 
defalut value 1.


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/expr-test.cc@10449
PS2, Line 10449:   // Error handling
> What happens when one would mask the day in 2019-02-02 to 30? Could you add
Done


http://gerrit.cloudera.org:8080/#/c/14963/1/be/src/exprs/mask-functions-ir.cc
File be/src/exprs/mask-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/14963/1/be/src/exprs/mask-functions-ir.cc@141
PS1, Line 141:   }
> Awesome, thanks!
Done


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc
File be/src/exprs/mask-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@223
PS2, Line 223: !(1 <= day_value && day_value <= 31)
> This considers 31 as a valid day number for eg. February. Shouldn't this be
Good point! Hive will round the additional days to the next month...

hive> select mask(cast('2019-02-02' as date), -1, -1, -1, -1, -1, 29, -1, -1);
2019-03-01
hive> select mask(cast('2019-02-02' as date), -1, -1, -1, -1, -1, 31, -1, -1);
2019-03-03

We currently return NULL for these cases. I think it's due to the different 
behaviors of DATE between Hive and Impala. E.g. cast('2019-02-31' as date) 
results to '2019-03-03' in Hive but results to error in Impala. Updated the 
description about this.

I also found that Hive treats the yearValue as starting at 1900. So yearValue=0 
means masking year field to 1900 actually. That's not as said by the 
descriptions. What's worse, Hive can't mask year to 1899 since -1 already means 
retaining original value. So I created HIVE-22711 hoping Hive can change its 
behavior.


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@266
PS2, Line 266: 4
> Wouldn't it be nicer if this constant were defined at the beginning of the
Done


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions-ir.cc@697
PS2, Line 697:   (void)SHA256(val.ptr, val.len, sha256_hash.ptr);
> nit: Wouldn't using "discard_result" be nicer here?
Done


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions.h
File be/src/exprs/mask-functions.h:

http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions.h@50
PS2, Line 50: number of characters
> nit: Is it better to use "number of characters to retain" to make it cleare
Ack


http://gerrit.cloudera.org:8080/#/c/14963/2/be/src/exprs/mask-functions.h@59
PS2, Line 59: ///   numberChar - character to replace digits in a number with. 
Valid values: 0-9.
            : ///                Default value: '1'
> Sorry I meant to say "Specify -1 to use the default value, i.e., 1" (if my
-1 is an invalid value for numberChar. All invalid values (-1, 10, 99...) will 
be treated as defalut value 1.



--
To view, visit http://gerrit.cloudera.org:8080/14963
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ica779a1bf63a085d51f3b533f654cbaac102a664
Gerrit-Change-Number: 14963
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Fang-Yu Rao <fangyu....@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Kurt Deschler <kdesc...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <huangquanl...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Jan 2020 09:06:45 +0000
Gerrit-HasComments: Yes

Reply via email to