Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20447 )

Change subject: IMPALA-12418: Include crypto functions supported by Hive
......................................................................


Patch Set 3:

(43 comments)

http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG@9
PS3, Line 9: AES (Advanced Encryption Standard) is a widely recognized
This paragraph is a bit like an advert for AES :)
Instead, we should write what new Impala functions are added by this commit, 
how they can be used etc, but of course a short description of AES should also 
be included.


http://gerrit.cloudera.org:8080/#/c/20447/3//COMMIT_MSG@19
PS3, Line 19: ECB
Could describe in short what ECB is.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc
File be/src/exprs/string-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@40
PS3, Line 40: #include "util/openssl-util.h"
            : #include "udf/udf.h"
The includes should be in alphabetical order.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@43
PS3, Line 43: #include <string>
            : #include <openssl/bio.h>
            : #include <openssl/evp.h>
            : #include <cstring>
Put these together with the other std library includes (L20), in alphabetical 
order.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@182
PS3, Line 182:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@551
PS3, Line 551:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@854
PS3, Line 854:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1076
PS3, Line 1076:   
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1112
PS3, Line 1112:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1344
PS3, Line 1344:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1624
PS3, Line 1624:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1625
PS3, Line 1625:
Unnecessary spaces.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806
PS3, Line 1806: unsigned char*
Could be const.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806
PS3, Line 1806: ctx1
Why call it 'ctx1'? It could simply be 'ctx'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1806
PS3, Line 1806: encrypt_ecb
Add a doc comment describing what this function does, especially what 'ecb' is, 
and what the function returns.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1807
PS3, Line 1807: ,
Nit: add space after the comma.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1807
PS3, Line 1807: unsigned char*
Could be const.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1808
PS3, Line 1808: int cipher_len = 0;
              :   int len = 0;
No need to declare these here. 'len' could be done just before L1823, and 
'cipher_len' could be defined on L1828 as "int cipher_len = len;"


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1811
PS3, Line 1811: ctx
This could be 'cipher_ctx'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1811
PS3, Line 1811: EVP_CIPHER_CTX_new
We should check whether we can reuse the EVP_CIPHER_CTX between calls to the 
UDF, in the FunctionContext. I haven't checked OpenSSL yet, I don't know if it 
supports multiple initialisations.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1814
PS3, Line 1814: \n
Do we need the newline at the end of error messages? Applies to all other error 
messages too.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1818
PS3, Line 1818: NULL
Use 'nullptr' instead of 'NULL'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1818
PS3, Line 1818: NULL
Use 'nullptr' instead of 'NULL'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1836
PS3, Line 1836:   EVP_CIPHER_CTX_free(ctx);
We have some early returns in the function, in which case this "free" function 
is not called.
If the EVP_CIPHER_CTX cannot be reused (see L1811), we should either call this 
free function in before each return statement OR wrap 'cipher_ctx' in an object 
that calls this function in its destructor.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1841
PS3, Line 1841: unsigned char*
Could be const.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1841
PS3, Line 1841: ctx1
See L1806.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1842
PS3, Line 1842: *
We usually put the '*' on the type, not the variable name.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1842
PS3, Line 1842: unsigned char*
Could be const.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1843
PS3, Line 1843: int cipher_len = 0;
              :   int len = 0;
See L1808.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1853
PS3, Line 1853: NULL
Use 'nullptr' instead of 'NULL'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1853
PS3, Line 1853: NULL
Use 'nullptr' instead of 'NULL'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1871
PS3, Line 1871:   EVP_CIPHER_CTX_free(ctx);
See L1836.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1884
PS3, Line 1884: key_len
I don't think extracting 'key.len' into a variable is useful. Could simply use 
'key.len'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1885
PS3, Line 1885: input_len
See L1884 about 'key_len'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1886
PS3, Line 1886: %
Surround '%' with spaces. A comment should explain why we have this length.
If you'd like to round it up to the next 16 bytes, this won't work well because 
if 'input_len' is already a multiple of 16, it will add 16 extra bytes. Or is 
it intentional?


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1887
PS3, Line 1887: input_vec
No need for this vector, could simply use the memory of 'input', i.e. 
"input.ptr"


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1889
PS3, Line 1889: key_vec
No need for this vector, could simply use the memory of 'key', i.e. "key.ptr"


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1904
PS3, Line 1904: NULL
Use 'nullptr' instead of 'NULL'.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1912
PS3, Line 1912: ret_size != output_len
We should add a DCHECK for the specific case where "ret_size > output_len" 
because it is a buffer overflow, which is undefined behaviour. The DCHECK 
should be in addition to the normal check.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1916
PS3, Line 1916:   StringVal result = StringVal::CopyFrom(ctx, output_vec, 
output_len);
You could allocate the result directly as a StringVal with a length of 
'output_len' and use its buffer in the call to encrypt_ecb(). It would save us 
a copy.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1920
PS3, Line 1920: StringVal StringFunctions::aes_decrypt(FunctionContext* ctx,
See comments for aes_encrypt() above, they apply also here.


http://gerrit.cloudera.org:8080/#/c/20447/3/be/src/exprs/string-functions-ir.cc@1954
PS3, Line 1954:   StringVal result = StringVal::CopyFrom(ctx, output_vec, 
ret_size);
In the case of decryption, this copy may be worth keeping because the result 
may be shorter than 'output_len'.


http://gerrit.cloudera.org:8080/#/c/20447/3/bin/cmake_aux/create_py3_virtualenv.sh
File bin/cmake_aux/create_py3_virtualenv.sh:

http://gerrit.cloudera.org:8080/#/c/20447/3/bin/cmake_aux/create_py3_virtualenv.sh@48
PS3, Line 48:   # Remove the directory that Python3 venv created, so 
impala-virtualenv can start
Did this come from a rebase?



--
To view, visit http://gerrit.cloudera.org:8080/20447
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3902f2b1d95da4d06995cbd687e79c48e16190c9
Gerrit-Change-Number: 20447
Gerrit-PatchSet: 3
Gerrit-Owner: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Anonymous Coward <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Comment-Date: Tue, 14 Nov 2023 10:05:26 +0000
Gerrit-HasComments: Yes

Reply via email to