Laszlo Gaal has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/18256 )

Change subject: IMPALA-11133: Decode author of a commit with utf8 before 
printing it
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18256/3/bin/compare_branches.py
File bin/compare_branches.py:

http://gerrit.cloudera.org:8080/#/c/18256/3/bin/compare_branches.py@270
PS3, Line 270: msg
I would actually suggest decoding the "msg" field as well: it is free text 
coming from (former) user input, so it can also contain non-ASCII characters, 
e.g. smart quotes in earlier problems that led to earlier patches to this line.
Another solution could be to explicitly encode each input commit message field 
in L147 (changing t.strip() to t.decode('utf-8').strip() ), but that would 
require checking the further data flow for the "commit_hash" field. OTOH the 
commit hash is guaranteed to comtain only hex digits, so implicit 
ASCII->Unicode and reverse transofrmations should not cause any problems.



--
To view, visit http://gerrit.cloudera.org:8080/18256
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieb03b0937a994db2bf08e4199574d04f7fb99f5d
Gerrit-Change-Number: 18256
Gerrit-PatchSet: 3
Gerrit-Owner: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Fang-Yu Rao <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Laszlo Gaal <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Tue, 22 Feb 2022 11:16:53 +0000
Gerrit-HasComments: Yes

Reply via email to