Laszlo Gaal has posted comments on this change. ( http://gerrit.cloudera.org:8080/18256 )
Change subject: IMPALA-11133: Decode author of a commit with utf8 before printing it ...................................................................... Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/18256/3/bin/compare_branches.py File bin/compare_branches.py: http://gerrit.cloudera.org:8080/#/c/18256/3/bin/compare_branches.py@270 PS3, Line 270: msg I would actually suggest decoding the "msg" field as well: it is free text coming from (former) user input, so it can also contain non-ASCII characters, e.g. smart quotes in earlier problems that led to earlier patches to this line. Another solution could be to explicitly encode each input commit message field in L147 (changing t.strip() to t.decode('utf-8').strip() ), but that would require checking the further data flow for the "commit_hash" field. OTOH the commit hash is guaranteed to comtain only hex digits, so implicit ASCII->Unicode and reverse transofrmations should not cause any problems. -- To view, visit http://gerrit.cloudera.org:8080/18256 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieb03b0937a994db2bf08e4199574d04f7fb99f5d Gerrit-Change-Number: 18256 Gerrit-PatchSet: 3 Gerrit-Owner: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Fang-Yu Rao <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Laszlo Gaal <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Tue, 22 Feb 2022 11:16:53 +0000 Gerrit-HasComments: Yes
