Adar Dembo has posted comments on this change. Change subject: KUDU-1812: redact sensitive partition keys from logs ......................................................................
Patch Set 1: (8 comments) http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/client/meta_cache.cc File src/kudu/client/meta_cache.cc: PS1, Line 668: return Substitute("GetTableLocations { table: '$0', attempt: $1 }", : table_->name(), num_attempts()); > Is it ok to drop additional info here even if 'log_row_contents' is set to I believe Dan is always dropping the partition key because this is client code, where log_row_contents has no effect. http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/client/meta_cache.h File src/kudu/client/meta_cache.h: PS1, Line 324: This string will not be redacted, since table partitions : // are considered metadata. Nit: put this on a new line, like you did in partition.h. http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition.cc File src/kudu/common/partition.cc: PS1, Line 628: <redacted> May be nice to store "<redacted>" somewhere centrally so we can easily change the redaction string if necessary (or make it configurable via gflag in the future). http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition.h File src/kudu/common/partition.h: Line 128: class PartitionSchema { Would be good to include a high level discussion of redaction as it pertains to partitions, partition keys, range keys, etc. Not sure if that belongs in this class comment or in Partition. PS1, Line 197: std::string RangeKeyDebugString(Slice range_key, const Schema& schema) const; : std::string RangeKeyDebugString(const KuduPartialRow& key) const; : std::string RangeKeyDebugString(const ConstContiguousRow& key) const; These can be private. Also, the warning may not be necessary in that case, or if you'd like to keep it, it might be worth adding a similar warning on other private stringifying methods here that also rely on their callers to handle redaction. PS1, Line 302: /// Returns the stringified hash and range schema componenets of the partition : /// schema. Nit: the triple-slash notation here was probably a mistake; could you fix it? http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition_pruner.cc File src/kudu/common/partition_pruner.cc: PS1, Line 419: number of partition ranges being scanned Why is this sensitive? http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/util/logging.cc File src/kudu/util/logging.cc: Line 65: DEFINE_bool(log_row_contents, false, I'd prefer if the gflag name were more generic, on two fronts: 1) It affects redaction of Status messages propagated back to clients. 2) In the future, it affects redaction of other sensitive data, not just row contents. So, perhaps something like "redact_sensitive_data" or "redact_user_data"? -- To view, visit http://gerrit.cloudera.org:8080/5548 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3eba5290145048d77d551b989865c05d928a9b10 Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Dan Burkert <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Alexey Serbin <[email protected]> Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Tidy Bot Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
