Adar Dembo has posted comments on this change.

Change subject: KUDU-1812: redact sensitive partition keys from logs
......................................................................


Patch Set 1:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/client/meta_cache.cc
File src/kudu/client/meta_cache.cc:

PS1, Line 668:   return Substitute("GetTableLocations { table: '$0', attempt: 
$1 }",
             :                     table_->name(), num_attempts());
> Is it ok to drop additional info here even if 'log_row_contents' is set to 
I believe Dan is always dropping the partition key because this is client code, 
where log_row_contents has no effect.


http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/client/meta_cache.h
File src/kudu/client/meta_cache.h:

PS1, Line 324: This string will not be redacted, since table partitions
             :   // are considered metadata.
Nit: put this on a new line, like you did in partition.h.


http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition.cc
File src/kudu/common/partition.cc:

PS1, Line 628: <redacted>
May be nice to store "<redacted>" somewhere centrally so we can easily change 
the redaction string if necessary (or make it configurable via gflag in the 
future).


http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition.h
File src/kudu/common/partition.h:

Line 128: class PartitionSchema {
Would be good to include a high level discussion of redaction as it pertains to 
partitions, partition keys, range keys, etc. Not sure if that belongs in this 
class comment or in Partition.


PS1, Line 197:   std::string RangeKeyDebugString(Slice range_key, const Schema& 
schema) const;
             :   std::string RangeKeyDebugString(const KuduPartialRow& key) 
const;
             :   std::string RangeKeyDebugString(const ConstContiguousRow& key) 
const;
These can be private. Also, the warning may not be necessary in that case, or 
if you'd like to keep it, it might be worth adding a similar warning on other 
private stringifying methods here that also rely on their callers to handle 
redaction.


PS1, Line 302:   /// Returns the stringified hash and range schema componenets 
of the partition
             :   /// schema.
Nit: the triple-slash notation here was probably a mistake; could you fix it?


http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/common/partition_pruner.cc
File src/kudu/common/partition_pruner.cc:

PS1, Line 419: number of partition ranges being scanned
Why is this sensitive?


http://gerrit.cloudera.org:8080/#/c/5548/1/src/kudu/util/logging.cc
File src/kudu/util/logging.cc:

Line 65: DEFINE_bool(log_row_contents, false,
I'd prefer if the gflag name were more generic, on two fronts:
1) It affects redaction of Status messages propagated back to clients.
2) In the future, it affects redaction of other sensitive data, not just row 
contents.

So, perhaps something like "redact_sensitive_data" or "redact_user_data"?


-- 
To view, visit http://gerrit.cloudera.org:8080/5548
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3eba5290145048d77d551b989865c05d928a9b10
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Dan Burkert <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Alexey Serbin <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans <[email protected]>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to