Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/15914 )

Change subject: IMPALA-9435: Usability enhancements for data cache access trace
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc
File be/src/runtime/io/data-cache-trace-replayer.cc:

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@35
PS4, Line 35: // This provides a command-line utility for replaying data cache 
trace files.
> would be nice to include an example of how to invoke this from the command
Added examples of usage


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@51
PS4, Line 51:
> nit: needs revising
Done


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@174
PS4, Line 174:  Status status = ValidateFlags();
             :   if (!status.ok()) CLEAN_EXIT_WITH_ERROR(status.GetDetail());
             :
             :   LOG(INFO) << "Initialize cache with
> my understanding of this tool is that it takes the trace from a data cache
Good point, added this to the new example at the top


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-test.cc
File be/src/runtime/io/data-cache-trace-test.cc:

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-test.cc@166
PS4, Line 166:   // the location is a HIT in the original trace, but the replay 
doesn't have an entry
> Might be good to expand on the comments in this file a bit, eg. note here w
Add comments for the replay stats about why they are different from the 
original stats.


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h
File be/src/runtime/io/data-cache-trace.h:

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h@50
PS4, Line 50:
> Its a little weird how this class is basically a wrapper around several oth
Changed this to use an impala::io::trace namespace. Added forward declarations, 
so data-cache.h no longer needs to include data-cache-trace.h.

I didn't adjust the names of classes, but hopefully the names are clear enough 
in context.

To forward declare things, I also changed the EventType enum to an enum class.


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h@50
PS4, Line 50:
> at the very least, it would be good to separate out the Replayer code from
Do you mean put them in different files or just split it out of the 
DataCacheTrace class?

Given the change to using a namespace, is there something else that you had in 
mind?


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc
File be/src/runtime/io/data-cache-trace.cc:

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@251
PS4, Line 251:       filename.size());
> if we are doing this because of security concerns, should we use a secure h
This is the same code for data_cache_anonymize_trace. It got moved out of 
data-cache.cc to here. It hasn't changed much, just some minor things 
(different signature of CalculateBase64EscapedLen).

We want it to be hard to go from the hash value back to the original filename, 
but we don't care about the ability for attackers to generate collisions.

The question is how hard is hard enough? A regular hash means someone needs to 
write a program to reverse it. A secure hash function is significantly slower, 
but it would increase the resources needed to reverse the hash.

My current thought is that a regular hash is sufficient for now, but if we 
think that filenames are very sensitive, we can switch to a secure hash.

There is also post-processing that you could do to eliminate any information. 
For example, you could walk through the trace, maintaining a hash table of 
known entries, and then map each distinct entry to a truly random value.


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@375
PS4, Line 375:
> nit: extra new line
Done


http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache.cc
File be/src/runtime/io/data-cache.cc:

http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache.cc@584
PS4, Line 584: kip the actual
> Maybe add a LIKELY, here and similar places below?
Good point. Added LIKELY/UNLIKELY for the places that are frequently called.



--
To view, visit http://gerrit.cloudera.org:8080/15914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e
Gerrit-Change-Number: 15914
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com>
Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Comment-Date: Mon, 08 Jun 2020 17:22:09 +0000
Gerrit-HasComments: Yes

Reply via email to