Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/15914 )
Change subject: IMPALA-9435: Usability enhancements for data cache access trace ...................................................................... Patch Set 5: (9 comments) http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc File be/src/runtime/io/data-cache-trace-replayer.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@35 PS4, Line 35: // This provides a command-line utility for replaying data cache trace files. > would be nice to include an example of how to invoke this from the command Added examples of usage http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@51 PS4, Line 51: > nit: needs revising Done http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@174 PS4, Line 174: Status status = ValidateFlags(); : if (!status.ok()) CLEAN_EXIT_WITH_ERROR(status.GetDetail()); : : LOG(INFO) << "Initialize cache with > my understanding of this tool is that it takes the trace from a data cache Good point, added this to the new example at the top http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-test.cc File be/src/runtime/io/data-cache-trace-test.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-test.cc@166 PS4, Line 166: // the location is a HIT in the original trace, but the replay doesn't have an entry > Might be good to expand on the comments in this file a bit, eg. note here w Add comments for the replay stats about why they are different from the original stats. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h File be/src/runtime/io/data-cache-trace.h: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h@50 PS4, Line 50: > Its a little weird how this class is basically a wrapper around several oth Changed this to use an impala::io::trace namespace. Added forward declarations, so data-cache.h no longer needs to include data-cache-trace.h. I didn't adjust the names of classes, but hopefully the names are clear enough in context. To forward declare things, I also changed the EventType enum to an enum class. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h@50 PS4, Line 50: > at the very least, it would be good to separate out the Replayer code from Do you mean put them in different files or just split it out of the DataCacheTrace class? Given the change to using a namespace, is there something else that you had in mind? http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc File be/src/runtime/io/data-cache-trace.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@251 PS4, Line 251: filename.size()); > if we are doing this because of security concerns, should we use a secure h This is the same code for data_cache_anonymize_trace. It got moved out of data-cache.cc to here. It hasn't changed much, just some minor things (different signature of CalculateBase64EscapedLen). We want it to be hard to go from the hash value back to the original filename, but we don't care about the ability for attackers to generate collisions. The question is how hard is hard enough? A regular hash means someone needs to write a program to reverse it. A secure hash function is significantly slower, but it would increase the resources needed to reverse the hash. My current thought is that a regular hash is sufficient for now, but if we think that filenames are very sensitive, we can switch to a secure hash. There is also post-processing that you could do to eliminate any information. For example, you could walk through the trace, maintaining a hash table of known entries, and then map each distinct entry to a truly random value. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@375 PS4, Line 375: > nit: extra new line Done http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache.cc File be/src/runtime/io/data-cache.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache.cc@584 PS4, Line 584: kip the actual > Maybe add a LIKELY, here and similar places below? Good point. Added LIKELY/UNLIKELY for the places that are frequently called. -- To view, visit http://gerrit.cloudera.org:8080/15914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e Gerrit-Change-Number: 15914 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Mon, 08 Jun 2020 17:22:09 +0000 Gerrit-HasComments: Yes