Sahil Takiar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15914 )
Change subject: IMPALA-9435: Usability enhancements for data cache access trace ...................................................................... Patch Set 4: (6 comments) http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc File be/src/runtime/io/data-cache-trace-replayer.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@35 PS4, Line 35: // This provides a command-line utility for replaying data cache trace files. would be nice to include an example of how to invoke this from the command line. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@51 PS4, Line 51: means the cache may 2TB nit: needs revising http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace-replayer.cc@174 PS4, Line 174: LOG(INFO) << "Cache hit statistics from the original trace:"; : DumpStatisticsToLog(original_trace_stats); : LOG(INFO) << "Cache hit statistics from the replay:"; : DumpStatisticsToLog(replay_stats); my understanding of this tool is that it takes the trace from a data cache configured with a specific eviction algorithm (e.g. LRU) and then replays it using a different eviction algorithm (e.g. LRFU), and them dumps the cache hit stats for the "original trace" (e.g. LRU) and the "replay trace" (e.g. LRFU). is that correct? might be worth mentioning that somewhere in the top level comments. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h File be/src/runtime/io/data-cache-trace.h: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.h@50 PS4, Line 50: class DataCacheTrace { > Its a little weird how this class is basically a wrapper around several oth at the very least, it would be good to separate out the Replayer code from the code that actually does the tracing (e.g. the code that writes to trace files). its a bit confusing to put them together. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc File be/src/runtime/io/data-cache-trace.cc: http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@251 PS4, Line 251: uint128 hash = util_hash::CityHash128(reinterpret_cast<const char*>(filename.data()), if we are doing this because of security concerns, should we use a secure hash function like SHA? I guess this is the same function we use for 'data_cache_anonymize_trace' so maybe this isn't an issue. http://gerrit.cloudera.org:8080/#/c/15914/4/be/src/runtime/io/data-cache-trace.cc@375 PS4, Line 375: nit: extra new line -- To view, visit http://gerrit.cloudera.org:8080/15914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0f84204d8e5145f5fa8d4851d9c19ac317db168e Gerrit-Change-Number: 15914 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Joe McDonnell <joemcdonn...@cloudera.com> Gerrit-Reviewer: Sahil Takiar <stak...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Comment-Date: Mon, 01 Jun 2020 18:55:38 +0000 Gerrit-HasComments: Yes