Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/12604 )
Change subject: Add timing information to /scans ...................................................................... Patch Set 1: (10 comments) http://gerrit.cloudera.org:8080/#/c/12604/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12604/1//COMMIT_MSG@7 PS1, Line 7: Add timing information to /scans > Link to KUDU-2705. :O I didn't know there was a JIRA for this. I thought someone had just mentioned it in Slack the other day. http://gerrit.cloudera.org:8080/#/c/12604/1//COMMIT_MSG@14 PS1, Line 14: Screenshot: https://github.com/wdberkeley/kudu/blob/scanspage_screenshot/ss.png > Curious that all of those scans didn't scan anything. Is that expected? Are They are scans of loadgen tables with 10K's of rows only. I added a laboriously long tooltip to explain the difference between wall time and duration. If someone needs to know what user and system time are and how they differ from duration or wall (real) time, they can look that up. The fact that wall > duration is weird. At first I thought this was because I was double-counting time by having two ScopedAddScannerTiming's in scope at once for a scanner, but that's not so. Then I thought it was because the calls to UpdateAccessTime were not covering the whole scope of the ScopedAddScannerTiming. This was true, and I adjusted things to make them match up more closely, and this should make duration >= wall. But then I tried to test it, and I see an example on my mac of a scan of < 2000 rows from 5 seconds ago that reports a duration of 855 microseconds, a wall time of 4.57s, and a system time of 1.15 minutes: real: 4.72 s user: 3.93 ms sys: 1.15 min Also, I see a number of scans with almost exactly the same weird timing. So I think what we're seeing is timing being broken or at least not working like we expect on macOS. I'm researching it a bit further but I will also try to take this patch and test it on a Linux machine. http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/scanners.h File src/kudu/tserver/scanners.h: http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/scanners.h@46 PS1, Line 46: #include "kudu/util/stopwatch.h" > warning: #includes are not sorted properly [llvm-include-order] Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/scanners.h@434 PS1, Line 434: // The cumulative amounts of wall, user cpu, and system cpu time spent on : // this scanner, in seconds. : double wall_secs; : double user_secs; : double sys_secs; > curious if you considered just using CpuTimes from stopwatch.h for this and Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/scanners.h@444 PS1, Line 444: explicit ScopedAddScannerTiming(Scanner* scanner) > Add a comment about the lifetime of the scanner vs. the scoped object. Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/scanners.h@466 PS1, Line 466: }; > should probably have a DISALLOW_COPY_AND_ASSIGN so it isn't accidentally co Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/tablet_service.cc File src/kudu/tserver/tablet_service.cc: http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/tablet_service.cc@1990 PS1, Line 1990: // TODO: add test cases for bad projections including 0 columns > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/tablet_service.cc@2182 PS1, Line 2182: // TODO: Instead of copying the pb, instead split HandleContinueScanRequest > warning: missing username/bug in TODO [google-readability-todo] Done http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/tserver_path_handlers.cc File src/kudu/tserver/tserver_path_handlers.cc: http://gerrit.cloudera.org:8080/#/c/12604/1/src/kudu/tserver/tserver_path_handlers.cc@542 PS1, Line 542: duration > can we somehow clarify in the UI the distinction between duration and wall_ Done. And see my very long response to Andrew's comment about the same issue (on the commit message). http://gerrit.cloudera.org:8080/#/c/12604/1/www/scans.mustache File www/scans.mustache: http://gerrit.cloudera.org:8080/#/c/12604/1/www/scans.mustache@29 PS1, Line 29: running time of the scan" > can we clarify how this differs from wall time? Done -- To view, visit http://gerrit.cloudera.org:8080/12604 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id30251557ae7f0d97e0d1328bd53511dae7a820b Gerrit-Change-Number: 12604 Gerrit-PatchSet: 1 Gerrit-Owner: Will Berkeley <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Andrew Wong <[email protected]> Gerrit-Reviewer: Kudu Jenkins (120) Gerrit-Reviewer: Tidy Bot (241) Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-Reviewer: Will Berkeley <[email protected]> Gerrit-Comment-Date: Wed, 27 Feb 2019 00:52:28 +0000 Gerrit-HasComments: Yes
