Grant Henke has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12509
Change subject: Fix tracing of log appending and reduce log-related log spam ...................................................................... Fix tracing of log appending and reduce log-related log spam While messing around on a server I noticed a funny trace: 0212 16:29:26.554846 (+ 0us) service_pool.cc:163] Inserting onto call queue 0212 16:29:26.554854 (+ 8us) service_pool.cc:222] Handling call 0212 16:29:28.219369 (+1664515us) inbound_call.cc:157] Queueing success response Related trace 'txn': 0212 16:29:26.561719 (+ 0us) write_transaction.cc:101] PREPARE: Starting 0212 16:29:26.562102 (+ 383us) write_transaction.cc:268] Acquiring schema lock in shared mode 0212 16:29:26.562103 (+ 1us) write_transaction.cc:271] Acquired schema lock 0212 16:29:26.562104 (+ 1us) tablet.cc:400] PREPARE: Decoding operations 0212 16:29:26.599420 (+ 37316us) tablet.cc:422] PREPARE: Acquiring locks for 6376 operations 0212 16:29:26.611285 (+ 11865us) tablet.cc:426] PREPARE: locks acquired 0212 16:29:26.611286 (+ 1us) write_transaction.cc:126] PREPARE: finished. 0212 16:29:26.611389 (+ 103us) write_transaction.cc:136] Start() 0212 16:29:26.611392 (+ 3us) write_transaction.cc:141] Timestamp: P: 1550017766611388 usec, L: 0 0212 16:29:26.613188 (+ 1796us) log.cc:582] Serialized 10493083 byte log entry 0212 16:29:26.735023 (+121835us) write_transaction.cc:149] APPLY: Starting 0212 16:29:28.213010 (+1477987us) tablet_metrics.cc:365] ProbeStats: bloom_lookups=27143,key_file_lookups=5,delta_file_lookups=0,mrs_lookups=6376 0212 16:29:28.214317 (+ 1307us) log.cc:582] Serialized 38279 byte log entry 0212 16:29:28.214376 (+ 59us) write_transaction.cc:309] Releasing row and schema locks 0212 16:29:28.216505 (+ 2129us) write_transaction.cc:277] Released schema lock 0212 16:29:28.219357 (+ 2852us) write_transaction.cc:196] FINISH: updating metrics 0212 16:29:28.219709 (+ 352us) write_transaction.cc:309] Releasing row and schema locks 0212 16:29:28.219709 (+ 0us) write_transaction.cc:277] Released schema lock 0212 16:29:28.824261 (+604552us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegment4us8LW 0212 16:29:29.531241 (+706980us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentGJU550 0212 16:29:30.254932 (+723691us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenteWEBF9 0212 16:29:31.005554 (+750622us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentsAIJzm 0212 16:29:31.695339 (+689785us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentsBimGD 0212 16:29:32.443351 (+748012us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentamea7Y 0212 16:29:33.180797 (+737446us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentSam2No 0212 16:29:33.905360 (+724563us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenthfhQDS 0212 16:29:34.634994 (+729634us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentxDVwMq 0212 16:29:35.384800 (+749806us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentjMi862 0212 16:29:36.080099 (+695299us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentRgaDAJ 0212 16:29:36.822293 (+742194us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentwSXUHR 0212 16:29:37.540434 (+718141us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentCp3SMG 0212 16:29:38.289865 (+749431us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentyEttfA 0212 16:29:38.993878 (+704013us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegment7OTQ23 0212 16:29:39.759563 (+765685us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentzk7y0x 0212 16:29:40.495284 (+735721us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenttvkp8B 0212 16:29:41.289037 (+793753us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentnhuvEK 0212 16:29:41.993331 (+704294us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentVCJgnX 0212 16:29:42.710348 (+717017us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentrZTzve 0212 16:29:43.458553 (+748205us) log.cc:1006] Preallocating 8388608 byte segment in /data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentffNUWz The issue is that this txn caused the append thread to wake up, which meant that the append thread adopted the txn's trace. Then, several batches were written to the wal during that one active period of the append thread, and some of them caused WAL preallocation, which caused the trace messages to be appended to the trace buffer of the original txn that woke the append thread, even though it had nothing to do with those preallocations, and in the default cause it is not blocked by any preallocation because preallocation of WAL segments is asynchronous. So this is misleading, as none of those preallocations delayed the write. The simple thing to do is to remove the TRACE() call for preallocation. There are still TRACE_EVENT()'s for tracking the activity of the append thread using the tracing interface, and to cover the case of synchronous segment allocation and rolling, I added a scoped latency counter for synchronous log rolls. Additionally, I noticed that logs tend to get spammed with log rolling and preallocation messages like I0214 16:38:30.307582 14666 log.cc:649] T f445674ede8d4b6590ad5002764eeae7 P 09d6bf7a02124145b43f43cb7a667b3d: Max segment size reached. Starting new segment allocation I0214 16:38:30.430410 14666 log.cc:576] T f445674ede8d4b6590ad5002764eeae7 P 09d6bf7a02124145b43f43cb7a667b3d: Rolled over to a new log segment at /data/15/kudu/wals/f445674ede8d4b6590ad5002764eeae7/wal-000001116 These messages aren't really useful. If preallocation or rolling is slow, there are already slow execution logging scopes that will alert us through the logs. I dropped their level to VLOG(1). Change-Id: Ia50698e3af321b4ab87ee3974525dea6fc551613 Reviewed-on: http://gerrit.cloudera.org:8080/12491 Reviewed-by: Adar Dembo <[email protected]> Reviewed-by: Andrew Wong <[email protected]> Tested-by: Kudu Jenkins (cherry picked from commit 3574e3eb55a6bc815a9095636c4e08edcfa7c6b0) --- M src/kudu/consensus/log.cc 1 file changed, 4 insertions(+), 3 deletions(-) git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/12509/1 -- To view, visit http://gerrit.cloudera.org:8080/12509 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: kudu Gerrit-Branch: branch-1.9.x Gerrit-MessageType: newchange Gerrit-Change-Id: Ia50698e3af321b4ab87ee3974525dea6fc551613 Gerrit-Change-Number: 12509 Gerrit-PatchSet: 1 Gerrit-Owner: Grant Henke <[email protected]>
