Grant Henke has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12509


Change subject: Fix tracing of log appending and reduce log-related log spam
......................................................................

Fix tracing of log appending and reduce log-related log spam

While messing around on a server I noticed a funny trace:

0212 16:29:26.554846 (+     0us) service_pool.cc:163] Inserting onto call queue
0212 16:29:26.554854 (+     8us) service_pool.cc:222] Handling call
0212 16:29:28.219369 (+1664515us) inbound_call.cc:157] Queueing success response
Related trace 'txn':
0212 16:29:26.561719 (+     0us) write_transaction.cc:101] PREPARE: Starting
0212 16:29:26.562102 (+   383us) write_transaction.cc:268] Acquiring schema 
lock in shared mode
0212 16:29:26.562103 (+     1us) write_transaction.cc:271] Acquired schema lock
0212 16:29:26.562104 (+     1us) tablet.cc:400] PREPARE: Decoding operations
0212 16:29:26.599420 (+ 37316us) tablet.cc:422] PREPARE: Acquiring locks for 
6376 operations
0212 16:29:26.611285 (+ 11865us) tablet.cc:426] PREPARE: locks acquired
0212 16:29:26.611286 (+     1us) write_transaction.cc:126] PREPARE: finished.
0212 16:29:26.611389 (+   103us) write_transaction.cc:136] Start()
0212 16:29:26.611392 (+     3us) write_transaction.cc:141] Timestamp: P: 
1550017766611388 usec, L: 0
0212 16:29:26.613188 (+  1796us) log.cc:582] Serialized 10493083 byte log entry
0212 16:29:26.735023 (+121835us) write_transaction.cc:149] APPLY: Starting
0212 16:29:28.213010 (+1477987us) tablet_metrics.cc:365] ProbeStats: 
bloom_lookups=27143,key_file_lookups=5,delta_file_lookups=0,mrs_lookups=6376
0212 16:29:28.214317 (+  1307us) log.cc:582] Serialized 38279 byte log entry
0212 16:29:28.214376 (+    59us) write_transaction.cc:309] Releasing row and 
schema locks
0212 16:29:28.216505 (+  2129us) write_transaction.cc:277] Released schema lock
0212 16:29:28.219357 (+  2852us) write_transaction.cc:196] FINISH: updating 
metrics
0212 16:29:28.219709 (+   352us) write_transaction.cc:309] Releasing row and 
schema locks
0212 16:29:28.219709 (+     0us) write_transaction.cc:277] Released schema lock
0212 16:29:28.824261 (+604552us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegment4us8LW
0212 16:29:29.531241 (+706980us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentGJU550
0212 16:29:30.254932 (+723691us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenteWEBF9
0212 16:29:31.005554 (+750622us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentsAIJzm
0212 16:29:31.695339 (+689785us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentsBimGD
0212 16:29:32.443351 (+748012us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentamea7Y
0212 16:29:33.180797 (+737446us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentSam2No
0212 16:29:33.905360 (+724563us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenthfhQDS
0212 16:29:34.634994 (+729634us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentxDVwMq
0212 16:29:35.384800 (+749806us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentjMi862
0212 16:29:36.080099 (+695299us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentRgaDAJ
0212 16:29:36.822293 (+742194us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentwSXUHR
0212 16:29:37.540434 (+718141us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentCp3SMG
0212 16:29:38.289865 (+749431us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentyEttfA
0212 16:29:38.993878 (+704013us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegment7OTQ23
0212 16:29:39.759563 (+765685us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentzk7y0x
0212 16:29:40.495284 (+735721us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmenttvkp8B
0212 16:29:41.289037 (+793753us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentnhuvEK
0212 16:29:41.993331 (+704294us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentVCJgnX
0212 16:29:42.710348 (+717017us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentrZTzve
0212 16:29:43.458553 (+748205us) log.cc:1006] Preallocating 8388608 byte 
segment in 
/data/15/kudu/wals/b5ecd0b8d23a4a529650e5871741f23e/.kudutmp.newsegmentffNUWz

The issue is that this txn caused the append thread to wake up, which
meant that the append thread adopted the txn's trace. Then, several
batches were written to the wal during that one active period of the
append thread, and some of them caused WAL preallocation, which caused
the trace messages to be appended to the trace buffer of the original
txn that woke the append thread, even though it had nothing to do with
those preallocations, and in the default cause it is not blocked by any
preallocation because preallocation of WAL segments is asynchronous.

So this is misleading, as none of those preallocations delayed the write.
The simple thing to do is to remove the TRACE() call for preallocation.
There are still TRACE_EVENT()'s for tracking the activity of the append
thread using the tracing interface, and to cover the case of synchronous
segment allocation and rolling, I added a scoped latency counter for
synchronous log rolls.

Additionally, I noticed that logs tend to get spammed with log rolling
and preallocation messages like

I0214 16:38:30.307582 14666 log.cc:649] T f445674ede8d4b6590ad5002764eeae7 P 
09d6bf7a02124145b43f43cb7a667b3d: Max segment size reached. Starting new 
segment allocation
I0214 16:38:30.430410 14666 log.cc:576] T f445674ede8d4b6590ad5002764eeae7 P 
09d6bf7a02124145b43f43cb7a667b3d: Rolled over to a new log segment at 
/data/15/kudu/wals/f445674ede8d4b6590ad5002764eeae7/wal-000001116

These messages aren't really useful. If preallocation or rolling is
slow, there are already slow execution logging scopes that will alert us
through the logs. I dropped their level to VLOG(1).

Change-Id: Ia50698e3af321b4ab87ee3974525dea6fc551613
Reviewed-on: http://gerrit.cloudera.org:8080/12491
Reviewed-by: Adar Dembo <[email protected]>
Reviewed-by: Andrew Wong <[email protected]>
Tested-by: Kudu Jenkins
(cherry picked from commit 3574e3eb55a6bc815a9095636c4e08edcfa7c6b0)
---
M src/kudu/consensus/log.cc
1 file changed, 4 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/12509/1
--
To view, visit http://gerrit.cloudera.org:8080/12509
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: branch-1.9.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia50698e3af321b4ab87ee3974525dea6fc551613
Gerrit-Change-Number: 12509
Gerrit-PatchSet: 1
Gerrit-Owner: Grant Henke <[email protected]>

Reply via email to