Alexander Dejanovski created CASSANDRA-16406: ------------------------------------------------
Summary: Debug logging affects repair performance Key: CASSANDRA-16406 URL: https://issues.apache.org/jira/browse/CASSANDRA-16406 Project: Cassandra Issue Type: Bug Components: Consistency/Repair Reporter: Alexander Dejanovski Assignee: Alexander Dejanovski Attachments: with_debug_logging.png, without_debug_logging.png While working on the repair quality testing in CASSANDRA-16245, it appeared that the node coordinating repairs on a 20GB per node dataset was generating more than 2G of log with a total duration for the incremental repair scenarios of ~2h40m: [https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/37/workflows/6a7a41c8-0fca-4080-b37e-3b38998b3fab/jobs/49/steps ] !with_debug_logging.png! The logs showed a lot of messages from the MerkleTree class at high pace: {noformat} DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) Fully inconsistent range [#<TreeRange (-6738651767434294419,-6738622715859497972] depth=11>, #<TreeRange (-6738622715859497972,-6738593664284701525] depth=11>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (9) Inconsistent digest on right sub-range #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>: [#<OffHeapInner -6738564612709905078 hash=[b8efd3d684474276f316b1bc9f23b463cda4f8d620a4b5cf4d2c7dbb101bbe85] children=[#<OffHeapLeaf [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]> #<OffHeapLeaf [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>]>, #<OffHeapInner -6738564612709905078 hash=[95334327a0b50b7d3c51d6588a5f3d57701e0b978f6ab28e85cda3cb5a094eb5] children=[#<OffHeapLeaf [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]> #<OffHeapLeaf [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>]>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:262 - (10) Hashing sub-ranges [#<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>] for #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10> divided by midpoint -6738564612709905078 DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:272 - (10) Inconsistent digest on left sub-range #<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>: [#<OffHeapLeaf [fff431a30da07558c5106897da9f3fc3bcc598fb59e67c6959f4e70b6d3a7722]>, #<OffHeapLeaf [0d2c5c21f6b04b098f71f43d0f3f3a5a823b372eb850261774847c456629bedd]>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (10) Inconsistent digest on right sub-range #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>: [#<OffHeapLeaf [471be27589e7372e3606d92b45bc8ba07161602d7942c9a614d89ab07d21c9a7]>, #<OffHeapLeaf [981f1f0656054074b32022658560070df2253cb9373a9499f149df8e3c20f068]>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (10) Fully inconsistent range [#<TreeRange (-6738593664284701525,-6738564612709905078] depth=11>, #<TreeRange (-6738564612709905078,-6738535561135108630] depth=11>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:308 - (9) Fully inconsistent range [#<TreeRange (-6738651767434294419,-6738593664284701525] depth=10>, #<TreeRange (-6738593664284701525,-6738535561135108630] depth=10>] DEBUG [RepairJobTask:4] 2021-01-21 16:15:29,631 MerkleTree.java:292 - (8) Inconsistent digest on right sub-range #<TreeRange (-6738535561135108630,-6738419354835922841] depth=9>: [#<OffHeapInner -6738477457985515736 hash=[806ede1a35783bf5fafe8b8ccefe4d3ff48e8f0e1314f8a9ce4b23f13fed4bf9] children=[#<OffHeapInner -6738506509560312183 hash=[e6d133afbd8041266f8a1cfe456ff07c9e7debe8ff54279b579b252f2af78b6b] children=[#]> #<OffHeapInner -6738448406410719289 hash=[66bfedb588f87ad3957497728b91bd436af364e6ec40df3299d006de151ac092] children=[#]>]>, #<OffHeapInner -6738477457985515736 hash=[bf128431ddf7ad72417ce853f90abd0bc7a2a38088d5fbec864c02c516ed6458] children=[#<OffHeapInner -6738506509560312183 hash=[12a71b62ceb096369c4797f56edd91a16dd9803348b4f4cfe1454027d7dae3d2] children=[#]> #<OffHeapInner -6738448406410719289 hash=[adb59f5313473b44dd3b7fa697d72caaaa7b23b3c0610f23670942e2c137878a] children=[#]>]>]{noformat} When disabling debug logging, the duration dropped to ~2h05m with decent log sizes: [https://app.circleci.com/pipelines/github/riptano/cassandra-rtest/38/workflows/c91e6ceb-438b-4f00-b38a-d670f9afb4c3/jobs/51] !without_debug_logging.png! There's apparently too much logging for each inconsistency found in the Merkle tree comparisons and we should move this to TRACE level if we still want to allow debug logging to be turned on by default. I'll prepare a patch for the MerkleTree class and run the repair testing scenarios again to verify their duration. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org