[Bug 1914911] Re: [SRU] bluefs doesn't compact log file

Ponnuvel Palaniyappan Thu, 11 Feb 2021 07:20:55 -0800

** Description changed:

- For a certain type of workload, the bluefs might never compact the log file, 
- which would cause the bluefs log file slowly grows to a huge size 
+ [Impact]
+ 
+ For a certain type of workload, the bluefs might never compact the log
+ file, which would cause the bluefs log file slowly grows to a huge size
  (some bigger than 1TB for a 1.5T device).
  
- This bug could eventually cause osd crash and failed to restart as it 
couldn't get through the bluefs replay phase during boot time.
- We might see below log when trying to restart the osd:
- bluefs mount failed to replay log: (5) Input/output error
- 
  There are more details in the bluefs perf counters when this issue happened:
- e.g. 
+ e.g.
  "bluefs": {
  "gift_bytes": 811748818944,
  "reclaim_bytes": 0,
  "db_total_bytes": 888564350976,
  "db_used_bytes": 867311747072,
  "wal_total_bytes": 0,
  "wal_used_bytes": 0,
  "slow_total_bytes": 0,
  "slow_used_bytes": 0,
  "num_files": 11,
  "log_bytes": 866545131520,
  "log_compactions": 0,
  "logged_bytes": 866542977024,
  "files_written_wal": 2,
  "files_written_sst": 3,
  "bytes_written_wal": 32424281934,
  "bytes_written_sst": 25382201
  }
  
- As we can see the log_compactions is 0, which means it's never compacted and 
the log file size(log_bytes) is already 800+G. After the compaction, the log 
file size would reduced to around 
- 1 G
+ This bug could eventually cause osd crash and failed to restart as it 
couldn't get through the bluefs replay phase during boot time.
+ We might see below log when trying to restart the osd:
+ bluefs mount failed to replay log: (5) Input/output error
  
- Here is the PR[1] that addressed this bug, we need to backport this to ubuntu 
12.2.13
- [1] https://github.com/ceph/ceph/pull/17354
+ As we can see the log_compactions is 0, which means it's never compacted
+ and the log file size(log_bytes) is already 800+G. After the compaction,
+ the log file size would need to be reduced to around 1G.
+ 
+ [Test Case]
+ 
+ Deploy a test ceph cluster (Luminous 12.2.13 which has the bug) and
+ drive I/O. The compaction doesn't get triggered often when most I/O are
+ reads. So fill up the cluster initially with lots of writes and then
+ start reading heavy reads (no writes). Then the problem should occur.
+ Smaller sized OSDs are OK as we'are only interested filling up the OSD
+ and grow the bluefs log.
+ 
+ [Where problems could occur]
+ 
+ This fix has been part of all upstream releases since Mimic, so there's been 
quite good "runtime".
+ The changes ensure that compaction happens more often. But that's not going 
to cause any problem.
+ I can't see any real problems.
+ 
+ [Other Info]
+  - It's only needed for Luminous (Bionic). All new releases since have this 
already.
+  - Upstream PR: https://github.com/ceph/ceph/pull/17354


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1914911

Title:
  [SRU] bluefs doesn't compact log file

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1914911/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1914911] Re: [SRU] bluefs doesn't compact log file

Reply via email to