On 2021-02-15 6:49 a.m., László Pál wrote:
I have a server mainly used for central log collection (syslog-ng collects logs 
and put in files). It is quite big, with 5TB partition allocated for logs. I’m 
using BTRFS on this partition with on-the-fly compression, so I can save lots 
of space without playing with logrotate and compress. What I’ve observed the 
i/o performance of this partition is terrible. Right now I’m deleting some old 
logs (couple thousand files) for hours. Is there any way to improve the 
performance of a partition like this or should I consider to move data to xfs 
w/o compression and implement some logrotate based archiving?

How much cpu are you giving this server?  Compression is pretty expensive and it sounds like you are writing a large number of compressed streams simultaneously, possibly requiring multiple cpu threads.  That probably will not affect deletion, but its an obvious log performance issue.

How much memory is available for cache on this machine?  With low-memory pressure, cacheing is almost ineffective and that's going to massively increase the number of extents written to each file.

Given that this filesystem is being used for logging, I think that you have been writing a large number of files simultaneously.  That means that with insufficient cache memory to delay writes, you probably have an extreme number of file extents on disk.  Btrfs is not optimized for deletes, especially wrt. the write-lock.  Each extent is delete in turn, so you probably have the equivalent of one-by-one deletion of half a trillion files.

Multiple high-volume logs are a knotty problem for filesystems. I can't think of a normal filesystem that would handle this situation better.  Ext4 will probably be faster deleting, but not by that much.    You might want to think about doing a bit of redesign and writing the logs to an object filesystem or a database instead.

Doing logrotation will mostly resolve the extent issue, as the copy in btrfs will unify the extents into one big one, massively improving the delete performance.  Only the smaller current log will still have a large number of extents.  This will also speed up other filesystems a lot.

Another thought: Do logging to the local machine, and logrotation to the central log aggregation machine.  That will enormously speed up logging and mostly resolve the extents problem.Single-threading the logrotate seems like a good idea as well.

--

John Mellor
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure

Reply via email to