[ ... ] > After deleting a large amount of data (~100G) we noticed the ocfs2 > file system go "offline" for a few minutes at different times > throughout the day. By "offline" I mean it is not accessible to the > server, through samba, or through nfs mounts.
Seems rather unsurprising to me... > After about 5 minutes it is back online After a time, it is different > each day, it will happen again. There is no pattern to the "offline" > status and it is not predictable. Seems rather unsurprising to me... > If we run an fsck.ocfs2 -f on the file system it seems to clear the > issue (at least we have run for 1 full week without the issue > returning). Even if you delete a large amount of data or metadata during that week? That would be a bit surprising, and seems to indicate that your ordinary SMB/NFS workload involves a lot of filesystem updates. > We noticed it again after deleting ~125G of data (1.5Million files). > After running fsck.ocfs2 the system stabilized. Seems especially unsurprising to me... Interesting that the description of the underlying hw is omitted, in particular how many servers share the filetree, and the latency and structure of the storage subsystem. Perhaps asking some questions may help, for example: * How many random IOPS can the underlying storage system deliver? * How many metadata operations per second do you expect a heavily shared and interlocked filetree to deliver under load? * Specifically, how many days do you expect the deletion of 1.5m million files to take (taking into account directory and free list update operations) on a heavily shared interlocked filetree? * How "smooth" do you expect massive metadata updating workloads to be in terms of completion rates, given arm competition with file serving workloads and CPU and disk scheduling policies and local buffering and sizes of the queues before the interlocks? * How strong is your imagination as to the ability of shared filesystems to have extremely low latency and very high throughput for doing mass metadata updates concurrently with ordinary file-serving workloads? * Are you also suprised when even non-shared/interlocked filesystems seem to become "stuck" during heavy write-based workloads especially when large cache flushes happen on HBAs with unwise elevator policies? (extra points if you are running any of the OCFS2 systems on a VM! :->) While some filesystem users would be content with 'O_PONIES' some even wish for 'O_UNICORNS'... Note: there have been OCFS2 "performance bugs" in the past, and some real bugs with Samba, but the situation reported here does not need to be related to bugs. _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users