On 02/28/2012 04:37 PM, Sunil Mushran wrote: > In 1.4, the local allocator window is small. 8MB. Meaning the node > has to hit the global bitmap after every 8MB. In later releases, the > window is much larger. >
I'll just mention again in this note: if there are any sysadmins successfully running OCFS2 1.6 on RHEL5, I would very much like to discuss particulars. > Second, a single node is not a good baseline. A better baseline is > multiple nodes writing concurrently to the block device. Not fs. > Use dd. Set different write offsets. This should help figure out how > the shared device works with multiple nodes. > Benchmarking against the block device was a good idea. I took it a step further and eliminated DM-Multipath from the equation, too, by writing to the /dev/sd* device (representing one path to the LUN) on each node. Inasmuch as dd(1) is capable of accurate benchmarking, I feel comfortable that I have reasonable data to work with now. I ran a large number of tests with varying block sizes, and both synchronous and direct I/O. The conclusion: I don't think OCFS2 is the problem. Concurrent (from both nodes) writes to the block device consistently perform at around ~75 MB/sec. That _implies_ that OCFS2's overhead is not bad -- apparently ~10 MB/sec in my environment. I need to work with the SAN storage support. That's another thread. Thanks again for the reply. > On 2/28/2012 9:24 AM, Erik Schwartz wrote: >> I have a two-node RHEL5 cluster that runs the following Linux kernel and >> accompanying OCFS2 module packages: >> >> * kernel-2.6.18-274.17.1.el5 >> * ocfs2-2.6.18-274.17.1.el5-1.4.7-1.el5 >> >> A 2.5TB LUN is presented to both nodes via DM-Multipath. I have carved >> out a single partition (using the entire LUN), and formatted it with OCFS2: >> >> # mkfs.ocfs2 -N 2 -L 'foofs' -T datafiles /dev/mapper/bams01p1 >> >> Finally, the filesystem is mounted to both nodes with the following options: >> >> # mount | grep bams01 >> /dev/mapper/bams01p1 on /foofs type ocfs2 >> (rw,_netdev,noatime,data=writeback,heartbeat=local) >> >> ---------- >> >> When a single node is writing arbitrary data (i.e. dd(1) with /dev/zero >> as input) to a large (say, 10 GB) file in /foofs, I see the expected >> performance of ~850 MB/sec. >> >> If both nodes are concurrently writing large files full of zeros to >> /foofs, performance drops way down to ~45 MB/s. I experimented with each >> node writing to /foofs/test01/ and /foofs/test02/ subdirectories, >> respectively, and found that performance increased slightly to a - still >> poor - 65 MB/s. >> >> ---------- >> >> I understand from searching past mailing list threads that the culprit >> is likely related to the negotiation of file locks, and waiting for data >> to be flushed to journal / disk. >> >> My two questions are: >> >> 1. Does this dramatic write performance slowdown sound reasonable and >> expected? >> >> 2. Are there any OCFS2-level steps I can take to improve this situation? >> >> >> Thanks - >> > > > _______________________________________________ > Ocfs2-users mailing list > Ocfs2-users@oss.oracle.com > http://oss.oracle.com/mailman/listinfo/ocfs2-users -- Erik Schwartz <schwartz.eri...@gmail.com> | GPG key 14F1139B _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users