Mark. Did you try to use interconnection to make additional 'ocfs driver <-> disks' communication channel? It could help in some cases, such as 'one node experienced IO errors while other do not', and can help to detect the case 'overall IO problem so no sense to fence until at least some nodes restore disk conenction'.
Notice, that it is all low level IO. Of course, there can be a problem because of system caching mechanizm, but just in case if it can be implemented, such bypass could increase ocfsv2 stability dramatically (and eliminate most of well known undesirable 'self fencing' cases). In addition, I wounder if drdb can be used except for the testing - it makes very possible simultaneous _interconnection glitch and disk desyncronization_ so making possible many failure scenarious. ----- Original Message ----- From: "Mark Fasheh" <[EMAIL PROTECTED]> To: "Philipp Wehrheim" <[EMAIL PROTECTED]> Cc: <[email protected]> Sent: Wednesday, June 27, 2007 11:12 AM Subject: Re: [Ocfs2-users] OCFS2 benchmark slow concurrent write > On Wed, Jun 27, 2007 at 02:11:00PM +0200, Philipp Wehrheim wrote: > > I did some more benchmarks writing 99 chars (one line) into a file > > but the time this takes vary between 40 us and 1s so this is way to > > variable for our applications. > > The average for writing one line is 175 ms what is ok. > > > > The benchmark was done with different kernel: > > > > - with and without preemption > > - with 100 and 1000Hz clock frequency > > - etc > > As Sunil noted, I'd suggest you try a gigabit connection. > > But the purpose of my e-mail is really to answer your second question below :) > > > > I also mailed to the drbd list and the response was that it is > > probably an dlm issue/dlm = bottleneck. > > Further more the recommendation was that I should nether write > > concurrently into one file nor into two file in one directory. > > > > Is this right? > > If you're looking to gain maximum performance, yes. Either that or you do > writes with O_DIRECT in which case the file system will avoid some lock > pinging. Meta data and buffered data operations have to always be cache > coherent though. > > Basically transferring control of a shared resource between nodes is > expensive - it involves some combination of journal flushes, data writeout, > and cache invalidation depending on what level of control is being given up. > For what it's worth, this is a bottleneck in most symmetric shared disk file > systems (at least the ones I've in detail looked at). Ocfs2 tries very hard > to keep operations node local - we have a purely node local allocation cache > as well as a deallocation cache and many per-node system files. Ultimately > though individual user files and directories have to be seen by all nodes. > > So if you're at the stage where you can design the layout of your > application I'd recommend that the performance sensitive components avoid > concurrent buffered writes to shared files or rapid creation of files in > shared directories. The case where there is one writer and many readers > performs much better, but will still be slower than each node chugging away > in their own area. > --Mark > > -- > Mark Fasheh > Senior Software Developer, Oracle > [EMAIL PROTECTED] > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
