Hello Mark, ...snip.. > > SLES10 with kernel version about 2.6.16.x, used blocking way, i.e. > > down_read(), wich has the > > potential deaklock between page lock / ip_alloc_sem when one node get the > > cluster lock and > > does writing and reading on same file on it. This deadlock was fixed by > > this commit: > > You are correct here - the change was introduced to solve a deadlock between > page lock and ip_alloc_sem(). Basically, ->readpage is going to be called > with the page lock held and we need to be aware of that. ...snip.. > > But somehow with this patch, performance in the scenario become very bad. I > > don't how this could happen? because the reading node just has only one > > thread reading the shared file, then down_read_trylock() should always get > > ip_alloc_sem successfully, right? if not, who else may race ip_alloc_sem? > > Hmm, there's only one thread and it can't get the lock? Any chance you might No, it can always get the lock in this case. Sorry, I made a false testing result. There're probably mainly two factors:
1. none-isolated testing environment - include nodes, network and shared disk; 2. testing program from customer - sleep for 1s after finishing ~1M read/write each time, thus the overlap time of read/write on two nodes is random; so the shoter overlap time is, the better performance looks. Sorry again for bothering your time. --Eric > put some debug prints around where we acquire ip_alloc_sem? It would be > interesting to see where it get taken to prevent this from happening. > --Mark > > -- > Mark Fasheh > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel