Hello!

Thanks for the fast reply.

Kurt Hackel wrote:
Hi,

Alexander Finger wrote:
Hello,

my problem: When I want to create a large number of small files on any node at my ocfs2 cluster, after some time the oom killer starts killing processes because of low LowMem. All error messages and memory stats are at the end of this mail.

This is a known issue that is being currently fixed for the next scheduled release. At this time, once a node masters a lock resource (from the filesystem this would happen if the node were the first node to access that file) it cannot drop the mastery of that resource until it unmounts. The fix is nontrivial but I'm almost done with it. Once the fix is done it will need extensive testing.
This is very bad... I have prepared the whole cluster (9 nodes) already and thought I am "close" to deployment... while functional testing the clusters behavior was "normal" (bonnie & iozone reported good results) after setting the scheduler to deadline, and doing other fine tuning it crashed within minutes when I tried to copy our production data into it. I need just minutes to crash the cluster because I need the cluster to hold about 10 mio. files (each about 3-5 kB).

So I would suggest you send your fix to me for testing... once its done. ;-) ... please!

The only way to avoid this behavoir is to unmount the ocfs2 partition after some disk operations, because LowMem (LowFree) stays low until unmount... I searched the web and found many descriptions of this error, but no answer how to handle this problem.
Correct. The only current workaround is to unmount, or to attempt to spread the lock resources out across all the nodes of the cluster (which may be impossible in your usage case).
Wonderful, how can I spread the resources? I did recognize such an option at the documentation. The ocfs2 volume is needed "just" to store a fast changing and very large directory tree, containing metadata files (xml). I do not use it (at this point) for database(s) or anythying else. The cluster has a size of ~ 290 GB. If you need further information to explain if spreading the lock resources to other nodes or not may help me, I'll be happy to send them to you.


Best regards,

Alexander

--
Fotofinder GmbH         USt-IdNr. DE812854514
Software Entwicklung    Web: http://www.fotofinder.net/
Potsdamer Str. 96       Tel: +49 30 25792890
10785 Berlin            Fax: +49 30 257928999

begin:vcard
fn:Fotofinder GmbH / Alexander Finger
n:Finger;Alexander
org:Fotofinder GmbH;Software Entwicklung
adr:;;Potsdamer Str. 96;Berlin;Berlin;10785;DEU
email;internet:[EMAIL PROTECTED]
tel;work:+49 30 25792890
tel;fax:+49 30 257928999
tel;home:+49 30 25792890
x-mozilla-html:FALSE
url:http://www.fotofinder.net
version:2.1
end:vcard

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to