The patch fix is in testing. We are aiming to release 1.2.4
sometime late september.

Peter McMahon wrote:
Kurt
we are facing the exact same problem. We use OCFS2
ocfs2-2.6.9-34.ELsmp-1.2.3-1 for aSHARED APPL_TOP in
an  11i env.

We hit the problem when trying to do backups!

Any timeframe on the latest version where this OOM
problem will be resolved?

Any advice on best approach for backing up an OCFS2
volume with 100's of thousands of files, such as an
APPL_TOP (or multiple APPL_TOPs as in our case).

TIA

Peter




--- Alexander Finger <[EMAIL PROTECTED]> wrote:

Hello!

Thanks for the fast reply.

Kurt Hackel wrote:
Hi,

Alexander Finger wrote:
Hello,

my problem: When I want to create a large number
of small files on
any node at my ocfs2 cluster, after some time the
oom killer starts
killing processes because of low LowMem. All
error messages and
memory stats are at the end of this mail.
This is a known issue that is being currently
fixed for the next
scheduled release.  At this time, once a node
masters a lock resource
(from the filesystem this would happen if the node
were the first node
to access that file) it cannot drop the mastery of
that resource until
it unmounts.  The fix is nontrivial but I'm almost
done with it. Once
the fix is done it will need extensive testing.
This is very bad... I have prepared the whole
cluster (9 nodes) already and thought I am "close" to deployment... while functional testing the clusters behavior was "normal" (bonnie & iozone reported good results) after setting the scheduler to deadline, and doing other fine tuning it crashed within minutes when I tried to copy our production data into it. I need just minutes to crash the cluster because I need the cluster to hold about 10 mio. files (each about 3-5 kB).

So I would suggest you send your fix to me for
testing... once its done. ;-) ... please!
The only way to avoid this behavoir is to unmount
the ocfs2 partition
after some disk operations, because LowMem
(LowFree) stays low until
unmount... I searched the web and found many
descriptions of this
error, but no answer how to handle this problem.
Correct.  The only current workaround is to
unmount, or to attempt to
spread the lock resources out across all the nodes
of the cluster
(which may be impossible in your usage case).
Wonderful, how can I spread the resources? I did
recognize such an option at the documentation. The ocfs2 volume is needed "just" to store a fast changing and very large directory tree, containing metadata files (xml). I do not use it (at this point) for database(s) or anythying else. The cluster has a size of ~ 290 GB. If you need further information to explain if spreading the lock resources to other nodes or not may help me, I'll be happy to send them to you.


Best regards,

Alexander

--
Fotofinder GmbH         USt-IdNr. DE812854514
Software Entwicklung    Web:
http://www.fotofinder.net/
Potsdamer Str. 96       Tel: +49 30 25792890
10785 Berlin            Fax: +49 30 257928999

begin:vcard
fn:Fotofinder GmbH / Alexander Finger
n:Finger;Alexander
org:Fotofinder GmbH;Software Entwicklung
adr:;;Potsdamer Str. 96;Berlin;Berlin;10785;DEU
email;internet:[EMAIL PROTECTED]
tel;work:+49 30 25792890
tel;fax:+49 30 257928999
tel;home:+49 30 25792890
x-mozilla-html:FALSE
url:http://www.fotofinder.net
version:2.1
end:vcard

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users




                
____________________________________________________ On Yahoo!7 Answers: Real people ask and answer questions on any topic. http://www.yahoo7.com.au/answers

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to