One thing that has been of concern is that duplicating our environment
is a challenge because of the size of the installation (4 nodes, 12
Terabytes, Fiber SAN etc.)

I decided to take some time this weekend to dust off a machine from my
collection and see if I could replicate the issue in the lab on a much
smaller and easier to duplicate platform.

I am happy to report that I have been successful.

Replicating the issue takes nothing more than a base SLES10 install on
standard hardware with a small ocfs2 partition.

Here are the steps:

1) build a SLES10 server (I used the SLES10 Evaluation DVD).
- Create 3 hard drive partitions:
  /dev/hda1 100M  ext3 /boot
  /dev/hdb2 15G   ext3 /
  /dev/hdb3 97G   (primary partition, type linux, leave unmounted)

- apply all online updates

2) Use ocfs2console to create a single node (local machine only).

3) invoke these commands on the command line to get ocfs2 running:

# /etc/init.d/o2cb config

Accept all the defaults. (this is probably not a required step)

# /etc/init.d/o2cb force-reload

Ensure there are no errors and then:

# mkfs.ocfs2 /dev/hda3
# mkdir /data
# mount -t ocfs2 /dev/hda3 /data

4) create the following script and make it executable

# joe /root/bin/populateocfs.sh

--- cut ---

#!/bin/bash
dd if=/dev/urandom of=/tmp/1Mfile count=1024 bs=1024
COUNT1=0
while [ "$COUNT1" -lt 1000 ]
  do
    cp -v /tmp/1Mfile /data/$COUNT1.1Mfile
  (( COUNT1++ ))
done

--- cut ---

# chmod 700 /root/bin/populateocfs.sh

5) In a separate shell I recommend invoking:

# vmstat 1

so you can watch the memory plummet.

6) Now invoke the script:

# /root/bin/populateocfs.sh

---------------

That's it. Free memory will drop fast. On my test system I never got
past about 200 files before I ran out of RAM.

There are 2 methods to recover the trapped memory. Either tell the
kernel to flush the caches, or delete all the files in the /data/
directory:

1) # echo 3 > /proc/sys/vm/drop_caches

or

2) # rm -R /data/[0-9]*

-------------

Let me know if you have any questions.

John


On Fri, 2007-03-09 at 17:24 -0800, Sunil Mushran wrote:
> Not to feed fuel to this thread, just wanted to mention that we will are in
> the process of reimaging some of our test boxes and should be able
> to test John's issue sometime next week.
> 
> [EMAIL PROTECTED] wrote:
> > I think sunil just wants to ensure that if something is urgent and 
> > production its good to get formal support and filing bugs for stuff in 
> > bugzilla is looked at for sure but might not get the same priority  . 
> > Nothing more.
> >
> > Note also that the guys work hard on making this product good. And they do 
> > their very best to do it the right way. I must say that sometimes language 
> > is a hit harsh and disrespectful to say the least. Which is never really 
> > appreciated.  A little bit of respect and more constructive feedback 
> > usually goes a very long way. Everyone is tryong their best.
> >
> >
> >
> > -----Original Message-----
> > From: "Alexei_Roudnev" <[EMAIL PROTECTED]>
> > To: "Sunil Mushran" <[EMAIL PROTECTED]>; "John Lange" <[EMAIL PROTECTED]>
> > Cc: "ocfs2-users" <[email protected]>; "Lars Marowsky-Bree" 
> > <[EMAIL PROTECTED]>
> > Sent: 3/8/07 5:36 PM
> > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> >
> > Sunil, you DONT UNDERSTAND.
> >
> > They DONT ASK for the SUPPORT. They ask, "HOW WE CAN REPORT A BUG?"
> >
> > I had the same problem many times - there IS NOT a simple way to report a
> > bug to the Novell. Not a surprise that
> > systems are so buggy. For the beta versions, if you are not signed as a beta
> > tester, then there is not easy way to report bug, too.
> >
> > (Signing as a beta tester means many obligations, and what if I am going to
> > test a few components only?)
> >
> > For now, I am testing SP1 Beta4 (I have access as a partner so that we can
> > test new software before it is released). I never asked for suppirt, but I
> > saw a bugs many many times, and each time when I tried to report it
> > (possible or absolute bug), it was a headache. Let's take open-iscsi -
> > it require few small improvements for 100% sure, I can test them in our
> > case, many other users can test them too, but we CAN;T REPORT them.
> >
> > (Just to have a list:
> > - lvm is not called after the iscsi, so it don't see open-iscsi. On the
> > other hand, having multiport support dropped in new iSCSI means that
> > you can't use human readable names from disk/by-path but must use multipath
> > disk ID's, so the only way to do it is lvm - but lvm is
> > not called after iSCSI;
> > - documentation have numerous bugs and don't explain how to mount iSCSI
> > disks (SuSe dropped netfs and did not add anything instead);
> > - few actions require timeouts; for example, you should wait 5 - 10 seconds
> > after discovery and before conenction.
> >
> > )
> >
> > They have the same problem. tested version, something don't work, call
> > support _We see a problem_ , and got response _you have not premium support
> > so we wil not talk_ (next time when I see your home burning, I call you and
> > you say _dont telemarket me, hang on_ instead of _what's the matter? OI,
> > it's serious. let's look together).
> >
> > The sad story is that many of these bugs are easy to fix, and that system
> > itself is excellent... but quality is far from production grade, and the
> > futher the worst.
> >
> >
> > ----- Original Message ----- 
> > From: "Sunil Mushran" <[EMAIL PROTECTED]>
> > To: "John Lange" <[EMAIL PROTECTED]>
> > Cc: "ocfs2-users" <[email protected]>; "Lars Marowsky-Bree"
> > <[EMAIL PROTECTED]>
> > Sent: Thursday, March 08, 2007 4:37 PM
> > Subject: Re: [Ocfs2-users] ocfs2 is still eating memory
> >
> >
> >   
> >> If you are running a prod shop, you should looking into buying support.
> >>
> >> John Lange wrote:
> >>     
> >>> On Mon, 2007-03-05 at 13:46 -0800, Sunil Mushran wrote:
> >>>
> >>>       
> >>>> Well, kswapd is supposed to flush the caches. As in, the vm
> >>>> controls the lifetime of the inodes in the inode_cache not ocfs2.
> >>>>
> >>>> All ocfs2 can do is free the memory associated with the inode when
> >>>> asked to. And it does that when you manually flush the cache. Qs is
> >>>> why the vm is not doing it on its own. (fwiw, you are on a beta
> >>>>         
> > kernel.)
> >   
> >>> We are using beta kernels in an attempt to solve this problem. As
> >>> everyone knows, the most recent official SUSE kernel (2.6.16.21-0.25 i
> >>> believe?) completely broke ocfs2. Downgrading to 2.6.16.21-0.15 solves
> >>> that problem but the memory issue remains.
> >>>
> >>> So as far as I am aware, there is no SUSE kernel that works with ocfs2
> >>> which is where we find ourselves today.
> >>>
> >>> I just upgraded to the latest KOTD:
> >>>
> >>> 2.6.16.42-SLES10_SP1_BRANCH_20070307114604-smp
> >>>
> >>> And still, when running ocfs2, all ram gets consumed.
> >>>
> >>> Right now Novell is playing the "you don't have premium support" game so
> >>> where should I report this bug?
> >>>
> >>> Regards,
> >>>
> >>> John Lange
> >>>
> >>>
> >>>
> >>>       
> >> _______________________________________________
> >> Ocfs2-users mailing list
> >> [email protected]
> >> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>
> >>     
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > [email protected]
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >
> >
> >
> >   
> 


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to