tstile lockup (was: Serious WAPL performance problems)

2012-10-31 Thread Edgar Fuß
Invoke crash(8), then just perform ps and t/a address on each LWP which seems to be stuck (on tstile or elsewhere). So it seems I can sort of lock up the machine for minutes with a simple dd if=/dev/zero of=/dev/dk14 bs=64k count=1000 (In case it matters, dk14 is on a RAID5 on 4+1 mpt(4)

Re: Serious WAPL performance problems

2012-10-30 Thread Stephan
Just for the log: LFS, EXT2 and TMPFS are all comparable slow when creating files. I created a small C program which counts from 1 to 3 internally and opens a corresponding file with O_CREAT. This takes half the time (9 secs.). The same program takes 0,09 secs. when just opening these files.

Re: Serious WAPL performance problems

2012-10-29 Thread Roger Pau Monné
On 25/10/12 14:54, Stephan wrote: 2012/10/25 Edgar Fuß e...@math.uni-bonn.de: Now this is getting weird. I have retried the experiment with neither softdep nor log, i.e. with a plain old FFS, and that performs as well as or even outperforms WAPL. With both a 5.1_STABLE and a 6.0_RELEASE

Re: Serious WAPL performance problems

2012-10-29 Thread Stephan
I just did this on my 6.99.14 system and it takes less than 1s: I also installed 6.99.14 in VirtualBox and it takes 40 seconds. Also I wonder which time(1) you use as thats more looking like FreeBSDs :)

Re: Serious WAPL performance problems

2012-10-29 Thread Stephan
Hi Roger It is funny someone from Citrix is responding to this NetBSD topic - I maintain a very huge Citrix installation including XenServer, XenApp and Provisioning Services. I've added some debugging to xbd_xenbus, to see if we where issuing a lot of flushes or requests, and I've found out

Re: Serious WAPL performance problems

2012-10-29 Thread Iain Hibbert
On Mon, 29 Oct 2012, Roger Pau Monné wrote: time seq 1 3 | xargs touch I did this inside of a NetBSD DomU, and the time to perform the operation when using FFSv2 and WAPL is between 27-28s. Also, when the operation finishes, and I get the time results, it takes some time for the shell

Re: Serious WAPL performance problems

2012-10-27 Thread Mindaugas Rasiukevicius
Reinoud Zandijk rein...@netbsd.org wrote: Hi! On Wed, Oct 24, 2012 at 12:07:21AM +0100, Mindaugas Rasiukevicius wrote: Easy to diagnose problems? Plain false. The lock naming you are talking about would give no *more* information than it is a vnode lock, and one can guess already that it

Re: Serious WAPL performance problems

2012-10-26 Thread Reinoud Zandijk
Hi! On Wed, Oct 24, 2012 at 12:07:21AM +0100, Mindaugas Rasiukevicius wrote: Easy to diagnose problems? Plain false. The lock naming you are talking about would give no *more* information than it is a vnode lock, and one can guess already that it is most likely the case here (what a

Re: Serious WAPL performance problems

2012-10-25 Thread Edgar Fuß
Now this is getting weird. I have retried the experiment with neither softdep nor log, i.e. with a plain old FFS, and that performs as well as or even outperforms WAPL. With both a 5.1_STABLE and a 6.0_RELEASE kernel, on a 16k fsbsize FFSv2, the svn updates takes around 5 seconds with either

Re: Serious WAPL performance problems

2012-10-25 Thread Stephan
2012/10/25 Edgar Fuß e...@math.uni-bonn.de: Now this is getting weird. I have retried the experiment with neither softdep nor log, i.e. with a plain old FFS, and that performs as well as or even outperforms WAPL. With both a 5.1_STABLE and a 6.0_RELEASE kernel, on a 16k fsbsize FFSv2, the svn

Re: Serious WAPL performance problems

2012-10-25 Thread Paul Goyette
On Thu, 25 Oct 2012, Stephan wrote: I always found FFS being slow when creating or deleting many files. For example, on 6.0 with FFSv2 and WAPBL it took 20 sec. to complete this: time seq 1 3 | xargs touch I just did this on my 6.99.14 system and it takes less than 1s: # uname -rs

Re: Serious WAPL performance problems

2012-10-25 Thread Stephan
2012/10/25 Paul Goyette p...@whooppee.com: On Thu, 25 Oct 2012, Stephan wrote: I always found FFS being slow when creating or deleting many files. For example, on 6.0 with FFSv2 and WAPBL it took 20 sec. to complete this: time seq 1 3 | xargs touch I just did this on my 6.99.14

Re: Serious WAPL performance problems

2012-10-25 Thread Edgar Fuß
I just did this on my 6.99.14 system and it takes less than 1s: Did you run iostat -D to examine whether your discs got saturated some seconds after the command finished?

Re: Serious WAPL performance problems

2012-10-25 Thread Edgar Fuß
I tried the same thing: Creating took 18.6 seconds, deleting 2.4s. Troughput (dd) is 17.5MB/s. Probably it's significant that the 2,500 .lock files that svn update creates are scattered around the directory tree?

Re: Serious WAPL performance problems

2012-10-24 Thread Edgar Fuß
I suggest trying the latest 5.1 sources Do I really need to build from source or will 5.1.2 suffice?

Re: Serious WAPL performance problems

2012-10-24 Thread Stephen Borrill
On Wed, 24 Oct 2012, Edgar Fu? wrote: I suggest trying the latest 5.1 sources Do I really need to build from source or will 5.1.2 suffice? 5.1.2 isn't enough. But use a daily build and then you don't need to build yourself: http://nyftp.netbsd.org/pub/NetBSD-daily/netbsd-5/ -- Stephen

Re: Serious WAPL performance problems

2012-10-24 Thread Edgar Fuß
As far as I remember My memory was false. Normally, I have one select and four nfsd. The select one never changes. Three nfsd's also don't change in my experiment. At 15:13:34, I get one tstile At 15:13:44, I get nfs: server not responding (from a Linux client) At 15:13:47, tstile changes to

Re: Serious WAPL performance problems

2012-10-24 Thread Edgar Fuß
Now that we have crash(8) it should not be harder than invoking ps(1). So what shall I examine during the time the discs are saturated?

Re: Serious WAPL performance problems

2012-10-24 Thread Brian Buhrow
Hello. As Steven said, you want to get the daily builds or build from the NetBSD-5 branch itself. That's because the patches that allow you to set the disk strategy for raid sets went into the NetBSD-5 branch yesterday. -Brian

Re: Serious WAPL performance problems

2012-10-24 Thread Edgar Fuß
I suggest trying the latest 5.1 sources So I set up a test machine with the latest 5.1_STABLE snapshot. It differs from the real server in a couple of points[*], but exhibits similar behaviour. On a 16k fsbsize FFSv2, 5.1 softdep takes 1.25s for the svn update. Ten seconds later, there is a two

Serious WAPL performance problems

2012-10-23 Thread Edgar Fuß
We are facing some very serious file system performance problems on 6.0 which we attribute to WAPL. Comparable 4.0.1 machines with softdep are performing much, much better. Having essentially skipped 5, I cannot easily compare log to softdep on identical hardware. The most prominent way to

Re: Serious WAPL performance problems

2012-10-23 Thread Brian Buhrow
On Oct 23, 6:51pm, Edgar =?iso-8859-1?B?RnXf?= wrote: } Subject: Serious WAPL performance problems } We are facing some very serious file system performance problems on 6.0 which } we attribute to WAPL. Comparable 4.0.1 machines with softdep are performing } much, much better. Having

Re: Serious WAPL performance problems

2012-10-23 Thread Edgar Fuß
the output of ps -lax on the NFS server during the 18-20 second window As far as I remember (you need the s option, too), the main nfsd thread is on select, one subthread on biowait or biolock and the others on tstile.

Re: Serious WAPL performance problems

2012-10-23 Thread Edgar Fuß
Also look for sync calls. There are none. To verify this, try setting vfs.wapbl.flush_disk_cache=0 on the server. That doesn't change the behaviour. Acting locally, I still get 100% busy discs for 18 seconds some seconds after the svn update finishes.

Re: Serious WAPL performance problems

2012-10-23 Thread David Holland
On Tue, Oct 23, 2012 at 07:53:28PM +0200, Edgar Fu? wrote: the output of ps -lax on the NFS server during the 18-20 second window As far as I remember (you need the s option, too), the main nfsd thread is on select, one subthread on biowait or biolock and the others on tstile. It would

Re: Serious WAPL performance problems

2012-10-23 Thread Mindaugas Rasiukevicius
David Holland dholland-t...@netbsd.org wrote: On Tue, Oct 23, 2012 at 07:53:28PM +0200, Edgar Fu? wrote: the output of ps -lax on the NFS server during the 18-20 second window As far as I remember (you need the s option, too), the main nfsd thread is on select, one subthread on

Re: Serious WAPL performance problems

2012-10-23 Thread Brian Buhrow
On Oct 24, 12:07am, Mindaugas Rasiukevicius wrote: } Subject: Re: Serious WAPL performance problems } David Holland dholland-t...@netbsd.org wrote: } On Tue, Oct 23, 2012 at 07:53:28PM +0200, Edgar Fu? wrote: } the output of ps -lax on the NFS server during the 18-20 second } window

Re: Serious WAPL performance problems

2012-10-23 Thread Mindaugas Rasiukevicius
Hello Brian, Brian Buhrow buh...@nfbcal.org wrote: Hello. I think you two are talking past each other. While it's true that having a lock name isn't necessarily enough information to diagnose a problem, it's a lot better than having nothing. I've worked on systems where all you could