----- Original Message ----- From: "William Jojo" <[EMAIL PROTECTED]> To: "Jeremy Allison" <[EMAIL PROTECTED]> Cc: <[email protected]>; "Gerald (Jerry) Carter" <[EMAIL PROTECTED]>; "Andrew Tridgell" <[EMAIL PROTECTED]>; "Jeremy Allison" <[EMAIL PROTECTED]> Sent: Tuesday, February 28, 2006 4:33 PM Subject: Re: [Samba] hanging smbd(s) revisited
> > ----- Original Message ----- > From: "Jeremy Allison" <[EMAIL PROTECTED]> > To: "William Jojo" <[EMAIL PROTECTED]> > Cc: <[email protected]>; "Gerald (Jerry) Carter" <[EMAIL PROTECTED]>; > "Andrew Tridgell" <[EMAIL PROTECTED]>; "Jeremy Allison" <[EMAIL PROTECTED]> > Sent: Tuesday, February 28, 2006 3:25 PM > Subject: Re: [Samba] hanging smbd(s) revisited > > > > On Tue, Feb 28, 2006 at 01:30:40PM -0500, William Jojo wrote: > > > > > > So we've gone back to 3.0.20 and we're stable again. I should indicate > that > > > it's 3.0.20 with patches 9484, 9481 and 9456 to fix Win98 dir loop, > excel > > > shared workbook and ACLs (not necessarily in that order). > > > > > > Since the problem manifests in the filesystem where our Samba install > is, > > > and it appears to be a tdb (namely locking.tdb for fd=15, but can't > identify > > > the fd=3 that spins unmercifully), I'm wondering if *maybe* it could be > the > > > "Fix for tdb clear-if-first race condition." or some other tdb change > after > > > 3.0.20 that traded one bug for another? I'm guessing... :-) > > > > Identifying that fd would be really useful. > > Ok, dug it up. This is the IBM info. > > > ----- Original Message ----- > From: Robert Elias > To: [EMAIL PROTECTED] > Sent: Monday, February 27, 2006 12:30 PM > Subject: Pmr#47402,180 > > > Bill, > > Thank you for patience while I work through your questions. I ran this issue > by our level 3 performance team and received the following input. > > The file in question is inode 12363 in /samba. Use 'find /samba -inum 12363' > to determine the file name. > > I ran this by the Samba team members that work for IBM and they suggested > the following: > > As a long shot, I suggest that you have him run tdbtorture (a file i/o > testcase) from the samba source tree as that does a simulation of the > locking that Samba does and if we have a bug in AIX locking. > > Your comments or thoughts? > > Thanks, > > Robert Elias > AIX Duty Manager > IBM Integrated Technology Services > 214-257-9292 - T/L 972 > > > > > > > [storage:/samba/3.0.21b] # find /samba -inum 12363 > /samba/3.0.21b/var/locks/locking.tdb > > > > > > We are going to start moving to 20a, then 20b, then to 21 then back to > 21a > > > where we started (21b did it too, haven't tried 21c yet) after another > day > > > or two of 3.0.20 to make sure we're not losing our mind. > > > > I've looked over the logic for the aquiring/release of the lock > > for the locking.tdb in the 3.0.21c release code - I can't see any possible > > paths, error or otherwise where the lock can be left live on a > > record. I'll keep looking though. When it's spinning, what is the errno > that the fcntl call > > returns ? > > > > What appears to happen is pid 266946 is exiting (exited?) and some kind of > dealock has occured which shows the following in filemon.sum from the > perfpmr that IBM had me run during the event. > > > <snip> > 9603204 hooks processed (incl. 2108 utility) > 60.013 secs in measured interval > Cpu utilization: 42.9% > > Most Active Files > ------------------------------------------------------------------------ > #MBs #opns #rds #wrs file volume:inode > ------------------------------------------------------------------------ > 230.1 0 29492 0 pid=266946_fd=3 > 43.3 0 1588 129 pid=240270_fd=5 > </snip> > > > My question to IBM was how can this happen? The above inode number is what > was provided to me yesterday. > > Since moving to 3.0.20 the problem has subsided, I'm back here and not > bugging IBM at the moment. :-| > > Whatever else I can get you, just say the word. :-) > > Do you agree with us to step to 20a, 20b ... ? > > We've survived two days on 3.0.20, and our load is even more than when we started. We have over 1000 smbd's running on this machine and it's not even breaking a sweat. Now additonally, I'm looking through source/locking/locking.c I notice that diff of 3.0.20 and 20a and 20b have no changes. Then in 3.0.21 there's an invasive change. (locking/posix.c remains unchanged through 21b.) I'm pretty certain that 20a and 20b will be fine for us based on what I see, but I'm still learning (and comprehending :-) ) these changes looking for a smoking gun. And tomorrow I will put 20b (skipping 20a) in place on this server. I'm opening a bug because I think this one is real and load related. Cheers, Bill > Cheers, > > Bill > > > > Jeremy. > > -- > > To unsubscribe from this list go to the following URL and read the > > instructions: https://lists.samba.org/mailman/listinfo/samba > > > > -- > To unsubscribe from this list go to the following URL and read the > instructions: https://lists.samba.org/mailman/listinfo/samba > -- To unsubscribe from this list go to the following URL and read the instructions: https://lists.samba.org/mailman/listinfo/samba
