I encountered a serious samba problem and want to publish details for public 
benefit.? 

SLES 10 server running Samba 3.0.28 as domain controller, file and cups print 
server, running uneventfully for 2 years suddenly drops all users, load rapidly 
grows to about 250 and becomes unresponsive. smbstatus reveals that every user 
has about 10 instances of smbd instead of one. CPU (dual processsor, dual core) 
utilization very low (2 % - mostly X and top).? Reboot clears problem but issue 
returns every 30 minutes or so.? Logs are empty of any usefull info:? 
/var/log/messages and /var/log/samba/log.smbd.? dmesg shows no errors. System 
is not using any swap space.? Server passes all diagnostics possible. System is 
fully patched. 2tb raid array attached via 320 SCSI checks fsck clean with zero 
errors and so does each of the local file system slices. File system limit not 
reached, limit of ~202000 , lsof says only 8800 files open during load 
spool-up. ? 50 irritated people idle.? 

Grasping at straws,? we verify all 50 Windows XP clients have latest virus sigs 
and we do deep scan of every machine.? Two virus' discovered, but niether 
seemed responsible. 

A clue comes in from a user.? "Every time I try to open a certain file, my 
system freezes".. Oh really...

I go to the subdirectory, via linux console, where the suspect file is located 
and ls the directory.? 9 files.? ls -al gets Killed. After ls -al filename for 
each of the 9 files, I determine that 5 of these files are badly corrupt.? I 
perform an experiment.? Tell everyone to leave these files alone, reboot the 
server and it runs happily for an hour.? Load is .05 average.? I ask one user 
to attempt to open one of the corrupt files, and instantly all 50 smbd daemons 
go to uninterruptible sleep and every WinXP client instantly re-establishes its 
smbd session with the server and these (all 50) smbd sessions also die and go 
to heaven.? This cycle continues rapidly sending the load sky high with no cpu 
utilization to speak of. 

The short term fix is to move the offending directory to another place on the 
volume which is out of scope of any share.? Not sure how to delete these files 
as linux tools seem unable to handle them.? 

Questions that remain:
1.? Why do all client smbd daemons have to die if only one of them ran into 
trouble?
2.? How do files get in a state that they can't be viewed or managed?? virus, 
lack of sunspots?
3.? Why did the fsck say that the filesystem was fine, when obviously it isn't?
4.? How to delete these poison files?

Karl








-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/listinfo/samba

Reply via email to