We have moved our 3rd-party multiuser billing system database files from Novell NetWare to Samba (first 2.2.8a, and now, just last Friday evening, upgraded to 3.0.2a) on Mandrake Linux 9.2 (kernel version 2.4). Now, about once every week or so, we get a file corruption, and last week (even after upgrading some NICs) seemed to be even worse, with 5 or 6 problems.
Because we had no problems before changing servers, I think hardware errors are probably not to blame, even though I've seen them implicated in Samba discussions. And because it only occurs when multiple users are in a file (never otherwise, even after many, many index rebuilds and other file repair operations done by a single user), my guess is that it stems from some sort of locking or other synchronization problem. Also, so far there does not seem to be a pattern as to which workstations have errors, except generally the most-used ones. We have a mix of Win 98 and Win 2K clients (mostly the former). We used to have two Win 95 workstations, but upgraded them to 98 to try to solve the problems. No Unix programs access these files (except for nightly backups), only the billing software using Samba. The workstations still login to NetWare as the primary network login, then use the Windows networking to map the drive to Samba. Our Samba configuration file is very simple, with only one share. I've tried various combinations of these three settings: 1. I turned off all oplocks, and that didn't fix it. 2. I set sync always = yes and strict sync = yes, and that didn't fix it either. (I have turned these off & on several times to see if there's any effect.) 3. Most recently I have set strict locking = yes. Week before last we had 3 corruptions in 2 days. After the first two, that's when I finally turned on #3 above, and then within a few hours had the third corruption. The boss is really getting upset that I have to kick everyone off the system to rebuild the problem file--some of these files are > 300MB and take 2 hours or more to rebuild. He is saying another problem, and Samba goes into the trash and we revert to the Novell server. I know it's hard to track down things like this, but here are some specific questions: 1. Are there any other options anyone can suggest trying? Also, apart from a server crash, would you expect #2 to be actually relevant to the problem or not? 2. I know Samba is supposed to re-read the config file periodically, and I'm counting on that when I change the various options. But how can I really tell whether or not Samba has changed the option--and more to the point, changed its behavior? Do any of the above options have inherent delays before Samba can change? The way some of the corruptions have come shortly after I changed a setting which would be expected to make the files MORE safe, not less, have me wondering whether Samba is really changing the settings. I can use smbstatus to confirm there are no oplocks, but what about the other settings? In other words, must I stop & restart Samba after changes such as these (thereby temporarily kicking everyone off the system, a real hassle)? 3. What debugging level would be required for a developer to investigate this? Would it be preferable to be a combined log, or would separate logs for each workstation be usable? Is there a way to get Samba logs to contain only the most recent stuff leading up to a non-reproducible-on-demand incident like this, without filling them up with hours or days of clutter? 4. Does anyone know of some software I could run to actually test Samba for problems? Something that would really exercise multi-user access.? Any help would be MUCH appreciated. I'm running out of time. Thanks -- Warren -- To unsubscribe from this list go to the following URL and read the instructions: http://lists.samba.org/mailman/listinfo/samba