I'm running build 44 on a dual dual-core Opteron. Root, /usr, and swap are on SVM mirrors on a pair of SATA disks on a SiI 3114; the machine serves about a hundred NFS and SMB clients from two large UFS filesystems on Apple Xraids connected by LSI FibreChannel cards; it is also our master NIS server.
Recently the system has begun to misbehave in a truly spectacular and bizarre way when I run 'make' in /var/yp. Our only modification to /var/yp/Makefile has been to use /etc/nis instead of /etc as the source directory. We see the following misbehavior whether or not the server system itself is using nis for _anything_ in nsswitch.conf, and even if we ypstop before running the make: Within a second or so of our running make, if passwd needs to be rebuilt, make completes processing for passwd.byname and begins processing passwd.byuid, using nawk to merge passwd and shadow and feeding the result to makedbm, just as it did for passwd.byname. But this time, all other processes on the system except makedbm quickly grind to an (unkillable!) halt (we suspect it's as soon as they try to do any disk I/O but we are not sure). makedbm itself (according to truss, anyway) seems to write to the new dbm file a few times a second, consuming (according to a copy of top we left running -- once this starts, we can't run any new diagnostic tools; the shell hangs!) about 25% of one CPU. If left alone this will continue for _hours_: makedb mslooooowly writes the byuid DBM files, and nothing else on the system does any work, seemingly because it can't do any I/O. We have about 4000 users in our passwd file. I know there's an issue with keys in the DBM files used by NIS exceeding the 1024-byte limit embedded in ndbm itself but that does not seem to be what is happening here; makedbm doesn't fail, instead the whole system grinds to a halt. I can reproduce this with our password file truncated to as few as 250 users, though oddly enough it does not happen if I chop the file at 225 users and the syntax of the passwd and shadow records for lines 225-250 is OK. Has anyone ever seen anything like this before? I suspect a volume manager bug -- it's all I can think of, honestly -- since we have root, usr, and swap on mirrored volumes -- but I cannot imagine what could suddenly be triggering it. This message posted from opensolaris.org _______________________________________________ opensolaris-help mailing list opensolaris-help@opensolaris.org