Hi Jay,
On 30/01/2012 01:03, Jay Sullivan wrote:
I see a tiny correlation when our (Winodws) domain controllers
reboot. After patch MS patch Tuesday, I'm guaranteed at least one
winbind failure when the DC that I'm presently connected to reboots.
In my kerb config, I'm using a kdc address that round-robins to all
of our DCs. When the DC reboots, it's taken out of the rotation, so
that shouldn't cause any connection loss, right? Sometime next week
we won't have any more 2003 domain controllers--all will be replaced
with 2008. Maybe this will "solve" my problem?
This sounds near exactly like our config. KDC setup in /etc/krb5.conf
exactly the same. We're running (mostly?) 2008r2 DCs and nope it doesn't
look like it's solved. I have for a long time suspected that the reboots
are the cause. At one point we had a static list of DCs and when the
first one went down we had to restart. It does seem that samba doesn't
reconnect to the 2nd in the list when the first disappears.
At the height of my issue, I was seeing winbind problems every 2
hours or so. This was on Debian 5 with Samba 3.4.latest. I've since
moved to RHEL 6 and Samba 3.5.10.blah. Since moving to RHEL/Samba
3.5, I've experienced significantly less problems with winbind, maybe
a few times a week (that I've detected). At the same time, some of
our oldest 2003 domain controllers were retired, so this could be a
case of correlation != causation.
We're running a mix of Debian Lenny and Squeeze. Squeeze almost seems
worst but I think that's just a perception as these services are more
frequently used.
The symptoms are the same as Matthew. When I try 'getent
usernamethatisnotincache', I get nothing. Cached users are fine.
Similar results with 'id'. Restarting winbind "fixes" it.
I started logging a bunch of stuff when my script picked up a winbind
failure. Sometimes, but not always, there would be several extra
winbindd processes running. I usually have 8 winbindd processes (we
have a few trusted domains, it seems that increases the number of
winbindd processes) running, but a snapshot of 'ps' before I
restarted winbind would show maybe 10 or 12 winbindd processes.
That sounds familiar.
I also cranked up the log level for a while, but my untrained eye
couldn't seem to make any correlation to a specific event before
non-cached winbind lookups started to fail.
It might be worth checking the event logs of the DCs for correlating the
reboots to failures (or when the log entries start appearing). We have a
separate group of people maintaining the Windows environment so I'll ask
them for info.
Thanks very much for your comments. =]
Matt
-----Original Message----- From: Matthew Baker
[mailto:[email protected]] Sent: Sunday, January 29, 2012 6:21
PM To: Jay Sullivan; [email protected] Subject: Re: winbind craps
out, NT_STATUS_PIPE_BROKEN
Hi Jay,
thanks for your comments on your workaround. I too come from an
environment where there are 1000s of users to pick from who're
unlikely to login. I found that using the command "getent passwd
username" just came back empty when the aforementioned error shows in
the log. I don't suppose you've noticed a point in time when the pipe
"breaks"? I would be interested to find what causes the break, a
change in AD or the server running winbind? If we could detect the
break then we might be closer to the root cause.
Many thanks,
Matt
On 26/01/2012 17:17, Jay Sullivan wrote:
I'm not going to show you my code because everyone will make fun
of me. But here is the 10 second version:
I'm checking on the results of the `id` command from an array of
usernames that don't frequently connect to my samba box. Most
users in our AD are members of dozens or hundreds of groups, so I
simply check on the length of the output from `id` and decide on
whether or not to restart winbind. The output will typically be
empty when winbind is down, but it'll occasionally report just a
few groups instead of the usual hundreds. Why an array of
infrequent users? I've found that once I do `id username1`, that
user will be stuck in the winbind cache for a while and won't help
me figure out if winbind is broken. Since I have the luxury(?) of
thousands of users in our AD that will (probably) never connect to
my samba box, I picked a sample and ran with it. It works _most_
of the time, but it's not a solution. I'm good at band aids, but
suck at surgery. =(
Please forward this to the samba mailing list for me. I just got
a bounce from my mail server and it'll take some time to sort out:
"Your e-mail service was detected by mx.selfip.biz (NiX Spam) as
spamming". Blacklisting is a necessary evil, I suppose...
~Jay
-----Original Message----- From: Matthew Baker
[mailto:[email protected]] Sent: Thursday, January 26, 2012
11:41 AM To: Jay Sullivan Cc: [email protected] Subject: Re:
winbind craps out, NT_STATUS_PIPE_BROKEN
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Jay,
many thanks for your response.
I have a similar set of scripts currently they only run wbinfo -t
and a script to check net ads testjoin is sane. They don't catch
this. I was thinking about processing the log with something like
swatch but it's a kludge. I would be interested in seeing your
sanity checks if you don't mind?
Cheers,
Matt
On 26/01/12 16:32, Jay Sullivan wrote:
I am still experiencing this problem. I've scripted out some
winbind sanity checks that catch when it poops out and restart
winbind automagically.
I recently migrated our biggest samba host from Debian 5 to RHEL
6. The problem persists, albeit slightly less frequently (not
very scientific, I know...).
I typically only have problems with winbind when there are>
200 users connected _or_> 500 open files as reported by
smbstatus. Unfortunately for me, these conditions describe a
typical samba load during off-peak hours. =(
~Jay
-- Jay Sullivan Rochester Institute of Technology College of
Imaging Arts and Sciences [email protected]
-----Original Message----- From: Matthew Baker
[mailto:[email protected]] Sent: Tuesday, January 24,
2012 3:34 AM To: Jay Sullivan; [email protected] Subject:
Re: winbind craps out, NT_STATUS_PIPE_BROKEN
Hi Jay/Samba peeps,
Emailing in reference to
http://lists.samba.org/archive/samba/2011-April/162277.html
I have seen a very similar issue with a similar setup.
Users fail to be verified with:
getent passwd username
Entry in the log at same time is:
[2012/01/23 16:58:53.159761, 3]
winbindd/winbindd_misc.c:352(winbindd_interface_version)
[18510]: request interface version [2012/01/23 16:58:53.159966,
3] winbindd/winbindd_misc.c:385(winbindd_priv_pipe_dir) [18510]:
request location of privileged pipe [2012/01/23 16:58:53.160214,
3] winbindd/winbindd_getpwnam.c:55(winbindd_getpwnam_send)
getpwnam username [2012/01/23 16:58:53.162493, 5]
winbindd/winbindd_getpwnam.c:138(winbindd_getpwnam_recv) Could
not convert sid S-1-5-21-1117850145-1682116191-196506527-126617:
NT_STATUS_PIPE_BROKEN
Restarting winbindd solves the problem temporarily.
I've attached a copy of the smb.conf.
OS: Debian Squeeze 6.0.3 Kernel: 2.6.32-5-686-bigmem samba
2:3.5.6~dfsg-3squeeze5 winbind 2:3.5.6~dfsg-3squeeze5
Jay did you find a solution to your problem? Has anyone else on
the list seen similar issues or have any ideas of what might be
happening?
Any advice or pointers would be very much appreciated.
Thanks,
Matt
- --
Matthew Baker :: Senior Systems Administrator :: University of
Bristol
+----------------------------------------------------------------------+
| Infrastructure, Systems and Operations [email protected] |
| T: Berkeley Square: +44(0)117 3314325 (Mon, Thur& Fri) | | T:
Computer Centre: +44(0)117 3317467 (Tue, Wed) | | A: Uni of
Bristol, Computer Centre, Tyndall Ave, Bristol. BS81UD |
+----------------------------------------------------------------------+
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Fedora
- http://enigmail.mozdev.org/
iEYEARECAAYFAk8hggMACgkQLvm7pB/aicMZyACfYGhlYW/Xd2ULgMPdp4K5oL7b
8noAnAz4VjjvHEb/cuhbOj+97Rxc9bJ2 =uAtp -----END PGP SIGNATURE-----
--
Matthew Baker :: Senior Systems Administrator :: University of Bristol
+----------------------------------------------------------------------+
| Infrastructure, Systems and Operations [email protected] |
| T: Berkeley Square: +44(0)117 3314325 (Mon, Thur & Fri) |
| T: Computer Centre: +44(0)117 3317467 (Tue, Wed) |
| A: Uni of Bristol, Computer Centre, Tyndal Ave, Bristol. BS81UD |
+----------------------------------------------------------------------+
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba