Matthew J. Salerno wrote:
> Please understand that I am not a samba dev, I am just an average
user who is willing to help others out when I can because I know how
much it sucks to be stuck. I do not have the time to mirror your
environment. Regarding the settings I recommended in my last post, I'm
not sure what the best settings would be for them, but since they all
deal with caching info from AD I figured that they might be usefull.
Honestly, I would set them all to cache for a very long time, simulate
outtage, adjust and repeat.
>
> Have you checked on any suse forums? If it is a suse issue, chances
are that you are not the only person having this problem. I'll try the
outage out in my Redhat env.
>
I appreciate your help, dev or not - even though my answers are somewhat
glib. (hopefully amusing!) honestly wish I could have posted this to the
samba technical list instead... but I like the chain of command here.
Also, I didnt find anything useful on the suse forums and I besides, I
dont think this is suse issue.
Plus I hope to avoid standard overgeneralized tech support/newbie Linux
user questions, or inflated forum moderator egos by posting here
instead. I guaranty they would ask me the opposite question: "hey did
you check the samba forums?" ;-)
Those options you mentioned:
idmap cache time (G)
This parameter specifies the number of seconds that Winbind's idmap
interface will cache positive SID/uid/gid query results.
Default: //|idmap cache time|/ = |604800 (one week)| /
This default setting looks fine to me... one week is a lot longer than 1
hour so this I dont believe causes this issue nor does it help alleviate
the symptoms. Maybe I am wrong.
idmap negative cache time (G)
This parameter specifies the number of seconds that Winbind's idmap
interface will cache negative SID/uid/gid query results.
Default: //|idmap negative cache time|/ = |120| /
120 what? hmmm seconds? minutes? LOL
I am assuming the term negative is not an integer and that it means
"bad". Since I do not query bad SIDs in this test I dont think this is
the cause either. Maybe I am wrong.
winbind cache time (G)
This parameter specifies the number of seconds the winbindd(8)
<http://samba.org/samba/docs/man/manpages-3/winbindd.8.html> daemon
will cache user and group information before querying a Windows NT
server again.
This does not apply to authentication requests, these are always
evaluated in real time unless the winbind offline logon
<http://samba.org/samba/docs/man/manpages-3/smb.conf.5.html#WINBINDOFFLINELOGON>
option has been enabled.
Default: //|winbind cache time|/ = |300| /
300 what? -- years? fortnights? furlongs? farthings? bushels? bottles of
beer on the wall?
This setting may be useful... but the problem with messing with this is
once the limit is reached - the system is still unusable.
Messing with this I do not see the system go back to a usable state in a
reasonable amount of time once the AD is back up either.
Perhaps my goal is to find out if this is a design misstep, and if so
have devs fix that issue and make samba more resilient, able to tell if
the AD is up or down at a moments notice, and not fubar the samba server
during a AD server outage. You know, like you would see if you used a
windows workstation....
winbind offline logon (G)
This isnt really what I am doing here. I am not using this samba box as
a workstation. I am using it as a NAS joined to a AD domain. The only
querys it does is validate passwords for logging into CIFS shares from
windows workstations, and set/read ACLs in the filesystem.
Neither of which cause this condition of the system becoming
unresponsive. All you need to do is take the AD offline for a minute or two.
-- Option Disqualified! ;-)
winbind reconnect delay (G)
This parameter specifies the number of seconds the winbindd(8)
<http://samba.org/samba/docs/man/manpages-3/winbindd.8.html> daemon
will wait between attempts to contact a Domain controller for a
domain that is determined to be down or not contactable.
Default: //|winbind reconnect delay|/ = |30|/
Hmm 30 bottles of beer? I am guessiung seconds. If this is true, then I
should not have this issue once the AD is back up. I have seen this
problem continue long after the AD is back up and running so this causes
concern. If this was working right then it looks like it would cure my
problem and know immediately if the AD was up or down if I set it to 5
instead of 30 -- but hey it could be 30 minutes, hours, days etc - I
dont know!
Hope this helps!
Thanks,
-Clayton
------------------------------------------------------------------------
*From:* Clayton Hill <[email protected]>
*To:* Matthew J. Salerno <[email protected]>
*Cc:* [email protected]
*Sent:* Mon, October 19, 2009 1:20:00 PM
*Subject:* Re: [Samba] winbind causes Linux to lockup when
connectivity to AD is lost (subject line edited for clarity)
Hi Matthew,
/>I don't have the time to setup an environment to match yours, but I did take the
time to go back to your initial post and read through your >smb.conf./
Understandable, but that is not going to be of much help if you don't have a
way to reproduce this issue.. and I'll be answering too many basic questions.
;-)
/> 1. http://samba.org/samba/docs/man/manpages-3/winbindd.8.html - Did you check your
winbind config to make sure you are not running it with a "-n" ?
/>
Yes. I am using the default init script to start and stop winbind. Remember I am using suse 11.0 x86_64
BUT I have tested this without -n which is a totally useless way to run winbind and ironically should be far worse usability-wise than this scenario - but isn't.
> 2. http://samba.org/samba/docs/man/manpages-3/smb.conf.5.html - Have you tried playing with the "winbind cache
time", "winbind offline logon", "winbind reconnect delay" and "idmap cache time"
settings?
>
I will reread those options in the man page, but.... what do you recommend
here? Feels like a shot in the dark, and a lengthy way to randomly test this
IE: This test renders a samba machine useless every time it is ran... so very
long, slow, shots in the dark here.
_Need some experienced expert advice here on which options are best to modify
and why._
/> 3. Have you tried increasing the log level and enabling winbind debug and
creating an artificial outage and then review the logs?/
Yes - I will give you a snippet of log level 2 though during a "fake AD outage"
in a bit. I doubt it will be useful but I'll try it.
/> Again, what kind of troubleshooting have you done and what are the results?/
Please- try and reproduce this issue. It will become quite obvious to you after that.
Thanks,
-Clayton
Matthew J. Salerno wrote:
----- Original Message ----
From: Clayton Hill <[email protected]>
To: Matthew J. Salerno <[email protected]>
Cc: [email protected]; Jeremy Allison <[email protected]>
Sent: Sun, October 18, 2009 7:49:01 PM
Subject: Re: [Samba] winbind causes Linux to lockup when connectivity to AD is
lost (subject line edited for clarity)
Thanks for confirming my config is good. I already know about the old
problem with SSH and reverse DNS lookups. That actually takes about 5
minutes or less to log in, with this issue be prepared to wait almost an
hour if it even works. Similar but not the same issue.
Please, to get an understanding of this problem do the following steps
to reproduce this problem.
SUSE 11.0
Samba 3.2
Join windows 2003 AD domain (with 40,000 objects) using net ads join
Take domain controller offline.
Try to log in LOCALLY as ROOT to your console on your domain member
linux box. Do not even bother to log in as any samba user of do ANYTHING
samba related.
Watch as it takes more time than bearable (I am talking MORE THAN 20
minutes!) to0 log in to the LOCAL TERMINAL
attempt to do the same with ssh
if you are already logged in before you do this test as root LOCALLY TTY
then try and run simple commands such as: top,ls,ps,man etc etc
After seeing the problem clearly simply do this to become unstuck:
killall winbindd
or
service winbind stop
have a lot of fun.
Cheers,
-Clayton
Matthew J. Salerno wrote:
Your /etc/nsswitch.conf looks correct to me. For services like ssh, you
should just disable ptr lookups (VerifyReverseMapping no). Regarding winbind,
do you have any services or processes running on the box as a domain user?
Perhaps there is a timeout setting for krb and winbind. I don't recall seeing
one for winbind, but I would imagine that there is one for kerberos. Have you
bumped up the debugging and purposefully caused an ad failure (ifdown or bad
route) ? Have you had the console open and watched top to see if it's a
processes consuming to much cpu? What kind of troubleshooting have you done?
and what are the results?
----- Original Message ----
From: "[email protected]" <[email protected]>
To: [email protected]
Cc: [email protected]; Jeremy Allison <[email protected]>
Sent: Fri, October 16, 2009 3:59:45 PM
Subject: Re: [Samba] winbind causes Linux to lockup when connectivity to AD is
lost (subject line edited for clarity)
Ok I am not hearing replies back - I dont want this issue to be swept under
the rug.
It has been a issue for me since SuSE 10.1 + samba-3.0.30-0.1.112 even..
I know now that the commands I was telling you all access UN/PW info such
as LS or MAN etc, to see if you have permission to run them? IDK I am
guessing.
BUT - if winbind is really caching and the connection is lost, then this
should be a non-issue as you say.
Well here is my nsswitch.conf:
cat /etc/nsswitch.conf
passwd: compat winbind
group: compat winbind
networks: files dns
services: files
protocols: files
rpc: files
ethers: files
netmasks: files
netgroup: files
publickey: files
bootparams: files
automount: files
aliases: files
hosts: files dns
shadow: compat
Isn't this set up right? ;-)
So, famously when DNS is down, crap like SSH and NFS take unreasonable
amounts of time and cause system hangs in linux. This is what I've been
told, and I can accept that.
Since DNS is hosted on the AD server, when that server goes down, SSH, and
even local login hang for extremely long amounts of time - im talking more
than 10 minutes... then fail.
In Windows (im sorry Im about to compare 2 operating systems) this is a non
issue and you can use the machine even if the networking is hosed or you
cant talk to the AD.
So.......
BUMP! :-)
On Wed, 14 Oct 2009 16:51:10 -0600, <[email protected]> wrote:
Hopefully that isn't a bad thing! haha
Thanks!
On Wed, 14 Oct 2009 15:44:54 -0700, Jeremy Allison <[email protected]> wrote:
On Wed, Oct 14, 2009 at 04:02:41PM -0600, [email protected] wrote:
Hi Jeremy,
Sorry, didn't look too closely at your winbindd issue.
winbindd will cache all information to allow disconnected
operation (we made this work perfectly at SuSE), so there
certainly shouldn't be a problem with a loss of connection to a DC.
I am sorry to report that I am in fact using SuSE, and this problem is
very
easy to reproduce if I power off my AD domain, then wait (I guess) 10
minutes - then try and ssh to my Linux box. There is no way to log into
the
box.
Ok, then I'm going to hand you over to the SuSE Samba Team
maintainers on this list (sorry :-).
Jeremy.
I don't have the time to setup an environment to match yours, but I did take
the time to go back to your initial post and read through your smb.conf.
1. http://samba.org/samba/docs/man/manpages-3/winbindd.8.html - Did you check your
winbind config to make sure you are not running it with a "-n" ?
2. http://samba.org/samba/docs/man/manpages-3/smb.conf.5.html - Have you tried playing with the "winbind cache
time", "winbind offline logon", "winbind reconnect delay" and "idmap cache time"
settings?
3. Have you tried increasing the log level and enabling winbind debug and
creating an artificial outage and then review the logs?
Again, what kind of troubleshooting have you done and what are the results?
Please understand that I am not a samba dev, I am just an average user
who is willing to help others out when I can because I know how much
it sucks to be stuck. I do not have the time to mirror your
environment. Regarding the settings I recommended in my last post,
I'm not sure what the best settings would be for them, but since they
all deal with caching info from AD I figured that they might be
usefull. Honestly, I would set them all to cache for a very long
time, simulate outtage, adjust and repeat.
Have you checked on any suse forums? If it is a suse issue, chances
are that you are not the only person having this problem. I'll try
the outage out in my Redhat env.
--
To unsubscribe from this list go to the following URL and read the
instructions: https://lists.samba.org/mailman/options/samba