Re: FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-18 Thread Roger Pau Monné
On 17/03/14 19:44, Karl Pielorz wrote:
 
 
 --On 17 March 2014 18:17:53 +0100 Roger Pau Monné roger@citrix.com
 wrote:
 
 Anyone know what 'urdlck' is?

 It seems like the process is stuck while trying to acquire a rw mutex in
 read mode. Could you obtain a backtrace of the process with gdb?
 
 Ok, I think I did this right - let me know if I've not...
 
 # gdb /usr/sbin/sshd 5325
 ...
 Attaching to program: /usr/sbin/sshd, process 5325
 
 warning: current_sos: Can't read pathname for load map: Bad address
 [repeated several times]
 [lots of reading symbols from - 'no debugging symbols found' output]
 ...
 [New Thread 804006400 (LWP 100184/sshd)]
 [a few reading symbols - 'no debugging symbols found' output]
 Loaded symbols for /libexec/ld-elf.so.1
 [Switching to Thread 804006400 (LWP 100184/sshd)]
 0x0008038eb89c in __error () from /lib/libthr.so.3
 (gdb) bt
 #0  0x0008038eb89c in __error () from /lib/libthr.so.3
 #1  0x0008038e921c in pthread_timedjoin_np () from /lib/libthr.so.3
 #2  0x00080064f9a2 in _rtld_get_stack_prot () from /libexec/ld-elf.so.1
 #3  0x0008006498c9 in r_debug_state () from /libexec/ld-elf.so.1
 #4  0x0008006470cd in .text () from /libexec/ld-elf.so.1
 #5  0x0246 in ?? ()
 #6  0x in ?? ()
 
 
 Also a
 kernel-space dump might be useful, could you also run procstat -k pid?
 
 procstat output is:
 
 
 # procstat -k 5334
  PIDTID COMM TDNAME   KSTACK
 5334 100183 sshd -mi_switch
 sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_rw_rdlock
 __umtx_op_rw_rdlock amd64_syscall Xfast_syscall
 
 
 If you can briefly tell me how to do the kernel-space dump? Do I panic
 the machine (i.e. cause a crash-dump?) somehow?

The output of vmstat -ai might also be helpful to assure that event
timers are working correctly. Also, does this VM have some
PCI-passthrough? Did you migrate it? The xl configuration file used to
create the domain would also be interesting.

Also, could you try the same workload with a pristine GENERIC kernel?

Roger.
___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org

Re: FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-18 Thread Tiago Ribeiro


Em 18/03/2014, às 06:47, Karl Pielorz kpielorz_...@tdx.co.uk escreveu:

 
 --On 18 March 2014 09:44 +0100 Roger Pau Monné roger@citrix.com wrote:
 
 The output of vmstat -ai might also be helpful to assure that event
 timers are working correctly.
 
 Ok, that's at the bottom of this email.
 
 Also, does this VM have some
 PCI-passthrough? Did you migrate it? The xl configuration file used to
 create the domain would also be interesting.
 
 VM has no passthrough (this is a completely separate system to the one I was 
 doing the passthrough work on), and was created 'from new' - from Xen Center 
 I did VM, new VM
 
 I used the Other install media template with 2 vCPU's and 2048Mb of RAM - 
 and fed it the FreeBSD 10.0-R amd64 install ISO. Single GPT partition, ufs - 
 and a small (2Gb SWAP partition).
 
 Once the system was running I installed subversion then did a svn checkout of 
 'stable' source - rebuilt the world/kernel / installed the kernel/world - and 
 did the usual mergemaster updates.
 
 Also, could you try the same workload with a pristine GENERIC kernel?
 
 Ok - I'll leave this VM 'as-is' If I get time I'll setup two new VM's - a 
 stock 10.0-R with stock GENERIC, and another one using the r261289M source 
 from this machine, but stock GENERIC kernel (the only mods I'd made were the 
 'NO_ADAPTIVE_' changes as recommended by the xen man page).
 
 It may be a while before I know if sshd is going to get 'stuck' again - but 
 I'll keep an eye on them.
 
 -Karl
 

This problem seems to not be in virtualization, the FUG-BR are discussing this 
issue in SSH FreeBSD10.
___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org

Re: FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-18 Thread Karl Pielorz



--On 18 March 2014 07:14 -0300 Tiago Ribeiro sha...@gmail.com wrote:


This problem seems to not be in virtualization, the FUG-BR are discussing
this issue in SSH FreeBSD10.


Hmmm that's interesting...

None of our other 10.x boxes (bare metal) have had this issue yet - but 
being fair they're usually busy and at the moment get restarted regularly - 
so might be why we've not seen it on those.


FUG-BR = Grupo Brasileiro de Usuarios de FreeBSD? - Do you know if they're 
going to raise the issue on the FreeBSD lists? [if they haven't already?]


Thanks,

-Karl
___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org


Re: FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-18 Thread Karl Pielorz



--On 18 March 2014 08:42:00 -0300 Tiago Ribeiro sha...@gmail.com wrote:


I think some people report this error in oder list.
Here is part of email send to list:


# ps afx

15961  -  Is0:00.02 sshd: wendell [priv] (sshd)
15962  -  Z 0:00.00 defunct

# uname -a
FreeBSD bacula 10.0-RELEASE FreeBSD 10.0-RELEASE #0 r260789: Thu Jan 16
22:34:59 UTC 2014 r...@snap.freebsd.org:/usr/obj/usr/src/sys/GENERIC
amd64


Thanks,

I found the thread using the above on -stable and have posted to that (it 
didn't mention urdlck - which is why I probably missed it the first time 
round with Google et'al).


Regards and thanks for the replies,

-Karl

___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org


FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-17 Thread Karl Pielorz


Hi,

I setup a VM a while ago under XenServer 6.2 - it's an amd64 FBSD 10.0-S 
box (based on r261289) - it's running under Xen PVHVM.


I set it up - and left it for a while (46 days). I went to ssh to it today, 
and got:



ssh_exchange_identification: Connection closed by remote host


Getting on to the boxes console there's lots of what look to be 'stuck' 
sshd processes?


0  3933  895 0  20  0 84868 6944 urdlck Is -  0:00.01 sshd: unknown [priv] 
(sshd)

22  3934 3933 0  20  0 00 -  Z  -  0:00.00 defunct
0  3935 3933 0  20  0 84868 6952 sbwait I  -  0:00.00 sshd: unknown [pam] 
(sshd)
0  4338  895 0  20  0 84868 6944 urdlck Is -  0:00.01 sshd: unknown [priv] 
(sshd)

22  4339 4338 0  20  0 00 -  Z  -  0:00.00 defunct


Anyone know what 'urdlck' is?

There's 126 of these processes, e.g.

5446  -  I 0:00.00 sshd: unknown [pam] (sshd)
5450  -  Is0:00.01 sshd: unknown [priv] (sshd)
5452  -  I 0:00.00 sshd: unknown [pam] (sshd)
5453  -  Is0:00.01 sshd: unknown [priv] (sshd)


Bearing in mind the box is firewalled from ssh access, and no one (apart 
from me today) has attempted to get onto the box with ssh - this is a 
little concerning. Every about 5-10 connects will actually 'connect' - the 
rest just result in the 'key exchange' error.


Kernel is GENERIC with:

  options NO_ADAPTIVE_MUTEXES
  options NO_ADAPTIVE_RWLOCKS
  options NO_ADAPTIVE_SX

There's nothing logged in dmesg, or syslog.

Is there anything worth doing to this VM before I restart it - i.e. to try 
and figure out what's happened? / troubleshoot?


-Karl

___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org


Re: FBSD 10.0-S (r261289M) under XenServer 6.2 - Stuck sshd in urdlck?

2014-03-17 Thread Karl Pielorz



--On 17 March 2014 18:17:53 +0100 Roger Pau Monné roger@citrix.com 
wrote:



Anyone know what 'urdlck' is?


It seems like the process is stuck while trying to acquire a rw mutex in
read mode. Could you obtain a backtrace of the process with gdb?


Ok, I think I did this right - let me know if I've not...

# gdb /usr/sbin/sshd 5325
...
Attaching to program: /usr/sbin/sshd, process 5325

warning: current_sos: Can't read pathname for load map: Bad address
[repeated several times]
[lots of reading symbols from - 'no debugging symbols found' output]
...
[New Thread 804006400 (LWP 100184/sshd)]
[a few reading symbols - 'no debugging symbols found' output]
Loaded symbols for /libexec/ld-elf.so.1
[Switching to Thread 804006400 (LWP 100184/sshd)]
0x0008038eb89c in __error () from /lib/libthr.so.3
(gdb) bt
#0  0x0008038eb89c in __error () from /lib/libthr.so.3
#1  0x0008038e921c in pthread_timedjoin_np () from /lib/libthr.so.3
#2  0x00080064f9a2 in _rtld_get_stack_prot () from /libexec/ld-elf.so.1
#3  0x0008006498c9 in r_debug_state () from /libexec/ld-elf.so.1
#4  0x0008006470cd in .text () from /libexec/ld-elf.so.1
#5  0x0246 in ?? ()
#6  0x in ?? ()



Also a
kernel-space dump might be useful, could you also run procstat -k pid?


procstat output is:


# procstat -k 5334
 PIDTID COMM TDNAME   KSTACK
5334 100183 sshd -mi_switch 
sleepq_catch_signals sleepq_wait_sig _sleep umtxq_sleep do_rw_rdlock 
__umtx_op_rw_rdlock amd64_syscall Xfast_syscall



If you can briefly tell me how to do the kernel-space dump? Do I panic the 
machine (i.e. cause a crash-dump?) somehow?


Cheers  thanks for your reply,

-Karl

___
freebsd-xen@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-xen
To unsubscribe, send any mail to freebsd-xen-unsubscr...@freebsd.org