[releng_9 tinderbox] failure on ia64/ia64

2013-06-19 Thread FreeBSD Tinderbox
TB --- 2013-06-19 08:42:19 - tinderbox 2.10 running on freebsd-stable.sentex.ca
TB --- 2013-06-19 08:42:19 - FreeBSD freebsd-stable.sentex.ca 8.3-STABLE 
FreeBSD 8.3-STABLE #0: Tue Oct 16 17:37:58 UTC 2012 
mdtan...@freebsd-stable.sentex.ca:/usr/obj/usr/src/sys/server  amd64
TB --- 2013-06-19 08:42:19 - starting RELENG_9 tinderbox run for ia64/ia64
TB --- 2013-06-19 08:42:19 - cleaning the object tree
TB --- 2013-06-19 08:42:39 - /usr/local/bin/svn stat /src
TB --- 2013-06-19 08:43:12 - At svn revision 251990
TB --- 2013-06-19 08:43:13 - building world
TB --- 2013-06-19 08:43:13 - CROSS_BUILD_TESTING=YES
TB --- 2013-06-19 08:43:13 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-06-19 08:43:13 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-06-19 08:43:13 - SRCCONF=/dev/null
TB --- 2013-06-19 08:43:13 - TARGET=ia64
TB --- 2013-06-19 08:43:13 - TARGET_ARCH=ia64
TB --- 2013-06-19 08:43:13 - TZ=UTC
TB --- 2013-06-19 08:43:13 - __MAKE_CONF=/dev/null
TB --- 2013-06-19 08:43:13 - cd /src
TB --- 2013-06-19 08:43:13 - /usr/bin/make -B buildworld
 World build started on Wed Jun 19 08:43:14 UTC 2013
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
 World build completed on Wed Jun 19 10:29:46 UTC 2013
TB --- 2013-06-19 10:29:46 - generating LINT kernel config
TB --- 2013-06-19 10:29:46 - cd /src/sys/ia64/conf
TB --- 2013-06-19 10:29:46 - /usr/bin/make -B LINT
TB --- 2013-06-19 10:29:47 - cd /src/sys/ia64/conf
TB --- 2013-06-19 10:29:47 - /usr/sbin/config -m LINT
TB --- 2013-06-19 10:29:47 - building LINT kernel
TB --- 2013-06-19 10:29:47 - CROSS_BUILD_TESTING=YES
TB --- 2013-06-19 10:29:47 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-06-19 10:29:47 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-06-19 10:29:47 - SRCCONF=/dev/null
TB --- 2013-06-19 10:29:47 - TARGET=ia64
TB --- 2013-06-19 10:29:47 - TARGET_ARCH=ia64
TB --- 2013-06-19 10:29:47 - TZ=UTC
TB --- 2013-06-19 10:29:47 - __MAKE_CONF=/dev/null
TB --- 2013-06-19 10:29:47 - cd /src
TB --- 2013-06-19 10:29:47 - /usr/bin/make -B buildkernel KERNCONF=LINT
 Kernel build for LINT started on Wed Jun 19 10:29:47 UTC 2013
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
[...]
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/advansys/adwmcode.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/ae/if_ae.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/age/if_age.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param 

shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both 
physical 9.1 boxes as well as VMs for I would say 6-9 months at least. 
 I finally have a physical box here that reproduces it consistently 
that I can reboot easily (ie; not a production/client server).


No matter what I do:

reboot
shutdown -p
shutdown -r

This specific server will stop at All buffers synced and not actually 
power down or reboot.  KB input seems to be ignored.  This server is a 
ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show 
this are using GMIRRORs for root/swap/boot (no ZFS).


Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg

When I reset the server it appears that disks were not dismounted 
cleanly ... on this ZFS box it comes back quick because ZFS is good like 
that but on the other servers with GMIRROR roots rebuilding the GMIRROR 
and fscking at the same time is murder on the disk/performance until it 
finishes.


Another interesting thing is that this particular server runs slapd 
(OpenLDAP) which, when it comes back up, has a corrupted DB (easily 
fixed with db_recover, but still).  This might be because FS commits 
aren't happening at the end.   I can even manually stop slapd (service 
slapd stop) then run sync(8) (I assume this does something for ZFS too) 
and it still comes back as hosed if I reboot shortly after.  If I 
start/stop slapd it's fine.  So I feel like there is an FS/dismount 
thing going on here.


Additional information: I also have some boxes which will reboot (ie; 
they don't freeze like some do at the end) but they don't dismount 
cleanly either and have to rebuild both GMIRROR and fsck.  This might be 
a different issue, too.


Anyone have any thoughts?  Let me know if I can provide more details etc.

--
Adam Strohl
http://www.ateamsystems.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
 Hello -STABLE@,
 
 So I've seen this situation seemingly randomly on a number of both
 physical 9.1 boxes as well as VMs for I would say 6-9 months at
 least.  I finally have a physical box here that reproduces it
 consistently that I can reboot easily (ie; not a production/client
 server).
 
 No matter what I do:
 
 reboot
 shutdown -p
 shutdown -r
 
 This specific server will stop at All buffers synced and not
 actually power down or reboot.  KB input seems to be ignored.  This
 server is a ZFS NAS (with GMIRROR for boot blocks) but the other
 boxes which show this are using GMIRRORs for root/swap/boot (no
 ZFS).
 
 Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
 
 When I reset the server it appears that disks were not dismounted
 cleanly ... on this ZFS box it comes back quick because ZFS is good
 like that but on the other servers with GMIRROR roots rebuilding the
 GMIRROR and fscking at the same time is murder on the
 disk/performance until it finishes.

1. You mention as well as VMs.  Anything under a virtual machine or
under a hypervisor is going to be very, very, **VERY** different than
bare metal.  So I hope the issues you're talking about above are on bare
metal -- I will assume so.

2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).

3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.

4. Does sysctl hw.usb.no_shutdown_wait=1 help you?

5. Does sysctl hw.acpi.handle_reboot=1 help you?

6. Does sysctl hw.acpi.disable_on_reboot=1 help you?

7. If none of the above helps, can you please boot verbose mode and then
when the system locks up on shutdown -r now take a picture of the
VGA console?

8. Does the machine run moused(8) (check the process list please, do not
rely on rc.conf) ?

 Another interesting thing is that this particular server runs slapd
 (OpenLDAP) which, when it comes back up, has a corrupted DB
 (easily fixed with db_recover, but still).  This might be because FS
 commits aren't happening at the end.   I can even manually stop
 slapd (service slapd stop) then run sync(8) (I assume this does
 something for ZFS too) and it still comes back as hosed if I reboot
 shortly after.  If I start/stop slapd it's fine.  So I feel like
 there is an FS/dismount thing going on here.

sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html

Your problem is related to unclean shutdown; fix that and your issues go
away.

 Additional information: I also have some boxes which will reboot
 (ie; they don't freeze like some do at the end) but they don't
 dismount cleanly either and have to rebuild both GMIRROR and fsck.
 This might be a different issue, too.

Every issue needs to be handled/treated separately.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Steven Hartland

OS version?
- Original Message - 
From: Adam Strohl adams-free...@ateamsystems.com

To: freebsd-stable@freebsd.org
Sent: Wednesday, June 19, 2013 12:35 PM
Subject: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount



Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both 
physical 9.1 boxes as well as VMs for I would say 6-9 months at least. 
 I finally have a physical box here that reproduces it consistently 
that I can reboot easily (ie; not a production/client server).


No matter what I do:

reboot
shutdown -p
shutdown -r

This specific server will stop at All buffers synced and not actually 
power down or reboot.  KB input seems to be ignored.  This server is a 
ZFS NAS (with GMIRROR for boot blocks) but the other boxes which show 
this are using GMIRRORs for root/swap/boot (no ZFS).


Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg

When I reset the server it appears that disks were not dismounted 
cleanly ... on this ZFS box it comes back quick because ZFS is good like 
that but on the other servers with GMIRROR roots rebuilding the GMIRROR 
and fscking at the same time is murder on the disk/performance until it 
finishes.


Another interesting thing is that this particular server runs slapd 
(OpenLDAP) which, when it comes back up, has a corrupted DB (easily 
fixed with db_recover, but still).  This might be because FS commits 
aren't happening at the end.   I can even manually stop slapd (service 
slapd stop) then run sync(8) (I assume this does something for ZFS too) 
and it still comes back as hosed if I reboot shortly after.  If I 
start/stop slapd it's fine.  So I feel like there is an FS/dismount 
thing going on here.


Additional information: I also have some boxes which will reboot (ie; 
they don't freeze like some do at the end) but they don't dismount 
cleanly either and have to rebuild both GMIRROR and fsck.  This might be 
a different issue, too.


Anyone have any thoughts?  Let me know if I can provide more details etc.

--
Adam Strohl
http://www.ateamsystems.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

On 6/19/2013 19:21, Jeremy Chadwick wrote:

On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:

Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both
physical 9.1 boxes as well as VMs for I would say 6-9 months at
least.  I finally have a physical box here that reproduces it
consistently that I can reboot easily (ie; not a production/client
server).

No matter what I do:

reboot
shutdown -p
shutdown -r

This specific server will stop at All buffers synced and not
actually power down or reboot.  KB input seems to be ignored.  This
server is a ZFS NAS (with GMIRROR for boot blocks) but the other
boxes which show this are using GMIRRORs for root/swap/boot (no
ZFS).

Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg

When I reset the server it appears that disks were not dismounted
cleanly ... on this ZFS box it comes back quick because ZFS is good
like that but on the other servers with GMIRROR roots rebuilding the
GMIRROR and fscking at the same time is murder on the
disk/performance until it finishes.


1. You mention as well as VMs.  Anything under a virtual machine or
under a hypervisor is going to be very, very, **VERY** different than
bare metal.  So I hope the issues you're talking about above are on bare
metal -- I will assume so.


Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor 
(and yes it worries me the implications of something so broad).  Those 
unites I just haven't been able to isolate on a server which isn't 
critical.  Lets focus on this server for now though per your suggestion 
below.




2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).


Sorry, this ZFS box is 9.1-R P4 (kernel built today):

FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19 
15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64




3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.


Sure take a look at the full log here: http://pastebin.com/k55gVVuU

This includes a boot, then a reboot as I describe (you can see it logs 
the All Buffers Synced, etc) then powering back on.




4. Does sysctl hw.usb.no_shutdown_wait=1 help you?


Weirdly this allowed it to reboot on the first try (without needing to 
be reset), but not the second.  The Starting background file system 
checks in 60 seconds message appeared ... that only happens when 
something is dirty, right?


So the second try with just this I could ctrl alt del it and it 
responded .. kind of:

http://i.imgur.com/POAIaNg.jpg

Still had to reset it though.



5. Does sysctl hw.acpi.handle_reboot=1 help you?


No change, still responded to a ctrl alt del like above, but like that 
still needs to be reset and comes back dirty.




6. Does sysctl hw.acpi.disable_on_reboot=1 help you?


No change.  Same as above, ctrl alt del responds but needs a hard reset 
still.




7. If none of the above helps, can you please boot verbose mode and then
when the system locks up on shutdown -r now take a picture of the
VGA console?


Lots of debug on boot obviously but not much different on shutdown/hang:
http://i.imgur.com/SgzSsoP.jpg



8. Does the machine run moused(8) (check the process list please, do not
rely on rc.conf) ?


ps -auxww | grep moused reveals nothing running (which is how I have 
things set).





Another interesting thing is that this particular server runs slapd
(OpenLDAP) which, when it comes back up, has a corrupted DB
(easily fixed with db_recover, but still).  This might be because FS
commits aren't happening at the end.   I can even manually stop
slapd (service slapd stop) then run sync(8) (I assume this does
something for ZFS too) and it still comes back as hosed if I reboot
shortly after.  If I start/stop slapd it's fine.  So I feel like
there is an FS/dismount thing going on here.


sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html


Groking this now ..



Your problem is related to unclean shutdown; fix that and your issues go
away.


Yeah that is my feeling as well.




Additional information: I also have some boxes which will reboot
(ie; they don't freeze like some do at the end) but they don't
dismount cleanly either and have to rebuild both GMIRROR and fsck.
This might be a different issue, too.


Every issue needs to be handled/treated separately.


Sure, I just had run across some threads about that but will focus on 
this ZFS box (and see if anything that fixes here does anything with 
that once I can reliably reproduce it out of production).







--
Adam Strohl
http://www.ateamsystems.com/
___

Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Dennis Kögel
Hi,

very periodically, we see I/O hangs for about 10 seconds, roughly once per 
minute.

Each time this happens, the I/O rate simply drops to zero, and all disk access 
hangs; this is also very noticeable on the shell, for NFS clients etc. 
Everything else (networking, kernel, …) seems to continue normally.

Environment: FreeBSD 9.1R GENERIC on amd64, using ZFS, on a ARC1320 PCIe with 
24x Seagate ST33000650SS (3rd party arcsas.ko driver).

It's easy to observe these hangs under write load, e.g. with 'zpool iostat 1':

void22.4T  42.6T 34  2.73K  1.07M   293M
void22.4T  42.6T 20  2.74K   623K   289M
void22.4T  42.6T144  2.62K  4.83M   279M
void22.4T  42.6T 13  2.60K   437K   283M
void22.4T  42.6T  0  0  0  0 -- hang starts
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0296  4.00K  34.2M -- hang ends
void22.4T  42.6T  2  2.64K  73.8K   288M
void22.4T  42.6T  8  3.12K   278K   329M

Each time this happens, there is a completely unexplained spike of interrupts 
on uhci0: 'systat -vm' then displays numbers around 270k.

# vmstat -i | grep -E '(arcsas|uhci0|Total)'
irq16: uhci0  1227020890  67708
irq24: arcsas0  12045211664
Total 1266417827  69882

Things to note:

- Booting an USB-less kernel or disabling all USB in the BIOS doesn't change a 
thing (no interrupt spikes to be seen, but the hangs remain)
- The hangs / interrupt spikes happen just as often when the system is idle
- Board is a Supermicro x8dth
- There's two igb cards
- Root is ZFS as well (separate pool though)
- BIOS, Areca FW and driver already are latest versions
- Putting the controller to a different slot doesn't change the behaviour
- We have two identical systems and both show the exact same symptoms, so flaky 
hardware is probably not the issue

Any ideas would be appreciated.

Thanks,
D.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

On 6/19/2013 19:53, Adam Strohl wrote:

sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html


Groking this now ..



Epic.  So basically mount -u -o ro FS is really what I (and probably 
everyone else) wants and the man page needs a major overhaul + 
disclaimer (and possibly a recommendation to use mount -u -o ro FS 
instead).



--
Adam Strohl
http://www.ateamsystems.com/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Ronald Klop
On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl  
adams-free...@ateamsystems.com wrote:



On 6/19/2013 19:21, Jeremy Chadwick wrote:

On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:

Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both
physical 9.1 boxes as well as VMs for I would say 6-9 months at
least.  I finally have a physical box here that reproduces it
consistently that I can reboot easily (ie; not a production/client
server).


Hi,

My home computer had the same symptom (not rebooting after 'all buffers  
flushed' message) a couple of months ago. But I follow 9-STABLE and the  
problem is gone for a while now.


Ronald.



No matter what I do:

reboot
shutdown -p
shutdown -r

This specific server will stop at All buffers synced and not
actually power down or reboot.  KB input seems to be ignored.  This
server is a ZFS NAS (with GMIRROR for boot blocks) but the other
boxes which show this are using GMIRRORs for root/swap/boot (no
ZFS).

Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg

When I reset the server it appears that disks were not dismounted
cleanly ... on this ZFS box it comes back quick because ZFS is good
like that but on the other servers with GMIRROR roots rebuilding the
GMIRROR and fscking at the same time is murder on the
disk/performance until it finishes.


1. You mention as well as VMs.  Anything under a virtual machine or
under a hypervisor is going to be very, very, **VERY** different than
bare metal.  So I hope the issues you're talking about above are on bare
metal -- I will assume so.


Nope, I see basically the same thing sometimes under ESXi 5.0 Hypervisor  
(and yes it worries me the implications of something so broad).  Those  
unites I just haven't been able to isolate on a server which isn't  
critical.  Lets focus on this server for now though per your suggestion  
below.




2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).


Sorry, this ZFS box is 9.1-R P4 (kernel built today):

FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun 19  
15:31:12 ICT 2013 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS   
amd64




3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.


Sure take a look at the full log here: http://pastebin.com/k55gVVuU

This includes a boot, then a reboot as I describe (you can see it logs  
the All Buffers Synced, etc) then powering back on.




4. Does sysctl hw.usb.no_shutdown_wait=1 help you?


Weirdly this allowed it to reboot on the first try (without needing to  
be reset), but not the second.  The Starting background file system  
checks in 60 seconds message appeared ... that only happens when  
something is dirty, right?


So the second try with just this I could ctrl alt del it and it  
responded .. kind of:

http://i.imgur.com/POAIaNg.jpg

Still had to reset it though.



5. Does sysctl hw.acpi.handle_reboot=1 help you?


No change, still responded to a ctrl alt del like above, but like that  
still needs to be reset and comes back dirty.




6. Does sysctl hw.acpi.disable_on_reboot=1 help you?


No change.  Same as above, ctrl alt del responds but needs a hard reset  
still.




7. If none of the above helps, can you please boot verbose mode and then
when the system locks up on shutdown -r now take a picture of the
VGA console?


Lots of debug on boot obviously but not much different on shutdown/hang:
http://i.imgur.com/SgzSsoP.jpg



8. Does the machine run moused(8) (check the process list please, do not
rely on rc.conf) ?


ps -auxww | grep moused reveals nothing running (which is how I have  
things set).





Another interesting thing is that this particular server runs slapd
(OpenLDAP) which, when it comes back up, has a corrupted DB
(easily fixed with db_recover, but still).  This might be because FS
commits aren't happening at the end.   I can even manually stop
slapd (service slapd stop) then run sync(8) (I assume this does
something for ZFS too) and it still comes back as hosed if I reboot
shortly after.  If I start/stop slapd it's fine.  So I feel like
there is an FS/dismount thing going on here.


sync(8) does not do what you think it does.  Please read (not skim) this
entire thread starting here:

http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/thread.html#16982
http://lists.freebsd.org/pipermail/freebsd-fs/2013-April/016982.html


Groking this now ..



Your problem is related to unclean shutdown; fix that and your issues go
away.


Yeah that is my feeling as well.




Additional information: I also have some boxes which will reboot
(ie; they don't freeze like some do at the end) but they don't
dismount cleanly either and have to rebuild both GMIRROR and fsck.
This might be a different issue, too.


Every issue needs to be handled/treated separately.


Sure, I 

Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Ronald Klop

On Wed, 19 Jun 2013 15:01:14 +0200, Dennis Kögel d...@neveragain.de wrote:


Hi,

very periodically, we see I/O hangs for about 10 seconds, roughly once  
per minute.


Each time this happens, the I/O rate simply drops to zero, and all disk  
access hangs; this is also very noticeable on the shell, for NFS clients  
etc. Everything else (networking, kernel, …) seems to continue normally.


Environment: FreeBSD 9.1R GENERIC on amd64, using ZFS, on a ARC1320 PCIe  
with 24x Seagate ST33000650SS (3rd party arcsas.ko driver).


It's easy to observe these hangs under write load, e.g. with 'zpool  
iostat 1':


void22.4T  42.6T 34  2.73K  1.07M   293M
void22.4T  42.6T 20  2.74K   623K   289M
void22.4T  42.6T144  2.62K  4.83M   279M
void22.4T  42.6T 13  2.60K   437K   283M
void22.4T  42.6T  0  0  0  0 -- hang starts
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0  0  0  0
void22.4T  42.6T  0296  4.00K  34.2M -- hang ends
void22.4T  42.6T  2  2.64K  73.8K   288M
void22.4T  42.6T  8  3.12K   278K   329M

Each time this happens, there is a completely unexplained spike of  
interrupts on uhci0: 'systat -vm' then displays numbers around 270k.


# vmstat -i | grep -E '(arcsas|uhci0|Total)'
irq16: uhci0  1227020890  67708
irq24: arcsas0  12045211664
Total 1266417827  69882

Things to note:

- Booting an USB-less kernel or disabling all USB in the BIOS doesn't  
change a thing (no interrupt spikes to be seen, but the hangs remain)
- The hangs / interrupt spikes happen just as often when the system is  
idle

- Board is a Supermicro x8dth
- There's two igb cards
- Root is ZFS as well (separate pool though)
- BIOS, Areca FW and driver already are latest versions
- Putting the controller to a different slot doesn't change the behaviour
- We have two identical systems and both show the exact same symptoms,  
so flaky hardware is probably not the issue


Any ideas would be appreciated.

Thanks,
D.


First send more information about the system:
- The content of /var/run/dmesg.boot.
- Install /usr/ports/sysutils/zfs-stats and send the output of zfs-stats  
-a.

- Send the output of zpool status + zpool list.
- Did you configure compression or dedup on the pool?
- Do you keep a lot of snapshots?
- Do you run a cronjob every minute which does something with the pool?  
Gathers statistics or something like that.


Ronald.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 07:53:19PM +0700, Adam Strohl wrote:
 On 6/19/2013 19:21, Jeremy Chadwick wrote:
 On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:
 Hello -STABLE@,
 
 So I've seen this situation seemingly randomly on a number of both
 physical 9.1 boxes as well as VMs for I would say 6-9 months at
 least.  I finally have a physical box here that reproduces it
 consistently that I can reboot easily (ie; not a production/client
 server).
 
 No matter what I do:
 
 reboot
 shutdown -p
 shutdown -r
 
 This specific server will stop at All buffers synced and not
 actually power down or reboot.  KB input seems to be ignored.  This
 server is a ZFS NAS (with GMIRROR for boot blocks) but the other
 boxes which show this are using GMIRRORs for root/swap/boot (no
 ZFS).
 
 Here is what happens on the console: http://i.imgur.com/1H8JMyB.jpg
 
 When I reset the server it appears that disks were not dismounted
 cleanly ... on this ZFS box it comes back quick because ZFS is good
 like that but on the other servers with GMIRROR roots rebuilding the
 GMIRROR and fscking at the same time is murder on the
 disk/performance until it finishes.
 
 1. You mention as well as VMs.  Anything under a virtual machine or
 under a hypervisor is going to be very, very, **VERY** different than
 bare metal.  So I hope the issues you're talking about above are on bare
 metal -- I will assume so.
 
 Nope, I see basically the same thing sometimes under ESXi 5.0
 Hypervisor (and yes it worries me the implications of something so
 broad).  Those unites I just haven't been able to isolate on a
 server which isn't critical.  Lets focus on this server for now
 though per your suggestion below.

I'm sorry but I don't understand your first sentence -- the first part
of your sentence says nope (I have to assume in reply to my on bare
metal part), but then says I see basically the same thing sometimes
under ESXi which implies an alternate environment in comparison (i.e.
we *are* talking about bare metal).  Consider me confused.  :-)

 2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
 If you use stable/9 (RELENG_9) we need to see uname -a output (you can
 hide the machine name if you want).
 
 Sorry, this ZFS box is 9.1-R P4 (kernel built today):
 
 FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun
 19 15:31:12 ICT 2013
 root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64

I suggest trying stable/9 (and staying with it, for that matter).

 3. Can we please have dmesg from this machine?  The controller and some
 other hardware details matter.
 
 Sure take a look at the full log here: http://pastebin.com/k55gVVuU
 
 This includes a boot, then a reboot as I describe (you can see it
 logs the All Buffers Synced, etc) then powering back on.

Thanks.  I was mainly interested in the storage controller being used
(in this case ahci(4)) and the disks being used (notorious ST3000DM001,
known for excessively parking heads).  AFAIK this isn't one of the
controllers that was known for weird quirky issues pertaining to
flushing data to disk on shutdown.

I have to ask: is this FreeBSD box running under a HV?

If it *is not* running under a HV, could we please get exact motherboard
model and version (including BIOS version)?  Sometimes (not always) you
can get this from kenv | grep smbios.

I can also see you're running your own kernel.  We'll get to that in a
moment.

 4. Does sysctl hw.usb.no_shutdown_wait=1 help you?
 
 Weirdly this allowed it to reboot on the first try (without needing
 to be reset), but not the second.

I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
working on the USB stack and fixing major bugs.

 The Starting background file
 system checks in 60 seconds message appeared ... that only happens
 when something is dirty, right?

No it does not.  That message is always printed when you use background
fsck, which is the default.

I do not advocate using background fsck, because it has been known (and
may still do this -- I do not care to find out, I do not have time for
unreliable filesystem nonsense) to not always fix all filesystem
problems.  Meaning: people using background fsck have been known to boot
into single-user and issue fsck manually and find issues.

Place background_fsck=no in /etc/rc.conf.  If the machine does not
have a clean filesystem on boot-up, you'll know because the system will
immediately begin fsck (in the foreground actively).  You'll recognise
that output if it happens, trust me.

 So the second try with just this I could ctrl alt del it and it
 responded .. kind of:
 http://i.imgur.com/POAIaNg.jpg
 
 Still had to reset it though.

This looks like a chicken-and-egg problem -- you're probably fighting
with background fsck, as the message there indicate some processes
would not die.  I'm just taking a guess though.

I am now going to ask you for more information:

1. gpart show -p xxx where xxx is each disk you have in the system
2. 

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Steven Hartland
- Original Message - 
From: Ronald Klop ronald-freeb...@klop.yi.org



On Wed, 19 Jun 2013 14:53:19 +0200, Adam Strohl  
adams-free...@ateamsystems.com wrote:



On 6/19/2013 19:21, Jeremy Chadwick wrote:

On Wed, Jun 19, 2013 at 06:35:57PM +0700, Adam Strohl wrote:

Hello -STABLE@,

So I've seen this situation seemingly randomly on a number of both
physical 9.1 boxes as well as VMs for I would say 6-9 months at
least.  I finally have a physical box here that reproduces it
consistently that I can reboot easily (ie; not a production/client
server).


Hi,

My home computer had the same symptom (not rebooting after 'all buffers  
flushed' message) a couple of months ago. But I follow 9-STABLE and the  
problem is gone for a while now.


avg@ did a lot of work on the ZFS vfs locking which fixed at least one
hang on reboot for ZFS. I don't believe this is in 9.1-RELEASE, so you
should test a stable/9 or 8.4-RELEASE (which is newer than 9.1-RELEASE)
kernel.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Dennis Kögel
Hi,

Am 19.06.2013 um 15:28 schrieb Ronald Klop:
 First send more information about the system:
 - The content of /var/run/dmesg.boot.
 - Install /usr/ports/sysutils/zfs-stats and send the output of zfs-stats -a.
 - Send the output of zpool status + zpool list.

not sure if I should put them all in this mail? -- I've put them here:

http://pub.neveragain.de/arcsas/sysinfo.txt

 - Did you configure compression or dedup on the pool?
 - Do you keep a lot of snapshots?
 - Do you run a cronjob every minute which does something with the pool? 
 Gathers statistics or something like that.

There's only a handful of datasets (three on one machine, six on the other), 
and currently no snapshots. No deduplication.
Some datasets on one machine have compression, the other machine doesn't have 
compression turned on for any dataset.

No minutely cronjobs, automated logons, nothing alike.

Thanks!
D.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

On 6/19/2013 20:35, Jeremy Chadwick wrote:


Nope, I see basically the same thing sometimes under ESXi 5.0
Hypervisor (and yes it worries me the implications of something so
broad).  Those unites I just haven't been able to isolate on a
server which isn't critical.  Lets focus on this server for now
though per your suggestion below.


I'm sorry but I don't understand your first sentence -- the first part
of your sentence says nope (I have to assume in reply to my on bare
metal part), but then says I see basically the same thing sometimes
under ESXi which implies an alternate environment in comparison (i.e.
we *are* talking about bare metal).  Consider me confused.  :-)


Basically: The issue is extremely similar if not the same root cause, be 
it a native or virtual server.  This server though is native.





2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).


Sorry, this ZFS box is 9.1-R P4 (kernel built today):

FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun
19 15:31:12 ICT 2013
root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64


I suggest trying stable/9 (and staying with it, for that matter).


The issue is no binary updates and we have a large deploy base, so we've 
stuck with -R and use it internally because it's what we deploy.





3. Can we please have dmesg from this machine?  The controller and some
other hardware details matter.


Sure take a look at the full log here: http://pastebin.com/k55gVVuU

This includes a boot, then a reboot as I describe (you can see it
logs the All Buffers Synced, etc) then powering back on.


Thanks.  I was mainly interested in the storage controller being used
(in this case ahci(4)) and the disks being used (notorious ST3000DM001,
known for excessively parking heads).


Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD supply 
chain here (Thailand) is weird considering how many are made here (and 
can't buy).  Smartd screams about them possibly needing a firmware 
update (they don't according to Seagate).   Had no issues aside from a 
failure a month or so again (it's an HD ... it happens).



AFAIK this isn't one of the
controllers that was known for weird quirky issues pertaining to
flushing data to disk on shutdown.

I have to ask: is this FreeBSD box running under a HV?


No, native/direct for sure on this one.



If it *is not* running under a HV, could we please get exact motherboard
model and version (including BIOS version)?  Sometimes (not always) you
can get this from kenv | grep smbios.


No problem I built this one personally:

Asus P8B-X BIOS revision 6103




I can also see you're running your own kernel.  We'll get to that in a
moment.


It's GENERIC with the following added to the end:

# -- Add Support for nicer console
#
options VESA
options SC_PIXEL_MODE

# -- PF Support
#
device pf
device pflog
device pfsync

# -- Core temperature reporting
#
device  coretemp # For Intel CPUs

device  smbios




4. Does sysctl hw.usb.no_shutdown_wait=1 help you?


Weirdly this allowed it to reboot on the first try (without needing
to be reset), but not the second.


I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
working on the USB stack and fixing major bugs.


Got it but probably not going to go this route as it means no more 
binary upgrades.  While I can reboot it, it is the office NAS here and 
so 'testing out' -STABLE I think probably isn't going to happen.





The Starting background file
system checks in 60 seconds message appeared ... that only happens
when something is dirty, right?


No it does not.  That message is always printed when you use background
fsck, which is the default.


Got it.



I do not advocate using background fsck, because it has been known (and
may still do this -- I do not care to find out, I do not have time for
unreliable filesystem nonsense) to not always fix all filesystem
problems.  Meaning: people using background fsck have been known to boot
into single-user and issue fsck manually and find issues.

Place background_fsck=no in /etc/rc.conf.  If the machine does not
have a clean filesystem on boot-up, you'll know because the system will
immediately begin fsck (in the foreground actively).  You'll recognise
that output if it happens, trust me.


Preaching to the choir, we set this on all servers this one somehow did 
not have it set (I think due to ZFS making it unique and not copying our 
rc.conf template over properly).





So the second try with just this I could ctrl alt del it and it
responded .. kind of:
http://i.imgur.com/POAIaNg.jpg

Still had to reset it though.


This looks like a chicken-and-egg problem -- you're probably fighting
with background fsck, as the message there indicate some processes
would not die.  I'm just taking a guess though.


Yeah.  Even with no background fsck though I still 

Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Steven Hartland


- Original Message - 
From: Adam Strohl adams-free...@ateamsystems.com

To: Jeremy Chadwick j...@koitsu.org
Cc: freebsd-stable@freebsd.org
Sent: Wednesday, June 19, 2013 3:15 PM
Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly 
dismount



On 6/19/2013 20:35, Jeremy Chadwick wrote:


Nope, I see basically the same thing sometimes under ESXi 5.0
Hypervisor (and yes it worries me the implications of something so
broad).  Those unites I just haven't been able to isolate on a
server which isn't critical.  Lets focus on this server for now
though per your suggestion below.


I'm sorry but I don't understand your first sentence -- the first part
of your sentence says nope (I have to assume in reply to my on bare
metal part), but then says I see basically the same thing sometimes
under ESXi which implies an alternate environment in comparison (i.e.
we *are* talking about bare metal).  Consider me confused.  :-)


Basically: The issue is extremely similar if not the same root cause, be 
it a native or virtual server.  This server though is native.





2. We need to know what version of 9.1 you're using, i.e. 9.1-RELEASE.
If you use stable/9 (RELENG_9) we need to see uname -a output (you can
hide the machine name if you want).


Sorry, this ZFS box is 9.1-R P4 (kernel built today):

FreeBSD ilos.dsn 9.1-RELEASE-p4 FreeBSD 9.1-RELEASE-p4 #6: Wed Jun
19 15:31:12 ICT 2013
root@hostname:/usr/obj/usr/src/sys/ATEAMSYSTEMS  amd64


I suggest trying stable/9 (and staying with it, for that matter).


The issue is no binary updates and we have a large deploy base, so we've 
stuck with -R and use it internally because it's what we deploy.


You still need to test if stable/9 fixes your issue though as otherwise
you don't know if the issue your seeing has already been fixed, and if
its the old know ZFS vfs hang on shutdown, it has.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Steven Hartland

Any timeouts show in /var/log/messages or in the areca event log?

- Original Message - 
From: Dennis Kögel d...@neveragain.de

Am 19.06.2013 um 15:28 schrieb Ronald Klop:

First send more information about the system:
- The content of /var/run/dmesg.boot.
- Install /usr/ports/sysutils/zfs-stats and send the output of zfs-stats -a.
- Send the output of zpool status + zpool list.


not sure if I should put them all in this mail? -- I've put them here:

http://pub.neveragain.de/arcsas/sysinfo.txt


- Did you configure compression or dedup on the pool?
- Do you keep a lot of snapshots?
- Do you run a cronjob every minute which does something with the pool? Gathers 
statistics or something like that.


There's only a handful of datasets (three on one machine, six on the other), 
and currently no snapshots. No deduplication.
Some datasets on one machine have compression, the other machine doesn't have 
compression turned on for any dataset.

No minutely cronjobs, automated logons, nothing alike.




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

On 6/19/2013 21:21, Steven Hartland wrote:

You still need to test if stable/9 fixes your issue though as otherwise
you don't know if the issue your seeing has already been fixed, and if
its the old know ZFS vfs hang on shutdown, it has.


Thanks Steve, understood but probably not going to happen with this box. 
 I can reboot this thing but it's our NAS and not a test bed.  This 
problem on this machine isn't a big deal because its a server and not 
rebooted often (and easy to bring back).  But I more was hoping it would 
let me easily test solutions to the issue since the other servers 
showing the issue are in client production with the mind that the VMs 
not use ZFS also show a similar/identical issue  My gut says it 
appeared in/with 9.1 (We never saw this with 9.0 servers).   It is also 
possible this is a different issue from those other servers and VMs.


How far away is 9.2? ;-P

Depending on how things go with Jeremy I'll probably have to wait this 
out unless I can get a test machine or VM where I can reproduce the 
issue AND upgrade it to -STABLE (again assuming it's even the same issue).

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Steven Hartland


- Original Message - 
From: Adam Strohl adams-free...@ateamsystems.com

To: Steven Hartland kill...@multiplay.co.uk
Cc: Jeremy Chadwick j...@koitsu.org; freebsd-stable@freebsd.org
Sent: Wednesday, June 19, 2013 3:29 PM
Subject: Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly 
dismount



On 6/19/2013 21:21, Steven Hartland wrote:

You still need to test if stable/9 fixes your issue though as otherwise
you don't know if the issue your seeing has already been fixed, and if
its the old know ZFS vfs hang on shutdown, it has.


Thanks Steve, understood but probably not going to happen with this box. 
 I can reboot this thing but it's our NAS and not a test bed.  This 
problem on this machine isn't a big deal because its a server and not 
rebooted often (and easy to bring back).  But I more was hoping it would 
let me easily test solutions to the issue since the other servers 
showing the issue are in client production with the mind that the VMs 
not use ZFS also show a similar/identical issue  My gut says it 
appeared in/with 9.1 (We never saw this with 9.0 servers).   It is also 
possible this is a different issue from those other servers and VMs.


How far away is 9.2? ;-P

Depending on how things go with Jeremy I'll probably have to wait this 
out unless I can get a test machine or VM where I can reproduce the 
issue AND upgrade it to -STABLE (again assuming it's even the same issue).


Don't rule out there being more than one issue at play.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Dennis Kögel
Am 19.06.2013 um 16:28 schrieb Steven Hartland:
 Any timeouts show in /var/log/messages or in the areca event log?

System logs don't show anything suspicious.

Areca CLI utility - event info is empty as well.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Steven Hartland
- Original Message - 
From: Dennis Kögel d...@neveragain.de




Am 19.06.2013 um 16:28 schrieb Steven Hartland:

Any timeouts show in /var/log/messages or in the areca event log?


System logs don't show anything suspicious.

Areca CLI utility - event info is empty as well.


I'm not familar with that model of the areca but have you tried
with the standard OS driver or does it not support that card?

Also when you see hangs can you access the disk directly or not
e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?

   Regards
   Steve 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Dennis Kögel
Am 19.06.2013 um 16:47 schrieb Steven Hartland:
 I'm not familar with that model of the areca but have you tried
 with the standard OS driver or does it not support that card?

The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver.

 Also when you see hangs can you access the disk directly or not
 e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?

Interesting idea. The dd then hangs right until everything else resumes as well.

^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 
1632k

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote:
 On 6/19/2013 20:35, Jeremy Chadwick wrote:

I've snipped out portions which aren't relevant at this point in the
convo.  I'm trying to be terse as much as possible here (honest).

To recap for readers/mailing list:

- Adam seems the same behaviour on systems on bare metal, as well as
  FreeBSD guests running under VMware ESXi 5.0 hypervisor.  However,
  as I stated on the list just yesterday about lock-ups on shutdown,
  every situation may be different and there is a well-established
  history of this problem on FreeBSD where each root cause (bugs)
  were completely different from one another.

- The system we're discussing at this point in the thread is on
  bare metal -- specifically an Asus P8B-X motherboard, with BIOS
  version 6103, driven entirely by on-board Intel AHCI (not BIOS-level
  RAID).

- Adam runs 9.1-RELEASE because of business needs pertaining to
  freebsd-update and binary updates.  (I ask more about this for
  benefits of readers below, however -- because this situation comes
  up a lot and I want to know what real-world admins do)

 Thanks.  I was mainly interested in the storage controller being used
 (in this case ahci(4)) and the disks being used (notorious ST3000DM001,
 known for excessively parking heads).
 
 Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD
 supply chain here (Thailand) is weird considering how many are made
 here (and can't buy).  Smartd screams about them possibly needing a
 firmware update (they don't according to Seagate).   Had no issues
 aside from a failure a month or so again (it's an HD ... it
 happens).

Absolutely understood -- and FYI, in case you need backup, your thought
process/conclusion here is spot on (re: it's a MHDD, failures happen).

Irrelevant to your shutdown problem: as for smartmontools bitching about
the firmware: no vendors disclose what actual changes go into their
drive firmware updates (vendors if you are reading this: I will have
your souls...), so I have to read a bunch of end-user forums where
nobody knows what they're talking about, and then of course find this
highly educational *cough* article from Adaptec:

http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives

The problem here is that there have been *so many* firmware bugs with
Seagate's drives in the past 2 years or so that it's impossible for me
to know which fixes what.  You buy what you buy because that's what you
buy, and that's cool -- but I avoid their stuff like the plague.

unrelated
Readers: if any of you have a ST[123]000DM001 drive running the CC24
firmware, and can confirm high head parking counts (SMART attribute
193), and are willing to upgrade your drive firmware to the latest then
see if the LCC increments stop (or at least settle down to normal
levels), I'd love to hear from you.  I have been socially boycotting
these models of drives because of that idiotic firmware design choice
for quite some time now (not to mention the parking on those drives
is audibly loud in a normal living room), and if the F/W actually
inhibits the excessive parking then I have some drives to consider
upgrading.  :-)
/unrelated

 I can also see you're running your own kernel.  We'll get to that in a
 moment.
 
 It's GENERIC with the following added to the end:
 
 # -- Add Support for nicer console
 #
 options VESA
 options SC_PIXEL_MODE

Can you try removing VESA and SC_PIXEL_MODE please?  I know that
sounds crazy (what on earth would that have to do with it?), but
please try it.  I can explain the justification if need be -- I'm being
extra paranoid of something that got discovered here on -stable only a
few days ago.  It's a stretch, but I can see potential relevance.  I can
provide details/links later.

 4. Does sysctl hw.usb.no_shutdown_wait=1 help you?
 
 Weirdly this allowed it to reboot on the first try (without needing
 to be reset), but not the second.
 
 I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
 working on the USB stack and fixing major bugs.
 
 Got it but probably not going to go this route as it means no more
 binary upgrades.  While I can reboot it, it is the office NAS here
 and so 'testing out' -STABLE I think probably isn't going to happen.

I understand.  I have a question relating to this below.

 Place background_fsck=no in /etc/rc.conf.  If the machine does not
 have a clean filesystem on boot-up, you'll know because the system will
 immediately begin fsck (in the foreground actively).  You'll recognise
 that output if it happens, trust me.
 
 Preaching to the choir, we set this on all servers this one somehow
 did not have it set (I think due to ZFS making it unique and not
 copying our rc.conf template over properly).

Where should I send my bill for services rendered?  (Totally kidding --
just had some breakfast so feeling chipper :-) )

 So the second try with just this I could ctrl alt 

Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Steven Hartland
- Original Message - 
From: Dennis Kögel d...@neveragain.de

 I'm not familar with that model of the areca but have you tried
 with the standard OS driver or does it not support that card?

The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver.

 Also when you see hangs can you access the disk directly or not
 e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?

Interesting idea. The dd then hangs right until everything else resumes as well.

^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 
1632k


So it sounds like your seeing device level hangs which indicates
either a driver, HW, controller FW or disk level issue.

You might want to try adding a seperate disk (different type)
to the controller which isn't used and perform the same test to
try and eliminate disk's as the source of the issue.

Also see what gstat -d shows during this? Do you see a big spike
of activity either side?

   Regards
   Steve 




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 05:02:20PM +0200, Dennis Kgel wrote:
 Am 19.06.2013 um 16:47 schrieb Steven Hartland:
  I'm not familar with that model of the areca but have you tried
  with the standard OS driver or does it not support that card?
 
 The ARC1320 (non-raid) unfortunately isn't supported by the in-tree driver.

Which model of the ARC1320 are you using (there are 2).  I'm having
trouble understanding their chart too:

http://www.areca.us/products/sasnoneraid6g.htm

Because the controllers claim to support up to 128 disks, via break-out
cables, but I'm not sure.

You aren't using any port multipliers, are you?

  Also when you see hangs can you access the disk directly or not
  e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?
 
 Interesting idea. The dd then hangs right until everything else resumes as 
 well.
 
 ^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 
 1632k

Is this ***while** you have immense amounts of ZFS write I/O going to
those drives (your zpool iostat was showing ~250-300MB/sec to the pool)?

It's very important to note that the stats you showed were during
writes.

What we're trying to figure out here is where the blocking (waiting) is
happening:

a) the ZFS layer
b) the storage driver layer ('arcsat', the 3rd-party unofficial driver)
c) the CAM layer
d) the GEOM layer
e) something with the disk(s)
f) something with memory I/O going on (say between the storage driver
   and ZFS, for lack of better way to phrase it)

I have a very big Email written for you, but I wanted to let certain
answers to Ronald's questions come out first.

-rw---1 jdc   users 5576 Jun 19 06:49 dennis_kgel_response.txt

I need to re-word this and take into consideration some of the new stuff
said up to now, but I don't know if I'll ahve the time for this (you
should see my desktop right now, I have literally 4 IM messages to
answer and my Email box is non-stop).

The one I want to get out of the way right now is this:

Can you please try putting this in /boot/loader.conf + reboot and
see if the behaviour for you changes?

vfs.zfs.no_write_throttle=1

Warning: this may actually exacerbate the problem worse, depending on
what the nature/root cause is.  Right now I'm of the opinion ZFS is
actually doing the Right Thing(tm) and that the issue may be in Areca's
driver, but that's hearsay until I have proof.  But the write throttling
stuff added semi-recently (by the Illumos folks, this is not a FreeBSD
feature) has had some reports of problems where disabling it helped
immensely.

Important: 24 disks off a single controller is a lot of bandwidth.
That controller may be overwhelmed, in which case you would see
exactly this kind of behaviour as the controller is screaming GOD HELP
ME, I'M TRYING TO DO ALL THIS STUFF AND YOU KEEP THROWING I/O AT ME.
:-)  This is also why I ask about port multiplier usage.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Matthew D. Fuller
On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of
Jeremy Chadwick, and lo! it spake thus:
 
 unrelated
 Readers: if any of you have a ST[123]000DM001 drive running the CC24
 firmware, and can confirm high head parking counts (SMART attribute
 193), and are willing to upgrade your drive firmware to the latest then
 see if the LCC increments stop (or at least settle down to normal
 levels), I'd love to hear from you.  I have been socially boycotting
 these models of drives because of that idiotic firmware design choice
 for quite some time now (not to mention the parking on those drives
 is audibly loud in a normal living room), and if the F/W actually
 inhibits the excessive parking then I have some drives to consider
 upgrading.  :-)
 /unrelated

I dunno about firmware, but you can smack 'em with a big hammer...

/etc/rc.local:
for i in 0 1; do
/sbin/camcontrol cmd ada${i} -a EF 85 00 00 00 00 00 00 00 00 00 00
done

x-ref:
http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html


LCC was somewhere in the upper 400's (I wanna say 480-some?) a year
and change ago when I dropped that in.  It's 506/493 now on the two
drives.


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 10:53:46AM -0500, Matthew D. Fuller wrote:
 On Wed, Jun 19, 2013 at 08:04:14AM -0700 I heard the voice of
 Jeremy Chadwick, and lo! it spake thus:
  
  unrelated
  Readers: if any of you have a ST[123]000DM001 drive running the CC24
  firmware, and can confirm high head parking counts (SMART attribute
  193), and are willing to upgrade your drive firmware to the latest then
  see if the LCC increments stop (or at least settle down to normal
  levels), I'd love to hear from you.  I have been socially boycotting
  these models of drives because of that idiotic firmware design choice
  for quite some time now (not to mention the parking on those drives
  is audibly loud in a normal living room), and if the F/W actually
  inhibits the excessive parking then I have some drives to consider
  upgrading.  :-)
  /unrelated
 
 I dunno about firmware, but you can smack 'em with a big hammer...
 
 /etc/rc.local:
 for i in 0 1; do
 /sbin/camcontrol cmd ada${i} -a EF 85 00 00 00 00 00 00 00 00 00 00
 done
 
 x-ref:
 http://lists.freebsd.org/pipermail/freebsd-stable/2009-November/052997.html
 
 
 LCC was somewhere in the upper 400's (I wanna say 480-some?) a year
 and change ago when I dropped that in.  It's 506/493 now on the two
 drives.

The above CDB + subcommand disables APM entirely.  There is a lot more
to APM than just parking heads (and in all honesty, APM should have
nothing to do with parking heads).  Disabling APM can actually have
drastic effects on drive temperature (meaning there are certain chip
and/or motor operations that said feature controls *in addition* to head
parking), and other firmware-level features that aren't documented.

Furthermore, that CDB does not work for all drives.  There are Seagate
drives -- I know because I bought some and returned them when the APM
trick did not work -- that lack the LCC-disable tie-in to APM.  The
drive either rejected the CDB (ATA status code error returned), while
others accepted it but nothing in 0xec (IDENTIFY) reported as got
changed.

The only model of drive I know that reliably works with this method is
the WD Green/-GP drive, and the drive temperatures do increase.  No idea
on the Blues.  (Another reason I recommend the Reds...)

What *should* have happened is that a new 0xef subcommand should have
been created for this.  Subs range from 0x00-0xff.  T13 spec shows
that a huge number of them (I'd say 30% or more) are marked Reserved
and an additional 30% or so are marked Obsolete.  And finally,
0x56-0x5c, 0xd6-0xdc and 0xe0 are Vendor Specific.

But looking at this from a more general view, the real issue is that
these types of features should not have been introduced to begin with.
The vendors introduced this problem, and now are marketing drives with
said feature disabled, claiming we fixed the problem that annoys so
many of you! -- the same problem **they introduced without asking
anyone**.

I will have -- and eat -- their souls.

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Matthew D. Fuller
On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of
Jeremy Chadwick, and lo! it spake thus:
 
 The above CDB + subcommand disables APM entirely.  There is a lot
 more to APM than just parking heads (and in all honesty, APM should
 have nothing to do with parking heads).  Disabling APM can actually
 have drastic effects on drive temperature (meaning there are certain
 chip and/or motor operations that said feature controls *in
 addition* to head parking), and other firmware-level features that
 aren't documented.

True enough, in concept.  With all the drives sitting behind
ventilation perfectly capable of dealing with 15kRPM drives, I don't
worry about what that might do to the 7200's though...


 Furthermore, that CDB does not work for all drives.  There are
 Seagate drives -- I know because I bought some and returned them
 when the APM trick did not work -- that lack the LCC-disable tie-in
 to APM.  The drive either rejected the CDB (ATA status code error
 returned), while others accepted it but nothing in 0xec (IDENTIFY)
 reported as got changed.

Well, I haven't seen it with these.  Several of
ada0: ST1000DM003-9YN162 CC4D ATA-8 SATA 3.x device
and some systems with CC4C too.


 I will have -- and eat -- their souls.

The problem with that is that the undigestible bits of soul just get
passed right back into the ecosystem, and in a more concentrated form.

Some might suggest that's already happened, and is got us here in the
first place  8-}


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Jeremy Chadwick
On Wed, Jun 19, 2013 at 11:34:39AM -0500, Matthew D. Fuller wrote:
 On Wed, Jun 19, 2013 at 09:16:35AM -0700 I heard the voice of
 Jeremy Chadwick, and lo! it spake thus:
  
  The above CDB + subcommand disables APM entirely.  There is a lot
  more to APM than just parking heads (and in all honesty, APM should
  have nothing to do with parking heads).  Disabling APM can actually
  have drastic effects on drive temperature (meaning there are certain
  chip and/or motor operations that said feature controls *in
  addition* to head parking), and other firmware-level features that
  aren't documented.
 
 True enough, in concept.  With all the drives sitting behind
 ventilation perfectly capable of dealing with 15kRPM drives, I don't
 worry about what that might do to the 7200's though...

Justified in your environment, but not in mine -- where most of my
systems (at home) are extremely quiet (1000-1200rpm fans, lots of noise
dampening material, etc.).  A 10C increase *during idle* is enough to
make me wary.  I also have extremely sensitive hearing, so drives
clicking is something I can hear from quite a distance -- I guess
working with them for so long over the years has made me sensitive to
'em.

  Furthermore, that CDB does not work for all drives.  There are
  Seagate drives -- I know because I bought some and returned them
  when the APM trick did not work -- that lack the LCC-disable tie-in
  to APM.  The drive either rejected the CDB (ATA status code error
  returned), while others accepted it but nothing in 0xec (IDENTIFY)
  reported as got changed.
 
 Well, I haven't seen it with these.  Several of
 ada0: ST1000DM003-9YN162 CC4D ATA-8 SATA 3.x device
 and some systems with CC4C too.

The drives I was testing were STx000DM001.  I don't remember if I had a
DM002.  I also don't remember the firmware version they had on them, but
I do remember there were no updates available from Seagate at that time.
On the other hand, their forum was *filled* with post after post about
the issue, including one fellow whose drive in something like 3 months
was almost reaching MTBF head park/reload count.

But my point is this: 3.5 drives do not need this feature in 95% of
environments.  In desktop systems it's worthless -- in consumer desktops
it accomplishes nothing but noise and annoyance and impacts I/O, and in
business desktop desktop environments it serves no purpose because most
places have their desktops go into sleep mode (so drive standby/sleep
gets used).  And in the server environment it's pure 100% worthless.

With 2.5 drives I can see it being more useful, but only if the drive
is used in a laptop.  There are NASes (and now servers too!) which use
2.5 drives, and I sure as hell wouldn't want that happening there.

So really it's just a bad feature all around that should be specific to
one environment demographic; the vendors should have made a 2.5 drive
dedicated for laptops that had this feature enabled, while disabld on
all other drives (2.5 and 3.5).  What we got was nearly opposite.

  I will have -- and eat -- their souls.
 
 The problem with that is that the undigestible bits of soul just get
 passed right back into the ecosystem, and in a more concentrated form.
 
 Some might suggest that's already happened, and is got us here in the
 first place  8-}

If you had what I do (moderate-to-severe IBS), you'd know that it
definitely doesn't get passed back in a more concentrated form.  First
joke I've been able to make about my health condition, yeah!  Ha!  I
kill me! -- Alf

-- 
| Jeremy Chadwick   j...@koitsu.org |
| UNIX Systems Administratorhttp://jdc.koitsu.org/ |
| Making life hard for others since 1977. PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Matthew D. Fuller
On Wed, Jun 19, 2013 at 09:52:00AM -0700 I heard the voice of
Jeremy Chadwick, and lo! it spake thus:
 
 Justified in your environment, but not in mine -- where most of my
 systems (at home) are extremely quiet (1000-1200rpm fans, lots of
 noise dampening material, etc.).  A 10C increase *during idle* is
 enough to make me wary.

Mmm.  Well, some of them are in 1U cases, and so behind very loud
little fans (but that's in a datacenter where *I* don't have to hear
it).  But the ones sitting beside me are behind 1kRPM fans (80 and
120 mm), and are around 28-30c (which is a tad high; the filters are
overdue for cleaning).  And ambient is probably 24-25.  I'd be
seriously creeped out if an *active* drive were 10 over ambient, much
less if flipping some config setting moved anything 10.

(this is also why I _hate_ laptops...)


 On the other hand, their forum was *filled* with post after post
 about the issue, including one fellow whose drive in something like
 3 months was almost reaching MTBF head park/reload count.

Oh, sure.  If you don't get the stupid things to stop, you can measure
their life with an egg timer.  The 400-some these drives got before I
turned APM off happened in, like, an afternoon.


 If you had what I do (moderate-to-severe IBS), you'd know that it
 definitely doesn't get passed back in a more concentrated form.
 First joke I've been able to make about my health condition, yeah!

Well, if your diet consists of hard drive manufacturer's souls, it's
no wonder your system got all screwed up!  You gotta find something to
eat with more moral fiber!;p


-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: system sporadically hangs on shutdown after switching to WITH_NEW_XORG

2013-06-19 Thread Warren Block

On Sun, 16 Jun 2013, Ian Lepore wrote:


On Sun, 2013-06-16 at 09:07 -0700, Jeremy Chadwick wrote:

On Sun, Jun 16, 2013 at 06:01:49PM +0200, Michiel Boland wrote:

On 06/16/2013 17:55, Jeremy Chadwick wrote:
[...]


Are you running moused(8)?  Actually, I can see quite clearly that you
are in your core.txt:

Starting ums0 moused.

Try turning that off.  Don't ask me how, because devd(8) / devd.conf(5)
might be involved.



The moused is started by devd - I don't see a quick way of turning that off.


Comment out the relevant crap in devd.conf(5).  Search for ums
and comment out the two notify sections.


I don't understand why people treat devd as if it's some sort of evil
virus that they're forced to live with (using phrases like crap in
devd.conf).  In general, the standard devd rules tend to fall into 3
categories:
 * use logger(1) to record some anomaly
 * kldload a module
 * invoke a standard /etc/rc.d script

For moused, the devd rules invoke /etc/rc.d/moused, which implies that
setting moused_enable=NO in rc.conf would be all that's needed to
disable it.


Seems that way, but it's misleading.  Plug in a USB mouse, and devd will 
start moused anyway (with different options, but still...).  ISTR that 
can be disabled with


  moused_enable=NO
  moused_nondefault_enable=NO

I have not tested that lately.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: shutdown -r / shutdown -h / reboot all hang and don't cleanly dismount

2013-06-19 Thread Adam Strohl

On 6/19/2013 22:04, Jeremy Chadwick wrote:

On Wed, Jun 19, 2013 at 09:15:18PM +0700, Adam Strohl wrote:

On 6/19/2013 20:35, Jeremy Chadwick wrote:


I've snipped out portions which aren't relevant at this point in the
convo.  I'm trying to be terse as much as possible here (honest).

To recap for readers/mailing list:

- Adam seems the same behaviour on systems on bare metal, as well as
   FreeBSD guests running under VMware ESXi 5.0 hypervisor.  However,
   as I stated on the list just yesterday about lock-ups on shutdown,
   every situation may be different and there is a well-established
   history of this problem on FreeBSD where each root cause (bugs)
   were completely different from one another.

- The system we're discussing at this point in the thread is on
   bare metal -- specifically an Asus P8B-X motherboard, with BIOS
   version 6103, driven entirely by on-board Intel AHCI (not BIOS-level
   RAID).

- Adam runs 9.1-RELEASE because of business needs pertaining to
   freebsd-update and binary updates.  (I ask more about this for
   benefits of readers below, however -- because this situation comes
   up a lot and I want to know what real-world admins do)



This is all correct.


Thanks.  I was mainly interested in the storage controller being used
(in this case ahci(4)) and the disks being used (notorious ST3000DM001,
known for excessively parking heads).


Yeah, was not my first choice but then again ... RAIDZ-2 :)  HD
supply chain here (Thailand) is weird considering how many are made
here (and can't buy).  Smartd screams about them possibly needing a
firmware update (they don't according to Seagate).   Had no issues
aside from a failure a month or so again (it's an HD ... it
happens).


Absolutely understood -- and FYI, in case you need backup, your thought
process/conclusion here is spot on (re: it's a MHDD, failures happen).


Indeed :-D



Irrelevant to your shutdown problem: as for smartmontools bitching about
the firmware: no vendors disclose what actual changes go into their
drive firmware updates (vendors if you are reading this: I will have
your souls...), so I have to read a bunch of end-user forums where
nobody knows what they're talking about, and then of course find this
highly educational *cough* article from Adaptec:

http://ask.adaptec.com/app/answers/detail/a_id/17241/~/known-issues-with-seagate-barracuda-7200.14-desktop-drives



Yeah I agree .. I tried to firmware upgrade them when I was building the 
system but it said they didn't qualify when using the boot ISO.  I just 
checked the site and it says no firmware update available too when using 
their search by serial # tool.   At this point I'm leery about updating 
given that I've got data on it anyway.  I do occasionally (maybe once a 
week or two and they're in the same room as me/my office) hear one parking.


I see nothing wrong in smart though, no dmesg errors and have noticed no 
issues with the array and it bench tests at around 850 MB/sec.  Too bad 
10 Gbit equipment isn't cheaper.


Also when I bought the 6 for this array I got a 7th as a cold spare :P


The problem here is that there have been *so many* firmware bugs with
Seagate's drives in the past 2 years or so that it's impossible for me
to know which fixes what.  You buy what you buy because that's what you
buy, and that's cool -- but I avoid their stuff like the plague.


Yeah.  I'd prefer WD myself but this place is swimming in green and 
now red drives.  uhgl.


 Snipping out the unrelated parts ... 


Can you try removing VESA and SC_PIXEL_MODE please?  I know that
sounds crazy (what on earth would that have to do with it?), but
please try it.  I can explain the justification if need be -- I'm being
extra paranoid of something that got discovered here on -stable only a
few days ago.  It's a stretch, but I can see potential relevance.  I can
provide details/links later.


No change unfortunately.




4. Does sysctl hw.usb.no_shutdown_wait=1 help you?


Weirdly this allowed it to reboot on the first try (without needing
to be reset), but not the second.


I'm not surprised.  Pleas re-try with stable/9; Hans has been constantly
working on the USB stack and fixing major bugs.


Got it but probably not going to go this route as it means no more
binary upgrades.  While I can reboot it, it is the office NAS here
and so 'testing out' -STABLE I think probably isn't going to happen.


I understand.  I have a question relating to this below.


Place background_fsck=no in /etc/rc.conf.  If the machine does not
have a clean filesystem on boot-up, you'll know because the system will
immediately begin fsck (in the foreground actively).  You'll recognise
that output if it happens, trust me.


Preaching to the choir, we set this on all servers this one somehow
did not have it set (I think due to ZFS making it unique and not
copying our rc.conf template over properly).


Where should I send my bill for services rendered?  (Totally kidding --
just had some breakfast 

Re: Weird I/O hangs (9.1R, arcsas, interrupt spikes on uhci0)

2013-06-19 Thread Dennis Kögel
Am 19.06.2013 um 17:16 schrieb Jeremy Chadwick j...@koitsu.org:
 Which model of the ARC1320 are you using (there are 2).

It has four internal connectors, so it should be the ARC-1320ix-16.

No port multipliers.

 Also when you see hangs can you access the disk directly or not
 e.g. dd if=/dev/da0 of=/dev/null bs=1m count=10 ?
 
 Interesting idea. The dd then hangs right until everything else resumes as 
 well.
 
 ^T during hang says: load: 12.39  cmd: dd 7847 [physrd] 6.36r 0.00u 0.00s 0% 
 1632k
 
 Is this ***while** you have immense amounts of ZFS write I/O going to
 those drives (your zpool iostat was showing ~250-300MB/sec to the pool)?
 [...]

It's important to note that the interrupt spikes (and the I/O hangs) happen 
just as frequently on an idle system.
Having a bunch of dd processes writiing + iostat just visualizes it better.

So, with or without actual write load: dd with if=/dev/daX (arcsas device) 
hangs when the interrupt counters for uhci0 soar for these ~10 seconds phases, 
as shown above.

Noteworthy: dd'ing from if=/dev/ada1 (onboard controller) during such a hang 
phase returns immediately, i.e. works fine. (ada1 is part of ZFS -- the other 
'zroot' pool -- but is not an arcsas device, so a driver issue sounds more 
likely).

 Can you please try putting this in /boot/loader.conf + reboot and
 see if the behaviour for you changes?
 
 vfs.zfs.no_write_throttle=1

This produces quite interesting burst numbers, but does not affect the problem 
behaviour at all.

Am 19.06.2013 um 17:10 schrieb Steven Hartland kill...@multiplay.co.uk:
 You might want to try adding a seperate disk (different type)
 to the controller which isn't used and perform the same test to
 try and eliminate disk's as the source of the issue.

That's currently not an option, as the zpool already contains data; but I tried 
against a disk on another controller, see above.

 Also see what gstat -d shows during this? Do you see a big spike
 of activity either side?

The picture is pretty much the same as with zpool iostat: Healthy values, all 
disks from 70-100% busy; during a hang phase, every column just drops to zero 
-- except for L(q), which remains frozen at some low value for the duration of 
the hang (e.g. 4 or 10).
Sample outputs here: http://pub.neveragain.de/arcsas/gstat.txt

Thanks,
D.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


[releng_9 tinderbox] failure on ia64/ia64

2013-06-19 Thread FreeBSD Tinderbox
TB --- 2013-06-19 18:20:57 - tinderbox 2.10 running on freebsd-stable.sentex.ca
TB --- 2013-06-19 18:20:57 - FreeBSD freebsd-stable.sentex.ca 8.3-STABLE 
FreeBSD 8.3-STABLE #0: Tue Oct 16 17:37:58 UTC 2012 
mdtan...@freebsd-stable.sentex.ca:/usr/obj/usr/src/sys/server  amd64
TB --- 2013-06-19 18:20:57 - starting RELENG_9 tinderbox run for ia64/ia64
TB --- 2013-06-19 18:20:57 - cleaning the object tree
TB --- 2013-06-19 18:21:14 - /usr/local/bin/svn stat /src
TB --- 2013-06-19 18:21:50 - At svn revision 251993
TB --- 2013-06-19 18:21:51 - building world
TB --- 2013-06-19 18:21:51 - CROSS_BUILD_TESTING=YES
TB --- 2013-06-19 18:21:51 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-06-19 18:21:51 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-06-19 18:21:51 - SRCCONF=/dev/null
TB --- 2013-06-19 18:21:51 - TARGET=ia64
TB --- 2013-06-19 18:21:51 - TARGET_ARCH=ia64
TB --- 2013-06-19 18:21:51 - TZ=UTC
TB --- 2013-06-19 18:21:51 - __MAKE_CONF=/dev/null
TB --- 2013-06-19 18:21:51 - cd /src
TB --- 2013-06-19 18:21:51 - /usr/bin/make -B buildworld
 World build started on Wed Jun 19 18:21:52 UTC 2013
 Rebuilding the temporary build tree
 stage 1.1: legacy release compatibility shims
 stage 1.2: bootstrap tools
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3: cross tools
 stage 4.1: building includes
 stage 4.2: building libraries
 stage 4.3: make dependencies
 stage 4.4: building everything
 World build completed on Wed Jun 19 20:09:19 UTC 2013
TB --- 2013-06-19 20:09:19 - generating LINT kernel config
TB --- 2013-06-19 20:09:19 - cd /src/sys/ia64/conf
TB --- 2013-06-19 20:09:19 - /usr/bin/make -B LINT
TB --- 2013-06-19 20:09:19 - cd /src/sys/ia64/conf
TB --- 2013-06-19 20:09:19 - /usr/sbin/config -m LINT
TB --- 2013-06-19 20:09:19 - building LINT kernel
TB --- 2013-06-19 20:09:19 - CROSS_BUILD_TESTING=YES
TB --- 2013-06-19 20:09:19 - MAKEOBJDIRPREFIX=/obj
TB --- 2013-06-19 20:09:19 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
TB --- 2013-06-19 20:09:19 - SRCCONF=/dev/null
TB --- 2013-06-19 20:09:19 - TARGET=ia64
TB --- 2013-06-19 20:09:19 - TARGET_ARCH=ia64
TB --- 2013-06-19 20:09:19 - TZ=UTC
TB --- 2013-06-19 20:09:19 - __MAKE_CONF=/dev/null
TB --- 2013-06-19 20:09:19 - cd /src
TB --- 2013-06-19 20:09:19 - /usr/bin/make -B buildkernel KERNCONF=LINT
 Kernel build for LINT started on Wed Jun 19 20:09:19 UTC 2013
 stage 1: configuring the kernel
 stage 2.1: cleaning up the object tree
 stage 2.2: rebuilding the object tree
 stage 2.3: build tools
 stage 3.1: making dependencies
 stage 3.2: building everything
[...]
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
/src/sys/dev/advansys/adwmcode.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/ae/if_ae.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param inline-unit-growth=100 --param 
large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
-mfixed-range=f32-f127 -fpic -ffreestanding -Werror  /src/sys/dev/age/if_age.c
cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
-Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
-Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
-Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
-I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
-DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
-finline-limit=15000 --param 

sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Miroslav Lachman
The version of sshd in FreeBSD 8.4 is not backward compatible with older 
version from 8.3.


OpenSSH_5.4p1 (on FreeBSD 8.3)
OpenSSH_6.1p1 (on FreeBSD 8.4)

# sshd -t
/etc/ssh/sshd_config line 19: Missing argument.

On line 19, there is:
VersionAddendum

It was OK in older versions. It will remove any default text appended to 
SSH protocol banner (for example 'FreeBSD-20120901').


On FreeBSD 8.4, there must be some string (any single character)

I was really badly surprised that the machine was re-booted without ssh 
access!


I think this change is worth to mention in Release Notes

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Steven Hartland

Given its often critical nature ssh really should never fail due
to a bad config line, it should ignore and continue.

- Original Message - 
From: Miroslav Lachman 000.f...@quip.cz

To: freebsd-stable Stable freebsd-stable@FreeBSD.org
Sent: Wednesday, June 19, 2013 11:17 PM
Subject: sshd didn't run after upgrade to FreeBSD 8.4


The version of sshd in FreeBSD 8.4 is not backward compatible with older 
version from 8.3.


OpenSSH_5.4p1 (on FreeBSD 8.3)
OpenSSH_6.1p1 (on FreeBSD 8.4)

# sshd -t
/etc/ssh/sshd_config line 19: Missing argument.

On line 19, there is:
VersionAddendum

It was OK in older versions. It will remove any default text appended to 
SSH protocol banner (for example 'FreeBSD-20120901').


On FreeBSD 8.4, there must be some string (any single character)

I was really badly surprised that the machine was re-booted without ssh 
access!


I think this change is worth to mention in Release Notes

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org




This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Kimmo Paasiala
On Thu, Jun 20, 2013 at 1:17 AM, Miroslav Lachman 000.f...@quip.cz wrote:
 The version of sshd in FreeBSD 8.4 is not backward compatible with older
 version from 8.3.

 OpenSSH_5.4p1 (on FreeBSD 8.3)
 OpenSSH_6.1p1 (on FreeBSD 8.4)

 # sshd -t
 /etc/ssh/sshd_config line 19: Missing argument.

 On line 19, there is:
 VersionAddendum

 It was OK in older versions. It will remove any default text appended to SSH
 protocol banner (for example 'FreeBSD-20120901').

 On FreeBSD 8.4, there must be some string (any single character)

 I was really badly surprised that the machine was re-booted without ssh
 access!

 I think this change is worth to mention in Release Notes

 Miroslav Lachman

How did you update to 8.4? This sounds more like messing up the
mergemaster(8)/freebsd-update merge procedure than a real problem with
the config file.

This is the source configuration file straight from SVN releng/8.4
branch and as you can see the VersionAddendum on line 115 is commented
out there:

http://svnweb.freebsd.org/base/releng/8.4/crypto/openssh/sshd_config?view=markup

-Kimmo
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Miroslav Lachman

Kimmo Paasiala wrote:

On Thu, Jun 20, 2013 at 1:17 AM, Miroslav Lachman000.f...@quip.cz  wrote:

The version of sshd in FreeBSD 8.4 is not backward compatible with older
version from 8.3.

OpenSSH_5.4p1 (on FreeBSD 8.3)
OpenSSH_6.1p1 (on FreeBSD 8.4)

# sshd -t
/etc/ssh/sshd_config line 19: Missing argument.

On line 19, there is:
VersionAddendum

It was OK in older versions. It will remove any default text appended to SSH
protocol banner (for example 'FreeBSD-20120901').

On FreeBSD 8.4, there must be some string (any single character)

I was really badly surprised that the machine was re-booted without ssh
access!

I think this change is worth to mention in Release Notes

Miroslav Lachman


How did you update to 8.4? This sounds more like messing up the
mergemaster(8)/freebsd-update merge procedure than a real problem with
the config file.

This is the source configuration file straight from SVN releng/8.4
branch and as you can see the VersionAddendum on line 115 is commented
out there:

http://svnweb.freebsd.org/base/releng/8.4/crypto/openssh/sshd_config?view=markup


It was upgraded by freebsd-update. It was intentionally left here as it 
was valid configuration for many years.
That's why I think it should be mentioned in the Release Notes, that it 
is no longer valid configuration (empty VersionAddendum).


The fact, that it is no longer in default sshd_config file doesn't mean 
it can't be used at all. It is still valid in the form which was in old 
default config: VersionAddendum FreeBSD-20100308, but is no longer 
valid if empty. That's the point.


(and empty VersionAddendum was widely used, it is not my invention)

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Kimmo Paasiala
On Thu, Jun 20, 2013 at 2:29 AM, Miroslav Lachman 000.f...@quip.cz wrote:
 Kimmo Paasiala wrote:

 On Thu, Jun 20, 2013 at 1:17 AM, Miroslav Lachman000.f...@quip.cz
 wrote:

 The version of sshd in FreeBSD 8.4 is not backward compatible with older
 version from 8.3.

 OpenSSH_5.4p1 (on FreeBSD 8.3)
 OpenSSH_6.1p1 (on FreeBSD 8.4)

 # sshd -t
 /etc/ssh/sshd_config line 19: Missing argument.

 On line 19, there is:
 VersionAddendum

 It was OK in older versions. It will remove any default text appended to
 SSH
 protocol banner (for example 'FreeBSD-20120901').

 On FreeBSD 8.4, there must be some string (any single character)

 I was really badly surprised that the machine was re-booted without ssh
 access!

 I think this change is worth to mention in Release Notes

 Miroslav Lachman


 How did you update to 8.4? This sounds more like messing up the
 mergemaster(8)/freebsd-update merge procedure than a real problem with
 the config file.

 This is the source configuration file straight from SVN releng/8.4
 branch and as you can see the VersionAddendum on line 115 is commented
 out there:


 http://svnweb.freebsd.org/base/releng/8.4/crypto/openssh/sshd_config?view=markup


 It was upgraded by freebsd-update. It was intentionally left here as it was
 valid configuration for many years.
 That's why I think it should be mentioned in the Release Notes, that it is
 no longer valid configuration (empty VersionAddendum).

 The fact, that it is no longer in default sshd_config file doesn't mean it
 can't be used at all. It is still valid in the form which was in old default
 config: VersionAddendum FreeBSD-20100308, but is no longer valid if empty.
 That's the point.

 (and empty VersionAddendum was widely used, it is not my invention)

 Miroslav Lachman


You're missing my point totally. The line is commented out in the
official source of 8.4 and there for I have very hard time believing
that it would show up uncommented on a fresh 8.4 installation.

-Kimmo
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Adam Vande More
On Wed, Jun 19, 2013 at 6:32 PM, Kimmo Paasiala kpaas...@gmail.com wrote:

 You're missing my point totally. The line is commented out in the
 official source of 8.4 and there for I have very hard time believing
 that it would show up uncommented on a fresh 8.4 installation.


I don't think this warrants a mention in the Release Notes for exactly this
point, however it should probably be mentioned in UPDATING.  If nothing
else, that would at least keep UPDATING consistent with previous ssh major
upgrades.

-- 
Adam Vande More
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Steven Hartland


- Original Message - 
From: Kimmo Paasiala kpaas...@gmail.com

To: Miroslav Lachman 000.f...@quip.cz
Cc: freebsd-stable Stable freebsd-stable@freebsd.org
Sent: Thursday, June 20, 2013 12:32 AM
Subject: Re: sshd didn't run after upgrade to FreeBSD 8.4



On Thu, Jun 20, 2013 at 2:29 AM, Miroslav Lachman 000.f...@quip.cz wrote:

Kimmo Paasiala wrote:


On Thu, Jun 20, 2013 at 1:17 AM, Miroslav Lachman000.f...@quip.cz
wrote:


The version of sshd in FreeBSD 8.4 is not backward compatible with older
version from 8.3.

OpenSSH_5.4p1 (on FreeBSD 8.3)
OpenSSH_6.1p1 (on FreeBSD 8.4)

# sshd -t
/etc/ssh/sshd_config line 19: Missing argument.

On line 19, there is:
VersionAddendum

It was OK in older versions. It will remove any default text appended to
SSH
protocol banner (for example 'FreeBSD-20120901').

On FreeBSD 8.4, there must be some string (any single character)

I was really badly surprised that the machine was re-booted without ssh
access!

I think this change is worth to mention in Release Notes

Miroslav Lachman



How did you update to 8.4? This sounds more like messing up the
mergemaster(8)/freebsd-update merge procedure than a real problem with
the config file.

This is the source configuration file straight from SVN releng/8.4
branch and as you can see the VersionAddendum on line 115 is commented
out there:


http://svnweb.freebsd.org/base/releng/8.4/crypto/openssh/sshd_config?view=markup



It was upgraded by freebsd-update. It was intentionally left here as it was
valid configuration for many years.
That's why I think it should be mentioned in the Release Notes, that it is
no longer valid configuration (empty VersionAddendum).

The fact, that it is no longer in default sshd_config file doesn't mean it
can't be used at all. It is still valid in the form which was in old default
config: VersionAddendum FreeBSD-20100308, but is no longer valid if empty.
That's the point.

(and empty VersionAddendum was widely used, it is not my invention)

Miroslav Lachman



You're missing my point totally. The line is commented out in the
official source of 8.4 and there for I have very hard time believing
that it would show up uncommented on a fresh 8.4 installation.


I believe Miroslav is saying he left his old but previously working
sshd_config as was when updating, so its a change to the code which
now fails on an empty VersionAddendum, where it previously didn't
hence the problem.

   Regards
   Steve


This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. 


In the event of misdirection, illegible or incomplete transmission please 
telephone +44 845 868 1337
or return the E.mail to postmas...@multiplay.co.uk.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Kimmo Paasiala
On Thu, Jun 20, 2013 at 2:40 AM, Steven Hartland
kill...@multiplay.co.uk wrote:


 I believe Miroslav is saying he left his old but previously working
 sshd_config as was when updating, so its a change to the code which
 now fails on an empty VersionAddendum, where it previously didn't
 hence the problem.

Regards
Steve



Err yes, your right. The proper way to specify empty VersionAddendum
based on some googling seems to be now:


VersionAddendum 


-Kimmo
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Charles Sprickman
On Jun 19, 2013, at 7:37 PM, Adam Vande More wrote:

 On Wed, Jun 19, 2013 at 6:32 PM, Kimmo Paasiala kpaas...@gmail.com wrote:
 
 You're missing my point totally. The line is commented out in the
 official source of 8.4 and there for I have very hard time believing
 that it would show up uncommented on a fresh 8.4 installation.
 
 
 I don't think this warrants a mention in the Release Notes for exactly this
 point, however it should probably be mentioned in UPDATING.  If nothing
 else, that would at least keep UPDATING consistent with previous ssh major
 upgrades.

+1

Even if you ran mergemaster and saw the change, without a comment above the 
VersionAddendum line or mention in UPDATING, you might make any number of 
assumptions about why it's commented out now.Given the behavior (ie: sshd 
does not start) for those that have chosen in the past not to tell the world 
what OS and build date they are running.

Not really the best choice by the OpenSSH folks either, IMHO.  I skim the 
OpenSSH release notes sent to the -announce list and totally missed this change.

Charles

 
 -- 
 Adam Vande More
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Miroslav Lachman

Kimmo Paasiala wrote:

On Thu, Jun 20, 2013 at 2:40 AM, Steven Hartland
kill...@multiplay.co.uk  wrote:



I believe Miroslav is saying he left his old but previously working
sshd_config as was when updating, so its a change to the code which
now fails on an empty VersionAddendum, where it previously didn't
hence the problem.


Yes, this is my point - I left my old and previously working sshd_config 
with empty VersionAddendum.



Err yes, your right. The proper way to specify empty VersionAddendum
based on some googling seems to be now:


VersionAddendum 


This is not true, it will add two quotes to the banner:
SSH-2.0-OpenSSH_6.1_hpn13v11 


Default banner (no VersionAddendum in sshd_config):
SSH-2.0-OpenSSH_6.1_hpn13v11 FreeBSD-20120901


So I am fine with:
VersionAddendum -

It will print:
SSH-2.0-OpenSSH_6.1_hpn13v11 -

I don't need really empty addendum, I just don't want to show FreeBSD 
version info and empty VersionAddendum was working for me many years. 
Now it breaks sshd after final reboot on two of our upgraded servers.


So Release Notes or better UPDATING entry will warn other users before 
the same mistake.


Thanks to the remote management / KVM on Sun Fire and Supermicro servers 
that I didn't need to drive to the datacenter and I can fix it remotely.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: sshd didn't run after upgrade to FreeBSD 8.4

2013-06-19 Thread Kimmo Paasiala
On Thu, Jun 20, 2013 at 3:15 AM, Miroslav Lachman 000.f...@quip.cz wrote:
 Kimmo Paasiala wrote:

 On Thu, Jun 20, 2013 at 2:40 AM, Steven Hartland
 kill...@multiplay.co.uk  wrote:



 I believe Miroslav is saying he left his old but previously working
 sshd_config as was when updating, so its a change to the code which
 now fails on an empty VersionAddendum, where it previously didn't
 hence the problem.


 Yes, this is my point - I left my old and previously working sshd_config
 with empty VersionAddendum.


 Err yes, your right. The proper way to specify empty VersionAddendum
 based on some googling seems to be now:


 VersionAddendum 


 This is not true, it will add two quotes to the banner:
 SSH-2.0-OpenSSH_6.1_hpn13v11 


 Default banner (no VersionAddendum in sshd_config):
 SSH-2.0-OpenSSH_6.1_hpn13v11 FreeBSD-20120901


 So I am fine with:
 VersionAddendum -

 It will print:
 SSH-2.0-OpenSSH_6.1_hpn13v11 -

 I don't need really empty addendum, I just don't want to show FreeBSD
 version info and empty VersionAddendum was working for me many years. Now it
 breaks sshd after final reboot on two of our upgraded servers.

 So Release Notes or better UPDATING entry will warn other users before the
 same mistake.

 Thanks to the remote management / KVM on Sun Fire and Supermicro servers
 that I didn't need to drive to the datacenter and I can fix it remotely.

 Miroslav Lachman

Ok, this is crazy. If you put one space after the VersionAddendum
keyword you get exactly what you want, an empty VersionAddendum
string. If there's no space but a newline right after the
VersionAddendum keyword, sshd(8) complains about the line and refuses
to start. So this is ok (without the single quotes, they are just to
show the endings of the lines):

'VersionAddendum '

But this is not:

'VersionAddendum'

What are the OpenSSH devs thinking?

-Kimmo
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org