Re: 7.2 dies in zfs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Adam McDougall wrote: On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote: On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote: Randy Bush ra...@psg.com wrote: imiho, zfs can not be called production ready if it crashes if you do not stand on your left leg, put your right hand in the air, and burn some eye of newt. This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has been marked as production ready. As far as i know, on FreeBSD 8.0 ZFS is called production ready. If you boot your system it probably tell you it is still experimental. Try running FreeBSD 7-Stable to get the latest ZFS version which on FreeBSD is 13 On 7.2 it is still at 6 (if I remember it right). RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18. RELENG_7 and RELENG_8 both, more or less, behave the same way with regards to ZFS. Both panic on kmem exhaustion. No one has answered my question as far as what's needed to stabilise ZFS on either 7.x or 8.x. I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13. It has been stable since mid may. I have not had a kmem panic on any of my ZFS systems for a long time, its a matter of making sure there is enough kmem at boot (not depending on kmem_size_max) and that it is big enough that fragmentation does not cause a premature allocation failure due to lack of large-enough contiguous chunk. This requires the platform to support a kmem size that is big enough... i386 can barely muster 1.6G and sometimes that might not be enough. I'm pretty sure all of my currently existing ZFS systems are amd64 where the kmem can now be huge. On the busy fileserver with 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009, I currently have: vfs.zfs.arc_max=16384M vfs.zfs.arc_min=4096M vm.kmem_size=18G The arc settings here are to try to encourage it to favor the arc cache instead of whatever else Inactive memory in 'top' contains. Very interesting. For my iscsi backend (running istgt from ports), I had to change the arc_max below 128M to stop iSCSI initiators generating timeouts when the cache flushed. (This is on a system with a megaraid 8308ELP handling the disk back end, with the disks in two RAID5 arrays of four disks each, zpooled as one big pool). When I had more than 128M arc_max, zfs on regular times ate all available resources to flush to disk, leaving the istgt waiting, and iSCSI initiators timed out and had to reconnect. The iSCSI initiators are the built-in software initator in VMWare ESX 4i. //Svein - -- - +---+--- /\ |Svein Skogen | sv...@d80.iso100.no \ / |Solberg Østli 9| PGP Key: 0xE5E76831 X|2020 Skedsmokorset | sv...@jernhuset.no / \ |Norway | PGP Key: 0xCE96CE13 | | sv...@stillbilde.net ascii | | PGP Key: 0x58CD33B6 ribbon |System Admin | svein-listm...@stillbilde.net Campaign|stillbilde.net | PGP Key: 0x22D494A4 +---+--- |msn messenger: | Mobile Phone: +47 907 03 575 |sv...@jernhuset.no | RIPE handle:SS16503-RIPE - +---+--- If you really are in a hurry, mail me at svein-mob...@stillbilde.net This mailbox goes directly to my cellphone and is checked even when I'm not in front of my computer. - Picture Gallery: https://gallery.stillbilde.net/v/svein/ - -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAksI/ZMACgkQODUnwSLUlKRr6gCfeq5dybIfp5RLOzjL04guLV25 +qgAn04SjnGG3lBRExQaMjxyKcd9Jcct =ubYi -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.0-RC USB problem -- how to recover a damaged USB stick
On Sunday 22 November 2009 04:40:27 Guojun Jin wrote: Does anyone know if it is possible to revocer such damaged USB stick? Hi, There are several recovery tools in /usr/ports for this kind of task. For example photorec . --HPS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 8.0-RC USB/FS problem
On Sunday 22 November 2009 05:38:13 Guojun Jin wrote: Tried on the USB hard drive: Deleted slice 3 and recreated slice 3 with two partitions s3d and s3e. Was happy because successfully did dump/restore on s3d, and thought it just partition format issue; but system crashed during dump/restore on s3e, and partition lost the file system type. wolf# mount /dev/da0s3e /mnt WARNING: /mnt was not properly dismounted /mnt: mount pending error: blocks 35968 files 0 wolf# fsck da0s3e fsck: Could not determine filesystem type wolf# bsdlabel da0s3 # /dev/da0s3: 8 partitions: #size offsetfstype [fsize bsize bps/cpg] c: 1757350350unused0 0 # raw part, don't edi t d: 1887436804.2BSD0 0 0 e: 156860667 188743684.2BSD0 0 0 Therefore, tried directly use fsck_ufs on both USB hard drive and USB stick to get file system clean up. All data got back now. The machine has run with FreeBSD 6.1 all the way to 7.2 without such problem. How can we determine what could go wrong in 8.0? FS or USB. Hi, Error 5 means IO error, so probably the transport layer, USB or lower, is to blame. Some things to check: 1) Make sure the connection for your memory stick is Ok. 2) Make sure there is enough power for your memory stick. Regarding memory sticks: Other operating systems do a port bus reset when the device has a problem. On FreeBSD we just try a software reset via the control endpoint. I guess that it is a device problem you are seeing. The USB stack in FreeBSD is faster than the old one, and maybe the faster queueing of mass storage requests trigger some hidden bugs in your device. When the problem happens try: sysctl hw.usb.umass.debug=-1 --HPS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
BIOS resource allocation and FreeBSD ACPI
Hi, I see this problem over and over again... some time ago I created this PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/135070 and I just saw it has been duplicated: http://www.freebsd.org/cgi/query-pr.cgi?pr=140751 maybe the later one should be closed as a duplicate... anyway I think I saw this problem reported for more than 10 different laptops in the lists and the forums...maybe it's time someone to fix this issue ... I'm willing to donate money if someone can take and fix this (yes, I'm serious, I think it's worth it) regards, mgp - Вижте водещите новини от Vesti.bg! http://www.vesti.bg ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: 7.2 dies in zfs
On Sun, Nov 22, 2009 at 10:00:03AM +0100, Svein Skogen (listmail account) wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Adam McDougall wrote: On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote: On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote: Randy Bush ra...@psg.com wrote: imiho, zfs can not be called production ready if it crashes if you do not stand on your left leg, put your right hand in the air, and burn some eye of newt. This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has been marked as production ready. As far as i know, on FreeBSD 8.0 ZFS is called production ready. If you boot your system it probably tell you it is still experimental. Try running FreeBSD 7-Stable to get the latest ZFS version which on FreeBSD is 13 On 7.2 it is still at 6 (if I remember it right). RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18. RELENG_7 and RELENG_8 both, more or less, behave the same way with regards to ZFS. Both panic on kmem exhaustion. No one has answered my question as far as what's needed to stabilise ZFS on either 7.x or 8.x. I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13. It has been stable since mid may. I have not had a kmem panic on any of my ZFS systems for a long time, its a matter of making sure there is enough kmem at boot (not depending on kmem_size_max) and that it is big enough that fragmentation does not cause a premature allocation failure due to lack of large-enough contiguous chunk. This requires the platform to support a kmem size that is big enough... i386 can barely muster 1.6G and sometimes that might not be enough. I'm pretty sure all of my currently existing ZFS systems are amd64 where the kmem can now be huge. On the busy fileserver with 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009, I currently have: vfs.zfs.arc_max=16384M vfs.zfs.arc_min=4096M vm.kmem_size=18G The arc settings here are to try to encourage it to favor the arc cache instead of whatever else Inactive memory in 'top' contains. Very interesting. For my iscsi backend (running istgt from ports), I had to change the arc_max below 128M to stop iSCSI initiators generating timeouts when the cache flushed. (This is on a system with a megaraid 8308ELP handling the disk back end, with the disks in two RAID5 arrays of four disks each, zpooled as one big pool). When I had more than 128M arc_max, zfs on regular times ate all available resources to flush to disk, leaving the istgt waiting, and iSCSI initiators timed out and had to reconnect. The iSCSI initiators are the built-in software initator in VMWare ESX 4i. //Svein I could understand that happening. I've seen situations in the past where my kmem was smaller than I wanted it to be, and within a few days the overall ZFS disk IO would become incredibly slow because it was trying to flush out the ARC way too often because of external intense memory pressure on the ARC. Assuming you have a large amount of ram, I wonder if setting kmem_size, arc_min and arc_max sufficiently large and using modern code would help as long as you made sure other processes on the machine don't squeeze down Wired memory in top too much. In such a situation, I would expect it to operate fine while the ARC has enough kmem to expand as much as it wants to, and it might either hit a wall later or perhaps given enough ARC the reclamation might be tolerable. Or, if 128M ARC is good enough for you, leave it :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
HEADS UP: removal of PECOFF support in RELENG_[67]
Hi, I'd like to give you a heads up that I intend to also remove PECOFF support from the stable/7 and stable/6 branches. PECOFF support is non-working and unmaintained in those FreeBSD releases and has lately still seen public security problems. PECOFF support is already gone in the upcoming 8.0 RELEASE or the 9-CURRENT development branch. Should no valid complaints come up saying that someone needs (and actively uses *cough* PECOFF support on FreeBSD it'll be removed earliest Novemeber 29th 2009 00:00 UTC (in about one week). /bz -- Bjoern A. Zeeb It will not break if you know what you are doing. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
openoffice stuck in _umtx_op
Hello! I'm trying to start OOo, and it hangs at start-up -- after popping up its banner page. ktrace shows the following, slowly repeating, sequence of events: [...] 32726 soffice.bin CALL _umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0) 32726 soffice.bin RET _umtx_op -1 errno 60 Operation timed out 32726 soffice.bin CALL gettimeofday(0x7fbfef70,0) 32726 soffice.bin RET gettimeofday 0 32726 soffice.bin CALL clock_gettime(0,0x7fbfef00) 32726 soffice.bin RET clock_gettime 0 32726 soffice.bin CALL _umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0) 32726 soffice.bin RET _umtx_op -1 errno 60 Operation timed out 32726 soffice.bin CALL gettimeofday(0x7fbfef70,0) 32726 soffice.bin RET gettimeofday 0 32726 soffice.bin CALL clock_gettime(0,0x7fbfef00) 32726 soffice.bin RET clock_gettime 0 32726 soffice.bin CALL _umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0) [...] what's happening? `ipcs -a' does not show anything extraordinary and there is nothing in syslog either... The machine is running 7.2-STABLE/amd64 (as of Oct 25). Any ideas? Thanks! -mi ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 7.x hang-on-boot on Dell 1950
On Fri, Nov 13, 2009 at 9:44 PM, Jeremy Chadwick free...@jdc.parodius.comwrote: This 1950 may predate that a bit, but I'm not sure how to nail it down exactly, other than by it's hardware components. Anyways, 7.0 does the same thing --- still wedged. I haven't seen anyone recommend this as a test method yet -- disabling fdc prior to the kernel booting via the loader prompt: - Press 6 at the menu, - At the loader prompt, type: set hint.fdc.0.disabled=1 boot -v (or without -v; your choice) You shouldn't need to set hint.fd.0.disabled=1, since fd0 would normally bind to fdc0; disable the latter and you disable the lesser. The intention here is to rule out the device attachment failures from fdc as the source of the deadlock. Entertainingly, it does not. Aparently that hint doesn't stop the code from trying to attach fdc0 when acpi says so. I suppose I need to know the console command to disable acpi and fdc. but it still wedges at device_attach: fdc0 attach returned 6 with the above. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
RE: 8.0-RC USB/FS problem
From more intensive diagnose, it looks like more related USB layer. repeated a few time on following process and ithe crash happened at different USB access phase at each time. dd if=/dev/zero of=/dev/da0 count=1000 bs=4k sysinstall partition slice 1 (da0s1) 18GB ID=12 slice 2 (da0s2) 10-15GB Id=165 slice 3 (da0s3) rest ID=165 W --- OK label da0s3d 9GB /mnt da0s3e rest/dist W --- da0s3e device is not configured. w# ll /dev/da0* # after sysinstall did partition + W at 1st time crw-r- 1 root operator0, 97 Nov 22 11:23 /dev/da0 crw-r- 1 root operator0, 98 Nov 22 11:23 /dev/da0s1 crw-r- 1 root operator0, 99 Nov 22 11:23 /dev/da0s2 crw-r- 1 root operator0, 100 Nov 22 11:23 /dev/da0s3 # ll /dev/da0* # after sysinstall start at 2nd time crw-r- 1 root operator0, 97 Nov 22 11:27 /dev/da0 System crashed The crash log is available at http:/www.daemonfun.com/archives/pub/USB/crash1-reset.bz2 (All logs are based on hw.usb.umass.debug=-1) After system reboot, and repeated above processes, the da0s3e was mounted on /dist, but da0s3d cannot. It tunred out that newfs fail inside labeling process in sysinstall. Manually did newfs on da0s3d, and it cannot be mounted on /mnt, but access to it caused crash. The crash log is available at http:/www.daemonfun.com/archives/pub/USB/newfs Tried entire process again, this time, both partitons are formatted (newfs) inside labaling process (sysinstall) but crahsed system during dump/restore on da0s3e (/dist). The crash log is available at http:/www.daemonfun.com/archives/pub/USB/usb-log.crash2.bz2, which is huge one. It contains two parts, one dump/restore IDE to da0s3d (passed), and the rest is dump/restore to da0s3e (crashed). I am going to reinstall the system with the new ISO from Nov 21 8.0-RELEASE to see if anything will improve. -Original Message- From: Hans Petter Selasky [mailto:hsela...@c2i.net] Sent: Sun 11/22/2009 1:47 AM To: freebsd-...@freebsd.org Cc: Guojun Jin; b...@freebsd.org; freebsd-stable@freebsd.org Subject: Re: 8.0-RC USB/FS problem On Sunday 22 November 2009 05:38:13 Guojun Jin wrote: Tried on the USB hard drive: Deleted slice 3 and recreated slice 3 with two partitions s3d and s3e. Was happy because successfully did dump/restore on s3d, and thought it just partition format issue; but system crashed during dump/restore on s3e, and partition lost the file system type. wolf# mount /dev/da0s3e /mnt WARNING: /mnt was not properly dismounted /mnt: mount pending error: blocks 35968 files 0 wolf# fsck da0s3e fsck: Could not determine filesystem type wolf# bsdlabel da0s3 # /dev/da0s3: 8 partitions: #size offsetfstype [fsize bsize bps/cpg] c: 1757350350unused0 0 # raw part, don't edi t d: 1887436804.2BSD0 0 0 e: 156860667 188743684.2BSD0 0 0 Therefore, tried directly use fsck_ufs on both USB hard drive and USB stick to get file system clean up. All data got back now. The machine has run with FreeBSD 6.1 all the way to 7.2 without such problem. How can we determine what could go wrong in 8.0? FS or USB. Hi, Error 5 means IO error, so probably the transport layer, USB or lower, is to blame. Some things to check: 1) Make sure the connection for your memory stick is Ok. 2) Make sure there is enough power for your memory stick. Regarding memory sticks: Other operating systems do a port bus reset when the device has a problem. On FreeBSD we just try a software reset via the control endpoint. I guess that it is a device problem you are seeing. The USB stack in FreeBSD is faster than the old one, and maybe the faster queueing of mass storage requests trigger some hidden bugs in your device. When the problem happens try: sysctl hw.usb.umass.debug=-1 --HPS ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 7.x hang-on-boot on Dell 1950
On Thu, 2009-11-12 at 16:56 -0500, Zaphod Beeblebrox wrote: I've now verified that 8.0-RC3 does the same thing, BTW. Anyways... no. There is no floppy option in the BIOS. It's not in Dell BIOS absolutely sucks. You get what you pay for. We have 25+ 9th gen systems. Revision 1, with the older HT Xeons, are all lemons. The only thing that runs stable on them in ESXi R3 of the 1950 with the PERC6 and DRAC5 works fine on 6.3/amd64, 7.2, etc. ~BAS any of the sub-menus (which is how Dell's BIOS is organized). I'm also not-so-sure that the floppy is where it's stopping because the non-ACPI boot doesn't ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
Re: FreeBSD 7.x hang-on-boot on Dell 1950
On Sun, Nov 22, 2009 at 9:09 PM, Zaphod Beeblebrox zbee...@gmail.comwrote: On Fri, Nov 13, 2009 at 9:44 PM, Jeremy Chadwick free...@jdc.parodius.com wrote: This 1950 may predate that a bit, but I'm not sure how to nail it down exactly, other than by it's hardware components. Anyways, 7.0 does the same thing --- still wedged. I haven't seen anyone recommend this as a test method yet -- disabling fdc prior to the kernel booting via the loader prompt: - Press 6 at the menu, - At the loader prompt, type: set hint.fdc.0.disabled=1 boot -v (or without -v; your choice) You shouldn't need to set hint.fd.0.disabled=1, since fd0 would normally bind to fdc0; disable the latter and you disable the lesser. The intention here is to rule out the device attachment failures from fdc as the source of the deadlock. Entertainingly, it does not. Aparently that hint doesn't stop the code from trying to attach fdc0 when acpi says so. I suppose I need to know the console command to disable acpi and fdc. but it still wedges at device_attach: fdc0 attach returned 6 with the above. OK. With both floppy and acpi disabled, it dies calling start_init several times, the last being /stand/sysinstal (which should work). I don't see it starting the other CPUs. It hangs hard... no keyboard working (ie: no caps lock). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org