Re: 7.2 dies in zfs

2009-11-22 Thread Svein Skogen (listmail account)
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Adam McDougall wrote:
 On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote:
 
   
   On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote:
Randy Bush ra...@psg.com wrote:
 imiho, zfs can not be called production ready if it crashes if you
 do not stand on your left leg, put your right hand in the air, and
 burn some eye of newt.

This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has
been marked as production ready.
As far as i know, on FreeBSD 8.0 ZFS is called production ready.

If you boot your system it probably tell you it is still experimental.

Try running FreeBSD 7-Stable to get the latest ZFS version which on
FreeBSD is 13
On 7.2 it is still at 6 (if I remember it right).
   
   RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18.
   
   RELENG_7 and RELENG_8 both, more or less, behave the same way with
   regards to ZFS.  Both panic on kmem exhaustion.  No one has answered my
   question as far as what's needed to stabilise ZFS on either 7.x or 8.x.
 
 I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13.
 It has been stable since mid may.  I have not had a kmem panic on any
 of my ZFS systems for a long time, its a matter of making sure there is
 enough kmem at boot (not depending on kmem_size_max) and that it is big enough
 that fragmentation does not cause a premature allocation failure due to lack
 of large-enough contiguous chunk.  This requires the platform to support a
 kmem size that is big enough... i386 can barely muster 1.6G and sometimes
 that might not be enough.  I'm pretty sure all of my currently existing ZFS
 systems are amd64 where the kmem can now be huge.  On the busy fileserver with
 20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009,
 I currently have:
 vfs.zfs.arc_max=16384M
 vfs.zfs.arc_min=4096M
 vm.kmem_size=18G
 The arc settings here are to try to encourage it to favor the arc cache
 instead of whatever else Inactive memory in 'top' contains.

Very interesting. For my iscsi backend (running istgt from ports), I had
to change the arc_max below 128M to stop iSCSI initiators generating
timeouts when the cache flushed. (This is on a system with a megaraid
8308ELP handling the disk back end, with the disks in two RAID5 arrays
of four disks each, zpooled as one big pool).

When I had more than 128M arc_max, zfs on regular times ate all
available resources to flush to disk, leaving the istgt waiting, and
iSCSI initiators timed out and had to reconnect. The iSCSI initiators
are the built-in software initator in VMWare ESX 4i.

//Svein

- --
- +---+---
  /\   |Svein Skogen   | sv...@d80.iso100.no
  \ /   |Solberg Østli 9| PGP Key:  0xE5E76831
   X|2020 Skedsmokorset | sv...@jernhuset.no
  / \   |Norway | PGP Key:  0xCE96CE13
|   | sv...@stillbilde.net
 ascii  |   | PGP Key:  0x58CD33B6
 ribbon |System Admin   | svein-listm...@stillbilde.net
Campaign|stillbilde.net | PGP Key:  0x22D494A4
+---+---
|msn messenger: | Mobile Phone: +47 907 03 575
|sv...@jernhuset.no | RIPE handle:SS16503-RIPE
- +---+---
 If you really are in a hurry, mail me at
   svein-mob...@stillbilde.net
 This mailbox goes directly to my cellphone and is checked
even when I'm not in front of my computer.
- 
 Picture Gallery:
  https://gallery.stillbilde.net/v/svein/
- 
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAksI/ZMACgkQODUnwSLUlKRr6gCfeq5dybIfp5RLOzjL04guLV25
+qgAn04SjnGG3lBRExQaMjxyKcd9Jcct
=ubYi
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0-RC USB problem -- how to recover a damaged USB stick

2009-11-22 Thread Hans Petter Selasky
On Sunday 22 November 2009 04:40:27 Guojun Jin wrote:
 Does anyone know if it is possible to revocer such damaged USB stick?

Hi,

There are several recovery tools in /usr/ports for this kind of task.

For example photorec .

--HPS

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 8.0-RC USB/FS problem

2009-11-22 Thread Hans Petter Selasky
On Sunday 22 November 2009 05:38:13 Guojun Jin wrote:
 Tried on the USB hard drive:

 Deleted slice 3 and recreated slice 3 with two partitions s3d and s3e.
 Was happy because successfully did dump/restore on s3d, and thought it just
 partition format issue; but system crashed during dump/restore on s3e, and
 partition lost the file system type.

 wolf# mount /dev/da0s3e /mnt
 WARNING: /mnt was not properly dismounted
 /mnt: mount pending error: blocks 35968 files 0
 wolf# fsck da0s3e
 fsck: Could not determine filesystem type
 wolf# bsdlabel da0s3
 # /dev/da0s3:
 8 partitions:
 #size   offsetfstype   [fsize bsize bps/cpg]
   c: 1757350350unused0 0 # raw part,
 don't edi t
   d: 1887436804.2BSD0 0 0
   e: 156860667 188743684.2BSD0 0 0

 Therefore, tried directly use fsck_ufs on both USB hard drive and USB stick
 to get file system clean up. All data got back now.

 The machine has run with FreeBSD 6.1 all the way to 7.2 without such
 problem. How can we determine what could go wrong in 8.0? FS or USB.

Hi,

Error 5 means IO error, so probably the transport layer, USB or lower, is to 
blame.

Some things to check:

1) Make sure the connection for your memory stick is Ok.
2) Make sure there is enough power for your memory stick.

Regarding memory sticks:

Other operating systems do a port bus reset when the device has a problem. On 
FreeBSD we just try a software reset via the control endpoint. I guess that it 
is a device problem you are seeing. The USB stack in FreeBSD is faster than 
the old one, and maybe the faster queueing of mass storage requests trigger 
some hidden bugs in your device.

When the problem happens try:

sysctl hw.usb.umass.debug=-1

--HPS
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


BIOS resource allocation and FreeBSD ACPI

2009-11-22 Thread Mario Pavlov
 Hi,
I see this problem over and over again...
some time ago I created this PR: 
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/135070
and I just saw it has been duplicated: 
http://www.freebsd.org/cgi/query-pr.cgi?pr=140751
maybe the later one should be closed as a duplicate...
anyway I think I saw this problem reported for more than 10 different laptops 
in the lists and the forums...maybe it's time someone to fix this issue ... I'm 
willing to donate money if someone can take and fix this (yes, I'm serious, I 
think it's worth it)

regards,
mgp

-
Вижте водещите новини от Vesti.bg!
http://www.vesti.bg
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: 7.2 dies in zfs

2009-11-22 Thread Adam McDougall
On Sun, Nov 22, 2009 at 10:00:03AM +0100, Svein Skogen (listmail account) wrote:

  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1
  
  Adam McDougall wrote:
   On Sat, Nov 21, 2009 at 11:36:43AM -0800, Jeremy Chadwick wrote:
   
 
 On Sat, Nov 21, 2009 at 08:07:40PM +0100, Johan Hendriks wrote:
  Randy Bush ra...@psg.com wrote:
   imiho, zfs can not be called production ready if it crashes if you
   do not stand on your left leg, put your right hand in the air, and
   burn some eye of newt.
  
  This is not a rant, but where do you read that on FreeBSD 7.2 ZFS has
  been marked as production ready.
  As far as i know, on FreeBSD 8.0 ZFS is called production ready.
  
  If you boot your system it probably tell you it is still experimental.
  
  Try running FreeBSD 7-Stable to get the latest ZFS version which on
  FreeBSD is 13
  On 7.2 it is still at 6 (if I remember it right).
 
 RELENG_7 uses ZFS v13, RELENG_8 uses ZFS v18.
 
 RELENG_7 and RELENG_8 both, more or less, behave the same way with
 regards to ZFS.  Both panic on kmem exhaustion.  No one has answered my
 question as far as what's needed to stabilise ZFS on either 7.x or 8.x.
   
   I have a stable public ftp/http/rsync/cvsupd mirror that runs ZFS v13.
   It has been stable since mid may.  I have not had a kmem panic on any
   of my ZFS systems for a long time, its a matter of making sure there is
   enough kmem at boot (not depending on kmem_size_max) and that it is big 
enough
   that fragmentation does not cause a premature allocation failure due to lack
   of large-enough contiguous chunk.  This requires the platform to support a
   kmem size that is big enough... i386 can barely muster 1.6G and sometimes
   that might not be enough.  I'm pretty sure all of my currently existing ZFS
   systems are amd64 where the kmem can now be huge.  On the busy fileserver 
with
   20 gigs of ram running FreeBSD 8.0-RC2 #21: Tue Oct 27 21:45:41 EDT 2009,
   I currently have:
   vfs.zfs.arc_max=16384M
   vfs.zfs.arc_min=4096M
   vm.kmem_size=18G
   The arc settings here are to try to encourage it to favor the arc cache
   instead of whatever else Inactive memory in 'top' contains.
  
  Very interesting. For my iscsi backend (running istgt from ports), I had
  to change the arc_max below 128M to stop iSCSI initiators generating
  timeouts when the cache flushed. (This is on a system with a megaraid
  8308ELP handling the disk back end, with the disks in two RAID5 arrays
  of four disks each, zpooled as one big pool).
  
  When I had more than 128M arc_max, zfs on regular times ate all
  available resources to flush to disk, leaving the istgt waiting, and
  iSCSI initiators timed out and had to reconnect. The iSCSI initiators
  are the built-in software initator in VMWare ESX 4i.
  
  //Svein
  
I could understand that happening.  I've seen situations in the past where my
kmem was smaller than I wanted it to be, and within a few days the overall
ZFS disk IO would become incredibly slow because it was trying to flush out
the ARC way too often because of external intense memory pressure on the ARC.
Assuming you have a large amount of ram, I wonder if setting kmem_size, arc_min
and arc_max sufficiently large and using modern code would help as long as you
made sure other processes on the machine don't squeeze down Wired memory in top
too much.  In such a situation, I would expect it to operate fine while the ARC
has enough kmem to expand as much as it wants to, and it might either hit a wall
later or perhaps given enough ARC the reclamation might be tolerable.  Or, if 
128M ARC is good enough for you, leave it :)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


HEADS UP: removal of PECOFF support in RELENG_[67]

2009-11-22 Thread Bjoern A. Zeeb

Hi,

I'd like to give you a heads up that I intend to also remove PECOFF
support from the stable/7 and stable/6 branches.  PECOFF support is
non-working and unmaintained in those FreeBSD releases and has lately
still seen public security problems.

PECOFF support is already gone in the upcoming 8.0 RELEASE or the
9-CURRENT development branch.


Should no valid complaints come up saying that someone needs (and
actively uses *cough* PECOFF support on FreeBSD it'll be removed
earliest Novemeber 29th 2009 00:00 UTC (in about one week).


/bz

--
Bjoern A. Zeeb It will not break if you know what you are doing.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


openoffice stuck in _umtx_op

2009-11-22 Thread Mikhail T.
Hello!

I'm trying to start OOo, and it hangs at start-up -- after popping up
its banner page.

ktrace shows the following, slowly repeating, sequence of events:

[...]
 32726 soffice.bin CALL 
_umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0)
 32726 soffice.bin RET   _umtx_op -1 errno 60 Operation timed out
 32726 soffice.bin CALL  gettimeofday(0x7fbfef70,0)
 32726 soffice.bin RET   gettimeofday 0
 32726 soffice.bin CALL  clock_gettime(0,0x7fbfef00)
 32726 soffice.bin RET   clock_gettime 0
 32726 soffice.bin CALL 
_umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0)
 32726 soffice.bin RET   _umtx_op -1 errno 60 Operation timed out
 32726 soffice.bin CALL  gettimeofday(0x7fbfef70,0)
 32726 soffice.bin RET   gettimeofday 0
 32726 soffice.bin CALL  clock_gettime(0,0x7fbfef00)
 32726 soffice.bin RET   clock_gettime 0
 32726 soffice.bin CALL 
_umtx_op(0x805d09060,0x8,0x1,0x805d09040,0x7fbfeef0)
[...]

what's happening? `ipcs -a' does not show anything extraordinary and
there is nothing in syslog either...

The machine is running 7.2-STABLE/amd64 (as of Oct 25). Any ideas? Thanks!

-mi

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 7.x hang-on-boot on Dell 1950

2009-11-22 Thread Zaphod Beeblebrox
On Fri, Nov 13, 2009 at 9:44 PM, Jeremy Chadwick
free...@jdc.parodius.comwrote:


  This 1950 may predate that a bit, but I'm not sure how to nail it down
  exactly, other than by it's hardware components.  Anyways, 7.0 does the
 same
  thing --- still wedged.

 I haven't seen anyone recommend this as a test method yet -- disabling
 fdc prior to the kernel booting via the loader prompt:

 - Press 6 at the menu,
 - At the loader prompt, type:

  set hint.fdc.0.disabled=1
  boot -v   (or without -v; your choice)

 You shouldn't need to set hint.fd.0.disabled=1, since fd0 would
 normally bind to fdc0; disable the latter and you disable the lesser.

 The intention here is to rule out the device attachment failures from
 fdc as the source of the deadlock.


Entertainingly, it does not.  Aparently that hint doesn't stop the code from
trying to attach fdc0 when acpi says so.  I suppose I need to know the
console command to disable acpi and fdc.

but it still wedges at device_attach: fdc0 attach returned 6 with the
above.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


RE: 8.0-RC USB/FS problem

2009-11-22 Thread Guojun Jin
From more intensive diagnose, it looks like more related USB layer.

repeated a few time on following process and ithe crash happened at different 
USB access phase at each time.

dd if=/dev/zero of=/dev/da0 count=1000 bs=4k
sysinstall
partition 
 slice 1 (da0s1) 18GB ID=12
 slice 2 (da0s2) 10-15GB Id=165
 slice 3 (da0s3) rest ID=165
 W --- OK
label
 da0s3d 9GB /mnt
 da0s3e rest/dist
 W --- da0s3e  device is not configured.

w# ll /dev/da0*  # after sysinstall did partition + W at 1st time
crw-r-  1 root  operator0,  97 Nov 22 11:23 /dev/da0
crw-r-  1 root  operator0,  98 Nov 22 11:23 /dev/da0s1
crw-r-  1 root  operator0,  99 Nov 22 11:23 /dev/da0s2
crw-r-  1 root  operator0, 100 Nov 22 11:23 /dev/da0s3

# ll /dev/da0*  # after sysinstall start at 2nd time
crw-r-  1 root  operator0,  97 Nov 22 11:27 /dev/da0
System crashed

The crash log is available at 
http:/www.daemonfun.com/archives/pub/USB/crash1-reset.bz2
(All logs are based on hw.usb.umass.debug=-1)

After system reboot, and repeated above processes, the da0s3e was mounted on 
/dist, but da0s3d cannot.
It tunred out that newfs fail inside labeling process in sysinstall. Manually 
did newfs on da0s3d, and
it cannot be mounted on /mnt, but access to it caused crash.
The crash log is available at http:/www.daemonfun.com/archives/pub/USB/newfs

Tried entire process again, this time, both partitons are formatted (newfs) 
inside labaling process (sysinstall)
but crahsed system during dump/restore on da0s3e (/dist).
The crash log is available at 
http:/www.daemonfun.com/archives/pub/USB/usb-log.crash2.bz2, which is huge one.
It contains two parts, one dump/restore IDE to da0s3d (passed), and the rest is 
dump/restore to da0s3e (crashed).

I am going to reinstall the system with the new ISO from Nov 21 8.0-RELEASE to 
see if anything will improve.

-Original Message-
From: Hans Petter Selasky [mailto:hsela...@c2i.net]
Sent: Sun 11/22/2009 1:47 AM
To: freebsd-...@freebsd.org
Cc: Guojun Jin; b...@freebsd.org; freebsd-stable@freebsd.org
Subject: Re: 8.0-RC USB/FS problem
 
On Sunday 22 November 2009 05:38:13 Guojun Jin wrote:
 Tried on the USB hard drive:

 Deleted slice 3 and recreated slice 3 with two partitions s3d and s3e.
 Was happy because successfully did dump/restore on s3d, and thought it just
 partition format issue; but system crashed during dump/restore on s3e, and
 partition lost the file system type.

 wolf# mount /dev/da0s3e /mnt
 WARNING: /mnt was not properly dismounted
 /mnt: mount pending error: blocks 35968 files 0
 wolf# fsck da0s3e
 fsck: Could not determine filesystem type
 wolf# bsdlabel da0s3
 # /dev/da0s3:
 8 partitions:
 #size   offsetfstype   [fsize bsize bps/cpg]
   c: 1757350350unused0 0 # raw part,
 don't edi t
   d: 1887436804.2BSD0 0 0
   e: 156860667 188743684.2BSD0 0 0

 Therefore, tried directly use fsck_ufs on both USB hard drive and USB stick
 to get file system clean up. All data got back now.

 The machine has run with FreeBSD 6.1 all the way to 7.2 without such
 problem. How can we determine what could go wrong in 8.0? FS or USB.

Hi,

Error 5 means IO error, so probably the transport layer, USB or lower, is to 
blame.

Some things to check:

1) Make sure the connection for your memory stick is Ok.
2) Make sure there is enough power for your memory stick.

Regarding memory sticks:

Other operating systems do a port bus reset when the device has a problem. On 
FreeBSD we just try a software reset via the control endpoint. I guess that it 
is a device problem you are seeing. The USB stack in FreeBSD is faster than 
the old one, and maybe the faster queueing of mass storage requests trigger 
some hidden bugs in your device.

When the problem happens try:

sysctl hw.usb.umass.debug=-1

--HPS

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 7.x hang-on-boot on Dell 1950

2009-11-22 Thread Brian Seklecki
On Thu, 2009-11-12 at 16:56 -0500, Zaphod Beeblebrox wrote:
 
 I've now verified that 8.0-RC3 does the same thing, BTW.
 
 Anyways... no.  There is no floppy option in the BIOS.  It's not in 

Dell BIOS absolutely sucks.  You get what you pay for.

We have 25+ 9th gen systems.  Revision 1, with the older HT Xeons, are
all lemons.  

The only thing that runs stable on them in ESXi

R3 of the 1950 with the PERC6 and DRAC5 works fine on 6.3/amd64, 7.2,
etc.

~BAS

 any of the sub-menus (which is how Dell's BIOS is organized).  I'm
 also not-so-sure that the floppy is where it's stopping because the
 non-ACPI boot doesn't 

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: FreeBSD 7.x hang-on-boot on Dell 1950

2009-11-22 Thread Zaphod Beeblebrox
On Sun, Nov 22, 2009 at 9:09 PM, Zaphod Beeblebrox zbee...@gmail.comwrote:



 On Fri, Nov 13, 2009 at 9:44 PM, Jeremy Chadwick free...@jdc.parodius.com
  wrote:


  This 1950 may predate that a bit, but I'm not sure how to nail it down
  exactly, other than by it's hardware components.  Anyways, 7.0 does the
 same
  thing --- still wedged.

 I haven't seen anyone recommend this as a test method yet -- disabling
 fdc prior to the kernel booting via the loader prompt:

 - Press 6 at the menu,
 - At the loader prompt, type:

  set hint.fdc.0.disabled=1
  boot -v   (or without -v; your choice)

 You shouldn't need to set hint.fd.0.disabled=1, since fd0 would
 normally bind to fdc0; disable the latter and you disable the lesser.

 The intention here is to rule out the device attachment failures from
 fdc as the source of the deadlock.


 Entertainingly, it does not.  Aparently that hint doesn't stop the code
 from trying to attach fdc0 when acpi says so.  I suppose I need to know the
 console command to disable acpi and fdc.

 but it still wedges at device_attach: fdc0 attach returned 6 with the
 above.


OK.  With both floppy and acpi disabled, it dies calling start_init
several times, the last being /stand/sysinstal (which should work).  I don't
see it starting the other CPUs.  It hangs hard... no keyboard working (ie:
no caps lock).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org