date:20110109

On 09/01/2011 05:50, Randy Bush wrote:
 given i have raid or raidz1, can i move to raidz2?
 
 # zpool status 
   pool: tank
  state: ONLINE
  scrub: none requested
 config:
 
 NAMESTATE READ WRITE CKSUM
 tankONLINE   0 0 0
   raidz1ONLINE   0 0 0
 ad4s2   ONLINE   0 0 0
 ad8s2   ONLINE   0 0 0
 ad6s1   ONLINE   0 0 0
 ad10s1  ONLINE   0 0 0
 
 or
 
 # zpool status
   pool: tank
  state: ONLINE
  scrub: none requested
 config:
 
 NAME  STATE READ WRITE CKSUM
 tank  ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 label/disk01  ONLINE   0 0 0
 label/disk00  ONLINE   0 0 0
   mirror  ONLINE   0 0 0
 label/disk02  ONLINE   0 0 0
 label/disk03  ONLINE   0 0 0

Not without backing up your current data, destroying the existing
zpool(s) and rebuilding from scratch.

Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror
pairs of disks, and the RAID10 mirror is going to be rather more performant.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-09 Thread Josef Karthauser

Brill! Thanks :)
Joe

On 8 Jan 2011, at 09:50, Jeremy Chadwick free...@jdc.parodius.com wrote:

On Sat, Jan 08, 2011 at 09:14:19AM +, Josef Karthauser wrote:
On 7 Jan 2011, at 17:30, Artem Belevich fbsdl...@src.cx wrote:
One way to get specific ratio for *your* pool would be to collect
record size statistics from your pool using zdb -L -b pool and
then calculate L2ARC:ARC ratio based on average record size. I'm not
sure, though whether L2ARC stores records in compressed or
uncompressed form.

Can someone point me to a reference describing the various zfs caches
available? What's the arc and zil? Ive been running some zfs for a few years
now, and must have missed thus entire subject :/.

ARC: http://en.wikipedia.org/wiki/ZFS#Cache_management
L2ARC: http://en.wikipedia.org/wiki/ZFS#Storage_pools
L2ARC: http://blogs.sun.com/brendan/entry/test
Both:
http://www.c0t0d0s0.org/archives/5329-Some-insight-into-the-read-cache-of-ZFS-or-The-ARC.html
Both: http://nilesh-joshi.blogspot.com/2010/07/zfs-revisited.html
ZIL: http://blogs.sun.com/perrin/entry/the_lumberjack
ZIL: http://blogs.sun.com/realneel/entry/the_zfs_intent_log

Enjoy.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: New ZFSv28 patchset for 8-STABLE


 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz

I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio-i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio-i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio-i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio-i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat syscallenter 
syscall Xfast_syscall


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-09 Thread Jean-Yves Avenard

Hi

On 9 January 2011 19:44, Matthew Seaman m.sea...@infracaninophile.co.uk wrote:
 Not without backing up your current data, destroying the existing
 zpool(s) and rebuilding from scratch.

 Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror
 pairs of disks, and the RAID10 mirror is going to be rather more performant.

I would have thought that the probability of failure to be slightly different.
Sure you out of 4 disks, 2 can fail in both conditions.

*But*, in raidz2, any two of the four can fail.
In RAID10, the two disks that failed must be in different block
otherwise you loose it all

As such the resilience for failure in a RAIDz2 is far greater than in
a RAID10 system
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

On 09/01/2011 09:01, Jean-Yves Avenard wrote:
 Hi
 
 On 9 January 2011 19:44, Matthew Seaman m.sea...@infracaninophile.co.uk 
 wrote:
 Not without backing up your current data, destroying the existing
 zpool(s) and rebuilding from scratch.

 Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror
 pairs of disks, and the RAID10 mirror is going to be rather more performant.
 
 I would have thought that the probability of failure to be slightly different.
 Sure you out of 4 disks, 2 can fail in both conditions.
 
 *But*, in raidz2, any two of the four can fail.
 In RAID10, the two disks that failed must be in different block
 otherwise you loose it all
 
 As such the resilience for failure in a RAIDz2 is far greater than in
 a RAID10 system

So you sacrifice performance 100% of the time based on the very unlikely
possibility of drives 1+2 or 3+4 failing simultaneously, compared to the
similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4
failing simultaneously?[*]  That's not a trade-off worth making IMHO.
If the data is that valuable, you should be making copies of it to some
independent machine all the time and backing up at frequent intervals,
which backups you keep off-site in disaster-proof storage.

Cheers,

Matthew

[*] All of this mathematics is pretty suspect, because if two drives
fail simultaneously in a machine, the chances are the failures are not
independent, but due to some external cause [eg. like the case fan
breaking and the box toasting itself.]  In which case, the comparative
chance of whatever it is affecting three or four drives at once renders
the whole argument pointless.

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-09 Thread Jean-Yves Avenard

On 9 January 2011 21:03, Matthew Seaman m.sea...@infracaninophile.co.uk wrote:


 So you sacrifice performance 100% of the time based on the very unlikely
 possibility of drives 1+2 or 3+4 failing simultaneously, compared to the
 similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4

But this is not what you first wrote

You said the effect were identical. they are not.

Now if you want to favour performance over redundancy that's
ultimately up to the user...

Plus, honestly, the difference in performance between raidz and raid10
is also close to bein insignificant.


 failing simultaneously?[*]  That's not a trade-off worth making IMHO.
 If the data is that valuable, you should be making copies of it to some
 independent machine all the time and backing up at frequent intervals,
 which backups you keep off-site in disaster-proof storage.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-09 Thread Patrick M. Hausen

Hi, all,

Am 09.01.2011 um 11:03 schrieb Matthew Seaman:

 [*] All of this mathematics is pretty suspect, because if two drives
 fail simultaneously in a machine, the chances are the failures are not
 independent, but due to some external cause [eg. like the case fan
 breaking and the box toasting itself.]  In which case, the comparative
 chance of whatever it is affecting three or four drives at once renders
 the whole argument pointless.


I assume you are familiar with these papers?

http://queue.acm.org/detail.cfm?id=1317403
http://queue.acm.org/detail.cfm?id=1670144

Short version: as hard disk sizes increase to 2 TB and beyond while the URE rate
stays in the order of 1 to 10^14 blocks read, the probability of encountering 
an URE
during rebuild of a single parity RAID approaches 1.

Best regards,
Patrick
-- 
punkt.de GmbH * Kaiserallee 13a * 76133 Karlsruhe
Tel. 0721 9109 0 * Fax 0721 9109 100
i...@punkt.de   http://www.punkt.de
Gf: Jürgen Egeling  AG Mannheim 108285

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

On 09/01/2011 10:24, Jean-Yves Avenard wrote:
 On 9 January 2011 21:03, Matthew Seaman m.sea...@infracaninophile.co.uk 
 wrote:
 

 So you sacrifice performance 100% of the time based on the very unlikely
 possibility of drives 1+2 or 3+4 failing simultaneously, compared to the
 similarly unlikely possibility of drives 1+3 or 1+4 or 2+3 or 2+4
 
 But this is not what you first wrote

What I said was:

  Note: raidz2 on 4 disks doesn't really win you anything over 2 x mirror
  pairs of disks, and the RAID10 mirror is going to be rather more
performant.

 You said the effect were identical. they are not.

Which is certainly not saying the effects are identical.  It's saying
the difference is too small to worry about.

 Plus, honestly, the difference in performance between raidz and raid10
 is also close to bein insignificant.

That's not my experience.  It depends on what sort of workload you have.
 If you're streaming very large files, I'd expect RAID10 and RAIDz to be
about equal.  If you're doing lots of randomly distributed small IOs,
then RAID10 is going to win hands down.

Cheers

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature

Re: New ZFSv28 patchset for 8-STABLE


 On 01/09/2011 10:00 AM, Attila Nagy wrote:

 On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz 



I've got an IO hang with dedup enabled (not sure it's related, I've 
started to rewrite all data on pool, which makes a heavy load):


The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio-i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio-i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio-i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio-i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch sleepq_wait 
__lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock VOP_LOCK1_APV 
_vn_lock nullfs_root lookup namei vn_open_cred kern_openat 
syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read 
error)
and it seems it froze the whole zpool. Removing the disk by hand solved 
the problem.

I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: New ZFSv28 patchset for 8-STABLE


 On 01/01/2011 08:09 PM, Artem Belevich wrote:

On Sat, Jan 1, 2011 at 10:18 AM, Attila Nagyb...@fsn.hu  wrote:

What I see:
- increased CPU load
- decreased L2 ARC hit rate, decreased SSD (ad[46]), therefore increased
hard disk load (IOPS graph)


...

Any ideas on what could cause these? I haven't upgraded the pool version and
nothing was changed in the pool or in the file system.

The fact that L2 ARC is full does not mean that it contains the right
data.  Initial L2ARC warm up happens at a much higher rate than the
rate L2ARC is updated after it's been filled initially. Even
accelerated warm-up took almost a day in your case. In order for L2ARC
to warm up properly you may have to wait quite a bit longer. My guess
is that it should slowly improve over the next few days as data goes
through L2ARC and those bits that are hit more often take residence
there. The larger your data set, the longer it will take for L2ARC to
catch the right data.

Do you have similar graphs from pre-patch system just after reboot? I
suspect that it may show similarly abysmal L2ARC hit rates initially,
too.


I've finally found the time to read the v28 patch and figured out the 
problem: vfs.zfs.l2arc_noprefetch was changed to 1, so it doesn't use 
the prefetched data on the L2ARC devices.
This is a major hit in my case. Enabling this again restored the 
previous hit rates and lowered the load on the hard disks significantly.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Panic 8.2 PRERELEASE WRITE_DMA48

The last half year I've been installing FreeBSD on several machines.

I installed it on my main desktop system a few weeks ago which
normally runs Linux, but I get this panic under heavy disk I/O.

It even happened during the initial sysinstall, allthough I also have
completed several buildworlds without problems.

I can trigger it easily by accessing /usr (UFS) and a linux ext
partition simultaneously, eg by copying
large files to the /usr partition.

Just bought a serial cable to enable the serial console of the various
FreeBSD installations, which is of good use for this problem, because
a crash dump is not written.

Full boot output in the attachment

Sun Jan  9 10:11:17 CET 2011
unknown: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=274799820^M
ata2: timeout waiting to issue command^M
ata2: error issuing WRITE_DMA48 command^M
g_vfs_done():ad4s2f[WRITE(offset=28915105792, length=131072)]error = 6^M
/usr: got error 6 while accessing filesystem^M
panic: softdep_deallocate_dependencies: unrecovered I/O error^M
cpuid = 0^M
KDB: stack backtrace:^M
#0 0xc08e0f77 at kdb_backtrace+0x47^M
#1 0xc08b2037 at panic+0x117^M
#2 0xc0ae2ecd at softdep_deallocate_dependencies+0x3d^M
#3 0xc0925590 at brelse+0x90^M
#4 0xc092829a at bufdone_finish+0x3fa^M
#5 0xc092830d at bufdone+0x4d^M
#6 0xc092bdf9 at cluster_callback+0x89^M
#7 0xc09282f7 at bufdone+0x37^M
#8 0xc0850ad5 at g_vfs_done+0x85^M
#9 0xc09224d9 at biodone+0xb9^M
#10 0xc084da69 at g_io_schedule_up+0x79^M
#11 0xc084e0a8 at g_up_procbody+0x68^M
#12 0xc0886fc1 at fork_exit+0x91^M
#13 0xc0bcc144 at fork_trampoline+0x8^M
Uptime: 2h56m27s^M
Physical memory: 1515 MB^M
Dumping 177 MB:ata2: timeout waiting to issue command^M
ata2: error issuing WRITE_DMA command^M
^M
** DUMP FAILED (ERROR 5) **^M
Automatic reboot in 15 seconds - press a key on the console to abort^M
Rebooting...^M


crash_script
Description: Binary data
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

On 09/01/2011 10:14, Patrick M. Hausen wrote:

 I assume you are familiar with these papers?
 
 http://queue.acm.org/detail.cfm?id=1317403
 http://queue.acm.org/detail.cfm?id=1670144
 
 Short version: as hard disk sizes increase to 2 TB and beyond while the URE 
 rate
 stays in the order of 1 to 10^14 blocks read, the probability of encountering 
 an URE
 during rebuild of a single parity RAID approaches 1.

Yes.  Rotating magnetic media seems to be bumping up against some
intrinsic performance/reliability limits to the year-on-year doubling of
capacity.  Having to add more and more extra drives to ensure the same
level of reliability is not a wining proposition in the long term.

Roll on solid state storage.  I particularly like the sound of HP and
Hynix's memristor technology. If memristors pan out, then they are going
to replace both D-RAM and hard drives, and eventually replace
transistors as the basic building block for electronic logic circuits.
Five to ten years from now, hardware design is going to be very
different, and the software that runs on it will have to be radically
redesigned to match.  Think what that means.

   * You don't have to *save* a file, ever.  If it's in memory, it's in
 persistent storage.
   * The effect on RDBMS performance is going to be awesome -- none of
 that time consuming waiting for sync-to-disk.
   * A computer should be able to survive a power outage of a few
 seconds and carry on where it left off, without specially going
 into hibernation mode.
   * Similarly, reboot will be at the flick of a switch -- pretty
 much instant on.
   * Portables will look a lot more like iPads or other tablet devices,
 and will have battery lifetimes of several days.  About the only
 significant difference is one will have a hinge down the middle
 and a built-in keyboard, while the other will only have the touch
 screen.

Oh, and let's not forget the beneficial effects of *no moving parts* and
*lower power consumption* on system reliability.  Now all we need are
the telcos to lay multi-Gb/s capacity fibre to every house and business,
and things will start to get very interesting indeed.

Cheers

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature

Re: New ZFSv28 patchset for 8-STABLE

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
  On 01/09/2011 10:00 AM, Attila Nagy wrote:
  On 12/16/2010 01:44 PM, Martin Matuska wrote:
 Hi everyone,
 
 following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
 providing a ZFSv28 testing patch for 8-STABLE.
 
 Link to the patch:
 
 http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 
 
 I've got an IO hang with dedup enabled (not sure it's related,
 I've started to rewrite all data on pool, which makes a heavy
 load):
 
 The processes are in various states:
 65747   1001  1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
 80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
  1501 www 1  440  7304K  2504K zio-i  0   2:09  0.00% nginx
  1479 www 1  440  7304K  2416K zio-i  1   2:03  0.00% nginx
  1477 www 1  440  7304K  2664K zio-i  0   2:02  0.00% nginx
  1487 www 1  440  7304K  2376K zio-i  0   1:40  0.00% nginx
  1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
  1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx
 
 And everything which wants to touch the pool is/becomes dead.
 
 Procstat says about one process:
 # procstat -k 1497
   PIDTID COMM TDNAME   KSTACK
  1497 100257 nginx-mi_switch
 sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
 VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
 kern_openat syscallenter syscall Xfast_syscall
 No, it's not related. One of the disks in the RAIDZ2 pool went bad:
 (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
 (da4:arcmsr0:0:4:0): SCSI status: Check Condition
 (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
 read error)
 and it seems it froze the whole zpool. Removing the disk by hand
 solved the problem.
 I've seen this previously on other machines with ciss.
 I wonder why ZFS didn't throw it out of the pool.

Hold on a minute.  An unrecoverable read error does not necessarily mean
the drive is bad, it could mean that the individual LBA that was
attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
block was encountered).  I would check SMART stats on the disk (since
these are probably SATA given use of arcmsr(4)) and provide those.
*That* will tell you if the disk is bad.  I'll help you decode the
attributes values if you provide them.

My understanding is that a single LBA read failure should not warrant
ZFS marking the disk UNAVAIL in the pool.  It should have incremented
the READ error counter and that's it.  Did you receive a *single* error
for the disk and then things went catatonic?

If the entire system got wedged (a soft wedge, e.g. kernel is still
alive but nothing's happening in userland), that could be a different
problem -- either with ZFS or arcmsr(4).  Does ZFS have some sort of
timeout value internal to itself where it will literally mark a disk
UNAVAIL in the case that repeated I/O transactions takes too long?
What is its error recovery methodology?

Speaking strictly about Solaris 10 and ZFS: I have seen many, many times
a system soft wedge after repeated I/O errors (read or write) are
spewed out on the console for a single SATA disk (via AHCI), but only
when the disk is used as a sole root filesystem disk (no mirror/raidz).
My impression is that ZFS isn't the problem in this scenario.  In most
cases, post-mortem debugging on my part shows that disks encountered
some CRC errors (indicating cabling issues, etc.), sometimes as few as
2, but something else went crazy -- or possibly ZFS couldn't mark the
disk UNAVAIL (if it has that logic) because it's a single disk
associated with root.  Hardware in this scenario are Hitachi SATA disks
with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06)
with ZFS v15.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

On Sun, Jan 09, 2011 at 12:33:10PM +0100, Tom Vijlbrief wrote:
 The last half year I've been installing FreeBSD on several machines.
 
 I installed it on my main desktop system a few weeks ago which
 normally runs Linux, but I get this panic under heavy disk I/O.
 
 It even happened during the initial sysinstall, allthough I also have
 completed several buildworlds without problems.
 
 I can trigger it easily by accessing /usr (UFS) and a linux ext
 partition simultaneously, eg by copying
 large files to the /usr partition.
 
 Just bought a serial cable to enable the serial console of the various
 FreeBSD installations, which is of good use for this problem, because
 a crash dump is not written.
 
 Full boot output in the attachment
 
 Sun Jan  9 10:11:17 CET 2011
 unknown: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=274799820^M
 ata2: timeout waiting to issue command^M
 ata2: error issuing WRITE_DMA48 command^M
 g_vfs_done():ad4s2f[WRITE(offset=28915105792, length=131072)]error = 6^M
 /usr: got error 6 while accessing filesystem^M
 panic: softdep_deallocate_dependencies: unrecovered I/O error^M
 cpuid = 0^M
 KDB: stack backtrace:^M
 #0 0xc08e0f77 at kdb_backtrace+0x47^M
 #1 0xc08b2037 at panic+0x117^M
 #2 0xc0ae2ecd at softdep_deallocate_dependencies+0x3d^M
 #3 0xc0925590 at brelse+0x90^M
 #4 0xc092829a at bufdone_finish+0x3fa^M
 #5 0xc092830d at bufdone+0x4d^M
 #6 0xc092bdf9 at cluster_callback+0x89^M
 #7 0xc09282f7 at bufdone+0x37^M
 #8 0xc0850ad5 at g_vfs_done+0x85^M
 #9 0xc09224d9 at biodone+0xb9^M
 #10 0xc084da69 at g_io_schedule_up+0x79^M
 #11 0xc084e0a8 at g_up_procbody+0x68^M
 #12 0xc0886fc1 at fork_exit+0x91^M
 #13 0xc0bcc144 at fork_trampoline+0x8^M
 Uptime: 2h56m27s^M
 Physical memory: 1515 MB^M
 Dumping 177 MB:ata2: timeout waiting to issue command^M
 ata2: error issuing WRITE_DMA command^M
 ^M
 ** DUMP FAILED (ERROR 5) **^M
 Automatic reboot in 15 seconds - press a key on the console to abort^M
 Rebooting...^M

Can you please provide output from the following commands (after
installing ports/sysutils/smartmontools, which should be version 5.40 or
later (in case you haven't updated your ports tree)):

$ dmesg
$ smartctl -a /dev/ad4

The SMART output should act as a verifier as to whether or not you
really do have a bad block on your disk (which is what READ/WRITE_DMA48
can sometimes indicate).

You may also want to boot the machine in single user mode and do a
manual fsck /dev/ad4s2f.  It's been proven in the past that
background_fsck doesn't manage to address all issues.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: New ZFSv28 patchset for 8-STABLE


 On 01/09/2011 01:18 PM, Jeremy Chadwick wrote:

On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:

  On 01/09/2011 10:00 AM, Attila Nagy wrote:

On 12/16/2010 01:44 PM, Martin Matuska wrote:

Hi everyone,

following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
providing a ZFSv28 testing patch for 8-STABLE.

Link to the patch:

http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz



I've got an IO hang with dedup enabled (not sure it's related,
I've started to rewrite all data on pool, which makes a heavy
load):

The processes are in various states:
65747   1001  1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
1501 www 1  440  7304K  2504K zio-i  0   2:09  0.00% nginx
1479 www 1  440  7304K  2416K zio-i  1   2:03  0.00% nginx
1477 www 1  440  7304K  2664K zio-i  0   2:02  0.00% nginx
1487 www 1  440  7304K  2376K zio-i  0   1:40  0.00% nginx
1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx

And everything which wants to touch the pool is/becomes dead.

Procstat says about one process:
# procstat -k 1497
  PIDTID COMM TDNAME   KSTACK
1497 100257 nginx-mi_switch
sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
kern_openat syscallenter syscall Xfast_syscall

No, it's not related. One of the disks in the RAIDZ2 pool went bad:
(da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
(da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
(da4:arcmsr0:0:4:0): SCSI status: Check Condition
(da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
read error)
and it seems it froze the whole zpool. Removing the disk by hand
solved the problem.
I've seen this previously on other machines with ciss.
I wonder why ZFS didn't throw it out of the pool.

Hold on a minute.  An unrecoverable read error does not necessarily mean
the drive is bad, it could mean that the individual LBA that was
attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
block was encountered).  I would check SMART stats on the disk (since
these are probably SATA given use of arcmsr(4)) and provide those.
*That* will tell you if the disk is bad.  I'll help you decode the
attributes values if you provide them.
You are right, and I gave incorrect information. There are a lot more 
errors for that disk in the logs, and the zpool was frozen.
I tried to offline the given disk. That helped in the ciss case, where 
the symptom is the same, or something similar, like there is no IO for 
ages, then something small and nothing for long seconds/minutes, and 
there are no errors logged. zpool status reported no errors, and the 
dmesg was clear too.
There I could find the bad disk by watching gstat output and there I saw 
when the very small amount of IO was done, there was one disk with 
response times well above a second, while the others responded quickly.
There the zpool offline helped. Here not, the command just got hang, 
like everything else.

So what I did then: got into the areca-cli and searched for errors.
One disk was set to failed and it seemed to be the cause. I've removed 
it (and did a camcontrol rescan, but I'm not sure it was necessary or 
not), and suddenly the zpool offline finished and everything went back 
to normal.
But there are two controllers in the system and now I see that the above 
disk is on ctrl 1, while the one I have removed is on ctrl 2.
I was misleaded by their same position. So now I have an offlined disk 
(which produces read errors, but I couldn't see them in the zpool 
output) and another, which is shown as failed in the RAID controller and 
got removed by hand (and solved the situation):

NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
label/disk20-01  ONLINE   0 0 0
label/disk20-02  ONLINE   0 0 0
label/disk20-03  ONLINE   0 0 0
label/disk20-04  ONLINE   0 0 0
label/disk20-05  OFFLINE  0 0 0
label/disk20-06  ONLINE   0 0 0
label/disk20-07  ONLINE   0 0 0
label/disk20-08  ONLINE   0 0 0
label/disk20-09  ONLINE   0 0 0
label/disk20-10  ONLINE   0 0 0
label/disk20-11  ONLINE   0 0 0
label/disk20-12  ONLINE   0 0 0
  raidz2-1   DEGRADED 0 0 0
label/disk21-01  ONLINE   0 0 0
label/disk21-02  ONLINE   0 0 0
label/disk21-03  ONLINE   0 0 0
label/disk21-04  ONLINE   0 0 0

Re: New ZFSv28 patchset for 8-STABLE

2011-01-09 Thread Rich

Once upon a time, this was a known problem with the arcmsr driver not
correctly interacting with ZFS, resulting in this behavior.

Since I'm presuming that the arcmsr driver update which was intended
to fix this behavior (in my case, at least) is in your nightly build,
it's probably worth pinging the arcmsr driver maintainer about this.

- Rich

On Sun, Jan 9, 2011 at 7:18 AM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
  On 01/09/2011 10:00 AM, Attila Nagy wrote:
  On 12/16/2010 01:44 PM, Martin Matuska wrote:
 Hi everyone,
 
 following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
 providing a ZFSv28 testing patch for 8-STABLE.
 
 Link to the patch:
 
 http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 
 
 I've got an IO hang with dedup enabled (not sure it's related,
 I've started to rewrite all data on pool, which makes a heavy
 load):
 
 The processes are in various states:
 65747   1001      1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
 80383   1001      1  54   10 40616K 30196K select  1   5:38  0.00% rsync
  1501 www         1  44    0  7304K  2504K zio-i  0   2:09  0.00% nginx
  1479 www         1  44    0  7304K  2416K zio-i  1   2:03  0.00% nginx
  1477 www         1  44    0  7304K  2664K zio-i  0   2:02  0.00% nginx
  1487 www         1  44    0  7304K  2376K zio-i  0   1:40  0.00% nginx
  1490 www         1  44    0  7304K  1852K zfs     0   1:30  0.00% nginx
  1486 www         1  44    0  7304K  2400K zfsvfs  1   1:05  0.00% nginx
 
 And everything which wants to touch the pool is/becomes dead.
 
 Procstat says about one process:
 # procstat -k 1497
   PID    TID COMM             TDNAME           KSTACK
  1497 100257 nginx            -                mi_switch
 sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
 VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
 kern_openat syscallenter syscall Xfast_syscall
 No, it's not related. One of the disks in the RAIDZ2 pool went bad:
 (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
 (da4:arcmsr0:0:4:0): SCSI status: Check Condition
 (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
 read error)
 and it seems it froze the whole zpool. Removing the disk by hand
 solved the problem.
 I've seen this previously on other machines with ciss.
 I wonder why ZFS didn't throw it out of the pool.

 Hold on a minute.  An unrecoverable read error does not necessarily mean
 the drive is bad, it could mean that the individual LBA that was
 attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
 block was encountered).  I would check SMART stats on the disk (since
 these are probably SATA given use of arcmsr(4)) and provide those.
 *That* will tell you if the disk is bad.  I'll help you decode the
 attributes values if you provide them.

 My understanding is that a single LBA read failure should not warrant
 ZFS marking the disk UNAVAIL in the pool.  It should have incremented
 the READ error counter and that's it.  Did you receive a *single* error
 for the disk and then things went catatonic?

 If the entire system got wedged (a soft wedge, e.g. kernel is still
 alive but nothing's happening in userland), that could be a different
 problem -- either with ZFS or arcmsr(4).  Does ZFS have some sort of
 timeout value internal to itself where it will literally mark a disk
 UNAVAIL in the case that repeated I/O transactions takes too long?
 What is its error recovery methodology?

 Speaking strictly about Solaris 10 and ZFS: I have seen many, many times
 a system soft wedge after repeated I/O errors (read or write) are
 spewed out on the console for a single SATA disk (via AHCI), but only
 when the disk is used as a sole root filesystem disk (no mirror/raidz).
 My impression is that ZFS isn't the problem in this scenario.  In most
 cases, post-mortem debugging on my part shows that disks encountered
 some CRC errors (indicating cabling issues, etc.), sometimes as few as
 2, but something else went crazy -- or possibly ZFS couldn't mark the
 disk UNAVAIL (if it has that logic) because it's a single disk
 associated with root.  Hardware in this scenario are Hitachi SATA disks
 with an ICH ESB2 controller, software is Solaris 10 (Generic_142901-06)
 with ZFS v15.

 --
 | Jeremy Chadwick                                   j...@parodius.com |
 | Parodius Networking                       http://www.parodius.com/ |
 | UNIX Systems Administrator                  Mountain View, CA, USA |
 | Making life hard for others since 1977.               PGP 4BD6C0CB |

 ___
 freebsd...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-fs
 To unsubscribe, send any mail to freebsd-fs-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list

Re: New ZFSv28 patchset for 8-STABLE

On Sun, Jan 09, 2011 at 01:42:13PM +0100, Attila Nagy wrote:
  On 01/09/2011 01:18 PM, Jeremy Chadwick wrote:
 On Sun, Jan 09, 2011 at 12:49:27PM +0100, Attila Nagy wrote:
   On 01/09/2011 10:00 AM, Attila Nagy wrote:
 On 12/16/2010 01:44 PM, Martin Matuska wrote:
 Hi everyone,
 
 following the announcement of Pawel Jakub Dawidek (p...@freebsd.org) I am
 providing a ZFSv28 testing patch for 8-STABLE.
 
 Link to the patch:
 
 http://people.freebsd.org/~mm/patches/zfs/v28/stable-8-zfsv28-20101215.patch.xz
 
 
 I've got an IO hang with dedup enabled (not sure it's related,
 I've started to rewrite all data on pool, which makes a heavy
 load):
 
 The processes are in various states:
 65747   1001  1  54   10 28620K 24360K tx-tx  0   6:58  0.00% cvsup
 80383   1001  1  54   10 40616K 30196K select  1   5:38  0.00% rsync
 1501 www 1  440  7304K  2504K zio-i  0   2:09  0.00% nginx
 1479 www 1  440  7304K  2416K zio-i  1   2:03  0.00% nginx
 1477 www 1  440  7304K  2664K zio-i  0   2:02  0.00% nginx
 1487 www 1  440  7304K  2376K zio-i  0   1:40  0.00% nginx
 1490 www 1  440  7304K  1852K zfs 0   1:30  0.00% nginx
 1486 www 1  440  7304K  2400K zfsvfs  1   1:05  0.00% nginx
 
 And everything which wants to touch the pool is/becomes dead.
 
 Procstat says about one process:
 # procstat -k 1497
   PIDTID COMM TDNAME   KSTACK
 1497 100257 nginx-mi_switch
 sleepq_wait __lockmgr_args vop_stdlock VOP_LOCK1_APV null_lock
 VOP_LOCK1_APV _vn_lock nullfs_root lookup namei vn_open_cred
 kern_openat syscallenter syscall Xfast_syscall
 No, it's not related. One of the disks in the RAIDZ2 pool went bad:
 (da4:arcmsr0:0:4:0): READ(6). CDB: 8 0 2 10 10 0
 (da4:arcmsr0:0:4:0): CAM status: SCSI Status Error
 (da4:arcmsr0:0:4:0): SCSI status: Check Condition
 (da4:arcmsr0:0:4:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered
 read error)
 and it seems it froze the whole zpool. Removing the disk by hand
 solved the problem.
 I've seen this previously on other machines with ciss.
 I wonder why ZFS didn't throw it out of the pool.
 Hold on a minute.  An unrecoverable read error does not necessarily mean
 the drive is bad, it could mean that the individual LBA that was
 attempted to be read resulted in ASC 0x11 (MEDIUM ERROR) (e.g. a bad
 block was encountered).  I would check SMART stats on the disk (since
 these are probably SATA given use of arcmsr(4)) and provide those.
 *That* will tell you if the disk is bad.  I'll help you decode the
 attributes values if you provide them.
 You are right, and I gave incorrect information. There are a lot
 more errors for that disk in the logs, and the zpool was frozen.
 I tried to offline the given disk. That helped in the ciss case,
 where the symptom is the same, or something similar, like there is
 no IO for ages, then something small and nothing for long
 seconds/minutes, and there are no errors logged. zpool status
 reported no errors, and the dmesg was clear too.
 There I could find the bad disk by watching gstat output and there I
 saw when the very small amount of IO was done, there was one disk
 with response times well above a second, while the others responded
 quickly.
 There the zpool offline helped. Here not, the command just got hang,
 like everything else.
 So what I did then: got into the areca-cli and searched for errors.
 One disk was set to failed and it seemed to be the cause. I've
 removed it (and did a camcontrol rescan, but I'm not sure it was
 necessary or not), and suddenly the zpool offline finished and
 everything went back to normal.
 But there are two controllers in the system and now I see that the
 above disk is on ctrl 1, while the one I have removed is on ctrl 2.
 I was misleaded by their same position. So now I have an offlined
 disk (which produces read errors, but I couldn't see them in the
 zpool output) and another, which is shown as failed in the RAID
 controller and got removed by hand (and solved the situation):
 NAME STATE READ WRITE CKSUM
 data DEGRADED 0 0 0
   raidz2-0   DEGRADED 0 0 0
 label/disk20-01  ONLINE   0 0 0
 label/disk20-02  ONLINE   0 0 0
 label/disk20-03  ONLINE   0 0 0
 label/disk20-04  ONLINE   0 0 0
 label/disk20-05  OFFLINE  0 0 0
 label/disk20-06  ONLINE   0 0 0
 label/disk20-07  ONLINE   0 0 0
 label/disk20-08  ONLINE   0 0 0
 label/disk20-09  ONLINE   0 0 0
 label/disk20-10  ONLINE   0 0 0
 label/disk20-11  ONLINE   0 0 0
 label/disk20-12  ONLINE   0 0 0
   raidz2-1   DEGRADED 0 0 0
 label/disk21-01  ONLINE   0 0 0
 label/disk21-02

Re: Panic 8.2 PRERELEASE WRITE_DMA48

I've run many fscks on /usr in single user because I had soft update
inconsistencies,
no DMA errors during those repairs.

smartctl 5.40 2010-10-16 r3189 [FreeBSD 8.2-PRERELEASE i386] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family: SAMSUNG SpinPoint F1 DT series
Device Model: SAMSUNG HD103UJ
Serial Number:S13PJ9BQC02902
Firmware Version: 1AA01113
User Capacity:1,000,204,886,016 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 3b
Local Time is:Sun Jan  9 16:40:24 2011 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status:  (   0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (11811) seconds.
Offline data collection
capabilities:(0x7b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities:(0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability:(0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time:(   2) minutes.
Extended self-test routine
recommended polling time:( 198) minutes.
Conveyance self-test routine
recommended polling time:(  21) minutes.
SCT capabilities:  (0x003f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000f   100   100   051Pre-fail
Always   -   0
  3 Spin_Up_Time0x0007   078   078   011Pre-fail
Always   -   7580
  4 Start_Stop_Count0x0032   100   100   000Old_age
Always   -   399
  5 Reallocated_Sector_Ct   0x0033   100   100   010Pre-fail
Always   -   0
  7 Seek_Error_Rate 0x000f   253   253   051Pre-fail
Always   -   0
  8 Seek_Time_Performance   0x0025   100   100   015Pre-fail
Offline  -   10097
  9 Power_On_Hours  0x0032   100   100   000Old_age
Always   -   2375
 10 Spin_Retry_Count0x0033   100   100   051Pre-fail
Always   -   0
 11 Calibration_Retry_Count 0x0012   100   100   000Old_age
Always   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
Always   -   392
 13 Read_Soft_Error_Rate0x000e   100   100   000Old_age
Always   -   0
183 Runtime_Bad_Block   0x0032   100   100   000Old_age
Always   -   0
184 End-to-End_Error0x0033   100   100   000Pre-fail
Always   -   0
187 Reported_Uncorrect  0x0032   100   100   000Old_age
Always   -   0
188 Command_Timeout 0x0032   100   100   000Old_age
Always   -   0
190 Airflow_Temperature_Cel 0x0022   057   052   000Old_age
Always   -   43 (Min/Max 42/45)
194 Temperature_Celsius 0x0022   056   050   000Old_age
Always   -   44 (Min/Max 42/46)
195 Hardware_ECC_Recovered  0x001a   100   100   000Old_age
Always   -   20728126
196 Reallocated_Event_Count 0x0032   100   100   000Old_age
Always   -   0
197 Current_Pending_Sector  0x0012   100   100   000Old_age
Always   -   0
198 Offline_Uncorrectable   0x0030   100   100   000Old_age
Offline  -   0
199 UDMA_CRC_Error_Count0x003e   100   100   000Old_age
Always   -   1
200

Specifying root mount options on diskless boot.

2011-01-09 Thread Daniel Feenberg



Daniel Braniss writes...


I have it pxebooting nicely and running with an NFS root
but it then reports locking problems: devd, syslogd, moused (and maybe
others) lock their PID file to protect against multiple instances.
Unfortunately, these daemons all start before statd/lockd and so the
locking fails and reports operation not supported.


Are you mounting /var via nfs?  We have been running FreeBSD 
diskless for several years, and have never run into this problem - but we 
use a memory filesystem. The memory filesystem can be quite small. Our 
methods are documented at


  http://www.nber.org/sys-admin/FreeBSD-diskless.html

If that isn't the problem, can you guess what we are doing differently to 
avoid it?


I note that the response to your message from danny offers the ability 
to pass arguments to the nfs mount command, but also seems to offer a fix 
for the fact that classes are not supported under PXE:


   http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/90368

I hope danny will offer a patch to mainline code - it would be an 
important improvement (and already promised in the documentation).


(I am sorry if this doesn't thread properly - I just joined the list after 
seeing the message). The thread is available at:


   http://lists.freebsd.org/pipermail/freebsd-stable/2011-January/060854.html

Daniel Feenberg
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

On Sun, Jan 09, 2011 at 04:41:43PM +0100, Tom Vijlbrief wrote:
 I've run many fscks on /usr in single user because I had soft update
 inconsistencies,
 no DMA errors during those repairs.

There's no 1:1 ratio between running fsck on a filesystem and seeing a
DMA error.  I should explain what I mean by that: just because you
receive a read or write error from a disk during operation doesn't mean
fsck will induce it.  fsck simply checks filesystem tables and so on for
integrity, it doesn't do the equivalent of a bad block scan, nor does it
check (read) every data block referenced by an inode.

So if you have a filesystem which has a bad block somewhere within a
data block, fsck almost certainly won't catch this.  ZFS, on the other
hand (specifically a zpool scrub), would/should induce such.

The reason I advocated booting into single-user and running a fsck
manually is because there's confirmation that background fsck doesn't
catch/handle all filesystem consistency errors that a foreground fsck
does.  This is why I continue to advocate background_fsck=no in
rc.conf(5).  That's for another discussion though.

Let's review the disk:

 === START OF INFORMATION SECTION ===
 Model Family: SAMSUNG SpinPoint F1 DT series
 Device Model: SAMSUNG HD103UJ
 Serial Number:S13PJ9BQC02902
 Firmware Version: 1AA01113
 User Capacity:1,000,204,886,016 bytes
 Device is:In smartctl database [for details use: -P show]
 ATA Version is:   8
 ATA Standard is:  ATA-8-ACS revision 3b
 Local Time is:Sun Jan  9 16:40:24 2011 CET
 SMART support is: Available - device has SMART capability.
 SMART support is: Enabled

 ... 

 ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  
 WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate 0x000f   100   100   051Pre-fail  Always 
   -   0
   3 Spin_Up_Time0x0007   078   078   011Pre-fail  Always 
   -   7580
   4 Start_Stop_Count0x0032   100   100   000Old_age  Always  
  -   399
   5 Reallocated_Sector_Ct   0x0033   100   100   010Pre-fail  Always 
   -   0
   7 Seek_Error_Rate 0x000f   253   253   051Pre-fail  Always 
   -   0
   8 Seek_Time_Performance   0x0025   100   100   015Pre-fail  Offline
   -   10097
   9 Power_On_Hours  0x0032   100   100   000Old_age  Always  
  -   2375
  10 Spin_Retry_Count0x0033   100   100   051Pre-fail  Always 
   -   0
  11 Calibration_Retry_Count 0x0012   100   100   000Old_age  Always  
  -   0
  12 Power_Cycle_Count   0x0032   100   100   000Old_age  Always  
  -   392
  13 Read_Soft_Error_Rate0x000e   100   100   000Old_age  Always  
  -   0
 183 Runtime_Bad_Block   0x0032   100   100   000Old_age  Always  
  -   0
 184 End-to-End_Error0x0033   100   100   000Pre-fail  Always 
   -   0
 187 Reported_Uncorrect  0x0032   100   100   000Old_age  Always  
  -   0
 188 Command_Timeout 0x0032   100   100   000Old_age  Always  
  -   0
 190 Airflow_Temperature_Cel 0x0022   057   052   000Old_age  Always  
  -   43 (Min/Max 42/45)
 194 Temperature_Celsius 0x0022   056   050   000Old_age  Always  
  -   44 (Min/Max 42/46)
 195 Hardware_ECC_Recovered  0x001a   100   100   000Old_age  Always  
  -   20728126
 196 Reallocated_Event_Count 0x0032   100   100   000Old_age  Always  
  -   0
 197 Current_Pending_Sector  0x0012   100   100   000Old_age  Always  
  -   0
 198 Offline_Uncorrectable   0x0030   100   100   000Old_age  Offline 
  -   0
 199 UDMA_CRC_Error_Count0x003e   100   100   000Old_age  Always  
  -   1
 200 Multi_Zone_Error_Rate   0x000a   100   100   000Old_age  Always  
  -   0
 201 Soft_Read_Error_Rate0x000a   100   100   000Old_age  Always  
  -   0

Your drive looks fine.  Attribute 195 isn't anything to worry about
(vendor-specific encoding makes this number appear large).  Attribute
199 indicates one CRC error, but again nothing to worry about -- but
could explain a single error during the lifetime of the drive
(impossible to determine when it happened).

 SMART Self-test log structure revision number 1
 Num  Test_DescriptionStatus  Remaining  LifeTime(hours)  
 LBA_of_first_error
 # 1  Short offline   Completed without error   00%  2361 -
 # 2  Short offline   Completed without error   00%  2205 -
 # 3  Short offline   Completed without error   00%  2138 -
 # 4  Extended offlineCompleted without error   00%  2109 -
 # 5  Short offline   Completed without error   00%  2105 -
 # 6  Short offline   Completed without error   00%  2092 -
 # 7  Short offline   Completed without error   00%

Re; NFS performance

2011-01-09 Thread george+freebsd

It has been suggested that I move this thread to freebsd-stable.  The
thread so far (deficient NFS performance in FreeBSD 8):

http://lists.freebsd.org/pipermail/freebsd-hackers/2011-January/034006.html

I updated my kernel to FreeBSD 8.2-PRERELEASE.  This improved my
throughput, but still not to the level I got from 7.3-STABLE.  Here's
an updated table from my original message:

Observed bytes per second (dd if=filename of=/dev/null bs=65536):
Source machine:mattapan scollay  sullivan
Destination machine:
wonderland/7.3-STABLE  870K5.2M  1.8M
wonderland/8.1-STABLE  496K690K  420K
wonderland/8.2-PRERELEASE  800K1.2M  447K

Furthermore, I was still able to induce the NFS server not responding
message with 8.2-PRERELEASE.  So I applied the patch from Rick Macklem.
The throughput did not change, but I haven't seen the NFS server not
responding message yet.

As to an earlier question about NFS options: I'm not setting any, so
they are whatever the automounter uses by default.   -- George

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Specifying root mount options on diskless boot.

2011-01-09 Thread Rick Macklem

 Daniel Braniss writes...
 
  I have it pxebooting nicely and running with an NFS root
  but it then reports locking problems: devd, syslogd, moused (and
  maybe
  others) lock their PID file to protect against multiple instances.
  Unfortunately, these daemons all start before statd/lockd and so the
  locking fails and reports operation not supported.
 
 Are you mounting /var via nfs?
You can use the nolockd mount option to make locking happen locally in
the client. (Only a problem if the file(s) being locked are concurrently
shared with other clients.) I don't know if this would fix your diskless
problem.

rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Re; NFS performance

2011-01-09 Thread Rick Macklem

 It has been suggested that I move this thread to freebsd-stable. The
 thread so far (deficient NFS performance in FreeBSD 8):
 
 http://lists.freebsd.org/pipermail/freebsd-hackers/2011-January/034006.html
 
 I updated my kernel to FreeBSD 8.2-PRERELEASE. This improved my
 throughput, but still not to the level I got from 7.3-STABLE. Here's
 an updated table from my original message:
 
 Observed bytes per second (dd if=filename of=/dev/null bs=65536):
 Source machine: mattapan scollay sullivan
 Destination machine:
 wonderland/7.3-STABLE 870K 5.2M 1.8M
 wonderland/8.1-STABLE 496K 690K 420K
 wonderland/8.2-PRERELEASE 800K 1.2M 447K
 
 Furthermore, I was still able to induce the NFS server not
 responding
 message with 8.2-PRERELEASE. So I applied the patch from Rick Macklem.
 The throughput did not change, but I haven't seen the NFS server not
 responding message yet.

So, did the patch get rid of the 1min + stalls you reported earlier?

Beyond that, all I can suggest is trying to fiddle with some of the
options on the net device driver, such as rxcsum, txcsum and tso.
(I think tso has had some issues for some drivers, but I don't know
 any specifics.) When I've seen poor NFS perf. it has usually been a
problem at the network device driver level. (If you have a different
kind of network card handy, you could try swapping them. Basically
one with a different net chipset so that it uses a different net device
driver.)

rick
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

2011/1/9 Jeremy Chadwick free...@jdc.parodius.com:


 errno 6 is device not configured.  ad4 is on a Silicon Image
 controller (thankfully a reliable model).  Sadly AHCI (ahci.ko) isn't in
 use here; I would advocate switching to it (your device names will
 change however) and see if these errors continue (they'll appear as SCSI
 CAM errors though).  ahci_load=yes in /boot/loader.conf should be
 enough.  smartmontools does know to talk ATA to /dev/adaX (that's not a
 typo) disks.


Made that change, ahci is loaded

[...@swanbsd /usr/home/tom]$ kldstat
Id Refs AddressSize Name
 1   16 0xc040 bd9998   kernel
 21 0xc0fda000 88a8 snd_emu10k1.ko
 33 0xc0fe3000 579b0sound.ko
 41 0xc103b000 4df90c   nvidia.ko
 51 0xc151b000 c108 ahci.ko
[...@swanbsd /usr/home/tom]$ cat /boot/loader.conf
snd_emu10k1_load=YES
nvidia_load=YES
ahci_load=YES
#console=comconsole
[...@swanbsd /usr/home/tom]$

But the device naming is unchanged:

[...@swanbsd /usr/home/tom]$ ls /dev/a*
/dev/acd0   /dev/ad0s2  /dev/ad4s1  /dev/ad4s2e /dev/apm0
/dev/acd1   /dev/ad0s5  /dev/ad4s2  /dev/ad4s2f /dev/ata
/dev/acpi   /dev/ad0s6  /dev/ad4s2a /dev/ad4s3  /dev/atkbd0
/dev/ad0/dev/ad0s7  /dev/ad4s2b /dev/ad4s4  /dev/audit
/dev/ad0s1  /dev/ad4/dev/ad4s2d /dev/ad4s5
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

On Sun, Jan 09, 2011 at 09:02:16PM +0100, Tom Vijlbrief wrote:
 2011/1/9 Jeremy Chadwick free...@jdc.parodius.com:
 
 
  errno 6 is device not configured.  ad4 is on a Silicon Image
  controller (thankfully a reliable model).  Sadly AHCI (ahci.ko) isn't in
  use here; I would advocate switching to it (your device names will
  change however) and see if these errors continue (they'll appear as SCSI
  CAM errors though).  ahci_load=yes in /boot/loader.conf should be
  enough.  smartmontools does know to talk ATA to /dev/adaX (that's not a
  typo) disks.
 
 
 Made that change, ahci is loaded
 
 [...@swanbsd /usr/home/tom]$ kldstat
 Id Refs AddressSize Name
  1   16 0xc040 bd9998   kernel
  21 0xc0fda000 88a8 snd_emu10k1.ko
  33 0xc0fe3000 579b0sound.ko
  41 0xc103b000 4df90c   nvidia.ko
  51 0xc151b000 c108 ahci.ko
 [...@swanbsd /usr/home/tom]$ cat /boot/loader.conf
 snd_emu10k1_load=YES
 nvidia_load=YES
 ahci_load=YES
 #console=comconsole
 [...@swanbsd /usr/home/tom]$
 
 But the device naming is unchanged:
 
 [...@swanbsd /usr/home/tom]$ ls /dev/a*
 /dev/acd0   /dev/ad0s2  /dev/ad4s1  /dev/ad4s2e /dev/apm0
 /dev/acd1   /dev/ad0s5  /dev/ad4s2  /dev/ad4s2f /dev/ata
 /dev/acpi   /dev/ad0s6  /dev/ad4s2a /dev/ad4s3  /dev/atkbd0
 /dev/ad0/dev/ad0s7  /dev/ad4s2b /dev/ad4s4  /dev/audit
 /dev/ad0s1  /dev/ad4/dev/ad4s2d /dev/ad4s5

I'm sorry, I gave you incorrect advice; I'm used to Intel controllers
with AHCI, not Silicon Image controllers.  Silicon Image controllers
have their own driver: siis(4).

Please change ahci_load=yes to siis_load=yes.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

RE: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011-01-09 Thread Chris Forgeron

 On 6 January 2011 22:26, Chris Forgeron cforge...@acsi.ca wrote:
  You know, these days I'm not as happy with SSD's for ZIL. I may blog about 
  some of the speed results I've been getting over the last 6mo-1yr that 
  I've been running them with ZFS. I think people should be using hardware 
  RAM drives. You can get old Gigabyte i-RAM drives with 4 gig of memory for 
  the cost of a 60 gig SSD, and it will trounce the SSD for speed.
 

(I'm making an updated comment on my previous comment. Sorry for the topic 
drift, but I think this is important to consider)

I decided to do some tests between my Gigabyte i-RAM and OCZ Vertex 2 SSD. I've 
found that they are both very similar for Random 4K-aligned Write speed (I was 
receiving around 17,000 IOPS on both, slightly faster ms access time for the 
i-RAM). Now, if you're talking 512b aligned writes (which is what ZFS is unless 
you've tweaked the ashift value) you're going to win with an i-RAM device. The 
OCZ Drops down to ~6000 IOPS for 512b random writes.

Please note, that's on a used Vertex 2. A fresh Vertex 2 was giving me 28,000 
IOPS on 4k aligned writes - Faster than the i-RAM. But with more time, it will 
be slower than the i-RAM due to SSD fade. 

I'm seriously considering trading in my ZIL SSD's for i-RAM devices, they are 
around the same price if you can still find them, and they won't degrade like 
an SSD does. ZIL doesn't need much storage space. I think 12 gig (3 I-RAM's) 
would do nicely, and would give me an aggregate IOPS close to a ddrdrive for 
under $500. 

I did some testing with SSD Fade recently, here's the link to my blog on it if 
anyone cares for more detail - 
http://christopher-technicalmusings.blogspot.com/2011/01/ssd-fade-its-real-and-why-you-may-not.html

I'm still using SSDs for my ZIL, but I think I'll be switching over to some 
sort of RAM device shortly. I wish the i-RAM in 3.5 format had proper SATA 
power connectors on the back so it could plug into my SAS backplane like the 
OCZ 3.5 SSDs do. As it stands, I'd have to rig something, as my SAN head 
doesn't have any PCI controller slots for the other i-RAM format.


-Original Message-
From: owner-freebsd-sta...@freebsd.org 
[mailto:owner-freebsd-sta...@freebsd.org] On Behalf Of Markiyan Kushnir
Sent: Friday, January 07, 2011 8:10 AM
To: Jeremy Chadwick
Cc: Chris Forgeron; freebsd-stable@freebsd.org; Artem Belevich; Jean-Yves 
Avenard
Subject: Re: ZFS - moving from a zraid1 to zraid2 pool with 1.5tb disks

2011/1/7 Jeremy Chadwick free...@jdc.parodius.com:
 On Fri, Jan 07, 2011 at 12:29:17PM +1100, Jean-Yves Avenard wrote:
 On 6 January 2011 22:26, Chris Forgeron cforge...@acsi.ca wrote:
  You know, these days I'm not as happy with SSD's for ZIL. I may blog about 
  some of the speed results I've been getting over the last 6mo-1yr that 
  I've been running them with ZFS. I think people should be using hardware 
  RAM drives. You can get old Gigabyte i-RAM drives with 4 gig of memory for 
  the cost of a 60 gig SSD, and it will trounce the SSD for speed.
 
  I'd put your SSD to L2ARC (cache).

 Where do you find those though.

 I've looked and looked and all references I could find was that
 battery-powered RAM card that Sun used in their test setup, but it's
 not publicly available..

 DDRdrive:
  http://www.ddrdrive.com/
  http://www.engadget.com/2009/05/05/ddrdrives-ram-based-ssd-is-snappy-costly/

 ACard ANS-9010:
  http://techreport.com/articles.x/16255

 GC-RAMDISK (i-RAM) products:
  http://us.test.giga-byte.com/Products/Storage/Default.aspx

 Be aware these products are absurdly expensive for what they offer (the
 cost isn't justified), not to mention in some cases a bottleneck is
 imposed by use of a SATA-150 interface.  I'm also not sure if all of
 them offer BBU capability.

 In some respects you might be better off just buying more RAM for your
 system and making md(4) memory disks that are used by L2ARC (cache).
 I've mentioned this in the past (specifically back in the days when
 the ARC piece of ZFS on FreeBSD was causing havok, and asked if one
 could work around the complexity by using L2ARC with md(4) drives
 instead).


Once you have got extra RAM, why not just reserve it directly to ARC
(via vm.kmem_size[_max] and vfs.zfs.arc_max)?

Markiyan.

 I tried this, but couldn't get rc.d/mdconfig2 to do what I wanted on
 startup WRT the aforementioned.

 --
 | Jeremy Chadwick                                   j...@parodius.com |
 | Parodius Networking                       http://www.parodius.com/ |
 | UNIX Systems Administrator                  Mountain View, CA, USA |
 | Making life hard for others since 1977.               PGP 4BD6C0CB |

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list

Re: usb errors with 8 stable

2011-01-09 Thread Beach Geek

It's looking more like a hardware failure.
I connected the phone to a friends HP Pavilion dv8xxx running FreeBSD 8
r217175 and it detected it  created da devices.
I've upgraded to r217175 and still doesn't work.

When I get back to lab, I'll test hardware and go from there.

Thanks for the suggestions, and I'll post an update when I know more.

Beach Geek

On Jan 5, 2011 12:38 AM, Jeremy Chadwick free...@jdc.parodius.com wrote:

On Tue, Jan 04, 2011 at 11:37:48PM -0600, Beach Geek wrote:
 Compaq Presario 5xxx 2GHz
 FreeBSD 8 ...
I would start by reviewing the commits for RELENG_8 between the two
timeframes and try to narrow down which commit may have caused your
problem.

http://www.freshbsd.org/?branch=RELENG_8project=freebsd

--
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.   PGP 4BD6C0CB |
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

2011/1/9 Jeremy Chadwick free...@jdc.parodius.com:


 I'm sorry, I gave you incorrect advice; I'm used to Intel controllers
 with AHCI, not Silicon Image controllers.  Silicon Image controllers
 have their own driver: siis(4).

 Please change ahci_load=yes to siis_load=yes.


Tried it, but no change, the siis driver does not support the 3512
according to its man page,
so I'm probably stuck with the default driver.

Did the full disk scan, smartctl -t select,0-max /dev/ad4.

It completed this night, no errors, so the disk is fine.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48

2011/1/9 Jeremy Chadwick free...@jdc.parodius.com:


 Not to get off topic, but what is causing this?  It looks like you have
 a cron job or something very aggressive doing a smartctl -t short
 /dev/ad4 or equivalent.  If you have such, please disable this
 immediately.  You shouldn't be doing SMART tests with such regularity;
 it accomplishes absolutely nothing, especially the short tests.  Let
 the drive operate normally, otherwise run smartd and watch logs instead.


I have this default entry (from the author of that file) in
smartd.conf and enabled it on many machines over the years.
Is it a bad practice?

# First (primary) ATA/IDE hard disk.  Monitor all attributes, enable
# automatic online data collection, automatic Attribute autosave, and
# start a short self-test every day between 2-3am, and a long self test
# Saturdays between 3-4am.
/dev/hda -a -o on -S on -s (S/../.././02|L/../../6/03)

Thanks for all your feedback
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Panic 8.2 PRERELEASE WRITE_DMA48