[zfs-discuss] file system under heavy load, how to find out what the cause is?

2011-09-15 Thread Carsten Aulbert
%15290 
-


Has nayone any idea what's going on here?

Cheers

carsten
-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6
CaCert Assurer | Get free certificates from http://www.cacert.org/


smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver/scrub times?

2010-12-20 Thread Carsten Aulbert
Hi

On Sunday 19 December 2010 11:12:32 Tobias Lauridsen wrote:
 sorry to bring the old one up, but I think it is better than make a new one
 ?? Are there some one who have some resilver time from a raidz1/2 pool
 whith  5TB+ data on it ?

if you just looked into the discussion over the past day (or week), you would 
learn that the resilver time depends on the amount of writes to the system 
while resilvering. On an idle system you might be able to guesstimate this by 
taking the disk size and the number of iops of the disk and the system into 
account, usually a couple of hours should be alright.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Carsten Aulbert
Hi all

one of our system just developed something remotely similar:


s06:~# zpool status
  pool: atlashome
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 67h18m, 100.00% done, 0h0m to go
config:

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz2-0DEGRADED 0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
replacing-3   DEGRADED 0 0 0
  c7t0d0s0/o  FAULTED  0 0 0  corrupted data
  c7t0d0  ONLINE   0 0 0  678G resilvered

[...]

It's 100% done for more than a day now, system is running fully patched 
Solaris 10 (patchref from September 10th or 13th I believe)

Has someone an idea how it is possible to resilver 678G of data on a 500G 
drive?

s06:~# iostat -En c7t0d0
c7t0d0   Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 
Vendor: ATA  Product: HITACHI HDS7250S Revision: AV0A Serial No:  
Size: 500.11GB 500107861504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 
Illegal Request: 197 Predictive Failure Analysis: 0 

I still have to upgrade the zpool versin, but wanted to wait for the resilver 
to complete.

Any ideas?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] resilver that never finishes

2010-09-18 Thread Carsten Aulbert
Hi

On Saturday 18 September 2010 10:02:42 Ian Collins wrote:
 
 I see this all the time on a troublesome Thumper.  I believe this
 happens because the data in the pool is continuously changing.

Ah ok, that may be, there is one particular active user on this box right now.

Interesting I've never seen this in the past.

Is there really an end to this and do I just have to wait?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Suggested RaidZ configuration...

2010-09-06 Thread Carsten Aulbert
Hi

On Monday 06 September 2010 17:53:44 hatish wrote:
 Im setting up a server with 20x1TB disks. Initially I had thought to setup
 the disks using 2 RaidZ2 groups of 10 discs. However, I have just read the
 Best Practices guide, and it says your group shouldnt have  9 disks. So
 Im thinking a better configuration would be 2 x 7disk RaidZ2 + 1 x 6disk
 RaidZ2. However its 14TB worth of data instead of 16TB.
 
 What are your suggestions and experiences?

Another one is that in one pool all vdev should be equal, i.e. not mixed like 
2x7 and 1x6 (this configuration you most likely will need to force anyway).

First, I'd assess what you want/expect from this file system in then end. 
Maximum performance, maximum reliability or maximum size - as always pick two 
;)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS development moving behind closed doors

2010-08-16 Thread Carsten Aulbert
On Sunday 15 August 2010 11:56:22 Joerg Moellenkamp wrote:
 And by the way: Wasn't there a
 comment of Linus Torvals recently that people shound move their
 low-quality code into the codebase ??? ;)

Yeah, those codes should be put into the staging part of the codebase, so 
that (more) people can work on it to insufficient quality code with a great 
idea behind better until it meets the quality of the mainline kernel.

As you rightly pointed out, this is a development model which works nicely 
with open source in an open environment where developers are all around the 
globe and have a largely varying programming skill. I don't think that 
something like this would work in a (possibly much smaller) corporate 
environment/software engineering group.

That said, I think it's actually a very good thing, to have this opportunity 
to push low-quality/non-conforming software into a controlled environment for 
polishing.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Find out which of many FS from a zpool is busy?

2010-04-22 Thread Carsten Aulbert
Hi all,

sorry if this is in any FAQ - then I've clearly missed it.

Is there an easy or at least straight forward way to determine which of n ZFS 
is currently under heavy NFS load?

Once upon a time, when one had old style file systems and exported these as a 
whole iostat -x came in handy, however, with zpools, this is not the case 
anymore, right?

Imagine

zpool create tank . (many devices here)
zfs set sharenfs=on tank
zfs create tank/a
zfs create tank/b
zfs create tank/c
[...]
zfs create tank/z

Now, you have these lovely number of ZFS but how to find out which user is 
currently (ab)using the system most?

Cheers
Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Find out which of many FS from a zpool is busy?

2010-04-22 Thread Carsten Aulbert
Hi

On Thursday 22 April 2010 16:33:51 Peter Tribble wrote:
 fsstat?
 
 Typically along the lines of
 
 fsstat /tank/* 1
 

Sh**, I knew about fsstat but never ever even tried to run it on many file 
systems at once. D'oh.

*sigh* well, at least a good one for the archives...

Thanks a lot!

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS-8000-8A: Able to go back to normal without destroying whole pool?

2010-04-11 Thread Carsten Aulbert
Hi all,

on Friday night two disk in one raidz2 vdev decided to die within a couple of 
minutes. Swapping drives and resilvering one at a time worked quite ok, 
however, now I'm faced with a nasty problem:

s07:~# zpool status -v
  pool: atlashome  
 state: ONLINE 
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A  
 scrub: resilver completed with 1 errors on Sat Apr 10 10:23:14 2010

[...]
errors: Permanent errors have been detected in the following files:

atlashome/BACKUP/userA:0x962de4

The web page is pretty generic (IMHO), I would like to restore this file from 
backup (or actually its origin) or simply unlink it permanently. But how do I 
find this blob and how to fix it without replaying the full pool (Is it really 
a pool issue or the file system).

TIA for any advice

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on a 11TB HW RAID-5 controller

2010-03-24 Thread Carsten Aulbert
Hi

On Wednesday 24 March 2010 17:01:31 Dusan Radovanovic wrote:

  connected to P212 controller in RAID-5. Could someone direct me or suggest
  what I am doing wrong. Any help is greatly appreciated.
 

I don't know, but I would get around this like this:

My suggestion would be to configure the HW RAID controller to act as a dumb 
JBOD controller and thus make the 12 disks visible to the OS.

Then you can start playing around with ZFS on these disks, e.g. creating 
different pools:

zpool create testpool raidz c0t0d0 c0t1d0 c0t2d0 c0t3d0 c0t4d0 c0t5d0 \
  raidz c0t6d0 c0t7d0 c0t8d0 c0t9d0 c0t10d0 c0t11d0

(Caveat: this is from the top of my head and might be - very -wrong). This 
would create something like RAID50.

Then I would start reading, reading and testing and testing :)

HTH

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thoughts on ZFS Pool Backup Strategies

2010-03-18 Thread Carsten Aulbert
Hi all

On Thursday 18 March 2010 13:54:52 Joerg Schilling wrote:
 If you have no technical issues to discuss, please stop insulting
 people/products.
 
 We are on OpenSolaris and we don't like this kind of discussions on the
  mailing lists. Please act collaborative.
 

May I suggest this to both of you.


 It has been widely discussed here already that the output of zfs send
  cannot be used as a backup.

Depends on the exact definition of backup, e.g. if I may take this from 
wikipedia: 

In information technology, a backup or the process of backing up refers to 
making copies of data so that these additional copies may be used to restore 
the original after a data loss event.

In this regard zfs send *could* be a tool for a backup provided you have the 
means of decrypting/deciphering the blob coming out of it. OTOH if I used zfs 
send to replicate data to another machine/location together with zfs receive 
and put a label backup onto the receiver this would also count as a backup 
from where you can restore everything and/or partially.

In case of 'star' the blob coming out of it might also be useless if you don't 
have star (or other tools) around for deciphering it - very unlikely, but 
still possible ;)

Of course your (plural!) definition of backup may vary, thus I would propose 
first to settle on this before exchanging blows...

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs/sol10u8 less stable than in sol10u5?

2010-02-04 Thread Carsten Aulbert
Hi all,

it might not be a ZFS issue (and thus on the wrong list), but maybe there's 
someone here who might be able to give us a good hint:

We are operating 13 x4500 and started to play with non-Sun blessed SSDs in 
there. As we were running Solaris 10u5 before and wanted to use them as log 
devices we upgraded to the latest and greatest 10u8 and changed the zpool 
layout[1]. However, on the first machine we found many, many problems with 
various disks failing in different vdevs (I wrote about this in December on 
this list IIRC).

After going through this with Sun they gave us hints but mostly blamed (maybe 
rightfully the Intel X25e in there), we considered the 2.5 to 2.5 converter 
to be at fault as well. Thus we did the next test by placing the SSD into the 
tray without a conversion unit, but that box (a different one) failed with the 
same problems.

Now, we learned from this experience and did the same to another box but 
without the SSD, i.e. jumpstarted the box and installed 10u8, redid the zpool 
and started to fill data in. In today's scrub suddenly this happened:

s09:~# zpool status   
  pool: atlashome 
 state: DEGRADED  
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
using 'zpool clear' or replace the device with 'zpool replace'. 
   see: http://www.sun.com/msg/ZFS-8000-9P  
 scrub: resilver in progress for 0h9m, 3.89% done, 4h2m to go   
config: 

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c4t0d0ONLINE   0 0 0
c6t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c4t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c7t1d0ONLINE   0 0 1
c0t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 2
c4t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c6t2d0ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c4t3d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
c5t3d0ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 1
spare DEGRADED 0 0 0
  c4t4d0  DEGRADED 5 011  too many errors
  c0t4d0  ONLINE   0 0 0  5.38G resilvered
  raidz1  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c6t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 0
c1t5d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c4t5d0ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c0t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c1t6d0ONLINE   0 0 0
c4t6d0ONLINE   0 0 0
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 0
c4t7d0ONLINE   0 0 0
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
spares
  c0t4d0  INUSE currently in use
  c7t7d0  AVAIL


Also similar to the other hosts were the much, much higher Soft/Hard error 
count in iostat:

s09:~# iostat -En|grep Soft
c2t0d0   Soft Errors: 1 Hard Errors: 2 Transport 

Re: [zfs-discuss] How to grow ZFS on growing pool?

2010-02-02 Thread Carsten Aulbert
Hi Jörg,

On Tuesday 02 February 2010 16:40:50 Joerg Schilling wrote:
 After that, the zpool did notice that there is more space:
 
 zpool list
 NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
 test   476M  1,28M   475M 0%  ONLINE  -
 

That's the size already after the initial creation, after exporting and 
importing it again:

# zpool list
NAMESIZE   USED  AVAILCAP  HEALTH  ALTROOT
test976M   252K   976M 0%  ONLINE  -

 the ZFS however did not grow:
 
 zfs list
 NAME USED  AVAIL  REFER  MOUNTPOINT
 test 728K   251M   297K  /test
 

# zfs list test
NAME   USED  AVAIL  REFER  MOUNTPOINT
test   139K   549M  37.5K  /test


I think you fell into the tarp that zpool just adds up all rows, especially 
visible on a thumper when it's under heavy load, the read and write operations 
per time slice for each vdev seem to be just the individual sums of the 
devices underneath.

But this still does not explain why the pool is larger ater exporting and 
reimporting.

Cheers

Carsten

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-21 Thread Carsten Aulbert
On Thursday 21 January 2010 10:29:16 Edward Ned Harvey wrote:
  zpool create -f testpool mirror c0t0d0 c1t0d0 mirror c4t0d0 c6t0d0
   mirror c0t1d0 c1t1d0 mirror c4t1d0 c5t1d0 mirror c6t1d0 c7t1d0
  mirror c0t2d0 c1t2d0
   mirror c4t2d0 c5t2d0 mirror c6t2d0 c7t2d0 mirror c0t3d0 c1t3d0
  mirror c4t3d0 c5t3d0
   mirror c6t3d0 c7t3d0 mirror c0t4d0 c1t4d0 mirror c4t4d0 c6t4d0
  mirror c0t5d0 c1t5d0
   mirror c4t5d0 c5t5d0 mirror c6t5d0 c7t5d0 mirror c0t6d0 c1t6d0
  mirror c4t6d0 c5t6d0
   mirror c6t6d0 c7t6d0 mirror c0t7d0 c1t7d0 mirror c4t7d0 c5t7d0
  mirror c6t7d0 c7t7d0
   mirror c7t0d0 c7t4d0
 
 This looks good.  But you probably want to stick a spare in there, and
  add a SSD disk specified by log

May I jump in here an ask how people are using SSDs relibly in a x4500? So far 
we had very little success with X25-E drives and a converter from 3.5 to 2.5 
inches. So far two systems have shown pretty bad instabilities with that.

Anyone with a success here?

Cheers

Carste
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-21 Thread Carsten Aulbert
Hi

On Friday 22 January 2010 07:04:06 Brad wrote:
 Did you buy the SSDs directly from Sun?  I've heard there could possibly be
  firmware that's vendor specific for the X25-E.

No.

So far I've heard that they are not readily available as certification 
procedures are still underway (apart from this the 8850 firmware should be ok, 
but that's just what I've heard).

C
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed to find out where the problem is

2009-11-27 Thread Carsten Aulbert
Hi all,

On Thursday 26 November 2009 17:38:42 Cindy Swearingen wrote:
 Did anything about this configuration change before the checksum errors
 occurred?
 

No, This machine is running in this configuration for a couple of weeks now

 The errors on c1t6d0 are severe enough that your spare kicked in.
 
Yes and overnight more spare would have kicked in if available:

s13:~# zpool status 
  pool: atlashome   
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
using 'zpool clear' or replace the device with 'zpool replace'. 
   see: http://www.sun.com/msg/ZFS-8000-9P  
 scrub: resilver completed after 5h46m with 0 errors on Thu Nov 26 15:55:22 
2009
config: 

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
c8t0d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 1
c6t1d0ONLINE   0 0 6
c7t1d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c8t1d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 3
c6t2d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c8t2d0ONLINE   0 0 1
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c8t3d0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c8t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 1
c1t5d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c8t5d0ONLINE   0 0 1
c0t6d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
spare DEGRADED 0 0 0
  c1t6d0  DEGRADED 6 017  too many errors
  c8t7d0  ONLINE   0 0 0  130G resilvered
c5t6d0ONLINE   0 0 0
c6t6d0DEGRADED 0 041  too many errors
c7t6d0DEGRADED 1 014  too many errors
c8t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 1
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
c7t7d0ONLINE   0 0 0
logs
  c6t4d0  ONLINE   0 0 0
spares
  c8t7d0  INUSE currently in use

errors: No known data errors
 You can use the fmdump -eV command to review the disk errors that FMA has
 detected. This command can generate a lot of output but you can see if
 the checksum errors on the disks are transient or if they occur repeatedly.
 

Hmm, The output does not seem to stop. After about 1.3 GB of file size I 
stopped it. There seem to be a few different types here:

Nov 04 2009 15:54:08.039456458 ereport.fs.zfs.checksum
nvlist version: 0
class = ereport.fs.zfs.checksum
ena = 0x403c56a7d4a1
detector = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0xea7c0de1586275c7
vdev = 0xfca535aa8bbc70d1
(end detector)

pool = atlashome
pool_guid = 0xea7c0de1586275c7
pool_context = 0

Re: [zfs-discuss] Help needed to find out where the problem is

2009-11-27 Thread Carsten Aulbert
Hi Bob

On Friday 27 November 2009 17:19:22 Bob Friesenhahn wrote:
 
 It is interesting that in addition to being in the same vdev, the
 disks encountering serious problems are all target 6.  Besides
 something at the zfs level, there could be some some issue at the
 device driver, or underlying hardware level.  Or maybe just bad luck.
 
 As I recall, Albert Chin-A-Young posted about a pool failure where
 many devices in the same raidz2 vdev spontaneously failed somehow (in
 his case the whole pool was lost).  He is using different hardware but
 this looks somewhat similar.

It looks quite similar as this one:

http://www.mail-archive.com/storage-disc...@opensolaris.org/msg06125.html

we swapped the drive and resilvering is almost though and the vdev is showing 
a large number of errors:

 raidz1DEGRADED 0 0 1
spare   DEGRADED 0 0 8.81M
  replacing DEGRADED 0 0 0
c1t6d0s0/o  FAULTED  6 017  corrupted data
c1t6d0  ONLINE   0 0 0  120G resilvered
  c8t7d0ONLINE   0 0 0  120G resilvered
c5t6d0  ONLINE   0 0 0
c6t6d0  DEGRADED 0 041  too many errors
c7t6d0  DEGRADED 1 014  too many errors
c8t6d0  ONLINE   0 0 1


If having all sixes is a problem, maybe we should try to use a diagonal 
approach the next time (or solve the n-queen problem on a rectangular thumper 
layout)...

I guess after resilvering the next step will be zpool clear and a new scrub, 
but I fear that will show errors again.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help needed to find out where the problem is

2009-11-27 Thread Carsten Aulbert
Hi Ross,

On Friday 27 November 2009 21:31:52 Ross Walker wrote:
 I would plan downtime to physically inspect the cabling.

There is not much cabling as the disks are directly connected to a large 
backplane (Sun Fire X4500)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help needed to find out where the problem is

2009-11-26 Thread Carsten Aulbert
Hi all,

on a x4500 with a relatively well patched Sol10u8

# uname -a
SunOS s13 5.10 Generic_141445-09 i86pc i386 i86pc

I've started a scrub after about 2 weeks of operation and have a lot of 
checksum errors:

s13:~# zpool status 
  pool: atlashome   
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors  
using 'zpool clear' or replace the device with 'zpool replace'. 
   see: http://www.sun.com/msg/ZFS-8000-9P  
 scrub: resilver in progress for 1h17m, 8.96% done, 13h5m to go 
config: 

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz1  ONLINE   0 0 0
c0t0d0ONLINE   0 0 0
c1t0d0ONLINE   0 0 0
c5t0d0ONLINE   0 0 0
c7t0d0ONLINE   0 0 0
c8t0d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c0t1d0ONLINE   0 0 0
c1t1d0ONLINE   0 0 0
c5t1d0ONLINE   0 0 0
c6t1d0ONLINE   0 0 6
c7t1d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c8t1d0ONLINE   0 0 0
c0t2d0ONLINE   0 0 0
c1t2d0ONLINE   0 0 0
c5t2d0ONLINE   0 0 2
c6t2d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c7t2d0ONLINE   0 0 0
c8t2d0ONLINE   0 0 0
c0t3d0ONLINE   0 0 0
c1t3d0ONLINE   0 0 0
c5t3d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c6t3d0ONLINE   0 0 0
c7t3d0ONLINE   0 0 0
c8t3d0ONLINE   0 0 0
c0t4d0ONLINE   0 0 0
c1t4d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t4d0ONLINE   0 0 0
c7t4d0ONLINE   0 0 0
c8t4d0ONLINE   0 0 0
c0t5d0ONLINE   0 0 1
c1t5d0ONLINE   0 0 0
  raidz1  ONLINE   0 0 0
c5t5d0ONLINE   0 0 0
c6t5d0ONLINE   0 0 0
c7t5d0ONLINE   0 0 0
c8t5d0ONLINE   0 0 1
c0t6d0ONLINE   0 0 0
  raidz1  DEGRADED 0 0 0
spare DEGRADED 0 0 0
  c1t6d0  DEGRADED 6 017  too many errors
  c8t7d0  ONLINE   0 0 0  11.8G resilvered
c5t6d0ONLINE   0 0 0
c6t6d0ONLINE   0 0 0
c7t6d0ONLINE   0 0 1
c8t6d0ONLINE   0 0 1
  raidz1  ONLINE   0 0 0
c0t7d0ONLINE   0 0 0
c1t7d0ONLINE   0 0 1
c5t7d0ONLINE   0 0 0
c6t7d0ONLINE   0 0 0
c7t7d0ONLINE   0 0 0
logs
  c6t4d0  ONLINE   0 0 0
spares
  c8t7d0  INUSE currently in use


So far, it seems that the pool survived it, but I'm a bit worried how to trace 
down the problem of this.

Any suggestion how to proceed?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to discover disks?

2009-07-06 Thread Carsten Aulbert
Hi

Hua-Ying Ling wrote:
 How do I discover the disk name to use for zfs commands such as:
 c3d0s0?  I tried using format command but it only gave me the first 4
 letters: c3d1.  Also why do some command accept only 4 letter disk
 names and others require 6 letters?


Usually i find

cfgadm -a

helpful enough for that (mayby adding '|grep disk' to it).

Why sometimes 4 and sometimes 6 characters:

c3d1 - this would be disk#1 on controller#3
c3d0s0 - this would be slice #0 (partition) on disk #0 on controller #3

Usually there is a also t0 there, e.g.:

cfgadm -a|grep disk |head
sata0/0::dsk/c0t0d0disk connectedconfigured   ok
sata0/1::dsk/c0t1d0disk connectedconfigured   ok
sata0/2::dsk/c0t2d0disk connectedconfigured   ok
sata0/3::dsk/c0t3d0disk connectedconfigured   ok
sata0/4::dsk/c0t4d0disk connectedconfigured   ok
sata0/5::dsk/c0t5d0disk connectedconfigured   ok
sata0/6::dsk/c0t6d0disk connectedconfigured   ok
sata0/7::dsk/c0t7d0disk connectedconfigured   ok
sata1/0::dsk/c1t0d0disk connectedconfigured   ok
sata1/1::dsk/c1t1d0disk connectedconfigured   ok


HTH

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi

a small addendum. It seems that all sub ZFS below /atlashome/BACKUP are
already mounted when /atlashome/BACKUP is tried to be mounted:

# zfs get all atlashome/BACKUP|head -15
NAME  PROPERTY   VALUE  SOURCE
atlashome/BACKUP  type   filesystem -
atlashome/BACKUP  creation   Thu Oct  9 16:30 2008  -
atlashome/BACKUP  used   9.95T  -
atlashome/BACKUP  available  1.78T  -
atlashome/BACKUP  referenced 172K   -
atlashome/BACKUP  compressratio  1.47x  -
atlashome/BACKUP  mountedno -
atlashome/BACKUP  quota  none   default
atlashome/BACKUP  reservationnone   default
atlashome/BACKUP  recordsize 32Kinherited from
atlashome
atlashome/BACKUP  mountpoint /atlashome/BACKUP  default
atlashome/BACKUP  sharenfs   on inherited from
atlashome
atlashome/BACKUP  checksum   on default
atlashome/BACKUP  compressionon local

while
# ls -l /atlashome/BACKUP | wc -l
  33


Is there any way to force zpool import to re-order that? I could delete
all stuff under BACKUP, however given the size I don't really want to.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi

Mark J Musante wrote:
 
 Do a zpool export first, and then check to see what's in /atlashome.  My
 bet is that the BACKUP directory is still there.  If so, do an rmdir on
 /atlashome/BACKUP and then try the import again.

Sorry, I meant to copy this earlier:

s11 console login: root
Password:
Last login: Mon Jun 29 10:37:47 on console
Sun Microsystems Inc.   SunOS 5.10  Generic January 2005
s11:~# zpool export atlashome
s11:~# ls -l /atlashome
/atlashome: No such file or directory
s11:~# zpool import atlashome
cannot mount '/atlashome/BACKUP': directory is not empty
s11:~# ls -l /atlashome/BACKUP/|wc -l
  33
s11:~#

Thus you see that probably zpool import does the wrong thing(TM) (or
wrong order)

Any idea?

Cheers

Carsten

PS: I opened a case for that, but waited for the call back. When solving
this problem, I can post the case ID for further reference.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool import: Cannot mount,

2009-06-29 Thread Carsten Aulbert
Hi Mark,

Mark J Musante wrote:
 
 OK, looks like you're running into CR 6827199.
 
 There's a workaround for that as well.  After the zpool import, manually
 zfs umount all the datasets under /atlashome/BACKUP.  Once you've done
 that, the BACKUP directory will still be there.  Manually mount the
 dataset that corresponds to /atlashome/BACKUP, and then try 'zfs mount -a'.

I did that (needed to rmdir the directories under BACKUP) and then
finally it worked - and the best even after a reboot it was able to
mount all file systems again.

Great and a lot of thanks!

One question:

Where can I find more about CR 6827199? I logged into sun.com with my
service contract enabled log-in but I cannot find it there (or the
search function does not like me too much).

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How do I mirror zfs rpool, x4500?

2009-03-18 Thread Carsten Aulbert
Hi Tim,

Tim wrote:
 
 How does any of that affect an x4500 with onboard controllers that can't
 ever be moved?

Well, consider one box being installed from CD (external USB-CD) and
another one which is jumpstarted via the network. The results usually
are two different boot device names :(

Q: Is there an easy way to reset this without breaking everything?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Hi all,

I was just reading
http://blogs.sun.com/dap/entry/zfs_compression

and would like to know what the experience of people is about enabling
compression in ZFS.

In principle I don't think it's a bad thing, especially not when the
CPUs are fast enough to improve the performance as the hard drives might
be too slow. However, I'm missing two aspects:

o what happens when a user opens the file and does a lot of seeking
inside the file? For example our scientists use a data format where
quite compressible data is contained in stretches and the file header
contains a dictionary where each stretch of data starts. If these files
are compressed on disk, what will happen with ZFS? Will it just make
educated guesses, or does it have to read all of the typically 30-150 MB
of the file and then does the seeking from buffer caches?

o Another problem I see (but probably isn't): A user is accessing a file
via a NFS-exported ZFS, appending a line of text, closing the file (and
hopefully also flushing everything correctly. However, then the user
opens it again appends another line of text, ... Imagine this happening
a few times per second. How will ZFS react to this pattern? Will it only
opens the final record of the file, uncompress it, adds data,
recompresses it, flushes it to disk and reports that back to the user's
processes? Is there a potential problem here?

Cheers (and sorry if these questions are stupid ones)

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Hi Richard,

Richard Elling wrote:
 
 Files are not compressed in ZFS.  Blocks are compressed.

Sorry, yes, I was not specific enough.

 
 If the compression of the blocks cannot gain more than 12.5% space savings,
 then the block will not be compressed.  If your file contains
 compressable parts
 and uncompressable parts, then (depending on the size/blocks) it may be
 partially compressed.


I guess the block size is related (or equal) to the record size set for
this file system, right?

What will happen then if I have a file which contains a header which
fits into 1 or 2 blocks, and is followed by stretches of data which are
say 500kB each (for simplicity) which could be visualized as sitting in
a rectangle with M rows and N columns. Since the file system has no way
of knowing details on the file, it will cut the file into blocks and
store it compressed or uncompressed as you have written. However, what
happens if the typical usage pattern is read only columns of the
rectangle, i.e. read the header, seek to the start of stretch #1, then
seeking to stretch #N+1, ...

Can ZFS make educated guesses where the seek targets might be or will it
read the file block by block until it reaches the target position, in
the latter case it might be quite inefficient if the file is huge and
has a large variance in compressibility.

 
 The file will be cached in RAM. When the file is closed and synced, the
 data
 will be written to the ZIL and ultimately to the data set.  I don't
 think there
 is a fundamental problem here... you should notice the NFS sync behaviour
 whether the backing store is ZFS or some other file system. Using a slog
 or nonvolatile write cache will help performance for such workloads.


Thanks, that's answer I was hoping for :)

 They are good questions :-)

Good :)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] seeking in ZFS when data is compressed

2009-03-16 Thread Carsten Aulbert
Darren, Richard,

thanks a lot for the very good answers. Regarding the seeking I was
probably mislead by the believe that the block size was like an
impenetrable block where as much data as possible is being squeezed into
(like .Z files would be if you first compressed and then cut the data
into blocks).

Thanks a lot!

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Carsten Aulbert
Hi,

i've followed this thread a bit and I think there are some correct
points on any side of the discussion, but here I see a misconception (at
least I think it is):

D. Eckert schrieb:
 (..)
 Dave made a mistake pulling out the drives with out exporting them first.
 For sure also UFS/XFS/EXT4/.. doesn't like that kind of operations but only 
 with ZFS you risk to loose ALL your data.
 that's the point!
 (...)
 
 I did that many times after performing the umount cmd with ufs/reiserfs 
 filesystems on USB external drives. And they never complainted or got 
 corrupted.

This of ZFS as an entity which cannot live without the underlying ZPOOL.
You can have reiserfs, jfs, ext?, xfs - you name it - on any logical
device as it will only live on this one and when you umount it, it's
safe to power it off, yank the disk out whatever since there is now
other layer between the file system and the logical disk partition/slice/...

However, as soon as you add another layer (say RAID which in this
analogy is somehow the ZPOOL) you might also lose data when you have a
RAID0 setup and umount reiserfs/ufs/whatever and take a disc out of the
RAID and destroy it or change a few sectors on it. When you then mount
the file system again, it's utterly broken and lost. Or - which might be
worse - you might end up with a silent data corruption you will never
notice unless you try to open the data block which is damaged.

However, in your case you have some checksum error in the file system on
a single hard disk which might have been caused by some accident. ZFS is
good in the respect that it can tell you that somethings broken, but
without a mirror or parity device it won't be able to fix the data out
of thin air.

I cannot claim to fully understand what happened to your devices, so
please take my written stuff with a grain of salt.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Introducing zilstat

2009-02-04 Thread Carsten Aulbert
Hi Richard,

Richard Elling schrieb:
 
 Yes.  I've got a few more columns in mind, too.  Does anyone still use
 a VT100? :-)

Only when using ILOM ;)

(anyone using 72 char/line MUA, sorry to them, the following lines are longer):

Thanks for the great tool, it showed something very interesting yesterday:

s06: TIME   N-MBytes N-MBytes/s N-Max-Rate   B-MBytes 
B-MBytes/s B-Max-Rate
s06: 2009 Feb  4 14:37:11  5  0  0 10  
0  1
s06: 2009 Feb  4 14:37:26  6  0  1 12  
0  1
s06: 2009 Feb  4 14:37:41  4  0  0 10  
0  1
s06: 2009 Feb  4 14:37:56  5  0  1 11  
0  1
s06: 2009 Feb  4 14:38:11  6  0  1 11  
0  2
s06: 2009 Feb  4 14:38:26  7  0  1 13  
0  2
s06: 2009 Feb  4 14:38:41 10  0  2 17  
1  3
s06: 2009 Feb  4 14:38:56  4  0  0  9  
0  1
s06: 2009 Feb  4 14:39:11  5  0  1 11  
0  1
s06: 2009 Feb  4 14:39:26  7  0  0 13  
0  1
s06: 2009 Feb  4 14:39:41  7  0  2 13  
0  3
s06: 2009 Feb  4 14:39:56  6  0  1 11  
0  2
s06: 2009 Feb  4 14:40:11  6  0  1 12  
0  1
s06: 2009 Feb  4 14:40:26  6  0  0 13  
0  1
s06: 2009 Feb  4 14:40:41  5  0  0 10  
0  1
s06: 2009 Feb  4 14:40:56  6  0  1 12  
0  1
s06: 2009 Feb  4 14:41:11  4  0  0  9  
0  1
[..]
so far, the box was almost idle, a little bit later:
s06: 2009 Feb  4 14:53:41  2  0  0  5  
0  0
s06: 2009 Feb  4 14:53:56  1  0  0  3  
0  0
s06: 2009 Feb  4 14:54:11  1  0  0  4  
0  0
s06: 2009 Feb  4 14:54:26  1  0  0  3  
0  0
s06: 2009 Feb  4 14:54:41  2  0  0  5  
0  0
s06: 2009 Feb  4 14:54:56604 40171702 
46198
s06: 2009 Feb  4 14:55:11816 54130939 
62154
s06: 2009 Feb  4 14:55:26  2  0  0  4  
0  0
s06: 2009 Feb  4 14:55:41  2  0  0  4  
0  0
s06: 2009 Feb  4 14:55:56  1  0  0  3  
0  0
s06: 2009 Feb  4 14:56:11  3  0  0  6  
0  1
s06: 2009 Feb  4 14:56:26  1  0  0  3  
0  0
[...]
s06: 2009 Feb  4 16:13:11  1  0  0  3  
0  0
s06: 2009 Feb  4 16:13:26  2  0  0  5  
0  0
s06: 2009 Feb  4 16:13:41389 25 97477 
31119
s06: 2009 Feb  4 16:13:56505 33193599 
39218
s06: 2009 Feb  4 16:14:11  2  0  0  4  
0  0
s06: 2009 Feb  4 16:14:26  3  0  0  5  
0  1
s06: 2009 Feb  4 16:14:41  1  0  0  3  
0  0
s06: 2009 Feb  4 16:14:56  2  0  0  6  
0  1
s06: 2009 Feb  4 16:15:11  4  0  2 10  
0  4
s06: 2009 Feb  4 16:15:26  0  0  0  1  
0  0
s06: 2009 Feb  4 16:15:41128  8 94168 
11123
s06: 2009 Feb  4 16:15:56   1081 72212   1305 
87279
s06: 2009 Feb  4 16:16:11262 17 99317 
21122
s06: 2009 Feb  4 16:16:26  0  0  0  0  
0  0

just showing a few bursts...

Given that this is the output of 'zilstat.ksh  -M -t 15' I guess we should 
really look into 
a fast device for it, right?

Do you have any hint, which numbers are reasonable on a X4500 and which are 
approaching 
serious problems?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Expert hint for replacing 3.5 SATA drive in X4500 with SSD for ZIL

2009-02-02 Thread Carsten Aulbert
Hi all,

We would like to replace one of our 3.5 inch SATA drives of our Thumpers
with a SSD device (and put the ZIL on this device). We are currently
looking into this with in a bit more detail and would like to ask for
input if people already have experience with single vs. multi cell SSDs,
read- and write optimized devices (if these really exist) and so on.

If possible I would like this discussion to take place on list, but if
people want to suggest brand names/model numbers I'll be happy to accept
them off-list as well.

Thanks a lot in advance

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Expert hint for replacing 3.5 SATA drive in X4500 with SSD for ZIL

2009-02-02 Thread Carsten Aulbert
Just a brief addendum

Something like this (or a fully DRAM based device if available in 3.5
inch FF) might also be interesting to test,

http://www.platinumhdd.com/

any thoughts?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] checksum errors on Sun Fire X4500

2009-01-22 Thread Carsten Aulbert
Hi Jay,

Jay Anderson schrieb:
 I have b105 running on a Sun Fire X4500, and I am constantly seeing checksum 
 errors reported by zpool status. The errors are showing up over time on every 
 disk in the pool. In normal operation there might be errors on two or three 
 disks each day, and sometimes there are enough errors so it reports too many 
 errors, and the disk goes into a degraded state. I have had to remove the 
 spares from the pool because otherwise the spares get pulled into the pool to 
 replace the drives. There are no reported hardware problems with any of the 
 drives. I have run scrub multiple times, and this also generates checksum 
 errors. After the scrub completes the checksums continue to occur during 
 normal operation.
 
 This problem also occurred with b103. Before that Solaris 10u4 was installed 
 on the server, and it never had any checksum errors. With the OpenSolaris 
 builds I am running CIFS Server, and that's the only difference in server 
 function from when Solaris 10u4 was installed on it.
 
 Is this a known issue? Any suggestions or workarounds?

We had something similar two or three disk slots which started to act
weird and failed quite often - usually starting with a high error rate.
After exchanging two hard drives, the Sun hotline initiated to exchange
the backplane - essentially the chassis was replaced.

Since then, we have not encountered anything like this anymore.

So it *might* be the backplane or a broken Marvell controller, but it's
hard to judge.

HTH

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hung when import zpool

2009-01-08 Thread Carsten Aulbert
Hi

Qin Ming Hua wrote:
 bash-3.00# zpool import mypool
 ^C^C
 
 it hung when i try to re-import the zpool, has anyone  see this before?
 

How long did you wait?

Once a zfs import took 1-2 hours to complete (it was seemingly stuck at
a ~30 GB filesystem which it needed to do some work on).

Cheer

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Benchmarking ZFS via NFS

2009-01-08 Thread Carsten Aulbert
Hi all,

among many other things I recently restarted benchmarking ZFS over NFS3
performance between X4500 (host) and Linux clients. I've just iozone
quite a while ago and am still a bit at a loss understanding the
results. The automatic mode is pretty ok (and generates nice 3D plots
for the people higher up the ladder), but someone gave a hint to use
multiple threads for testing the ops/s and here I'm a bit at a loss how
to understand the results and if the values are reasonable or not.

Here is the current example - can anyone with deeper knowledge tell me
if these are reasonable values to start with?

Thanks a lot

Carsten

Iozone: Performance Test of File I/O
Version $Revision: 3.315 $
Compiled for 64 bit mode.
Build: linux-AMD64

Contributors:William Norcott, Don Capps, Isom Crawford, Kirby
Collins
 Al Slater, Scott Rhine, Mike Wisner, Ken Goss
 Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
 Randy Dunlap, Mark Montague, Dan Million, Gavin
Brebner,
 Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy,
 Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root.

Run began: Wed Jan  7 09:31:49 2009

Multi_buffer. Work area 16777216 bytes
OPS Mode. Output is in operations per second.
Record Size 8 KB
SYNC Mode.
File size set to 4194304 KB
Command line used: ../iozone3_315/src/current/iozone -m -t 8 -T
-O -r 8k -o -s 4G iozone
Time Resolution = 0.01 seconds.
Processor cache size set to 1024 Kbytes.
Processor cache line size set to 32 bytes.
File stride size set to 17 * record size.
Throughput test with 8 threads
Each thread writes a 4194304 Kbyte file in 8 Kbyte records

Children see throughput for  8 initial writers  =4925.20 ops/sec
Parent sees throughput for  8 initial writers   =4924.65 ops/sec
Min throughput per thread   = 615.61 ops/sec
Max throughput per thread   = 615.69 ops/sec
Avg throughput per thread   = 615.65 ops/sec
Min xfer=  524219.00 ops

Children see throughput for  8 rewriters=4208.45 ops/sec
Parent sees throughput for  8 rewriters =4208.42 ops/sec
Min throughput per thread   = 525.88 ops/sec
Max throughput per thread   = 526.22 ops/sec
Avg throughput per thread   = 526.06 ops/sec
Min xfer=  523944.00 ops

Children see throughput for  8 readers  =   11986.99 ops/sec
Parent sees throughput for  8 readers   =   11986.46 ops/sec
Min throughput per thread   =1481.13 ops/sec
Max throughput per thread   =1512.71 ops/sec
Avg throughput per thread   =1498.37 ops/sec
Min xfer=  513361.00 ops

Children see throughput for 8 re-readers=   12017.70 ops/sec
Parent sees throughput for 8 re-readers =   12017.22 ops/sec
Min throughput per thread   =1486.72 ops/sec
Max throughput per thread   =1520.35 ops/sec
Avg throughput per thread   =1502.21 ops/sec
Min xfer=  512761.00 ops

Children see throughput for 8 reverse readers   =   25741.62 ops/sec
Parent sees throughput for 8 reverse readers=   25735.91 ops/sec
Min throughput per thread   =3141.50 ops/sec
Max throughput per thread   =3282.11 ops/sec
Avg throughput per thread   =3217.70 ops/sec
Min xfer=  501956.00 ops

Children see throughput for 8 stride readers=1434.73 ops/sec
Parent sees throughput for 8 stride readers =1434.71 ops/sec
Min throughput per thread   = 122.51 ops/sec
Max throughput per thread   = 297.87 ops/sec
Avg throughput per thread   = 179.34 ops/sec
Min xfer=  215638.00 ops

Children see throughput for 8 random readers= 529.83 ops/sec
Parent sees throughput for 8 random readers = 529.83 ops/sec
Min throughput per thread   =  55.63 ops/sec
Max throughput per thread   = 101.03 ops/sec
Avg throughput per thread   =  

Re: [zfs-discuss] Benchmarking ZFS via NFS

2009-01-08 Thread Carsten Aulbert
Hi Bob.

Bob Friesenhahn wrote:
 Here is the current example - can anyone with deeper knowledge tell me
 if these are reasonable values to start with?
 
 Everything depends on what you are planning do with your NFS access. For
 example, the default blocksize for zfs is 128K.  My example tests
 performance when doing I/O with small 8K blocks (like a database), which
 will severely penalize zfs configured for 128K blocks.
 [...]

My plans don't count in here, I need to optimize what the users want and
they don't have a clue what they will do in 6 months from now, so I
guess all detailed planning will fail anyway and I'm just searching for
the one size fits almost all...

 
 My experience with iozone is that it refuses to run on an NFS client of
 a Solaris server using ZFS since it performs a test and then refuses to
 work since it says that the filesystem is not implemented correctly. 
 Commenting a line of code in iozone will get over this hurdle.  This
 seems to be a religious issue with the iozone maintainer.

Interesting, I've been running this on a Linux client accessing a ZFS
file system from one of our Thumpers without any source modifications
and problems.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 'zfs recv' is very slow

2009-01-06 Thread Carsten Aulbert
Hi,

Brent Jones wrote:
 
 Using mbuffer can speed it up dramatically, but this seems like a hack
 without addressing a real problem with zfs send/recv.
 Trying to send any meaningful sized snapshots from say an X4540 takes
 up to 24 hours, for as little as 300GB changerate.

I have not found a solution yet also. But it seems to depend highly on
the distribution of file sizes, number of files per directory or
whatever. The last tests I made showed still more than 50 hours for 700
GB and ~45 hours for 5 TB (both tests were null tests where zfs send
wrote to /dev/null).

Cheers from a still puzzled Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS send fails incremental snapshot

2009-01-04 Thread Carsten Aulbert
Hi Brent,

Brent Jones wrote:
 I am using 2008.11 with the Timeslider automatic snapshots, and using
 it to automatically send snapshots to a remote host every 15 minutes.
 Both sides are X4540's, with the remote filesystem mounted read-only
 as I read earlier that would cause problems.
 The snapshots send fine for several days, I accumulate many snapshots
 at regular intervals, and they are sent without any problems.
 Then I will get the dreaded:
 
 cannot receive incremental stream: most recent snapshot of pdxfilu02
 does not match incremental source
 
 

Which command line are you using?

Maybe you need to do a rollback first (zfs receive -F)?

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2009-01-01 Thread Carsten Aulbert
Hi Marc (and all the others),

Marc Bevand wrote:

 So Carsten: Mattias is right, you did not simulate a silent data corruption 
 error. hdparm --make-bad-sector just introduces a regular media error that 
 *any* RAID level can detect and fix.

OK, I'll need to go back to our tests performed months ago, but my
feeling is now that we didn't it right in the first place. Will take
some time to retest that.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-30 Thread Carsten Aulbert
Hi Marc,

Marc Bevand wrote:
 Carsten Aulbert carsten.aulbert at aei.mpg.de writes:
 In RAID6 you have redundant parity, thus the controller can find out
 if the parity was correct or not. At least I think that to be true
 for Areca controllers :)
 
 Are you sure about that ? The latest research I know of [1] says that 
 although an algorithm does exist to theoretically recover from
 single-disk corruption in the case of RAID-6, it is *not* possible to
 detect dual-disk corruption with 100% certainty. And blindly running
 the said algorithm in such a case would even introduce corruption on a
 third disk.


Well, I probably need to wade through the paper (and recall Galois field
theory) before answering this. We did a few tests in a 16 disk RAID6
where we wrote data to the RAID, powered the system down, pulled out one
disk, inserted it into another computer and changed the sector checksum
of a few sectors (using hdparm's utility makebadsector). The we
reinserted this into the original box, powered it up and ran a volume
check and the controller did indeed find the corrupted sector and
repaired the correct one without destroying data on another disk (as far
as we know and tested).

For the other point: dual-disk corruption can (to my understanding)
never be healed by the controller since there is no redundant
information available to check against. I don't recall if we performed
some tests on that part as well, but maybe we should do that to learn
how the controller will behave. As a matter of fact at that point it
should just start crying out loud and tell me, that it cannot recover
for that. But the chance of this happening should be relatively small
unless the backplane/controller had a bad hiccup when writing that stripe.

 This is the reason why, AFAIK, no RAID-6 implementation actually
 attempts to recover from single-disk corruption (someone correct me if
 I am wrong).
 

As I said I know that our Areca 1261ML does detect and correct those
errors - if these are single-disk corruptions

 The exception is ZFS of course, but it accomplishes single and
 dual-disk corruption self-healing by using its own checksum, which is
 one layer above RAID-6 (therefore unrelated to it).

Yes, very helpful and definitely desirable to have :)
 
 [1] http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf

Thanks for the pointer

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi all,

Bob Friesenhahn wrote:
 My understanding is that ordinary HW raid does not check data 
 correctness.  If the hardware reports failure to successfully read a 
 block, then a simple algorithm is used to (hopefully) re-create the 
 lost data based on data from other disks.  The difference here is that 
 ZFS does check the data correctness (at the CPU) for each read while 
 HW raid depends on the hardware detecting a problem, and even if the 
 data is ok when read from disk, it may be corrupted by the time it 
 makes it to the CPU.

AFAIK this is not done during the normal operation (unless a disk asked
for a sector cannot get this sector).

 
 ZFS's scrub algorithm forces all of the written data to be read, with 
 validation against the stored checksum.  If a problem is found, then 
 an attempt to correct is made from redundant storage using traditional 
 RAID methods.

That's exactly what volume checking for standard HW controllers does as
well. Read all data and compare it with parity.

This is exactly the point why RAID6 should always be chosen over RAID5,
because in the event of a wrong parity check and RAID5 the controller
can only say, oops, I have found a problem but cannot correct it - since
it does not know if the parity is correct or any of the n data bits. In
RAID6 you have redundant parity, thus the controller can find out if the
parity was correct or not. At least I think that to be true for Areca
controllers :)

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vs HardWare raid - data integrity?

2008-12-28 Thread Carsten Aulbert
Hi Bob,

Bob Friesenhahn wrote:

 AFAIK this is not done during the normal operation (unless a disk asked
 for a sector cannot get this sector).
 
 ZFS checksum validates all returned data.  Are you saying that this fact
 is incorrect?
 

No sorry, too long in front of a computer today I guess: I was referring
to hardware RAID controllers, AFAIK these usually do not check the
validity of data unless a disc returns an error. My knowledge regarding
ZFS is exactly that, that data is checked in the CPU against the stored
checksum.

 That's exactly what volume checking for standard HW controllers does as
 well. Read all data and compare it with parity.
 
 What if the data was corrupted prior to parity generation?
 

Well, that is bad luck, same is true if your ZFS box has faulty memory
and the computed checksum is right for the data on disk, but wrong in
the sense of the file under consideration.

Sorry for the confusion

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMART data

2008-12-21 Thread Carsten Aulbert


Mam Ruoc wrote:
 Carsten wrote:
 I will ask my boss about this (since he is the one
 mentioned in the
 copyright line of smartctl ;)), please stay tuned.
 
 How is this going? I'm very interested too... 

Not much happening right now, December meetings, holiday season, ...

But thanks for pinging me - I tend to forget such things.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SMART data

2008-12-08 Thread Carsten Aulbert
Hi all,

Miles Nordin wrote:
 rl == Rob Logan [EMAIL PROTECTED] writes:
 
 rl the sata framework uses the sd driver so its:
 
 yes but this is a really tiny and basically useless amount of output
 compared to what smartctl gives on Linux with SATA disks, where SATA
 disks also use the sd driver (the same driver Linux uses for SCSI
 disks).
 
 In particular, the reallocated sector count and raw read error rates
 are missing, as is the very useful offline self test interface and the
 sometimes useful last-5-errors log.
 

I will ask my boss about this (since he is the one mentioned in the
copyright line of smartctl ;)), please stay tuned.

Cheers
Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-03 Thread Carsten Aulbert
Ross wrote:
 Aha, found it!  It was this thread, also started by Carsten :)
 http://www.opensolaris.org/jive/thread.jspa?threadID=78921tstart=45

Did I? Darn, I need to get a brain upgrade.

But yes, there it was mainly focused on zfs send/receive being slow -
but maybe these are also linked.

What I will try today/this week:

Put some stress on the system with bonnie and other tools and try to
find slow disks and see if this could be the main problem but also look
into more vdevs and then possible move to raidz to somehow compensate
for lost disk space. Since we have 4 cold spares on the shelf plus a SMS
warnings on disk failures (that is if fma catches them) the risk
involved should be tolerable.

More later.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-03 Thread Carsten Aulbert
Carsten Aulbert wrote:

 Put some stress on the system with bonnie and other tools and try to
 find slow disks and see if this could be the main problem but also look
 into more vdevs and then possible move to raidz to somehow compensate
 for lost disk space. Since we have 4 cold spares on the shelf plus a SMS
 warnings on disk failures (that is if fma catches them) the risk
 involved should be tolerable.

First result with bonnie during the writing intelligently... phase I
see this in a 2 minute average:

zpool iostats:

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   1.70T  19.2T225  1.49K   342K   107M
  raidz2 550G  6.28T 74409   114K  32.6M
c0t0d0  -  -  0314  32.3K  2.51M
c1t0d0  -  -  0315  31.8K  2.52M
c4t0d0  -  -  0313  31.3K  2.52M
c6t0d0  -  -  0315  32.3K  2.51M
c7t0d0  -  -  0326  32.8K  2.50M
c0t1d0  -  -  0309  33.9K  2.52M
c1t1d0  -  -  0313  33.4K  2.51M
c4t1d0  -  -  0314  33.4K  2.52M
c5t1d0  -  -  0308  32.8K  2.52M
c6t1d0  -  -  0314  31.3K  2.51M
c7t1d0  -  -  0311  31.8K  2.52M
c0t2d0  -  -  0309  31.8K  2.52M
c1t2d0  -  -  0313  31.8K  2.51M
c4t2d0  -  -  0315  31.8K  2.52M
c5t2d0  -  -  0307  32.8K  2.52M
  raidz2 567G  6.26T 64529  96.5K  36.3M
c6t2d0  -  -  1368  74.2K  2.79M
c7t2d0  -  -  1366  74.2K  2.80M
c0t3d0  -  -  1364  75.8K  2.80M
c1t3d0  -  -  1365  75.2K  2.80M
c4t3d0  -  -  1368  76.8K  2.80M
c5t3d0  -  -  1362  76.3K  2.80M
c6t3d0  -  -  1366  77.9K  2.80M
c7t3d0  -  -  1365  76.8K  2.80M
c0t4d0  -  -  1361  76.8K  2.80M
c1t4d0  -  -  1363  75.8K  2.80M
c4t4d0  -  -  1366  76.3K  2.80M
c6t4d0  -  -  1364  78.4K  2.80M
c7t4d0  -  -  1370  78.9K  2.79M
c0t5d0  -  -  1365  77.3K  2.80M
c1t5d0  -  -  1364  74.7K  2.80M
  raidz2 620G  6.64T 86582   131K  37.9M
c4t5d0  -  - 18382  1.16M  2.74M
c5t5d0  -  - 10380   674K  2.74M
c6t5d0  -  - 18378  1.15M  2.73M
c7t5d0  -  -  9384   628K  2.74M
c0t6d0  -  - 18377  1.16M  2.74M
c1t6d0  -  - 10383   680K  2.75M
c4t6d0  -  - 19379  1.21M  2.73M
c5t6d0  -  - 10383   691K  2.75M
c6t6d0  -  - 19379  1.21M  2.73M
c7t6d0  -  - 10383   676K  2.72M
c0t7d0  -  - 18374  1.19M  2.75M
c1t7d0  -  - 10381   676K  2.74M
c4t7d0  -  - 19380  1.22M  2.74M
c5t7d0  -  - 10382   696K  2.74M
c6t7d0  -  - 18381  1.17M  2.74M
c7t7d0  -  -  9386   631K  2.75M
--  -  -  -  -  -  -

iostat -Mnx 120:
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c2t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c3t0d0
0.01.40.00.0  0.0  0.01.50.4   0   0 c5t0d0
0.6  351.50.02.6  0.4  0.11.20.2   3   8 c7t0d0
0.6  336.30.02.6  0.1  0.10.40.2   3   7 c0t0d0
0.6  340.80.02.6  0.2  0.10.60.2   3   7 c1t0d0
0.6  330.60.02.6  0.1  0.10.30.2   3   7 c5t1d0
0.6  336.70.02.6  0.1  0.10.30.2   3   7 c4t0d0
0.6  331.80.02.6  0.1  0.10.30.2   3   7 c0t1d0
0.6  339.00.02.6  0.4  0.11.10.2   3   7 c7t1d0
0.6  335.40.02.6  0.1  0.10.40.2   3   7 c1t1d0
0.6  329.20.02.6  0.1  0.10.30.2   3   7 c5t2d0
0.6  343.70.02.6  0.3  0.10.70.2   3   7 c4t1d0
0.6  331.80.02.6  0.1  0.10.30.2   2   7 c0t2d0
1.2  396.30.12.9  0.3  0.10.70.2   4   8 c7t2d0
0.6  336.70.02.6  0.1  0.10.40.2   3   7 c1t2d0
0.6  341.90.02.6  0.2  0.10.70.2   3   7 c4t2d0
1.3  390.70.12.9  0.3  0.10.80.2   4   9 c5t3d0
1.3  396.70.12.9  0.3  0.10.80.2   4   9 c7t3d0
1.3  393.60.12.9  0.2  0.10.60.2   4   9 c0t3d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c5t4d0
1.3  396.20.12.9  0.2  0.10.5

[zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Hi all,

We are running pretty large vdevs since the initial testing showed that
our setup was not too much off the optimum. However, under real world
load we do see quite some weird behaviour:

The system itself is a X4500 with 500 GB drives and right now the system
seems to be under heavy load, e.g. ls takes minutes to return on only a
few hundred entries, top shows 10% kernel, rest idle.

zpool ioststat -v atlashome 60 shows (not the first output):

  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   2.11T  18.8T  2.29K 36  71.7M   138K
  raidz2 466G  6.36T493 11  14.9M  34.1K
c0t0d0  -  - 48  5  1.81M  3.52K
c1t0d0  -  - 48  5  1.81M  3.46K
c4t0d0  -  - 48  5  1.81M  3.27K
c6t0d0  -  - 48  5  1.81M  3.40K
c7t0d0  -  - 47  5  1.81M  3.40K
c0t1d0  -  - 47  5  1.81M  3.20K
c1t1d0  -  - 47  6  1.81M  3.59K
c4t1d0  -  - 47  6  1.81M  3.53K
c5t1d0  -  - 47  5  1.81M  3.33K
c6t1d0  -  - 48  6  1.81M  3.67K
c7t1d0  -  - 48  6  1.81M  3.66K
c0t2d0  -  - 48  5  1.82M  3.42K
c1t2d0  -  - 48  6  1.81M  3.56K
c4t2d0  -  - 48  6  1.81M  3.54K
c5t2d0  -  - 48  5  1.81M  3.41K
  raidz2 732G  6.10T800 12  24.6M  52.3K
c6t2d0  -  -139  5  7.52M  4.54K
c7t2d0  -  -139  5  7.52M  4.81K
c0t3d0  -  -140  5  7.52M  4.98K
c1t3d0  -  -139  5  7.51M  4.47K
c4t3d0  -  -139  5  7.51M  4.82K
c5t3d0  -  -139  5  7.51M  4.99K
c6t3d0  -  -139  5  7.52M  4.44K
c7t3d0  -  -139  5  7.52M  4.78K
c0t4d0  -  -139  5  7.52M  4.97K
c1t4d0  -  -139  5  7.51M  4.60K
c4t4d0  -  -139  5  7.51M  4.86K
c6t4d0  -  -139  5  7.51M  4.99K
c7t4d0  -  -139  5  7.51M  4.52K
c0t5d0  -  -139  5  7.51M  4.78K
c1t5d0  -  -138  5  7.51M  4.94K
  raidz2 960G  6.31T  1.02K 12  32.2M  52.0K
c4t5d0  -  -178  5  9.29M  4.79K
c5t5d0  -  -178  5  9.28M  4.64K
c6t5d0  -  -179  5  9.29M  4.44K
c7t5d0  -  -178  4  9.26M  4.26K
c0t6d0  -  -178  5  9.28M  4.78K
c1t6d0  -  -178  5  9.20M  4.58K
c4t6d0  -  -178  5  9.26M  4.25K
c5t6d0  -  -177  4  9.21M  4.18K
c6t6d0  -  -178  5  9.29M  4.69K
c7t6d0  -  -177  5  9.26M  4.61K
c0t7d0  -  -177  5  9.29M  4.34K
c1t7d0  -  -177  5  9.24M  4.28K
c4t7d0  -  -177  5  9.29M  4.78K
c5t7d0  -  -177  5  9.27M  4.75K
c6t7d0  -  -177  5  9.29M  4.34K
c7t7d0  -  -177  5  9.27M  4.28K
--  -  -  -  -  -  -

Questions:
(a) Why the first vdev does not get an equal share of the load
(b) Why is a large raidz2 so bad? When I use a standard Linux box with
hardware raid6 over 16 disks I usually get more bandwidth and at least
about the same small file performance
(c) Would the use of several smaller vdev would help much? And which
layout would be a good compromise for getting space as well as
performance and reliability? 46 disks have so few prime factors

Thanks a lot

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Hi Miles,

Miles Nordin wrote:
 ca == Carsten Aulbert [EMAIL PROTECTED] writes:
 
 ca (a) Why the first vdev does not get an equal share
 ca of the load
 
 I don't know.  but, if you don't add all the vdev's before writing
 anything, there's no magic to make them balance themselves out.  Stuff
 stays where it's written.  I'm guessing you did add them at the same
 time, and they still filled up unevenly?
 

Yes, they are created all in one go (even on the same command line) and
only then are filled - either naturally over time or via zfs
send/receive (all on Sol10u5). So yes, it seems they fill up unevenly.

 'zpool iostat' that you showed is the place I found to see how data is
 spread among vdev's.
 
 ca  (b) Why is a large raidz2 so bad? When I use a
 ca standard Linux box with hardware raid6 over 16 disks I usually
 ca get more bandwidth and at least about the same small file
 ca performance
 
 obviously there are all kinds of things going on but...the standard
 answer is, traditional RAID5/6 doesn't have to do full stripe I/O.
 ZFS is more like FreeBSD's RAID3: it gets around the NVRAMless-RAID5
 write hole by always writing a full stripe, which means all spindles
 seek together and you get the seek performance of 1 drive (per vdev).
 Linux RAID5/6 just gives up and accepts a write hole, AIUI, but
 because the stripes are much fatter than a filesystem block, you'll
 sometimes get the record you need by seeking a subset of the drives
 rather than all of them, which means the drives you didn't seek have
 the chance to fetch another record.
 
 If you're saying you get worse performance than a single spindle, I'm
 not sure why.


No I think a single disk would be much less performant, however I'm a
bit disappointed by the overall performance of the boxes and just now we
have users where they experience extremely slow performance.

But already thanks for the inside

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Asymmetric zpool load

2008-12-02 Thread Carsten Aulbert
Bob Friesenhahn wrote:
 You may have one or more slow disk drives which slow down the whole
 vdev due to long wait times.  If you can identify those slow disk drives
 and replace them, then overall performance is likely to improve.
 
 The problem is that under severe load, the vdev with the highest backlog
 will be used the least.  One or more slow disks in the vdev will slow
 down the whole vdev.  It takes only one slow disk to slow down the whole
 vdev.

Hmm, since I only started with Solaris this year, is there a way to
identify a slow disk? In principle these should all be identical
Hitachi Deathstar^WDeskstar drives and should only have the standard
deviation during production.
 
 ZFS commits the writes to all involved disks in a raidz2 before
 proceeding with the next write.  With so many disks, you are asking for
 quite a lot of fortuitous luck in that everything must be working
 optimally.  Compounding the problem is that I understand that when the
 stripe width exceeds the number of segmented blocks from the data to be
 written (ZFS is only willing to dice to a certain minimum size), then
 only a subset of the disks will be used, wasting potiential I/O
 bandwidth.  Your stripes are too wide.
 

Ah, ok, that's one of the first reasonable explanation (which I
understand) why large zpools might be bad. So far I was not able to
track that down and only found the standard magic rule not to exceed
10 drives - but our (synthetic) tests had not shown a significant
drawbacks. But I guess we might be bitten by it now.

 (c) Would the use of several smaller vdev would help much? And which
 layout would be a good compromise for getting space as well as
 performance and reliability? 46 disks have so few prime factors
 
 Yes, more vdevs should definitely help quite a lot for dealing with
 real-world muti-user loads.  One raidz/raidz2 vdev provides (at most)
 the IOPs of a single disk.
 
 There is a point of diminishing returns and your layout has gone far
 beyond this limit.

Thanks for the insight, I guess I need to experiment with empty boxes to
get into a better state!

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs snapshot stalled?

2008-10-20 Thread Carsten Aulbert
Hi all,

I've just seen something weird. On a zpool which looks a bit busy right
now (~ 100 read op/s, 100 kB/s) I started a zfs snapshot about an hour
ago. Until now, a taking a snapshot took usually at few seconds at most
even for largish ~TByte file systems. I don't know if the read IOs are
currently related to the snapshot itself or if another user doing this
since I have not looked prior to taking the snapshot.

My remaining questions after searching the web:

(1) Is it common that snapshots can take this long?
(2) Is there a way to stop it if one assumes somethings went wrong? I.e.
is there a special signal I could send it?

Thanks for any hint

Carsten
-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs snapshot stalled?

2008-10-20 Thread Carsten Aulbert
Hi again,

brief update:

the process ended successfully (at least a snapshot was created) after
close to 2 hrs. Since the load is still the same as before taking the
snapshot I blame other users' processes reading from that array for the
long snapshot duration.

Carsten Aulbert wrote:

 My remaining questions after searching the web:
 
 (1) Is it common that snapshots can take this long?
 (2) Is there a way to stop it if one assumes somethings went wrong? I.e.
 is there a special signal I could send it?
 
 Thanks for any hint
 
 Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-18 Thread Carsten Aulbert
Hi

Miles Nordin wrote:
 r == Ross  [EMAIL PROTECTED] writes:
 
  r figures so close to 10MB/s.  All three servers are running
  r full duplex gigabit though
 
 there is one tricky way 100Mbit/s could still bite you, but it's
 probably not happening to you.  It mostly affects home users with
 unmanaged switches:
 
   http://www.smallnetbuilder.com/content/view/30212/54/
   http://virtualthreads.blogspot.com/2006/02/beware-ethernet-flow-control.html
 
 because the big switch vendors all use pause frames safely:
 
  http://www.networkworld.com/netresources/0913flow2.html -- pause frames as 
 interpreted by netgear are harmful

That rings a bell, Ross, are you using NFS via UDP or TCP? May it be
that your network has different performance levels for different
transport types? For our network we have disabled pause frames completey
and rely only on TCP internal mechanisms to prevent flooding/blocking.

Carsten

PS: the job where 25k files sizing up to 800 GB is now done - zfs send
took only 52 hrs and the speed was ~ 4.5 MB/s :(
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Scott,

Scott Williamson wrote:
 You seem to be using dd for write testing. In my testing I noted that
 there was a large difference in write speed between using dd to write
 from /dev/zero and using other files. Writing from /dev/zero always
 seemed to be fast, reaching the maximum of ~200MB/s and using cp which
 would perform poorler the fewer the vdevs.

You are right, the write benchmarks were done with dd just to have some
bulk bulk figures since usually zeros can be generated fast enough.

 
 This also impacted the zfs send speed, as with fewer vdevs in RaidZ2 the
 disks seemed to spend most of their time seeking during the send.
 

That seems a bit too simplistic to me. If you compare raidz with raidz2
it seems that raidz2 is not too bad with fewer vdevs. I wish there was a
way for zfs send to avoid so many seeks. The  1 TB file system is
still being zfs send, now close to 48 hours.

Cheers

Carsten

PS: We still have a spare thumper sitting around, maybe I give it a try
with 5 vdevs
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-16 Thread Carsten Aulbert
Hi Ross

Ross wrote:
 Now though I don't think it's network at all.  The end result from that 
 thread is that we can't see any errors in the network setup, and using 
 nicstat and NFS I can show that the server is capable of 50-60MB/s over the 
 gigabit link.  Nicstat also shows clearly that both zfs send / receive and 
 mbuffer are only sending 1/5 of that amount of data over the network.
 
 I've completely run out of ideas of my own (but I do half expect there's a 
 simple explanation I haven't thought of).  Can anybody think of a reason why 
 both zfs send / receive and mbuffer would be so slow?

Try to separate the two things:

(1) Try /dev/zero - mbuffer --- network --- mbuffer  /dev/null

That should give you wirespeed

(2) Try zfs send | mbuffer  /dev/null

That should give you an idea how fast zfs send really is locally.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi all,

Carsten Aulbert wrote:
 More later.

OK, I'm completely puzzled right now (and sorry for this lengthy email).
 My first (and currently only idea) was that the size of the files is
related to this effect, but that does not seem to be the case:

(1) A 185 GB zfs file system was transferred yesterday with a speed of
about 60 MB/s to two different servers. The histogram of files looks like:

2822 files were investigated, total size is: 185.82 Gbyte

Summary of file sizes [bytes]:
zero:  2
1 - 2 0
2 - 4 1
4 - 8 3
8 - 16   26
16 - 32   8
32 - 64   6
64 - 128 29
128 - 25611
256 - 51213
512 - 1024   17
1024 - 2k33
2k - 4k  45
4k - 8k9044  
8k - 16k 60
16k - 32k41
32k - 64k19
64k - 128k   22
128k - 256k  12
256k - 512k   5
512k - 1024k   1218  **
1024k - 2M16004  *
2M - 4M   46202

4M - 8M   0
8M - 16M  0
16M - 32M 0
32M - 64M 0
64M - 128M0
128M - 256M   0
256M - 512M   0
512M - 1024M  0
1024M - 2G0
2G - 4G   0
4G - 8G   0
8G - 16G  1

(2) Currently a much larger file system is being transferred, the same
script (even the same incarnation, i.e. process) is now running close to
22 hours:

28549 files were investigated, total size is: 646.67 Gbyte

Summary of file sizes [bytes]:
zero:   4954  **
1 - 2 0
2 - 4 0
4 - 8 1
8 - 161
16 - 32   0
32 - 64   0
64 - 128  1
128 - 256 0
256 - 512 9
512 - 1024   71
1024 - 2k 1
2k - 4k1095  **
4k - 8k8449  *
8k - 16k   2217  
16k - 32k   503  ***
32k - 64k 1
64k - 128k1
128k - 256k   1
256k - 512k   0
512k - 1024k  0
1024k - 2M0
2M - 4M   0
4M - 8M  16
8M - 16M  0
16M - 32M 0
32M - 64M 11218

64M - 128M0
128M - 256M   0
256M - 512M   0
512M - 1024M  0
1024M - 2G0
2G - 4G   5
4G - 8G   1
8G - 16G  3
16G - 32G 1


When watching zpool iostat I get this (30 second average, NOT the first
output):

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T137  0  4.28M  0
  raidz2 833G  6.00T  1  0  30.8K  0
c0t0d0  -  -  1  0  2.38K  0
c1t0d0  -  -  1  0  2.18K  0
c4t0d0  -  -  0  0  1.91K  0
c6t0d0  -  -  0  0  1.76K  0
c7t0d0  -  -  0  0  1.77K  0
c0t1d0  -  -  0  0  1.79K  0
c1t1d0  -  -  0  0  1.86K  0
c4t1d0  -  -  0  0  1.97K  0
c5t1d0  -  -  0  0  2.04K  0
c6t1d0  -  -  1  0  2.25K  0
c7t1d0  -  -  1  0  2.31K  0
c0t2d0  -  -  1  0  2.21K  0
c1t2d0  -  -  0  0  1.99K  0
c4t2d0  -  -  0  0  1.99K  0
c5t2d0  -  -  1  0  2.38K  0
  raidz21.29T  5.52T 67  0  2.09M  0
c6t2d0  -  - 58  0   143K  0
c7t2d0  -  - 58  0   141K  0
c0t3d0  -  - 53  0   131K  0
c1t3d0  -  - 53  0   130K  0
c4t3d0  -  - 58  0   143K  0
c5t3d0  -  - 58  0   145K  0
c6t3d0  -  - 59  0   147K  0
c7t3d0  -  - 59  0   146K  0
c0t4d0  -  - 59  0   145K  0
c1t4d0  -  - 58  0   145K  0
c4t4d0  -  - 58  0   145K  0
c6t4d0  -  - 58  0   143K  0
c7t4d0  -  - 58  0   143K  0
c0t5d0  -  - 58  0   145K  0
c1t5d0  -  - 58  0   144K  0
  raidz21.43T  5.82T 69  0  2.16M  0
c4t5d0  -  - 62  0   141K  0
c5t5d0  -  - 60  0   138K  0
c6t5d0  -  - 59  0   135K  0
c7t5d0  -  - 60  0   138K  0
c0t6d0  -  - 62

Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Ross

Ross Smith wrote:
 Thanks, that got it working.  I'm still only getting 10MB/s, so it's not 
 solved my problem - I've still got a bottleneck somewhere, but mbuffer is a 
 huge improvement over standard zfs send / receive.  It makes such a 
 difference when you can actually see what's going on.

I'm currently trying to investigate this a bit. One of our user's home
directories is extremely slow to 'zfs send'. It started yesterday
afternoon at about 1600+0200 and is still running and has only copied
less than 50% of the whole tree:

On the receiving side zfs get tells me:

atlashome/BACKUP/XXX  used   193G   -
atlashome/BACKUP/XXX  available  17.2T  -
atlashome/BACKUP/XXX  referenced 193G   -
atlashome/BACKUP/XXX  compressratio  1.81x  -

So close 350 GB are transferred and about 500 GB to go.

More later.

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi Richard,

Richard Elling wrote:
 Since you are reading, it depends on where the data was written.
 Remember, ZFS dynamic striping != RAID-0.
 I would expect something like this if the pool was expanded at some
 point in time.

No, the RAID was set-up in one go right after jumpstarting the box.

 (2) The disks should be able to perform much much faster than they
 currently output data at, I believe it;s 2008 and not 1995.
   
 
 X4500?  Those disks are good for about 75-80 random iops,
 which seems to be about what they are delivering.  The dtrace
 tool, iopattern, will show the random/sequential nature of the
 workload.


I need to read about his a bit and will try to analyze it.

 (3) The four cores of the X4500 are dying of boredom, i.e. idle 95% all
 the time.

 Has anyone a good idea, where the bottleneck could be? I'm running out
 of ideas.
   
 
 I would suspect the disks.  30 second samples are not very useful
 to try and debug such things -- even 1 second samples can be
 too coarse.  But you should take a look at 1 second samples
 to see if there is a consistent I/O workload.
 -- richard
 

Without doing too much statistics (yet, if needed I can easily do that)
it looks like these:


   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
atlashome   3.54T  17.3T256  0  7.97M  0
  raidz2 833G  6.00T  0  0  0  0
c0t0d0  -  -  0  0  0  0
c1t0d0  -  -  0  0  0  0
c4t0d0  -  -  0  0  0  0
c6t0d0  -  -  0  0  0  0
c7t0d0  -  -  0  0  0  0
c0t1d0  -  -  0  0  0  0
c1t1d0  -  -  0  0  0  0
c4t1d0  -  -  0  0  0  0
c5t1d0  -  -  0  0  0  0
c6t1d0  -  -  0  0  0  0
c7t1d0  -  -  0  0  0  0
c0t2d0  -  -  0  0  0  0
c1t2d0  -  -  0  0  0  0
c4t2d0  -  -  0  0  0  0
c5t2d0  -  -  0  0  0  0
  raidz21.29T  5.52T133  0  4.14M  0
c6t2d0  -  -117  0   285K  0
c7t2d0  -  -114  0   279K  0
c0t3d0  -  -106  0   261K  0
c1t3d0  -  -114  0   282K  0
c4t3d0  -  -118  0   294K  0
c5t3d0  -  -125  0   308K  0
c6t3d0  -  -126  0   311K  0
c7t3d0  -  -118  0   293K  0
c0t4d0  -  -119  0   295K  0
c1t4d0  -  -120  0   298K  0
c4t4d0  -  -120  0   291K  0
c6t4d0  -  -106  0   257K  0
c7t4d0  -  - 96  0   236K  0
c0t5d0  -  -109  0   267K  0
c1t5d0  -  -114  0   282K  0
  raidz21.43T  5.82T123  0  3.83M  0
c4t5d0  -  -108  0   242K  0
c5t5d0  -  -104  0   236K  0
c6t5d0  -  -104  0   239K  0
c7t5d0  -  -107  0   245K  0
c0t6d0  -  -108  0   248K  0
c1t6d0  -  -106  0   245K  0
c4t6d0  -  -108  0   250K  0
c5t6d0  -  -112  0   258K  0
c6t6d0  -  -114  0   261K  0
c7t6d0  -  -110  0   253K  0
c0t7d0  -  -109  0   248K  0
c1t7d0  -  -109  0   246K  0
c4t7d0  -  -108  0   243K  0
c5t7d0  -  -108  0   244K  0
c6t7d0  -  -106  0   240K  0
c7t7d0  -  -109  0   244K  0
--  -  -  -  -  -  -

the iops vary between about 70 - 140, interesting bit is that the first
raidz2 does not get any hits at all :(

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-15 Thread Carsten Aulbert
Hi again

Brent Jones wrote:
 
 Scott,
 
 Can you tell us the configuration that you're using that is working for you?
 Were you using RaidZ, or RaidZ2? I'm wondering what the sweetspot is
 to get a good compromise in vdevs and usable space/performance


Some time ago I made some tests to find this:

(1) create a new zpool
(2) Copy user's home to it (always the same ~ 25 GB IIRC)
(3) zfs send to /dev/null
(4) evaluate  continue loop

I did this for fully mirrored setups, raidz as well as raidz2, the
results were mixed:

https://n0.aei.uni-hannover.de/cgi-bin/twiki/view/ATLAS/ZFSBenchmarkTest#ZFS_send_performance_relevant_fo

The culprit here might be that in retrospect this seemed like a good
home filesystem, i.e. one which was quite fast.

If you don't want to bother with the table:

Mirrored setup never exceeded 58 MB/s and was getting faster the more
small mirrors you used.

RaidZ had its sweetspot with a configuration of '6 6 6 6 6 6 5 5', i.e.
6 or 5 disks per RaidZ and 8 vdevs

RaidZ2 finally was best at '10 9 9 9 9', i.e. 5 vdevs but not much worse
with only 3, i.e. what we are currently using to get more storage space
(gains us about 2 TB/box).

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-14 Thread Carsten Aulbert
Hi again,

Thomas Maier-Komor wrote:
 Carsten Aulbert schrieb:
 Hi Thomas,
 I don't know socat or what benefit it gives you, but have you tried
 using mbuffer to send and receive directly (options -I and -O)?

I thought we tried that in the past and with socat it seemed faster, but
I just made a brief test and I got (/dev/zero - remote /dev/null) 330
MB/s with mbuffer+socat and 430MB/s with mbuffer alone.

 Additionally, try to set the block size of mbuffer to the recordsize of
 zfs (usually 128k):
 receiver$ mbuffer -I sender:1 -s 128k -m 2048M | zfs receive
 sender$ zfs send blabla | mbuffer -s 128k -m 2048M -O receiver:1

We are using 32k since many of our user use tiny files (and then I need
to reduce the buffer size because of this 'funny' error):

mbuffer: fatal: Cannot address so much memory
(32768*65536=21474836481544040742911).

Does this qualify for a bug report?

Thanks for the hint of looking into this again!

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi all,

although I'm running all this in a Sol10u5 X4500, I hope I may ask this
question here. If not, please let me know where to head to.

We are running several X4500 with only 3 raidz2 zpools since we want
quite a bit of storage space[*], but the performance we get when using
zfs send is sometimes really lousy. Of course this depends what's in the
file system, but when doing a few backups today I have seen the following:

receiving full stream of atlashome/[EMAIL PROTECTED] into
atlashome/BACKUP/[EMAIL PROTECTED]
in @ 11.1 MB/s, out @ 11.1 MB/s, 14.9 GB total, buffer   0% full
summary: 14.9 GByte in 45 min 42.8 sec - average of 5708 kB/s

So, a mere 15 GB were transferred in 45 minutes, another user's home
which is quite large (7TB) took more than 42 hours to be transferred.
Since all this is going a 10 Gb/s network and the CPUs are all idle I
would really like to know why

* zfs send is so slow and
* how can I improve the speed?

Thanks a lot for any hint

Cheers

Carsten

[*] we have some quite a few tests with more zpools but were not able to
improve the speeds substantially. For this particular bad file system I
still need to histogram the file sizes.

-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi

Darren J Moffat wrote:

 
 What are you using to transfer the data over the network ?
 

Initially just plain ssh which was way to slow, now we use mbuffer on
both ends and socket transfer the data over via socat - I know that
mbuffer already allows this, but in a few tests socat seemed to be faster.

Sorry for not writing this into the first email.

Cheers

Carsten


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving zfs send performance

2008-10-13 Thread Carsten Aulbert
Hi Thomas,

Thomas Maier-Komor wrote:

 
 Carsten,
 
 the summary looks like you are using mbuffer. Can you elaborate on what
 options you are passing to mbuffer? Maybe changing the blocksize to be
 consistent with the recordsize of the zpool could improve performance.
 Is the buffer running full or is it empty most of the time? Are you sure
 that the network connection is 10Gb/s all the way through from machine
 to machine?

Well spotted :)

right now plain mbuffer with plenty of buffer (-m 2048M) on both ends
and I have not seen any buffer exceeding the 10% watermark level. The
network connection are via Neterion XFrame II Sun Fire NICs then via CX4
cables to our core switch where both boxes are directly connected
(WovenSystmes EFX1000). netperf tells me that the TCP performance is
close to 7.5 GBit/s duplex and if I use

cat /dev/zero | mbuffer | socat --- socat | mbuffer  /dev/null

I easily see speeds of about 350-400 MB/s so I think the network is fine.

Cheers

Carsten
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool replace is stuck

2008-10-10 Thread Carsten Aulbert
Hi,

on a Solaris 10u5 box (X4500) with latest patches (Oct 8) one disk was
marked as failed. We replaced it yesterday, I configured it via cfgadm
and told ZFS to replace it with the replacement:

cfgadm -c configure sata1/4
zpool replace atlashome c1t4d0

Initially it looked well, resilvering started, but when I looked a few
hours later I found the zpool still degraded and the replacement disk
was also marked as failed, but the resilvering looked complete
(according to zpool status):

zpool status

s08:~# zpool status
  pool: atlashome
 state: DEGRADED
 scrub: resilver completed with 0 errors on Thu Oct  9 13:51:52 2008
config:

NAME  STATE READ WRITE CKSUM
atlashome DEGRADED 0 0 0
  raidz2  ONLINE   0 0 0
[...]
  raidz2  DEGRADED 0 0 0
[...]c0t4d0ONLINE   0 0 0
replacing DEGRADED 0 3.00K18
  c1t4d0s0/o  UNAVAIL  0   277 0  cannot open
  c1t4d0  *0 0 0
c4t4d0ONLINE   0 0 0
[...]
* at that point this disk was OFFLINE IIRC, now it's marked ONLINE, see
below

I tried to get it back online with disconnecting the SATA port and then
reconnecting it. Apparently that worked (the disk is still ONLINE after
more than 12 hours), but I'm still stuck in the same place. ZFS seems to
think that the replacement is still going on and I don't know how to
continue.

I'm currently backing up the files form that box (luckily only about 1
TB), but I would like to know how to solve this:

(1) export/import the file system after the backup?
(2) Destroying the pool and re-init it?
(3) Anything else?

Thanks a lot for a brief hint!

Cheers

Carsten

-- 
Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics
Callinstrasse 38, 30167 Hannover, Germany
Phone/Fax: +49 511 762-17185 / -17193
http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss