Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2010-01-08 Thread Jack Kielsmeier
I'm thinking that the issue is simply with zfs destroy, not with dedup or 
compression.

Yesterday I decided to do some iscsi testing, I created a new dataset in my 
pool, 1TB. I did not use compression or dedup.

After copying about 700GB of data from my windows box (NTFS on top of the iscsi 
disk), I decided I didn't want to use it, so I attempted to delete the dataset.

Once again, the command froze. I removed the zfs cache file and am now trying 
to import my pool... again. This time, the memory fills up QUICKLY, I hit 8GB 
used in about an hour, then the box completely freezes.

iostat shows each of my disks being read at about 10 megs/S up until the freeze.

It does not matter if I limit l2arc size in /etc/system, the behavior is the 
same.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-08 Thread Peter van Gemert
 By having a snapshot you
 are not releasing the 
 space forcing zfs to allocate new space from other
 parts of a disk 
 drive. This may lead (depending on workload) to more
 fragmentation, less 
 localized data (more and longer seeks).
 

ZFS uses COW (copy on write) during writes. This means that it first has to 
find a new location for the data and when this data is written, the original 
block is released. When using snapshots, the original block is not released.

I don't think the use of snapshots will alter the way data is fragmented or 
localized on disk.

---
PeterVG
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] $100 SSD = 5x faster dedupe

2010-01-08 Thread marty scholes
--- On Thu, 1/7/10, Tiernan OToole lsmart...@gmail.com wrote:
 Sorry to hijack the thread, but can you
 explain your setup? Sounds interesting, but need more
 info...

This is just a home setup to amuse me and placate my three boys, each of whom 
has several Windows instances running under Virtualbox.

Server is a Sun v40z: quad 2.4 GHz Opteron with 16GB.  Internal bays hold a 
pair of 73GB drives as a mirrored rpool and a pair of 36GB drives for spares to 
the array plus a 146GB drive I use as cache to the usb pool (a single 320GB 
sata drive).

The array is an HP MSA30 with 14x36GB drives configured as RAIDZ3 using the 
spares listed above with auto snapshots as the tank pool. Tank is synchronized 
hourly to the usb pool.

It's all connected via four HP 4000M switches (one at the server and one at 
each workstation) which are meshed via gigabit fiber.

Two workstations are triple-head sunrays.

One station is a single sunray 150 integrated unit.

This is a work in progress with plenty of headroom to grow.  I started the 
build in November and have less than $1200 into it so far.

Thanks for letting me hijack the thread by sharing!

Cheers,
Marty


  
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)
On Wed, 23 Dec 2009 03:02:47 +0100, Mike Gerdts mger...@gmail.com wrote:

 I've been playing around with zones on NFS a bit and have run into
 what looks to be a pretty bad snag - ZFS keeps seeing read and/or
 checksum errors.  This exists with S10u8 and OpenSolaris dev build
 snv_129.  This is likely a blocker for anything thinking of
 implementing parts of Ed's Zones on Shared Storage:

 http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

 The OpenSolaris example appears below.  The order of events is:

 1) Create a file on NFS, turn it into a zpool
 2) Configure a zone with the pool as zonepath
 3) Install the zone, verify that the pool is healthy
 4) Boot the zone, observe that the pool is sick
[...]
 r...@soltrain19# zoneadm -z osol boot

 r...@soltrain19# zpool status osol
   pool: osol
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
 attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
 using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

 NAME  STATE READ WRITE CKSUM
 osol  DEGRADED 0 0 0
   /mnt/osolzone/root  DEGRADED 0 0   117  too many errors

 errors: No known data errors

Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit
the same during my slightely different testing, where I'm NFS mounting an
entire, pre-existing remote file living in the zpool on the NFS server and use 
that to create a zpool and install zones into it.

I've filed today:

6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent 
reason

here's the relevant piece worth investigating out of it (leaving out the actual 
setup etc..)
as in your case, creating the zpool and installing the zone into it still gives
a healthy zpool, but immediately after booting the zone, the zpool served over 
NFS
accumulated CHKSUM errors.

of particular interest are the 'cksum_actual' values as reported by Mike for his
test case here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

if compared to the 'chksum_actual' values I got in the fmdump error output on 
my test case/system:

note, the NFS servers zpool that is serving and sharing the file we use is 
healthy.

zone halted now on my test system, and checking fmdump:

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

osoldev.root./export/home/batschul.= zpool status -v
  pool: nfszone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
nfszone DEGRADED 0 0 0
  /nfszone  DEGRADED 0 0   462  too many errors

errors: No known data errors

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6

Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Darren J Moffat

Frank Batschulat (Home) wrote:

This just can't be an accident, there must be some coincidence and thus there's 
a good chance
that these CHKSUM errors must have a common source, either in ZFS or in NFS ?


What are you using for on the wire protection with NFS ?  Is it shared 
using krb5i or do you have IPsec configured ?  If not I'd recommend 
trying one of those and see if your symptoms change.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Detecting quota limits

2010-01-08 Thread Martijn de Munnik
Hi List,

We create a zfs filesystem for each user's homedir. I would like to
monitor their usage and when the user approaches his quota I would like to
receive a warning by mail. Does anybody have a script available which does
this job and can be run using a cron job. Or even better, is this a build
in feature of zfs?

thanks,
Martijn

-- 
YoungGuns
Kasteleinenkampweg 7b
5222 AX 's-Hertogenbosch
T. 073 623 56 40
F. 073 623 56 39
www.youngguns.nl
KvK 18076568
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-08 Thread Robert Milkowski

On 08/01/2010 12:40, Peter van Gemert wrote:

By having a snapshot you
are not releasing the
space forcing zfs to allocate new space from other
parts of a disk
drive. This may lead (depending on workload) to more
fragmentation, less
localized data (more and longer seeks).

 

ZFS uses COW (copy on write) during writes. This means that it first has to 
find a new location for the data and when this data is written, the original 
block is released. When using snapshots, the original block is not released.

I don't think the use of snapshots will alter the way data is fragmented or 
localized on disk.

---
PeterVG
   


Well, it will (depending on workload).
For example - lets say you have a 80GB disk drive as a pool with a 
single db file which is 1GB in size.
Now no snapshots are created and you constantly are modyfing logical 
blocks in the file. As ZFS will release the old block and will re-use it 
later on so all current data should be roughly within the first 2GB of 
the disk drive therefore highly localized.


Now if you would create a snapshot while modyfing data, then another one 
and another one, you would end-up in a situation where free blocks are 
availably further and further onto a disk drive. When you end-up almost 
filling the disk drive even if you delete all snapshots now your active 
data will be scattered all over the disk (assuming you were not modyfing 
100% of data between creating snapshots). It won't be highly localized 
anymore.


--
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 6:55 AM, Darren J Moffat darr...@opensolaris.org wrote:
 Frank Batschulat (Home) wrote:

 This just can't be an accident, there must be some coincidence and thus
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in
 NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared using
 krb5i or do you have IPsec configured ?  If not I'd recommend trying one of
 those and see if your symptoms change.

Shouldn't a scrub pick that up?  Why would there be no errors from
zoneadm install, which under the covers does a pkg image create
followed by *multiple* pkg install invocations.  No checksum errors
pop up there.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 6:51 AM, James Carlson carls...@workingcode.com wrote:
 Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

 One possible cause would be a lack of substantial exercise.  The man
 page says:

         A regular file. The use of files as a backing  store  is
         strongly  discouraged.  It  is  designed  primarily  for
         experimental purposes, as the fault tolerance of a  file
         is  only  as  good  as  the file system of which it is a
         part. A file must be specified by a full path.

 Could it be that discouraged and experimental mean not tested as
 thoroughly as you might like, and certainly not a good idea in any sort
 of production environment?

 It sounds like a bug, sure, but the fix might be to remove the option.

This unsupported feature is supported with the use of Sun Ops Center
2.5 when a zone is put on a NAS Storage Library.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thin device support in ZFS?

2010-01-08 Thread Daniel Carosone
Yet another way to thin-out the backing devices for a zpool on a
thin-provisioned storage host, today: resilver. 

If your zpool has some redundancy across the SAN backing LUNs, simply
drop and replace one at a time and allow zfs to resilver only the
blocks currently in use onto the replacement LUN.

--
Dan.

pgpo7ejxaipJy.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Hello, 

today I wanted to test that the failure of the L2ARC device is not crucial to 
the pool. I added a Intel X25-M Postville (160GB) as cache device to a 54 disk 
mittor pool. Then I startet a SYNC iozone on the pool: 

iozone -ec -r 32k -s 2048m -l 2 -i 0 -i 2 -o

Pool: 

pool 
  mirror-0
disk1
disk2
  mirror-1
disk3
disk4
cache
  intel-postville-ssd

Then I pulled the power cable of the SSD device (not the sata connector) and 
from that moment on, al pool related commands hang. (e.g. zpool iostat -v) 

I've waited 20 minutes now - still hangs :(

I can login to the system itself (after some time - so the whole is system is 
sluggish), so the syspool (which is a seperate device) is ok. Release is 
svn_104.

dmesg shows: 

Jan  8 15:21:42 nexenta gda: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6):
Jan  8 15:21:42 nexenta Error for command 'write sector'Error 
Level: Informational
Jan  8 15:21:42 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted 
command
Jan  8 15:21:42 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error 
code: 0x3
Jan  8 15:21:47 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select 
failed
Jan  8 15:21:47 nexenta gda: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1/c...@0,0 (Disk5):
Jan  8 15:21:47 nexenta Error for command 'write sector'Error 
Level: Informational
Jan  8 15:21:47 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted 
command
Jan  8 15:21:47 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error 
code: 0x3
Jan  8 15:21:52 nexenta genunix: [ID 698548 kern.notice] ata_disk_start: select 
failed
Jan  8 15:21:57 nexenta scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1 (ata9):

lspci: 

00:00.0 Host bridge: ATI Technologies Inc RX780/RX790 Chipset Host Bridge
00:02.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (external gfx0 
port A)
00:05.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express 
gpp port B)
00:06.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express 
gpp port C)
00:0a.0 PCI bridge: ATI Technologies Inc RD790 PCI to PCI bridge (PCI express 
gpp port F)
00:11.0 IDE interface: ATI Technologies Inc SB700/SB800 SATA Controller [IDE 
mode]
00:12.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:12.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:12.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:13.0 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI0 Controller
00:13.1 USB Controller: ATI Technologies Inc SB700 USB OHCI1 Controller
00:13.2 USB Controller: ATI Technologies Inc SB700/SB800 USB EHCI Controller
00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 3c)
00:14.1 IDE interface: ATI Technologies Inc SB700/SB800 IDE Controller
00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA)
00:14.3 ISA bridge: ATI Technologies Inc SB700/SB800 LPC host controller
00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge
00:14.5 USB Controller: ATI Technologies Inc SB700/SB800 USB OHCI2 Controller
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address 
Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM 
Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] 
Miscellaneous Control
01:00.0 VGA compatible controller: nVidia Corporation G72 [GeForce 7300 SE/7200 
GS] (rev a1)
02:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)
03:00.0 Ethernet controller: Intel Corporation 82572EI Gigabit Ethernet 
Controller (Copper) (rev 06)
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI 
Express Gigabit Ethernet controller (rev 02)
05:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit Ethernet 
Controller (rev 05)
05:0e.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 
Controller (PHY/Link)


Anyone seen something like this ? 

Hardware is a standard Gigabyte Mainboard with on-Soard sata.

Regards,
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Ok, 

I now waited 30 minutes - still hung. After that I pulled the SATA cable to the 
L2ARC device also - still no success (I waited 10 minutes). 

After 10 minutes I put the L2ARC device back (SATA + Power) 

20 seconds after that the system continues to run. 

dmesg shows: 

Jan  8 15:41:57 nexenta scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1 (ata9):
Jan  8 15:41:57 nexenta timeout: early timeout, target=1 lun=0
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci-...@14,1/i...@1/c...@1,0 (Disk6):
Jan  8 15:41:57 nexenta Error for command 'write sector'Error 
Level: Informational
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.notice]Sense Key: aborted 
command
Jan  8 15:41:57 nexenta gda: [ID 107833 kern.notice]Vendor 'Gen-ATA ' error 
code: 0x3
Jan  8 15:42:01 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: Major
Jan  8 15:42:01 nexenta EVENT-TIME: Fri Jan  8 15:41:59 CET 2010
Jan  8 15:42:01 nexenta PLATFORM: GA-MA770-UD3, CSN:  , HOSTNAME: nexenta
Jan  8 15:42:01 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan  8 15:42:01 nexenta EVENT-ID: aca93a91-e013-c1b8-a5b7-fff547b2a61e
Jan  8 15:42:01 nexenta DESC: The number of I/O errors associated with a ZFS 
device exceeded
Jan  8 15:42:01 nexenta  acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
Jan  8 15:42:01 nexenta AUTO-RESPONSE: The device has been offlined and marked 
as faulted.  An attempt
Jan  8 15:42:01 nexenta  will be made to activate a hot spare if 
available.
Jan  8 15:42:01 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan  8 15:42:01 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad 
device.
Jan  8 15:42:13 nexenta fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, 
TYPE: Fault, VER: 1, SEVERITY: Major
Jan  8 15:42:13 nexenta EVENT-TIME: Fri Jan  8 15:42:12 CET 2010
Jan  8 15:42:13 nexenta PLATFORM: GA-MA770-UD3, CSN:  , HOSTNAME: nexenta
Jan  8 15:42:13 nexenta SOURCE: zfs-diagnosis, REV: 1.0
Jan  8 15:42:13 nexenta EVENT-ID: 781fa01d-394f-c24d-b900-c114d1cd9d06
Jan  8 15:42:13 nexenta DESC: The number of I/O errors associated with a ZFS 
device exceeded
Jan  8 15:42:13 nexenta  acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD for more information.
Jan  8 15:42:13 nexenta AUTO-RESPONSE: The device has been offlined and marked 
as faulted.  An attempt
Jan  8 15:42:13 nexenta  will be made to activate a hot spare if 
available.
Jan  8 15:42:13 nexenta IMPACT: Fault tolerance of the pool may be compromised.
Jan  8 15:42:13 nexenta REC-ACTION: Run 'zpool status -x' and replace the bad 
device.

.. the deivce is seen as faulted: 

  pool: data
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Fri Jan  8 15:42:03 2010
config:

NAMESTATE READ WRITE CKSUM
dataONLINE   0 0 0
  mirrorONLINE   0 0 0
c3d0ONLINE   0 0 0
c6d0ONLINE   0 0 0  512 resilvered
  mirrorONLINE   0 0 0
c3d1ONLINE   0 0 0
c4d0ONLINE   0 0 0
cache
  c6d1  FAULTED  0   499 0  too many errors

.. however zpool iostat -v still shows the device 

r...@nexenta:/export/home/admin# zpool iostat -v 1
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data 209G  1.61T  0129  0  4.64M
  mirror 104G   824G  0 64  0  2.34M
c3d0-  -  0 64  0  2.34M
c6d0-  -  0 64  0  2.34M
  mirror 104G   824G  0 64  0  2.31M
c3d1-  -  0 64  0  2.31M
c4d0-  -  0 64  0  2.31M
cache   -  -  -  -  -  -
  c6d1   137M   149G  0  0  0  0
--  -  -  -  -  -  -
syspool 2.18G   462G  0  0  0  0
  c4d1s02.18G   462G  0  0  0  0
--  -  -  -  -  -  -

So this seems to be a hardware issue. 

I would expect that there is some general in kernel timeout for I/O's so that 
strangly failing and not reacting device (and real failures are like this) are 
killed. 

Did I miss something ? Is there a tunable (/etc/system) ? 

Thanks for your responses :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-08 Thread David Dyer-Bennet

On Fri, January 8, 2010 07:51, Robert Milkowski wrote:
 On 08/01/2010 12:40, Peter van Gemert wrote:
 By having a snapshot you
 are not releasing the
 space forcing zfs to allocate new space from other
 parts of a disk
 drive. This may lead (depending on workload) to more
 fragmentation, less
 localized data (more and longer seeks).


 ZFS uses COW (copy on write) during writes. This means that it first has
 to find a new location for the data and when this data is written, the
 original block is released. When using snapshots, the original block is
 not released.

 I don't think the use of snapshots will alter the way data is fragmented
 or localized on disk.

 Well, it will (depending on workload).
 For example - lets say you have a 80GB disk drive as a pool with a
 single db file which is 1GB in size.
 Now no snapshots are created and you constantly are modyfing logical
 blocks in the file. As ZFS will release the old block and will re-use it
 later on so all current data should be roughly within the first 2GB of
 the disk drive therefore highly localized.

I thought block re-use was delayed to allow for TXG rollback, though? 
They'll certainly get reused eventually, but I think they get reused later
rather than sooner.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] I/O errors after zfs promote back and forth

2010-01-08 Thread Nils Goroll

Hi,

I have just observed the following issue and I would like to ask if it is 
already known:


I'm using zones on ZFS filesystems which were cloned from a common template 
(which is itself an original filesystem). A couple of weeks ago, I did a pkg 
image-update, so all zone roots got cloned again and the new zone roots got 
promoted. I then decided to undo the update and promoted the original zone roots 
again.


So until today, the zone template was dependend upon one of the zone roots and 
when I promoted it again to restore the intended order, all zones effectively 
crashed. When trying to execute processes in them, I got exec failures like this 
one:


# zlogin ZONE
[Connected to zone 'ZONE' pts/2]
zlogin: exec failure: I/O error

Is this issue known to anyone already?

Thank you, Nils

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] I/O errors after zfs promote back and forth

2010-01-08 Thread Nils Goroll

BTW, this was on snv_111b - sorry I forgot to mention.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 5:28 AM, Frank Batschulat (Home)
frank.batschu...@sun.com wrote:
[snip]
 Hey Mike, you're not the only victim of these strange CHKSUM errors, I hit
 the same during my slightely different testing, where I'm NFS mounting an
 entire, pre-existing remote file living in the zpool on the NFS server and use
 that to create a zpool and install zones into it.

What does your overall setup look like?

Mine is:

T5220 + Sun System Firmware 7.2.4.f 2009/11/05 18:21
   Primary LDom
  Solaris 10u8
  Logical Domains Manager 1.2,REV=2009.06.25.09.48 + 142840-03
  Guest Domain 4 vcpus + 15 GB memory
 OpenSolaris snv_130
(this is where the problem is observed)

I've seen similar errors on Solaris 10 in the primary domain and on a
M4000.  Unfortunately Solaris 10 doesn't show the checksums in the
ereport.  There I noticed a mixture between read errors and checksum
errors - and lots more of them.  This could be because the S10 zone
was a full root SUNWCXall compared to the much smaller default ipkg
branded zone.  On the primary domain running Solaris 10...

(this command was run some time ago)
primary-domain# zpool status myzone
  pool: myzone
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
myzone  DEGRADED 0 0 0
  /foo/20g  DEGRADED 4.53K 0   671  too many errors

errors: No known data errors


(this was run today, many days after previous command)
primary-domain# fmdump -eV | egrep zio_err | uniq -c | head
   1zio_err = 5
   1zio_err = 50
   1zio_err = 5
   1zio_err = 50
   1zio_err = 5
   1zio_err = 50
   2zio_err = 5
   1zio_err = 50
   3zio_err = 5
   1zio_err = 50


Note that even though I had thousands of read errors the zone worked
just fine. I would have never known (suspected?) there was a problem
if I hadn't run zpool status or the various FMA commands.


 I've filed today:

 6915265 zpools on files (over NFS) accumulate CKSUM errors with no apparent 
 reason

Thanks.  I'll open a support call to help get some funding on it...

 here's the relevant piece worth investigating out of it (leaving out the 
 actual setup etc..)
 as in your case, creating the zpool and installing the zone into it still 
 gives
 a healthy zpool, but immediately after booting the zone, the zpool served 
 over NFS
 accumulated CHKSUM errors.

 of particular interest are the 'cksum_actual' values as reported by Mike for 
 his
 test case here:

 http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

 if compared to the 'chksum_actual' values I got in the fmdump error output on 
 my test case/system:

 note, the NFS servers zpool that is serving and sharing the file we use is 
 healthy.

 zone halted now on my test system, and checking fmdump:

 osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | 
 sort | uniq -c | sort -n | tail
   2    cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
 0x7cd81ca72df5ccc0
   2    cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
 0x3d2827dd7ee4f21
   6    cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
 0x983ddbb8c4590e40
 *A   6    cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
 0x89715e34fbf9cdc0
 *B   7    cksum_actual = 0x0 0x0 0x0 0x0
 *C  11    cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
 0x280934efa6d20f40
 *D  14    cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
 0x7e0aef335f0c7f00
 *E  17    cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
 0xd4f1025a8e66fe00
 *F  20    cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
 0x7f84b11b3fc7f80
 *G  25    cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
 0x82804bc6ebcfc0

 osoldev.root./export/home/batschul.= zpool status -v
  pool: nfszone
  state: DEGRADED
 status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
  scrub: none requested
 config:

        NAME        STATE     READ WRITE CKSUM
        nfszone     DEGRADED     0     0     0
          /nfszone  DEGRADED     0     0   462  too many errors

 errors: No known data errors

 ==

 now compare this with Mike's error output as posted here:

 

Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-08 Thread Robert Milkowski

On 08/01/2010 14:50, David Dyer-Bennet wrote:

On Fri, January 8, 2010 07:51, Robert Milkowski wrote:
   

On 08/01/2010 12:40, Peter van Gemert wrote:
 

By having a snapshot you
are not releasing the
space forcing zfs to allocate new space from other
parts of a disk
drive. This may lead (depending on workload) to more
fragmentation, less
localized data (more and longer seeks).


 

ZFS uses COW (copy on write) during writes. This means that it first has
to find a new location for the data and when this data is written, the
original block is released. When using snapshots, the original block is
not released.

I don't think the use of snapshots will alter the way data is fragmented
or localized on disk.
   
   

Well, it will (depending on workload).
For example - lets say you have a 80GB disk drive as a pool with a
single db file which is 1GB in size.
Now no snapshots are created and you constantly are modyfing logical
blocks in the file. As ZFS will release the old block and will re-use it
later on so all current data should be roughly within the first 2GB of
the disk drive therefore highly localized.
 

I thought block re-use was delayed to allow for TXG rollback, though?
They'll certainly get reused eventually, but I think they get reused later
rather than sooner.

   


yes there is a delay but iirc it is only several transactions while the 
above scenario in practice usually means a snapshot a day and keep 30 of 
them.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] link in zpool upgrade -v broken

2010-01-08 Thread Cindy Swearingen

Hi Ian,

I see the problem. In your included URL below, you didn't
include the /N suffix as included in the zpool upgrade
output.

CR 6898657 is still filed to identify the change.

If you copy and paste the URL from the zpool upgrade -v output:

http://www.opensolaris.org/os/community/zfs/version/N

You will be redirected to the new version page:

http://hub.opensolaris.org/bin/view/Community+Group+zfs/N

See the output below.

Thanks,

Cindy

# zpool upgrade -v
This system is currently running ZFS pool version 22.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history
 5   Compression using the gzip algorithm
 6   bootfs pool property
 7   Separate intent log devices
 8   Delegated administration
 9   refquota and refreservation properties
 10  Cache devices
 11  Improved scrub performance
 12  Snapshot properties
 13  snapused property
 14  passthrough-x aclinherit
 15  user/group space accounting
 16  stmf property support
 17  Triple-parity RAID-Z
 18  Snapshot user holds
 19  Log device removal
 20  Compression using zle (zero-length encoding)
 21  Deduplication
 22  Received properties

For more information on a particular version, including supported 
releases, see:


http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.



On 01/07/10 16:52, Ian Collins wrote:

http://www.opensolaris.org/os/community/zfs/version/

No longer exists.  Is there a bug for this yet?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool iostat -v hangs on L2ARC failure (SATA, 160 GB Postville)

2010-01-08 Thread Lutz Schumann
Ok, after browsing I found that the sata disks are not shown via cfgadm. 
I found http://opensolaris.org/jive/message.jspa?messageID=287791tstart=0 
which states that you have to set the mode to AHCI to enable hot-plug etc. 

However I sill think, also the plain IDE driver needs a timeout to hande disk 
failures, cause cables etc can fail. 

I looked in the BIOS and it seems the disks are in IDE mode. There is a AHCI 
mode, however I dod not know if I can switch without reinstalling. 

Is it possible to set AHCI without reinstalling OSol ? 

Regards
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] (Practical) limit on the number of snapshots?

2010-01-08 Thread Bob Friesenhahn

On Fri, 8 Jan 2010, Peter van Gemert wrote:


I don't think the use of snapshots will alter the way data is 
fragmented or localized on disk.


What happens after a snapshot is deleted?

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS partially hangs when removing an rpool mirrored disk while having some IO on another pool on another partition of the same disk

2010-01-08 Thread Arnaud Brand
Hello,



Sorry for the (very) long subject but I've pinpointed the problem to this exact 
situation.

I know about the other threads related to hangs, but in my case there was no  
zfs destroy  involved, nor any compression or deduplication.



To make a long story short, when

- a disk contains 2 partitions  (p1=32GB, p2=1800 GB) and

- p1 is used as part of a zfs mirror of rpool and

- p2 is used as part of a raidz (tested raidz1 and raidz2) of tank and

- some serious work  is underway on tank (tested write, copy, scrub),

If you physically remove the disk, zfs partially hangs. Putting back the 
physical disk does not help.



For the long story :



About the hardware :

1 x intel X25E (64GB SSD), 15x2TB SATA drives (7xWD, 8xHitachi), 2xQuadCore 
Xeon, 12GB RAM, 2xAreca-1680 (8-ports SAS controller), tyan S7002 mainboard.



About the software / firmware :

Opensolaris b130 installed on the SSD drive, on the first 32 GB.

The areca cards are configured as a JBOD and are running the latest release 
firmware.



Initial setup :

We created a 32GB partition on all of the 2TB drives and mirrored the system 
partition, giving us a 16-way rpool mirror.

The rest of the 2TB drives's space was put in a second partition and used for a 
raidz2 pool (named tank)



Problem :

Whenever we physically removed a disk from its tray while doing some speed 
testing on the tank pool, the system hung.

At that time I hadn't read all the thread about zfs hangs and couldn't 
determine wether the system was hung or just zfs.

In order to pinpoint the problem, we made another setup.



Second setup :

I  reduced the number of partitions in the rpool mirror down to 3 (p1 from the 
SSD, p1 from a 2TB drive on the same controller as the SSD and p1 from a 2TB 
drive on the other controller).



Problem :

When the system is quiet, I am able to physically remove any disk, plug it back 
and resilver it.

When I am putting some load on the tank pool, I can remove any disk that does 
*not* contain the rpool mirror (I can plug it back and resilver it while the 
load keeps running without noticeable performance impact).



When I am putting some load on the tank pool, I cannot physically remove a disk 
that also contains a mirror of the rpool or zfs partially hangs.

When I say partially, I mean that :

- zpool iostat -v tank 5 freezes

- if I run any zpool command related to rpool, I'm stuck (zpool clear rpool 
c4t0d7s0 for example or zpool status rpool)



I can't launch new programms, but already launched programs continue to run (at 
least in an ssh session, since gnome becomes more and more frozen as you move 
from window to window).



From ssh sessions :



- prstat shows that only gnome-system-monitor, xorg, ssh, bash and various 
*stat utils (prstat, fstat, iostat, mpstat) are consumming some CPU.



- zpool iostat -v tank 5 is frozen (It freezes when I issue a zpool clear rpool 
c4t0d7s0 in another session)



- iostat -xn is not stuck but shows all zeroes since the very moment zpool 
iostat froze (which is quite strange if you look at fsstat ouput hereafter). 
NB: when I say all zeroes, I really mea nit, it's not zero dot domething, its 
zero dot zero.



- mpstat shows normal activity (almost nothing since this is a test machine, so 
only a few percent are used, but it still shows some activity and refreshes 
correctly)

CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl

  00   0  125   428  109  1131400   2512   0   0  98

  10   0   2056   16   442210   277   11   1   0  88

  20   0  163   152   13  3091300  13704   0   0  96

  30   0   19   111   41   900400800   0   0 100

  40   0   69   192   17   660300200   0   0 100

  50   0   10617   920400   1670   0   0 100

  60   0   96   191   25   740410 50   0   0 100

  70   0   16586   630310590   0   0 100



- fsstat -F 5 shows all zeroes but for the zfs line (the figures hereunder stay 
almost the same over time)

new  name   name  attr  attr lookup rddir  read read  write write

file remov  chng   get   setops   ops   ops bytes   ops bytes

0 0 0 1,25K 0  2,51K 0   803 11,0M   473 11,0M zfs



- disk leds show no activity



- I cannot run any other command (neither from ssh, nor from gnome)



- I cannot open another ssh session (I don't even get the login prompt in putty)



- I can successfully ping the machine



- I cannot establish a new cifs session (the login prompt should not appear 
since the machine is in an active directory domain, but when it's stuck the 
prompt appear and I cannot authenticate. I guess it's related to ldap or 
kerberos or whatever cannot be read on rpool), but an already active session 
will stay open (last time I even managed to create a text file with a few lines 

Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 9:11 AM, Mike Gerdts mger...@gmail.com wrote:
 I've seen similar errors on Solaris 10 in the primary domain and on a
 M4000.  Unfortunately Solaris 10 doesn't show the checksums in the
 ereport.  There I noticed a mixture between read errors and checksum
 errors - and lots more of them.  This could be because the S10 zone
 was a full root SUNWCXall compared to the much smaller default ipkg
 branded zone.  On the primary domain running Solaris 10...

I've written a dtrace script to get the checksums on Solaris 10.
Here's what I see with NFSv3 on Solaris 10.

# zoneadm -z zone1 halt ; zpool export pool1 ; zpool import -d
/mnt/pool1 pool1 ; zoneadm -z zone1 boot ; sleep 30 ; pkill dtrace

# ./zfs_bad_cksum.d
Tracing...
dtrace: error on enabled probe ID 9 (ID 43443:
fbt:zfs:zio_checksum_error:return): invalid address (0x301b363a000) in
action #4 at DIF offset 20
dtrace: error on enabled probe ID 9 (ID 43443:
fbt:zfs:zio_checksum_error:return): invalid address (0x3037f746000) in
action #4 at DIF offset 20
cccdtrace:
error on enabled probe ID 9 (ID 43443:
fbt:zfs:zio_checksum_error:return): invalid address (0x3026e7b) in
action #4 at DIF offset 20
cc
Checksum errors:
   3 : 0x130e01011103 0x20108 0x0 0x400 (fletcher_4_native)
   3 : 0x220125cd8000 0x62425980c08 0x16630c08296c490c
0x82b320c082aef0c (fletcher_4_native)
   3 : 0x2f2a0a202a20436f 0x7079726967687420 0x2863292032303031
0x2062792053756e20 (fletcher_4_native)
   3 : 0x3c21444f43545950 0x452048544d4c2050 0x55424c494320222d
0x2f2f5733432f2f44 (fletcher_4_native)
   3 : 0x6005a8389144 0xc2080e6405c200b6 0x960093d40800
0x9eea007b9800019c (fletcher_4_native)
   3 : 0xac044a6903d00163 0xa138c8003446 0x3f2cd1e100b10009
0xa37af9b5ef166104 (fletcher_4_native)
   3 : 0xbaddcafebaddcafe 0xc 0x0 0x0 (fletcher_4_native)
   3 : 0xc4025608801500ff 0x1018500704528210 0x190103e50066
0xc34b90001238f900 (fletcher_4_native)
   3 : 0xfe00fc01fc42fc42 0xfc42fc42fc42fc42 0xfffc42fc42fc42fc
0x42fc42fc42fc42fc (fletcher_4_native)
   4 : 0x4b2a460a 0x0 0x4b2a460a 0x0 (fletcher_4_native)
   4 : 0xc00589b159a00 0x543008a05b673 0x124b60078d5be
0xe3002b2a0b605fb3 (fletcher_4_native)
   4 : 0x130e010111 0x32000b301080034 0x10166cb34125410
0xb30c19ca9e0c0860 (fletcher_4_native)
   4 : 0x130e010111 0x3a201080038 0x104381285501102
0x418016996320408 (fletcher_4_native)
   4 : 0x130e010111 0x3a201080038 0x1043812c5501102
0x81802325c080864 (fletcher_4_native)
   4 : 0x130e010111 0x3a0001c01080038 0x1383812c550111c
0x818975698080864 (fletcher_4_native)
   4 : 0x1f81442e9241000 0x2002560880154c00 0xff10185007528210
0x19010003e566 (fletcher_4_native)
   5 : 0xbab10c 0xf 0x53ae 0xdd549ae39aa1ba20 (fletcher_4_native)
   5 : 0x130e010111 0x3ab01080038 0x1163812c550110b
0x8180a7793080864 (fletcher_4_native)
   5 : 0x61626300 0x0 0x0 0x0 (fletcher_4_native)
   5 : 0x8003 0x3df0d6a1 0x0 0x0 (fletcher_4_native)
   6 : 0xbab10c 0xf 0x5384 0xdd549ae39aa1ba20 (fletcher_4_native)
   7 : 0xbab10c 0xf 0x0 0x9af5e5f61ca2e28e (fletcher_4_native)
   7 : 0x130e010111 0x3a201080038 0x104381265501102
0xc18c7210c086006 (fletcher_4_native)
   7 : 0x275c222074650a2e 0x5c222020436f7079 0x7269676874203139
0x38392041540a2e5c (fletcher_4_native)
   8 : 0x130e010111 0x3a0003101080038 0x1623812c5501131
0x8187f66a4080864 (fletcher_4_native)
   9 : 0x8a000801010c0682 0x2eed0809c1640513 0x70200ff00026424
0x18001d16101f0059 (fletcher_4_native)
  12 : 0xbab10c 0xf 0x0 0x45a9e1fc57ca2aa8 (fletcher_4_native)
  30 : 0xbaddcafebaddcafe 0xbaddcafebaddcafe 0xbaddcafebaddcafe
0xbaddcafebaddcafe (fletcher_4_native)
  47 : 0x0 0x0 0x0 0x0 (fletcher_4_native)
  92 : 0x130e01011103 0x10108 0x0 0x200 (fletcher_4_native)

Since I had to guess at what the Solaris 10 source looks like, some
extra eyeballs on the dtrace script is in order.

Mike

-- 
Mike Gerdts
http://mgerdts.blogspot.com/


zfs_bad_cksum.d
Description: Binary data
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread James Lee
I haven't seen much discussion on how deduplication affects performance.
 I've enabled dudup on my 4-disk raidz array and have seen a significant
drop in write throughput, from about 100 MB/s to 3 MB/s.  I can't
imagine such a decrease is normal.

 # zpool iostat nest 1 (with dedup enabled):
 ...
 nest1.05T   411G 91 18   197K  2.35M
 nest1.05T   411G147 15   443K  1.98M
 nest1.05T   411G 82 28   174K  3.59M

 # zpool iostat nest 1 (with dedup disabled):
 ...
 nest1.05T   410G  0787  0  96.9M
 nest1.05T   410G  1899   253K  95.0M
 nest1.05T   409G  0533  0  48.5M

I do notice when dedup is enabled that the drives sound like they are
constantly seeking.  iostat shows average service times around 20 ms
which is normal for my drives and prstat shows that my processor and
memory aren't a bottleneck.  What could cause such a marked decrease in
throughput?  Is anyone else experiencing similar effects?

Thanks,

James
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread Ray Van Dolson
On Fri, Jan 08, 2010 at 10:00:14AM -0800, James Lee wrote:
 I haven't seen much discussion on how deduplication affects performance.
  I've enabled dudup on my 4-disk raidz array and have seen a significant
 drop in write throughput, from about 100 MB/s to 3 MB/s.  I can't
 imagine such a decrease is normal.

Seems like I've seen other posts with similar numbers (maybe 9MB/s or
so?).

Sounded like adding SSD for caching really improved performance
however.

 
  # zpool iostat nest 1 (with dedup enabled):
  ...
  nest1.05T   411G 91 18   197K  2.35M
  nest1.05T   411G147 15   443K  1.98M
  nest1.05T   411G 82 28   174K  3.59M
 
  # zpool iostat nest 1 (with dedup disabled):
  ...
  nest1.05T   410G  0787  0  96.9M
  nest1.05T   410G  1899   253K  95.0M
  nest1.05T   409G  0533  0  48.5M
 
 I do notice when dedup is enabled that the drives sound like they are
 constantly seeking.  iostat shows average service times around 20 ms
 which is normal for my drives and prstat shows that my processor and
 memory aren't a bottleneck.  What could cause such a marked decrease in
 throughput?  Is anyone else experiencing similar effects?
 
 Thanks,
 
 James

Ray
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread James Carlson
Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

One possible cause would be a lack of substantial exercise.  The man
page says:

 A regular file. The use of files as a backing  store  is
 strongly  discouraged.  It  is  designed  primarily  for
 experimental purposes, as the fault tolerance of a  file
 is  only  as  good  as  the file system of which it is a
 part. A file must be specified by a full path.

Could it be that discouraged and experimental mean not tested as
thoroughly as you might like, and certainly not a good idea in any sort
of production environment?

It sounds like a bug, sure, but the fix might be to remove the option.

-- 
James Carlson 42.703N 71.076W carls...@workingcode.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Frank Batschulat (Home)
On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org 
wrote:

 Frank Batschulat (Home) wrote:
 This just can't be an accident, there must be some coincidence and thus 
 there's a good chance
 that these CHKSUM errors must have a common source, either in ZFS or in NFS ?

 What are you using for on the wire protection with NFS ?  Is it shared
 using krb5i or do you have IPsec configured ?  If not I'd recommend
 trying one of those and see if your symptoms change.

Hey Darren, doing krb5i is certainly a good idea for additional protection in 
general,
however I have some doubts that NFS OTW corruption will produce the exact same
wrong checksum inside 2 totally different setups and networks, as comparing
Mike and my results showed [see 1].

cheers
frankB

[1]

osoldev.batschul./export/home/batschul.= fmdump -eV | grep cksum_actual | sort 
| uniq -c | sort -n | tail
   2cksum_actual = 0x4bea1a77300 0xf6decb1097980 0x217874c80a8d9100 
0x7cd81ca72df5ccc0
   2cksum_actual = 0x5c1c805253 0x26fa7270d8d2 0xda52e2079fd74 
0x3d2827dd7ee4f21
   6cksum_actual = 0x28e08467900 0x479d57f76fc80 0x53bca4db5209300 
0x983ddbb8c4590e40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*B   7cksum_actual = 0x0 0x0 0x0 0x0
*C  11cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*D  14cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E  17cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*F  20cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  25cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

==

now compare this with Mike's error output as posted here:

http://www.mail-archive.com/zfs-discuss@opensolaris.org/msg33041.html

# fmdump -eV | grep cksum_actual | sort | uniq -c | sort -n | tail

   2cksum_actual = 0x14c538b06b6 0x2bb571a06ddb0 0x3e05a7c4ac90c62 
0x290cbce13fc59dce
*D   3cksum_actual = 0x175bb95fc00 0x1767673c6fe00 0xfa9df17c835400 
0x7e0aef335f0c7f00
*E   3cksum_actual = 0x2eb772bf800 0x5d8641385fc00 0x7cf15b214fea800 
0xd4f1025a8e66fe00
*B   4cksum_actual = 0x0 0x0 0x0 0x0
   4cksum_actual = 0x1d32a7b7b00 0x248deaf977d80 0x1e8ea26c8a2e900 
0x330107da7c4bcec0
   5cksum_actual = 0x14b8f7afe6 0x915db8d7f87 0x205dc7979ad73 
0x4e0b3a8747b8a8
*C   6cksum_actual = 0x1184cb07d00 0xd2c5aab5fe80 0x69ef5922233f00 
0x280934efa6d20f40
*A   6cksum_actual = 0x348e6117700 0x765aa1a547b80 0xb1d6d98e59c3d00 
0x89715e34fbf9cdc0
*F  16cksum_actual = 0xbaddcafe00 0x5dcc54647f00 0x1f82a459c2aa00 
0x7f84b11b3fc7f80
*G  48cksum_actual = 0x5d6ee57f00 0x178a70d27f80 0x3fc19c3a19500 
0x82804bc6ebcfc0

and observe that the values in 'chksum_actual' causing our CHKSUM pool errors 
eventually
because of missmatching with what had been expected are the SAME ! for 2 totally
different client systems and 2 different NFS servers (mine vrs. Mike's),
see the entries marked with *A to *G.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread James Carlson
Mike Gerdts wrote:
 This unsupported feature is supported with the use of Sun Ops Center
 2.5 when a zone is put on a NAS Storage Library.

Ah, ok.  I didn't know that.

-- 
James Carlson 42.703N 71.076W carls...@workingcode.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Torrey McMahon

On 1/8/2010 10:04 AM, James Carlson wrote:

Mike Gerdts wrote:
   

This unsupported feature is supported with the use of Sun Ops Center
2.5 when a zone is put on a NAS Storage Library.
 

Ah, ok.  I didn't know that.

   


Does anyone know how that works? I can't find it in the docs, no one 
inside of Sun seemed to have a clue when I asked around, etc. RTFM 
gladly taken.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Richard Elling

On Jan 8, 2010, at 6:20 AM, Frank Batschulat (Home) wrote:

On Fri, 08 Jan 2010 13:55:13 +0100, Darren J Moffat darr...@opensolaris.org 
 wrote:



Frank Batschulat (Home) wrote:
This just can't be an accident, there must be some coincidence and  
thus there's a good chance
that these CHKSUM errors must have a common source, either in ZFS  
or in NFS ?


What are you using for on the wire protection with NFS ?  Is it  
shared

using krb5i or do you have IPsec configured ?  If not I'd recommend
trying one of those and see if your symptoms change.


Hey Darren, doing krb5i is certainly a good idea for additional  
protection in general,
however I have some doubts that NFS OTW corruption will produce the  
exact same
wrong checksum inside 2 totally different setups and networks, as  
comparing

Mike and my results showed [see 1].


Attach a mirror (not on NFS) and see if the bitmap yields any clues.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] link in zpool upgrade -v broken

2010-01-08 Thread Ian Collins

Cindy Swearingen wrote:

Hi Ian,

I see the problem. In your included URL below, you didn't
include the /N suffix as included in the zpool upgrade
output.


That's correct, N is the version number.  I see it is fixed now, thanks.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [zones-discuss] Zones on shared storage - a warning

2010-01-08 Thread Mike Gerdts
On Fri, Jan 8, 2010 at 12:28 PM, Torrey McMahon tmcmah...@yahoo.com wrote:
 On 1/8/2010 10:04 AM, James Carlson wrote:

 Mike Gerdts wrote:


 This unsupported feature is supported with the use of Sun Ops Center
 2.5 when a zone is put on a NAS Storage Library.


 Ah, ok.  I didn't know that.



 Does anyone know how that works? I can't find it in the docs, no one inside
 of Sun seemed to have a clue when I asked around, etc. RTFM gladly taken.

Storage libraries are discussed very briefly at:

http://wikis.sun.com/display/OC2dot5/Storage+Libraries

Creation of zones is discussed at:

http://wikis.sun.com/display/OC2dot5/Creating+Zones

I've found no documentation that explains the implementation details.
From looking at a test environment that I have running, it seems to go
like:

1. The storage admin carves out some NFS space and exports it with the
appropriate options to the  various hosts (global zones).

2. In the Ops Center BUI, the ops center admin creates a new storage
library.  He selects type NFS and specifies the hostname and path that
was allocated.

3. The ops center admin associates the storage library with various
hosts.  This causes it to be be mounted at
/var/mnt/virtlibs/libraryId on those hosts.  I'll call this $libmnt.

4. When the sysadmin provisions a zone through ops center, a UUID is
allocated and associated with this zone.  I'll call it $zuuid.  A
directory $libmnt/$zuuid is created with a set of directories under
it.

5. As the sysadmin provisions ops center prompts for the virtual disk
size.  A file of that size is created at $libmnt/$zuuid/virtdisk/data.

6. Ops center creates a zpool:

zpool create -m /var/mnt/oc-zpools/$zuuid/ z$zuuid \
 $libmnt/$zuuid/virtdisk/data

7. The zonepath is created using a uuid that is unique to the zonepath
($puuid) z$zuuid/$puuid.  It has a quota and a reservation set (8G
each in the zpool history I am looking at).

8. The zone is configured with
zonepath=/var/mnt/oc-zpools/$zuuid/$puuid, then installed

Just in case anyone sees this as the right way to do things, I think
it is generally OK with a couple caveats. The key areas that I would
suggest for improvement are:

- Mount the NFS space with -o forcedirectio.  There is no need to
cache data twice.
- Never use UUID's in paths.  This makes it nearly impossible for a
sysadmin or a support person to look at the output of commands on the
system and understand what it is doing.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread Lutz Schumann
See the reads on the pool with the low I/O ? I suspect reading the DDT causes 
the writes to slow down. 

See this bug 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566. It seems to 
give some backgrounds. 

Can you test setting the primarycache=metadata on the volume you test ? This 
would be my initial test. My suggestion would be that it may improve the 
situation because your ARC can be better utilized for DDT (this does not make 
much sence for production without a SSD cache, because you practially disable 
all caches for reading without a L2ARC (aka SSD)!)

As I read the bug report above - it seems the if the DDT (deduplication table) 
does not fit into memory or dropped from there the DDT has to be read from disk 
causing massive random I/O.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread Ian Collins

James Lee wrote:

I haven't seen much discussion on how deduplication affects performance.
 I've enabled dudup on my 4-disk raidz array and have seen a significant
drop in write throughput, from about 100 MB/s to 3 MB/s.  I can't
imagine such a decrease is normal.

  

What is you data?

I've found data that lends its self to deduplication writes slightly 
faster while data that does not (video, iso images) writes dramatically 
slower. So I turn dedupe (and compression) off for filesystems 
containing random data.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread James Dickens
On Fri, Jan 8, 2010 at 1:44 PM, Ian Collins i...@ianshome.com wrote:

 James Lee wrote:

 I haven't seen much discussion on how deduplication affects performance.
  I've enabled dudup on my 4-disk raidz array and have seen a significant
 drop in write throughput, from about 100 MB/s to 3 MB/s.  I can't
 imagine such a decrease is normal.



 What is you data?

 I have seen the same,  fsstat reports 4-7 seconds of small writes then
bursts of 40-80MB/s but without dedup i see 80-150MB/s writes on my 4x 500GB
sata drives, split between two controllers. 6GB of ram, and about 1.5TB of
storage with 1.2TB used. if I disable dedup, speed goes backup. While doing
dedup writes  zfs destroy pool/filesystem takes about 100x time as usual
even if the pool is that is being destroyed is empty reports say its far
worse when over 100GB of data is on a drive. my dedup ratio for the pool is
1.15x. Read performance seems about the same or slightly faster I didn't
really benchmark this work load since my clients seem to be the bottleneck.

As money is tight at the moment i don't have the funds for a SSD to test
with, but have disk space on non-utilized disk to try but haven't researched
the effect of adding and removing (if possible) l2arc or zil log slices on a
pool. it would be great to enable a 5-50GB slice off a sata drive to use as
logging device for greater performance.


James Dickens
uadmin.blogspot.com

I've found data that lends its self to deduplication writes slightly faster
 while data that does not (video, iso images) writes dramatically slower. So
 I turn dedupe (and compression) off for filesystems containing random
 data.

 --
 Ian.


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Dedup Performance

2010-01-08 Thread James Lee
On 01/08/2010 02:42 PM, Lutz Schumann wrote:
 See the reads on the pool with the low I/O ? I suspect reading the
 DDT causes the writes to slow down.
 
 See this bug
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6913566.
 It seems to give some backgrounds.
 
 Can you test setting the primarycache=metadata on the volume you
 test ? This would be my initial test. My suggestion would be that it
 may improve the situation because your ARC can be better utilized for
 DDT (this does not make much sence for production without a SSD
 cache, because you practially disable all caches for reading without
 a L2ARC (aka SSD)!)
 
 As I read the bug report above - it seems the if the DDT
 (deduplication table) does not fit into memory or dropped from there
 the DDT has to be read from disk causing massive random I/O.

The symptoms described in that bug report do match up with mine.  I have
also experienced long hang times (1hr) destroying a dataset while the
disk just thrashes.

I tried setting primarycache=metadata, but that did not help.  I
pulled the DDT statistics for my pool, but don't know how to determine
its physical size-on-disk from that.  If deduplication ends up requiring
a separate sort-of log device, that will be a real shame.

 # zdb -DD nest
 DDT-sha256-zap-duplicate: 780321 entries, size 338 on disk, 174 in core
 DDT-sha256-zap-unique: 6188123 entries, size 335 on disk, 164 in core
 
 DDT histogram (aggregated over all DDTs):
 
 bucket  allocated   referenced  
 __   __   __
 refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
 --   --   -   -   -   --   -   -   -
  15.90M752G729G729G5.90M752G729G729G
  2 756K   94.0G   93.7G   93.6G1.48M188G187G187G
  45.36K152M   80.3M   81.5M22.4K618M325M330M
  8  258   4.05M   1.93M   2.00M2.43K   36.7M   16.3M   16.9M
 16   30434K 42K   50.9K  597   10.2M824K   1003K
 325255K   65.5K   66.6K  204   10.5M   3.26M   3.30M
 64   20   2.02M906K910K1.41K141M   62.0M   62.2M
1284  2K  2K   2.99K  723362K362K541K
2561 512 512 766  277138K138K207K
5122  1K  1K   1.50K1.62K830K830K   1.21M
  Total6.65M846G823G823G7.41M941G917G917G
 
 dedup = 1.11, compress = 1.03, copies = 1.00, dedup * compress / copies = 1.14

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] unable to zfs destroy

2010-01-08 Thread Rob Logan

this one has me alittle confused. ideas?

j...@opensolaris:~# zpool import z
cannot mount 'z/nukeme': mountpoint or dataset is busy
cannot share 'z/cle2003-1': smb add share failed
j...@opensolaris:~# zfs destroy z/nukeme
internal error: Bad exchange descriptor
Abort (core dumped)
j...@opensolaris:~# adb core
core file = core -- program ``/sbin/zfs'' on platform i86pc
SIGABRT: Abort
$c
libc_hwcap1.so.1`_lwp_kill+0x15(1, 6, 80462a8, fee9bb5e)
libc_hwcap1.so.1`raise+0x22(6, 0, 80462f8, fee7255a)
libc_hwcap1.so.1`abort+0xf2(8046328, fedd, 8046328, 8086570, 8086970, 400)
libzfs.so.1`zfs_verror+0xd5(8086548, 813, fedc5178, 804635c)
libzfs.so.1`zfs_standard_error_fmt+0x225(8086548, 32, fedc5178, 808acd0)
libzfs.so.1`zfs_destroy+0x10e(808acc8, 0, 0, 80479c8)
destroy_callback+0x69(808acc8, 8047910, 80555ec, 8047910)
zfs_do_destroy+0x31f(2, 80479c8, 80479c4, 80718dc)
main+0x26a(3, 80479c4, 80479d4, 8053fdf)
_start+0x7d(3, 8047ae4, 8047ae8, 8047af0, 0, 8047af9)
^d
j...@opensolaris:~# uname -a
SunOS opensolaris 5.11 snv_130 i86pc i386 i86pc
j...@opensolaris:~# zpool status -v z
  pool: z
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h39m, 19.15% done, 2h46m to go
config:

NAMESTATE READ WRITE CKSUM
z   ONLINE   0 0 2
  c3t0d0s7  ONLINE   0 0 4
  c3t1d0s7  ONLINE   0 0 0
  c2d0  ONLINE   0 0 4

errors: Permanent errors have been detected in the following files:

z/nukeme:0x0

j...@opensolaris:~# zfs list z/nukeme
NAME   USED  AVAIL  REFER  MOUNTPOINT
z/nukeme  49.0G   496G  49.0G  /z/nukeme
j...@opensolaris:~# zdb -d z/nukeme 0x0
zdb: can't open 'z/nukeme': Device busy

there is also no mount point /z/nukeme

any ideas how to nuke /z/nukeme?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] I/O Read starvation

2010-01-08 Thread bank kus
dd if=/dev/urandom of=largefile.txt bs=1G count=8

cp largefile.txt ./test/1.txt 
cp largefile.txt ./test/2.txt 

Thats it now the system is totally unusable after launching the two 8G copies. 
Until these copies finish no other application is able to launch completely. 
Checking prstat shows them to be in the sleep state. 

Question:
 I m guessing this because ZFS doesnt use CFQ and that one process is allowed 
to queue up all its I/O reads ahead of other processes?

 Is there a concept of priority among I/O reads? I only ask because if root 
were to launch some GUI application they dont start up until both copies are 
done. So there is no concept of priority? Needless to say this does not exist 
on Linux 2.60...
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ssd pool + ssd cache ?

2010-01-08 Thread Francois
Hi list,

Experimental question ...
Imagine a pool made of SSDs disks, is there any interest to add a SSD cache to 
it ? What real impact ?

Thx.

--
Francois
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss