Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread James Lever
Hi Tomas,

On 27/12/2009, at 7:25 PM, Tomas Bodzar wrote:

 pfexec zpool set dedup=verify rpool
 pfexec zfs set compression=gzip-9 rpool
 pfexec zfs set devices=off rpool/export/home
 pfexec zfs set exec=off rpool/export/home
 pfexec zfs set setuid=off rpool/export/home

grub doesn’t support gzip - so you will need to unset that and hope that it can 
still boot with what has been written to disk.  It is possible you will need to 
backup/reinstall.

I learnt this one the hard way - don’t use gzip compression on the root of your 
rpool (you can on child filesystems that are not involved in the boot process 
though)

HTH,
James
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Cyril Plisko
On Sun, Dec 27, 2009 at 11:25 AM, Tomas Bodzar bodz...@openbsd.cz wrote:
 Hi all,

 I installed another OpenSolaris (snv_129) in VirtualBox 3.1.0 on Windows 
 because snv_130 doesn't boot anymore after installation of VirtualBox guest 
 additions. Older builds before snv_129 were running fine too. I like some 
 features of this OS, but now I end with something funny.

 I installed default snv_129, installed guest additions - reboot, set 
 'noexec_user_stack=1 ; set noexec_user_stack-log=1' in /etc/system - reboot 
 and then :

 pfexec zpool set dedup=verify rpool
 pfexec zfs set compression=gzip-9 rpool
 pfexec zfs set devices=off rpool/export/home
 pfexec zfs set exec=off rpool/export/home
 pfexec zfs set setuid=off rpool/export/home

 after that reboot and I will end with error which you can see in attachment. 
 Boot with -s or -sv flags doesn't help it's in loop and all the time it will 
 end in this state without chance to get to login prompt either CLI or GUI.

 Is there a way to do something with it or is it fucked? :-) I want to know 
 just for my knowledge and knowledge of others thanks to archive in forum. No 
 data or similar stuff in this VM as it was only test.


gzip compression is not supported in GRUB zfs reader. You should avoid
using it for boot filesystem. If may try to revert compression setting
to off or on (which defaults to lzjb) and try to boot that way.
(That is if you didn't rewrite any critical data after setting gzip
compression).


-- 
Regards,
Cyril
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Tomas Bodzar
Uh, but why system allowed that if it's not running? And how to revert it as I 
can't boot even to single user mode? Is there a way to do that with Live CD?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Thomas Burgess
you should be able to boot with the live cd then import the pool i would
think...


On Sun, Dec 27, 2009 at 4:40 AM, Tomas Bodzar bodz...@openbsd.cz wrote:

 Uh, but why system allowed that if it's not running? And how to revert it
 as I can't boot even to single user mode? Is there a way to do that with
 Live CD?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Leonid Kogan

On 12/26/2009 10:41 AM, Saso Kiselkov wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Would an upgrade to the development repository of 2010.02 do the same?
I'd like to avoid having to do a complete reinstall, since I've got
quite a bit of custom software in the system already in various places
and recompiling and fine-tuning would take me another 1-2 days.

Regards,
- --
Saso

Leonid Kogan wrote:
   

Try b130.
http://genunix.org/

Cheers,
LK


On 12/26/2009 12:59 AM, Saso Kiselkov wrote:
 

Hi,

I tried it and I got the following error message:

# zfs set logbias=throughput content
cannot set property for 'content': invalid property 'logbias'

Is it because I'm running some older version which does not have this
feature? (2009.06)

Regards,
--
Saso

Leonid Kogan wrote:

   

Hi there,
Try to:
zfs set logbias=throughputyourdataset

Good luck,
LK


 


   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAks1zCIACgkQRO8UcfzpOHA1SQCaAqK+2v/+lQnuaXPc4pOju7UC
oaIAoNKJO3oOr4DCdCXHCp+vf2/Ri2mW
=pmGr
-END PGP SIGNATURE-
   

AFAIK yes.

LK

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Tomas Bodzar
So I booted from Live CD and then :

zpool import
pfexec zpool import -f rpool
pfexec zfs set compression=off rpool
pfexec zpool export rpool

and reboot but still same problem.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-12-27 Thread Muhammed Syyid
Hi
I just picked up one of these cards and had a few questions
After installing it I can see it via scanpci but any devices I've connected to 
it don't show up in iostat -En , is there anything specific I need to do to 
enable it?

Do any of you experience the bug mentioned below (worried about using it and 
losing my data)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6894775
http://opensolaris.org/jive/thread.jspa?threadID=117702tstart=1
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] adding extra drives without creating a second parity set

2009-12-27 Thread Michael Armstrong
Hi, I currently have 4x 1tb drives in a raidz configuration. I want to  
add another 2 x 1tb drives, however if i simply zpool add, i will only  
gain an extra 1tb of space as it will create a second raidz set inside  
the existing tank/pool. Is there a way to add my new drives into the  
existing raidz without losing even more space without rebuilding the  
entire pool from the beginning? if not, is this something being worked  
on currently? thanks and merry xmas!



On 25 Dec 2009, at 20:00, zfs-discuss-requ...@opensolaris.org wrote:


Send zfs-discuss mailing list submissions to
zfs-discuss@opensolaris.org

To subscribe or unsubscribe via the World Wide Web, visit
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
or, via email, send a message with subject or body 'help' to
zfs-discuss-requ...@opensolaris.org

You can reach the person managing the list at
zfs-discuss-ow...@opensolaris.org

When replying, please edit your Subject line so it is more specific
than Re: Contents of zfs-discuss digest...


Today's Topics:

  1. Re: Benchmarks results for ZFS + NFS, using SSD's as  slog
 devices (ZIL) (Freddie Cash)
  2. Re: Benchmarks results for ZFS + NFS,  using SSD's as  slog
 devices (ZIL) (Richard Elling)
  3. Re: Troubleshooting dedup performance (Michael Herf)
  4. ZFS write bursts cause short app stalls (Saso Kiselkov)


--

Message: 1
Date: Thu, 24 Dec 2009 17:34:32 PST
From: Freddie Cash fjwc...@gmail.com
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Benchmarks results for ZFS + NFS, using
SSD's as  slog devices (ZIL)
Message-ID: 2086438805.291261704902840.javamail.tweb...@sf-app1
Content-Type: text/plain; charset=UTF-8


Mattias Pantzare wrote:
That  would leave us with three options;

1) Deal with it and accept performance as it is.
2) Find a way to speed things up further for this
workload
3) Stop trying to use ZFS for this workload


Option 4 is to re-do your pool, using fewer disks per raidz2 vdev,  
giving more vdevs to the pool, and thus increasing the IOps for the  
whole pool.


14 disks in a single raidz2 vdev is going to give horrible IO,  
regardless of how fast the individual disks are.


Redoing it with 6-disk raidz2 vdevs, or even 8-drive raidz2 vdevs  
will give you much better throughput.


Freddie
--
This message posted from opensolaris.org


--

Message: 2
Date: Thu, 24 Dec 2009 17:39:11 -0800
From: Richard Elling richard.ell...@gmail.com
To: Freddie Cash fjwc...@gmail.com
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Benchmarks results for ZFS + NFS,using
SSD's as  slog devices (ZIL)
Message-ID: b8134afb-e6f1-4c62-a93b-d5826587b...@gmail.com
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

On Dec 24, 2009, at 5:34 PM, Freddie Cash wrote:


Mattias Pantzare wrote:
That  would leave us with three options;

1) Deal with it and accept performance as it is.
2) Find a way to speed things up further for this
workload
3) Stop trying to use ZFS for this workload


Option 4 is to re-do your pool, using fewer disks per raidz2 vdev,
giving more vdevs to the pool, and thus increasing the IOps for the
whole pool.

14 disks in a single raidz2 vdev is going to give horrible IO,
regardless of how fast the individual disks are.

Redoing it with 6-disk raidz2 vdevs, or even 8-drive raidz2 vdevs
will give you much better throughput.


At this point it is useful to know that if you do not have a
separate log, then the ZIL uses the pool and its data protection
scheme.  In other words, each ZIL write will be a raidz2 stripe
with its associated performance.
 -- richard



--

Message: 3
Date: Thu, 24 Dec 2009 21:22:28 -0800
From: Michael Herf mbh...@gmail.com
To: Richard Elling richard.ell...@gmail.com
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Troubleshooting dedup performance
Message-ID:
c65729770912242122k1c3f9cf4hdfa1c17789393...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

FWIW, I just disabled prefetch, and my dedup + zfs recv seems to be
running visibly faster (somewhere around 3-5x faster).

echo zfs_prefetch_disable/W0t1 | mdb -kw

Anyone else see a result like this?

I'm using the read bandwidth from the sending pool from zpool
iostat -x 5 to estimate transfer rate, since I assume the write rate
would be lower when dedup is working.

mike

p.s. Note to set it back to the default behavior:
echo zfs_prefetch_disable/W0t0 | mdb -kw


--

Message: 4
Date: Fri, 25 Dec 2009 18:57:32 +0100
From: Saso Kiselkov skisel...@gmail.com
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] ZFS write bursts cause short app stalls
Message-ID: 4b34fd0c.8090...@gmail.com
Content-Type: text/plain; charset=ISO-8859-1

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've started porting a video streaming 

Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Robert Milkowski

On 26/12/2009 12:22, Saso Kiselkov wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thank you, the post you mentioned helped me move a bit forward. I tried
putting:

zfs:zfs_txg_timeout = 1

btw: you can tune it on a live system without a need to do reboots.

mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
zfs_txg_timeout:
zfs_txg_timeout:30
mi...@r600:~# echo zfs_txg_timeout/W0t1 | mdb -kw
zfs_txg_timeout:0x1e=   0x1
mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
zfs_txg_timeout:
zfs_txg_timeout:1
mi...@r600:~# echo zfs_txg_timeout/W0t30 | mdb -kw
zfs_txg_timeout:0x1 =   0x1e
mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
zfs_txg_timeout:
zfs_txg_timeout:30
mi...@r600:~#

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Sriram Narayanan
You could revert to the @install snapshot (via the livecd) and swe if
that works for you.

-- Sriram

On 12/27/09, Tomas Bodzar bodz...@openbsd.cz wrote:
 So I booted from Live CD and then :

 zpool import
 pfexec zpool import -f rpool
 pfexec zfs set compression=off rpool
 pfexec zpool export rpool

 and reboot but still same problem.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] adding extra drives without creating a second parity set

2009-12-27 Thread Tim Cook
On Fri, Dec 25, 2009 at 5:49 PM, Michael Armstrong michael.armstr...@me.com
 wrote:

 Hi, I currently have 4x 1tb drives in a raidz configuration. I want to add
 another 2 x 1tb drives, however if i simply zpool add, i will only gain an
 extra 1tb of space as it will create a second raidz set inside the existing
 tank/pool. Is there a way to add my new drives into the existing raidz
 without losing even more space without rebuilding the entire pool from the
 beginning? if not, is this something being worked on currently? thanks and
 merry xmas!




No, you cannot currently expand a raid-z, and there is no ETA on being able
to do so.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Can I destroy a Zpool without importing it?

2009-12-27 Thread Havard Kruger
Hi, in the process of building a new fileserver and I'm currently playing 
around with various operating systems, I created a pool in Solaris, before I 
decided to try OpenSolaris aswell, so I installed OpenSolaris 20009.06, but I 
forgot to destroy the pool I created in Solaris, so now I can't import it 
because it's a newer version of ZFS in Solaris then it is in OpenSolaris.

And I can not seem to find a way to destroy the pool without importing it 
first. I guess I could format the drives in another OS, but that is alot more 
work then it should be. Is there any way to do this in OpenSolaris?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz data loss stories?

2009-12-27 Thread Al Hopper
I know I'm a bit late to contribute to this thread, but I'd still like to
add my $0.02.  My gut feel is that we (generally) don't yet understand the
subtleties of disk drive failure modes as they relate to 1.5 or 2Tb+ drives.
 Why?  Because those large drives have not been widely available until
relatively recently.

There's a tendency to extrapolate ones existing knowledge base and
understanding of how/why drives fail (or degrade) by basing our expected
outcome on some extension of our existing knowledge base.  In the case of
the current generation of high capacity drives, that may or may not be
appropriate.  We simply don't know!  Mainly because the hard drive
manufacturers, those engineering gods and providers of ever increasing
storage density, don't communicate their acquired and evolving knowledge as
it relates to disk reliability (or failure) mechanisms.

In this case I feel, as a user, it's best to take a very conservative
approach and err on the side of safety by using raidz3 when high capacity
drives are being deployed.  Over time, a consensus based understanding of
the failure modes will emerge and then, from a user perspective, we can have
a clearer understanding of the risks of data loss and its relation to
different ZFS pool configurations.

Personally, I was surprised at how easily I was able to take out a 1Tb WD
Caviar black drive by moving a 1U server with the drives spinning.  Earlier
drive generations (500Gb or smaller) tolerated this abuse with no signs of
degradation.  So I know that high capacity drives are a lot more sensitive
to mechanical abuse - I can only assume that 2Tb drives are probably even
more sensitive and that shock mounting, to reduce vibration induced by a
bunch of similar drives operating in the same box, is probably a smart
move.

Likewise, my previous experience has seen how a given percentage of disk
drives would fail in the 2 or 3 week period following a temperature
excursion in a data center environment.  Sometimes everyone knows about
that event, and sometimes the folks doing A/C work over a holiday weekend
will forget to publish the details of what went wrong! :)   Again - the
same doubts continue to nag me: are the current 1.5Tb+ drives more likely to
suffer degradation due to a temperature excursion over a relatively small
time period?  If the drive firmware does its job and remaps damaged sectors
or tracks transparently, we, as the users, won't know - until it happens one
time too many!!

Regards,

-- 
Al Hopper  Logical Approach Inc,Plano,TX a...@logical-approach.com
  Voice: 972.379.2133 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] repost - high read iops

2009-12-27 Thread Richard Elling

OK, I'll take a stab at it...

On Dec 26, 2009, at 9:52 PM, Brad wrote:


repost - Sorry for ccing the other forums.

I'm running into a issue where there seems to be a high number of  
read iops hitting disks and physical free memory is fluctuating  
between 200MB - 450MB out of 16GB total. We have the l2arc  
configured on a 32GB Intel X25-E ssd and slog on another 32GB X25-E  
ssd.


OK, this shows that memory is being used... a good thing.

According to our tester, Oracle writes are extremely slow (high  
latency).


OK, this is a workable problem statement... another good thing.


Below is a snippet of iostat:

r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
4898.3 34.2 23.2 1.4 0.1 385.3 0.0 78.1 0 1246 c1
0.0 0.8 0.0 0.0 0.0 0.0 0.0 16.0 0 1 c1t0d0
401.7 0.0 1.9 0.0 0.0 31.5 0.0 78.5 1 100 c1t1d0
421.2 0.0 2.0 0.0 0.0 30.4 0.0 72.3 1 98 c1t2d0
403.9 0.0 1.9 0.0 0.0 32.0 0.0 79.2 1 100 c1t3d0
406.7 0.0 2.0 0.0 0.0 33.0 0.0 81.3 1 100 c1t4d0
414.2 0.0 1.9 0.0 0.0 28.6 0.0 69.1 1 98 c1t5d0
406.3 0.0 1.8 0.0 0.0 32.1 0.0 79.0 1 100 c1t6d0
404.3 0.0 1.9 0.0 0.0 31.9 0.0 78.8 1 100 c1t7d0
404.1 0.0 1.9 0.0 0.0 34.0 0.0 84.1 1 100 c1t8d0
407.1 0.0 1.9 0.0 0.0 31.2 0.0 76.6 1 100 c1t9d0
407.5 0.0 2.0 0.0 0.0 33.2 0.0 81.4 1 100 c1t10d0
402.8 0.0 2.0 0.0 0.0 33.5 0.0 83.2 1 100 c1t11d0
408.9 0.0 2.0 0.0 0.0 32.8 0.0 80.3 1 100 c1t12d0
9.6 10.8 0.1 0.9 0.0 0.4 0.0 20.1 0 17 c1t13d0
0.0 22.7 0.0 0.5 0.0 0.5 0.0 22.8 0 33 c1t14d0


You are getting 400+ IOPS @ 4 KB out of HDDs.  Count your lucky stars.
Don't expect that kind of performance as normal, it is much better than
normal.

Is this an indicator that we need more physical memory? From http://blogs.sun.com/brendan/entry/test 
, the order that a read request is satisfied is:


   0) Oracle SGA

1) ARC
2) vdev cache of L2ARC devices
3) L2ARC devices
4) vdev cache of disks
5) disks

Using arc_summary.pl, we determined that prefletch was not helping  
much so we disabled.


CACHE HITS BY DATA TYPE:
Demand Data: 22% 158853174
Prefetch Data: 17% 123009991 ---not helping???
Demand Metadata: 60% 437439104
Prefetch Metadata: 0% 2446824

The write iops started to kick in more and latency reduced on  
spinning disks:


0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0
0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 c0t0d0
1629.0 968.0 17.4 7.3 0.0 35.9 0.0 13.8 0 1088 c1
0.0 1.9 0.0 0.0 0.0 0.0 0.0 1.7 0 0 c1t0d0
126.7 67.3 1.4 0.2 0.0 2.9 0.0 14.8 0 90 c1t1d0
129.7 76.1 1.4 0.2 0.0 2.8 0.0 13.7 0 90 c1t2d0
128.0 73.9 1.4 0.2 0.0 3.2 0.0 16.0 0 91 c1t3d0
128.3 79.1 1.3 0.2 0.0 3.6 0.0 17.2 0 92 c1t4d0
125.8 69.7 1.3 0.2 0.0 2.9 0.0 14.9 0 89 c1t5d0
128.3 81.9 1.4 0.2 0.0 2.8 0.0 13.1 0 89 c1t6d0
128.1 69.2 1.4 0.2 0.0 3.1 0.0 15.7 0 93 c1t7d0
128.3 80.3 1.4 0.2 0.0 3.1 0.0 14.7 0 91 c1t8d0
129.2 69.3 1.4 0.2 0.0 3.0 0.0 15.2 0 90 c1t9d0
130.1 80.0 1.4 0.2 0.0 2.9 0.0 13.6 0 89 c1t10d0
126.2 72.6 1.3 0.2 0.0 2.8 0.0 14.2 0 89 c1t11d0
129.7 81.0 1.4 0.2 0.0 2.7 0.0 12.9 0 88 c1t12d0
90.4 41.3 1.0 4.0 0.0 0.2 0.0 1.2 0 6 c1t13d0
0.0 24.3 0.0 1.2 0.0 0.0 0.0 0.2 0 0 c1t14d0


latency is reduced, but you are also now only seeing 200 IOPS,
not 400+ IOPS.  This is closer to what you would see as a max
for HDDs.

I cannot tell which device is the cache device.  I would expect
to see one disk with significantly more reads than the others.
What do the l2arc stats show?

Is it true if your MFU stats start to go over 50% then more memory  
is needed?


That is a good indicator. It means that most of the cache entries are
frequently used. Grow your SGA and you should see this go down.


CACHE HITS BY CACHE LIST:
Anon: 10% 74845266 [ New Customer, First Cache Hit ]
Most Recently Used: 19% 140478087 (mru) [ Return Customer ]
Most Frequently Used: 65% 475719362 (mfu) [ Frequent Customer ]
Most Recently Used Ghost: 2% 20785604 (mru_ghost) [ Return Customer  
Evicted, Now Back ]
Most Frequently Used Ghost: 1% 9920089 (mfu_ghost) [ Frequent  
Customer Evicted, Now Back ]

CACHE HITS BY DATA TYPE:
Demand Data: 22% 158852935
Prefetch Data: 17% 123009991
Demand Metadata: 60% 437438658
Prefetch Metadata: 0% 2446824

My theory is since there's not enough memory for the arc to cache  
data, its hits the l2arc where it can't find data and has to query  
the disk for the request. This causes contention between reads and  
writes causing the service times to inflate.


If you have a choice of where to use memory, always choose closer to
the application. Try a larger SGA first.  Be aware of large page  
stealing --

consider increasing the SGA immediately after a reboot and before the
database or applications are started.
 -- richard


uname: 5.10 Generic_141445-09 i86pc i386 i86pc
Sun Fire X4270: 11+1 raidz (SAS)
   l2arc Intel X25-E
   slog Intel X25-E
Thoughts?
--
This message posted from opensolaris.org
___
zfs-discuss mailing list

Re: [zfs-discuss] How to destroy your system in funny way with ZFS

2009-12-27 Thread Bob Friesenhahn

On Sun, 27 Dec 2009, Cyril Plisko wrote:


gzip compression is not supported in GRUB zfs reader. You should avoid
using it for boot filesystem. If may try to revert compression setting
to off or on (which defaults to lzjb) and try to boot that way.
(That is if you didn't rewrite any critical data after setting gzip
compression).


After changing the compression back to something which is supported, 
the boot archive could be manually re-generated:


  bootadm update-archive -R /a

where '/a' is a mount of the alternate root (the root on the disk).

If any data is accessed which still uses gzip, then there will still 
be no joy.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I destroy a Zpool without importing it?

2009-12-27 Thread Sriram Narayanan
opensolaris has a newer version of ZFS than Solaris. What you have is
a pool that was not marked as exported for use on a different OS
install.

Simply force import the pool using zpool import -f

-- Sriram

On 12/27/09, Havard Kruger hva...@broadpark.no wrote:
 Hi, in the process of building a new fileserver and I'm currently playing
 around with various operating systems, I created a pool in Solaris, before I
 decided to try OpenSolaris aswell, so I installed OpenSolaris 20009.06, but
 I forgot to destroy the pool I created in Solaris, so now I can't import it
 because it's a newer version of ZFS in Solaris then it is in OpenSolaris.

 And I can not seem to find a way to destroy the pool without importing it
 first. I guess I could format the drives in another OS, but that is alot
 more work then it should be. Is there any way to do this in OpenSolaris?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I destroy a Zpool without importing it?

2009-12-27 Thread Sriram Narayanan
Also, if you don't care about the existing pool and want to create a
new pool one the same devices, you can go ahead and do so.

The format command will list the storage devices available to you.

-- Sriram

On 12/27/09, Sriram Narayanan sri...@belenix.org wrote:
 opensolaris has a newer version of ZFS than Solaris. What you have is
 a pool that was not marked as exported for use on a different OS
 install.

 Simply force import the pool using zpool import -f

 -- Sriram

 On 12/27/09, Havard Kruger hva...@broadpark.no wrote:
 Hi, in the process of building a new fileserver and I'm currently playing
 around with various operating systems, I created a pool in Solaris, before
 I
 decided to try OpenSolaris aswell, so I installed OpenSolaris 20009.06,
 but
 I forgot to destroy the pool I created in Solaris, so now I can't import
 it
 because it's a newer version of ZFS in Solaris then it is in OpenSolaris.

 And I can not seem to find a way to destroy the pool without importing it
 first. I guess I could format the drives in another OS, but that is alot
 more work then it should be. Is there any way to do this in OpenSolaris?
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


 --
 Sent from my mobile device


-- 
Sent from my mobile device
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I destroy a Zpool without importing it?

2009-12-27 Thread Colin Raven
Are there any negative consequences as a result of a force import? I mean
STUNT; Sudden Totally Unexpected and Nasty Things
-Me

On Sun, Dec 27, 2009 at 17:55, Sriram Narayanan sri...@belenix.org wrote:

 opensolaris has a newer version of ZFS than Solaris. What you have is
 a pool that was not marked as exported for use on a different OS
 install.

 Simply force import the pool using zpool import -f

 -- Sriram

 On 12/27/09, Havard Kruger hva...@broadpark.no wrote:
  Hi, in the process of building a new fileserver and I'm currently playing
  around with various operating systems, I created a pool in Solaris,
 before I
  decided to try OpenSolaris aswell, so I installed OpenSolaris 20009.06,
 but
  I forgot to destroy the pool I created in Solaris, so now I can't import
 it
  because it's a newer version of ZFS in Solaris then it is in OpenSolaris.
 
  And I can not seem to find a way to destroy the pool without importing it
  first. I guess I could format the drives in another OS, but that is alot
  more work then it should be. Is there any way to do this in OpenSolaris?
  --
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

 --
 Sent from my mobile device
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Saso Kiselkov
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks for the mdb syntax - I wasn't sure how to set it using mdb at
runtime, which is why I used /etc/system. I was quite intrigued to find
out that the Solaris kernel was in fact designed for being tuned at
runtime using some generic debugging mechanism, rather than like other
traditional kernels, using a defined kernel settings interface (sysctl
comes to mind).

Anyway, upgrading to b130 helped my issue and I hope that by the time we
start selling this product, OpenSolaris 2010.02 comes out, so that I can
tell people to just grab the latest stable OpenSolaris release, rather
than having to go to a development branch or tuning kernel parameters to
even get the software working as it should.

Regards,
- --
Saso

Robert Milkowski wrote:
 On 26/12/2009 12:22, Saso Kiselkov wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Thank you, the post you mentioned helped me move a bit forward. I tried
 putting:

 zfs:zfs_txg_timeout = 1
 btw: you can tune it on a live system without a need to do reboots.
 
 mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
 zfs_txg_timeout:
 zfs_txg_timeout:30
 mi...@r600:~# echo zfs_txg_timeout/W0t1 | mdb -kw
 zfs_txg_timeout:0x1e=   0x1
 mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
 zfs_txg_timeout:
 zfs_txg_timeout:1
 mi...@r600:~# echo zfs_txg_timeout/W0t30 | mdb -kw
 zfs_txg_timeout:0x1 =   0x1e
 mi...@r600:~# echo zfs_txg_timeout/D | mdb -k
 zfs_txg_timeout:
 zfs_txg_timeout:30
 mi...@r600:~#
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAks3lGkACgkQRO8UcfzpOHBzcwCgyDlxr94I9r8kHbVEkTt1lu0Y
AOIAmgJnZ5nZw8j7FS+irrJWJ4RBup0Q
=0g8/
-END PGP SIGNATURE-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Upgrading from snv_121 to snv_129+

2009-12-27 Thread Errol Neal
I'm looking at upgrading a box serving a lun via iSCSI (Comstar) currently 
running snv_121. 
The initiators are cluster pair running SLES11.

Any gotchas that I should be aware of?
Sys Specs are:

2 Xeon E5410 procs
8 GB RAM
12 10K Savios
1  X25-E as ZIL
Supermicro rebranded LSI SAS controller

The drives are configured as a single mirrored pool

Thanks in advance,

EN
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-27 Thread Austin
I don't know how much progress has been made on this, but back when I moved 
from FreeBSD (an older version, maybe the first to have stable ZFS) to Solaris, 
this couldn't be done since they were not quite compatible yet. I got some new 
drives since the ones I had were dated, copied the data to the new Solaris 
system with a network connection, and then tried to import the old drives to 
see if it could be done. If I remember correctly (which I might not), they 
imported, but the data wasn't there. I know it didn't work.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-27 Thread tom wagner
4 Gigabytes.  The hang on my system happens much faster. I can watch the drives 
light up and run iostat but 3 minutes in like clockwork everything gets hung 
and I'm left with a blinking cursor at the console that newlines but doesn't do 
anything. Although if I run kmdb and hit f1-a I can get into the debugger.

I'm thinking upon import it sees the destroy never finished and tries again 
during import and the same thing that hung the system during the original 
destroy is hanging the pool again and again during these imports.  for the heck 
of it I even tried using the -F to roll the uber block, but no joy.  I wouldn't 
recommend this as it can be destructive, but since I pulled the mirrored drive 
physically out of my pool, I still have a good copy of the original pool

I'm really worried that I won't get this data back. I'm hoping its just a 
resource leak issue or something rather than corrupted metadata from destroying 
a dedup zvol.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Roch Bourbonnais


Le 26 déc. 09 à 04:47, Tim Cook a écrit :




On Fri, Dec 25, 2009 at 11:57 AM, Saso Kiselkov  
skisel...@gmail.com wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I've started porting a video streaming application to opensolaris on
ZFS, and am hitting some pretty weird performance issues. The thing  
I'm

trying to do is run 77 concurrent video capture processes (roughly
430Mbit/s in total) all writing into separate files on a 12TB J4200
storage array. The disks in the array are arranged into a single  
RAID-0

ZFS volume (though I've tried different RAID levels, none helped). CPU
performance is not an issue (barely hitting 35% utilization on a  
single

CPU quad-core X2250). I/O bottlenecks can also be ruled out, since the
storage array's sequential write performance is around 600MB/s.

The problem is the bursty behavior of ZFS writes. All the capture
processes do, in essence is poll() on a socket and then read() and
write() any available data from it to a file. The poll() call is done
with a timeout of 250ms, expecting that if no data arrives within 0.25
seconds, the input is dead and recording stops (I tried increasing  
this

value, but the problem still arises, although not as frequently). When
ZFS decides that it wants to commit a transaction group to disk (every
30 seconds), the system stalls for a short amount of time and  
depending

on the number capture of processes currently running, the poll() call
(which usually blocks for 1-2ms), takes on the order of hundreds of  
ms,
sometimes even longer. I figured that I might be able to resolve  
this by

lowering the txg timeout to something like 1-2 seconds (I need ZFS to
write as soon as data arrives, since it will likely never be
overwritten), but I couldn't find any tunable parameter for it  
anywhere

on the net. On FreeBSD, I think this can be done via the
vfs.zfs.txg_timeout sysctl. A glimpse into the source at
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c
on line 40 made me worry that somebody maybe hard-coded this value  
into

the kernel, in which case I'd be pretty much screwed in opensolaris.

Any help would be greatly appreciated.

Regards,
- --
Saso




Hang on... if you've got 77 concurrent threads going, I don't see  
how that's a sequential I/O load.  To the backend storage it's  
going to look like the equivalent of random I/O.



I see this posted once in  a while and I'm not sure where that comes  
from. Sequential workloads are important inasmuch as the FS/VM can  
detect and issue large request to disk (followed by cache hits)  
instead of multiple small ones.  The detection for ZFS is done at the  
file level and so the fact that one has N concurrent streams going is  
not relevant.
On writes ZFS and the Copy-On-Write model makes sequential/random  
distinction not very defining. All writes are targetting free blocks.


-r


I'd also be surprised to see 12 1TB disks supporting 600MB/sec  
throughput and would be interested in hearing where you got those  
numbers from.


Is your video capture doing 430MB or 430Mbit?

--
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




smime.p7s
Description: S/MIME cryptographic signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to configure simple RAID-Z on OpenSolaris?

2009-12-27 Thread Hillel Lubman
May be this question was already asked here, so I'm sorry for redundancy. 

What is a minimal amount of hard drives for enabling  RAID-Z on OpenSolaris? Is 
it possible to have only 4 identical hard drives, and to install a whole system 
on them with software RAID-Z underneath? Or enabling RAID-Z is possible only on 
some additional 4 disks to the one from which the system boots?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Moving a pool from FreeBSD 8.0 to opensolaris

2009-12-27 Thread Thomas Burgess
This isn't an option for me.  The current machine is going to be totally
upgraded,

New motherboard, new ram (ecc) new controller cards  and 9 new hard drives.


Current pool is 3 raidz1 vdevs with 4 drives each (all 1 tb)

It's about 65% full.

If i have to use some other filesystem that is an option, but i need to be
able to use the 9 new disks to somehow backup the data on the FreeBSD 8.0
system, then destory the old pool, create a new pool using the same drives
in Opensolaris with the new motherboard/controillers/memory and copy the
data to the new pool
Afterward i intend to create more vdevs (i haven't decided which way to go
yeti could either do 5  4 disk raidz vdevs, 4 5 disk raidz vdevs or
maybe something weird like 2 7 disk raidz vdevs with 1 6 disk raidz
vdevi don't know yetbut that's beside the point)


On Sun, Dec 27, 2009 at 2:07 PM, Austin shyguy91...@atwelm.com wrote:

 I don't know how much progress has been made on this, but back when I moved
 from FreeBSD (an older version, maybe the first to have stable ZFS) to
 Solaris, this couldn't be done since they were not quite compatible yet. I
 got some new drives since the ones I had were dated, copied the data to the
 new Solaris system with a network connection, and then tried to import the
 old drives to see if it could be done. If I remember correctly (which I
 might not), they imported, but the data wasn't there. I know it didn't work.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to configure simple RAID-Z on OpenSolaris?

2009-12-27 Thread Bob Friesenhahn

On Sun, 27 Dec 2009, Hillel Lubman wrote:


May be this question was already asked here, so I'm sorry for redundancy.

What is a minimal amount of hard drives for enabling RAID-Z on 
OpenSolaris? Is it possible to have only 4 identical hard drives, 
and to install a whole system on them with software RAID-Z 
underneath? Or enabling RAID-Z is possible only on some additional 4 
disks to the one from which the system boots?


You can use as little as three drives for raidz but OpenSolaris can 
only boot from a single drive, or a mirror pair.  It can't boot from 
raidz.  This means that you need to dedicate one or two drives (or 
partitions) for the root pool.  If you don't mind losing some 
performance, you could use partitioning to put a 30GB partition on the 
first two drives, use that for a mirrored bootable root pool, and then 
use the remainder for a different pool with two hog partitions (from 
first two disks) plus two drives in one raidz vdev.  This allows you 
to use raidz across all four drives, but losing 30GB from each drive, 
and using partitioning on the first two drives.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to configure simple RAID-Z on OpenSolaris?

2009-12-27 Thread Hillel Lubman
Bob Friesenhahn wrote:
 OpenSolaris can only boot from a single drive, or a mirror pair. It can't 
 boot from
 raidz. This means that you need to dedicate one or two drives (or
 partitions) for the root pool.

Thanks, that's what I wanted to know. But why can't OpenSolaris boot from its 
own RAID-Z? Is it a GRUB related limitation? It would make sense to be able to 
boot from RAID-Z if it's such an integral part of ZFS.

Regards,

Hillel.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to configure simple RAID-Z on OpenSolaris?

2009-12-27 Thread Bob Friesenhahn

On Sun, 27 Dec 2009, Hillel Lubman wrote:


Thanks, that's what I wanted to know. But why can't OpenSolaris boot 
from its own RAID-Z? Is it a GRUB related limitation? It would make 
sense to be able to boot from RAID-Z if it's such an integral part 
of ZFS.


Yes, it is a GRUB limitation.  Anything put into GRUB becomes 
GPL-encumbered.  There is also the issue that GRUB can not communicate 
with the fault monitoring system (FMA) since it is not running yet, 
and GRUB is unlikely to be able to make correct recovery decisions for 
complex fault scenarios.  Keeping things simple improves system 
reliability.  I have heard that FreeBSD has a RAID-Z boot available.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [osol-help] zfs destroy stalls, need to hard reboot

2009-12-27 Thread Brent Jones
On Sun, Dec 27, 2009 at 12:55 AM, Stephan Budach stephan.bud...@jvm.de wrote:
 Brent,

 I had known about that bug a couple of weeks ago, but that bug has been files 
 against v111 and we're at v130. I have also seached the ZFS part of this 
 forum and really couldn't find much about this issue.

 The other issue I noticed is that, as opposed to the statements I read, that 
 once zfs is underway destroying a big dataset, other operations would 
 continue to work, but that doesen't seem to be the case. When destroying the 
 3 TB dataset, the other zvol that had been exported via iSCSI stalled as well 
 and that's really bad.

 Cheers,
 budy
 --
 This message posted from opensolaris.org
 ___
 opensolaris-help mailing list
 opensolaris-h...@opensolaris.org


I just tested your claim, and you appear to be correct.

I created a couple dummy ZFS filesystems, loaded them with about 2TB,
exported them via CIFS, and destroyed one of them.
The destroy took the usual amount of time (about 2 hours), and
actually, quite to my surprise, all I/O on the ENTIRE zpool stalled.
I dont recall seeing this prior to 130, in fact, I know I would have
noticed this, as we create and destroy large ZFS filesystems very
frequently.

So it seems the original issue I reported many months back has
actually gained some new negative impacts  :(

I'll try to escalate this with my Sun support contract, but Sun
support still isn't very familiar/clued in about OpenSolaris, so I
doubt I will get very far.

Cross posting to ZFS-discuss also, as other may have seen this and
know of a solution/workaround.



-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-27 Thread Jack Kielsmeier
It sounds like you have less data on yours, perhaps that is why yours freezes 
faster.

Whatever mine is doing during the import, it reads my disks now for nearly 
24-hours, and then starts writing to the disks.

The reads start out fast, then they just sit, going at something like 20k / 
second on each disk in my raidz1 pool.

As soon as it's done reading whatever it's reading, it starts to write, that is 
when the freeze happens.

I think the folks here from Sun that have been assisting here are on holiday 
break. I'm guessing there won't be further assistance from them until after the 
first of the year.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-27 Thread Jack Kielsmeier
Here is iostat output of my disks being read:

r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   45.30.0   27.60.0  0.0  0.60.0   13.3   0  60 c3d0
   44.30.0   27.00.0  0.0  0.30.07.7   0  34 c3d1
   43.50.0   27.40.0  0.0  0.50.0   12.6   0  55 c4d0
   41.10.0   24.90.0  0.0  0.30.08.0   0  33 c4d1

very very slow

It didn't used to take as long to freeze for me, but every time I restart the 
process, the 'reading' portion of the zpool import seems to take much longer.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-27 Thread scottford
I have a pool in the same state.  I deleted a file set that was compressed and 
deduped and had a bunch of zero blocks in it.  The delete ran for a while and 
then it hung.  Trying to import with any combination of -f or -fF or -fFX gives 
the same results you guys get.  zdb -eud shows all my file sets and the one I 
deleted gives error 16 inconsistent data.  My pool has 12 750GB drives with 2 
vdevs 6 drives in each raidz2 set.  I have 12GB ram and a corei7.  When I run 
the import it can return as fast as a few hours or as long as 3 days depending 
on the options I choose.  Each run does end with a system hang.  At some points 
all 8 logical cpus are running at 50% for hours.
I don't mind rolling back to a previous consistent state, just need some help 
getting it right.
I started on snv_128 and since have upgraded to 129.  Haven't tried 130 yet.  I 
tried limiting the arc size according to another post and it just took longer 
to get to the system hang.

Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Upgrading from snv_121 to snv_129+

2009-12-27 Thread Eric D. Mudama

On Sun, Dec 27 at 12:57, Errol Neal wrote:

I'm looking at upgrading a box serving a lun via iSCSI (Comstar) currently 
running snv_121.
The initiators are cluster pair running SLES11.

Any gotchas that I should be aware of?


I bumped into 4 of the ~12 warnings in the release notes.  I don't
have a link offhand, but be prepared.  This was on a 101 to 129
upgrade.

packagemanager still doesn't function on my system, with no
resolution, but pkg does.

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Tim Cook
On Sun, Dec 27, 2009 at 1:38 PM, Roch Bourbonnais
roch.bourbonn...@sun.comwrote:


 Le 26 déc. 09 à 04:47, Tim Cook a écrit :



 On Fri, Dec 25, 2009 at 11:57 AM, Saso Kiselkov skisel...@gmail.com
 wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 I've started porting a video streaming application to opensolaris on
 ZFS, and am hitting some pretty weird performance issues. The thing I'm
 trying to do is run 77 concurrent video capture processes (roughly
 430Mbit/s in total) all writing into separate files on a 12TB J4200
 storage array. The disks in the array are arranged into a single RAID-0
 ZFS volume (though I've tried different RAID levels, none helped). CPU
 performance is not an issue (barely hitting 35% utilization on a single
 CPU quad-core X2250). I/O bottlenecks can also be ruled out, since the
 storage array's sequential write performance is around 600MB/s.

 The problem is the bursty behavior of ZFS writes. All the capture
 processes do, in essence is poll() on a socket and then read() and
 write() any available data from it to a file. The poll() call is done
 with a timeout of 250ms, expecting that if no data arrives within 0.25
 seconds, the input is dead and recording stops (I tried increasing this
 value, but the problem still arises, although not as frequently). When
 ZFS decides that it wants to commit a transaction group to disk (every
 30 seconds), the system stalls for a short amount of time and depending
 on the number capture of processes currently running, the poll() call
 (which usually blocks for 1-2ms), takes on the order of hundreds of ms,
 sometimes even longer. I figured that I might be able to resolve this by
 lowering the txg timeout to something like 1-2 seconds (I need ZFS to
 write as soon as data arrives, since it will likely never be
 overwritten), but I couldn't find any tunable parameter for it anywhere
 on the net. On FreeBSD, I think this can be done via the
 vfs.zfs.txg_timeout sysctl. A glimpse into the source at

 http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/txg.c
 on line 40 made me worry that somebody maybe hard-coded this value into
 the kernel, in which case I'd be pretty much screwed in opensolaris.

 Any help would be greatly appreciated.

 Regards,
 - --
 Saso




 Hang on... if you've got 77 concurrent threads going, I don't see how
 that's a sequential I/O load.  To the backend storage it's going to look
 like the equivalent of random I/O.



 I see this posted once in  a while and I'm not sure where that comes from.
 Sequential workloads are important inasmuch as the FS/VM can detect and
 issue large request to disk (followed by cache hits) instead of multiple
 small ones.  The detection for ZFS is done at the file level and so the fact
 that one has N concurrent streams going is not relevant.
 On writes ZFS and the Copy-On-Write model makes sequential/random
 distinction not very defining. All writes are targetting free blocks.

 -r



That is ONLY true when there's significant free space available/a fresh
pool.  Once those files have been deleted and the blocks put back into the
free pool, they're no longer sequential on disk, they're all over the
disk.  So it makes a VERY big difference.  I'm not sure why you'd be shocked
someone would bring this up.

-- 
--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Bob Friesenhahn

On Sun, 27 Dec 2009, Tim Cook wrote:


That is ONLY true when there's significant free space available/a 
fresh pool.  Once those files have been deleted and the blocks put 
back into the free pool, they're no longer sequential on disk, 
they're all over the disk.  So it makes a VERY big difference.  I'm 
not sure why you'd be shocked someone would bring this up.   --


While I don't know what zfs actually does, I do know that it performs 
large disk allocations (e.g. 1MB) and then parcels 128K zfs blocks 
from those allocations.  If the zfs designers are wise, then they will 
use knowledge of sequential access to ensure that all of the 128K 
blocks from a metaslab allocation are pre-assigned for use by that 
file, and they will try to choose metaslabs which are followed by free 
metaslabs, or close to other free metaslabs.  This approach would tend 
to limit the sequential-access damage caused by COW and free block 
fragmentation on a dirty disk.


This sort of planning is not terribly different than detecting 
sequential read I/O and scheduling data reads in advance of 
application requirements.  If you can intelligently pre-fetch data 
blocks, then you can certainly intelligently pre-allocate data blocks.


Today I did an interesting (to me) test where I ran two copies of 
iozone at once on huge (up to 64GB) files.  The results were somewhat 
amazing to me.  The cause of the amazement was that I noticed that the 
reported data rates from iozone did not drop very much (e.g. a 
single-process write rate of 359MB/second dropped to 298MB/second with 
two processes).  This clearly showed that zfs is doing quite a lot of 
smart things when writing files and that it is optimized for 
several/many writers rather than just one.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Extremely bad performance - hw failure?

2009-12-27 Thread Morten-Christian Bernson
Lately my zfs pool in my home server has degraded to a state where it can be 
said it doesn't work at all.  Read spead is slower than I can read from the 
internet on my slow dsl-line... This is compared to just a short while ago, 
where I could read from it with over 50mb/sec over the network.

My setup:
Running latest Solaris 10: # uname -a
SunOS solssd01 5.10 Generic_142901-02 i86pc i386 i86pc

# zpool status DATA
  pool: DATA
 state: ONLINE
config:
NAMESTATE READ WRITE CKSUM
DATAONLINE   0 0 0
  raidz1ONLINE   0 0 0
c2t5d0  ONLINE   0 0 0
c2t4d0  ONLINE   0 0 0
c2t3d0  ONLINE   0 0 0
c2t2d0  ONLINE   0 0 0
spares
  c0t2d0AVAIL
errors: No known data errors

# zfs list -r DATA
NAME   USED  AVAIL  REFER  MOUNTPOINT
DATA  3,78T   229G  3,78T  /DATA

All of the drives in this pool are 1.5tb western digital green drives. I am not 
seeing any error messages in /var/adm/messages, and fmdump -eV shows no 
errors...   However, I am seeing some soft faults in iostat -eEn:
  errors ---
  s/w h/w trn tot device
  2   0   0   2 c0t0d0
  1   0   0   1 c1t0d0
  2   0   0   2 c2t1d0
151   0   0 151 c2t2d0
151   0   0 151 c2t3d0
153   0   0 153 c2t4d0
153   0   0 153 c2t5d0
  2   0   0   2 c0t1d0
  3   0   0   3 c0t2d0
  0   0   0   0 solssd01:vold(pid531)
c0t0d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 31.87GB 31866224128 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c1t0d0   Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: _NEC Product: DVD_RW ND-3500AG Revision: 2.16 Serial No:
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c2t1d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:
Size: 750.16GB 750156373504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c2t2d0   Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 151 Predictive Failure Analysis: 0
c2t3d0   Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 151 Predictive Failure Analysis: 0
c2t4d0   Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 153 Predictive Failure Analysis: 0
c2t5d0   Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 153 Predictive Failure Analysis: 0
c0t1d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 31.87GB 31866224128 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c0t2d0   Soft Errors: 3 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 1497.86GB 1497859358208 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3 Predictive Failure Analysis: 0

I am curious as to why the counter for Illegal request goes up all the time.  
The machine was rebooted ~11 hours ago, and it goes up all the time when I try 
to use the pool...

The machine is a quite powerful one, and top shows no cpu load, no iowait and 
plenty of available memory.  The machine basicly doesn't do anything at the 
moment, still it can take several minutes to copy a 300mb file from somewhere 
in the pool to /tmp/...
# top
last pid:  1383;  load avg:  0.01,  0.00,  0.00;  up 0+10:47:57 
 01:39:17
55 processes: 54 sleeping, 1 on cpu
CPU states: 99.0% idle,  0.0% user,  1.0% kernel,  0.0% iowait,  0.0% swap
Kernel: 193 ctxsw, 3 trap, 439 intr, 298 syscall, 3 flt
Memory: 8186M phys mem, 4699M free mem, 2048M total swap, 2048M free swap

I thought I might have run into problems described here on the forums with the 
ARC and fragmentation, 

Re: [zfs-discuss] Extremely bad performance - hw failure?

2009-12-27 Thread Richard Elling
The best place to start looking at disk-related performance problems  
is iostat.

Slow disks will show high service times.  There are many options, but I
usually use something like:
iostat -zxcnPT d 1

Ignore the first line.  Look at the service times.  They should be  
below 10ms

for good performance.
 -- richard


On Dec 27, 2009, at 4:52 PM, Morten-Christian Bernson wrote:

Lately my zfs pool in my home server has degraded to a state where  
it can be said it doesn't work at all.  Read spead is slower than I  
can read from the internet on my slow dsl-line... This is compared  
to just a short while ago, where I could read from it with over 50mb/ 
sec over the network.


My setup:
Running latest Solaris 10: # uname -a
SunOS solssd01 5.10 Generic_142901-02 i86pc i386 i86pc

# zpool status DATA
 pool: DATA
state: ONLINE
config:
   NAMESTATE READ WRITE CKSUM
   DATAONLINE   0 0 0
 raidz1ONLINE   0 0 0
   c2t5d0  ONLINE   0 0 0
   c2t4d0  ONLINE   0 0 0
   c2t3d0  ONLINE   0 0 0
   c2t2d0  ONLINE   0 0 0
   spares
 c0t2d0AVAIL
errors: No known data errors

# zfs list -r DATA
NAME   USED  AVAIL  REFER  MOUNTPOINT
DATA  3,78T   229G  3,78T  /DATA

All of the drives in this pool are 1.5tb western digital green  
drives. I am not seeing any error messages in /var/adm/messages, and  
fmdump -eV shows no errors...   However, I am seeing some soft  
faults in iostat -eEn:

 errors ---
 s/w h/w trn tot device
 2   0   0   2 c0t0d0
 1   0   0   1 c1t0d0
 2   0   0   2 c2t1d0
151   0   0 151 c2t2d0
151   0   0 151 c2t3d0
153   0   0 153 c2t4d0
153   0   0 153 c2t5d0
 2   0   0   2 c0t1d0
 3   0   0   3 c0t2d0
 0   0   0   0 solssd01:vold(pid531)
c0t0d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 31.87GB 31866224128 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c1t0d0   Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
Vendor: _NEC Product: DVD_RW ND-3500AG Revision: 2.16 Serial No:
Size: 0.00GB 0 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
c2t1d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:
Size: 750.16GB 750156373504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c2t2d0   Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 151 Predictive Failure Analysis: 0
c2t3d0   Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 151 Predictive Failure Analysis: 0
c2t4d0   Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 153 Predictive Failure Analysis: 0
c2t5d0   Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
Vendor: ATA  Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
Size: 1500.30GB 1500301909504 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 153 Predictive Failure Analysis: 0
c0t1d0   Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 31.87GB 31866224128 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 2 Predictive Failure Analysis: 0
c0t2d0   Soft Errors: 3 Hard Errors: 0 Transport Errors: 0
Vendor: Sun  Product: STK RAID INT Revision: V1.0 Serial No:
Size: 1497.86GB 1497859358208 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 3 Predictive Failure Analysis: 0

I am curious as to why the counter for Illegal request goes up all  
the time.  The machine was rebooted ~11 hours ago, and it goes up  
all the time when I try to use the pool...


The machine is a quite powerful one, and top shows no cpu load, no  
iowait and plenty of available memory.  The machine basicly doesn't  
do anything at the moment, still it can take several minutes to copy  
a 300mb file from somewhere in the pool to /tmp/...

# top
last pid:  1383;  load avg:  0.01,  0.00,  0.00;  up  
0 
+ 
10 
: 
47 
:57   

Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Bob Friesenhahn

On Sun, 27 Dec 2009, Tim Cook wrote:

How is that going to prevent blocks being spread all over the disk 
when you've got files several GB in size being written concurrently 
and deleted at random?  And then throw in a mix of small files as 
well, kiss that goodbye.


There would certainly be blocks spread all over the disk, but a 
(possible) seek ever 1MB of data is not too bad (not considering 
metadata seeks).  If the pool is allowed to get very full, then 
optimizations based on pre-allocated space stop working.


Pre-allocating data blocks is also not going to cure head seek and 
the latency it induces on slow 7200/5400RPM drives.


But if the next seek to a data block is on a different drive, that 
drive can be seeking for the next block while the current block is 
already being read.


On a new, empty pool, or a pool that's been filled completely and 
emptied several times?  It's not amazing to me on a new pool.  I 
would be surprised to see you accomplish this feat repeatedly after 
filling and emptying the drives.  It's a drawback of every 
implementation of copy-on-write I've ever seen.  By it's very 
nature, I have no idea how you would avoid it.


This is a 2 year old pool which is typically filled (to about 80%) and 
emptied (reduced to 25%) many times.  However, when it is emptied, 
all of the new files get removed since the extra space is used for 
testing.  I have only seen this pool get faster over time.


For example, when the pool was first created, iozone only measured a 
single-thread large-file (64GB) write rate of 148MB/second but now it 
is up to 380MB/second with the same hardware.  The performance 
improvement is due to improvements to Solaris 10 software and array 
(STK2540) firmware.


Original vs current:

  KB  reclen   write rewritereadreread
67108864 256  148995  165041   463519   453896
67108864 256  380286  377397   551060   550414

Here is an anchient blog entry where Jeff Bonwick discusses ZFS block 
allocation:


  http://blogs.sun.com/bonwick/entry/zfs_block_allocation

and a somewhat newer one where Jeff describes space maps:

  http://blogs.sun.com/bonwick/entry/space_maps

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write bursts cause short app stalls

2009-12-27 Thread Tim Cook
On Sun, Dec 27, 2009 at 8:40 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Sun, 27 Dec 2009, Tim Cook wrote:

  How is that going to prevent blocks being spread all over the disk when
 you've got files several GB in size being written concurrently and deleted
 at random?  And then throw in a mix of small files as well, kiss that
 goodbye.


 There would certainly be blocks spread all over the disk, but a (possible)
 seek ever 1MB of data is not too bad (not considering metadata seeks).  If
 the pool is allowed to get very full, then optimizations based on
 pre-allocated space stop working.


I guess it depends entirely on the space map :)



  Pre-allocating data blocks is also not going to cure head seek and the
 latency it induces on slow 7200/5400RPM drives.


 But if the next seek to a data block is on a different drive, that drive
 can be seeking for the next block while the current block is already being
 read.


Well of course.  The argument of if you just throw more disks at the
problem will be valid in almost all situations.  Expecting to get the same
performance out of drives you get when they're empty and new vs. full and
used, in my experience, is crazy.  My point from the start was you will see
a significant performance decrease as time and fragmentation take place.





  On a new, empty pool, or a pool that's been filled completely and emptied
 several times?  It's not amazing to me on a new pool.  I would be surprised
 to see you accomplish this feat repeatedly after filling and emptying the
 drives.  It's a drawback of every implementation of copy-on-write I've ever
 seen.  By it's very nature, I have no idea how you would avoid it.


 This is a 2 year old pool which is typically filled (to about 80%) and
 emptied (reduced to 25%) many times.  However, when it is emptied, all
 of the new files get removed since the extra space is used for testing.  I
 have only seen this pool get faster over time.

 For example, when the pool was first created, iozone only measured a
 single-thread large-file (64GB) write rate of 148MB/second but now it is up
 to 380MB/second with the same hardware.  The performance improvement is due
 to improvements to Solaris 10 software and array (STK2540) firmware.

 Original vs current:

  KB  reclen   write rewritereadreread
67108864 256  148995  165041   463519   453896
67108864 256  380286  377397   551060   550414


Cmon, saying all I did was change code and firmware isn't a valid
comparison at all.  Ignoring that, I'm still referring to multiple streams
which create random I/O to the backend disk.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pool unusable after attempting to destroy a dataset with dedup enabled

2009-12-27 Thread Jack Kielsmeier
One thing that bugged me is that I can not ssh as myself to my box when a zpool 
import is running. It just hangs after accepting my password.

I had to convert root from a role to a user and ssh as root to my box.

I now know why this is, when I log in, /usr/sbin/quota gets called. This must 
do a zfs or zpool command get get quota information which hangs during an 
import.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cannot receive new filesystem stream: invalid backup stream

2009-12-27 Thread Albert Chin
I have two snv_126 systems. I'm trying to zfs send a recursive snapshot
from one system to another:
  # zfs send -v -R tww/opt/chro...@backup-20091225 |\
  ssh backupserver zfs receive -F -d -u -v tww
  ...
  found clone origin tww/opt/chroots/a...@ab-1.0
  receiving incremental stream of tww/opt/chroots/ab-...@backup-20091225 into 
tww/opt/chroots/ab-...@backup-20091225
  cannot receive new filesystem stream: invalid backup stream

If I do the following on the origin server:
  # zfs destroy -r tww/opt/chroots/ab-1.0
  # zfs list -t snapshot -r tww/opt/chroots | grep ab-1.0 
  tww/opt/chroots/a...@ab-1.0
  tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0
  tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0
  ...
  # zfs list -t snapshot -r tww/opt/chroots | grep ab-1.0 |\
  while read a; do zfs destroy $a; done
then another zfs send like the above, the zfs send/receive succeeds.
However, If I then perform a few operations like the following:
  zfs snapshot tww/opt/chroots/a...@ab-1.0
  zfs clone tww/opt/chroots/a...@ab-1.0 tww/opt/chroots/ab-1.0
  zfs rename tww/opt/chroots/ab/hppa1.1-hp-hpux11.00 
tww/opt/chroots/ab-1.0/hppa1.1-hp-hpux11.00
  zfs rename tww/opt/chroots/hppa1.1-hp-hpux11...@ab 
tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0
  zfs destroy tww/opt/chroots/ab/hppa1.1-hp-hpux11.00
  zfs destroy tww/opt/chroots/hppa1.1-hp-hpux11...@ab
  zfs snapshot tww/opt/chroots/hppa1.1-hp-hpux11...@ab
  zfs clone tww/opt/chroots/hppa1.1-hp-hpux11...@ab 
tww/opt/chroots/ab/hppa1.1-hp-hpux11.00
  zfs rename tww/opt/chroots/ab/hppa1.1-hp-hpux11.11 
tww/opt/chroots/ab-1.0/hppa1.1-hp-hpux11.11
  zfs rename tww/opt/chroots/hppa1.1-hp-hpux11...@ab 
tww/opt/chroots/hppa1.1-hp-hpux11...@ab-1.0
  zfs destroy tww/opt/chroots/ab/hppa1.1-hp-hpux11.11
  zfs destroy tww/opt/chroots/hppa1.1-hp-hpux11...@ab
  zfs snapshot tww/opt/chroots/hppa1.1-hp-hpux11...@ab
  zfs clone tww/opt/chroots/hppa1.1-hp-hpux11...@ab 
tww/opt/chroots/ab/hppa1.1-hp-hpux11.11
  ...
and then perform another zfs send/receive, the error above occurs. Why?

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Extremely bad performance - hw failure?

2009-12-27 Thread Joe Little
I've had this happen to me too. I found some dtrace scripts at the
time that showed that the file system was spending too much time
finding available 128k blocks or the like as I was near full per each
disk, even though combined I still had 140GB left of my 3TB pool. The
SPA code I believe it was was spending too much time walking the
available pool for continguous space for new writes, and this
affecting both read and write performance dramatically (measured in
kb/sec).

I was able to alleviate the pressure so to speak by adjusting the
recordsize for the pool down to 8k (32k is likely more recommended)
and from there I could then start to clear out space. Anything below
10% available space seems to cause ZFS to start behaving poorly, and
getting down lower increases the problems. But the root cause was
metadata management on pools w/ less than 5-10% disk space left.

In my case, I had lots of symlinks, lots of small files, and also
dozens of snapshots. My pool was a RAID10 (aka, 3 mirror sets
striped).


On Sun, Dec 27, 2009 at 4:52 PM, Morten-Christian Bernson m...@uib.no wrote:
 Lately my zfs pool in my home server has degraded to a state where it can be 
 said it doesn't work at all.  Read spead is slower than I can read from the 
 internet on my slow dsl-line... This is compared to just a short while ago, 
 where I could read from it with over 50mb/sec over the network.

 My setup:
 Running latest Solaris 10: # uname -a
 SunOS solssd01 5.10 Generic_142901-02 i86pc i386 i86pc

 # zpool status DATA
  pool: DATA
  state: ONLINE
 config:
        NAME        STATE     READ WRITE CKSUM
        DATA        ONLINE       0     0     0
          raidz1    ONLINE       0     0     0
            c2t5d0  ONLINE       0     0     0
            c2t4d0  ONLINE       0     0     0
            c2t3d0  ONLINE       0     0     0
            c2t2d0  ONLINE       0     0     0
        spares
          c0t2d0    AVAIL
 errors: No known data errors

 # zfs list -r DATA
 NAME                               USED  AVAIL  REFER  MOUNTPOINT
 DATA                              3,78T   229G  3,78T  /DATA

 All of the drives in this pool are 1.5tb western digital green drives. I am 
 not seeing any error messages in /var/adm/messages, and fmdump -eV shows no 
 errors...   However, I am seeing some soft faults in iostat -eEn:
   errors ---
  s/w h/w trn tot device
  2   0   0   2 c0t0d0
  1   0   0   1 c1t0d0
  2   0   0   2 c2t1d0
 151   0   0 151 c2t2d0
 151   0   0 151 c2t3d0
 153   0   0 153 c2t4d0
 153   0   0 153 c2t5d0
  2   0   0   2 c0t1d0
  3   0   0   3 c0t2d0
  0   0   0   0 solssd01:vold(pid531)
 c0t0d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: Sun      Product: STK RAID INT     Revision: V1.0 Serial No:
 Size: 31.87GB 31866224128 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c1t0d0           Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
 Vendor: _NEC     Product: DVD_RW ND-3500AG Revision: 2.16 Serial No:
 Size: 0.00GB 0 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 1 Predictive Failure Analysis: 0
 c2t1d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: SAMSUNG HD753LJ  Revision: 1113 Serial No:
 Size: 750.16GB 750156373504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c2t2d0           Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 151 Predictive Failure Analysis: 0
 c2t3d0           Soft Errors: 151 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 151 Predictive Failure Analysis: 0
 c2t4d0           Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 153 Predictive Failure Analysis: 0
 c2t5d0           Soft Errors: 153 Hard Errors: 0 Transport Errors: 0
 Vendor: ATA      Product: WDC WD15EADS-00R Revision: 0A01 Serial No:
 Size: 1500.30GB 1500301909504 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 153 Predictive Failure Analysis: 0
 c0t1d0           Soft Errors: 2 Hard Errors: 0 Transport Errors: 0
 Vendor: Sun      Product: STK RAID INT     Revision: V1.0 Serial No:
 Size: 31.87GB 31866224128 bytes
 Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
 Illegal Request: 2 Predictive Failure Analysis: 0
 c0t2d0           Soft Errors: 3 Hard Errors: 0 

Re: [zfs-discuss] repost - high read iops

2009-12-27 Thread Brad
Richard - the l2arc is c1t13d0.  What tools can be use to show the l2arc stats?

  raidz1 2.68T   580G543453  4.22M  3.70M
c1t1d0   -  -258102   689K   358K
c1t2d0   -  -256103   684K   354K
c1t3d0   -  -258102   690K   359K
c1t4d0   -  -260103   687K   354K
c1t5d0   -  -255101   686K   358K
c1t6d0   -  -263103   685K   354K
c1t7d0   -  -259101   689K   358K
c1t8d0   -  -259103   687K   354K
c1t9d0   -  -260102   689K   358K
c1t10d0  -  -263103   686K   354K
c1t11d0  -  -260102   687K   359K
c1t12d0  -  -263104   684K   354K
  c1t14d0 396K  29.5G  0 65  7  3.61M
cache-  -  -  -  -  -
  c1t13d029.7G  11.1M157 84  3.93M  6.45M

We've added 16GB to the box bring the overall total to 32GB.
arc_max is set to 8GB:
set zfs:zfs_arc_max = 8589934592

arc_summary output:
ARC Size:
 Current Size: 8192 MB (arcsize)
 Target Size (Adaptive):   8192 MB (c)
 Min Size (Hard Limit):1024 MB (zfs_arc_min)
 Max Size (Hard Limit):8192 MB (zfs_arc_max)

ARC Size Breakdown:
 Most Recently Used Cache Size:  39%3243 MB (p)
 Most Frequently Used Cache Size:60%4948 MB (c-p)

ARC Efficency:
 Cache Access Total: 154663786
 Cache Hit Ratio:  41%   64221251   [Defined State for 
buffer]
 Cache Miss Ratio: 58%   90442535   [Undefined State for 
Buffer]
 REAL Hit Ratio:   41%   64221251   [MRU/MFU Hits Only]

 Data Demand   Efficiency:38%
 Data Prefetch Efficiency:DISABLED (zfs_prefetch_disable)

CACHE HITS BY CACHE LIST:
  Anon:   --%Counter Rolled.
  Most Recently Used: 17%8906 (mru) [ 
Return Customer ]
  Most Frequently Used:   82%53102345 (mfu) [ 
Frequent Customer ]
  Most Recently Used Ghost:   14%9427708 (mru_ghost)[ 
Return Customer Evicted, Now Back ]
  Most Frequently Used Ghost:  6%4344287 (mfu_ghost)[ 
Frequent Customer Evicted, Now Back ]
CACHE HITS BY DATA TYPE:
  Demand Data:84%5108
  Prefetch Data:   0%0
  Demand Metadata:15%9777143
  Prefetch Metadata:   0%0
CACHE MISSES BY DATA TYPE:
  Demand Data:96%87542292
  Prefetch Data:   0%0
  Demand Metadata: 3%2900243
  Prefetch Metadata:   0%0


Also disabled file-level pre-fletch and vdev cache max:
set zfs:zfs_prefetch_disable = 1
set zfs:zfs_vdev_cache_max = 0x1

After reading about some issues with concurrent ios, I tweaked the setting down 
from 35 to 1 and it reduced the response times greatly (2 - 8ms):
set zfs:zfs_vdev_max_pending=1

It did increased the actv...I'm still unsure about the side-effects here:
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 2295.2  398.74.27.2  0.0 18.60.06.9   0 1084 c1
0.00.80.00.0  0.0  0.00.00.1   0   0 c1t0d0
  190.3   22.90.40.0  0.0  1.50.07.0   0  87 c1t1d0
  180.9   20.60.30.0  0.0  1.70.08.5   0  95 c1t2d0
  195.0   43.00.30.2  0.0  1.60.06.8   0  93 c1t3d0
  193.2   21.70.40.0  0.0  1.50.06.8   0  88 c1t4d0
  195.7   34.80.30.1  0.0  1.70.07.5   0  97 c1t5d0
  186.8   20.60.30.0  0.0  1.50.07.3   0  88 c1t6d0
  188.4   21.00.40.0  0.0  1.60.07.7   0  91 c1t7d0
  189.6   21.20.30.0  0.0  1.60.07.4   0  91 c1t8d0
  193.8   22.60.40.0  0.0  1.50.07.1   0  91 c1t9d0
  192.6   20.80.30.0  0.0  1.40.06.8   0  88 c1t10d0
  195.7   22.20.30.0  0.0  1.50.06.7   0  88 c1t11d0
  184.7   20.30.30.0  0.0  1.40.06.8   0  84 c1t12d0
7.3   82.40.15.5  0.0  0.00.00.2   0   1 c1t13d0
1.3   23.90.01.3  0.0  0.00.00.2   0   0 c1t14d0

I'm still in talks with the dba in seeing if we can raise the SGA from 4GB to 
6GB to see if it'll help.

The changes that showed a lot of improvement is disabling file/device level 
pre-fletch and reducing concurrent ios from 35 to 1 (tried 10 but it didn't 
help much).  Is there anything else that could be tweaked to increase write 
performance?  Record sizes are set according to 8K and 128K for redo logs.
-- 
This