[zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Haudy Kazemi

Hello,

I've looked around Google and the zfs-discuss archives but have not been 
able to find a good answer to this question (and the related questions 
that follow it):


How well does ZFS handle unexpected power failures? (e.g. environmental 
power failures, power supply dying, etc.)

Does it consistently gracefully recover?
Should having a UPS be considered a (strong) recommendation or a don't 
even think about running without it item?
Are there any communications/interfacing caveats to be aware of when 
choosing the UPS?


In this particular case, we're talking about a home file server running 
OpenSolaris 2009.06.  Actual environment power failures are generally  
1 per year.  I know there are a few blog articles about this type of 
application, but I don't recall seeing any (or any detailed) discussion 
about power failures and UPSes as they relate to ZFS.  I did see that 
the ZFS Evil Tuning Guide says cache flushes are done every 5 seconds.


Here is one post that didn't get any replies about a year ago after 
someone had a power failure, then UPS battery failure while copying data 
to a ZFS pool:

http://lists.macosforge.org/pipermail/zfs-discuss/2008-July/000670.html

Both theoretical answers and real life experiences would be appreciated 
as the former tells me where ZFS is needed while the later tells me 
where it has been or is now.


Thanks,

-hk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Ross
 backup windows using primarily iSCSI. When those
 writes occur to my RaidZ volume, all activity pauses until the writes
 are fully flushed.

The more I read about this, the worse it sounds.  The thing is, I can see where 
the ZFS developers are coming from - in theory this is a more efficient use of 
the disk, and with that being the slowest part of the system, there probably is 
a slight benefit in computational time.

However, it completely breaks any process like this that can't afford 3-5s 
delays in processing, it makes ZFS a nightmare for things like audio or video 
editing (where it would otherwise be a perfect fit), and it's also horrible 
from the perspective of the end user.

Does anybody know if a L2ARC would help this?  Does that work off a different 
queue, or would reads still be blocked?

I still think a simple solution to this could be to split the ZFS writes into 
smaller chunks.  That creates room for reads to be squeezed in (with the ratio 
of reads to writes something that should be automatically balanced by the 
software), but you still get the benefit of ZFS write ordering with all the 
work that's gone into perfecting that.  

Regardless of whether there are reads or not, your data is always going to be 
written to disk in an optimized fashion, and you could have a property on the 
pool that specifies how finely chopped up writes should be, allowing this to be 
easily tuned.

We're considering ZFS as storage for our virtualization solution, and this 
could be a big concern.  We really don't want the entire network pausing for 
3-5 seconds any time there is a burst of write activity.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Scrub restarting on Solaris 10 Update 7.

2009-06-30 Thread Ian Collins
I'm trying to scrub a pool on a backup server running Solaris 10 Update 
7 and the scrub restarts each time a snap is received.


I thought this was fixed in update 6? 


The machine was recently upgraded from update5, which did have the issue.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Ross
I've seen enough people suffer from corrupted pools that a UPS is definitely 
good advice.  However, I'm running a (very low usage) ZFS server at home and 
it's suffered through at least half a dozen power outages without any problems 
at all.

I do plan to buy a UPS as soon as I can, but it seems to be surviving very well 
so far.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Monish Shah

A related question:  If you are on a UPS, is it OK to disable ZIL?

The evil tuning guide says The ZIL is an essential part of ZFS and should 
never be disabled.  However, if you have a UPS, what can go wrong that 
really requires ZIL?


Opinions?

Monish

- Original Message - 
From: Ross no-re...@opensolaris.org

To: zfs-discuss@opensolaris.org
Sent: Tuesday, June 30, 2009 3:04 PM
Subject: Re: [zfs-discuss] ZFS, power failures, and UPSes


I've seen enough people suffer from corrupted pools that a UPS is 
definitely good advice.  However, I'm running a (very low usage) ZFS 
server at home and it's suffered through at least half a dozen power 
outages without any problems at all.


I do plan to buy a UPS as soon as I can, but it seems to be surviving very 
well so far.

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Scott Lawson



Haudy Kazemi wrote:

Hello,

I've looked around Google and the zfs-discuss archives but have not 
been able to find a good answer to this question (and the related 
questions that follow it):


How well does ZFS handle unexpected power failures? (e.g. 
environmental power failures, power supply dying, etc.)

Does it consistently gracefully recover?
Mostly. Unless you are unlucky. Backups are your friend in *any* 
environment though.
Should having a UPS be considered a (strong) recommendation or a 
don't even think about running without it item?
There has been quite any interesting thread on this over the last few 
months. I won't repeat my comments, but it is there in digital posterity 
on the zfs-discuss archives.


Certainly in a large environment with a lot of data being written, then 
one should consider this a mandatory requirement if you care about your
data. Particularly if there are many links in your storage chain that 
cause data corruption due to power failure.


Are there any communications/interfacing caveats to be aware of when 
choosing the UPS?


In this particular case, we're talking about a home file server 
running OpenSolaris 2009.06.  
As far as a home server goes, particularly if it is not write intensive 
then you will 'most likely' be fine. I have a home one with a v120 
running S10 u6 with a D1000
and 7 x 300 GB SCSI disk in a RAIDZ2 that has seen numerous power 
interruptions with no faults. This machine is a Samba server for my Macs 
and printing

business.

I also have another mail / web server also on another v120 which 
experiences the same power faults and regularly bounces back without 
issues.  But your mileage may vary. It all really

depends on how much you care about the data really.

I haven't used OpenSolaris specifically however as I prefer the 
generally more well supported S10 releases. (yes I know you can get 
support for OS, but I tend to be
conservative and standardize as much as possible. I do have millions of 
files stored on ZFS volumes for our Uni and I sleep well ;))


Actual environment power failures are generally  1 per year.  I know 
there are a few blog articles about this type of application, but I 
don't recall seeing any (or any detailed) discussion about power 
failures and UPSes as they relate to ZFS.  I did see that the ZFS Evil 
Tuning Guide says cache flushes are done every 5 seconds.
The flush time you mention is based on older versions of ZFS, newer ones 
can have a flush time as long as 30 seconds I believe now.


Here is one post that didn't get any replies about a year ago after 
someone had a power failure, then UPS battery failure while copying 
data to a ZFS pool:

http://lists.macosforge.org/pipermail/zfs-discuss/2008-July/000670.html

Both theoretical answers and real life experiences would be 
appreciated as the former tells me where ZFS is needed while the later 
tells me where it has been or is now.


Thanks,

-hk
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Andre van Eyssen

On Tue, 30 Jun 2009, Monish Shah wrote:

The evil tuning guide says The ZIL is an essential part of ZFS and should 
never be disabled.  However, if you have a UPS, what can go wrong that 
really requires ZIL?


Without addressing a single ZFS-specific issue:

* panics
* crashes
* hardware failures
- dead RAM
- dead CPU
- dead systemboard
- dead something else
* natural disasters
* UPS failure
* UPS failure (must be said twice)
* Human error (what does this button do?)
* Cabling problems (say, where did my disks go?)
* Malicious actions (Fired? Let me turn their power off!)

That's just a warm-up; I'm sure people can add both the ZFS-specific 
reasons and also the fallacy that a UPS does anything more than mitigate 
one particular single point of failure.


Don't forget to buy two UPSes and split your machine across both. And 
don't forget to actually maintain the UPS. And check the batteries. And 
schedule a load test.


The single best way to learn about the joys of UPS behaviour is to sit 
down and have a drink with a facilities manager who has been doing the job 
for at least ten years. At least you'll hear some funny stories about the 
day a loose screw on one floor took out a house UPS and 100+ hosts and NEs 
with it.


Andre.


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Doug Baker - Sun UK - Support Engineer

Monish Shah wrote:

A related question:  If you are on a UPS, is it OK to disable ZIL?

The evil tuning guide says The ZIL is an essential part of ZFS and 
should never be disabled.  However, if you have a UPS, what can go 
wrong that really requires ZIL?


The UPS.



Opinions?

Monish

- Original Message - From: Ross no-re...@opensolaris.org
To: zfs-discuss@opensolaris.org
Sent: Tuesday, June 30, 2009 3:04 PM
Subject: Re: [zfs-discuss] ZFS, power failures, and UPSes


I've seen enough people suffer from corrupted pools that a UPS is 
definitely good advice.  However, I'm running a (very low usage) ZFS 
server at home and it's suffered through at least half a dozen power 
outages without any problems at all.


I do plan to buy a UPS as soon as I can, but it seems to be surviving 
very well so far.

--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Dr Doug Baker
Sun Microsystems Systems Support Engineer.
UK Mission Critical Solution Centre.
Tel : 0870 600 3222
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Useful Emulex tunable for i386

2009-06-30 Thread Bob Friesenhahn

On Sun, 28 Jun 2009, Bob Friesenhahn wrote:


On Sun, 28 Jun 2009, Bob Friesenhahn wrote:
Today I experimented with doubling this value to 688128 and was happy to 
see a large increase in sequential read performance from my ZFS pool which 
is based on six mirrors vdevs.  Sequential read performance jumped from 
552787 MB/s to 799626 MB/s.  It seems that the default driver buffer size 
interfers with zfs's ability to double the read performance by balancing 
the reads from the mirror devices.  Now the read performance is almost 2X 
the write performance.


Grumble.  This may be a bit of a red herring.


Perhaps this Emulex tunable was not entirely a red herring.

Doubling the default for this tunable made a difference to my 
application.  It dropped total real execution time from 2:45:03.152 to 
2:24:25.675.  That is a pretty large improvement.


If I run two copies of my application at once and divide up the work, 
the execution time is 1:42:32.42.  Even with two (or three) copies of 
the application running, it seems that zfs is still the bottleneck 
since the square-wave of system CPU utilization becomes even more 
prominent, indicating that all readers are blocked during the TXG 
sync.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Ross wrote:


However, it completely breaks any process like this that can't 
afford 3-5s delays in processing, it makes ZFS a nightmare for 
things like audio or video editing (where it would otherwise be a 
perfect fit), and it's also horrible from the perspective of the end 
user.


Yes.  I updated the image at 
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-stalls.png 
so that it shows the execution impact with more processes running. 
This is taken with three processes running in parallel so that there 
can be no doubt that I/O is being globally blocked and it is not just 
misbehavior of a single process.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Neal Pollack

On 06/30/09 03:00 AM, Andre van Eyssen wrote:

On Tue, 30 Jun 2009, Monish Shah wrote:

The evil tuning guide says The ZIL is an essential part of ZFS and 
should never be disabled.  However, if you have a UPS, what can go 
wrong that really requires ZIL?


Without addressing a single ZFS-specific issue:

* panics
* crashes
* hardware failures
- dead RAM
- dead CPU
- dead systemboard
- dead something else
* natural disasters
* UPS failure
* UPS failure (must be said twice)
* Human error (what does this button do?)
* Cabling problems (say, where did my disks go?)
* Malicious actions (Fired? Let me turn their power off!)

That's just a warm-up; I'm sure people can add both the ZFS-specific 
reasons and also the fallacy that a UPS does anything more than 
mitigate one particular single point of failure.


Actually, they do quite a bit more than that.
They create jobs, generate revenue for battery manufacturers, and tech's 
that change
batteries and do PM maintenance on the large units.  Let's not forget 
that they add
significant revenue to the transportation industry, given their weight 
for shipping.


In the last 28 years of doing this stuff, I've found a few times that 
the UPS has actually
worked and lasted as long as the outage.  Many other times, the unit is 
failed (circuits),
or the batteries are beyond the service life.  But really, something 
approaching 40%

of the time they actually work out OK.

So they also create repair and recycling jobs. :-)




Don't forget to buy two UPSes and split your machine across both. And 
don't forget to actually maintain the UPS. And check the batteries. 
And schedule a load test.


The single best way to learn about the joys of UPS behaviour is to sit 
down and have a drink with a facilities manager who has been doing the 
job for at least ten years. At least you'll hear some funny stories 
about the day a loose screw on one floor took out a house UPS and 100+ 
hosts and NEs with it.


Andre.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Neal Pollack wrote:

Actually, they do quite a bit more than that. They create jobs, 
generate revenue for battery manufacturers, and tech's that change 
batteries and do PM maintenance on the large units.  Let's not


It sounds like this is a responsibility which should be moved to the 
US federal goverment since UPSs create jobs.


In the last 28 years of doing this stuff, I've found a few times 
that the UPS has actually worked and lasted as long as the outage.


I have seen UPSs help quite a lot for short glitches lasting seconds, 
or a minute.  Otherwise the outage is usually longer than the UPSs can 
stay up since the problem required human attention.


A standby generator is needed for any long outages.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Erik Trimble

Bob Friesenhahn wrote:

On Tue, 30 Jun 2009, Neal Pollack wrote:

Actually, they do quite a bit more than that. They create jobs, 
generate revenue for battery manufacturers, and tech's that change 
batteries and do PM maintenance on the large units.  Let's not


It sounds like this is a responsibility which should be moved to the 
US federal goverment since UPSs create jobs.


Actually, I think UPS already employs some 410,000+ people, making it 
the 3rd largest private employer in the USA. (5th overall, if you 
include the Federal Gov't and the US Postal Service).


wink


In the last 28 years of doing this stuff, I've found a few times that 
the UPS has actually worked and lasted as long as the outage.


I have seen UPSs help quite a lot for short glitches lasting seconds, 
or a minute.  Otherwise the outage is usually longer than the UPSs can 
stay up since the problem required human attention.


A standby generator is needed for any long outages.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


As someone who has spend enough time doing data center work, I can 
attest to the fact that UPSes are really useful only as 
extremely-short-interval solutions. A dozen or so minutes, at best.


The best design I've see was for an old BBN (hey, remember them!) site 
just outside of Cambridge, MA.  It took in utility power, ran it through 
a conditioner setup, and then through this nice switch thing.  The 
switch took three inputs:  Utility, a local diesel generator, and a line 
of marine batteries.  The switch itself was internally redundant (which 
isn't hard to do, it's 50's tech), so you could draw power from any (or 
even all 3 at once).  Nothing really fancy; it was simple, with no 
semiconductor stuff to fail - just all 50-ish hardwired circuitry. I 
don't even think there was a transistor in the whole shebang. Lots of 
capacitors, though.   :-)



The jist of the whole thing was, that if utility power was out more than 
5 minutes, there was not good predictor of how long it would remain out 
- I saw a nice little graph that showed no real good prediction of 
outage time based on existing outage length (i.e. if the power has been 
out X minutes, you can expect it to be restored in Y minutes...).   I 
suspect it was something like 20 years of accumulated data or so...


The end of this is simple:  UPSes should give you enough time to start 
the gen-pack.  If you are having problems with your gen-pack, you'll 
never have enough UPS time to fix it (and, it's not cost-effective to 
try to make it so), so FIX YOUR GEN PACK BEFORE the outage.  Which means 
- TEST it, and TEST it, and TEST it again!



For home use, I set my UPS to immediately shut down anything attached to 
it for /any/ service outage.  Large enough batteries to handle anything 
more than a couple of minutes are frankly a fire-hazard for the home, 
not to mention a maintenance PITA.


--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Jason King
On Tue, Jun 30, 2009 at 1:36 PM, Erik Trimbleerik.trim...@sun.com wrote:
 Bob Friesenhahn wrote:

 On Tue, 30 Jun 2009, Neal Pollack wrote:

 Actually, they do quite a bit more than that. They create jobs, generate
 revenue for battery manufacturers, and tech's that change batteries and do
 PM maintenance on the large units.  Let's not

 It sounds like this is a responsibility which should be moved to the US
 federal goverment since UPSs create jobs.

 Actually, I think UPS already employs some 410,000+ people, making it the
 3rd largest private employer in the USA. (5th overall, if you include the
 Federal Gov't and the US Postal Service).

 wink


 In the last 28 years of doing this stuff, I've found a few times that the
 UPS has actually worked and lasted as long as the outage.

 I have seen UPSs help quite a lot for short glitches lasting seconds, or a
 minute.  Otherwise the outage is usually longer than the UPSs can stay up
 since the problem required human attention.

 A standby generator is needed for any long outages.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 As someone who has spend enough time doing data center work, I can attest to
 the fact that UPSes are really useful only as extremely-short-interval
 solutions. A dozen or so minutes, at best.

 The best design I've see was for an old BBN (hey, remember them!) site just
 outside of Cambridge, MA.  It took in utility power, ran it through a
 conditioner setup, and then through this nice switch thing.  The switch took
 three inputs:  Utility, a local diesel generator, and a line of marine
 batteries.  The switch itself was internally redundant (which isn't hard to
 do, it's 50's tech), so you could draw power from any (or even all 3 at
 once).  Nothing really fancy; it was simple, with no semiconductor stuff to
 fail - just all 50-ish hardwired circuitry. I don't even think there was a
 transistor in the whole shebang. Lots of capacitors, though.   :-)


 The jist of the whole thing was, that if utility power was out more than 5
 minutes, there was not good predictor of how long it would remain out - I
 saw a nice little graph that showed no real good prediction of outage time
 based on existing outage length (i.e. if the power has been out X minutes,
 you can expect it to be restored in Y minutes...).   I suspect it was
 something like 20 years of accumulated data or so...

 The end of this is simple:  UPSes should give you enough time to start the
 gen-pack.  If you are having problems with your gen-pack, you'll never have
 enough UPS time to fix it (and, it's not cost-effective to try to make it
 so), so FIX YOUR GEN PACK BEFORE the outage.  Which means - TEST it, and
 TEST it, and TEST it again!

Slight corollary -- just because you have a generator and test it
doesn't mean you can assume you can get fuel in a timely manner (so
still be prepared to shutdown if needed).  I have seen places whose DR
plans completely rely on the assumption there will never be any
problems refueling their generators.  However, last year after Ike
hit, one of ATT's central offices lost power because it ran out of
fuel (and couldn't get refilled in time).



 For home use, I set my UPS to immediately shut down anything attached to it
 for /any/ service outage.  Large enough batteries to handle anything more
 than a couple of minutes are frankly a fire-hazard for the home, not to
 mention a maintenance PITA.

 --
 Erik Trimble
 Java System Support
 Mailstop:  usca22-123
 Phone:  x17195
 Santa Clara, CA

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
For what it is worth, I too have seen this behavior when load testing our zfs 
box. I used iometer and the RealLife profile (1 worker, 1 target, 65% reads, 
60% random, 8k, 32 IOs in the queue). When writes are being dumped, reads drop 
close to zero, from 600-700 read IOPS to 15-30 read IOPS.

zpool iostat data01 1

Where data01 is my pool name

pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  55.5G  20.4T691  0  4.21M  0
data01  55.5G  20.4T632  0  3.80M  0
data01  55.5G  20.4T657  0  3.93M  0
data01  55.5G  20.4T669  0  4.12M  0
data01  55.5G  20.4T689  0  4.09M  0
data01  55.5G  20.4T488  1.77K  2.94M  9.56M
data01  55.5G  20.4T 29  4.28K   176K  23.5M
data01  55.5G  20.4T 25  4.26K   165K  23.7M
data01  55.5G  20.4T 20  3.97K   133K  22.0M
data01  55.6G  20.4T170  2.26K  1.01M  11.8M
data01  55.6G  20.4T678  0  4.05M  0
data01  55.6G  20.4T625  0  3.74M  0
data01  55.6G  20.4T685  0  4.17M  0
data01  55.6G  20.4T690  0  4.04M  0
data01  55.6G  20.4T679  0  4.02M  0
data01  55.6G  20.4T664  0  4.03M  0
data01  55.6G  20.4T699  0  4.27M  0
data01  55.6G  20.4T423  1.73K  2.66M  9.32M
data01  55.6G  20.4T 26  3.97K   151K  21.8M
data01  55.6G  20.4T 34  4.23K   223K  23.2M
data01  55.6G  20.4T 13  4.37K  87.1K  23.9M
data01  55.6G  20.4T 21  3.33K   136K  18.6M
data01  55.6G  20.4T468496  2.89M  1.82M
data01  55.6G  20.4T687  0  4.13M  0

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Mon, 29 Jun 2009, Lejun Zhu wrote:


With ZFS write throttle, the number 2.5GB is tunable. From what I've 
read in the code, it is possible to e.g. set 
zfs:zfs_write_limit_override = 0x800 (bytes) to make it write 
128M instead.


This works, and the difference in behavior is profound.  Now it is a 
matter of finding the best value which optimizes both usability and 
performance.  A tuning for 384 MB:


# echo zfs_write_limit_override/W0t402653184 | mdb -kw
zfs_write_limit_override:   0x3000  =   0x1800

CPU is smoothed out quite a lot and write latencies (as reported by a 
zio_rw.d dtrace script) are radically different than before.


Perfmeter display for 256 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png

Perfmeter display for 384 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png

Perfmeter display for 768 MB:
http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Miles Nordin
 ms == Monish Shah mon...@indranetworks.com writes:
 sl == Scott Lawson scott.law...@manukau.ac.nz writes:
 np == Neal Pollack neal.poll...@sun.com writes:

ms If you are on a UPS, is it OK to disable ZIL?

sl I have seen numerous UPS' failures over the years,

yeah at my place in NYC we've had more problems with the UPS than with
the service.  At the very least a UPS needs to switch off for new
batteries every two years, and the raw service does not go out that
often for me.

It starts to make more sense to use a UPS if you have dual power
supplies, dual UPS's, bypass switches.  Or crappy aboveground power.

anyway, typical machines panic because of bugs a lot more often than
either UPS or line problems.

**BUT THIS IS ALL BESIDE THE POINT**!

The ZIL is for implementing fsync() for databases and also the part of
NFS that allows servers to reboot without client data loss.  It has
*NOTHING TO DO* with losing your entire pool.  Disabling the ZIL does
not make catastrophic pool loss more likely, not even a little bit!

Unfortunately some software developer decided to write a bunch of DIRE
WARNINGS to SCARE PEOPLE INTO ASSUMPTIONS leading them to use the
maximum amount of code of which said developer is justly proud,
regardless of whether they're using it for the right reason or not.

oddly, I don't think disabling ZIL will make catastrophic loss more
likely for databases running above the ZFS, either, because unlike
non-COW filesystems ZFS never recovers to a state where writes appear
to have happened out-of-order prior to the crash.  Yes, disabling the
ZIL could break the 'D' in ACID for databases running above that ZFS,
but in a way that rolls them back in time, not makes them become
corrupt.  Running without ZIL is as if a snapshot were taken at each
TXG commit time, and on reboot after a crash you recover to the most
recent TXG-snapshot that fully committed, thus databases will be
``crash-consistent'' even without the ZIL, unless I'm mistaken.

Adding an SSD *does* make catastrophic pool loss more likely, because
if you break the SSD and then export the pool, you can never import it
again.  so, adding an SSD for the ZIL as a suggestive good-little-boy
alternative to disabling the ZIL makes catastrophic loss of the entire
pool more likely, not less.

The advantage of rolling with ZIL is, if you're using NFS you should
be able to crash and reboot the server without the clients noticing.
Also MTA's that accept messages, databases that confirm orders and
bookings, won't lose anything they've accepted or confirmed in the
crash (if everything else works).  I wish ZIL could be enabled and
disabled per filesystem instead of per kernel.


pgpxF80aXBJS7.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Brent Jones
On Tue, Jun 30, 2009 at 12:25 PM, Bob
Friesenhahnbfrie...@simple.dallas.tx.us wrote:
 On Mon, 29 Jun 2009, Lejun Zhu wrote:

 With ZFS write throttle, the number 2.5GB is tunable. From what I've read
 in the code, it is possible to e.g. set zfs:zfs_write_limit_override =
 0x800 (bytes) to make it write 128M instead.

 This works, and the difference in behavior is profound.  Now it is a matter
 of finding the best value which optimizes both usability and performance.
  A tuning for 384 MB:

 # echo zfs_write_limit_override/W0t402653184 | mdb -kw
 zfs_write_limit_override:       0x3000      =       0x1800

 CPU is smoothed out quite a lot and write latencies (as reported by a
 zio_rw.d dtrace script) are radically different than before.

 Perfmeter display for 256 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-256mb.png

 Perfmeter display for 384 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-384mb.png

 Perfmeter display for 768 MB:
 http://www.simplesystems.org/users/bfriesen/zfs-discuss/perfmeter-768mb.png

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Maybe there could be a supported ZFS tuneable (per file system even?)
that is optimized for 'background' tasks, or 'foreground'.

Beyond that, I will give this tuneable a shot and see how it impacts
my own workload.

Thanks!

-- 
Brent Jones
br...@servuhome.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Brent Jones wrote:


Maybe there could be a supported ZFS tuneable (per file system even?)
that is optimized for 'background' tasks, or 'foreground'.

Beyond that, I will give this tuneable a shot and see how it impacts
my own workload.


Note that this issue does not apply at all to NFS service, database 
service, or any other usage which does synchronous writes.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread David Magda

On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote:

I have seen UPSs help quite a lot for short glitches lasting  
seconds, or a minute.  Otherwise the outage is usually longer than  
the UPSs can stay up since the problem required human attention.


A standby generator is needed for any long outages.


Can't remember where I read the claim, but supposedly if power isn't  
restored within about ten minutes, then it will probably be out for a  
few hours. If this 'statistic' is true, it would mean that your UPS  
should last (say) fifteen minutes, and after that you really need a  
generator.


At $WORK we currently have about thirty minutes worth of juice at full  
load, but as time drags on and we start shutting down less essential  
stuff we can increase that. The PBX and security system have their own  
UPSes in their own racks, so there are two layers of battery there.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Rob Logan

 CPU is smoothed out quite a lot
yes, but the area under the CPU graph is less, so the
rate of real work performed is less, so the entire
job took longer. (allbeit smoother)

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Ross
Interesting to see that it makes such a difference, but I wonder what effect it 
has on ZFS's write ordering, and it's attempts to prevent fragmentation?

By reducing the write buffer, are you loosing those benefits?

Although on the flip side, I guess this is no worse off than any other 
filesystem, and as SSD drives take off, fragmentation is going to be less and 
less of an issue.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Scott Lawson



David Magda wrote:

On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote:

I have seen UPSs help quite a lot for short glitches lasting seconds, 
or a minute.  Otherwise the outage is usually longer than the UPSs 
can stay up since the problem required human attention.


A standby generator is needed for any long outages.


Can't remember where I read the claim, but supposedly if power isn't 
restored within about ten minutes, then it will probably be out for a 
few hours. If this 'statistic' is true, it would mean that your UPS 
should last (say) fifteen minutes, and after that you really need a 
generator.
Most UPS's from any vendor are designed to run for around ~12 minutes at 
full load. So that would appear to back

that claim up and from my experience that is pretty much on the money...


At $WORK we currently have about thirty minutes worth of juice at full 
load, but as time drags on and we start shutting down less essential 
stuff we can increase that. The PBX and security system have their own 
UPSes in their own racks, so there are two layers of battery there.
The problem comes  when the power cut comes and you aren't there in the 
middle of the night. Then you either
need an automated shutdown system instigated by traps from the UPS 
(shutting things down in the correct order)
or a generator. About here the generator becomes a very good option. The 
above no generator scenario needs to be consistently tested to maintain 
it's validity, which is a royal pain in the neck. Gen sets are worth 
their weight in gold. I can't
even think how many times in the last few years they have saved our 
bacon. (through both planned and unplanned

outages)


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Bob Friesenhahn

On Tue, 30 Jun 2009, Rob Logan wrote:


CPU is smoothed out quite a lot

yes, but the area under the CPU graph is less, so the
rate of real work performed is less, so the entire
job took longer. (allbeit smoother)


For the purpose of illustration, the case showing the huge sawtooth 
was when running three processes at once.  The period/duration of the 
sawtooth was pretty similar, but the magnitude changes.


I agree that there is a size which provides the best balance of 
smoothness and application performance.  Probably the value should be 
dialed down to just below the point where the sawtooth occurs.


More at 11.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-06-30 Thread Scott Meilicke
 On Tue, 30 Jun 2009, Bob Friesenhahn wrote:
 
 Note that this issue does not apply at all to NFS
 service, database 
 service, or any other usage which does synchronous
 writes.

I see read starvation with NFS. I was using iometer on a Windows VM, connecting 
to an NFS mount on a 2008.11 physical box. iometer params: 65% read, 60% 
random, 8k blocks, 32 outstanding IO requests, 1 worker, 1 target.

NFS Testing
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data01  59.6G  20.4T 46 24   757K  3.09M
data01  59.6G  20.4T 39 24   593K  3.09M
data01  59.6G  20.4T 45 25   687K  3.22M
data01  59.6G  20.4T 45 23   683K  2.97M
data01  59.6G  20.4T 33 23   492K  2.97M
data01  59.6G  20.4T 16 41   214K  1.71M
data01  59.6G  20.4T  3  2.36K  53.4K  30.4M
data01  59.6G  20.4T  1  2.23K  20.3K  29.2M
data01  59.6G  20.4T  0  2.24K  30.2K  28.9M
data01  59.6G  20.4T  0  1.93K  30.2K  25.1M
data01  59.6G  20.4T  0  2.22K  0  28.4M
data01  59.7G  20.4T 21295   317K  4.48M
data01  59.7G  20.4T 32 12   495K  1.61M
data01  59.7G  20.4T 35 25   515K  3.22M
data01  59.7G  20.4T 36 11   522K  1.49M
data01  59.7G  20.4T 33 24   508K  3.09M
data01  59.7G  20.4T 35 23   536K  2.97M
data01  59.7G  20.4T 32 23   483K  2.97M
data01  59.7G  20.4T 37 37   538K  4.70M

While writes are being committed to the ZIL all the time, periodic dumping to 
the pool still occurs, and during those times reads are starved. Maybe this 
doesn't happen in the 'real world' ?

-Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Any news on deduplication?

2009-06-30 Thread Andre van Eyssen

On Tue, 30 Jun 2009, MC wrote:


Any news on the ZFS deduplication work being done?  I hear Jeff Bonwick might 
speak about it this month.


Yes, it is definately on the agenda for Kernel Conference Australia 
(http://www.kernelconference.net) - you should come along!


--
Andre van Eyssen.
mail: an...@purplecow.org  jabber: an...@interact.purplecow.org
purplecow.org: UNIX for the masses http://www2.purplecow.org
purplecow.org: PCOWpix http://pix.purplecow.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS, power failures, and UPSes

2009-06-30 Thread Ian Collins

David Magda wrote:

On Jun 30, 2009, at 14:08, Bob Friesenhahn wrote:

I have seen UPSs help quite a lot for short glitches lasting seconds, 
or a minute.  Otherwise the outage is usually longer than the UPSs 
can stay up since the problem required human attention.


A standby generator is needed for any long outages.


Can't remember where I read the claim, but supposedly if power isn't 
restored within about ten minutes, then it will probably be out for a 
few hours. If this 'statistic' is true, it would mean that your UPS 
should last (say) fifteen minutes, and after that you really need a 
generator.
Or run your systems of DC and get as much backup as you have room (and 
budget!) for batteries.  I once visited a central exchange with 48 hours 
of battery capacity...


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss