Re: [zfs-discuss] [ldoms-discuss] Solaris 10 patch 137137-09 broke LDOM

2008-11-16 Thread James Black
When installing the 137137-09 patch it run out of / space, just like
http://www.opensolaris.org/jive/thread.jspa?threadID=82413&tstart=0
However tried the 6 steps to recover that didn't work.
I just rebuild the ldom and attached the LDOM image files from the old system 
and did a zpool import to recover my ZFS Data.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-16 Thread Johan Hartzenberg
On Sun, Nov 16, 2008 at 11:44 PM, Jeff Bonwick <[EMAIL PROTECTED]> wrote:

> These are the conditions:
>
> (1) The bug is specific to the root pool.  Other pools are unaffected.
> (2) It is triggered by doing a 'zpool online' while I/O is in flight.
> (3) Item (2) can be triggered by syseventd.
> (4) The bug is new in build 102.  Builds 101 and earlier are fine.
>
> I believe the following should be a viable workaround until build 103:
>
> (1) svcadm disable -t sysevent
> (2) Don't run zpool online on your root pool
>
> Jeff
>


Hi Jeff,

Thank you for the details.  A few more questions:  Does booting into build
102 do I zpool online on the root pool? And the above disable -t is
"temporary" till the next reboot - any specific reason for doing it that
way?  And last question:  What do I loose when I disable "sysevent"?

Thank you,
  _Johan



-- 
Any sufficiently advanced technology is indistinguishable from magic.
   Arthur C. Clarke

My blog: http://initialprogramload.blogspot.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + OpenSolaris for home NAS?

2008-11-16 Thread Bill Werner
> If you want a small system that is pre-built, look at
> every possible
> permutation/combination of the Dell Vostro 200 box.

I agree, the Vostro 200 systems are an excellent deal.  Update to the latest 
BIOS and they will recognize 8GB of RAM.

The ONE problem with them, is that Dell does not enable AHCI, so SATA access is 
slower than it needs to be.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Miles Nordin
> "mb" == Martin Blom <[EMAIL PROTECTED]> writes:

mb> if I'm risking it more than usual when the procedure is done?

yeah, that is my opinion: when the procedure is done, using ZFS
without a backup is risking the data more than using UFS or ext3
without a backup.  Is that a clear statement?


I can ramble on, but maybe that's all you care to hear.

ZFS or not, I've become a big believer that filesystem-level backup is
always important for data that must last a decade, not just RAID or
snapshots, so it doesn't mean don't use ZFS, it means this is the time
to start building proper backup into your budget.

With huge amounts of data, you can get into a situation where you need
to make a copy of the data, and you've nowhere to put it and no time
and money to create a space for it, and you find yourself painted into
a corner.  This is not as much the case if you've bought a single big
external drive---you can afford to buy another drive, and the new
drive works instantly---but with big RAIDs you have to
save/budget/build to make space to copy them.  

Why would you suddenly need a copy?  Well, maybe you need to carry the
data with you to deliver it to someone else.  You risk damaging the
copy you're physically transporting, so you should always have a
stationary copy too.  Maybe you need to (I'm repeating myself again so
maybe just reread my other post) change the raid stripe arrangement
(ex., widen the stripe when you add a fourth disk, otherwise you end
up stuck with raidz(3disk) * 2 when you could have the same capacity
and more protection with raidz2(6disk) * 1), or remove a slog, or work
around a problem by importing the pool on an untrustworthy SXCE
release or hacked zfs code that might destroy everything, or you want
to test-upgrade the pool to a new version to see if it fixes a problem
but think you might want to downgrade it to the old zpool version if
you run into other problems.  Without a copy, you will be so fearful
of every reasonable step you take, you will make prudish decisions and
function slowly.

The possible exception to needing backups is these two-level
filesystems like GlusterFS or googlefs or maybe samfs?  These are
mid-layer filesystems that are backed by ordinary filesystems beneath
them, not block devices, and they replicate the data across the
ordinary filesystems.  Some of these may have worst-case recovery
schemes that are pretty good, especially if you have a few big files
rather than many tiny ones so you can live with getting just the
insides of the file back like fsck gives you in lost+found.  And they
don't use RAID-like parity/FEC schemes, rather they only make
mirror-like copies of the files, and they've usually the ability to
evacuate an underlying filesystem, so you're less likely to be painted
into a corner like i described---you always own enough disk for two or
three complete copies.  but I don't have experience here---that's my
point.  Maybe still for these, maybe not, but at least for all
block-backed filesystems, my experience says that you need a backup,
because within a decade you'll make a mistake or hit a corruption bug
or need a copy.


pgpnXGfHPQu5o.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub

2008-11-16 Thread Richard Elling
dick hoogendijk wrote:
> Can I do a zpool scrub on a running server without effecting
> webserving / email serving? I read it is a I/O-intensive operation.
>   

No, it is a read I/O-intensive operation :-)

> Does that mean the server has to be idle? Or better still: go into
> maintenance (init S)? I guess not, but still..
>   


ugh, that would be a bitter pill.  I don't know anyone who would
tolerate a requirement to go into milestone/single-user for scrubs.

The way it works is that ZFS schedules I/O with a priority scheme.
Scrubs have a lower priority than most other operations.  Regular
reads and writes will have higher priority.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-16 Thread Jeff Bonwick
These are the conditions:

(1) The bug is specific to the root pool.  Other pools are unaffected.
(2) It is triggered by doing a 'zpool online' while I/O is in flight.
(3) Item (2) can be triggered by syseventd.
(4) The bug is new in build 102.  Builds 101 and earlier are fine.

I believe the following should be a viable workaround until build 103:

(1) svcadm disable -t sysevent
(2) Don't run zpool online on your root pool

Jeff

On Sun, Nov 16, 2008 at 03:12:03PM -0600, Gary Mills wrote:
> On Sat, Nov 15, 2008 at 05:41:56PM -0600, Al Hopper wrote:
> > Heads up! and apologies to folks subscribed to os-announce.
> 
> Argh, and I just live-upgraded to build 102.  I searched for this bug
> number in three bug databases without success.  Does it affect ZFS
> root only, as long as I don't use `offline' or `online'?  Do I have
> to boot back to 101?
> 
> > -- Forwarded message --
> > From: Derek Cicero <[EMAIL PROTECTED]>
> > Date: Sat, Nov 15, 2008 at 1:14 PM
> > Subject: [osol-announce] IMPT: Do not use SXCE Build 102
> > To: [EMAIL PROTECTED], os-discuss
> > <[EMAIL PROTECTED]>
> > 
> > Due to the following bug, I have removed build 102 from the Download page.
> > 
> >  6771840 zpool online on ZFS root can panic system
> > 
> > It apparently may cause data corruption and may have been implicated in
> > damage to one or more systems that have upgraded to build 102 or
> > beyond.
> 
> -- 
> -Gary Mills--Unix Support--U of M Academic Computing and Networking-
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scrub

2008-11-16 Thread Andrew Gabriel
dick hoogendijk wrote:
> Can I do a zpool scrub on a running server without effecting
> webserving / email serving? I read it is a I/O-intensive operation.
> Does that mean the server has to be idle? Or better still: go into
> maintenance (init S)? I guess not, but still..
>   

It used to have a really bad effect on the performance of my Nevada 
system (builds back in the 70's), but more recently, I don't notice it 
much (and I think it's taking longer). I don't know if someone has done 
something to reduce the load it places on the system, or maybe it's 
something to do with switching to the nv_sata driver, but whatever, the 
impact is much less in current Nevada builds.

Don't need to drop into single user or shut down the app. I would start 
it at the beginning of a quiet period though, and monitor the system 
performance, at least the first time you run it. You can cancel it if 
you start seeing bad performance.

Note there was a bug whereby issuing the zpool status command as a 
privileged user caused the scrub to restart (so if you kept looking, it 
never appeared to be getting anywhere). It's fixed now in Nevada (don't 
know about in S10). Safest thing if you aren't sure is to make sure you 
issue zpool status command only as a non-priv user, which didn't cause a 
scrub restart.

-- 
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-16 Thread Bob Friesenhahn
On Sun, 16 Nov 2008, Richard Elling wrote:
>
> Let's do some math.  A generally accepted Soft Error Rate (SER) for
> DRAMs is
> 1,000 FITs or an Annualized Failure Rate (AFR) of 0.88%.  If a non-ECC DIMM
> has 8 chips then your AFR is 7%, or 14% for 16 chip DIMMs.  My desktop
> has 4 DIMMs at 16-chips each, so I should expect an AFR of 56%.  Since these
> are soft errors, a RAM test program may not detect it.

This does not consider the possibility of a motherboard problem.  In 
my case, a partial motherboard failure caused many ECC events.  It was 
as if a couple of DIMMs were failing.  Solaris/ECC did the right thing 
to isolate the failing parts so I was not aware of the problem at all 
except for a fault report.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] scrub

2008-11-16 Thread dick hoogendijk
Can I do a zpool scrub on a running server without effecting
webserving / email serving? I read it is a I/O-intensive operation.
Does that mean the server has to be idle? Or better still: go into
maintenance (init S)? I guess not, but still..

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv101 ++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance

2008-11-16 Thread Mike Futerko
Hello list,


I have a system with 2x 1.8 GHz AMD CPUs, 4G of ECC RAM, 7T RAID-Z pool
on Areca controller with about 400 file systems on OpenSolaris snv_101.

The problem is that it takes VERY long to take or delete snapshot and
sync incremental snapshots to backup system.


System load is quite low I'd say, CPU is 98% idle:
load average: 0.09, 0.13, 0.26

IOPs are low as well:
   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
data5.62T  2.32T 98115  7.72M  2.35M
data5.62T  2.32T547227  47.9M   864K
data5.62T  2.32T204 58  15.9M   616K
data5.62T  2.32T  4  0   256K  0
data5.62T  2.32T 20  0   399K  0
data5.62T  2.32T 99 47  9.68M   264K
data5.62T  2.32T  0 11  6.93K  38.1K
data5.62T  2.32T  0455506  1.90M
data5.62T  2.32T250 21  17.0M   420K
data5.62T  2.32T150235  10.7M  1.34M
data5.62T  2.32T305  0  16.0M  0
data5.62T  2.32T137  3.42K  12.9M  16.8M
data5.62T  2.32T107  0  13.2M  0
data5.62T  2.32T 56  0  4.97M  0
data5.62T  2.32T200296  23.6M  1.70M


mpstat output:
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0  160   0  690  1152  568 1133   89   68  1440  25995   8   0  87
  1  154   0  108  4424 3241 1388  102   68  1370  24815   7   0  88
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  06   0   83   594  365  2860   3130   6160   7   0  93
  10   00   524  141  6692   2710   3210   2   0  98
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   0   55   575  353  2800   1530   4831   6   0  93
  10   00   462  142  6103   1750   4501   2   0  97
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   454  210  3230   1910   7630   3   0  97
  10   00   288  166  2970   1530   3380   2   0  98
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   398  172  2130   1310   6260   1   0  99
  10   00   252  154  2450   1510   2490   0   0 100
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  00   00   461  229  2920   1710   5010   3   0  97
  10   00   290  149  3394   1210   4020   2   0  98



What can be wrong that ZFS operations like create file system,
take/destroy snapshot, (not saying snapshot listing which takes ages)
takes minutes to complete.

Is there something I can look at which would help to determine where is
a bottleneck or what is wrong etc?


Thanks in advance for any advice,
Mike
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fwd: [osol-announce] IMPT: Do not use SXCE Build 102

2008-11-16 Thread Gary Mills
On Sat, Nov 15, 2008 at 05:41:56PM -0600, Al Hopper wrote:
> Heads up! and apologies to folks subscribed to os-announce.

Argh, and I just live-upgraded to build 102.  I searched for this bug
number in three bug databases without success.  Does it affect ZFS
root only, as long as I don't use `offline' or `online'?  Do I have
to boot back to 101?

> -- Forwarded message --
> From: Derek Cicero <[EMAIL PROTECTED]>
> Date: Sat, Nov 15, 2008 at 1:14 PM
> Subject: [osol-announce] IMPT: Do not use SXCE Build 102
> To: [EMAIL PROTECTED], os-discuss
> <[EMAIL PROTECTED]>
> 
> Due to the following bug, I have removed build 102 from the Download page.
> 
>  6771840 zpool online on ZFS root can panic system
> 
> It apparently may cause data corruption and may have been implicated in
> damage to one or more systems that have upgraded to build 102 or
> beyond.

-- 
-Gary Mills--Unix Support--U of M Academic Computing and Networking-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-16 Thread Ian Collins
 On Mon 17/11/08 09:17 , Richard Elling [EMAIL PROTECTED] sent:
> Ian Collins wrote:
> 
> > ZFS also uses system RAM in a way it hasn't been
> used before.  Memory> that would have been unused or holding static
> pages is now churning> rapidly, in a way similar memory testers like
> memtest86. Random patterns> are cycling though RAM like never before,
> greatly increasing the chances> for hitting a bad bit or addressing error.  
> I've
> had RAM faults that> have taken hours with memtest86 to hit the
> trigger bit pattern that> would have gone unnoticed for years if I hadn't
> seen data corruption> with ZFS.
> >
> > ZFS may turn out to be the ultimate RAM soak
> tester!>   
> 
> :-)  no, not really.  SERs are more of a problem for idle DRAM because
> theprobability of a SER affecting you is a function of the time the data 
> has been
> sitting in RAM waiting to be affected by upsets.
> 
Maybe not for soft errors, but more so for hard errors.  The last faulty DIMM I 
had had been in use for more than a year before I started using ZFS on that 
system.  Within a few days I had I/O errors reported by ZFS.

It may have been a coincidence, but I don't believe in those!

-- 
Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for

2008-11-16 Thread gnomad
Henrik Johansson writes:

> I looked at this a month back, i was leaning towards
> intel for  
> performance and power consumption but went for AMD
> doe to lack of ECC  
> support in most of the Intel chipsets.

This seems to be the crux of my indecision as well.

> I went for a AM2+ GeForce 8200 motherboard which
> seemed more stable  
> with Solaris than 8300.

The problem I have been having is that the best I can hear of the in-production 
AMD hardware is that some is simply "more stable" than others.

-g.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for

2008-11-16 Thread gnomad
Al Hopper writes:

> I'm going to be somewhat rude and bypass your list of
> detailed
> questions - but give you my thoughts on a motherboard
> recommendation
> (and other hardware).

No worries, you've pretty much confirmed things I already knew.  ;-)

> a) related to the 1Tb disks, I'd highly recommend the
> WD Caviar Black
> drive.  Its fast and the firmware does a great job on
> different
> workloads that vary between large file sequencial
> read (workloads) to
> (workloads that demand) lots of small random
> reads/writes.  Their
> "dual processor" controller architecture really
> works.

I have been a fan of Seagate for the past few years, but it seems as if they 
have taken a big dive in the past six months.  I was planing to go with WD for 
this project, though probably Green over Black, as heat and noise are the 
primary concerns.

> b) If I were building a system today, I'd go Intel -
>  even thought I'm
> n AMD fanboy - but I can't recommend AMD today ...
> unfortunately.

Aside from the ECC issue, of course.

> c) RAM is the most important attribute of a ZFS based
> server.  Think
> lots of RAM.  Unfortunately, Intel has turned the
> market into a
> two-tier market, with the lower (price) tier limited
> to 4 DIMM slots.
> So, pick a board that has been tested with 4 * 2Gb or
> 4 * 4Gb DIMM
> configs and plan on building a system with at least
> 4*2Gb DIMMs today.
> 
> c1) If you have a choice, based on your budgetary
> constraints, between
> (for example) 4*1G of "performance" RAM and 4*2Gb of
> "value" (main
> stream performance) RAM - go with value RAM.
>  Whatever you do,  PLEASE
> aximize system memory capacity.

I was planning to go with 2 x 2G sticks (total 4G) in a four slot mobo which 
would allow me to upgrade to 8G if necessary.  I think 4G should be sufficient 
as I will be the only user for now.

> d) The P45 based boards are a no-brainer.  Great
> performance, good
> pricing, reasonable power consumption and highly
> mature.

While I would agree the P45 is mature in terms of mobo support, I have not seen 
indication that those motherboards are mature in terms of Solaris support.

> e) If the board is going to be *only* used as a NAS,
> the current CPU
> "sweet spot" is, IMHO, the Intel  Intel Core 2 Duo
> E7200 (45nm, 2.53
> GHz, 3MB L2 Cache).  Plenty of "horsepower",
> low-power consumption,
> nice cache capacity and priced to go!

I was actually thinking the E5200 which seems nearly as powerful at 2/3 the 
price.

> f) If you intend to use the box for other demanding
> tasks (for
> example, running other OS under VirtualBox) and need
> more CPU power,
> I'd pick the E8400 (dual core).   But remember, the
> priority is RAM
> capacity first, upgraded CPU second.  I really think
> that the E7200
> will work well in your application.

And I am planning to select a mobo that will take the latest 45 nm quad cores 
should I decide to do that upgrade down the road.

> I really don't think you can go wrong with any Intel
> based system that
> has had a halfway decent review report card.

The real question is how solid the P45/ICH10 support is with Solaris, and 
whether the lack of ECC supports negates much of the advantages of the 
P45/Core2.  I have nothing against AMD (I was an AMD guy prior to the Core 
architecture) but I just have not seen much in the way of solid reports from 
the AMD mobo chipsets currently in production.

-g.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-16 Thread Richard Elling
[EMAIL PROTECTED] wrote:
>> RTL8211C IP checksum offload is broken.  You can disable it, but you
>> have to edit /etc/system.  See CR 6686415 for details.
>> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6686415
>> -- richard
>> 
>
>
> I think the proper way to state this is "the driver doesn't properly 
> support checksum offload".  (In many of the newer realtek cards the
> way the offload is done is differently)
>   

Yes, this is a better way to say it.  The bug is in, or can be solved 
by, the driver.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-16 Thread Richard Elling
Ian Collins wrote:
> Al Hopper wrote:
>   
>> On Sat, Nov 15, 2008 at 9:26 AM, Richard Elling <[EMAIL PROTECTED]> wrote:
>>   
>> 
>>> dick hoogendijk wrote:
>>> 
>>>   
 On Sat, 15 Nov 2008 18:49:17 +1300
 Ian Collins <[EMAIL PROTECTED]> wrote:


   
 
> [EMAIL PROTECTED] wrote:
>
> 
>   
>>  > WD Caviar Black drive [...] Intel E7200 2.53GHz 3MB L2
>>  > The P45 based boards are a no-brainer
>>
>> 16G of DDR2-1066 with P45 or
>>   8G of ECC DDR2-800 with 3210 based boards
>>
>> That is the question.
>>
>>
>>   
>> 
> I guess the answer is how valuable is your data?
>
> 
>   
 I disagree. The answer is: go for the 16G and make backups. The 16G
 system will work far more "easy" and I may be lucky but in the past
 years I did not have ZFS issues with my non-ECC ram ;-)

   
 
>>> You are lucky.  I recommend ECC RAM for any data that you care
>>> about.  Remember, if there is a main memory corruption, that may
>>> impact the data that ZFS writes which will negate any on-disk
>>> redundancy.  And yes, this does occur -- check the archives for the
>>> tales of woe.
>>> 
>>>   
>> I agree with your recommendation Richard.  OTOH I've built/used a
>> bunch of systems over several years that were mostly non ECC equipped
>> and only lost one DIMM along the way.  So I guess I've been lucky also
>> - but IMHO the failure rate for RAM these days is pretty small[1].
>> I've also been around hundreds of SPARC boxes and, again, very, few
>> RAM failures (one is all that I can remember).
>>
>>   
>> 
> I think the situation will change with the current expansion in RAM
> sizes.  Five years ago with mainly 32 bit x86 systems, 4G of ram was a
> lot (even on most Sparc boxes).  Today 32 and 64GB are becoming common. 
> Desktop systems have seen similar growth.
>   

Let's do some math.  A generally accepted Soft Error Rate (SER) for 
DRAMs is
1,000 FITs or an Annualized Failure Rate (AFR) of 0.88%.  If a non-ECC DIMM
has 8 chips then your AFR is 7%, or 14% for 16 chip DIMMs.  My desktop
has 4 DIMMs at 16-chips each, so I should expect an AFR of 56%.  Since these
are soft errors, a RAM test program may not detect it.

ECC will dramatically reduce the system-level effects of SERs.  Extended ECC
will further reduce this by about 2 orders of magnitude.

> ZFS also uses system RAM in a way it hasn't been used before.  Memory
> that would have been unused or holding static pages is now churning
> rapidly, in a way similar memory testers like memtest86. Random patterns
> are cycling though RAM like never before, greatly increasing the chances
> for hitting a bad bit or addressing error.  I've had RAM faults that
> have taken hours with memtest86 to hit the trigger bit pattern that
> would have gone unnoticed for years if I hadn't seen data corruption
> with ZFS.
>
> ZFS may turn out to be the ultimate RAM soak tester!
>   

:-)  no, not really.  SERs are more of a problem for idle DRAM because the
probability of a SER affecting you is a function of the time the data 
has been
sitting in RAM waiting to be affected by upsets.

Note: there are some studies suggesting a correlation between SERs and
hard faults.  In practice, it doesn't really matter why or how the fault
occurred, the solution is ECC, Extended ECC, or memory mirroring.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] continuous replication

2008-11-16 Thread Mertol Özyöney
Hi All;

Accessing the same data(Raid Group) from different controllers does slow
down the system considerebly. 
All modern controllers will demand the administrator to choose the primary
controler for Raid Groups. 
Two controller accesing the same data will require drive interface switching
between ports, controllers will not be able to optimize the head movement,
caching will suffer due to dublicate records on both controllers,  a lot of
data transfers between each controller... 

Only very few disk systems support multi controler access to same data and
when you read their best practice document you will notice that this is not
recommended.  

Best regards
Mertol 


Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email [EMAIL PROTECTED]



-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Mattias Pantzare
Sent: Friday, November 14, 2008 11:48 PM
To: David Pacheco
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] continuous replication

> I think you're confusing our clustering feature with the remote
> replication feature. With active-active clustering, you have two closely
> linked head nodes serving files from different zpools using JBODs
> connected to both head nodes. When one fails, the other imports the
> failed node's pool and can then serve those files. With remote
> replication, one appliance sends filesystems and volumes across the
> network to an otherwise separate appliance. Neither of these is
> performing synchronous data replication, though.

That is _not_ active-active, that is active-passive.



If you have a active-active system I can access the same data via both
controllers at the same time. I can't if it works like you just
described. You can't call it active-active just because different
volumes are controlled by different controllers. Most active-passive
RAID controllers can do that.

The data sheet talks about active-active clusters, how does that work?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Status of the ADM

2008-11-16 Thread Mertol Özyöney
Hi All ;

 

Is there any update on the status of ADM?

 

Best regards

Mertol

 

 

 


  http://www.sun.com/emrkt/sigs/6g_top.gif

Mertol Ozyoney 
Storage Practice - Sales Manager

Sun Microsystems, TR
Istanbul TR
Phone +902123352200
Mobile +905339310752
Fax +90212335
Email   [EMAIL PROTECTED]

 

 

<>___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS hangs on laptop

2008-11-16 Thread Galen
I am NOT on a notebook and I am having this problem. It the hang/pause is less 
intense when I don't use compression with ZFS. This issue happens on my large 
zpool as well as my boot zfs volume (single disk)

Here's my scanpci output:

pci bus 0x cardnum 0x00 function 0x00: vendor 0x10de device 0x02f0
 nVidia Corporation C51 Host Bridge

pci bus 0x cardnum 0x00 function 0x01: vendor 0x10de device 0x02fa
 nVidia Corporation C51 Memory Controller 0

pci bus 0x cardnum 0x00 function 0x02: vendor 0x10de device 0x02fe
 nVidia Corporation C51 Memory Controller 1

pci bus 0x cardnum 0x00 function 0x03: vendor 0x10de device 0x02f8
 nVidia Corporation C51 Memory Controller 5

pci bus 0x cardnum 0x00 function 0x04: vendor 0x10de device 0x02f9
 nVidia Corporation C51 Memory Controller 4

pci bus 0x cardnum 0x00 function 0x05: vendor 0x10de device 0x02ff
 nVidia Corporation C51 Host Bridge

pci bus 0x cardnum 0x00 function 0x06: vendor 0x10de device 0x027f
 nVidia Corporation C51 Memory Controller 3

pci bus 0x cardnum 0x00 function 0x07: vendor 0x10de device 0x027e
 nVidia Corporation C51 Memory Controller 2

pci bus 0x cardnum 0x02 function 0x00: vendor 0x10de device 0x02fc
 nVidia Corporation C51 PCI Express Bridge

pci bus 0x cardnum 0x03 function 0x00: vendor 0x10de device 0x02fd
 nVidia Corporation C51 PCI Express Bridge

pci bus 0x cardnum 0x04 function 0x00: vendor 0x10de device 0x02fb
 nVidia Corporation C51 PCI Express Bridge

pci bus 0x cardnum 0x05 function 0x00: vendor 0x10de device 0x0245
 nVidia Corporation C51 [Quadro NVS 210S/GeForce 6150LE]

pci bus 0x cardnum 0x09 function 0x00: vendor 0x10de device 0x0270
 nVidia Corporation MCP51 Host Bridge

pci bus 0x cardnum 0x0a function 0x00: vendor 0x10de device 0x0260
 nVidia Corporation MCP51 LPC Bridge

pci bus 0x cardnum 0x0a function 0x01: vendor 0x10de device 0x0264
 nVidia Corporation MCP51 SMBus

pci bus 0x cardnum 0x0a function 0x02: vendor 0x10de device 0x0272
 nVidia Corporation MCP51 Memory Controller 0

pci bus 0x cardnum 0x0b function 0x00: vendor 0x10de device 0x026d
 nVidia Corporation MCP51 USB Controller

pci bus 0x cardnum 0x0b function 0x01: vendor 0x10de device 0x026e
 nVidia Corporation MCP51 USB Controller

pci bus 0x cardnum 0x0d function 0x00: vendor 0x10de device 0x0265
 nVidia Corporation MCP51 IDE

pci bus 0x cardnum 0x0e function 0x00: vendor 0x10de device 0x0266
 nVidia Corporation MCP51 Serial ATA Controller

pci bus 0x cardnum 0x0f function 0x00: vendor 0x10de device 0x0267
 nVidia Corporation MCP51 Serial ATA Controller

pci bus 0x cardnum 0x10 function 0x00: vendor 0x10de device 0x026f
 nVidia Corporation MCP51 PCI Bridge

pci bus 0x cardnum 0x10 function 0x01: vendor 0x10de device 0x026c
 nVidia Corporation MCP51 High Definition Audio

pci bus 0x cardnum 0x14 function 0x00: vendor 0x10de device 0x0269
 nVidia Corporation MCP51 Ethernet Controller

pci bus 0x cardnum 0x18 function 0x00: vendor 0x1022 device 0x1100
 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology 
Configuration

pci bus 0x cardnum 0x18 function 0x01: vendor 0x1022 device 0x1101
 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map

pci bus 0x cardnum 0x18 function 0x02: vendor 0x1022 device 0x1102
 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller

pci bus 0x cardnum 0x18 function 0x03: vendor 0x1022 device 0x1103
 Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control

pci bus 0x0002 cardnum 0x00 function 0x00: vendor 0x1095 device 0x3132
 Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller

pci bus 0x0003 cardnum 0x00 function 0x00: vendor 0x1095 device 0x3132
 Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller

pci bus 0x0004 cardnum 0x09 function 0x00: vendor 0x1095 device 0x3114
 Silicon Image, Inc. SiI 3114 [SATALink/SATARaid] Serial ATA Controller
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs boot - U6 kernel patch breaks sparc boot

2008-11-16 Thread Ed Clark
Hi,

> > 
> > 1. a copy of the 137137-09 patchadd log if you have
> one available 
> 
> cp it to
> http://iws.cs.uni-magdeburg.de/~elkner/137137-09/
> Can't spot anything unusual.
>   

thanks for info - what you provided here is the patch pkg installation log, 
what i was actually after was patchadd log (ie. the messages output to 
terminal) -- both the patchadd log and the console log on reboot should have 
shown errors which would have provided hints as to what the problem was


> 2. an indication of anything particular about the
> system configuration, ie. mirrored root 
> 
> No mirrors/raid:
> 
> # format
> Searching for disks...done
> 
> 
> AVAILABLE DISK SELECTIONS:
> 0. c0t0d0   alt 2 hd 4 sec 737>
>  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
> c0t1d0   sec 606>
>  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>  c0t2d0 
>  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
>  c0t3d0 
>  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
> put from the following commands run against root fs
> where 137137-09 was applied
> > 
> > ls -l usr/platform/sun4u/lib/fs/*/bootblk
> > ls -l platform/sun4u/lib/fs/*/bootblk
> > sum usr/platform/sun4u/lib/fs/*/bootblk
> > sum platform/sun4u/lib/fs/*/bootblk
> > dd if=/dev/rdsk/ of=/tmp/bb bs=1b iseek=1
> count=15
> > cmp /tmp/bb usr/platform/sun4u/lib/fs/ufs/bootblk
> > cmp /tmp/bb platform/sun4u/lib/fs/ufs/bootblk
> > prtvtoc /dev/rdsk/
> 
> also cp to
> http://iws.cs.uni-magdeburg.de/~elkner/137137-09/
> Seems to be ok, too.
>   

now the df/prtvtoc output was most useful :

137137-09 delivers sparc newboot, and the problem here appears to be that a 
root fs slice of 256M falls well below the minimum required size required for 
sparc newboot to operate nominally -- due to the lack of space in /, i suspect 
that 137137-09 postpatch failed to copy the ~180MB failsafe archive 
(/platform/sun4u/failsafe) to your system, and that the ~80M boot archive 
(/platform/sun4u/boot_archive) was not created correctly on the reboot after 
applying 137137-09

the 'seek failed' error message you see on boot is coming from the ufs bootblk 
fcode, which i suspect is due to not being able load the corrupt boot_archive

you should be able to get your system to boot by doing the following

1. net/CD/DVD boot the system using a recent update release, u5/u6 should work, 
not sure about u4 or earlier 
2. mount the root fs slice, cd to 
3. ls -l platform/sun4u
4. rm -f platform/sun4u/boot_archive
5. sbin/bootadm -a update_all
6. ls -l platform/sun4u

the boot_archive file should build successfully, and you should see something 
like the following

# ls -la platform/sun4u
total 168008
drwxr-xr-x   4 root sys  512 Nov 16 07:36 .
drwxr-xr-x  40 root sys 1536 Nov 16 05:36 ..
-rw-r--r--   1 root root 84787200 Nov 16 07:36 boot_archive
-rw-r--r--   1 root sys71808 Oct  3 14:28 bootlst
drwxr-xr-x   9 root sys  512 Nov 16 05:10 kernel
drwxr-xr-x   4 root bin  512 Nov 16 05:36 lib
-rw-r--r--   1 root sys  1084048 Oct  3 14:28 wanboot
# 


boot_archive corruption will be a recurrent problem on your configuration, 
every time the system determines that boot_archive needs to be rebuilt on 
reboot -- a very inelegant workaround would be to 'rm -f 
/platform/sun4u/boot_archive' every time before rebooting the system

better option would be to reinstall the system, choosing a disk layout adequate 
for newboot

hth,
Ed
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Bob Friesenhahn
On Sun, 16 Nov 2008, Ross wrote:

> Well yes, but he doesn't sound too worried about performance, and 
> I'm not aware of any other issues with splitting drives?

Besides some possible loss of performance, splitting drives tends to 
blow natural redundancy models where you want as little coupling as 
possible.  If the part of the drive that zfs is using fails, you don't 
want to have to worry about whether the data in other partitions is 
still recoverable so maybe you should delay repair of the zfs pool 
while you investigate the rest of the drive.  You just want to slap in 
a new drive and wait for the pool to recover as quickly as possible. 
It is best to minimize commonality between redundant components to the 
maximum extent possible.

Bob
==
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Martin Blom
Miles Nordin wrote:
>
> mb> 5) Given that this is all cheap PC hardware ... can I move a
> mb> disk from a broken controller to another
>
> zpool export, zpool import.
>   
I was testing with the rpool, but "zpool import -f" when booting for the
CD did the trick. Thanks for the hint.
> If the pool is only DEGRADED it would be nice to do it online, but I
> don't know a way to do that.
>
> mb> How does this idea sound to you?
>
> I think you need a backup in a separate pool or a non-ZFS filesystem.
> The backup could be .tar files or an extracted rsync copy, but somehow
> I think you need protection against losing the whole pool to software
> bugs or operator error.  There are other cases where you might want to
> destroy and recreate the pool, like wanting to remove a slog or change
> the raidz/raidz2/mirror level, but I think that's not why you need it.
> You need it for protection.  losing the pool is really possible.
>   
I do intend to keep backups both before and after, but are you referring
to the actual migration or when everything is completed? I know the data
is at risk while transferring the old content and when attaching the
third drive; what I'm worried about is if I'm risking it more than usual
when the procedure is done?

-- 
 Martin Blom --- [EMAIL PROTECTED] 
Eccl 1:18 http://martin.blom.org/



smime.p7s
Description: S/MIME Cryptographic Signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Ross
Well yes, but he doesn't sound too worried about performance, and I'm not aware 
of any other issues with splitting drives?

And if you did want performance later, it would probably be possible to add a 
flash drive for cache once the prices drop.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Henrik Johansson

On Nov 16, 2008, at 11:23 AM, Vincent Boisard wrote:

> I just found this: http://www.sun.com/software/solaris/whats_new.jsp
>
> I lists Solaris 10 features and is a first hint at what features are  
> in.
>
> Another question: My MoBo has a JMB (363 I think) SATA controller. I  
> know support is included now in sxce, but I don't know for s10U6.
>
> Is there a changelog for S10U6 somewhere like for SXCE ?

Have a look at the bugids in the patches for S10U6, like the kernel  
patch 137137-09. There are lists of all the new patches in the  
documentation for the release at docs.sun.com.

Henrik Johansson
http://sparcv9.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread dick hoogendijk
On Sun, 16 Nov 2008 03:13:59 PST
Ross <[EMAIL PROTECTED]> wrote:

> I don't know much about working with slices in Solaris I'm afraid,
> but to me that sounds like a pretty good setup for a home server, and
> I can't see why the layout would cause you any problems.
> 
> In theory you'll be able to swap controllers without any problems
> too.  That's one of the real benefits of ZFS for me, although it's
> not something I've had to put into practice yet.

One minor issue: zfs works best on whole disks.

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv101 ++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Vincent Boisard
Silly me,

I didn't scroll down far enough to see the bug description on the patch
readme page.

Vincent

On Sun, Nov 16, 2008 at 12:24 PM, Vincent Boisard <[EMAIL PROTECTED]>wrote:

> Hi,
>
>
>> Have a look at the bugids in the patches for S10U6, like the kernel patch
>> 137137-09. There are lists of all the new patches in the documentation for
>> the release at docs.sun.com.
>>
>
> Thank you for the hint, but I can only see the patches and not the bugid
> description as I don't have any service plan with sun.
>
> Vincent
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Vincent Boisard
Hi,


> Have a look at the bugids in the patches for S10U6, like the kernel patch
> 137137-09. There are lists of all the new patches in the documentation for
> the release at docs.sun.com.
>

Thank you for the hint, but I can only see the patches and not the bugid
description as I don't have any service plan with sun.

Vincent
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mirror and RaidZ on only 3 disks

2008-11-16 Thread Ross
I don't know much about working with slices in Solaris I'm afraid, but to me 
that sounds like a pretty good setup for a home server, and I can't see why the 
layout would cause you any problems.

In theory you'll be able to swap controllers without any problems too.  That's 
one of the real benefits of ZFS for me, although it's not something I've had to 
put into practice yet.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Vincent Boisard
I just found this: http://www.sun.com/software/solaris/whats_new.jsp

I lists Solaris 10 features and is a first hint at what features are in.

Another question: My MoBo has a JMB (363 I think) SATA controller. I know
support is included now in sxce, but I don't know for s10U6.

Is there a changelog for S10U6 somewhere like for SXCE ?

Thanks,

Vincent
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [ldoms-discuss] Solaris 10 patch 137137-09 broke LDOM

2008-11-16 Thread Casper . Dik

>I've tried using S10 U6 to reinstall the boot file (instead of U5) over 
>jumpstart as its a ldom, n
oticed a another error.
>
>Boot device: /[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]  File and 
>args: -s
>Requesting Internet Address for 0:14:4f:f9:84:f3
>boot: cannot open kernel/sparcv9/unix
>Enter filename [kernel/sparcv9/unix]:
>
>Has anyone seen this error on U6 jumpstart or is it just me?

Make sure that you use a inetboot for s10u6 so it properly loads the boot
archive.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best SXCE version for ZFS Home Server

2008-11-16 Thread Vincent Boisard
Hi,


>
If Zone Cloning via ZFS snapshots is the only feature you miss in S10u6,
> then you should reconsider.  Writing a script to implement this yourself
> will require only a little experimentation.
>

It is the only feature I miss now, because it is the only one I know of,
because I don't know exactly which features have been backported to S10U6.

Vincent
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Still more questions WRT selecting a mobo for small ZFS RAID

2008-11-16 Thread dick hoogendijk
On Sat, 15 Nov 2008 13:38:53 -0600
"Al Hopper" <[EMAIL PROTECTED]> wrote:

>  So I guess I've been lucky also
> - but IMHO the failure rate for RAM these days is pretty small[1].
> I've also been around hundreds of SPARC boxes and, again, very, few
> RAM failures (one is all that I can remember).
> 
> Risk management is exactly that.  You have to determine where the risk
> is and how important it is and how likely it is to bite.  And then
> allocate costs from your budget to minimize that risk.

So I guess, I do have to go for ECC ram when I build a new server.
I also understood that like you wrote "The P45 based boards are a
no-brainer" Intel MoBo's are a no-go when you want ECC ram -and- want
it a little cheap.

So, what -is- a really good MB that supports ECC ram (min.8MB) and what
processor is recommended?

-- 
Dick Hoogendijk -- PGP/GnuPG key: 01D2433D
+ http://nagual.nl/ | SunOS sxce snv101 ++
+ All that's really worth doing is what we do for others (Lewis Carrol)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss