Re: [zfs-discuss] Poor relative performance of SAS over SATA drives

2011-10-27 Thread Jason J. W. Williams
 if you get rid of the HBA and log device, and run with ZIL
 disabled (if your work load is compatible with a disabled ZIL.)

By get rid of the HBA I assume you mean put in a battery-backed RAID
card instead?

-J
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] Question about WD drives with Super Micro systems

2011-08-06 Thread Jason J. W. Williams
WD's drives have gotten better the last few years but their quality is still 
not very good. I doubt they test their drives extensively for heavy duty server 
configs, particularly since you don't see them inside any of the major server 
manufactures' boxes. 

Hitachi in particular does well in mass storage configs. 

-J

Sent via iPhone

Is your email Premiere?

On Aug 6, 2011, at 10:45, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote:

 Hi all
 
 We have a few servers with WD Black (and some green) drives on Super Micro 
 systems. We've seen both drives work well with direct attach, but with LSI 
 controllers and Super Micro's SAS expanders, well, that's another story. With 
 those SAS expanders, we've seen numerous drives being kicked out and flagged 
 as bad during high load (typically scrub/resilver). We have not seen this on 
 the units we have with Hitachi or Seagate drives. After a drive is kicked 
 out, we run a test on it, using WDs tool, and in many (or most) cases, we 
 find the drive being error free. We've seen these issues on several machines, 
 so hardware failure seem not to be the case.
 
 Have anyone here used WD drives with LSI controllers (3801/3081/9211) with 
 Super Micro machines? Any success stories?
 
 Vennlige hilsener / Best regards
 
 roy
 --
 Roy Sigurd Karlsbakk
 (+47) 97542685
 r...@karlsbakk.net
 http://blogg.karlsbakk.net/
 --
 I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
 er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
 idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
 relevante synonymer på norsk.
 
 ___
 OpenIndiana-discuss mailing list
 openindiana-disc...@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] Question about WD drives with Super Micro systems

2011-08-06 Thread Jason J. W. Williams
This might be related to your issue:

http://blog.mpecsinc.ca/2010/09/western-digital-re3-series-sata-drives.html

On Saturday, August 6, 2011, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote:
 In my experience, SATA drives behind SAS expanders just don't work.
 They fail in the manner you
 describe, sooner or later. Use SAS and be happy.

 Funny thing is Hitachi and Seagate drives work stably, whereas WD drives
tend to fail rather quickly

 Vennlige hilsener / Best regards

 roy
 --
 Roy Sigurd Karlsbakk
 (+47) 97542685
 r...@karlsbakk.net
 http://blogg.karlsbakk.net/
 --
 I all pedagogikk er det essensielt at pensum presenteres intelligibelt.
Det er et elementært imperativ for alle pedagoger å unngå eksessiv
anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller
eksisterer adekvate og relevante synonymer på norsk.

 ___
 OpenIndiana-discuss mailing list
 openindiana-disc...@openindiana.org
 http://openindiana.org/mailman/listinfo/openindiana-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Long import due to spares.

2010-10-05 Thread Jason J. W. Williams
Just for history as to why Fishworks was running on this box...we were
in the beta program and have upgraded along the way. This box is an
X4240 with 16x 146GB disks running the Feb 2010 release of FW with
de-dupe.

We were getting ready to re-purpose the box and getting our data off.
We then deleted a filesystem that was using de-duplication and the box
suddenly went into a freeze and the pool had activity like crazy.

After several failed attempts to recover the box to usable state (days
of importing failed), we reloaded the boot drives with Nexenta 3.0
(b134) (which was our goal anyway). When we tried to import this pool
again, after 24 hours the pool finally imported but with the error
that the two spares were FAULTED with too many errors.

Controller is an LSI 1068E-IR

Normally, I'd believe the drive was dead except, both spares? Could
this be related to the de-dupe FS being deleted?

-J
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unusual Resilver Result

2010-09-30 Thread Jason J. W. Williams
Hi,

I just replaced a drive (c12t5d0 in the listing below). For the first 6
hours of the resilver I saw no issues. However, sometime during the last
hour of the resilver, the new drive and two others in the same RAID-Z2 strip
threw a couple checksum errors. Also, two of the other drives in the stripe
sometime the the last hour decided they need to resilver small amounts of
data (128K and 64K respectively). OS in snv126.

My two questions are:

Should I be worried about these checksum errors?

What caused the small resilverings on c8t5d0 and c11t5d0 which were not
replaced or otherwise touched?

Thank you in advance.

-J

  pool: zpool_db_css
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 7h0m with 0 errors on Thu Sep 30 04:59:49
2010
config:

NAME STATE READ WRITE CKSUM
zpool_db_css  ONLINE   0 0 0
  raidz2-0   ONLINE   0 0 0
c7t5d0   ONLINE   0 0 0
c8t5d0   ONLINE   0 0 4  128K resilvered
c10t5d0  ONLINE   0 0 0
c11t5d0  ONLINE   0 0 2  64K resilvered
c12t5d0  ONLINE   0 0 3  61.0G resilvered
c13t5d0  ONLINE   0 0 0
  raidz2-1   ONLINE   0 0 0
c7t6d0   ONLINE   0 0 0
c8t6d0   ONLINE   0 0 0
c10t6d0  ONLINE   0 0 0
c11t6d0  ONLINE   0 0 0
c12t6d0  ONLINE   0 0 0
c13t6d0  ONLINE   0 0 0
  raidz2-2   ONLINE   0 0 0
c7t7d0   ONLINE   0 0 0
c8t7d0   ONLINE   0 0 0
c10t7d0  ONLINE   0 0 0
c11t7d0  ONLINE   0 0 0
c12t7d0  ONLINE   0 0 0
c13t7d0  ONLINE   0 0 0
spares
  c13t4d0AVAIL
  c12t4d0AVAIL
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unusual Resilver Result

2010-09-30 Thread Jason J. W. Williams
Thanks Tuomas. I'll run the scrub. It's an aging X4500.

-J

On Thu, Sep 30, 2010 at 3:25 AM, Tuomas Leikola tuomas.leik...@gmail.comwrote:

 On Thu, Sep 30, 2010 at 9:08 AM, Jason J. W. Williams 
 jasonjwwilli...@gmail.com wrote:


 Should I be worried about these checksum errors?


 Maybe. Your disks, cabling or disk controller is probably having some issue
 which caused them. or maybe sunspots are to blame.

 Run a scrub often and monitor if there are more, and if there is a pattern
 to them. Have backups. Maybe switch hardware one by one to see if that
 helps.


 What caused the small resilverings on c8t5d0 and c11t5d0 which were not
 replaced or otherwise touched?


 It was the checksum errors. ZFS automatically read the good data on other
 mirrors, and replaced the broken blocks with correct data. If you run zpool
 clear and zpool scrub you will notice these checksum errors have vanished.
 If they were caused by botched writes, no new errors should probably appear.
 If they are botched reads, you can see some new ones appearing  :(

 So, not critical yet but something to keep an eye on.

 Tuomas

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Long resilver time

2010-09-27 Thread Jason J. W. Williams
134 it is. This is an OpenSolaris rig that's going to be replaced within the
next 60 days, so just need to get it to something that won't through false
checksum errors like the 120-123 builds do and has decent rebuild times.

Future boxes will be NexentaStor.

Thank you guys. :)

-J

On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling rich...@nexenta.com wrote:

 On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:
  Upgrading is definitely an option. What is the current snv favorite
  for ZFS stability? I apologize, with all the Oracle/Sun changes I
  haven't been paying as close attention to big reports on zfs-discuss
  as I used to.
 
  OpenIndiana b147 is the latest binary release, but it also includes
  the fix for
  CR6494473, ZFS needs a way to slow down resilvering
  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
  http://www.openindiana.org
 
  Are you sure upgrading to OI is safe at this point? 134 is stable unless
 you start fiddling with dedup, and OI is hardly tested. For a production
 setup, I'd recommend 134

 For a production setup?  For production I'd recommend something that is
 supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes
 :-)
  -- richard

 --
 OpenStorage Summit, October 25-27, Palo Alto, CA
 http://nexenta-summit2010.eventbrite.com

 Richard Elling
 rich...@nexenta.com   +1-760-896-4422
 Enterprise class storage for everyone
 www.nexenta.com





 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Long resilver time

2010-09-27 Thread Jason J. W. Williams
Err...I meant Nexenta Core.

-J

On Mon, Sep 27, 2010 at 12:02 PM, Jason J. W. Williams 
jasonjwwilli...@gmail.com wrote:

 134 it is. This is an OpenSolaris rig that's going to be replaced within
 the next 60 days, so just need to get it to something that won't through
 false checksum errors like the 120-123 builds do and has decent rebuild
 times.

 Future boxes will be NexentaStor.

 Thank you guys. :)

 -J

 On Sun, Sep 26, 2010 at 2:21 PM, Richard Elling rich...@nexenta.comwrote:

 On Sep 26, 2010, at 1:16 PM, Roy Sigurd Karlsbakk wrote:
  Upgrading is definitely an option. What is the current snv favorite
  for ZFS stability? I apologize, with all the Oracle/Sun changes I
  haven't been paying as close attention to big reports on zfs-discuss
  as I used to.
 
  OpenIndiana b147 is the latest binary release, but it also includes
  the fix for
  CR6494473, ZFS needs a way to slow down resilvering
  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6494473
  http://www.openindiana.org
 
  Are you sure upgrading to OI is safe at this point? 134 is stable unless
 you start fiddling with dedup, and OI is hardly tested. For a production
 setup, I'd recommend 134

 For a production setup?  For production I'd recommend something that is
 supported, preferably NexentaStor 3 (which is b134 + important ZFS fixes
 :-)
  -- richard

 --
 OpenStorage Summit, October 25-27, Palo Alto, CA
 http://nexenta-summit2010.eventbrite.com

 Richard Elling
 rich...@nexenta.com   +1-760-896-4422
 Enterprise class storage for everyone
 www.nexenta.com





 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intermittent ZFS hang

2010-09-27 Thread Jason J. W. Williams
If one was sticking with OpenSolaris for the short term, is something older
than 134 more stable/less buggy? Not using de-dupe.

-J

On Thu, Sep 23, 2010 at 6:04 PM, Richard Elling richard.ell...@gmail.comwrote:

 Hi Charles,
 There are quite a few bugs in b134 that can lead to this. Alas, due to the
 new
 regime, there was a period of time where the distributions were not being
 delivered. If I were in your shoes, I would upgrade to OpenIndiana b147
 which
 has 26 weeks of maturity and bug fixes over b134.

 http://www.openindiana.org
  -- richard



 On Sep 23, 2010, at 2:48 PM, Charles J. Knipe wrote:

  So, I'm still having problems with intermittent hangs on write with my
 ZFS pool.  Details from my original post are below.  Since posting that,
 I've gone back and forth with a number of you, and gotten a lot of useful
 advice, but I'm still trying to get to the root of the problem so I can
 correct it.  Since the original post I have:
 
  -Gathered a great deal of information in the form of kernel thread dumps,
 zio_state dumps, and live crash dumps while the problem is happening.
  -Been advised that my ruling out of dedupe was probably premature, as I
 still likely have a good deal of deduplicated data on-disk.
  -Checked just about every log and counter that might indicate a hardware
 error, without finding one.
 
  I was wondering at this point if someone could give me some pointers on
 the following:
  1. Given the dumps and diagnostic data I've gathered so far, is there a
 way I can determine for certain where in the ZFS driver I'm spending so much
 time hanging?  At the very least I'd like to try to determine whether it is,
 in-fact a deduplication issue.
  2. If it is, in fact, a deduplication issue, would my only recourse be a
 new pool and a send/receive operation?  The data we're storing is VMFS
 volumes for ESX.  We're tossing around the idea of creating new volumes in
 the same pool (now that dedupe is off) and migrating VMs over in small
 batches.  The theory is that we would be writing non-deduped data this way,
 and when we were done we could remove the deduplicated volumes.  Is this
 sound?
 
  Thanks again for all the help!
 
  -Charles
 
  Howdy,
 
  We're having a ZFS performance issue over here that I
  was hoping you guys could help me troubleshoot.  We
  have a ZFS pool made up of 24 disks, arranged into 7
  raid-z devices of 4 disks each.  We're using it as an
  iSCSI back-end for VMWare and some Oracle RAC
  clusters.
 
  Under normal circumstances performance is very good
  both in benchmarks and under real-world use.  Every
  couple days, however, I/O seems to hang for anywhere
  between several seconds and several minutes.  The
  hang seems to be a complete stop of all write I/O.
  The following zpool iostat illustrates:
 
  pool0   2.47T  5.13T120  0   293K  0
  pool0   2.47T  5.13T127  0   308K  0
  pool0   2.47T  5.13T131  0   322K  0
  pool0   2.47T  5.13T144  0   347K  0
  pool0   2.47T  5.13T135  0   331K  0
  pool0   2.47T  5.13T122  0   295K  0
  pool0   2.47T  5.13T135  0   330K  0
 
  While this is going on our VMs all hang, as do any
  zfs create commands or attempts to touch/create
  files in the zfs pool from the local system.  After
  several minutes the system un-hangs and we see very
  high write rates before things return to normal
  across the board.
 
  Some more information about our configuration:  We're
  running OpenSolaris svn-134.  ZFS is at version 22.
  Our disks are 15kRPM 300gb Seagate Cheetahs, mounted
  in Promise J610S Dual enclosures, hanging off a Dell
  SAS 5/e controller.  We'd tried out most of this
  configuration previously on OpenSolaris 2009.06
  without running into this problem.  The only thing
  that's new, aside from the newer OpenSolaris/ZFS is
  a set of four SSDs configured as log disks.
 
  At first we blamed de-dupe, but we've disabled that.
  Next we suspected the SSD log disks, but we've seen
  the problem with those removed, as well.
 
  Has anyone seen anything like this before?  Are there
  any tools we can use to gather information during the
  hang which might be useful in determining what's
  going wrong?
 
  Thanks for any insights you may have.
 
  -Charles
  --
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 OpenStorage Summit, October 25-27, Palo Alto, CA
 http://nexenta-summit2010.eventbrite.com
 ZFS and performance consulting
 http://www.RichardElling.com












 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org

[zfs-discuss] Long resilver time

2010-09-26 Thread Jason J. W. Williams
I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x raid-z2 
stripes with 6 disks per raid-z2. Disks are 500gb in size. No checksum errors. 

It seems like an exorbitantly long time. The other 5 disks in the stripe with 
the replaced disk were at 90% busy and ~150io/s each during the resilver. Does 
this seem unusual to anyone else? Could it be due to heavy fragmentation or do 
I have a disk in the stripe going bad? Post-resilver no disk is above 30% util 
or noticeably higher than any other disk. 

Thank you in advance. (kernel is snv123)

-J

Sent via iPhone

Is your e-mail Premiere?___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Long resilver time

2010-09-26 Thread Jason J. W. Williams
Upgrading is definitely an option. What is the current snv favorite for ZFS 
stability? I apologize, with all the Oracle/Sun changes I haven't been paying 
as close attention to big reports on zfs-discuss as I used to. 

-J

Sent via iPhone

Is your e-mail Premiere?

On Sep 26, 2010, at 10:22, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote:

 
 I just witnessed a resilver that took 4h for 27gb of data. Setup is 3x 
 raid-z2 stripes with 6 disks per raid-z2. Disks are 500gb in size. No 
 checksum errors. 
 
 It seems like an exorbitantly long time. The other 5 disks in the stripe with 
 the replaced disk were at 90% busy and ~150io/s each during the resilver. 
 Does this seem unusual to anyone else? Could it be due to heavy fragmentation 
 or do I have a disk in the stripe going bad? Post-resilver no disk is above 
 30% util or noticeably higher than any other disk. 
 
 Thank you in advance. (kernel is snv123)
 It surely seems a long time for 27 gigs. Scrub takes its time, but for this 
 50TB setup with currently ~29TB used, on WD Green drives (yeah, I know 
 they're bad, but I didn't know that at the time I installed the box, and they 
 have worked flawlessly for a year or so), scrub takes a bit of time, but 
 nothing comparible to what you're reporting
 
scrub: scrub completed after 47h57m with 0 errors on Fri Sep  3 16:57:26 
 2010
 
 Also, snv123 is quite old, is upgrading to 134 an option?
  
 Vennlige hilsener / Best regards
 
 roy
 --
 Roy Sigurd Karlsbakk
 (+47) 97542685
 r...@karlsbakk.net
 http://blogg.karlsbakk.net/
 --
 I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det 
 er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
 idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
 relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [storage-discuss] ZFS iscsi snapshot - VSS compatible?

2009-01-07 Thread Jason J. W. Williams
Since iSCSI is block-level, I don't think the iSCSI intelligence at
the file level you're asking for is feasible. VSS is used at the
file-system level on either NTFS partitions or over CIFS.

-J

On Wed, Jan 7, 2009 at 5:06 PM, Mr Stephen Yum sosu...@yahoo.com wrote:
 Hi all,

 If I want to make a snapshot of an iscsi volume while there's a transfer 
 going on, is there a way to detect this and either 1) not include the file 
 being transferred, or 2) wait until the transfer is finished before making 
 the snapshot?

 If I understand correctly, this is what Microsoft's VSS is supposed to do. Am 
 I right?

 Right now, when there is a transfer going on while making the snapshot, I 
 always end up with a corrupt file (understandably so, since the file transfer 
 is unfinished).

 S





 ___
 storage-discuss mailing list
 storage-disc...@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/storage-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3ware support

2008-02-12 Thread Jason J. W. Williams
X4500 problems seconded. Still having issues with port resets due to
the Marvell driver. Though they seem considerably more transient and
less likely to lock up the entire systems in the most recent ( b72)
OpenSolaris builds.

-J

On Feb 12, 2008 9:35 AM, Carson Gaspar [EMAIL PROTECTED] wrote:
 Tim wrote:
 

  A much cheaper (and probably the BEST supported card), is the supermicro
  based on the marvell chipset.  This is the same chipset that is used in
  the thumper x4500 so you know that the folks at sun are doing their due
  diligence to make sure the drivers are solid.

 Except the drivers _aren't_ solid, at least in Solaris(tm). The
 OpenSolaris drivers may have been fixed (I know a lot of work is going
 into them, but I haven't tested them), but those fixes have not made it
 back into the supported realm.

 So if you need to run a supported OS, I'd skip the Marvell chips if
 possible, at least for now.

 --
 Carson

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LVM on ZFS

2008-01-21 Thread Jason J. W. Williams
Hey Thiago,

SVM is a direct replacement for LVM. Also, you'll notice about a 30%
performance boost if you move from LVM to SVM. At least we did when we
moved a couple of years ago.

-J

On Jan 21, 2008 8:09 AM, Thiago Sobral [EMAIL PROTECTED] wrote:
 Hi folks,

 I need to manage volumes like LVM does on Linux or AIX, and I think that
 ZFS can solve this issue.

 I read the SVM specification and certainly it doesn't will be the
 solution that I'll adopt. I don't have Veritas here.

 I created a pool with name black and a volume lv00, then created a
 filesystem with 'newfs' command:
 #newfs /dev/zvol/rdsk/black/lv00

 is this the right way ?
 What's is the best way to manage volumes in Solaris?
 Do you have a URL or document describing this !?

 cheers,

 TS


 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] De-duplication in ZFS

2008-01-21 Thread Jason J. W. Williams
It'd be a really nice feature. Combined with baked-in replication it
would be a nice alternative to our DD appliances.

-J

On Jan 21, 2008 2:03 PM, John Martinez [EMAIL PROTECTED] wrote:

 Great question. I've been wondering this myself over the past few
 weeks, as de-dup is becoming more popular a term in our IT department.

 -john

 On Jan 20, 2008, at 5:40 PM, Narayan Venkat wrote:

  Hi,
 
  Is de-duplication in ZFS an active project?  If so, can somebody
  share details about how it's going to be implemented?
 
  Thanks.
 
  NV

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] MySQL/ZFS backup program posted.

2008-01-17 Thread Jason J. W. Williams
Hey Y'all,

I've posted the program (SnapBack) my company developed internally for
backing up production MySQL servers using ZFS snapshots:
http://blogs.digitar.com/jjww/?itemid=56

Hopefully, it'll save other folks some time. We use it a lot for
standing up new MySQL slaves as well.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hello,

There seems to be a persistent issue we have with ZFS where one of the
SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
does not offline the disk and instead hangs all zpools across the
system. If it is not caught soon enough, application data ends up in
an inconsistent state. We've had this issue with b54 through b77 (as
of last night).

We don't seem to be the only folks with this issue reading through the
archives. Are there any plans to fix this behavior? It really makes
ZFS less than desirable/reliable.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Albert,

Thank you for the link. ZFS isn't offlining the disk in b77.

-J

On Jan 3, 2008 3:07 PM, Albert Chin
[EMAIL PROTECTED] wrote:

 On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
  There seems to be a persistent issue we have with ZFS where one of the
  SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
  does not offline the disk and instead hangs all zpools across the
  system. If it is not caught soon enough, application data ends up in
  an inconsistent state. We've had this issue with b54 through b77 (as
  of last night).
 
  We don't seem to be the only folks with this issue reading through the
  archives. Are there any plans to fix this behavior? It really makes
  ZFS less than desirable/reliable.

 http://blogs.sun.com/eschrock/entry/zfs_and_fma

 FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
   http://www.opensolaris.org/os/community/arc/caselog/2007/283/
   http://www.opensolaris.org/os/community/on/flag-days/all/

 --
 albert chin ([EMAIL PROTECTED])
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Not Offlining Disk on SCSI Sense Error (X4500)

2008-01-03 Thread Jason J. W. Williams
Hi Eric,

Hard to say. I'll use MDB next time it happens for more info. The
applications using any zpool lock up.

-J

On Jan 3, 2008 3:33 PM, Eric Schrock [EMAIL PROTECTED] wrote:
 When you say starts throwing sense errors, does that mean every I/O to
 the drive will fail, or some arbitrary percentage of I/Os will fail?  If
 it's the latter, ZFS is trying to do the right thing by recognizing
 these as transient errors, but eventually the ZFS diagnosis should kick
 in.  What does '::spa -ve' in 'mdb -k' show in one of these situations?
 How about '::zio_state'?

 - Eric


 On Thu, Jan 03, 2008 at 03:11:39PM -0700, Jason J. W. Williams wrote:
  Hi Albert,
 
  Thank you for the link. ZFS isn't offlining the disk in b77.
 
  -J
 
  On Jan 3, 2008 3:07 PM, Albert Chin
  [EMAIL PROTECTED] wrote:
  
   On Thu, Jan 03, 2008 at 02:57:08PM -0700, Jason J. W. Williams wrote:
There seems to be a persistent issue we have with ZFS where one of the
SATA disk in a zpool on a Thumper starts throwing sense errors, ZFS
does not offline the disk and instead hangs all zpools across the
system. If it is not caught soon enough, application data ends up in
an inconsistent state. We've had this issue with b54 through b77 (as
of last night).
   
We don't seem to be the only folks with this issue reading through the
archives. Are there any plans to fix this behavior? It really makes
ZFS less than desirable/reliable.
  
   http://blogs.sun.com/eschrock/entry/zfs_and_fma
  
   FMA For ZFS Phase 2 (PSARC/2007/283) was integrated in b68:
 http://www.opensolaris.org/os/community/arc/caselog/2007/283/
 http://www.opensolaris.org/os/community/on/flag-days/all/
  
   --
   albert chin ([EMAIL PROTECTED])
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 --
 Eric Schrock, FishWorkshttp://blogs.sun.com/eschrock

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance with Oracle

2007-12-05 Thread Jason J. W. Williams
Seconded. Redundant controllers means you get one controller that
locks them both up, as much as it means you've got backup.

Best Regards,
Jason

On Mar 21, 2007 4:03 PM, Richard Elling [EMAIL PROTECTED] wrote:
 JS wrote:
  I'd definitely prefer owning a sort of SAN solution that would basically 
  just be trays of JBODs exported through redundant controllers, with 
  enterprise level service. The world is still playing catch up to integrate 
  with all the possibilities of zfs.

 It was called the A5000, later A5100 and A5200.  I've still
 got the scars and Torrey looks like one of the X-men.  If you think
 that a disk drive vendor can write better code than an OS/systems
 vendor, then you're due for a sad realization.
   -- richard

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.

2007-12-04 Thread Jason J. W. Williams
Hey Guys,

Have any of y'all seen a condition where the ILOM considers a disk
faulted (status is 3 instead of 1), but ZFS keeps writing to the disk
and doesn't report any errors? I'm going to do a scrub tomorrow and
see what comes back. I'm curious what caused the ILOM to fault the
disk. Any advice is greatly appreciated.

Best Regards,
Jason

P.S.
The system is running OpenSolaris Build 54.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X4500 ILOM thinks disk 20 is faulted, ZFS thinks not.

2007-12-04 Thread Jason J. W. Williams
Hi Ralf,

Thank you for the suggestion. About half of the disks are reporting
1968-1969 in the Soft Errors field.  All disks are reporting 1968 in
the Illegal Request field. There don't appear to be any other
errors; all other counters are 0. The Illegal Request count seems a
little fishy...like iostat -E doesn't like the X4500 for some reason.
Thank you again for your help.

Best Regards,
Jason

On Dec 4, 2007 2:54 AM, Ralf Ramge [EMAIL PROTECTED] wrote:
 Jason J. W. Williams wrote:
  Have any of y'all seen a condition where the ILOM considers a disk
  faulted (status is 3 instead of 1), but ZFS keeps writing to the disk
  and doesn't report any errors? I'm going to do a scrub tomorrow and
  see what comes back. I'm curious what caused the ILOM to fault the
  disk. Any advice is greatly appreciated.
 
 What does `iostat -E` tell you?

 I've experienced several times that ZFS is very fault tolerant - a bit
 too tolerant for my taste - when it comes to faulting a disk. I saw
 external FC drives with hundreds or even thousands of errors, even
 entire hanging loops or drives with hardware trouble, and neither ZFS
 nor /var/adm/messages reported a problem. So I prefer examining the
 iostat output over `zpool status` - but with the unattractive side
 effect that it's not possible to reset the error count which iostat
 reports without a reboot, so this method is not suitable for monitoring
 purposes.

 --

 Ralf Ramge
 Senior Solaris Administrator, SCNA, SCSA

 Tel. +49-721-91374-3963
 [EMAIL PROTECTED] - http://web.de/

 11 Internet AG
 Brauerstraße 48
 76135 Karlsruhe

 Amtsgericht Montabaur HRB 6484

 Vorstand: Henning Ahlert, Ralph Dommermuth, Matthias Ehrlich, Andreas Gauger, 
 Thomas Gottschlich, Matthias Greve, Robert Hoffmann, Norbert Lang, Achim Weiss
 Aufsichtsratsvorsitzender: Michael Scheeren


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Yager on ZFS

2007-11-09 Thread Jason J. W. Williams
 A quick Google of ext3 fsck did not yield obvious examples of why people 
 needed to run fsck on ext3, though it did remind me that by default ext3 runs 
 fsck just for the hell of it every N (20?) mounts - could that have been part 
 of what you were seeing?


I'm not sure if that's what Robert meant, but that's been my
experience with ext3. In fact that little behavior caused a rather
lengthy bit of downtime on another company in our same colo facility
this week as a result of a facility required reboot. Frankly, ext3 is
an abortion of a filesystem. I'm somewhat surprised its being used as
a counterexample of journaling filesystems being no less reliable than
ZFS. XFS or ReiserFS are both better examples than ext3.

The primary use case for end-to-end checksumming in our environment
has been exonerating the storage path when data corruption occurs. Its
been crucial in a couple of instances in proving to our DB vendor that
the corruption was caused by their code and not the OS, drivers, HBA,
FC network, array etc.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Count objects/inodes

2007-11-09 Thread Jason J. W. Williams
Hi Guys,

Someone asked me how to count the number of inodes/objects in a ZFS
filesystem and I wasn't exactly sure. zdb -dv filesystem seems
like a likely candidate but I wanted to find out for sure. As to why
you'd want to know this, I don't know their reasoning but I assume it
has to do with the maximum number of files a ZFS filesystem can
support (2^48 no?). Thank you in advance for your help.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fracture Clone Into FS

2007-10-18 Thread Jason J. W. Williams
Hi Bill,

You've got it 99%. I want to roll E back to say B, and keep G intact.
I really don't care about C, D or F. Essentially, B is where I want to
roll back to, but in case B's data copy doesn't improve what I'm
trying to fix I want to have copy of G's data around so I can go back
to how it.

My order of operations would be something like this:

1.)  Snapshot filesystem to preserve current state (snapshot F).
2.)  Create clone of F (clone G).
3.)  Roll the filesystem back to snapshot B.
4.)  Maintain clone G data even though filesystem is at B.

My concerns are:

1.) If I rollback to B after creating the clone, it will erase F and
thereby the dependent clone G.
2.) If I promote the clone G, G will be the active filesystem data
copy, when I want B to be the active data copy, I just want to keep G
around.

I apologize that this is coming out so confusingly. Please let me know
if this is clear at all.

I guess in a simple way, you could say I'd like to be able to rollback
to any particular snapshot without having to lose any newer snapshot.
Thereby giving the ability to roll-forward and backward.

Thank you in advance very much!

Best Regards,
Jason

On 10/18/07, Bill Moore [EMAIL PROTECTED] wrote:
 I may not be understanding your usage case correctly, so bear with me.

 Here is what I understand your request to be.  Time is increasing from
 left to right.

 A -- B -- C -- D -- E
  \
   - F -- G

 Where E and G are writable filesystems and the others are snapshots.

 I think you're saying that you want to, for example, keep G and roll E
 back to A, keeping A, B, F, and G.

 If that's correct, I think you can just clone A (getting H), promote H,
 then delete C, D, and E.  That would leave you with:

 A -- H
 \
  -- B -- F -- G

 Is that anything at all like what you're after?


 --Bill

 On Wed, Oct 17, 2007 at 10:00:03PM -0600, Jason J. W. Williams wrote:
  Hey Guys,
 
  Its not possible yet to fracture a snapshot or clone into a
  self-standing filesystem is it? Basically, I'd like to fracture a
  snapshot/clone into is own FS so I can rollback past that snapshot in
  the original filesystem and still keep that data.
 
  Thank you in advance.
 
  Best Regards,
  Jason
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Fracture Clone Into FS

2007-10-18 Thread Jason J. W. Williams
Hi Bill,

Thinking about this a little more, would this provide the ability to
maintain B and G's data for  a rollback followed by a possible roll
forward?

1.) Create a clone of snapshot_B (clone_B).
2.) Create a new current snapshot (snapshot_F).
3.) Create a clone of snapshot_F (clone_F).
4.) Promote clone_B.
5.) If clone_Bs data doesn't work out, promote clone_F to roll forward.

Thank you in advance.

Best Regards,
Jason

On 10/18/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:
 Hi Bill,

 You've got it 99%. I want to roll E back to say B, and keep G intact.
 I really don't care about C, D or F. Essentially, B is where I want to
 roll back to, but in case B's data copy doesn't improve what I'm
 trying to fix I want to have copy of G's data around so I can go back
 to how it.

 My order of operations would be something like this:

 1.)  Snapshot filesystem to preserve current state (snapshot F).
 2.)  Create clone of F (clone G).
 3.)  Roll the filesystem back to snapshot B.
 4.)  Maintain clone G data even though filesystem is at B.

 My concerns are:

 1.) If I rollback to B after creating the clone, it will erase F and
 thereby the dependent clone G.
 2.) If I promote the clone G, G will be the active filesystem data
 copy, when I want B to be the active data copy, I just want to keep G
 around.

 I apologize that this is coming out so confusingly. Please let me know
 if this is clear at all.

 I guess in a simple way, you could say I'd like to be able to rollback
 to any particular snapshot without having to lose any newer snapshot.
 Thereby giving the ability to roll-forward and backward.

 Thank you in advance very much!

 Best Regards,
 Jason

 On 10/18/07, Bill Moore [EMAIL PROTECTED] wrote:
  I may not be understanding your usage case correctly, so bear with me.
 
  Here is what I understand your request to be.  Time is increasing from
  left to right.
 
  A -- B -- C -- D -- E
   \
- F -- G
 
  Where E and G are writable filesystems and the others are snapshots.
 
  I think you're saying that you want to, for example, keep G and roll E
  back to A, keeping A, B, F, and G.
 
  If that's correct, I think you can just clone A (getting H), promote H,
  then delete C, D, and E.  That would leave you with:
 
  A -- H
  \
   -- B -- F -- G
 
  Is that anything at all like what you're after?
 
 
  --Bill
 
  On Wed, Oct 17, 2007 at 10:00:03PM -0600, Jason J. W. Williams wrote:
   Hey Guys,
  
   Its not possible yet to fracture a snapshot or clone into a
   self-standing filesystem is it? Basically, I'd like to fracture a
   snapshot/clone into is own FS so I can rollback past that snapshot in
   the original filesystem and still keep that data.
  
   Thank you in advance.
  
   Best Regards,
   Jason
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Fracture Clone Into FS

2007-10-17 Thread Jason J. W. Williams
Hey Guys,

Its not possible yet to fracture a snapshot or clone into a
self-standing filesystem is it? Basically, I'd like to fracture a
snapshot/clone into is own FS so I can rollback past that snapshot in
the original filesystem and still keep that data.

Thank you in advance.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Direct I/O ability with zfs?

2007-10-03 Thread Jason J. W. Williams
Hi Dale,

We're testing out the enhanced arc_max enforcement (track DNLC
entries) using Build 72 right now. Hopefully, it will fix the memory
creep, which is the only real downside to ZFS for DB work it seems to
me. Frankly, of our DB loads have improved performance with ZFS. I
suspect its because we are write-heavy.

-J

On 10/3/07, Dale Ghent [EMAIL PROTECTED] wrote:
 On Oct 3, 2007, at 10:31 AM, Roch - PAE wrote:

  If the DB cache is made large enough to consume most of memory,
  the ZFS copy will quickly be evicted to stage other I/Os on
  their way to the DB cache.
 
  What problem does that pose ?

 Personally, I'm still not completely sold on the performance
 (performance as in ability, not speed) of ARC eviction. Often times,
 especially during a resilver, a server with ~2GB of RAM free under
 normal circumstances will dive down to the minfree floor, causing
 processes to be swapped out. We've had to take to manually
 constraining ARC max size so this situation is avoided. This is on
 s10u2/3. I haven't tried anything heavy duty with Nevada simply
 because I don't put Nevada in production situations.

 Anyhow, in the case of DBs, ARC indeed becomes a vestigial organ. I'm
 surprised that this is being met with skepticism considering that
 Oracle highly recommends direct IO be used,  and, IIRC, Oracle
 performance was the main motivation to adding DIO to UFS back in
 Solaris 2.6. This isn't a problem with ZFS or any specific fs per se,
 it's the buffer caching they all employ. So I'm a big fan of seeing
 6429855 come to fruition.

 /dale
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS ARC DNLC Limitation

2007-09-24 Thread Jason J. W. Williams
Hello All,

Awhile back (Feb '07) when we noticed ZFS was hogging all the memory
on the system, y'all were kind enough to help us use the arc_max
tunable to attempt to limit that usage to a hard value. Unfortunately,
at the time a sticky problem was that the hard limit did not include
DNLC entries generated by ZFS.

I've been watching the list since then and trying to watch the Nevada
commits. I haven't noticed that anything has been committed back so
that arc_max truly enforces the max amount of memory ZFS is allowed to
consume (including DNLC entries). Has this been corrected and I just
missed it? Thank you in advance for you any help.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Snapshot destroy to

2007-05-11 Thread Jason J. W. Williams

Hey All,

Is it possible (or even technically feasible) for zfs to have a
destroy to feature? Basically destroy any snapshot older than a
certain date?

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Snapshot destroy to

2007-05-11 Thread Jason J. W. Williams

Hi Mark,

Thank you very much. That's what I was kind of afraid of. Its fine to
script it, just would be nice to have a built in function. :-) Thank
you again.

Best Regards,
Jason

On 5/11/07, Mark J Musante [EMAIL PROTECTED] wrote:

On Fri, 11 May 2007, Jason J. W. Williams wrote:

 Is it possible (or even technically feasible) for zfs to have a destroy
 to feature? Basically destroy any snapshot older than a certain date?

Sorta-kinda.  You can use 'zfs get' to get the creation time of a
snapshot.  If you give it -p, it'll provide the seconds-since-epoch time
so, with a little fancy footwork, this is scriptable.


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] C'mon ARC, stay small...

2007-03-30 Thread Jason J. W. Williams

Hi Guys,

Rather than starting a new thread I thought I'd continue this thread.
I've been running Build 54 on a Thumper since Mid January and wanted
to ask a question about the zfs_arc_max setting. We set it to 
0x1 #4GB, however its creeping over that till our Kernel
memory usage is nearly 7GB (::memstat inserted below).

This is a database server so I was curious if the DNLC would have this
affect over time, as it does quite quickly when dealing with small
files? Would it be worth upgrade to Build 59?

Thank you in advance!

Best Regards,
Jason

Page SummaryPagesMB  %Tot
     
Kernel1750044  6836   42%
Anon  1211203  4731   29%
Exec and libs7648290%
Page cache 220434   8615%
Free (cachelist)   318625  12448%
Free (freelist)659607  2576   16%

Total 4167561 16279
Physical  4078747 15932


On 3/23/07, Roch - PAE [EMAIL PROTECTED] wrote:


With latest Nevada setting zfs_arc_max in /etc/system is
sufficient. Playing with mdb on a live system is more
tricky and is what caused the problem here.

-r

[EMAIL PROTECTED] writes:
  Jim Mauro wrote:
 
   All righty...I set c_max to 512MB, c to 512MB, and p to 256MB...
  
 arc::print -tad
   {
...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t16588228608
   c02e29f8 uint64_t c = 0t33176457216
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t33176457216
   ...
   }
 c02e2a08 /Z 0x2000
   arc+0x48:   0x7b9789000 =   0x2000
 c02e29f8 /Z 0x2000
   arc+0x38:   0x7b9789000 =   0x2000
 c02e29f0 /Z 0x1000
   arc+0x30:   0x3dcbc4800 =   0x1000
 arc::print -tad
   {
   ...
   c02e29e8 uint64_t size = 0t299008
   c02e29f0 uint64_t p = 0t268435456  -- p
   is 256MB
   c02e29f8 uint64_t c = 0t536870912  -- c
   is 512MB
   c02e2a00 uint64_t c_min = 0t1070318720
   c02e2a08 uint64_t c_max = 0t536870912--- c_max is
   512MB
   ...
   }
  
   After a few runs of the workload ...
  
 arc::print -d size
   size = 0t536788992

  
  
   Ah - looks like we're out of the woods. The ARC remains clamped at 512MB.
 
 
  Is there a way to set these fields using /etc/system?
  Or does this require a new or modified init script to
  run and do the above with each boot?
 
  Darren
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: ZFS memory and swap usage

2007-03-19 Thread Jason J. W. Williams

Hi Rainer,

While I would recommend upgrading to Build 54 or newer to use the
system tunable, its not that big of a deal to set the ARC on boot up.
We've done it on a T2000 for awhile, until we could take it down for
an extended period of time to upgrade it.

Definitely WOULD NOT run a database on ZFS without it. You will run
out of RAM, and depending on how your DB responds to being out of RAM,
you could get some very undesirable results.

Just my two cents.

-J

On 3/19/07, Rainer Heilke [EMAIL PROTECTED] wrote:

The updated information states that the kernel setting is only for the current 
Nevada build. We are not going to use the kernel debugger method to change the 
setting on a live production system (and do this everytime we need to reboot).

We're back to trying to set their expectations more realistically, and using proper tools 
to measure memory usage. As I stated at the outset, they are trying to start up a 10GB 
SGA database within two minutes to simulate the start-up of five 2GB 
databases at boot-up. I sincerely doubt they are going to start all five databases 
simultaneously within two minutes on a regular boot-up.

So, what is the best use of the OS tools (vmstat, etc.) to show them how this 
would really occur?

Rainer


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] X2200-M2

2007-03-12 Thread Jason J. W. Williams

Hi Brian,

To my understanding the X2100 M2 and X2200 M2 are basically the same
board OEM'd from Quanta...except the 2200 M2 has two sockets.

As to ZFS and their weirdness, it would seem to me that fixing it
would be more an issue of the SATA/SCSI driver. I may be wrong here.

-J

On 3/12/07, Brian Hechinger [EMAIL PROTECTED] wrote:

After the interesting revelations about the X2100 and it's hot-swap abilities,
what are the abilities of the X2200-M2's disk subsystem, and is ZFS going to
tickle any wierdness out of them?

-brian
--
The reason I don't use Gnome: every single other window manager I know of is
very powerfully extensible, where you can switch actions to different mouse
buttons. Guess which one is not, because it might confuse the poor users?
Here's a hint: it's not the small and fast one.--Linus
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-27 Thread Jason J. W. Williams

Hi Przemol,

I think migration is a really important feature...think I said that...
;-)  SAN/RAID is not awful...frankly there's not been better solution
(outside of NetApp's WAFL) till ZFS. SAN/RAID just has its own
reliability issues you accept unless you don't have toZFS :-)

-J

On 2/27/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Thu, Feb 22, 2007 at 12:21:50PM -0700, Jason J. W. Williams wrote:
 Hi Przemol,

 I think Casper had a good point bringing up the data integrity
 features when using ZFS for RAID. Big companies do a lot of things
 just because that's the certified way that end up biting them in the
 rear. Trusting your SAN arrays is one of them. That all being said,
 the need to do migrations is a very valid concern.

Jason,

I don't claim that SAN/RAID solutions are the best and don't have any
mistakes/failures/problems. But if SAN/RAID is so bad why companies
using them survive ?

Imagine also that some company is using SAN/RAID for a few years
and doesn't have any problems (or once a few months). Also from time to
time they need to migrate between arrays (for whatever reason). Now you come 
and say
that they have unreliable SAN/RAID and you offer something new (ZFS)
which is going to make it much more reliable but migration to another array
will be painfull. What do you think what they choose ?

BTW: I am a fan of ZFS. :-)

przemol

--
Ustawiaj rekordy DNS dla swojej domeny 
http://link.interia.pl/f1a1a

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ARGHH. An other panic!!

2007-02-26 Thread Jason J. W. Williams

Hi Gino,

Was there more than one LUN in the RAID-Z using the port you disabled?

-J

On 2/26/07, Gino Ruopolo [EMAIL PROTECTED] wrote:

Hi Jason,

saturday we made some tests and found that disabling a FC port under heavy load 
(MPXio enabled) often takes to a panic.  (using a RAID-Z !)
No problems with UFS ...

later,
Gino


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELIOS and ZFS cache

2007-02-22 Thread Jason J. W. Williams

Hi Eric,

Everything Mark said.

We as a customer ran into this running MySQL on a Thumper (and T2000).
We solved it on the Thumper by limiting the ARC to 4GB:

/etc/system: set zfs:zfs_arc_max = 0x1 #4GB

This has worked marvelously over the past 50 days. The ARC stays
around 5-6GB now. Leaving 11GB for the DB.

Best Regards,
Jason

On 2/22/07, Mark Maybee [EMAIL PROTECTED] wrote:

This issue has been discussed a number of times in this forum.
To summerize:

ZFS (specifically, the ARC) will try to use *most* of the systems
available memory to cache file system data.  The default is to
max out at physmem-1GB (i.e., use all of physical memory except
for 1GB).  In the face of memory pressure, the ARC will give up
memory, however there are some situations where we are unable to
free up memory fast enough for an application that needs it (see
example in the HELIOS note below).  In these situations, it may
be necessary to lower the ARCs maximum memory footprint, so that
there is a larger amount of memory immediately available for
applications.  This is particularly relevant in situations where
there is a known amount of memory that will always be required for
use by some application (databases often fall into this category).
The tradeoff here is that the ARC will not be able to cache as much
file system data, and that could impact performance.

For example, if you know that an application will need 5GB on a
36GB machine, you could set the arc maximum to 30GB (0x78000).

In ZFS on on10 prior to update 4, you can only change the arc max
size via explicit actions with mdb(1):

# mdb -kw
  arc::print -a c_max
address c_max = current-max
  address/Z new-max

In the current opensolaris nevada bits, and in s10u4, you can use
the system variable 'zfs_arc_max' to set the maximum arc size.  Just
set this in /etc/system.

-Mark

Erik Vanden Meersch wrote:

 Could someone please provide comments or solution for this?

 Subject: Solaris 10 ZFS problems with database applications


 HELIOS TechInfo #106
 


 Tue, 20 Feb 2007

 Solaris 10 ZFS problems with database applications
 --

 We have tested Solaris 10 release 11/06 with ZFS without any problems
 using all HELIOS UB based products, including very high load tests.

 However we learned from customers that some database solutions (known
 are Sybase and Oracle), when allocating a large amount of memory may
 slow down or even freeze the system for up to a minute. This can
 result in RPC timeout messages and service interrupts for HELIOS
 processes. ZFS is basically using most memory for file caching.
 Freeing this ZFS memory for the database memory allocation can result
 into serious delays. This does not occur when using HELIOS products
 only.

 HELIOS tested system was using 4GB memory.
 Customer production machine was using 16GB memory.


 Contact your SUN representative how to limit the ZFS cache and what
 else to consider using ZFS in your workflow.

 Check also with your application vendor for recommendations using ZFS
 with their applications.


 Best regards,

 HELIOS Support

 HELIOS Software GmbH
 Steinriede 3
 30827 Garbsen (Hannover)
 Germany

 Phone:  +49 5131 709320
 FAX:+49 5131 709325
 http://www.helios.de

 --
 http://www.sun.com/solaris  * Erik Vanden Meersch *
 Solution Architect

 *Sun Microsystems, Inc.*
 Phone x48835/+32-2-704 8835
 Mobile 0479/95 05 98
 Email [EMAIL PROTECTED]


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-02-22 Thread Jason J. W. Williams

Hi Przemol,

I think Casper had a good point bringing up the data integrity
features when using ZFS for RAID. Big companies do a lot of things
just because that's the certified way that end up biting them in the
rear. Trusting your SAN arrays is one of them. That all being said,
the need to do migrations is a very valid concern.

Best Regards,
Jason

On 2/22/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

On Wed, Feb 21, 2007 at 04:43:34PM +0100, [EMAIL PROTECTED] wrote:

 I cannot let you say that.
 Here in my company we are very interested in ZFS, but we do not care
 about the RAID/mirror features, because we already have a SAN with
 RAID-5 disks, and dual fabric connection to the hosts.

 But you understand that these underlying RAID mechanism give absolutely
 no guarantee about data integrity but only that some data was found were
 some (possibly other) data was written?  (RAID5 never verifies the
 checkum is correct on reads; it only uses it to reconstruct data when
 reads fail)

But you understand that he perhaps knows that but so far nothing wrong
happened [*] and migration is still very important feature for him ?

[*] almost every big company has its data center with SAN and FC
connections with RAID-5 or RAID-10 in their storage arrays
and they are treated as reliable

przemol

--
Wpadka w kosciele - zobacz  http://link.interia.pl/f19ea

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs best practice for 2U SATA iSCSI NAS

2007-02-19 Thread Jason J. W. Williams

Hi Nicholas,


Actually Virtual Iron, they have a nice system at the moment with live
migration of windows guest.


Ah. We looked at them for some Windows DR. They do have a nice product.


3. Which leads to: coming from Debian, how easy is system updates?  I
remember with OpenBSD system updates used to be a pain.


Not a pain, but coming from Debian/Gentoo not great either. Packaging
is one of the last areas that Solaris really needs an upgrade. You
might want to take a look at Nexenta, which is OpenSolaris with GNU
userland and apt-get. Works pretty well. Once installed you can update
it to Build 56 to get the iSCSI target.

-J
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs best practice for 2U SATA iSCSI NAS

2007-02-17 Thread Jason J. W. Williams

Hi Nicholas,

ZFS itself is very stable and very effective as fast FS in our
experience. If you browse the archives of the list you'll see that NFS
performance is pretty acceptable, with some performance/RAM quirks
around small files:

http://www.opensolaris.org/jive/message.jspa?threadID=19858
http://www.opensolaris.org/jive/thread.jspa?threadID=18394

To my understanding  the iSCSI driver is undergoing significant
performance improvements...maybe someone close to this can help?


If by VI you are referring to VMware Infrastructure...you won't get
any support from VMware if you're using the iSCSI target on Solaris as
its not approved by them. Not that this is really a problem in my
experience as VMware tech support is pretty terrible anyway.



Some questions:
1. how stable is zfs? i'm tolarent to some sweat work to fix problems
but data loss is unacceptable


We haven't experienced any data loss, and have had some pretty nasty
things thrown at it (FC array rebooted unexpectedly).


2. If drives need to be pulled and put into a new chasis does zfs
handle them having new device names and being out of order?


My understanding and experience here is yes. It'll read the ZFS lables
off the drives/slice.


3. Is it possible to hot swap drives with raidz(2)


Depends on your underlying hardware. To my knowledge hot-swapping is
not dependent on the RAID-level at all.


4. How does performance compare with 'brand name' storage systems?


No clue if you're referring to NetApp. Does anyone else know?

-J
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS or UFS - what to do?

2007-01-29 Thread Jason J. W. Williams

Hi Jeff,

Maybe I mis-read this thread, but I don't think anyone was saying that
using ZFS on-top of an intelligent array risks more corruption. Given
my experience, I wouldn't run ZFS without some level of redundancy,
since it will panic your kernel in a RAID-0 scenario where it detects
a LUN is missing and can't fix it. That being said, I wouldn't run
anything but ZFS anymore. When we had some database corruption issues
awhile back, ZFS made it very simple to prove it was the DB. Just did
a scrub and boom, verification that the data was laid down correctly.
RAID-5 will have better random read performance the RAID-Z for reasons
Robert had to beat into my head. ;-) But if you really need that
performance, perhaps RAID-10 is what you should be looking at? Someone
smarter than I can probably give a better idea.

Regarding the failure detection, is anyone on the list have the
ZFS/FMA traps fed into a network management app yet? I'm curious what
the experience with it is?

Best Regards,
Jason

On 1/29/07, Jeffery Malloch [EMAIL PROTECTED] wrote:

Hi Guys,

SO...

From what I can tell from this thread ZFS if VERY fussy about managing 
writes,reads and failures.  It wants to be bit perfect.  So if you use the 
hardware that comes with a given solution (in my case an Engenio 6994) to manage 
failures you risk a) bad writes that don't get picked up due to corruption from 
write cache to disk b) failures due to data changes that ZFS is unaware of that 
the hardware imposes when it tries to fix itself.

So now I have a $70K+ lump that's useless for what it was designed for.  I 
should have spent $20K on a JBOD.  But since I didn't do that, it sounds like a 
traditional model works best (ie. UFS et al) for the type of hardware I have.  
No sense paying for something and not using it.  And by using ZFS just as a 
method for ease of file system growth and management I risk much more 
corruption.

The other thing I haven't heard is why NOT to use ZFS.  Or people who don't 
like it for some reason or another.

Comments?

Thanks,

Jeff

PS - the responses so far have been great and are much appreciated!  Keep 'em 
coming...


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jason J. W. Williams

Thank you for the detailed explanation. It is very helpful to
understand the issue. Is anyone successfully using SNDR with ZFS yet?

Best Regards,
Jason

On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

Jason J. W. Williams wrote:
 Could the replication engine eventually be integrated more tightly
 with ZFS?
Not it in the present form. The architecture and implementation of
Availability Suite is driven off block-based replication at the device
level (/dev/rdsk/...), something that allows the product to replicate
any Solaris file system, database, etc., without any knowledge of what
it is actually replicating.

To pursue ZFS replication in the manner of Availability Suite, one needs
to see what replication looks like from an abstract point of view. So
simplistically, remote replication is like the letter 'h', where the
left side of the letter is the complete I/O path on the primary node,
the horizontal part of the letter is the remote replication network
link, and the right side of the letter is only the bottom half of the
complete I/O path on the secondary node.

Next ZFS would have to have its functional I/O path split into two
halves, a top and bottom piece.  Next we configure replication, the
letter 'h', between two given nodes, running both a top and bottom piece
of ZFS on the source node, and just the bottom half of ZFS on the
secondary node.

Today, the SNDR component of Availability Suite works like the letter
'h' today, where we split the Solaris I/O stack into a top and bottom
half. The top half is that software (file system, database or
application I/O) that directs its I/Os to the bottom half (raw device,
volume manager or block device).

So all that needs to be done is to design and build a new variant of the
letter 'h', and find the place to separate ZFS into two pieces.

- Jim Dunham


 That would be slick alternative to send/recv.

 Best Regards,
 Jason

 On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:
 Project Overview:

 I propose the creation of a project on opensolaris.org, to bring to
 the community two Solaris host-based data services; namely volume
 snapshot and volume replication. These two data services exist today
 as the Sun StorageTek Availability Suite, a Solaris 8, 9  10,
 unbundled product set, consisting of Instant Image (II) and Network
 Data Replicator (SNDR).

 Project Description:

 Although Availability Suite is typically known as just two data
 services (II  SNDR), there is an underlying Solaris I/O filter
 driver framework which supports these two data services. This
 framework provides the means to stack one or more block-based, pseudo
 device drivers on to any pre-provisioned cb_ops structure, [
 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
 ], thereby shunting all cb_ops I/O into the top of a developed filter
 driver, (for driver specific processing), then out the bottom of this
 filter driver, back into the original cb_ops entry points.

 Availability Suite was developed to interpose itself on the I/O stack
 of a block device, providing a filter driver framework with the means
 to intercept any I/O originating from an upstream file system,
 database or application layer I/O. This framework provided the means
 for Availability Suite to support snapshot and remote replication
 data services for UFS, QFS, VxFS, and more recently the ZFS file
 system, plus various databases like Oracle, Sybase and PostgreSQL,
 and also application I/Os. By providing a filter driver at this point
 in the Solaris I/O stack, it allows for any number of data services
 to be implemented, without regard to the underlying block storage
 that they will be configured on. Today, as a snapshot and/or
 replication solution, the framework allows both the source and
 destination block storage device to not only differ in physical
 characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical
 characteristics such as in RAID type, volume managed storage (i.e.,
 SVM, VxVM), lofi, zvols, even ram disks.

 Community Involvement:

 By providing this filter-driver framework, two working filter drivers
 (II  SNDR), and an extensive collection of supporting software and
 utilities, it is envisioned that those individuals and companies that
 adopt OpenSolaris as a viable storage platform, will also utilize and
 enhance the existing II  SNDR data services, plus have offered to
 them the means in which to develop their own block-based filter
 driver(s), further enhancing the use and adoption on OpenSolaris.

 A very timely example that is very applicable to Availability Suite
 and the OpenSolaris community, is the recent announcement of the
 Project Proposal: lofi [ compression  encryption ] -
 http://www.opensolaris.org/jive/click.jspamessageID=26841. By
 leveraging both the Availability Suite and the lofi OpenSolaris
 projects, it would be highly probable to not only offer compression 
 encryption to lofi devices (as already proposed

Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Jason J. W. Williams

Hi Guys,

I seem to remember the Massive Array of Independent Disk guys ran into
a problem I think they called static friction, where idle drives would
fail on spin up after being idle for a long time:
http://www.eweek.com/article2/0,1895,1941205,00.asp

Would that apply here?

Best Regards,
Jason

On 1/29/07, Toby Thain [EMAIL PROTECTED] wrote:


On 29-Jan-07, at 9:04 PM, Al Hopper wrote:

 On Mon, 29 Jan 2007, Toby Thain wrote:

 Hi,

 This is not exactly ZFS specific, but this still seems like a
 fruitful place to ask.

 It occurred to me today that hot spares could sit in standby (spun
 down) until needed (I know ATA can do this, I'm supposing SCSI does
 too, but I haven't looked at a spec recently). Does anybody do this?
 Or does everybody do this already?

 I don't work with enough disk storage systems to know what is the
 industry
 norm.  But there are 3 broad categories of disk drive spares:

 a) Cold Spare.  A spare where the power is not connected until it is
 required.  [1]

 b) Warm Spare.  A spare that is active but placed into a low power
 mode. ...

 c) Hot Spare.  A spare that is spun up and ready to accept
 read/write/position (etc) requests.

Hi Al,

Thanks for reminding me of the distinction. It seems very few
installations would actually require (c)?


 Does the tub curve (chance of early life failure) imply that hot
 spares should be burned in, instead of sitting there doing nothing
 from new? Just like a data disk, seems to me you'd want to know if a
 hot spare fails while waiting to be swapped in. Do they get tested
 periodically?

 The ideal scenario, as you already allude to, would be for the disk
 subsystem to initially configure the drive as a hot spare and send it
 periodic test events for, say, the first 48 hours.

For some reason that's a little shorter than I had in mind, but I
take your word that that's enough burn-in for semiconductors, motors,
servos, etc.

 This would get it
 past the first segment of the bathtub reliability curve ...

 If saving power was the highest priority, then the ideal situation
 would
 be where the disk subsystem could apply/remove power to the spare
 and move
 it from warm to cold upon command.

I am surmising that it would also considerably increase the spare's
useful lifespan versus hot and spinning.


 One trick with disk subsystems, like ZFS that have yet to have
 the FMA
 type functionality added and which (today) provide for hot spares
 only, is
 to initially configure a pool with one (hot) spare, and then add a
 2nd hot
 spare, based on installing a brand new device, say, 12 months
 later.  And
 another spare 12 months later.  What you are trying to achieve,
 with this
 strategy, is to avoid the scenario whereby mechanical systems, like
 disk
 drives, tend to wear out within the same general, relatively short,
 timeframe.

 One (obvious) issue with this strategy, is that it may be
 impossible to
 purchase the same disk drive 12 and 24 months later.  However, it's
 always
 possible to purchase a larger disk drive

...which is not guaranteed to be compatible with your storage
subsystem...!

--Toby

 and simply commit to the fact
 that the extra space provided by the newer drive will be wasted.

 [1] The most common example is a disk drive mounted on a carrier
 but not
 seated within the disk drive enclosure.  Simple push in when
 required.
 ...
 Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
 OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
  OpenSolaris Governing Board (OGB) Member - Feb 2006

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-29 Thread Jason J. W. Williams

Hi Jim,

Thank you very much for the heads up. Unfortunately, we need the
write-cache enabled for the application I was thinking of combining
this with. Sounds like SNDR and ZFS need some more soak time together
before you can use both to their full potential together?

Best Regards,
Jason

On 1/29/07, Jim Dunham [EMAIL PROTECTED] wrote:

Jason,
 Thank you for the detailed explanation. It is very helpful to
 understand the issue. Is anyone successfully using SNDR with ZFS yet?
Of the opportunities I've been involved with the answer is yes, but so
far I've not seen SNDR with  ZFS in a production environment, but that
does not mean they don't exists. It was not until late June '06, that
AVS 4.0, Solaris 10 and ZFS were generally available, and to date AVS
has not been made available for the Solaris Express, Community Release,
but it will be real soon.

While I have your attention, there are two issues between ZFS and AVS
that needs mentioning.

1). When ZFS is given an entire LUN to place in a ZFS storage pool, ZFS
detect this, enabling SCSI write-caching on the LUN, and also opens the
LUN with exclusive access, preventing other data services (like AVS)
from accessing this device. The work-around is to manually format the
LUN, typically placing all the blocks into a single partition, then just
place this partition into the ZFS storage pool. ZFS detect this, not
owning the entire LUN, and doesn't enable write-caching, which means it
also doesn't open the LUN with exclusive access, and therefore AVS and
ZFS can share the same LUN.

I thought about submitting an RFE to have ZFS provide a means to
override this restriction, but I am not 100% certain that a ZFS
filesystem directly accessing a write-cached enabled LUN is the same
thing as a replicated ZFS filesystem accessing a write-cached enabled
LUN. Even though AVS is write-order consistent, there are disaster
recovery scenarios, when enacted, where block-order, verses write-order
I/Os are issued.

2). One has to be very cautious in using zpool import -f   (forced
import), especially on a LUN or LUNs in which SNDR is actively
replicating into. If ZFS complains that the storage pool was not cleanly
exported when issuing a zpool import ..., and one attempts a zpool
import -f , without checking the active replication state, they are
sure to panic Solaris. Of  course this failure scenario is no different
then accessing a LUN or LUNs on dual-ported, or SAN based storage when
another Solaris host is still accessing the ZFS filesystem, or
controller based replication, as they are all just different operational
scenarios of the same issue, data blocks changing out from underneath
the ZFS filesystem, and its CRC checking mechanisms.

Jim


 Best Regards,
 Jason



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] hot spares - in standby?

2007-01-29 Thread Jason J. W. Williams

Hi Toby,

You're right. The healthcheck would definitely find any issues. I
misinterpreted your comment to that effect as a question and didn't
quite latch on. A zpool MAID-mode with that healthcheck might also be
interesting on something like a Thumper for pure-archival, D2D backup
work. Would dramatically cut down on the power. What do y'all think?

Best Regards,
Jason

On 1/29/07, Toby Thain [EMAIL PROTECTED] wrote:


On 29-Jan-07, at 11:02 PM, Jason J. W. Williams wrote:

 Hi Guys,

 I seem to remember the Massive Array of Independent Disk guys ran into
 a problem I think they called static friction, where idle drives would
 fail on spin up after being idle for a long time:

You'd think that probably wouldn't happen to a spare drive that was
spun up from time to time. In fact this problem would be (mitigated
and/or) caught by the periodic health check I suggested.

--T

 http://www.eweek.com/article2/0,1895,1941205,00.asp

 Would that apply here?

 Best Regards,
 Jason

 On 1/29/07, Toby Thain [EMAIL PROTECTED] wrote:

 On 29-Jan-07, at 9:04 PM, Al Hopper wrote:

  On Mon, 29 Jan 2007, Toby Thain wrote:
 
  Hi,
 
  This is not exactly ZFS specific, but this still seems like a
  fruitful place to ask.
 
  It occurred to me today that hot spares could sit in standby (spun
  down) until needed (I know ATA can do this, I'm supposing SCSI
 does
  too, but I haven't looked at a spec recently). Does anybody do
 this?
  Or does everybody do this already?
 
  I don't work with enough disk storage systems to know what is the
  industry
  norm.  But there are 3 broad categories of disk drive spares:
 
  a) Cold Spare.  A spare where the power is not connected until
 it is
  required.  [1]
 
  b) Warm Spare.  A spare that is active but placed into a low power
  mode. ...
 
  c) Hot Spare.  A spare that is spun up and ready to accept
  read/write/position (etc) requests.

 Hi Al,

 Thanks for reminding me of the distinction. It seems very few
 installations would actually require (c)?

 
  Does the tub curve (chance of early life failure) imply that hot
  spares should be burned in, instead of sitting there doing nothing
  from new? Just like a data disk, seems to me you'd want to know
 if a
  hot spare fails while waiting to be swapped in. Do they get tested
  periodically?
 
  The ideal scenario, as you already allude to, would be for the disk
  subsystem to initially configure the drive as a hot spare and
 send it
  periodic test events for, say, the first 48 hours.

 For some reason that's a little shorter than I had in mind, but I
 take your word that that's enough burn-in for semiconductors, motors,
 servos, etc.

  This would get it
  past the first segment of the bathtub reliability curve ...
 
  If saving power was the highest priority, then the ideal situation
  would
  be where the disk subsystem could apply/remove power to the spare
  and move
  it from warm to cold upon command.

 I am surmising that it would also considerably increase the spare's
 useful lifespan versus hot and spinning.

 
  One trick with disk subsystems, like ZFS that have yet to have
  the FMA
  type functionality added and which (today) provide for hot spares
  only, is
  to initially configure a pool with one (hot) spare, and then add a
  2nd hot
  spare, based on installing a brand new device, say, 12 months
  later.  And
  another spare 12 months later.  What you are trying to achieve,
  with this
  strategy, is to avoid the scenario whereby mechanical systems, like
  disk
  drives, tend to wear out within the same general, relatively
 short,
  timeframe.
 
  One (obvious) issue with this strategy, is that it may be
  impossible to
  purchase the same disk drive 12 and 24 months later.  However, it's
  always
  possible to purchase a larger disk drive

 ...which is not guaranteed to be compatible with your storage
 subsystem...!

 --Toby

  and simply commit to the fact
  that the extra space provided by the newer drive will be wasted.
 
  [1] The most common example is a disk drive mounted on a carrier
  but not
  seated within the disk drive enclosure.  Simple push in when
  required.
  ...
  Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
 approach.com
 Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
  OpenSolaris.Org Community Advisory Board (CAB) Member - Apr 2005
   OpenSolaris Governing Board (OGB) Member - Feb 2006

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Jason J. W. Williams

Hi Jeff,

We're running a FLX210 which I believe is an Engenio 2884. In our case
it also is attached to a T2000. ZFS has run VERY stably for us with
data integrity issues at all.

We did have a significant latency problem caused by ZFS flushing the
write cache on the array after every write, but that can be fixed by
configuring your array to ignore cache flushes. The instructions for
Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44

We use the config for a production database, so I can't speak to the
NFS issues. All I would mention is to watch the RAM consumption by
ZFS.

Does anyone on the list have a recommendation for ARC sizing with NFS?

Best Regards,
Jason


On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote:

Hi Folks,

I am currently in the midst of setting up a completely new file server using a pretty 
well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for 
LSI Logic so Engenio is a no brainer).  I have configured a couple of zpools from Volume 
groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I then created sub zfs systems below 
that and set quotas and sharenfs'd them so that it appears that these file 
systems are dynamically shrinkable and growable.  It looks very good...  I can see 
the correct file system sizes on all types of machines (Linux 32/64bit and of course 
Solaris boxes) and if I resize the quota it's picked up in NFS right away.  But I would 
be the first in our organization to use this in an enterprise system so I definitely have 
some concerns that I'm hoping someone here can address.

1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 with 
hot spares and write cache (8GB) has battery backup so I'm not too concerned 
from a hardware side.  I'm looking for an idea of how stable ZFS itself is in 
terms of corruptability, uptime and OS stability.

2.  Recommended config.  Above, I have a fairly simple setup.  In many of the 
examples the granularity is home directory level and when you have many many 
users that could get to be a bit of a nightmare administratively.  I am really 
only looking for high level dynamic size adjustability and am not interested in 
its built in RAID features.  But given that, any real world recommendations?

3.  Caveats?  Anything I'm missing that isn't in the docs that could turn into 
a BIG gotchya?

4.  Since all data access is via NFS we are concerned that 32 bit systems 
(Mainly Linux and Windows via Samba) will not be able to access all the data 
areas of a 2TB+ zpool even if the zfs quota on a particular share is less then 
that.  Can anyone comment?

The bottom line is that with anything new there is cause for concern.  
Especially if it hasn't been tested within our organization.  But the 
convenience/functionality factors are way too hard to ignore.

Thanks,

Jeff


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS or UFS - what to do?

2007-01-26 Thread Jason J. W. Williams

Correction: ZFS has run VERY stably for us with  data integrity
issues at all. should read ZFS has run VERY stably for us with  NO
data integrity issues at all.


On 1/26/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:

Hi Jeff,

We're running a FLX210 which I believe is an Engenio 2884. In our case
it also is attached to a T2000. ZFS has run VERY stably for us with
data integrity issues at all.

We did have a significant latency problem caused by ZFS flushing the
write cache on the array after every write, but that can be fixed by
configuring your array to ignore cache flushes. The instructions for
Engenio products are here: http://blogs.digitar.com/jjww/?itemid=44

We use the config for a production database, so I can't speak to the
NFS issues. All I would mention is to watch the RAM consumption by
ZFS.

Does anyone on the list have a recommendation for ARC sizing with NFS?

Best Regards,
Jason


On 1/26/07, Jeffery Malloch [EMAIL PROTECTED] wrote:
 Hi Folks,

 I am currently in the midst of setting up a completely new file server using a pretty 
well loaded Sun T2000 (8x1GHz, 16GB RAM) connected to an Engenio 6994 product (I work for 
LSI Logic so Engenio is a no brainer).  I have configured a couple of zpools from Volume 
groups on the Engenio box - 1x2.5TB and 1x3.75TB.  I then created sub zfs systems below that 
and set quotas and sharenfs'd them so that it appears that these file systems 
are dynamically shrinkable and growable.  It looks very good...  I can see the correct file 
system sizes on all types of machines (Linux 32/64bit and of course Solaris boxes) and if I 
resize the quota it's picked up in NFS right away.  But I would be the first in our 
organization to use this in an enterprise system so I definitely have some concerns that I'm 
hoping someone here can address.

 1.  How stable is ZFS?  The Engenio box is completely configured for RAID5 
with hot spares and write cache (8GB) has battery backup so I'm not too concerned 
from a hardware side.  I'm looking for an idea of how stable ZFS itself is in 
terms of corruptability, uptime and OS stability.

 2.  Recommended config.  Above, I have a fairly simple setup.  In many of the 
examples the granularity is home directory level and when you have many many users 
that could get to be a bit of a nightmare administratively.  I am really only 
looking for high level dynamic size adjustability and am not interested in its 
built in RAID features.  But given that, any real world recommendations?

 3.  Caveats?  Anything I'm missing that isn't in the docs that could turn 
into a BIG gotchya?

 4.  Since all data access is via NFS we are concerned that 32 bit systems 
(Mainly Linux and Windows via Samba) will not be able to access all the data areas 
of a 2TB+ zpool even if the zfs quota on a particular share is less then that.  
Can anyone comment?

 The bottom line is that with anything new there is cause for concern.  
Especially if it hasn't been tested within our organization.  But the 
convenience/functionality factors are way too hard to ignore.

 Thanks,

 Jeff


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: How much do we really want zpool remove?

2007-01-26 Thread Jason J. W. Williams

To be fair, you can replace vdevs with same-sized or larger vdevs online.
The issue is that you cannot replace with smaller vdevs nor can you
eliminate vdevs.  In other words, I can migrate data around without
downtime, I just can't shrink or eliminate vdevs without send/recv.
This is where the philosophical disconnect lies.  Everytime we descend
into this rathole, we stir up more confusion :-(


We did just this to move off RAID-5 LUNs that were the vdevs for a
pool, to RAID-10 LUNs. Worked very well, and as Richard said was done
all on-line. Doesn't really address the shrinking issue though. :-)

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] multihosted ZFS

2007-01-26 Thread Jason J. W. Williams

You could use SAN zoning of the affected LUN's to keep multiple hosts
from seeing the zpool.  When failover time comes, you change the zoning
to make the LUN's visible to the new host, then import.  When the old
host reboots, it won't find any zpool.  Better safe than sorry


Or change the LUN masking on the array. Depending on your switch that
can be less disruptive, and depending on your storage array might be
able to be scripted.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Project Proposal: Availability Suite

2007-01-26 Thread Jason J. W. Williams

Could the replication engine eventually be integrated more tightly
with ZFS? That would be slick alternative to send/recv.

Best Regards,
Jason

On 1/26/07, Jim Dunham [EMAIL PROTECTED] wrote:

Project Overview:

I propose the creation of a project on opensolaris.org, to bring to the community 
two Solaris host-based data services; namely volume snapshot and volume 
replication. These two data services exist today as the Sun StorageTek Availability 
Suite, a Solaris 8, 9  10, unbundled product set, consisting of Instant Image 
(II) and Network Data Replicator (SNDR).

Project Description:

Although Availability Suite is typically known as just two data services (II  
SNDR), there is an underlying Solaris I/O filter driver framework which supports 
these two data services. This framework provides the means to stack one or more 
block-based, pseudo device drivers on to any pre-provisioned cb_ops structure, [ 
http://www.opensolaris.org/os/article/2005-03-31_inside_opensolaris__solaris_driver_programming/#datastructs
 ], thereby shunting all cb_ops I/O into the top of a developed filter driver, (for 
driver specific processing), then out the bottom of this filter driver, back into 
the original cb_ops entry points.

Availability Suite was developed to interpose itself on the I/O stack of a 
block device, providing a filter driver framework with the means to intercept 
any I/O originating from an upstream file system, database or application layer 
I/O. This framework provided the means for Availability Suite to support 
snapshot and remote replication data services for UFS, QFS, VxFS, and more 
recently the ZFS file system, plus various databases like Oracle, Sybase and 
PostgreSQL, and also application I/Os. By providing a filter driver at this 
point in the Solaris I/O stack, it allows for any number of data services to be 
implemented, without regard to the underlying block storage that they will be 
configured on. Today, as a snapshot and/or replication solution, the framework 
allows both the source and destination block storage device to not only differ 
in physical characteristics (DAS, Fibre Channel, iSCSI, etc.), but also logical 
characteristics such as in RAID type, volume managed storage (i.e., SVM, VxVM), 
lofi, zvols, even ram disks.

Community Involvement:

By providing this filter-driver framework, two working filter drivers (II  SNDR), 
and an extensive collection of supporting software and utilities, it is envisioned that 
those individuals and companies that adopt OpenSolaris as a viable storage platform, 
will also utilize and enhance the existing II  SNDR data services, plus have 
offered to them the means in which to develop their own block-based filter driver(s), 
further enhancing the use and adoption on OpenSolaris.

A very timely example that is very applicable to Availability Suite and the OpenSolaris 
community, is the recent announcement of the Project Proposal: lofi [ compression  
encryption ] - http://www.opensolaris.org/jive/click.jspamessageID=26841. By 
leveraging both the Availability Suite and the lofi OpenSolaris projects, it would be 
highly probable to not only offer compression  encryption to lofi devices (as already 
proposed), but by collectively leveraging these two project, creating the means to support 
file systems, databases and applications, across all block-based storage devices.

Since Availability Suite has strong technical ties to storage, please look for email 
discussion for this project at: storage-discuss at opensolaris dot org

A complete set of Availability Suite administration guides can be found at: 
http://docs.sun.com/app/docs?p=coll%2FAVS4.0


Project Lead:

Jim Dunham http://www.opensolaris.org/viewProfile.jspa?username=jdunham

Availability Suite - New Solaris Storage Group


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-24 Thread Jason J. W. Williams

Hi Wee,

Having snapshots in the filesystem that work so well is really nice.
How are y'all quiescing the DB?

Best Regards,
J

On 1/24/07, Wee Yeh Tan [EMAIL PROTECTED] wrote:

On 1/25/07, Bryan Cantrill [EMAIL PROTECTED] wrote:
 ...
 after all, what was ZFS going to do with that expensive but useless
 hardware RAID controller?  ...

I almost rolled over reading this.

This is exactly what I went through when we moved our database server
out from Vx** to ZFS.  We had a 3510 and were thinking how best to
configure the RAID.  In the end, we ripped out the controller board
and used the 3510 as a JBOD directly attached to the server.  My DBA
was so happy with this setup (especially with the snapshot capability)
he is asking for another such setup.


--
Just me,
Wire ...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] need advice: ZFS config ideas for X4500 Thumper?

2007-01-23 Thread Jason J. W. Williams

Hi Neal,

We've been getting pretty good performance out of RAID-Z2 with 3x
6-disk RAID-Z2 stripes. More stripes mean better performance all
around...particularly on random reads. But as a file-server that's
probably not a concern. With RAID-Z2 it seems to me 2 hot-spares is
very sufficient, but I'll defer to others with more knowledge.

Best Regards,
Jason

On 1/23/07, Neal Pollack [EMAIL PROTECTED] wrote:

Hi:   (Warning, new zfs user question)

I am setting up an X4500 for our small engineering site file server.
It's mostly for builds, images, doc archives, certain workspace
archives, misc
data.

I'd like a trade off between space and safety of data.  I have not set
up a large
ZFS system before, and have only played with simple raidz2 with 7 disks.
After reading
http://blogs.sun.com/relling/entry/raid_recommendations_space_vs_mttdl;
I am leaning toward a RAID-Z2 config with spares, for approx 15
terabytes, but I
do not yet understand the nomenclature and exact config details.
For example, the graph/chart shows that 7+2 RAID-Z2  with spares would
be a good
balance in capacity and data safety, but I do not know what to do with
that number, how
it maps to an actual setup?   Does that type of config also provide a
balance between
performance and data safety?

Can someone provide an actual example of how the config should look?
If I save two disks for the boot, how do the other 46 disks get configured
between spares and zfs groups?

Thanks,

Neal

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: Re: Re: External drive enclosures + Sun

2007-01-23 Thread Jason J. W. Williams

I believe the SmartArray is an LSI like the Dell PERC isn't it?

Best Regards,
Jason

On 1/23/07, Robert Suh [EMAIL PROTECTED] wrote:

People trying to hack together systems might want to look
at the HP DL320s

http://h10010.www1.hp.com/wwpc/us/en/ss/WF05a/15351-241434-241475-241475
-f79-3232017.html

12 drive bays, Intel Woodcrest, SAS (and SATA) controller.  If you snoop
around, you
might be able to find drive carriers on eBay or elsewhere (*cough*
search HP drive sleds
HP drive carriers)  $3k for the chassis.  A mini thumper.

Though I'm not sure if Solaris supports the Smart Array controller.

Rob

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of mike
Sent: Monday, January 22, 2007 1:17 PM
To: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Re: Re: Re: Re: External drive enclosures +
Sun


I'm dying here - does anyone know when or even if they will support
these?

I had this whole setup planned out but it requires eSATA + port
multipliers.

I want to use ZFS, but currently cannot in that fashion. I'd still
have to buy some [more expensive, noisier, bulky internal drive]
solution for ZFS. Unless anyone has other ideas. I'm looking to run a
5-10 drive system (with easy ability to expand) in my home office; not
in a datacenter.

Even opening up to iSCSI seems to not get me much - there aren't any
SOHO type NAS enclosures that act as iSCSI targets. There are however
handfuls of eSATA based 4, 5, and 10 drive enclosures perfect for
this... but all require the port multiplier support.



On 1/22/07, Frank Cusack [EMAIL PROTECTED] wrote:
 Unfortunately, Solaris does not support SATA port multipliers (yet) so
 I think you're pretty limited in how many esata drives you can
connect.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] need advice: ZFS config ideas for X4500 Thumper?

2007-01-23 Thread Jason J. W. Williams

Hi Peter,

Perhaps I'm a bit dense, but I've been befuddled by the x+y notation
myself. Is it X stripes consisting of Y disks?

Best Regards,
Jason

On 1/23/07, Peter Tribble [EMAIL PROTECTED] wrote:

On 1/23/07, Neal Pollack [EMAIL PROTECTED] wrote:
 Hi:   (Warning, new zfs user question)

 I am setting up an X4500 for our small engineering site file server.
 It's mostly for builds, images, doc archives, certain workspace
 archives, misc
 data.

...
 Can someone provide an actual example of how the config should look?
 If I save two disks for the boot, how do the other 46 disks get configured
 between spares and zfs groups?

What I ended up with was working with 8+2 raidz2 vdevs. It could have been
4+2, but 8+2 gives you more space, and that was more important than
performance. (The performance of the 8+2 is easily adequate for our needs.)
And with 46 drives to play with I can have 4 lots of that. At the moment I
have
6 hot-spares (I may take some of those out later, but at the moment I don't
need them).

So the config looks like:

zpool create images \
raidz2 c{0,1,4,6,7}t0d0 c{1,4,5,6,7}t1d0 \
raidz2 c{0,4,5,6,7}t2d0 c{0,1,5,6,7}t3d0 \
raidz2 c{0,1,4,6,7}t4d0 c{0,1,4,6,7}t5d0 \
raidz2 c{0,1,4,5,7}t6d0 c{0,1,4,5,6}t7d0 \
spare c0t1d0 c1t2d0 c4t3d0 c5t5d0 c6t6d0 c7t7d0

this spreads everything across all the controllers, and with no more than 2
disks on each controller I could survive the rather unlikely event of a
controller failure (unless it's the controller with the boot drives...).

--
-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] need advice: ZFS config ideas for X4500 Thumper?

2007-01-23 Thread Jason J. W. Williams

Hi Peter,

Ah! That clears it up for me. Thank you.

Best Regards,
Jason

On 1/23/07, Peter Tribble [EMAIL PROTECTED] wrote:

On 1/23/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:
 Hi Peter,

 Perhaps I'm a bit dense, but I've been befuddled by the x+y notation
 myself. Is it X stripes consisting of Y disks?


Sorry. Took a short cut on that bit. It's x data disks + y parity. So in the
case of raidz1, y=1; in the case of raidz2, y=2. And ideally x should
be a power of 2. (So 8+2 is a raidz2 stripe of 10 disks in total.)

I've always used this notation, but now I think about it I'm not sure
how universal it is.

--

-Peter Tribble
http://www.petertribble.co.uk/ - http://ptribble.blogspot.com/

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Thumper Origins Q

2007-01-23 Thread Jason J. W. Williams

Hi All,

This is a bit off-topic...but since the Thumper is the poster child
for ZFS I hope its not too off-topic.

What are the actual origins of the Thumper? I've heard varying stories
in word and print. It appears that the Thumper was the original server
Bechtolsheim designed at Kealia as a massive video server. However,
when we were first told about it a year ago through Sun contacts
Thumper was described as a part of a scalabe iSCSI storage system,
where Thumpers would be connected to a head (which looked a lot like a
pair of X4200s) via iSCSI that would then present the storage over
iSCSI and NFS. Recently, other sources mentioned they were told about
the same time that Thumper was part of the Honeycomb project.

So I was curious if anyone had any insights into the history/origins
of the Thumper...or just wanted to throw more rumors on the fire. ;-)

Thanks in advance for your indulgence.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Synchronous Mount?

2007-01-23 Thread Jason J. W. Williams

Hi Prashanth,

My company did a lot of LVM+XFS vs. SVM+UFS testing in addition to
ZFS. Overall, LVM's overhead is abysmal. We witnessed performance hits
of 50%+. SVM only reduced performance by about 15%. ZFS was similar,
though a tad higher.

Also, my understanding is you can't write to a ZFS snapshot...unless
you clone it. Perhaps, someone who knows more than I can clarify.

Best Regards,
Jason

On 1/23/07, Prashanth Radhakrishnan [EMAIL PROTECTED] wrote:


  Is there someway to synchronously mount a ZFS filesystem?
  '-o sync' does not appear to be honoured.

 No there isn't. Why do you think it is necessary?

Specifically, I was trying to compare ZFS snapshots with LVM snapshots on
Linux. One of the tests does writes to an ext3FS (that's on top of an LVM
snapshot) mounted synchronously, in order to measure the real
Copy-on-write overhead. So, I was wondering if I could do the same with
ZFS. Seems not.

Thanks.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Thumper Origins Q

2007-01-23 Thread Jason J. W. Williams

Wow. That's an incredibly cool story. Thank you for sharing it! Does
the Thumper today pretty much resemble what you saw then?

Best Regards,
Jason

On 1/23/07, Bryan Cantrill [EMAIL PROTECTED] wrote:


 This is a bit off-topic...but since the Thumper is the poster child
 for ZFS I hope its not too off-topic.

 What are the actual origins of the Thumper? I've heard varying stories
 in word and print. It appears that the Thumper was the original server
 Bechtolsheim designed at Kealia as a massive video server.

That's correct -- it was originally called the StreamStor.  Speaking
personally, I first learned about it in the meeting with Andy that I
described here:

  http://blogs.sun.com/bmc/entry/man_myth_legend

I think it might be true that this was the first that anyone in Solaris
had heard of it.  Certainly, it was the first time that Andy had ever
heard of ZFS.  It was a very high bandwidth conversation, at any rate. ;)

After the meeting, I returned post-haste to Menlo Park, where I excitedly
described the box to Jeff Bonwick, Bill Moore and Bart Smaalders.  Bill
said something like I gotta see this thing and sometime later (perhaps
the next week?) Bill, Bart and I went down to visit Andy.  Andy gave
us a much more detailed tour, with Bill asking all sorts of technical
questions about the hardware (many of which were something like how did
you get a supplier to build that for you?!).  After the tour, Andy
took the three of us to lunch, and it was one of those moments that I
won't forget:  Bart, Bill, Andy and I sitting in the late afternoon Palo
Alto sun, with us very excited about his hardware, and Andy very excited
about our software.  Everyone realized that these two projects -- born
independently -- were made for each other, that together they would change
the market.  It was one of those rare moments that reminds you why you got
into this line of work -- and I feel lucky to have shared in it.

- Bryan

--
Bryan Cantrill, Solaris Kernel Development.   http://blogs.sun.com/bmc


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Synchronous Mount?

2007-01-23 Thread Jason J. W. Williams

Hi Prashanth,

This was about a year ago. I believe I ran bonnie++ and IOzone tests.
Tried also to simulate an OLTP load. The 15-20% overhead for ZFS was
vs. UFS on a raw disk...UFS on SVM was almost exactly 15% lower
performance than raw UFS. UFS and XFS on raw disk were pretty similar
in terms of performance, until you got into small files...then XFS
bogged down really badly. None of this was testing with snapshots, so
I'm not sure of the effect there.

I can attest we're running ZFS right now in production on a Thumper
serving two MySQL instances, under an 80/20 write/read load. We use
ZFS snapshots as our primary backup mechanism (flush/lock the tables,
flush the logs, snap, release the locks). At the moment we have 60 ZFS
snapshots across 4 filesystems (one FS per zpool). Our primary
database zpool has 26 of those snapshots, and the primary DB log zpool
has another 26 snapshots. Overall, we haven't noticed any performance
degradation in our database serving performance. I don't have hard
benchmark numbers for you on this, but anecdotally it works very well.

There have been some folks complaining here of snapshot numbers in the
200+ range causing performance problems on a single FS.  We don't plan
to have more than about 40 snapshots on an FS right now.

Hope this is somewhat helpful. Its been a long time (2+ years) since
I've used Ext3 on a Linux system, so I couldn't give you a comparative
benchmark. Good luck! :-)

Best Regards,
Jason

On 1/23/07, Prashanth Radhakrishnan [EMAIL PROTECTED] wrote:

Hi Jason,

 My company did a lot of LVM+XFS vs. SVM+UFS testing in addition to
 ZFS. Overall, LVM's overhead is abysmal. We witnessed performance hits
 of 50%+. SVM only reduced performance by about 15%. ZFS was similar,
 though a tad higher.

Yes, LVM snapshots' overhead is high. But I've seen that as you start
increasing the chunksize, they get better (though, with higher space
usage).

So, you saw performance reductions as much as 15% with ZFS
clones/snapshots. I'm curious to know what tests and ZFS config (# of
snapshots/clones) you ran on.

I ran bonnie++ and din't notice any perceptible drops in the numbers.
Though my config had only upto 3 clones and 3 snapshots for each of them.

 Also, my understanding is you can't write to a ZFS snapshot...unless
 you clone it. Perhaps, someone who knows more than I can clarify.

Right. I wanted to check if creating snapshots affected the performance of
the origin FS/clone.

Thanks,
Prashanth

 On 1/23/07, Prashanth Radhakrishnan [EMAIL PROTECTED] wrote:
 
Is there someway to synchronously mount a ZFS filesystem?
'-o sync' does not appear to be honoured.
  
   No there isn't. Why do you think it is necessary?
 
  Specifically, I was trying to compare ZFS snapshots with LVM snapshots on
  Linux. One of the tests does writes to an ext3FS (that's on top of an LVM
  snapshot) mounted synchronously, in order to measure the real
  Copy-on-write overhead. So, I was wondering if I could do the same with
  ZFS. Seems not.
 
  Thanks.
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: External drive enclosures + Sun Server for massstorage

2007-01-22 Thread Jason J. W. Williams

Hi Frank,

I'm sure Richard will check it out. He's a very good guy and not
trying to jerk you around. I'm sure the hostility isn't warranted. :-)

Best Regards,
Jason

On 1/22/07, Frank Cusack [EMAIL PROTECTED] wrote:

On January 22, 2007 10:03:14 AM -0800 Richard Elling
[EMAIL PROTECTED] wrote:
 Toby Thain wrote:
   To be clear: the X2100 drives are neither hotswap nor hotplug under
   Solaris. Replacing a failed drive requires a reboot.

 I do not believe this is true, though I don't have one to test.

Well if you won't accept multiple technically adept people's word on it,
I highly suggest you get one to test instead of speculating.

  If this
 were true, then we would have had to rewrite the disk drivers to not allow
 us to open a device more than once, even if we also closed the device.
 I can't imagine anyone allowing such code to be written.

Obviously you have not rewritten the disk drivers to do this, so this is
the wrong line of reasoning.

 However, I don't believe this is the context of the issue.  I believe that
 this release note deals with the use of NVRAID (NVidia's MCP RAID
 controller)
 which does not have a systems management interface under Solaris.  The
 solution is to not use NVRAID for Solaris.  Rather, use the proven
 techniques
 that we've been using for decades to manage hot plugging drives.

No, the release note is not about NVRAID.

 In short, the release note is confusing, so ignore it.  Use x2100 disks as
 hot pluggable like you've always used hot plug disks in Solaris.

Again, NO these drives are not hot pluggable and the release note is
accurate.  PLEASE get a system to test.  Or take our word for it.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: External drive enclosures + Sun Server for mass

2007-01-22 Thread Jason J. W. Williams

Hi David,

Depending on the I/O you're doing the X4100/X4200 are much better
suited because of the dual HyperTransport buses. As a storage box with
GigE outputs you've got a lot more I/O capacity with two HT buses than
one. That plus the X4100 is just a more solid box. The X2100 M2 while
a vast improvement over the X2100 in terms of reliability and
features, is still an OEM'd whitebox. We use the X2100 M2s for
application servers, but for anything that needs solid reliability or
I/O we go Galaxy.

Best Regards,
Jason

On 1/22/07, David J. Orman [EMAIL PROTECTED] wrote:

 Not to be picky, but the X2100 and X2200 series are
 NOT
 designed/targeted for disk serving (they don't even
 have redundant power
 supplies).  They're compute-boxes.  The X4100/X4200
 are what you are
 looking for to get a flexible box more oriented
 towards disk i/o and
 expansion.

I don't see those as being any better suited to external discs other than:

#1 - They have the capacity for redundant PSUs, which is irrelevant to my needs.
#2 - They only have PCI Express slots, and I can't find any good external SATA 
interface cards on PCI Express

I can't wrap my head around the idea that I should buy a lot more than I need, 
which still doesn't serve my purposes. The 4 disks in an x4100 still aren't 
enough, and the machine is a fair amount more costly. I just need mirrored boot 
drives, and an external disk array.

 That said (if you're set on an X2200 M2), you are
 probably better off
 getting a PCI-E SCSI controller, and then attaching
 it to an external
 SCSI-SATA JBOD.  There are plenty of external JBODs
 out there which use
 Ultra320/Ultra160 as a host interface and SATA as a
 drive interface.
 Sun will sell you a supported SCSI controller with
 the X2200 M2 (the
 Sun StorageTek PCI-E Dual Channel Ultra320 SCSI
 HBA).

 SCSI is far better for a host attachment mechanism
 than eSATA if you
 plan on doing more than a couple of drives, which it
 sounds like you
 are. While the SCSI HBA is going to cost quite a bit
 more than an eSATA
 HBA, the external JBODs run about the same, and the
 total difference is
 going to be $300 or so across the whole setup (which
 will cost you $5000
 or more fully populated). So the cost to use SCSI vs
 eSATA as the host-
 attach is a rounding error.

I understand your comments in some ways, in others I do not. It sounds like we're moving backwards 
in time. Exactly why is SCSI better than SAS/SATA for external devices? From my 
experience (with other OSs/hardware platforms) the opposite is true. A nice SAS/SATA controller 
with external ports (especially those that allow multiple SAS/SATA drives via one cable - whichever 
tech you use) works wonderfully for me, and I get a nice thin/clean cable which makes cable 
management much more enjoyable in higher density situations.

I also don't agree with the logic just spend a mere $300 extra to use older 
technology!

$300 may not be much to large business, but things like this nickle and dime 
small business owners. There's a lot of things I'd prefer to spend $300 on than 
an expensive SCSI HBA which offers no advantages over a SAS counterpart, in 
fact offers disadvantages instead.

Your input is of course highly valued, and it's quite possible I'm missing an important 
piece to the puzzle somewhere here, but I am not convinced this is the ideal solution - 
simply a stick with the old stuff, it's easier solution, which I am very much 
against.

Thanks,
David


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: External drive enclosures + Sun Server for massstorage

2007-01-22 Thread Jason J. W. Williams

Hi Guys,

The original X2100 was a pile of doggie doo-doo. All of our problems
with it go back to the atrocious quality of the nForce 4 Pro chipset.
The NICs in particular are just crap. The M2s are better, but the
MCP55 chipset has not resolved all of its flakiness issues. That being
said Sun designed that case with hot-plug bays, if Solaris isn't going
to support it, then those shouldn't be there in my opinion.

Best Regards,
Jason

On 1/22/07, Frank Cusack [EMAIL PROTECTED] wrote:

  In short, the release note is confusing, so ignore it.  Use x2100
  disks as hot pluggable like you've always used hot plug disks in
  Solaris.

 Again, NO these drives are not hot pluggable and the release note is
 accurate.  PLEASE get a system to test.  Or take our word for it.

hmm I think I may have just figured out the problem here.

YES the x2100 is that bad.  I too found it quite hard to believe that
Sun would sell this without hot plug drives.  It seems like a step
backwards.

(and of course I don't mean that the x2100 is awful, it's a great
hardware and very well priced ... now if only hot plug worked!)

My main issue is that the x2100 is advertised as hot plug working.
You have to dig pretty deep -- deeper than would be expected of a
typical buyer -- to find that Solaris does not support it.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re: External drive enclosures + Sun Server for mass

2007-01-22 Thread Jason J. W. Williams

Hi David,

Glad to help! I don't want to bad-mouth the X2100 M2s that much,
because they have been solid. I believe the M2s are made/designed just
for Sun by Quanta Computer (http://www.quanta.com.tw/e_default.htm)
whereas the mobos in the original X2100 was Tyan Tiger with some
slight modifications. That all being said, the problem is that Nvidia
chipset. The MCP55 in the X2100 M2 is an alright chipset, the nForce 4
Pro just had bugs.

Best Regards,
Jason

On 1/22/07, David J. Orman [EMAIL PROTECTED] wrote:

 Hi David,

 Depending on the I/O you're doing the X4100/X4200 are
 much better
 suited because of the dual HyperTransport buses. As a
 storage box with
 GigE outputs you've got a lot more I/O capacity with
 two HT buses than
 one. That plus the X4100 is just a more solid box.

That much makes sense, thanks for clearing that up.

 The X2100 M2 while
 a vast improvement over the X2100 in terms of
 reliability and
 features, is still an OEM'd whitebox. We use the
 X2100 M2s for
 application servers, but for anything that needs
 solid reliability or
 I/O we go Galaxy.

Ahh. That explains a lot. Thank you once again!

Sounds like the X2* is the red-headed stepchild of Sun's product line. They 
should slap disclaimers up on the product information pages so we know better 
than to purchase into something that doesn't fully function.

Still unclear on the SAS/SATA solutions, but hopefully that'll progress further 
now in the thread.

Cheers,
David


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Understanding ::memstat in terms of the ARC

2007-01-22 Thread Jason J. W. Williams

Hello all,

I have a question. Below are two ::memstat outputs about 5 days apart.
The interesting thing is the anonymous memory shows 2GB, though the
two major hogs of that memory (two MySQL instances) claim to be
consuming about 6.2GB (checked via pmap).

Also, it seems like the ARC keeps creeping the kernel memory over the
4GB limit I set for the ARC (zfs_arc_max). What I was also, curious
about, is if ZFS affects the cachelist line, or if that is just for
UFS. Thank you in advance!

Best Regards,
Jason

01/17/2007  02:28:50 GMT 2007
Page SummaryPagesMB  %Tot
     
Kernel1485925  5804   36%
Anon   855812  3343   21%
Exec and libs7438290%
Page cache   3863150%
Free (cachelist)   185235   7234%
Free (freelist)   1629288  6364   39%

Total 4167561 16279
Physical  4078747 15932

01/22/2007 01:17:32 GMT 2007
Page SummaryPagesMB  %Tot
     
Kernel1534184  5992   37%
Anon   538054  2101   13%
Exec and libs7497290%
Page cache  18550720%
Free (cachelist)  1384165  5406   33%
Free (freelist)685111  2676   16%

Total 4167561 16279
Physical  4078747 15932
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] External drive enclosures + Sun Server for mass storage

2007-01-20 Thread Jason J. W. Williams

Hi Shannon,

The markup is still pretty high on a per-drive basis. That being said,
$1-2/GB is darn low for the capacity in a server. Plus, you're also
paying for having enough HyperTransport I/O to feed the PCI-E I/O.

Does anyone know what problems they had with the 250GB version of the
Thumper that caused them to pull it?

Best Regards,
Jason

On 1/20/07, Shannon Roddy [EMAIL PROTECTED] wrote:

Frank Cusack wrote:

 thumper (x4500) seems pretty reasonable ($/GB).

 -frank


I am always amazed that people consider thumper to be reasonable in
price.  450% or more markup per drive from street price in July 2006
numbers doesn't seem reasonable to me, even after subtracting the cost
of the system.  I like the x4500, I wish I had one.  But, I can't pay
what Sun wants for it.  So, instead, I am stuck buying lower end Sun
systems and buying third party SCSI/SATA JBODs.  I like Sun.  I like
their products, but I can't understand their storage pricing most of the
time.

-Shannon

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] External drive enclosures + Sun Server for mass storage

2007-01-19 Thread Jason J. W. Williams

Hi David,

I don't know if your company qualifies as a startup under Sun's regs
but you can get an X4500/Thumper for $24,000 under this program:
http://www.sun.com/emrkt/startupessentials/

Best Regards,
Jason

On 1/19/07, David J. Orman [EMAIL PROTECTED] wrote:

Hi,

I'm looking at Sun's 1U x64 server line, and at most they support two drives. 
This is fine for the root OS install, but obviously not sufficient for many 
users.

Specifically, I am looking at the: http://www.sun.com/servers/x64/x2200/ 
X2200M2.

It only has Riser card assembly with two internal 64-bit, 8-lane, low-profile, half 
length PCI-Express slots for expansion.

What I'm looking for is a SAS/SATA card that would allow me to add an external 
SATA enclosure (or some such device) to add storage. The supported list on the 
HCL is pretty slim, and I see no PCI-E stuff. A card that supports SAS would be 
*ideal*, but I can settle for normal SATA too.

So, anybody have any good suggestions for these two things:

#1 - SAS/SATA PCI-E card that would work with the Sun X2200M2.
#2 - Rack-mountable external enclosure for SAS/SATA drives, supporting hot swap 
of drives.

Basically, I'm trying to get around using Sun's extremely expensive storage 
solutions while waiting on them to release something reasonable now that ZFS 
exists.

Cheers,
David


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: What SATA controllers are people using for ZFS?

2007-01-18 Thread Jason J. W. Williams

Hi Frank,

Sun doesn't support the X2100 SATA controller on Solaris 10? That's
just bizarre.

-J

On 1/18/07, Frank Cusack [EMAIL PROTECTED] wrote:

THANK YOU Naveen, Al Hopper, others, for sinking yourselves into the
shit world of PC hardware and [in]compatibility and coming up with
well qualified white box solutions for S10.

I strongly prefer to buy Sun kit, but I am done waiting for Sun to support
the SATA controller on the x2100.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Jason J. W. Williams

Hi Anantha,

I was curious why segregating at the FS level would provide adequate
I/O isolation? Since all FS are on the same pool, I assumed flogging a
FS would flog the pool and negatively affect all the other FS on that
pool?

Best Regards,
Jason

On 1/17/07, Anantha N. Srirama [EMAIL PROTECTED] wrote:

You're probably hitting the same wall/bug that I came across; ZFS in all 
versions up to and including Sol10U3 generates excessive I/O when it encounters 
'fssync' or if any of the files were opened with 'O_DSYNC' option.

I do believe Oracle (or any DB for that matter) opens the file with O_DSYNC 
option. During normal times it does result in excessive I/O but is probably 
well under your system capacity (it was in our case.) But when you are doing 
backups or clones (Oracle clones by using RMAN or copying of db files?) you are 
going to flood the I/O sub-system and that's when the whole ZFS excessive I/O 
starts to put a hurt on the DB performance.

Here are a few suggestions that can give you interim relief:

- Seggregate your I/O at filesystem level; the bug is at the filesystem level 
not ZFS pool level. By this I mean ensure the online redo logs are in a ZFS FS 
that nobody else uses, same for control files. As long as the writes to control 
and online redo logs are met your system will be happy.
- Ensure that your clone and RMAN (if you're going to disk) write to a seperate 
ZFS FS that contains no production files.
- If the above two items don't give you relieve then relocate the online redo 
log and control files to a UFS filesystem. No need to downgrade the entire ZFS 
to something else.
- Consider Oracle ASM (DB version permitting,) works very well. Why deal with 
VxFS.

Feel free to drop me a line, I've over 17 years of Oracle DB experience and 
love to troubleshoot problems like this. I've another vested interest; we're 
considering ZFS for widespread use in our environment and any experience is 
good for us.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: Heavy writes freezing system

2007-01-17 Thread Jason J. W. Williams

Hi Robert,

I see. So it really doesn't get around the idea of putting DB files
and logs on separate spindles?

Best Regards,
Jason

On 1/17/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Wednesday, January 17, 2007, 11:24:50 PM, you wrote:

JJWW Hi Anantha,

JJWW I was curious why segregating at the FS level would provide adequate
JJWW I/O isolation? Since all FS are on the same pool, I assumed flogging a
JJWW FS would flog the pool and negatively affect all the other FS on that
JJWW pool?

because of the bug which forces all outstanding writes in a file
system to commit to storage in case of one fsync to one file.
Now when you separate data to different file systems the bug will
affect only data in that file system which could greatly reduce imapct
on performance if it's done right.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Eliminating double path with ZFS's volume manager

2007-01-16 Thread Jason J. W. Williams

Hi Philip,

I'm not an expert, so I'm afraid I don't know what to tell you. I'd
call Apple Support and see what they say. As horrid as they are at
Enterprise support they may be the best ones to clarify if
multipathing is available without Xsan.


Best Regards,
Jason

On 1/16/07, Philip Mötteli [EMAIL PROTECTED] wrote:

 Looks like its got a half-way decent multipath
 design:
 http://docs.info.apple.com/article.html?path=Xsan/1.1/
 en/c3xs12.html

Great, but that is with Xsan. If I don't exchange our Hitachi with an Xsan, I 
don't have this 'cvadmin'.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Eliminating double path with ZFS's volume manager

2007-01-15 Thread Jason J. W. Williams

Hi Torrey,

I think it does if you buy Xsan. Its still a separate product isn't
it? Thought its more like QFS + MPXIO.

Best Regards,
Jason

On 1/15/07, Torrey McMahon [EMAIL PROTECTED] wrote:

Robert Milkowski wrote:

 2. I belive it's definitely possible to just correct your config under
 Mac OS without any need to use other fs or volume manager, however
 going to zfs could be a good idea anyway


That implies that MacOS has some sort of native SCSI multipathing like
Solaris Mpxio. Does such a beast exist?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS direct IO

2007-01-15 Thread Jason J. W. Williams

Hi Roch,

You mentioned improved ZFS performance in the latest Nevada build (60
right now?)...I was curious if one would notice much of a performance
improvement between 54 and 60? Also, does anyone think the zfs_arc_max
tunable-support will be made available as a patch to S10U3, or would
that wait until U4? Thank you in advance!

Best Regards,
Jason

On 1/15/07, Roch - PAE [EMAIL PROTECTED] wrote:


Jonathan Edwards writes:
 
  On Jan 5, 2007, at 11:10, Anton B. Rang wrote:
 
   DIRECT IO is a set of performance optimisations to circumvent
   shortcomings of a given filesystem.
  
   Direct I/O as generally understood (i.e. not UFS-specific) is an
   optimization which allows data to be transferred directly between
   user data buffers and disk, without a memory-to-memory copy.
  
   This isn't related to a particular file system.
  
 
  true .. directio(3) is generally used in the context of *any* given
  filesystem to advise it that an application buffer to system buffer
  copy may get in the way or add additional overhead (particularly if
  the filesystem buffer is doing additional copies.)  You can also look
  at it as a way of reducing more layers of indirection particularly if
  I want the application overhead to be higher than the subsystem
  overhead.  Programmatically .. less is more.

Direct IO makes good sense when the target disk sectors are
set a priori. But in the context of ZFS, would you rather
have 10 direct disk I/Os or 10 bcopies and 2 I/O (say that
was possible).

As for read, I  can see that when  the load is cached in the
disk array and we're running  100% CPU, the extra copy might
be noticeable. Is this the   situation that longs for DIO  ?
What % of a system is spent in the copy  ? What is the added
latency that comes from the copy ? Is DIO the best way to
reduce the CPU cost of ZFS ?

The  current Nevada  code base  has  quite nice  performance
characteristics  (and  certainly   quirks); there are   many
further efficiency gains to be reaped from ZFS. I just don't
see DIO on top of  that list for now.   Or at least  someone
needs to  spell out what  is ZFS/DIO and  how much better it
is expected to be (back of the envelope calculation accepted).

Reading RAID-Z  subblocks on filesystems that  have checksum
disabled might be interesting.   That would avoid  some disk
seeks.To served  the  subblocks directly   or  not is  a
separate matter; it's  a small deal  compared to the feature
itself.  How about disabling the  DB  checksum (it can't fix
the block anyway) and do mirroring ?

-r


  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Eliminating double path with ZFS's volume manager

2007-01-15 Thread Jason J. W. Williams

Hi Torrey,

Looks like its got a half-way decent multipath design:
http://docs.info.apple.com/article.html?path=Xsan/1.1/en/c3xs12.html

Whether or not it works is another story I suppose. ;-)

Best Regards,
Jason

On 1/15/07, Torrey McMahon [EMAIL PROTECTED] wrote:

Got me. However, transport multipathing - Like Mpxio, DLM, VxDMP, etc. -
is usually separated from the filesystem layers.

Jason J. W. Williams wrote:
 Hi Torrey,

 I think it does if you buy Xsan. Its still a separate product isn't
 it? Thought its more like QFS + MPXIO.

 Best Regards,
 Jason

 On 1/15/07, Torrey McMahon [EMAIL PROTECTED] wrote:
 Robert Milkowski wrote:
 
  2. I belive it's definitely possible to just correct your config under
  Mac OS without any need to use other fs or volume manager, however
  going to zfs could be a good idea anyway


 That implies that MacOS has some sort of native SCSI multipathing like
 Solaris Mpxio. Does such a beast exist?

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Replacing a drive in a raidz2 group

2007-01-13 Thread Jason J. W. Williams

Hi Robert,

Will build 54 offline the drive?

Best Regards,
Jason

On 1/13/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Saturday, January 13, 2007, 12:06:57 AM, you wrote:

JJWW Hi Robert,

JJWW We've experienced luck with flaky SATA drives in our STK array by
JJWW unseating and reseating the drive to cause a reset of the firmware. It
JJWW may be a bad drive, or the firmware may just have hit a bug. Hope its
JJWW the latter! :-D

JJWW I'd be interested why the hot-spare didn't kick in. I thought the FMA
JJWW integration would detect read errors.

FMA did but ZFS+FMA we're not there in U3.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Replacing a drive in a raidz2 group

2007-01-12 Thread Jason J. W. Williams

Hi Robert,

We've experienced luck with flaky SATA drives in our STK array by
unseating and reseating the drive to cause a reset of the firmware. It
may be a bad drive, or the firmware may just have hit a bug. Hope its
the latter! :-D

I'd be interested why the hot-spare didn't kick in. I thought the FMA
integration would detect read errors.

Best Regards,
Jason

On 1/12/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello zfs-discuss,

  One of our drives in x4500 is failing - it periodically
  disconnects/connects. ZFS only reports READ errors and no hot-spare
  automatically took in which was expected currently.

  So I issued zpool replace with a hot-spare drive.

  Now it takes forever and it seems like ZFS is rebuilding drive using
  checksums - wouldn't it be much faster if it just copied data from
  the drive being replaced (like attaching mirror)?


--
Best regards,
 Robert  mailto:[EMAIL PROTECTED]
 http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-11 Thread Jason J. W. Williams

Hi Mark,

That does help tremendously. How does ZFS decide which zio cache to
use? I apologize if this has already been addressed somewhere.

Best Regards,
Jason

On 1/11/07, Mark Maybee [EMAIL PROTECTED] wrote:

Al Hopper wrote:
 On Wed, 10 Jan 2007, Mark Maybee wrote:

 Jason J. W. Williams wrote:
 Hi Robert,

 Thank you! Holy mackerel! That's a lot of memory. With that type of a
 calculation my 4GB arc_max setting is still in the danger zone on a
 Thumper. I wonder if any of the ZFS developers could shed some light
 on the calculation?

 In a worst-case scenario, Robert's calculations are accurate to a
 certain degree:  If you have 1GB of dnode_phys data in your arc cache
 (that would be about 1,200,000 files referenced), then this will result
 in another 3GB of related data held in memory: vnodes/znodes/
 dnodes/etc.  This related data is the in-core data associated with
 an accessed file.  Its not quite true that this data is not evictable,
 it *is* evictable, but the space is returned from these kmem caches
 only after the arc has cleared its blocks and triggered the free of
 the related data structures (and even then, the kernel will need to
 to a kmem_reap to reclaim the memory from the caches).  The
 fragmentation that Robert mentions is an issue because, if we don't
 free everything, the kmem_reap may not be able to reclaim all the
 memory from these caches, as they are allocated in slabs.

 We are in the process of trying to improve this situation.
  snip .

 Understood (and many Thanks).  In the meantime, is there a rule-of-thumb
 that you could share that would allow mere humans (like me) to calculate
 the best values of zfs:zfs_arc_max and ncsize, given the that machine has
 nGb of RAM and is used in the following broad workload scenarios:

 a) a busy NFS server
 b) a general multiuser development server
 c) a database server
 d) an Apache/Tomcat/FTP server
 e) a single user Gnome desktop running U3 with home dirs on a ZFS
 filesystem

 It would seem, from reading between the lines of previous emails,
 particularly the ones you've (Mark M) written, that there is a rule of
 thumb that would apply given a standard or modified ncsize tunable??

 I'm primarily interested in a calculation that would allow settings that
 would reduce the possibility of the machine descending into swap hell.

Ideally, there would be no need for any tunables; ZFS would always do
the right thing.   This is our grail.  In the meantime, I can give some
recommendations, but there is no rule of thumb that is going to work
in all circumstances.

ncsize: As I have mentioned previously, there are overheads
associated with caching vnode data in ZFS.  While
the physical on-disk data for a znode is only 512bytes,
the related in-core cost is significantly higher.
Roughly, you can expect that each ZFS vnode held in
the DNLC will cost about 3K of kernel memory.

So, you need to set ncsize appropriately for how much
memory you are willing to devote to it.  500,000 entries
is going to cost you 1.5GB of memory.

zfs_arc_max: This is the maximum amount of memory you want the
ARC to be able to use.  Note that the ARC won't
necessarily use this much memory: if other applications
need memory, the ARC will shrink to accommodate.
Although, also note that the ARC *can't* shrink if all
of its memory is held.  For example, data in the DNLC
cannot be evicted from the ARC, so this data must first
be evicted from the DNLC before the ARC can free up
space (this is why it is dangerous to turn off the ARCs
ability to evict vnodes from the DNLC).

Also keep in mind that the ARC size does not account for
many in-core data structures used by ZFS (znodes/dnodes/
dbufs/etc).  Roughly, for every 1MB of cached file
pointers, you can expect another 3MB of memory used
outside of the ARC.  So, in the example above, where
ncsize is 500,000, the ARC is only seeing about 400MB
of the 1.5GB consumed.  As I have stated previously,
we consider this a bug in the current ARC accounting
that we will soon fix.  This is only an issue in
environments where many files are being accessed.  If
the number of files accessed is relatively low, then
the ARC size will be much closer to the actual memory
consumed by ZFS.

So, in general, you should not really need to tune
zfs_arc_max.  However, in environments where you have
specific applications that consume known quantities of
memory (e.g. database), it will likely

Re: [zfs-discuss] Solid State Drives?

2007-01-11 Thread Jason J. W. Williams

Hello all,

Just my two cents on the issue. The Thumper is proving to be a
terrific database server in all aspects except latency. While the
latency is acceptable, being able to add some degree of battery-backed
write cache that ZFS could use would be phenomenal.

Best Regards,
Jason

On 1/11/07, Jonathan Edwards [EMAIL PROTECTED] wrote:


On Jan 11, 2007, at 15:42, Erik Trimble wrote:

 On Thu, 2007-01-11 at 10:35 -0800, Richard Elling wrote:
 The product was called Sun PrestoServ.  It was successful for
 benchmarking
 and such, but unsuccessful in the market because:

  + when there is a failure, your data is spread across multiple
fault domains

  + it is not clusterable, which is often a requirement for data
centers

  + it used a battery, so you had to deal with physical battery
replacement and all of the associated battery problems

  + it had yet another device driver, so integration was a pain

 Google for it and you'll see all sorts of historical perspective.
   -- richard


 Yes, I remember (and used) PrestoServ. Back in the SPARCcenter 1000
 days. :-)

as do i .. (keep your batteries charged!! and don't panic!)

 And yes, local caching makes the system non-clusterable.

not necessarily .. i like the javaspaces approach to coherency, and
companies like gigaspaces have done some pretty impressive things
with in memory SBA databases and distributed grid architectures ..
intelligent coherency design with a good distribution balance for
local, remote, and redundant can go a long way in improving your
cache numbers.

 However, all
 the other issues are common to a typical HW raid controller, and many
 people use host-based HW controllers just fine and don't find their
 problems to be excessive.

True given most workloads, but in general it's the coherency issues
that drastically affect throughput on shared controllers particularly
as you add and distribute the same luns or data across different
control processors.  Add too many and your cache hit rates might fall
in the toilet.

.je
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-10 Thread Jason J. W. Williams

Hi Guys,

After reading through the discussion on this regarding ZFS memory
fragmentation on snv_53 (and forward) and going through our
::kmastat...looks like ZFS is sucking down about 544 MB of RAM in the
various caches. About 360MB of that is in the zio_buf_65536 cache.
Next most notable is 55MB in zio_buf_32768, and 36MB in zio_buf_16384.
I don't think that's too bad but worth keeping track of. At this
point our kernel memory growth seems to have slowed, with it hovering
around 5GB, and the anon column is mostly what's growing now (as
expected...MySQL).

Most of the problem in the discussion thread on this seemed to be
related to a lot of DLNC entries due to the workload of a file server.
How would this affect a database server with operations in only a
couple very large files? Thank you in advance.

Best Regards,
Jason

On 1/10/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:

Sanjeev  Robert,

Thanks guys. We put that in place last night and it seems to be doing
a lot better job of consuming less RAM. We set it to 4GB and each of
our 2 MySQL instances on the box to a max of 4GB. So hopefully slush
of 4GB on the Thumper is enough. I would be interested in what the
other ZFS modules memory behaviors are. I'll take a perusal through
the archives. In general it seems to me that a max cap for ZFS whether
set through a series of individual tunables or a single root tunable
would be very helpful.

Best Regards,
Jason

On 1/10/07, Sanjeev Bagewadi [EMAIL PROTECTED] wrote:
 Jason,

 Robert is right...

 The point is ARC is the caching module of ZFS and majority of the memory
 is consumed through ARC.
 Hence by limiting the c_max of ARC we are limiting the amount ARC consumes.

 However, other modules of ZFS would consume more but that may not be as
 significant as ARC.

 Expert, please correct me if I am wrong here.

 Thanks and regards,
 Sanjeev.

 Robert Milkowski wrote:

 Hello Jason,
 
 Tuesday, January 9, 2007, 10:28:12 PM, you wrote:
 
 JJWW Hi Sanjeev,
 
 JJWW Thank you! I was not able to find anything as useful on the subject as
 JJWW that!  We are running build 54 on an X4500, would I be correct in my
 JJWW reading of that article that if I put set zfs:zfs_arc_max =
 JJWW 0x1 #4GB in my /etc/system, ZFS will consume no more than
 JJWW 4GB? Thank you in advance.
 
 That's the idea however it's not working that way now - under some
 circumstances ZFS could still consume much more memory - see other
 posts lately here.
 
 
 


 --
 Solaris Revenue Products Engineering,
 India Engineering Center,
 Sun Microsystems India Pvt Ltd.
 Tel:x27521 +91 80 669 27521




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Adding disk to a RAID-Z?

2007-01-10 Thread Jason J. W. Williams

Hi Kyle,

I think there was a lot of talk about this behavior on the RAIDZ2 vs.
RAID-10 thread. My understanding from that discussion was that every
write stripes the block across all disks on a RAIDZ/Z2 group, thereby
making writing the group no faster than writing to a single disk.
However reads are much faster, as all the disk are activated in the
read process.

The default config on the X4500 we received recently was RAIDZ-groups
of 6 disks (across the 6 controllers) striped together into one large
zpool.

Best Regards,
Jason

On 1/10/07, Kyle McDonald [EMAIL PROTECTED] wrote:

Robert Milkowski wrote:
 Hello Kyle,

 Wednesday, January 10, 2007, 5:33:12 PM, you wrote:

 KM Remember though that it's been mathematically figured that the
 KM disadvantages to RaidZ start to show up after 9 or 10 drives. (That's

 Well, nothing like this was proved and definitely not mathematically.

 It's just a common sense advise - for many users keeping raidz groups
 below 9 disks should give good enough performance. However if someone
 creates raidz group of 48 disks he/she probable expects also
 performance and in general raid-z wouldn't offer one.



It's very possible I misstated something. :)

I thought I had read though, something like over 9 or so disks would put
mean that each FS block would be written to less than a single disk
block on each disk?

Or maybe it was that waiting to read from all drives for files less than
a FS block would suffer?

Ahhh...  I can't remember what the effect were thought to be. I thought
there was some theoretical math involved though.

I do remember people advising against it though. Not just on a
performance basis, but also on a increased risk of failure basis. I
think it was just seen as a good balancing point.

-Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: Adding disk to a RAID-Z?

2007-01-10 Thread Jason J. W. Williams

Hi Robert,

I read the following section from
http://blogs.sun.com/roch/entry/when_to_and_not_to as indicating
random writes to a RAID-Z had the performance of a single disk
regardless of the group size:


Effectively,  as  a first approximation,  an  N-disk RAID-Z group will
behave as   a single   device in  terms  of  deliveredrandom input
IOPS. Thus  a 10-disk group of devices  each capable of 200-IOPS, will
globally act as a 200-IOPS capable RAID-Z group.



Best Regards,
Jason

On 1/10/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Wednesday, January 10, 2007, 10:54:29 PM, you wrote:

JJWW Hi Kyle,

JJWW I think there was a lot of talk about this behavior on the RAIDZ2 vs.
JJWW RAID-10 thread. My understanding from that discussion was that every
JJWW write stripes the block across all disks on a RAIDZ/Z2 group, thereby
JJWW making writing the group no faster than writing to a single disk.
JJWW However reads are much faster, as all the disk are activated in the
JJWW read process.

The opposite actually. Because of COW, writing (modifying as well)
will give you up-to N-1 disks performance for raid-z1 and N-2 disks performance 
for
raid-z2. Howeer reading can be slow in case of many small random reads
as to read each fs block you've got to wait for all data disks in a
group.


JJWW The default config on the X4500 we received recently was RAIDZ-groups
JJWW of 6 disks (across the 6 controllers) striped together into one large
JJWW zpool.

However the problem with that config is lack of hot-spare.
Of course it depends what you want (and there was no hot spare support
in U2 which is os installed in factory so far).


--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[4]: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-10 Thread Jason J. W. Williams

Hi Robert,

We've got the default ncsize. I didn't see any advantage to increasing
it outside of NFS serving...which this server is not. For speed the
X4500 is showing to be a killer MySQL platform. Between the blazing
fast procs and the sheer number of spindles, its perfromance is
tremendous. If MySQL cluster had full disk-based support, scale-out
with X4500s a-la Greenplum would be terrific solution.

At this point, the ZFS memory gobbling is the main roadblock to being
a good database platform.

Regarding the paging activity, we too saw tremendous paging of up to
24% of the X4500s CPU being used for that with the default arc_max.
After changing it to 4GB, we haven't seen anything much over 5-10%.

Best Regards,
Jason

On 1/10/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Thursday, January 11, 2007, 12:36:46 AM, you wrote:

JJWW Hi Robert,

JJWW Thank you! Holy mackerel! That's a lot of memory. With that type of a
JJWW calculation my 4GB arc_max setting is still in the danger zone on a
JJWW Thumper. I wonder if any of the ZFS developers could shed some light
JJWW on the calculation?

JJWW That kind of memory loss makes ZFS almost unusable for a database system.


If you leave ncsize with default value then I belive it won't consume
that much memory.


JJWW I agree that a page cache similar to UFS would be much better.  Linux
JJWW works similarly to free pages, and it has been effective enough in the
JJWW past. Though I'm equally unhappy about Linux's tendency to grab every
JJWW bit of free RAM available for filesystem caching, and then cause
JJWW massive memory thrashing as it frees it for applications.

Page cache won't be better - just better memory control for ZFS caches
is strongly desired. Unfortunately from time to time ZFS makes servers
to page enormously :(


--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-08 Thread Jason J. W. Williams

Sanjeev,

Could you point me in the right direction as to how to convert the
following GCC compile flags to Studio 11 compile flags? Any help is
greatly appreciated. We're trying to recompile MySQL to give a
stacktrace and core file to track down exactly why its
crashing...hopefully it will illuminate if memory truly is the issue.
Thank you very much in advance!

-felide-constructors
-fno-exceptions -fno-rtti

Best Regards,
Jason

On 1/7/07, Sanjeev Bagewadi [EMAIL PROTECTED] wrote:

Jason,

There is no documented way of limiting the memory consumption.
The ARC section of ZFS tries to adapt to the memory pressure of the system.
However, in your case probably it is not quick enough I guess.

One way of limiting the memory consumption would be limit the arc.c_max
This (arc.c_max) is set to 3/4 of the memory available (or 1GB less than
memory available).
This is done when the ZFS is loaded (arc_init()).

You should be able to change the value of arc.c_max through mdb and set
it to the value
you want. Exercise caution while setting it. Make sure you don't have
active zpools during this operation.

Thanks and regards,
Sanjeev.

Jason J. W. Williams wrote:

 Hello,

 Is there a way to set a max memory utilization for ZFS? We're trying
 to debug an issue where the ZFS is sucking all the RAM out of the box,
 and its crashing MySQL as a result we think. Will ZFS reduce its cache
 size if it feels memory pressure? Any help is greatly appreciated.

 Best Regards,
 Jason
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Limit ZFS Memory Utilization

2007-01-08 Thread Jason J. W. Williams

We're not using the Enterprise release, but we are working with them.
It looks like MySQL is crashing due to lack of memory.

-J

On 1/8/07, Toby Thain [EMAIL PROTECTED] wrote:


On 8-Jan-07, at 11:54 AM, Jason J. W. Williams wrote:

 ...We're trying to recompile MySQL to give a
 stacktrace and core file to track down exactly why its
 crashing...hopefully it will illuminate if memory truly is the issue.

If you're using the Enterprise release, can't you get MySQL's
assistance with this?

--Toby



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Limit ZFS Memory Utilization

2007-01-07 Thread Jason J. W. Williams

Hello,

Is there a way to set a max memory utilization for ZFS? We're trying
to debug an issue where the ZFS is sucking all the RAM out of the box,
and its crashing MySQL as a result we think. Will ZFS reduce its cache
size if it feels memory pressure? Any help is greatly appreciated.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solid State Drives?

2007-01-05 Thread Jason J. W. Williams

Could this ability (separate ZIL device) coupled with an SSD give
something like a Thumper the write latency benefit of battery-backed
write cache?

Best Regards,
Jason

On 1/5/07, Neil Perrin [EMAIL PROTECTED] wrote:



Robert Milkowski wrote On 01/05/07 11:45,:
 Hello Neil,

 Friday, January 5, 2007, 4:36:05 PM, you wrote:

 NP I'm currently working on putting the ZFS intent log on separate devices
 NP which could include seperate disks and nvram/solid state devices.
 NP This would help any application using fsync/O_DSYNC - in particular
 NP DB and NFS. From protoyping considerable peformanace improvements have
 NP been seen.

 Can you share any results from prototype testing?

I'd prefer not to just yet as I don't want to raise expectations unduly.
When testing I was using a simple local benchmark, whereas
I'd prefer to run something more official such as TPC.
I'm also missing a few required features in the protoype which
may affect performance.

Hopefully I can can provide some results soon, but even those will
be unoffical.

Neil.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Hello All,

I was curious if anyone had run a benchmark on the IOPS performance of
RAIDZ2 vs RAID-10? I'm getting ready to run one on a Thumper and was
curious what others had seen. Thank you in advance.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Hi Richard,

Hmmthat's interesting. I wonder if its worth benchmarking RAIDZ2
if those are the results you're getting. The testing is to see the
performance gain we might get for MySQL moving off the FLX210 to an
active/passive pair of X4500s. Was hoping with that many SATA disks
RAIDZ2 would provide a nice safety net.

Best Regards,
Jason

On 1/3/07, Richard Elling [EMAIL PROTECTED] wrote:

Jason J. W. Williams wrote:
 Hello All,

 I was curious if anyone had run a benchmark on the IOPS performance of
 RAIDZ2 vs RAID-10? I'm getting ready to run one on a Thumper and was
 curious what others had seen. Thank you in advance.

I've been using a simple model for small, random reads.  In that model,
the performance of a raidz[12] set will be approximately equal to a single
disk.  For example, if you have 6 disks, then the performance for the
6-disk raidz2 set will be normalized to 1, and the performance of a 3-way
dynamic stripe of 2-way mirrors will have a normalized performance of 6.
I'd be very interested to see if your results concur.

The models for writes or large reads are much more complicated because
of the numerous caches of varying size and policy throughout the system.
The small, random read workload will be largely unaffected by caches and
you should see the performance as predicted by the disk rpm and seek time.
  -- richard


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Just got an interesting benchmark. I made two zpools:

RAID-10 (9x 2-way RAID-1 mirrors: 18 disks total)
RAID-Z2 (3x 6-way RAIDZ2 group: 18 disks total)

Copying 38.4GB of data from the RAID-Z2 to the RAID-10 took 307
seconds. Deleted the data from the RAID-Z2. Then copying the 38.4GB of
data from the RAID-10 to the RAID-Z2 took 258 seconds. Would have
expected the RAID-10 to write data more quickly.

Its interesting to me that the RAID-10 pool registered the 38.4GB of
data as 38.4GB, whereas the RAID-Z2 registered it as 56.4.

Best Regards,
Jason

On 1/3/07, Jason J. W. Williams [EMAIL PROTECTED] wrote:

Hi Richard,

Hmmthat's interesting. I wonder if its worth benchmarking RAIDZ2
if those are the results you're getting. The testing is to see the
performance gain we might get for MySQL moving off the FLX210 to an
active/passive pair of X4500s. Was hoping with that many SATA disks
RAIDZ2 would provide a nice safety net.

Best Regards,
Jason

On 1/3/07, Richard Elling [EMAIL PROTECTED] wrote:
 Jason J. W. Williams wrote:
  Hello All,
 
  I was curious if anyone had run a benchmark on the IOPS performance of
  RAIDZ2 vs RAID-10? I'm getting ready to run one on a Thumper and was
  curious what others had seen. Thank you in advance.

 I've been using a simple model for small, random reads.  In that model,
 the performance of a raidz[12] set will be approximately equal to a single
 disk.  For example, if you have 6 disks, then the performance for the
 6-disk raidz2 set will be normalized to 1, and the performance of a 3-way
 dynamic stripe of 2-way mirrors will have a normalized performance of 6.
 I'd be very interested to see if your results concur.

 The models for writes or large reads are much more complicated because
 of the numerous caches of varying size and policy throughout the system.
 The small, random read workload will be largely unaffected by caches and
 you should see the performance as predicted by the disk rpm and seek time.
   -- richard



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Hi Robert,

Our X4500 configuration is multiple 6-way (across controllers) RAID-Z2
groups striped together. Currently, 3 RZ2 groups. I'm about to test
write performance against ZFS RAID-10. I'm curious why RAID-Z2
performance should be good? I assumed it was an analog to RAID-6. In
our recent experience RAID-5 due to the 2 reads, a XOR calc and a
write op per write instruction is usually much slower than RAID-10
(two write ops). Any advice is  greatly appreciated.

Best Regards,
Jason

On 1/3/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Wednesday, January 3, 2007, 11:11:31 PM, you wrote:

JJWW Hi Richard,

JJWW Hmmthat's interesting. I wonder if its worth benchmarking RAIDZ2
JJWW if those are the results you're getting. The testing is to see the
JJWW performance gain we might get for MySQL moving off the FLX210 to an
JJWW active/passive pair of X4500s. Was hoping with that many SATA disks
JJWW RAIDZ2 would provide a nice safety net.

Well, you weren't thinking about one big raidz2 group?

To get more performance you can create one pool with many smaller
raidz2 groups - that way your worst case read performance should
increase approximately N times where N is number of raidz-2 groups.

However  keep in mind that write performance should be really good
with raidz2.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Hi Robert,

That makes sense. Thank you. :-) Also, it was zpool I was looking at.
zfs always showed the correct size.

-J

On 1/3/07, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Wednesday, January 3, 2007, 11:40:38 PM, you wrote:

JJWW Just got an interesting benchmark. I made two zpools:

JJWW RAID-10 (9x 2-way RAID-1 mirrors: 18 disks total)
JJWW RAID-Z2 (3x 6-way RAIDZ2 group: 18 disks total)

JJWW Copying 38.4GB of data from the RAID-Z2 to the RAID-10 took 307
JJWW seconds. Deleted the data from the RAID-Z2. Then copying the 38.4GB of
JJWW data from the RAID-10 to the RAID-Z2 took 258 seconds. Would have
JJWW expected the RAID-10 to write data more quickly.

Actually with 18 disks in raid-10 in theory you get write performance
equal to stripe of 9 disks. With 18 disks in 3 raidz2 groups of 6 disks each you
should expect something like (6-2)*3 = 12 disk, so equal to 12 disks
in stripe.

JJWW Its interesting to me that the RAID-10 pool registered the 38.4GB of
JJWW data as 38.4GB, whereas the RAID-Z2 registered it as 56.4.

If you checked with zpool - then it's ok - it reports disk usage
also wit parity overhead. If zfs list showed you that numbers then
either you're using old snv bits or s10U2 as it was corrected some
time ago (in U3).


--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Re[2]: RAIDZ2 vs. ZFS RAID-10

2007-01-03 Thread Jason J. W. Williams

Hi Anton,

Thank you for the information. That is exactly our scenario. We're 70%
write heavy, and given the nature of the workload, our typical writes
are 10-20K. Again the information is much appreciated.

Best Regards,
Jason

On 1/3/07, Anton B. Rang [EMAIL PROTECTED] wrote:

 In our recent experience RAID-5 due to the 2 reads, a XOR calc and a
 write op per write instruction is usually much slower than RAID-10
 (two write ops). Any advice is  greatly appreciated.

 RAIDZ and RAIDZ2 does not suffer from this malady (the RAID5 write hole).

1. This isn't the write hole.

2. RAIDZ and RAIDZ2 suffer from read-modify-write overhead when updating a file 
in writes of less than 128K, but not when writing a new file or issuing large 
writes.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS over NFS extra slow?

2007-01-02 Thread Jason J. W. Williams

Hi Brad,

I believe benr experienced the same/similar issue here:
http://www.opensolaris.org/jive/message.jspa?messageID=77347

If it is the same, I believe its a known ZFS/NFS interaction bug, and
has to do with small file creation.

Best Regards,
Jason

On 1/2/07, Brad Plecs [EMAIL PROTECTED] wrote:

I had a user report extreme slowness on a ZFS filesystem mounted over NFS over 
the weekend.
After some extensive testing, the extreme slowness appears to only occur when a 
ZFS filesystem is mounted over NFS.

One example is doing a 'gtar xzvf php-5.2.0.tar.gz'... over NFS onto a ZFS 
filesystem.  this takes:

real5m12.423s
user0m0.936s
sys 0m4.760s

Locally on the server (to the same ZFS filesystem) takes:

real0m4.415s
user0m1.884s
sys 0m3.395s

The same job over NFS to a UFS filesystem takes

real1m22.725s
user0m0.901s
sys 0m4.479s

Same job locally on server to same UFS filesystem:

real0m10.150s
user0m2.121s
sys 0m4.953s


This is easily reproducible even with single large files, but the multiple 
small files
seems to illustrate some awful sync latency between each file.

Any idea why ZFS over NFS is so bad?  I saw the threads that talk about an 
fsync penalty,
but they don't seem relevant since the local ZFS performance is quite good.


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: Re[2]: [zfs-discuss] Re: Difference between ZFS and UFS with one LUN froma SAN

2006-12-26 Thread Jason J. W. Williams

Hi Robert,

MPxIO had correctly moved the paths. More than one path to controller
A was OK, and one patch to controller A for each LUN was active when
controller B was rebooted.  I have a hunch that the array was at
fault, because it also rebooted a Windows server with LUNs only on
Controller A. In the case of the Windows server Engenios RDAC was
handling multipathing. Overall, not a big deal, I just wouldn't trust
the array to do a hitless commanded controller failover or firmware
upgrade.

-J

On 12/22/06, Robert Milkowski [EMAIL PROTECTED] wrote:

Hello Jason,

Friday, December 22, 2006, 5:55:38 PM, you wrote:

JJWW Just for what its worth, when we rebooted a controller in our array
JJWW (we pre-moved all the LUNs to the other controller), despite using
JJWW MPXIO ZFS kernel panicked. Verified that all the LUNs were on the
JJWW correct controller when this occurred. Its not clear why ZFS thought
JJWW it lost a LUN but it did. We have done cable pulling using ZFS/MPXIO
JJWW before and that works very well. It may well be array-related in our
JJWW case, but I hate anyone to have a false sense of security.

Did you first check (with format for example) if LUNs were really
accessible? If MPxIO worked ok and at least one path is ok then ZFS
won't panic.

--
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Difference between ZFS and UFS with one LUN froma SAN

2006-12-22 Thread Jason J. W. Williams

Just for what its worth, when we rebooted a controller in our array
(we pre-moved all the LUNs to the other controller), despite using
MPXIO ZFS kernel panicked. Verified that all the LUNs were on the
correct controller when this occurred. Its not clear why ZFS thought
it lost a LUN but it did. We have done cable pulling using ZFS/MPXIO
before and that works very well. It may well be array-related in our
case, but I hate anyone to have a false sense of security.

-J

On 12/22/06, Tim Cook [EMAIL PROTECTED] wrote:

This may not be the answer you're looking for, but I don't know if it's
something you've thought of.  If you're pulling a LUN from an expensive
array, with multiple HBA's in the system, why not run mpxio?  If you ARE
running mpxio, there shouldn't be an issue with a path dropping.  I have
the setup above in my test lab and pull cables all the time and have yet
to see a zfs kernel panic.  Is this something you've considered?  I
haven't seen the bug in question, but I definitely have not run into it
when running mpxio.

--Tim

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Shawn Joy
Sent: Friday, December 22, 2006 7:35 AM
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] Re: Difference between ZFS and UFS with one LUN
froma SAN

OK,

But lets get back to the original question.

Does ZFS provide you with less features than UFS does on one LUN from a
SAN (i.e is it less stable).

ZFS on the contrary checks every block it reads and is able to find the
mirror
or reconstruct the data in a raidz config.
Therefore ZFS uses only valid data and is able to repair the data
blocks
automatically.
This is not possible in a traditional filesystem/volume manager
configuration.

The above is fine. If I have two LUNs. But my original question was if I
only have one LUN.

What about kernel panics from ZFS if for instance access to one
controller goes away for a few seconds or minutes. Normally UFS would
just sit there and warn I have lost access to the controller. Then when
the controller returns, after a short period, the warnings go away and
the LUN continues to operate. The admin can then research further into
why the controller went away. With ZFS, the above will panic the system
and possibly cause other coruption  on other LUNs due to this panic? I
believe this was discussed in other threads? I also believe there is a
bug filed against this? If so when should we expect this bug to be
fixed?


My understanding of ZFS is that it functions better in an environment
where we have JBODs attached to the hosts. This way ZFS takes care of
all of the redundancy? But what about SAN enviroments where customers
have spend big money to invest in storage. I know of one instance where
a customer has a growing need for more storage space. There environemt
uses many inodes. Due to the UFS inode limitation, when creating LUNs
over one TB, they would have to quadrulpe the about of storage usesd in
there SAN in order to hold all of the files. A possible solution to this
inode issue would be ZFS. However they have experienced kernel panics in
there environment when a controller dropped of line.

Any body have a solution to this?

Shawn


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Difference between ZFS and UFS with one LUN froma SAN

2006-12-22 Thread Jason J. W. Williams

Hi Tim,

One switch environment, two ports going to the host, 4 ports going to
the storage. Switch is a Brocade SilkWorm 3850 and the HBA is a
dual-port QLA2342. Solaris rev is S10 update 3. Array is a StorageTek
FLX210 (Engenio 2884)

The LUNs had moved to the other controller and MPXIO had shown the
paths change as a result, so it was a bit bizarre. Rebooting the other
controller shouldn't have done anything, but it did. Could have been
the array.

-J

On 12/22/06, Tim Cook [EMAIL PROTECTED] wrote:

Always good to hear others experiences J.  Maybe I'll try firing up the
Nexan today and downing a controller to see how that affects it vs.
downing a switch port/pulling cable.  My first intuition is time-out
values.  A cable pull will register differently than a blatant time-out
depending on where it occurs.  IE: Pulling the cable from the back of
the server will register instantly, vs. the storage timing out 3
switches away.  I'm sure you're aware of that, but just an FYI for
others following the thread less familiar with SAN technology.

To get a little more background:

What kind of an array is it?

How do you have the controllers setup?  Active/active?  Active/passive?
In other words do you have array side failover occurring as well or is
it in *dummy mode*?

Do you have multiple physical paths?  IE: each controller port and each
server port hitting different switches?

What HBA's are you using?  What switches?

What version of snv are you running, and which driver?

Yey for slow Friday's before x-mas, I have a bit of time to play in the
lab today.

--Tim

-Original Message-
From: Jason J. W. Williams [mailto:[EMAIL PROTECTED]
Sent: Friday, December 22, 2006 10:56 AM
To: Tim Cook
Cc: Shawn Joy; zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] Re: Difference between ZFS and UFS with one
LUN froma SAN

Just for what its worth, when we rebooted a controller in our array
(we pre-moved all the LUNs to the other controller), despite using
MPXIO ZFS kernel panicked. Verified that all the LUNs were on the
correct controller when this occurred. Its not clear why ZFS thought
it lost a LUN but it did. We have done cable pulling using ZFS/MPXIO
before and that works very well. It may well be array-related in our
case, but I hate anyone to have a false sense of security.

-J

On 12/22/06, Tim Cook [EMAIL PROTECTED] wrote:
 This may not be the answer you're looking for, but I don't know if
it's
 something you've thought of.  If you're pulling a LUN from an
expensive
 array, with multiple HBA's in the system, why not run mpxio?  If you
ARE
 running mpxio, there shouldn't be an issue with a path dropping.  I
have
 the setup above in my test lab and pull cables all the time and have
yet
 to see a zfs kernel panic.  Is this something you've considered?  I
 haven't seen the bug in question, but I definitely have not run into
it
 when running mpxio.

 --Tim

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Shawn Joy
 Sent: Friday, December 22, 2006 7:35 AM
 To: zfs-discuss@opensolaris.org
 Subject: [zfs-discuss] Re: Difference between ZFS and UFS with one LUN
 froma SAN

 OK,

 But lets get back to the original question.

 Does ZFS provide you with less features than UFS does on one LUN from
a
 SAN (i.e is it less stable).

 ZFS on the contrary checks every block it reads and is able to find
the
 mirror
 or reconstruct the data in a raidz config.
 Therefore ZFS uses only valid data and is able to repair the data
 blocks
 automatically.
 This is not possible in a traditional filesystem/volume manager
 configuration.

 The above is fine. If I have two LUNs. But my original question was if
I
 only have one LUN.

 What about kernel panics from ZFS if for instance access to one
 controller goes away for a few seconds or minutes. Normally UFS would
 just sit there and warn I have lost access to the controller. Then
when
 the controller returns, after a short period, the warnings go away and
 the LUN continues to operate. The admin can then research further into
 why the controller went away. With ZFS, the above will panic the
system
 and possibly cause other coruption  on other LUNs due to this panic? I
 believe this was discussed in other threads? I also believe there is a
 bug filed against this? If so when should we expect this bug to be
 fixed?


 My understanding of ZFS is that it functions better in an environment
 where we have JBODs attached to the hosts. This way ZFS takes care of
 all of the redundancy? But what about SAN enviroments where customers
 have spend big money to invest in storage. I know of one instance
where
 a customer has a growing need for more storage space. There environemt
 uses many inodes. Due to the UFS inode limitation, when creating LUNs
 over one TB, they would have to quadrulpe the about of storage usesd
in
 there SAN in order to hold all of the files. A possible solution to
this
 inode issue would be ZFS. However they have experienced

Re: [zfs-discuss] What SATA controllers are people using for ZFS?

2006-12-21 Thread Jason J. W. Williams

Hi Naveen,

I believe the newer LSI cards work pretty well with Solaris.

Best Regards,
Jason

On 12/20/06, Naveen Nalam [EMAIL PROTECTED] wrote:

Hi,

This may not be the right place to post, but hoping someone here is running a 
reliably working system with 12 drives using ZFS that can tell me what hardware 
they are using.

I have on order with my server vendor a pair of 12-drive servers that I want to 
use with ZFS for our company file stores. We're trying to use Supermicro PDSME 
motherboards, and each has two Supermicro MV8 sata cards. Solaris 10U3 he's 
found doesn't work on these systems. And I just read a post today (and an older 
post) on this group about how the Marvell based cards lock up. I can't afford 
lockups since this is very critical and expensive data that is being stored.

My goal is a single cpu board that works with Solaris, and somehow get 
12-drives plus 2 system boot drives plugged into it. I don't see any suitable 
sata cards on the Sun HCL.

Are there any 4-port PCIe cards that people know reliably work? The Adaptec 
1430SA looks nice, but no idea if it works. I could potentially get two 4-port 
PCIe cards, a 2 port PCI sata card (for boot), and 4-port motherboard - for 14 
drives total. And cough up the extra cash for a supported dual-cpu motherboard 
(though i'm only using one cpu).

any advice greatly appreciated..

Thanks!
Naveen


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS and SE 3511

2006-12-20 Thread Jason J. W. Williams

Hi Toby,

My understanding on the subject of SATA firmware reliability vs.
FC/SCSI is that its mostly related to SATA firmware being a lot
younger. The FC/SCSI firmware that's out there has been debugged for
10 years or so, so it has a lot fewer hiccoughs. Pillar Data Systems
told us once that they found most of their SATA failed disks were
just fine when examined, so their policy is to issue a RESET to the
drive when a SATA error is detected, then retry the write/read and
keep trucking. If they continue to get SATA errors, then they'll fail
the drive.

Looking at the latest Engenio SATA products, I believe they do the
same thing. Its probably unfair to expect defect rates out of SATA
firmware equivalent to firmware that's been around for a long
time...particularly with the price pressures on SATA. SAS may suffer
the same issue, though they seem to have 1,000,000 MTBF ratings like
their traditional FC/SCSI counterparts. On a side-note, we experienced
a path failure to a drive in our SATA Engenio array (older model),
simply popping the drive out and back in fixed the issue...haven't had
any notifications since. A RESET and RETRY would have been nice
behavior to have, since popping and reinserting triggered a rebuild of
the drive.

Best Regards,
Jason

On 12/19/06, Toby Thain [EMAIL PROTECTED] wrote:


On 19-Dec-06, at 2:42 PM, Jason J. W. Williams wrote:

 I do see this note in the 3511 documentation: Note - Do not use a
 Sun StorEdge 3511 SATA array to store single instances of data. It
 is more suitable for use in configurations where the array has a
 backup or archival role.

 My understanding of this particular scare-tactic wording (its also in
 the SANnet II OEM version manual almost verbatim) is that it has
 mostly to do with the relative unreliability of SATA firmware versus
 SCSI/FC firmware.

That's such a sad sentence to have to read.

Either prices are unrealistically low, or the revenues aren't being
invested properly?

--Toby

 Its possible that the disks are lower quality SATA
 disks too, but that was not what was relayed to us when we looked at
 buying the 3511 from Sun or the DotHill version (SANnet II).


 Best Regards,
 Jason
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS in a SAN environment

2006-12-20 Thread Jason J. W. Williams

Not sure. I don't see an advantage to moving off UFS for boot pools. :-)

-J

On 12/20/06, James C. McPherson [EMAIL PROTECTED] wrote:

Jason J. W. Williams wrote:
 I agree with others here that the kernel panic is undesired behavior.
 If ZFS would simply offline the zpool and not kernel panic, that would
 obviate my request for an informational message. It'd be pretty darn
 obvious what was going on.

What about the root/boot pool?


James C. McPherson
--
Solaris kernel software engineer, system admin and troubleshooter
   http://www.jmcp.homeunix.com/blog
Find me on LinkedIn @ http://www.linkedin.com/in/jamescmcpherson


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: ZFS and SE 3511

2006-12-19 Thread Jason J. W. Williams

I do see this note in the 3511 documentation: Note - Do not use a Sun StorEdge 3511 
SATA array to store single instances of data. It is more suitable for use in 
configurations where the array has a backup or archival role.


My understanding of this particular scare-tactic wording (its also in
the SANnet II OEM version manual almost verbatim) is that it has
mostly to do with the relative unreliability of SATA firmware versus
SCSI/FC firmware. Its possible that the disks are lower quality SATA
disks too, but that was not what was relayed to us when we looked at
buying the 3511 from Sun or the DotHill version (SANnet II).


Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS in a SAN environment

2006-12-19 Thread Jason J. W. Williams

 Shouldn't there be a big warning when configuring a pool
 with no redundancy and/or should that not require a -f flag ?

why?  what if the redundancy is below the pool .. should we
warn that ZFS isn't directly involved in redundancy decisions?


Because if the host controller port goes flaky and starts introducing
checksum errors at the block level (a lady a few weeks ago reported
this) ZFS will kernel panic, and most users won't expect it.  Users
should be warned it seems to me to the real possibility of a kernel
panic if they don't implement redundancy at the zpool level. Just my 2
cents.

Best Regards,
Jason
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >