Re: [zfs-discuss] pool layout vs resilver times

2013-01-05 Thread Dave Pooser
On 1/5/13 11:42 AM, Russ Poyner rpoy...@engr.wisc.edu wrote:

I'm configuring a box with 24x 3Tb consumer SATA drives

snip

The box is a supermicro with 36 bays controlled through a single LSI
9211-8i.

My recollection is that it's far from best practice to have SATA drives
connected to a SAS expander; better to either use SAS drives or use one of
the Supermicro chassis designs that doesn't use expanders in their
backplanes and control the drives with multiple LSI cards. If you've
already purchased the configuration as described, you may be a little
stuck with it, but my understanding is that the combination of SATA drives
and SAS expanders is a large economy-sized bucket of pain.
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sonnet Tempo SSD supported?

2012-12-03 Thread Dave Pooser
On 12/3/12 5:28 PM, Peter Tripp pe...@psych.columbia.edu wrote:

This product only makes sense if you're trying to run OpenIndiana on a
Mac Pro, which in my experience is more trouble than it's worth, but to
each their own I guess.

I could make a case for it in some other environments. Say you're using a
SuperMicro 4U chassis with 24x3.5 drives split into two zpools and you'd
like to use SSDs for L2ARC and ZIL. If you mirror each ZIL and use single
drives for each L2ARC, that's 6 drive bays you'd be sacrificing-- or you
could use 3 PCI slots, which might be available depending on your
configuration, and lets you combine nearline SAS hard drives (to play
nicely with SAS expanders) and SATA SSDs (because SAS SSDs are painfully
expensive).

Obviously, this all depends on the controller in use on the cards-- I'll
probably be getting one to play with in the Jan-Feb timeframe, but as of
now I have no knowledge of that subject.
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-16 Thread Dave Pooser
On 9/16/12 10:40 AM, Richard Elling richard.ell...@gmail.com wrote:

With a zvol of 8K blocksize, 4K sector disks, and raidz you will get 12K
(data
plus parity) written for every block, regardless of how many disks are in
the set.
There will also be some metadata overhead, but I don't know of a metadata
sizing formula for the general case.

So the bad news is, 4K sector disks with small blocksize zvols tend to
have space utilization more like mirroring. The good news is that
performance
is also more like mirroring.
 -- richard

Ok, that makes sense. And since there's no way to change the blocksize of
a zvol after creation (AFAIK) I can either live with the size, find 3TB
drives with 512byte sectors (I think Seagate Constellations would work)
and do yet another send/receive, or create a new zvol with a larger
blocksize and copy the files from one zvol to the other. (Leaning toward
option 3 because the files are mostly largish graphics files and the like.)

Thanks for the help!
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-15 Thread Dave Pooser
 The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
 That... doesn't look right. (Comparing zfs list -t snapshot and looking at
 the 5.34 ref for the snapshot vs zfs list on the new system and looking at
 space used.)
 
 Is this a problem? Should I be panicking yet?

Well, the zfs send/receive finally finished, at a size of 9.56TB (apologies
for the HTML, it was the only way I could make the columns readable):

root@archive:/home/admin# zfs get all archive1/RichRAID
NAMEPROPERTY  VALUE  SOURCE
archive1/RichRAID   type  volume -
archive1/RichRAID   creation  Fri Sep 14  4:17 2012  -
archive1/RichRAID   used  9.56T  -
archive1/RichRAID   available 1.10T  -
archive1/RichRAID   referenced9.56T  -
archive1/RichRAID   compressratio 1.00x  -
archive1/RichRAID   reservation   none   default
archive1/RichRAID   volsize   5.08T  local
archive1/RichRAID   volblocksize  8K -
archive1/RichRAID   checksum  on default
archive1/RichRAID   compression   offdefault
archive1/RichRAID   readonly  offdefault
archive1/RichRAID   copies1  default
archive1/RichRAID   refreservationnone   default
archive1/RichRAID   primarycache  alldefault
archive1/RichRAID   secondarycachealldefault
archive1/RichRAID   usedbysnapshots   0  -
archive1/RichRAID   usedbydataset 9.56T  -
archive1/RichRAID   usedbychildren0  -
archive1/RichRAID   usedbyrefreservation  0  -
archive1/RichRAID   logbias   latencydefault
archive1/RichRAID   dedup offdefault
archive1/RichRAID   mlslabel  none   default
archive1/RichRAID   sync  standard   default
archive1/RichRAID   refcompressratio  1.00x  -
archive1/RichRAID   written   9.56T  -

So used is 9.56TB, volsize is 5.08TB (which is the amount of data used on
the volume). The Mac connected to the FC target sees a 5.6TB volume with
5.1TB used, so that makes sense-- but where did the other 4TB go?

(I'm about at the point where I'm just going to create and export another
volume on a second zpool and then let the Mac copy from one zvol to the
other-- this is starting to feel like voodoo here.)
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-14 Thread Dave Pooser
I need a bit of a sanity check here.

1) I have a a RAIDZ2 of 8 1TB drives, so 6TB usable, running on an ancient
version of OpenSolaris (snv_134 I think). On that zpool (miniraid) I have
a zvol (RichRAID) that's using almost the whole FS. It's shared out via
COMSTAR Fibre Channel target mode. I'd like to move that zvol to a newer
server with a larger zpool. Sounds like a job for ZFS send/receive, right?

2) Since ZFS send/receive is snapshot-based I need to create a snapshot.
Unfortunately I did not realize that zvols require disk space sufficient
to duplicate the zvol, and my zpool wasn't big enough. After a false start
(zpool add is dangerous when low on sleep) I added a 250GB mirror and a
pair of 3GB mirrors to miniraid and was able to successfully snapshot the
zvol: miniraid/RichRAID@exportable (I ended up booting off an OI 151a5 USB
stick to make that work, since I don't believe snv_134 could handle a 3TB
disk).

3) Now it's easy, right? I enabled root login via SSH on the new host,
which is running a zpool archive1 consisting of a single RAIDZ2 of 3TB
drives using ashift=12, and did a ZFS send:
ZFS send miniraid/RichRAID@exportable | ssh root@newhost zfs receive
archive1/RichRAID

It asked for the root password, I gave it that password, and it was off
and running. GigE ain't super fast, but I've got time.

The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
That... doesn't look right. (Comparing zfs list -t snapshot and looking at
the 5.34 ref for the snapshot vs zfs list on the new system and looking at
space used.)

Is this a problem? Should I be panicking yet?

-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Question on 4k sectors

2012-07-18 Thread Dave U . Random
Hi. Is the problem with ZFS supporting 4k sectors or is the problem mixing
512 byte and 4k sector disks in one pool, or something else? I have seen
alot of discussion on the 4k issue but I haven't understood what the actual
problem ZFS has with 4k sectors is. It's getting harder and harder to find
large disks with 512 byte sectors so what should we do? TIA...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Zombie damaged zpool won't die

2012-05-29 Thread Dave Pooser
In the beginning, I created a mirror named DumpFiles on FreeBSD. Later, I
decided to move those drives to a new Solaris 11 server-- but rather than
import the old pool I'd create a new pool. And I liked the DumpFiles name,
so I stuck with it.

Oops.

Now whenever I run zpool import, it shows a faulted zpool that I can't
import and can't delete:
root@backbone:/home/dpooser# zpool import
  pool: DumpFiles
id: 16375225052759912554
 state: FAULTED
status: The pool was last accessed by another system.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
   see: http://www.sun.com/msg/ZFS-8000-EY
config:

DumpFilesFAULTED  corrupted data
  mirror-0   ONLINE
c8t5000C5001B03A749d0p0  ONLINE
c9t5000C5001B062211d0p0  ONLINE

I deleted the new DumpFiles pool; no change. The -f flag doesn't help with
the import, and I've deleted the zpool.cache and rebooted without any
luck. Any suggestions appreciated-- there is no data on those drives that
I'm worried about, but I'd like to get rid of that error.

-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] 4k sector support in Solaris 11?

2012-02-16 Thread Dave Pooser
If I want to use a batch of new Seagate 3TB Barracudas with Solaris 11,
will zpool let me create a new pool with ashift=12 out of the box or will
I need to play around with a patched zpool binary (or the iSCSI loopback)?
-- 
Dave Pooser
Manager of Information Services
Alford Media http://www.alfordmedia.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS on Dell with FreeBSD

2011-10-19 Thread Dave Pooser
On 10/19/11 9:14 AM, Albert Shih albert.s...@obspm.fr wrote:

When we buy a MD1200 we need a RAID PERC H800 card on the server

No, you need a card that includes 2 external x4 SFF8088 SAS connectors.
I'd recommend an LSI SAS 9200-8e HBA flashed with the IT firmware-- then
it presents the individual disks and ZFS can handle redundancy and
recovery.
-- 
Dave Pooser
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZIL, L2ARC, rpool -- partitions and mirrors oh my!

2011-07-07 Thread Dave Pooser
Putting together a server for a friend's recording studio. He's planning
to do audio editing off the server, so low latency is a big deal. My plan
is to create a pool of two 8-drive RAIDZ2 vdevs and then accelerate
them... But how?

OS if going to be latest OpenIndiana. I have a pair of 40GB SSDs (Crucial)
with good write speeds and a pair of 64GB SSDs ( with good read speeds.
I'd like to mirror the root pool.

My initial thought was mirror the 40GB SSDs for the ZIL and partition the
two 64s; mirror two slices for the rpool and two slices for the L2ARC. If
there's a smarter way to do it, suggestions gratefully accepted. My
current ZFS storage servers are all built around sustained reads/sustained
writes, so tuning the ZIL and L2ARC are still outside my experience.
-- 
Dave Pooser
Manager of Information Services
Alford Media Services, Inc.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-23 Thread Dave U . Random
Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com
wrote:

 Well ... 
 Slice all 4 drives into 13G and 60G.
 Use a mirror of 13G for the rpool.
 Use 4x 60G in some way (raidz, or stripe of mirrors) for tank
 Use a mirror of 13G appended to tank

Hi Edward! Thanks for your post. I think I understand what you are saying
but I don't know how to actually do most of that. If I am going to make a
new install of Solaris 10 does it give me the option to slice and dice my
disks and to issue zpool commands? Until now I have only used Solaris on
Intel with boxes and used both complete drives as a mirror.

Can you please tell me what are the steps to do your suggestion?

I imagine I can slice the drives in the installer and then setup a 4 way
root mirror (stupid but as you say not much choice) on the 13G section. Or
maybe one root mirror on two slices and then have 13G aux storage left to
mirror for something like /var/spool? What would you recommend? I didn't
understand what you suggested about appending a 13G mirror to tank. Would
that be something like RAID10 without actually being RAID10 so I could still
boot from it? How would the system use it?

In this setup that will install everything on the root mirror so I will
have to move things around later? Like /var and /usr or whatever I don't
want on the root mirror? And then I just make a RAID10 like Jim was saying
with the other 4x60 slices? How should I move mountpoints that aren't
separate ZFS filesystems?

 The only conclusion you can draw from that is:  First take it as a given
 that you can't boot from a raidz volume.  Given, you must have one mirror.

Thanks, I will keep it in mind.

 Then you raidz all the remaining space that's capable of being put into a
 raidz...  And what you have left is a pair of unused space, equal to the
 size of your boot volume.  You either waste that space, or you mirror it
 and put it into your tank.

So RAID10 sounds like the only reasonable choice since there are an even
number of slices, I mean is RAIDZ1 even possible with 4 slices?
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-22 Thread Dave U . Random
Hello!

 I don't see the problem. Install the OS onto a mirrored partition, and
 configure all the remaining storage however you like - raid or mirror or
 watever. 

I didn't understand your point of view until I read the next paragraph.

 My personal preference, assuming 4 disks, since the OS is mostly reads and
 only a little bit of writes, is to create a 4-way mirrored 100G partition
 for the OS, and the remaining 900G of each disk (or whatever) becomes
 either a stripe of mirrors or raidz, as appropriate in your case, for the
 storagepool.

Oh, you are talking about 1T drives and my servers are all 4x73G! So it's a
fairly big deal since I have little storage to waste and still want to be
able to survive losing one drive. I should have given the numbers at the
beginning, sorry. Given this meager storage do you have any suggestions?
Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server with 4 drives, how to configure ZFS?

2011-06-21 Thread Dave U . Random
Hello Jim! I understood ZFS doesn't like slices but from your reply maybe I
should reconsider. I have a few older servers with 4 bays x 73G. If I make a
root mirror pool and swap on the other 2 as you suggest, then I would have
about 63G x 4 left over. If so then I am back to wondering what to do about
4 drives. Is raidz1 worthwhile in this scenario? That is less redundancy
that a mirror and much less than a 3 way mirror, isn't it? Is it even
possible to do raidz2 on 4 slices? Or would 2, 2 way mirrors be better? I
don't understand what RAID10 is, is it simply a stripe of two mirrors? Or
would it be best to do a 3 way mirror and a hot spare? I would like to be
able to tolerate losing one drive without loss of integrity.

I will be doing new installs of Solaris 10. Is there an option in the
installer for me to issue ZFS commands and set up pools or do I need to
format the disks before installing and if so how do I do that? Thank you.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is another drive worth anything? [Summary]

2011-06-02 Thread Dave U . Random
Many thanks to all who responded. I learned a lot from this thread! For now
I have decided to make a 3 way mirror because of the read performance. I
don't want to take a risk on an unmirrored drive.

Instead of replying to everyone separately I am following the Sun Managers
system since I read that newsgroup occasionalliy also. Here's a summary of
the responses.

Jim Klimov wrote:

 Well, you can use this drive as a separate scratch area, as a separate
 single-disk pool, without redundancy. You'd have a separate spindle for
 some dedicated tasks with data you're okay with losing.

I thought about that and I really don't like losing data. I also don't
generate much temporary data so I love ZFS because it makes mirroring
easy. On my other systems where I don't have ZFS I run hourly backups from
drive to drive. Consumer drives are pretty good these days but you never
know when one will fail. I had a failure recently on a Linux box and
although I didn't lose data because I back up hourly it's still annoying to
deal with. If I hadn't had another good drive with that data on it I would
have lost critical data.
 
 You can also make the rpool a three-way mirror which may increase read
 speeds if you have enough concurrentcy. And when one drive breaks, your
 rpool is still mirrored. 

I think that's the best suggestion. I didn't realize a 3 way mirror would
help performance but you and several others said it does, so that's what I
will do. Thanks for the suggestions, Jim.


Roy pointed out a theoretical 50% read increase when adding the third drive.

Thanks Roy!


Edward Ned Harvey wrote:

 In my benchmarking, I found 2-way mirror reads 1.97x the speed of a single
 disk, and a 3-way mirror reads 2.91x a single disk.

Always great having hard data to base a decision on! That helped me make my
decision! Thanks Edward!


Jim Klimov answered a question that came up based on comments that read
performance was improved in a three way mirror:

 Writes in a mirror are deemed to be not faster than the slowest disk - all
 two or three drives must commit a block before it is considered written
 (in sync write mode), likewise for TXG sync but with some optimization by
 caching and write-coalescing.

Thanks Jim! Good to know.


Edward Ned Harvey pointed out If you make it a 3-way mirror, your write
performance will be unaffected, but your read performance will increase 50%
over a 2-way mirror.  All 3 drives can read different data simultaneously
for the net effect of 3x a single disk read performance.

Bob clarified the theoretical benefit of adding a third drive to a mirror by
saying I think that a read performance increase of (at most) 33.3% is more
correct.  You might obtain (at most) 50% over one disk by mirroring it. Zfs
makes a random selection of which disk to read from in a mirror set so the
improvement is not truely linear.

Thanks guys, that makes sense.


Daniel Carosone suggested keeping the extra drive around in case of a
failure and in the meantime using an SSD in the 3rd SATA slot. He pointed
out a few other options that could help with performance besides creating a
3 way mirror when he wrote: 

 Namely, leave the third drive on the shelf as a cold spare, and use the
 third sata connector for an ssd, as L2ARC, ZIL or even possibly both
 (which will affect selection of which device to use).

That's not an option for me right now but I am planning to revisit SSD again
when the consumer drives are reliable enough and don't have wear issues.
Right now overall integrity and long service life are more important
than absolute performance on this box, although since I have the integrity
with the ZFS mirror I could add an SSD but I really don't want to deal with
another failure as long as I don't have to. I do want additional performance
if I can afford it, but not at the expense of possible data loss.

Daniel also wrote: 

 L2ARC is likely to improve read latency (on average) even more than a
 third submirror.  ZIL will be unmirrored, but may improve writes at an
 acceptable risk for development system.  If this risk is acceptable, you
 may wish to consider whether setting sync=disabled is also acceptable at
 least for certain datasets. 

I don't know what L2ARC is, but I'll take a look on the net. I did hear
about ZIL but don't understand it fully, but I figured spending 500G on ZIL
would be unwise. By that I mean I understand ZIL doesn't require much
storage but if I don't have an identical drive I can't add a drive or slice
with less storage than the other drives in a mirror to that mirror, so I
would be forced to waste a lot of storage to implement ZIL.

 Finally, if you're considering spending money, can you increase the RAM
 instead?  If so, do that first. 

This mobo is maxed out at 4G, it's a socket 775 I bought a couple of years
ago. I have always seen the benefits to more RAM and I agree with you it
helps more than people generally believe. Next time I buy a new box I am
hoping to go with 8 to 16G although on 

Re: [zfs-discuss] Good SLOG devices?

2011-03-02 Thread Dave Pooser
On 3/2/11 9:42 AM, David Dyer-Bennet d...@dd-b.net wrote:

Says call for price.  I know what that means, it means If you have to
ask, you can't afford it.

I called. It's $3k -- not a fit for my archive servers, but an interesting
idea for a database server I'm building

Probably not a great product for the home hobbyist, though.  :^)
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-28 Thread Dave Pooser
On 2/27/11 11:13 PM, James C. McPherson j...@opensolaris.org wrote:

/pci@0,0/pci8086,340c@5/pci1000,3020@0
and
/pci@0,0/pci8086,340e@7/pci1000,3020@0

which are in different slots on your motherboard and connected to
different PCI Express Root Ports - which should help with transfer
rates amongst other things. Have a look at /usr/share/hwdata/pci.ids
for 340[0-9a-f] after the line which starts with 8086.

That's the information I needed; I now have the drives allocated across
multiple controllers for the fault-tolerance I was looking for.

Thanks for all your help-- not only can I fully, unequivocally retract my
failed bit crack, but I just ordered two more of these cards for my next
project!  :^)
--
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com http://www.alfordmedia.com/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-28 Thread Dave Pooser
On 2/28/11 4:23 PM, Garrett D'Amore garr...@nexenta.com wrote:

Drives are ordered in the order they are *enumerated* when they *first*
show up in the system.  *Ever*.

Is the same true of controllers? That is, will c12 remain c12 or
/pci@0,0/pci8086,340c@5 remain /pci@0,0/pci8086,340c@5 even if other
controllers are active?
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-27 Thread Dave Pooser
On 2/27/11 5:15 AM, James C. McPherson j...@opensolaris.org wrote:

On 27/02/11 05:24 PM, Dave Pooser wrote:
On 2/26/11 7:43 PM, Bill Sommerfeldsommerf...@hamachi.org  wrote:

On your system, c12 is the mpxio virtual controller; any disk which is
potentially multipath-able (and that includes the SAS drives) will
appear as a child of the virtual controller (rather than appear as the
child of two or more different physical controllers).

Hmm... That makes sense, except that my drives are all SATA because I'm
cheap^H^H^H fiscally conservative.  :^)

They're attached to a SAS hba, which is doing translations for them
using SATL - SAS to ATA Translation Layer.

Yeah, but they're still not multipathable, are they?

'stmsboot -L' displayed no mappings,

this is because mpt_sas(7d) controllers - which you have - are using
MPxIO by default. Running stmsboot -L will only show mappings if you've
enabled or disabled MPxIO

 but I went ahead and tried stmsboot
-d to disable multipathing;

... and now you have disabled MPxIO, stmsboot -L should show mappings.

Nope:
locadmin@bigdawg2:~# stmsboot -L
stmsboot: MPXIO disabled

after reboot instead of seeing nine disks on a
single controller I now see ten different controllers (in a machine that
has four PCI controllers and one motherboard controller):

This is a side effect of how your expanders are configured to operate
on your motherboard.

But there shouldn't be any expanders in the system-- the front backplane
has six SFF-8087 ports to control 24 drives, and the rear backplane has
three more SFF-8087 ports to control 12 more drives. Each of those ports
is connected directly to an SFF-8087 port on an LSI 9211-8i controller,
except that the ninth port is connected to the integrated LSI 2008
controller on the motherboard.

If you're lucky, your expanders and the enclosure that they're
configured into will show up with one or more SES targets. If
that's the case, you might be able to see bay numbers with the
fmtopo command - when you run it as root:

# /usr/lib/fm/fmd/fmtopo -V

If this doesn't work for you, then you'll have to resort to the
tried and tested use of dd to /dev/null for each disk, and see
which lights blink.

I can live with that-- but I really want to know what (real, not virtual)
controllers disks are connected to; I want to build 3 8-disk RAIDz2 vdevs
now (with room for a fourth for expansion later) and I really want to make
sure each of those vdevs has fewer than three disks per controller so a
single controller failure can degrade my vdevs but not kill them.

Probably my next step is going to be to take a look with Nexenta Core or
FreeBSD (or maybe SolEx11 for a temporary eval) and see if either of those
gives me a saner view, but other suggestions would be appreciated.
--
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com http://www.alfordmedia.com/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-27 Thread Dave Pooser
On 2/27/11 11:18 AM, Roy Sigurd Karlsbakk r...@karlsbakk.net wrote:

I cannot but agree. On Linux and Windoze (haven't tested FreeBSD), drives
connected to an LSI9211 show up in the correct order, but not on
OI/osol/S11ex (IIRC), and fmtopo doesn't always show a mapping between
device name and slot, since that relies on the SES hardware being
properly supported. The answer I've got for this issue is, it's not an
issue, since it's that way by design etc. This doesn't make sense when
Linux/Windows show the drives in the correct order. IMHO this looks more
like a design flaw in the driver code

Especially since the SAS3081 cards work as expected. I guess I'll start
looking for some more of the 3Gb SAS controllers and chalk the 9211s up as
a failed bit.
--
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com http://www.alfordmedia.com/




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-27 Thread Dave Pooser
On 2/27/11 4:07 PM, James C. McPherson j...@opensolaris.org wrote:

I misread your initial email, sorry.

No worries-- I probably could have written it more clearly.

So your disks are connected to separate PHYs on the HBA, by virtue
of their cabling. You can see this for yourself by looking at the
iport@xx element in the physical paths:

1. c13t5000CCA222DF92A0d0
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0

2. c14t5000CCA222DF8FBEd0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0

The xx part is a bitmask, starting from 0, which gives you an
indication of which PHY the device is attached to.

Your disk #1 above is connected to iport@10, which is PHY #4 when
you have x1 ports:


PHY  iport@
01
12
24
38
410
520
640
780

OK, bear with me for a moment because I'm feeling extra dense this evening.

The PHY tells me which port on the HBA I'm connected to. What tells me
which HBA? That's the information I care most about, and if that
information is contained up there I'll do a happy dance and head on in to
the office to start building zpools.


With the information above about the PHY/iport relationship, I
hope you can now see better what your physical layout is. Also,
please remember that using MPxIO means you have a single virtual
controller, and the driver stack handles the translation to physical
for you so you don't have to worry about that aspect. Of course,
if you want to worry about it, feel free.

Well, I want to make sure that a single controller failure can't cause any
of my RAIDz2 vdevs to fault. I know I can do that manually by building the
vdevs in such a way that no more than two drives are on a single
controller. If the virtual controller is smart enough to do that
automagically-- when I'm using SATA disks and a backplane that doesn't
support multipathing-- then I have no complaints and I owe you a beer or
three the next time you're in the Dallas area. But that seems unlikely to
me, and so I think I have to worry about it. I'd love to be wrong, though!

Personally, having worked on the mpt_sas(7d) project, I'm disappointed
that you believe the card and its driver are a failed bit.


I'd like to revise and extend my remarks and replace that with a
suboptimal choice for this project. In fact, if I can't make this work my
backup plan is to take some of my storage towers that have only one HBA,
put the 9211s in them and grab the LSISAS3081 cards out of those towers
for this beast. So those cards will still get productive use -- not a
failed bit, at worst just not serving the purpose I had in mind.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-27 Thread Dave Pooser
On 2/27/11 10:06 PM, James C. McPherson j...@opensolaris.org wrote:

I've arranged these by devinfo path:

1st controller

c10t2d0   
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0
c15t5000CCA222E006B6d0
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@8/disk@w5000cca222e006b6,0
c13t5000CCA222DF92A0d0
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0
c12t5000CCA222E0533Fd0
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@20/disk@w5000cca222e0533f,0

The most likely reason why you're seeing a c10t2d0 is because the
disk is failing to respond in the required fashion for a particular
SCSI INQUIRY command when the disk is attached to the system.

That's an inexpensive SSD used as the boot disk, so it's different enough
from the other devices I can't say I'm stunned that it behaves differently.

2nd controller
c16t5000CCA222DDD7BAd0
/pci@0,0/pci8086,340c@5/pci1000,3020@0/iport@2/disk@w5000cca222ddd7ba,0


3rd controller
c14t5000CCA222DF8FBEd0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0
c18t5000CCA222DEAFE6d0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@2/disk@w5000cca222deafe6,0
c19t5000CCA222E0A3DEd0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@4/disk@w5000cca222e0a3de,0
c20t5000CCA222E046B7d0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@8/disk@w5000cca222e046b7,0
c17t5000CCA222DF3CECd0
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@20/disk@w5000cca222df3cec,0

So I mentioned I'm dense tonight, right? Is the key there where it says
340x@y, so each controller will have a different letter associated
with it and a different number after the @? (That is, presumably in this
system there's a 340b@4 and a 340d@6 if I add more drives and try 'format'
again?)

I'd like to revise and extend my remarks and replace that with a
suboptimal choice for this project.
Not knowing your other requirements for the project, I'll settle
for this version :)


Actually at this point I think I have to re-revise it to just fine for
this project had I brains enough to comprehend the output of 'format'.
:^)
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Format returning bogus controller info

2011-02-26 Thread Dave Pooser
The hardware: SuperMicro 847A chassis (3 drive bays in 4U) -- A means
there are 9 SFF-8087 ports on the backplanes, each controlling 4 drives;
no expanders here.
SuperMicro X8DTH-6F motherboard with integrated LSI 2008 SAS chipset,
flashed to IT firmware, connected to one backplane port.
Four LSI 9211-8i SAS controllers, flashed to IT firmware, each connected
to two backplane ports

The OS: OpenSolaris b134, installed off a USB stick created using the
instructions at 
http://blogs.sun.com/clayb/entry/creating_opensolaris_usb_sticks_is

The problem:
 While trying to add drives one at a time so I can identify them for later
use, I noticed two interesting things: the controller information is
unlike any I've seen before, and out of nine disks added after the boot
drive all nine are attached to c12 -- and no single controller has more
than eight ports.

The output of format:
locadmin@bigdawg2:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c10t2d0 DEFAULT cyl 9965 alt 2 hd 224 sec 56
  /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0
   1. c12t5000CCA222DDD7BAd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222ddd7ba
   2. c12t5000CCA222DEAFE6d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222deafe6
   3. c12t5000CCA222DF3CECd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222df3cec
   4. c12t5000CCA222DF8FBEd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222df8fbe
   5. c12t5000CCA222DF92A0d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222df92a0
   6. c12t5000CCA222E0A3DEd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222e0a3de
   7. c12t5000CCA222E006B6d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222e006b6
   8. c12t5000CCA222E046B7d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222e046b7
   9. c12t5000CCA222E0533Fd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  /scsi_vhci/disk@g5000cca222e0533f
Specify disk (enter its number): ^C


Any suggestions?
--
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Format returning bogus controller info

2011-02-26 Thread Dave Pooser
On 2/26/11 7:43 PM, Bill Sommerfeld sommerf...@hamachi.org wrote:

On your system, c12 is the mpxio virtual controller; any disk which is
potentially multipath-able (and that includes the SAS drives) will
appear as a child of the virtual controller (rather than appear as the
child of two or more different physical controllers).

Hmm... That makes sense, except that my drives are all SATA because I'm
cheap^H^H^H fiscally conservative.  :^)

'stmsboot -L' displayed no mappings, but I went ahead and tried stmsboot
-d to disable multipathing; after reboot instead of seeing nine disks on a
single controller I now see ten different controllers (in a machine that
has four PCI controllers and one motherboard controller):

locadmin@bigdawg2:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c10t2d0 DEFAULT cyl 9965 alt 2 hd 224 sec 56
  /pci@0,0/pci8086,340a@3/pci1000,72@0/iport@4/disk@p2,0
   1. c13t5000CCA222DF92A0d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@10/disk@w5000cca222df92a0,0
   2. c14t5000CCA222DF8FBEd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@1/disk@w5000cca222df8fbe,0
   3. c15t5000CCA222E006B6d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@8/disk@w5000cca222e006b6,0
   4. c16t5000CCA222DDD7BAd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340c@5/pci1000,3020@0/iport@2/disk@w5000cca222ddd7ba,0
   5. c17t5000CCA222DF3CECd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@20/disk@w5000cca222df3cec,0
   6. c18t5000CCA222DEAFE6d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@2/disk@w5000cca222deafe6,0
   7. c19t5000CCA222E0A3DEd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@4/disk@w5000cca222e0a3de,0
   8. c20t5000CCA222E046B7d0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340e@7/pci1000,3020@0/iport@8/disk@w5000cca222e046b7,0
   9. c21t5000CCA222E0533Fd0 DEFAULT cyl 60798 alt 2 hd 255 sec 252
  
/pci@0,0/pci8086,340a@3/pci1000,72@0/iport@20/disk@w5000cca222e0533f,0


So now I'm more baffled than I started. Any other suggestions will be
gratefully accepted...

-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vidoe files residing on zfs used from cifs fail to work

2010-11-21 Thread Dave Pooser
On 11/21/10 Nov 21, 8:43 PM, Harry Putnam rea...@newsguy.com wrote:

 When *.mov file reside on a windows host, and assuming your browser
 has the right plugins, you can open them with either quicktime player
 or firefox (which also uses the quicktime player).
 
 But I find if the files are on a zfs server the same files fail to
 play.
 
 Is it a local phenomena or a common problem?

We don't have that problem, and we have roughly 25TB of QuickTime files on
an OpenSolaris box shared over CIFS to mostly Mac clients.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Apparent SAS HBA failure-- now what?

2010-11-06 Thread Dave Pooser
My setup: A SuperMicro 24-drive chassis with Intel dual-processor
motherboard, three LSI SAS3081E controllers, and 24 SATA 2TB hard drives,
divided into three pools with each pool a single eight-disk RAID-Z2. (Boot
is an SSD connected to motherboard SATA.)

This morning I got a cheerful email from my monitoring script: Zchecker has
discovered a problem on bigdawg. The full output is below, but I have one
unavailable pool and two degraded pools, with all my problem disks connected
to controller c10. I have multiple spare controllers available.

First question-- is there an easy way to identify which controller is c10?
Second question-- What is the best way to handle replacement (of either the
bad controller or of all three controllers if I can't identify the bad
controller)? I was thinking that I should be able to shut the server down,
remove the controller(s), install the replacement controller(s), check to
see that all the drives are visible, run zpool clear for each pool and then
do another scrub to verify the problem has been resolved. Does that sound
like a good plan?

===
pool: uberdisk1
 state: UNAVAIL
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool
clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub in progress for 3h7m, 24.08% done, 9h52m to go
config:

NAME STATE READ WRITE CKSUM
uberdisk1UNAVAIL 55 0 0  insufficient replicas
  raidz2 UNAVAIL112 0 0  insufficient replicas
c9t0d0   ONLINE   0 0 0
c9t1d0   ONLINE   0 0 0
c9t2d0   ONLINE   0 0 0
c10t0d0  UNAVAIL 4330 0  experienced I/O failures
c10t1d0  REMOVED  0 0 0
c10t2d0  ONLINE  74 0 0
c11t1d0  ONLINE   0 0 0
c11t2d0  ONLINE   0 0 0

errors: 1 data errors, use '-v' for a list

  pool: uberdisk2
 state: DEGRADED
 scrub: scrub in progress for 3h3m, 32.26% done, 6h24m to go
config:

NAME STATE READ WRITE CKSUM
uberdisk2DEGRADED 0 0 0
  raidz2 DEGRADED 0 0 0
c9t3d0   ONLINE   0 0 0
c9t4d0   ONLINE   0 0 0
c9t5d0   ONLINE   0 0 0
c10t3d0  REMOVED  0 0 0
c10t4d0  REMOVED  0 0 0
c11t3d0  ONLINE   0 0 0
c11t4d0  ONLINE   0 0 0
c11t5d0  ONLINE   0 0 0

errors: No known data errors

  pool: uberdisk3
 state: DEGRADED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run 'zpool
clear'.
   see: http://www.sun.com/msg/ZFS-8000-HC
 scrub: scrub in progress for 2h58m, 31.95% done, 6h19m to go
config:

NAME STATE READ WRITE CKSUM
uberdisk3DEGRADED 1 0 0
  raidz2 DEGRADED 4 0 0
c9t6d0   ONLINE   0 0 0
c9t7d0   ONLINE   0 0 0
c10t5d0  ONLINE   5 0 0
c10t6d0  ONLINE  9894 0
c10t7d0  REMOVED  0 0 0
c11t6d0  ONLINE   0 0 0
c11t7d0  ONLINE   0 0 0
c11t8d0  ONLINE   0 0 0

errors: 1 data errors, use '-v' for a list

-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apparent SAS HBA failure-- now what?

2010-11-06 Thread Dave Pooser
 associated with a ZFS device exceeded
 acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-FD
  for more information.

Response: The device has been offlined and marked as faulted.  An
attempt
 will be made to activate a hot spare if available.

Impact  : Fault tolerance of the pool may be compromised.

Action  : Run 'zpool status -x' and replace the bad device.

---   --
-
TIMEEVENT-ID  MSG-ID
SEVERITY
---   --
-
Nov 06 06:33:23 896d10f1-fa11-69bb-ae78-d18a56fd3288  ZFS-8000-HCMajor

Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=uberdisk1
  faulted but still in service
Problem in  : zfs://pool=uberdisk1
  faulty

Description : The ZFS pool has experienced currently unrecoverable I/O
failures.  Refer to http://sun.com/msg/ZFS-8000-HC for
more
  information.

Response: No automated response will be taken.

Impact  : Read and write I/Os cannot be serviced.

Action  : Make sure the affected devices are connected, then run
'zpool clear'.

---   --
-
TIMEEVENT-ID  MSG-ID
SEVERITY
---   --
-
Nov 06 06:33:30 989d0590-9e27-cd11-cba5-d7dbf7127ce1  ZFS-8000-FDMajor

Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=uberdisk3/vdev=e0209de35309a6f8
  faulted but still in service
Problem in  : zfs://pool=uberdisk3/vdev=e0209de35309a6f8
  faulty

Description : The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.  Refer to
http://sun.com/msg/ZFS-8000-FD
  for more information.

Response: The device has been offlined and marked as faulted.  An
attempt
 will be made to activate a hot spare if available.

Impact  : Fault tolerance of the pool may be compromised.

Action  : Run 'zpool status -x' and replace the bad device.

---   --
-
TIMEEVENT-ID  MSG-ID
SEVERITY
---   --
-
Nov 06 06:33:51 a2d736ac-14e9-cbf7-db28-84e25bfd4a3e  ZFS-8000-HCMajor

Fault class : fault.fs.zfs.io_failure_wait
Affects : zfs://pool=uberdisk3
  faulted but still in service
Problem in  : zfs://pool=uberdisk3
  faulty

Description : The ZFS pool has experienced currently unrecoverable I/O
failures.  Refer to http://sun.com/msg/ZFS-8000-HC for
more
  information.

Response: No automated response will be taken.

Impact  : Read and write I/Os cannot be serviced.

Action  : Make sure the affected devices are connected, then run
'zpool clear'.

-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apparent SAS HBA failure-- now what?

2010-11-06 Thread Dave Pooser
 Errors: 8
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t1d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 8
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t2d0  Soft Errors: 0 Hard Errors: 2 Transport Errors: 16
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t3d0  Soft Errors: 0 Hard Errors: 3 Transport Errors: 13
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t4d0  Soft Errors: 0 Hard Errors: 2 Transport Errors: 19
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t5d0  Soft Errors: 0 Hard Errors: 1 Transport Errors: 1
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t6d0  Soft Errors: 0 Hard Errors: 2 Transport Errors: 12
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 2 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0
c10t7d0  Soft Errors: 0 Hard Errors: 0 Transport Errors: 9
Vendor: ATA  Product: Hitachi HDS72202 Revision: A20N Serial No:
Size: 2000.40GB 2000398934016 bytes
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0 Predictive Failure Analysis: 0

-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apparent SAS HBA failure-- now what?

2010-11-06 Thread Dave Pooser
On 11/6/10 Nov 6, 2:35 PM, Khushil Dep khushil@gmail.com wrote:

 Similar to what I've seen before, SATA disks in a 846 chassis with hardware
 and transport errors. Though in that occasion it was an E2 chassis with
 interposers. How long has this system been up? Is it production or can you
 offline and check all firmware on lsi controllers are up to date and match
 each other? 

It's been up for about 6 months. I can offline them.

 Do and fmdump -u UUID - V on those faults and get the serial numbers of disks
 that have failed. Trial and error unless you wrote down which went where I'm
 afraid. 

Here's the thing, though-- I'm really not at all sure it's the disks that
failed. The idea that coincidentally I'm going to have had eight of 24 disks
report major errors, all at the same time (because I scrub weekly and didn't
catch any errors last scrub), all on the same controller-- well, that seems
much less likely than the idea that I just have a bad controller that needs
replacing.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Changing vdev controller

2010-10-22 Thread Dave
I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog devices 
attached. 
I had a port go bad on one of my controllers (both are sat2-mv8), so I need to 
replace it (I have no spare ports on either card). My spare controller is a LSI 
1068 based 8 port card. 

My plan is to remove the l2arc and slog from the pool (to try and minimize any 
glitches), export the pool, change the controller, re-import and the add the 
l2arc and slog. Is that basically the correct process, or are there any tips 
for avoiding potential issues?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disk space monitoring with SNMP

2010-10-02 Thread Dave
I just query for the percentage in use via snmp (net-snmp)

In my snmpd.conf I have:
extend .1.3.6.1.4.1.2021.60 drive15 /usr/gnu/bin/sh /opt/utils/zpools.ksh rpool 
space


and the zpools.ksh is:

#!/bin/ksh
export PATH=/usr/bin:/usr/sbin:/sbin
export LD_LIBRARY_PATH=/usr/lib
zpool list -H -o capacity ${1} | sed -e 's/%//g'
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] dedup status

2010-09-30 Thread Dave
Can you provide some specifics to see how bad the writes are?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Opensolaris is apparently dead

2010-08-14 Thread Dave Pooser
On 8/14/10 Aug 14, 2:57 PM, Edward Ned Harvey sh...@nedharvey.com wrote:

 Or Btrfs. It may not be ready for production now, but it could become a
 serious alternative to ZFS in one year's time or so. (I have been using
 
 I will much sooner pay for sol11 instead of use btrfs.  Stability  speed 
 maturity greatly outweigh a few hundred dollars a year, if you run your
 business on it.

Flip side is that if Oracle convinces enough people that ZFS is a shrinking
market (how long do you think the BSDs will support a proprietary
filesystem?) then there will be a lot more interest in the BTRFS project,
much of it from the same folks who have experience producing
enterprise-grade ZFS. Speaking for myself, if Solaris 11 doesn't include
COMSTAR I'm going to have to take a serious look at another alternative for
our show storage towers
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread Dave Pacheco

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages. 
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?



Is it possible that snapshots were renamed on the sending pool during 
the send operation?


-- Dave


--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Problems with big ZFS send/receive in b134

2010-08-10 Thread Dave Pacheco

David Dyer-Bennet wrote:

On Tue, August 10, 2010 13:23, Dave Pacheco wrote:

David Dyer-Bennet wrote:

My full backup still doesn't complete.  However, instead of hanging the
entire disk subsystem as it did on 111b, it now issues error messages.
Errors at the end.

[...]

cannot receive incremental stream: most recent snapshot of
bup-wrack/fsfs/zp1/ddb does not
match incremental source
bash-4.0$

The bup-wrack pool was newly-created, empty, before this backup started.

The backup commands were:

zfs send -Rv $srcsnap | zfs recv -Fudv $BUPPOOL/$HOSTNAME/$FS

I don't see how anything could be creating snapshots on bup-wrack while
this was running.  That pool is not normally mounted (it's on a single
external USB drive, I plug it in for backups).  My script for doing
regular snapshots of zp1 and rpool doesn't reference any of the bup-*
pools.

I don't see how this snapshot mismatch can be coming from anything but
the
send/receive process.

There are quite a lot of snapshots; dailys for some months, 2-hour ones
for a couple of weeks.  Most of them are empty or tiny.

Next time I will try WITHOUT -v on both ends, and arrange to capture the
expanded version of the command with all the variables filled in, but I
don't expect any different outcome.

Any other ideas?


Is it possible that snapshots were renamed on the sending pool during
the send operation?


I don't have any scripts that rename a snapshot (in fact I didn't know it
was possible until just now), and I don't have other users with permission
to make snapshots (either delegated or by root access).  I'm not using the
Sun auto-snapshot thing, I've got a much-simpler script of my own (hence I
know what it does).  So I don't at the moment see how one would be getting
renamed.

It's possible that a snapshot was *deleted* on the sending pool during the
send operation, however.  Also that snapshots were created (however, a
newly created one would be after the one specified in the zfs send -R, and
hence should be irrelevant).  (In fact it's certain that snapshots were
created and I'm nearly certain of deleted.)

If that turns out to be the problem, that'll be annoying to work around
(I'm making snapshots every two hours and deleting them after a couple of
weeks).  Locks between admin scripts rarely end well, in my experience. 
But at least I'd know what I had to work around.


Am I looking for too much here?  I *thought* I was doing something that
should be simple and basic and frequently used nearly everywhere, and
hence certain to work.  What could go wrong?, I thought :-).  If I'm
doing something inherently dicey I can try to find a way to back off; as
my primary backup process, this needs to be rock-solid.



It's certainly a reasonable thing to do and it should work.  There have 
been a few problems around deleting and renaming snapshots as they're 
being sent, but the delete issues were fixed in build 123 by having 
zfs_send hold snapshots being sent (as long as you've upgraded your pool 
past version 18), and it sounds like you're not doing renames, so your 
problem may be unrelated.


-- Dave

--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused about consumer drives and zfs can someone help?

2010-07-24 Thread Dave
I've been looking at using consumer 2.5 drives also, I think the ones I've 
settled on are the hitachi 7K500 500 GB. These are 7200 rpm, I'm concerned the 
5400's might be a little too low performance wise. The main reasons for hitachi 
were performance seems to be among the top 2 or 3 in the laptop drive segment, 
I've found hitachi to be pretty reliable, and perhaps most importantly is there 
is the hitachi feature tool, which allows you to disable the head unload 
feature. You don't need to set it on each reboot, plus it's persistent across 
reboots.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-15 Thread Dave Pooser
 Ok guys, can we please kill this thread about commodity versus enterprise
 hardware?

+1
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS bug - CVE-2010-2392

2010-07-15 Thread Dave Pooser
Looks like the bug affects through snv_137. Patches are available from the
usual location-- https://pkg.sun.com/opensolaris/support for OpenSolaris.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] preparing for future drive additions

2010-07-14 Thread Dave Pooser
On 7/14/10 Jul 14, 2:58 PM, Daniel Taylor dan...@kaweb.co.uk wrote:

 I was thinking of mirroring the drives and then converting to raidz some how?

Not possible. You can start with a mirror and then add another mirror; the
filesystem will spread data across both drives in a way analogous* to RAID
10.

*You can't really compare ZFS to conventional RAID implementations, but if
you look at it from 50,000 feet and squint you get the similarities.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Legality and the future of zfs...

2010-07-13 Thread Dave Pooser
On 7/12/10 Jul 12, 10:49 AM, Linder, Doug doug.lin...@merchantlink.com
wrote:

 Out of sheer curiosity - and I'm not disagreeing with you, just wondering -
 how does ZFS make money for Oracle when they don't charge for it?  Do you
 think it's such an important feature that it's a big factor in customers
 picking Solaris over other platforms?

I'm looking at a new web server for the company, and am considering Solaris
specifically because of ZFS. (Oracle's lousy sales model-- specifically the
unwillingness to give a price for a Solaris support contract without my
having to send multiple emails to multiple addresses-- may yet push me back
to my default CentOS platform, but to the extent that Oracle is even in the
running it's because of ZFS.)
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-11 Thread Dave Koelmeyer
I trimmed, and then got complained at by a mailing list user that the context 
of what I was replying to was missing. Can't win :P
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send to S7000

2010-05-26 Thread Dave Pacheco

Martijn de Munnik wrote:
I have several home directories on a Solaris server. I want to move 
these home directories to a S7000 storage. I know I can use zfs send | 
zfs receive to move zfs filesystems. Can this be done to a S7000 storage 
using ssh?



No. Check out the shadow migration feature, described in the 
administration guide:


http://wikis.sun.com/display/FishWorks/Documentation

-- Dave

--
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance drop during scrub?

2010-05-02 Thread Dave Pooser
On 5/2/10 3:12 PM, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote:

 On the flip-side, using 'zfs scrub' puts more stress on the system
 which may make it more likely to fail.  It increases load on the power
 supplies, CPUs, interfaces, and disks.  A system which might work fine
 under normal load may be stressed and misbehave under scrub.  Using
 scrub on a weak system could actually increase the chance of data
 loss.

If my system is going to fail under the stress of a scrub, it's going to
fail under the stress of a resilver. From my perspective, I'm not as scared
of data corruption as I am of data corruption *that I don't know about.* I
only keep backups for a finite amount of time. If I scrub every week, and my
zpool dies during a scrub, then I know it's time to pull out last week's
backup, where I know (thanks to scrubbing) the data was not corrupt. I've
lived the experience where a user comes to me because he tried to open a
seven-year-old file and it was corrupt. Not a blankety-blank thing I could
do, because we only retain backup tapes for four years and the four-year-old
tape had a backup of the file post-corruption.

Data loss may be unavoidable, but that's why we keep backups. It's the
invisible data loss that makes life suboptimal.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-26 Thread Dave Pooser
On 4/26/10 10:10 AM, Richard Elling richard.ell...@gmail.com wrote:

 SAS shines with multiple connections to one or more hosts.  Hence, SAS
 is quite popular when implementing HA clusters.

So that would be how one builds something like the active/active controller
failover in standalone RAID boxes. Is there a good resource on doing
something like that with an OpenSolaris storage server? I could see that as
a project I might want to attempt.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-04-25 Thread Dave Pooser
On 4/25/10 6:07 PM, Rich Teer rich.t...@rite-group.com wrote:

 Sounds fair enough!  Let's move this to email; meanwhile, what's the
 packet sniffing incantation I need to use?  On Solaris I'd use snoop,
 but I don't htink Mac OS comes with that!

Use Wireshark (formerly Ethereal); works great for me. It does require X11
on your machine.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Re: [zfs-discuss] Mac OS X clients with ZFS server

2010-04-25 Thread Dave Pooser
On 4/25/10 6:11 PM, Rich Teer rich.t...@rite-group.com wrote:

 I tried going to that URL, but got a 404 error...  :-(  What's the correct
 one, please?

http://code.google.com/p/maczfs/
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SAS vs SATA: Same size, same speed, why SAS?

2010-04-25 Thread Dave Pooser
I'm building another 24-bay rackmount storage server, and I'm considering
what drives to put in the bays. My chassis is a Supermicro SC846A, so the
backplane supports SAS or SATA; my controllers are LSI3081E, again
supporting SAS or SATA.

Looking at drives, Seagate offers an enterprise (Constellation) 2TB 7200RPM
drive in both SAS and SATA configurations; the SAS model offers one quarter
the buffer (16MB vs 64MB on the SATA model), the same rotational speed, and
costs 10% more than its enterprise SATA twin. (They also offer a Barracuda
XT SATA drive; it's roughly 20% less expensive than the Constellation drive,
but rated at 60% the MTBF of the others and a predicted rate of
nonrecoverable errors an order of magnitude higher.)

Assuming I'm going to be using three 8-drive RAIDz2 configurations, and
further assuming this server will be used for backing up home directories
(lots of small writes/reads), how much benefit will I see from the SAS
interface?
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Identifying drives

2010-04-25 Thread Dave Pooser
I have one storage server with 24 drives, spread across three controllers
and split into three RAIDz2 pools. Unfortunately, I have no idea which bay
holds which drive. Fortunately, this server is used for secondary storage so
I can take it offline for a bit. My plan is to use zpool export to take each
pool offline and then dd to do a sustained read off each drive in turn and
watch the blinking lights to see which drive is which. In a nutshell:
zpool export uberdisk1
zpool export uberdisk2
zpool export uberdisk3
dd if=/dev/rdsk/c9t0d0 of=/dev/null
dd if=/dev/rdsk/c9t1d0 of=/dev/null
 [etc. 22 more times]
zpool import uberdisk1
zpool import uberdisk2
zpool import uberdisk3

Are there any glaring errors in my reasoning here? My thinking is I should
probably identify these disks before any problems develop, in case of
erratic read errors that are enough to make me replace the drive without
being enough to make the hardware ID it as bad.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-18 Thread Dave Vrona
 
 On 18 apr 2010, at 00.52, Dave Vrona wrote:
 
  Ok, so originally I presented the X-25E as a
 reasonable approach.  After reading the follow-ups,
 I'm second guessing my statement.
  
  Any decent alternatives at a reasonable price?
 
 How much is reasonable? :-)

How about $1000 per device?  $2000 for a mirrored pair.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-18 Thread Dave Vrona
The Acard device mentioned in this thread looks interesting:

http://opensolaris.org/jive/thread.jspa?messageID=401719#401719
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-18 Thread Dave Vrona
Or, DDRDrive X1 ?  Would the X1 need to be mirrored?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-18 Thread Dave Vrona
 IMHO, whether a dedicated log device needs redundancy
 (mirrored), should
 be determined by the dynamics of each end-user
 environment (zpool version,
 goals/priorities, and budget).
 

Well, I populate a chassis with dual HBAs because my _perception_ is they tend 
to fail more than other cards.  

Please help me with my perception of the X1.  :-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] SSD best practices

2010-04-17 Thread Dave Vrona
Hi all,

I'm planning a new build based on a SuperMicro chassis with 16 bays.  I am 
looking to use up to 4 of the bays for SSD devices.

After reading many posts about SSDs I believe I have a _basic_ understanding of 
a reasonable approach to utilizing SSDs for ZIL and L2ARC.

Namely:

ZIL:  Intel X-25E
L2ARC:  Intel X-25M

So, I am somewhat unclear about a couple of details surrounding the deployment 
of these devices.

1) Mirroring.  Leaving cost out of it, should ZIL and/or L2ARC SSDs be mirrored 
?

2) ZIL write cache.  It appears some have disabled the write cache on the 
X-25E.  This results in a 5 fold performance hit but it eliminates a potential 
mechanism for data loss.  Is this valid?  If I can mirror ZIL, I imagine this 
is no longer a concern?

3) SATA devices on a SAS backplane.  Assuming the main drives are SAS, what 
impact do the SATA SSDs have?  Any performance impact?  I realize I could use 
an onboard SATA controller for the SSDs however this complicates things in 
terms of the mounting of these drives.

thanks !
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD best practices

2010-04-17 Thread Dave Vrona
Ok, so originally I presented the X-25E as a reasonable approach.  After 
reading the follow-ups, I'm second guessing my statement.

Any decent alternatives at a reasonable price?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Areca ARC-1680 on OpenSolaris 2009.06?

2010-04-09 Thread Dave Pooser
Now that Erik has made me all nervous about my 3xRAIDz2 of 8x2TB 7200RPM
disks approach, I'm considering moving forward using more and smaller 2.5
disks instead. The problem is that at eight drives per LSI 3018, I run out
of PCIe slots quickly. The ARC-1680 cards would appear to offer greater
drive densities, but a quick Google search shows that they've overpromised
and underdelivered on Solaris support in the past. Is anybody currently
using those cards on OpenSolaris?
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS: clarification on meaning of the autoreplace property

2010-03-17 Thread Dave Johnson
From pages 29,83,86,90 and 284 of the 10/09 Solaris ZFS Administration
guide, it sounds like a disk designated as a hot spare will:
1. Automatically take the place of a bad drive when needed
2. The spare will automatically be detached back to the spare
   pool when a new device is inserted and brought up to replace the
   original compromised one.

Should this work the same way for slices?

I have four active disks in a RAID 10 configuration,
for a storage pool, and the same disks are used
for mirrored root configurations, but only
only one of the possible mirrored root slice
pairs is currently active.

I wanted to designate slices on a 5th disk as
hot spares for the two existing pools, so
after partitioning the 5th disk (#4) identical
to the four existing disks, I ran:

# zpool add rpool spare c0t4d0s0
# zpool add store1 spare c0t4d0s7
# zpool status
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s0  ONLINE   0 0 0
c0t1d0s0  ONLINE   0 0 0
spares
  c0t4d0s0AVAIL

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors
--
So It looked like everything was set up how I was
hoping until I emulated a disk failure by pulling
one of the online disks. The root pool responded
how I expected, but the storage pool, on slice 7,
did not appear to perform the autoreplace:

Not too long after pulling one of the online disks:


# zpool status
  pool: rpool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress for 0h0m, 10.02% done, 0h5m to go
config:

NAMESTATE READ WRITE CKSUM
rpool   DEGRADED 0 0 0
  mirrorDEGRADED 0 0 0
c0t0d0s0ONLINE   0 0 0
spare   DEGRADED84 0 0
  c0t1d0s0  REMOVED  0 0 0
  c0t4d0s0  ONLINE   0 084  329M resilvered
spares
  c0t4d0s0  INUSE currently in use

errors: No known data errors

  pool: store1
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
store1ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t0d0s7  ONLINE   0 0 0
c0t1d0s7  ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c0t2d0s7  ONLINE   0 0 0
c0t3d0s7  ONLINE   0 0 0
spares
  c0t4d0s7AVAIL

errors: No known data errors

I was able to convert the state of store1 to DEGRADED by
writing to a file in that storage pool, but it always listed
the spare as available. This at the same time as showing
c0t1d0s7 as REMOVED in the same pool

Based on the manual, I expected the system to bring a
reinserted disk back on line automatically, but zpool status
still showed it as REMOVED. To get it back on line:

# zpool detach rpool c0t4d0s0
# zpool clear rpool
# zpool clear store1

Then status showed *both* pools resilvering. So the questions are:

1. Does autoreplace work on slices, or just complete disks?
2. Is there a problem replacing a bad disk with the same disk
   to get the autoreplace function to work?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: clarification on meaning of the autoreplace propert

2010-03-17 Thread Dave Johnson
 Hi Dave,
 
 I'm unclear about the autoreplace behavior with one
 spare that is
 connected to two pools. I don't see how it could work
 if the autoreplace 
 property is enabled on both pools, which formats and
 replaces a spare

Because I already partitioned the disk into slices. Then
I indicated the proper slice as the spare.

 disk that might be in-use in another pool (?) Maybe I
 misunderstand.
 
 1. I think autoreplace behavior might be inconsistent
 when a device is
 removed. CR 6935332 was filed recently but is not
 available yet through
 our public bug database.
 
 2. The current issue with adding a spare disk to a
 ZFS root pool is that 
 if a root pool mirror disk fails and the spare kicks
 in, the bootblock
 is not applied automatically. We're working on
 improving this
 experience.

While the bootblock may not have been applied automatically,
the root pool did show resilvering, but the storage pool
did not (at least per the status report)

 
 My advice would be to create a 3-way mirrored root
 pool until we have a
 better solution for root pool spares.

That would be sort of a different topic. I'm just interested
in understanding the functionality of the hot spare at this
point.

 
 3. For simplicity and ease of recovery, consider
 using your disks as
 whole disks, even though you must use slices for the
 root pool.

I can't do this with a RAID 10 configuration on the
storage pool, and a mirrored root pool. I only have
places for 5 disks on a 2RU/ 3.5 drive server

 If one disk is part of two pools and it fails, two
 pool are impacted. 

Yes. This is why I used slices instead of a whole disk
for the hot spare.

 The beauty of ZFS is no longer having to deal with
 slice administration, 
 except for the root pool.
 
 I like your mirror pool configurations but I would
 simplify it by
 converting store1 to using whole disks, and keep
 separate spare disks.`

I would have done that from the beginning with more
chassis space.

 One for the store1 pool, and either create a 3-way
 mirrored root pool
 or keep a spare disk connected to the system but
 unconfigured.

I still need confirmation on whether the hot spare function
will work with slices. I saw no errors when executing the commands
for the hot spare slices, but I got this funny response when I ran the 
test
 
Dave
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Listing snapshots in a pool

2010-02-21 Thread Dave

Try:

zfs list -r -t snapshot zp1

--
Dave

On 2/21/10 5:23 PM, David Dyer-Bennet wrote:

I thought this was simple.  Turns out not to be.

bash-3.2$ zfs list -t snapshot zp1
cannot open 'zp1': operation not applicable to datasets of this type

Fails equally on all the variants of pool name that I've tried,
including zp1/ and zp1/@ and such.

You can do zfs list -t snapshot and get a list of all snapshots in all
pools. You can do zfs list -r -t snapshot zp1 and get a recursive list
of snapshots in zp1. But you can't, with any options I've tried, get a
list of top-level snapshots in a given pool. (It's easy, of course, with
grep, to get the bigger list and then filter out the subset you want).

Am I missing something? Has this been added after snv_111b?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Speed question: 8-disk RAIDZ2 vs 10-disk RAIDZ3

2010-02-16 Thread Dave Pooser
I currently am getting good speeds out of my existing system (8x 2TB in a
RAIDZ2 exported over fibre channel) but there's no such thing as too much
speed, and these other two drive bays are just begging for drives in
them If I go to 10x 2TB in a RAIDZ3, will the extra spindles increase
speed, or will the extra parity writes reduce speed, or will the two factors
offset and leave things a wash?
(My goal is to be able to survive one controller failure, so if I add
more drives I'll have to add redundancy to compensate for the fact that one
controller would then be able to take out three drives.)
I've considered adding a drive for the ZIL instead, but my experiments
in disabling the ZIL (using the evil tuning guide at
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Disabl
ing_the_ZIL_.28Don.27t.29) didn't show any speed increase. (I know it's a
bad idea run the system with ZIL disabled; I disabled it only to measure its
impact on my write speeds and re-enabled it after testing was complete.)

Current system:
OpenSolaris dev release b132
Intel S5500BC mainboard (latest firmware)
Intel E5506 Xeon 2.13GHz
8GB RAM
3x LSI 3018 PCIe SATA controllers (latest IT firmware)
8x 2TB Hitachi 7200RPM SATA drives (2 connected to each LSI and 2 to
motherboard SATA ports)
2x 60GB Imation M-class SSD (boot mirror)
Qlogic 2440 PCIe Fibre Channel HBA
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Speed question: 8-disk RAIDZ2 vs 10-disk RAIDZ3

2010-02-16 Thread Dave Pooser
 If I go to 10x 2TB in a RAIDZ3, will the extra spindles increase
 speed, or will the extra parity writes reduce speed, or will the two factors
 offset and leave things a wash?

I should mention that the usage of this system is as storage for large
(5-300GB) video files, so what's most important is sequential write speed.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export

2010-02-14 Thread Dave Pooser
   0   6 c8t1d0
0.0  191.00.0 1816.2  0.0  0.10.00.5   0   6 c9t0d0
0.0  191.00.0 1816.2  0.0  0.10.00.5   0   6 c9t1d0

-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export

2010-02-14 Thread Dave Pooser
 So which hard drives are connected to which controllers?
 And what device drivers are those controllers using?

   0. c7t0d0 DEFAULT cyl 7764 alt 2 hd 255 sec 63
  /p...@0,0/pci8086,3...@3/pci1000,3...@0/s...@0,0
   1. c7t1d0 ATA-Hitachi HDS72202-A20N-1.82TB
  /p...@0,0/pci8086,3...@3/pci1000,3...@0/s...@1,0
   2. c8t0d0 ATA-Hitachi HDS72202-A20N-1.82TB
  /p...@0,0/pci8086,3...@7/pci1000,3...@0/s...@0,0
   3. c8t1d0 ATA-Hitachi HDS72202-A20N-1.82TB
  /p...@0,0/pci8086,3...@7/pci1000,3...@0/s...@1,0
   4. c9t0d0 ATA-Hitachi HDS72202-A20N-1.82TB
  /p...@0,0/pci8086,3...@9/pci1000,3...@0/s...@0,0
   5. c9t1d0 ATA-Hitachi HDS72202-A20N-1.82TB
  /p...@0,0/pci8086,3...@9/pci1000,3...@0/s...@1,0
   6. c10d0 DEFAULT cyl 7764 alt 2 hd 255 sec 63
  /p...@0,0/pci-...@1f,2/i...@0/c...@0,0
   7. c10d1 Hitachi-   JK1131YAGP8N3-0001-1.82TB
  /p...@0,0/pci-...@1f,2/i...@0/c...@1,0
   8. c11d0 Hitachi-   JK1131YAGZE4Z-0001-1.82TB
  /p...@0,0/pci-...@1f,2/i...@1/c...@0,0
   9. c11d1 Hitachi-   JK1131YAGGMT9-0001-1.82TB
  /p...@0,0/pci-...@1f,2/i...@1/c...@1,0

 Strange that you say
 that there are two hard drives
 per controllers, but three drives are showing
 high %b.
 
 And strange that you have c7,c8,c9,c10,c11
 which looks like FIVE controllers!

c7, c8 and c9 are LSI controllers using the MPT driver. The motherboard has
6 SATA ports which are presented as two controllers (presumably c10 and c11)
one for ports 0-3 and one for ports 4 and 5; both currently use the PCI-IDE
drivers.

And as you say, it's odd that there are three drives on c10 and c11, since
they should have only two of the raidz2 drives; I need to go double-check my
cabling. The way it's *supposed* to be configured is:

c7: two RAIDZ2 drives and one of the boot mirror drives
c8: two RAIDZ2 drives
c9: two RAIDZ2 drives
c10: one RAIDZ2 drive and one of the boot mirror drives
c11: one RAIDZ2 drive

(The theory here is that since this server is going to spend its life being
shipped places in the back of a truck I want to make sure that no single
controller failure can either render it unbootable or destroy the RAIDZ2.)

That said, I think that this is probably *a* tuning problem but not *the*
tuning problem, since I was getting acceptable performance over CIFS and
miserable performance over FC. Richard Elling suggested I try the latest dev
release to see if I'm encountering a bug that forces synchronous writes, so
I'm off to straighten out my controller distribution, check to see if I have
write caching turned off on the motherboard ports, install the b132 build,
and possibly grab some dinner while I'm about it. I'll report back to the
list with any progress or lack thereof.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export

2010-02-14 Thread Dave Pooser
 on my motherboard, i can make the onboard sata ports show up as IDE or SATA,
 you may look into that.  It would probably be something like AHCI mode.

Yeah, I changed the motherboard setting from enhanced to AHCI and now
those ports show up as SATA.
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Painfully slow RAIDZ2 as fibre channel COMSTAR export

2010-02-14 Thread Dave Pooser
 I'm off to straighten out my controller distribution, check to see if I have
 write caching turned off on the motherboard ports, install the b132 build,
 and possibly grab some dinner while I'm about it. I'll report back to the
 list with any progress or lack thereof.

OK, the issue seems to be resolved now-- I'm seeing write speeds in excess
of 160MB/s. What I did to fix things:
1) Redistributed drives across controllers to match my actual
configuration-- thanks to Nigel for pointing that one out
2) Set my motherboard controller to AHCI mode-- thanks to Richard and Thomas
for suggesting that. Once I made that change I no longer saw the raidz
contains devices of different sizes error, so it looks like Bob was right
about the source of that error
3) Upgraded to OpenSolaris 2010.03 preview b132 which appears to correct a
problem in 2009.06 where iSCSI (and apparently FC) forced all writes to be
synchronous -- thanks to Richard for that pointer.

Five hours from tearing my hair out to toasting a success-- this list is a
great resource!
-- 
Dave Pooser, ACSA
Manager of Information Services
Alford Media  http://www.alfordmedia.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Dave

Use create-lu to give the clone a different GUID:

sbdadm create-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

--
Dave

On 2/8/10 10:34 AM, Scott Meilicke wrote:

Thanks Dan.

When I try the clone then import:

pfexec zfs clone 
data01/san/gallardo/g...@zfs-auto-snap:monthly-2009-12-01-00:00 
data01/san/gallardo/g-testandlab
pfexec sbdadm import-lu /dev/zvol/rdsk/data01/san/gallardo/g-testandlab

The sbdadm import-lu gives me:

sbdadm: guid in use

which makes sense, now that I see it. The man pages make it look like I cannot 
give it another GUID during the import. Any other thoughts? I *could* delete 
the current lu, import, get my data off and reverse the process, but that would 
take the current volume off line, which is not what I want to do.

Thanks,
Scott

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Mounting a snapshot of an iSCSI volume using Windows

2010-02-08 Thread Dave
Ah, I didn't see the original post. If you're using an old COMSTAR 
version prior to build 115, maybe the metadata placed at the first 64K 
of the volume is causing problems?


http://mail.opensolaris.org/pipermail/storage-discuss/2009-September/007192.html

The clone and create-lu process works for mounting cloned volumes under 
linux with b130. I don't have any windows clients to test with.


--
Dave


On 2/8/10 11:23 AM, Scott Meilicke wrote:

Sure, but that will put me back into the original situation.

-Scott

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS + fsck

2009-11-05 Thread Dave Koelmeyer
Thanks for taking the time to write this - very useful info :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Crazy Phantom Zpools Again

2009-09-18 Thread Dave Abrahams
I just did a fresh reinstall of OpenSolaris and I'm again seeing
the phenomenon described in 
http://article.gmane.org/gmane.os.solaris.opensolaris.zfs/26259
which I posted many months ago and got no reply to.

Can someone *please* help me figure out what's going on here?

Thanks in Advance,
--
Dave Abrahams
BoostPro Computing
http://boostpro.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS commands hang after several zfs receives

2009-09-15 Thread Dave
 The case has been identified and I've just received
 an IDR,which I will 
 test next week.  I've been told the issue is fixed in
 update 8, but I'm 
 not sure if there is an nv fix target.
 

Anyone know if there Is an opensolaris fix for this issue and when?

These seem to be related.  
http://www.opensolaris.org/jive/thread.jspa?threadID=112808
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool status OK but zfs filesystem seems hung

2009-09-14 Thread Dave
Thanks for the reply but this seems to be a bit different.  

a couple of things I failed to mention;
1) this is a secondary pool and not the root pool. 
2) the snapshot are trimmed to only keep 80 or so.

The system boots and runs fine.   It's just an issue for this secondary pool 
and filesystem.It seems to be directly related to I/O intensive operations 
as the (full) backup seems to trigger it, never seen it happen with incremental 
backups...


Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool status OK but zfs filesystem seems hung

2009-09-14 Thread Dave
Hello all,
  I have a situation where zpool status shows no known data errors but all 
processes on a specific filesystem are hung.  This has happened 2 times before 
since we installed Opensolaris 2009.06 snv_111b. For instance there are two 
files systems in this pool 'zfs get all' on one filesystem returns with out 
issue when ran on the other filesystem it hangs.  Also a 'df -h' hangs, etc.

This file system has many different operation running on it;
1) It receives incremental snapshot every 30 minutes continuously.
2) every night a clone is made from one of the received snapshot streams then a 
filesystem backup is taken on that clone (the backup is a directory traversal) 
once the backup completes the clone is destroyed. 

We tried to upgrade to the latest build but ran in to the current 'check sum' 
issue in build snv_122 so we rolled back.

# uname -a
SunOS lahar2 5.11 snv_111b i86pc i386 i86pc

# zpool status zdisk1
  pool: zdisk1
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
zdisk1  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c7t1d0  ONLINE   0 0 0
c7t2d0  ONLINE   0 0 0
c7t3d0  ONLINE   0 0 0
c7t4d0  ONLINE   0 0 0
c7t5d0  ONLINE   0 0 0
spares
  c7t6d0AVAIL

errors: No known data errors


The filesystem is currently in this 'hung' state,  is there any commands I can 
run to help debug the issue?

TIA
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] check a zfs rcvd file

2009-09-02 Thread Dave

Dick Hoogendijk wrote:


Some time ago there was some discussion on zfs send | rcvd TO A FILE.
Apart form the disadvantages which I now know someone mentioned a CHECK 
to be at least sure that the file itself was OK (without one or more 
bits that felt over). I lost this reply and would love to hear this 
check again. In other words how can I be sure of the validity of the 
received file in the next command line:


# zfs send -Rv rp...@090902  /backup/snaps/rpool.090902

I only want to know how to check the integrity of the received file.



You should be able to generate a sha1sum/md5sum of the zfs send stream 
on the fly with 'tee':


# zfs send -R rp...@090902 | tee /backups/snaps/rpool.090902 | sha1sum

compare the output of that with the sha1sum of the file on-disk:
# sha1sum /backups/snaps/rpool.090902

This only guarantees that the file contains the exact same bits as the 
zfs send stream. It does not verify the ZFS format/integrity of the 
stream - the only way to do that is to zfs recv the stream into ZFS.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-28 Thread Dave
Thanks, Trevor. I understand the RFE/CR distinction. What I don't 
understand is how this is not a bug that should be fixed in all solaris 
versions.


The related ID 6612830 says it was fixed in Sol 10 U6, which was a while 
ago. I am using OpenSolaris, so I would really appreciate confirmation 
that it has been fixed in OpenSolaris as well. I can't tell by the info 
on the bugs DB - it seems like it hasn't been fixed in OpenSolaris. If 
it has, then the status should reflect it as Fixed/Closed in the bug 
database...


--
Dave


Trevor Pretty wrote:

Dave

Yep that's an RFE. (Request For Enchantment) that's how things are 
reported to engineers to fix things inside Sun.  If it's an honest to 
goodness CR = bug (However it normally need a real support paying 
customer to have a problem to go from RFE to CR) the responsible 
engineer evaluates it, and eventually gets it fixed, or not. When I 
worked at Sun I logged a lot of RFEs, only a few where accepted as bugs 
and fixed.


Click on the new Search link and look at the type and state menus. It 
gives you an idea of the states a RFE and CR goes through. It's probably 
documented somewhere, but I can't find it. Part of the joy of Sun 
putting out in public something most other vendors would not dream of doing.


Oh and it doesn't help both RFEs and CR are labelled bug at 
http://bugs.opensolaris.org/


So. Looking at your RFE.

It tells you which version on Nevada it was reported against 
(translating this into an Opensolaris version is easy - NOT!)


Look at *Related Bugs* 6612830 
http://bugs.opensolaris.org/bugdatabase/view_bug.do;jsessionid=e49afb42be7df0f5f17ec9c2d711?bug_id=6612830 



This will tell you the

*Responsible Engineer* Richard Morris

and when it was fixed

*Release Fixed* , solaris_10u6(s10u6_01) (*Bug ID:*2160894 
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=2160894) 


Although as nothing in life is guaranteed it looks like another bug  
2160894 has been identified and that's not yet on bugs.opensolaris.org


Hope that helps.

Trevor


Dave wrote:

Just to make sure we're looking at the same thing:

http://bugs.opensolaris.org/view_bug.do?bug_id=6761786

This is not an issue of auto snapshots. If I have a ZFS server that 
exports 300 zvols via iSCSI and I have daily snapshots retained for 14 
days, that is a total of 4200 snapshots. According to the link/bug 
report above it will take roughly 5.5 hours to import my pool (even when 
the pool is operating perfectly fine and is not degraded or faulted).


This is obviously unacceptable to anyone in an HA environment. Hopefully 
someone close to the issue can clarify.


--
Dave

Blake wrote:
  

I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cookt...@cook.ms wrote:


On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers re...@lengers.com wrote:
  

Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco


Seriously?  It's considered works as designed for a system to take 5+
hours to boot?  Wow.

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  





*/

*//*

*//*///*

www.eagle.co.nz http://www.eagle.co.nz/ 

This email is confidential and may be legally privileged. If received in 
error please destroy and immediately notify us.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-28 Thread Dave

Richard Elling wrote:

On Aug 28, 2009, at 12:15 AM, Dave wrote:

Thanks, Trevor. I understand the RFE/CR distinction. What I don't 
understand is how this is not a bug that should be fixed in all 
solaris versions.


In a former life, I worked at Sun to identify things like this that 
affect availability
and lobbied to get them fixed. There are opposing forces at work: the 
functionality
is correct as designed versus availability folks think it should go 
faster. It is difficult
to build the case that code changes should be made for availability when 
other
workarounds exist. It will be more fruitful for you to examine the 
implementation and
see if there is a better way to improve the efficiencies of your 
snapshot processes.
For example, the case can be made for a secondary data store containing 
long-term
snapshots which can allow you to further optimize the primary data store 
for

performance and availability.
 -- richard


This is unfortunate, but it seems this may be the only option if I want 
to import a pool within a reasonable amount of time. It's very 
frustrating to know that it can be fixed (evidenced by the S10U6 fix), 
but won't be fixed in Nevada/OpenSolaris - or so it seems.


It may be filed as an RFE, but in my opinion it is most definitely a bug.

--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Dave
Can anyone from Sun comment on the status/priority of bug ID 6761786? 
Seems like this would be a very high priority bug, but it hasn't been 
updated since Oct 2008.


Has anyone else with thousands of volume snapshots experienced the hours 
long import process?


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Status/priority of 6761786

2009-08-27 Thread Dave

Just to make sure we're looking at the same thing:

http://bugs.opensolaris.org/view_bug.do?bug_id=6761786

This is not an issue of auto snapshots. If I have a ZFS server that 
exports 300 zvols via iSCSI and I have daily snapshots retained for 14 
days, that is a total of 4200 snapshots. According to the link/bug 
report above it will take roughly 5.5 hours to import my pool (even when 
the pool is operating perfectly fine and is not degraded or faulted).


This is obviously unacceptable to anyone in an HA environment. Hopefully 
someone close to the issue can clarify.


--
Dave

Blake wrote:

I think the value of auto-snapshotting zvols is debatable.  At least,
there are not many folks who need to do this.

What I'd rather see is a default property of 'auto-snapshot=off' for zvols.

Blake

On Thu, Aug 27, 2009 at 4:29 PM, Tim Cookt...@cook.ms wrote:


On Thu, Aug 27, 2009 at 3:24 PM, Remco Lengers re...@lengers.com wrote:

Dave,

Its logged as an RFE (Request for Enhancement) not as a CR (bug).

The status is 3-Accepted/  P1  RFE

RFE's are generally looked at in a much different way then a CR.

..Remco


Seriously?  It's considered works as designed for a system to take 5+
hours to boot?  Wow.

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to find poor performing disks

2009-08-26 Thread Dave Koelmeyer
Maybe you can run a Dtrace probe using Chime?

http://blogs.sun.com/observatory/entry/chime

Initial Traces - Device IO
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Another user looses his pool (10TB) in this case and 40

2009-07-31 Thread Dave Stubbs
 I don't mean to be offensive Russel, but if you do
 ever return to ZFS, please promise me that you will
 never, ever, EVER run it virtualized on top of NTFS
 (a.k.a. worst file system ever) in a production
 environment. Microsoft Windows is a horribly
 unreliable operating system in situations where
 things like protecting against data corruption are
 important. Microsoft knows this

Oh WOW!  Whether or not our friend Russel virtualized on top of NTFS (he didn't 
- he used raw disk access) this point is amazing!  System5 - based on this 
thread I'd say you can't really make this claim at all.  Solaris suffered a 
crash and the ZFS filesystem lost EVERYTHING!  And there aren't even any 
recovery tools?  

HANG YOUR HEADS!!!

Recovery from the same situation is EASY on NTFS.  There are piles of tools out 
there that will recover the file system, and failing that, locate and extract 
data.  The key parts of the file system are stored in multiple locations on the 
disk just in case.  It's been this way for over 10 years.  I'd say it seems 
from this thread that my data is a lot safer on NTFS than it is on ZFS!  

I can't believe my eyes as I read all these responses blaming system 
engineering and hiding behind ECC memory excuses and well, you know, ZFS is 
intended for more Professional systems and not consumer devices, etc etc.  My 
goodness!  You DO realize that Sun has this website called opensolaris.org 
which actually proposes to have people use ZFS on commodity hardware, don't 
you?  I don't see a huge warning on that site saying ATTENTION:  YOU PROBABLY 
WILL LOSE ALL YOUR DATA.  

I recently flirted with putting several large Unified Storage 7000 systems on 
our corporate network.  The hype about ZFS is quite compelling and I had 
positive experience in my lab setting.  But because of not having Solaris 
capability on our staff we went in another direction instead.

Reading this thread, I'm SO glad we didn't put ZFS in production in ANY way.  
Guys, this is the real world.  Stuff happens.  It doesn't matter what the 
reason is - hardware lying about cache commits, out-of-order commits, failure 
to use ECC memory, whatever.  It is ABSOLUTELY unacceptable for the filesystem 
to be entirely lost.  No excuse or rationalization of any type can be 
justified.  There MUST be at least the base suite of tools to deal with this 
stuff.  without it, ZFS simply isn't ready yet.  

I am saving a copy of this thread to show my colleagues and also those Sun 
Microsystems sales people that keep calling.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zfs deduplication

2009-07-31 Thread Dave McDorman
I don't think is at liberty to discuss ZFS Deduplication at this point in time:

http://www.itworld.com/storage/71307/sun-tussles-de-duplication-startup

Hopefully, the matter is resolved and discussions can proceed openly.

Send lawyers, guns and money. - Warren Zevon
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Tunable iSCSI timeouts - ZFS over iSCSI fix

2009-07-29 Thread Dave
Anyone (Ross?) creating ZFS pools over iSCSI connections will want to 
pay attention to snv_121 which fixes the 3 minute hang after iSCSI disk 
problems:


http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=649

Yay!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] APPLE: ZFS need bug corrections instead of new func! Or?

2009-06-20 Thread Dave



Haudy Kazemi wrote:



I think a better question would be: what kind of tests would be most
promising for turning some subclass of these lost pools reported on
the mailing list into an actionable bug?

my first bet would be writing tools that test for ignored sync cache
commands leading to lost writes, and apply them to the case when iSCSI
targets are rebooted but the initiator isn't.

I think in the process of writing the tool you'll immediately bump
into a defect, because you'll realize there is no equivalent of a
'hard' iSCSI mount like there is in NFS.  and there cannot be a strict
equivalent to 'hard' mounts in iSCSI, because we want zpool redundancy
to preserve availability when an iSCSI target goes away.  I think the
whole model is wrong somehow.
  
I'd surely hope that a ZFS pool with redundancy built on iSCSI targets 
could survive the loss of some targets whether due to actual failures or 
necessary upgrades to the iSCSI targets (think OS upgrades + reboots on 
the systems that are offering iSCSI devices to the network.)




I've had a mirrored zpool created from solaris iSCSI target servers in 
production since April 2008. I've had disks die and reboots of the 
target servers - ZFS has handled them very well. My biggest wish is to 
be able to tune the iSCSI timeout value so ZFS can failover reads/writes 
to the other half of the mirror quicker than it does now (about 180 
seconds on my config). A minor gripe considering the features that ZFS 
provides.


I've also had the zfs server (the initiator aggregating the mirrored 
disks) unintentionally power cycled with the iscsi zpool imported. The 
pool re-imported and scrubbed fine.


ZFS is definitely my FS of choice - by far.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server Cloning With ZFS?

2009-06-19 Thread Dave Ringkor
Cindy, my question is about what system specific info is maintained that 
would need to be changed?  To take my example, my E450, homer, has disks that 
are failing and it's a big clunky server anyway, and management wants to 
decommission it.  But we have an old 220R racked up doing nothing, and it's not 
scheduled for disposal.  

What would be wrong with this:
1) Create a recursive snapshot of the root pool on homer.
2) zfs send this snapshot to a file on some NFS server.
3) Boot my 220R (same architecture as the E450) into single user mode from a 
DVD.
4) Create a zpool on the 220R's local disks.
5) zfs receive the snapshot created in step 2 to the new pool.
6) Set the bootfs property.
7) Reboot the 220R.

Now my 220R comes up as homer, with its IP address, users, root pool 
filesystems, any software that was installed in the old homer's root pool, etc.

Since ZFS filesystems don't care about the underlying disk structure -- they 
only care about the pool, and I've already created a pool for them on the 220R 
using the disks it has, there shouldn't be any storage-type system specific 
into to change, right?  And sure, the 220R might have a different number and 
speed of CPUs, and more or less RAM than the E450 had.  But when you upgrade a 
server in place you don't have to manually configure the CPUs or RAM, and how 
is this different?

The only thing I can think of that I might need to change, in order to bring up 
my 220R and have it be homer, is the network interfaces, from hme to bge or 
whatever.  And that's a simple config setting.

I don't care about Flash.  Actually, if you wanted to provision new servers 
based on a golden image like you can with Flash, couldn't you just take a 
recursive snapshot of a zpool as above, receive it in an empty zpool on 
another server, set your bootfs, and do a sys-unconfig?

So my big question is, with a server on ZFS root, what system specific info 
would still need to be changed?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Things I Like About ZFS

2009-06-19 Thread Dave Ringkor
I'll start:

- The commands are easy to remember -- all two of them.  Which is easier, SVM 
or ZFS, to mirror your disks?  I've been using SVM for years and still have to 
break out the manual to use metadb, metainit, metastat, metattach, metadetach, 
etc.  I hardly ever have to break out the ZFS manual.  I can actually remember 
the commands and options to do things.  Don't even start me on VxVM.

- Boasting to the unconverted.  We still have a lot of VxVM and SVM on Solaris, 
and LVM on AIX, in the office.  The other admins are always having issues with 
storage migrations, full filesystems, Live Upgrade, corrupted root filesystems, 
etc.  I love being able to offer solutions to their immediate problems, and 
follow it up with, You know, if your box was on ZFS this wouldn't be an issue.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Server Cloning With ZFS?

2009-06-17 Thread Dave Ringkor
So I had an E450 running Solaris 8 with VxVM encapsulated root disk.  I 
upgraded it to Solaris 10 ZFS root using this method:

- Unencapsulate the root disk
- Remove VxVM components from the second disk
- Live Upgrade from 8 to 10 on the now-unused second disk
- Boot to the new Solaris 10 install
- Create a ZFS pool on the now-unused first disk
- Use Live Upgrade to migrate root filesystems to the ZFS pool
- Add the now-unused second disk to the ZFS pool as a mirror

Now my E450 is running Solaris 10 5/09 with ZFS root, and all the same users, 
software, and configuration that it had previously.  That is pretty slick in 
itself.  But the server itself is dog slow and more than half the disks are 
failing, and maybe I want to clone the server on new(er) hardware.

With ZFS, this should be a lot simpler than it used to be, right?  A new server 
has new hardware, new disks with different names and different sizes.  But that 
doesn't matter anymore.  There's a procedure in the ZFS manual to recover a 
corrupted server by using zfs receive to reinstall a copy of the boot 
environment into a newly created pool on the same server.  But what if I used 
zfs send to save a recursive snapshot of my root pool on the old server, booted 
my new server (with the same architecture) from the DVD in single user mode and 
created a ZFS pool on its local disks, and did zfs receive to install the boot 
environments there?  The filesystems don't care about the underlying disks.  
The pool hides the disk specifics.  There's no vfstab to edit.  

Off the top of my head, all I can think to have to change is the network 
interfaces.  And that change is as simple as cd /etc ; mv hostname.hme0 
hostname.qfe0 or whatever.  Is there anything else I'm not thinking of?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2009-05-20 Thread Dave



Richard Elling wrote:

Will Murnane wrote:

On Wed, May 20, 2009 at 12:42, Miles Nordin car...@ivy.net wrote:
 

djm == Darren J Moffat darr...@opensolaris.org writes:
  

  djm a) it was highly dangerous and involved using multiple
  djm different zfs kernel modules was well as

however...utter hogwash!  Nothing is ``highly dangerous'' when your
pool is completely unreadable.


It is if you turn your unreadable but fixable pool into a
completely unrecoverable pool.  If my pool loses its log disk, I'm
waiting for an official tool to fix it.
  


Whoa.

The slog is a top-level vdev like the others.  The current situation is 
that

loss of a top-level vdev results in a pool that cannot be imported. If you
are concerned about the loss of a top-level vdev, then you need to protect
them.  For slogs, mirrors work.  For the main pool, mirrors and raidz[12]
work.

There was a conversation regarding whether it would be a best practice
to always mirror the slog. Since the recovery from slog failure modes is
better than that of the other top-level vdevs, the case for recommending
a mirrored slog is less clear. If you are paranoid, then mirror the slog.
-- richard



I can't test this myself at the moment, but the reporter of Bug ID 
6733267 says even one failed slog from a pair of mirrored slogs will 
prevent an exported zpool from being imported. Has anyone tested this 
recently?


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2009-05-19 Thread Dave

Paul B. Henson wrote:

I was checking with Sun support regarding this issue, and they say The CR
currently has a high priority and the fix is understood. However, there is
no eta, workaround, nor IDR.

If it's a high priority, and it's known how to fix it, I was curious as to
why has there been no progress? As I understand, if a failure of the log
device occurs while the pool is active, it automatically switches back to
an embedded pool log. It seems removal would be as simple as following the
failure path to an embedded log, and then update the pool metadata to
remove the log device. Is it more complicated than that? We're about to do
some testing with slogs, and it would make me a lot more comfortable to
deploy one in production if there was a backout plan :)...


If you don't have mirrored slogs and the slog fails, you may lose any 
data that was in a txg group waiting to be committed to the main pool 
vdevs - you will never know if you lost any data or not.


I think this thread is the latest discussion about slogs and their behavior:

https://opensolaris.org/jive/thread.jspa?threadID=102392tstart=0

--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] CR# 6574286, remove slog device

2009-05-19 Thread Dave



Eric Schrock wrote:


On May 19, 2009, at 12:57 PM, Dave wrote:


If you don't have mirrored slogs and the slog fails, you may lose any 
data that was in a txg group waiting to be committed to the main pool 
vdevs - you will never know if you lost any data or not.


None of the above is correct.  First off, you only lose data if the slog 
fails *and* the machine panics/reboots before the transaction groups is 
synced (5-30s by default depending on load, though there is a CR filed 
to immediately sync on slog failure).  You will not lose any data once 
the txg is synced - syncing the transaction group does not require 
reading from the slog, so failure of the log device does not impact 
normal operation.




Thanks for correcting my statement. There is still a potential 
approximate 60 second window for data loss if there are 2 transaction 
groups waiting to sync with a 30 second txg commit timer, correct?


The latter half of the above statement is also incorrect.  Should you 
find yourself in the double-failure described above, you will get an FMA 
fault that describes the nature of the problem and the implications.  If 
the slog is truly dead, you can 'zpool clear' (or 'fmadm repair') the 
fault and use whatever data you still have in the pool.  If the slog is 
just missing, you can insert it and continue without losing data.  In no 
cases will ZFS silently continue without committed data.




How will it know that data was actually lost? Or does it just alert you 
that it's possible data was lost?


There's also the worry that the pool is not importable if you did have 
the double failure scenario and the log really is gone. Re: bug ID 
6733267 . E.g. if you had done a 'zpool import -o cachefile=none mypool'.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] reboot when copying large amounts of data

2009-03-12 Thread Dave



Tim wrote:



On Thu, Mar 12, 2009 at 2:22 PM, Blake blake.ir...@gmail.com 
mailto:blake.ir...@gmail.com wrote:


I've managed to get the data transfer to work by rearranging my disks
so that all of them sit on the integrated SATA controller.

So, I feel pretty certain that this is either an issue with the
Supermicro aoc-sat2-mv8 card, or with PCI-X on the motherboard (though
I would think that the integrated SATA would also be using the PCI
bus?).

The motherboard, for those interested, is an HD8ME-2 (not, I now find
after buying this box from Silicon Mechanics, a board that's on the
Solaris HCL...)

http://www.supermicro.com/Aplus/motherboard/Opteron2000/MCP55/h8dme-2.cfm

So I'm not considering one of LSI's HBA's - what do list members think
about this device:

http://www.provantage.com/lsi-logic-lsi00117~7LSIG03X.htm
http://www.provantage.com/lsi-logic-lsi00117%7E7LSIG03X.htm



I believe the MCP55's SATA controllers are actually PCI-E based.


I use Tyan 2927 motherboards. They have on-board nVidia MCP55 chipsets, 
which is the same chipset at the X4500 (IIRC). I wouldn't trust the 
MCP55 chipset in OpenSolaris. I had random disk hangs even while the 
machine was mostly idle.


In Feb 2008 I bought AOC-SAT2-MV8 cards and moved all my drives to these 
add-in cards. I haven't had any issues with drive hanging since. There 
does not seem to be any problems with the SAT2-MV8 under heavy load in 
my servers from what I've seen.


When the SuperMicro AOC-USAS-L8i came out later last year, I started 
using them instead. They work better than the SAT2-MV8s.


This card needs a 3U or bigger case:
http://www.supermicro.com/products/accessories/addon/AOC-USAS-L8i.cfm

This is the low profile card that will fit in a 2U:
http://www.supermicro.com/products/accessories/addon/AOC-USASLP-L8i.cfm

They both work in normal PCI-E slots on my Tyan 2927 mobos.

Finding good non-Sun hardware that works very well under OpenSolaris is 
frustrating to say the least. Good luck.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-06 Thread Dave

C. Bergström wrote:

Bob Friesenhahn wrote:
I don't know if anyone has noticed that the topic is google summer of 
code.  There is only so much that a starving college student can 
accomplish from a dead-start in 1-1/2 months.  The ZFS equivalent of 
eliminating world hunger is not among the tasks which may be 
reasonably accomplished, yet tasks at this level of effort is all that 
I have seen mentioned here.
May I interject a bit.. I'm silently collecting this task list and even 
outside of gsoc may help try to arrange it from a community 
perspective.  Of course this will be volunteer based unless /we/ get a 
sponsor or sun beats /us/ to it.  So all the crazy ideas welcome..




I would really like to see a feature like 'zfs diff f...@snap1 
f...@othersnap' that would report the paths of files that have either been 
added, deleted, or changed between snapshots. If this could be done at 
the ZFS level instead of the application level it would be very cool.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-04 Thread Dave

Gary Mills wrote:

On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote:

gm == Gary Mills mi...@cc.umanitoba.ca writes:

gm I suppose my RFE for two-level ZFS should be included,

Not that my opinion counts for much, but I wasn't deaf to it---I did
respond.


I appreciate that.


I thought it was kind of based on mistaken understanding.  It included
this strangeness of the upper ZFS ``informing'' the lower one when
corruption had occured on the network, and the lower ZFS was supposed
to do something with the physical disks...to resolve corruption on the
network?  why?  IIRC several others pointed out the same bogosity.


It's a simply a consequence of ZFS's end-to-end error detection.
There are many different components that could contribute to such
errors.  Since only the lower ZFS has data redundancy, only it can
correct the error.  Of course, if something in the data path
consistently corrupts the data regardless of its origin, it won't be
able to correct the error.  The same thing can happen in the simple
case, with one ZFS over physical disks.


I would argue against building this into ZFS. Any corruption happening 
on the wire should not be the responsibility of ZFS. If you want to make 
sure your data is not corrupted over the wire, use IPSec. If you want to 
prevent corruption in RAM, use ECC sticks, etc.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs related google summer of code ideas - your vote

2009-03-04 Thread Dave



Gary Mills wrote:

On Wed, Mar 04, 2009 at 06:31:59PM -0700, Dave wrote:

Gary Mills wrote:

On Wed, Mar 04, 2009 at 01:20:42PM -0500, Miles Nordin wrote:

gm == Gary Mills mi...@cc.umanitoba.ca writes:

   gm I suppose my RFE for two-level ZFS should be included,

It's a simply a consequence of ZFS's end-to-end error detection.
There are many different components that could contribute to such
errors.  Since only the lower ZFS has data redundancy, only it can
correct the error.  Of course, if something in the data path
consistently corrupts the data regardless of its origin, it won't be
able to correct the error.  The same thing can happen in the simple
case, with one ZFS over physical disks.
I would argue against building this into ZFS. Any corruption happening 
on the wire should not be the responsibility of ZFS. If you want to make 
sure your data is not corrupted over the wire, use IPSec. If you want to 
prevent corruption in RAM, use ECC sticks, etc.


But what if the `wire' is a SCSI bus?  Would you want ZFS to do error
correction in that case?  There are many possible wires.  Every
component does its own error checking of some sort, but in its own
domain.  This brings us back to end-to-end error checking again. Since
we are designing a filesystem, that's where the reliability should
reside.



ZFS can't eliminate or prevent all errors. You should have a split 
backplane/multiple controllers and a minimum 2-way mirror if you're 
concerned about this from a local component POV.


Same with iSCSI. I run a minimum 2-way mirror from my ZFS server from 2 
different NICs, over 2 gigabit switches w/trunking to two different disk 
shelves for this reason. I do not stack ZFS layers, since it degrades 
performance and really doesn't provide any benefit.


What's your reason for stacking zpools? I can't recall the original 
argument for this.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disable startup import

2009-03-02 Thread Dave

smart trams wrote:

Hi All,

   
   What I all want is a way to disable startup import process of ZFS. So on every server reboot, I want to manually import the pools and mount on required mount point.

   zpool attributes like mountpoint=legacy or canmount affect pool mounting 
behavior and no command found for disabling startup import process. My systems 
are Solaris running on SPARC systems.

   Why I need this feature? Good Question! I've a active/standby clustered 
environment with 1 shared SAN disk with 2 servers. Shared disk have one ZFS 
pool [xpool] that must be always imported and mounted on one server on any 
time. When the active server dies, my cluster software [Verites Cluster] 
detects the problem and imports the 'xpool' [with -f switch] on standby server 
and starts the applications.
   Everything is happy till now. When the died server boots up, it tries to 
have the 'xpool' pool and lists it as one of it's pools. Note that I didn't 
mentioned about mounting in any mountpoint! only listing as it's current pools. 
The problem now rise up that two nodes are now trying to have write activities 
on pool and the pool gets inconsistent! What I want is to disable this ZFS 
behaviour and force it to wait until my cluster software decides about the 
active server.



Use the cachefile=none option whenever you import the pool on either server:

zpool import -o cachefile=none xpool

--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Confused about zfs recv -d, apparently

2009-02-22 Thread Dave

Frank Cusack wrote:

When you try to backup the '/' part of the root pool, it will get
mounted on the altroot itself, which is of course already occupied.
At that point, the receive will fail.

So far as I can tell, mounting the received filesystem is the last
step in the process.  So I guess maybe you could replicate everything
except '/', finally replicate '/' and just ignore the error message.
I haven't tried this.  You have to do '/' last because the receive
stops at that point even if there is more data in the stream.


Wouldn't it be relatively easy to add an option to 'zfs receive' to 
ignore/not mount the received filesystem, or set the canmount option to 
'no' when receiving? Is there an RFE for this, or has it been added to a 
more recent release already?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-02-12 Thread Dave



Will Murnane wrote:

On Thu, Feb 12, 2009 at 20:05, Tim t...@tcsac.net wrote:

Are you selectively ignoring responses to this thread or something?  Dave
has already stated he *HAS IT WORKING TODAY*.

No, I saw that post.  However, I saw one unequivocal it doesn't work
earlier (even if I can't show it to you), which implies to me that
whether the card works or not in a particular setup is somewhat
finicky.  So here's one datapoint:

Dave wrote:

Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 2927
motherboard.

but the thread that Brandon linked to does not contain a datapoint.

For what it's worth, I think these are the only two datapoints I've
seen; most threads about this card end up debating back and forth
whether it will work, with nobody actually buying and testing the
card.



I can tell you that the USAS-L8i absolutely works fine with a Tyan 2927 
in a Chenbro RM31616 3U rackmount chassis. In fact, I have two of the 
USAS-L8i in this chassis because I forgot that, unlike the 8-port 
AOC-SAT2-MV8, the USAS-L8i can support up to 122 drives.


I have 8 drives connected to the first USAS-L8i. They are set up in a 
raidz-2 and I get 90-120MB/sec read and 60-75MB/sec write during my 
rsyncs from linux machines (this solaris box is only used to store 
backup data).


I plan on removing the second USAS-L8i and connect all 16 drives to the 
first USAS-L8i when I need more storage capacity. I have no doubt that 
it will work as intended. I will report to the list otherwise.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-12 Thread Dave

Blake wrote:

I'm sure it's very hard to write good error handling code for hardware
events like this.

I think, after skimming this thread (a pretty wild ride), we can at
least decide that there is an RFE for a recovery tool for zfs -
something to allow us to try to pull data from a failed pool.  That
seems like a reasonable tool to request/work on, no?



The ability to force a roll back to an older uberblock in order to be 
able to access the pool (in the case of corrupt current uberblock) 
should be ZFS developer's very top priority, IMO. I'd offer to do it 
myself, but I have nowhere near the ability to do so.


--
Dave
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two zvol devices one volume?

2009-02-12 Thread Dave



Henrik Johansson wrote:
I tried to export the zpool also, and I got this, the strange part is 
that it sometimes still thinks that the ubuntu-01-dsk01 dataset exists:


# zpool export zpool01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist
cannot unmount '/zpool01/dump': Device busy

But:
# zfs destroy zpool01/xvm/dsk/ubuntu-01-dsk01
cannot open 'zpool01/xvm/dsk/ubuntu-01-dsk01': dataset does not exist

Regards


I have seen this 'phantom dataset' with a pool on nv93. I created a 
zpool, created a dataset, then destroyed the zpool. When creating a new 
zpool on the same partitions/disks as the destroyed zpool, upon export I 
receive the same message as you describe above, even though I never 
created the dataset in the new pool.


Creating a dataset of the same name and then destroying it doesn't seem 
to get rid of it, either.


I never did remember to file a bug for it...
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] importing zpools after a remote replication from different sites

2009-02-11 Thread Dave
You can also import pools by their unique ID instead of by name. If the 
pool is not imported, 'zpool import' with no arguments should list the 
pool IDs. If the pool is imported, 'zpool get guid poolname' will list 
the pool ID.


Beware that if the zpools have the same mountpoints set within any of 
their datasets, then that may cause problems when importing more than 
one zpool at a time.


Frank Cusack wrote:
On February 11, 2009 6:17:58 PM +0200 Rafael Friedlander r...@sun.com 
wrote:

In a scenario where multiple sites replicate their zpools (EMC storage,
hardware based replication) to a single storage in a central site, and
given that all zpools have the same name, can the host in the central
site correctly identify and mount the different zpools correctly?


yes, hostid that last mounted the zpool is stored in the zpool.  Solaris
will skip zpools that were mounted by someone else.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Supermicro AOC-USAS-L8i

2009-02-11 Thread Dave

Brent wrote:

Does anyone know if this card will work in a standard pci express slot?


Yes. I have an AOC-USAS-L8i working in a regular PCI-E slot in my Tyan 
2927 motherboard.


The AOC-SAT2-MV8 also works in a regular PCI slot (although it is PCI-X 
card).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS: unreliable for professional usage?

2009-02-10 Thread Dave



D. Eckert wrote:

(...)
You don't move a pool with 'zfs umount', that only unmounts a single zfs
filesystem within a pool, but the pool is still active.. 'zpool export'
releases the pool from the OS, then 'zpool import' on the other machine.
(...)

with all respect: I never read such a non logic ridiculous .


You are not listening and you are not learning. You do not seem to 
understand the fundamentals of ZFS.




I have a single zpool set up over the entire available disk space on an 
external USB drive without any other filesystems inside this particular pool.

so how on earth should I be sure, that the pool is still a live pool inside the 
operating system if the output of 'mount' cmd tells me, the pool is no longer 
attached to the root FS

this doesn't make sense at all and it is a vulnerability of ZFS.


'mount' is not designed to know anything about the storage *pools*. Yes, 
you unmounted the filesystem and mount shows it is not mounted. This 
does not mean the zpool is not still imported and active.




so if the output of the mount cmd tells you the FS / ZPOOL is not mounted I 
can't face any reason why the filesystem should be still up and running, 
because I just unmounted the only one available ZPOOL.


No, you did not unmount the zpool.


And by the way: After performing: 'zpool umount usbhdd1' I can NOT access any 
single file inside /usbhdd1.


There is no 'zpool unmount' command.



What else should be released from the OS FS than a single zpool containing no 
other sub Filesystems?


Again, you have not 'released' the zpool.



Why? The answer is quite simple: The pool is unmounted and no longer hooked up 
to the system's filesystem. so what should me prevent from unplugging the usb 
wire?



Again, you are not understanding the fundamentals of ZFS. You may have 
unmounted the *filesystem*, but not the zpool. You yanked a disk 
containing a live, imported zpool.


Since the advice and information offered to you in this thread has been 
completely disregarded, the only thing left to say is: RTFM.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Time taken Backup using ZFS Send Receive

2009-02-02 Thread Dave
Upgrading to b105 seems to improve zfs send/recv quite a bit. See this 
thread:

http://www.opensolaris.org/jive/message.jspa?messageID=330988

--
Dave

Kok Fong Lau wrote:
 I have been using ZFS send and receive for a while and I noticed that when I 
 try to do a send on a zfs file system of about 3 gig plus it took only about 
 3 minutes max.
 
 zfs send application/sam...@back   /backup/sample.zfs
 
 However when I tried to send a file system that's about 20 gig, it took 
 almost an hour.  I would had expected that since 3 gig took 3 mins, then 20 
 gig should take 20 mins instead of 60 mins or more.
 
 Is there something that I'm doing wrong or could I looks into any logs / 
 enable any logs to find out what is going on.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can I do this? [SEC=UNCLASSIFIED]

2009-01-13 Thread Dave
Don't try to mount the same zvol on two different machines at the same 
time. You will end up with a corrupted pool. EXPORT the zpool from your 
mac first.

If you run 'zpool import -d /dev/zvol/dsk/zfs dataset' on the solaris 
box, your Zpool from the Mac iSCSI volume should show up. If it shows up 
then you can probably import it, but beware that there may be 
incompatibilities and bugs in either the solaris or mac zfs code that 
may cause you to lose your data.

--
Dave

LEES, Cooper wrote:
 M,
 
 Just taking a stab at it.
 
 Yes. This should work - well mounting it locally through ISCSI - there 
 may be a smarter way ... ??
 
 Install iscsi client (if you don't already have it installed):
 pfexec pkg install SUNWiscs
 
 Then follow the documentation on mounting ISCSI luns on Opensolaris site.
 
 Ta,
 ---
 Cooper Ry Lees
 A boring old UNIX Administrator - Information Management Services (IMS)
 Australian Nuclear Science and Technology Organisation
 T  +61 2 9717 3853
 F  +61 2 9717 9273
 M  +61 403 739 446
 E  cooper.l...@ansto.gov.au mailto:cooper.l...@ansto.gov.au
 www.ansto.gov.au http://www.ansto.gov.au
 
 **Important: ***This transmission is intended only for the use of 
 the addressee. It is confidential and may contain privileged information 
 or copyright material. If you are not the intended recipient, any use or 
 further disclosure of this communication is strictly forbidden. If you 
 have received this transmission in error, please notify me immediately 
 by telephone and delete all copies of this transmission as well as any 
 attachments.*
 
 On 14/01/2009, at 10:17 AM, M wrote:
 
 I currently am sharing out zfs ISCSI volumes from a solaris server to 
 a Mac. I installed ZFS locally on the mac, created a local zfs pool 
 and put a zfs filesystem on the local volume. Can I now mount the 
 volume on the Solaris server and see the data?
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 
 
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   >