date:20070531

Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-31 Thread David Anderson


Nathan,

Keep in mind iSCSI target is only in OpenSolaris at this time.

On 05/30/2007 10:15 PM, Nathan Huisman wrote:

snip



= QUESTION #1

What is the best way to mirror two zfs pools in order to achieve a sort
of HA storage system? I don't want to have to physically swap my disks
into another system if any of the hardware on the ZFS server dies. If I
have the following configuration what is the best way to mirror these in
near real time?

BOX 1 (JBOD-ZFS) BOX 2 (JBOD-ZFS)

I've seen the zfs send and recieve commands but I'm not sure how well
that would work with a close to real time mirror.


If you want this to be redundant (and very scalable) you will want at 
least 2xBOX 1's and 2x BOX2's. IPMP with redundant GbE switches + NICs 
as well.


Do not use zfs send/recv. Use Sun Cluster 3.2 for HA-ZFS.

http://docs.sun.com/app/docs/doc/820-0335/6nc35dge2?a=view

There is potential for data loss if the active ZFS node crashes before 
outstanding transaction groups commit for non-synchronous writes, but 
the ZVOL (and underlying ext3fs) should not become corrupt (hasn't 
happened to me yet). Can someone from the ZFS team comment on this?





= QUESTION #2

Can ZFS be exported via iscsi and then imported as a disk to a linux
system and then be formated with another file system. I wish to use ZFS
as a block level file systems for my virtual machines. Specifically
using xen. If this is possible, how stable is this? 


This is possible and is stable in my experience. Scales well if you 
design your infrastructure correctly.



How is error
checking handled if the zfs is exported via iscsi and then the block
device formated to ext3? Will zfs still be able to check for errors?


Yes, ZFS will detect/correct block level errors in ZVOLs as long as you 
have a redundant zpool configuration (see note below about LVM)



If this is possible and this all works, then are there ways to expand a
zfs iscsi exported volume and then expand the ext3 file system on the
remote host?



Haven't tested it myself (yet), but should be possible. You might have 
to export and re-import the iSCSI target on the Xen dom0 and then resize 
the ext3 partition (e.g. using 'parted'). If that doesn't work there are 
other ways to accomplish this.



= QUESTION #3

How does zfs handle a bad drive? What process must I go through in
order to take out a bad drive and replace it with a good one?


If you have a redundant zpool configuration you will replace the failed 
disk and then issue a 'zpool replace'.




= QUESTION #4

What is a good way to back up this HA storage unit? Snapshots will
provide an easy way to do it live, but should it be dumped into a tape
library, or an third offsite zfs pool using zfs send/recieve or ?


Send snapshots to another server that has a RAIDZ (or RAIDZ2) zpool 
(want space vs performace/redundancy for backup. Opposite of the 
*MIRRORS* you will want to use for the HA-ZFS cluster - Storage 
nodes). From this node you can dump to tape, etc.




= QUESTION #5

Does the following setup work?

BOX 1 (JBOD) - iscsi export - BOX 2 ZFS.

In other words, can I setup a bunch of thin storage boxes with low cpu
and ram instead of using sas or fc to supply the jbod to the zfs server?


Yes. And ZFS+iSCSI makes this relatively cheap. I very strongly 
recommend against using LVM to handle the mirroring. *You will lose the 
ability to correct data corruption* at the ZFS level. It also does not 
scale well, increases complexity, increases cost, and reduces throughput 
over iSCSI to your ZFS nodes. Leave volume management and redundancy to ZFS.


Set up your Xen dom0 boxes to have a redundant path to your ZVOLs over 
iSCSI. Send your data _one time_ to your ZFS nodes. Let ZFS handle the 
mirroring and then send that to your iSCSI LUNs on the storage nodes. 
Make sure you set up half of each mirror in the zpool with a disk from a 
separate storage node.


Be wary of layering ZFS/ZVOLs like this. There are multiple ways to set 
up your storage nodes (plain iscsitadm or using ZVOls), and if you use 
ZVOLs you may want to disable checksum and leave that to your ZFS nodes.


Other:
 -Others have reported that Sil3124 based SATA expansion cards work 
well with Solaris.
 -Test your failover times between ZFS nodes (BOX 2s). Having lots of 
iscsi shares/filesystems can cause this to be slow. Hopefully this will 
be improved with parallel zpool device mounting in the future.
 -ZVOLs are not sparse by default. I prefer this, but if you really 
want to use sparse ZVOLs there is a switch for it in 'zfs create'

 -This will work, but TEST, TEST, TEST for your particular scenario.
 -Yes, this can be built for less than $30k US for your storage size 
requirement.
 -I get ~150MB/s throughput on this setup with 2 storage nodes of 6 
disks each. Appears as ~3TB mirror on ZFS nodes.
 -Use Build 64 or later, as there is a ZVOL bug in b63 if I'm not 
mistaken. Probably a good idea to read through the open ZFS bugs, too.

Re: [zfs-discuss] current state of play with ZFS boot and install?

2007-05-31 Thread Marko Milisavljevic


I second that... I am trying to figure out what is missing so that I
can use ZFS exclusively... right now as far as I know two major
obstacles are no support from installer and issues with live update.
Are both of those expected to be resolved this year?

On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote:

Out of curiosity, I'm wondering if Lori, or anyone else who actually writes the 
stuff, has any sort of a 'current state of play' page that describes the latest 
OS ON release and how it does ZFS boot and installs? There's blogs all over the 
place, of course, which have a lot of stale information, but is there a 'the 
current release supports this, and this is how you install it' page anywhere, 
or somewhere in particular to watch?

I've been playing with ZFS boot since around b34 or whenever it was that it 
first started to be able to be used as a boot partition with the temporary ufs 
partition hack, but I understand it's moved beyond that.

I've been downloading and playing with the ON builds every now and then, but haven't 
found (haven't looked in the right places?) anywhere where each build has this is 
what this build does differently, this is what works and how documented.

can someone belt me with a cluestick please?


This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] current state of play with ZFS boot and install?

2007-05-31 Thread Mike Dotson

I've been using the zfsbootkit to modify my jumpstart images.  As far as
I know, the kit is the current process for zfs boot until further
notice.

http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2

See readme in the package.

On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote:
 I second that... I am trying to figure out what is missing so that I
 can use ZFS exclusively... right now as far as I know two major
 obstacles are no support from installer and issues with live update.
 Are both of those expected to be resolved this year?
 
 On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote:
  Out of curiosity, I'm wondering if Lori, or anyone else who actually writes 
  the stuff, has any sort of a 'current state of play' page that describes 
  the latest OS ON release and how it does ZFS boot and installs? There's 
  blogs all over the place, of course, which have a lot of stale information, 
  but is there a 'the current release supports this, and this is how you 
  install it' page anywhere, or somewhere in particular to watch?
 
  I've been playing with ZFS boot since around b34 or whenever it was that it 
  first started to be able to be used as a boot partition with the temporary 
  ufs partition hack, but I understand it's moved beyond that.
 
  I've been downloading and playing with the ON builds every now and then, 
  but haven't found (haven't looked in the right places?) anywhere where each 
  build has this is what this build does differently, this is what works and 
  how documented.
 
  can someone belt me with a cluestick please?
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
-- 
Mike Dotson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS Directory

2007-05-31 Thread kanishk

i wanted to know how does ZFS finds an entry of a file from its dirctory 
object.

anylinks to the code will be highly appriciated.

thankx regards
kanishk

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-31 Thread Al Hopper


On Thu, 31 May 2007, Darren J Moffat wrote:

Since you are doing iSCSI and may not be running ZFS on the initiator 
(client) then I highly recommend that you run with IPsec using at least AH 
(or ESP with Authentication) to protect the transport.  Don't assume that 
your network is reliable.  ZFS won't help you here if it isn't running on the


[Hi Darren]

Thats a curious recommendation!  You don't think that TCP/IP is 
reliable enough to provide iSCSI data integrity?

What errors and error rates have you seen?

iSCSI initiator, and even if it is it would need two targets to be able to 
repair.


Regards,

Al Hopper  Logical Approach Inc, Plano, TX.  [EMAIL PROTECTED]
   Voice: 972.379.2133 Fax: 972.379.2134  Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-31 Thread Darren J Moffat


Al Hopper wrote:

On Thu, 31 May 2007, Darren J Moffat wrote:

Since you are doing iSCSI and may not be running ZFS on the initiator 
(client) then I highly recommend that you run with IPsec using at 
least AH (or ESP with Authentication) to protect the transport.  Don't 
assume that your network is reliable.  ZFS won't help you here if it 
isn't running on the


[Hi Darren]

Thats a curious recommendation!  You don't think that TCP/IP is reliable 
enough to provide iSCSI data integrity?


No I don't.  Also I don't personally thing that the access control model 
of iSCSI is sufficient and trust IPsec more in that respect.


Personally I would actually like to see at IPsec AH be the default for 
all traffic that isn't otherwise doing a cryptographically strong 
integrity check of its own.



What errors and error rates have you seen?


I have seen switches flip bits in NFS traffic such that the TCP checksum 
still match yet the data was corrupted.  One of the ways we saw this was 
when files were being checked out of SCCS, the SCCS checksum failed. 
Other ways we saw it was the compiler failing to compile untouched code.


Just like we with ZFS we don't trust the HBA and the disks to give us 
correct data. With iSCSI the network is your HBA and cableing and in 
part your disk controller as well.   Defence in depth is a common mantra 
in the security geek world, I take that forward to protecting the data 
in transit too even when it isn't purely for security reasons.


--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS Directory

2007-05-31 Thread Sanjeev Bagewadi


Kanishk,

Directories are implemented as ZAP objects.

Look at the routines in that order :
- zfs_lookup()
- zfs_dirlook()
- zfs_dirent_lock()
- zap_lookup

Hope that helps.

Regards,
Sanjeev.

kanishk wrote:

i wanted to know how does ZFS finds an entry of a file from its 
dirctory object.

anylinks to the code will be highly appriciated.

thankx regards
kanishk

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] current state of play with ZFS boot and install?

2007-05-31 Thread Malachi de Ælfweald


I am running the zfsboot from b62. So far, it has been recommended that I
not upgrade to a newer build.

Malachi

On 5/31/07, Mike Dotson [EMAIL PROTECTED] wrote:


I've been using the zfsbootkit to modify my jumpstart images.  As far as
I know, the kit is the current process for zfs boot until further
notice.


http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2

See readme in the package.

On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote:
 I second that... I am trying to figure out what is missing so that I
 can use ZFS exclusively... right now as far as I know two major
 obstacles are no support from installer and issues with live update.
 Are both of those expected to be resolved this year?

 On 5/30/07, Carl Brewer [EMAIL PROTECTED] wrote:
  Out of curiosity, I'm wondering if Lori, or anyone else who actually
writes the stuff, has any sort of a 'current state of play' page that
describes the latest OS ON release and how it does ZFS boot and installs?
There's blogs all over the place, of course, which have a lot of stale
information, but is there a 'the current release supports this, and this is
how you install it' page anywhere, or somewhere in particular to watch?
 
  I've been playing with ZFS boot since around b34 or whenever it was
that it first started to be able to be used as a boot partition with the
temporary ufs partition hack, but I understand it's moved beyond that.
 
  I've been downloading and playing with the ON builds every now and
then, but haven't found (haven't looked in the right places?) anywhere where
each build has this is what this build does differently, this is what works
and how documented.
 
  can someone belt me with a cluestick please?
 
 
  This message posted from opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Mike Dotson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Which IO transfer size are zfs using when writing to disk ?

2007-05-31 Thread Richard Elling


Erik Lund wrote:

The parameters are:
maxphys=0x20, sd_max_xfer_size = 0x20 og sd_max_xfer_size = 0x20

Can i be sure that the IO transfer is 2MB or  ?


Use iostat or another tool (DTrace iosnoop.d) to see for sure.
Note that these are upper limits...


I wants to lign up my Sun ST6140 for optimal performance and therefore needs to 
know the transfer size.


OK, but since this is zfs-discuss, I presume you know that ZFS uses a
maximum block size of 128kBytes.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: how to move a zfs file system between disks

2007-05-31 Thread Chris Gerhard

It is not possible to use send and receive of the pool is not imported. It is 
however possible to use send and receive when the file system is not mounted.

--chris
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-31 Thread Matty


On 5/31/07, Darren J Moffat [EMAIL PROTECTED] wrote:

Since you are doing iSCSI and may not be running ZFS on the initiator
(client) then I highly recommend that you run with IPsec using at least
AH (or ESP with Authentication) to protect the transport.  Don't assume
that your network is reliable.  ZFS won't help you here if it isn't
running on the iSCSI initiator, and even if it is it would need two
targets to be able to repair.


If you don't intend to encrypt the iSCSI headers / payloads, why not
just use the header and data digests that are part of the iSCSI
protocol?

Thanks,
- Ryan
--
UNIX Administrator
http://prefetch.net
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] current state of play with ZFS boot and install?

2007-05-31 Thread Lori Alt



Carl's request for a current state of play is a reasonable
one.  I have modified this page:

http://www.opensolaris.org/os/community/zfs/boot/netinstall/

to include a list of status updates.  I will keep it current
so that anyone who wants to know how to install zfs boot
using a netinstall can get a working combination of
Solaris community release and the netinstall kit.

Lori


Malachi de Ælfweald wrote:
I am running the zfsboot from b62. So far, it has been recommended 
that I not upgrade to a newer build.


Malachi

On 5/31/07, *Mike Dotson*  [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


I've been using the zfsbootkit to modify my jumpstart images.  As
far as
I know, the kit is the current process for zfs boot until further
notice.


http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2

http://www.opensolaris.org/os/community/install/files/zfsboot-kit-20060418.i386.tar.bz2

See readme in the package.

On Thu, 2007-05-31 at 02:06 -0700, Marko Milisavljevic wrote:
 I second that... I am trying to figure out what is missing so that I
 can use ZFS exclusively... right now as far as I know two major
 obstacles are no support from installer and issues with live update.
 Are both of those expected to be resolved this year?

 On 5/30/07, Carl Brewer [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:
  Out of curiosity, I'm wondering if Lori, or anyone else who
actually writes the stuff, has any sort of a 'current state of
play' page that describes the latest OS ON release and how it does
ZFS boot and installs? There's blogs all over the place, of
course, which have a lot of stale information, but is there a 'the
current release supports this, and this is how you install it'
page anywhere, or somewhere in particular to watch?
 
  I've been playing with ZFS boot since around b34 or whenever
it was that it first started to be able to be used as a boot
partition with the temporary ufs partition hack, but I understand
it's moved beyond that.
 
  I've been downloading and playing with the ON builds every now
and then, but haven't found (haven't looked in the right places?)
anywhere where each build has this is what this build does
differently, this is what works and how documented.
 
  can someone belt me with a cluestick please?
 
 
  This message posted from opensolaris.org http://opensolaris.org
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
--
Mike Dotson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS consistency guarantee

2007-05-31 Thread ganesh

Hi Folks,
how can i guarantee consistency for the ZFS snapshots?.
If i am running a db or any other app on my ZFS and want to take a snapshot is 
there is any filesystem equivalent command to quiesce the ZFS before taking a 
snapshot or do i have to rely on the app itself?.
Can i do something like lockfs or the like?. If i take snapshost on the 
storage, how can i guarantee consistency on those snapshosts?. Any methods to 
quiesce the FS after which i can take snapshosts on storage?.

thanks for any inputs.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS consistency guarantee

2007-05-31 Thread Richard Elling


ganesh wrote:

Hi Folks,
how can i guarantee consistency for the ZFS snapshots?.
If i am running a db or any other app on my ZFS and want to take a snapshot is 
there is any filesystem equivalent command to quiesce the ZFS before taking a 
snapshot or do i have to rely on the app itself?.


You almost always have to quiesce the app in order to flush its buffers.


Can i do something like lockfs or the like?. If i take snapshost on the 
storage, how can i guarantee consistency on those snapshosts?. Any methods to 
quiesce the FS after which i can take snapshosts on storage?.


zfs snapshot
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Overview (rollup) of recent activity on zfs-discuss

2007-05-31 Thread Eric Boutilier


For background on what this is, see:

http://www.opensolaris.org/jive/message.jspa?messageID=24416#24416
http://www.opensolaris.org/jive/message.jspa?messageID=25200#25200

=
zfs-discuss 05/01 - 05/15
=

Size of all threads during period:

Thread size Topic
--- -
 36   gzip compression throttles system?
 26   Lots of overhead with ZFS - what am I doing wrong?
 22   Motley group of discs?
 15   ZFS Support for remote mirroring
 13   Need guidance on RAID 5, ZFS, and RAIDZ on home file server
 12   Optimal strategy (add or replace disks) to build a cheap and 
raidz?
 11   ZFS over a layered driver interface
 10   ZFS not utilizing all disks
 10   Resilvering speed?
  9   zfs boot image conversion kit is posted
  8   How does ZFS write data to disks?
  7   Will this work?
  7   Filesystem Benchmark
  6   setup_install_server, cpio and zfs : fix needed ?
  6   ZFS vs UFS2 overhead and may be a bug?
  6   ZFS Storage Pools Recommendations for Productive Environments
  6   Odd zpool create error
  5   recovered state after system crash
  5   Zpool, RaidZ  how it spreads its disk load?
  5   Samba and ZFS ACL Question
  5   Remove files when at quota limit
  5   Lost in boot loop..
  5   Is this a workable ORACLE disaster recovery solution?
  4   zpool create -f ... fails on disk with previous
  4   zfs and jbod-storage
  4   does every fsync() require O(log n) platter-writes?
  4   ZFS raid on removable media for backups/temporary use possible?
  4   ZFS Snapshot destroy to
  4   Permanently removing vdevs from a pool
  4   Issue with adding existing EFI disks to a zpool
  4   A quick ZFS question: RAID-Z Disk Replacement + Growth ?
  4   ZFS: Under The Hood at LOSUG (16/05/07)
  3   zpool create -f ... fails on disk with previous UFS on it
  3   iscsitadm local_name in ZFS
  3   Will this work?]
  3   Very Large Filesystems
  3   Q: recreate pool?
  3   Multiple filesystem costs? Directory sizes?
  3   Force rewriting of all data, to push stripes onto newly added 
devices?
  3   Extremely long ZFS destroy operations
  3   Clear corrupted data
  3   Boot disk clone with zpool present
  3   Automatic rotating snapshots
  2   zfs tcsh command completion
  2   zfs lost function
  2   zdb -l goes wild about the labels
  2   tape-backup software (was: Very Large Filesystems)
  2   snv63: kernel panic on import
  2   Solaris Backup Server
  2   Motley group of discs? (doing it right, or right now)
  2   External eSata ZFS raid possible?
  2   Best way to migrate filesystems to ZFS?
  2   ARC, mmap, pagecache...
  2   3320 JBOD setup
  1   zpool status faulted, but raid1z status is online?
  1   zpool list and df -k difference
  1   zpool import - arc problem?
  1   zpool command causes a crash of my server
  1   zfs send/receive question
  1   zfs performance on fuse (Linux) compared to other fs
  1   zfs dataset option relations
  1   thoughts on ZFS copies
  1   simple Raid-Z question
  1   crash
  1   ZFS with raidz
  1   ZFS in S10update 4
  1   ZFS improvements
  1   ZFS and Oracle db production deployment
  1   ZFS Boot: Dividing up the name space
  1   Who modified my ZFS receive destination?
  1   Summary: Poor man's backup by attaching/detaching mirror drives 
on a _striped_ pool?
  1   Optimal strategy (add or replace disks) tobuild a cheap and raidz?
  1   Optimal strategy (add or replace disks) to build acheap and raidz?
  1   Move data from the zpool (root) to a zfs file system
  1   Filesystem full not reported in /var/adm/messages
  1   Benchmarking
  1   Benchmark which models ISP workloads
  1   B62 AHCI and ZFS


Posting activity by person for period:

# of posts  By
--   --
 15   matthew.ahrens at sun.com (matthew ahrens)
 15   ian at ianshome.com (ian collins)
 14   richard.elling at sun.com (richard elling)
 13   rmilkowski at task.gda.pl (robert milkowski)
 11   me at tomservo.cc (mario goebbels)
 11   marko at cognistudio.com (marko milisavljevic)
 10   jk at tools.de (=?utf-8?q?j=c3=bcrgen_keil?=)
  7   al at logical-approach.com (al hopper)
  6   toby at smartgames.ca (toby thain)
  6   tmcmahon2

Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-31 Thread Lori Alt


zfs-boot crowd:

I said I'd try to come up with a procedure for liveupgrading
the netinstalled zfs-root setup, but I haven't found time to
do so yet (I'm focusing on getting this supported in install
for real).  So while I hate to retreat into the I never said
you could upgrade this configuration excuse, that's what
I'm going to do, at least for now.  I might get a chance
to work on a liveupgrade procedure in the next couple of
weeks.  In the meantime, if someone else wants to take
a shot at it and post the results, go ahead.

Lori

Malachi de Ælfweald wrote:
No, I did mean 'snapshot -r' but I thought someone on the list said 
that the '-r' wouldn't work until b63... hmmm...


Well, realistically, all of us new to this should probably know how to 
patch our system before we put any useful data on it anyway, right? :)


Thanks,
Mal

On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


Hi Malachi,

Malachi de Ælfweald wrote:
 I'm actually wondering the same thing because I have b62 w/ the ZFS
 bits; but need the snapshot's -r functionality.

you're lucky, it's already there. From my b62 machine's man zfs:

 zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED]

 Creates  a  snapshot  with  the  given  name.  See   the
 Snapshots section for details.

 -rRecursively create  snapshots  of  all  descendant
   datasets.  Snapshots are taken atomically, so that
   all recursive snapshots  correspond  to  the  same
   moment in time.

Or did you mean send -r?

Best regards,
   Constantin


--
Constantin GonzalezSun Microsystems
GmbH, Germany
Platform Technology Group, Global Systems
Engineering  http://www.sun.de/
Tel.: +49 89/4 60 08-25 91  
http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/


Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551
Kirchheim-Heimstetten
Amtsgericht Muenchen: HRB 161028
Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
Boemer
Vorsitzender des Aufsichtsrates: Martin Haering




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS consistency guarantee

2007-05-31 Thread Darren Dunham

 how can i guarantee consistency for the ZFS snapshots?.

Filesystem consistency or application/data consistency?

 If i am running a db or any other app on my ZFS and want to take a
 snapshot is there is any filesystem equivalent command to quiesce the
 ZFS before taking a snapshot or do i have to rely on the app itself?.

Because ZFS is taking the snapshot, it is able to guarantee filesystem
consistency.

However, it cannot speak to the data or application contents.  You have
to do that, and ensure it has a consistent on-disk image at the time of
the snapshot.  This is the same as any other snapshot or copy technique
would require.

 Can i do something like lockfs or the like?. If i take snapshost on
 the storage, how can i guarantee consistency on those snapshosts?. Any
 methods to quiesce the FS after which i can take snapshosts on
 storage?.

At the filesystem level, that's all taken care of.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zpool relayout

2007-05-31 Thread Vic Engle

Just a quick question. If I create a raidz pool but then later find that I need 
more space I can add another raidz set to the pool but what happens to data 
already in the pool? Does a relayout occur or does zfs work towards balancing 
I/O to the pool across the 2 raidz sets only as new data is written?

Also, is it possible to explicitly request a relayout; for example can I 
convert a raidz1 pool to a raidz2 pool?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool relayout

2007-05-31 Thread Darren Dunham

 Just a quick question. If I create a raidz pool but then later find
 that I need more space I can add another raidz set to the pool but
 what happens to data already in the pool? Does a relayout occur or
 does zfs work towards balancing I/O to the pool across the 2 raidz
 sets only as new data is written?

Technically, raidz describes a vdev in a pool, not a pool itself.  

So yes, you can add another raidz to the pool.  New data is striped
across both components, but weighted to the empty one to try to balance
things out a bit over time.

No relayout occurs.

 Also, is it possible to explicitly request a relayout; for example can
 I convert a raidz1 pool to a raidz2 pool?

Not today.  My assumption is that other items (like zpool
shrink/evacuation) are being targeted as a higher priority.

-- 
Darren Dunham   [EMAIL PROTECTED]
Senior Technical Consultant TAOShttp://www.taos.com/
Got some Dr Pepper?   San Francisco, CA bay area
  This line left intentionally blank to confuse you. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS + ISCSI + LINUX QUESTIONS

2007-05-31 Thread Michael Li


Al Hopper 提到:


On Thu, 31 May 2007, David Anderson wrote:

 snip .


Other:
-Others have reported that Sil3124 based SATA expansion cards work 
well with Solaris.



[Sorry - don't mean to hijack this interesting thread]

I believe that there is a serious bug with the si3124 driver that has 
not been addressed. Ben Rockwood and I have seen it firsthand, and a 
quick look at the Hg logs shows that si3124.c has not been changed in 
6 months.


Basic description of the bug: under heavy load (lots of I/O ops/Sec) 
all data from the drive(s) will completely stop for an extended period 
of time - 60 to 90+ Seconds.


There was a recent discussion of the same issue on the Solaris on x86 
list ([EMAIL PROTECTED]) - several experienced x86ers have 
seen this bug and found the current driver unusable. Interestingly, 
one individual said (paraphrased) ... don't see any issues and then 
later ... now I see it and it was there the entire time.


Recommendation: If you plan to use the 3124 driver, test it yourself 
under heavy load. A simple test with one disk drive will suffice.


In my case, it was plainly obvious with one (ex Sun M20) drive and a 
UFS filesystem - all I was doing was tarring up /export/home to 
another drive. Periodically the tar process would simply stop (iostat 
went flatline) - it looked like the system was going to crash - then 
(after 60+ Secs) the tar process continued as if nothing had happened. 
This was repeated 4 or 5 times before the 'tar cvf' (of around 40Mb of 
data) completed successfully.


Regards,

Al Hopper Logical Approach Inc, Plano, TX. [EMAIL PROTECTED]
Voice: 972.379.2133 Fax: 972.379.2134 Timezone: US CDT
OpenSolaris Governing Board (OGB) Member - Apr 2005 to Mar 2007
http://www.opensolaris.org/os/community/ogb/ogb_2005-2007/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Does the si3124 bug Hopper mentioned has something to do with below 
ERROR? I met them in workspace warlock build step, but I did nothing to 
si3124 codes...


warlock -c ../../common/io/warlock/si3124.wlcmd si3124.ll \
../sd/sd.ll ../sd/sd_xbuf.ll \
-l ../scsi/scsi_capabilities.ll -l ../scsi/scsi_control.ll -l 
../scsi/scsi_watch.ll -l ../scsi/scsi_data.ll -l 
../scsi/scsi_resource.ll -l ../scsi/scsi_subr.ll -l ../scsi/scsi_hba.ll 
-l ../scsi/scsi_transport.ll -l ../scsi/scsi_confsubr.ll -l 
../scsi/scsi_reset_notify.ll \

-l ../cmlb/cmlb.ll \
-l ../sata/sata.ll \
-l ../warlock/ddi_dki_impl.ll

The following variables don't seem to be protected consistently:

dev_info::devi_state

*** Error code 10
make: Fatal error: Command failed for target `si3124.ok'
Current working directory 
/net/greatwall/workspaces/wifi_rtw/usr/src/uts/intel/si3124

*** Error code 1
The following command caused the error:
cd ../si3124; make clean; make warlock
make: Fatal error: Command failed for target `warlock.sata'
Current working directory 
/net/greatwall/workspaces/wifi_rtw/usr/src/uts/intel/warlock


-
Michael

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zpool relayout

2007-05-31 Thread Tomas Ögren

On 31 May, 2007 - Vic Engle sent me these 0,6K bytes:

 Just a quick question. If I create a raidz pool but then later find
 that I need more space I can add another raidz set to the pool but
 what happens to data already in the pool? Does a relayout occur or
 does zfs work towards balancing I/O to the pool across the 2 raidz
 sets only as new data is written?

If you have a raidz of say 500G, filled with 300G of data.. then you add
another raidz of 500G and start writing.. ZFS will put more data on the
second raidz thing to even out the distribution..

 Also, is it possible to explicitly request a relayout; for example can
 I convert a raidz1 pool to a raidz2 pool?

Currently no.

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS vs UFS performance measurement

2007-05-31 Thread Durga Deep Tirunagari

Hi folks,

 We have the following disks :and we want to create a STRIPE

 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0  c8t3d0 c8t4d0 c8t5d0 c8t8d0 
c8t9d0

What we would like to measure is how the following two STRIPES perform
STRIPE ( Created using Solaris Volume Manager ) 
STRIPE created with ZFS. 

How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, 
etc... ). We want to make sure that the two STRIPE configurations are 
identical. Any pointers ?

Thanks in Advance
_D
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vs UFS performance measurement

2007-05-31 Thread Richard Elling


Durga Deep Tirunagari wrote:

Hi folks,

 We have the following disks :and we want to create a STRIPE

 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0  c8t3d0 c8t4d0 c8t5d0 c8t8d0 
c8t9d0

What we would like to measure is how the following two STRIPES perform
STRIPE ( Created using Solaris Volume Manager ) 
STRIPE created with ZFS. 


How can I acheive the exact STRIPE ( w.r.t to the interleaving, stripe size, 
etc... ). We want to make sure that the two STRIPE configurations are 
identical. Any pointers ?


This is not possible because SVM uses a fixed stripe allocation and ZFS
uses dynamic stripe allocation.  Reads and writes for ZFS will only use
the number of disks needed, not all of the disks (every time) like SVM.

In other words, it is like comparing apples and oranges, at least as far
as RAID implementations is concerned.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-31 Thread Jason King


I've had at least some success (tried it once so far) doing a BFU to cloned
filesystem from a b62 zfs root system, I could probably document that if
there is interest.

I have not tried taking a new ISO and installing the new packages ontop of a
cloned fileystem though.

On 5/31/07, Lori Alt [EMAIL PROTECTED] wrote:


zfs-boot crowd:

I said I'd try to come up with a procedure for liveupgrading
the netinstalled zfs-root setup, but I haven't found time to
do so yet (I'm focusing on getting this supported in install
for real).  So while I hate to retreat into the I never said
you could upgrade this configuration excuse, that's what
I'm going to do, at least for now.  I might get a chance
to work on a liveupgrade procedure in the next couple of
weeks.  In the meantime, if someone else wants to take
a shot at it and post the results, go ahead.

Lori

Malachi de Ælfweald wrote:
 No, I did mean 'snapshot -r' but I thought someone on the list said
 that the '-r' wouldn't work until b63... hmmm...

 Well, realistically, all of us new to this should probably know how to
 patch our system before we put any useful data on it anyway, right? :)

 Thanks,
 Mal

 On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED]
 mailto: [EMAIL PROTECTED] wrote:

 Hi Malachi,

 Malachi de Ælfweald wrote:
  I'm actually wondering the same thing because I have b62 w/ the
ZFS
  bits; but need the snapshot's -r functionality.

 you're lucky, it's already there. From my b62 machine's man zfs:

  zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED]

  Creates  a  snapshot  with  the  given  name.  See   the
  Snapshots section for details.

  -rRecursively create  snapshots  of  all  descendant
datasets.  Snapshots are taken atomically, so that
all recursive snapshots  correspond  to  the  same
moment in time.

 Or did you mean send -r?

 Best regards,
Constantin


 --
 Constantin GonzalezSun Microsystems
 GmbH, Germany
 Platform Technology Group, Global Systems
 Engineering   http://www.sun.de/
 Tel.: +49 89/4 60 08-25 91
 http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/ 

 Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551
 Kirchheim-Heimstetten
 Amtsgericht Muenchen: HRB 161028
 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
 Boemer
 Vorsitzender des Aufsichtsrates: Martin Haering


 

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs migration

2007-05-31 Thread Krzys

Sorry to bother you but something is not clear to me regarding this process.. 
Ok, lets sat I have two internal disks (73gb each) and I am mirror them... now I 
want to replace those two mirrored disks into one LUN that is on SAN and it is 
around 100gb. Now I do meet one requirement of having more than 73gb of storage 
but do I need only something like 73gb at minimum or do I actually need two luns 
of 73gb or more since I have it mirrored?


My goal is simple to move data of two mirrored disks into one single SAN 
device... Any ideas if what I am planning to do is duable? or do I need to use 
zfs send and receive and just update everything and switch when I am done?


or do I just add this SAN disk to the existing pool and then remove mirror 
somehow? I would just have to make sure that all data is off that disk... is 
there any option to evacuate data off that mirror?



here is what I exactly have:
bash-3.00# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
mypool   68G   52.9G   15.1G77%  ONLINE -
bash-3.00# zpool status
  pool: mypool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0

errors: No known data errors
bash-3.00#


On Tue, 29 May 2007, Cyril Plisko wrote:


On 5/29/07, Krzys [EMAIL PROTECTED] wrote:

Hello folks, I have a question. Currently I have zfs pool (mirror) on two
internal disks... I wanted to connect that server to SAN, then add more 
storage
to this pool (double the space) then start using it. Then what I wanted to 
do is
just take out the internal disks out of that pool and use SAN only. Is 
there any
way to do that with zfs pools? Is there any way to move data from those 
internal

disks to external disks?


You can zpool replace your disks with other disks. Provided that you have
same amount of new disks and they are of same or greater size


--
Regards,
  Cyril


!DSPAM:122,465c515921755021468!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vs UFS performance measurement

2007-05-31 Thread Durga . Tirunagari


Richard Elling wrote:


Durga Deep Tirunagari wrote:


Hi folks,

 We have the following disks :and we want to create a STRIPE

 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0  c8t3d0 c8t4d0 
c8t5d0 c8t8d0 c8t9d0


What we would like to measure is how the following two STRIPES perform
STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS.
How can I acheive the exact STRIPE ( w.r.t to the interleaving, 
stripe size, etc... ). We want to make sure that the two STRIPE 
configurations are identical. Any pointers ?



This is not possible because SVM uses a fixed stripe allocation and ZFS
uses dynamic stripe allocation.  Reads and writes for ZFS will only use
the number of disks needed, not all of the disks (every time) like SVM.

In other words, it is like comparing apples and oranges, at least as far
as RAID implementations is concerned.
  -- richard



hi Richard,

Please suggest an alternative ?. Please advice on whats the best course 
of action


_D
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS boot: Now, how can I do a pseudo live upgrade?

2007-05-31 Thread Tim Foster


Jason King wrote:
I've had at least some success (tried it once so far) doing a BFU to 
cloned filesystem from a b62 zfs root system, I could probably document 
that if there is interest.


Yep, been there too, weather's nice :-)

 http://blogs.sun.com/timf/entry/an_easy_way_to_manage

 (and previously http://blogs.sun.com/timf/entry/zfs_mountrootadm )

I have not tried taking a new ISO and installing the new packages ontop 
of a cloned fileystem though.


I seem to remember trying something like that before, didn't work. I
suspect there's more to it than that unfortunately - would love to have
the time to play about more with upgrade hacks.

cheers,
tim



On 5/31/07, *Lori Alt*  [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote:

zfs-boot crowd:

I said I'd try to come up with a procedure for liveupgrading
the netinstalled zfs-root setup, but I haven't found time to
do so yet (I'm focusing on getting this supported in install
for real).  So while I hate to retreat into the I never said
you could upgrade this configuration excuse, that's what
I'm going to do, at least for now.  I might get a chance
to work on a liveupgrade procedure in the next couple of
weeks.  In the meantime, if someone else wants to take
a shot at it and post the results, go ahead.

Lori

Malachi de Ælfweald wrote:
  No, I did mean 'snapshot -r' but I thought someone on the list said
  that the '-r' wouldn't work until b63... hmmm...

  Well, realistically, all of us new to this should probably know how to
  patch our system before we put any useful data on it anyway,
right? :)

  Thanks,
  Mal

  On 5/25/07, *Constantin Gonzalez* [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED]
  mailto: [EMAIL PROTECTED]
mailto:[EMAIL PROTECTED] wrote:

 Hi Malachi,

 Malachi de Ælfweald wrote:
  I'm actually wondering the same thing because I have b62 w/
the ZFS
  bits; but need the snapshot's -r functionality.

 you're lucky, it's already there. From my b62 machine's man zfs:

  zfs snapshot [-r] [EMAIL PROTECTED]|[EMAIL PROTECTED]

  Creates  a  snapshot  with  the  given  name.  See   the
  Snapshots section for details.

  -rRecursively create  snapshots  of  all  descendant
datasets.  Snapshots are taken atomically, so that
all recursive snapshots  correspond  to  the  same
moment in time.

 Or did you mean send -r?

 Best regards,
Constantin


 --
 Constantin GonzalezSun Microsystems
 GmbH, Germany
 Platform Technology Group, Global Systems
 Engineering   http://www.sun.de/
 Tel.: +49 89/4 60 08-25 91
 http://blogs.sun.com/constantin/
http://blogs.sun.com/constantin/ http://blogs.sun.com/constantin/
http://blogs.sun.com/constantin/

 Sitz d. Ges.: Sun Microsystems GmbH, Sonnenallee 1, 85551
 Kirchheim-Heimstetten
 Amtsgericht Muenchen: HRB 161028
 Geschaeftsfuehrer: Marcel Schneider, Wolfgang Engels, Dr. Roland
 Boemer
 Vorsitzender des Aufsichtsrates: Martin Haering





  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops
  http://blogs.sun.com/timf
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS vs UFS performance measurement

2007-05-31 Thread Richard Elling


[EMAIL PROTECTED] wrote:

Richard Elling wrote:


Durga Deep Tirunagari wrote:


Hi folks,

 We have the following disks :and we want to create a STRIPE

 c7t2d0 c7t3d0 c7t4d0 c7t5d0 c7t8d0 c7t9d0 c8t2d0  c8t3d0 c8t4d0 
c8t5d0 c8t8d0 c8t9d0


What we would like to measure is how the following two STRIPES perform
STRIPE ( Created using Solaris Volume Manager ) STRIPE created with ZFS.
How can I acheive the exact STRIPE ( w.r.t to the interleaving, 
stripe size, etc... ). We want to make sure that the two STRIPE 
configurations are identical. Any pointers ?



This is not possible because SVM uses a fixed stripe allocation and ZFS
uses dynamic stripe allocation.  Reads and writes for ZFS will only use
the number of disks needed, not all of the disks (every time) like SVM.

In other words, it is like comparing apples and oranges, at least as far
as RAID implementations is concerned.
  -- richard



hi Richard,

Please suggest an alternative ?. Please advice on whats the best course 
of action


What are you trying to accomplish?
The PAE group in Sun has a team working on ZFS performance and characterization.
Some results have been blogged externally.  It may be that your question is
already answered someplace.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Delegated Administration?

2007-05-31 Thread Haik Aftandilian

Is it possible to give a user control of a ZFS filesystem such that the user 
can create their own file systems within it, take snapshots, etc.?

Thanks,
Haik
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]

2007-05-31 Thread Richard Elling


Hi Mike,
more thoughts below...

Ellis, Mike wrote:

Hey Richard, thanks for sparking the conversation... This is a very
interesting topic (especially if you take it out of the HPC we need
1000 servers to have this minimal boot image space into general
purpose/enterprise computing)


CF cards aren't generally very fast, so the solid state disk vendors are
putting them into hard disk form factors with SAS/SATA interfaces.  These
will be more interesting because they are really fast and can employ
more sophisticated data protection methods -- like magnetic disk drives :-)


Based on your earlier note, it appears you're not planning to use cheapo
free after rebate CF cards :-) (The cheap-ones would probably be
perfect for ZFS a-la cheap-o-JBOD).


The price of flash memory has dropped by 50% this year.  Expect this trend
to follow Moore's law.


Having boot disks mirrored across controllers has had sys-admins sleep
better over the years (especially in FC-loop-cases with both drives on
the same loop... Sigh). If the USB-bus one might hang these fancy
FC-cards on is robust enough then perhaps a single battle hardened
CF-card will suffice... (although zfs ditto-blocks or some form of
protection might still be considered a good thing?)

Having 2 cards would certainly make the unlikely replacement of a card
a LOT more straight-forward than a single-card failure... Much of this
would depend on the quality of these CF-cards and how they put up under
load/stress/time 


Disagree.  With two cards, you have to implement software mirroring of
some sort.  While ZFS is a step in the right direction (simplifying the
process) it is unproven for long term system administration.  The costs
of implementing software mirroring occur in the complexity of managing
the software environment over time as upgrades and patches occur.
Reliability tends to trump availability for this reason.


--

If we're going down this CF-boot path, many of us are going to have to
re-think our boot-environment quite a bit. We've been spoiled with 36+
GB mirrored-boot drives for some time now  (if you do a lot of
PATCHING, you'll find that even those can get tight But that's a
discussion for a different day)

I don't think most enterprise boot disk layouts are going to fit (even
unmirrored) onto a single 4GB CF-card. So we'll have to play some games
where we start splitting off /opt, /var, (which is fairly read-write
intensive when you have process-accounting etc. running) onto some
other non-CF filesystem (likely a SAN of some variety). At some
point the hackery a 4GB CF-card is going to force us to do, is going to
become more complex than just biting the bullet and doing a full
multipath-ed SAN-boot  calling it a day. (or perhaps some future
iSCSI/NFS boot for the SAN-averse)


4 GBytes is possible, but 8 GBytes ( $100 today) will be more common.
16 GByte CFs are still above $100... wait a few months.  These are
often used for the high-end digital cameras, where there is no redundancy,
so the photography sites might be a good source of quality evaluations.


Seriously though... If (say in some HPC/grid space?) you can stick your
ENTIRE boot environment onto a 4GB CF-card, why not just do the SAN,
NFS/iSCSI boot thing instead? (what ever happened to:
http://blogs.sun.com/dweibel/entry/sprint_snw_2006#comments  )


Good question.  You can build an NFS service which is much more reliable
than a disk, quite easily in fact.  Some people get all upset about that,
though.

N.B. a client only needs the NFS service to be available when an I/O
operation is started.  Once you boot and have been running for a while,
most stuff should be cached in main memory and your reliance on the NFS
boot server is reduced.  This makes analysis of the reliability of such
systems difficult.


--

But lets explore the CF thing some more... There is something there,
although I think Sun might have to provide some
best-practices/suggestions as to how customers that don't run a
minimum-config-no-local-apps, pacct, monitoring, etc. solaris
environment are best to use something like this. Use it as a pivot boot
onto the real root-image? That would delegate the CF-card to little more
than a rescue/utility image Kinda cool, but not earth-shattering I
would think (especially for those already utilizing wanboot for such
purposes)


On my list of things to do is measure the actual block reuse patterns.
For ZFS, this isn't really interesting because of the COW.  For UFS, we
do expect some hot spots.  But even then, there is some debate over
whether the problems will hit in metadata first (file appends do not
rewrite original data, so logs aren't interesting).  Since UFS metadata
is not redundant (unlike ZFS) the issues may get tricky.  Somewhere on
my list of things to do... and it isn't a trivial data collection exercise.


--

Splitting off /var and friends from the boot environment (and still
packing the boot env say on a ditto-block 4GB FC card)

Re: [zfs-discuss] zfs migration

2007-05-31 Thread Richard Elling


Krzys wrote:
Sorry to bother you but something is not clear to me regarding this 
process.. Ok, lets sat I have two internal disks (73gb each) and I am 
mirror them... now I want to replace those two mirrored disks into one 
LUN that is on SAN and it is around 100gb. Now I do meet one requirement 
of having more than 73gb of storage but do I need only something like 
73gb at minimum or do I actually need two luns of 73gb or more since I 
have it mirrored?


You can attach any number of devices to a mirror.

You can detach all but one of the devices from a mirror.  Obviously, when
the number is one, you don't currently have a mirror.

The resulting logical size will be equivalent to the smallest device.

My goal is simple to move data of two mirrored disks into one single SAN 
device... Any ideas if what I am planning to do is duable? or do I need 
to use zfs send and receive and just update everything and switch when I 
am done?


or do I just add this SAN disk to the existing pool and then remove 
mirror somehow? I would just have to make sure that all data is off that 
disk... is there any option to evacuate data off that mirror?


The ZFS terminology is attach and detach  A replace is an attach
followed by detach.

It is a good idea to verify that the sync has completed before detaching.
zpool status will show the current status.
 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Delegated Administration?

2007-05-31 Thread Mark Shellenbaum


Haik Aftandilian wrote:

Is it possible to give a user control of a ZFS filesystem such that the user can create 
their own file systems within it, take snapshots, etc.?

Thanks,
Haik
 


Support for this should be available within the next month or two.  You 
should check out PSARC/2006/465


http://www.opensolaris.org/jive/thread.jspa?messageID=47766


  -Mark
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]

2007-05-31 Thread Richard Elling


Richard Elling wrote:

CF cards aren't generally very fast, so the solid state disk vendors are
putting them into hard disk form factors with SAS/SATA interfaces.  


Timing is everything... a new standard might help... let's call it miCard
http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=199703805

 -- richard
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] zfs boot error recovery

2007-05-31 Thread Jakob Praher

hi all,

i would like to ask some questions regarding best practices for zfs
recovery if disk errors occur.

currently i have zfs boot (nv62) and the following setup:

2 si3224 controllers (each 4 sata disks)
8 sata disks, same size, same type

i have two pools:
a) rootpool
b) datapool

the rootpool is a mirrored pool, where every disk has a slice (the s0,
which is 5 % of the whole disk) and this is devoted to the rootpool,
just for mirroring.

the rest of the disk (s1) is added to the datapool which is raidz.

my idea is that if any disk is corrupt i am still be able to boot.

now I have some questions:

a) if i want to boot from every disk in case of error, i have to setup
grub on every disk, such that if the controller sets this disk as the
booting, the rootpool is able to be loaded from that.

b) what is the best way to as fast as possible replace a disk.
adding a disk as hotspare for the raidz is a good idea. but i also would
like to replace the disk during runtime as simple as possible.

the problem is that for the root pool the disks are labeled (the slices
thingy). So I cannot simply  detach the volumes and replace the disk and
attach them again, but I have to format the disk such that the slicing
exists. Is there some clever way to automatically re-label a replacement
disk?

c) si 3224 related question: is it possible to simply hot swap the disk
(i have the disks in special hot-swappable units, but have no experience
in hotswapping under solaris, such that i want to have some echo).

d) do you have best practices for systems like that above? what are the
best resources on the web for learning about monitoring the health of
the zfs system (like email notifications in case of disk failures...)

thannks in advance
-- Jakob

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Delegated Administration?

2007-05-31 Thread Haik Aftandilian

Support for this should be available within the next month or two.  You 
should check out PSARC/2006/465


http://www.opensolaris.org/jive/thread.jspa?messageID=47766


This is what I was looking for.

Thanks,
Haik
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs migration

2007-05-31 Thread Krzys

Hmm, I am having some problems, I did follow what you suggested and here is what 
I did:


bash-3.00# zpool status
  pool: mypool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  ONLINE   0 0 0
  mirrorONLINE   0 0 0
c1t2d0  ONLINE   0 0 0
c1t3d0  ONLINE   0 0 0

errors: No known data errors
bash-3.00# zpool detach mypool c1t3d0
bash-3.00# zpool status
  pool: mypool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  ONLINE   0 0 0
  c1t2d0ONLINE   0 0 0

errors: No known data errors


so now I have only one disk in my pool... Now, the c1t2d0 disk is a 72fb SAS 
drive. I am trying to replace it with SAN 100GB LUN (emcpower0a)




bash-3.00# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c1t0d0 SUN72G cyl 14087 alt 2 hd 24 sec 424
  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   1. c1t1d0 SUN72G cyl 14087 alt 2 hd 24 sec 424
  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   2. c1t2d0 SEAGATE-ST973401LSUN72G-0556-68.37GB
  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   3. c1t3d0 FUJITSU-MAY2073RCSUN72G-0501-68.37GB
  /[EMAIL PROTECTED],60/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   4. c2t5006016041E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16
  /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0
   5. c2t5006016941E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16
  /[EMAIL PROTECTED],70/[EMAIL PROTECTED]/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0
   6. c3t5006016841E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16
  /[EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0
   7. c3t5006016141E035A4d0 DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16
  /[EMAIL PROTECTED],70/[EMAIL PROTECTED],2/SUNW,[EMAIL 
PROTECTED]/[EMAIL PROTECTED],0/[EMAIL PROTECTED],0
   8. emcpower0a DGC-RAID5-0324 cyl 51198 alt 2 hd 256 sec 16
  /pseudo/[EMAIL PROTECTED]
Specify disk (enter its number): ^D


so I do run replace command and I get and error:
bash-3.00# zpool replace mypool c1t2d0 emcpower0a
cannot replace c1t2d0 with emcpower0a: device is too small

Any idea what I am doing wrong? Why it thinks that emcpower0a is too small?

Regards,

Chris




On Thu, 31 May 2007, Richard Elling wrote:


Krzys wrote:
Sorry to bother you but something is not clear to me regarding this 
process.. Ok, lets sat I have two internal disks (73gb each) and I am 
mirror them... now I want to replace those two mirrored disks into one LUN 
that is on SAN and it is around 100gb. Now I do meet one requirement of 
having more than 73gb of storage but do I need only something like 73gb at 
minimum or do I actually need two luns of 73gb or more since I have it 
mirrored?


You can attach any number of devices to a mirror.

You can detach all but one of the devices from a mirror.  Obviously, when
the number is one, you don't currently have a mirror.

The resulting logical size will be equivalent to the smallest device.

My goal is simple to move data of two mirrored disks into one single SAN 
device... Any ideas if what I am planning to do is duable? or do I need to 
use zfs send and receive and just update everything and switch when I am 
done?


or do I just add this SAN disk to the existing pool and then remove mirror 
somehow? I would just have to make sure that all data is off that disk... 
is there any option to evacuate data off that mirror?


The ZFS terminology is attach and detach  A replace is an attach
followed by detach.

It is a good idea to verify that the sync has completed before detaching.
zpool status will show the current status.
-- richard


!DSPAM:122,465f396b235932151120594!


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]

2007-05-31 Thread Frank Cusack

On May 31, 2007 1:59:04 PM -0700 Richard Elling [EMAIL PROTECTED] 
wrote:

CF cards aren't generally very fast, so the solid state disk vendors are
putting them into hard disk form factors with SAS/SATA interfaces.  These


If CF cards aren't fast, how will putting them into a different form
factor make them faster?

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs migration

2007-05-31 Thread Will Murnane


On 5/31/07, Krzys [EMAIL PROTECTED] wrote:

so I do run replace command and I get and error:
bash-3.00# zpool replace mypool c1t2d0 emcpower0a
cannot replace c1t2d0 with emcpower0a: device is too small

Try zpool attach mypool emcpower0a; see
http://docs.sun.com/app/docs/doc/819-5461/6n7ht6qrt?a=view .

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Thoughts on CF/SSDs [was: ZFS - Use h/w raid or not?Thoughts.Considerations.]

2007-05-31 Thread Bart Smaalders


Frank Cusack wrote:
On May 31, 2007 1:59:04 PM -0700 Richard Elling [EMAIL PROTECTED] 
wrote:

CF cards aren't generally very fast, so the solid state disk vendors are
putting them into hard disk form factors with SAS/SATA interfaces.  These


If CF cards aren't fast, how will putting them into a different form
factor make them faster?


Well, if I were doing that I'd use DRAM and provide
enough on-board capacitance and a small processor to copy
the contents of the DRAM to flash on power failure.

- Bart


--
Bart Smaalders  Solaris Kernel Performance
[EMAIL PROTECTED]   http://blogs.sun.com/barts
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] zfs boot error recovery

2007-05-31 Thread Will Murnane


On 5/31/07, Jakob Praher [EMAIL PROTECTED] wrote:

c) si 3224 related question: is it possible to simply hot swap the disk
(i have the disks in special hot-swappable units, but have no experience
in hotswapping under solaris, such that i want to have some echo).

As it happens, I just happen to have tried this - albeit on a
different card, it went well.  I have a Marvell 88SX6081 controller,
and removing a disk caused no undue panic (as far as I can tell).
Adding a new disk, the kernel detected it immediately and then I had
to run cfgadm -cconfigure scsi0/1 or something like that.  Then it
Just Worked.  I don't know if this is recommended or not... but it
worked for me.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Re: zfs boot error recovery

2007-05-31 Thread Jakob Praher


Jakob Praher schrieb:

hi all,

i would like to ask some questions regarding best practices for zfs
recovery if disk errors occur.

currently i have zfs boot (nv62) and the following setup:

2 si3224 controllers (each 4 sata disks)
8 sata disks, same size, same type

i have two pools:
a) rootpool
b) datapool

the rootpool is a mirrored pool, where every disk has a slice (the s0,
which is 5 % of the whole disk) and this is devoted to the rootpool,
just for mirroring.

the rest of the disk (s1) is added to the datapool which is raidz.

my idea is that if any disk is corrupt i am still be able to boot.

now I have some questions:

a) if i want to boot from every disk in case of error, i have to setup
grub on every disk, such that if the controller sets this disk as the
booting, the rootpool is able to be loaded from that.

b) what is the best way to as fast as possible replace a disk.
adding a disk as hotspare for the raidz is a good idea. but i also would
like to replace the disk during runtime as simple as possible.

the problem is that for the root pool the disks are labeled (the slices
thingy). So I cannot simply  detach the volumes and replace the disk and
attach them again, but I have to format the disk such that the slicing
exists. Is there some clever way to automatically re-label a replacement
disk?



i found out that storing or getting the label information from another 
disk should work:


prtvtoc /dev/rdsk/s2 | fmthard -s - /dev/rdsk/s2

for instance i could simply store the label of all disks on the root 
pool, which should be available as long as any of the 8 disks is still 
availabe. So in case of repair i simply have to fmthard -s disknumber 
before attaching the replaced disk.



c) si 3224 related question: is it possible to simply hot swap the disk
(i have the disks in special hot-swappable units, but have no experience
in hotswapping under solaris, such that i want to have some echo).

d) do you have best practices for systems like that above? what are the
best resources on the web for learning about monitoring the health of
the zfs system (like email notifications in case of disk failures...)

thannks in advance
-- Jakob


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

41 matches

Mail list logo