Re: [zfs-discuss] Logical Units and ZFS send / receive

2010-08-04 Thread Terry Hull



> From: Richard Elling 
> Date: Wed, 4 Aug 2010 18:40:49 -0700
> To: Terry Hull 
> Cc: "zfs-discuss@opensolaris.org" 
> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive
> 
> On Aug 4, 2010, at 1:27 PM, Terry Hull wrote:
>>> From: Richard Elling 
>>> Date: Wed, 4 Aug 2010 11:05:21 -0700
>>> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive
>>> 
>>> On Aug 3, 2010, at 11:58 PM, Terry Hull wrote:
 I have a logical unit created with sbdadm create-lu that it I replicating
 with zfs send  / receive between 2 build 134 hosts.   The these LUs are
 iSCSI
 targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003
 machine.   The zfs pool names are the same on both machines.  The
 replication
 seems to be going correctly.  However, when I try to use the LUs on the
 server I am replicating the data to, I have issues.   Here is the scenario:
 
 The LUs are created as sparse.  Here is the process I¹m going through after
 the snapshots are replicated to a secondary machine:
>>> 
>>> How did you replicate? In b134, the COMSTAR metadata is placed in
>>> hidden parameters in the dataset. These are not transferred via zfs send,
>>> by default.  This metadata includes the LU.
>>> -- richard
>> 
>> Does the -p option on the zfs send solve that problem?
> 
> I am unaware of a "zfs send -p" option.  Did you mean the -R option?
> 
> The LU metadata is stored in the stmf_sbd_lu property.  You should be able
> to get/set it.
> 

On the source machine I did a

zfs get -H stmf_sbd_lu pool-name.  In my case that gave me

tank/iscsi/bg-man5-vmfs stmf_sbd_lu
554c4442534e555307020702
010001843000b7010100ff862005
00c01200
180009fff1030010600144f0fa354000
4c4f9edb0003





7461
6e6b2f69736373692f62672d6d616e352d766d6673002f6465762f7a766f6c2f7264736b2f74
616e6b2f69736373692f62672d6d616e352d766d667300e70100
002200ff080 local

(But it was all one line.)

I cut the numeric section out above and then did a

Zfs set stmf_sbd_lu=(above cut section) pool_name

And that seemed to work.  However, when I did a

stmfadm import_lu /dev/zvol/rdsk/pool

I still get meta file error

However, when I do a zfs get -H stmf_sbd_lu pool_name on the secondary
system, it now matches the results on the first system.

BTW:  The zfs send -p option is described as "Send Properties"

It seems like this should not be so hard to transfer an LU with zfs
send/receive.   


>> What else is not sent
>> by default?   In other words, am I better off sending the metadata with the
>> zfs send, or am I better off just creating the GUID once I get the data
>> transferred?  
> 
> I don't think this is a GUID issue.
>  -- richard
> 
> -- 
> Richard Elling
> rich...@nexenta.com   +1-760-896-4422
> Enterprise class storage for everyone
> www.nexenta.com
> 
--
Terry Hull


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Splitting root mirror to prep for re-install

2010-08-04 Thread Chris Josephes
> You can also use the "zpool split" command and save
> yourself having to do the zfs send|zfs recv step -
> all the data will be preserved.
> 
> "zpool split rpool preserve" does essentially
> everything up to and including the "zpool export
> preserve" commands you listed in your original email.
>  Just don't try to boot off it.

Gotta love OpenSolaris.

Just did a test run with "zpool split -n rpool preserve", and it looks like 
it'd be the easiest way to go about the process.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Splitting root mirror to prep for re-install

2010-08-04 Thread Chris Josephes
>
> 
> So, after rebuilding, you don't want to restore the
> same OS that you're
> currently running.  But there are some files you'd
> like to save for after
> you reinstall.  Why not just copy them off somewhere,
> in a tarball or
> something like that?

It's about 200+ gigs of files.  If I had a third drive, empty for all this, I'd 
do that in a heartbeat.

> 
> 
> > Given a rpool with disks c7d0s0 and c6d0s0, I think
> the following
> > process will do what I need:
> > 
> > 1. Run these commands
> > 
> > # zpool detach rpool c6d0s0
> > # zpool create preserve c6d0s0
> 
> The only reason you currently have the rpool in a
> slice (s0) is because
> that's a requirement for booting.  If you aren't
> planning to boot from the
> device after breaking it off the mirror ... Maybe
> just use the whole device
> instead of the slice.
> 
> zpool create preserve c6d0
> 
> 
> > # zfs create export/home
> > # zfs send rpool/export/home | zfs receive
> preserve/home
> > # zfs send (other filesystems)
> > # zpool export preserve
> 
> These are not right.  It should be something more
> like this:
> zfs create -o readonly=on preserve/rpool_export_home
> zfs snapshot rpool/export/h...@fubarsnap
> zfs send rpool/export/h...@fubarsnap | zfs receive -F
> preserve/rpool_export_home
> 
> And finally
> zpool export preserve
> 

Good catch on the readonly.  The snapshot wouldn't hurt either.  The zfs 
manpage on svn_133 suggests that I could do the whole send/receive directly 
against the filesystems without a snapshot, but one extra step isn't going to 
hurt.


> 
> > 2. Build out new host with svn_134, placing new
> root pool on c6d0s0 (or
> > whatever it's called on the new SATA controller)
> 
> Um ... I assume that's just a type-o ... 
> Yes, install fresh.  No, don't overwrite the existing
> "preserve" disk.
> 

Yeah, typo.

> For that matter, why break the mirror at all?  Just
> install the OS again,
> onto a single disk, which implicitly breaks the
> mirror.  Then when it's all
> done, use "zpool import" to import the other half of
> the mirror, which you
> didn't overwrite.
> 

I was worried about how "zpool import" would identify it.  If I just detach the 
disk from the mirror, would it still consider itself a part of "rpool"?  If so, 
how would ZFS handle two disks that belong to two distinct pools with the same 
name?

> 
> > 3. Run zpool import against "preserve", copy over
> data that should be
> > migrated.
> > 
> > 4. Rebuild the mirror by destroying the "preserve"
> pool and attaching
> > c7d0s0 to the rpool mirror.
> > 
> > Am I missing anything?
> 
> If you blow away the partition table of the 2nd disk
> (as I suggested above,
> but now retract) then you'll have to recreate the
> partition table of the
> second disk.  So you only attach s0 to s0.
> 
> After attaching, and resilvering, you'll want to
> installgrub on the 2nd
> disk, or else it won't be bootable after the first
> disk fails.  See the ZFS
> Troubleshooting Guide for details.

Yep.  I keep forgetting about the installgrub part.  And the future plan would 
be to use the whole disk instead of just a slice.

> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discu
> ss
>
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Logical Units and ZFS send / receive

2010-08-04 Thread Richard Elling
On Aug 4, 2010, at 1:27 PM, Terry Hull wrote:
>> From: Richard Elling 
>> Date: Wed, 4 Aug 2010 11:05:21 -0700
>> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive
>> 
>> On Aug 3, 2010, at 11:58 PM, Terry Hull wrote:
>>> I have a logical unit created with sbdadm create-lu that it I replicating
>>> with zfs send  / receive between 2 build 134 hosts.   The these LUs are 
>>> iSCSI
>>> targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003
>>> machine.   The zfs pool names are the same on both machines.  The 
>>> replication
>>> seems to be going correctly.  However, when I try to use the LUs on the
>>> server I am replicating the data to, I have issues.   Here is the scenario:
>>> 
>>> The LUs are created as sparse.  Here is the process I’m going through after
>>> the snapshots are replicated to a secondary machine:
>> 
>> How did you replicate? In b134, the COMSTAR metadata is placed in
>> hidden parameters in the dataset. These are not transferred via zfs send,
>> by default.  This metadata includes the LU.
>> -- richard
> 
> Does the -p option on the zfs send solve that problem?

I am unaware of a "zfs send -p" option.  Did you mean the -R option?

The LU metadata is stored in the stmf_sbd_lu property.  You should be able
to get/set it.

> What else is not sent
> by default?   In other words, am I better off sending the metadata with the
> zfs send, or am I better off just creating the GUID once I get the data
> transferred?  

I don't think this is a GUID issue.
 -- richard

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Splitting root mirror to prep for re-install

2010-08-04 Thread Mark Musante

You can also use the "zpool split" command and save yourself having to do the 
zfs send|zfs recv step - all the data will be preserved.

"zpool split rpool preserve" does essentially everything up to and including 
the "zpool export preserve" commands you listed in your original email.  Just 
don't try to boot off it.

On 4 Aug 2010, at 20:58, Edward Ned Harvey wrote:

>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Chris Josephes
>> 
>> I have a host running svn_133 with a root mirror pool that I'd like to
>> rebuild with a fresh install on new hardware; but I still have data on
>> the pool that I would like to preserve.
> 
> So, after rebuilding, you don't want to restore the same OS that you're
> currently running.  But there are some files you'd like to save for after
> you reinstall.  Why not just copy them off somewhere, in a tarball or
> something like that?
> 
> 
>> Given a rpool with disks c7d0s0 and c6d0s0, I think the following
>> process will do what I need:
>> 
>> 1. Run these commands
>> 
>> # zpool detach rpool c6d0s0
>> # zpool create preserve c6d0s0
> 
> The only reason you currently have the rpool in a slice (s0) is because
> that's a requirement for booting.  If you aren't planning to boot from the
> device after breaking it off the mirror ... Maybe just use the whole device
> instead of the slice.
> 
> zpool create preserve c6d0
> 
> 
>> # zfs create export/home
>> # zfs send rpool/export/home | zfs receive preserve/home
>> # zfs send (other filesystems)
>> # zpool export preserve
> 
> These are not right.  It should be something more like this:
> zfs create -o readonly=on preserve/rpool_export_home
> zfs snapshot rpool/export/h...@fubarsnap
> zfs send rpool/export/h...@fubarsnap | zfs receive -F
> preserve/rpool_export_home
> 
> And finally
> zpool export preserve
> 
> 
>> 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or
>> whatever it's called on the new SATA controller)
> 
> Um ... I assume that's just a type-o ... 
> Yes, install fresh.  No, don't overwrite the existing "preserve" disk.
> 
> For that matter, why break the mirror at all?  Just install the OS again,
> onto a single disk, which implicitly breaks the mirror.  Then when it's all
> done, use "zpool import" to import the other half of the mirror, which you
> didn't overwrite.
> 
> 
>> 3. Run zpool import against "preserve", copy over data that should be
>> migrated.
>> 
>> 4. Rebuild the mirror by destroying the "preserve" pool and attaching
>> c7d0s0 to the rpool mirror.
>> 
>> Am I missing anything?
> 
> If you blow away the partition table of the 2nd disk (as I suggested above,
> but now retract) then you'll have to recreate the partition table of the
> second disk.  So you only attach s0 to s0.
> 
> After attaching, and resilvering, you'll want to installgrub on the 2nd
> disk, or else it won't be bootable after the first disk fails.  See the ZFS
> Troubleshooting Guide for details.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Splitting root mirror to prep for re-install

2010-08-04 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Chris Josephes
> 
> I have a host running svn_133 with a root mirror pool that I'd like to
> rebuild with a fresh install on new hardware; but I still have data on
> the pool that I would like to preserve.

So, after rebuilding, you don't want to restore the same OS that you're
currently running.  But there are some files you'd like to save for after
you reinstall.  Why not just copy them off somewhere, in a tarball or
something like that?


> Given a rpool with disks c7d0s0 and c6d0s0, I think the following
> process will do what I need:
> 
> 1. Run these commands
> 
> # zpool detach rpool c6d0s0
> # zpool create preserve c6d0s0

The only reason you currently have the rpool in a slice (s0) is because
that's a requirement for booting.  If you aren't planning to boot from the
device after breaking it off the mirror ... Maybe just use the whole device
instead of the slice.

zpool create preserve c6d0


> # zfs create export/home
> # zfs send rpool/export/home | zfs receive preserve/home
> # zfs send (other filesystems)
> # zpool export preserve

These are not right.  It should be something more like this:
zfs create -o readonly=on preserve/rpool_export_home
zfs snapshot rpool/export/h...@fubarsnap
zfs send rpool/export/h...@fubarsnap | zfs receive -F
preserve/rpool_export_home

And finally
zpool export preserve


> 2. Build out new host with svn_134, placing new root pool on c6d0s0 (or
> whatever it's called on the new SATA controller)

Um ... I assume that's just a type-o ... 
Yes, install fresh.  No, don't overwrite the existing "preserve" disk.

For that matter, why break the mirror at all?  Just install the OS again,
onto a single disk, which implicitly breaks the mirror.  Then when it's all
done, use "zpool import" to import the other half of the mirror, which you
didn't overwrite.


> 3. Run zpool import against "preserve", copy over data that should be
> migrated.
> 
> 4. Rebuild the mirror by destroying the "preserve" pool and attaching
> c7d0s0 to the rpool mirror.
> 
> Am I missing anything?

If you blow away the partition table of the 2nd disk (as I suggested above,
but now retract) then you'll have to recreate the partition table of the
second disk.  So you only attach s0 to s0.

After attaching, and resilvering, you'll want to installgrub on the 2nd
disk, or else it won't be bootable after the first disk fails.  See the ZFS
Troubleshooting Guide for details.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Splitting root mirror to prep for re-install

2010-08-04 Thread Chris Josephes
I have a host running svn_133 with a root mirror pool that I'd like to rebuild 
with a fresh install on new hardware; but I still have data on the pool that I 
would like to preserve.

Given a rpool with disks c7d0s0 and c6d0s0, I think the following process will 
do what I need:

1. Run these commands

# zpool detach rpool c6d0s0
# zpool create preserve c6d0s0
# zfs create export/home
# zfs send rpool/export/home | zfs receive preserve/home
# zfs send (other filesystems)
# zpool export preserve

2. Build out new host with svn_134, placing new root pool on c6d0s0 (or 
whatever it's called on the new SATA controller)

3. Run zpool import against "preserve", copy over data that should be migrated.

4. Rebuild the mirror by destroying the "preserve" pool and attaching c7d0s0 to 
the rpool mirror.


Am I missing anything?

--
Chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Richard Elling
On Aug 4, 2010, at 9:03 AM, Eduardo Bragatto wrote:

> On Aug 4, 2010, at 12:26 AM, Richard Elling wrote:
> 
>> The tipping point for the change in the first fit/best fit allocation 
>> algorithm is
>> now 96%. Previously, it was 70%. Since you don't specify which OS, build,
>> or zpool version, I'll assume you are on something modern.
> 
> I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.

Then the first fit/best fit threshold is 96%.

>> NB, "zdb -m" will show the pool's metaslab allocations. If there are no 100%
>> free metaslabs, then it is a clue that the allocator might be working extra 
>> hard.
> 
> On the first two VDEVs there are no allocations 100% free (most are nearly 
> full)... The two newer ones, however, do have several allocations of 128GB 
> each, 100% free.
> 
> If I understand correctly in that scenario the allocator will work extra, is 
> that correct?

Yes, and this can be measured, but...

>> OK, so how long are they waiting?  Try "iostat -zxCn" and look at the
>> asvc_t column.  This will show how the disk is performing, though it
>> won't show the performance delivered by the file system to the
>> application.  To measure the latter, try "fsstat zfs" (assuming you are
>> on a Solaris distro)
> 
> Checking with iostat, I noticed the average wait time to be between 40ms and 
> 50ms for all disks. Which doesn't seem too bad.

... actually, that is pretty bad.  Look for an average around 10 ms and peaks
around 20ms.  Solve this problem first -- the system can do a huge amount of
allocations for any algorithm in 1ms.

> And this is the output of fsstat:
> 
> # fsstat zfs
> new  name   name  attr  attr lookup rddir  read read  write write
> file remov  chng   get   setops   ops   ops bytes   ops bytes
> 3.26M 1.34M 3.22M  161M 13.4M  1.36G  9.6M 10.5M  899G 22.0M  625G zfs

Unfortunately, the first line is useless, it is the summary since boot.  Try 
adding a sample interval to see how things are moving now.

> 
> However I did have CPU spikes at 100% where the kernel was taking all cpu 
> time.

Again, this can be analyzed using baseline performance analysis techniques.
The "prstat" command should show how CPU is being used.  I'm not running
Solaris 10 10/09, but IIRC, it has the ZFS enhancement where CPU time is 
attributed to the pool, as seen in prstat.
 -- richard

> 
> I have reduced my zfs_arc_max parameter as it seemed the applications were 
> struggling for RAM and things are looking better now
> 
> Thanks for your time,
> Eduardo Bragatto.
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


I will also start using rsync v3 to reduce the memory foot print, so I might 
be able to give back some RAM to ARC, and I'm thinking maybe going to 16GB 
RAM, as the pool is quite large and I'm sure more ARC wouldn't hurt.


It is definitely a wise idea to use rsync v3.  Previous versions had 
to recurse the whole tree on both sides (storing what was 
learned in memory) before doing anything.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Corrupt file without filename

2010-08-04 Thread Cindy Swearingen

Because this is a non-redundant root pool, you should still
check fmdump -eV to make sure the corrupted files aren't
due to some ongoing disk problems.

cs

On 08/04/10 13:45, valrh...@gmail.com wrote:

Oooh... Good call!

I scrubbed the pool twice, then it showed a real filename from an old snapshot 
that I had attempted to delete before (like a month ago), and gave an error, 
which I subsequently forgot about. I deleted the snapshot and cleaned up a few 
other snaphots, cleared the error, rescrubbed. And now, no more corrupt file. 
Nice!

Love this forum... thanks so much!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Logical Units and ZFS send / receive

2010-08-04 Thread Terry Hull

> From: Richard Elling 
> Date: Wed, 4 Aug 2010 11:05:21 -0700
> Subject: Re: [zfs-discuss] Logical Units and ZFS send / receive
> 
> On Aug 3, 2010, at 11:58 PM, Terry Hull wrote:
>> I have a logical unit created with sbdadm create-lu that it I replicating
>> with zfs send  / receive between 2 build 134 hosts.   The these LUs are iSCSI
>> targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003
>> machine.   The zfs pool names are the same on both machines.  The replication
>> seems to be going correctly.  However, when I try to use the LUs on the
>> server I am replicating the data to, I have issues.   Here is the scenario:
>> 
>> The LUs are created as sparse.  Here is the process I¹m going through after
>> the snapshots are replicated to a secondary machine:
> 
> How did you replicate? In b134, the COMSTAR metadata is placed in
> hidden parameters in the dataset. These are not transferred via zfs send,
> by default.  This metadata includes the LU.
>  -- richard

Does the -p option on the zfs send solve that problem? What else is not sent
by default?   In other words, am I better off sending the metadata with the
zfs send, or am I better off just creating the GUID once I get the data
transferred?  

--
Terry Hull
Network Resource Group, Inc.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Corrupt file without filename

2010-08-04 Thread valrh...@gmail.com
Oooh... Good call!

I scrubbed the pool twice, then it showed a real filename from an old snapshot 
that I had attempted to delete before (like a month ago), and gave an error, 
which I subsequently forgot about. I deleted the snapshot and cleaned up a few 
other snaphots, cleared the error, rescrubbed. And now, no more corrupt file. 
Nice!

Love this forum... thanks so much!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to identify user-created zfs filesystems?

2010-08-04 Thread Mark J Musante


You can use 'zpool history -l syspool' to show the username of the person 
who created the dataset.  The history is in a ring buffer, so if too many 
pool operations have happened since the dataset was created, the 
information is lost.



On Wed, 4 Aug 2010, Peter Taps wrote:


Folks,

In my application, I need to present user-created filesystems. For my test, I created a 
zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run 
"zfs list," I see a lot more entries:

# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
mypool  1.31M  1.95G33K  /volumes/mypool
mypool/cifs11.12M  1.95G  1.12M  /volumes/mypool/cifs1
mypool/cifs2  44K  1.95G44K  /volumes/mypool/cifs2
syspool 3.58G  4.23G  35.5K  legacy
syspool/dump 716M  4.23G   716M  -
syspool/rootfs-nmu-000  1.85G  4.23G  1.36G  legacy
syspool/rootfs-nmu-001  53.5K  4.23G  1.15G  legacy
syspool/swap1.03G  5.19G  71.4M  -

I just need to present cifs1 and cifs2 to the user. Is there a property on the 
filesystem that I can use to determine user-created filesystems?

Thank you in advance for your help.

Regards,
Peter
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to identify user-created zfs filesystems?

2010-08-04 Thread Cindy Swearingen

Hi Peter,

I don't think we have any property that determines who created
the file system.

Would this work instead:

# zfs list -r mypool
NAME   USED  AVAIL  REFER  MOUNTPOINT
mypool 172K   134G33K  /mypool
mypool/cifs131K   134G31K  /mypool/cifs1
mypool/cifs231K   134G31K  /mypool/cifs2

Or, take a look at user properties, which is text that
you can apply to a file system for whatever purpose you
choose.

Thanks,

Cindy


On 08/04/10 12:55, Peter Taps wrote:

Folks,

In my application, I need to present user-created filesystems. For my test, I created a 
zfs pool called mypool and two file systems called cifs1 and cifs2. However, when I run 
"zfs list," I see a lot more entries:

# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
mypool  1.31M  1.95G33K  /volumes/mypool
mypool/cifs11.12M  1.95G  1.12M  /volumes/mypool/cifs1
mypool/cifs2  44K  1.95G44K  /volumes/mypool/cifs2
syspool 3.58G  4.23G  35.5K  legacy
syspool/dump 716M  4.23G   716M  -
syspool/rootfs-nmu-000  1.85G  4.23G  1.36G  legacy
syspool/rootfs-nmu-001  53.5K  4.23G  1.15G  legacy
syspool/swap1.03G  5.19G  71.4M  -

I just need to present cifs1 and cifs2 to the user. Is there a property on the 
filesystem that I can use to determine user-created filesystems?

Thank you in advance for your help.

Regards,
Peter

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 11:18 AM, Bob Friesenhahn wrote:

Assuming that your impressions are correct, are you sure that your  
new disk drives are similar to the older ones?  Are they an  
identical model?  Design trade-offs are now often resulting in  
larger capacity drives with reduced performance.


Yes, the disks are the same, no problems there.


On Aug 4, 2010, at 2:11 PM, Bob Friesenhahn wrote:


On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


Checking with iostat, I noticed the average wait time to be between  
40ms and 50ms for all disks. Which doesn't seem too bad.


Actually, this is quite high.  I would not expect such long wait  
times except for when under extreme load such as a benchmark.  If  
the wait times are this long under normal use, then there is  
something wrong.


That's a backup server, I usually have 10 rsync instances running  
simultaneously so there's a lot of random disk access going on -- I  
think that explains the high average time. Also, I recently enabled  
graphing of the IOPS per disk (reading it using net-snmp) and I see  
most disks are operating near their limit -- except for some disks  
from the older VDEVs which is what I'm trying to address here.


However I did have CPU spikes at 100% where the kernel was taking  
all cpu time.


I have reduced my zfs_arc_max parameter as it seemed the  
applications were struggling for RAM and things are looking better  
now


Odd.  What type of applications are you running on this system?  Are  
applications running on the server competing with client accesses?



I noticed some of those rsync processes were using almost 1GB of RAM  
each and the server has only 8GB. I started seeing the server swapping  
a bit during the cpu spikes at 100%, so I figured it would be better  
to cap ARC and leave some room for the rsync processes.


I will also start using rsync v3 to reduce the memory foot print, so I  
might be able to give back some RAM to ARC, and I'm thinking maybe  
going to 16GB RAM, as the pool is quite large and I'm sure more ARC  
wouldn't hurt.


Thanks,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How to identify user-created zfs filesystems?

2010-08-04 Thread Peter Taps
Folks,

In my application, I need to present user-created filesystems. For my test, I 
created a zfs pool called mypool and two file systems called cifs1 and cifs2. 
However, when I run "zfs list," I see a lot more entries:

# zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
mypool  1.31M  1.95G33K  /volumes/mypool
mypool/cifs11.12M  1.95G  1.12M  /volumes/mypool/cifs1
mypool/cifs2  44K  1.95G44K  /volumes/mypool/cifs2
syspool 3.58G  4.23G  35.5K  legacy
syspool/dump 716M  4.23G   716M  -
syspool/rootfs-nmu-000  1.85G  4.23G  1.36G  legacy
syspool/rootfs-nmu-001  53.5K  4.23G  1.15G  legacy
syspool/swap1.03G  5.19G  71.4M  -

I just need to present cifs1 and cifs2 to the user. Is there a property on the 
filesystem that I can use to determine user-created filesystems?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] vdev using more space

2010-08-04 Thread Karl Rossing

 Hi,

We have a server running b134. The server runs xen and uses a vdev as 
the storage.


The xen image is running nevada 134.

I took a snapshot last night to move the xen image to another server.

NAME USED  AVAIL  REFER  MOUNTPOINT
vpool/host/snv_130 32.8G  11.3G  37.7G  -
vpool/host/snv_...@2010-03-31  3.27G  -  13.8G  -
vpool/host/snv_...@2010-08-03   436M  -  37.7G  -

It's also worth noting that vpool/host/snv_130 is a clone at least two 
other snapshots.


I then did a zfs send of vpool/host/snv_...@2010-08-03 and got a 39GB file.
A zfs send of vpool/host/snv_...@2010-03-31 gave a file of 15GB.

I don't understand why the file is 39GB since df -h inside of the xen 
image drive vpool/host/snv_130 shows:

Filesystem size   used  avail capacity  Mounted on
rpool/ROOT/snv_130  39G12G22G35%/

It would be nice if the zfs send file would be roughly the same size as 
the space used inside of xen machine.


Karl





















CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS performance Tuning

2010-08-04 Thread Richard Elling
On Aug 4, 2010, at 3:22 AM, TAYYAB REHMAN wrote:
> Hi,
> i am working with ZFS now a days, i am facing some performance issues 
> from application team, as they said writes are very slow in ZFS w.r.t UFS. 
> Kindly send me some good reference or books links. i will be very thankful to 
> you.

Hi Tayyab, 
Please start with the ZFS Best Practices Guide.
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Wed, 4 Aug 2010, Eduardo Bragatto wrote:


Checking with iostat, I noticed the average wait time to be between 40ms and 
50ms for all disks. Which doesn't seem too bad.


Actually, this is quite high.  I would not expect such long wait times 
except for when under extreme load such as a benchmark.  If the wait 
times are this long under normal use, then there is something wrong.


However I did have CPU spikes at 100% where the kernel was taking all cpu 
time.


I have reduced my zfs_arc_max parameter as it seemed the applications were 
struggling for RAM and things are looking better now


Odd.  What type of applications are you running on this system?  Are 
applications running on the server competing with client accesses?


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Logical Units and ZFS send / receive

2010-08-04 Thread Richard Elling
On Aug 3, 2010, at 11:58 PM, Terry Hull wrote:
> I have a logical unit created with sbdadm create-lu that it I replicating 
> with zfs send  / receive between 2 build 134 hosts.   The these LUs are iSCSI 
> targets used as VMFS filesystems and ESX RDMs mounted on a Windows 2003 
> machine.   The zfs pool names are the same on both machines.  The replication 
> seems to be going correctly.  However, when I try to use the LUs on the 
> server I am replicating the data to, I have issues.   Here is the scenario:  
> 
> The LUs are created as sparse.  Here is the process I’m going through after 
> the snapshots are replicated to a secondary machine:

How did you replicate? In b134, the COMSTAR metadata is placed in 
hidden parameters in the dataset. These are not transferred via zfs send,
by default.  This metadata includes the LU.
 -- richard

>   • Original machine:  svccfg export -a stmf > /tmp/stmf.cfg
>   • Copy stmf.cfg to second machine:
>   • Secondary machine:  svcadm disable stmf
>   • svccfg delete xtmf
>   • cd /var/svc/manifest
>   • svccfg import system/stmf.xml
>   • svcadm disable stmf
>   • svcadm import /tmp/stmf.cfg
> 
> At this point stmfadm list-lu –v shows the SCSI LUs  all as “unregistered” 
> 
> When I try to import the LUs I get: stmfadm: meta data error
> 
> I am using the command:
> stmfadm import-lu /dev/zvol/rdsk/pool-name
> 
> to import the LU
> 
> It is as if the pool does not exist.  However, I can verify that the pool 
> does actually exist with zfs list and with zfs list –t snapshot to show the 
> snapshot that I replicated.   
> 
> 
> Any suggestions?  
> --
> Terry Hull
> Network Resource Group, Inc.
> 
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 
Richard Elling
rich...@nexenta.com   +1-760-896-4422
Enterprise class storage for everyone
www.nexenta.com



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LTFS and LTO-5 Tape Drives

2010-08-04 Thread David Magda
On Wed, August 4, 2010 12:25, valrh...@gmail.com wrote:
> Actually, no. I could care less about incrementals, and multivolume
> handling. My purpose is to have occasional, long-term archival backup of
> big experimental data sets. The challenge is keeping everything organized,
> and readable several years later, where I only need to recall a small
> subset of what's on the tape. The idea that the tape has a browseable
> filesystem is therefore extremely useful in principle.
>
> Has anyone actually tried this with OpenSolaris? The LTFS websites I've
> seen only talk about Mac and Linux support, but if it's supported on
> Linux, in principle the (open-source?) drivers should be portable, no?

I can understand the desire and convenience of a browsable file system,
but I'd trust the long-term accessibility of the (POSIX) tar format more
than most other things. Perhaps have one tape with tar, and other with
this LTFS thing, so you have your bases covered (e.g., in case one tape is
damaged, or if LTFS is just a buzzword/fad).

I'm assuming you're referring to:

http://en.wikipedia.org/wiki/Linear_Tape_File_System

If Linux and Mac (which can be considered a variant of FreeBSD) are
covered, then it should technically be possible to modify it to support
Solaris. I'm sure the authors of the software would be interested in
patches (assuming it's open-source).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Corrupt file without filename

2010-08-04 Thread Cindy Swearingen

Maybe it is a temporary file.

You might try running a scrub to see if it goes away.

I would also use fmdump -eV to see if this disk is
having problems.

Thanks,

Cindy

On 08/04/10 01:05, valrh...@gmail.com wrote:

I have one corrupt file in my rpool, but when I run "zpool status -v", I don't 
get a filename, just an address. Any idea how to fix this? Here's the output:

p...@dellt7500:~# zpool status -v rpool 
  pool: rpool

 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c4t0d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

rpool/export/home/plu:<0x12491>
p...@dellt7500:~#

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Ross Walker
On Aug 4, 2010, at 12:04 PM, Roch  wrote:

> 
> Ross Walker writes:
>> On Aug 4, 2010, at 9:20 AM, Roch  wrote:
>> 
>>> 
>>> 
>>> Ross Asks: 
>>> So on that note, ZFS should disable the disks' write cache,
>>> not enable them  despite ZFS's COW properties because it
>>> should be resilient. 
>>> 
>>> No, because ZFS builds resiliency on top of unreliable parts. it's able to 
>>> deal
>>> with contained failures (lost state) of the disk write cache. 
>>> 
>>> It can then export LUNS that have WC enabled or
>>> disabled. But if we enable the WC on the exported LUNS, then
>>> the consumer of these LUNS must be able to say the same.
>>> The discussion at that level then needs to focus on failure groups.
>>> 
>>> 
>>> Ross also Said :
>>> I asked this question earlier, but got no answer: while an
>>> iSCSI target is presented WCE does it respect the flush
>>> command? 
>>> 
>>> Yes. I would like to say "obviously" but it's been anything
>>> but.
>> 
>> Sorry to probe further, but can you expand on but...
>> 
>> Just if we had a bunch of zvols exported via iSCSI to another Solaris
>> box which used them to form another zpool and had WCE turned on would
>> it be reliable? 
>> 
> 
> Nope. That's because all the iSCSI are in the same fault
> domain as they share a unified back-end cache. What works,
> in principle, is mirroring SCSI channels hosted on 
> different storage controllers (or N SCSI channels on N
> controller in a raid group).
> 
> Which is why keeping the WC set to the default, is really
> better in general.

Well I was actually talking about two backend Solaris storage servers serving 
up storage over iSCSI to a front-end Solaris server serving ZFS over NFS, so I 
have redundancy there, but want the storage to be performant, so I want the 
iSCSI to have WCE, yet I want it to be reliable and have it honor cache flush 
requests from the front-end NFS server.

Does that make sense? Is it possible?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LTFS and LTO-5 Tape Drives

2010-08-04 Thread valrh...@gmail.com
Actually, no. I could care less about incrementals, and multivolume handling. 
My purpose is to have occasional, long-term archival backup of big experimental 
data sets. The challenge is keeping everything organized, and readable several 
years later, where I only need to recall a small subset of what's on the tape. 
The idea that the tape has a browseable filesystem is therefore extremely 
useful in principle.

Has anyone actually tried this with OpenSolaris? The LTFS websites I've seen 
only talk about Mac and Linux support, but if it's supported on Linux, in 
principle the (open-source?) drivers should be portable, no?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snapshot space - miscalculation?

2010-08-04 Thread Scott Meilicke
Are there other file systems underneath daten/backups that have snapshots?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Roch

Ross Walker writes:
 > On Aug 4, 2010, at 9:20 AM, Roch  wrote:
 > 
 > > 
 > > 
 > >  Ross Asks: 
 > >  So on that note, ZFS should disable the disks' write cache,
 > >  not enable them  despite ZFS's COW properties because it
 > >  should be resilient. 
 > > 
 > > No, because ZFS builds resiliency on top of unreliable parts. it's able to 
 > > deal
 > > with contained failures (lost state) of the disk write cache. 
 > > 
 > > It can then export LUNS that have WC enabled or
 > > disabled. But if we enable the WC on the exported LUNS, then
 > > the consumer of these LUNS must be able to say the same.
 > > The discussion at that level then needs to focus on failure groups.
 > > 
 > > 
 > >  Ross also Said :
 > >  I asked this question earlier, but got no answer: while an
 > >  iSCSI target is presented WCE does it respect the flush
 > >  command? 
 > > 
 > > Yes. I would like to say "obviously" but it's been anything
 > > but.
 > 
 > Sorry to probe further, but can you expand on but...
 > 
 > Just if we had a bunch of zvols exported via iSCSI to another Solaris
 > box which used them to form another zpool and had WCE turned on would
 > it be reliable? 
 > 

Nope. That's because all the iSCSI are in the same fault
domain as they share a unified back-end cache. What works,
in principle, is mirroring SCSI channels hosted on 
different storage controllers (or N SCSI channels on N
controller in a raid group).

Which is why keeping the WC set to the default, is really
better in general.

-r

 > -Ross
 > 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 12:20 AM, Khyron wrote:


I notice you use the word "volume" which really isn't accurate or
appropriate here.


Yeah, it didn't seem right to me, but I wasn't sure about the  
nomenclature, thanks for clarifying.



You may want to get a bit more specific and choose from the oldest
datasets THEN find the smallest of those oldest datasets and
send/receive it first.  That way, the send/receive completes in less
time, and when you delete the source dataset, you've now created
more free space on the entire pool but without the risk of a single
dataset exceeding your 10 TiB of workspace.


That makes sense, I'll try send/receiving a few of those datasets and  
see how it goes. I believe I can find the ones that were created  
before the two new VDEVs were added, by comparing the creation time  
from "zfs get creation"



ZFS' copy-on-write nature really wants no less than 20% free because
you never update data in place; a new copy is always written to disk.


Right, and my problem is that I have two VDEVs with less than 10% free  
at this point -- although the other two have around 50% free each.



You might want to consider turning on compression on your new datasets
too, especially if you have free CPU cycles to spare.  I don't know  
how
compressible your data is, but if it's fairly compressible, say lots  
of text,

then you might get some added benefit when you copy the old data into
the new datasets.  Saving more space, then deleting the source  
dataset,

should help your pool have more free space, and thus influence your
writes for better I/O balancing when you do the next (and the next)  
dataset

copies.


Unfortunately the data taking most of the space it already compressed,  
so while I would gain some space from many text files that I also  
have, those are not the majority of my content, and the effort would  
probably not justify the small gain.


Thanks
Eduardo Bragatto
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Eduardo Bragatto

On Aug 4, 2010, at 12:26 AM, Richard Elling wrote:

The tipping point for the change in the first fit/best fit  
allocation algorithm is
now 96%. Previously, it was 70%. Since you don't specify which OS,  
build,

or zpool version, I'll assume you are on something modern.


I'm running Solaris 10 10/09 s10x_u8wos_08a, ZFS Pool version 15.

NB, "zdb -m" will show the pool's metaslab allocations. If there are  
no 100%
free metaslabs, then it is a clue that the allocator might be  
working extra hard.


On the first two VDEVs there are no allocations 100% free (most are  
nearly full)... The two newer ones, however, do have several  
allocations of 128GB each, 100% free.


If I understand correctly in that scenario the allocator will work  
extra, is that correct?



OK, so how long are they waiting?  Try "iostat -zxCn" and look at the
asvc_t column.  This will show how the disk is performing, though it
won't show the performance delivered by the file system to the
application.  To measure the latter, try "fsstat zfs" (assuming you  
are

on a Solaris distro)


Checking with iostat, I noticed the average wait time to be between  
40ms and 50ms for all disks. Which doesn't seem too bad.


And this is the output of fsstat:

# fsstat zfs
new  name   name  attr  attr lookup rddir  read read  write write
file remov  chng   get   setops   ops   ops bytes   ops bytes
3.26M 1.34M 3.22M  161M 13.4M  1.36G  9.6M 10.5M  899G 22.0M  625G zfs

However I did have CPU spikes at 100% where the kernel was taking all  
cpu time.


I have reduced my zfs_arc_max parameter as it seemed the applications  
were struggling for RAM and things are looking better now


Thanks for your time,
Eduardo Bragatto.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Restripe

2010-08-04 Thread Bob Friesenhahn

On Tue, 3 Aug 2010, Eduardo Bragatto wrote:

You're a funny guy. :)

Let me re-phrase it: I'm sure I'm getting degradation in performance as my 
applications are waiting more on I/O now than they used to do (based on CPU 
utilization graphs I have). The impression part, is that the reason is the 
limited space in those two volumes -- as I said, I already experienced bad 
performance on zfs systems running nearly out of space before.


Assuming that your impressions are correct, are you sure that your new 
disk drives are similar to the older ones?  Are they an identical 
model?  Design trade-offs are now often resulting in larger capacity 
drives with reduced performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Ross Walker
On Aug 4, 2010, at 9:20 AM, Roch  wrote:

> 
> 
>  Ross Asks: 
>  So on that note, ZFS should disable the disks' write cache,
>  not enable them  despite ZFS's COW properties because it
>  should be resilient. 
> 
> No, because ZFS builds resiliency on top of unreliable parts. it's able to 
> deal
> with contained failures (lost state) of the disk write cache. 
> 
> It can then export LUNS that have WC enabled or
> disabled. But if we enable the WC on the exported LUNS, then
> the consumer of these LUNS must be able to say the same.
> The discussion at that level then needs to focus on failure groups.
> 
> 
>  Ross also Said :
>  I asked this question earlier, but got no answer: while an
>  iSCSI target is presented WCE does it respect the flush
>  command? 
> 
> Yes. I would like to say "obviously" but it's been anything
> but.

Sorry to probe further, but can you expand on but...

Just if we had a bunch of zvols exported via iSCSI to another Solaris box which 
used them to form another zpool and had WCE turned on would it be reliable?

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS performance Tuning

2010-08-04 Thread TAYYAB REHMAN
Hi,
i am working with ZFS now a days, i am facing some performance issues
from application team, as they said writes are very slow in ZFS w.r.t UFS.
Kindly send me some good reference or books links. i will be very thankful
to you.

BR,
Tayyab
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Matt Connolly
On 04/08/2010, at 2:13, Roch Bourbonnais  wrote:

> 
> Le 27 mai 2010 à 07:03, Brent Jones a écrit :
> 
>> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
>>  wrote:
>>> I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
>>> 
>>> sh-4.0# zfs create rpool/iscsi
>>> sh-4.0# zfs set shareiscsi=on rpool/iscsi
>>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test
>>> 
>>> The underlying zpool is a mirror of two SATA drives. I'm connecting from a 
>>> Mac client with global SAN initiator software, connected via Gigabit LAN. 
>>> It connects fine, and I've initialiased a mac format volume on that iScsi 
>>> volume.
>>> 
>>> Performance, however, is terribly slow, about 10 times slower than an SMB 
>>> share on the same pool. I expected it would be very similar, if not faster 
>>> than SMB.
>>> 
>>> Here's my test results copying 3GB data:
>>> 
>>> iScsi:  44m01s  1.185MB/s
>>> SMB share:  4m2711.73MB/s
>>> 
>>> Reading (the same 3GB) is also worse than SMB, but only by a factor of 
>>> about 3:
>>> 
>>> iScsi:  4m3611.34MB/s
>>> SMB share:  1m4529.81MB/s
>>> 
> 
>  
> 
> Not unexpected. Filesystems have readahead code to prefetch enough to cover 
> the latency of the read request. iSCSI only responds to the request.
> Put a filesystem on top of iscsi and try again.

As I indicated above, there is a mac filesystem on the iscsi volume.

Matt. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Roch

   
  Ross Asks: 
  So on that note, ZFS should disable the disks' write cache,
  not enable them  despite ZFS's COW properties because it
  should be resilient. 

No, because ZFS builds resiliency on top of unreliable parts. it's able to deal
with contained failures (lost state) of the disk write cache. 

It can then export LUNS that have WC enabled or
disabled. But if we enable the WC on the exported LUNS, then
the consumer of these LUNS must be able to say the same.
The discussion at that level then needs to focus on failure groups.


  Ross also Said :
  I asked this question earlier, but got no answer: while an
  iSCSI target is presented WCE does it respect the flush
  command? 

Yes. I would like to say "obviously" but it's been anything
but.


-r

Ross Walker writes:
 > On Aug 4, 2010, at 3:52 AM, Roch  wrote:
 > 
 > > 
 > > Ross Walker writes:
 > > 
 > >> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais  
 > >> wrote:
 > >> 
 > >>> 
 > >>> Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 > >>> 
 >  On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 >   wrote:
 > > I've set up an iScsi volume on OpenSolaris (snv_134) with these 
 > > commands:
 > > 
 > > sh-4.0# zfs create rpool/iscsi
 > > sh-4.0# zfs set shareiscsi=on rpool/iscsi
 > > sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 > > 
 > > The underlying zpool is a mirror of two SATA drives. I'm connecting 
 > > from a Mac client with global SAN initiator software, connected via 
 > > Gigabit LAN. It connects fine, and I've initialiased a mac format 
 > > volume on that iScsi volume.
 > > 
 > > Performance, however, is terribly slow, about 10 times slower than an 
 > > SMB share on the same pool. I expected it would be very similar, if 
 > > not faster than SMB.
 > > 
 > > Here's my test results copying 3GB data:
 > > 
 > > iScsi:  44m01s  1.185MB/s
 > > SMB share:  4m2711.73MB/s
 > > 
 > > Reading (the same 3GB) is also worse than SMB, but only by a factor of 
 > > about 3:
 > > 
 > > iScsi:  4m3611.34MB/s
 > > SMB share:  1m4529.81MB/s
 > > 
 > >>> 
 > >>>  
 > >>> 
 > >>> Not unexpected. Filesystems have readahead code to prefetch enough to 
 > >>> cover the latency of the read request. iSCSI only responds to the 
 > >>> request.
 > >>> Put a filesystem on top of iscsi and try again.
 > >>> 
 > >>> For writes, iSCSI is synchronous and SMB is not. 
 > >> 
 > >> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is 
 > >> is simply SCSI over IP.
 > >> 
 > > 
 > > Hey Ross,
 > > 
 > > Nothing to do with ZFS here, but you're right to point out
 > > that iSCSI is neither. It was just that in the context of
 > > this test (and 99+% of iSCSI usage) it will be. SMB is
 > > not. Thus a large discrepancy on the write test.
 > > 
 > > Resilient storage, by default, should expose iSCSI channels
 > > with write caches disabled.
 > 
 > 
 > So on that note, ZFS should disable the disks' write cache, not enable them  
 > despite ZFS's COW properties because it should be resilient.
 > 
 > 
 > >> It is the application using the iSCSI protocol that
 > > determines whether it is synchronous, issue a flush after
 > > write, or asynchronous, wait until target flushes.
 > >> 
 > > 
 > > True.
 > > 
 > >> I think the ZFS developers didn't quite understand that
 > > and wanted strict guidelines like NFS has, but iSCSI doesn't
 > > have those, it is a lower level protocol than NFS is, so
 > > they forced guidelines on it and violated the standard. 
 > >> 
 > >> -Ross
 > >> 
 > > 
 > > Not True. 
 > > 
 > > 
 > > ZFS exposes LUNS (or ZVOL) and while at first we didn't support
 > > DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you
 > > need it to be.
 > 
 > I asked this question earlier, but got no answer: while an iSCSI target is 
 > presented WCE does it respect the flush command?
 > 
 > > Now in the context of iSCSI luns hosted by a resilient
 > > storage system, enabling write caches is to be used only in
 > > very specific circumstances. The situation is not symmetrical
 > > with WCE in disks of a JBOD since that can be setup with
 > > enough redundancy to deal with potential data loss. When
 > > using a resilient storage, you need to trust the storage for
 > > persistence of SCSI commands and building a resilient system
 > > on top of write cache enabled SCSI channels is not trivial.
 > 
 > Not true, advertise WCE, support flush and tagged command queuing and the 
 > initiator will be able to use the resilient storage appropriate for it's 
 > needs.
 > 
 > -Ross
 > 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Ross Walker
On Aug 4, 2010, at 3:52 AM, Roch  wrote:

> 
> Ross Walker writes:
> 
>> On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais  
>> wrote:
>> 
>>> 
>>> Le 27 mai 2010 à 07:03, Brent Jones a écrit :
>>> 
 On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
  wrote:
> I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
> 
> sh-4.0# zfs create rpool/iscsi
> sh-4.0# zfs set shareiscsi=on rpool/iscsi
> sh-4.0# zfs create -s -V 10g rpool/iscsi/test
> 
> The underlying zpool is a mirror of two SATA drives. I'm connecting from 
> a Mac client with global SAN initiator software, connected via Gigabit 
> LAN. It connects fine, and I've initialiased a mac format volume on that 
> iScsi volume.
> 
> Performance, however, is terribly slow, about 10 times slower than an SMB 
> share on the same pool. I expected it would be very similar, if not 
> faster than SMB.
> 
> Here's my test results copying 3GB data:
> 
> iScsi:  44m01s  1.185MB/s
> SMB share:  4m2711.73MB/s
> 
> Reading (the same 3GB) is also worse than SMB, but only by a factor of 
> about 3:
> 
> iScsi:  4m3611.34MB/s
> SMB share:  1m4529.81MB/s
> 
>>> 
>>>  
>>> 
>>> Not unexpected. Filesystems have readahead code to prefetch enough to cover 
>>> the latency of the read request. iSCSI only responds to the request.
>>> Put a filesystem on top of iscsi and try again.
>>> 
>>> For writes, iSCSI is synchronous and SMB is not. 
>> 
>> It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is 
>> simply SCSI over IP.
>> 
> 
> Hey Ross,
> 
> Nothing to do with ZFS here, but you're right to point out
> that iSCSI is neither. It was just that in the context of
> this test (and 99+% of iSCSI usage) it will be. SMB is
> not. Thus a large discrepancy on the write test.
> 
> Resilient storage, by default, should expose iSCSI channels
> with write caches disabled.


So on that note, ZFS should disable the disks' write cache, not enable them  
despite ZFS's COW properties because it should be resilient.


>> It is the application using the iSCSI protocol that
> determines whether it is synchronous, issue a flush after
> write, or asynchronous, wait until target flushes.
>> 
> 
> True.
> 
>> I think the ZFS developers didn't quite understand that
> and wanted strict guidelines like NFS has, but iSCSI doesn't
> have those, it is a lower level protocol than NFS is, so
> they forced guidelines on it and violated the standard. 
>> 
>> -Ross
>> 
> 
> Not True. 
> 
> 
> ZFS exposes LUNS (or ZVOL) and while at first we didn't support
> DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you
> need it to be.

I asked this question earlier, but got no answer: while an iSCSI target is 
presented WCE does it respect the flush command?

> Now in the context of iSCSI luns hosted by a resilient
> storage system, enabling write caches is to be used only in
> very specific circumstances. The situation is not symmetrical
> with WCE in disks of a JBOD since that can be setup with
> enough redundancy to deal with potential data loss. When
> using a resilient storage, you need to trust the storage for
> persistence of SCSI commands and building a resilient system
> on top of write cache enabled SCSI channels is not trivial.

Not true, advertise WCE, support flush and tagged command queuing and the 
initiator will be able to use the resilient storage appropriate for it's needs.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LTFS and LTO-5 Tape Drives

2010-08-04 Thread Joerg Schilling
"valrh...@gmail.com"  wrote:

> Has anyone looked into the new LTFS on LTO-5 for tape backups? Any idea how 
> this would work with ZFS? I'm presuming ZFS send / receive are not going to 
> work. But it seems rather appealing to have the metadata properly with the 
> data, and being able to browse files directly instead of having to rely on 
> backup software, however nice tar may be. Has anyone used this with 
> OpenSolaris, or have an opinion on how this would work in practice? Thanks!

What do you understand by "nice tar"?

For a backup, you need reliable incrementals (you get this from star)
and reliable multi-volume handling (this is what you also get from star).

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] iScsi slow

2010-08-04 Thread Roch

Ross Walker writes:

 > On Aug 3, 2010, at 12:13 PM, Roch Bourbonnais  
 > wrote:
 > 
 > > 
 > > Le 27 mai 2010 à 07:03, Brent Jones a écrit :
 > > 
 > >> On Wed, May 26, 2010 at 5:08 AM, Matt Connolly
 > >>  wrote:
 > >>> I've set up an iScsi volume on OpenSolaris (snv_134) with these commands:
 > >>> 
 > >>> sh-4.0# zfs create rpool/iscsi
 > >>> sh-4.0# zfs set shareiscsi=on rpool/iscsi
 > >>> sh-4.0# zfs create -s -V 10g rpool/iscsi/test
 > >>> 
 > >>> The underlying zpool is a mirror of two SATA drives. I'm connecting from 
 > >>> a Mac client with global SAN initiator software, connected via Gigabit 
 > >>> LAN. It connects fine, and I've initialiased a mac format volume on that 
 > >>> iScsi volume.
 > >>> 
 > >>> Performance, however, is terribly slow, about 10 times slower than an 
 > >>> SMB share on the same pool. I expected it would be very similar, if not 
 > >>> faster than SMB.
 > >>> 
 > >>> Here's my test results copying 3GB data:
 > >>> 
 > >>> iScsi:  44m01s  1.185MB/s
 > >>> SMB share:  4m2711.73MB/s
 > >>> 
 > >>> Reading (the same 3GB) is also worse than SMB, but only by a factor of 
 > >>> about 3:
 > >>> 
 > >>> iScsi:  4m3611.34MB/s
 > >>> SMB share:  1m4529.81MB/s
 > >>> 
 > > 
 > >  
 > > 
 > > Not unexpected. Filesystems have readahead code to prefetch enough to 
 > > cover the latency of the read request. iSCSI only responds to the request.
 > > Put a filesystem on top of iscsi and try again.
 > > 
 > > For writes, iSCSI is synchronous and SMB is not. 
 > 
 > It may be with ZFS, but iSCSI is neither synchronous nor asynchronous is is 
 > simply SCSI over IP.
 > 

Hey Ross,

Nothing to do with ZFS here, but you're right to point out
that iSCSI is neither. It was just that in the context of
this test (and 99+% of iSCSI usage) it will be. SMB is
not. Thus a large discrepancy on the write test.

Resilient storage, by default, should expose iSCSI channels
with write caches disabled.

 > It is the application using the iSCSI protocol that
determines whether it is synchronous, issue a flush after
write, or asynchronous, wait until target flushes.
 > 

True.

 > I think the ZFS developers didn't quite understand that
and wanted strict guidelines like NFS has, but iSCSI doesn't
have those, it is a lower level protocol than NFS is, so
they forced guidelines on it and violated the standard. 
 > 
 > -Ross
 > 

Not True. 


ZFS exposes LUNS (or ZVOL) and while at first we didn't support
DKIOCSETWCE, we now do. So a ZFS LUN can be whatever you
need it to be.

Now in the context of iSCSI luns hosted by a resilient
storage system, enabling write caches is to be used only in
very specific circumstances. The situation is not symmetrical
with WCE in disks of a JBOD since that can be setup with
enough redundancy to deal with potential data loss. When
using a resilient storage, you need to trust the storage for
persistence of SCSI commands and building a resilient system
on top of write cache enabled SCSI channels is not trivial.





Then Matts points out

  As I indicated above, there is a mac filesystem on the iscsi volume.
  Matt. 


On the read side, single threaded performance is just very much controled by
the readahead. Each filesystem will implement something
different, the fact that you got 3X more throughput with SMB
that with the Mac (HSFS+?) simply means that SMB had a 3X
larger readahead buffer than HSFS.

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Corrupt file without filename

2010-08-04 Thread valrh...@gmail.com
I have one corrupt file in my rpool, but when I run "zpool status -v", I don't 
get a filename, just an address. Any idea how to fix this? Here's the output:

p...@dellt7500:~# zpool status -v rpool 
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
rpool   ONLINE   0 0 0
  c4t0d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

rpool/export/home/plu:<0x12491>
p...@dellt7500:~#
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Logical Units and ZFS send / receive

2010-08-04 Thread Terry Hull

I have a logical unit created with sbdadm create-lu that it I replicating
with zfs send  / receive between 2 build 134 hosts.   The these LUs are
iSCSI targets used as VMFS filesystems and ESX RDMs mounted on a Windows
2003 machine.   The zfs pool names are the same on both machines.  The
replication seems to be going correctly.  However, when I try to use the LUs
on the server I am replicating the data to, I have issues.   Here is the
scenario:  

The LUs are created as sparse.  Here is the process I¹m going through after
the snapshots are replicated to a secondary machine:
* Original machine:  svccfg export -a stmf > /tmp/stmf.cfg
* Copy stmf.cfg to second machine:
* Secondary machine:  svcadm disable stmf
* svccfg delete xtmf
* cd /var/svc/manifest
* svccfg import system/stmf.xml
* svcadm disable stmf
* svcadm import /tmp/stmf.cfg

At this point stmfadm list-lu ­v shows the SCSI LUs  all as ³unregistered²

When I try to import the LUs I get: stmfadm: meta data error

I am using the command:
stmfadm import-lu /dev/zvol/rdsk/pool-name

to import the LU

It is as if the pool does not exist.  However, I can verify that the pool
does actually exist with zfs list and with zfs list ­t snapshot to show the
snapshot that I replicated.


Any suggestions?  
--
Terry Hull
Network Resource Group, Inc.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss