Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.


one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not 
sure  which version of solaris or openSolaris ypu are using the 
procedure may be different depend on version

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, 
though, and a suitable external cabinet for connecting one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--


attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hans J Albertsson
I suppose I must start by labelling the new disk properly, and give the s0 
partition to zpool, so the new zpool can be booted?




Skickat från min Android MobilHung-Sheng Tsao Ph.D. laot...@gmail.com skrev:
one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not sure  
which version of solaris or openSolaris ypu are using the procedure may be 
different depend on version
5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:
I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, though, 
and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger disks so I 
can then use the larger disks for booting?

I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov

2012-06-15 16:14, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector,
though, and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger
disks so I can then use the larger disks for booting?

I have no extra machine to use.


I think this question was recently asked and discussed on another list;
my suggestion would be more low-level than that suggested by others:

0) Boot from a LiveCD/LiveUSB so that your rpool's environment
   doesn't change during the migration, and so that you can
   ultimately rename your new rpool to its old name.
   It is not fatal if you don't use a LiveMedia environment,
   but it can be problematic to rename a running rpool, and
   some of your programs might depend on its known name as
   recorded in some config file or service properties.

1) Break the existing mirror, reducing it to a single-disk pool

2) Install the new disk, slice it, create an rpool2 on it.
   NOTE that you might not want all 2TB to be the rpool2,
   but rather you might dedicate several tens of GBs to
   a root-pool partition or slice, and store the rest as a
   data pool - perhaps implemented with different choices
   on caching, dedup, etc.
   NOTE also that you might need to apply some tricks to
   enforce that the new pool uses ashift=12 if that (4KB)
   is your hardware native sector size. We had some info
   recently on the mailing lists and carried that over to
   illumos wiki: 
http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks


3) # zfs snapshot -r rpool@20120615-preMigration
4) # zfs send -R rpool@20120615-preMigration | \
 zfs recv -vFd rpool2
   NOTE this assumes you do want the whole old rpool into rpool2.
   If you decide you want something on a data pool, i.e. the
   /export/* datasets - you'd have to make that pool and send
   the datasets there in a similar manner, and send the root pool
   datasets not in one recursive command, but in several sets i.e.
   for rpool/ROOT and rpool/swap and rpool/dump in the default
   layout.

5) # zpool get all rpool
   # zpool get all rpool2

   Compare the pool settings. Carry over the local changes with
   # zpool set property=value rpool2
   You'll likely change bootfs, failmode, maybe some others.

6) installgrub onto the new disk so it becomes bootable

7) If you're on live media, try to rename the new rpool2 to
   become rpool, i.e.:
   # zpool export rpool2
   # zpool export rpool
   # zpool import -N rpool rpool2
   # zpool export rpool

8) Reboot, disconnecting your remaining old disk, and hope that
   the new pool boots okay. It should ;)
   When it's ok, attach the second new disk to the system and
   slice it similarly (prtvtoc|fmthard usually helps, google it).
   Then attach the new second disk's slices to your new rpool
   (and data pool if you've made one), installgrub onto the second
   disk - and you're done.

HTH,
//Jim Klimov
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov

2012-06-15 17:18, Jim Klimov wrote:

7) If you're on live media, try to rename the new rpool2 to
become rpool, i.e.:
# zpool export rpool2
# zpool export rpool
# zpool import -N rpool rpool2
# zpool export rpool


Ooops, bad typo in third line; should be:

 # zpool export rpool2
 # zpool export rpool
 # zpool import -N rpool2 rpool
 # zpool export rpool

Sorry,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Johannes Totz
On 15/06/2012 13:22, Sašo Kiselkov wrote:
 On 06/15/2012 02:14 PM, Hans J Albertsson wrote:
 I've got my root pool on a mirror on 2 512 byte blocksize disks. I
 want to move the root pool to two 2 TB disks with 4k blocks. The
 server only has room for two disks. I do have an esata connector,
 though, and a suitable external cabinet for connecting one extra disk.
 
 How would I go about migrating/expanding the root pool to the
 larger disks so I can then use the larger disks for booting?
 I have no extra machine to use.
 
 Suppose we call the disks like so:
 
   A, B: your old 512-block drives
   X, Y: your new 2TB drives
 
 The easiest way would be to simply:
 
 1) zpool set autoexpand=on rpool
 2) offline the A drive
 3) physically replace it with the X drive
 4) do a zpool replace on it and wait for it to resilver

When sector size differs, attaching it is going to fail (at least on fbsd).
You might not get around a send-receive cycle...

 5) offline the B drive
 6) physically replace it with the Y drive
 7) do a zpool replace on it and wait for it to resilver
 
 At this point, you should have a 2TB rpool (thanks to the
 autoexpand=on in step 1). Unfortunately, to my knowledge, there is no
 way to convert a bshift=9 pool (512 byte sectors) to a bshift=13 pool
 (4k sectors). Perhaps some great ZFS guru can shed more light on this.
 
 --
 Saso
 



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Sašo Kiselkov
On 06/15/2012 03:35 PM, Johannes Totz wrote:
 On 15/06/2012 13:22, Sašo Kiselkov wrote:
 On 06/15/2012 02:14 PM, Hans J Albertsson wrote:
 I've got my root pool on a mirror on 2 512 byte blocksize disks. I
 want to move the root pool to two 2 TB disks with 4k blocks. The
 server only has room for two disks. I do have an esata connector,
 though, and a suitable external cabinet for connecting one extra disk.

 How would I go about migrating/expanding the root pool to the
 larger disks so I can then use the larger disks for booting?
 I have no extra machine to use.

 Suppose we call the disks like so:

   A, B: your old 512-block drives
   X, Y: your new 2TB drives

 The easiest way would be to simply:

 1) zpool set autoexpand=on rpool
 2) offline the A drive
 3) physically replace it with the X drive
 4) do a zpool replace on it and wait for it to resilver
 
 When sector size differs, attaching it is going to fail (at least on fbsd).
 You might not get around a send-receive cycle...

Jim Klimov has already posted a way better guide, which rebuilds the
pool using the old one's data, so yeah, the replace route I recommended
here is rendered moot.

--
Saso
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.

yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

Hung-Sheng Tsao Ph.D. laot...@gmail.com skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source (not 
sure  which version of solaris or openSolaris ypu are using the 
procedure may be different depend on version

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector, 
though, and a suitable external cabinet for connecting one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--


attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.

by the way
when you format start with cylinder 1 donot use 0
depend on the version of Solaris you may not be able to use 2TB as root
regards


On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote:

yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

Hung-Sheng Tsao Ph.D. laot...@gmail.com skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source 
(not sure  which version of solaris or openSolaris ypu are using 
the procedure may be different depend on version

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata 
connector, though, and a suitable external cabinet for connecting 
one extra disk.


How would I go about migrating/expanding the root pool to the larger 
disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--



--


attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Richard Elling
[Phil beat me to it]
Yes, the 0s are a result of integer division in DTrace/kernel.

On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote:

 Indeed they are there, shown with 1 second interval.  So, it is the
 client's fault after all.  I'll have to see whether it is somehow
 possible to get the server to write cached data sooner (and hopefully
 asynchronous), and the client to issue commits less often.  Luckily I
 can live with the current behavior (and the SSDs shouldn't give out
 any time soon even being used like this), if it isn't possible to
 change it.

If this is the proposed workload, then it is possible to tune the DMU to
manage commits more efficiently. In an ideal world, it does this automatically,
but the algorithms are based on a bandwidth calculation and those are not
suitable for HDD capacity planning. The efficiency goal would be to do less
work, more often and there are two tunables that can apply:

1. the txg_timeout controls the default maximum transaction group commit
interval and is set to 5 seconds on modern ZFS implementations.

2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg 
will
be committed when the size reaches this limit, rather than waiting for the
txg_timeout. For streaming writes, this can work better than tuning the 
txg_timeout.

 -- richard

 
 Thanks for all the help,
 Tim
 
 On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman phil.har...@gmail.com wrote:
 On 14 Jun 2012, at 23:15, Timothy Coalson tsc...@mst.edu wrote:
 
 The client is using async writes, that include commits. Sync writes do not
 need commits.
 
 Are you saying nfs commit operations sent by the client aren't always
 reported by that script?
 
 They are not reported in your case because the commit rate is less than one 
 per second.
 
 DTrace is an amazing tool, but it does dictate certain coding compromises, 
 particularly when it comes to output scaling, grouping, sorting and 
 formatting.
 
 In this script the commit rate is calculated using integer division. In your 
 case the sample interval is 5 seconds, so up to 4 commits per second will be 
 reported as a big fat zero.
 
 If you use a sample interval of 1 second you should see occasional commits. 
 We know they are there because we see a non-zero commit time.
 
 

-- 

ZFS and performance consulting
http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Richard Elling
On Jun 14, 2012, at 1:35 PM, Robert Milkowski wrote:

 The client is using async writes, that include commits. Sync writes do
 not need commits.
 
 What happens is that the ZFS transaction group commit occurs at more-
 or-less regular intervals, likely 5 seconds for more modern ZFS
 systems. When the commit occurs, any data that is in the ARC but not
 commited in a prior transaction group gets sent to the ZIL
 
 Are you sure? I don't think this is the case unless I misunderstood you or
 this is some recent change to Illumos.

Need to make sure we are clear here, there is time between the txg being
closed and the txg being on disk. During that period, a sync write of the
data in the closed txg is written to the ZIL.

 Whatever is being committed when zfs txg closes goes directly to pool and
 not to zil. Only sync writes will go to zil right a way (and not always, see
 logbias, etc.) and to arc to be committed later to a pool when txg closes.

In this specific case, there are separate log devices, so logbias doesn't apply.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Hung-Sheng Tsao Ph.D.

hi
what is the version of Solaris?
uname -a output?
regards


On 6/15/2012 10:37 AM, Hung-Sheng Tsao Ph.D. wrote:

by the way
when you format start with cylinder 1 donot use 0
depend on the version of Solaris you may not be able to use 2TB as root
regards


On 6/15/2012 9:53 AM, Hung-Sheng Tsao Ph.D. wrote:

yes
which version of solaris or bsd you are using?
for bsd I donot know the steps for create new BE (boot env)
for s10 and opensolaris and solaris express (may be other opensolaris 
fork) , you use the liveupgrade

for s11 you use beadm
regards



On 6/15/2012 9:13 AM, Hans J Albertsson wrote:
I suppose I must start by labelling the new disk properly, and give 
the s0 partition to zpool, so the new zpool can be booted?





Skickat från min Android Mobil

Hung-Sheng Tsao Ph.D. laot...@gmail.com skrev:

one possible way:
1)break the mirror
2)install new hdd, format the HDD
3)create new zpool on new hdd with 4k block
4)create new BE  on the new pool with the old root pool as source 
(not sure  which version of solaris or openSolaris ypu are using 
the procedure may be different depend on version

5)activate the new BE
6)boot the new BE
7)destroy the old zpool
8)replace old HDD with new HDD
9)format the HDD
10)attach the HDD to the new root pool
regards



On 6/15/2012 8:14 AM, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata 
connector, though, and a suitable external cabinet for connecting 
one extra disk.


How would I go about migrating/expanding the root pool to the 
larger disks so I can then use the larger disks for booting?


I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--



--



--



--


attachment: laotsao.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Timothy Coalson
Thanks for the suggestions.  I think it would also depend on whether
the nfs server has tried to write asynchronously to the pool in the
meantime, which I am unsure how to test, other than making the txgs
extremely frequent and watching the load on the log devices.  As for
the integer division giving misleading zeros, one possible solution is
to add (delay-1) to the count before dividing by delay, so if there
are any, it will show at least 1 (or you could get fancy and do fixed
point numbers).

As for very frequent txgs, I imagine this could cause more
fragmentation (more metadata written and discarded more frequently),
is there a way to estimate or test for the impact of it?  Depending on
how it allocates the metadata blocks, I suppose it could write it to
the blocks recently vacated by old metadata due to the previous txg,
and have almost no impact until a snapshot is taken, is it smart
enough to do this?

Tim

On Fri, Jun 15, 2012 at 10:56 AM, Richard Elling
richard.ell...@gmail.com wrote:
 [Phil beat me to it]
 Yes, the 0s are a result of integer division in DTrace/kernel.

 On Jun 14, 2012, at 9:20 PM, Timothy Coalson wrote:

 Indeed they are there, shown with 1 second interval.  So, it is the
 client's fault after all.  I'll have to see whether it is somehow
 possible to get the server to write cached data sooner (and hopefully
 asynchronous), and the client to issue commits less often.  Luckily I
 can live with the current behavior (and the SSDs shouldn't give out
 any time soon even being used like this), if it isn't possible to
 change it.

 If this is the proposed workload, then it is possible to tune the DMU to
 manage commits more efficiently. In an ideal world, it does this 
 automatically,
 but the algorithms are based on a bandwidth calculation and those are not
 suitable for HDD capacity planning. The efficiency goal would be to do less
 work, more often and there are two tunables that can apply:

 1. the txg_timeout controls the default maximum transaction group commit
 interval and is set to 5 seconds on modern ZFS implementations.

 2. the zfs_write_limit is a size limit for txg commit. The idea is that a txg 
 will
 be committed when the size reaches this limit, rather than waiting for the
 txg_timeout. For streaming writes, this can work better than tuning the
 txg_timeout.

  -- richard


 Thanks for all the help,
 Tim

 On Thu, Jun 14, 2012 at 10:30 PM, Phil Harman phil.har...@gmail.com wrote:
 On 14 Jun 2012, at 23:15, Timothy Coalson tsc...@mst.edu wrote:

 The client is using async writes, that include commits. Sync writes do not
 need commits.

 Are you saying nfs commit operations sent by the client aren't always
 reported by that script?

 They are not reported in your case because the commit rate is less than one 
 per second.

 DTrace is an amazing tool, but it does dictate certain coding compromises, 
 particularly when it comes to output scaling, grouping, sorting and 
 formatting.

 In this script the commit rate is calculated using integer division. In 
 your case the sample interval is 5 seconds, so up to 4 commits per second 
 will be reported as a big fat zero.

 If you use a sample interval of 1 second you should see occasional commits. 
 We know they are there because we see a non-zero commit time.



 --

 ZFS and performance consulting
 http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Cindy Swearingen

Hi Hans,

Its important to identify your OS release to determine if
booting from a 4k disk is supported.

Thanks,

Cindy



On 06/15/12 06:14, Hans J Albertsson wrote:

I've got my root pool on a mirror on 2 512 byte blocksize disks.
I want to move the root pool to two 2 TB disks with 4k blocks.
The server only has room for two disks. I do have an esata connector,
though, and a suitable external cabinet for connecting one extra disk.

How would I go about migrating/expanding the root pool to the larger
disks so I can then use the larger disks for booting?

I have no extra machine to use.



Skickat från min Android Mobil


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread John Martin

On 06/15/12 15:52, Cindy Swearingen wrote:


Its important to identify your OS release to determine if
booting from a 4k disk is supported.


In addition, whether the drive is really 4096p or 512e/4096p.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-15 Thread Timothy Coalson
On Fri, Jun 15, 2012 at 12:56 PM, Timothy Coalson tsc...@mst.edu wrote:
 Thanks for the suggestions.  I think it would also depend on whether
 the nfs server has tried to write asynchronously to the pool in the
 meantime, which I am unsure how to test, other than making the txgs
 extremely frequent and watching the load on the log devices.

I didn't want to reboot the main file server to test this, so I used
zilstat on the backup nfs server (which has nearly identical hardware
and configuration, but doesn't have SSDs for a separate ZIL) to see if
I could estimate the difference it would make, and the story got
stranger: it wrote far less data to the ZIL for the same copy
operation (single 8GB file):

$ sudo ./zilstat -M -l 20 -p backuppool txg
waiting for txg commit...
   txg   N-MB N-MB/s N-Max-Rate   B-MB B-MB/s
B-Max-Rateops  =4kB 4-32kB =32kB
   2833307  1  0  1  1  0
1 15  0  0 15
   2833308  0  0  0  0  0
0  0  0  0  0
   2833309  1  0  1  1  0
1  8  0  0  8
   2833310  0  0  0  0  0
0  4  0  0  4
   2833311  1  0  0  1  0
0  9  0  0  9
   2833312  0  0  0  0  0
0  0  0  0  0
   2833313  2  0  2  2  0
2 21  0  0 21
   2833314  7  1  7  8  1
8 63  0  0 63
   2833315  1  0  1  2  0
2 18  0  0 18
   2833316  0  0  0  0  0
0  5  0  0  5

A small sample from the server with SSD log devices doing the same operation:

$ sudo ./zilstat -M -l 20 -p mainpool txg
waiting for txg commit...
   txg   N-MB N-MB/s N-Max-Rate   B-MB B-MB/s
B-Max-Rateops  =4kB 4-32kB =32kB
   2808483989197593   1967393
 1180  15010  0  0  15010
   2808484599 99208   1134189
  393   8653  0  0   8653
   2808485  0  0  0  0  0
0  0  0  0  0
   2808486137 27126255 51
  235   1953  0  0   1953
   2808487460 92460859171
  859   6555  0  0   6555
   2808488530 75530   1031147
 1031   7871  0  0   7871

Setting logbias=throughput makes the server with the SSD log devices
act the same as the server without them, as far as I can tell, which I
somewhat expected.  However, I did not expect use of separate log
devices to change how often ZIL ops are performed, other than to raise
the upper limit if the device can service more IOPS.  Additionally,
nfssvrtop showed a lower value for Com_t when not using the separate
log device (2.1s with logbias=latency, 0.24s with throughput).
Copying a folder with small files and subdirectories pushes the server
to ~400 ZIL ops per txg with logbias=throughput, so it shouldn't be
the device performance making it only issue ~15 ops per txg copying a
large file without using a separate log device.  I am thinking of
transplanting one of the SSDs temporarily for testing, but I would be
interested to know the cause of this behavior.  I don't know why more
asynchronous writes seem to be making it into txgs without being
caught by an nfs commit when a separate log device isn't used.

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Jim Klimov

2012-06-16 0:05, John Martin wrote:

Its important to know...

...whether the drive is really 4096p or 512e/4096p.


BTW, is there a surefire way to learn that programmatically
from Solaris or its derivates (i.e. from SCSI driver options,
format/scsi/inquiry, SMART or some similar way)? Or if the
drive lies, saying its sectors are 512b while they physically
are 4KB - it is undetectable except by reading vendor specs?

Thanks,
//Jim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Timothy Coalson
On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov jimkli...@cos.ru wrote:
 2012-06-16 0:05, John Martin wrote:

 Its important to know...

 ...whether the drive is really 4096p or 512e/4096p.


 BTW, is there a surefire way to learn that programmatically
 from Solaris or its derivates

prtvtoc device should show the block size the OS thinks it has.  Or
you can use format, select the disk from a list that includes the
model number and size, and use verify.

Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating 512 byte block zfs root pool to 4k disks

2012-06-15 Thread Timothy Coalson
Sorry, if you meant distinguishing between true 512 and emulated
512/4k, I don't know, it may be vendor-specific as to whether they
expose it through device commands at all.

Tim

On Fri, Jun 15, 2012 at 6:02 PM, Timothy Coalson tsc...@mst.edu wrote:
 On Fri, Jun 15, 2012 at 5:35 PM, Jim Klimov jimkli...@cos.ru wrote:
 2012-06-16 0:05, John Martin wrote:

 Its important to know...

 ...whether the drive is really 4096p or 512e/4096p.


 BTW, is there a surefire way to learn that programmatically
 from Solaris or its derivates

 prtvtoc device should show the block size the OS thinks it has.  Or
 you can use format, select the disk from a list that includes the
 model number and size, and use verify.

 Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Recovery of RAIDZ with broken label(s)

2012-06-15 Thread Scott Aitken
On Fri, Jun 15, 2012 at 10:54:34AM +0200, Stefan Ring wrote:
  Have you also mounted the broken image as /dev/lofi/2?
 
  Yep.
 
 Wouldn't it be better to just remove the corrupted device? This worked
 just fine in my case.

 
Hi Stefan,

when you say remove the device, I assume you mean simply make it unavailable
for import (I can't remove it from the vdev).

This is what happens (lofi/2 is the drive which ZFS thinks has corrupted
data):

oot@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
   see: http://www.sun.com/msg/ZFS-8000-5E
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0ONLINE
12339070507640025002  UNAVAIL  corrupted data
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE
root@openindiana-01:/mnt# lofiadm -d /dev/lofi/2
root@openindiana-01:/mnt# zpool import -d /dev/lofi
  pool: ZP-8T-RZ1-01
id: 9952605666247778346
 state: FAULTED
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-3C
config:

ZP-8T-RZ1-01  FAULTED  corrupted data
  raidz1-0DEGRADED
12339070507640025002  UNAVAIL  cannot open
/dev/lofi/5   ONLINE
/dev/lofi/4   ONLINE
/dev/lofi/3   ONLINE
/dev/lofi/1   ONLINE

So in the second import, it complains that it can't open the device, rather
than saying it has corrupted data.

It's interesting that even though 4 of the 5 disks are available, it still
can import it as DEGRADED.

Thanks again.
Scott
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss