Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Ian Collins

Andrew Werchowiecki wrote:


Thanks for the info about slices, I may give that a go later on. I’m 
not keen on that because I have clear evidence (as in zpools set up 
this way, right now, working, without issue) that GPT partitions of 
the style shown above work and I want to see why it doesn’t work in my 
set up rather than simply ignoring and moving on.




Didn't you read Richard's post? You can have only one Solaris partition 
at a time.


Your original example failed when you tried to add a second.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Ian Collins

Andrew Werchowiecki wrote:


Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone 
got any ideas?


muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2

Password:

cannot open '/dev/dsk/c25t10d1p2': I/O error

muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each 
drive for use as a mirrored write cache. I also have the remainder of 
the drive partitioned for use as the read only cache. However, when 
attempting to add it I get the error above.




Create one 100% Solaris partition and then use format to create two slices.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Robert Milkowski wrote:


Solaris 11.1 (free for non-prod use).



But a ticking bomb if you use a cache device.

--

Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Robert Milkowski wrote:

Robert Milkowski wrote:

Solaris 11.1 (free for non-prod use).


But a ticking bomb if you use a cache device.


It's been fixed in SRU (although this is only for customers with a support
contract - still, will be in 11.2 as well).

Then, I'm sure there are other bugs which are fixed in S11 and not in
Illumos (and vice-versa).



There may well be, but in seven+ years of using ZFS, this was the first 
one to cost me a pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SVM ZFS

2013-02-26 Thread Ian Collins

Alfredo De Luca wrote:
On Wed, Feb 27, 2013 at 10:36 AM, Paul Kraus p...@kraus-haus.org 
mailto:p...@kraus-haus.org wrote:


On Feb 26, 2013, at 6:19 PM, Jim Klimov jimkli...@cos.ru
mailto:jimkli...@cos.ru wrote:

 Ah, I forgot to mention - ufsdump|ufsrestore was at some time also
 a recommended way of such transition ;)

The last time I looked at using ufsdump/ufsrestore for
this ufsrestore was NOT aware of ZFS ACL semantics. That was under
Solaris 10, but I would be surprised if the ufsrestore code has
changed since then.




what about Solaris live upgrade?



It's been a long time, but I'm sure LU only supports UFS-ZFS for the 
root pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Bob Friesenhahn wrote:

On Tue, 26 Feb 2013, Richard Elling wrote:

Consider using different policies for different data. For traditional file 
systems, you
had relatively few policy options: readonly, nosuid, quota, etc. With ZFS, 
dedup and
compression are also policy options. In your case, dedup for your media is not 
likely
to be a good policy, but dedup for your backups could be a win (unless you're 
using
something that already doesn't backup duplicate data -- eg most backup 
utilities).
A way to approach this is to think of your directory structure and create file 
systems
to match the policies. For example:

I am finding that rsync with the right options (to directly
block-overwrite) plus zfs snapshots is providing me with pretty
amazing deduplication for backups without even enabling
deduplication in zfs.  Now backup storage goes a very long way.


We do the same for all of our legacy operating system backups. Take a 
snapshot then do an rsync and an excellent way of maintaining 
incremental backups for those.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Distro Advice

2013-02-26 Thread Ian Collins

Bob Friesenhahn wrote:

On Wed, 27 Feb 2013, Ian Collins wrote:

I am finding that rsync with the right options (to directly
block-overwrite) plus zfs snapshots is providing me with pretty
amazing deduplication for backups without even enabling
deduplication in zfs.  Now backup storage goes a very long way.

We do the same for all of our legacy operating system backups. Take a
snapshot then do an rsync and an excellent way of maintaining incremental
backups for those.

Magic rsync options used:

-a --inplace --no-whole-file --delete-excluded

This causes rsync to overwrite the file blocks in place rather than
writing to a new temporary file first.  As a result, zfs COW produces
primitive deduplication of at least the unchanged blocks (by writing
nothing) while writing new COW blocks for the changed blocks.


Do these options impact performance or reduce the incremental stream sizes?

I just use -a --delete and the snapshots don't take up much space 
(compared with the incremental stream sizes).


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool

2013-02-20 Thread Ian Collins

Peter Wood wrote:

I'm using OpenIndiana 151a7, zpool v28, zfs v5.

When I bought my storage servers I intentionally left hdd slots 
available so I can add another vdev when needed and delay immediate 
expenses.


After reading some posts on the mailing list I'm getting concerned 
about degrading performance due to unequal distribution of data among 
the vdevs. I still have a chance to migrate the data away, add all 
drives and rebuild the pools and start fresh.


Before going that road I was hoping to hear your opinion on what will 
be the best way to handle this.


System: Supermicro with 36 hdd bays. 28 bays filled with 3TB SAS 7.2K 
enterprise drives. 8 bays available to add another vdev to the pool.


Pool configuration:

snip

#

Will adding another vdev hurt the performance?


How full is the pool?

When I've added (or grown an existing) vdev, I used zfs send to make a
copy of a suitably large filesystem, then deleted the original and
renamed the copy.  I had to do this a couple of times to redistribute
data, but it saved a lot of down time.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool

2013-02-20 Thread Ian Collins

Bob Friesenhahn wrote:

On Thu, 21 Feb 2013, Sašo Kiselkov wrote:


On 02/21/2013 12:27 AM, Peter Wood wrote:

Will adding another vdev hurt the performance?

In general, the answer is: no. ZFS will try to balance writes to
top-level vdevs in a fashion that assures even data distribution. If
your data is equally likely to be hit in all places, then you will not
incur any performance penalties. If, OTOH, newer data is more likely to
be hit than old data
, then yes, newer data will be served from fewer spindles. In that case
it is possible to do a send/receive of the affected datasets into new
locations and then renaming them.

You have this reversed.  The older data is served from fewer spindles
than data written after the new vdev is added. Performance with the
newer data should be improved.


Not if the pool is close to full, when new data will end up on fewer 
spindles (the new or extended vdev).


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Is there performance penalty when adding vdev to existing pool

2013-02-20 Thread Ian Collins

Peter Wood wrote:

Currently the pool is about 20% full:
# zpool list pool01
NAME SIZE  ALLOC   FREE  EXPANDSZCAP  DEDUP  HEALTH  ALTROOT
pool01  65.2T  15.4T  49.9T -23%  1.00x  ONLINE  -
#



So you will be about 15% full after adding a new vdev.

Unless you are likely to get too close to filling the enlarged pool, you 
will probably be OK performance wise.  The old data access times will be 
no worse, the new data better.


If you can spread some of your old data around after added the new vdev, 
do so.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-17 Thread Ian Collins

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote:

From: Tim Cook [mailto:t...@cook.ms]

We can agree to disagree.

I think you're still operating under the auspices of Oracle wanting to have an
open discussion.  This is patently false.

I'm just going to respond to this by saying thank you, Cindy, Casper, Neil, and 
others, for all the help over the years.  I think we all agree it was cooler 
when opensolaris was open, but things are beyond our control, so be it.  Moving 
forward, I don't expect Oracle to be any more open than MS or Apple or Google, 
which is to say, I understand there's stuff you can't talk about, and support 
you can't give freely or openly.  But to the extent you're still able to 
discuss publicly known things, thank you.


+1.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! RPool problem

2013-02-16 Thread Ian Collins

Sašo Kiselkov wrote:

On 02/16/2013 09:49 PM, John D Groenveld wrote:

Boot with kernel debugger so you can see the panic.

Sadly, though, without access to the source code, all he do can at that
point is log a support ticket with Oracle (assuming he has paid his
support fees) and hope it will get picked up by somebody there. People
on this list have few, if any ways of helping out.


If he can boot from a recent install media and import the pool, that's a 
pretty good indicator that the problem has been fixed. He can then 
upgrade the what ever he booted with (which could be OI or Solaris11.1) 
and recover his data.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-16 Thread Ian Collins

Toby Thain wrote:

Signed up, thanks.

The ZFS list has been very high value and I thank everyone whose wisdom
I have enjoyed, especially people like you Sašo, Mr Elling, Mr
Friesenhahn, Mr Harvey, the distinguished Sun and Oracle engineers who
post here, and many others.

Let the Illumos list thrive.


This list certainly has been high value for ZFS users (I think I 
subscribed the day is started!).


One of its main advantages is it has been platform agnostic.  We see 
Solaris, Illumos, BSD and more recently ZFS on Linux questions all give 
the same respect.


I do hope we can get another, platform agnostic, home for this list.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs-discuss mailing list opensolaris EOL

2013-02-16 Thread Ian Collins

Richard Elling wrote:

On Feb 16, 2013, at 10:16 PM, Bryan Horstmann-Allen b...@mirrorshades.net 
wrote:


+--
| On 2013-02-17 18:40:47, Ian Collins wrote:
|

One of its main advantages is it has been platform agnostic.  We see
Solaris, Illumos, BSD and more recently ZFS on Linux questions all give the
same respect.

I do hope we can get another, platform agnostic, home for this list.

As the guy who provides the illumos mailing list services, and as someone who
has deeply vested interests in seeing ZFS thrive on all platforms, I'm happy to
suggest that we'd welcome all comers on z...@lists.illumos.org.

+1


Me to.

One list is certainly better than 1!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zfs writes

2013-02-12 Thread Ian Collins

Ram Chander wrote:


Hi Roy,
You are right. So it looks like re-distribution issue. Initially  
there were two Vdev with 24 disks ( disk 0-23 ) for close to year. 
After which  which we added 24 more disks and created additional 
vdevs. The initial vdevs are filled up and so write speed declined. 
Now  how to find files that are present in a Vdev or a disk. That way 
I can remove and re-copy back to distribute data. Any other way to 
solve this ?



The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Slow zfs writes

2013-02-12 Thread Ian Collins

Jim Klimov wrote:

On 2013-02-12 10:32, Ian Collins wrote:

Ram Chander wrote:

Hi Roy,
You are right. So it looks like re-distribution issue. Initially there
were two Vdev with 24 disks ( disk 0-23 ) for close to year. After
which  which we added 24 more disks and created additional vdevs. The
initial vdevs are filled up and so write speed declined. Now  how to
find files that are present in a Vdev or a disk. That way I can remove
and re-copy back to distribute data. Any other way to solve this ?


The only way is to avoid the problem in the first place by not mixing
vdev sizes in a pool.



I was a bit quick off the mark there, I didn't notice that some vdevs 
were older than others.



Well, that disbalance is there - in the zpool status printout we see
raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares
- which seems to sum up to 48 ;)


The vdev sizes are about (including parity space) 14, 14, 22, 19, 19, 
19TB respectively and 127TB total.  So even if the data is balanced, the 
performance of this pool will still start to degrade once ~84TB (about 
2/3 full) are used.


So the only viable long term solution is a rebuild, or putting bigger 
drives in the two smallest vdevs.


In the short term, when I've had similar issues I used zfs send to copy 
a large filesystem within the pool then renamed the copy to the original 
name and deleted the original.  This can be repeated until you have an 
acceptable distribution.


One last thing: unless this is some form of backup pool, or the data on 
it isn't important, avoid raidz vdevs in such a large pool!


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Bizarre receive error

2013-02-08 Thread Ian Collins
I recently had to recover a lot of data from my backup pool which is on 
a Solaris 11 system.  I'm now sending regular snapshots back to the pool 
and all was well until the pool became nearly full.  I then started 
getting receive failures:


receiving incremental stream of tank/vbox/windows@Wednesday_1800 into 
backup/vbox/windows@Wednesday_1800
zfs_receive: Can't mount a version 6 file system on a version 33 pool. 
Pool must be upgraded to mount this file system.


When I freed up space on the pool, the errors stopped:

receiving incremental stream of tank/vbox/windows@Wednesday_1800 into 
backup/vbox/windows@Wednesday_1800

received 380MB stream in 18 seconds (21.1MB/sec)

On the Solaris 11.1 sender:

zfs get -H version tank/vbox/windows
tank/vbox/windows   version 5   -

Odd!  I assume an error code was being misreported.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-23 Thread Ian Collins

Jim Klimov wrote:

On 2013-01-23 09:41, casper@oracle.com wrote:

Yes and no: the system reserves a lot of additional memory (Solaris
doesn't over-commits swap) and swap is needed to support those
reservations.  Also, some pages are dirtied early on and never touched
again; those pages should not be kept in memory.


I believe, by the symptoms, that this is what happens often
in particular to Java processes (app-servers and such) - I do
regularly see these have large VM sizes and much (3x) smaller
RSS sizes.


Being swapped out is probably the best thing that can be done to most 
Java processes :)


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RFE: Un-dedup for unique blocks

2013-01-22 Thread Ian Collins

Darren J Moffat wrote:

It is a mechanism for part of the storage system above the disk (eg
ZFS) to inform the disk that it is no longer using a given set of blocks.

This is useful when using an SSD - see Saso's excellent response on that.

However it can also be very useful when your disk is an iSCSI LUN.  It
allows the filesystem layer (eg ZFS or NTFS, etc) when on iSCSI LUN that
advertises SCSI UNMAP to tell the target there are blocks in that LUN it
isn't using any more (eg it just deleted some blocks).


That is something I have been waiting a long time for!  I have to run a 
periodic fill the pool with zeros cycle on a couple of iSCSI backed 
pools to reclaim free space.


I guess the big question is do oracle storage appliances advertise SCSI 
UNMAP?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Odd snapshots exposed in Solaris 11.1

2013-01-21 Thread Ian Collins

Since upgrading to Solaris 11.1, I've started seeing snapshots like

tank/vbox/shares%VMs

appearing with zfs list -t snapshot.

I thought snapshots with a % in their name where private objects created 
during a send/receive operation.  These snapshots don't have many 
properties:


zfs get all tank/vbox/shares%VMs
NAME  PROPERTYVALUE  SOURCE
tank/vbox/shares%VMs  creationTue Jan 15  9:15 2013  -
tank/vbox/shares%VMs  mountpoint  /vbox/shares   -
tank/vbox/shares%VMs  share.* ...local
tank/vbox/shares%VMs  zoned   offdefault

Which is casing one of my scripts grief.

Does anyone know why these are showing up?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Ian Collins

Cindy Swearingen wrote:

Hi Jamie,

Yes, that is correct.

The S11u1 version of this bug is:

https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

and has this notation which means Solaris 11.1 SRU 3.4:

Changeset pushed to build 0.175.1.3.0.4.0

Hello Cindy,

I really really hope this will be a public update.  Within a week of 
upgrading to 11.1 I hit this bug and I had to rebuild my main pool. I'm 
still restoring backups.


Without this fix, 11.1 is a bomb waiting to go off!

--
Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool error in metadata:0x0

2012-12-08 Thread Ian Collins

Jim Klimov wrote:

I've had this error on my pool since over a year ago, when I
posted and asked about it. The general consent was that this
is only fixable by recreation of the pool, and that if things
don't die right away, the problem may be benign (i.e. in some
first blocks of MOS that are in practice written once and not
really used nor relied upon).

In detailed zpool status this error shows as:
metadata:0x0

By analogy to other errors in unnamed files, this was deemed to
be the MOS dataset, object number 0.


Unlike you, I haven't had to time or patience to dig deeper into this!  
The only times I have seen this error are in iSCSI pools when the target 
machine's pool became full, causing bizarre errors in the iSCSI client 
pools.


Once the underlying problem was fixed and the pools imported and 
exported, the error went away.


This might enable you to recreate the error for testing.

--
Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs on SunFire X2100M2 with hybrid pools

2012-11-28 Thread Ian Collins

Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Jim Klimov

I really hope someone better versed in compression - like Saso -
would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
terms of read-speeds from the pools. My HDD-based assumption is
in general that the less data you read (or write) on platters -
the better, and the spare CPU cycles can usually take the hit.

Oh, I can definitely field that one -
The lzjb compression (default compression as long as you just turn compression on without 
specifying any other detail) is very fast compression, similar to lzo.  It generally has 
no noticeable CPU overhead, but it saves you a lot of time and space for highly 
repetitive things like text files (source code) and sparse zero-filled files and stuff 
like that.  I personally always enable this.  compresson=on

zlib (gzip) is more powerful, but *way* slower.  Even the fastest level gzip-1 
uses enough CPU cycles that you probably will be CPU limited rather than IO 
limited.


I haven't seen that for a long time.  When gzip compression was first 
introduced, it would cause writes on a Thumper to be CPU bound.  It was 
all but unusable on that machine.  Today with better threading, I barely 
notice the overhead on the same box.



There are very few situations where this option is better than the default lzjb.


That part I do agree with!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Appliance as a general-purpose server question

2012-11-22 Thread Ian Collins

On 11/23/12 05:50, Jim Klimov wrote:

On 2012-11-22 17:31, Darren J Moffat wrote:

Is it possible to use the ZFS Storage appliances in a similar
way, and fire up a Solaris zone (or a few) directly on the box
for general-purpose software; or to shell-script administrative
tasks such as the backup archive management in the global zone
(if that concept still applies) as is done on their current
Solaris-based box?

No it is a true appliance, it might look like it has Solaris underneath
but it is just based on Solaris.

You can script administrative tasks but not using bash/ksh style
scripting you use the ZFSSA's own scripting language.

So, the only supported (or even possible) way is indeed to us it
as NAS for file or block IO from another head running the database
or application servers?..


Yes.


I wonder if it would make weird sense to get the boxes, forfeit the
cool-looking Fishworks, and install Solaris/OI/Nexenta/whatever to
get the most flexibility and bang for a buck from the owned hardware...
Or, rather, shop for the equivalent non-appliance servers...


As Tim Cook says, that would be a very expensive option.

I'm sure Oracle dropped the Thumper line because they competed head on 
with the appliances and gave way more flexibility.


If you are experienced with Solaris and ZFS, you will find using 
appliances very frustrating! You can't use the OS as you would like and 
you have to go through support when you would other wise fix things 
yourself.  In my part of the world, that isn't much fun.


Buy and equivalent JBOD and head unit and pretend you have a new Thumper.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel DC S3700

2012-11-21 Thread Ian Collins

On 11/14/12 12:28, Jim Klimov wrote:

On 2012-11-13 22:56, Mauricio Tavares wrote:

Trying again:

Intel just released those drives. Any thoughts on how nicely they will
play in a zfs/hardware raid setup?

Seems interesting - fast, assumed reliable and consistent in its IOPS
(according to marketing talk), addresses power loss reliability (acc.
to datasheet):

* Endurance Rating - 10 drive writes/day over 5 years while running
JESD218 standard

* The Intel SSD DC S3700 supports testing of the power loss capacitor,
which can be monitored using the following SMART attribute: (175, AFh).

snip

All in all, I can't come up with anything offensive against it quickly
;) One possible nit regards the ratings being geared towards 4KB block
(which is not unusual with SSDs), so it may be further from announced
performance with other block sizes - i.e. when caching ZFS metadata.


I can't help thinking these drives would be overkill for an ARC device.  
All of the expensive controller hardware is geared to boosting random 
write IOPs, which somewhat wasted on a write slowly, read often device.  
The enhancements would be good for a ZIL, but the smallest drive is at 
least an order of magnitude too big...


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Woeful performance from an iSCSI pool

2012-11-21 Thread Ian Collins
I look after a remote server that has two iSCSI pools.  The volumes for 
each pool are sparse volumes and a while back the target's storage 
became full, causing weird and wonderful corruption issues until they 
manges to free some space.


Since then, one pool has been reasonably OK, but the other has terrible 
performance receiving snapshots.  Despite both iSCSI devices using the 
same IP connection, iostat shows one with reasonable service times while 
the other shows really high (up to 9 seconds) service times and 100% 
busy.  This kills performance for snapshots with many random file 
removals and additions.


I'm currently zero filling the bad pool to recover space on the target 
storage to see if that improves matters.


Has anyone else seen similar behaviour with previously degraded iSCSI 
pools?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Woeful performance from an iSCSI pool

2012-11-21 Thread Ian Collins

On 11/22/12 10:15, Ian Collins wrote:

I look after a remote server that has two iSCSI pools.  The volumes for
each pool are sparse volumes and a while back the target's storage
became full, causing weird and wonderful corruption issues until they
manges to free some space.

Since then, one pool has been reasonably OK, but the other has terrible
performance receiving snapshots.  Despite both iSCSI devices using the
same IP connection, iostat shows one with reasonable service times while
the other shows really high (up to 9 seconds) service times and 100%
busy.  This kills performance for snapshots with many random file
removals and additions.

I'm currently zero filling the bad pool to recover space on the target
storage to see if that improves matters.

Has anyone else seen similar behaviour with previously degraded iSCSI
pools?

As a data point, both pools are being zero filled with dd.  A 30 second 
iostat sample shows one device getting more than double the write 
throughput of the other:


r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
0.2   64.00.0   50.1  0.0  5.60.7   87.9   4  64 
c0t600144F096C94AC74ECD96F20001d0
5.6   44.90.0   18.2  0.0  5.80.3  115.7   2  76 
c0t600144F096C94AC74FF354B2d0


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Dedicated server running ESXi with no RAID card, ZFS for storage?

2012-11-13 Thread Ian Collins

On 11/14/12 15:20, Dan Swartzendruber wrote:

Well, I think I give up for now.  I spent quite a few hours over the last
couple of days trying to get gnome desktop working on bare-metal OI,
followed by virtualbox.  Supposedly that works in headless mode with RDP for
management, but nothing but fail for me.  Found quite a few posts on various
forums of people complaining that RDP with external auth doesn't work (or
not reliably), and that was my experience.  The final straw was when I
rebooted the OI server as part of cleaning things up, and... It hung.  Last
line in verbose boot log is 'ucode0 is /pseudo/ucode@0'.  I power-cycled it
to no avail.  Even tried a backup BE from hours earlier, to no avail.
Likely whatever was bunged happened prior to that.  If I could get something
that ran like xen or kvm reliably for a headless setup, I'd be willing to
give it a try, but for now, no...


SmartOS.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strange mount -a problem in Solaris 11.1

2012-10-31 Thread Ian Collins
On 10/31/12 23:35, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ian Collins

Have have a recently upgraded (to Solaris 11.1) test system that fails
to mount its filesystems on boot.

Running zfs mount -a results in the odd error

#zfs mount -a
internal error
Invalid argument

truss shows the last call as

ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0)

The system boots up fine in the original BE.  The root (only) pool in a
single drive.

Any ideas?

devfsadm -Cv
rm /etc/zfs/zpool.cache
init 6



That was a big enough stick to fix it.  Nasty bug none the less.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Strange mount -a problem in Solaris 11.1

2012-10-30 Thread Ian Collins
Have have a recently upgraded (to Solaris 11.1) test system that fails 
to mount its filesystems on boot.


Running zfs mount -a results in the odd error

#zfs mount -a
internal error
Invalid argument

truss shows the last call as

ioctl(3, ZFS_IOC_OBJECT_STATS, 0xF706BBB0)

The system boots up fine in the original BE.  The root (only) pool in a 
single drive.


Any ideas?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send to older version

2012-10-18 Thread Ian Collins

On 10/18/12 21:09, Michel Jansens wrote:

Hi,

I've been using a Solaris 10 update 9 machine for some time to replicate 
filesystems from different servers through zfs send|ssh zfs receive.
This was done to store  disaster recovery pools. The DR zpools are made from  
sparse files (to allow for easy/efficient backup to tape).

Now I've installed a Solaris 11 machine and a SmartOS one.
When I try to replicate the pools from those machines, I get an error because 
filesystem/pool version don't support some features/properties on the solaris 
10u9.
Is there a way (apart from rsync) to send a snapshot from a newer zpool to an 
older one?


You have to create pools/filesystems with the older versions used by the 
destination machine.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-13 Thread Ian Collins

On 10/13/12 22:13, Jim Klimov wrote:

2012-10-13 0:41, Ian Collins пишет:

On 10/13/12 02:12, Edward Ned Harvey
(opensolarisisdeadlongliveopensolaris) wrote:

There are at least a couple of solid reasons *in favor* of partitioning.

#1  It seems common, at least to me, that I'll build a server with
let's say, 12 disk slots, and we'll be using 2T disks or something
like that.  The OS itself only takes like 30G which means if I don't
partition, I'm wasting 1.99T on each of the first two disks.  As a
result, when installing the OS, I always partition rpool down to ~80G
or 100G, and I will always add the second partitions of the first
disks to the main data pool.

How do you provision a spare in that situation?

Technically - you can layout the spare disks similarly and attach
the partitions or slices as spares for pools.


I probably didn't didn't make my self clear, so I'll try again!

Assuming the intention is to get the most storage from your drives.  If 
you add the remainder of the space on the drives you have partitioned 
for the root pool to the main pool giving a mix of device sizes in the 
pool, how do you provision a spare?


That's why I have never done this.  I use whole drives everywhere and as 
you mention further down, use the spare space in the root pool for 
scratch filesystems.



However, in servers I've seen there were predominantly different
layout designs:

1) Dedicated root disks/mirrors - small enough for rpool/swap
tasks, nowadays perhaps SSDs or CF cards - especially if care
was taken to use the rpool device mostly for reads and place
all writes like swap and logs onto other pools;

2) For smaller machines with 2 or 4 disks, a partition (slice)
is made for rpool sized about 10-20Gb, and the rest is for
data pool vdevs. In case of 4-disk machines, the rpool can be
a two-way mirror and the other couple of disks can host swap
and/or dump in an SVM or ZFS mirror for example. The data pool
components are identically sized and form a mirror, raid10 or
a raidz1; rarely a raidz2 - that is assumed to have better
resilience to loss of ANY two disks than a raid10 resilient
to loss of CORRECT two disks (from different mirrors).

3) For todays computers with all disks being big, I'd also
make a smallish rpool, a large data pool on separate disks,
and use the extra space on the disks with rpool for something
else - be it swap in SVM-mirrored partition, a scratch pool
for incoming data or tests, etc.


Most of the system I have built up this year are 2U boxes with 8 to 12 
(2TB) drives.  I expect these are very common at the moment.  I use your 
third option but I tend to just create a big rpool mirror and add a 
scratch filesystem  rather than partitioning the drives.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Using L2ARC on an AdHoc basis.

2012-10-13 Thread Ian Collins

On 10/14/12 10:02, Michael Armstrong wrote:

Hi Guys,

I have a portable pool i.e. one that I carry around in an enclosure. However, 
any SSD I add for L2ARC, will not be carried around... meaning the cache drive will 
become unavailable from time to time.

My question is Will random removal of the cache drive put the pool into a degraded 
state or affect the integrity of the pool at all? Additionally, how adversely will this effect 
warm up...
Or will moving the enclosure between machines with and without cache, just 
automatically work, and offer benefits when cache is available, and less 
benefits when it isn't?


Why bother with cache devices at all if you are moving the pool around?  
As you hinted above, the cache can take a while to warm up and become 
useful.


You should zpool remove the cache device before exporting the pool.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS best practice for FreeBSD?

2012-10-12 Thread Ian Collins
On 10/13/12 02:12, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

There are at least a couple of solid reasons *in favor* of partitioning.

#1  It seems common, at least to me, that I'll build a server with let's say, 
12 disk slots, and we'll be using 2T disks or something like that.  The OS 
itself only takes like 30G which means if I don't partition, I'm wasting 1.99T 
on each of the first two disks.  As a result, when installing the OS, I always 
partition rpool down to ~80G or 100G, and I will always add the second 
partitions of the first disks to the main data pool.


How do you provision a spare in that situation?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-08 Thread Ian Collins

On 10/08/12 20:08, Tiernan OToole wrote:
Ok, so, after reading a bit more of this discussion and after playing 
around at the weekend, i have a couple of questions to ask...


1: Do my pools need to be the same? for example, the pool in the 
datacenter is 2 1Tb drives in Mirror. in house i have 5 200Gb virtual 
drives in RAIDZ1, giving 800Gb usable. If i am backing up stuff to the 
home server, can i still do a ZFS Send, even though underlying system 
is different?


Yes you can, just make sure you have enough space!


2: If i give out a partition as an iSCSI LUN, can this be ZFS Sended 
as normal, or is there any difference?




It can be sent as normal.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-05 Thread Ian Collins

On 10/05/12 21:36, Jim Klimov wrote:

2012-10-05 11:17, Tiernan OToole wrote:

Also, as a follow up question, but slightly unrelated, when it comes to
the ZFS Send, i could use SSH to do the send, directly to the machine...
Or i could upload the compressed, and possibly encrypted dump to the
server... Which, for resume-ability and speed, would be suggested? And
if i where to go with an upload option, any suggestions on what i should
use?

As for this, the answer depends on network bandwidth, reliability,
and snapshot file size - ultimately, on the probability and retry
cost of an error during transmission.

Many posters on the list strongly object to using files as storage
for snapshot streams, because in reliability this is (may be) worse
than a single-disk pool and bitrot on it - a single-bit error in
a snapshot file can render it and all newer snapshots invalid and
un-importable.

Still, given enough scratch space on the sending and receiving sides
and a bad (slow, glitchy) network in-between, I did go with compressed
files of zfs-send streams (perhaps making recursion myself and using
smaller files of one snapshot each - YMMV). For compression on multiCPU
senders I can strongly suggest pigz --fast $filename (I did have
problems in pigz-1.7.1 compressing several files with one command,
maybe that's fixed now). If you're tight on space/transfer size more
than on CPU, you can try other parallel algos - pbzip2, p7zip, etc.
Likewise, you can also pass the file into an encryptor of your choice.


I do have to suffer a slow, glitchy WAN to a remote server and rather 
than send stream files, I broke the data on the remote server into a 
more fine grained set of filesystems than I would do normally.  In this 
case, I made the directories under what would have been the leaf 
filesystems filesystems themselves.


By spreading the data over more filesystems, the individual incremental 
sends are smaller, so there is less data to resend if the link burps 
during a transfer.


--
Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Building an On-Site and Off-Size ZFS server, replication question

2012-10-05 Thread Ian Collins
On 10/06/12 07:57, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Frank Cusack

On Fri, Oct 5, 2012 at 3:17 AM, Ian Collinsi...@ianshome.com  wrote:
I do have to suffer a slow, glitchy WAN to a remote server and rather than
send stream files, I broke the data on the remote server into a more fine
grained set of filesystems than I would do normally.  In this case, I made the
directories under what would have been the leaf filesystems filesystems
themselves.

Meaning you also broke the data on the LOCAL server into the same set of
more granular filesystems?  Or is it now possible to zfs send a subdirectory of
a filesystem?

zfs create instead of mkdir

As Ian said - he didn't zfs send subdirs, he made filesystems where he 
otherwise would have used subdirs.



That right.

I do have a lot of what would appear to be unnecessary filesystems, but 
after loosing the WAN 3 days into a large transfer, a change of tactic 
was required!


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Should clearing the share property on a clone unshare the origin?

2012-09-25 Thread Ian Collins
I've noticed on a Solaris 11 system that when I clone a filesystem and 
change the share property:


#zfs clone -p -o atime=off filesystem@snapshot clone

#zfs set -c share=name=old share clone

#zfs set share=name=new NFS share clone

#zfs set sharenfs=on clone

The origin filesystem is no longer shared (the clone is successfully 
shared).  The share and sharenfs properties on the origin filesystem are 
unchanged.


I have to run zfs share on the origin filesystem to restore the share.

Feature or a bug??

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] all in one server

2012-09-19 Thread Ian Collins

On 09/19/12 02:38 AM, Sašo Kiselkov wrote:

On 09/18/2012 04:31 PM, Eugen Leitl wrote:


Can I actually have a year's worth of snapshots in
zfs without too much performance degradation?

Each additional dataset (not sure about snapshots, though) increases
boot times slightly, however, I've seen pools with several hundred
datasets without any serious issues, so yes, it is possible. Be
prepared, though, that the data volumes might be substantial (depending
on your overall data turn-around per unit time between the snapshots).


The boot overhead for many (in my case 1200) filesystems isn't as bad 
as it was.  On our original Thumper I had to amalgamate all our user 
home directories into one filesystem due to slow boot.  Now I have split 
them again to send over a slow WAN...


Large numbers of snapshots (10's of thousands) don't appear to impact 
boot times.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zvol vs zfs send/zfs receive

2012-09-14 Thread Ian Collins

On 09/15/12 04:46 PM, Dave Pooser wrote:

I need a bit of a sanity check here.

1) I have a a RAIDZ2 of 8 1TB drives, so 6TB usable, running on an ancient
version of OpenSolaris (snv_134 I think). On that zpool (miniraid) I have
a zvol (RichRAID) that's using almost the whole FS. It's shared out via
COMSTAR Fibre Channel target mode. I'd like to move that zvol to a newer
server with a larger zpool. Sounds like a job for ZFS send/receive, right?

2) Since ZFS send/receive is snapshot-based I need to create a snapshot.
Unfortunately I did not realize that zvols require disk space sufficient
to duplicate the zvol, and my zpool wasn't big enough.


To do what?

A snapshot only starts to consume space when data in the 
filesystem/volume changes.



After a false start
(zpool add is dangerous when low on sleep) I added a 250GB mirror and a
pair of 3GB mirrors to miniraid and was able to successfully snapshot the
zvol: miniraid/RichRAID@exportable (I ended up booting off an OI 151a5 USB
stick to make that work, since I don't believe snv_134 could handle a 3TB
disk).

3) Now it's easy, right? I enabled root login via SSH on the new host,
which is running a zpool archive1 consisting of a single RAIDZ2 of 3TB
drives using ashift=12, and did a ZFS send:
ZFS send miniraid/RichRAID@exportable | ssh root@newhost zfs receive
archive1/RichRAID

It asked for the root password, I gave it that password, and it was off
and running. GigE ain't super fast, but I've got time.

The problem: so far the send/recv appears to have copied 6.25TB of 5.34TB.
That... doesn't look right. (Comparing zfs list -t snapshot and looking at
the 5.34 ref for the snapshot vs zfs list on the new system and looking at
space used.)

Is this a problem? Should I be panicking yet?


No.

Do you have compression on on one side but no the other?  Either way, 
let things run to completion.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scripting incremental replication data streams

2012-09-12 Thread Ian Collins
On 09/13/12 07:44 AM, Edward Ned Harvey 
(opensolarisisdeadlongliveopensolaris) wrote:


I send a replication data stream from one host to another. (and receive).

I discovered that after receiving, I need to remove the auto-snapshot 
property on the receiving side, and set the readonly property on the 
receiving side, to prevent accidental changes (including auto-snapshots.)


Question #1:Actually, do I need to remove the auto-snapshot on the 
receiving side?Or is it sufficient to simply set the readonly 
property?Will the readonly property prevent auto-snapshots from occurring?


So then, sometime later, I want to send an incremental replication 
stream.I need to name an incremental source snap on the sending 
side...which needs to be the latest matching snap that exists on both 
sides.


Question #2:What's the best way to find the latest matching snap on 
both the source and destination?At present, it seems, I'll have to 
build a list of sender snaps, and a list of receiver snaps, and parse 
and search them, till I find the latest one that exists in both.For 
shell scripting, this is very non-trivial.




That's pretty much how I do it.  Get the two (sorted) sets of snapshots, 
remove those that only exist on the remote end (ageing) and send those 
that only exist locally.  The first incremental pair will be the last 
common snapshot and the first unique local snapshot.


I haven't tried this in a script, but it's quite straightforward in C++ 
using the standard library set container and algorithms.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] scripting incremental replication data streams

2012-09-12 Thread Ian Collins

On 09/13/12 10:23 AM, Timothy Coalson wrote:
Unless i'm missing something, they didn't solve the matching 
snapshots thing yet, from their site:


To Do:

Additional error handling for mismatched snapshots (last destination 
snap no longer exists on the source) walk backwards through the remote 
snaps until a common snapshot is found and destroy non-matching remote 
snapshots




That's what I do as party of my destroy snapshots not on the source 
check.  Over many years of managing various distributed systems, I've 
discovered the apparently simple tends to get complex!


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] what have you been buying for slog and l2arc?

2012-08-29 Thread Ian Collins

On 08/ 4/12 09:50 PM, Eugen Leitl wrote:

On Fri, Aug 03, 2012 at 08:39:55PM -0500, Bob Friesenhahn wrote:


Extreme write IOPS claims in consumer SSDs are normally based on large
write caches which can lose even more data if there is a power failure.

Intel 311 with a good UPS would seem to be a reasonable tradeoff.


The 313 series looks like a consumer price SLC drive aimed at the recent 
trend in windows cache drives.


Should be worth a look.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Benefits of enabling compression in ZFS for the zones

2012-07-10 Thread Ian Collins

On 07/10/12 09:25 PM, Jordi Espasa Clofent wrote:

Hi all,

By default I'm using ZFS for all the zones:

admjoresp@cyd-caszonesrv-15:~$ zfs list
NAME USED  AVAIL  REFER  MOUNTPOINT
opt 4.77G  45.9G   285M  /opt
opt/zones   4.49G  45.9G29K  /opt/zones
opt/zones/glad-gm02-ftcl01   367M  45.9G   367M  /opt/zones/glad-gm02-ftcl01
opt/zones/glad-gp02-ftcl01   502M  45.9G   502M  /opt/zones/glad-gp02-ftcl01
opt/zones/glad-gp02-ftcl02  1.21G  45.9G  1.21G  /opt/zones/glad-gp02-ftcl02
opt/zones/mbd-tcasino-02 257M  45.9G   257M  /opt/zones/mbd-tcasino-02
opt/zones/mbd-tcasino-04 281M  45.9G   281M  /opt/zones/mbd-tcasino-04
opt/zones/mbfd-gp02-ftcl01   501M  45.9G   501M  /opt/zones/mbfd-gp02-ftcl01
opt/zones/mbfd-gp02-ftcl02   475M  45.9G   475M  /opt/zones/mbfd-gp02-ftcl02
opt/zones/mbhd-gp02-ftcl01   475M  45.9G   475M  /opt/zones/mbhd-gp02-ftcl01
opt/zones/mbhd-gp02-ftcl02   507M  45.9G   507M  /opt/zones/mbhd-gp02-ftcl02

However, I have the compression disabled in all of them.

According to this Oracle whitepaper
http://www.oracle.com/technetwork/server-storage/solaris10/solaris-zfs-in-containers-wp-167903.pdf:

The next example demonstrates the compression property. If compression
is enabled, Oracle Solaris ZFS will transparently compress all of the
data before it is written to disk. The benefits of compression
are both saved disk space and possible write speed improvements.

What exactly means POSSIBLE write speed improvements?


With compression enabled, less data has to be written to disk, so N 
bytes writes in N/compress ratio time.


On most systems, the performance cost of compressing and uncompressing 
data is relatively low.

As you can see above I don't use to have any room problems, so if I'm
going to enable the compression flag it has to be because of the write
speed improvements.


I always enable compression by default and only turn it off for 
filesystems I know hold un-compressible data such as media files.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scenario sanity check

2012-07-09 Thread Ian Collins

On 07/10/12 05:26 AM, Brian Wilson wrote:

Yep, thanks, and to answer Ian with more detail on what TruCopy does.
TruCopy mirrors between the two storage arrays, with software running on
the arrays, and keeps a list of dirty/changed 'tracks' while the mirror
is split. I think they call it something other than 'tracks' for HDS,
but, whatever.  When it resyncs the mirrors it sets the target luns
read-only (which is why I export the zpools first), and the source array
reads the changed tracks, and writes them across dedicated mirror ports
and fibre links to the target array's dedicated mirror ports, which then
brings the target luns up to synchronized. So, yes, like Richard says,
there is IO, but it's isolated to the arrays, and it's scheduled as
lower priority on the source array than production traffic. For example
it can take an hour or more to re-synchronize a particularly busy 250 GB
lun. (though you can do more than one at a time without it taking longer
or impacting production any more unless you choke the mirror links,
which we do our best not to do) That lower priority, dedicated ports on
the arrays, etc, all makes the noticaeble impact on the production
storage luns from the production server as un-noticable as I can make it
in my environment.


Thank you for the background on TruCopy.   Reading the above, it looks 
like you can have pretty long time without a true copy!  I guess my view 
on replication is you are always going to have X number of I/O 
operations and now dense they are depends on how up to date you want 
you're copy to be.


What I still don't understand is why a service interruption is 
preferable to a wee bit more I/O?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scenario sanity check

2012-07-06 Thread Ian Collins

On 07/ 7/12 08:34 AM, Brian Wilson wrote:

Hello,

I'd like a sanity check from people more knowledgeable than myself.
I'm managing backups on a production system.  Previously I was using
another volume manager and filesystem on Solaris, and I've just switched
to using ZFS.

My model is -
Production Server A
Test Server B
Mirrored storage arrays (HDS TruCopy if it matters)
Backup software (TSM)

Production server A sees the live volumes.
Test Server B sees the TruCopy mirrors of the live volumes.  (it sees
the second storage array, the production server sees the primary array)

Production server A shuts down zone C, and exports the zpools for zone C.
Production server A splits the mirror to secondary storage array,
leaving the mirror writable.
Production server A re-imports the pools for zone C, and boots zone C.
Test Server B imports the ZFS pool using -R /backup.
Backup software backs up the mounted mirror volumes on Test Server B.

Later in the day after the backups finish, a script exports the ZFS
pools on test server B, and re-establishes the TruCopy mirror between
the storage arrays.


That looks awfully complicated.   Why don't you just clone a snapshot 
and back up the clone?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scenario sanity check

2012-07-06 Thread Ian Collins

On 07/ 7/12 11:29 AM, Brian Wilson wrote:

On 07/ 6/12 04:17 PM, Ian Collins wrote:

On 07/ 7/12 08:34 AM, Brian Wilson wrote:

Hello,

I'd like a sanity check from people more knowledgeable than myself.
I'm managing backups on a production system.  Previously I was using
another volume manager and filesystem on Solaris, and I've just switched
to using ZFS.

My model is -
Production Server A
Test Server B
Mirrored storage arrays (HDS TruCopy if it matters)
Backup software (TSM)

Production server A sees the live volumes.
Test Server B sees the TruCopy mirrors of the live volumes.  (it sees
the second storage array, the production server sees the primary array)

Production server A shuts down zone C, and exports the zpools for
zone C.
Production server A splits the mirror to secondary storage array,
leaving the mirror writable.
Production server A re-imports the pools for zone C, and boots zone C.
Test Server B imports the ZFS pool using -R /backup.
Backup software backs up the mounted mirror volumes on Test Server B.

Later in the day after the backups finish, a script exports the ZFS
pools on test server B, and re-establishes the TruCopy mirror between
the storage arrays.

That looks awfully complicated.   Why don't you just clone a snapshot
and back up the clone?


Taking a snapshot and cloning incurs IO.  Backing up the clone incurs a
lot more IO reading off the disks and going over the network.  These
aren't acceptable costs in my situation.


So splitting a mirror and reconnecting it doesn't incur I/O?


The solution is complicated if you're starting from scratch.  I'm
working in an environment that already had all the pieces in place
(offsite synchronous mirroring, a test server to mount stuff up on,
scripts that automated the storage array mirror management, etc).  It
was setup that way specifically to accomplish short downtime outages for
cold backups with minimal or no IO hit to production.  So while it's
complicated, when it was put together it was also the most obvious thing
to do to drop my backup window to almost nothing, and keep all the IO
from the backup from impacting production.  And like I said, with a
different volume manager, it's been rock solid for years.

So, to ask the sanity check more specifically -
Is it reasonable to expect ZFS pools to be exported, have their luns
change underneath, then later import the same pool on those changed
drives again?


If you were splitting ZFS mirrors to read data from one half all would 
be sweet (and you wouldn't have to export the pool).  I guess the 
question here is what does TruCopy do under the hood when you re-connect 
the mirror?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sol11 missing snapshot facility

2012-07-05 Thread Ian Collins

On 07/ 5/12 06:52 PM, Carsten John wrote:

Hello everybody,


for some reason I can not find the zfs-autosnapshot service facility any more. 
I already reinstalles time-slider, but it refuses to start:


RuntimeError: Error reading SMF schedule instances
Details:
['/usr/bin/svcs', '-H', '-o', 'state', 
'svc:/system/filesystem/zfs/auto-snapshot:monthly'] failed with exit code 1
svcs: Pattern 'svc:/system/filesystem/zfs/auto-snapshot:monthly' doesn't match 
any instances


Have you looked with svcs -a?

# svcs -a | grep zfs
disabled   Jul_02   svc:/system/filesystem/zfs/auto-snapshot:daily
disabled   Jul_02   svc:/system/filesystem/zfs/auto-snapshot:frequent
disabled   Jul_02   svc:/system/filesystem/zfs/auto-snapshot:hourly
disabled   Jul_02   svc:/system/filesystem/zfs/auto-snapshot:monthly
disabled   Jul_02   svc:/system/filesystem/zfs/auto-snapshot:weekly
disabled   Jul_02   svc:/application/time-slider/plugin:zfs-send

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sol11 missing snapshot facility

2012-07-05 Thread Ian Collins

On 07/ 5/12 09:25 PM, Carsten John wrote:


Hi Ian,

yes, I already checked that:

svcs -a | grep zfs
disabled   11:50:39 svc:/application/time-slider/plugin:zfs-send

is the only service I get listed.


Odd.

How did you install?

Is the manifest there 
(/lib/svc/manifest/system/filesystem/auto-snapshot.xml)?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sol11 missing snapshot facility

2012-07-05 Thread Ian Collins

On 07/ 5/12 11:32 PM, Carsten John wrote:

-Original message-
To: Carsten Johncj...@mpi-bremen.de;
CC: zfs-discuss@opensolaris.org;
From:   Ian Collinsi...@ianshome.com
Sent:   Thu 05-07-2012 11:35
Subject:Re: [zfs-discuss] Sol11 missing snapshot facility

On 07/ 5/12 09:25 PM, Carsten John wrote:


Hi Ian,

yes, I already checked that:

svcs -a | grep zfs
disabled   11:50:39 svc:/application/time-slider/plugin:zfs-send

is the only service I get listed.


Odd.

How did you install?

Is the manifest there
(/lib/svc/manifest/system/filesystem/auto-snapshot.xml)?


Hi Ian,

I installed from CD/DVD, but it might have been in a rush, as I needed to 
replace a broken machine as quick as possible.

The manifest is there:


ls /lib/svc/manifest/system/filesystem/
.  .. auto-snapshot.xml  autofs.xml 
local-fs.xml   minimal-fs.xml rmvolmgr.xml   root-fs.xml
ufs-quota.xml  usr-fs.xml



Running svcadm restart manifest-import should load it, or give you 
some idea why it won't load.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-07-02 Thread Ian Collins

On 05/29/12 08:42 AM, Richard Elling wrote:

On May 28, 2012, at 2:48 AM, Ian Collins wrote:

On 05/28/12 08:55 PM, Sašo Kiselkov wrote:

..
If the drives show up at all, chances are you only need to work around
the power-up issue in Dell HDD firmware.

Here's what I had to do to get the drives going in my R515:
/kernel/drv/sd.conf

sd-config-list = SEAGATE ST3300657SS, power-condition:false,
 SEAGATE ST2000NM0001, power-condition:false;

(that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on
your drive model the strings might differ)


How would that work when the drive type is unknown (to format)?  I 
assumed if sd knows the type, so will format.


I haven't looked at the code recently, but if it is the same parser as 
used elsewhere,
then a partial match should work. Can someone try it out and report 
back to the list?

sd-config-list = SEAGATE ST, power-condition:false;



Well I finally got back to testing this box...

Yes, that shorthand fixes the power-up issue (tested from a cold start).

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-07-02 Thread Ian Collins

On 07/ 1/12 08:57 PM, Ian Collins wrote:

On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:

On Sun, Jul 1, 2012 at 4:18 AM, Ian Collinsi...@ianshome.com   wrote:

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI
target
and initiator behaviour.

Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?

Before making that assumption, I'd try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it's not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it's not client-specific problem
- possibly restart the iscsi server, just to make sure

Booting the initiator host from a live DVD image and attempting to
import the pool gives the same error report.


The pool's data appears to be recoverable when I import it read only.

The storage appliance is so full they can't delete files from it!  Now 
that shouldn't have caused problems with a fixed sized volume, but who 
knows?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-07-01 Thread Ian Collins

On 07/ 1/12 10:20 AM, Fajar A. Nugraha wrote:

On Sun, Jul 1, 2012 at 4:18 AM, Ian Collinsi...@ianshome.com  wrote:

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI
target
and initiator behaviour.


Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?

Before making that assumption, I'd try something simple first:
- reading from the imported iscsi disk (e.g. with dd) to make sure
it's not iscsi-related problem
- import the disk in another host, and try to read the disk again, to
make sure it's not client-specific problem
- possibly restart the iscsi server, just to make sure


Booting the initiator host from a live DVD image and attempting to 
import the pool gives the same error report.

I suspect the problem is with your oracle storage appliance. But since
you say there's no errors there, then the simple tests should make
sure whethere it's client, disk, or zfs problem.


So did I.

I'll get the admin for that system to dig a little deeper and export a 
new volume to see if I can create a new pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Very sick iSCSI pool

2012-06-30 Thread Ian Collins

On 06/30/12 03:01 AM, Richard Elling wrote:

Hi Ian,
Chapter 7 of the DTrace book has some examples of how to look at iSCSI 
target

and initiator behaviour.


Thanks Richard, I 'll have a look.

I'm assuming the pool is hosed?


 -- richard

On Jun 28, 2012, at 10:47 PM, Ian Collins wrote:

I'm trying to work out the case a remedy for a very sick iSCSI pool 
on a Solaris 11 host.


The volume is exported from an Oracle storage appliance and there are 
no errors reported there.  The host has no entries in its logs 
relating to the network connections.


Any zfs or zpool commands the change the state of the pool (such as 
zfs mount or zpool export) hang and can't be killed.


fmadm faulty reports:

Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FD 
   Major


Host: taitaklsc01
Platform: SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
Product_sn  : 1142FMM02N

Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
 faulted but still in service
Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
 faulted but still in service

Description : The number of I/O errors associated with a ZFS device 
exceeded
acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD

 for more information.

The zpool status paints a very gloomy picture:

 pool: fileserver
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
   continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Fri Jun 29 11:59:59 2012
   858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
   567K resilvered, 0.00% done
config:

   NAME STATE READ WRITE 
CKSUM
   fileserver   ONLINE   0 1.16M 
0
 c0t600144F096C94AC74ECD96F20001d0  ONLINE   0 1.16M 
0  (resilvering)


errors: 1557164 data errors, use '-v' for a list

Any ideas how to determine the cause of the problem and remedy it?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org mailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
ZFS Performance and Training
richard.ell...@richardelling.com mailto:richard.ell...@richardelling.com
+1-760-896-4422










--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Very sick iSCSI pool

2012-06-28 Thread Ian Collins
I'm trying to work out the case a remedy for a very sick iSCSI pool on a 
Solaris 11 host.


The volume is exported from an Oracle storage appliance and there are no 
errors reported there.  The host has no entries in its logs relating to 
the network connections.


Any zfs or zpool commands the change the state of the pool (such as zfs 
mount or zpool export) hang and can't be killed.


fmadm faulty reports:

Jun 27 14:04:24 536fb2ad-1fca-c8b2-fc7d-f5a4a94c165d  ZFS-8000-FDMajor

Host: taitaklsc01
Platform: SUN-FIRE-X4170-M2-SERVER  Chassis_id  : 1142FMM02N
Product_sn  : 1142FMM02N

Fault class : fault.fs.zfs.vdev.io
Affects : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
  faulted but still in service
Problem in  : zfs://pool=fileserver/vdev=68c1bdefa6f97db8
  faulted but still in service

Description : The number of I/O errors associated with a ZFS device exceeded
 acceptable levels.  Refer to 
http://sun.com/msg/ZFS-8000-FD

  for more information.

The zpool status paints a very gloomy picture:

  pool: fileserver
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Jun 29 11:59:59 2012
858K scanned out of 15.7T at 43/s, (scan is slow, no estimated time)
567K resilvered, 0.00% done
config:

NAME STATE READ WRITE CKSUM
fileserver   ONLINE   0 1.16M 0
  c0t600144F096C94AC74ECD96F20001d0  ONLINE   0 
1.16M 0  (resilvering)


errors: 1557164 data errors, use '-v' for a list

Any ideas how to determine the cause of the problem and remedy it?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-28 Thread Ian Collins

On 05/ 7/12 04:08 PM, Ian Collins wrote:

On 05/ 7/12 03:42 PM, Greg Mason wrote:

I am currently trying to get two of these things running Illumian. I don't have 
any particular performance requirements, so I'm thinking of using some sort of 
supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the 
driver support issues, and passing the disks through to an Illumian guest.

The H310 does indeed support pass-through (the non-raid mode), but one thing to 
keep in mind is that I was only able to configure a single boot disk. I 
configured the rear two drives into a hardware raid 1 and set the virtual disk 
as the boot disk so that I can still boot the system if an OS disk fails.

Once Illumos is better supported on the R720 and the PERC H310, I plan to get 
rid of the hypervisor silliness and run Illumos on bare metal.

Thank you for the feedback Greg.

Using a hypervisor layer is our fall-back position.  My next attempt
would be SmartOs if I can't get the cards swapped (the R720 currently
has a Broadcom 5720 NIC).


To follow up, the H310 appears to be useless in non-raid mode.

The drives do show up in Solaris 11 format, but they show up as unknown, 
unformatted drives.  One oddity is the box has two SATA SSDs which also 
show up the card's BIOS, but present OK to Solaris.


I'd like to re-FLASH the cards, but I don't think Dell would be too 
happy with me doing that on an evaluation system...


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-28 Thread Ian Collins

On 05/28/12 08:55 PM, Sašo Kiselkov wrote:

On 05/28/2012 10:48 AM, Ian Collins wrote:

To follow up, the H310 appears to be useless in non-raid mode.

The drives do show up in Solaris 11 format, but they show up as
unknown, unformatted drives.  One oddity is the box has two SATA
SSDs which also show up the card's BIOS, but present OK to
Solaris.

I'd like to re-FLASH the cards, but I don't think Dell would be
too happy with me doing that on an evaluation system...

If the drives show up at all, chances are you only need to work around
the power-up issue in Dell HDD firmware.

Here's what I had to do to get the drives going in my R515:
/kernel/drv/sd.conf

sd-config-list = SEAGATE ST3300657SS, power-condition:false,
  SEAGATE ST2000NM0001, power-condition:false;

(that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on
your drive model the strings might differ)


How would that work when the drive type is unknown (to format)?  I 
assumed if sd knows the type, so will format.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-28 Thread Ian Collins

On 05/28/12 10:53 PM, Sašo Kiselkov wrote:

On 05/28/2012 11:48 AM, Ian Collins wrote:

On 05/28/12 08:55 PM, Sašo Kiselkov wrote:

On 05/28/2012 10:48 AM, Ian Collins wrote:

To follow up, the H310 appears to be useless in non-raid mode.

The drives do show up in Solaris 11 format, but they show up as
unknown, unformatted drives.  One oddity is the box has two SATA
SSDs which also show up the card's BIOS, but present OK to
Solaris.

I'd like to re-FLASH the cards, but I don't think Dell would be
too happy with me doing that on an evaluation system...

If the drives show up at all, chances are you only need to work around
the power-up issue in Dell HDD firmware.

Here's what I had to do to get the drives going in my R515:
/kernel/drv/sd.conf

sd-config-list = SEAGATE ST3300657SS, power-condition:false,
   SEAGATE ST2000NM0001, power-condition:false;

(that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on
your drive model the strings might differ)

How would that work when the drive type is unknown (to format)?  I
assumed if sd knows the type, so will format.

Simply take out the drive and have a look at the label.


Tricky when the machine is on a different continent!

Joking aside, *I* know what the drive is, the OS as far as I can tell 
doesn't.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-28 Thread Ian Collins

On 05/28/12 11:01 PM, Sašo Kiselkov wrote:

On 05/28/2012 12:59 PM, Ian Collins wrote:

On 05/28/12 10:53 PM, Sašo Kiselkov wrote:

On 05/28/2012 11:48 AM, Ian Collins wrote:

On 05/28/12 08:55 PM, Sašo Kiselkov wrote:

On 05/28/2012 10:48 AM, Ian Collins wrote:

To follow up, the H310 appears to be useless in non-raid mode.

The drives do show up in Solaris 11 format, but they show up as
unknown, unformatted drives.  One oddity is the box has two SATA
SSDs which also show up the card's BIOS, but present OK to
Solaris.

I'd like to re-FLASH the cards, but I don't think Dell would be
too happy with me doing that on an evaluation system...

If the drives show up at all, chances are you only need to work around
the power-up issue in Dell HDD firmware.

Here's what I had to do to get the drives going in my R515:
/kernel/drv/sd.conf

sd-config-list = SEAGATE ST3300657SS, power-condition:false,
SEAGATE ST2000NM0001, power-condition:false;

(that's for Seagate 300GB 15k SAS and 2TB 7k2 SAS drives, depending on
your drive model the strings might differ)

How would that work when the drive type is unknown (to format)?  I
assumed if sd knows the type, so will format.

Simply take out the drive and have a look at the label.

Tricky when the machine is on a different continent!

Joking aside, *I* know what the drive is, the OS as far as I can tell
doesn't.

Can you have a look at your /var/adm/messages or dmesg to check whether
the OS is complaining about failed to power up on the relevant drives?
If yes, then the above fix should work for you, all you need to do is
determine the exact manufacturer and model to enter into sd.conf and
reload the driver via update_drv -vf sd.


Yes I do see that warning for the non-raid drives.

The problem is I'm booting from a remote ISO image, so I can't alter 
/kernel/drv/sd.conf.


I'll play more tomorrow, typing on a remote console inside an RDP 
session running in a VNC session on a virtual machine is interesting :)


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] need information about ZFS API

2012-05-28 Thread Ian Collins

On 05/29/12 08:32 AM, Richard Elling wrote:

Hi Dhiraj,

On May 27, 2012, at 11:28 PM, Dhiraj Bhandare wrote:


Hi All

I would like to create a sample application for ZFS using C++/C and 
libzfs.
I am very new to ZFS, I would like to have an some information about 
ZFS API.

Even some sample code will be useful.
Looking for help and constructive suggestion.


libzfs is a private interface (see Solaris man page for attributes)
It was not designed to be used directly by external programmers.
I can't comment on what Oracle might or might not be doing, but for the
open source community, there is a project underway called libzfs_core
that is developing a stable library for external consumers. For more info,
see
http://smartos.org/2012/01/13/the-future-of-libzfs/



That's good news.  It's a shame it wasn't announced here.

I'm one of the many Matt refers to in the presentation who has been 
using libzfs.  I started using it pretty much the week ZFS shipped and 
it a was a long while after that I discovered the the private nature of 
the interface (through a posting here asking for an enhancement!).


Since then I have been using a thin wrapper to decouple my applications 
from changes to the API.  Generally the API has been stable for basic 
operations such as iteration and accessing properties.  Not so for send 
and receive!


I have a simple (150 line) C++ wrapper that supports iteration and 
property access I'm happy to share if anyone is interested.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hard Drive Choice Question

2012-05-17 Thread Ian Collins

On 05/17/12 02:53 AM, Paul Kraus wrote:

 I have a small server at home (HP Proliant Micro N36) that I use
for file, DNS, DHCP, etc. services. I currently have a zpool of four
mirrored 1 TB Seagate ES2 SATA drives. Well, it was a zpool of four
until last night when one of the drives died. ZFS did it's job and all
the data is still OK.

 The drive is still under warranty and is going back to Seagate,
but it raised an issue. I want to pick up a spare drive or two so that
I don't have to wait for shipping delays when a drive fails. I was
just going to pick up another 1 TB ES2 or two, but I find that those
drives are no longer available (I bought mine in 2009, warranty is up
in 2014).

 What do people like today for 7x24 operation SATA drives? I am
willing to consider 2TB, but don't really need the extra capacity (but
if that is all the market offers, I don't have to use the other half
:-) I found a Seagate Constellation ES 2 TB for about $350 (which is
more than I really want to spend, I got the ES2 1TB drives for about
$130 when I bought them). I have been sticking with Seagate as I am
comfortable with them, but am willing to look at others. The only
thing I insist on is that the drive be rated for 7x24 operation.


I wouldn't be too fussed about 7x24 rating in a home server.

I still have a set of 10 regular Seagate drives I bought in 2007 that 
were spinning non stop for four years in a very hostile environment (my 
garage!).  They simply refuse to die and I'm still using them in various 
test systems.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Unexpected error adding a cache device to existing pool

2012-05-14 Thread Ian Collins
On a Solaris 11 system I have a pool that was originally built with a 
log a cache device on a single SSD.  The SSD died and I realised I 
should have a mirror log, so I've just tried to replace the log a cache 
with a pair of SSDs.


Adding the log was OK:

zpool add -f export log mirror c10t3d0s0 c10t4d0s0

But adding the cache fails:

zpool add -f export cache c10t3d0s1 c10t4d0s1
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c10t3d0s2 is part of active ZFS pool export. Please see zpool(1M).
/dev/dsk/c10t3d0s1 overlaps with /dev/dsk/c10t3d0s2

Now that looks impossible to repair, s2 can't be removed.  The SSD 
partition table is:


Total disk cylinders available: 19932 + 2 (reserved cylinders)

Part  TagFlag Cylinders SizeBlocks
  0 unassignedwm   0 -  2674   16.00GB(2675/0/0)   33555200
  1 unassignedwm2675 - 19931  103.22GB(17257/0/0) 216471808
  2 backupwu   0 - 19931  119.22GB(19932/0/0) 250027008

Is there a solution?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Unexpected error adding a cache device to existing pool

2012-05-14 Thread Ian Collins

On 05/14/12 10:32 PM, Carson Gaspar wrote:

On 5/14/12 2:02 AM, Ian Collins wrote:

Adding the log was OK:

zpool add -f export log mirror c10t3d0s0 c10t4d0s0

But adding the cache fails:

zpool add -f export cache c10t3d0s1 c10t4d0s1
invalid vdev specification
the following errors must be manually repaired:
/dev/dsk/c10t3d0s2 is part of active ZFS pool export. Please see zpool(1M).
/dev/dsk/c10t3d0s1 overlaps with /dev/dsk/c10t3d0s2

The only solution I know of is to get rid of the whole-disk slice s2
from the disk label. I ended up using prtvtoc to dump the table, editing
it by hand, and feeding it to fmthard.

You could also try making s0 start at cylinder 1 instead of zero, so
zpool doesn't see a magic number on s2, but I don't know if that will be
enough.



Thank you for the suggestions Carson.

Making s0 start at cylinder 1 did the trick.  I'm sure I didn't have to 
do that when I originally built the pool, but that was back on Solaris 
11 Express.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Strange hang during snapshot receive

2012-05-11 Thread Ian Collins

On 05/11/12 02:01 AM, Mike Gerdts wrote:

On Thu, May 10, 2012 at 5:37 AM, Ian Collinsi...@ianshome.com  wrote:

I have an application I have been using to manage data replication for a
number of years.  Recently we started using a new machine as a staging
server (not that new, an x4540) running Solaris 11 with a single pool built
from 7x6 drive raidz.  No dedup and no reported errors.

On that box and nowhere else is see empty snapshots taking 17 or 18 seconds
to write.  Everywhere else they return in under a second.


Have you installed any SRUs?  If not, you could be seeing:


The machine was at SRU 3.


7060894 zfs recv is excruciatingly slow

which is fixed in Solaris 11 SRU 5.


Thanks Mike, that appears to be it.  Updating to SRU 6 fixed the issue.

If you are using zones and are using any https pkg(5) origins (such as
https://pkg.oracle.com/solaris/support), I suggest reading
https://forums.oracle.com/forums/thread.jspa?threadID=2380689tstart=15
before updating to SRU 6 (SRU 5 is fine, however).  The fix for the
problem mentioned in that forums thread should show up in an upcoming
SRU via CR 7157313.



Luckily I have a local repository!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Strange hang during snapshot receive

2012-05-10 Thread Ian Collins
I have an application I have been using to manage data replication for a 
number of years.  Recently we started using a new machine as a staging 
server (not that new, an x4540) running Solaris 11 with a single pool 
built from 7x6 drive raidz.  No dedup and no reported errors.


On that box and nowhere else is see empty snapshots taking 17 or 18 
seconds to write.  Everywhere else they return in under a second.


Using truss and the last published source code, it looks like the pause 
is between a printf and  the call to zfs_ioctl and there aren't any 
other functions calls between them:


100.5124 0.0004open(/dev/zfs, O_RDWR|O_EXCL)= 10
100.7582 0.0001read(7, \0\0\0\0\0\0\0\0ACCBBAF5.., 312)= 312
100.7586 0.read(7, 0x080464F8, 0)= 0
100.7591 0.time()= 1336628656
100.7653 0.0035ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040CF0)= 0
100.7699 0.0022ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0
100.7740 0.0016ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040580)= 0
100.7787 0.0026ioctl(8, ZFS_IOC_OBJSET_STATS, 0x080405B0)= 0
100.7794 0.0001write(1,  r e c e i v i n g   i n.., 75)= 75
118.3551 0.6927ioctl(8, ZFS_IOC_RECV, 0x08042570)= 0
118.3596 0.0010ioctl(8, ZFS_IOC_OBJSET_STATS, 0x08040900)= 0
118.3598 0.time()= 1336628673
118.3600 0.write(1,  r e c e i v e d   3 1 2.., 45)= 45

zpool iostat (1 second interval) for the period is:

tank12.5T  6.58T175  0   271K  0
tank12.5T  6.58T176  0   299K  0
tank12.5T  6.58T189  0   259K  0
tank12.5T  6.58T156  0   231K  0
tank12.5T  6.58T170  0   243K  0
tank12.5T  6.58T252  0   295K  0
tank12.5T  6.58T179  0   200K  0
tank12.5T  6.58T214  0   258K  0
tank12.5T  6.58T165  0   210K  0
tank12.5T  6.58T154  0   178K  0
tank12.5T  6.58T186  0   221K  0
tank12.5T  6.58T184  0   215K  0
tank12.5T  6.58T218  0   248K  0
tank12.5T  6.58T175  0   228K  0
tank12.5T  6.58T146  0   194K  0
tank12.5T  6.58T 99258   209K  1.50M
tank12.5T  6.58T196296   294K  1.31M
tank12.5T  6.58T188130   229K   776K

Can anyone offer any insight or further debugging tips?

Thanks.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hung zfs destroy

2012-05-07 Thread Ian Collins

On 05/ 8/12 08:36 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Ian Collins

On a Solaris 11 (SR3) system I have a zfs destroy process what appears
to be doing nothing and can't be killed.  It has used 5 seconds of CPU
in a day and a half, but truss -p won't attach.  No data appears to have
been removed.  The dataset (but not the pool) is busy.

I thought this was an old problem that was fixed long ago in Solaris 10
(I had several temporary patches over the years), but it appears to be
alive and well.

How big is your dataset?


Small, 15GB.


  On what type of disks/pool?


Single iSCSI volume.


zfs destroy does indeed take time (unlike zpool destroy.)  A couple of days
might be normal expected behavior, depending on your configuration.  You
didn't specify if you have dedup...  Dedup will greatly hurt your zfs
destroy speed, too.


I've yet to find a system with enough RAM to make dedup worthwhile!

After 5 days, a grand total of 1.2GB has been removed and the process
responded to kill -9 and exited...

I just re-ran the command it it completed in 2 seconds.  Well odd.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-06 Thread Ian Collins
I'm trying to configure a DELL R720 (not a pleasant experience) which 
has an H710p card fitted.


The H710p definitely doesn't support JBOD, but the H310 looks like it 
might (the data sheet mentions non-RAID).  Has anyone used one with ZFS?


Thanks,

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Has anyone used a Dell with a PERC H310?

2012-05-06 Thread Ian Collins

On 05/ 7/12 03:42 PM, Greg Mason wrote:

I am currently trying to get two of these things running Illumian. I don't have 
any particular performance requirements, so I'm thinking of using some sort of 
supported hypervisor, (either RHEL and KVM or VMware ESXi) to get around the 
driver support issues, and passing the disks through to an Illumian guest.

The H310 does indeed support pass-through (the non-raid mode), but one thing to 
keep in mind is that I was only able to configure a single boot disk. I 
configured the rear two drives into a hardware raid 1 and set the virtual disk 
as the boot disk so that I can still boot the system if an OS disk fails.

Once Illumos is better supported on the R720 and the PERC H310, I plan to get 
rid of the hypervisor silliness and run Illumos on bare metal.


Thank you for the feedback Greg.

Using a hypervisor layer is our fall-back position.  My next attempt 
would be SmartOs if I can't get the cards swapped (the R720 currently 
has a Broadcom 5720 NIC).


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Hung zfs destroy

2012-05-04 Thread Ian Collins
On a Solaris 11 (SR3) system I have a zfs destroy process what appears 
to be doing nothing and can't be killed.  It has used 5 seconds of CPU 
in a day and a half, but truss -p won't attach.  No data appears to have 
been removed.  The dataset (but not the pool) is busy.


I thought this was an old problem that was fixed long ago in Solaris 10 
(I had several temporary patches over the years), but it appears to be 
alive and well.


Any hints?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-26 Thread Ian Collins

On 04/26/12 10:12 PM, Jim Klimov wrote:

On 2012-04-26 2:20, Ian Collins wrote:

On 04/26/12 09:54 AM, Bob Friesenhahn wrote:

On Wed, 25 Apr 2012, Rich Teer wrote:

Perhaps I'm being overly simplistic, but in this scenario, what would
prevent
one from having, on a single file server, /exports/nodes/node[0-15],
and then
having each node NFS-mount /exports/nodes from the server? Much simplier
than
your example, and all data is available on all machines/nodes.

This solution would limit bandwidth to that available from that single
server. With the cluster approach, the objective is for each machine
in the cluster to primarily access files which are stored locally.
Whole files could be moved as necessary.

Distributed software building faces similar issues, but I've found once
the common files have been read (and cached) by each node, network
traffic becomes one way (to the file server). I guess that topology
works well when most access to shared data is read.

Which reminds me: older Solarises used to have a nifty-looking
(via descriptions) cachefs, apparently to speed up NFS clients
and reduce traffic, which we did not get to really use in real
life. AFAIK Oracle EOLed it for Solaris 11, and I am not sure
it is in illumos either.


I don't think it even made it into Solaris 10.. I used to use it with 
Solaris 8 back in the days when 100Mb switches were exotic!

Does caching in current Solaris/illumos NFS client replace those
benefits, or did the project have some merits of its own (like
caching into local storage of client, so that the cache was not
empty after reboot)?

It did have local backing store, but my current desktop has more RAM 
than that Solaris 8 box had disk and my network is 100 times faster, so 
it doesn't really matter any more.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Ian Collins

On 04/26/12 09:54 AM, Bob Friesenhahn wrote:

On Wed, 25 Apr 2012, Rich Teer wrote:

Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier
than
your example, and all data is available on all machines/nodes.

This solution would limit bandwidth to that available from that single
server.  With the cluster approach, the objective is for each machine
in the cluster to primarily access files which are stored locally.
Whole files could be moved as necessary.


Distributed software building faces similar issues, but I've found once 
the common files have been read (and cached) by each node, network 
traffic becomes one way (to the file server).  I guess that topology 
works well when most access to shared data is read.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cluster vs nfs

2012-04-25 Thread Ian Collins

On 04/26/12 10:34 AM, Paul Archer wrote:

2:34pm, Rich Teer wrote:


On Wed, 25 Apr 2012, Paul Archer wrote:


Simple. With a distributed FS, all nodes mount from a single DFS. With NFS,
each node would have to mount from each other node. With 16 nodes, that's
what, 240 mounts? Not to mention your data is in 16 different
mounts/directory
structures, instead of being in a unified filespace.

Perhaps I'm being overly simplistic, but in this scenario, what would prevent
one from having, on a single file server, /exports/nodes/node[0-15], and then
having each node NFS-mount /exports/nodes from the server?  Much simplier
than
your example, and all data is available on all machines/nodes.


That assumes the data set will fit on one machine, and that machine won't be a
performance bottleneck.


Aren't those general considerations when specifying a file server?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Two disks giving errors in a raidz pool, advice needed

2012-04-22 Thread Ian Collins

On 04/23/12 01:47 PM, Manuel Ryan wrote:
Hello, I have looked around this mailing list and other virtual spaces 
and I wasn't able to find a similar situation than this weird one.


I have a 6 disks raidz zfs15 pool. After a scrub, the status of the 
pool and all disks still show up as ONLINE but two of the disks are 
starting to give me errors and I do have fatal data corruption. The 
disks seems to be failing differently :


disk 2 has 78 (not growing) read errors, 43k (growing) write errors 
and 3 (not growing) checksum errors.


disk 5 has 0 read errors, 0 write errors but 7.4k checksum errors 
(growing).


Data corruption is around 22k files.

I plan to replace both disks. Which disk do you think should be 
replaced first to loose as few data as possible ?


I was thinking of replacing disk 5 first as it seems to have a lot of 
silent data corruption so maybe it's a bad idea to use it's output 
to replace disk 2. Also checksum and read errors on disk 2 do not seem 
to be growing as I used the pool to backup data (corrupted files could 
not be accessed, but a lot of files were fine) but write errors are 
growing extremely fast. So reading uncorrupted data from disk 2 seems 
to be working but writing on it seems to be problematic.


Do you guys also think I should change disk 5 first or am I missing 
something ?


If it were my data, I'd set the pool read only, backup, rebuild and 
restore.  You do risk further data loss (maybe even pool loss) while the 
new drive is resilvering.


I would only use raidz for unimportant data, or for a copy of data from 
a more robust pool.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Improving snapshot write performance

2012-04-11 Thread Ian Collins
I use an application with a fairly large receive data buffer (256MB) to 
replicate data between sites.


I have noticed the buffer becoming completely full when receiving 
snapshots for some filesystems, even over a slow (~2MB/sec) WAN 
connection.  I assume this is due to the changes being widely scattered.


Is there any way to improve this situation?

Thanks,

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving snapshot write performance

2012-04-11 Thread Ian Collins

On 04/12/12 04:17 AM, Richard Elling wrote:

On Apr 11, 2012, at 1:34 AM, Ian Collins wrote:

I use an application with a fairly large receive data buffer (256MB) 
to replicate data between sites.


I have noticed the buffer becoming completely full when receiving 
snapshots for some filesystems, even over a slow (~2MB/sec) WAN 
connection.  I assume this is due to the changes being widely scattered.


Widely scattered on the sending side, receiving side should be mostly 
contiguous...


That's what I originally thought.

unless you are mostly full or there is some other cause of slow 
writes. The usual disk-oriented
performance analysis will show if this is the case. Most likely, 
something else is going on here.




Odd.  The pool is a single iSCSI volume exported from a 7320 and there 
is 18TB free.


I see the same issues with local replications on our LAN.  The 
filesystems that appear to write slowly are ones containing many small 
files, such as office documents.


Over the WAN, the receive buffer high water mark is usually the TCP 
receive window size, except for the apparently slow filesystems.


I'll add some more diagnostics.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving snapshot write performance

2012-04-11 Thread Ian Collins

On 04/12/12 09:00 AM, Jim Klimov wrote:

2012-04-11 23:55, Ian Collins wrote:

Odd. The pool is a single iSCSI volume exported from a 7320 and there is
18TB free.

Lame question: is that 18Tb free on the pool inside the
iSCSI volume, or on the backing pool on 7320?

I mean that as far as the external pool is concerned,
the zvol's blocks are allocated - even if the internal
pool considers them deleted but did not zero them out
and/or TRIM them explicitly.

Thus there may be lags due to fragmentation on the backing
external pool (physical on 7320), especially if it is
not very free and/or ifs free space is already too heavily
fragmented into many small bubbles.


I'll check, but I see the same effect with local replications as well.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving snapshot write performance

2012-04-11 Thread Ian Collins

On 04/12/12 09:51 AM, Peter Jeremy wrote:

On 2012-Apr-11 18:34:42 +1000, Ian Collinsi...@ianshome.com  wrote:

I use an application with a fairly large receive data buffer (256MB) to
replicate data between sites.

I have noticed the buffer becoming completely full when receiving
snapshots for some filesystems, even over a slow (~2MB/sec) WAN
connection.  I assume this is due to the changes being widely scattered.

As Richard pointed out, the write side should be mostly contiguous.


Is there any way to improve this situation?

Is the target pool nearly full (so ZFS is spending lots of time searching
for free space)?

Do you have dedupe enabled on the target pool?  This would force ZFS to
search the DDT to write blocks - this will be expensive, especially if
you don't have enough RAM.

Do yoy have a high compression level (gzip or gzip-N) on the target
filesystems, without enough CPU horsepower?

Do you have a dying (or dead) disk in the target pool?



No to all of the above!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Puzzling problem with zfs receive exit status

2012-03-29 Thread Ian Collins

On 03/29/12 10:46 PM, Borja Marcos wrote:

Hello,

I hope someone has an idea.

I have a replication program that copies a dataset from one server to another 
one. The replication mechanism is the obvious one, of course:

  zfs send -Ri from snapshot(n-1) snapshot(n)  file
scp file remote machine (I do it this way instead of using a pipeline so that a 
network error won't interrupt a receive data stream)
and on the remote machine,
zfs receive -Fd pool

It's been working perfectly for months, no issues. However, yesterday we began 
to see something weird: the zfs receive being executed on the remote machine is 
exiting with an exit status of 1, even though the replication is finished, and 
I see the copied snapshots on the remote machine.

Any ideas? It's really puzzling. It seems that the replication is working (a 
zfs list -t snapshot shows the new snapshots correctly applied to the dataset) 
but I'm afraid there's some kind of corruption.


Does zfs receive produce any warnings?  Have you tried adding -v?

--
 Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Receive failing with invalid backup stream error

2012-03-09 Thread Ian Collins

On 03/10/12 01:48 AM, Jim Klimov wrote:

2012-03-09 9:24, Ian Collins wrote:

I sent the snapshot to a file, coped the file to the remote host and
piped the file into zfs receive. That worked and I was able to send
further snapshots with ssh.

Odd.

Is it possible that in case of zfs send ... | ssh | zfs recv
piping, the two ZFS processes can have some sort of dialog and
misunderstanding in your case; while zfs-sending to a file has
no dialog and some commonly-working default format/assumptions?

As a wild guess, two systems might have different opinions for
example regarding dedup during dialog, while it was not even
considered when passing through files?


Both systems are identical (same hardware, same SRU).

The receive also fails if the output of the libzfs zfs_send() function 
is connected through a socket to zfs_receive() on the other box.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs send/receive script

2012-03-09 Thread Ian Collins

On 03/10/12 02:48 AM, Cameron Hanover wrote:

On Mar 6, 2012, at 8:26 AM, Carsten John wrote:


Hello everybody,

I set up a script to replicate all zfs filesystems (some 300 user home directories in 
this case) within a given pool to a mirror machine. The basic idea is to send 
the snapshots incremental if the corresponding snapshot exists on the remote side or send 
a complete snapshot if no corresponding previous snapshot is available

Thee setup basically works, but form time to time (within a run over all 
filesystems) I get error messages like:

cannot receive new filesystem stream: dataset is busy or

cannot receive incremental filesystem stream: dataset is busy

I've seen similar error messages from a script I've written, as well.  Mine 
does create a lock file and won't run if a `zfs send` is already in progress.
My only guess is that the second (or third, or...) filesystem starts sending to 
the receiving host before the latter has fully finished the `zfs recv` process. 
 I've considered putting a 5 second pause between successive processes, but the 
errors are intermittent enough that it's pretty low on my to-do list.


I have also seen the same issue (a long time ago) and the application I 
use for replication still has a one second pause between sends to fix  
the problem.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Receive failing with invalid backup stream error

2012-03-08 Thread Ian Collins

On 03/ 3/12 11:57 AM, Ian Collins wrote:

Hello,

I am problems sending some snapshots between two fully up to date
Solaris 11 systems:

zfs send -i tank/live/fs@20120226_0705 tank/live/fs@20120226_1105 | ssh
remote zfs receive -vd fileserver/live
receiving incremental stream of tank/live/fs@20120226_1105 into
fileserver/live/fs@20120226_1105
cannot receive incremental stream: invalid backup stream

Both pools and filesystems are at the latest revision.  Most the other
filesystems in the pool can be sent without issues.

The filesystem was upgraded yesterday, which is when the problems
stared.  The snapshots are from 26/02.

Other filesystems that were upgraded yesterday receive fine, so I don't
think the problem is directly related to the upgrade.

Any ideas?

I haven't had a solution from support yet, but I do have a workaround if 
anyone else encounters the same problem.


I sent the snapshot to a file, coped the file to the remote host and 
piped the file into zfs receive.  That worked and I was able to send 
further snapshots with ssh.


Odd.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Receive failing with invalid backup stream error

2012-03-02 Thread Ian Collins

Hello,

I am problems sending some snapshots between two fully up to date 
Solaris 11 systems:


zfs send -i tank/live/fs@20120226_0705 tank/live/fs@20120226_1105 | ssh 
remote zfs receive -vd fileserver/live
receiving incremental stream of tank/live/fs@20120226_1105 into 
fileserver/live/fs@20120226_1105

cannot receive incremental stream: invalid backup stream

Both pools and filesystems are at the latest revision.  Most the other 
filesystems in the pool can be sent without issues.


The filesystem was upgraded yesterday, which is when the problems 
stared.  The snapshots are from 26/02.


Other filesystems that were upgraded yesterday receive fine, so I don't 
think the problem is directly related to the upgrade.


Any ideas?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs diff performance

2012-02-28 Thread Ian Collins

On 02/28/12 12:53 PM, Ulrich Graef wrote:

Hi Ian,

On 26.02.12 23:42, Ian Collins wrote:

I had high hopes of significant performance gains using zfs diff in
Solaris 11 compared to my home-brew stat based version in Solaris 10.
However the results I have seen so far have been disappointing.

Testing on a reasonably sized filesystem (4TB), a diff that listed 41k
changes took 77 minutes. I haven't tried my old tool, but I would
expect the same diff to take a couple of hours.

Size does not matter (at least here).
How many files do you have and do you have enough cache in main memory
(25% of ARC) or cache device (set to metadata only).


Last time I looked, about 10 million files.

If you are able to manage that every dnode (512 Byte) is in the ARC or
the L2ARC then your compare will fly!

When your are doing too much other stuff (do you IO? Do you have
applications running?)
They will move dnode data out of the direct access and compare needs to
read a lot from disk.


There was a send running form the same pool.


You are comparing a measurement with a guess. That is not a valid test.


The guess is based on the last time I ram my old diff tool.


The box is well specified, an x4270 with 96G of RAM and a FLASH
accelerator card used for log and cache.

Number of files/size of files is missing.


As I said, about 10 million, various sized form bytes to Gbytes.

How much of the pool is used (in %)?


63%

Perhaps the recordsize is lowered, then
How much is used for the cache.
Did you set secondarycache=metadata?


No.


When, is your burn in long enough, that all the metadata is on fast devices?
How large is your L2ARC?


72GB.

What is running in parallel to your test?
What is the disk configuration (you know: disks are slow)?


stripe of 5 2 way mirrors.


Do you use de-duplication (does not directly harm the performance, but
needs memory
and slows down zfs diff through that)?


No dedup!


Tell me the hit rates of the cache (metadata and data in ARC and L2ARC).
Good?


I'll have to check next time I run a diff.

Raidz or mirror?

Are there any ways to improve diff performance?


Yes. Mainly memory. Or use less files.


Tell that to the users!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs diff performance

2012-02-26 Thread Ian Collins
I had high hopes of significant performance gains using zfs diff in 
Solaris 11 compared to my home-brew stat based version in Solaris 10.  
However the results I have seen so far have been disappointing.


Testing on a reasonably sized filesystem (4TB), a diff that listed 41k 
changes took 77 minutes.  I haven't tried my old tool, but I would 
expect the same diff to take a couple of hours.


The box is well specified, an x4270 with 96G of RAM  and a FLASH 
accelerator card used for log and cache.


Are there any ways to improve diff performance?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Server upgrade

2012-02-17 Thread Ian Collins

On 02/17/12 03:54 AM, Edward Ned Harvey wrote:


If you consider paying for solaris - at Oracle, you just pay them for An
OS and they don't care which one you use.  Could be oracle linux, solaris,
or solaris express.  I would recommend solaris 11 express based on personal
experience.  It gets bugfixes and new features sooner than commercial
solaris.


Solaris 11 express is long gone.

You don't just pay them for An OS.  Compare the sensible support 
pricing for their Linux offering the the ridiculous price for Solaris.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Strange send failure

2012-02-08 Thread Ian Collins

Hello,

I'm attempting to dry run the send the root data set of a zone from one 
Solaris 11 host to another:


sudo zfs send -r rpool/zoneRoot/zone@to_send | sudo ssh remote zfs 
receive -ven fileserver/zones


But I'm seeing

cannot receive: stream has unsupported feature, feature flags = 24

The source pool version is 31, the remote pool version is 33.  Both the 
source filesystem and parent on the remote box are version 5.


I've never seen this before, any clues?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Improving L1ARC cache efficiency with dedup

2011-12-08 Thread Ian Collins

On 12/ 9/11 12:39 AM, Darren J Moffat wrote:

On 12/07/11 20:48, Mertol Ozyoney wrote:

Unfortunetly the answer is no. Neither l1 nor l2 cache is dedup aware.

The only vendor i know that can do this is Netapp

In fact , most of our functions, like replication is not dedup aware.
For example, thecnicaly it's possible to optimize our replication that
it does not send daya chunks if a data chunk with the same chechsum
exists in target, without enabling dedup on target and source.

We already do that with 'zfs send -D':

   -D

   Perform dedup processing on the stream. Deduplicated
   streams  cannot  be  received on systems that do not
   support the stream deduplication feature.





Is there any more published information on how this feature works?

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] First zone creation - getting ZFS error

2011-12-08 Thread Ian Collins

On 12/ 9/11 11:37 AM, Betsy Schwartz wrote:

 On Dec 7, 2011, at 9:50 PM, Ian Collins i...@ianshome.com wrote:

On 12/ 7/11 05:12 AM, Mark Creamer wrote:


Since the zfs dataset datastore/zones is created, I don't understand what the 
error is trying to get me to do. Do I have to do:

zfs create datastore/zones/zonemaster

before I can create a zone in that path? That's not in the documentation, so I 
didn't want to do anything until someone can point out my error for me. Thanks 
for your help!


You shouldn't have to, but it won't do any harm.

If you don't get any further, try zones-discuss.

I would also try it without the /zones mountpoint. Putting the zone root dir on 
an alternate mountpoint caused problems for us. Try creating /datastore/zones 
for a zone root home, or just make the zones in  /datastore

Solaris seems to get very easily confused when zone root is anything out of the 
ordinary ( and it really bites you at patch time!)


It shouldn't.

On all my systems, I have:

NAME USED  AVAIL  REFER  MOUNTPOINT
rpool/zoneRoot  11.6G   214G40K  /zoneRoot

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] First zone creation - getting ZFS error

2011-12-07 Thread Ian Collins

On 12/ 7/11 05:12 AM, Mark Creamer wrote:
I'm running OI 151a. I'm trying to create a zone for the first time, 
and am getting an error about zfs. I'm logged in as me, then su - to 
root before running these commands.


I have a pool called datastore, mounted at /datastore

Per the wiki document 
http://wiki.openindiana.org/oi/Building+in+zones, I first created the 
zfs file system (note that the command syntax in the document appears 
to be wrong, so I did the options I wanted separately):


zfs create datastore/zones
zfs set compression=on datastore/zones
zfs set mountpoint=/zones datastore/zones

zfs list shows:

NAME USED  AVAIL  REFER  MOUNTPOINT
datastore   28.5M  7.13T  57.9K  /datastore
datastore/dbdata28.1M  7.13T  28.1M  /datastore/dbdata
datastore/zones 55.9K  7.13T  55.9K  /zones
rpool   27.6G   201G45K  /rpool
rpool/ROOT  2.89G   201G31K  legacy
rpool/ROOT/openindiana  2.89G   201G  2.86G  /
rpool/dump  12.0G   201G  12.0G  -
rpool/export5.53M   201G32K  /export
rpool/export/home   5.50M   201G32K  /export/home
rpool/export/home/mcreamer  5.47M   201G  5.47M  /export/home/mcreamer
rpool/swap  12.8G   213G   137M  -

Then I went about creating the zone:

zonecfg -z zonemaster
create
set autoboot=true
set zonepath=/zones/zonemaster
set ip-type=exclusive
add net
set physical=vnic0
end
exit

That all goes fine, then...

zoneadm -z zonemaster install

which returns...

ERROR: the zonepath must be a ZFS dataset.
The parent directory of the zonepath must be a ZFS dataset so that the
zonepath ZFS dataset can be created properly.


That's odd, it should have worked.

Since the zfs dataset datastore/zones is created, I don't understand 
what the error is trying to get me to do. Do I have to do:


zfs create datastore/zones/zonemaster

before I can create a zone in that path? That's not in the 
documentation, so I didn't want to do anything until someone can point 
out my error for me. Thanks for your help!



You shouldn't have to, but it won't do any harm.

If you don't get any further, try zones-discuss.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Confusing zfs error message

2011-11-26 Thread Ian Collins
I was trying to destroy a filesystem and I was baffled by the following 
error:


zfs destroy -r rpool/test/opt
cannot destroy 'rpool/test/opt/csw@2001_1405': dataset already exists

zfs destroy -r rpool/test/opt/csw@2001_1405
cannot destroy 'rpool/test/opt/csw@2001_1405': snapshot is cloned

It turns out there was a zfs receive writing to the filesystem.

A more sensible error would have been dataset is busy.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Compression

2011-11-22 Thread Ian Collins

On 11/23/11 04:58 PM, Jim Klimov wrote:

2011-11-23 7:39, Matt Breitbach wrote:

So I'm looking at files on my ZFS volume that are compressed, and I'm
wondering to myself, self, are the values shown here the size on disk, or
are they the pre-compressed values.  Google gives me no great results on
the first few pages, so I headed here.

Alas, I can't give a good hint about VMWare - which values
it uses. But here are some numbers it might see (likely
du or ls sizes are in play):

Locally on a ZFS-enabled system you can use ls to normally
list your files. This would show you the logical POSIX file
size, including any referenced-but-not-allocated sparse blocks
(logical size = big, physical size = zero), etc.
Basically, this just gives a range of byte numbers that you
can address in the file, and depending on the underlying FS
all or not all of these bytes are backed by physical storage 1:1.

If you use du on the ZFS filesystem, you'll see the logical
storage size, which takes into account compression and sparse
bytes. So the du size should be not greater than ls size.


It can be significantly bigger:

ls -sh x
   2 x

du -sh x
   1Kx

-- Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Ian Collins

On 11/16/11 01:01 PM, Eric D. Mudama wrote:

On Wed, Nov 16 at  3:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1
disk to 100+ disks in pool. But the speed doesn't vary in any degree.
As I understand 'zfs send' is a limiting factor. I did tests by
sending to /dev/null. It worked out too slow and absolutely not
scalable.
None of cpu/memory/disk activity were in peak load, so there is of
room for improvement.

My belief is that initial/incremental may be affecting it because of
initial versus incremental efficiency of the data layout in the pools,
not because of something inherent in the send/recv process itself.

There are various send/recv improvements (e.g. don't use SSH as a
tunnel) but even that shouldn't be capping you at 17MBytes/sec.

My incrementals get me ~35MB/s consistently.  Each incremental is
10-50GB worth of transfer.


While my incremental sizes are much smaller, the rates I see for dense 
(large blocks of changes, such as media files) incrementals is about the 
same.  I do see much lower rates for more scattered (such as filesystems 
with documents) changes.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hare receiving snapshots become slower?

2011-11-15 Thread Ian Collins

On 11/14/11 04:00 AM, Jeff Savit wrote:

On 11/12/2011 03:04 PM, Ian Collins wrote:


It turns out this was a problem with e1000g interfaces.  When we 
swapped over to an igb port, the problem went away.


Ian,  could you summarize what the e1000g problem was? It might be 
interesting or useful for the list. If you don't want to do that, but 
are willing to tell me off-list that would be appreciated. (Just out 
of curiosity).


I was seeing high latency (2-4 seconds each) when sending large number 
of small snapshots, say a series of incremental snapshots for a 
filesystem that hadn't changed.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Hare receiving snapshots become slower?

2011-11-12 Thread Ian Collins

On 09/30/11 08:12 AM, Ian Collins wrote:

   On 09/30/11 08:03 AM, Bob Friesenhahn wrote:

On Fri, 30 Sep 2011, Ian Collins wrote:

Slowing down replication is not a good move!

Do you prefer pool corruption? ;-)

Probably they fixed a dire bug and this is the cost of the fix.


Could be.  I think I'll raise a support case to find out why.  This is
making it difficult for me to meet a replication guarantee.

It turns out this was a problem with e1000g interfaces.  When we swapped 
over to an igb port, the problem went away.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to set up solaris os and cache within one SSD

2011-11-11 Thread Ian Collins

On 11/11/11 08:52 PM, darkblue wrote:



2011/11/11 Ian Collins i...@ianshome.com mailto:i...@ianshome.com

On 11/11/11 02:42 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org
mailto:zfs-discuss-boun...@opensolaris.org
[mailto:zfs-discuss- mailto:zfs-discuss-
boun...@opensolaris.org mailto:boun...@opensolaris.org]
On Behalf Of darkblue

1 * XEON 5606
1 * supermirco X8DT3-LN4F
6 * 4G RECC RAM
22 * WD RE3 1T harddisk
4 * intel 320 (160G) SSD
1 * supermicro 846E1-900B chassis

I just want to say, this isn't supported hardware, and
although many people will say they do this without problem,
I've heard just as many people (including myself) saying it's
unstable that way.


I've never had issues with Supermicro boards.  I'm using a similar
model and everything on the board is supported.

I recommend buying either the oracle hardware or the nexenta
on whatever they recommend for hardware.

Definitely DO NOT run the free version of solaris without
updates and expect it to be reliable.


That's a bit strong.  Yes I do regularly update my supported
(Oracle) systems, but I've never had problems with my own build
Solaris Express systems.

I waste far more time on (now luckily legacy) fully supported
Solaris 10 boxes!


what does it mean?


Solaris 10 live upgrade is a pain in the arse!  It gets confused when you have 
lots of filesystems, clones and zones.


I am going to install solaris 10 u10 on this server.it 
http://server.it that any problem about compatible?
and which version of solaris or solaris derived do you suggest to 
build storage with the above hardware.


I'm running 11 Express now, upgrading to Solaris 11 this weekend.  Unless you 
have good reason to use Solaris 10, use Solaris 11 or OpenIndiana.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to set up solaris os and cache within one SSD

2011-11-10 Thread Ian Collins

On 11/11/11 02:42 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of darkblue

1 * XEON 5606
1 * supermirco X8DT3-LN4F
6 * 4G RECC RAM
22 * WD RE3 1T harddisk
4 * intel 320 (160G) SSD
1 * supermicro 846E1-900B chassis

I just want to say, this isn't supported hardware, and although many people 
will say they do this without problem, I've heard just as many people 
(including myself) saying it's unstable that way.


I've never had issues with Supermicro boards.  I'm using a similar model 
and everything on the board is supported.

I recommend buying either the oracle hardware or the nexenta on whatever they 
recommend for hardware.

Definitely DO NOT run the free version of solaris without updates and expect it 
to be reliable.


That's a bit strong.  Yes I do regularly update my supported (Oracle) 
systems, but I've never had problems with my own build Solaris Express 
systems.


I waste far more time on (now luckily legacy) fully supported Solaris 10 
boxes!


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Stream versions in Solaris 10.

2011-11-04 Thread Ian Collins

 On 11/ 5/11 02:37 PM, Matthew Ahrens wrote:
On Wed, Oct 19, 2011 at 1:52 AM, Ian Collins i...@ianshome.com 
mailto:i...@ianshome.com wrote:


 I just tried sending from a oi151a system to a Solaris 10 backup
server and the server barfed with

zfs_receive: stream is unsupported version 17

I can't find any documentation linking stream version to release,
so does anyone know the Update 10 stream version?


The stream version here is actually the zfs send stream version, which 
is different from the zpool (SPA) and zfs (ZPL) version numbers.


17 is DMU_BACKUP_FEATURE_SA_SPILL (42) + DMU_SUBSTREAM (1).  The 
SA_SPILL feature is enabled when sending a filesystem of version 5 
(System attributes) or later.


So the problem is that you are sending a version 5 zfs filesystem to a 
system that does not support filesystem version 5.



Thank you Matt.

Are these DMU details documented anywhere?  I'm familiar with the SPA 
and ZPL defines in zfs.h.


Odd coincidence: I was reading your blog when this reply came through!

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Log disk with all ssd pool?

2011-10-28 Thread Ian Collins

 On 10/28/11 07:04 PM, Mark Wolek wrote:


Still kicking around this idea and didn’t see it addressed in any of 
the threads before the forum closed.


If one made an all ssd pool, would a log/cache drive just slow you 
down? Would zil slow you down?




I would guess not, you would still be spreading your IOPs. I haven't 
tried an all SSD pool, but I have tried adding a lump of spinning rust 
as a log to pool of identical dives and it did give a small improvement 
to NFS performance.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Stream versions in Solaris 10.

2011-10-19 Thread Ian Collins
 I just tried sending from a oi151a system to a Solaris 10 backup 
server and the server barfed with


zfs_receive: stream is unsupported version 17

I can't find any documentation linking stream version to release, so 
does anyone know the Update 10 stream version?


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] about btrfs and zfs

2011-10-18 Thread Ian Collins

 On 10/19/11 03:12 AM, Paul Kraus wrote:

On Tue, Oct 18, 2011 at 9:13 AM, Darren J Moffat
darr...@opensolaris.org  wrote:

On 10/18/11 14:04, Jim Klimov wrote:

2011-10-18 16:26, Darren J Moffat пишет:

ZFS does slightly biases new vdevs for new writes so that we will get
to a more even spread. It doesn't go and move already written blocks
onto the new vdevs though. So while there isn't an admin interface to
rebalancing ZFS does do something in this area.

This is implemented in metaslab_alloc_dva()


http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/zfs/metaslab.c


See lines 1356-1378


And the admin interface would be what exactly?..

As I said there isn't one because that isn't how it works today it is all
automatic and only for new writes.

I was pointing out that ZFS does do 'something' not that it had an exactly
matching feature.

I have done a poor man's rebalance by copying data after adding
devices. I know this is not a substitute for a real online rebalance,
but it gets the job done (if you can take the data offline, I do it a
small chunk at a time).


I do the same.

Whether you do the balance by hand, or the filesystem does it the data 
still has to be moved around which can be resource intensive.  I'd 
rather do that at a time of my choosing.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  1   2   3   4   5   6   7   8   >