[zfs-discuss] Scrubbing the ZIL

2008-11-21 Thread Chris Gerhard
If you have a separate ZIL device is there any way to scrub the data in it?

I appreciate that the data in the ZIL is only there for a short time but since 
it is never read if you had a misbehaving ZIL device that was just throwing the 
data away you could potentially run like this for many months and only discover 
the problem when you reboot and go to read the ZIL to replay it.

So is there anyway to verify the ZIL device is working as expected (ie can 
return the data written into it) while the system is running?

--chris
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Scrubbing the ZIL

2008-11-21 Thread Darren J Moffat
Chris Gerhard wrote:
 If you have a separate ZIL device is there any way to scrub the data in it?

zpool scrub traverses the ZIL regardless of wither or not it is in a 
slog device on in one of the normal pool devices.

 I appreciate that the data in the ZIL is only there for a short time but 
 since it is never read if you had a misbehaving ZIL device that was just 
 throwing the data away you could potentially run like this for many months 
 and only discover the problem when you reboot and go to read the ZIL to 
 replay it.
 
 So is there anyway to verify the ZIL device is working as expected (ie can 
 return the data written into it) while the system is running?

Do a sync write which will cause the ZIL to be used then before the txg 
is commited run 'zdb -ivv poolname'.

Or if you are feeling really brave and don't mind exporting the pool you 
can use the undocumented test capability and run 'zpool freeze' then do 
your writes (even a normal non sync will be enough here), the export and 
reimport the pool.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Peculiar disk loading on raidz2

2008-11-21 Thread Will Murnane
On Fri, Nov 21, 2008 at 14:35, Charles Menser [EMAIL PROTECTED] wrote:
 I have a 5 drive raidz2 pool which I have a iscsi share on. While
 backing up a MacOS drive to it I noticed some very strange access
 patterns, and wanted to know if what I am seeing is normal, or not.

 There are times when all five drives are accessed equally, and there
 are times when only three of them are seeing any load.
What does zpool status say?  How are the drives connected?  To what
controller(s)?

This  could just be some degree of asynchronicity showing up.  Take a
look at these two:
  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
main_pool852G  3.70T361  1.30K  2.78M  10.1M
 raidz2 852G  3.70T361  1.30K  2.78M  10.1M
   c5t5d0  -  -180502  1.25M  3.57M
   c5t3d0  -  -205330  1.30M  2.73M
   c5t4d0  -  -239489  1.43M  2.81M
   c5t2d0  -  -205 17  1.25M  26.1K
   c5t1d0  -  -248 13  1.41M  25.1K
--  -  -  -  -  -  -

  capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
main_pool852G  3.70T 10  2.02K  77.7K  15.8M
 raidz2 852G  3.70T 10  2.02K  77.7K  15.8M
   c5t5d0  -  -  2921   109K  6.52M
   c5t3d0  -  -  9691   108K  5.63M
   c5t4d0  -  -  9962   105K  5.97M
   c5t2d0  -  -  9  1.30K   167K  8.50M
   c5t1d0  -  -  2  1.23K   150K  8.54M
--  -  -  -  -  -  -

For c5t5d0, a total of 3.57+6.52 MB of IO happen: 10.09 MB;
For c5t3d0, a total of 2.73+5.63 MB of IO happen: 8.36 MB;
For c5t4d0, a total of 2.81+5.97 MB of IO happen: 8.78 MB;
For c5t2d0, a total of (~0)+8.50 MB of IO happen: 8.50 MB;
and for c5t1d0, a total of (~0) + 8.54 MB of IO happen: 8.54 MB.

So over time, the amount written to each drive is approximately the
same.  This being the case, I don't think I'd worry about it too
much... but a scrub is a fairly cheap way to get peace of mind.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] So close to better, faster, cheaper....

2008-11-21 Thread Kam
Posted for my friend Marko:

I've been reading up on ZFS with the idea to build a home NAS.

My ideal home NAS would have:

- high performance via striping
- fault tolerance with selective use of multiple copies attribute
- cheap by getting the most efficient space utilization possible (not raidz, 
not mirroring)
- scalability


I was hoping to start with 4 1TB disks, in a single striped pool with only some 
filesystems
set to copies=2.

I would be able to survive a single disk failure for my data which was on the 
copies2 filesystem.

(trusting that I had enough free space across multiple disks that copies2 
writes were not placed
on the same physical disk)

I could grow this filesystem just by adding single disks.

Theoretically, at some point in time I would switch to copies=3 to increase my 
chances of surviving
two disk failures. The block checksums would be a useful in early detection of 
failed disks.


The major snag I discovered is that if a striped pool loses a disk, I can still 
read and write from
the remaining data, but I cannot reboot and remount a partial piece of the 
stripe, even with -f.

For example, if I lost some of my single copies data, I'd like to still 
access the good data, pop in a
new (potentially larger) disk, re cp the important data to have multiple 
copies rebuilt, and not have
to rebuild the entire pool structure.


So the feature request would be for zfs to allow selective disk removal from 
striped pools, with the
resultant data loss, but any data that survived, either by chance (living on 
the remaining disks) or
policy (multiple copies) would still be accessible.

Is there some underlying reason in zfs that precludes this functionality?

If the filesystem partially-survives while the striped pool member disk fails 
and the box is still up, why not after a reboot?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Pawel Tecza
Hello All,

This is my zfs list:

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 10,5G  3,85G61K  /rpool
rpool/ROOT9,04G  3,85G18K  legacy
rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
rpool/dump 256M  3,85G   256M  -
rpool/export   747M  3,85G19K  /export
rpool/export/home  747M  3,85G   747M  /export/home
rpool/swap 524M  3,85G   524M  -

Today I've created one snapshot as below:

# zfs snapshot rpool/ROOT/[EMAIL PROTECTED]

Ufortunately I can't see it, because `zfs list` command doesn't show it:

# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool 10,5G  3,85G61K  /rpool
rpool/ROOT9,04G  3,85G18K  legacy
rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
rpool/dump 256M  3,85G   256M  -
rpool/export   747M  3,85G19K  /export
rpool/export/home  747M  3,85G   747M  /export/home
rpool/swap 524M  3,85G   524M  -

I know the snapshot exists, because I can't create the same again:

# zfs snapshot rpool/ROOT/[EMAIL PROTECTED]
cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset 
already exists

Is it a strange? How can you explain that?

I use OpenSolaris 2008.11 snv_101a:

# uname -a
SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris

My best regards,

Pawel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Ahmed Kamal
zfs list -t snapshot ?

On Sat, Nov 22, 2008 at 1:14 AM, Pawel Tecza [EMAIL PROTECTED] wrote:

 Hello All,

 This is my zfs list:

 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 rpool 10,5G  3,85G61K  /rpool
 rpool/ROOT9,04G  3,85G18K  legacy
 rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
 rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
 rpool/dump 256M  3,85G   256M  -
 rpool/export   747M  3,85G19K  /export
 rpool/export/home  747M  3,85G   747M  /export/home
 rpool/swap 524M  3,85G   524M  -

 Today I've created one snapshot as below:

 # zfs snapshot rpool/ROOT/[EMAIL PROTECTED]

 Ufortunately I can't see it, because `zfs list` command doesn't show it:

 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 rpool 10,5G  3,85G61K  /rpool
 rpool/ROOT9,04G  3,85G18K  legacy
 rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
 rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
 rpool/dump 256M  3,85G   256M  -
 rpool/export   747M  3,85G19K  /export
 rpool/export/home  747M  3,85G   747M  /export/home
 rpool/swap 524M  3,85G   524M  -

 I know the snapshot exists, because I can't create the same again:

 # zfs snapshot rpool/ROOT/[EMAIL PROTECTED]
 cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset
 already exists

 Is it a strange? How can you explain that?

 I use OpenSolaris 2008.11 snv_101a:

 # uname -a
 SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris

 My best regards,

 Pawel

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Prabahar Jeyaram
'zfs list' by default does not list the snapshots.

You need to use '-t snapshot' option with zfs list to view the snapshots.

--
Prabahar.

On Sat, Nov 22, 2008 at 12:14:47AM +0100, Pawel Tecza wrote:
 Hello All,
 
 This is my zfs list:
 
 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 rpool 10,5G  3,85G61K  /rpool
 rpool/ROOT9,04G  3,85G18K  legacy
 rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
 rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
 rpool/dump 256M  3,85G   256M  -
 rpool/export   747M  3,85G19K  /export
 rpool/export/home  747M  3,85G   747M  /export/home
 rpool/swap 524M  3,85G   524M  -
 
 Today I've created one snapshot as below:
 
 # zfs snapshot rpool/ROOT/[EMAIL PROTECTED]
 
 Ufortunately I can't see it, because `zfs list` command doesn't show it:
 
 # zfs list
 NAME   USED  AVAIL  REFER  MOUNTPOINT
 rpool 10,5G  3,85G61K  /rpool
 rpool/ROOT9,04G  3,85G18K  legacy
 rpool/ROOT/opensolaris89,7M  3,85G  5,44G  legacy
 rpool/ROOT/opensolaris-1  8,95G  3,85G  5,52G  legacy
 rpool/dump 256M  3,85G   256M  -
 rpool/export   747M  3,85G19K  /export
 rpool/export/home  747M  3,85G   747M  /export/home
 rpool/swap 524M  3,85G   524M  -
 
 I know the snapshot exists, because I can't create the same again:
 
 # zfs snapshot rpool/ROOT/[EMAIL PROTECTED]
 cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset 
 already exists
 
 Is it a strange? How can you explain that?
 
 I use OpenSolaris 2008.11 snv_101a:
 
 # uname -a
 SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris
 
 My best regards,
 
 Pawel
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Andrew Gabriel
Pawel Tecza wrote:
 But I still don't understand why `zfs list` doesn't display snapshots
 by default. I saw it in the Net many times at the examples of zfs usage.
   

It was changed.

zfs list -t all

gives you everything, like zfs list used to.

-- 
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Pawel Tecza
Prabahar Jeyaram pisze:
 'zfs list' by default does not list the snapshots.

 You need to use '-t snapshot' option with zfs list to view the snapshots.
Hello Prabahar,

Thank you very much for your fast explanation! Did `zfs list` always
work in that way or it is default behaviour of the latest version?
I'm sure I can google many examples of `zfs list` with snapshots
in a result.

Cheers,

Pawel

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Malachi de Ælfweald
It used to. Although, with the Time Slider now, I agree that it shouldn't by
default

Malachi

On Fri, Nov 21, 2008 at 3:29 PM, Pawel Tecza [EMAIL PROTECTED] wrote:

 Ahmed Kamal pisze:
  zfs list -t snapshot ?
 Hi Ahmed,

 Thanks a lot for the hint! It works. I didn't know that I have so many
 snapshots :D

 # zfs list -t snapshot
 NAMEUSED  AVAIL
 REFER  MOUNTPOINT
 [EMAIL PROTECTED]20K  -
 58K  -
 rpool/[EMAIL PROTECTED] 0  -
 18K  -
 rpool/ROOT/[EMAIL PROTECTED]   1,47G  -
 2,74G  -
 rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:12:17   124K  -
 4,83G  -
 rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:13:16   119K  -
 4,83G  -
 rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:13:59  16,6M  -
 4,84G  -
 rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-23:21:06  76,4M  -
 4,83G  -
 rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-19-15:51:21  65,7M  -
 5,44G  -
 rpool/ROOT/[EMAIL PROTECTED]:30:33   12,6M  -
 5,51G  -
 rpool/ROOT/[EMAIL PROTECTED]:03:43248K  -
 5,52G  -
 rpool/ROOT/[EMAIL PROTECTED] 178K  -
 5,52G  -
 rpool/[EMAIL PROTECTED] 15K  -
 19K  -
 rpool/export/[EMAIL PROTECTED]19K  -
 21K  -

 But I still don't understand why `zfs list` doesn't display snapshots
 by default. I saw it in the Net many times at the examples of zfs usage.

 Have a nice weekend! :)

 Pawel

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread David Pacheco
Pawel Tecza wrote:
 But I still don't understand why `zfs list` doesn't display snapshots
 by default. I saw it in the Net many times at the examples of zfs usage.

This was PSARC/2008/469 - excluding snapshot info from 'zfs list'

http://opensolaris.org/os/community/on/flag-days/pages/2008091003/

-- Dave


-- 
David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-21 Thread Vincent Kéravec
I just try ZFS on one of our slave and got some really bad performance.

When I start the server yesterday, it was able to keep up with the main server 
without problem but after two days of consecutive run the server is crushed by 
IO.

After running the dtrace script iopattern, I notice that the workload is now 
100% Random IO. Copying the database (140Go) from one directory to an other 
took more than 4 hours without any other tasks running on the server, and all 
the reads on table that where updated where random... Keeping an eye on 
iopattern and zpool iostat I saw that when the systems was accessing file that 
have not been changed the disk was reading sequentially at more than 50Mo/s but 
when reading files that changed often the speed got down to 2-3 Mo/s.

The server has plenty of diskplace so it should not have such a level of file 
fragmentation in such a short time.

For information I'm using solaris 10/08 with a mirrored root pool on two 1Tb 
Sata harddisk (slow with random io). I'm using MySQL 5.0.67 with MyISAM engine. 
The zfs recordsize is 8k as recommended on the zfs guide.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Jens Elkner
On Fri, Nov 21, 2008 at 03:42:17PM -0800, David Pacheco wrote:
 Pawel Tecza wrote:
  But I still don't understand why `zfs list` doesn't display snapshots
  by default. I saw it in the Net many times at the examples of zfs usage.
 
 This was PSARC/2008/469 - excluding snapshot info from 'zfs list'
 
 http://opensolaris.org/os/community/on/flag-days/pages/2008091003/

The uncomplete one - where is the '-t all' option? It's really annoying,
error prone, time consuming to type stories on the command line ...
Does anybody remember the keep it small and simple thing?

Regards,
jel.
-- 
Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
39106 Magdeburg, Germany Tel: +49 391 67 12768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] `zfs list` doesn't show my snapshot

2008-11-21 Thread Tim
On Fri, Nov 21, 2008 at 9:38 PM, Jens Elkner
[EMAIL PROTECTED][EMAIL PROTECTED]
 wrote:



 The uncomplete one - where is the '-t all' option? It's really annoying,
 error prone, time consuming to type stories on the command line ...
 Does anybody remember the keep it small and simple thing?

 Regards,
 jel.
 --
 Otto-von-Guericke University http://www.cs.uni-magdeburg.de/
 Department of Computer Science   Geb. 29 R 027, Universitaetsplatz 2
 39106 Magdeburg, Germany Tel: +49 391 67 12768
 __



How is defaulting to output that makes the command unusable to the majority
of their customers keeping it simple?  Their choice of implementation does
leave something to be desired though... I would think it would make more
sense to have something like zfs list snapshots, and if you wanted to
limit that to a specific pool zfs list snapshots poolname.

--TIm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-21 Thread zerk
Hi,

I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig sata 
drives in raidz1.

With dd, I can write at quasi disk maximum speed of 80meg each for a total of 
250meg/s if I have no Xsession at all (only console tty).

But as soon as I have an Xsession running, the write speed drops to about 
120MB/s.
Its even worse if I have a VBoxHeadless running with an idle win2k3 inside. It 
drops to 30 MB/s.

The CPU is at 0% in both cases and nothing is using the array either. I tried 
to investigate with DTrace without success...

Anyone have a clue of what could be going on?

Thanks

Zerk
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X

2008-11-21 Thread Tim
On Fri, Nov 21, 2008 at 11:33 PM, zerk [EMAIL PROTECTED] wrote:

 Hi,

 I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig
 sata drives in raidz1.

 With dd, I can write at quasi disk maximum speed of 80meg each for a total
 of 250meg/s if I have no Xsession at all (only console tty).

 But as soon as I have an Xsession running, the write speed drops to about
 120MB/s.
 Its even worse if I have a VBoxHeadless running with an idle win2k3 inside.
 It drops to 30 MB/s.

 The CPU is at 0% in both cases and nothing is using the array either. I
 tried to investigate with DTrace without success...

 Anyone have a clue of what could be going on?

 Thanks

 Zerk
 --


Ya, you're using gobs of ram that was normally being used by zfs for
caching.  I would venture to guess if you stuck another 2GB ram in there
you'd see far less of a *hit* from X or a VM.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] So close to better, faster, cheaper.... zfs stripe pool survival

2008-11-21 Thread MC
 Posted for my friend Marko:
 
 I've been reading up on ZFS with the idea to build a
 home NAS.
 
 My ideal home NAS would have:
 
 - high performance via striping
 - fault tolerance with selective use of multiple
 copies attribute
 - cheap by getting the most efficient space
 utilization possible (not raidz, not mirroring)
 - scalability
 
 
 I was hoping to start with 4 1TB disks, in a single
 striped pool with only some filesystems
 set to copies=2.
 
 I would be able to survive a single disk failure for
 my data which was on the copies2 filesystem.
 
 (trusting that I had enough free space across
 multiple disks that copies2 writes were not placed
 on the same physical disk)
 
 I could grow this filesystem just by adding single
 disks.
 
 Theoretically, at some point in time I would switch
 to copies=3 to increase my chances of surviving
 two disk failures. The block checksums would be a
 useful in early detection of failed disks.
 
 
 The major snag I discovered is that if a striped pool
 loses a disk, I can still read and write from
 the remaining data, but I cannot reboot and remount a
 partial piece of the stripe, even with -f.
 
 For example, if I lost some of my single copies
 data, I'd like to still access the good data, pop in
 a
 new (potentially larger) disk, re cp the important
 data to have multiple copies rebuilt, and not have
 to rebuild the entire pool structure.
 
 
 So the feature request would be for zfs to allow
 selective disk removal from striped pools, with the
 resultant data loss, but any data that survived,
 either by chance (living on the remaining disks) or
 policy (multiple copies) would still be accessible.
 
 Is there some underlying reason in zfs that precludes
 this functionality?
 
 If the filesystem partially-survives while the
 striped pool member disk fails and the box is still
 up, why not after a reboot?

You may never get a good answer to this, so I'll give it to you straight up.  
ZFS doesn't do this because no business using Sun products wants to do this.  
Thus nobody at Sun ever made ZFS do this.  Maybe you can convince someone at 
Sun to care about this feature, but I doubt it because it is a pretty fringe 
use case.  

In the end you can probably work around this problem, though.  Striping doesn't 
improve performance that much and it doesn't provide that much more space.  
Next year we'll be using 2TB hard drives, and when you can make a 6TB RAIDZ 
array with 4 hard drives one year and a 7.5TB one the year after, and put them 
both in the same pool so it looks like 13.5TB coming from 8 drives that can 
tolerate 1/4 + 1/4 drives failing, that isn't too shabby.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS fragmentation with MySQL databases

2008-11-21 Thread Kees Nuyt
[Default] On Fri, 21 Nov 2008 17:20:48 PST, Vincent Kéravec
[EMAIL PROTECTED] wrote:

 I just try ZFS on one of our slave and got some really
 bad performance.
 
 When I start the server yesterday, it was able to keep
 up with the main server without problem but after two
 days of consecutive run the server is crushed by IO.
 
 After running the dtrace script iopattern, I notice
 that the workload is now 100% Random IO. Copying the
 database (140Go) from one directory to an other took
 more than 4 hours without any other tasks running on
 the server, and all the reads on table that where
 updated where random... Keeping an eye on iopattern and
 zpool iostat I saw that when the systems was accessing
 file that have not been changed the disk was reading
 sequentially at more than 50Mo/s but when reading files
 that changed often the speed got down to 2-3 Mo/s.

Good observation and analysis.
 
 The server has plenty of diskplace so it should not
 have such a level of file fragmentation in such a short
 time.

My explanation would be: Whenever a block within a file
changes, zfs has to write it at another location (copy on
write), so the previous version isn't immediately lost.

Zfs will try to keep the new version of the block close to
the original one, but after several changes on the same
database page, things get pretty messed up and logical
sequential I/O becomes pretty much physically random indeed.

The original blocks will eventually be added to the freelist
and reused, so proximity can be restored, but it will never
be 100% sequential again.
The effect is larger when many snapshots are kept, because
older block versions are not freed, or when the same block
is changed very often and freelist updating has to be
postponed.

That is the trade-off between always consistent and
fast.

 For information I'm using solaris 10/08 with a mirrored
 root pool on two 1Tb Sata harddisk (slow with random
 io). I'm using MySQL 5.0.67 with MyISAM engine. The zfs
 recordsize is 8k as recommended on the zfs guide.

I would suggest to enlarge the MyISAM buffers.
The InnoDB engine does copy on write within its data files,
so things might be different there. 
-- 
  (  Kees Nuyt
  )
c[_]
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss