[zfs-discuss] Scrubbing the ZIL
If you have a separate ZIL device is there any way to scrub the data in it? I appreciate that the data in the ZIL is only there for a short time but since it is never read if you had a misbehaving ZIL device that was just throwing the data away you could potentially run like this for many months and only discover the problem when you reboot and go to read the ZIL to replay it. So is there anyway to verify the ZIL device is working as expected (ie can return the data written into it) while the system is running? --chris -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Scrubbing the ZIL
Chris Gerhard wrote: If you have a separate ZIL device is there any way to scrub the data in it? zpool scrub traverses the ZIL regardless of wither or not it is in a slog device on in one of the normal pool devices. I appreciate that the data in the ZIL is only there for a short time but since it is never read if you had a misbehaving ZIL device that was just throwing the data away you could potentially run like this for many months and only discover the problem when you reboot and go to read the ZIL to replay it. So is there anyway to verify the ZIL device is working as expected (ie can return the data written into it) while the system is running? Do a sync write which will cause the ZIL to be used then before the txg is commited run 'zdb -ivv poolname'. Or if you are feeling really brave and don't mind exporting the pool you can use the undocumented test capability and run 'zpool freeze' then do your writes (even a normal non sync will be enough here), the export and reimport the pool. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Peculiar disk loading on raidz2
On Fri, Nov 21, 2008 at 14:35, Charles Menser [EMAIL PROTECTED] wrote: I have a 5 drive raidz2 pool which I have a iscsi share on. While backing up a MacOS drive to it I noticed some very strange access patterns, and wanted to know if what I am seeing is normal, or not. There are times when all five drives are accessed equally, and there are times when only three of them are seeing any load. What does zpool status say? How are the drives connected? To what controller(s)? This could just be some degree of asynchronicity showing up. Take a look at these two: capacity operationsbandwidth pool used avail read write read write -- - - - - - - main_pool852G 3.70T361 1.30K 2.78M 10.1M raidz2 852G 3.70T361 1.30K 2.78M 10.1M c5t5d0 - -180502 1.25M 3.57M c5t3d0 - -205330 1.30M 2.73M c5t4d0 - -239489 1.43M 2.81M c5t2d0 - -205 17 1.25M 26.1K c5t1d0 - -248 13 1.41M 25.1K -- - - - - - - capacity operationsbandwidth pool used avail read write read write -- - - - - - - main_pool852G 3.70T 10 2.02K 77.7K 15.8M raidz2 852G 3.70T 10 2.02K 77.7K 15.8M c5t5d0 - - 2921 109K 6.52M c5t3d0 - - 9691 108K 5.63M c5t4d0 - - 9962 105K 5.97M c5t2d0 - - 9 1.30K 167K 8.50M c5t1d0 - - 2 1.23K 150K 8.54M -- - - - - - - For c5t5d0, a total of 3.57+6.52 MB of IO happen: 10.09 MB; For c5t3d0, a total of 2.73+5.63 MB of IO happen: 8.36 MB; For c5t4d0, a total of 2.81+5.97 MB of IO happen: 8.78 MB; For c5t2d0, a total of (~0)+8.50 MB of IO happen: 8.50 MB; and for c5t1d0, a total of (~0) + 8.54 MB of IO happen: 8.54 MB. So over time, the amount written to each drive is approximately the same. This being the case, I don't think I'd worry about it too much... but a scrub is a fairly cheap way to get peace of mind. Will ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] So close to better, faster, cheaper....
Posted for my friend Marko: I've been reading up on ZFS with the idea to build a home NAS. My ideal home NAS would have: - high performance via striping - fault tolerance with selective use of multiple copies attribute - cheap by getting the most efficient space utilization possible (not raidz, not mirroring) - scalability I was hoping to start with 4 1TB disks, in a single striped pool with only some filesystems set to copies=2. I would be able to survive a single disk failure for my data which was on the copies2 filesystem. (trusting that I had enough free space across multiple disks that copies2 writes were not placed on the same physical disk) I could grow this filesystem just by adding single disks. Theoretically, at some point in time I would switch to copies=3 to increase my chances of surviving two disk failures. The block checksums would be a useful in early detection of failed disks. The major snag I discovered is that if a striped pool loses a disk, I can still read and write from the remaining data, but I cannot reboot and remount a partial piece of the stripe, even with -f. For example, if I lost some of my single copies data, I'd like to still access the good data, pop in a new (potentially larger) disk, re cp the important data to have multiple copies rebuilt, and not have to rebuild the entire pool structure. So the feature request would be for zfs to allow selective disk removal from striped pools, with the resultant data loss, but any data that survived, either by chance (living on the remaining disks) or policy (multiple copies) would still be accessible. Is there some underlying reason in zfs that precludes this functionality? If the filesystem partially-survives while the striped pool member disk fails and the box is still up, why not after a reboot? -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] `zfs list` doesn't show my snapshot
Hello All, This is my zfs list: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - Today I've created one snapshot as below: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] Ufortunately I can't see it, because `zfs list` command doesn't show it: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - I know the snapshot exists, because I can't create the same again: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset already exists Is it a strange? How can you explain that? I use OpenSolaris 2008.11 snv_101a: # uname -a SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris My best regards, Pawel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
zfs list -t snapshot ? On Sat, Nov 22, 2008 at 1:14 AM, Pawel Tecza [EMAIL PROTECTED] wrote: Hello All, This is my zfs list: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - Today I've created one snapshot as below: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] Ufortunately I can't see it, because `zfs list` command doesn't show it: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - I know the snapshot exists, because I can't create the same again: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset already exists Is it a strange? How can you explain that? I use OpenSolaris 2008.11 snv_101a: # uname -a SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris My best regards, Pawel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
'zfs list' by default does not list the snapshots. You need to use '-t snapshot' option with zfs list to view the snapshots. -- Prabahar. On Sat, Nov 22, 2008 at 12:14:47AM +0100, Pawel Tecza wrote: Hello All, This is my zfs list: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - Today I've created one snapshot as below: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] Ufortunately I can't see it, because `zfs list` command doesn't show it: # zfs list NAME USED AVAIL REFER MOUNTPOINT rpool 10,5G 3,85G61K /rpool rpool/ROOT9,04G 3,85G18K legacy rpool/ROOT/opensolaris89,7M 3,85G 5,44G legacy rpool/ROOT/opensolaris-1 8,95G 3,85G 5,52G legacy rpool/dump 256M 3,85G 256M - rpool/export 747M 3,85G19K /export rpool/export/home 747M 3,85G 747M /export/home rpool/swap 524M 3,85G 524M - I know the snapshot exists, because I can't create the same again: # zfs snapshot rpool/ROOT/[EMAIL PROTECTED] cannot create snapshot 'rpool/ROOT/[EMAIL PROTECTED]': dataset already exists Is it a strange? How can you explain that? I use OpenSolaris 2008.11 snv_101a: # uname -a SunOS oklahoma 5.11 snv_101a i86pc i386 i86pc Solaris My best regards, Pawel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
Pawel Tecza wrote: But I still don't understand why `zfs list` doesn't display snapshots by default. I saw it in the Net many times at the examples of zfs usage. It was changed. zfs list -t all gives you everything, like zfs list used to. -- Andrew ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
Prabahar Jeyaram pisze: 'zfs list' by default does not list the snapshots. You need to use '-t snapshot' option with zfs list to view the snapshots. Hello Prabahar, Thank you very much for your fast explanation! Did `zfs list` always work in that way or it is default behaviour of the latest version? I'm sure I can google many examples of `zfs list` with snapshots in a result. Cheers, Pawel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
It used to. Although, with the Time Slider now, I agree that it shouldn't by default Malachi On Fri, Nov 21, 2008 at 3:29 PM, Pawel Tecza [EMAIL PROTECTED] wrote: Ahmed Kamal pisze: zfs list -t snapshot ? Hi Ahmed, Thanks a lot for the hint! It works. I didn't know that I have so many snapshots :D # zfs list -t snapshot NAMEUSED AVAIL REFER MOUNTPOINT [EMAIL PROTECTED]20K - 58K - rpool/[EMAIL PROTECTED] 0 - 18K - rpool/ROOT/[EMAIL PROTECTED] 1,47G - 2,74G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:12:17 124K - 4,83G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:13:16 119K - 4,83G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-22:13:59 16,6M - 4,84G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-17-23:21:06 76,4M - 4,83G - rpool/ROOT/[EMAIL PROTECTED]:-:2008-11-19-15:51:21 65,7M - 5,44G - rpool/ROOT/[EMAIL PROTECTED]:30:33 12,6M - 5,51G - rpool/ROOT/[EMAIL PROTECTED]:03:43248K - 5,52G - rpool/ROOT/[EMAIL PROTECTED] 178K - 5,52G - rpool/[EMAIL PROTECTED] 15K - 19K - rpool/export/[EMAIL PROTECTED]19K - 21K - But I still don't understand why `zfs list` doesn't display snapshots by default. I saw it in the Net many times at the examples of zfs usage. Have a nice weekend! :) Pawel ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
Pawel Tecza wrote: But I still don't understand why `zfs list` doesn't display snapshots by default. I saw it in the Net many times at the examples of zfs usage. This was PSARC/2008/469 - excluding snapshot info from 'zfs list' http://opensolaris.org/os/community/on/flag-days/pages/2008091003/ -- Dave -- David Pacheco, Sun Microsystems Fishworks. http://blogs.sun.com/dap/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] ZFS fragmentation with MySQL databases
I just try ZFS on one of our slave and got some really bad performance. When I start the server yesterday, it was able to keep up with the main server without problem but after two days of consecutive run the server is crushed by IO. After running the dtrace script iopattern, I notice that the workload is now 100% Random IO. Copying the database (140Go) from one directory to an other took more than 4 hours without any other tasks running on the server, and all the reads on table that where updated where random... Keeping an eye on iopattern and zpool iostat I saw that when the systems was accessing file that have not been changed the disk was reading sequentially at more than 50Mo/s but when reading files that changed often the speed got down to 2-3 Mo/s. The server has plenty of diskplace so it should not have such a level of file fragmentation in such a short time. For information I'm using solaris 10/08 with a mirrored root pool on two 1Tb Sata harddisk (slow with random io). I'm using MySQL 5.0.67 with MyISAM engine. The zfs recordsize is 8k as recommended on the zfs guide. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
On Fri, Nov 21, 2008 at 03:42:17PM -0800, David Pacheco wrote: Pawel Tecza wrote: But I still don't understand why `zfs list` doesn't display snapshots by default. I saw it in the Net many times at the examples of zfs usage. This was PSARC/2008/469 - excluding snapshot info from 'zfs list' http://opensolaris.org/os/community/on/flag-days/pages/2008091003/ The uncomplete one - where is the '-t all' option? It's really annoying, error prone, time consuming to type stories on the command line ... Does anybody remember the keep it small and simple thing? Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] `zfs list` doesn't show my snapshot
On Fri, Nov 21, 2008 at 9:38 PM, Jens Elkner [EMAIL PROTECTED][EMAIL PROTECTED] wrote: The uncomplete one - where is the '-t all' option? It's really annoying, error prone, time consuming to type stories on the command line ... Does anybody remember the keep it small and simple thing? Regards, jel. -- Otto-von-Guericke University http://www.cs.uni-magdeburg.de/ Department of Computer Science Geb. 29 R 027, Universitaetsplatz 2 39106 Magdeburg, Germany Tel: +49 391 67 12768 __ How is defaulting to output that makes the command unusable to the majority of their customers keeping it simple? Their choice of implementation does leave something to be desired though... I would think it would make more sense to have something like zfs list snapshots, and if you wanted to limit that to a specific pool zfs list snapshots poolname. --TIm ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] RC1 Zfs writes a lot slower when running X
Hi, I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig sata drives in raidz1. With dd, I can write at quasi disk maximum speed of 80meg each for a total of 250meg/s if I have no Xsession at all (only console tty). But as soon as I have an Xsession running, the write speed drops to about 120MB/s. Its even worse if I have a VBoxHeadless running with an idle win2k3 inside. It drops to 30 MB/s. The CPU is at 0% in both cases and nothing is using the array either. I tried to investigate with DTrace without success... Anyone have a clue of what could be going on? Thanks Zerk -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] RC1 Zfs writes a lot slower when running X
On Fri, Nov 21, 2008 at 11:33 PM, zerk [EMAIL PROTECTED] wrote: Hi, I have OpenSolaris on an Amd64 Asus-A8NE with 2gig of Rams and 4x320 gig sata drives in raidz1. With dd, I can write at quasi disk maximum speed of 80meg each for a total of 250meg/s if I have no Xsession at all (only console tty). But as soon as I have an Xsession running, the write speed drops to about 120MB/s. Its even worse if I have a VBoxHeadless running with an idle win2k3 inside. It drops to 30 MB/s. The CPU is at 0% in both cases and nothing is using the array either. I tried to investigate with DTrace without success... Anyone have a clue of what could be going on? Thanks Zerk -- Ya, you're using gobs of ram that was normally being used by zfs for caching. I would venture to guess if you stuck another 2GB ram in there you'd see far less of a *hit* from X or a VM. --Tim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] So close to better, faster, cheaper.... zfs stripe pool survival
Posted for my friend Marko: I've been reading up on ZFS with the idea to build a home NAS. My ideal home NAS would have: - high performance via striping - fault tolerance with selective use of multiple copies attribute - cheap by getting the most efficient space utilization possible (not raidz, not mirroring) - scalability I was hoping to start with 4 1TB disks, in a single striped pool with only some filesystems set to copies=2. I would be able to survive a single disk failure for my data which was on the copies2 filesystem. (trusting that I had enough free space across multiple disks that copies2 writes were not placed on the same physical disk) I could grow this filesystem just by adding single disks. Theoretically, at some point in time I would switch to copies=3 to increase my chances of surviving two disk failures. The block checksums would be a useful in early detection of failed disks. The major snag I discovered is that if a striped pool loses a disk, I can still read and write from the remaining data, but I cannot reboot and remount a partial piece of the stripe, even with -f. For example, if I lost some of my single copies data, I'd like to still access the good data, pop in a new (potentially larger) disk, re cp the important data to have multiple copies rebuilt, and not have to rebuild the entire pool structure. So the feature request would be for zfs to allow selective disk removal from striped pools, with the resultant data loss, but any data that survived, either by chance (living on the remaining disks) or policy (multiple copies) would still be accessible. Is there some underlying reason in zfs that precludes this functionality? If the filesystem partially-survives while the striped pool member disk fails and the box is still up, why not after a reboot? You may never get a good answer to this, so I'll give it to you straight up. ZFS doesn't do this because no business using Sun products wants to do this. Thus nobody at Sun ever made ZFS do this. Maybe you can convince someone at Sun to care about this feature, but I doubt it because it is a pretty fringe use case. In the end you can probably work around this problem, though. Striping doesn't improve performance that much and it doesn't provide that much more space. Next year we'll be using 2TB hard drives, and when you can make a 6TB RAIDZ array with 4 hard drives one year and a 7.5TB one the year after, and put them both in the same pool so it looks like 13.5TB coming from 8 drives that can tolerate 1/4 + 1/4 drives failing, that isn't too shabby. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS fragmentation with MySQL databases
[Default] On Fri, 21 Nov 2008 17:20:48 PST, Vincent Kéravec [EMAIL PROTECTED] wrote: I just try ZFS on one of our slave and got some really bad performance. When I start the server yesterday, it was able to keep up with the main server without problem but after two days of consecutive run the server is crushed by IO. After running the dtrace script iopattern, I notice that the workload is now 100% Random IO. Copying the database (140Go) from one directory to an other took more than 4 hours without any other tasks running on the server, and all the reads on table that where updated where random... Keeping an eye on iopattern and zpool iostat I saw that when the systems was accessing file that have not been changed the disk was reading sequentially at more than 50Mo/s but when reading files that changed often the speed got down to 2-3 Mo/s. Good observation and analysis. The server has plenty of diskplace so it should not have such a level of file fragmentation in such a short time. My explanation would be: Whenever a block within a file changes, zfs has to write it at another location (copy on write), so the previous version isn't immediately lost. Zfs will try to keep the new version of the block close to the original one, but after several changes on the same database page, things get pretty messed up and logical sequential I/O becomes pretty much physically random indeed. The original blocks will eventually be added to the freelist and reused, so proximity can be restored, but it will never be 100% sequential again. The effect is larger when many snapshots are kept, because older block versions are not freed, or when the same block is changed very often and freelist updating has to be postponed. That is the trade-off between always consistent and fast. For information I'm using solaris 10/08 with a mirrored root pool on two 1Tb Sata harddisk (slow with random io). I'm using MySQL 5.0.67 with MyISAM engine. The zfs recordsize is 8k as recommended on the zfs guide. I would suggest to enlarge the MyISAM buffers. The InnoDB engine does copy on write within its data files, so things might be different there. -- ( Kees Nuyt ) c[_] ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss