Re: [zfs-discuss] Slow zfs writes
Ram Chander wrote: Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS monitoring
On Mon, Feb 11, 2013 at 05:39:27PM +0100, Jim Klimov wrote: On 2013-02-11 17:14, Borja Marcos wrote: On Feb 11, 2013, at 4:56 PM, Tim Cook wrote: The zpool iostat output has all sorts of statistics I think would be useful/interesting to record over time. Yes, thanks :) I think I will add them, I just started with the esoteric ones. Anyway, still there's no better way to read it than running zpool iostat and parsing the output, right? I believe, in this case you'd have to run it as a continuous process and parse the outputs after the first one (overall uptime stat, IIRC). Also note that on problems with ZFS engine itself, zpool may lock up and thus halt your program - so have it ready to abort an outstanding statistics read after a timeout and perhaps log an error. And if pools are imported-exported during work, the zpool iostat output changes dynamically, so you basically need to parse its text structure every time. The zpool iostat -v might be even more interesting though, as it lets you see per-vdev statistics and perhaps notice imbalances, etc... All that said, I don't know if this data isn't also available as some set of kstats - that would probably be a lot better for your cause. Inspect the zpool source to see where it gets its numbers from... and perhaps make and RTI relevant kstats, if they aren't yet there ;) On the other hand, I am not certain how Solaris-based kstats interact or correspond to structures in FreeBSD (or Linux for that matter)?.. I made kstat data available on FreeBSD via 'kstat' sysctl tree: # sysctl kstat -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpyFGpZBBFM1.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zfs writes
On 2013-02-12 10:32, Ian Collins wrote: Ram Chander wrote: Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. Well, that disbalance is there - in the zpool status printout we see raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares - which seems to sum up to 48 ;) Depending on disk size, it might be possible that tlvdev sizes in gigabytes were kept the same (i.e. a raidz set with twice as many disks of half size), but we have no info on this detail and it is unlikely. The disk sets being in one pool, this would still quite disbalance the load among spindles and IO buses. Beside all that - with the older tlvdev's being more full than the newer ones, there is the disbalance which wouldn't be avoided by not mixing vdev sizes - writes into newer ones are more likely to quickly find available holes, while writes into older ones are more fragmented and longer data inspection is needed to find a hole - if not even the gang-block fragmentation. These two are, I believe, the basis for performance drop on full pools, with the measure being rather the mix of IO patterns and fragmentation of data and holes. I think there were developments in illumos ZFS to address more writes onto devices with more available space; I am not sure if the average write latency to a tlvdev was monitored and taken into account during write-targeting decisions (which would also wrap the case of failing devices which take longer to respond). I am not sure which portions nave been completed and integrated into common illumos-gate. As was suggested, you can use zpool iostat -v 5 to monitor IOs to the pool with a fanout per TLVDEV and per disk, and witness possible patterns there. Do keep in mind, however, that for a non-failed raidz set you should see reads from only the data disks for a particular stripe, while parity disks are not used unless a checksum mismatch occurs. On the average data should be on all disks in such a manner that there is no dedicated parity disk, but with small IOs you are likely to notice this. If the budget permits, I'd suggest building (or leasing) another system with balanced disk sets and replicating all data onto it, then repurposing the older system - for example, to be a backup of the newer box (also after remaking the disk layout). As for the question of which files are on the older disks - you can as a rule of thumb use the file creation/modification time in comparison with the date when you expanded the pool ;) Closer inspection could be done with a ZDB walk to print out the DVA block addresses for blocks of a file (the DVA includes the number of the top-level vdev), but that would take some time - to determine which files you want to expect (likely some band of sizes) and then to do these zdb walks. Good luck, //Jim ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] ZFS monitoring
On Feb 12, 2013, at 11:25 AM, Pawel Jakub Dawidek wrote: I made kstat data available on FreeBSD via 'kstat' sysctl tree: Yes, I am using the data. I wasn't sure about how getting something meaningful from it, but I've found the arcstats.pl script and I am using it as a model. Suggestions will be always welcome, though :) (the sample pages I put on devilator.froblua.com aren't using the better organized graphs, though, it's just a crude parameter dump) Borja. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. Freeing unused blocks works perfectly well with fstrim (Linux) consuming an iSCSI zvol served up by oi151a6. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
Darren On 02/12/2013 11:25 AM, Darren J Moffat wrote: On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. Seem to have skipped that one... Are there any related tools e.g. to release all zero blocks or the like? Of course it's up to the admin then to know what all this is about or to wreck the data Thomas ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/12/13 15:07, Thomas Nau wrote: Darren On 02/12/2013 11:25 AM, Darren J Moffat wrote: On 02/10/13 12:01, Koopmann, Jan-Peter wrote: Why should it? Unless you do a shrink on the vmdk and use a zfs variant with scsi unmap support (I believe currently only Nexenta but correct me if I am wrong) the blocks will not be freed, will they? Solaris 11.1 has ZFS with SCSI UNMAP support. Seem to have skipped that one... Are there any related tools e.g. to release all zero blocks or the like? Of course it's up to the admin then to know what all this is about or to wreck the data No tools, ZFS does it automaticaly when freeing blocks when the underlying device advertises the functionality. ZFS ZVOLs shared over COMSTAR advertise SCSI UNMAP as well. -- Darren J Moffat ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
No tools, ZFS does it automaticaly when freeing blocks when the underlying device advertises the functionality. ZFS ZVOLs shared over COMSTAR advertise SCSI UNMAP as well. If a system was running something older, e.g., Solaris 11; the free blocks will not be marked such on the server even after the system upgrades to Solaris 11.1. There might be a way to force that by disabling compression and then create a large file full with NULs and then remove that. But you need to check first that this has some effect before you even try. Casper ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Freeing unused space in thin provisioned zvols
On 02/10/2013 01:01 PM, Koopmann, Jan-Peter wrote: Why should it? I believe currently only Nexenta but correct me if I am wrong The code has been mainlined a while ago, see: https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/io/comstar/lu/stmf_sbd/sbd.c#L3702-L3730 https://github.com/illumos/illumos-gate/blob/master/usr/src/uts/common/fs/zfs/zvol.c#L1697-L1754 Thanks should go to the guys at Nexenta for contributing this to the open-source effort. Cheers, -- Saso ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Slow zfs writes
Jim Klimov wrote: On 2013-02-12 10:32, Ian Collins wrote: Ram Chander wrote: Hi Roy, You are right. So it looks like re-distribution issue. Initially there were two Vdev with 24 disks ( disk 0-23 ) for close to year. After which which we added 24 more disks and created additional vdevs. The initial vdevs are filled up and so write speed declined. Now how to find files that are present in a Vdev or a disk. That way I can remove and re-copy back to distribute data. Any other way to solve this ? The only way is to avoid the problem in the first place by not mixing vdev sizes in a pool. I was a bit quick off the mark there, I didn't notice that some vdevs were older than others. Well, that disbalance is there - in the zpool status printout we see raidz1 top-level vdevs of size 5, 5, 12, 7, 7, 7 disks and some 5 spares - which seems to sum up to 48 ;) The vdev sizes are about (including parity space) 14, 14, 22, 19, 19, 19TB respectively and 127TB total. So even if the data is balanced, the performance of this pool will still start to degrade once ~84TB (about 2/3 full) are used. So the only viable long term solution is a rebuild, or putting bigger drives in the two smallest vdevs. In the short term, when I've had similar issues I used zfs send to copy a large filesystem within the pool then renamed the copy to the original name and deleted the original. This can be repeated until you have an acceptable distribution. One last thing: unless this is some form of backup pool, or the data on it isn't important, avoid raidz vdevs in such a large pool! -- Ian. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss