Re: [zfs-discuss] L2ARC, block based or file based?
mattba...@gmail.com said: > We're looking at buying some additional SSD's for L2ARC (as well as > additional RAM to support the increased L2ARC size) and I'm wondering if we > NEED to plan for them to be large enough to hold the entire file or if ZFS > can cache the most heavily used parts of a single file. > > After watching arcstat (Mike Harsch's updated version) and arc_summary, I'm > still not sure what to make of it. It's rare that the l2arc (14Gb) hits > double digits in %hit whereas the ARC (3Gb) is frequently >80% hit. I'm not sure of the answer to your initial question (file-based vs block-based), but I may have an explanation for the stats you're seeing. We have a system here with 96GB of RAM and also the Sun F20 flash accelerator card (96GB), most of which is used for L2ARC. Note that data is not written into the L2ARC until it is evicted from the ARC (e.g. when something newer or more frequently used needs ARC space). So, my interpretation of the high hit rates on the in-RAM ARC, and low hit rates on the L2ARC, is that the working set of data fits mostly in RAM, and the system seldom needs to go to the L2ARC for more. Regards, Marion ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] L2ARC, block based or file based?
I'm sorry to be asking such a basic question that would seem to be easily found on Google, but after 30 minutes of "googling" and looking through this lists' archives, I haven't found a definitive answer. Is the L2ARC caching scheme based on files or blocks? The reason I ask: We have several databases that are stored in single large files of 500GB or more. So, is L2ARC doing us any good if the entire file can't be cached at once? We're looking at buying some additional SSD's for L2ARC (as well as additional RAM to support the increased L2ARC size) and I'm wondering if we NEED to plan for them to be large enough to hold the entire file or if ZFS can cache the most heavily used parts of a single file. After watching arcstat (Mike Harsch's updated version) and arc_summary, I'm still not sure what to make of it. It's rare that the l2arc (14Gb) hits double digits in %hit whereas the ARC (3Gb) is frequently >80% hit. TIA matt ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] zfs defragmentation via resilvering?
On Thu, 12 Jan 2012, Edward Ned Harvey wrote: Suppose you have a 1G file open, and a snapshot of this file is on disk from a previous point in time. for ( i=0 ; i<1trillion ; i++ ) { seek(random integer in range[0 to 1G]); write(4k); } Something like this would quickly try to write a bunch of separate and scattered 4k blocks at different offsets within the file. Every 32 of these 4k writes would be write-coalesced into a single 128k on-disk block. Sometime later, you read the whole file sequentially such as cp or tar or cat. The first 4k come from this 128k block... The next 4k come from another 128k block... The next 4k come from yet another 128k block... Essentially, the file has become very fragmented and scattered about on the physical disk. Every 4k read results in a random disk seek. Are you talking about some other filesystem or are you talking about zfs? Because zfs does not work like that ... However, I did ignore the additional fragmentation due to using raidz type formats. These break the 128K block into smaller chunks and so there can be more fragmentation. The worst case fragmentation pecentage for 8K blocks (and 512-byte sectors) is 6.25% ((100*1/((8*1024)/512))). You seem to be assuming that reading 512b disk sector and its neighboring 512b sector count as contiguous blocks. And since there are guaranteed to be exactly 256 sectors in every 128k filesystem block, then there is no fragmentation for 256 contiguous sectors, guaranteed. Unfortunately, the 512b sector size is just an arbitrary number (and variable, actually 4k on modern disks), and the resultant percentage of fragmentation is equally arbitrary. Yes, I am saying that zfs writes its data in contiguous chunks (filesystem blocksize in the case of mirrors). To produce a number that actually matters - What you need to do is calculate the percentage of time the disk is able to deliver payload, versus the percentage of time the disk is performing time-wasting "overhead" operations - seek and latency. Yes, latency is the critical factor. That's 944 times larger than the largest 128k block size currently in zfs, and obviously larger still compared to what you mentioned - 4k or 8k recordsizes or 512b disk sectors... Yes, fragmentation is still important even with 128K chunks. I would call that 100% fragmentation, because there are no contiguously aligned sequential blocks on disk anywhere. But again, any measure of "percent fragmentation" is purely arbitrary, unless you know (a) which type I agree that the notion of percent fragmentation is arbitrary. I used one that I invented, and which is based on underlying disk sectors rather than filesystem blocks. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- > boun...@opensolaris.org] On Behalf Of Jim Klimov > > Perhaps I need to specify some usecases more clearly: Actually, I'm not sure you do need to specify usecases more clearly - Because the idea is obviously awesome. The main problem, if you're interested, is getting attention. Maybe it's more work than I know, but I agree with you, at first blush it doesn't sound like much work. I think the most compelling use case you mentioned was ability to resume interrupted zfs send. It's one of those things where it's not super-super useful (most people are content with whatever snapshot and zfs send scheme they already have today) but if it's not much work, then maybe it's worth while anyway. But there's a finite amount of development resource. And other features that are in higher demand (such as BP rewrite, etc). Why would oracle or nexenta care about devoting the effort? Maybe it's possible, maybe there just isn't enough motivation... ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss