Re: [zfs-discuss] Injection of ZFS snapshots into existing data, and replacement of older snapshots with zfs recv without truncating newer ones

2012-01-13 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jim Klimov
 
 Perhaps I need to specify some usecases more clearly:

Actually, I'm not sure you do need to specify usecases more clearly -
Because the idea is obviously awesome.  The main problem, if you're
interested, is getting attention.  Maybe it's more work than I know, but I
agree with you, at first blush it doesn't sound like much work.  

I think the most compelling use case you mentioned was ability to resume
interrupted zfs send.

It's one of those things where it's not super-super useful (most people are
content with whatever snapshot and zfs send scheme they already have today)
but if it's not much work, then maybe it's worth while anyway.  

But there's a finite amount of development resource.  And other features
that are in higher demand (such as BP rewrite, etc).  Why would oracle or
nexenta care about devoting the effort?  Maybe it's possible, maybe there
just isn't enough motivation...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs defragmentation via resilvering?

2012-01-13 Thread Bob Friesenhahn

On Thu, 12 Jan 2012, Edward Ned Harvey wrote:

Suppose you have a 1G file open, and a snapshot of this file is on disk from
a previous point in time.
for ( i=0 ; i1trillion ; i++ ) {
seek(random integer in range[0 to 1G]);
write(4k);
}

Something like this would quickly try to write a bunch of separate and
scattered 4k blocks at different offsets within the file.  Every 32 of these
4k writes would be write-coalesced into a single 128k on-disk block.

Sometime later, you read the whole file sequentially such as cp or tar or
cat.  The first 4k come from this 128k block...  The next 4k come from
another 128k block...  The next 4k come from yet another 128k block...
Essentially, the file has become very fragmented and scattered about on the
physical disk.  Every 4k read results in a random disk seek.


Are you talking about some other filesystem or are you talking about 
zfs?  Because zfs does not work like that ...


However, I did ignore the additional fragmentation due to using raidz 
type formats.  These break the 128K block into smaller chunks and so 
there can be more fragmentation.



The worst case
fragmentation pecentage for 8K blocks (and 512-byte sectors) is 6.25%
((100*1/((8*1024)/512))).


You seem to be assuming that reading 512b disk sector and its neighboring
512b sector count as contiguous blocks.  And since there are guaranteed to
be exactly 256 sectors in every 128k filesystem block, then there is no
fragmentation for 256 contiguous sectors, guaranteed.  Unfortunately, the
512b sector size is just an arbitrary number (and variable, actually 4k on
modern disks), and the resultant percentage of fragmentation is equally
arbitrary.


Yes, I am saying that zfs writes its data in contiguous chunks 
(filesystem blocksize in the case of mirrors).



To produce a number that actually matters - What you need to do is calculate
the percentage of time the disk is able to deliver payload, versus the
percentage of time the disk is performing time-wasting overhead operations
- seek and latency.


Yes, latency is the critical factor.


That's 944 times larger than the largest 128k block size currently in zfs,
and obviously larger still compared to what you mentioned - 4k or 8k
recordsizes or 512b disk sectors...


Yes, fragmentation is still important even with 128K chunks.


I would call that 100% fragmentation, because there are no contiguously
aligned sequential blocks on disk anywhere.  But again, any measure of
percent fragmentation is purely arbitrary, unless you know (a) which type


I agree that the notion of percent fragmentation is arbitrary.  I used 
one that I invented, and which is based on underlying disk sectors 
rather than filesystem blocks.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC, block based or file based?

2012-01-13 Thread Matt Banks
I'm sorry to be asking such a basic question that would seem to be easily found 
on Google, but after 30 minutes of googling and looking through this lists' 
archives, I haven't found a definitive answer.

Is the L2ARC caching scheme based on files or blocks?

The reason I ask: We have several databases that are stored in single large 
files of 500GB or more.

So, is L2ARC doing us any good if the entire file can't be cached at once?

We're looking at buying some additional SSD's for L2ARC (as well as additional 
RAM to support the increased L2ARC size) and I'm wondering if we NEED to plan 
for them to be large enough to hold the entire file or if ZFS can cache the 
most heavily used parts of a single file.

After watching arcstat (Mike Harsch's updated version) and arc_summary, I'm 
still not sure what to make of it. It's rare that the l2arc (14Gb) hits double 
digits in %hit whereas the ARC (3Gb) is frequently 80% hit.


TIA
matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC, block based or file based?

2012-01-13 Thread Marion Hakanson
mattba...@gmail.com said:
 We're looking at buying some additional SSD's for L2ARC (as well as
 additional RAM to support the increased L2ARC size) and I'm wondering if we
 NEED to plan for them to be large enough to hold the entire file or if ZFS
 can cache the most heavily used parts of a single file.
 
 After watching arcstat (Mike Harsch's updated version) and arc_summary, I'm
 still not sure what to make of it. It's rare that the l2arc (14Gb) hits
 double digits in %hit whereas the ARC (3Gb) is frequently 80% hit. 

I'm not sure of the answer to your initial question (file-based vs 
block-based),
but I may have an explanation for the stats you're seeing.  We have a system
here with 96GB of RAM and also the Sun F20 flash accelerator card (96GB),
most of which is used for L2ARC.

Note that data is not written into the L2ARC until it is evicted from the
ARC (e.g. when something newer or more frequently used needs ARC space).
So, my interpretation of the high hit rates on the in-RAM ARC, and low hit
rates on the L2ARC, is that the working set of data fits mostly in RAM,
and the system seldom needs to go to the L2ARC for more.

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss