Re: [zfs-discuss] how l2arc works?

2009-07-02 Thread Zhu, Lejun
For now L2ARC will have to be warmed up every time a reboot happens. See 
6662467.


From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Joseph Mocker
Sent: 2009年7月3日 7:51
To: zfs-discuss@opensolaris.org
Subject: [zfs-discuss] how l2arc works?

Hello,

I was wondering if someone could point me to any information describing how the 
l2arc works?

I attached an SSD as a cache device to the root pool of a 2009.11 system. 
Although the cache has started filling up (zpool iostat -v) it just seems that 
when I do a reboot, I hear quite a bit of disk activity. I was hoping, in the 
best case, that most of the disk access that was needed to boot would have been 
served though the SSD.

Does the cache fill only on writes? And what is the cache replacement policy 
if/when the cache becomes full?

I did notice some information regarding how limiting the speed at which the 
l2arc populates, and I think bug 6748030 discusses a turbo warmup.

Thanks for any information.

  --joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS write I/O stalls

2009-07-01 Thread Zhu, Lejun
Actually it seems to be 3/4:

dsl_pool.c
391 zfs_write_limit_max = ptob(physmem)  
zfs_write_limit_shift;
392 zfs_write_limit_inflated = MAX(zfs_write_limit_min,
393 spa_get_asize(dp-dp_spa, zfs_write_limit_max));

While spa_get_asize is:

spa_misc.c
   1249 uint64_t
   1250 spa_get_asize(spa_t *spa, uint64_t lsize)
   1251 {
   1252 /*
   1253  * For now, the worst case is 512-byte RAID-Z blocks, in which
   1254  * case the space requirement is exactly 2x; so just assume 
that.
   1255  * Add to this the fact that we can have up to 3 DVAs per bp, 
and
   1256  * we have to multiply by a total of 6x.
   1257  */
   1258 return (lsize * 6);
   1259 }

Which will result in:
   zfs_write_limit_inflated = MAX((32  20), (ptob(physmem)  3) * 6);

Bob Friesenhahn wrote:
 Even if I set zfs_write_limit_override to 8053063680 I am unable to
 achieve the massive writes that Solaris 10 (141415-03) sends to my
 drive array by default.
 
 When I read the blog entry at
 http://blogs.sun.com/roch/entry/the_new_zfs_write_throttle, I see this
 statement:
 
 The new code keeps track of the amount of data accepted in a TXG and
 the time it takes to sync. It dynamically adjusts that amount so that
 each TXG sync takes about 5 seconds (txg_time variable). It also
 clamps the limit to no more than 1/8th of physical memory.
 
 On my system I see that the about 5 seconds rule is being followed,
 but see no sign of clamping the limit to no more than 1/8th of
 physical memory.  There is no sign of clamping at all.  The writen
 data is captured and does take about 5 seconds to write (good
 estimate).
 
 On my system with 20GB of RAM, and ARC memory limit set to 10GB
 (zfs:zfs_arc_max = 0x28000), the maximum zfs_write_limit_override
 value I can set is on the order of 8053063680, yet this results in a
 much smaller amount of data being written per write cycle than the
 Solaris 10 default operation.  The default operation is 24 seconds of
 no write activity followed by 5 seconds of write.
 
 On my system, 1/8 of memory would be 2.5GB.  If I set the
 zfs_write_limit_override value to 2684354560 then it seems that about
 1.2 seconds of data is captured for write.  In this case I see 5
 seconds of no write followed by maybe a second of write.
 
 This causes me to believe that the algorithm is not implemented as
 described in Solaris 10.
 
 Bob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss