Hi all,

On my oi_148a system I'm now in the process of "evacuating"
data from my "dcpool" (an iSCSI device with a ZFS pool inside),
which is hosted in my physical "pool" on harddisks (6-disk
raidz2). The "dcpool" was configured to dedup all data inside
it, and the volume "pool/dcpool" was compressed as to separate
the two processes. I decided to scrap this experiment, and
now I'm copying back my data by reading files from "dcpool"
and writing it back into compressed+deduped datasets in "pool".

I often see two interesting conditions in this setup:

1) The process is rather slow (I think due to dedup involved -
   even though, by my calculations, the whole DDT can fit in
   my 8Gb RAM), however the kernel processing time often peaks
   out at close to 50%, and there is often quite a bit of idle
   time. I have a dual-core box, so it makes sense to believe
   that some system cycle is not using more than one core.

   Does anyone know if DDT tree walk or search for available
   block ranges in metaslabs or whatever lengthy cycles there
   can be - if any of these are done in a sequential fashion?

   Below is my current DDT sizing. I still do not know which
   value to trust as the DDT entry size in RAM - the one
   returned by MDB or by ZDB (otherwise - what are those
   in-core and on-disk values? I've asked before but got
   no replies...)

# zdb -D -e 1601233584937321596
DDT-sha256-zap-ditto: 68 entries, size 1807 on disk, 240 in core
DDT-sha256-zap-duplicate: 1970815 entries, size 1134 on disk, 183 in core
DDT-sha256-zap-unique: 4376290 entries, size 1158 on disk, 187 in core

dedup = 1.38, compress = 1.07, copies = 1.01, dedup * compress / copies = 1.46

# zdb -D -e dcpool
DDT-sha256-zap-ditto: 388 entries, size 380 on disk, 200 in core
DDT-sha256-zap-duplicate: 5421787 entries, size 311 on disk, 176 in core
DDT-sha256-zap-unique: 16841361 entries, size 284 on disk, 145 in core

dedup = 1.34, compress = 1.00, copies = 1.00, dedup * compress / copies = 1.34

# echo ::sizeof ddt_entry_t | mdb -k
sizeof (ddt_entry_t) = 0x178

   Since I'm writing to "pool" (queried by GUID number above),
   my box's performance primarily depends on its DDT - I guess.
   In worst case that's 6.4mil entries times 376 bytes = 2.4Gb,
   which is well below my computer's 8Gb RAM (and fits the ARC
   metadata report below).

   However the "dcpool"'s current DDT is clearly big, about
   23mil entries * 376 bytes = 8.6Gb.

2) As seen below, the ARC including metadata currently takes up 3.7Gb.
   According to prstat, all of the global zone processes use 180Mb.
   ZFS is the only filesystem on this box.
   So the second question is: Who uses the other 4Gb of system RAM?

   This picture occurs consistently on every system uptime, as long
   as I use the pool for reading and/or writing extensively, and it
   seems that this is some sort of kernel buffering or workspace
   memory or whatever (cached metaslab allocation tables, maybe?),
   and it is not part of ARC - but it is even bigger.

   What is it? Can it be controlled (as to not decrease performance
   when ARC and/or DDT need more RAM) or at least queried?

# ./tuning/arc_summary.pl | egrep -v 'mdb|set zfs:' | head -18 | grep ": "; echo ::arc | mdb -k | grep meta_
         Physical RAM:  8183 MB
         Free Memory :  993 MB
         LotsFree:      127 MB
         Current Size:             3705 MB (arcsize)
         Target Size (Adaptive):   3705 MB (c)
         Min Size (Hard Limit):    3072 MB (zfs_arc_min)
         Max Size (Hard Limit):    6656 MB (zfs_arc_max)
         Most Recently Used Cache Size:          90%    3342 MB (p)
         Most Frequently Used Cache Size:         9%    362 MB (c-p)
arc_meta_used             =      2617 MB
arc_meta_limit            =      6144 MB
arc_meta_max              =      4787 MB

Thanks for any insights,
//Jim Klimov

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to