On Apr 27, 2011, at 9:26 PM, Edward Ned Harvey <opensolarisisdeadlongliveopensola...@nedharvey.com> wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- >> boun...@opensolaris.org] On Behalf Of Neil Perrin >> >> No, that's not true. The DDT is just like any other ZFS metadata and can > be >> split over the ARC, >> cache device (L2ARC) and the main pool devices. An infrequently referenced >> DDT block will get >> evicted from the ARC to the L2ARC then evicted from the L2ARC. > > When somebody has their "baseline" system, and they're thinking about adding > dedup and/or cache, I'd like to understand the effect of not having enough > ram. Obviously the impact will be performance, but precisely... Pecision is only possible if you know what the data looks like... > At bootup, I presume the arc & l2arc are all empty. So all the DDT entries > reside in pool. As the system reads things (anything, files etc) from pool, > it will populate arc, and follow fill rate policies to populate the l2arc > over time. Every entry in l2arc requires 200 bytes of arc, regardless of > what type of entry it is. (A DDT entry in l2arc consumes just as much arc > memory as any other type of l2arc entry.) (Ummm... What's the point of > that? Aren't DDT entries 270 bytes and ARC references 200 bytes? No. The DDT entries vary in size. > Seems > like a very questionable benefit to allow DDT entries to get evicted into > L2ARC.) So the ram consumption caused by the presence of l2arc will > initially be zero after bootup, and it will grow over time as the l2arc > populates, up to a maximum which is determined linearly as 200 bytes * the > number of entries that can fit in the l2arc. Of course that number varies > based on the size of each entry and size of l2arc, but at least you can > estimate and establish upper and lower bounds. The upper and lower bounds vary by 256x, unless you know what the data looks like more precisely. > So that's how the l2arc consumes system memory in arc. The penalty of > insufficient ram, in conjunction with enabled L2ARC, is insufficient arc > availability for other purposes - Maybe the whole arc is consumed by l2arc > entries, and so the arc doesn't have any room for other stuff like commonly > used files. I've never seen this. > Worse yet, your arc consumption could be so large, that > PROCESSES don't fit in ram anymore. In this case, your processes get pushed > out to swap space, which is really bad. [for Solaris, illumos, and NexentaOS] This will not happen unless the ARC size is at arc_min. At that point you are already close to severe memory shortfall. > Correct me if I'm wrong, but the dedup sha256 checksum happens in addition > to (not instead of) the fletcher2 integrity checksum. You are mistaken. > So after bootup, > while the system is reading a bunch of data from the pool, all those reads > are not populating the arc/l2arc with DDT entries. Reads are just > populating the arc and l2arc with other stuff. L2ARC is populated by a separate thread that watches the to-be-evicted list. The L2ARC fill rate is also throttled, so that under severe shortfall, blocks will be evicted without being placed in the L2ARC. > DDT entries don't get into the arc/l2arc until something tries to do a > write. No, the DDT entry contains the references to the actual data. > When performing a write, dedup calculates the checksum of the block > to be written, and then it needs to figure out if that's a duplicate of > another block that's already on disk somewhere. So (I guess this part) > there's probably a tree-structure (I'll use the subdirectories and files > analogy even though I'm certain that's not technically correct) on disk. Implemented as an AVL tree. > You need to find the DDT entry, if it exists, for the block whose checksum > is 1234ABCD. So you start by looking under the 1 directory, and from there > look for the 2 subdirectory, and then the 3 subdirectory, [...etc...] If you > encounter "not found" at any step, then the DDT entry doesn't already exist > and you decide to create a new one. But if you get all the way down to the > C subdirectory and it contains a file named "D," then you have found a > possible dedup hit - the checksum matched another block that's already on > disk. Now the DDT entry is stored in ARC just like anything else you read > from disk. DDT is metadata, not data, so it is more constrained than data entries in the ARC. > So the point is - Whenever you do a write, and the calculated DDT is not > already in ARC/L2ARC, the system will actually perform several small reads > looking for the DDT entry before it finally knows that the DDT entry > actually exists. So the penalty of performing a write, with dedup enabled, > and the relevant DDT entry not already in ARC/L2ARC is a very large penalty. > What originated as a single write quickly became several small reads plus a > write, due to the fact the necessary DDT entry was not already available. > > The penalty of insufficient ram, in conjunction with dedup, is terrible > write performance. > > _______________________________________________ > zfs-discuss mailing list > zfs-discuss@opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss