On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote: > 3. Resulting on-disk layouts for zstd > > ZSTD (nodict) — datum on‑disk layout > > +----------------------------------+ > | va_header (uint32) | > +----------------------------------+ > | va_tcinfo (uint32) | ← top two bits = 11 (extended) > +----------------------------------+ > | cmp_alg (uint8) | ← (ZSTD_NODICT) > +----------------------------------+ > | compressed bytes … | ← ZSTD frame > +----------------------------------+
This makes sense, yes. You are allocating an extra byte after va_tcinfo that serves as a redirection if the three bits dedicated to the compression method are set. > ZSTD(dict) — datum on‑disk layout > +----------------------------------+ > | va_header (uint32) | > +----------------------------------+ > | va_tcinfo (uint32) | ← top two bits = 11 (extended) > +----------------------------------+ > | cmp_alg (uint8) | ← (ZSTD_DICT) > +----------------------------------+ > | dict_id (uint32) | ← dictionary OID > +----------------------------------+ > | compressed bytes … | ← ZSTD frame > +----------------------------------+ > > I hope this updated design addresses your concerns. I would appreciate > any further feedback you may have. Thanks again for your guidance—it's > been very helpful. That makes sense as well structurally if we include a dictionary for each value. Not sure that we need that much space, for this purpose, though. We are going to need the extra byte anyway AFAIK, so better to start with that. I have been reading 0001 and I'm finding that the integration does not seem to fit much with the existing varatt_external, making the whole result slightly confusing. A simple thing: the last bit that we can use is in varatt_external's va_extinfo, where the patch is using VARATT_4BCE_MASK to track that we need to go beyond varatt_external to know what kind of compression information we should use. This is an important point, and it is not documented around varatt_external which still assumes that the last bit could be used for a compression method. With what you are doing in 0001 (or even 0002), this becomes wrong. Shouldn't we have a new struct portion in varattrib_4b's union for this purpose at least (I don't recall that we rely on varattrib_4b's size which would get larger with this extra byte for the new extended data with the three bits set for the compression are set in va_extinfo, correct me if I'm wrong here). -- Michael
signature.asc
Description: PGP signature