On Dec 20, 2010, at 2:16 PM, Jeff Johnson wrote:
>
> But remember there's a 16-32x overkill built into the original Bloom filter
> parameters,
> so its pretty clear a Bloom filter is gonna start "winning" soon, based
> solely on minimum
> size criteria, compared to a string representation.
>
> Next will be to measure specific per-package parameterized Bloom filters with
> the original e=0.0001
> false positive rate.
>
Here's some slightly adjusted output to carry around per-package Bloom filters:
[...@rhel6 lib]$ ./tbf
npkgs: 1221
Dirnames: 722712 bytes (20886 items)
with XZDIO: 62416 bytes
Bloom filter: false positives: 0.0001
Uncompressed: 62204 bytes
with XZDIO: 59612 bytes
A per-packge Bloom filter has 16 extra bytes of overhead:
"BFL3" magic as in "Bloom filter using lookup3"
n
m
k
followed by the bytes in the bit map.
About the only subtlety was ensuring that n >= 10 to avoid rpmbdParams()
from resetting n = 10000.
Since the XZ compressed output (including useful overhead) is already smaller
than
the simplistic (i.e. no markup overhead), I'll leave the rest to perl-URPM to
figure out.
Note that _ALL_ file paths and versionless Provides: can be represented
in Bloom filters, and will often (because of the assumption n >= 10)
fit within the per-package Bloom filters I just described.
Note also that 16b of per-packge overhead is overkill as well. One could
easily cut the overhead in half simply by changing to "B3" for magic and
writing uint16_t rather than uint32_t ... hang on ... here's that result:
[...@rhel6 lib]$ ./tbf
npkgs: 1221
Dirnames: 722712 bytes (20886 items)
with XZDIO: 62416 bytes
Bloom filter: false positives: 0.0001
Uncompressed: 62204 bytes
with XZDIO: 59328 bytes
Hmmm hardly enough to worry about (assuming I got the lib/tbf.c code right).
For that matter, compression could likely be _ENTIRELY_ skipped because the
"Uncompressed:"
no. of bytes (which is only the bitmaps, no per-package overhead) is approx.
the same as
the XZ compressed size (there is some modest saving because of XZ compression).
hth
73 de Jeff
______________________________________________________________________
RPM Package Manager http://rpm5.org
Developer Communication List [email protected]