Re: Bloom Filters

2018-10-11 Thread Jeff King
On Thu, Oct 11, 2018 at 08:33:58AM -0400, Derrick Stolee wrote: > > I don't know if this is a fruitful path at all or not. I was mostly just > > satisfying my own curiosity on the bitmap encoding question. But I'll > > post the patches, just to show my work. The first one is the same > > initial

Re: Bloom Filters

2018-10-11 Thread Derrick Stolee
On 10/9/2018 7:12 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote: Hmph. It really sounds like we could do better with a custom RLE solution. But that makes me feel like I'm missing something, because surely I can't invent something better than the state of the

Re: Bloom Filters

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote: > Hmph. It really sounds like we could do better with a custom RLE > solution. But that makes me feel like I'm missing something, because > surely I can't invent something better than the state of the art in a > simple thought experiment,

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 03:03:08PM -0400, Derrick Stolee wrote: > > I wonder if Roaring does better here. > > In these sparse cases, usually Roaring will organize the data as "array > chunks" which are simply lists of the values. The thing that makes this > still compressible is that we store

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
On 10/9/2018 2:46 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: [I snipped all of the parts about bloom filters that seemed entirely reasonable to me ;) ] Imagine we have that list. Is a bloom filter still the best data structure for each commit? At

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: > [I snipped all of the parts about bloom filters that seemed entirely > reasonable to me ;) ] > > Imagine we have that list. Is a bloom filter still the best data > > structure for each commit? At the point that we have the

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Ævar Arnfjörð Bjarmason
On Tue, Oct 09 2018, Derrick Stolee wrote: > The filter needs to store every path that would be considered "not > TREESAME". It can't store wildcards, so you would need to evaluate the > wildcard and test all of those paths individually (not a good idea). If full paths are stored, yes, But

Re: Bloom filters for have/want negotiation

2015-09-13 Thread Michael Haggerty
On 09/12/2015 07:16 AM, Shawn Pearce wrote: > On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty > wrote: >> I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge >> conference [1] in which he describes a scheme for using Bloom filters to >> make the

Re: Bloom filters for have/want negotiation

2015-09-12 Thread Junio C Hamano
Shawn Pearce writes: > The worst case is due to a bug in the negotiation. With nothing > common, the client just goes on forever until it reaches roots > (something is wrong with MAX_IN_VAIN). We saw 56,318 have lines ... a > 2.6 MiB section. But smart HTTP gzips, so this

Re: Bloom filters for have/want negotiation

2015-09-12 Thread Shawn Pearce
On Sat, Sep 12, 2015 at 12:01 PM, Junio C Hamano wrote: > Shawn Pearce writes: > >> The worst case is due to a bug in the negotiation. With nothing >> common, the client just goes on forever until it reaches roots >> (something is wrong with MAX_IN_VAIN).

Re: Bloom filters for have/want negotiation

2015-09-11 Thread Junio C Hamano
Michael Haggerty writes: > 1. The server advertises the references that it has in the way that it > is currently done. > 2. The client advertises the objects that it has (or some subset of > them; see below) via a Bloom filter. > 3. The server sends the client the packfile

Re: Bloom filters for have/want negotiation

2015-09-11 Thread Shawn Pearce
On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty wrote: > I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge > conference [1] in which he describes a scheme for using Bloom filters to > make the initial reference advertisement less expensive. ... > But