Re: Bloom Filters

2018-10-11 Thread Jeff King
On Thu, Oct 11, 2018 at 08:33:58AM -0400, Derrick Stolee wrote: > > I don't know if this is a fruitful path at all or not. I was mostly just > > satisfying my own curiosity on the bitmap encoding question. But I'll > > post the patches, just to show my work. The first one is the same > > initial

Re: Bloom Filters

2018-10-11 Thread Derrick Stolee
On 10/9/2018 7:12 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote: Hmph. It really sounds like we could do better with a custom RLE solution. But that makes me feel like I'm missing something, because surely I can't invent something better than the state of the

Re: Bloom Filters

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 05:14:50PM -0400, Jeff King wrote: > Hmph. It really sounds like we could do better with a custom RLE > solution. But that makes me feel like I'm missing something, because > surely I can't invent something better than the state of the art in a > simple thought experiment,

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 03:03:08PM -0400, Derrick Stolee wrote: > > I wonder if Roaring does better here. > > In these sparse cases, usually Roaring will organize the data as "array > chunks" which are simply lists of the values. The thing that makes this > still compressible is that we store

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
On 10/9/2018 2:46 PM, Jeff King wrote: On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: [I snipped all of the parts about bloom filters that seemed entirely reasonable to me ;) ] Imagine we have that list. Is a bloom filter still the best data structure for each commit

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Jeff King
On Tue, Oct 09, 2018 at 09:48:20AM -0400, Derrick Stolee wrote: > [I snipped all of the parts about bloom filters that seemed entirely > reasonable to me ;) ] > > Imagine we have that list. Is a bloom filter still the best data > > structure for each commit? At the

Re: Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Ævar Arnfjörð Bjarmason
On Tue, Oct 09 2018, Derrick Stolee wrote: > The filter needs to store every path that would be considered "not > TREESAME". It can't store wildcards, so you would need to evaluate the > wildcard and test all of those paths individually (not a good idea). If full paths are stored, yes, But

Bloom Filters (was Re: We should add a "git gc --auto" after "git clone" due to commit graph)

2018-10-09 Thread Derrick Stolee
te the wildcard and test all of those paths individually (not a good idea). At least not by itself. If we imagine that the commit-graph also had an alphabetized list of every path in every tree, then it's easy: apply the glob to that list once to get a set of concrete paths, and then query the

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-15 Thread Johannes Schindelin
Hi Kuba, On Mon, 14 May 2018, Jakub Narebski wrote: > [... lots and lots of discussions...] > > All right. > > Here is my [allegedly] improved version, which assumes that we always > want to start from commit with maximum generation number (there may be > more than one such commit). > > Let's

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-14 Thread Jakub Narebski
clones, replace-objects, and grafts So the goal of current v1.0 phase is to introduce generation numbers. use them for better performance ("low hanging fruit"), ensure that it is automatic and safe -- thus useable for an ordinary user. > > Commit-graph v1.1: > * Place commit-grap

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-14 Thread Derrick Stolee
correct behavior with shallow clones, replace-objects, and grafts Commit-graph v1.1: * Place commit-graph storage in the_repository * 'git tag --merged' use generation numbers * 'git log --graph' use generation numbers Commit-graph v1.X: * Protocol v2 verb for sending commit-graph * Bloom filter

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-12 Thread Jakub Narebski
t data (and possibly also indexes data) dominates, and the cost of using reachability indexes is negligible. [3]: https://colab.research.google.com/drive/1V-U7_slu5Z3s5iEEMFKhLXtaxSu5xyzg On the gripping hand the construction of algorithms in those future steps would be, I think, affected by what

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-07 Thread Derrick Stolee
s already present in VSTS (Visual Studio Team Services - the server counterpart of GVFS: Git Virtual File System) at Microsoft: AV> - VSTS adds bloom filters to know which paths have changed on the commit AV> - tree-same check in the bloom filter is fast; speeds up file history checks AV&g

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-05 Thread Jakub Narebski
ld skip straight to the merge base > and keep walking. Another solution that I thought of is to use the same mechanism that commit-graph file uses for storing merges: store Bloom filters for first two parents, and if there are more parents (octopus merge), store Bloom filters for the remain

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-04 Thread Ævar Arnfjörð Bjarmason
On Fri, May 04 2018, Jakub Narebski wrote: (Just off-the cuff here and I'm surely about to be corrected by Derrick...) > * What to do about merge commits, and octopus merges in particular? > Should Bloom filter be stored for each of the parents? How to ensure > fast access then

Re: [RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-04 Thread Ævar Arnfjörð Bjarmason
On Fri, May 04 2018, Jakub Narebski wrote: > With early parts of commit-graph feature (ds/commit-graph and > ds/lazy-load-trees) close to being merged into "master", see > https://public-inbox.org/git/xmqq4ljtz87g@gitster-ct.c.googlers.com/ > I think it would be good idea to think what other

[RFC] Other chunks for commit-graph, part 1 - Bloom filters, topo order, etc.

2018-05-04 Thread Jakub Narebski
or directory was changed in given commit, for queries such as "git log -- " or "git blame ". This is something that according to "Git Merge contributor summit notes" [2] is already present in VSTS (Visual Studio Team Services - the server counterpart of GVFS: Git Virtual Fi

Re: Bloom filters for have/want negotiation

2015-09-13 Thread Michael Haggerty
On 09/12/2015 07:16 AM, Shawn Pearce wrote: > On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty <mhag...@alum.mit.edu> > wrote: >> I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge >> conference [1] in which he describes a scheme for using

Re: Bloom filters for have/want negotiation

2015-09-12 Thread Junio C Hamano
Shawn Pearce writes: > The worst case is due to a bug in the negotiation. With nothing > common, the client just goes on forever until it reaches roots > (something is wrong with MAX_IN_VAIN). We saw 56,318 have lines ... a > 2.6 MiB section. But smart HTTP gzips, so this

Re: Bloom filters for have/want negotiation

2015-09-12 Thread Shawn Pearce
On Sat, Sep 12, 2015 at 12:01 PM, Junio C Hamano wrote: > Shawn Pearce writes: > >> The worst case is due to a bug in the negotiation. With nothing >> common, the client just goes on forever until it reaches roots >> (something is wrong with MAX_IN_VAIN).

Bloom filters for have/want negotiation

2015-09-11 Thread Michael Haggerty
I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge conference [1] in which he describes a scheme for using Bloom filters to make the initial reference advertisement less expensive. In his scheme (if I understand correctly) the client starts off the conversation by passing

Re: Bloom filters for have/want negotiation

2015-09-11 Thread Junio C Hamano
Michael Haggerty writes: > 1. The server advertises the references that it has in the way that it > is currently done. > 2. The client advertises the objects that it has (or some subset of > them; see below) via a Bloom filter. > 3. The server sends the client the packfile

Re: Bloom filters for have/want negotiation

2015-09-11 Thread Shawn Pearce
On Fri, Sep 11, 2015 at 2:13 PM, Michael Haggerty <mhag...@alum.mit.edu> wrote: > I have been thinking about Wilhelm Bierbaum's talk at the last GitMerge > conference [1] in which he describes a scheme for using Bloom filters to > make the initial reference advertisement