Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Jeff King
On Wed, Apr 04, 2018 at 12:17:06AM +0100, Ramsay Jones wrote:

> >> Is there any reason to believe this would be too small of a value in the
> >> future?  Or is a 32 bit unsigned good enough?
> > 
> > The linux kernel took ~10 years to produce 500k commits. Even assuming
> > those were all linear (and they're not), that gives us ~80,000 years of
> > leeway. So even if the pace of development speeds up or we have a
> > quicker project, it still seems we have a pretty reasonable safety
> > margin.
> 
> I didn't read the patches closely, but isn't it ~20,000 years?
> 
> Given that '#define GENERATION_NUMBER_MAX 0x3FFF', that is. ;-)

What, I'm supposed to read the patches before responding? Heresy.

-Peff


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Ramsay Jones


On 03/04/18 19:28, Jeff King wrote:
> On Tue, Apr 03, 2018 at 11:05:36AM -0700, Brandon Williams wrote:
> 
>> On 04/03, Derrick Stolee wrote:
>>> The generation number of a commit is defined recursively as follows:
>>>
>>> * If a commit A has no parents, then the generation number of A is one.
>>> * If a commit A has parents, then the generation number of A is one
>>>   more than the maximum generation number among the parents of A.
>>>
>>> Add a uint32_t generation field to struct commit so we can pass this
>>
>> Is there any reason to believe this would be too small of a value in the
>> future?  Or is a 32 bit unsigned good enough?
> 
> The linux kernel took ~10 years to produce 500k commits. Even assuming
> those were all linear (and they're not), that gives us ~80,000 years of
> leeway. So even if the pace of development speeds up or we have a
> quicker project, it still seems we have a pretty reasonable safety
> margin.

I didn't read the patches closely, but isn't it ~20,000 years?

Given that '#define GENERATION_NUMBER_MAX 0x3FFF', that is. ;-)

ATB,
Ramsay Jones




Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Stefan Beller
On Tue, Apr 3, 2018 at 11:28 AM, Jeff King  wrote:
> On Tue, Apr 03, 2018 at 11:05:36AM -0700, Brandon Williams wrote:
>
>> On 04/03, Derrick Stolee wrote:
>> > The generation number of a commit is defined recursively as follows:
>> >
>> > * If a commit A has no parents, then the generation number of A is one.
>> > * If a commit A has parents, then the generation number of A is one
>> >   more than the maximum generation number among the parents of A.
>> >
>> > Add a uint32_t generation field to struct commit so we can pass this
>>
>> Is there any reason to believe this would be too small of a value in the
>> future?  Or is a 32 bit unsigned good enough?
>
> The linux kernel took ~10 years to produce 500k commits. Even assuming
> those were all linear (and they're not),

... which you meant in terms of DAG, where a linear history is the worst case
for generation numbers.

I first read it the other way round, as the best case w.r.t. timing

~/linux$ git log --oneline |wc -l
721223
$ git log --oneline --since 2012 |wc -l
421853
$ git log --oneline --since 2011 |wc -l
477155

The number of commits is growing exponentially, though the exponential
part is very small and the YoY growth can be estimated using linear
interpolation.

In linux, the release is a natural synchronization point IIUC as well
as on a regular schedule. So an interesting question to ask there would
be whether the delta in generation number goes up over time, or if the
DAG just gets wider (=more parallel)

> that gives us ~80,000 years of
> leeway. So even if the pace of development speeds up or we have a
> quicker project, it still seems we have a pretty reasonable safety
> margin.

Thanks for the estimate.
Stefan


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Brandon Williams
On 04/03, Jeff King wrote:
> On Tue, Apr 03, 2018 at 11:05:36AM -0700, Brandon Williams wrote:
> 
> > On 04/03, Derrick Stolee wrote:
> > > The generation number of a commit is defined recursively as follows:
> > > 
> > > * If a commit A has no parents, then the generation number of A is one.
> > > * If a commit A has parents, then the generation number of A is one
> > >   more than the maximum generation number among the parents of A.
> > > 
> > > Add a uint32_t generation field to struct commit so we can pass this
> > 
> > Is there any reason to believe this would be too small of a value in the
> > future?  Or is a 32 bit unsigned good enough?
> 
> The linux kernel took ~10 years to produce 500k commits. Even assuming
> those were all linear (and they're not), that gives us ~80,000 years of
> leeway. So even if the pace of development speeds up or we have a
> quicker project, it still seems we have a pretty reasonable safety
> margin.
> 
> -Peff

I figured as much, but just wanted to check since the windows folks
seems to produce commits pretty quickly.

-- 
Brandon Williams


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Derrick Stolee

On 4/3/2018 2:28 PM, Jeff King wrote:

On Tue, Apr 03, 2018 at 11:05:36AM -0700, Brandon Williams wrote:


On 04/03, Derrick Stolee wrote:

The generation number of a commit is defined recursively as follows:

* If a commit A has no parents, then the generation number of A is one.
* If a commit A has parents, then the generation number of A is one
   more than the maximum generation number among the parents of A.

Add a uint32_t generation field to struct commit so we can pass this

Is there any reason to believe this would be too small of a value in the
future?  Or is a 32 bit unsigned good enough?

The linux kernel took ~10 years to produce 500k commits. Even assuming
those were all linear (and they're not), that gives us ~80,000 years of
leeway. So even if the pace of development speeds up or we have a
quicker project, it still seems we have a pretty reasonable safety
margin.


That, and larger projects do not have linear histories. Despite having 
almost 2 million reachable commits, the Windows repository has maximum 
generation number ~100,000.


Thanks,
-Stolee


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Jeff King
On Tue, Apr 03, 2018 at 11:05:36AM -0700, Brandon Williams wrote:

> On 04/03, Derrick Stolee wrote:
> > The generation number of a commit is defined recursively as follows:
> > 
> > * If a commit A has no parents, then the generation number of A is one.
> > * If a commit A has parents, then the generation number of A is one
> >   more than the maximum generation number among the parents of A.
> > 
> > Add a uint32_t generation field to struct commit so we can pass this
> 
> Is there any reason to believe this would be too small of a value in the
> future?  Or is a 32 bit unsigned good enough?

The linux kernel took ~10 years to produce 500k commits. Even assuming
those were all linear (and they're not), that gives us ~80,000 years of
leeway. So even if the pace of development speeds up or we have a
quicker project, it still seems we have a pretty reasonable safety
margin.

-Peff


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Jonathan Tan
On Tue,  3 Apr 2018 12:51:39 -0400
Derrick Stolee  wrote:

> The generation number of a commit is defined recursively as follows:
> 
> * If a commit A has no parents, then the generation number of A is one.
> * If a commit A has parents, then the generation number of A is one
>   more than the maximum generation number among the parents of A.
> 
> Add a uint32_t generation field to struct commit so we can pass this
> information to revision walks. We use two special values to signal
> the generation number is invalid:
> 
> GENERATION_NUMBER_UNDEF 0x
> GENERATION_NUMBER_NONE 0
> 
> The first (_UNDEF) means the generation number has not been loaded or
> computed. The second (_NONE) means the generation number was loaded
> from a commit graph file that was stored before generation numbers
> were computed.
> 
> Signed-off-by: Derrick Stolee 

This looks straightforward and correct, thanks. I think some of the
description above should appear as code comments.

> +#define GENERATION_NUMBER_UNDEF 0x
> +#define GENERATION_NUMBER_NONE 0

I would include the description above here as documentation, and would
replace "was stored before generation numbers were computed" by "was
written by a version of Git that did not support generation numbers".


Re: [PATCH 2/6] commit: add generation number to struct commmit

2018-04-03 Thread Brandon Williams
On 04/03, Derrick Stolee wrote:
> The generation number of a commit is defined recursively as follows:
> 
> * If a commit A has no parents, then the generation number of A is one.
> * If a commit A has parents, then the generation number of A is one
>   more than the maximum generation number among the parents of A.
> 
> Add a uint32_t generation field to struct commit so we can pass this

Is there any reason to believe this would be too small of a value in the
future?  Or is a 32 bit unsigned good enough?

> information to revision walks. We use two special values to signal
> the generation number is invalid:
> 
> GENERATION_NUMBER_UNDEF 0x
> GENERATION_NUMBER_NONE 0
> 
> The first (_UNDEF) means the generation number has not been loaded or
> computed. The second (_NONE) means the generation number was loaded
> from a commit graph file that was stored before generation numbers
> were computed.
> 
> Signed-off-by: Derrick Stolee 
> ---
>  alloc.c| 1 +
>  commit-graph.c | 2 ++
>  commit.h   | 3 +++
>  3 files changed, 6 insertions(+)
> 
> diff --git a/alloc.c b/alloc.c
> index cf4f8b61e1..1a62e85ac3 100644
> --- a/alloc.c
> +++ b/alloc.c
> @@ -94,6 +94,7 @@ void *alloc_commit_node(void)
>   c->object.type = OBJ_COMMIT;
>   c->index = alloc_commit_index();
>   c->graph_pos = COMMIT_NOT_FROM_GRAPH;
> + c->generation = GENERATION_NUMBER_UNDEF;
>   return c;
>  }
>  
> diff --git a/commit-graph.c b/commit-graph.c
> index 1fc63d541b..d24b947525 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -264,6 +264,8 @@ static int fill_commit_in_graph(struct commit *item, 
> struct commit_graph *g, uin
>   date_low = get_be32(commit_data + g->hash_len + 12);
>   item->date = (timestamp_t)((date_high << 32) | date_low);
>  
> + item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
> +
>   pptr = &item->parents;
>  
>   edge_value = get_be32(commit_data + g->hash_len);
> diff --git a/commit.h b/commit.h
> index e57ae4b583..3cadd386f3 100644
> --- a/commit.h
> +++ b/commit.h
> @@ -10,6 +10,8 @@
>  #include "pretty.h"
>  
>  #define COMMIT_NOT_FROM_GRAPH 0x
> +#define GENERATION_NUMBER_UNDEF 0x
> +#define GENERATION_NUMBER_NONE 0
>  
>  struct commit_list {
>   struct commit *item;
> @@ -24,6 +26,7 @@ struct commit {
>   struct commit_list *parents;
>   struct tree *tree;
>   uint32_t graph_pos;
> + uint32_t generation;
>  };
>  
>  extern int save_commit_buffer;
> -- 
> 2.17.0.rc0
> 

-- 
Brandon Williams