Re: Premerging topics (was: [RFD] annnotating a pair of commit objects?)

2013-04-23 Thread Johan Herland
On Wed, Apr 10, 2013 at 10:35 PM, Antoine Pelisse apeli...@gmail.com wrote:
 The goal is to propose a structure for storing and pre-merging pairs of 
 commits.

 Data-structure
 ==

 We could use a note ref to store the pre-merge information. Each commit
 would be annotated with a blob containing the list of pre-merges (one
 sha1 per line with sha1 pointing to a merge commit). The commit on the
 other side of a merge would also be annotated.
 The choice of the refname could be done like we do with notes:
 - Have a default value
 - Have a default value configured in config
 - Use a specific value when merging/creating the pre-merges

 Here are my concerns:

 Pros
 
 1. Notes allow dynamic annotation a commit
 2. If we manage to fix 4, we can easily download all pre-merges from a
 remote host by fetching the ref (or clean by deleting the ref).
 3. Conflicts on pre-merge notes could probably be resolved by concatenation.

 Cons
 
 4. Checking connectivity means opening the blob and parsing it

Can you solve this problem with a tree object, instead of inventing a
specially-formatted blob?

I.e. given pre-merge info P for a merge between commits A and B: A is
annotated by a tree object that contains all pre-merges where A is
involved. Each entry in the tree object has a filename and a blob
SHA1; we store other commit involved in this pre-merge (B) in the
filename, and the pre-merge data (P) in the blob SHA1.

Conversely, B is annotated with a tree object containing all pre-merge
entries concerning B, and within there, is an entry called A, pointing
to the same P.

You still need special handling in that you have to know that
refs/notes/pre-merge (or whatever it's called) stores tree objects
instead of blobs, and how to interpret the contents of those trees,
but you'll need such knowledge in any case.

A nice side-effect of using tree objects to store pre-merge entries,
is that you can do a tree-level merge of them, and it'll do the Right
Thing (assuming two-way merge with no common history), i.e. perform a
union merge, and leave you to handle conflicts of individual
pre-merges (i.e. you'll only get conflicts when both sides offer
different pre-merge data for A + B).

 5. Regular notes and pre-merge notes have to be handled separately
 because of 4.

Sort of, but you get that for any automated usage of notes for a
specific purpose. Just look at the notes-cache mechanism in
notes-cache.{h,c}. That's another example of functionality layered on
top of notes that makes assumptions on how its notes trees are
structured.

 I'm hoping we can keep the pros and avoid the cons, but I'm kind of
 stuck here. Help would be really appreciated (or maybe this is a totally
 wrong direction, and I would also like to know ;)

 Merging (Using what we saved)
 =
 The goal is to merge branches J and B using existing pre-merges.

 E0. Create an empty stack S
 E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
 E2. For each commit C in smallest(J..B, B..J), execute E3
 E3. For each premerge P in notes-premerge(C), execute E4
 E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S

I don't think _both_ parents of P can belong to biggest(J..B, B..J).
AFAICS J..B and B..J must always be completely disjoint sets of
commits (this is easier to see when you consider that A..B is
equivalent to B ^A for any commits A and B), and in E2/E3, you have
already made sure that P has a parent in one of them. There is then no
way that the same parent can occur in the other set, so you have at
most _one_ parent in the other set.

 E5. Merge J and B using all pre-merges from S

This is where things get complicated... :)

First there is one important thing that I have not seen a decision on
yet (maybe this was discussed in an earlier thread?):

Given pre-merge data P for commit A and B, does P encode the merge of
the entire history up to A with the entire history up to B, or does it
only encode the merging of the changes introduced in A with the
changes introduced in B? In other words, are we merging snapshots or
diffs?

In the former case, we only need to find the most recent commits A and
B on their respective branches - for which P exists - and then execute
that one P (or at most two Ps, if there is a criss-cross pre-merge
situation). In the other case, however, we need to enumerate all Ps
that apply to the two branches, and find a way to execute them
chronologically, dealing with missing Ps and conflicting Ps along the
way. IMHO, only the former approach seems practically solvable.

So you do not need to enumerate all Ps in J..B vs. B..J, you only need
to find the _most_ _recent_ P, and execute that one.

 Let's consider that |J..B| is smaller than |B..J|.
 E0 is executed only once
 E1 is O(|J..B| + |B..J|)
 E2 is O(|J..B|)
 E3 is O(|J..B| x the average number of pre-merge per commits P_avg)
 E4 is executed for each parent (let's say it's two/constant, after 

Re: Premerging topics (was: [RFD] annnotating a pair of commit objects?)

2013-04-23 Thread Antoine Pelisse
On Tue, Apr 23, 2013 at 8:34 AM, Johan Herland jo...@herland.net wrote:
 On Wed, Apr 10, 2013 at 10:35 PM, Antoine Pelisse apeli...@gmail.com wrote:
 Data-structure
 ==
 We could use a note ref to store the pre-merge information. Each commit
 would be annotated with a blob containing the list of pre-merges (one
 sha1 per line with sha1 pointing to a merge commit). The commit on the
 other side of a merge would also be annotated.
 The choice of the refname could be done like we do with notes:
 - Have a default value
 - Have a default value configured in config
 - Use a specific value when merging/creating the pre-merges
[snipped]
 Cons
 
 4. Checking connectivity means opening the blob and parsing it

 Can you solve this problem with a tree object, instead of inventing a
 specially-formatted blob?

That looks like a good idea.

 I.e. given pre-merge info P for a merge between commits A and B: A is
 annotated by a tree object that contains all pre-merges where A is
 involved. Each entry in the tree object has a filename and a blob
 SHA1; we store other commit involved in this pre-merge (B) in the
 filename, and the pre-merge data (P) in the blob SHA1.

But P is a commit(/merge with two parents), not a blob. Can we have trees
pointing to commits instead of blobs ?

 A nice side-effect of using tree objects to store pre-merge entries,
 is that you can do a tree-level merge of them, and it'll do the Right
 Thing (assuming two-way merge with no common history), i.e. perform a
 union merge, and leave you to handle conflicts of individual
 pre-merges (i.e. you'll only get conflicts when both sides offer
 different pre-merge data for A + B).

I'm definitely not sure what you mean here, especially with tree-level
merge. Also I think we could be doing three-way merges. But I'm
no merge-strategies expert, so I'm kind of confused.

 5. Regular notes and pre-merge notes have to be handled separately
 because of 4.

 Sort of, but you get that for any automated usage of notes for a
 specific purpose. Just look at the notes-cache mechanism in
 notes-cache.{h,c}. That's another example of functionality layered on
 top of notes that makes assumptions on how its notes trees are
 structured.

Thanks, I will have a look at it.

 The goal is to merge branches J and B using existing pre-merges.

 E0. Create an empty stack S
 E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
 E2. For each commit C in smallest(J..B, B..J), execute E3
 E3. For each premerge P in notes-premerge(C), execute E4
 E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S

 I don't think _both_ parents of P can belong to biggest(J..B, B..J).
 AFAICS J..B and B..J must always be completely disjoint sets of
 commits (this is easier to see when you consider that A..B is
 equivalent to B ^A for any commits A and B), and in E2/E3, you have
 already made sure that P has a parent in one of them. There is then no
 way that the same parent can occur in the other set, so you have at
 most _one_ parent in the other set.

I agree with that. After step E3, one of the parent belongs to one of
the two branches. Step E4 makes sure the other parent belongs to the other
branch (and not another unrelated branch).

 E5. Merge J and B using all pre-merges from S

 This is where things get complicated... :)

 First there is one important thing that I have not seen a decision on
 yet (maybe this was discussed in an earlier thread?):

 Given pre-merge data P for commit A and B, does P encode the merge of
 the entire history up to A with the entire history up to B, or does it
 only encode the merging of the changes introduced in A with the
 changes introduced in B? In other words, are we merging snapshots or
 diffs?

 In the former case, we only need to find the most recent commits A and
 B on their respective branches - for which P exists - and then execute
 that one P (or at most two Ps, if there is a criss-cross pre-merge
 situation). In the other case, however, we need to enumerate all Ps
 that apply to the two branches, and find a way to execute them
 chronologically, dealing with missing Ps and conflicting Ps along the
 way. IMHO, only the former approach seems practically solvable.

 So you do not need to enumerate all Ps in J..B vs. B..J, you only need
 to find the _most_ _recent_ P, and execute that one.

Indeed, we only need to know the most recent. That's a good point.

   B1   B2B3
O---o---o-o
|  / \  \
|P1  P2 M
|/| /
 \__o_o___o
   J1   J2  J3

In this use-case, we can use P1 to compute P2, and then we only need P2
to compute the real merge M. The idea would be to use P1 as an implicit
third parent to the pre-merge P2, and then P2 as an implicit third
parent to the real merge M.

 $ git pre-merge topicA topicB topicC

 to find, resolve and store all interactions between the topics.

 Let's leave out octopus merges for now, and only 

Re: Premerging topics (was: [RFD] annnotating a pair of commit objects?)

2013-04-23 Thread Johan Herland
On Tue, Apr 23, 2013 at 4:51 PM, Antoine Pelisse apeli...@gmail.com wrote:
 On Tue, Apr 23, 2013 at 8:34 AM, Johan Herland jo...@herland.net wrote:
 On Wed, Apr 10, 2013 at 10:35 PM, Antoine Pelisse apeli...@gmail.com wrote:
 Data-structure
 ==
 We could use a note ref to store the pre-merge information. Each commit
 would be annotated with a blob containing the list of pre-merges (one
 sha1 per line with sha1 pointing to a merge commit). The commit on the
 other side of a merge would also be annotated.
 The choice of the refname could be done like we do with notes:
 - Have a default value
 - Have a default value configured in config
 - Use a specific value when merging/creating the pre-merges
[snipped]
 Cons
 
 4. Checking connectivity means opening the blob and parsing it

 Can you solve this problem with a tree object, instead of inventing a
 specially-formatted blob?

 That looks like a good idea.

 I.e. given pre-merge info P for a merge between commits A and B: A is
 annotated by a tree object that contains all pre-merges where A is
 involved. Each entry in the tree object has a filename and a blob
 SHA1; we store other commit involved in this pre-merge (B) in the
 filename, and the pre-merge data (P) in the blob SHA1.

 But P is a commit(/merge with two parents), not a blob. Can we have trees
 pointing to commits instead of blobs ?

Sort of. We do so when recording submodules in regular git trees. The
submodule is recorded as a tree entry whose type is commit, name is the
subdir containing the submodule, and SHA1 is the commit ID recorded for
that submodule at this point in history. So it definitely _can_ be done.
That said, I'm not sure whether it's actually a good idea in this case...

I apologize for not having followed the initial discussion. I did not
know that the pre-merge information would be stored as commits.
I figured since it would only contain a partial merge resolution, it
could be represented as something like a tree object containing (only)
the pre-resolved paths...

 A nice side-effect of using tree objects to store pre-merge entries,
 is that you can do a tree-level merge of them, and it'll do the Right
 Thing (assuming two-way merge with no common history), i.e. perform a
 union merge, and leave you to handle conflicts of individual
 pre-merges (i.e. you'll only get conflicts when both sides offer
 different pre-merge data for A + B).

 I'm definitely not sure what you mean here, especially with tree-level
 merge. Also I think we could be doing three-way merges. But I'm
 no merge-strategies expert, so I'm kind of confused.

I was simply thinking of the simple two-way merge that can be done between
two independent trees:
- All paths that exist in only one tree, exist unchanged in the result
- All paths that are identical in both trees exist unchanged in the result
- All paths that exist in both trees, but are different, yield a conflict

In your pre-merge domain, this would become (for given commit A, B, C):
- If one tree records a pre-merge for (A,B) and the other for (A,C), then
  both (A,B) and (A,C) will exist in the result
- If both trees records the _same_ pre-merge resolution for (A,B), then
  that will also exist in the result
- If one tree records one way of pre-merging (A,B), and the other tree
  records a _different_ way of pre-merging (A,B), then there will be a
  conflict that needs resolving.

 5. Regular notes and pre-merge notes have to be handled separately
 because of 4.

 Sort of, but you get that for any automated usage of notes for a
 specific purpose. Just look at the notes-cache mechanism in
 notes-cache.{h,c}. That's another example of functionality layered on
 top of notes that makes assumptions on how its notes trees are
 structured.

 Thanks, I will have a look at it.

 The goal is to merge branches J and B using existing pre-merges.

 E0. Create an empty stack S
 E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
 E2. For each commit C in smallest(J..B, B..J), execute E3
 E3. For each premerge P in notes-premerge(C), execute E4
 E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S

 I don't think _both_ parents of P can belong to biggest(J..B, B..J).
 AFAICS J..B and B..J must always be completely disjoint sets of
 commits (this is easier to see when you consider that A..B is
 equivalent to B ^A for any commits A and B), and in E2/E3, you have
 already made sure that P has a parent in one of them. There is then no
 way that the same parent can occur in the other set, so you have at
 most _one_ parent in the other set.

 I agree with that. After step E3, one of the parent belongs to one of
 the two branches. Step E4 makes sure the other parent belongs to the other
 branch (and not another unrelated branch).

 E5. Merge J and B using all pre-merges from S

 This is where things get complicated... :)

 First there is one important thing that I have not seen a decision on
 yet (maybe this was 

Re: Premerging topics (was: [RFD] annnotating a pair of commit objects?)

2013-04-22 Thread Antoine Pelisse
Any comment on that ? I think anyone using a Topic Workflow could
use that feature and that it would be a nice addition to the project.
Maybe I'm totally wrong in the proposal below (please tell me !), but
there are some unanswered question that prevents me from starting (and
I'd really like this to be discussed before actually starting).

On Wed, Apr 10, 2013 at 10:35 PM, Antoine Pelisse apeli...@gmail.com wrote:

 The goal is to propose a structure for storing and pre-merging pairs of 
 commits.

 Data-structure
 ==

 We could use a note ref to store the pre-merge information. Each commit
 would be annotated with a blob containing the list of pre-merges (one
 sha1 per line with sha1 pointing to a merge commit). The commit on the
 other side of a merge would also be annotated.

 The choice of the refname could be done like we do with notes:
 - Have a default value
 - Have a default value configured in config
 - Use a specific value when merging/creating the pre-merges

 Here are my concerns:

 Pros
 
 1. Notes allow dynamic annotation a commit
 2. If we manage to fix 4, we can easily download all pre-merges from a
 remote host by fetching the ref (or clean by deleting the ref).
 3. Conflicts on pre-merge notes could probably be resolved by concatenation.

 Cons
 
 4. Checking connectivity means opening the blob and parsing it
 5. Regular notes and pre-merge notes have to be handled separately
 because of 4.

 I'm hoping we can keep the pros and avoid the cons, but I'm kind of
 stuck here. Help would be really appreciated (or maybe this is a totally
 wrong direction, and I would also like to know ;)

 Merging (Using what we saved)
 =
 The goal is to merge branches J and B using existing pre-merges.

 E0. Create an empty stack S
 E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
 E2. For each commit C in smallest(J..B, B..J), execute E3
 E3. For each premerge P in notes-premerge(C), execute E4
 E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S
 E5. Merge J and B using all pre-merges from S

 Let's consider that |J..B| is smaller than |B..J|.
 E0 is executed only once
 E1 is O(|J..B| + |B..J|)
 E2 is O(|J..B|)
 E3 is O(|J..B| x the average number of pre-merge per commits P_avg)
 E4 is executed for each parent (let's say it's two/constant, after all
 the topic is pair of commits), so still O(|J..B| x P_avg)
 E5 I don't know (how it can be done, and what would be the resulting
 time complexity)

 So the time cost for steps E0 to E4 is O(|J..B| + |B..J| x P_avg)

 Tools (Save the pre-merges)
 ===

 Of course we need several tools to maintain the list of premerges, and
 to easily compute them. For example, it would be nice to be able to do
 something like:

 $ git pre-merge topicA topicB topicC

 to find, resolve and store all interactions between the topics. We could
 then easily derive to something that would allow to pre-merge a new
 topic with all topics already merged in master..pu (for example).

 Anyway, this task is left for latter.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Premerging topics (was: [RFD] annnotating a pair of commit objects?)

2013-04-10 Thread Antoine Pelisse
The goal is to propose a structure for storing and pre-merging pairs of commits.

Data-structure
==

We could use a note ref to store the pre-merge information. Each commit
would be annotated with a blob containing the list of pre-merges (one
sha1 per line with sha1 pointing to a merge commit). The commit on the
other side of a merge would also be annotated.

The choice of the refname could be done like we do with notes:
- Have a default value
- Have a default value configured in config
- Use a specific value when merging/creating the pre-merges

Here are my concerns:

Pros

1. Notes allow dynamic annotation a commit
2. If we manage to fix 4, we can easily download all pre-merges from a
remote host by fetching the ref (or clean by deleting the ref).
3. Conflicts on pre-merge notes could probably be resolved by concatenation.

Cons

4. Checking connectivity means opening the blob and parsing it
5. Regular notes and pre-merge notes have to be handled separately
because of 4.

I'm hoping we can keep the pros and avoid the cons, but I'm kind of
stuck here. Help would be really appreciated (or maybe this is a totally
wrong direction, and I would also like to know ;)

Merging (Using what we saved)
=
The goal is to merge branches J and B using existing pre-merges.

E0. Create an empty stack S
E1. Create set of commits 'J..B' and 'B..J' (that is probably already done)
E2. For each commit C in smallest(J..B, B..J), execute E3
E3. For each premerge P in notes-premerge(C), execute E4
E4. If one of both parents of P belongs to biggest(J..B, B..J), stack P in S
E5. Merge J and B using all pre-merges from S

Let's consider that |J..B| is smaller than |B..J|.
E0 is executed only once
E1 is O(|J..B| + |B..J|)
E2 is O(|J..B|)
E3 is O(|J..B| x the average number of pre-merge per commits P_avg)
E4 is executed for each parent (let's say it's two/constant, after all
the topic is pair of commits), so still O(|J..B| x P_avg)
E5 I don't know (how it can be done, and what would be the resulting
time complexity)

So the time cost for steps E0 to E4 is O(|J..B| + |B..J| x P_avg)

Tools (Save the pre-merges)
===

Of course we need several tools to maintain the list of premerges, and
to easily compute them. For example, it would be nice to be able to do
something like:

$ git pre-merge topicA topicB topicC

to find, resolve and store all interactions between the topics. We could
then easily derive to something that would allow to pre-merge a new
topic with all topics already merged in master..pu (for example).

Anyway, this task is left for latter.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html