RE: Rename conflicts in the index

2013-04-02 Thread Edward Thomson
Edward Thomson [ethom...@microsoft.com] wrote:
 Junio C Hamano [mailto:gis...@pobox.com] wrote:
   * Path A may have only stage #1, while path B and C has only stage
 #2 and stage #3 (the user would have to notice these three
 correspond to each other, and resolve manually).
 
 You would want to annotate B at stage #2 seems to have been at A
 in the original (similarly for C#3) if you choose to do so.
 
 If we're going to make changes to the way conflicts are recorded in the main
 index, then I would prefer this approach.  It is unambiguous and all data 
 about
 all sides are recorded, including the names that items had in their respective
 branches.

Junio, did you have additional thoughts on this?

What would you like from me to proceed?  If the aforementioned seems
reasonable, I can update Documentation/technical/index-format.txt and
we can iron out the details in that fashion?

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-04-02 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 Edward Thomson [ethom...@microsoft.com] wrote:
 Junio C Hamano [mailto:gis...@pobox.com] wrote:
   * Path A may have only stage #1, while path B and C has only stage
 #2 and stage #3 (the user would have to notice these three
 correspond to each other, and resolve manually).
 
 You would want to annotate B at stage #2 seems to have been at A
 in the original (similarly for C#3) if you choose to do so.
 
 If we're going to make changes to the way conflicts are recorded in the main
 index, then I would prefer this approach.  It is unambiguous and all data 
 about
 all sides are recorded, including the names that items had in their 
 respective
 branches.

 Junio, did you have additional thoughts on this?

Not at this moment.

I think we have covered the principles (do not unnecessarily
duplicate information, do not break existing implementations
unnecessarily, etc.) already, and we know how we want to record one
side renamed A to B, the other side renamed A to C case, but I do
not think the discussion covered all cases yet.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-04-02 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  Junio, did you have additional thoughts on this?
 
 Not at this moment.
 
 I think we have covered the principles (do not unnecessarily duplicate
 information, do not break existing implementations unnecessarily, etc.) 
 already,
 and we know how we want to record one side renamed A to B, the other side
 renamed A to C case, but I do not think the discussion covered all cases yet.

Sorry, I'm not sure what you're asking for - would you just like some more
examples of what this looks like with aforementioned exotic conflict types?
Or are you looking for something more strict - BNF format, for example?

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-04-02 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  Junio, did you have additional thoughts on this?
 
 Not at this moment.
 
 I think we have covered the principles (do not unnecessarily duplicate
 information, do not break existing implementations unnecessarily, etc.) 
 already,
 and we know how we want to record one side renamed A to B, the other side
 renamed A to C case, but I do not think the discussion covered all cases 
 yet.

 Sorry, I'm not sure what you're asking for - would you just like some more
 examples of what this looks like with aforementioned exotic conflict types?
 Or are you looking for something more strict - BNF format, for example?

Ehh, I wasn't asking for anything ;-)

You asked if I had any additional thoughts, I answered there is
nothing at this moment based on what I saw so far.  It is not my
immediate itch to update the index with more rename information, but
it is yours, so I would imagine you would know what cases you would
want to improve the end user experience better than I do ;-).

If I were solving the issue, I would probably proceed like this:

 * Start from a rough sketch of what extra information I would want
   to store in the new index extension section.

 * Teach read-cache.c to read from the new extension and keep it in
   an in-core data structure, and read from the in-core data
   structure and seriealize it to write to the extension section.

 * Perhaps enhance update-index so that it can read textual
   representation of the contents of the new extension section, turn
   it into the in-core representation, so that it can write it out
   to the index file, as a debugging/development aid.

 * Teach read-cache.c to read from the new extension and keep it in
   an in-core data structure.

 * Teach wt-status.c to read from that in-core data structure and
   improve the presentation of the cases I care about using that
   information.  Use the update-index development aid to prepare
   various cases you care about.

- If the kind of information that is stored in the new extension
  turns out to be insufficient, go back to the beginning and
  iterate.

- If the use the in-core data structure here turns out to be
  awkward, go back one step and iterate.

- As I cover one more case, I would add a test to the test suite
  so that we would know what cases are covered and what the
  expected end-user presentation should be.

 * Once the result of the above covers all the cases I care about,
   then update merge-recursive.c to prepare the in-core data
   structure to be written out as the extension section.

As I iterate, the rough sketch will hopefully cover all the cases I
care about and I'll be ready to write them down as an update to the
document somewhere in Documentation/technical/api-*.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-04-02 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 If I were solving the issue, I would probably proceed like this:

  * Start from a rough sketch of what extra information I would want
to store in the new index extension section.

  * Teach read-cache.c to read from the new extension and keep it in
an in-core data structure, and read from the in-core data
structure and seriealize it to write to the extension section.

  * Perhaps enhance update-index so that it can read textual
representation of the contents of the new extension section, turn
it into the in-core representation, so that it can write it out
to the index file, as a debugging/development aid.

  * Teach read-cache.c to read from the new extension and keep it in
an in-core data structure.

Sorry, this is a dup of the second one.  Please ignore.

Also, all you in this section should read I (because this is a
description of If I were solving it).

  * Teach wt-status.c to read from that in-core data structure and
improve the presentation of the cases I care about using that
information.  Use the update-index development aid to prepare
various cases you care about.

 - If the kind of information that is stored in the new extension
   turns out to be insufficient, go back to the beginning and
   iterate.

 - If the use the in-core data structure here turns out to be
   awkward, go back one step and iterate.

 - As I cover one more case, I would add a test to the test suite
   so that we would know what cases are covered and what the
   expected end-user presentation should be.

  * Once the result of the above covers all the cases I care about,
then update merge-recursive.c to prepare the in-core data
structure to be written out as the extension section.

 As I iterate, the rough sketch will hopefully cover all the cases I
 care about and I'll be ready to write them down as an update to the
 document somewhere in Documentation/technical/api-*.

 Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-04-01 Thread Edward Thomson
Junio C Hamano [mailto:gis...@pobox.com] wrote:
 As long as
 the format will be backward compatible to allow existing users use existing 
 tools
 to deal with cases the existing tools can handle, then that is OK.  I didn't 
 get that
 impression which is where my non starter came from.

I see now.  Thank you for the clarification.  I apologize if I was not clear
about this; indeed, the duplication of data in my proposed extension was
specifically to avoid any compatibility problems amongst clients.

In particular, when we have a rename in ours, edit in theirs conflict, we
store the conflict at the new (ours) path.  If, for example, I rename a-b
in my branch and merge a branch that edits a:

mode hash 1 b
mode hash 2 b
mode hash 3 b

This prohibits us from storing anything else in the theirs side at that
path, so if I were to have added b in their branch in addition to modifying
b, I cannot record it.

I was assuming that any change to this behavior would be a breaking one,
which is where the new section came from.

  * Path A may have only stage #1, while path B and C has only stage
#2 and stage #3 (the user would have to notice these three
correspond to each other, and resolve manually).
 
You would want to annotate B at stage #2 seems to have been at A
in the original (similarly for C#3) if you choose to do so.

If we're going to make changes to the way conflicts are recorded in the
main index, then I would prefer this approach.  It is unambiguous and all
data about all sides are recorded, including the names that items had in
their respective branches.

I would think that this might be a burden on current tools, however.
Now if I rename a-b my just my branch, my conflict will be recorded as:

mode hash 1 a
mode hash 2 b
mode hash 3 a

And current git-status will not look at any rename annotations to know
how to report this.

However, maybe this is not as big a problem as I'm concerned it would be.

  * You can choose to favor our choice, and have path B with three
stages (if we guessed wrong and the user wants to move it to C,
the user can resolve and then git mv the path).

I think this approach suffers from the drawback that the current approach
has, wherein this conflicts when they had path B, also, as noted above.

I think that if you were to put both B and C with all three stages, this
would be problematic for the same reason.

 *1* Instead of a three-way merge that inspects only the endpoints,
 you might get a better rename trail if you looked at the
 histories of both branches.  It would be a lot more expensive
 than the simple three-way, but burning CPU cycles is better than
 burning human neurons.

For the record, I like this approach very much.  It's not something that
libgit2 will be able to tackle in the near future; we're in a sort of
walk-before-you-can-run situation with merge at the moment, as you can
probably see.  But any improvement that avoids burning neurons is a
valuable one.

Thanks-

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-27 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  I would propose that this not simply track rename conflicts, but all
  conflicts.
 
 That is a no starter.

So.  Can you explain to me why this would be a non starter?  Can you suggest
some alternate strategy here?

Maybe there's something I'm fundamentally misunderstanding.  It seems that
at present, git will:

1. Detect rename conflicts when performing a merge (at least,
   git-merge-recursive will, which is the default.)

2. If the rename itself caused a conflict (eg, renamed in one side, added in
   the other) then the merge cannot succeed.

3. The resultant index is written as if renames were not detected, which
   means - at best - records the files that went in to the name conflict
   and git status reports an added in ours conflict, which is a pretty
   disappointing conflict.  Often, though, many of the files will not
   exist at higher stage entries, since without rename detection, they
   would have not been conflicts.  At worst, one side is staged, there are
   no conflicts in the index and the user can commit (and thus lose the
   other side.)

Thus it's not like we could add some extension that merely records the names
that produced the rename conflicts and point them at the higher stage entries
in the index.  That would require that they actually be in the index.

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-03-27 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  I would propose that this not simply track rename conflicts, but all
  conflicts.
 
 That is a no starter.

 So.  Can you explain to me why this would be a non starter?

At least two, IIRC.  One is the consequence of the other.

We do not gratuitously break existing implementations.  If no
conflict is stored as higher-stage index entries in an index that
has your index extension, no existing implementation can read a
conflicted index written by your implementation and have users
resolve conflicts.

When a path originally at A is moved to B on only one branch, and
there are content-level conflicts between the changes made by one
branch (while going from A to B) and by the other branch (while
keeping it at A), we would end up having three stages for path B
without any trace of path A.  I do not offhand know how much it
helps to learn A in such a situation in the real life, but we are
indeed losing information, and I do not have any problem with an
extension that records in the index the fact that in the two (of the
three) commits involved in the merge, the path was at A.

But people have been successfully using existing versions of Git
without that information to merge branches with renames, and
resolving the content-level conflicts.  Your tool that
_additionally_ records This path that currently has three stages
for B was at A in the common ancestor (i.e. stage #1) and that
branch (either stage #2 or stage #3) does not _have_ _to_ break
these users by removing the three stages for B from the main index.

Also we do not duplicate information unnecessarily.  Nowhere in the
above we have been losing the fact that two of the three had the
contents we have at path B in the resulting unmerged index at path
A, and that information might be useful as well, there is a reason
to write another copy of mode or SHA-1 for any of the three variants.

As I said, you do not live in the world where you are writing
something like Git from scratch.  Perhaps you do, but then the
result will not be Git and we wouldn't be discussing that system on
this mailing list.




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-27 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 We do not gratuitously break existing implementations.  If no conflict is 
 stored
 as higher-stage index entries in an index that has your index extension, no
 existing implementation can read a conflicted index written by your
 implementation and have users resolve conflicts.

I'm not suggesting that anybody stop writing 0 stage entries.

 When a path originally at A is moved to B on only one branch, and there are
 content-level conflicts between the changes made by one branch (while going
 from A to B) and by the other branch (while keeping it at A), we would end up
 having three stages for path B without any trace of path A.  I do not offhand
 know how much it helps to learn A in such a situation in the real life, but 
 we are
 indeed losing information, and I do not have any problem with an extension 
 that
 records in the index the fact that in the two (of the
 three) commits involved in the merge, the path was at A.

What you've described is true only for a certain class of rename conflicts,
for example the rename/edit conflict you've described above.

It's also true if you were to rename some item 'a' to 'b' in both branches.
But when 'b' is sufficiently dissimilar to become a rewrite, then I end up
with a rename of a-b on one side and deleting a and adding b on the other.
The result is a mysterious added by us conflict:

100644 e2dd530c9f31550a2b0c90773ccde056929d6d66 2   b

Worse yet is if I don't do the rename in my side, but I just add a new b so
that in theirs I've renamed a to b and in mine I have both a and b.  When I
do the merge, I'm told I have conflicts, except that I don't:

100644 08d4f831774aed5d4c6cb496affefd4020dce40c 0   b

The other branch's b is long gone and exists only as a dirty file in the
workdir.

 But people have been successfully using existing versions of Git without that
 information to merge branches with renames, and resolving the content-level
 conflicts.

But you aren't afforded the option to resolve content-level conflicts if you
don't know where the conflict came from.  For example, in a rename 1-2
conflict, we dutifully detect that a was renamed to both b and c and fail,
but that fact is never given to the index.  This conflict could be fed into
a merge tool or, better, automerged, with the user only needing to pick a
path:

100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 1   a
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 3   b
100644 421c9102b8562ad227ba773ab1cf6bbed7b7496d 2   c

I hate to sound like a broken record here, but without some more data in the
index - anywhere, really - any tool that doesn't have the luxury of emitting
data about what happened to stdout certainly can't infer anything about what
happened in the merge.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-03-27 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 Junio C Hamano [mailto:gits...@pobox.com] wrote:
 We do not gratuitously break existing implementations.  If no conflict is 
 stored
 as higher-stage index entries in an index that has your index extension, no
 existing implementation can read a conflicted index written by your
 implementation and have users resolve conflicts.

 I'm not suggesting that anybody stop writing 0 stage entries.

Ah, OK, then I misread your original message.  You said

 Having a single canonical location is preferable - if the index
 contains a CONF section (and the client supports it), it would use
 that.  Otherwise, the client would look at stage 0 entries.

which I read as an index with this extension would not have higher
stage entries, an index without the extension records higher stage
entries.  As long as the format will be backward compatible to
allow existing users use existing tools to deal with cases the
existing tools can handle, then that is OK.  I didn't get that
impression which is where my non starter came from.

 What you've described is true only for a certain class of rename conflicts,
 for example the rename/edit conflict you've described above.

As you asked me to explain why it was a non starter, I only
illustrated with a renamed trivially, with content level conflict
example that shows why dropping higher-stage entries in the main
index would not be acceptable.  The previous message did not even
mean to cover any cases the *new* feature you have in mind is trying
to address.  Again, if it hurts existing users handling cases
existing tools used to handle, that makes it a non starter.

How new feature is designed, and extension is added to help that new
feature, is a different matter.  My original That's a non starter
message didn't even go that far.

In any case, the principle of always record the state 'merge'
stopped to ask for help as higher stage entries to give existing
tools and users a chance to manually resolve, and augment with
optional extension to record additional information that might help,
but do not gratiutously waste bytes on redundant information would
apply to other exotic cases you would want to tackle with the new
feature, I would think.

If one branch moves path A in the original to path B and the other
one moved it to path C, for example, we can record it in different
ways, even in the main index.

 * Path A may have only stage #1, while path B and C has only stage
   #2 and stage #3 (the user would have to notice these three
   correspond to each other, and resolve manually).

   You would want to annotate B at stage #2 seems to have been at A
   in the original (similarly for C#3) if you choose to do so.

 * You can choose to favor our choice, and have path B with three
   stages (if we guessed wrong and the user wants to move it to C,
   the user can resolve and then git mv the path).

   You would want to annotate the other side wanted to have B at
   stage #3 at C in that case.

 * Or you may want to have in the main index both B and C (but not
   A) with all three stages (the user would have to choose which
   one survives, but discarding the other side with git rm would
   be easy).

   You would want to annotate the origin of the stage #1 for path B
   and C (these were originally at A), stage #2 for B (the other
   branch wants to have it at C), stage #3 for C (we want to have it
   at B).

There may be other ways, and I do not offhand know what the current
merge-recursive implementation does, but both of the latter two
sound equally usable and reasonable ways, even without the
annotation.  And with your annotation that records different paths,
the conflict may become even easier to resolve.

I still do not need to duplicate mode, SHA-1 in the extensions to
do the above, or do I?

If the original path A was removed and a new path B was added, with
contents that are modified from A beyond recognition, at the merge
time you wouldn't know where B it came from or where A went, so
annotating A at stage #1 to say it went to B is a nonsense.  If
you have algorithm to do so [*1*], you would be better off detecting
it as a rename.


[Footnote]

*1* Instead of a three-way merge that inspects only the endpoints,
you might get a better rename trail if you looked at the
histories of both branches.  It would be a lot more expensive
than the simple three-way, but burning CPU cycles is better than
burning human neurons.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-26 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
 
  I would propose that we store the data about the file in conflict as
  it occurred through the renames.  For example, in a rename 1-2
  conflict where A was renamed to both B and C, you would have a single
  conflict entry containing the data for A, B and C.  This would allow
  us to provide more detailed information to the user - and allow them
  to (say) choose a single name to proceed with.
 
  Is this something that has value to core git as well?  Alternately, is
  there something particularly stupid about this proposal?
 
 I do not offhand see anything particularly stupid; a new optional index 
 extension
 section CACHE_EXT_RENAME_CONFLICT might be a good addition.
 
 Is one side moves A to B while the other side moves it to C the only case, 
 or is
 it just an example?  Off the top of my head, one side moves A to x while the
 other side moves B to x/y would also be something we would want to know.  I
 am sure there are other cases that need to be considered.
 
 I do not think we can discuss the design at the concrete level until the 
 proposal
 spells out to cover all interesting cases in order for implementations to 
 agree on
 the common semantics.

Sorry about the delay here:  besides getting busy with some other things,
I wanted both a complete writeup and to have taken a pass at a test
implementation this in libgit2 to make sure seemed like a reasonably sensible
approach.

I would propose a new extension, 'CONF', to handle conflict data, differing
from the stage 0 entries in the index in that this extension tracks the
conflicting file across names if the underlying merge engine has support
for renames.

I made an attempt to keep the entry data similar to other entries in the
index.  I would propose that entries in the conflict are as follows:

Flags
  Four octets that describe the conflict.  Data includes:

  0x01  HAS_ANCESTOR
There is a file in the common ancestor branch that contributes
to this conflict.  Its data will follow.
  0x02  HAS_OURS
There is a file in our branch that contributes to this conflict.
Its data will follow.
  0x04  HAS_THEIRS
There is a file in their branch that contributes to this conflict.
Its data will follow.

  0x08  NAME_CONFLICT_OURS
This item has a path in our branch that overlaps a different
item in their branch.  (Eg, this conflict represents the our
side of a rename/add conflict.)
  0x10  NAME_CONFLICT_THEIRS
This item has a path in their branch that overlaps a different
item in our branch.  (Eg, this conflict represents the theirs
side of a rename/add conflict.)

  0x20  DF_CONFLICT_FILE
This is the file involved in a directory/file conflict.
  0x40  DF_CONFLICT_CHILD
This is a child of a directory involved in a directory/file conflict.

  Other bits are reserved.

Conflict Sides
  The data about one side of a conflict will contain:
  mode (ASCII string representation of octal, null-terminated)
  path (null terminated)
  sha1 (raw bytes)

The conflict sides will be written in this order:
  Ancestor (if HAS_ANCESTOR is set)
  Ours (if HAS_OURS is set)
  Theirs (if HAS_THEIRS is set)

I would propose that this not simply track rename conflicts, but all
conflicts.  Having a single canonical location is preferable - if the index
contains a CONF section (and the client supports it), it would use that.
Otherwise, the client would look at stage 0 entries.

I would propose that another extension, 'RSVD', track these conflicts once
they are resolved.  The format would be the same - when a conflict is
resolved from the CONF the entry will be placed as-is in the RSVD.

Examples are not an exhaustive list, but should help elucidate the name
and d/f conflicts:

Normal edit / edit conflict, where A is edited in ours and theirs:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS
Entry 1 = A [Ancestor]
Entry 2 = B [Ancestor]
Entry 3 = C [Ancestor]

Rename / add conflict, where A is renamed to B in ours and B is added in
theirs:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|NAME_CONFLICT_OURS
Entry 1 = A [Ancestor]
Entry 2 = B [Ours]
Entry 3 = A [Theirs]
  Conflict two:
Flags = HAS_THEIRS|NAME_CONFLICT_THEIRS
Entry 1 = File B [Theirs]

D/F conflict, where some file A is deleted in theirs, and a directory
A is created with file child:

  Conflict one:
Flags = HAS_ANCESTOR|HAS_OURS|HAS_THEIRS|DF_CONFLICT_FILE
Entry 1 = A [Ancestor]
Entry 2 = A [Ours]
  Conflict two:
Flags = HAS_THEIRS|DF_CONFLICT_CHILD
Entry 1 = A/child [Theirs]

Thanks for your input on this.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-03-26 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 I would propose a new extension, 'CONF', to handle conflict data, differing
 from the stage 0 entries in the index in that this extension tracks the
 conflicting file across names if the underlying merge engine has support
 for renames.

 I made an attempt to keep the entry data similar to other entries in the
 index.  I would propose that entries in the conflict are as follows:

 Flags
   Four octets that describe the conflict.  Data includes:

   0x01  HAS_ANCESTOR
 There is a file in the common ancestor branch that contributes
 to this conflict.  Its data will follow.
   0x02  HAS_OURS
 There is a file in our branch that contributes to this conflict.
 Its data will follow.
   0x04  HAS_THEIRS
 There is a file in their branch that contributes to this conflict.
 Its data will follow.

   0x08  NAME_CONFLICT_OURS
 This item has a path in our branch that overlaps a different
 item in their branch.  (Eg, this conflict represents the our
 side of a rename/add conflict.)
   0x10  NAME_CONFLICT_THEIRS
 This item has a path in their branch that overlaps a different
 item in our branch.  (Eg, this conflict represents the theirs
 side of a rename/add conflict.)

   0x20  DF_CONFLICT_FILE
 This is the file involved in a directory/file conflict.
   0x40  DF_CONFLICT_CHILD
 This is a child of a directory involved in a directory/file conflict.

   Other bits are reserved.

 Conflict Sides
   The data about one side of a conflict will contain:
   mode (ASCII string representation of octal, null-terminated)
   path (null terminated)
   sha1 (raw bytes)

 The conflict sides will be written in this order:
   Ancestor (if HAS_ANCESTOR is set)
   Ours (if HAS_OURS is set)
   Theirs (if HAS_THEIRS is set)

Puzzled.  Most of the above, except NAME_CONFLICT_{OURS,THEIRS}
bits, look totally pointless duplication.

When you are working with Git, you have to be prepared to read from
the datafile like the index that other people (and your previous
version) created, and you also have to make sure you do not make
what you write out unusable by other people without a good reason.

So your tool needs code to see higher stage entries in the main
index to find mode,sha1 for the conflicted paths even without the
index extension anyway, and if your tool does also perform merges,
you would need to strive for writing the main index with conflicted
entries and implementations that do not yet understand your
extension can keep operating.  For some types of extensions, the
latter may be hard (and that is why I stopped at you would need to
strive for, and not you must), but for the one under discussion,
I do not think it is the case (by the way CONF sounds as if it is
some sort of configuration data).

If you are starting a brand new system from scratch, keeping only
the resolved entries in the main index and having a separate section
for conflicts might be also a valid design choice, but you do not
live in that world if you are discussing the design on this mailing
list.

 I would propose that this not simply track rename conflicts, but all
 conflicts.

That is a no starter.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-26 Thread Edward Thomson

Junio C Hamano [mailto:gits...@pobox.com] wrote:
 Edward Thomson ethom...@microsoft.com writes:
  I would propose a new extension, 'CONF', to handle conflict data,
  differing from the stage 0 entries in the index in that this
  extension tracks the conflicting file across names if the underlying
  merge engine has support for renames.
 
  I made an attempt to keep the entry data similar to other entries in
  the index.  I would propose that entries in the conflict are as follows:
 
  Flags
Four octets that describe the conflict.  Data includes:
 
0x01  HAS_ANCESTOR
  There is a file in the common ancestor branch that contributes
  to this conflict.  Its data will follow.
0x02  HAS_OURS
  There is a file in our branch that contributes to this conflict.
  Its data will follow.
0x04  HAS_THEIRS
  There is a file in their branch that contributes to this conflict.
  Its data will follow.
 
0x08  NAME_CONFLICT_OURS
  This item has a path in our branch that overlaps a different
  item in their branch.  (Eg, this conflict represents the our
  side of a rename/add conflict.)
0x10  NAME_CONFLICT_THEIRS
  This item has a path in their branch that overlaps a different
  item in our branch.  (Eg, this conflict represents the theirs
  side of a rename/add conflict.)
 
0x20  DF_CONFLICT_FILE
  This is the file involved in a directory/file conflict.
0x40  DF_CONFLICT_CHILD
  This is a child of a directory involved in a directory/file conflict.
 
Other bits are reserved.
 
  Conflict Sides
The data about one side of a conflict will contain:
mode (ASCII string representation of octal, null-terminated)
path (null terminated)
sha1 (raw bytes)
 
  The conflict sides will be written in this order:
Ancestor (if HAS_ANCESTOR is set)
Ours (if HAS_OURS is set)
Theirs (if HAS_THEIRS is set)
 
 Puzzled.  Most of the above, except NAME_CONFLICT_{OURS,THEIRS} bits, look
 totally pointless duplication.

Obviously HAS_ANCESTOR / HAS_OURS / HAS_THEIRS is to indicate to a reader
whether there is data to be read or not.  Similar to how a mode of 0
in the REUC indicates that the rest of the record should not be read.)

 When you are working with Git, you have to be prepared to read from the
 datafile like the index that other people (and your previous
 version) created, and you also have to make sure you do not make what you
 write out unusable by other people without a good reason.

I'm acutely aware that you need to be able to read an index that other
people created - that's the problem at hand.  git does not produce an
index that allows anyone (including itself) to reason about rename
conflicts.  It doesn't even bother to write high-stage conflict entries
in many instances, so you can have an instance where git tells you that
a conflict occurred but one of those files is staged anyway, the other
is just dirty in the workdir and you can commit immediately thereafter.

While obviously it's possible to handle this situation (is file A
in conflict?  Look in the rename conflict extension.  Not there?  Okay,
look in the index.)  That's not exactly elegant.  My goal here was
to have a single source for conflicts.

 (by the way CONF sounds as if it is some sort of configuration data).

There's only four letters, and not everything's as easy as TREE.  REUC,
for example, sounds like a donkey, though I suppose it depends on the
language in question.

-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rename conflicts in the index

2013-03-13 Thread Junio C Hamano
Edward Thomson ethom...@microsoft.com writes:

 I would propose that we store the data about the file in conflict as it
 occurred through the renames.  For example, in a rename 1-2 conflict where
 A was renamed to both B and C, you would have a single conflict entry
 containing the data for A, B and C.  This would allow us to provide more
 detailed information to the user - and allow them to (say) choose a single
 name to proceed with.

 Is this something that has value to core git as well?  Alternately, is
 there something particularly stupid about this proposal?

I do not offhand see anything particularly stupid; a new optional
index extension section CACHE_EXT_RENAME_CONFLICT might be a good
addition.

Is one side moves A to B while the other side moves it to C the
only case, or is it just an example?  Off the top of my head, one
side moves A to x while the other side moves B to x/y would also be
something we would want to know.  I am sure there are other cases
that need to be considered.

I do not think we can discuss the design at the concrete level until
the proposal spells out to cover all interesting cases in order for
implementations to agree on the common semantics.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Rename conflicts in the index

2013-03-13 Thread Edward Thomson
Junio C Hamano [mailto:gits...@pobox.com] wrote:
 I do not offhand see anything particularly stupid; a new optional index 
 extension
 section CACHE_EXT_RENAME_CONFLICT might be a good addition.
 
 Is one side moves A to B while the other side moves it to C the only case, 
 or is
 it just an example?  Off the top of my head, one side moves A to x while the
 other side moves B to x/y would also be something we would want to know.  I
 am sure there are other cases that need to be considered.

Yes, that was just an example.  Certainly I was intending that all conflicts
that arose from renames would end up here since one can't really reason
why the merge tool created a conflict by looking at the index alone - even
knowing the merge tool's similarity algorithms, this would be awfully
expensive to piece back together, even if the index did contain non-zero
stage entries for all the items that were involved in the conflicts.

That said, my rather naive initial thought was that we could repeat *all*
conflicts in this area.  This would give tools that knew how to understand
this the ability to go to a single place for conflict data, rather than
producing some merge of high-stage entries that comprise non-rename
conflicts and data from the rename conflict area for rename conflicts.

Thanks-
-ed
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html