Re: Storing state in $GIT_DIR

2005-08-26 Thread Eric W. Biederman
Martin Langhoff [EMAIL PROTECTED] writes:

 On 8/26/05, Eric W. Biederman [EMAIL PROTECTED] wrote:
 Thinking about it going from arch to git should be just a matter
 of checking sha1 hashes, possibly back to the beginning of the
 arch tree.

 Yup, though actually replaying the tree to compute the hashes is
 something I just _won't_ do ;)

I guess if you have the tla branch names it won't be necessary.
If you are careful how you do the import you can have two parallel
imports of the same data and produce exactly the same git tree.
That is largely why I care about a stable algorithm for the hashes.

 Going from git to arch is the trickier mapping, because you
 need to know the full repo--category--branch--version--patch
 mapping.

 My plan doesn't include git-arch support... yet...

One of my interests, and if I get the time to worry about it
is to get a scm that is a sufficient superset of what other
scms do so it can serve as a bidirectional gateway.

git is fairly close to what is needed to implement that.

Hmm.  I wonder if a git metadata branch in general is sufficient to
store information that does not map to git natively?

 Hmm.  Thinking about arch from a git perspective arch tags every
 commit.  So the really sane thing to do (I think) is to create
 a git tag object for every arch commit.

 Now I like that interesting idea. It doesn't solve all my problems,
 but is a reasonable mapping point. Will probably do it.

 With patch trading (Martin I think I know what you are refering to)
 arch does seem to have a concept that does not map very well to git,
 and this I think is a failing in git.

 I won't get into _that_ flamewar ;)

pouts No flamewar /pouts

 My plan for merges is to detect when two branches up until what point
 branches are fully merged, and mark that in git -- because that is
 what git considers a merge. The rest will be known to the importer,
 but nothing else.

I looked at least back to the StGit announcement and it helped to
clarify my thinking.  A patch is equivalent to a branch with
just one change. This makes cherry picking a single patch roughly
equivalent to describing that patch as a single commit branch
at the fork point from the common ancestor of the two branches,
and then having the single commit merged.

The fact that the original branch that was cherry picked from
can really only be represented as a an graft.  Like the original
linux kernel history.

The shortcoming I see in git-applypatch is that it doesn't attempt
to find the original base of a patch and instead simply assumes it
is against the current tree.

There is a similar short coming in git-diff-tree where it reports
the commit that you are on when take the diff, but it does not
report the commit the diff is against. 

..

Thinking a little more there is also a connection with reverting
patches.  Cherry picking changes from a branch may also be thought of
as reverting all of the other changes from a branch and then merging
the branch.

The practical impact of all of these things is there a form that
will allow future merges to realize the same change has already
been applied so it can skip it the second time.

Inter-operating with darcs, tla, quilt, and raw diff/patch brings up
these issues.

So my practical questions are:
- What information can a current git merge algorithms and more
  sophisticated merge algorithms use to avoid having conflicts when
  the same changes are merged into the same branch multiple times?

- Is the git meta data sufficient to represent the history
  sophisticated merge algorithms can use.

- Is the git meta data sufficient to represent the result
  of sufficient meta data operations.

- Is the current representation of a reverted change sufficient
  for the merge algorithms, or could they do a better job if
  they new a change was revert of a previous change.

I'm just trying to think through the issues that working with patch
based systems bring up.

Eric
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing state in $GIT_DIR

2005-08-25 Thread Martin Langhoff
Linus, 

I like the solution you are suggesting, but I suspect it will create
more problems that it will solve, and while the coolness factor is
drawing me in we ain't gonna need it, as the xp people say.

More below...

On 8/26/05, Linus Torvalds [EMAIL PROTECTED] wrote:
 Git won't care, so it will work, but things like clone/pull etc also won't
 actually ever look there, so it will only work for that one repo.

Storing things there _works_ in the sense that it will be ignored, and
that is fine with me. So I could just be lazy and have it strictly
tied to the repo. In practice, if you are tracking an external Arch
repo, you really have it scripted, and use a dedicated git repo for
that.

Not using a dedicated repo is quite... messy. If you do other things
in that particular repo, the import script may find it dirty, and mess
things up on import. And after the import, you'll probably run
git-push-script --all because it's bringing a dynamically growing
forest of heads from the arch repo. That's another reason why your
private branches should be elsewhere.

OTOH, storing the metadata in a branch will allow us to run the import
in alternating repositories. But as Junio points out, unless I can
guarantee that the metadata and the tree are in sync, I cannot
trivially resume the import cycle from a new repo.

 The git solution to this (which nobody has ever _used_, but which
 technically is wonderful) is to have a side branch that does not share
 any commits (or files, for that matter) in common with the real branch,
 and which is used to track any metadata. In fact, you can obviously have
 any number of side branches.

A couple of days ago, playing with the import, I realised that the git
repo can hold unrelated projects, too, if you just commit orphan trees
as new heads. I mean - it was a bug in my script but I thought it was
cool. ;)

 The way to maintain a metadata branch is to have not only a different
 branch name (obviously), but also use a totally different index file, so
 that you can index both branches in parallell, and you don't actually need
 to check out one or the other.

Hmmm. Now that's voodoo magic! I was thinking of reading the file by
asking directly for the object by its sha, or doing a checkout in a
tmpdir. Interesting.

cheers,


martin
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing state in $GIT_DIR

2005-08-25 Thread Linus Torvalds


On Fri, 26 Aug 2005, Martin Langhoff wrote:
 
 OTOH, storing the metadata in a branch will allow us to run the import
 in alternating repositories. But as Junio points out, unless I can
 guarantee that the metadata and the tree are in sync, I cannot
 trivially resume the import cycle from a new repo.

But you can.

Remember: the metadata is the pointers to the original git conversion, and 
objects are immutable.

In other words, if you just have a last commit pointer in your 
meta-data, then git is _by_definition_ in sync. There's never anything to 
get out of sync, because objects aren't going to change.

So you can think of your meta-data as a strange kind of head ref. Or 
rather, a _collection_ of these strange refs.

And it doesn't matter if somebody ends up committing on top of an arch 
import. The metadata by definition doesn't know about it, so the import 
head doesn't move anywhere (if you do git and arch work in parallell, you 
can then merge the two heads with git, of course).

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing state in $GIT_DIR

2005-08-25 Thread Martin Langhoff
On 8/26/05, Linus Torvalds [EMAIL PROTECTED] wrote:
  OTOH, storing the metadata in a branch will allow us to run the import
  in alternating repositories. But as Junio points out, unless I can
  guarantee that the metadata and the tree are in sync, I cannot
  trivially resume the import cycle from a new repo.
 
 But you can.
 
 Remember: the metadata is the pointers to the original git conversion, and
 objects are immutable.
 
 In other words, if you just have a last commit pointer in your
 meta-data, then git is _by_definition_ in sync. There's never anything to
 get out of sync, because objects aren't going to change.

Hmmm. That repo is in sync, but there are no guarantees that they will
travel together to a different repo. In fact, the push/pull
infrastructure wants to push/pull one head at a time.

And if they are not in sync, I have no way of knowing. Hmpf. I lie:
the arch metadata could keep track of what it expects the last head
commits to be, and complain bitterly if something smells rotten.

let me think about it ;)


martin
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing state in $GIT_DIR

2005-08-25 Thread Junio C Hamano
Martin Langhoff [EMAIL PROTECTED] writes:

 In other words, if you just have a last commit pointer in your
 meta-data, then git is _by_definition_ in sync. There's never anything to
 get out of sync, because objects aren't going to change.

 Hmmm. That repo is in sync, but there are no guarantees that they will
 travel together to a different repo. In fact, the push/pull
 infrastructure wants to push/pull one head at a time.

Wrong as of last week ;-), and definitely wrong since this morning.

 And if they are not in sync, I have no way of knowing. Hmpf. I lie:
 the arch metadata could keep track of what it expects the last head
 commits to be, and complain bitterly if something smells rotten.

What Linus suggests is doable by using an object that can hold
a pointer to at least one commit---you used that to record the
head commit of the corresponding git branch that the arch
metainfo represents.

You only pull arch metainfo branch; the objects associated with
the corresponding git branch head will be pulled together when
you pull it.  You do not have to tell git to pull git-part of
the commit chain.  There is no need to worry about version skew
when you use git this way.

Now, among the existing object types, there are only two kinds
of objects you can use for this.  If the only thing you need to
record is some textual information with one pointer to git
branch head, then you can use tag that points at the git head,
and store everything else as the tag comment.  This is doable
but unwieldy.

You could abuse a commit object as well; you store commit
objects (such as the corresponding git branch head) as parent
commits, and put everything else in a tree that is associated
with that commit.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing state in $GIT_DIR

2005-08-25 Thread Junio C Hamano
Linus Torvalds [EMAIL PROTECTED] writes:

 That kind of extension shouldn't be too hard, and might make tags much 
 more generally usable (ie you could say I sign these n official 
 releases or something).

Well, I admit that once I advocated changing tag to bag, but
one problem is how you would dereference something like that.

v0.99.5^0 means look at the named object v0.99.5, dereference
it repeatedly until you get a non-tag, and take the result,
which had better be a commit.  If a tag can contain more than
one pointers, I do not know what it means.





-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html