Re: Use case (was Re: Should branches be objects?)

2014-06-25 Thread Junio C Hamano
Nico Williams n...@cryptonector.com writes:

 On Tue, Jun 24, 2014 at 6:09 AM, Theodore Ts'o ty...@mit.edu wrote:
 ...
 This seems pretty close to what we have with signed tags.  When I send
 a pull request to Linus, I create a signed tag which createscontains a
 message about a set of commits, and this message is automatically
 included in the pull request message generated with git
 request-pull, and when Linus merges my pull request, the
 cryptographically signed tag, along with the message, date of the
 signature, etc., is preserved for all posterity.

 Thanks for pointing this out.  Signed tags are objects -- that's a
 clear and strong precedent..

Sounds as if you are interpreting what Ted said as a supporting
argument for having branches as separate type of objects, but the
way I read it was signed tags are sufficient for what you want to
do; adding a new branch type does not make much sense at this
point.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use case (was Re: Should branches be objects?)

2014-06-25 Thread Theodore Ts'o
On Wed, Jun 25, 2014 at 10:42:49AM -0700, Junio C Hamano wrote:
 Nico Williams n...@cryptonector.com writes:
 
  On Tue, Jun 24, 2014 at 6:09 AM, Theodore Ts'o ty...@mit.edu wrote:
  ...
  This seems pretty close to what we have with signed tags.  When I send
  a pull request to Linus, I create a signed tag which createscontains a
  message about a set of commits, and this message is automatically
  included in the pull request message generated with git
  request-pull, and when Linus merges my pull request, the
  cryptographically signed tag, along with the message, date of the
  signature, etc., is preserved for all posterity.
 
  Thanks for pointing this out.  Signed tags are objects -- that's a
  clear and strong precedent..
 
 Sounds as if you are interpreting what Ted said as a supporting
 argument for having branches as separate type of objects, but the
 way I read it was signed tags are sufficient for what you want to
 do; adding a new branch type does not make much sense at this
 point.

Yes, that's what I was saying.  If you want to record a reliable who
pushed this (or who requested this to be pulled), you really want
to use a GPG signature, since otherwise the identity of the pusher can
be completely faked --- especially if the you have a tiered system
where you have sub-maintainers in the mix.  So if you want any kind of
auditability long after the fact, you want digital signatures, and so
a signed tag maps exactly to what you want --- modulo needing a
standardized Linus Torvalds bot.  But the nice thing about creating
such an automated pull request processing system is that it doesn't
require making any changes to core git.

If you insist that it has to be done via a git push, I suspect it
wouldn't be that hard to add changes to Gerrit (which already has an
concept of access control which ssh keys are allowed to push a
change), and extended it to include a hook that validated whether the
push included a signed tag.  Again, no core changes needed to git, or
to the repository format.

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use case (was Re: Should branches be objects?)

2014-06-24 Thread John Keeping
On Mon, Jun 23, 2014 at 10:20:14PM -0500, Nico Williams wrote:
 The Illumos repo, like OpenSolaris before it, and Solaris itself at
 Sun (and now at Oracle) requires that fixes be broken down into small
 commits, with related fixes, tests, and docs changes all typically in
 separate commits, but all pushed together, so that a single push of N
 commits is a logical set of changes (e.g., to be backed out together
 if, say, any one of them breaks a build).  With git the only way to
 record this grouping at push time is with a post-receive hook that
 does the recording (which is what the Illumos repo does, sending email
 to a list about all the commits pushed in one go).

Have you considered using merges for this instead?  If each set of
related changes is its own branch, then if you merge with `--no-ff` so
that a merge commit is always created, you can identify the set of
related changes with:

git log ${MERGE_COMMIT}^1..${MERGE_COMMIT}^2

There are some interesting effects with reverting merge commits,
particularly if you want to merge the same set of changes at a later
date, but this seems like the Git way of identifying related commits.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use case (was Re: Should branches be objects?)

2014-06-24 Thread Theodore Ts'o
On Mon, Jun 23, 2014 at 10:20:14PM -0500, Nico Williams wrote:
 
 Now, suppose that branches were objects.  Then at push time one might
 push with a message about the set of commits being pushed, and this
 message (and time of push, and pusher ID) would get recorded in the
 branch object.  At fetch time the branch objects's histories would be
 pulled (but usually never pushed), and would be available for browsing
 with git log at remotes/remote/branch.  Each commit of the branch
 object (as it were) would record each logical set of commits.

This seems pretty close to what we have with signed tags.  When I send
a pull request to Linus, I create a signed tag which createscontains a
message about a set of commits, and this message is automatically
included in the pull request message generated with git
request-pull, and when Linus merges my pull request, the
cryptographically signed tag, along with the message, date of the
signature, etc., is preserved for all posterity.

 Problem: if pushing via an intermediary the push metadat would get
 lost.  This would argue for either a stronger still notion of related
 commits, or none stronger than what exists now (because ETOOMUCH).
 But this branch object concept could also be just right: if pushing
 through a an intermediary (what at Sun was called a project gate) then
 it becomes that intermedirary's (gatekeeper's) job to squash, rebase,
 regroup, edit, drop, reword, ... commits.

With signed tags, the metadata is preserved even when the set of
commits is sent via an intermediary.

It seems the major difference is that it's a pull model, where some
projects seem much happier with a push model.  But that sounds like
what is needed is that someone replaces Linus Torvalds with a shell
script --- namely, an e-mail bot that receives pull requests, checks
the signed tag against an access control list, and if it is an
authorized committer, accepts the pull request automatically (or
rejects it if there are merge conflicts).

Not that I am suggesting for even a second that Linus could be fully
replaced by a shell script.  For example, he handles trivial merge
conflicts, and more importantly, applies a oh my G*d you must be
kidding taste filter on incoming pull requests, which I think would
be hard to automate.  Then again, neural networks have automatically
evolved to recognize cat videos, so we can't rule it out in the
future.  :-)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Use case (was Re: Should branches be objects?)

2014-06-24 Thread Nico Williams
On Tue, Jun 24, 2014 at 6:09 AM, Theodore Ts'o ty...@mit.edu wrote:

 On Mon, Jun 23, 2014 at 10:20:14PM -0500, Nico Williams wrote:
 
  Now, suppose that branches were objects.  Then at push time one might
  push with a message about the set of commits being pushed, and this
  message (and time of push, and pusher ID) would get recorded in the
  branch object.  At fetch time the branch objects's histories would be
  pulled (but usually never pushed), and would be available for browsing
  with git log at remotes/remote/branch.  Each commit of the branch
  object (as it were) would record each logical set of commits.

 This seems pretty close to what we have with signed tags.  When I send
 a pull request to Linus, I create a signed tag which createscontains a
 message about a set of commits, and this message is automatically
 included in the pull request message generated with git
 request-pull, and when Linus merges my pull request, the
 cryptographically signed tag, along with the message, date of the
 signature, etc., is preserved for all posterity.

Thanks for pointing this out.  Signed tags are objects -- that's a
clear and strong precedent..  That's another thing that branches as
objects could have: signatures of pushed commits (separately from the
commits themselves).

 It seems the major difference is that it's a pull model, where some
 projects seem much happier with a push model.  But that sounds like
 what is needed is that someone replaces Linus Torvalds with a shell
 script --- namely, an e-mail bot that receives pull requests, checks
 the signed tag against an access control list, and if it is an
 authorized committer, accepts the pull request automatically (or
 rejects it if there are merge conflicts).

Shell script, protocol..  The git push protocol is convenient.  The
fact that git supports a patches-via-email, push, and pull models,
that's a great aspect of git.  Why disadvantage the push case, when
it's so popular (e.g., via github and such)?

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Use case (was Re: Should branches be objects?)

2014-06-23 Thread Nico Williams
(thinking more about this, digesting Jonathan's response...)

The Illumos repo, like OpenSolaris before it, and Solaris itself at
Sun (and now at Oracle) requires that fixes be broken down into small
commits, with related fixes, tests, and docs changes all typically in
separate commits, but all pushed together, so that a single push of N
commits is a logical set of changes (e.g., to be backed out together
if, say, any one of them breaks a build).  With git the only way to
record this grouping at push time is with a post-receive hook that
does the recording (which is what the Illumos repo does, sending email
to a list about all the commits pushed in one go).

Now, suppose that branches were objects.  Then at push time one might
push with a message about the set of commits being pushed, and this
message (and time of push, and pusher ID) would get recorded in the
branch object.  At fetch time the branch objects's histories would be
pulled (but usually never pushed), and would be available for browsing
with git log at remotes/remote/branch.  Each commit of the branch
object (as it were) would record each logical set of commits.

Side effects besides addressing the contiguous and related commit grouping need:

 - no more need to sign-off on cherry-picks: the branch will record
the ousher's ID, which can then be taken as the person signing off;

 - branch objects substantially replace/augment reflogs;

 - no need to ammend commits: just push an empty set of commits just
to update the branch object with a note!

The UI would mostly consist of an option to git push to include a push
message, and a way to review branch history (much like git log -g, but
with access to the push-time metadata).  Also along for the ride: a
way to get the new metadata in post-receive hooks.

Problem: if pushing via an intermediary the push metadat would get
lost.  This would argue for either a stronger still notion of related
commits, or none stronger than what exists now (because ETOOMUCH).
But this branch object concept could also be just right: if pushing
through a an intermediary (what at Sun was called a project gate) then
it becomes that intermedirary's (gatekeeper's) job to squash, rebase,
regroup, edit, drop, reword, ... commits.

Just a thought,

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Should branches be objects?

2014-06-19 Thread Nico Williams
[I'm a list newbie here, but a git power user.]

If branches were objects...

 - one could see the history of branches, including

 - how commits were grouped when pushed/pulled (push 5 commits, and
the branch object will record that its head moved by those five
commits at once)

 - rebase history (git log branch-object - better than git reflog!)

 - object transactional APIs would be used to update branches

Branch objects might be purely local, recording what was done in a
local repo to a branch, but they might be pullable, to make branch
history viewable in clones.

Just a thought,

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should branches be objects?

2014-06-19 Thread Jonathan Nieder
Hi,

Nico Williams wrote:

  - one could see the history of branches, including

Interesting.  'git log -g' is good for getting that information
locally, but the protocol doesn't have a way to get it from a remote
server so you have to ssh in.  Ronnie (cc-ed) and I were talking
recently about whether it would make sense to update git protocol to
have a way to get at the remote reflogs more easily --- would that be
useful to you?

  - how commits were grouped when pushed/pulled (push 5 commits, and
 the branch object will record that its head moved by those five
 commits at once)

The reflog on the server (if enabled) records this.

  - rebase history (git log branch-object - better than git reflog!)

The local reflog ('git log -g branch') records this.

  - object transactional APIs would be used to update branches

Ronnie's recent ref-transaction code does this.

Thanks and hope that helps,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should branches be objects?

2014-06-19 Thread Nico Williams
On Thu, Jun 19, 2014 at 6:46 PM, Jonathan Nieder jrnie...@gmail.com wrote:
 Nico Williams wrote:

  - one could see the history of branches, including

 Interesting.  'git log -g' is good for getting that information
 locally, but the protocol doesn't have a way to get it from a remote
 server so you have to ssh in.  Ronnie (cc-ed) and I were talking
 recently about whether it would make sense to update git protocol to
 have a way to get at the remote reflogs more easily --- would that be
 useful to you?

Yes and no.  I've thought about that some concept, but:

a) reflogs include information about what's done to the workspace
(checkout...) that's not relevant to any branch,

b) reflogs aren't objects, which ISTM has caused transactional issued
(even if they are fixed or soon to be),

c) the fewer kinds of things, the more elegant the design, so maybe
reflogs ought to be objects themselves, which is one thought that led
me to branches should be objects.

Another thought that led me there is that I often do:

$ git checkout -b ${branch}-rebase1
$ git rebase -i master
...
$ git checkout -b ${branch}-rebase2
$ git rebase -i master
...

I iterate through this until a set of commits is the way the upstream wants it.

No one really needs that history, except me: possibly to show my
boss/customer, possibly to put together a list of changes I've done to
show the upstream maintainer, ...   Yes, this is in the reflog, but...
it's mixed up with unrelated stuff.

Also, I'd like to be able to git diff
branch-version..same-branch-diff-branch-version.  Again, for my
own purposes in collating changes I've done to previously submitted
PRs.

Now, I can do that as I always have, but it litters my branch namespace.

Lastly, there are people who just don't get rebasing.  They think it's
horrible because it changes the truth.  You've met them, I'm certain.
Branches as objects might help mollify them.

  - how commits were grouped when pushed/pulled (push 5 commits, and
 the branch object will record that its head moved by those five
 commits at once)

 The reflog on the server (if enabled) records this.

Yeah, though as you point out I can't see it.

  - rebase history (git log branch-object - better than git reflog!)

 The local reflog ('git log -g branch') records this.

See above.

  - object transactional APIs would be used to update branches

 Ronnie's recent ref-transaction code does this.

Speaking of which: are there any power failure corruption cases left
in git?  How is this tested?

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should branches be objects?

2014-06-19 Thread Nico Williams
Another thing is that branches as objects could store a lot more
information, like:

 - the merge-base and HEAD for a rebase (and the --onto)

 - the interactive rebase plan!  (and diffs to what would have been
the non-interactive plan)

 - the would-be no-op non-interactive rebase plan post rebase (again,
so elucidate what commit splitting and such things occurred during a
rebase)

Nico
--
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should branches be objects?

2014-06-19 Thread Jonathan Nieder
Nico Williams wrote:

 a) reflogs include information about what's done to the workspace
 (checkout...) that's not relevant to any branch,

Nope, reflogs just record changes to refs and information about why
they happened.

 b) reflogs aren't objects, which ISTM has caused transactional issued
 (even if they are fixed or soon to be),

Not sure I understand.  Do you mean that if reflogs were named by their
content then they wouldn't need to be renamed when a ref is renamed?
Or are you referring to some other atomicity issue?

[...]
 $ git checkout -b ${branch}-rebase1
 $ git rebase -i master
 ...
 $ git checkout -b ${branch}-rebase2
 $ git rebase -i master
 ...

 I iterate through this until a set of commits is the way the upstream wants 
 it.

 No one really needs that history, except me: possibly to show my
 boss/customer, possibly to put together a list of changes I've done to
 show the upstream maintainer, ...   Yes, this is in the reflog, but...
 it's mixed up with unrelated stuff.

Yes, this isn't something we do well at all.  It would be nice to have a
tool that can take two versions of a branch (from different refs, taken
from the reflog, or whatever) and visually represent what happened to
corresponding commits.

Thomas Rast started work on such a thing called tbdiff, which you can
find at https://github.com/trast/tbdiff.

[...]
 Also, I'd like to be able to git diff
 branch-version..same-branch-diff-branch-version.  Again, for my
 own purposes in collating changes I've done to previously submitted
 PRs.

Do you mean 'git diff mybranch mybranch@{3}' /
'git diff mybranch mybranch@{3.days.ago}'?

[...]
  - object transactional APIs would be used to update branches

 Ronnie's recent ref-transaction code does this.

 Speaking of which: are there any power failure corruption cases left
 in git?  How is this tested?

What kind of power failure corruption are you talking about?  Git
usually updates files by writing a completely new file and then
renaming it into place, so depending on your filesystem this means it
is very hard or very easy to lose data with a power failure. :)

If you're on one of those filesystems where it is very easy and you
lose power a lot, you'll probably want to enable the
core.fsyncobjectfiles configuration option.  It might be worth adding
another knob like that for the other files git writes if someone is
interested.

Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should branches be objects?

2014-06-19 Thread Jeff King
On Thu, Jun 19, 2014 at 06:01:47PM -0700, Jonathan Nieder wrote:

  Speaking of which: are there any power failure corruption cases left
  in git?  How is this tested?
 
 What kind of power failure corruption are you talking about?  Git
 usually updates files by writing a completely new file and then
 renaming it into place, so depending on your filesystem this means it
 is very hard or very easy to lose data with a power failure. :)

We use git-core on ext4 at GitHub, and we certainly have seen our share
of machines failing unexpectedly. We haven't seen any problems of this
nature[1] (but note that we journal data writes; you should also be fine
with ordered data writes, but data=writeback is likely disastrous).

 If you're on one of those filesystems where it is very easy and you
 lose power a lot, you'll probably want to enable the
 core.fsyncobjectfiles configuration option.  It might be worth adding
 another knob like that for the other files git writes if someone is
 interested.

You probably know this already Jonathan, but to be clear:

Git always fsyncs pack writes. That knob controls fsyncing of loose
object files, but nothing else. So ref writes (and writing packed-refs)
could be corrupted on a filesystem that doesn't order data and metadata
writes (and there is currently no way to tell git to do otherwise).

My recommendation would be to steer clear or reconfigure such systems,
but it also would not be very hard to add an optional fsync in those
cases.

-Peff

[1] We did have one case where after a crash packfiles would end up
corrupted, but it turned out to be bad RAM in a battery-backed RAID
card that was transparently caching (and losing) the writes.
There's not much git can do when fsync lies to it, nor much the
kernel can do when the hardware lies to it. :)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html