Re: [PATCH v4] doc/SubmittingPatches: correct subject guidance

2017-11-10 Thread Josh Triplett
On Fri, Nov 10, 2017 at 03:02:50PM +, Adam Dinwoodie wrote:
> The examples and common practice for adding markers such as "RFC" or
> "v2" to the subject of patch emails is to have them within the same
> brackets as the "PATCH" text, not after the closing bracket.  Further,
> the practice of `git format-patch` and the like, as well as what appears
> to be the more common pratice on the mailing list, is to use "[RFC
> PATCH]", not "[PATCH/RFC]".
> 
> Update the SubmittingPatches article to match and to reference the
> `format-patch` helper arguments, and also make some minor text
> clarifications in the area.
> 
> Signed-off-by: Adam Dinwoodie <a...@dinwoodie.org>
> Helped-by: Eric Sunshine <sunsh...@sunshineco.com>

This looks great! Thank you for updating this documentation.

Reviewed-by: Josh Triplett <j...@joshtriplett.org>

> ---
> 
> Notes:
> Changes since v3:
> - Clarified meaning of "RFC" per Eric's suggestion
> - Made the impact of --subject-prefix and friends clearer per Eric's
>   suggestion
> 
> Thank you for your nitpicking, Eric, it's useful and very much
> appreciated :)
> 
>  Documentation/SubmittingPatches | 19 ---
>  1 file changed, 12 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 558d465b6..89f239071 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -184,21 +184,26 @@ lose tabs that way if you are not careful.
>  
>  It is a common convention to prefix your subject line with
>  [PATCH].  This lets people easily distinguish patches from other
> -e-mail discussions.  Use of additional markers after PATCH and
> -the closing bracket to mark the nature of the patch is also
> -encouraged.  E.g. [PATCH/RFC] is often used when the patch is
> -not ready to be applied but it is for discussion, [PATCH v2],
> -[PATCH v3] etc. are often seen when you are sending an update to
> -what you have previously sent.
> +e-mail discussions.  Use of markers in addition to PATCH within
> +the brackets to describe the nature of the patch is also
> +encouraged.  E.g. [RFC PATCH] (where RFC stands for "request for
> +comments") is often used to indicate a patch needs further
> +discussion before being accepted, [PATCH v2], [PATCH v3] etc.
> +are often seen when you are sending an update to what you have
> +previously sent.
>  
> -"git format-patch" command follows the best current practice to
> +The "git format-patch" command follows the best current practice to
>  format the body of an e-mail message.  At the beginning of the
>  patch should come your commit message, ending with the
>  Signed-off-by: lines, and a line that consists of three dashes,
>  followed by the diffstat information and the patch itself.  If
>  you are forwarding a patch from somebody else, optionally, at
>  the beginning of the e-mail message just before the commit
>  message starts, you can put a "From: " line to name that person.
> +To change the default "[PATCH]" in the subject to "[]", use
> +`git format-patch --subject-prefix=`.  As a shortcut, you
> +can use `--rfc` instead of `--subject-prefix="RFC PATCH"`, or
> +`-v ` instead of `--subject-prefix="PATCH v"`.
>  
>  You often want to add additional explanation about the patch,
>  other than the commit message itself.  Place such "cover letter"
> -- 
> 2.14.3
> 


Re: [RFC PATCH 2/4] change submodule push test to use proper repository setup

2017-10-11 Thread Josh Triplett
On Tue, Oct 10, 2017 at 11:39:21AM -0700, Stefan Beller wrote:
> On Tue, Oct 10, 2017 at 6:03 AM, Heiko Voigt <hvo...@hvoigt.net> wrote:
> > but in the long run my goal
> > for submodules is and always was: Make them behave as close to files as
> > possible. And why should a 'git add submodule' not magically do
> > everything it can to make submodules just work? I can look into a patch
> > for that if people agree here...
> 
> I'd love to see this implemented. I cc'd Josh (the author of git-series), who
> may disagree with this, or has some good input how to go forward without
> breaking git-series.

git-series doesn't use the git-submodule command at all, nor does it
construct series trees using git-add or any other git command-line tool;
it constructs gitlinks directly. Most of the time, it doesn't even make
sense to `git checkout` a series branch. Modifying commands like git-add
and similar to automatically manage .gitmodules won't cause any issue at
all, as long as git itself doesn't start rejecting or complaining about
repositories that have gitlinks without a .gitmodules file.

- Josh Triplett


Re: Regarding "git log" on "git series" metadata

2016-11-07 Thread Josh Triplett
On Mon, Nov 07, 2016 at 04:42:04PM +0700, Duy Nguyen wrote:
> On Mon, Nov 7, 2016 at 8:18 AM, Josh Triplett <j...@joshtriplett.org> wrote:
> > Once we have gitrefs, you have both alternatives: reachable (gitref) or
> > not reachable (gitlink).
> >
> > However, if you want some way to mark reachable objects as not
> > reachable, such as for a sparse checkout, external large-object storage,
> > or similar, then you can use a single unified mechanism for that whether
> > working with gitrefs, trees, or blobs.
> 
> How? Whether an object reachable or not is baked in the definition (of
> either gitlink or gitref). I don't think you can have a "maybe
> reachable" type then rely on an external source to determine
> reachability,

You'd have various "reachable by default" entries in trees, including
trees, blobs, and gitrefs, and then have an external mechanism (likely
via .git/config) to say "ignore objects with these hashes/paths".  For
instance, you might say "ignore all objects only reachable from the path
'assets/video/*' within a commit's tree".  With the right set of client
and server extensions, you could then avoid downloading those objects.


Re: Regarding "git log" on "git series" metadata

2016-11-06 Thread Josh Triplett
On Sun, Nov 06, 2016 at 12:17:10PM -0800, Jacob Keller wrote:
> On Sun, Nov 6, 2016 at 9:33 AM, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Sun, Nov 06, 2016 at 09:14:56AM -0800, Junio C Hamano wrote:
> >> Josh Triplett <j...@joshtriplett.org> writes:
> >> > We could, but if we (or one of the many third-party git implementations)
> >> > miss a case, gitlinks+reachability may appear to work in many cases with
> >> > dataloss afterward, while gitrefs will fail early and not appear
> >> > functional.
> >>
> >> I wonder what happens if we do not introduce the "gitref" but
> >> instead change the behaviour of "gitlink" to imply an optional
> >> reachability.  That is, when enumerating what is reachable in your
> >> repository, if you see a gitlink and if you notice that you locally
> >> have the target of that gitlink, you follow, but if you know you
> >> lack it, you do not error out.  This may be making things too
> >> complex to feasibily implement by simplify them ;-) and I see a few
> >> immediate fallout that needs to be thought through (i.e. downsides)
> >> and a few upsides, too.  I am feeling feverish and not thinking
> >> straight, so I won't try to weigh pros-and-cons.
> >>
> >> This would definitely need protocol extension when transferring
> >> objects across repositories.
> >
> > It'd also need a repository format extension locally.  Otherwise, if you
> > ever touched that repository with an older git (or a tool built on an
> > older libgit2 or JGit or other library), you could lose data.
> >
> > It does seem conceptually appealing, though.  In an ideal world, the
> > original version of gitlink would have had opt-out reachability (and
> > .gitmodules with an external repository reference could count as opting
> > out).
> >
> > But I can't think of any case where it's OK for a git implementation to
> > not know about this reachability extension and still operate on the
> > gitlink.  And given that, it might as well use a new object type that
> > the old version definitely won't think it understands.
> >
> > - Josh Triplett
> 
> That's still only true if the receiving end runs fsck, isn't it? I
> suppose that's a large number of receivers, and at least there are
> ways post-push to determine that objects don't make sense to that
> version of git.
> 
> I think using a new mode is the safest way, and it allows easily
> implementing RefTrees as well as other projects. Additionally, if we
> *wanted* additional "opt-in / opt-out" support we could add this by
> default to gitrefs,and they could (possibly) replace gitlinks in the
> future?

Once we have gitrefs, you have both alternatives: reachable (gitref) or
not reachable (gitlink).

However, if you want some way to mark reachable objects as not
reachable, such as for a sparse checkout, external large-object storage,
or similar, then you can use a single unified mechanism for that whether
working with gitrefs, trees, or blobs.


Re: Regarding "git log" on "git series" metadata

2016-11-06 Thread Josh Triplett
On Sun, Nov 06, 2016 at 09:14:56AM -0800, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > We could, but if we (or one of the many third-party git implementations)
> > miss a case, gitlinks+reachability may appear to work in many cases with
> > dataloss afterward, while gitrefs will fail early and not appear
> > functional.
> 
> I wonder what happens if we do not introduce the "gitref" but
> instead change the behaviour of "gitlink" to imply an optional
> reachability.  That is, when enumerating what is reachable in your
> repository, if you see a gitlink and if you notice that you locally
> have the target of that gitlink, you follow, but if you know you
> lack it, you do not error out.  This may be making things too
> complex to feasibily implement by simplify them ;-) and I see a few
> immediate fallout that needs to be thought through (i.e. downsides)
> and a few upsides, too.  I am feeling feverish and not thinking
> straight, so I won't try to weigh pros-and-cons.  
> 
> This would definitely need protocol extension when transferring
> objects across repositories.

It'd also need a repository format extension locally.  Otherwise, if you
ever touched that repository with an older git (or a tool built on an
older libgit2 or JGit or other library), you could lose data.

It does seem conceptually appealing, though.  In an ideal world, the
original version of gitlink would have had opt-out reachability (and
.gitmodules with an external repository reference could count as opting
out).

But I can't think of any case where it's OK for a git implementation to
not know about this reachability extension and still operate on the
gitlink.  And given that, it might as well use a new object type that
the old version definitely won't think it understands.

- Josh Triplett


Re: Regarding "git log" on "git series" metadata

2016-11-06 Thread Josh Triplett
On Sat, Nov 05, 2016 at 09:50:07PM -0700, Jacob Keller wrote:
> On Sat, Nov 5, 2016 at 1:25 PM, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Sat, Nov 05, 2016 at 09:21:58PM +0100, Christian Couder wrote:
> >> On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <j...@joshtriplett.org> 
> >> wrote:
> >> > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote:
> >> >> And with what Peff says above it looks like we will need ways
> >> >> configure and tweak commit reachability with gitlink/gitref anyway. So
> >> >> the point of gitref compared to gitlink would be that they just have a
> >> >> different reachability by default. But couldn't that be replaced by a
> >> >> default rule saying that when a gitlink is reached "this way or that
> >> >> way" then the commit reachability should be enforced, and otherwise it
> >> >> should not be?
> >> >
> >> > Any version of git unaware of that rule, though, would consider objects
> >> > only reachable by gitlink as unreachable and delete them, causing data
> >> > loss.  Likewise for a server not aware of that rule.  And a server
> >> > unaware of that rule would not supply those objects to a client pulling
> >> > such a branch.
> >>
> >> Yeah, so you would really need an up-to-date server and client to
> >> store the git-series data.
> >> But anyway if we create a gitref object, you would also need
> >> up-to-date servers and clients to make it work.
> >
> > Agreed, but gitrefs have the advantage of failing safe, rather than
> > failing with dataloss.
> >
> > - Josh Triplett
> 
> Isn't the "failing safe" only true if the client disconnects when a
> server doesn't advertise "i understand gitrefs"? So couldn't we, as
> part of the rules for reachability advertise a capability that does a
> similar thing and fails safe as well?

We could, but if we (or one of the many third-party git implementations)
miss a case, gitlinks+reachability may appear to work in many cases with
dataloss afterward, while gitrefs will fail early and not appear
functional.


Re: Regarding "git log" on "git series" metadata

2016-11-05 Thread Josh Triplett
On Sat, Nov 05, 2016 at 09:21:58PM +0100, Christian Couder wrote:
> On Sat, Nov 5, 2016 at 4:18 PM, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote:
> >> And with what Peff says above it looks like we will need ways
> >> configure and tweak commit reachability with gitlink/gitref anyway. So
> >> the point of gitref compared to gitlink would be that they just have a
> >> different reachability by default. But couldn't that be replaced by a
> >> default rule saying that when a gitlink is reached "this way or that
> >> way" then the commit reachability should be enforced, and otherwise it
> >> should not be?
> >
> > Any version of git unaware of that rule, though, would consider objects
> > only reachable by gitlink as unreachable and delete them, causing data
> > loss.  Likewise for a server not aware of that rule.  And a server
> > unaware of that rule would not supply those objects to a client pulling
> > such a branch.
> 
> Yeah, so you would really need an up-to-date server and client to
> store the git-series data.
> But anyway if we create a gitref object, you would also need
> up-to-date servers and clients to make it work.

Agreed, but gitrefs have the advantage of failing safe, rather than
failing with dataloss.

- Josh Triplett


Re: Regarding "git log" on "git series" metadata

2016-11-05 Thread Josh Triplett
On Sat, Nov 05, 2016 at 01:45:27PM +0100, Christian Couder wrote:
> And with what Peff says above it looks like we will need ways
> configure and tweak commit reachability with gitlink/gitref anyway. So
> the point of gitref compared to gitlink would be that they just have a
> different reachability by default. But couldn't that be replaced by a
> default rule saying that when a gitlink is reached "this way or that
> way" then the commit reachability should be enforced, and otherwise it
> should not be?

Any version of git unaware of that rule, though, would consider objects
only reachable by gitlink as unreachable and delete them, causing data
loss.  Likewise for a server not aware of that rule.  And a server
unaware of that rule would not supply those objects to a client pulling
such a branch.

So I don't think "gitlink defined as reachable" quite works, unless we
make some other format-incompatible change that forces clients and
servers touching that gitlink to know about that reachability rule.  (In
the absence of a hack such as making the same commit a parent.)


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On November 4, 2016 7:48:17 PM MDT, Jeff King <p...@peff.net> wrote:
>On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote:
>
>> > You might also want fallback rules for storing gitrefs on "old"
>servers
>> > (e.g., backfilling gitrefs you need if the server didn't them in
>the
>> > initial fetch). But I guess storing any gitrefs on such a server is
>> > inherently dangerous, because the server might prune them at any
>time.
>> 
>> Is it possible currently for a protocol extension to result in "oh
>the
>> server doesn't support this so I'm going to stop pushing"?
>
>Yes, it would be easy for the client to abort if the server fails to
>advertise a particular extension.

And the reverse (old client, new server) should work as well? 

- Josh Triplett



Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 04:37:34PM -0700, Jacob Keller wrote:
> On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplett <j...@joshtriplett.org> wrote:
> > That said, I'd *love* to have gitrefs available, for a wide variety of
> > applications, and I can see an argument for introducing them and waiting
> > a few years for them to become universally available, similar to the
> > process gitlinks went through.
> >
> > But I'd also love to have a backward-compatible solution.
> >
> > - Josh Triplett
> 
> I think that you won't really find a backwards compatible solution
> other than something like automatically generating refs for each point
> of history. I know that gerrit does something like this by storing
> each version in "refs/changes/id/version" or something along those
> lines. I think this might actually be cleaner than your parent links
> hack, and could be used as a fallback for when gitrefs don't work,
> though you'd have to code exactly how to tell what to push to a
> repository when pushing a series?

I'm not sure what the advantage of that would be, and it would mean that
if you ever have one branch without pushing the other(s), you'd get
severe time-delated breakage due to pruning.  (And if you pushed the
series without the other ref(s), its history would look right but then
you couldn't access the underlying versions of the patch series.)

One of my design goals was to *not* need a special "git series push" or
"git series pull"; you should just be able to use git push and git pull,
and you can set up normal refspecs.

That said, I could fairly easily generate the existing format with
artificial parent refs for backward compatibility, and provide a way to
use the new gitref-based storage format if you know that all your
servers and clients can handle it.  I'm also open to other suggestions
for how to make such a transition while still working with every git
server and git client that exists today.


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 03:49:07PM -0400, Jeff King wrote:
> On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote:
> 
> > I agree with your assessment here. The main difficulty in implementing
> > gitrefs is to ensure that they actually do get picked up by
> > reachability checks to prevent dropping commits. I'm not sure how easy
> > this is, but I would much rather we go this route rather than
> > continuing along with the hack. This seems like the ideal solution,
> > since it solves the entire problem and doesn't need more hacks bolted
> > on.
> 
> I think the main complication is that the reachability rules are used
> during object transfer. So you'd probably want to introduce some
> protocol extension to say "I understand gitrefs", so that when one side
> says "I have sha1 X and its reachable objects", we know whether they are
> including gitrefs there. And likewise receivers with
> transfer.fsckObjects may complain about the new gitref tree mode
> (fortunately a new object type shouldn't be needed).
> 
> You might also want fallback rules for storing gitrefs on "old" servers
> (e.g., backfilling gitrefs you need if the server didn't them in the
> initial fetch). But I guess storing any gitrefs on such a server is
> inherently dangerous, because the server might prune them at any time.
> 
> So perhaps a related question is: how can gitrefs be designed such that
> existing servers reject them (rather than accepting the push and then
> later throwing away half the data). It would be easy to notice in the
> client during a push that we are sending gitrefs to a server which does
> not claim that capability. But it seems more robust if it is the server
> who decides "I will not accept these bogus objects".

This seems like the critical problem, here.  The parent hack I used in
git-series might be a hack, but it transparently works with old servers
and clients.  So, for instance, I can push a git-series ref to github,
with no changes required on github's part.  If git added gitrefs, and I
started using them in git-series, then that'd eliminate parent hack and
allow many standard git tools to work naturally on git-series commits
and history, but it'd also mean that people couldn't push git-series
commits to any server until that server updates git.

That said, I'd *love* to have gitrefs available, for a wide variety of
applications, and I can see an argument for introducing them and waiting
a few years for them to become universally available, similar to the
process gitlinks went through.

But I'd also love to have a backward-compatible solution.

- Josh Triplett


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote:
> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano  wrote:
> >
> > Imagine we invent a new tree entry type, "gitref", that is similar
> > to "gitlink" in that it can record a commit object name in a tree,
> > but unlike "gitlink" it does imply reachability.  And you do not add
> > phony parents to your commit object.  A tree that has "gitref"s in
> > it is about annotating the commits in the same repository (e.g. the
> > tree references two commits, "base" and "tip", to point into a slice
> > of the main history).  And it is perfectly sensible for such a
> > pointer to imply reachability---after all it serves different
> > purposes from "gitlink".
> 
> The more I think about this (and also about how to limit ref
> advertisements as recently discussed in
> https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/),
> the more I think about Shawn's RefTree:
> 
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> 
> Couldn't a RefTree be used to store refs that point to the base
> commit, the tip commit and the blob that contains the cover letter,
> and maybe also a ref pointing to the RefTree of the previous version
> of the series?

That's really interesting!  The Software Heritage project is working on
something similar, because they want to store all the refs as part of
their data model as well.  I'll point them to the reftree work.

If upstream git supported RefTree, I could potentially use that for
git-series.  However, I do want a commit message and history for the
series itself, and using refs in the reftree to refer to the parents
seems like abusing reftree to recreate commits, in a reversal of the
hack of using commit parents as a reftree. :)

What if, rather than storing a hash reference to a reftree as a single
reference and replacing it with no history, a reftree could be
referenced from a commit and have history?  (That would also allow
tagging a version of the reftree.)


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 10:57:09AM -0700, Junio C Hamano wrote:
> After your talk at LPC2016, I was thinking about your proposal to
> give an option to hide certain parents from "git log" traversal.
> 
> While I do not think we would terribly mind a new feature in the
> core to support third-party additions like "git series" better, I
> think this particular one is a big mistake that we shouldn't take.
[...]
> I think this is backwards.  The root cause of the issue you have
> with "gitk" is because you added something that is *NOT* a parent to
> your commit.  We shouldn't have to add a mechanism to filter
> something that shouldn't have been added there in the first place.
> 
> I am wondering if an alternative approach would work better.
> 
> Imagine we invent a new tree entry type, "gitref", that is similar
> to "gitlink" in that it can record a commit object name in a tree,
> but unlike "gitlink" it does imply reachability.  And you do not add
> phony parents to your commit object.  A tree that has "gitref"s in
> it is about annotating the commits in the same repository (e.g. the
> tree references two commits, "base" and "tip", to point into a slice
> of the main history).  And it is perfectly sensible for such a
> pointer to imply reachability---after all it serves different
> purposes from "gitlink".

I absolutely agree with this, and I'd love to have gitref or similar in
core git.  Given the availability of that mechanism, I'd love to use it
in git-series.  (And in git submodule, as well, for other projects.)

The one critical issue there, though: that would break backward
compatibility with old versions of git.  No old version of git could
push, pull, gc, repack, or otherwise touch a repository that used this
feature.

The advantages of the approach (viewing and manipulating the series with
pure git) seem sufficiently high to make that worth considering, but it
is a significant downside.

> Another alternative that I am negative about (but is probably a
> better hack than how you abused the "parent" link) might be to add a
> new commit object header field that behaves similarly to "parent"
> only in that it implies reachability.  But recording the extra
> parent in commit object was not something you wanted to do in the
> first place (i.e. your series processing is done solely on the
> contents of the tree, and you do not read this extra parent). If you
> need to add an in-tree reference to another commit in your future
> versions of "git series", with either this variant or your original
> implementation, you would end up needing adding more "parent" (or
> pseudo parent) only to preserve reachability.  At that point, I
> think it makes more sense to have entries in the tree to directly
> ensure reachability, if you want these entries to always point at an
> in-tree object.

This would similarly break compatibility with old git, as old git
wouldn't follow those reachability-only links from commits, so it could
throw away the data.

One approach compatible with old git would be to continue adding the
relevant commits as artificial parents, but have a separate commit
metadata field that says which parents to ignore; old git would then do
the right thing, as long as it doesn't rewrite the commit entirely.

That does have the same disadvantages of having to duplicate the
information in both the tree and the parent list, though; it's the same
class of hack, just with improved usability.  I'd much rather use
gitrefs.

> I am afraid that I probably am two steps ahead of myself, because I
> am reasonably sure that it is quite possible that I have overlooked
> something trivially obvious that makes the "gitref" approach
> unworkable.

gitref seems like a good idea to me, as long as we can sort out the
compatibility story.


Re: [PATCH] t4014-format-patch: Adjust git_version regex to better handle distro changes to DEF_VER

2016-10-09 Thread Josh Triplett
On October 9, 2016 7:53:23 PM PDT, Jeremy Huddleston Sequoia 
<jerem...@apple.com> wrote:
>Regressed-in: 480871e09ed2e5275b4ba16b278681e5a8c122ae
>Signed-off-by: Jeremy Huddleston Sequoia <jerem...@apple.com>
>CC: Josh Triplett <j...@joshtriplett.org>
>CC: Junio C Hamano <gits...@pobox.com>

Looks reasonable to me. Didn't realize git versions could have spaces.
Reviewed-by: Josh Triplett <j...@joshtriplett.org>

> t/t4014-format-patch.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
>index 8d90a6e..33f6940 100755
>--- a/t/t4014-format-patch.sh
>+++ b/t/t4014-format-patch.sh
>@@ -754,7 +754,7 @@ test_expect_success 'format-patch
>--ignore-if-in-upstream HEAD' '
>   git format-patch --ignore-if-in-upstream HEAD
> '
> 
>-git_version="$(git --version | sed "s/.* //")"
>+git_version="$(git --version | sed "s/git version //")"
> 
> signature() {
>   printf "%s\n%s\n\n" "-- " "${1:-$git_version}"




Re: git 2.10.1 test regression in t4014-format-patch.sh

2016-10-09 Thread Josh Triplett
On October 9, 2016 5:15:22 PM PDT, Jeremy Huddleston Sequoia 
 wrote:
>Hey Josh,
>
>Hope you're doing well.
>
>I wanted to let you know that this patch of yours, which landed in git
>2.10.1, introduced some test failures, seen on macOS.
>
>Let me know if you need any additional information to track these down.
>
>Thanks,
>Jeremy
>
>
>not ok 65 - format-patch default signature
>#  
>#  git format-patch --stdout -1 | tail -n 3 >output &&
>#  signature >expect &&
>#  test_cmp expect output
>#  
>
>not ok 132 - format-patch --base
>#  
>#  git checkout side &&
>#  git format-patch --stdout --base=HEAD~3 -1 | tail -n 7 >actual 
>&&
>#  echo >expected &&
>#  echo "base-commit: $(git rev-parse HEAD~3)" >>expected &&
>#  echo "prerequisite-patch-id: $(git show --patch HEAD~2 | git
>patch-id --stable | awk "{print \$1}")" >>expected &&
>#  echo "prerequisite-patch-id: $(git show --patch HEAD~1 | git
>patch-id --stable | awk "{print \$1}")" >>expected &&
>#  signature >> expected &&
>#  test_cmp expected actual
>#  

Can you run the test with the option to show the expected and actual strings?

Did the testsuite run with the wrong git somehow?




[PATCH v3] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-19 Thread Josh Triplett
This provides a shorter and more convenient alias for
--subject-prefix='RFC PATCH'.

Includes documentation in the format-patch manpage, and a new test
covering --rfc.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
v3:
- Fix an error message referring to --subject-prefix
- Expand the acronym "RFC"
v2:
- Add documentation to the format-patch manpage
- Call subject_prefix_callback rather than reimplementing it
- Update test to move expectations inside

 Documentation/git-format-patch.txt |  8 +++-
 builtin/log.c  | 10 +-
 t/t4014-format-patch.sh|  9 +
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 9624c84..9b200b3 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -19,7 +19,8 @@ SYNOPSIS
   [--start-number ] [--numbered-files]
   [--in-reply-to=Message-Id] [--suffix=.]
   [--ignore-if-in-upstream]
-  [--subject-prefix=Subject-Prefix] [(--reroll-count|-v) ]
+  [--rfc] [--subject-prefix=Subject-Prefix]
+  [(--reroll-count|-v) ]
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
   []
@@ -172,6 +173,11 @@ will want to ensure that threading is disabled for `git 
send-email`.
allows for useful naming of a patch series, and can be
combined with the `--numbered` option.
 
+--rfc::
+   Alias for `--subject-prefix="RFC PATCH"`. RFC means "Request For
+   Comments"; use this when sending an experimental patch for
+   discussion rather than application.
+
 -v ::
 --reroll-count=::
Mark the series as the -th iteration of the topic. The
diff --git a/builtin/log.c b/builtin/log.c
index 92dc34d..c657900 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1112,6 +1112,11 @@ static int subject_prefix_callback(const struct option 
*opt, const char *arg,
return 0;
 }
 
+static int rfc_callback(const struct option *opt, const char *arg, int unset)
+{
+   return subject_prefix_callback(opt, "RFC PATCH", unset);
+}
+
 static int numbered_cmdline_opt = 0;
 
 static int numbered_callback(const struct option *opt, const char *arg,
@@ -1419,6 +1424,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
N_("start numbering patches at  instead of 1")),
OPT_INTEGER('v', "reroll-count", _count,
N_("mark the series as Nth re-roll")),
+   { OPTION_CALLBACK, 0, "rfc", , NULL,
+   N_("Use [RFC PATCH] instead of [PATCH]"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, rfc_callback },
{ OPTION_CALLBACK, 0, "subject-prefix", , N_("prefix"),
N_("Use [] instead of [PATCH]"),
PARSE_OPT_NONEG, subject_prefix_callback },
@@ -1557,7 +1565,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (numbered && keep_subject)
die (_("-n and -k are mutually exclusive."));
if (keep_subject && subject_prefix)
-   die (_("--subject-prefix and -k are mutually exclusive."));
+   die (_("--subject-prefix/--rfc and -k are mutually 
exclusive."));
rev.preserve_subject = keep_subject;
 
argc = setup_revisions(argc, argv, , _r_opt);
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..ed4d3c2 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1073,6 +1073,15 @@ test_expect_success 'empty subject prefix does not have 
extra space' '
test_cmp expect actual
 '
 
+test_expect_success '--rfc' '
+   cat >expect <<-\EOF &&
+   Subject: [RFC PATCH 1/1] header with . in it
+   EOF
+   git format-patch -n -1 --stdout --rfc >patch &&
+   grep ^Subject: patch >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--from=ident notices bogus ident' '
test_must_fail git format-patch -1 --stdout --from=foo >patch
 '

base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
-- 
git-series 0.8.10


Re: [PATCH v2] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-19 Thread Josh Triplett
On Mon, Sep 19, 2016 at 04:46:06PM -0700, Jacob Keller wrote:
> On Mon, Sep 19, 2016 at 4:40 PM, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Mon, Sep 19, 2016 at 04:34:35PM -0700, Jeff King wrote:
> >> As far as your patch goes, I'd be OK with defining:
> >>
> >>   --rfc::
> >>   Pretend as if `--subject-prefix='RFC PATCH'` was given.
> >>
> 
> Would:
> 
> Shorthand for `--subject-prefix='RFC PATCH'`
> 
> be a better reading? I feel like using "pretend" is a bit weird here.

My patch used "Alias for"; if you prefer "Shorthand for" I'm
indifferent. :)


Re: [PATCH v2] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-19 Thread Josh Triplett
On Mon, Sep 19, 2016 at 04:34:35PM -0700, Jeff King wrote:
> On Mon, Sep 19, 2016 at 01:44:08PM -0700, Josh Triplett wrote:
> 
> > On Mon, Sep 19, 2016 at 10:49:17AM -0700, Junio C Hamano wrote:
> > > Andrew Donnellan <andrew.donnel...@au1.ibm.com> writes:
> > > 
> > > > Sounds good to me. Agreed that "RFC" is essentially the only prefix
> > > > other than "PATCH" that I see, at least in the kernel.
> > > 
> > > Around here I think we saw WIP too, and that makes me lean towards
> > > Peff's earlier suggestion to allow an end-user supplied string in
> > > front of PATCH, i.e. "-P RFC" => "--subject-prefix='RFC PATCH'",
> > > even though I understand that those who _ONLY_ care about RFC would
> > > prefer --rfc (5 keystrokes) over "-P RFC" (6 keystrokes).
> > 
> > I do share the concern raised elsewhere in the thread that adding new
> > format-patch short options potentially conflicts with diff/rev-list
> > short options.  If you're not worried about that, I'd be happy to add
> > (and document and test) -P.  However, I'd still advocate adding --rfc as
> > well; it's a common case, and "-P RFC" is actually rather more
> > keystrokes when you count shifting. :)
> > 
> > There might also be some value in steering people towards "RFC" (since a
> > WIP is in a way an RFC).
> 
> Good point. This may be an opportunity to be just slightly opinionated
> and nudge people towards a micro-standard.
> 
> I was curious what we have used over the years on the git list. So here
> are the top hits for:
> 
>   # start with a maildir archive
>   find git/cur -type f |
> 
>   # grab first subject line from each mail
>   xargs grep -i -m1 -h ^subject: |
> 
>   # pull out only bracketed text, ignore "re: [PATCH]"
>   perl -lne '/subject:\s*(\[.*?\])/i and print $1' |
> 
>   # normalize numbers; note that a long patch series will be
>   # over-represented, since it gets one hit per message
>   perl -pe 's/[0-9]+/X/g' |
> 
>   # and then sort by count
>   sort | uniq -c | sort -rn
> 
> The top 5 hits are:
> 
>   26252 [PATCH X/X]
>   18255 [PATCH]
>   17262 [PATCH vX X/X]
>2330 [PATCH vX]
>2297 [PATCHvX X/X]
> 
> which is not surprising (our "-v" uses "PATCH vX", which is why that's
> so much more common). After that it starts to get interesting, but let's
> do two further transformations before the sort to coalesce similar
> cases:
> 
>   # drop versioning entirely
>   perl -pe 's/\s*vX//' |
> 
>   # drop multipart subjects
>   perl -pe 's{\s*X/X}{}' |
> 
> That gives us:
> 
>   67081 [PATCH]
>1286 [PATCH/RFC]
>1169 [RFC/PATCH]
> 863 [JGIT PATCH]
> 832 [ANNOUNCE]
> 797 [RFC]
> 675 [RFC PATCH]
> 537 [StGit PATCH]
> 524 [EGIT PATCH]
> 367 [BUG]
> 169 [StGIT PATCH]
> 158 [GUILT]
> 142 [PATCH RFC]
> 131 [WIP PATCH]
> 129 [PATCH/WIP]
> 115 [TopGit PATCH]
> 115 [PATCH VX]
> ...
> 
> Some of those are obviously uninteresting (like "ANNOUNCE" or "BUG").
> But I see a few interesting patterns:
> 
>   - the "slash" form of RFC/PATCH or PATCH/RFC seems more common than a
> straight prefix (2400 versus about 800)
> 
>   - both RFC/PATCH and PATCH/RFC seems similarly popular (but "RFC
> PATCH" is much more popular than "PATCH RFC")
> 
>   - WIP is a lot less popular; it seems reasonable that it's a synonym
> for RFC and I don't mind pushing people in that direction
> 
>   - there are a non-trivial number of patches for other projects (JGIT,
> EGIT, StGit, etc). This is somewhat unique to git, where we discuss
> a lot of related projects on the list. But I wonder if other
> projects would use subsystems in a similar way (though I guess for
> the kernel, there are separate subsystems lists, so the "to" or "cc"
> header becomes the more interesting tag).

The kernel mostly uses "[PATCH] subsystem: ...".  Occasionally I see
"[PATCH somegitrepo ...] ..." when it's necessary to explicitly say
whose git repo the patch needs to go through, but that's pretty rare.

> As far as your patch goes, I'd be OK with defining:
> 
>   --rfc::
>   Pretend as if `--subject-prefix='RFC PATCH'` was given.
> 
> for now. If we later add:
> 
>   -P ::
>   Pretend as if `--subject-prefix=' PATCH'` was given.
> 
> then `--rfc` naturally becomes:
> 
>   --rfc::
>   Pretend as if `-P RFC` was given.
> 
> in a backwards-compatible way. It doesn't have to all come at once, and
> it sounds like `-P` may not be as useful for the kernel (though I'd be
> interested if somebody wanted to do a similar count; I don't have a copy
> of the kernel archive handy).

Sounds reasonable to me.  I'll send v3 of --rfc with the requested
additional documentation fix, then.


Re: [PATCH v2] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-19 Thread Josh Triplett
On Mon, Sep 19, 2016 at 10:49:17AM -0700, Junio C Hamano wrote:
> Andrew Donnellan  writes:
> 
> > Sounds good to me. Agreed that "RFC" is essentially the only prefix
> > other than "PATCH" that I see, at least in the kernel.
> 
> Around here I think we saw WIP too, and that makes me lean towards
> Peff's earlier suggestion to allow an end-user supplied string in
> front of PATCH, i.e. "-P RFC" => "--subject-prefix='RFC PATCH'",
> even though I understand that those who _ONLY_ care about RFC would
> prefer --rfc (5 keystrokes) over "-P RFC" (6 keystrokes).

I do share the concern raised elsewhere in the thread that adding new
format-patch short options potentially conflicts with diff/rev-list
short options.  If you're not worried about that, I'd be happy to add
(and document and test) -P.  However, I'd still advocate adding --rfc as
well; it's a common case, and "-P RFC" is actually rather more
keystrokes when you count shifting. :)

There might also be some value in steering people towards "RFC" (since a
WIP is in a way an RFC).

> >> +--rfc::
> >> +  Alias for `--subject-prefix="RFC PATCH"`. Use this when
> >> +  sending an experimental patch for discussion rather than
> >> +  application.
> >
> > Perhaps mention the phrase "Request For Comment" for the benefit of
> > those who aren't familiar ...
> 
> Good point.

I'll add that to the documentation in v3.


[PATCH v2] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-17 Thread Josh Triplett
This provides a shorter and more convenient alias for
--subject-prefix='RFC PATCH'.

Includes documentation in the format-patch manpage, and a new test
covering --rfc.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
v2:
- Add documentation to the format-patch manpage
- Call subject_prefix_callback rather than reimplementing it
- Update test to move expectations inside

 Documentation/git-format-patch.txt |  8 +++-
 builtin/log.c  |  8 
 t/t4014-format-patch.sh|  9 +
 3 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 9624c84..b9590a5 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -19,7 +19,8 @@ SYNOPSIS
   [--start-number ] [--numbered-files]
   [--in-reply-to=Message-Id] [--suffix=.]
   [--ignore-if-in-upstream]
-  [--subject-prefix=Subject-Prefix] [(--reroll-count|-v) ]
+  [--rfc] [--subject-prefix=Subject-Prefix]
+  [(--reroll-count|-v) ]
   [--to=] [--cc=]
   [--[no-]cover-letter] [--quiet] [--notes[=]]
   []
@@ -172,6 +173,11 @@ will want to ensure that threading is disabled for `git 
send-email`.
allows for useful naming of a patch series, and can be
combined with the `--numbered` option.
 
+--rfc::
+   Alias for `--subject-prefix="RFC PATCH"`. Use this when
+   sending an experimental patch for discussion rather than
+   application.
+
 -v ::
 --reroll-count=::
Mark the series as the -th iteration of the topic. The
diff --git a/builtin/log.c b/builtin/log.c
index 92dc34d..5757d91 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1112,6 +1112,11 @@ static int subject_prefix_callback(const struct option 
*opt, const char *arg,
return 0;
 }
 
+static int rfc_callback(const struct option *opt, const char *arg, int unset)
+{
+   return subject_prefix_callback(opt, "RFC PATCH", unset);
+}
+
 static int numbered_cmdline_opt = 0;
 
 static int numbered_callback(const struct option *opt, const char *arg,
@@ -1419,6 +1424,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
N_("start numbering patches at  instead of 1")),
OPT_INTEGER('v', "reroll-count", _count,
N_("mark the series as Nth re-roll")),
+   { OPTION_CALLBACK, 0, "rfc", , NULL,
+   N_("Use [RFC PATCH] instead of [PATCH]"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, rfc_callback },
{ OPTION_CALLBACK, 0, "subject-prefix", , N_("prefix"),
N_("Use [] instead of [PATCH]"),
PARSE_OPT_NONEG, subject_prefix_callback },
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..ed4d3c2 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1073,6 +1073,15 @@ test_expect_success 'empty subject prefix does not have 
extra space' '
test_cmp expect actual
 '
 
+test_expect_success '--rfc' '
+   cat >expect <<-\EOF &&
+   Subject: [RFC PATCH 1/1] header with . in it
+   EOF
+   git format-patch -n -1 --stdout --rfc >patch &&
+   grep ^Subject: patch >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--from=ident notices bogus ident' '
test_must_fail git format-patch -1 --stdout --from=foo >patch
 '

base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
-- 
git-series 0.8.10


Re: [PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-16 Thread Josh Triplett
On Fri, Sep 16, 2016 at 05:49:22PM -0700, Jeff King wrote:
> On Fri, Sep 16, 2016 at 10:27:45AM -0700, Josh Triplett wrote:
> 
> > By far, the most common subject-prefix I've seen other than "PATCH" is
> > "RFC PATCH" (or occasionally "PATCH RFC").  Seems worth optimizing for
> > the common case, to avoid having to spell it out the long way as
> > --subject-prefix='RFC PATCH'.
> 
> "RFC" is the most common one for me, too. And if it ends here, I'm OK
> with it. But I'm a little worried with ending up with a proliferation of
> options.

I haven't seen a significant number of variations on subject prefixes; I
can't think of any other prefix I've seen often enough to suggest an
option.

> If we had a short-option for --subject-prefix, then:
> 
>   -P RFC
> 
> is not so bad compared to "--rfc". But if you want to spell it as "RFC
> PATCH" that's getting a bit longer. We could have a short option for
> "tag this in the subject prefix _in addition_ to writing PATCH". And
> then you could do:
> 
>   -T RFC
> 
> I dunno. One other thing to consider is that format-patch takes
> arbitrary diff options, so we'd want to avoid stomping on them with any
> short options (which is why I used "-T" instead of "-t", though I find
> it unlikely that many people use the latter with format-patch). That's a
> point in favor of --rfc, I think.

I agree; the short option space seems more contentious.  And in any
case, I find --rfc more ergonomic than "-T RFC". :)

> >  builtin/log.c   | 10 ++
> >  t/t4014-format-patch.sh |  9 +
> >  2 files changed, 19 insertions(+), 0 deletions(-)
> 
> Documentation?

Oops, thanks.  I'll send v2 shortly.

> > +static int rfc_callback(const struct option *opt, const char *arg, int 
> > unset)
> > +{
> > +   subject_prefix = 1;
> > +   ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH");
> > +   return 0;
> > +}
> 
> I was going to complain that you don't free() the previous value, but
> actually the other callers do not xstrdup() in the first place (and we
> do not need to do so here, either, as it's a string literal). We
> actually _do_ allocate a new copy when reading the value from config,
> but it's probably not a big deal in practice to leak that.
> 
> I also wonder if you could implement this as just:
> 
>   return subject_prefix_callback(opt, "RFC PATCH", unset);
> 
> And then if you write the documentation as:
> 
>   --rfc::
>   Behave as if --subject-prefix="RFC PATCH" was specified.
> 
> then it will be trivially correct. :)

Nice idea; will do.

> > +cat >expect <<'EOF'
> > +Subject: [RFC PATCH 1/1] header with . in it
> > +EOF
> > +test_expect_success '--rfc' '
> > +   git format-patch -n -1 --stdout --rfc >patch &&
> > +   grep ^Subject: patch >actual &&
> > +   test_cmp expect actual
> > +'
> 
> Our usual style these days is to set up expectations inside the test
> blocks (and use "<<-" to get nice indentation; we also typically use
> "\EOF" but that's purely style).

I copied this from a test immediately above it. :)

I can change it easily enough, though.

- Josh Triplett


[PATCH] format-patch: Add --rfc for the common case of [RFC PATCH]

2016-09-16 Thread Josh Triplett
This provides a shorter and more convenient alias for
--subject-prefix='RFC PATCH'.

Add a test covering --rfc.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---

By far, the most common subject-prefix I've seen other than "PATCH" is
"RFC PATCH" (or occasionally "PATCH RFC").  Seems worth optimizing for
the common case, to avoid having to spell it out the long way as
--subject-prefix='RFC PATCH'.

 builtin/log.c   | 10 ++
 t/t4014-format-patch.sh |  9 +
 2 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 92dc34d..48d6a38 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1112,6 +1112,13 @@ static int subject_prefix_callback(const struct option 
*opt, const char *arg,
return 0;
 }
 
+static int rfc_callback(const struct option *opt, const char *arg, int unset)
+{
+   subject_prefix = 1;
+   ((struct rev_info *)opt->value)->subject_prefix = xstrdup("RFC PATCH");
+   return 0;
+}
+
 static int numbered_cmdline_opt = 0;
 
 static int numbered_callback(const struct option *opt, const char *arg,
@@ -1419,6 +1426,9 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
N_("start numbering patches at  instead of 1")),
OPT_INTEGER('v', "reroll-count", _count,
N_("mark the series as Nth re-roll")),
+   { OPTION_CALLBACK, 0, "rfc", , NULL,
+   N_("Use [RFC PATCH] instead of [PATCH]"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, rfc_callback },
{ OPTION_CALLBACK, 0, "subject-prefix", , N_("prefix"),
N_("Use [] instead of [PATCH]"),
PARSE_OPT_NONEG, subject_prefix_callback },
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..81b0498 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1073,6 +1073,15 @@ test_expect_success 'empty subject prefix does not have 
extra space' '
test_cmp expect actual
 '
 
+cat >expect <<'EOF'
+Subject: [RFC PATCH 1/1] header with . in it
+EOF
+test_expect_success '--rfc' '
+   git format-patch -n -1 --stdout --rfc >patch &&
+   grep ^Subject: patch >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--from=ident notices bogus ident' '
test_must_fail git format-patch -1 --stdout --from=foo >patch
 '

base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
-- 
git-series 0.8.10


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-14 Thread Josh Triplett
On Wed, Sep 14, 2016 at 03:57:36PM -0700, Junio C Hamano wrote:
> Junio C Hamano  writes:
> 
> > I do not mind doing it myself, but I am already in today's
> > integration cycle (which will merge a handful of topics to
> > 'master'), so I won't get around to it for some time.  If you are
> > inclined to, please be my guest ;-)
> 
> I queued this on top for now; I think it can be just squashed into
> your patch.  Please say "I agree" and I'll make it happen, or say
> "that's wrong" followed by a replacement patch ;-).

"I agree". :)

I'd suggest squashing in an *additional* patch to the testsuite to
ensure the presence of the blank line:

diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 535857e..8d90a6e 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -1515,8 +1515,9 @@ test_expect_success 'format-patch -o overrides 
format.outputDirectory' '
 
 test_expect_success 'format-patch --base' '
git checkout side &&
-   git format-patch --stdout --base=HEAD~3 -1 | tail -n 6 >actual &&
-   echo "base-commit: $(git rev-parse HEAD~3)" >expected &&
+   git format-patch --stdout --base=HEAD~3 -1 | tail -n 7 >actual &&
+   echo >expected &&
+   echo "base-commit: $(git rev-parse HEAD~3)" >>expected &&
echo "prerequisite-patch-id: $(git show --patch HEAD~2 | git patch-id 
--stable | awk "{print \$1}")" >>expected &&
echo "prerequisite-patch-id: $(git show --patch HEAD~1 | git patch-id 
--stable | awk "{print \$1}")" >>expected &&
signature >> expected &&


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-09 Thread Josh Triplett
On Fri, Sep 09, 2016 at 01:51:04PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > On Fri, Sep 09, 2016 at 12:41:56PM -0700, Junio C Hamano wrote:
> >> So here is a suggested replacement.  I notice that in the MIME case,
> >> we do not leave any blank line between the last line of the patch
> >> and the baseinfo, which makes it look a bit strange, e.g. output of
> >> "format-patch --attach=mimemime -1" may end like this:
> >> 
> >> +   test_write_lines 1 2 >expect &&
> >> +   test_cmp expect actual
> >> +'
> >> +
> >>  test_expect_success 'format-patch --pretty=mboxrd' '
> >> sp=" " &&
> >> cat >msg <<-INPUT_END &&
> >> base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
> >> 
> >> --mimemime--
> >> 
> >> We may want to tweak it a bit further.
> >>  ...
> >
> > Looks good to me.
> 
> Thanks.
> 
> Do you mean that the base information that appears immediately after
> the patch text (either for MIME case or plain-text) does not bother
> you, though?

Sorry, I should have clarified that further.  I meant that the
additional tests looked good to me.

As it turns out, the patch I used to test this on happened to have a
blank line as the last line of context before the base-commit line, so
I'd overlooked this in the non-MIME case.  The issue you mentioned does
apply to both the MIME and non-MIME cases, and I agree that it needs
fixing.  It doesn't seem like a functional issue, but aesthetically it
doesn't look good.

Do you plan to make that change to print an additional blank line
(likely inside print_bases), or should I?


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-09 Thread Josh Triplett
On Fri, Sep 09, 2016 at 12:41:56PM -0700, Junio C Hamano wrote:
> So here is a suggested replacement.  I notice that in the MIME case,
> we do not leave any blank line between the last line of the patch
> and the baseinfo, which makes it look a bit strange, e.g. output of
> "format-patch --attach=mimemime -1" may end like this:
> 
> +   test_write_lines 1 2 >expect &&
> +   test_cmp expect actual
> +'
> +
>  test_expect_success 'format-patch --pretty=mboxrd' '
> sp=" " &&
> cat >msg <<-INPUT_END &&
> base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
> 
> ------mimemime--
> 
> We may want to tweak it a bit further.
> 
> -- >8 --
> From: Josh Triplett <j...@joshtriplett.org>
> Date: Wed, 7 Sep 2016 18:12:01 -0700
> Subject: [PATCH] format-patch: show base info before email signature
> 
> Any text below the "-- " for the email signature gets treated as part of
> the signature, and many mail clients will trim it from the quoted text
> for a reply.  Move it above the signature, so people can reply to it
> more easily.
> 
> Similarly, when producing the patch as a MIME attachment, the
> original code placed the base info after the attached part, which
> would be discarded.  Move the base info to the end of the part,
> still inside the part boundary.
> 
> Add tests for the exact format of the email signature, and add tests
> to ensure that the base info appears before the email signature when
> producing a plain-text output, and that it appears before the part
> boundary when producing a MIME attachment.
> 
> Signed-off-by: Josh Triplett <j...@joshtriplett.org>
> Signed-off-by: Junio C Hamano <gits...@pobox.com>

Looks good to me.

>  builtin/log.c   |  4 ++--
>  t/t4014-format-patch.sh | 30 +-
>  2 files changed, 27 insertions(+), 7 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index 92dc34d..d69d5e6 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1042,7 +1042,6 @@ static void make_cover_letter(struct rev_info *rev, int 
> use_stdout,
>   diff_flush();
>  
>   fprintf(rev->diffopt.file, "\n");
> - print_signature(rev->diffopt.file);
>  }
>  
>  static const char *clean_message_id(const char *msg_id)
> @@ -1720,6 +1719,7 @@ int cmd_format_patch(int argc, const char **argv, const 
> char *prefix)
>   make_cover_letter(, use_stdout,
> origin, nr, list, branch_name, quiet);
>   print_bases(, rev.diffopt.file);
> + print_signature(rev.diffopt.file);
>   total++;
>   start_number--;
>   }
> @@ -1779,13 +1779,13 @@ int cmd_format_patch(int argc, const char **argv, 
> const char *prefix)
>   if (!use_stdout)
>   rev.shown_one = 0;
>   if (shown) {
> + print_bases(, rev.diffopt.file);
>   if (rev.mime_boundary)
>   fprintf(rev.diffopt.file, "\n--%s%s--\n\n\n",
>  mime_boundary_leader,
>  rev.mime_boundary);
>   else
>   print_signature(rev.diffopt.file);
> - print_bases(, rev.diffopt.file);
>   }
>   if (!use_stdout)
>   fclose(rev.diffopt.file);
> diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> index b0579dd..535857e 100755
> --- a/t/t4014-format-patch.sh
> +++ b/t/t4014-format-patch.sh
> @@ -754,9 +754,22 @@ test_expect_success 'format-patch 
> --ignore-if-in-upstream HEAD' '
>   git format-patch --ignore-if-in-upstream HEAD
>  '
>  
> +git_version="$(git --version | sed "s/.* //")"
> +
> +signature() {
> + printf "%s\n%s\n\n" "-- " "${1:-$git_version}"
> +}
> +
> +test_expect_success 'format-patch default signature' '
> + git format-patch --stdout -1 | tail -n 3 >output &&
> + signature >expect &&
> + test_cmp expect output
> +'
> +
>  test_expect_success 'format-patch --signature' '
> - git format-patch --stdout --signature="my sig" -1 >output &&
> - grep "my sig" output
> + git format-patch --stdout --signature="my sig" -1 | tail -n 3 >output &&
> + signature "my sig" >expect &&
> + test_cmp expect output
>  '
>  
>  test_expect_success 'format-pat

Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Josh Triplett
On Thu, Sep 08, 2016 at 11:34:15AM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > Any text below the "-- " for the email signature gets treated as part of
> > the signature, and many mail clients will trim it from the quoted text
> > for a reply.  Move it above the signature, so people can reply to it
> > more easily.
> >
> > Add tests for the exact format of the email signature, and add tests to
> > ensure the email signature appears last.
> >
> > (Patch by Junio Hamano; tests by Josh Triplett.)
> > Signed-off-by: Josh Triplett <j...@joshtriplett.org>
> > ---
> >
> > Does the above seem reasonable, for a patch that incorporates the
> > proposed patch from Message-Id
> > xmqqh99rpud4@gitster.mtv.corp.google.com and adds tests?
> 
> Other than that I'd probably retitle it,

Ah, true, I should have titled it "format-patch: move base commit ...".

> your problem description
> looks perfect.  I am still not sure if the code does a reasonable
> thing in MIME case, though.

It *looks* correct to me.

> Thanks for tying the loose ends anyway.
> 
> > diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> > index b0579dd..a4af275 100755
> > --- a/t/t4014-format-patch.sh
> > +++ b/t/t4014-format-patch.sh
> > @@ -754,9 +754,22 @@ test_expect_success 'format-patch 
> > --ignore-if-in-upstream HEAD' '
> > git format-patch --ignore-if-in-upstream HEAD
> >  '
> >  
> > +git_version="$(git --version | sed "s/.* //")"
> > +
> > +signature() {
> > +   printf "%s\n%s\n\n" "-- " "${1:-$git_version}"
> > +}
> 
> Hmph.  I would actually have expected that you would force a fixed
> and an easily noticeable string via format.signature for the purpose
> of the test,

One of the git tests already did that.  I just modified that test to
test the exact signature format and that it appears at the end, rather
than just grepping to check that the signature string appears somewhere.
Then when doing so, I realized that I should check the default case too
(at which point that test change probably should have gone in a separate
patch).

> but I guess this test covers a lot more than what the
> purpose of the main part of the patch does (i.e. enforces that the
> default signature must be made from the version string of Git).  It
> is not a bad thing to test, but it probably does not belong to this
> change.  If you _were_ to split the patch in two, that is where I
> probably would split, i.e. "we didn't test what the default signature
> looks like, or we didn't make sure --signature option overrides the
> default signature, so let's test it" as the preliminary preparation,
> followed by "having base info after sig is inconvenient, let's move
> it and make sure base info stays before sig with additional test" as
> the second (and primary) patch.
>
> But a single patch is fine.
> 
> Thanks.

If any other change ends up being necessary, I'll split the patch in v2.

- Josh Triplett


[PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-07 Thread Josh Triplett
Any text below the "-- " for the email signature gets treated as part of
the signature, and many mail clients will trim it from the quoted text
for a reply.  Move it above the signature, so people can reply to it
more easily.

Add tests for the exact format of the email signature, and add tests to
ensure the email signature appears last.

(Patch by Junio Hamano; tests by Josh Triplett.)
Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---

Does the above seem reasonable, for a patch that incorporates the
proposed patch from Message-Id
xmqqh99rpud4@gitster.mtv.corp.google.com and adds tests?
Alternatively, feel free to split this patch into two, the first with
you as the author.  I can confirm that the code change doesn't break any
existing tests; only the new tests added here check for it.  So a
two-patch series wouldn't result in any breakage after the first patch.

 builtin/log.c   |  4 ++--
 t/t4014-format-patch.sh | 22 +-
 2 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/builtin/log.c b/builtin/log.c
index 92dc34d..d69d5e6 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1042,7 +1042,6 @@ static void make_cover_letter(struct rev_info *rev, int 
use_stdout,
diff_flush();
 
fprintf(rev->diffopt.file, "\n");
-   print_signature(rev->diffopt.file);
 }
 
 static const char *clean_message_id(const char *msg_id)
@@ -1720,6 +1719,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
make_cover_letter(, use_stdout,
  origin, nr, list, branch_name, quiet);
print_bases(, rev.diffopt.file);
+   print_signature(rev.diffopt.file);
total++;
start_number--;
}
@@ -1779,13 +1779,13 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
if (!use_stdout)
rev.shown_one = 0;
if (shown) {
+   print_bases(, rev.diffopt.file);
if (rev.mime_boundary)
fprintf(rev.diffopt.file, "\n--%s%s--\n\n\n",
   mime_boundary_leader,
   rev.mime_boundary);
else
print_signature(rev.diffopt.file);
-   print_bases(, rev.diffopt.file);
}
if (!use_stdout)
fclose(rev.diffopt.file);
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..a4af275 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -754,9 +754,22 @@ test_expect_success 'format-patch --ignore-if-in-upstream 
HEAD' '
git format-patch --ignore-if-in-upstream HEAD
 '
 
+git_version="$(git --version | sed "s/.* //")"
+
+signature() {
+   printf "%s\n%s\n\n" "-- " "${1:-$git_version}"
+}
+
+test_expect_success 'format-patch default signature' '
+   git format-patch --stdout -1 | tail -n 3 >output &&
+   signature >expect &&
+   test_cmp expect output
+'
+
 test_expect_success 'format-patch --signature' '
-   git format-patch --stdout --signature="my sig" -1 >output &&
-   grep "my sig" output
+   git format-patch --stdout --signature="my sig" -1 | tail -n 3 >output &&
+   signature "my sig" >expect &&
+   test_cmp expect output
 '
 
 test_expect_success 'format-patch with format.signature config' '
@@ -1502,12 +1515,11 @@ test_expect_success 'format-patch -o overrides 
format.outputDirectory' '
 
 test_expect_success 'format-patch --base' '
git checkout side &&
-   git format-patch --stdout --base=HEAD~3 -1 >patch &&
-   grep "^base-commit:" patch >actual &&
-   grep "^prerequisite-patch-id:" patch >>actual &&
+   git format-patch --stdout --base=HEAD~3 -1 | tail -n 6 >actual &&
echo "base-commit: $(git rev-parse HEAD~3)" >expected &&
echo "prerequisite-patch-id: $(git show --patch HEAD~2 | git patch-id 
--stable | awk "{print \$1}")" >>expected &&
echo "prerequisite-patch-id: $(git show --patch HEAD~1 | git patch-id 
--stable | awk "{print \$1}")" >>expected &&
+   signature >> expected &&
test_cmp expected actual
 '
 
base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b
-- 
git-series 0.8.10


Re: [RFC/PATCH v2 0/3] patch-id for merges

2016-09-07 Thread Josh Triplett
On Wed, Sep 07, 2016 at 06:01:01PM -0400, Jeff King wrote:
> Here's a re-roll of the series I posted at:
> 
>   
> http://public-inbox.org/git/20160907075346.z6wtmqnfc6bsu...@sigill.intra.peff.net/
> 
> Basically, it drops the time for "format-patch --cherry-pick" on a
> particular case from 3 minutes down to 3 seconds, by avoiding diffs
> on merge commits. Compared to v1, it fixes the totally-broken handling
> of commit_patch_id() pointed out by Johannes.
> 
> We can drop the diffs on the merge commits because they're quite broken,
> as discussed in the commit message of patch 3 (they don't take into
> account any parent except the first). So what do we do when somebody
> asks for the patch-id of a merge commit?
> 
> This is still marked RFC, because there are really two approaches here,
> and I'm not sure which one is better for "format-patch --base". I'd like
> to get input from Xiaolong Ye (who worked on --base), and Josh Triplett
> (who has proposed some patches in that area, and is presumably using
> them).

Thanks.

I'd love to see a more resilient patch-id mechanism, to make it easier
to match up patches between branches.  I don't think it makes sense to
talk about the patch-id of a merge commit (though it might make sense
for a merge which makes additional changes not present in any of the
parents).  Even if someone wants to match up merge commits with merge
commits, I don't think that should happen via patch-id; I think that
should happen in terms of "what patches does this merge introduce",
without constructing a merge-patch-id via a Merkle tree of commit
patch-ids.

So, I think this patch series makes sense (modulo the comments about the
commit message in patch 3).  We already don't respect merge commits when
doing format-patch; this seems consistent with that.  If we ever make it
possible for format-patch to handle merge commits, then we should also
allow it to have merge commits as prerequisites.

- Josh Triplett


Re: format-patch base-commit: moving to above the patch?

2016-09-07 Thread Josh Triplett
On Wed, Sep 07, 2016 at 09:06:31AM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > Currently, format-patch puts base-commit and prerequisite-patch-id
> > information below the patch, and below the email signature.  Most mail
> > clients automatically trim everything below the signature marker as
> > unimportant when quoting a mail for a reply, which would make it
> > difficult for someone to reply, quote the base-commit, and say something
> > like "I don't have this commit, where did it come from?" or "Can you
> > please rebase this on ...".
> >
> > Might it make sense to move this information adjacent to the diffstat,
> > instead?  Or, at least, above the email signature?
> 
> I personally feel that it would be annoying to have them near
> diffstat, especially given that unbounded many prereq patches can be
> listed.  It would not be too bad to flip the order between the call
> to print_signature() and print_bases(), though.

I can live with that; having it above the signature was a much bigger
concern for me than moving it above the patch.

> The extent of the change needed to (note: not even compile-tested)
> does does not look too bad, either.
> 
> I did not carefully think what the right adjustment for the MIME
> case is, though.

Seems plausible to me.

> I would expect some tests that expect the current order of the tail
> end of the output to break, which you would need to adjust.  And if
> there is no such test right now, you should add one, as your inquiry
> and this patch _sets_ a concrete expectation as to what should come
> before the signature line, which future updates should not break.

I can do that; arguably we should also have a test that nothing *except*
the git version appears after the signature line.

> 
>  builtin/log.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/builtin/log.c b/builtin/log.c
> index 92dc34d..d69d5e6 100644
> --- a/builtin/log.c
> +++ b/builtin/log.c
> @@ -1042,7 +1042,6 @@ static void make_cover_letter(struct rev_info *rev, int 
> use_stdout,
>   diff_flush();
>  
>   fprintf(rev->diffopt.file, "\n");
> - print_signature(rev->diffopt.file);
>  }
>  
>  static const char *clean_message_id(const char *msg_id)
> @@ -1720,6 +1719,7 @@ int cmd_format_patch(int argc, const char **argv, const 
> char *prefix)
>   make_cover_letter(, use_stdout,
> origin, nr, list, branch_name, quiet);
>   print_bases(, rev.diffopt.file);
> + print_signature(rev.diffopt.file);
>   total++;
>   start_number--;
>   }
> @@ -1779,13 +1779,13 @@ int cmd_format_patch(int argc, const char **argv, 
> const char *prefix)
>   if (!use_stdout)
>   rev.shown_one = 0;
>   if (shown) {
> + print_bases(, rev.diffopt.file);
>   if (rev.mime_boundary)
>   fprintf(rev.diffopt.file, "\n--%s%s--\n\n\n",
>  mime_boundary_leader,
>  rev.mime_boundary);
>   else
>   print_signature(rev.diffopt.file);
> - print_bases(, rev.diffopt.file);
>   }
>   if (!use_stdout)
>   fclose(rev.diffopt.file);


Re: [PATCH 9/9] rebase -i: rearrange fixup/squash lines using the rebase--helper

2016-09-03 Thread Josh Triplett
On Fri, Sep 02, 2016 at 06:23:42PM +0200, Johannes Schindelin wrote:
> Let's reimplement this with linear complexity (using a hash map to
> match the commits' subject lines) for the common case; Sadly, the
> fixup/squash feature's design neglected performance considerations,
> allowing arbitrary prefixes (read: `fixup! hell` will match the
> commit subject `hello world`), which means that we are stuck with
> quadratic performance in the worst case.

If the performance of that case matters enough, we can do better than
quadratic complexity: maintain a trie of the subjects, allowing prefix
lookups.  (Or hash all the prefixes, which you can do in linear time on
a string: hash next char, save hash, repeat.)  However, that would
pessimize the normal case of either a complete subject or a sha1, due to
the extra time taken constructing the data structure.  Probably not
worth it, if you assume that most "fixup!" subjects come from `git
commit --fixup` or similar automated means.


format-patch base-commit: moving to above the patch?

2016-09-01 Thread Josh Triplett
Currently, format-patch puts base-commit and prerequisite-patch-id
information below the patch, and below the email signature.  Most mail
clients automatically trim everything below the signature marker as
unimportant when quoting a mail for a reply, which would make it
difficult for someone to reply, quote the base-commit, and say something
like "I don't have this commit, where did it come from?" or "Can you
please rebase this on ...".

Might it make sense to move this information adjacent to the diffstat,
instead?  Or, at least, above the email signature?


Re: Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-24 Thread Josh Triplett
On Wed, Aug 24, 2016 at 07:05:17PM +0200, Jakub Narębski wrote:
> W dniu 24.08.2016 o 16:20, Josh Triplett pisze:
> > On Wed, Aug 24, 2016 at 03:16:56PM +0200, Jakub Narębski wrote:
> [...]
> >> Not really.
> >>
> >> The above means only that the support for new syntax would be not
> >> as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
> >> except for the case where submodule uses the same object database as
> >> supermodule.
> >>
> >> So it wouldn't be as easy (on conceptual level) as adding support
> >> for ':/' or '^{/}'.  It would be at least as
> >> hard, if not harder, as adding support for '@{-1}' and its '-'
> >> shortcut.
> > 
> > Depends on which cases you want to handle.  In the most general case,
> > you'd need to find and process the applicable .gitmodules file, which
> > would only work if you started from the top-level tree, not a random
> > treeish.  On the other hand, in the most general case, you don't
> > necessarily even have the module you need, because .git/modules only
> > contains the modules the *current* version needed, not every past
> > version.
> 
> There is an additional problem, namely that directory with submodule
> can be renamed.
> 
> I don't know if there is an existing API, but assuming modern
> git-submodule (with repository in .git/modules) you would have to
> do the following steps for ://:
> 
>  * look up :.gitmodules for module which 'path'
>is ; let's say it is named 
>  * check if : commit object
>is present in .git/modules/
>  * look up this object

This also assumes your lookup started with a  and not an
intermediate , but that'll work in many cases.

> > As an alternate approach (pun intended): treat every module in
> > .git/modules as an alternate and just look up the object by hash.  
> 
> This could be a good fallback, to search through all submodules.
> 
> > Or, teach git-submodule to store all the objects for submodules in the
> > supermodule's .git/objects (and teach git's reachability algorithm to
> > respect refs in .git/modules, or store their refs in
> > .git/refs/submodules/ or in a namespace).
> 
> And fallback to this fallback could be searching through supermodule
> object repository.

I'd flip those around: first search registered .gitmodules, then look up
the object in the superproject (since you have it at hand), and then
maybe search every submodule.

> >> Josh, what was the reason behind proposing this feature? Was it
> >> conceived as adding completeness to gitrevisions syntax, a low-hanging
> >> fruit?  It isn't (the latter).  Or was it some problem with submodule
> >> handling that you would want to use this syntax for?
> > 
> > This wasn't an abstract/theoretical completeness issue.  I specifically
> > wanted this syntax for practical use with actual trees containing
> > gitlinks, motivated by having a tool that creates and uses such
> > gitlinks. :)
> 
> Could you explain what you need in more detail?  Is it a fragment
> of history of submodule, a contents of a file at given point of
> superproject history, diff between file-in-submodule and something
> else, or what?

As part of git-series, I have commits, whose trees contain various
gitlinks, such as "series" and "base".  Those gitlinks point to commits
in the same repository.  I'd like to use those gitlinks everywhere I
could use any other committish, such as a branch name.  In particular,
I'd like to write things like some_feature:series:path/to/file ("what
does path/to/file look like in the current version of some_feature"),
some_feature:series^ ("what's the second-to-last commit in
some_feature"), some_feature~5:series:path/to/file ("what did
path/to/file look like in an older version of some_feature"), or
some_feature~5:base..some_feature~5:series~2 ("all but the last two
patches in some_feature~5").  Those should work with show, diff,
format-patch, log, etc.

> >> As for usefulness: this fills the hole in accessing submodules, one
> >> that could be handled by combining plumbing-level commands.  Namely,
> >> there are 5 states of submodule (as I understand it)
> >>
> >>  * recorded in ref / commit in supermodule
> >>  * recorded in the index in supermodule
> >>  - recorded in ref / commit in submodule
> >>  - recorded in the index in submodule
> >>  - state of worktree in submodule
> >>
> >> The last three can be easyly acessed by cd-ing to submodule.  The first
> >> two are not easy to get, AFAIUC.
> > 
> > Right.  I primarily care about those first two cases, especially the
> > first one: given a commit containing a gitlink, how can I easily dig
> > into the linked commit?
> 
> All right.
> 
> Though you can cobble it with plumbing... just saying.

Sure, but that makes the expression much more complex.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-24 Thread Josh Triplett
On Wed, Aug 24, 2016 at 03:16:56PM +0200, Jakub Narębski wrote:
> W dniu 24.08.2016 o 07:36, Junio C Hamano pisze:
> > Jakub Narębski  writes:
> > 
> >> The point is that submodule has it's own object database.  It might
> >> be the same as superproject's, but you need to handle submodule objects
> >> being in separate submodule repository anyway.  Common repository is
> >> just a special case.
> >>
> >> By the way, this also means that proposed "extended extended SHA1"
> >> syntax would be useful to user's of submodules...
> > 
> > Not really.
> > 
> > I think that you gave a prime example why ://
> > is not a useful thing for submodules.  When the syntax resolves to a
> > 40-hex object name, that object name by itself is not useful.
> > 
> > You also need to carry an additional piece of information that lets
> > you identify the location of the repository, in which the object
> > name is valid, in the current user's context (i.e. somewhere in the
> > superproject where the submodule lives).  In other words, you'd need
> > to carry : around anyway for the object name to be
> > useful, so there is no good reason why anybody should insist that
> > the plumbing level resolve :// directly to an
> > object name in the first place.
> 
> Not really.
> 
> The above means only that the support for new syntax would be not
> as easy as adding it to 'git rev-parse' (and it's built-in equivalent),
> except for the case where submodule uses the same object database as
> supermodule.
> 
> So it wouldn't be as easy (on conceptual level) as adding support
> for ':/' or '^{/}'.  It would be at least as
> hard, if not harder, as adding support for '@{-1}' and its '-'
> shortcut.

Depends on which cases you want to handle.  In the most general case,
you'd need to find and process the applicable .gitmodules file, which
would only work if you started from the top-level tree, not a random
treeish.  On the other hand, in the most general case, you don't
necessarily even have the module you need, because .git/modules only
contains the modules the *current* version needed, not every past
version.

As an alternate approach (pun intended): treat every module in
.git/modules as an alternate and just look up the object by hash.  Or,
teach git-submodule to store all the objects for submodules in the
supermodule's .git/objects (and teach git's reachability algorithm to
respect refs in .git/modules, or store their refs in
.git/refs/submodules/ or in a namespace).

> Josh, what was the reason behind proposing this feature? Was it
> conceived as adding completeness to gitrevisions syntax, a low-hanging
> fruit?  It isn't (the latter).  Or was it some problem with submodule
> handling that you would want to use this syntax for?

This wasn't an abstract/theoretical completeness issue.  I specifically
wanted this syntax for practical use with actual trees containing
gitlinks, motivated by having a tool that creates and uses such
gitlinks. :)

> As for usefulness: this fills the hole in accessing submodules, one
> that could be handled by combining plumbing-level commands.  Namely,
> there are 5 states of submodule (as I understand it)
> 
>  * recorded in ref / commit in supermodule
>  * recorded in the index in supermodule
>  - recorded in ref / commit in submodule
>  - recorded in the index in submodule
>  - state of worktree in submodule
> 
> The last three can be easyly acessed by cd-ing to submodule.  The first
> two are not easy to get, AFAIUC.

Right.  I primarily care about those first two cases, especially the
first one: given a commit containing a gitlink, how can I easily dig
into the linked commit?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-23 Thread Josh Triplett
On Mon, Aug 22, 2016 at 08:39:19PM +0200, Jakub Narębski wrote:
> W dniu 21.08.2016 o 16:26, Josh Triplett pisze:
> > On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
> >> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
> >>> Currently, if you have a branch "somebranch" that contains a gitlink
> >>> "somecommit", you can write "somebranch:somecommit" to refer to the
> >>> commit, just like a tree or blob.  ("man git-rev-parse" defines this
> >>> syntax in the "SPECIFYING REVISIONS" section.)  You can use this
> >>> anywhere you can use a committish, including "git show
> >>> somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
> >>> or even "git format-patch -1 somebranch:somecommit".
> >>>
> >>> However, you cannot traverse *through* the gitlink to look at files
> >>> inside its own tree, or to look at other commits relative to that
> >>> commit.  For instance, "somebranch:somecommit:somefile" and
> >>> "somebranch:somecommit~3" do not work.
> >>
> >> Note that there is the same problem traversing through trees:
> >> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
> >> doesn't:
> >>
> >>   $ git cat-file -p HEAD:subdir:file
> >>   fatal: Not a valid object name HEAD:subdir:file
> > 
> > Interesting point; if extending this syntax anyway, any treeish ought to
> > work, not just a committish.
> 
> Actually, because you can use simply "HEAD:subdir/file" I'd rather
> it didn't work (no two ways of access), unless we can get it for free.

Agreed.  I suspect we'd get it for free if we introduced a syntax for
traversing through commits (by allowing that syntax to work with any
treeish), but if not, I certainly don't see any value in adding a second
syntax for accessing tree contents.

> >>> I'd love to have a syntax that allows traversing through the gitlink to
> >>> other files or commits.  Ideally, I'd suggest the syntax above, as a
> >>> natural extension of the existing extended syntax.
> >>
> >> And with the above manual resolving, you can see the problem with
> >> implementing it: the git-cat-file (in submodule) and git-rev-parse
> >> (in supermodule) are across repository boundary.
> > 
> > Only if the gitlink points to a commit that doesn't exist in the same
> > repository.  A gitlink can point to a commit you already have.
> 
> The idea of submodules is that tree object in superproject includes
> link to commit of subproject (so called gitlink).  Tree object is
> in superproject repository, while gitlinked commit is in submodule
> repository.
> 
> True, with modern Git the submodule repository is embedded in .git
> area of superproject, with '.git' in submodule being gitling file,
> but by design those objects are in different repositories, in different
> object databases.

git-submodule handles them that way by default, yes.  But a gitlink
doesn't inherently have to point to a separate repository, and even a
submodule could point to an object available in the same repository
(perhaps via another ref).

git-series creates such gitlinks, for instance.

> >> Also the problem with proposed syntax is that is not very visible.
> >> But perhaps it is all right.  Maybe :/ as separator would be better,
> >> or using parentheses or braces?
> > 
> > It seems as visible as the standard commit:path syntax; the second colon
> > seems just as visible as the first.  :/ already has a different meaning
> > (text search), so that would introduce inconsistency.
> 
> Actually ":/" has a special meaning only if it is at beginning:

True, but it seems inconsistent to have :/ mean search if at the
beginning, or traversal if not.

> But perhaps '//' would be better.

That does seem unambiguous, and it can't conflict with an existing file.
Does it seem reasonable to allow that for the initial commit as well
('committish//file', as well as 'commit//gitlink//file')?

Also, while that handles traversal into the tree contained in the
gitlinked commit, what about navigating by commit (using '~' and '^',
for instance)?  Does it seem reasonable to allow those as well, perhaps
only if you use // to reach the gitlink?  For instance,
'commit//gitlink~3', or 'commit//gitlink^{tree}'?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding more namespace support to git

2016-08-22 Thread Josh Triplett
On Mon, Aug 22, 2016 at 07:36:31PM +0100, Richard wrote:
> On 21 Aug 2016 15:07, "Josh Triplett" <j...@joshtriplett.org> wrote:
> > I'd like to see it work more automatically than that.  Perhaps a
> > separate environment variable to set the client-side namespace?
> 
> How about a config option? That could be set globally, per repository, in
> the environment or on the command line.

That might work, though you wouldn't normally want to set it globally or
per-repository (since it affects access to a repository and you'd
typically want to use multiple different values or it wouldn't have much
point).

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-21 Thread Josh Triplett
On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
> > Currently, if you have a branch "somebranch" that contains a gitlink
> > "somecommit", you can write "somebranch:somecommit" to refer to the
> > commit, just like a tree or blob.  ("man git-rev-parse" defines this
> > syntax in the "SPECIFYING REVISIONS" section.)  You can use this
> > anywhere you can use a committish, including "git show
> > somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
> > or even "git format-patch -1 somebranch:somecommit".
> > 
> > However, you cannot traverse *through* the gitlink to look at files
> > inside its own tree, or to look at other commits relative to that
> > commit.  For instance, "somebranch:somecommit:somefile" and
> > "somebranch:somecommit~3" do not work.
> 
> Note that there is the same problem traversing through trees:
> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
> doesn't:
> 
>   $ git cat-file -p HEAD:subdir:file
>   fatal: Not a valid object name HEAD:subdir:file

Interesting point; if extending this syntax anyway, any treeish ought to
work, not just a committish.

> Though you can do resolve step manually
> 
>   $ git cat-file -p $(git rev-parse HEAD:subdir):file
> 
> This works.

True, but that seems quite inconvenient.

> > I'd love to have a syntax that allows traversing through the gitlink to
> > other files or commits.  Ideally, I'd suggest the syntax above, as a
> > natural extension of the existing extended syntax.
> 
> And with the above manual resolving, you can see the problem with
> implementing it: the git-cat-file (in submodule) and git-rev-parse
> (in supermodule) are across repository boundary.

Only if the gitlink points to a commit that doesn't exist in the same
repository.  A gitlink can point to a commit you already have.

> Also the problem with proposed syntax is that is not very visible.
> But perhaps it is all right.  Maybe :/ as separator would be better,
> or using parentheses or braces?

It seems as visible as the standard commit:path syntax; the second colon
seems just as visible as the first.  :/ already has a different meaning
(text search), so that would introduce inconsistency.

> > (That syntax would potentially introduce ambiguity if you had a file
> > named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
> > problem, though; the existing syntax already doesn't support accessing a
> > file named "x..y" or "x...y", so scripts already can't expect to access
> > arbitrary filenames with that syntax without some kind of quoting, which
> > we also don't have.)
> 
> Errr... what?
> 
>   $ echo A..B >A..B
>   $ git add A..B
>   $ git commit -m 'A..B added'
>   [master 2d69af9] A..B added
>1 file changed, 1 insertion(+), 1 deletion(-)
>create mode 100644 A..B
>   $ git show HEAD:A..B
>   A..B

I stand corrected; I didn't find that.  I thought rev parsing worked
independently from the repository, and didn't have any automagic
detection based on the contents of the repository?

This seems ambiguous, and (AFAICT) not documented.  If HEAD:A and B both
refer to a commit, in addition to the blob A..B, which will HEAD:A..B
refer to?  I did test the HEAD:gitlink..anotherbranch case, and it does
parse as a range.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding more namespace support to git

2016-08-21 Thread Josh Triplett
On Sun, Aug 21, 2016 at 12:30:16PM +0100, Richard wrote:
> On 21 August 2016 at 03:05, Josh Triplett <j...@joshtriplett.org> wrote:
> > Unfortunately, I think at this point, GIT_NAMESPACE has to exclusively
> > refer to the namespace for the remote end, to avoid breakage.  Which
> > means any automatic pervasive support for namespaces on the local side
> > would need to use a different mechanism.  (In addition to applying to
> > ref enumeration, this would also need to apply to the local end of
> > refspecs.)  And this new mechanism would need to not affect the remote
> > end, to allow remapping the local end while accessing an un-namespaced
> > (or differently namespaced) remote.
> 
> The problem for hooks is that it is implicitly inherited,
> so it could work if upload-pack receive-pack and http-backend work
> with GIT_NAMESPACE set,
> but everything else that wants to use a namespace has to set
> --namespace on the command-line.

I'd like to see it work more automatically than that.  Perhaps a
separate environment variable to set the client-side namespace?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Most recent revision that contains a string

2016-08-21 Thread Josh Triplett
On Sun, Aug 21, 2016 at 10:13:33AM +0200, Andreas Schwab wrote:
> On Aug 20 2016, Josh Triplett <j...@joshtriplett.org> wrote:
> > If you want to find a change that introduces or removes a particular
> > string, you could use "git log -S".  That doesn't allow regexes,
> 
> It does, actually, see --pickaxe-regex.

Thanks; I found that after looking up "git log -G", which Eric Wong
mentioned in another reply.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Adding more namespace support to git

2016-08-20 Thread Josh Triplett
On Sat, Aug 20, 2016 at 08:07:00PM +0100, Richard wrote:
> I work on a git server called gitano.
> We've been using and recommending cgit for the web UI.
> 
> I've been working on adding git namespace support to both,
> so that we can separate administrative branches from code branches.
> 
> Because git is not namespace aware for anything but git-upload-pack
> and git-receive-pack,
> I've had to implement namespace parsing in cgit
> for listing branches, showing logs, displaying notes and commit decorations.
> It might be more useful if this support was added to git itself,
> so other git servers could make use of it so there's less duplicated code.
> 
> I think the way to do this would be to make the low-level ref reading 
> functions,
> read_raw_ref, for_each_reflog_ent*, reflog_exists etc.,
> interpret the ref they are passed as being relative to the current git
> namespace.
> 
> Since when upload-pack and receive-pack run hooks they leave GIT_NAMESPACE set
> there are hook scripts that expect that the current namespace is ignored,
> so commands that now want to be namespace aware would have to opt-in.

That seems really unfortunate.  While at the time we wanted to start
with namespace support in upload-pack and receive-pack (and
http-backend) because those would allow using it as a server-side
storage format, I don't think we realized that leaving GIT_NAMESPACE in
the hook environment would completely prevent other git commands from
automatically handling namespaces.

And conversely, we can't just have upload-pack and receive-pack start
removing it from the hook environment, because a hook might expect to
read the current namespace from it (and then run git commands that the
hook expects will ignore it).

(This also affects libgit2; I recently added a function to libgit2 to
interpret various git environment variables, including GIT_NAMESPACE.
If git commands can't just use that automatically, that'll need to
change too, to avoid unexpected behavior in hooks.)

For that matter, someone could run "GIT_NAMESPACE=foo git push
remotename branchname" or
"GIT_NAMESPACE=foo git clone remotename", and based on the current
behavior, they'd expect to have the namespace apply to the remote end,
but not the local end.

Unfortunately, I think at this point, GIT_NAMESPACE has to exclusively
refer to the namespace for the remote end, to avoid breakage.  Which
means any automatic pervasive support for namespaces on the local side
would need to use a different mechanism.  (In addition to applying to
ref enumeration, this would also need to apply to the local end of
refspecs.)  And this new mechanism would need to not affect the remote
end, to allow remapping the local end while accessing an un-namespaced
(or differently namespaced) remote.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Most recent revision that contains a string

2016-08-20 Thread Josh Triplett
On Sat, Aug 20, 2016 at 02:41:35PM -0700, Nikolaus Rath wrote:
> Hello,
> 
> What's the easiest way to find the most recent revision (of any file in
> the repository, including those that have been deleted in the current
> HEAD) that contains a given string?
> 
> I was hoping that "git grep" would do this (like in Mercurial), but as
> far as I can tell it only greps through the working copy. Or is there a
> trick that I'm missing?

git grep can search through any arbitrary blob, tree, commit, or tag.
So you can search through HEAD~10, or HEAD~10:path/to/directory, or
HEAD~10:path/to/file.  (You can also search the index with --cached, and
various other options exist as well.)

If you want to find a change that introduces or removes a particular
string, you could use "git log -S".  That doesn't allow regexes, but it
might do what you want.  "git grep" will find occurrences of the string
in the current version, and if it has been removed, "git log -S" will
find the removal.

I don't know of any way to do that in one file-oriented step, searching
backward through the first occurrence of every path, including deleted
paths.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-20 Thread Josh Triplett
Currently, if you have a branch "somebranch" that contains a gitlink
"somecommit", you can write "somebranch:somecommit" to refer to the
commit, just like a tree or blob.  ("man git-rev-parse" defines this
syntax in the "SPECIFYING REVISIONS" section.)  You can use this
anywhere you can use a committish, including "git show
somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
or even "git format-patch -1 somebranch:somecommit".

However, you cannot traverse *through* the gitlink to look at files
inside its own tree, or to look at other commits relative to that
commit.  For instance, "somebranch:somecommit:somefile" and
"somebranch:somecommit~3" do not work.

I'd love to have a syntax that allows traversing through the gitlink to
other files or commits.  Ideally, I'd suggest the syntax above, as a
natural extension of the existing extended syntax.

(That syntax would potentially introduce ambiguity if you had a file
named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
problem, though; the existing syntax already doesn't support accessing a
file named "x..y" or "x...y", so scripts already can't expect to access
arbitrary filenames with that syntax without some kind of quoting, wich
we also don't have.)

Does this seem reasonable?  Would a patch introducing such syntax
(including documentation and tests) be acceptable?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Draft of Git Rev News edition 18

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 09:27:04PM +, Eric Wong wrote:
> Josh Triplett <j...@joshtriplett.org> wrote:
> > On Tue, Aug 16, 2016 at 09:30:27AM +, Eric Wong wrote:
> > > Jakub Narębski <jna...@gmail.com> wrote:
> > > > It's a great pity that https://public-inbox.org/ is just
> > > > directory index, not a true home page.
> > > 
> > > +Cc m...@public-inbox.org
> > > 
> > > I'm not sure one could do better while staying true to the
> > > minimalist nature of plain-text email.
> > > 
> > > In the spirit of decentralization, there may not be /a/
> > > homepage, but many.   Everything is meant to clonable with each
> > > public-inbox, so maybe every public-inbox will have a code
> > > branch attached to it with the source+docs bundled.
> > 
> > It'd be nice if it had a prominent list of all lists available; as far
> > as I can tell, the main page has no link to /git/.
> 
> I'm not sure that's necessary; most of the traffic seems to come
> from /git/MESSAGE_ID/ links posted by others.  So it's
> probably more inside-out exposure than anything.

If someone hears about public-inbox, it's nice to know what other lists
they can use it for.

> As for other projects, I'm not aware of anybody else using it,
> yet.  I have some small projects using it, but most of those are
> one-off throwaways and I'm not comfortable promoting those along
> with public-inbox.  I admit: I'm not comfortable promoting
> anything I do, really.

Please take this as encouragement to do so.  I'd love to see the
public-inbox equivalent to the main page of https://lists.debian.org/ ,
as an example.  (And I'd love to have public-inbox archives of Debian
mailing lists.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 05:15:51PM -0400, Jeff King wrote:
> On Tue, Aug 16, 2016 at 02:11:41PM -0700, Josh Triplett wrote:
> 
> > > For HTTPS, I'd just as soon use HTTP-level features.
> > 
> > ALPN, used carefully, could potentially allow eliminating one round-trip
> > compared to HTTPS, and could also allow full-duplex communication.
> 
> I'd love to have a real full-duplex git-over-https. I looked into
> WebSockets at one point, but it looked non-trivial and I gave up.

WebSockets would be non-trivial, and require server configuration as
well, but it could work.

> But if we had a real full-duplex connection over https, I think there
> would be no reason for git:// to continue existing (we'd probably keep
> ssh as it's a useful protocol for authentication, though).

Agreed.

Using ALPN wouldn't actually end up using HTTPS; it would negotiate with
the server and end up connected directly to a git program speaking an
arbitrary protocol over TLS.  Many web servers already support ALPN to
negotiate HTTP/2, so this seems plausible.

Another alternative would be to define a framing for a full-duplex
git-upload-pack connection inside a single HTTP/2 connection; HTTP/2
already effectively allows full-duplex asynchronous conversations.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 04:54:27PM -0400, Jeff King wrote:
> On Tue, Aug 16, 2016 at 01:31:50PM -0700, Josh Triplett wrote:
> 
> > > You can dig up the discussion on the list under the name "protocol v2",
> > > but basically yes, that approach has been considered. It's a little
> > > gross just because it leaves other protocols behind http (and it is not
> > > necessarily a good idea to push people into http, because it has some
> > > fundamental drawbacks over the other protocols because of its
> > > half-duplex nature).
> > 
> > I looked through the "protocol v2" threads, but couldn't find any
> > discussions of using HTTP headers.  I found some mentions of using
> > additional query parameters on the git-upload-pack request, which would
> > break compatibility with existing servers (they won't just ignore the
> > extra parameter).
> 
> Probably the most interesting recent discussion is the sub-thread of
> this patch:
> 
>   
> http://public-inbox.org/git/1460747949-3514-5-git-send-email-dtur...@twopensource.com/
> 
> which you might have missed because it only messages "v2 protocol" in
> the body.

Thanks for the link.

> But basically, I think you get the gist of it. We need to pass
> information from the client to the server _before_ the initial
> capability advertisement. For HTTP, we can do it via specialized headers
> or query parameters. For other protocols, we're stuck with some kind of
> try-and-fallback hack.
> 
> That means those protocols may diverge slightly from HTTP, but at least
> they would differ only in the "bootstrap v2" bit (and that would
> eventually become irrelevant as everybody moves to v2).
> 
> > --client-caps could work for SSH as well, it just requires an extra
> > round-trip to check for --client-caps.  Call git-upload-pack
> > --supports-client-caps, ignore any output (which with current git will
> > consist of a usage message), see if it returns a 0 exit code, if so,
> > call git-upload-pack --client-caps='...', and if not just call
> > git-upload-pack.  (A new git-upload-pack-2 binary would also work, but
> > that seems like overkill.)  I don't see any way around the extra round
> > trip here that would preserve backward compatibility with existing SSH
> > servers (which may force clients to *only* run exactly the command
> > "git-upload-pack" and nothing else).
> 
> Yep, that's about it. For ssh, I suspect we could optimistically try:
> 
>   git upload-pack --v2; test $? = 129 && git-upload-pack
> 
> and then fallback to just "git-upload-pack". That would work without an
> extra round-trip on real shell-capable servers, and eventually work on
> restricted ones.

True, that seems completely sensible.

> That doesn't help git://, though.

I'd love to see plaintext git:// disappear through lack of use.

> There are proposals floating around for basically easing into it with
> config. Have a "remote.*.v2" option you can set locally to enable (or
> disable) it. Default to "false". When there are enough v2 servers around
> to make it worthwhile, flip the default to "auto" which will do the
> probing (at some minor expense of handling fallbacks). Optionally we
> could record the last response for "auto" and use that going forward.

That sounds sensible.

> > Another possibility, which would work for both HTTPS and
> > git-protocol-over-TLS, would be to use ALPN.
> 
> Do people actually use git-over-TLS? There's no core support AFAIK, so
> you'd have to hack it up with a client proxy and git-remote-ext.

I've seen the idea bounced around and prototyped in the context of
supporting authenticated push to git://

> For HTTPS, I'd just as soon use HTTP-level features.

ALPN, used carefully, could potentially allow eliminating one round-trip
compared to HTTPS, and could also allow full-duplex communication.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 11:50:11AM -0700, Stefan Beller wrote:
> On Tue, Aug 16, 2016 at 11:28 AM, Jeff King <p...@peff.net> wrote:
> > On Tue, Aug 16, 2016 at 10:34:44AM -0700, Josh Triplett wrote:
> >
> >> > Sadly you cannot use a capability to fix that, because all of this
> >> > happens before the client agrees to any capabilities (you can find
> >> > discussion of a "v2" protocol on the list which solves this, but it's
> >> > sort of languishing in the design phase).
> >>
> >> As a potential 1.1 version, which could work in a backward-compatible
> >> way with existing servers and no additional round-trip: what if, in the
> >> smart HTTP protocol, the client advertised client capabilities with an
> >> additional HTTP header (e.g.  "Git-Client-Caps: symrefs othershiny
> >> featurenames"?  git-http-backend could then pass those capabilities to
> >> git-upload-pack (--client-caps='...'), which could take them into
> >> account in the initial response?
> >>
> >> That wouldn't work as a single-pass approach for SSH, since the client
> >> can't know if the server's upload-pack supports --client-caps, but it
> >> would work for the smart HTTP protocol.
> >
> > You can dig up the discussion on the list under the name "protocol v2",
> > but basically yes, that approach has been considered. It's a little
> > gross just because it leaves other protocols behind http (and it is not
> > necessarily a good idea to push people into http, because it has some
> > fundamental drawbacks over the other protocols because of its
> > half-duplex nature).
> 
> Some more thoughts on protocol v2 (the good parts to be attributed to
> jrnie...@gmail.com):
> 
> * In case of http we can use the http header and know information
>   about the client, i.e. if it may support the following ideas:
> 
> * Instead of introducing a new protocol we introduce a capability\
>   "resend-ref-advertisement" and only advertise very few refs (e.g.
>   only the branches, not the pending changes in case of Gerrit)
> 
> * The client can then ignore the refs advertisement and ask for a resend
>   of the refs with more specification,
>   e.g. "want refs/heads/*", so allowing more than just sha1s in the
>   want line but complex things like branch patterns.

That seems like something a client could advertise via client
capabilities.  For instance, if the client doesn't want the initial refs
list, only the server capabilities, it could say "capsonly".  (It might
also be reasonable to cache the knowledge that a server supports client
capabilities, though not any specific capability.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 02:28:52PM -0400, Jeff King wrote:
> On Tue, Aug 16, 2016 at 10:34:44AM -0700, Josh Triplett wrote:
> 
> > > Sadly you cannot use a capability to fix that, because all of this
> > > happens before the client agrees to any capabilities (you can find
> > > discussion of a "v2" protocol on the list which solves this, but it's
> > > sort of languishing in the design phase).
> > 
> > As a potential 1.1 version, which could work in a backward-compatible
> > way with existing servers and no additional round-trip: what if, in the
> > smart HTTP protocol, the client advertised client capabilities with an
> > additional HTTP header (e.g.  "Git-Client-Caps: symrefs othershiny
> > featurenames"?  git-http-backend could then pass those capabilities to
> > git-upload-pack (--client-caps='...'), which could take them into
> > account in the initial response?
> > 
> > That wouldn't work as a single-pass approach for SSH, since the client
> > can't know if the server's upload-pack supports --client-caps, but it
> > would work for the smart HTTP protocol.
> 
> You can dig up the discussion on the list under the name "protocol v2",
> but basically yes, that approach has been considered. It's a little
> gross just because it leaves other protocols behind http (and it is not
> necessarily a good idea to push people into http, because it has some
> fundamental drawbacks over the other protocols because of its
> half-duplex nature).

I looked through the "protocol v2" threads, but couldn't find any
discussions of using HTTP headers.  I found some mentions of using
additional query parameters on the git-upload-pack request, which would
break compatibility with existing servers (they won't just ignore the
extra parameter).

--client-caps could work for SSH as well, it just requires an extra
round-trip to check for --client-caps.  Call git-upload-pack
--supports-client-caps, ignore any output (which with current git will
consist of a usage message), see if it returns a 0 exit code, if so,
call git-upload-pack --client-caps='...', and if not just call
git-upload-pack.  (A new git-upload-pack-2 binary would also work, but
that seems like overkill.)  I don't see any way around the extra round
trip here that would preserve backward compatibility with existing SSH
servers (which may force clients to *only* run exactly the command
"git-upload-pack" and nothing else).

Another possibility, which would work for both HTTPS and
git-protocol-over-TLS, would be to use ALPN.

git:// , however, would require a new service name, and that service
would have to accept client caps in-band.  That doesn't seem nearly as
important to support, though.  And that approach doesn't seem preferable
for HTTP, which would require modification to server configuration to
support the new path rather than an HTTP header to the existing path.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 12:31:45PM -0400, Jeff King wrote:
> On Tue, Aug 16, 2016 at 09:18:39AM -0700, Josh Triplett wrote:
> 
> > Commit 5e7dcad771cb873e278a0571b46910d7c32e2f6c in September 2013 added
> > support to upload-pack to show the symbolic target of non-HEAD symbolic
> > refs.  However, commit d007dbf7d6d647dbcf0f357545f43f36dec46f3b in
> > November 2013 reverted that, because it used a capability to transmit
> > the information, and capabilities have a limited size (limited by the
> > pkt-line format which can't send lines longer than 64k) and can't
> > transmit an arbitrary number of symrefs.
> > 
> > (Incidentally, couldn't the same problem occur if the HEAD points to a
> > long enough path to exceed 64k?
> 
> Yes. But it's a lot easier to say "your 64k symref is ridiculous; don't
> do that" than it is to say "oh, you happened to have a lot of symrefs in
> your repository, so we overflowed and failed".
> 
> Besides which, the whole protocol cannot handle refnames larger than
> 64k, so it's not a new problem.

Absolutely agreed.  I mentioned it only because if a server provided any
mechanism to send it symbolic ref targets, this could lead to a
situation where you could push a repository that you could not
subsequently pull.  Depending on the protocols involved, that could
potentially require manual admin intervention, or could even result in a
DoS.

Not something that can arise with git itself, as send-pack/receive-pack
doesn't support sending symbolic ref targets, but I could imagine a
server doing so to allow setting HEAD.

> > I'd like to be able to see the targets of non-HEAD symbolic refs for a
> > repository (symbolic refs under refs/).  I'm interested in extending
> > upload-pack to expose those somehow.  What seems like a sensible format
> > to do so?
> > 
> > Would it make sense to advertise a new capability for symbolic ref
> > targets, which would allow the client to send back a dedicated request
> > for the targets of all symrefs?
> 
> It will definitely require a new capability. You cannot just send a
> "\0symref=..." trailer after each ref, because older clients treat
> multiple "\0" trailers as overwriting one another (so it essentially
> overwrites the old capabilities).
> 
> Sadly you cannot use a capability to fix that, because all of this
> happens before the client agrees to any capabilities (you can find
> discussion of a "v2" protocol on the list which solves this, but it's
> sort of languishing in the design phase).

As a potential 1.1 version, which could work in a backward-compatible
way with existing servers and no additional round-trip: what if, in the
smart HTTP protocol, the client advertised client capabilities with an
additional HTTP header (e.g.  "Git-Client-Caps: symrefs othershiny
featurenames"?  git-http-backend could then pass those capabilities to
git-upload-pack (--client-caps='...'), which could take them into
account in the initial response?

That wouldn't work as a single-pass approach for SSH, since the client
can't know if the server's upload-pack supports --client-caps, but it
would work for the smart HTTP protocol.

> So you are stuck introducing a new phase into the protocol, which is
> probably rather tricky (especially with the http protocol, which is very
> sensitive to extra round-trips). I guess the least-invasive way would be
> to communicate the desires in the "want" phase, and then have the server
> dump it out with the packfile. Like:
> 
>   - server claims "I support symref-wants" in the capability phase
> 
>   - during the negotiation phase, in addition to "want" and "have", the
> client may send "symref " packets. Probably  should be a
> wildcard to avoid having to ask about each ref individually.
> 
>   - before outputting the packfile in the final phase, if any "symref"
> wants were sent by the client, the server dumps a list of "symref
>  " packets, followed by a flush packet.
> 
> That should Just Work over the existing http protocol without requiring
> an extra request.

It'd require one extra request for git ls-remote, which normally doesn't
need the second round-trip, but that still seems reasonable.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


upload-pack/ls-remote: showing non-HEAD symbolic refs?

2016-08-16 Thread Josh Triplett
Commit 5e7dcad771cb873e278a0571b46910d7c32e2f6c in September 2013 added
support to upload-pack to show the symbolic target of non-HEAD symbolic
refs.  However, commit d007dbf7d6d647dbcf0f357545f43f36dec46f3b in
November 2013 reverted that, because it used a capability to transmit
the information, and capabilities have a limited size (limited by the
pkt-line format which can't send lines longer than 64k) and can't
transmit an arbitrary number of symrefs.

(Incidentally, couldn't the same problem occur if the HEAD points to a
long enough path to exceed 64k?  Unlikely to arise in practice, but
theoretically possible for a constructed repository.  Not a major
problem at the moment, since send-pack doesn't seem to support
*sending* symbolic refs, but it would become a problem given any
mechanism to send symbolic refs to the server.)

I'd like to be able to see the targets of non-HEAD symbolic refs for a
repository (symbolic refs under refs/).  I'm interested in extending
upload-pack to expose those somehow.  What seems like a sensible format
to do so?

Would it make sense to advertise a new capability for symbolic ref
targets, which would allow the client to send back a dedicated request
for the targets of all symrefs?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Draft of Git Rev News edition 18

2016-08-16 Thread Josh Triplett
On Tue, Aug 16, 2016 at 09:30:27AM +, Eric Wong wrote:
> Jakub Narębski  wrote:
> > It's a great pity that https://public-inbox.org/ is just
> > directory index, not a true home page.
> 
> +Cc m...@public-inbox.org
> 
> I'm not sure one could do better while staying true to the
> minimalist nature of plain-text email.
> 
> In the spirit of decentralization, there may not be /a/
> homepage, but many.   Everything is meant to clonable with each
> public-inbox, so maybe every public-inbox will have a code
> branch attached to it with the source+docs bundled.

It'd be nice if it had a prominent list of all lists available; as far
as I can tell, the main page has no link to /git/.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-15 Thread Josh Triplett
On Mon, Aug 15, 2016 at 12:17:11PM -0600, Simon Glass wrote:
> On 1 August 2016 at 12:37, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Mon, Aug 01, 2016 at 09:14:54AM -0600, Stephen Warren wrote:
> >> On 07/29/2016 12:40 AM, Josh Triplett wrote:
> >> > I'd like to announce a project I've been working on for a while:
> >> >
> >> > git-series provides a tool for managing patch series with git, tracking
> >> > the "history of history". git series tracks changes to the patch series
> >> > over time, including rebases and other non-fast-forwarding changes. git
> >> > series also tracks a cover letter for the patch series, formats the
> >> > series for email, and prepares pull requests.
> >>
> >> Just as an FYI, I wouldn't be surprised if there's some overlap, or
> >> potential for merging of tools, between this tool and the "patman" tool
> >> that's part of the U-Boot source tree:
> >>
> >> http://git.denx.de/?p=u-boot.git;a=blob;f=tools/patman/README;h=e36857dedea1d0dbafa41732aaf9bf0988d63f38;hb=HEAD
> >
> > Interesting tool; thanks for the link.
> >
> > As far as I can tell from that documentation, patman doesn't track old
> > versions of a patch series; you rebase to modify patches or change
> > patman tags (embedded in commit messages), and nothing preserves the
> > previous version.  And it tracks the cover letter and similar in one of
> > the commit messages in the series, so previous versions of that don't
> > get saved either.  If you wanted to track the history of your changes,
> > you'd have to use branch names or similar.
> 
> That's right. Normally you would keep the old branch around, or tag
> it. Of course old branches are often based on older versions the
> upstream repo, so they are not that useful for diiff, etc. But the
> normal procedure when updating a series to a new version is:
> 
> git checkout -b wibble-v2 wibble
> git rebase upstream/master

That's the workflow I used before git-series, as well.  Having to create
versioned branch names motivated creating git-series; the branch names
in the git-series documentation ("feature-v8-rebased-4.6-alice-fix") are
*reduced* versions of actual branch names used for internal projects.

> > In addition, tracking metadata in commit messages only works with a
> > patches-by-mail workflow where the messages get processed when
> > generating patches; that doesn't work for please-pull workflows.
> 
> Can you explain what a please-pull workflow looks like, and what tags
> are expected?

You push the branch somewhere, as a branch or tag, and then use git
request-pull or otherwise tell someone "please pull from here".  They
pull the *exact* commit hashes you pushed, including whatever you based
them on.  That means they get the exact commit messages you pushed.  So,
if you have any inline metadata in the commit message, that would end up
in the project history.  The Linux kernel and other projects object to
getting those kinds of bits in commit messages; I've seen many patches
rejected because they included a Gerrit Change-Id.

Tracking the history and cover letter in a separate string of "series
commits" allows the underlying patch series to contain the exact commits
you want upstream to pull, without any postprocessing required.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-10 Thread Josh Triplett
On August 9, 2016 11:37:31 PM HST, Richard Ipsum 
<richard.ip...@codethink.co.uk> wrote:
>On Thu, Aug 04, 2016 at 12:40:58PM -1000, Josh Triplett wrote:
>> On Wed, Aug 03, 2016 at 08:12:02PM +0100, Richard Ipsum wrote:
>> > On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
>> > > I'd welcome any feedback, whether on the interface and workflow,
>the
>> > > internals and collaboration, ideas on presenting diffs of patch
>series,
>> > > or anything else.
>> > 
>> > One other nice thing I've noticed about this tool is the
>> > way series behave like regular git branches: I specify the name
>> > of the series and from then on all other commands act on that
>> > series until told otherwise.
>> 
>> Thanks; I spent a while thinking about that part of the workflow.  I
>> save the current series as a symbolic ref SHEAD, and everything
>operates
>> on SHEAD.  (I should probably add support for running things like
>"git
>> series log" or "git series format" on a different series, because
>right
>> now "until told otherwise" doesn't include a way to tell it
>otherwise.)
>
>Apologies for this delayed response,
>I needed time to gather my thoughts,
>and also to fix the perl libgit2 binding to allow me to use
>your symbolic ref suggestion. :p

Yeah, during git-series development I ended up doing some work on both libgit2 
and git2-rs. :)

>Though it turns out that libgit2 doesn't currently allow
>me to write arbitrary data to a symbolic ref as git-symbolic-ref(1)
>will,
>so this still needs to be fixed somehow.

What arbitrary data do you need to write?

Also, note that you want to put your symbolic ref in refs/, not directly in 
.git, so that git takes it into account for object reachability.

>> > git-appraise looks as though it might also have this behaviour.
>> > I think it's a nice way to do it, since you don't generally
>> > perform more than one review simultaneously. So I may well
>> > use this idea in git-candidate if it's okay. :)
>> 
>> By all means.  For a review tool like git-candidate, it seems like
>you'd
>> want even more contextual information, to make it easier to specify
>> things like "comment on file F line L".  For instance, what if you
>> spawned the diff to review in an editor, with plenty of extra context
>> and a file extension that'll cause most editors to recognize it as a
>> patch (and specifically a git-candidate patch to allow specialized
>> editor modes), and told people to add their comments after the line
>they
>> applied to?  When the editor exits successfully, you can scan the
>file,
>> detect the added lines, and save those as comments.  You could figure
>> out the appropriate line by looking for the diff hunk headers and
>> counting line numbers.
>
>I really like this idea, the current interface for commenting is a
>little
>tedious I find.
>
>> 
>> If you use a format-patch diff that includes the headers and commit
>> message, you could also support commenting on those in the same way.
>> Does the notedb format support commenting on those?
>
>Comments in notedb are just a git note keyed on the sha of the
>commit being commented on, I'm not certain what advantage a
>format-patch
>diff provides in this case?

I meant for opening in an editor to write email-reply-style comments. The 
review tool and review storage format should allow commenting on commit 
messages, not just diffs.

>I've been closely following the 'patch submission process' thread,
>and given the discussion there I'm having doubts over the value
>of comments in git-candidate vs the mailing list. It seems to me that
>git-candidate has many of the disadvantages of Github/Gitlab when it
>comes to comments, for example, there is no threading.

That's not inherent, though. You could allow commenting on a comment easily 
enough. (Of course, at some point you've recreated email-style in-reply-to 
headers...)

>Also the system would be less open than the mailing list, since,
>as it stands currently you would require push access to the repository
>to comment on anything.

You'd need a federation mechanism.

>It may be worth reflecting that one reason some organisations
>have switched away from mailing list reviews to Github/Gitlab is that
>they provide patch tracking, where the mailing list provides none,
>so patches there can be 'lost'. So instead of trying to reimplement
>an entire Gerrit/Github/Gitlab ui on the commandline, I wonder whether
>it would be sufficient to add the minimum functionality necessary
>to provide git with native patch tracking, and leave comments f

Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-10 Thread Josh Triplett
On Wed, Aug 10, 2016 at 09:30:01AM +0200, Jakub Narębski wrote:
> On 10 August 2016 at 02:55, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Tue, Aug 09, 2016 at 06:28:00PM +, Eric Wong wrote:
> >> Some of these problems I hope public-inbox (or something like
> >> it) can fix and turn the tide towards email, again.
> >
> > This really seems like the dichotomy that drives people towards central
> > services like GitHub or GitLab.  We need an alternative that doesn't
> > involve email, or at the very least, doesn't require people to use email
> > directly.  Half of the pain in the process comes from coaxing email
> > clients that don't treat mail text as sacrosanct to leave it alone and
> > not mangle it.  (Some of that would go away if we accepted attachments
> > with inline disposition, but not all of it.  All of it would go away if
> > the submission process just involved "git push" to an appropriate
> > location.)
> 
> But submission is less important than review. And for review it is
> usually better (except gigantic series) to have patch text for review
> with the review.

Agreed.  However, submission typically requires more work than review,
because the patch text must remain applicable.  For review, as long as
the email client you use to respond doesn't do something horrible like
*re-wrap* the quoted patch text, the result will work as a review.

Ideally, I'd love to see 1) a review UI that stores line-by-line reviews
into a common format and can translate those to email, and 2) a
mechanism to translate reviews written by email and quoting into the
review format and store them with the repository.

> And (meta)-versioning of series.

I've got a documented format for that. :)

> And place for proof-of-concept / weather-balon patches...

Same place as all other patches, just with an "RFC" tag on them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-09 Thread Josh Triplett
On Tue, Aug 09, 2016 at 06:28:00PM +, Eric Wong wrote:
> Some of these problems I hope public-inbox (or something like
> it) can fix and turn the tide towards email, again.

This really seems like the dichotomy that drives people towards central
services like GitHub or GitLab.  We need an alternative that doesn't
involve email, or at the very least, doesn't require people to use email
directly.  Half of the pain in the process comes from coaxing email
clients that don't treat mail text as sacrosanct to leave it alone and
not mangle it.  (Some of that would go away if we accepted attachments
with inline disposition, but not all of it.  All of it would go away if
the submission process just involved "git push" to an appropriate
location.)

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-09 Thread Josh Triplett
On Tue, Aug 09, 2016 at 06:57:05AM -0400, Jeff King wrote:
> On Tue, Aug 09, 2016 at 10:11:30AM +0200, Michael J Gruber wrote:
> 
> > Maybe two more points of input for the discussion:
> > 
> > off-line capabilities
> > =
> > 
> > The current workflow here seems to work best when you are subscribed to
> > the git-ml and have your own (local, maybe selective) copy of git-ml in
> > your (text-based) MUA (mutt, (al)pine, emacs, ...) that can jump right
> > into git-am and such directly. I'm not sure how important the "off-line"
> > aspect of that is for some of us, and how that could be replicated on
> > GitHub - while PRs and such may be Git based behind the scenes there
> > seems to be no way to clone that info and work from a local clone.
> > (Don't know if GitLab is more open.)
> 
> You can pull it all out via GitHub's HTTP API, but the question is what
> format you would use to store it locally (and which tools you would then
> use to play with it).
> 
> I haven't really tried this lately, though, so I don't know if there is
> information that the API would be missing.
> 
> I do have a dream of writing a tool that sucks in GitHub PRs to a fake
> email thread, lets me make my responses inline in an editor, and then
> pushes it back up as PR comments (finding the right positions based on
> the quoted context).

You might try https://github.com/joeyh/github-backup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] format-patch: Add a config option format.from to set the default for --from

2016-08-07 Thread Josh Triplett
On Mon, Aug 08, 2016 at 12:54:41AM -0400, Jeff King wrote:
> On Sun, Aug 07, 2016 at 06:42:07PM -1000, Josh Triplett wrote:
> 
> > > Drop trailing comma after the last enum definition (trailing comma
> > > after the last element in an array is OK, though).
> > 
> > I realize this code didn't get included in the final version, but for
> > future reference, what's the rationale for this?  I tend to include a
> > final comma in cases like these (and likewise for initializers) to avoid
> > needing to change the last line when introducing a new element, reducing
> > noise in diffs.  I hadn't seen anything in any of the coding style
> > documentation talking about trailing commas (either pro or con).
> 
> Portability; some compilers choke on it. C89 allows trailing commas in
> array initialization but _not_ in enums. Most compilers allow it anyway
> (though gcc complains with -Wpedantic).
> 
> This definitely broke the build on real systems early in Git's history
> (I think the AIX compiler was one culprit),

Thanks for the explanation.  I assume such compilers also don't accept
C99?

> but at this point it's
> possible that all of those compilers have died off. It would be nice if
> we could start using it (for exactly the reasons you give).
> Unfortunately there's not a good way to know except "introduce it and
> see if people complain".

Fair enough.  I'll let someone else be the test case for that. :)

Perhaps the next Git user survey could ask "what compiler (including
version) do you use to compile Git", and perhaps "does it accept the
following code:"?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] format-patch: Add a config option format.from to set the default for --from

2016-08-07 Thread Josh Triplett
On Mon, Aug 01, 2016 at 01:32:49PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > +static char *from;
> >  static const char *signature = git_version_string;
> >  static const char *signature_file;
> >  static int config_cover_letter;
> > @@ -807,6 +808,17 @@ static int git_format_config(const char *var, const 
> > char *value, void *cb)
> > base_auto = git_config_bool(var, value);
> > return 0;
> > }
> > +   if (!strcmp(var, "format.from")) {
> > +   int b = git_config_maybe_bool(var, value);
> > +   free(from);
> > +   if (b < 0)
> > +   from = xstrdup(value);
> > +   else if (b)
> > +   from = xstrdup(git_committer_info(IDENT_NO_DATE));
> > +   else
> > +   from = NULL;
> > +   return 0;
> > +   }
> 
> One potential issue I see here is that if we ever plan to switch the
> default, we may want to issue a warning message to users who do not
> have any format.from configured when they do run the program on a
> commit that will get a different result before and after the switch
> in a release of Git before that default switch happens.  The message
> would say something like "you are formatting somebody else's commit.
> the output will change in future versions of Git and show an explicit
> in-body From: header; if you want to keep the current behaviour, set
> format.from configuration variable to false".

The previous discussion between you and Jeff King seemed to suggest that
a mention in the release notes might suffice, rather than a "noisy
deprecation" warning.

> But you cannot tell by looking at from that is NULL between two
> cases, it is NULL because we defaulted to it (in which case we do
> want to warn), or the user set it explicitly to false (we do not
> want to give the warning).

If we wanted to issue such a warning, I'd suggest a separate boolean
"from_set", set when either the configuration or command line sets
"from", and then the code that handles "from" could emit a warning to
stderr if it would produce different results and !from_set.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] format-patch: Add a config option format.from to set the default for --from

2016-08-07 Thread Josh Triplett
On Mon, Aug 01, 2016 at 02:18:47PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > +enum from {
> > +   FROM_AUTHOR,
> > +   FROM_USER,
> > +   FROM_VALUE,
> 
> Drop trailing comma after the last enum definition (trailing comma
> after the last element in an array is OK, though).

I realize this code didn't get included in the final version, but for
future reference, what's the rationale for this?  I tend to include a
final comma in cases like these (and likewise for initializers) to avoid
needing to change the last line when introducing a new element, reducing
noise in diffs.  I hadn't seen anything in any of the coding style
documentation talking about trailing commas (either pro or con).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] format-patch: Add a config option format.from to set the default for --from

2016-08-07 Thread Josh Triplett
On Sun, Aug 07, 2016 at 06:59:09PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > I'd actually seriously considered this exact approach, which I preferred
> > as well, but I'd discarded it because I figured it'd get rejected.
> > Given your suggestion, and Junio's comment, I'll go with this version.
> 
> Sorry, but your response is soo delayed that I am not sure what you
> are agreeing with

I'm on vacation right now.  I was agreeing with your comment that you
didn't care for the change in v2 to use an enum.

> and also am not sure if you are planning to reroll
> what has already been happily accepted to 'next', which is not quite
> welcome.

I didn't realize you had already taken the patch series into next; I'd
assumed from the various comments that you expected me to reroll it
before you'd take it.

Would you like me to write something up for the release notes regarding
plans to change the default?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/2] format-patch: Add a config option format.from to set the default for --from

2016-08-07 Thread Josh Triplett
On Mon, Aug 01, 2016 at 01:38:47PM -0400, Jeff King wrote:
> On Sat, Jul 30, 2016 at 12:11:11PM -0700, Josh Triplett wrote:
> 
> > +enum from {
> > +   FROM_AUTHOR,
> > +   FROM_USER,
> > +   FROM_VALUE,
> > +};
> > +
> > +static void set_from(enum from type, const char *value)
> > +{
> > +   free(from);
> > +   switch (type) {
> > +   case FROM_AUTHOR:
> > +   from = NULL;
> > +   break;
> > +   case FROM_USER:
> > +   from = xstrdup(git_committer_info(IDENT_NO_DATE));
> > +   break;
> > +   case FROM_VALUE:
> > +   from = xstrdup(value);
> > +   break;
> > +   }
> > +}
> 
> Thanks for looking into reducing the duplication. TBH, I am not sure it
> is really an improvement, just because of the amount of boilerplate (and
> this function interface is kind of weird, because of the rules for when
> "value" should or should not be NULL).
> 
> I guess another way to do it would be:
> 
>   #define FROM_AUTO_IDENT ((const char *)(intptr_t)1))
>   void set_from(const char *value)
>   {
>   if (value == FROM_AUTO_IDENT)
>   value = git_committer_info(IDENT_NO_DATE);
>   free(from);
>   from = xstrdup_or_null(value);
>   }

I'd actually seriously considered this exact approach, which I preferred
as well, but I'd discarded it because I figured it'd get rejected.
Given your suggestion, and Junio's comment, I'll go with this version.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-05 Thread Josh Triplett
On Mon, Sep 17, 2001 at 10:00:00AM +, Stefan Beller wrote:
> But both send-email as well as mail-patch-series as well as git-series
> are all about the *sending* part. Not about the back and forth part, i.e.
> these don't deal with: "here is a fixup on top". And by that I mean
> receiving mails and applying them. git-am is there as a front-end
> once you obtained the mail, but from what I get, your original problem
> is to get up to date with the latest state, that is either in pu or a proposed
> fixup mail on top of your series?

git-series, at least, is intended to handle the back-and-forth: if you
actually publish the series and not just the final result, someone could
pull the series, make a (non-fast-forwarding) change to that, make a new
series commit, and publish their modified version of your series.  You
could then incorporate that change.  One of the use cases I developed it
for was collaborative development of a patch series.

(That workflow still needs a lot more tool assistance to become fully
usable, not least of which to assist with the process of merging changes
to the series.  Working on that.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-04 Thread Josh Triplett
On Wed, Aug 03, 2016 at 08:12:02PM +0100, Richard Ipsum wrote:
> On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
> > I'd welcome any feedback, whether on the interface and workflow, the
> > internals and collaboration, ideas on presenting diffs of patch series,
> > or anything else.
> 
> One other nice thing I've noticed about this tool is the
> way series behave like regular git branches: I specify the name
> of the series and from then on all other commands act on that
> series until told otherwise.

Thanks; I spent a while thinking about that part of the workflow.  I
save the current series as a symbolic ref SHEAD, and everything operates
on SHEAD.  (I should probably add support for running things like "git
series log" or "git series format" on a different series, because right
now "until told otherwise" doesn't include a way to tell it otherwise.)

One fun detail that took a couple of iterations to get right: I keep
separate "staged" and "working" versions per-series, so even with
outstanding changes to the cover letter, base, or series, you can always
detach or checkout another series without losing anything.  If you
switch back, all your staged and unstaged changes will remain staged and
unstaged where you left them.  That solves the "checkout a different
series with modifications to the current series" case.

> git-appraise looks as though it might also have this behaviour.
> I think it's a nice way to do it, since you don't generally
> perform more than one review simultaneously. So I may well
> use this idea in git-candidate if it's okay. :)

By all means.  For a review tool like git-candidate, it seems like you'd
want even more contextual information, to make it easier to specify
things like "comment on file F line L".  For instance, what if you
spawned the diff to review in an editor, with plenty of extra context
and a file extension that'll cause most editors to recognize it as a
patch (and specifically a git-candidate patch to allow specialized
editor modes), and told people to add their comments after the line they
applied to?  When the editor exits successfully, you can scan the file,
detect the added lines, and save those as comments.  You could figure
out the appropriate line by looking for the diff hunk headers and
counting line numbers.

If you use a format-patch diff that includes the headers and commit
message, you could also support commenting on those in the same way.
Does the notedb format support commenting on those?

> I haven't found time to use the tool to do any serious review
> yet, but I'll try and post some more feedback when I do.

Thanks!

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/2] format-patch: Transition the default to --from to avoid spoofed mails

2016-08-01 Thread Josh Triplett
On Mon, Aug 01, 2016 at 01:47:24PM -0400, Jeff King wrote:
> On Sat, Jul 30, 2016 at 12:11:05PM -0700, Josh Triplett wrote:
> 
> > Josh Triplett (2):
> >   format-patch: Add a config option format.from to set the default for 
> > --from
> >   format-patch: Default to --from
> 
> By the way, I notice that the threading between your patches and cover
> letter are broken. Since I see you are also working on a tool for
> handling such things, I'd suspect the tool (or your workflow) has a bug.
> :)

My workflow, fortunately. :)

> The message-id of this message is:
> 
>   <20160730191104.2ps5k7eji7aqgufg@x>
> 
> but the patches have both "References" and "In-Reply-To" set to:
> 
>   
> <cover.4d006cadf197f80d899ad7d7d56d8ba41f574adf.1469905775.git-series.j...@joshtriplett.org>
> 
> I also see your MUA is mutt, and I think I saw you mention using "mutt
> -H" elsewhere. IIRC, when I started using a similar workflow years ago,
> I tried the same thing and had the same problem: "-H" treats the input
> file as a template, not a message, and thus generates a new message-id.
> 
> I switched to using mutt's internal "resend-message" function, which
> does a more literal re-send. I don't think I ever found a way to
> convince mutt to do a resend from the command line.

I actually tried using mutt's resend function (alt-e) with an mbox; I
checked the Message-Id and In-Reply-To headers, and they looked correct.
I've used mutt -H successfully before without breaking threads.  The
Debian mutt packages recently upgraded to the "neomutt" fork; I wonder
if something broke recently?

Thanks for letting me know; I'll investigate and try to figure out the
problem.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-01 Thread Josh Triplett
On Mon, Aug 01, 2016 at 09:14:54AM -0600, Stephen Warren wrote:
> On 07/29/2016 12:40 AM, Josh Triplett wrote:
> > I'd like to announce a project I've been working on for a while:
> > 
> > git-series provides a tool for managing patch series with git, tracking
> > the "history of history". git series tracks changes to the patch series
> > over time, including rebases and other non-fast-forwarding changes. git
> > series also tracks a cover letter for the patch series, formats the
> > series for email, and prepares pull requests.
> 
> Just as an FYI, I wouldn't be surprised if there's some overlap, or
> potential for merging of tools, between this tool and the "patman" tool
> that's part of the U-Boot source tree:
> 
> http://git.denx.de/?p=u-boot.git;a=blob;f=tools/patman/README;h=e36857dedea1d0dbafa41732aaf9bf0988d63f38;hb=HEAD

Interesting tool; thanks for the link.

As far as I can tell from that documentation, patman doesn't track old
versions of a patch series; you rebase to modify patches or change
patman tags (embedded in commit messages), and nothing preserves the
previous version.  And it tracks the cover letter and similar in one of
the commit messages in the series, so previous versions of that don't
get saved either.  If you wanted to track the history of your changes,
you'd have to use branch names or similar.

In addition, tracking metadata in commit messages only works with a
patches-by-mail workflow where the messages get processed when
generating patches; that doesn't work for please-pull workflows.

patman does have quite a few interesting ideas, though.  git-series
needs some way of handling To/Cc addresses for patches and the cover
letter (beyond just scripts/get_maintainer.pl), and more automatic
handling of series versioning (v2, v3, ...) and associated series
changelogs.  Suggestions welcome.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-01 Thread Josh Triplett
On Mon, Aug 01, 2016 at 07:55:54AM +, Eric Wong wrote:
> Christian Couder <christian.cou...@gmail.com> wrote:
> > On Fri, Jul 29, 2016 at 12:10 PM, Richard Ipsum
> > <richard.ip...@codethink.co.uk> wrote:
> > > On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
> > > [snip]
> > >>
> > >> I'd welcome any feedback, whether on the interface and workflow, the
> > >> internals and collaboration, ideas on presenting diffs of patch series,
> > >> or anything else.
> 
> > > I'm particularly interested in trying to establish a standard for
> > > storing review data in git. I've got a prototype for doing that[3],
> > > and an example tool that uses it[4]. The tool is still incomplete/buggy 
> > > though.
> > 
> > There is also git-appraise (https://github.com/google/git-appraise)
> > written in Go to store code review data in Git.
> > It looks like it stores its data in git notes and can be integrated
> > with Rust (https://github.com/Nemo157/git-appraise-rs).
> 
> I'm not convinced another format/standard is needed besides the
> email workflow we already use for git and kernel development.

Not all projects use a patches-by-email workflow, or want to.  To the
extent that tools and projects use some other workflow, standardizing
the format they use to store patch reviews (including per-line
annotations, approvals, test results, etc) seems preferable to having
each tool use its own custom format.

> I also see the reliance on an after-the-fact search engine
> (which can be tuned/replaced) as philosophically inline with
> what git does, too, such as not having rename tracking and
> doing delayed deltafication.

Storing review data in git doesn't mean it needs to end up in the
history of the project itself; it can use after-the-fact annotations on
a commit.

> Email also has the advantage of having existing tooling, and
> being (at least for now) federated without a single point of
> failure.

Storing review data in git makes it easy to push and pull it, which can
provide the basis for a federated system.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/2] format-patch: Add a config option format.from to set the default for --from

2016-07-30 Thread Josh Triplett
This helps users who would prefer format-patch to default to --from, and
makes it easier to change the default in the future.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
 Documentation/config.txt   | 10 ++-
 builtin/log.c  | 46 +--
 contrib/completion/git-completion.bash |  1 +-
 t/t4014-format-patch.sh| 40 +++-
 4 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 8b1aee4..bd34774 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1253,6 +1253,16 @@ format.attach::
value as the boundary.  See the --attach option in
linkgit:git-format-patch[1].
 
+format.from::
+   Provides the default value for the `--from` option to format-patch.
+   Accepts a boolean value, or a name and email address.  If false,
+   format-patch defaults to `--no-from`, using commit authors directly in
+   the "From:" field of patch mails.  If true, format-patch defaults to
+   `--from`, using your committer identity in the "From:" field of patch
+   mails and including a "From:" field in the body of the patch mail if
+   different.  If set to a non-boolean value, format-patch uses that
+   value instead of your committer identity.  Defaults to false.
+
 format.numbered::
A boolean which can enable or disable sequence numbers in patch
subjects.  It defaults to "auto" which enables it only if there
diff --git a/builtin/log.c b/builtin/log.c
index fd1652f..dbd2da7 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -719,6 +719,7 @@ static void add_header(const char *value)
 static int thread;
 static int do_signoff;
 static int base_auto;
+static char *from;
 static const char *signature = git_version_string;
 static const char *signature_file;
 static int config_cover_letter;
@@ -731,6 +732,28 @@ enum {
COVER_AUTO
 };
 
+enum from {
+   FROM_AUTHOR,
+   FROM_USER,
+   FROM_VALUE,
+};
+
+static void set_from(enum from type, const char *value)
+{
+   free(from);
+   switch (type) {
+   case FROM_AUTHOR:
+   from = NULL;
+   break;
+   case FROM_USER:
+   from = xstrdup(git_committer_info(IDENT_NO_DATE));
+   break;
+   case FROM_VALUE:
+   from = xstrdup(value);
+   break;
+   }
+}
+
 static int git_format_config(const char *var, const char *value, void *cb)
 {
if (!strcmp(var, "format.headers")) {
@@ -807,6 +830,16 @@ static int git_format_config(const char *var, const char 
*value, void *cb)
base_auto = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "format.from")) {
+   int b = git_config_maybe_bool(var, value);
+   if (b < 0)
+   set_from(FROM_VALUE, value);
+   else if (b)
+   set_from(FROM_USER, NULL);
+   else
+   set_from(FROM_AUTHOR, NULL);
+   return 0;
+   }
 
return git_log_config(var, value, cb);
 }
@@ -1199,16 +1232,12 @@ static int cc_callback(const struct option *opt, const 
char *arg, int unset)
 
 static int from_callback(const struct option *opt, const char *arg, int unset)
 {
-   char **from = opt->value;
-
-   free(*from);
-
if (unset)
-   *from = NULL;
+   set_from(FROM_AUTHOR, NULL);
else if (arg)
-   *from = xstrdup(arg);
+   set_from(FROM_VALUE, arg);
else
-   *from = xstrdup(git_committer_info(IDENT_NO_DATE));
+   set_from(FROM_USER, NULL);
return 0;
 }
 
@@ -1384,7 +1413,6 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
int quiet = 0;
int reroll_count = -1;
char *branch_name = NULL;
-   char *from = NULL;
char *base_commit = NULL;
struct base_tree_info bases;
 
@@ -1433,7 +1461,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
0, to_callback },
{ OPTION_CALLBACK, 0, "cc", NULL, N_("email"), N_("add Cc: 
header"),
0, cc_callback },
-   { OPTION_CALLBACK, 0, "from", , N_("ident"),
+   { OPTION_CALLBACK, 0, "from", NULL, N_("ident"),
N_("set From address to  (or committer ident 
if absent)"),
PARSE_OPT_OPTARG, from_callback },
OPT_STRING(0, "in-reply-to", _reply_to, N_("message-id"),
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 10f6d52.

[PATCH v2 0/2] format-patch: Transition the default to --from to avoid spoofed mails

2016-07-30 Thread Josh Triplett
As discussed, this patch series allows transitioning the default
behavior of format-patch to --from, to avoid spoofing mails when
formatting commits not written by the user.

The first patch introduces the format.from option to set the default
value of format-patch --from.  This patch doesn't change the default
behavior of format-patch, so it can go in without any transition.

The second patch changes the default to --from.  If you'd like to delay
this patch for a release and mention the planned change in the release
notes, let me know and I'll provide text for the release notes; if you
don't think this needs a transition period, you can go ahead and apply
the second patch.

v2: Unify the various places setting from into a single helper function.

Josh Triplett (2):
  format-patch: Add a config option format.from to set the default for --from
  format-patch: Default to --from

 Documentation/config.txt   | 10 -
 builtin/log.c  | 47 +++
 contrib/completion/git-completion.bash |  1 +-
 t/t4014-format-patch.sh| 68 +--
 4 files changed, 114 insertions(+), 12 deletions(-)

-- 
git-series 0.8.7
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/2] format-patch: Default to --from

2016-07-30 Thread Josh Triplett
This avoids spoofing mails when formatting commits not written by the
user.

Add tests for the new default, and fix tests whose expected output
depended on the old default.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
 Documentation/config.txt |  2 +-
 builtin/log.c|  1 +
 t/t4014-format-patch.sh  | 28 +---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bd34774..2310877 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1261,7 +1261,7 @@ format.from::
`--from`, using your committer identity in the "From:" field of patch
mails and including a "From:" field in the body of the patch mail if
different.  If set to a non-boolean value, format-patch uses that
-   value instead of your committer identity.  Defaults to false.
+   value instead of your committer identity.  Defaults to true.
 
 format.numbered::
A boolean which can enable or disable sequence numbers in patch
diff --git a/builtin/log.c b/builtin/log.c
index dbd2da7..bcca974 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1486,6 +1486,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
OPT_END()
};
 
+   set_from(FROM_USER, NULL);
extra_hdr.strdup_strings = 1;
extra_to.strdup_strings = 1;
extra_cc.strdup_strings = 1;
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..fa35cbe 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -652,7 +652,7 @@ EOF
 
 test_expect_success 'format-patch -p suppresses stat' '
 
-   git format-patch -p -2 &&
+   git format-patch --no-from -p -2 &&
sed -e "1,/^\$/d" -e "/^+5/q" < 
0001-This-is-an-excessively-long-subject-line-for-a-messa.patch > output &&
test_cmp expect output
 
@@ -973,7 +973,7 @@ check_author() {
echo content >>file &&
git add file &&
GIT_AUTHOR_NAME=$1 git commit -m author-check &&
-   git format-patch --stdout -1 >patch &&
+   git format-patch --no-from --stdout -1 >patch &&
sed -n "/^From: /p; /^ /p; /^$/q" actual &&
test_cmp expect actual
 }
@@ -1089,6 +1089,18 @@ test_expect_success '--from=ident replaces author' '
test_cmp expect patch.head
 '
 
+test_expect_success 'Default uses committer ident' '
+   git format-patch -1 --stdout >patch &&
+   cat >expect <<-\EOF &&
+   From: C O Mitter <commit...@example.com>
+
+   From: A U Thor <aut...@example.com>
+
+   EOF
+   sed -ne "/^From:/p; /^$/p; /^---$/q" patch.head &&
+   test_cmp expect patch.head
+'
+
 test_expect_success '--from uses committer ident' '
git format-patch -1 --stdout --from >patch &&
cat >expect <<-\EOF &&
@@ -1101,6 +1113,16 @@ test_expect_success '--from uses committer ident' '
test_cmp expect patch.head
 '
 
+test_expect_success '--no-from suppresses default --from' '
+   git format-patch -1 --stdout --no-from >patch &&
+   cat >expect <<-\EOF &&
+   From: A U Thor <aut...@example.com>
+
+   EOF
+   sed -ne "/^From:/p; /^$/p; /^---$/q" patch.head &&
+   test_cmp expect patch.head
+'
+
 test_expect_success '--from omits redundant in-body header' '
git format-patch -1 --stdout --from="A U Thor <aut...@example.com>" 
>patch &&
cat >expect <<-\EOF &&
@@ -1129,7 +1151,7 @@ test_expect_success 'in-body headers trigger content 
encoding' '
 append_signoff()
 {
C=$(git commit-tree HEAD^^{tree} -p HEAD) &&
-   git format-patch --stdout --signoff $C^..$C >append_signoff.patch &&
+   git format-patch --no-from --stdout --signoff $C^..$C 
>append_signoff.patch &&
sed -n -e "1,/^---$/p" append_signoff.patch |
egrep -n "^Subject|Sign|^$"
 }
-- 
git-series 0.8.7
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] format-patch: Add a config option format.from to set the default for --from

2016-07-30 Thread Josh Triplett
On Sat, Jul 30, 2016 at 11:40:34AM -0400, Jeff King wrote:
> On Sat, Jul 30, 2016 at 02:41:56AM -0700, Josh Triplett wrote:
> 
> > @@ -807,6 +808,17 @@ static int git_format_config(const char *var, const 
> > char *value, void *cb)
> > base_auto = git_config_bool(var, value);
> > return 0;
> > }
> > +   if (!strcmp(var, "format.from")) {
> > +   int b = git_config_maybe_bool(var, value);
> > +   free(from);
> > +   if (b < 0)
> > +   from = xstrdup(value);
> > +   else if (b)
> > +   from = xstrdup(git_committer_info(IDENT_NO_DATE));
> > +   else
> > +   from = NULL;
> > +   return 0;
> > +   }
> 
> This "free old, then handle tri-state" mirrors the code in the parseopt
> callback pretty closely. I wonder if they could share the logic (it is
> not many lines, but we would want the logic to stay identical). I
> suspect the helper function would end up with more boilerplate than it's
> worth, though, trying to handle the unset and default cases.

I looked at trying to share that code for exactly that reason, but
didn't find a convenient way to share the two, because from_callback
checked two separate variables (unset and arg), while the logic above
checks one.  So, while the *bodies* of the three-way if are duplicated,
the *conditions* aren't.

However, if you'd like to avoid the duplication between the three
values, I can do that with a set_from function that takes an enum and a
new value; it'll actually increase lines of code, but remove the
duplication (as well as the second patch's third copy of setting from to
the committer info).

> > diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> > index 1206c48..b0579dd 100755
> > --- a/t/t4014-format-patch.sh
> > +++ b/t/t4014-format-patch.sh
> > @@ -229,6 +229,46 @@ check_patch () {
> > grep -e "^Subject:" "$1"
> >  }
> >  
> > +test_expect_success 'format.from=false' '
> > +
> > +   git -c format.from=false format-patch --stdout master..side |
> > +   sed -e "/^\$/q" >patch &&
> > +   check_patch patch &&
> > +   ! grep "^From: C O Mitter <commit...@example.com>\$" patch
> > +'
> 
> These tests follow a different style from the "--from" tests later in
> the script (and your second patch does follow it, and puts its test
> close there). Any reason not to have all of the "from" tests together,
> and using the same style?

The tests covered different things.  The later --from tests made sure
that --from behaved as expected.  These tests made sure that format.from
and --from/--no-from interacted in the expected way, with the
command-line options overriding the configuration.  So, I put them next
to the tests for other options like format.to and format.cc, which
tested the same thing (overriding those with --no-to, --no-cc, etc).

> Overall, the whole thing looks cleanly done, and I don't mind it going
> in as-is. These are just two things I noticed while reading it over.

I'll send a v2 with the code duplication fixed.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] format-patch: Transition the default to --from to avoid spoofed mails

2016-07-30 Thread Josh Triplett
As discussed, this patch series allows transitioning the default
behavior of format-patch to --from, to avoid spoofing mails when
formatting commits not written by the user.

The first patch introduces the format.from option to set the default
value of format-patch --from.  This patch doesn't change the default
behavior of format-patch, so it can go in without any transition.

The second patch changes the default to --from.  If you'd like to delay
this patch for a release and mention the planned change in the release
notes, let me know and I'll provide text for the release notes; if you
don't think this needs a transition period, you can go ahead and apply
the second patch.

Josh Triplett (2):
  format-patch: Add a config option format.from to set the default for --from
  format-patch: Default to --from

 Documentation/config.txt   | 10 -
 builtin/log.c  | 14 +-
 contrib/completion/git-completion.bash |  1 +-
 t/t4014-format-patch.sh| 68 +--
 4 files changed, 89 insertions(+), 4 deletions(-)

-- 
git-series 0.8.7
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] format-patch: Add a config option format.from to set the default for --from

2016-07-30 Thread Josh Triplett
This helps users who would prefer format-patch to default to --from, and
makes it easier to change the default in the future.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
 Documentation/config.txt   | 10 +++-
 builtin/log.c  | 13 -
 contrib/completion/git-completion.bash |  1 +-
 t/t4014-format-patch.sh| 40 +++-
 4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 8b1aee4..bd34774 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1253,6 +1253,16 @@ format.attach::
value as the boundary.  See the --attach option in
linkgit:git-format-patch[1].
 
+format.from::
+   Provides the default value for the `--from` option to format-patch.
+   Accepts a boolean value, or a name and email address.  If false,
+   format-patch defaults to `--no-from`, using commit authors directly in
+   the "From:" field of patch mails.  If true, format-patch defaults to
+   `--from`, using your committer identity in the "From:" field of patch
+   mails and including a "From:" field in the body of the patch mail if
+   different.  If set to a non-boolean value, format-patch uses that
+   value instead of your committer identity.  Defaults to false.
+
 format.numbered::
A boolean which can enable or disable sequence numbers in patch
subjects.  It defaults to "auto" which enables it only if there
diff --git a/builtin/log.c b/builtin/log.c
index fd1652f..1f116be 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -719,6 +719,7 @@ static void add_header(const char *value)
 static int thread;
 static int do_signoff;
 static int base_auto;
+static char *from;
 static const char *signature = git_version_string;
 static const char *signature_file;
 static int config_cover_letter;
@@ -807,6 +808,17 @@ static int git_format_config(const char *var, const char 
*value, void *cb)
base_auto = git_config_bool(var, value);
return 0;
}
+   if (!strcmp(var, "format.from")) {
+   int b = git_config_maybe_bool(var, value);
+   free(from);
+   if (b < 0)
+   from = xstrdup(value);
+   else if (b)
+   from = xstrdup(git_committer_info(IDENT_NO_DATE));
+   else
+   from = NULL;
+   return 0;
+   }
 
return git_log_config(var, value, cb);
 }
@@ -1384,7 +1396,6 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
int quiet = 0;
int reroll_count = -1;
char *branch_name = NULL;
-   char *from = NULL;
char *base_commit = NULL;
struct base_tree_info bases;
 
diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 10f6d52..4393033 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2181,6 +2181,7 @@ _git_config ()
format.attach
format.cc
format.coverLetter
+   format.from
format.headers
format.numbered
format.pretty
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 1206c48..b0579dd 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -229,6 +229,46 @@ check_patch () {
grep -e "^Subject:" "$1"
 }
 
+test_expect_success 'format.from=false' '
+
+   git -c format.from=false format-patch --stdout master..side |
+   sed -e "/^\$/q" >patch &&
+   check_patch patch &&
+   ! grep "^From: C O Mitter <commit...@example.com>\$" patch
+'
+
+test_expect_success 'format.from=true' '
+
+   git -c format.from=true format-patch --stdout master..side |
+   sed -e "/^\$/q" >patch &&
+   check_patch patch &&
+   grep "^From: C O Mitter <commit...@example.com>\$" patch
+'
+
+test_expect_success 'format.from with address' '
+
+   git -c format.from="F R Om <f...@example.com>" format-patch --stdout 
master..side |
+   sed -e "/^\$/q" >patch &&
+   check_patch patch &&
+   grep "^From: F R Om <f...@example.com>\$" patch
+'
+
+test_expect_success '--no-from overrides format.from' '
+
+   git -c format.from="F R Om <f...@example.com>" format-patch --no-from 
--stdout master..side |
+   sed -e "/^\$/q" >patch &&
+   check_patch patch &&
+   ! grep "^From: F R Om <f...@example.com>\$" patch
+'
+
+test_expect_success '--from overrides format.from' '
+
+   git -c format.from="F R Om <f...@exampl

[PATCH 2/2] format-patch: Default to --from

2016-07-30 Thread Josh Triplett
This avoids spoofing mails when formatting commits not written by the
user.

Add tests for the new default, and fix tests whose expected output
depended on the old default.

Signed-off-by: Josh Triplett <j...@joshtriplett.org>
---
 Documentation/config.txt |  2 +-
 builtin/log.c|  1 +
 t/t4014-format-patch.sh  | 28 +---
 3 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bd34774..2310877 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1261,7 +1261,7 @@ format.from::
`--from`, using your committer identity in the "From:" field of patch
mails and including a "From:" field in the body of the patch mail if
different.  If set to a non-boolean value, format-patch uses that
-   value instead of your committer identity.  Defaults to false.
+   value instead of your committer identity.  Defaults to true.
 
 format.numbered::
A boolean which can enable or disable sequence numbers in patch
diff --git a/builtin/log.c b/builtin/log.c
index 1f116be..53b2f62 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -1472,6 +1472,7 @@ int cmd_format_patch(int argc, const char **argv, const 
char *prefix)
extra_hdr.strdup_strings = 1;
extra_to.strdup_strings = 1;
extra_cc.strdup_strings = 1;
+   from = xstrdup(git_committer_info(IDENT_NO_DATE));
init_log_defaults();
git_config(git_format_config, NULL);
init_revisions(, prefix);
diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index b0579dd..fa35cbe 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -652,7 +652,7 @@ EOF
 
 test_expect_success 'format-patch -p suppresses stat' '
 
-   git format-patch -p -2 &&
+   git format-patch --no-from -p -2 &&
sed -e "1,/^\$/d" -e "/^+5/q" < 
0001-This-is-an-excessively-long-subject-line-for-a-messa.patch > output &&
test_cmp expect output
 
@@ -973,7 +973,7 @@ check_author() {
echo content >>file &&
git add file &&
GIT_AUTHOR_NAME=$1 git commit -m author-check &&
-   git format-patch --stdout -1 >patch &&
+   git format-patch --no-from --stdout -1 >patch &&
sed -n "/^From: /p; /^ /p; /^$/q" actual &&
test_cmp expect actual
 }
@@ -1089,6 +1089,18 @@ test_expect_success '--from=ident replaces author' '
test_cmp expect patch.head
 '
 
+test_expect_success 'Default uses committer ident' '
+   git format-patch -1 --stdout >patch &&
+   cat >expect <<-\EOF &&
+   From: C O Mitter <commit...@example.com>
+
+   From: A U Thor <aut...@example.com>
+
+   EOF
+   sed -ne "/^From:/p; /^$/p; /^---$/q" patch.head &&
+   test_cmp expect patch.head
+'
+
 test_expect_success '--from uses committer ident' '
git format-patch -1 --stdout --from >patch &&
cat >expect <<-\EOF &&
@@ -1101,6 +1113,16 @@ test_expect_success '--from uses committer ident' '
test_cmp expect patch.head
 '
 
+test_expect_success '--no-from suppresses default --from' '
+   git format-patch -1 --stdout --no-from >patch &&
+   cat >expect <<-\EOF &&
+   From: A U Thor <aut...@example.com>
+
+   EOF
+   sed -ne "/^From:/p; /^$/p; /^---$/q" patch.head &&
+   test_cmp expect patch.head
+'
+
 test_expect_success '--from omits redundant in-body header' '
git format-patch -1 --stdout --from="A U Thor <aut...@example.com>" 
>patch &&
cat >expect <<-\EOF &&
@@ -1129,7 +1151,7 @@ test_expect_success 'in-body headers trigger content 
encoding' '
 append_signoff()
 {
C=$(git commit-tree HEAD^^{tree} -p HEAD) &&
-   git format-patch --stdout --signoff $C^..$C >append_signoff.patch &&
+   git format-patch --no-from --stdout --signoff $C^..$C 
>append_signoff.patch &&
sed -n -e "1,/^---$/p" append_signoff.patch |
egrep -n "^Subject|Sign|^$"
 }
-- 
git-series 0.8.7
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Josh Triplett
On Sat, Jul 30, 2016 at 01:47:42AM -0400, Jeff King wrote:
> On Fri, Jul 29, 2016 at 09:50:55PM -0700, Josh Triplett wrote:
> 
> > I would propose the following then:
> > 
> > - I'll write a patch adding a config option format.from, along with
> >   documentation, without changing the default.
> > - The release notes for the version of git introducing that config
> >   option should mention, prominently, the plan to change the default in
> >   a future version of git.
> > - A subsequent release can change the default.  No major rush to do
> >   this.
> > 
> > Does that sound reasonable?
> 
> That sounds fine to me.
> 
> I do have to admit that after reading through the "format.*" section of
> git-config(1), there is quite a bit that is configurable in it. So
> perhaps we do not need to be as careful about behavior changes as I
> thought.
> 
> So if you wanted to squish steps 2 and 3 together, that would also be OK
> by me.

I'll send two separate patches, and I'll leave it up to Junio whether to
apply the second one right away or wait a release.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 06:58:00PM -0400, Jeff King wrote:
> On Thu, Jul 28, 2016 at 07:08:02PM -0700, Josh Triplett wrote:
> 
> > > I think on the whole that defaulting to "--from" would help more people
> > > than hurt them, but if we do believe there are scripts that would be
> > > regressed, it probably needs a deprecation period.
> > 
> > I don't think it's likely that there are scripts that would be regressed
> > (and I think it's likely that there are scripts that would be
> > progressed), but I'd also have no objection to a deprecation period.
> 
> I'm on the fence, so I'd leave the final decision on whether to deal
> with deprecation to you (who will write the patch) and Junio (as the
> maintainer).

OK, see the plan proposed below then.

> > I just confirmed that with the default changed, --no-from works to
> > return to the current behavior, so we don't need a new option.  And
> > --no-from has worked for a long time, so scripts won't need to care if
> > they're working with an old version of git.
> > 
> > I can provide a patch implementing a new config option to set the
> > format-patch --from default ("false" for --no-from, "true" for --from,
> > or a string value for --from=value).
> 
> I'd be surprised if the config option is actually useful to anybody in
> practice (the distinction is not "this my preference" as much as "in
> what context am I calling the program?"). It is a convenient way to do
> deprecations (introduce an option, then flip the default, leaving the
> option as an escape hatch for anybody who has been broken), though.

It also allows people to change their local default in advance of git
changing the default.

> > Do you think this needs the kind of very noisy deprecation period that
> > push.default had, where anyone without the git-config option set gets a
> > warning to stderr?  Or do you think it would suffice to provide a
> > warning in the release notes for a while and then change the default?
> 
> IMHO the noisy deprecations haven't really been all that beneficial.
> Even after the length push.default one, I think we still had people who
> had skipped enough versions to miss it, and quite a few people became
> confused or annoyed by the spew of text.
> 
> OTOH, I'm not sure most people read the release notes carefully, either.
> It would be nice if we could make the change and count on people to
> notice it in 'next' or even 'master' before the release, but empirically
> that does not happen.
> 
> So I dunno. If we consider the risk minor, perhaps a mention in the
> release notes for version X, and then the change in X+1 would be fine.
> That gives some opportunity for release-note readers to pipe up. It's
> not foolproof, but it would give us more confidence.

I would propose the following then:

- I'll write a patch adding a config option format.from, along with
  documentation, without changing the default.
- The release notes for the version of git introducing that config
  option should mention, prominently, the plan to change the default in
  a future version of git.
- A subsequent release can change the default.  No major rush to do
  this.

Does that sound reasonable?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 01:44:44PM +0100, Richard Ipsum wrote:
> On Fri, Jul 29, 2016 at 04:04:26AM -0700, Josh Triplett wrote:
> > I hope to use git notes with git-series in the future, by putting
> > another gitlink under the git-series for notes related to the series.
> > I'd intended that for more persistent notes; putting them in the series
> > solves some of the problems related to notes refs, pushing/pulling, and
> > collaboration.  Using notes for review comments makes sense as well,
> > whether in a series or in a separate ref.
> 
> Sounds interesting, can you explain how this works in more detail?

The tree within a git-series commit includes a blob "cover" for the
cover letter, a gitlink "base" for the base commit, and a gitlink
"series" for the top of the series.  I could add a gitlink "notes",
which acts like a notes ref; then, each version of the series would have
its own notes ref.  As with the series, git-series would track the
"history of history"; since git-notes themselves use git history to
store a set of notes, git-series would store the history of the notes.
So if you add, remove, or change a note, git-series would track that as
a change to the notes ref.  If you merge/rebase/etc the notes ref to
merge notes, git-series would track that too.  A different series would
have a different set of notes, so you wouldn't be limited to
one notes ref per repository.

This doesn't solve the problem of merging notes, but it *does* mean you
have a full history of the changes to notes, not just the notes
themselves.

Something similar might work for the Gerrit notesdb.

> > > I've been considering taking the perl-notedb prototype and writing
> > > a C library for it with bindings for other languages (i.e. Rust).
> > 
> > A C library based on libgit2 seems like a good idea; ideally the
> > bindings could interoperate with git2-rs.  (Alternatively, Rust can
> > *export* a C interface, so you could write directly with git2-rs. :) )
> 
> Certainly a fair alternative, though it may arguably be safer to write
> the C and export to other languages, as cool as Rust looks it's not
> established the way C is, so may be a slightly riskier foundation,
> in my view.

I was mostly joking there.  Rust makes that potentially reasonable,
unlike most languages that can consume but not easily provide a C API,
but that doesn't make it the ideal solution quite yet. :)

> And ofcourse in C we have native access to libgit2.

Right.

> > One of the items on my long-term TODO list is a completely federated
> > GitHub; I've been looking at other aspects of that, but federated
> > reviews/comments/etc seem critical to that as well.
>
> I agree.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
On Fri, Jul 29, 2016 at 11:10:11AM +0100, Richard Ipsum wrote:
> On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
> [snip]
> > 
> > I'd welcome any feedback, whether on the interface and workflow, the
> > internals and collaboration, ideas on presenting diffs of patch series,
> > or anything else.
> > 
> 
> This looks awesome!
> 
> I've been working on some similar stuff for a while also.[1][2]
> 
> I'm particularly interested in trying to establish a standard for
> storing review data in git. I've got a prototype for doing that[3],
> and an example tool that uses it[4]. The tool is still incomplete/buggy 
> though.

Looks promising, though!

> There seem to be a number of us trying to solve this in our different ways,
> it would be great to coordinate our efforts.

These definitely seem like a family of related problems.  I'd like to
use git-series as a format for storing iterations on things like GitHub
pull-requests or Gerrit patch versions (in the latter case, overcoming
Gerrit's limitations on only handling one patch at a time).  Integrating
reviews with that seems helpful.

> The prototype library I have is partly the result of some discussion and work
> with the Gerrit folks, since they were thinking about this problem
> before I even started writing git-candidate, and solved it with Notedb.[5]
> 
> Let me know if you'd like to work together on this,

I'd love to.

I'll be presenting git-series at LinuxCon North America; will you be
there by any chance?  If not, perhaps we could meet by IRC or some other
medium and talk about this family of problems.

I hope to use git notes with git-series in the future, by putting
another gitlink under the git-series for notes related to the series.
I'd intended that for more persistent notes; putting them in the series
solves some of the problems related to notes refs, pushing/pulling, and
collaboration.  Using notes for review comments makes sense as well,
whether in a series or in a separate ref.

> I've been considering taking the perl-notedb prototype and writing
> a C library for it with bindings for other languages (i.e. Rust).

A C library based on libgit2 seems like a good idea; ideally the
bindings could interoperate with git2-rs.  (Alternatively, Rust can
*export* a C interface, so you could write directly with git2-rs. :) )

One of the items on my long-term TODO list is a completely federated
GitHub; I've been looking at other aspects of that, but federated
reviews/comments/etc seem critical to that as well.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] git-series: track changes to a patch series over time

2016-07-29 Thread Josh Triplett
I'd like to announce a project I've been working on for a while:

git-series provides a tool for managing patch series with git, tracking
the "history of history". git series tracks changes to the patch series
over time, including rebases and other non-fast-forwarding changes. git
series also tracks a cover letter for the patch series, formats the
series for email, and prepares pull requests.

This makes it easier to collaborate on a patch series, distribution
package, backport, or any other development process that includes
rebasing or non-fast-forward development.

A patch series typically goes through multiple iterations before
submission; the path from idea to RFC to [PATCHv12 1/8] includes many
invocations of git rebase -i. However, while Git tracks and organizes
commits quite well, it doesn't actually track changes to a patch series
at all, outside of the ephemeral reflog. This makes it a challenge to
collaborate on a patch series, distribution package, backport, or any
other development process that includes rebasing or non-fast-forward
development.

Typically, tracking the evolution of a patch series over time involves
moving part of the version control outside of git. You can move the
patch series from git into quilt or a distribution package, and then
version the patch files with git, losing the power of git's tools. Or,
you can keep the patch series in git, and version it via multiple named
branches; however, names like feature-v2, feature-v3-typofix, and
feature-v8-rebased-4.6-alice-fix sound like filenames from corporate
email, not modern version control. And either way, git doesn't track
your cover letter at all.

git-series tracks both a patch series and its evolution within the same
git repository. git-series works entirely with existing git features,
allowing git to push and pull a series to any git repository along with
other branches and tags. Each time you change the patch series, whether
fast-forwarding or not, you can "git series commit" a new version of the
patch series, complete with commit message.

You can rebase a patch series with "git series rebase -i", format it for
submission with "git series format", or send a "please pull" request with
"git series req".  git-series knows the base of your series, so you
don't need to count patches or find a commit hash to run rebase or
format.

If you're interested in trying git-series, see
https://github.com/git-series/git-series for installation instructions
and a "getting started" guide.

I've also documented the internal storage format of git-series at
https://github.com/git-series/git-series/blob/master/INTERNALS.md ,
including the details for how git-series ensures git can always reach,
push, and pull a series.

I'd welcome any feedback, whether on the interface and workflow, the
internals and collaboration, ideas on presenting diffs of patch series,
or anything else.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-28 Thread Josh Triplett
On Thu, Jul 28, 2016 at 08:16:19PM -0400, Jeff King wrote:
> The question in my mind is whether people actually use format-patch for
> things besides emailing, and if the final destination is something other
> than "git am".  It is a handy format because it is the least-lossy way
> to move commits around external to git itself.  That's why "rebase" used
> it originally. If the final destination is "am" (as it is for rebase),
> then in-body headers are OK, because we know it understands those. If
> not, then it's a regression.
> 
> I think on the whole that defaulting to "--from" would help more people
> than hurt them, but if we do believe there are scripts that would be
> regressed, it probably needs a deprecation period.

I don't think it's likely that there are scripts that would be regressed
(and I think it's likely that there are scripts that would be
progressed), but I'd also have no objection to a deprecation period.

I just confirmed that with the default changed, --no-from works to
return to the current behavior, so we don't need a new option.  And
--no-from has worked for a long time, so scripts won't need to care if
they're working with an old version of git.

I can provide a patch implementing a new config option to set the
format-patch --from default ("false" for --no-from, "true" for --from,
or a string value for --from=value).

Do you think this needs the kind of very noisy deprecation period that
push.default had, where anyone without the git-config option set gets a
warning to stderr?  Or do you think it would suffice to provide a
warning in the release notes for a while and then change the default?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-28 Thread Josh Triplett
On Thu, Jul 28, 2016 at 02:37:04PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > I'd like to propose changing the default behavior of git-format-patch to
> > --from (and adding a --from-author option to override, and perhaps a
> > config setting).  This will not change the output *except* when
> > formatting patches authored by someone else.  git-am and git-send-email
> > both handle the --from format without any issues.
> 
> I see this in "format-patch --help":
> 
>Note that this option is only useful if you are actually
>sending the emails and want to identify yourself as the
>sender, but retain the original author (and git am will
>correctly pick up the in-body header).  Note also that
>git send-email already handles this transformation for
>you, and this option should not be used if you are
>feeding the result to git send-email.
> 
> The first one says "only useful", but it seems what it really means
> is "it becomes no-op if you are sending your own patch anyway".  So
> that one does not worry me.  What is most worrysome is the latter
> half of the last sentence.  Is it really "should not be", or is it
> merely "use of this option is just a waste of time, as you would get
> exactly the same result anyway"?  If it is the latter, that is fine.

As far as I can tell, it's the latter.  git send-email can do this same
transformation, but handles mails that already have the transformation
done to them without any issue.

> One thing I absolutely do not want to see is people to start
> repeating their own ident on in-body "From: " header when they send
> their own patch.  That would waste everybody's time pointing out
> "You do not have to do that, it merely adds noise".  As long as you
> can guarantee that your change won't increase the rate of that, I am
> fine with the proposal.

git format-patch with --from *only* adds an in-body "From:" if the
commit author differs from the local committer identity.  So, as far as
I can tell, the only scenario that would produce additional in-body "From:"
headers here would be if someone had failed to configure their git
identity, and manually set the author for their own commits.  (In which
case, they'd also have a broken "From:" in any cover letter they
generated.)

So, it seems exceedingly unlikely to me that this would result in
unnecessary in-body "From:" headers.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-28 Thread Josh Triplett
On Thu, Jul 28, 2016 at 05:56:03PM -0400, Jeff King wrote:
> Another way to think about it is that "--from" is a no-brainer when you
> really are going to email the patches (and that's why it is has always
> been the default behavior in git-send-email). But if you _aren't_ going
> to mail the patches, retaining the original headers is more convenient.
> It's not clear to me how many non-mail users of format-patch there are
> (certainly rebase is one of them, but because it uses "am" on the
> receiving side, I think everything should Just Work).

I've seen various people use git format-patch to produce a patch file to
email around, or a patch file to commit to a patches directory in a
distribution package.  I don't think any such tools would work
incorrectly with the --from output, though for purely aesthetic reasons,
I can imagine not wanting to include your own email address when not
sending the patch by email.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-28 Thread Josh Triplett
On Thu, Jul 28, 2016 at 03:14:48PM -0700, Junio C Hamano wrote:
> Jeff King  writes:
> > I think the original reason I did not make "--from" the default is that
> > I was worried about breaking consumers which do not know how to handle
> > in-body headers.
> 
> That's a fair concern.
> 
> So going back to Josh's original problem description:
> 
> While git-send-email knows how to change the patch mails to use your own
> address as "From:" and add a "From:" line to the body for the author,
> any other tool used to send emails doesn't do that.
> 
> I wonder how these "any other tool" (that reads the format-patch
> output, i.e. mbox file with one mail per file each, and sends each
> as a piece of e-mail, without paying attention who you, the tool's
> user, are and blindly send them with the original "From:" and other
> headers intact in the header part of the message) are used in the
> wild to send patch submissions.  /usr/bin/mail or /usr/bin/Mail
> would not be among them, as I suspect they would place everything in
> the body part, and the would do so without stripping the "From "
> line that exists before each e-mail message.

mutt -H would be one example; I regularly use that to send mails.
(It'll override "From:" if it doesn't know the address in it, which
loses the author information entirely; it'll work fine with the --from
format.) git-imap-send would be another example; its behavior would vary
by mail client.  Both of those should always work fine with a mail
produced via --from; they'll just ignore the in-body "From:" and send
the mail.  They'd tend to do the wrong thing with a mail produced
without using --from though.

I don't know what people who end up sending From-spoofed mails to LKML
are using, but I've seen such mails regularly.  I also get occasional
blowback from someone who sent such mails including patches I authored
with my address spoofed as "From:".  And I've also seen someone flamed
for sending patches to a mailing list for review with spoofed "From:"
addresses.

I can think of aesthetic reasons to want the non-"--from" format (for
instance, sticking patch files into a non-git-based tool like quilt or a
distribution packaging system, and not wanting your own email address
included), but I can't think of any tool that would produce incorrect
results if handed the --from format.  That seems like an argument for
switching the default, and adding a --from-author option or similar to
get the current output.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] git-format-patch: default to --from to avoid spoofed mails?

2016-07-28 Thread Josh Triplett
When git-format-patch formats a patch authored by someone other than
yourself, it defaults to filling in the "From:" field of the email from
the commit author.  If you explicitly pass the --from option,
git-format-patch will instead use your own committer identity as the
"From:", and then put a "From:" line at the top of the body if the
commit author differs.  (git-am know to use that as the commit author
when applying.)

While git-send-email knows how to change the patch mails to use your own
address as "From:" and add a "From:" line to the body for the author,
any other tool used to send emails doesn't do that.  I've seen more than
a few mails sent to various mailing lists and patch review tools with a
spoofed "From:" field pointing to the commit author, typically without
the knowledge of the author, which can lead to interesting surprises.

I'd like to propose changing the default behavior of git-format-patch to
--from (and adding a --from-author option to override, and perhaps a
config setting).  This will not change the output *except* when
formatting patches authored by someone else.  git-am and git-send-email
both handle the --from format without any issues.

Before I write such a patch: does anyone see a problem with such a
change?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: format-patch with pager.format-patch=true gets very confused

2016-07-17 Thread Josh Triplett
On Sun, Jul 17, 2016 at 02:41:48PM +0200, Johannes Schindelin wrote:
> Hi Josh,
> 
> On Sat, 16 Jul 2016, Josh Triplett wrote:
> 
> > git-config(1) documents the ability to enable or disable the pager (or
> > set a command-specific pager) for any command by setting
> > pager.=true.  For most commands, this seems to work as expected.
> > However, setting pager.format-patch=true (or setting it to any specific
> > pager) breaks badly: the pager spawns, with no output in it, and the
> > pager doesn't respond to keystrokes (which makes it difficult to quit).
> > 
> > I think this may occur because format-patch's "reopen_stdout" interacts
> > badly with the pager.
> > 
> > I think it makes sense for "format-patch --stdout" to respect
> > pager.format-patch, but for format-patch *without* stdout to ignore
> > pager.* and *never* spawn a pager, given that its only output (the list
> > of patch files) goes to "realstdout".
> 
> As per http://article.gmane.org/gmane.comp.version-control.git/299451,
> the `js/log-to-diffopt-file` patch series will be merged to `master` soon.
> This patch series avoids the reopen() altogether and should fix the
> problem you experience.
> 
> Since it is already in `next`, it should be relatively easy for you to
> build and confirm. Would you kindly do that?

I can confirm that that fixes the problem.  Thanks!

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


format-patch with pager.format-patch=true gets very confused

2016-07-16 Thread Josh Triplett
git-config(1) documents the ability to enable or disable the pager (or
set a command-specific pager) for any command by setting
pager.=true.  For most commands, this seems to work as expected.
However, setting pager.format-patch=true (or setting it to any specific
pager) breaks badly: the pager spawns, with no output in it, and the
pager doesn't respond to keystrokes (which makes it difficult to quit).

I think this may occur because format-patch's "reopen_stdout" interacts
badly with the pager.

I think it makes sense for "format-patch --stdout" to respect
pager.format-patch, but for format-patch *without* stdout to ignore
pager.* and *never* spawn a pager, given that its only output (the list
of patch files) goes to "realstdout".

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-09 Thread Josh Triplett
On Sat, Jul 09, 2016 at 09:35:24AM +0200, Johannes Schindelin wrote:
> On Fri, 8 Jul 2016, Junio C Hamano wrote:
> > Josh Triplett <j...@joshtriplett.org> writes:
> > 
> > > That sounds reasonable.  And if they *do* end up taking any time to
> > > traverse, it's because they weren't reachable from other anchoring
> > > points, so taking the extra time to traverse them seems fine.
> > 
> > The only thing that is hard is to clearly define _what_ are the new
> > anchoring points.
> > 
> > It cannot be "anything directly under .git that has all-caps name
> > that ends with _HEAD".  The ones we write we know are going to be
> > removed at some point in time (e.g. "git reset", "git bisect reset",
> > "git merge --abort", etc.).  We do not have any control on random
> > ones that the users and third-party tools leave behind, holding onto
> > irrelevant objects forever.
> 
> Please note that bisect already uses the (transient) refs/bisect/
> namespace. So I do not think we need to take specific care of the
> BISECT_* files.
> 
> If we had thought of it back then, we could have used such a transient
> namespace also for FETCH_HEAD, CHERRY_PICK_HEAD and also for detached
> HEADs (which we should have called "unnamed branches").
> 
> Now, how about special-casing *just* these legacy files in gc: HEAD,
> FETCH_HEAD, MERGE_HEAD and CHERRY_PICK_HEAD? Any new transient refs should
> live in the refs/ namespace, which is already handled.

That seems workable as well; in that case, we should also document this
(in the git-gc manpage at a minimum), and explicitly suggest creating
refs in refs/ but outside of refs/heads/ and refs/tags/, rather than
directly in .git/.

> BTW this issue is getting much more problematic when you have a lot of
> worktrees, some of which operate on detached HEADs. Which I do.
> 
> Ciao,
> Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-08 Thread Josh Triplett
On Fri, Jul 08, 2016 at 07:50:53PM -0400, Theodore Ts'o wrote:
> On Fri, Jul 08, 2016 at 01:30:06PM -0700, Junio C Hamano wrote:
> > 
> > I can imagine I'd start hacking on a project that I rarely touch, give up
> > resolving a "git pull" from an unconfigured place (hence, some stuff is
> > only reachable from FETCH_HEAD), and after 2*N days, come back
> > to the repository and find that I cannot continue working on it.
> 
> Sure, but that's something that could happen today, and no one has
> really complained, have they?

Until now. :)

> > Turning the rule to "*_HEAD we know about, and those we don't that
> > are young" would not change the situation, as I may be depending on
> > some third-party tool that uses its OWN_HEAD just like we use
> > FETCH_HEAD in the above example.
> > 
> > So I dunno if that is a good solution. If we are going to declare that
> > transient stuff will now be kept, i.e. keeping them alive is no longer
> > end user's responsibility, then probably we should make it end user's
> > responsibility to clean things up.
> 
> Well, the question is what does "transient" stuff really mean?  If we
> keep them forever, then are they really any different from stuff under
> refs/heads?

No, they're not.  And I don't think they should be; HEAD itself is *not*
transient, as a detached HEAD can reference valuable non-transient
objects.

In an ideal world, HEAD and all other such refs would live somewhere
under refs, to avoid the special case.

> Maybe pester the user if there is stale *_HEAD files, but don't
> actually get rid of the objects?

Why pester at all?  Just leave them, and if the user has large objects
they don't care about and wants to decrease the size of the repository,
they can follow the advice from the git-gc manpage: "If you are
expecting some objects to be collected and they aren’t, check all of
those locations and decide whether it makes sense in your case to remove
those references."  (Where "those locations" will need to expand to
mention "*HEAD".)

(Also, I'd suggest "*HEAD", possibly limited to characters matching
[-_A-Z], rather than "*_HEAD".)

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-08 Thread Josh Triplett
On Fri, Jul 08, 2016 at 10:14:33AM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > That sounds reasonable.  And if they *do* end up taking any time to
> > traverse, it's because they weren't reachable from other anchoring
> > points, so taking the extra time to traverse them seems fine.
> 
> The only thing that is hard is to clearly define _what_ are the new
> anchoring points.
> 
> It cannot be "anything directly under .git that has all-caps name
> that ends with _HEAD".  The ones we write we know are going to be
> removed at some point in time (e.g. "git reset", "git bisect reset",
> "git merge --abort", etc.).  We do not have any control on random
> ones that the users and third-party tools leave behind, holding onto
> irrelevant objects forever.

"We don't understand it, so it must not be important" does not seem like
a safe approach.  An object matching "[_A-Z]*HEAD" might act more like a
detached HEAD, referencing objects the user wants to hold onto
permanently.  And even FETCH_HEAD might point to objects the user
expected to stick around until they chose to ignore them; what makes
FETCH_HEAD a less legitimate head to work with than refs/heads/scratch?
(Or, for that matter, what about an ugly merge in MERGE_HEAD that the
user put off until after a vacation?)

The only downside of "holding onto irrelevant objects forever" is
storage space, which the user can easily reclaim by removing HEADs they
don't want.  The downside of pruning objects the user wanted is data
loss.  Why not just mark all those objects as roots for reachability
permanently, and let the user remove them if they want to prune those
objects?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: gc and repack ignore .git/*HEAD when checking reachability

2016-07-08 Thread Josh Triplett
On Thu, Jul 07, 2016 at 09:34:02PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > This could result in data loss, if a user expected that having an object
> > referenced from those places would protect it from pruning.
> 
> Yeah, luckily, nobody expects such.  I do not think any of our
> document says nothing other than HEAD like CHERRY_PICK_HEAD is
> reachability anchoring point; they are designed to be transient.

I can imagine at least one scenario that would result in data loss here:
git pull a URL (not referenced via any ref other than
FETCH_HEAD/MERGE_HEAD), get a merge conflict, get halfway through
resolving it, set that repository aside for a while, do something that
triggers a gc, then attempt to finish and commit.

Unlikely, but not impossible.  Same reason the reachability logic looks
at the index.

(I originally encountered this because I intended to add another
HEAD-like ref in .git, so I started investigating the logic around such
HEADs.)

> Because they are designed to be transient, I do not think there is
> any downside (other than the initial start-up cost) to including
> them in reachability computation.  Because they are meant to be
> transient, the objects anchored by them would be reachable from
> other anchoring points anyway.

That sounds reasonable.  And if they *do* end up taking any time to
traverse, it's because they weren't reachable from other anchoring
points, so taking the extra time to traverse them seems fine.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


gc and repack ignore .git/*HEAD when checking reachability

2016-07-07 Thread Josh Triplett
The manpage for git gc says:
> git gc tries very hard to be safe about the garbage it collects. In
> particular, it will keep not only objects referenced by your current
> set of branches and tags, but also objects referenced by the index,
> remote-tracking branches, refs saved by git filter-branch in
> refs/original/, or reflogs (which may reference commits in branches
> that were later amended or rewound).

gc, repack, and anything else that uses the machinery in reachable.c
will also check HEAD, to include objects only reachable from a detached
HEAD (which the manpage should document).

However, unreachable.c does not check any other ref that sits directly
in the .git directory, such as MERGE_HEAD, FETCH_HEAD, or
CHERRY_PICK_HEAD.  To test this, try creating a new empty repository
with "git init repo ; cd repo", then use "git fetch URL" to fetch a
repository into FETCH_HEAD, then run "git repack -a -d -f", and then
"git show FETCH_HEAD".  This similarly affects "git gc", which will
unpack all the objects from the pack and leave them loose.

This could result in data loss, if a user expected that having an object
referenced from those places would protect it from pruning.

I think the right fix for this would involve having
mark_reachable_objects in reachable.c add all refs that match
.git/*HEAD, not just .git/HEAD itself.  (I'd suggest matching .git/*HEAD
rather than hardcoding the list of "special" refs, to provide
compatibility with any other tool or future version of git that
introduces another such ref.)  This seems fairly easily done with a new
variant of do_head_ref that includes all such refs, along with a
one-line change to mark_reachable_objects to use it.

Does this seem like a reasonable approach?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Easiest way to clone over an existing directory?

2016-06-15 Thread Josh Triplett
Currently, every time I set up a new system, I run the following:

git clone $MY_HOMEDIR
mv home/.git .
rm -r home
git checkout -f

This seems like an odd dance to go through.  But I can't just git clone
into ~ directly, because git clone will not clone into an existing
non-empty directory.

(I could use "git clone -n" to avoid the unnecessary checkout, but the
files are small, and it wouldn't remove the need to rmdir so the number
of commands would remain the same.)

Does some better way exist to handle this?  And if not, would it make
sense for git clone to have an option to clone into an existing
directory (which should also avoid setting junk_work_tree)?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Properties of trees referencing commit objects (mode 160000)?

2016-03-21 Thread Josh Triplett
On Mon, Mar 21, 2016 at 01:57:13AM -0400, Jeff King wrote:
> On Sun, Mar 20, 2016 at 04:22:02PM -0700, Josh Triplett wrote:
> 
> > I want to track the evolution of a patch series or other commit history,
> > through non-fast-forwarding actions like rebase, rebase -i, or commit
> > --amend.  Similar in spirit to reflog, but with intentional commits and
> > commit messages, and most importantly with the ability to share and
> > collaborate on it.  For each version of the patch series, I plan to
> > track the commit at the end of the series, and optionally the commit at
> > the base of the patch series (to simplify tracking of rebasing); I'll
> > use the tree object associated with the commit to track the cover
> > letter, and perhaps meta-changelog entries associated with v2/v3/vN of
> > the series.
> > 
> > Patch series almost always need to evolve through non-fast-forwarding
> > history.  And I've seen that done in two ways: either pull the patch
> > series out of git, put it in quilt or similar, and track the quilt
> > series with git; or pull the versioning of the patch series out of git
> > and track it with branch names like feature-v2, feature-2016-03-20,
> > feature-rebased, and feature-rebased-4.4-fixed-foo-fixed-bar.  That last
> > one closely matches a real-world example I've recently seen.  And that
> > starts to look a lot like the naming of files and documents in
> > organizations that haven't yet discovered the wonders of version
> > control.
> > 
> > I'd like to have the best of both worlds: handle the patch series in
> > git, *and* handle the versioning of the patch series in git.
> 
> It seems like you could represent this in git by storing a merge commit
> for each revision of the patch series, with one parent as the current
> real tip of the series, and the other as the merge-commit for the
> previous revision of the series. The tree would be the same as current
> tip commit's tree. You could store metadata about the series in the
> merge commits (presumably including some marker to say "I'm a special
> series-revision commit, not a regular merge").

That's exactly what I'm planning to do now, ever since your initial
response said submodules wouldn't work.

> That's roughly the same as having "feature-v1", "feature-v2", etc, in a
> tree, except that you have to walk down the parent pointers to discover
> each entry, rather than walking the tree.
> 
> The resulting history would be viewable with naive "git log", but you
> would probably want to use "--first-parent" to see just the revisions,
> or some to-be-invented option to see just the most recent commits (e.g.,
> something like: if the commit subject is "magic-token: my-series", then
> skip this commit, show the second parent, and then walk down the first
> parent chain, skipping commits until we hit a commit that doesn't have
> "magic-token: my-series" in it).

Rather than relying on the numeric index of the commit parents for
semantic value, I planned to use the tree object.  I can still use
gitlinks for named references to commits from within the series, as long
as the commits linked from those trees have references via parents to
keep them alive.  For instance, /series within the commit can refer to
the commit object at the top of the series, /base (if present) can refer
to the base of the series (e.g. v4.4), and /cover (a blob) can contain
the cover letter.  Then "git series format" automatically knows the base
to start from, and "git series log" or "git series diff" will know the
difference between "reordered patches" and "rebased on a new base".

The working copy and .git/HEAD will point to the last commit in the
current version of the patch series, so that tools like "git rebase -i"
and similar can Just Work; "git series" will manage the separate ref
independently.

I'll need to provide a variety of additional tools here for showing what
has changed in a patch series; I'd eventually like to support a sensible
"git series diff" that can show things like "rebased on this base,
reordered three patches, dropped one, edited another, and changed the
cover letter".

> And it would just work for transferring revisions. You'd probably have
> one ref per patch series (pointing to the marker commit of the most
> recent version), and the "real" clean history of the project would never
> include any of the revision markers at all (though it could if you
> really wanted to).

Exactly the plan.  And you won't ever actually check out those refs into
your working copy; you'll check out the patch series commits they
reference instead.

> But I will

Re: Properties of trees referencing commit objects (mode 160000)?

2016-03-20 Thread Josh Triplett
On Sun, Mar 20, 2016 at 04:07:25PM -0400, Jeff King wrote:
> On Sun, Mar 20, 2016 at 11:45:24AM -0700, Josh Triplett wrote:
> > > No, we do not follow "gitlinks" like this for reachability. Neither for
> > > pruning, nor for object transfer via push/fetch. So you'd need to have a
> > > separate reference to it (or history containing it).
> > 
> > Argh.  If I have a pile of disconnected commits, is there anything git
> > *would* follow to see them, other than a pile of refs?
> 
> I don't think so. We follow refs, tag "object" fields, and commit
> "parent" fields to get to commits, but never tree entries.

That does make sense, given your point (2) below about efficiency.

> And I don't think it works to just tweak the mode to 100644 for the
> commit entry; the checkout code will complain that it was expecting a
> blob and got a commit.

That much I assumed. :)

> > I suppose I could artificially generate a stack of merge commits with
> > those otherwise disconnect commits as parents, which would let me
> > reference them all from a single ref.  Still unsatisfying, though.
> 
> Yes, though you can do it all in one single merge commit. The tree of
> that merge commit wouldn't matter, it could literally just be a pointer
> to all of the parents (we used to have some limits on the number of
> parents, but I don't think we do anymore?).

I didn't want to assume that all the software assuming upper bounds on
octopus merges had disappeared.  And more importantly...

> If you just threw away that stacked merge and made a new one any time it
> was updated, I think it wouldn't end up very efficient for fetching. The
> client would say "I have giant-merge-commit X", and the server would say
> "well, I need to send you giant-merge-commit Y", but if it does not any
> longer have X, it cannot realize that Y includes 99% of what is in X.

...that.  And the linearly growing cost of recreating and working with
the merge commit as the number of parent commits grows.

> So I guess you'd want a history chain of those merges. I'm not quite
> sure what your application is, so I don't know if that would be a pain
> or not.

As it turns out, that fits fairly naturally.

I want to track the evolution of a patch series or other commit history,
through non-fast-forwarding actions like rebase, rebase -i, or commit
--amend.  Similar in spirit to reflog, but with intentional commits and
commit messages, and most importantly with the ability to share and
collaborate on it.  For each version of the patch series, I plan to
track the commit at the end of the series, and optionally the commit at
the base of the patch series (to simplify tracking of rebasing); I'll
use the tree object associated with the commit to track the cover
letter, and perhaps meta-changelog entries associated with v2/v3/vN of
the series.

Patch series almost always need to evolve through non-fast-forwarding
history.  And I've seen that done in two ways: either pull the patch
series out of git, put it in quilt or similar, and track the quilt
series with git; or pull the versioning of the patch series out of git
and track it with branch names like feature-v2, feature-2016-03-20,
feature-rebased, and feature-rebased-4.4-fixed-foo-fixed-bar.  That last
one closely matches a real-world example I've recently seen.  And that
starts to look a lot like the naming of files and documents in
organizations that haven't yet discovered the wonders of version
control.

I'd like to have the best of both worlds: handle the patch series in
git, *and* handle the versioning of the patch series in git.

> > (I'd also be tempted to ask whether a patch to teach git to follow
> > gitlinks for reachability and/or object transfer would be acceptable.)
> 
> As always, I reserve all judgement on patches until I see them. :)

Of course.  Obviously it's not possible to accept a patch that doesn't
exist yet, but it's possible to reject a patch that doesn't exist yet
based solely on description, saving the trouble of writing it. :)

> But I tend to think it would end up rather complex. Off the top of my
> head, these are the three gotchas I can think of:
> 
>   1. The reachability rules must be agreed upon between both sides of
>  the transfer. So you would need a protocol extension for "please
>  consider gitlinks reachable" so that the sender knows to include
>  them, and that when the client advertises commit X that can reach
>  Y via gitlink, it can assume that the client does not need Y.
> 
>   2. There may be some issues with efficiently finding shared history
>  during a transfer.  That process usually just looks at the commits,
>  and then we hand the set of have/want commits to the pack
>  generation code to figure out the full 

Re: Properties of trees referencing commit objects (mode 160000)?

2016-03-20 Thread Josh Triplett
On Sun, Mar 20, 2016 at 03:30:27PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > On Sun, Mar 20, 2016 at 12:18:04AM -0400, Jeff King wrote:
> >> On Sat, Mar 19, 2016 at 03:13:48PM -0700, Josh Triplett wrote:
> >> 
> >> > I'm building some tools to track commit objects, and I'm thinking of
> >> > using submodule-style references to commit objects in tree objects (mode
> >> > 16) to do so.  I'm trying to figure out some of the properties of
> >> > that.
> >> > 
> >> > Can a commit object referenced that way live in the same repository,
> >> > rather than some external repository?
> >> 
> >> Yes, it can be in the same repository, but...
> >
> > Will git clone/checkout/etc handle it properly in that case, in the
> > absence of a .gitmodules file?  Or would it only work with custom tools?
> 
> That depends on the definition of "proper".  The default "proper"
> way for the core Git for submodules/gitlinks is to create an empty
> directory.  If you want to populate a working tree for that, you'd
> need "git submodule" support, but because you are writing "some
> tools" on your own, there probably is a reason why you do not want
> to use "git submodule", so I am guessing that your definition of
> "proper" does not match either core Git or "git submodule" considers
> "proper"?  In which case you would need to implement your own
> semantics (you may not even want to have an empty directory there,
> for example).

I can live with "creates an empty directory and doesn't choke or
complain".  I'd kinda hoped that git might notice it already has the
commit, and do something like checking out the files of that commit into
the working directory, but it isn't critical.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Properties of trees referencing commit objects (mode 160000)?

2016-03-20 Thread Josh Triplett
On Sun, Mar 20, 2016 at 12:18:04AM -0400, Jeff King wrote:
> On Sat, Mar 19, 2016 at 03:13:48PM -0700, Josh Triplett wrote:
> 
> > I'm building some tools to track commit objects, and I'm thinking of
> > using submodule-style references to commit objects in tree objects (mode
> > 16) to do so.  I'm trying to figure out some of the properties of
> > that.
> > 
> > Can a commit object referenced that way live in the same repository,
> > rather than some external repository?
> 
> Yes, it can be in the same repository, but...

Will git clone/checkout/etc handle it properly in that case, in the
absence of a .gitmodules file?  Or would it only work with custom tools?

> > Will git treat such a reference as keeping the commit object (and
> > everything recursively referenced by it) live and reachable?  If that
> > commit object is only reachable by the tree, and not by following the
> > parents of any commit directly referenced from refs/*, will git discard
> > it as unreachable?
> 
> No, we do not follow "gitlinks" like this for reachability. Neither for
> pruning, nor for object transfer via push/fetch. So you'd need to have a
> separate reference to it (or history containing it).

Argh.  If I have a pile of disconnected commits, is there anything git
*would* follow to see them, other than a pile of refs?

I suppose I could artificially generate a stack of merge commits with
those otherwise disconnect commits as parents, which would let me
reference them all from a single ref.  Still unsatisfying, though.

Also, thanks, "gitlink" was the term I was trying to think of.

(I'd also be tempted to ask whether a patch to teach git to follow
gitlinks for reachability and/or object transfer would be acceptable.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Properties of trees referencing commit objects (mode 160000)?

2016-03-19 Thread Josh Triplett
I'm building some tools to track commit objects, and I'm thinking of
using submodule-style references to commit objects in tree objects (mode
16) to do so.  I'm trying to figure out some of the properties of
that.

Can a commit object referenced that way live in the same repository,
rather than some external repository?

Will git treat such a reference as keeping the commit object (and
everything recursively referenced by it) live and reachable?  If that
commit object is only reachable by the tree, and not by following the
parents of any commit directly referenced from refs/*, will git discard
it as unreachable?

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't git stash after using git add -N

2016-03-19 Thread Josh Triplett
On Tue, Mar 15, 2016 at 09:51:35PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> 
> > As far as I can tell, if I run "git add -N" on a file, and then commit
> > without adding the file contents, it gets committed as an empty file.
> 
> Is that true?  Git once worked like that in earlier days, but I
> think write-tree (hence commit) would simply ignore intent-to-add
> entries from its resulting tree.

Git 2.7.0 does appear to commit an empty file if I commit after git add
-N.

> > Could stash save it exactly as if I'd done "git add" of an empty file at
> > that path and then filled in the contents without adding them?
> 
> As I said, there is no space for a tree object to say "this one
> records an empty blob but it actually was an intent-to-add entry"
> and "this other one records an empty blob and it indeed is an empty
> blob".  So "stash pop" (or "stash apply") would fundamentally be
> unable to resurrect the exact state after "add -N".

How completely crazy would it be to use a non-standard mode bit for
that?

> >> "git rm --cached" the path and then running "stash save" would be a
> >> workaround, but then you'd probably need to use "--untracked" hack
> >> when you run "stash save" if you are stashing because you are going
> >> to do something to the same path in the cleaned-up working tree.
> >
> > Right; I do specifically want to save the working-tree files.
> 
> Then "git add" that path before "stash save" would probably be a
> better workaround.

I ended up using rm --cached and stash -u, which worked OK, though I
then had to manually restore the add -N state.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can't git stash after using git add -N

2016-03-15 Thread Josh Triplett
On Tue, Mar 15, 2016 at 04:46:48PM -0700, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > After using "git add -N", "git stash" can no longer stash:
> 
> I think this is unfortunately one of the fundamental limitations
> that comes from the way how "stash" is implemented.  It uses three
> tree objects (i.e. HEAD's tree that represents where you started at,
> the tree that results by writing the index out as a tree, and
> another tree that would result if you added all the changes you made
> to the working tree to the index and then writing the resulting
> index out as a tree), but there are some states in the index that
> cannot be represented as a tree object.  "I know I would eventually
> want to add this new path, but I cannot decide with what contents
> just yet" aka "git add -N" is one of them (the other notable state
> that cannot be represented as a tree object is paths with unresolved
> conflicts in the index).

As far as I can tell, if I run "git add -N" on a file, and then commit
without adding the file contents, it gets committed as an empty file.
Could stash save it exactly as if I'd done "git add" of an empty file at
that path and then filled in the contents without adding them?

> "git rm --cached" the path and then running "stash save" would be a
> workaround, but then you'd probably need to use "--untracked" hack
> when you run "stash save" if you are stashing because you are going
> to do something to the same path in the cleaned-up working tree.

Right; I do specifically want to save the working-tree files.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Can't git stash after using git add -N

2016-03-15 Thread Josh Triplett
After using "git add -N", "git stash" can no longer stash:

/tmp/temp$ git init
Initialized empty Git repository in /tmp/temp/.git/
/tmp/temp$ echo a > a
/tmp/temp$ git add a
/tmp/temp$ git commit -m 'Initial commit of a'
[master (root-commit) d7242c4] Initial commit of a
 1 file changed, 1 insertion(+)
 create mode 100644 a
/tmp/temp$ echo b > b
/tmp/temp$ git add -N b
/tmp/temp$ git stash
error: Entry 'b' not uptodate. Cannot merge.
Cannot save the current worktree state

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resumable git clone?

2016-03-02 Thread Josh Triplett
On Wed, Mar 02, 2016 at 12:41:20AM -0800, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > If you clone a repository, and the connection drops, the next attempt
> > will have to start from scratch.  This can add significant time and
> > expense if you're on a low-bandwidth or metered connection trying to
> > clone something like Linux.
> 
> For this particular issue, your friendly k.org administrator already
> has a solution.  Torvalds/linux.git is made into a bundle weekly
> with
> 
> $ git bundle create clone.bundle --all
> 
> and the result placed on k.org CDN.  So low-bandwidth cloners can
> grab it over resumable http, clone from the bundle, and then fill
> the most recent part by fetching from k.org already.
> 
> The tooling to allow this kind of "bundle" (and possibly other forms
> of "CDN offload" material) transparently used by "git clone" was the
> proposal by Shawn Pearce mentioned elsewhere in this thread.

That does help in the case of cloning torvalds/linux.git from
kernel.org, and I'd love to see it used transparently.

However, even with that, I still also see value in a resumable git clone
(or git pull) for many other repositories elsewhere, with a somewhat
lower pull-to-push ratio than kernel.org.  Supporting resumption based
on objects, without the repository needing to generate and keep around a
bundle, seems preferable for such repositories.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resumable git clone?

2016-03-02 Thread Josh Triplett
On Wed, Mar 02, 2016 at 12:31:16AM -0800, Junio C Hamano wrote:
> Josh Triplett <j...@joshtriplett.org> writes:
> > I think several simpler optimizations seem
> > preferable, such as binary object names, and abbreviating complete
> > object sets ("I have these commits/trees and everything they need
> > recursively; I also have this stack of random objects.").
> 
> Given the way pack stream is organized (i.e. commits first and then
> trees and blobs that belong to the same delta chain together), and
> our assumed goal being to salvage objects from an interrupted
> transfer of a packfile, you are unlikely to ever see "I have these
> commits/trees and everything they need" that are salvaged from such
> a failed transfer.  So I doubt such an optimization is worth doing.

True for the resumable clone case.  For that optimization, I was
thinking of the "pull during the merge window" case that Al Viro was
also interested in optimizing.

> Besides it is very expensive to compute (the computation is done on
> the client side, so the cycles burned and the time the user has to
> wait is of much less concern, though); you'd essentially be doing
> "git fsck" to find the "dangling" objects.

Trading client-side computation for bandwidth can potentially be
worthwhile if you have plenty of local compute but a slow and metered
link.

> The list of what would be transferred needs to come in full from the
> server end, as the list names objects that the receiving end may not
> have seen, but the response by the client could be encoded much
> tightly.  For the full list of N objects from the server, we can
> think of your response to be a bitstream of N bits, each on-bit in
> which signals an unwanted object in the list.  You can optimize this
> transfer by RLE compressing the bitstream, for example.
> 
> As git-over-HTTP is stateless, however, you cannot assume that the
> server side remembers what it sent to the client (instead, the
> client side needs to re-post what it heard from the server in the
> previous exchange to allow the server side to use it after
> validating).  So "objects at these indices in your list" kind of
> optimization may not work very well in that environment.  I'd
> imagine that an exchange of "Here are the list of objects", "Give me
> these objects" done naively in full 40-hex object names would work
> OK there, though.

Good point.  Between statelessness and Duy's point about the client list
usually being smaller than the server list, perhaps it would make sense
to not have the server send a list at all, and just have the client send
its own list.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Resumable git clone?

2016-03-02 Thread Josh Triplett
On Wed, Mar 02, 2016 at 03:22:17PM +0700, Duy Nguyen wrote:
> On Wed, Mar 2, 2016 at 3:13 PM, Josh Triplett <j...@joshtriplett.org> wrote:
> > On Wed, Mar 02, 2016 at 02:30:24AM +, Al Viro wrote:
> >> On Tue, Mar 01, 2016 at 05:40:28PM -0800, Stefan Beller wrote:
> >>
> >> > So throwing away half finished stuff while keeping the front load?
> >>
> >> Throw away the object that got truncated and ones for which delta chain
> >> doesn't resolve entirely in the transferred part.
> >>
> >> > > indexing the objects it
> >> > > contains, and then re-running clone and not having to fetch those
> >> > > objects.
> >> >
> >> > The pack is not deterministic for a given repository. When creating
> >> > the pack, you may encounter races between threads, such that the order
> >> > in a pack differs.
> >>
> >> FWIW, I wasn't proposing to recreate the remaining bits of that _pack_;
> >> just do the normal pull with one addition: start with sending the list
> >> of sha1 of objects you are about to send and let the recepient reply
> >> with "I already have , don't bother with those".  And exclude
> >> those from the transfer.  Encoding for the set being available is an
> >> interesting variable here - might be plain list of sha1, might be its
> >> complement ("I want the following subset"), might be "145th to 1029th,
> >> 1517th and 1890th to 1920th of the list you've sent"; which form ends
> >> up more efficient needs to be found experimentally...
> >
> > As a simple proposal, the server could send the list of hashes (in
> > approximately the same order it would send the pack), the client could
> > send back a bitmap where '0' means "send it" and '1' means "got that one
> > already", and the client could compress that bitmap.  That gives you the
> > RLE and similar without having to write it yourself.  That might not be
> > optimal, but it would likely set a high bar with minimal effort.
> 
> We have an implementation of EWAH bitmap compression, so compressing
> is not a problem.
> 
> But I still don't see why it's more efficient to have the server send
> the hash list to the client. Assume you need to transfer N objects.
> That direction makes you always send N hashes. But if the client sends
> the list of already fetched objects, M, then M <= N. And we won't need
> to send the bitmap. What did I miss?

M can potentially be larger than N if you have many remotes and branches
in your local repository that the server doesn't have.  However, that
certainly wouldn't be the common case, and in that case heuristics on
the client side could help there in determining a subset to send.

I can't think of any good argument for the server's hash list; a
client-sent list does seem reasonable.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >