Re: [PATCH 4/5] grep: optionally recurse into submodules

2016-11-04 Thread Jonathan Tan
On Thu, Oct 27, 2016 at 3:38 PM, Brandon Williams  wrote:
> diff --git a/builtin/grep.c b/builtin/grep.c
> index 8887b6add..f34f16df9 100644
> --- a/builtin/grep.c
> +++ b/builtin/grep.c
> @@ -18,12 +18,20 @@
>  #include "quote.h"
>  #include "dir.h"
>  #include "pathspec.h"
> +#include "submodule.h"
>
>  static char const * const grep_usage[] = {
> N_("git grep [] [-e]  [...] [[--] ...]"),
> NULL
>  };
>
> +static const char *super_prefix;

I think that the super_prefix changes could be in its own patch.

> +static int recurse_submodules;
> +static struct argv_array submodule_options = ARGV_ARRAY_INIT;

I guess this has to be static because it is shared by multiple threads.

> +
> +static int grep_submodule_launch(struct grep_opt *opt,
> +const struct grep_source *gs);
> +
>  #define GREP_NUM_THREADS_DEFAULT 8
>  static int num_threads;
>
> @@ -174,7 +182,10 @@ static void *run(void *arg)
> break;
>
> opt->output_priv = w;
> -   hit |= grep_source(opt, >source);
> +   if (w->source.type == GREP_SOURCE_SUBMODULE)
> +   hit |= grep_submodule_launch(opt, >source);
> +   else
> +   hit |= grep_source(opt, >source);

It seems to me that GREP_SOURCE_SUBMODULE is of a different nature
than the other GREP_SOURCE_.* - in struct work_item, could we instead
have another variable that distinguishes between submodules and
"native" sources? This might also assuage Junio's concerns in
 about the nature of the
sources.

That variable could also be the discriminant for a tagged union, such
that we have "struct grep_source" for the "native" sources and a new
struct (holding only submodule-relevant information) for the
submodule.

> +/*
> + * Prep grep structures for a submodule grep
> + * sha1: the sha1 of the submodule or NULL if using the working tree
> + * filename: name of the submodule including tree name of parent
> + * path: location of the submodule
> + */
> +static int grep_submodule(struct grep_opt *opt, const unsigned char *sha1,
> + const char *filename, const char *path)
> +{
> +   if (!(is_submodule_initialized(path) &&
> + is_submodule_checked_out(path))) {
> +   warning("skiping submodule '%s%s' since it is not initialized 
> and checked out",
> +   super_prefix ? super_prefix: "",
> +   path);
> +   return 0;
> +   }
> +
> +#ifndef NO_PTHREADS
> +   if (num_threads) {
> +   add_work(opt, GREP_SOURCE_SUBMODULE, filename, path, sha1);
> +   return 0;
> +   } else
> +#endif
> +   {
> +   struct work_item w;
> +   int hit;
> +
> +   grep_source_init(, GREP_SOURCE_SUBMODULE,
> +filename, path, sha1);
> +   strbuf_init(, 0);
> +   opt->output_priv = 
> +   hit = grep_submodule_launch(opt, );
> +
> +   write_or_die(1, w.out.buf, w.out.len);
> +
> +   grep_source_clear();
> +   strbuf_release();
> +   return hit;
> +   }

This is at least the third invocation of this "if pthreads, add work,
otherwise do it now" pattern - could this be extracted into its own
function (in another patch)? Ideally, there would also be exactly one
function in which the grep_source.* functions are invoked, and both
"run" and the non-pthread code path can use it.

> +}
> +
> +static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec,
> + int cached)

This line isn't modified other than the line break, as far as I can
tell, so I wouldn't break it.

> diff --git a/t/t7814-grep-recurse-submodules.sh 
> b/t/t7814-grep-recurse-submodules.sh
> new file mode 100755
> index 0..b670c70cb
> --- /dev/null
> +++ b/t/t7814-grep-recurse-submodules.sh
> @@ -0,0 +1,99 @@
> +#!/bin/sh
> +
> +test_description='Test grep recurse-submodules feature
> +
> +This test verifies the recurse-submodules feature correctly greps across
> +submodules.
> +'
> +
> +. ./test-lib.sh
> +

Would it be possible to also test it while num_threads is zero? (Or,
if num_threads is already zero, to test it while it is not zero?)


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Junio C Hamano
Junio C Hamano  writes:

>> I think the main complication is that the reachability rules are used
>> during object transfer.

One should not type after spending 20+ waking hours on plane and
airport.  I missed it when I wrote my first response, but yes, the
reachability that originates from inside a tree object indeed is a
problem, as we do not dig into trees while doing the want/have/ack
exhange.



Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 09:41:06PM -0700, Junio C Hamano wrote:

> > I think the main complication is that the reachability rules are used
> > during object transfer. So you'd probably want to introduce some
> > protocol extension to say "I understand gitrefs", so that when one side
> > says "I have sha1 X and its reachable objects", we know whether they are
> > including gitrefs there. And likewise receivers with
> > transfer.fsckObjects may complain about the new gitref tree mode
> > (fortunately a new object type shouldn't be needed).
> 
> Quite honestly I do not think backward compatibility here matters.
> When gitlinks were introduced, a repository that was created with
> gitlink capable version of Git would have failed "git fsck" that is
> not gitlink aware, and I think this new "link with reachability" is
> the same deal.  No existing implemention understands a tree entry
> whose mode bits are 14 or whatever new bit pattern we would
> assign to this thing.  You have to wait until both ends understand
> the new thing, and that is perfectly OK.

I'm OK with saying "if you use the gitref feature, you cannot push or
pull those objects with remotes that do not understand it".  But unlike
gitlink, if we fail to notice the situation, we run into a case where we
might silently lose objects, which is bad. So I think we need to be a
bit more careful.

I don't think the problems are insurmountable. I just think that's where
the real complexity is, not in the changes to teach a single git about
gitrefs.

I'm happy to stand back and let you or Josh figure out all the corner
cases. :)

-Peff


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Junio C Hamano
Christian Couder  writes:

> Couldn't a RefTree be used to store refs that point to the base
> commit,

I think it is the other way around.  With the new "gitref" thing
that is a pointer to an in-repository commit, RefTree can be
naturally implemented.


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 09:55:23PM -0600, Josh Triplett wrote:

> >> Is it possible currently for a protocol extension to result in "oh
> >the
> >> server doesn't support this so I'm going to stop pushing"?
> >
> >Yes, it would be easy for the client to abort if the server fails to
> >advertise a particular extension.
> 
> And the reverse (old client, new server) should work as well?

Yes. The server says "I know about gitrefs" and if the client does not
respond with "I also know about gitrefs", then the server can act
appropriately (e.g., for a fetch, bail if the fetched content includes
gitrefs).

-Peff


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote:
>
>> I agree with your assessment here. The main difficulty in implementing
>> gitrefs is to ensure that they actually do get picked up by
>> reachability checks to prevent dropping commits. I'm not sure how easy
>> this is, but I would much rather we go this route rather than
>> continuing along with the hack. This seems like the ideal solution,
>> since it solves the entire problem and doesn't need more hacks bolted
>> on.
>
> I think the main complication is that the reachability rules are used
> during object transfer. So you'd probably want to introduce some
> protocol extension to say "I understand gitrefs", so that when one side
> says "I have sha1 X and its reachable objects", we know whether they are
> including gitrefs there. And likewise receivers with
> transfer.fsckObjects may complain about the new gitref tree mode
> (fortunately a new object type shouldn't be needed).

Quite honestly I do not think backward compatibility here matters.
When gitlinks were introduced, a repository that was created with
gitlink capable version of Git would have failed "git fsck" that is
not gitlink aware, and I think this new "link with reachability" is
the same deal.  No existing implemention understands a tree entry
whose mode bits are 14 or whatever new bit pattern we would
assign to this thing.  You have to wait until both ends understand
the new thing, and that is perfectly OK.

Besides, I think the point of having this discussion is that Josh
did a good prototyping work of "git series" to discover what he can
do in that area of "keeping track of history of history" and what
operations are useful, without wasting time on mucking with the
object model and traversal machinery that are available to him.  

Now his prototyping is at the point where he knows at least one
enhancement to the core that would help him to redo the prototype in
the right way.  And I do not mind helping him from the core side.


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On November 4, 2016 7:48:17 PM MDT, Jeff King  wrote:
>On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote:
>
>> > You might also want fallback rules for storing gitrefs on "old"
>servers
>> > (e.g., backfilling gitrefs you need if the server didn't them in
>the
>> > initial fetch). But I guess storing any gitrefs on such a server is
>> > inherently dangerous, because the server might prune them at any
>time.
>> 
>> Is it possible currently for a protocol extension to result in "oh
>the
>> server doesn't support this so I'm going to stop pushing"?
>
>Yes, it would be easy for the client to abort if the server fails to
>advertise a particular extension.

And the reverse (old client, new server) should work as well? 

- Josh Triplett



Re: [PATCH v2 1/6] submodules: add helper functions to determine presence of submodules

2016-11-04 Thread Jonathan Tan

On 10/31/2016 03:38 PM, Brandon Williams wrote:

+   struct strbuf buf = STRBUF_INIT;
+   char *submodule_url = NULL;
+
+   strbuf_addf(, "submodule.%s.url", module->name);
+   ret = !git_config_get_string(buf.buf, _url);
+
+   free(submodule_url);
+   strbuf_release();
+   }
+
+   return ret;
+}
+
+/*
+ * Determine if a submodule has been checked out at a given 'path'
+ */
+int is_submodule_checked_out(const char *path)
+{
+   int ret = 0;
+   struct strbuf buf = STRBUF_INIT;
+
+   strbuf_addf(, "%s/.git", path);
+   ret = file_exists(buf.buf);
+
+   strbuf_release();


In this and the previous function, you can use xstrfmt.


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 04:34:34PM -0700, Jacob Keller wrote:

> > You might also want fallback rules for storing gitrefs on "old" servers
> > (e.g., backfilling gitrefs you need if the server didn't them in the
> > initial fetch). But I guess storing any gitrefs on such a server is
> > inherently dangerous, because the server might prune them at any time.
> 
> Is it possible currently for a protocol extension to result in "oh the
> server doesn't support this so I'm going to stop pushing"?

Yes, it would be easy for the client to abort if the server fails to
advertise a particular extension.

What I would worry about more is that "somehow" an older client gets
hold of history with a gitref, and then pushes it. It would be nice if
even an old server said "nope, I don't understand this and I won't take
it" rather than propagating the data to a server that will throw it
away.

> Right. I'm assuming tree objects don't get checked for invalid mode
> already? If they do, we could just change the mode to something
> unsupported currently. But... that seems like it might not be the case
> because it requires checking every tree object coming in?
> 
> I'm not familiar with what sort of checking already exists... Thoughts?

If the server sets receive.fsckObjects, then fsck_tree() runs and will
reject any non-standard mode. That option is not the default, though
some big hosters set it (GitHub does, but I am actually not that worried
about GitHub; if gitrefs support materialized I would probably ship it
there fairly promptly).

-Peff


Re: [PATCH v1 12/19] Documentation/config: add splitIndex.maxPercentChange

2016-11-04 Thread Christian Couder
On Tue, Nov 1, 2016 at 8:19 PM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> +splitIndex.maxPercentChange::
>> + When the split index feature is used, this specifies the
>> + percent of entries the split index can contain compared to the
>> + whole number of entries in both the split index and the shared
>> + index before a new shared index is written.
>> + The value should be between 0 and 100. If the value is 0 then
>> + a new shared index is always written, if it is 100 a new
>> + shared index is never written.
>
> Hmph.  The early part of the description implies this will kick in
> only when some other conditions (i.e. the bit in the index or the
> other configuration) are met, but if this disables the split index
> when it is set to 0, would we even need the other configuration
> variable?  IOW, perhaps we can do without core.splitIndex?

I think it is easier for user to be able to just set core.splitIndex
to true to enable split-index.
If we ask them to instead set splitIndex.maxPercentChange to a
"reasonable value like 20" to enable it, they will start to wonder why
20 and not 50 or 100 and might be unnecessarily confused.

Also it might be good to make it possible to have other values than
"true" and "false" for core.splitIndex in the future.
For example if we decide to first enable split-index only on
$GIT_DIR/index, we might later want an "everywhere" or "always" value
for core.splitIndex if we find a way to enable it on other indexes.

We might also want an "auto" mode in the future when split-index will
work really well, so that it can be turned on automatically on all the
repos where it makes sense (like when the index contains more than
1 entries for example).

>> + By default the value is 20, so a new shared index is written
>> + if the number of entries in the split index would be greater
>> + than 20 percent of the total number of entries.
>> + See linkgit:git-update-index[1].


Re: [PATCH v1 09/19] config: add git_config_get_max_percent_split_change()

2016-11-04 Thread Christian Couder
On Tue, Nov 1, 2016 at 8:13 PM, Junio C Hamano  wrote:
>
>>> +   return -1;
>
> Perhaps do the usual
>
> return error(_("..."));
>
> here?

Ok, it will be in the next version.


Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Brandon Williams
On 11/04, Jeff King wrote:
> On Fri, Nov 04, 2016 at 02:35:57PM -0700, Stefan Beller wrote:
> 
> > On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williams  wrote:
> > > On 11/04, Brandon Williams wrote:
> > >> Signed-off-by: Brandon Williams 
> > >
> > > Is there an acceptable way to give credit to Jeff for helping with this 
> > > patch?
> > 
> > What about:
> > Helped-by: Jeff King 
> 
> That, or often I would write:
> 
>   Based on a patch by Jeff King 
> 
> in the commit message. Basically anything is OK _except_ forging
> signed-off-by, because it has a very specific meaning. So let me also
> say that I am happy to give my:
> 
>   Signed-off-by: Jeff King 
> 
> to the original (which you should add in, to make clear that the
> copyright issues are OK).
> 
> In some cases it makes sense to just roll somebody's patch into your
> series, and then build on top. I'm fine with it all going into a single
> patch here.
> 
> -Peff

Oh if it would be more clear I can easily break it up into two patches.

-- 
Brandon Williams


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 04:37:34PM -0700, Jacob Keller wrote:
> On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplett  wrote:
> > That said, I'd *love* to have gitrefs available, for a wide variety of
> > applications, and I can see an argument for introducing them and waiting
> > a few years for them to become universally available, similar to the
> > process gitlinks went through.
> >
> > But I'd also love to have a backward-compatible solution.
> >
> > - Josh Triplett
> 
> I think that you won't really find a backwards compatible solution
> other than something like automatically generating refs for each point
> of history. I know that gerrit does something like this by storing
> each version in "refs/changes/id/version" or something along those
> lines. I think this might actually be cleaner than your parent links
> hack, and could be used as a fallback for when gitrefs don't work,
> though you'd have to code exactly how to tell what to push to a
> repository when pushing a series?

I'm not sure what the advantage of that would be, and it would mean that
if you ever have one branch without pushing the other(s), you'd get
severe time-delated breakage due to pruning.  (And if you pushed the
series without the other ref(s), its history would look right but then
you couldn't access the underlying versions of the patch series.)

One of my design goals was to *not* need a special "git series push" or
"git series pull"; you should just be able to use git push and git pull,
and you can set up normal refspecs.

That said, I could fairly easily generate the existing format with
artificial parent refs for backward compatibility, and provide a way to
use the new gitref-based storage format if you know that all your
servers and clients can handle it.  I'm also open to other suggestions
for how to make such a transition while still working with every git
server and git client that exists today.


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jacob Keller
On Fri, Nov 4, 2016 at 2:55 PM, Josh Triplett  wrote:
> That said, I'd *love* to have gitrefs available, for a wide variety of
> applications, and I can see an argument for introducing them and waiting
> a few years for them to become universally available, similar to the
> process gitlinks went through.
>
> But I'd also love to have a backward-compatible solution.
>
> - Josh Triplett

I think that you won't really find a backwards compatible solution
other than something like automatically generating refs for each point
of history. I know that gerrit does something like this by storing
each version in "refs/changes/id/version" or something along those
lines. I think this might actually be cleaner than your parent links
hack, and could be used as a fallback for when gitrefs don't work,
though you'd have to code exactly how to tell what to push to a
repository when pushing a series?

Thanks,
Jake


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jacob Keller
On Fri, Nov 4, 2016 at 12:49 PM, Jeff King  wrote:
> I think the main complication is that the reachability rules are used
> during object transfer. So you'd probably want to introduce some
> protocol extension to say "I understand gitrefs", so that when one side
> says "I have sha1 X and its reachable objects", we know whether they are
> including gitrefs there. And likewise receivers with
> transfer.fsckObjects may complain about the new gitref tree mode
> (fortunately a new object type shouldn't be needed).
>
> You might also want fallback rules for storing gitrefs on "old" servers
> (e.g., backfilling gitrefs you need if the server didn't them in the
> initial fetch). But I guess storing any gitrefs on such a server is
> inherently dangerous, because the server might prune them at any time.
>

Is it possible currently for a protocol extension to result in "oh the
server doesn't support this so I'm going to stop pushing"? This would
be a rather hard transition, but it would at least ensure that pushing
to a server which doesn't support gitrefs would fail rather than
silently accept objects and then discard them later? I think this is
the only real transition unless we can make a change that old servers
object to already.

> So perhaps a related question is: how can gitrefs be designed such that
> existing servers reject them (rather than accepting the push and then
> later throwing away half the data). It would be easy to notice in the
> client during a push that we are sending gitrefs to a server which does
> not claim that capability. But it seems more robust if it is the server
> who decides "I will not accept these bogus objects".
>
> I haven't thought all that hard about this. That's just my initial
> thoughts on what sound hard. Tweaking the reachability code doesn't seem
> all that bad; we already know all of the spots that care about
> S_ISGITLINK(). It may even be that some of those spots work out of the
> box (because gitlinks are usually about telling the graph-walking code
> that we _don't_ care about reachability; we do by default for trees and
> blobs).

Right. I'm assuming tree objects don't get checked for invalid mode
already? If they do, we could just change the mode to something
unsupported currently. But... that seems like it might not be the case
because it requires checking every tree object coming in?

I'm not familiar with what sort of checking already exists... Thoughts?

>
> I'd be surprised if all such sites work out of the box, though. Even if
> they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and
> therefore should be reachable", they need to also note that "X" is a
> commit and recursively walk its objects.
>

They won't all work out of the box, but it shouldn't be much work to
do this part.

> -Peff


Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 02:35:57PM -0700, Stefan Beller wrote:

> On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williams  wrote:
> > On 11/04, Brandon Williams wrote:
> >> Signed-off-by: Brandon Williams 
> >
> > Is there an acceptable way to give credit to Jeff for helping with this 
> > patch?
> 
> What about:
> Helped-by: Jeff King 

That, or often I would write:

  Based on a patch by Jeff King 

in the commit message. Basically anything is OK _except_ forging
signed-off-by, because it has a very specific meaning. So let me also
say that I am happy to give my:

  Signed-off-by: Jeff King 

to the original (which you should add in, to make clear that the
copyright issues are OK).

In some cases it makes sense to just roll somebody's patch into your
series, and then build on top. I'm fine with it all going into a single
patch here.

-Peff


Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 01:55:33PM -0700, Brandon Williams wrote:

> Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to
> specify a whitelist of protocols to be used in clone/fetch/pull
> commands.  This patch introduces new configuration options for more
> fine-grained control for allowing/disallowing protocols.  This also has
> the added benefit of allowing easier construction of a protocol
> whitelist on systems where setting an environment variable is
> non-trivial.

Good rationale.

> Now users can specify a policy to be used for each type of protocol via
> the 'protocol..allow' config option.  A default policy for all
> unknown protocols can be set with the 'protocol.allow' config option.

I think "unconfigured" is a better word here than "unknown", as it would
apply to known protocols like "https", etc.

That made me wonder if "unknown" would be a better behavior, but I'm
pretty sure it is not. It is harder to explain, and I think would be
less convenient in practice. I.e., you really do want:

  git config protocol.allow never
  git config protocol.https.allow always

to allow nothing but https.

> If no user configured default is made git, by default, will allow
> known-safe protocols (http, https, git, ssh, file), disallow
> known-dangerous protocols (ext), and have a default poliy of `user` for
> all other protocols.

I think this is a good way of thinking about it. The order of
enforcement becomes:

  - GIT_ALLOW_PROTOCOL; environment variables always take precedence
over config, so this makes sense. And it also is nice to put the
blunt hammer at the front for backwards-compatibility.

  - protocol-specific config

  - protocol-generic config

  - built-in defaults (known-safe, known-scary, unknown)

which seems right.

Also, s/poliy/policy/.

> The supported policies are `always`, `never`, and `user`.  The `user`
> policy can be used to configure a protocol to be usable when explicitly
> used by a user, while disallowing it for commands which run
> clone/fetch/pull commands without direct user intervention (e.g.
> recursive initialization of submodules).  Commands which can potentially
> clone/fetch/pull from untrusted repositories without user intervention
> can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent
> protocols configured to the `user` policy from being used.

Makes sense. I know "user" came from me. I don't know if there is a
better word to describe it. I originally called it "cmdline", but that
seemed too obscure (especially when a tool external to git sets it).
Something like "trusted" might make sense (we allow it only in a
more-trusted setting), but it's kind of vague. And it also doesn't leave
room for there to be more types of trust in the future. So "user" is
probably reasonable (or perhaps "user-only" or similar).

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 27069ac..5d845c4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2308,6 +2308,31 @@ pretty.::
>   Note that an alias with the same name as a built-in format
>   will be silently ignored.
>  
> +protocol.allow::
> + If set, provide a user defined default policy for all protocols which
> + don't explicitly have a policy (protocol..allow).  By default,
> + if unset, known-safe protocols (http, https, git, ssh, file) have a
> + default policy of `always`, known-dangerous protocols (ext) have a
> + default policy of `never`, and all other protocols have a default policy
> + of `user`.  Supported policies:
> ++
> +--
> +
> +* `always` - protocol is always able to be used.
> +
> +* `never` - protocol is never able to be used.
> +
> +* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is
> +  either unset or has a value of 1.  This policy should be used when you 
> want a
> +  protocol to be usable by the user but don't want it used by commands which
> +  execute clone/fetch/pull commands without user input, e.g. recursive
> +  submodule initialization.

Makes sense. I wonder if it would be good to emphasize _directly_ usable
here. I.e., "...when you want a protocol to be directly usable by the
user but don't want...".

Should clone/fetch/pull also include push?

> +protocol..allow::
> + Set a policy to be used by protocol  with clone/fetch/pull 
> commands.
> +

Nice that this matches protocol.allow, so we don't need to re-explain
that.

Should the list of protocols be here? I know they're covered under
GIT_ALLOW_PROTOCOL already, but if this is the preferred system, we
should probably explain them here, and then just have GIT_ALLOW_PROTOCOL
refer the user.

> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index ab7215e..ab25580 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -1150,13 +1150,13 @@ of clones and fetches.
>   cloning a repository to make a backup).
>  
>  `GIT_ALLOW_PROTOCOL`::
> - If set, provide a 

Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Christian Couder
On Fri, Nov 4, 2016 at 10:19 PM, Josh Triplett  wrote:
> On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote:
>> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano  wrote:
>> >
>> > Imagine we invent a new tree entry type, "gitref", that is similar
>> > to "gitlink" in that it can record a commit object name in a tree,
>> > but unlike "gitlink" it does imply reachability.  And you do not add
>> > phony parents to your commit object.  A tree that has "gitref"s in
>> > it is about annotating the commits in the same repository (e.g. the
>> > tree references two commits, "base" and "tip", to point into a slice
>> > of the main history).  And it is perfectly sensible for such a
>> > pointer to imply reachability---after all it serves different
>> > purposes from "gitlink".
>>
>> The more I think about this (and also about how to limit ref
>> advertisements as recently discussed in
>> https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/),
>> the more I think about Shawn's RefTree:
>>
>> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
>>
>> Couldn't a RefTree be used to store refs that point to the base
>> commit, the tip commit and the blob that contains the cover letter,
>> and maybe also a ref pointing to the RefTree of the previous version
>> of the series?
>
> That's really interesting!  The Software Heritage project is working on
> something similar, because they want to store all the refs as part of
> their data model as well.  I'll point them to the reftree work.

Yeah, I know them :-) and I think I have already told Stefano
Zacchiroli about this, but I am not sure anymore.
Anyway I am CC'ing him.

> If upstream git supported RefTree, I could potentially use that for
> git-series.  However, I do want a commit message and history for the
> series itself, and using refs in the reftree to refer to the parents
> seems like abusing reftree to recreate commits, in a reversal of the
> hack of using commit parents as a reftree. :)

Yeah, maybe :-) But the properties of the existing Git objects we
already use wouldn't change at all.

> What if, rather than storing a hash reference to a reftree as a single
> reference and replacing it with no history,

In what I suggest the history is kept because the new reftree has a
ref that points to the old one it is replacing.

Yeah, this reftree history maybe seen as "redundant" with the commit
history, but in my opinion this can be seen as a "feature" that will
prevent us from "mucking" too much with the commit object.

> a reftree could be
> referenced from a commit and have history?  (That would also allow
> tagging a version of the reftree.)

I think that tags are already allowed to point to any kind of Git
object, so tagging a reftree should be allowed anyway if we add a
reftree object.


Re: gitk: avoid obscene memory consumption

2016-11-04 Thread Stefan Beller
On Fri, Nov 4, 2016 at 12:49 PM, Markus Hitter  wrote:
>
> Hello all,

+cc Paul Mackeras, who maintains gitk.

>
> after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 
> 16.10, no swap to save the SSD) to its knees once more than I'm comfortable 
> with, I decided to investigate this issue.
>
> Result of this investigation is, my Git repo has a commit with a diff of some 
> 365'000 lines and Gitk tries to display all of them, consuming more than 1.5 
> GB of memory.
>
> The solution is to cut off diffs at 50'000 lines for the display. This 
> consumes about 350 MB RAM, still a lot. These first 50'000 lines are shown, 
> followed by a copyable message on how to view the full diff on the command 
> line. Diffs shorter than this limit are displayed as before.

Bikeshedding: I'd argue to even lower the number to 5-10k lines.

>
> To test the waters whether such a change is welcome, here's the patch as I 
> currently use it. If this patch makes sense I'll happily apply change 
> requests and bring it more in line with Git's patch submission expectations.

I have never contributed to gitk myself,
which is hosted at git://ozlabs.org/~paulus/gitk
though I'd expect these guide lines would roughly apply:
https://github.com/git/git/blob/master/Documentation/SubmittingPatches


Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Stefan Beller
On Fri, Nov 4, 2016 at 1:55 PM, Brandon Williams  wrote:
> Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to
> specify a whitelist of protocols to be used in clone/fetch/pull
> commands.  This patch introduces new configuration options for more
> fine-grained control for allowing/disallowing protocols.  This also has
> the added benefit of allowing easier construction of a protocol
> whitelist on systems where setting an environment variable is
> non-trivial.
>
> Now users can specify a policy to be used for each type of protocol via
> the 'protocol..allow' config option.  A default policy for all
> unknown protocols can be set with the 'protocol.allow' config option.
> If no user configured default is made git, by default, will allow
> known-safe protocols (http, https, git, ssh, file), disallow
> known-dangerous protocols (ext), and have a default poliy of `user` for
> all other protocols.
>
> The supported policies are `always`, `never`, and `user`.  The `user`
> policy can be used to configure a protocol to be usable when explicitly
> used by a user, while disallowing it for commands which run
> clone/fetch/pull commands without direct user intervention (e.g.
> recursive initialization of submodules).  Commands which can potentially
> clone/fetch/pull from untrusted repositories without user intervention
> can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent
> protocols configured to the `user` policy from being used.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/config.txt |  25 
>  Documentation/git.txt|  19 +++---
>  git-submodule.sh |  12 ++--
>  t/lib-proto-disable.sh   | 132 
> +++
>  t/t5509-fetch-push-namespaces.sh |   1 +
>  t/t5802-connect-helper.sh|   1 +
>  transport.c  |  73 +-
>  7 files changed, 235 insertions(+), 28 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 27069ac..5d845c4 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2308,6 +2308,31 @@ pretty.::
> Note that an alias with the same name as a built-in format
> will be silently ignored.
>
> +protocol.allow::
> +   If set, provide a user defined default policy for all protocols which
> +   don't explicitly have a policy (protocol..allow).  By default,

Use hyphens (`protocol..allow`) to highlight the config option.

By default, if unset, ... have a default policy ...
sounds strange. How about just dropping the first 4 words here:

 Known-safe protocols (http, https, git, ssh, file) have a
 default policy of `always`, known-dangerous protocols (ext) have a
 default policy of `never`, and all other protocols have a default policy
 of `user`.  Supported policies:


What happens if protocol.allow is set to true?



> +   if unset, known-safe protocols (http, https, git, ssh, file) have a
> +   default policy of `always`, known-dangerous protocols (ext) have a
> +   default policy of `never`, and all other protocols have a default 
> policy
> +   of `user`.  Supported policies:
> ++
> +--
> +
> +* `always` - protocol is always able to be used.
> +
> +* `never` - protocol is never able to be used.
> +
> +* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is
> +  either unset or has a value of 1.  This policy should be used when you 
> want a
> +  protocol to be usable by the user but don't want it used by commands which
> +  execute clone/fetch/pull commands without user input, e.g. recursive
> +  submodule initialization.
> +
> +--
> +
> +protocol..allow::
> +   Set a policy to be used by protocol  with clone/fetch/pull 
> commands.

How does this interact with protocol.allow?

When protocol.allow is set, this overrides the specific protocol.
If protocol is not set, it overrides the specific protocol as well(?)


> +
>  pull.ff::
> By default, Git does not create an extra merge commit when merging
> a commit that is a descendant of the current commit. Instead, the
> diff --git a/Documentation/git.txt b/Documentation/git.txt
> index ab7215e..ab25580 100644
> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -1150,13 +1150,13 @@ of clones and fetches.
> cloning a repository to make a backup).
>
>  `GIT_ALLOW_PROTOCOL`::
> -   If set, provide a colon-separated list of protocols which are
> -   allowed to be used with fetch/push/clone. This is useful to
> -   restrict recursive submodule initialization from an untrusted
> -   repository. Any protocol not mentioned will be disallowed (i.e.,
> -   this is a whitelist, not a blacklist). If the variable is not
> -   set at all, all protocols are enabled.  The protocol names
> -   currently used by git are:
> +   The new way to configure allowed 

gitk: avoid obscene memory consumption

2016-11-04 Thread Markus Hitter

Hello all,

after Gitk brought my shabby development machine (Core2Duo, 4 GB RAM, Ubuntu 
16.10, no swap to save the SSD) to its knees once more than I'm comfortable 
with, I decided to investigate this issue.

Result of this investigation is, my Git repo has a commit with a diff of some 
365'000 lines and Gitk tries to display all of them, consuming more than 1.5 GB 
of memory.

The solution is to cut off diffs at 50'000 lines for the display. This consumes 
about 350 MB RAM, still a lot. These first 50'000 lines are shown, followed by 
a copyable message on how to view the full diff on the command line. Diffs 
shorter than this limit are displayed as before.

To test the waters whether such a change is welcome, here's the patch as I 
currently use it. If this patch makes sense I'll happily apply change requests 
and bring it more in line with Git's patch submission expectations. The patch 
is made against git(k) version 2.9.3, the one coming with latest Ubuntu. Please 
also note that this is the first time I wrote some Tcl code, so the strategy 
used might not follow best Tcl practices.

$ diff -uw /usr/bin/gitk.org /usr/bin/gitk
--- /usr/bin/gitk.org   2016-08-16 22:32:47.0 +0200
+++ /usr/bin/gitk   2016-11-04 20:06:14.805920404 +0100
@@ -7,6 +7,15 @@
 # and distributed under the terms of the GNU General Public Licence,
 # either version 2, or (at your option) any later version.
 
+# Markus: trying to limit memory consumption. It happened that
+# complex commits led to more than 1.5 GB of memory usage.
+#
+# The problem was identified to be caused by extremely long diffs. The
+# commit leading to this research had some 365'000 lines of diff, consuming
+# these 1.5 GB when drawn into the canvas. The solution is to limit diffs to
+# 50'000 lines and skipping the rest. In case of a cutoff, a CLI command for
+# getting the full diff is shown.
+
 package require Tk
 
 proc hasworktree {} {
@@ -7956,6 +7965,7 @@
 
 proc getblobdiffs {ids} {
 global blobdifffd diffids env
+global parseddifflines
 global treediffs
 global diffcontext
 global ignorespace
@@ -7987,6 +7997,7 @@
 }
 fconfigure $bdf -blocking 0 -encoding binary -eofchar {}
 set blobdifffd($ids) $bdf
+set parseddifflines 0
 initblobdiffvars
 filerun $bdf [list getblobdiffline $bdf $diffids]
 }
@@ -8063,20 +8074,34 @@
 
 proc getblobdiffline {bdf ids} {
 global diffids blobdifffd
+global parseddifflines
 global ctext
 
 set nr 0
+set maxlines 5
 $ctext conf -state normal
 while {[incr nr] <= 1000 && [gets $bdf line] >= 0} {
+incr parseddifflines
+if {$parseddifflines >= $maxlines} {
+break
+}
if {$ids != $diffids || $bdf != $blobdifffd($ids)} {
catch {close $bdf}
return 0
}
parseblobdiffline $ids $line
 }
+if {$parseddifflines >= $maxlines} {
+$ctext insert end "\n--" hunksep
+$ctext insert end " Lines exceeding $maxlines skipped " hunksep
+$ctext insert end "--\n\n" hunksep
+$ctext insert end "To get a full diff, run\n\n" hunksep
+$ctext insert end "  git diff-tree -p -C --cc $ids\n\n" hunksep
+$ctext insert end "on the command line.\n" hunksep
+}
 $ctext conf -state disabled
 blobdiffmaybeseehere [eof $bdf]
-if {[eof $bdf]} {
+if {[eof $bdf] || $parseddifflines >= $maxlines} {
catch {close $bdf}
return 0
 }
@@ -9093,6 +9118,7 @@
 
 proc diffcommits {a b} {
 global diffcontext diffids blobdifffd diffinhdr currdiffsubmod
+global parseddifflines
 
 set tmpdir [gitknewtmpdir]
 set fna [file join $tmpdir "commit-[string range $a 0 7]"]
@@ -9114,6 +9140,7 @@
 set blobdifffd($diffids) $fd
 set diffinhdr 0
 set currdiffsubmod ""
+set parseddifflines 0
 filerun $fd [list getblobdiffline $fd $diffids]
 }
 

Cheers,
Markus

-- 
- - - - - - - - - - - - - - - - - - -
Dipl. Ing. (FH) Markus Hitter
http://www.jump-ing.de/


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 03:49:07PM -0400, Jeff King wrote:
> On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote:
> 
> > I agree with your assessment here. The main difficulty in implementing
> > gitrefs is to ensure that they actually do get picked up by
> > reachability checks to prevent dropping commits. I'm not sure how easy
> > this is, but I would much rather we go this route rather than
> > continuing along with the hack. This seems like the ideal solution,
> > since it solves the entire problem and doesn't need more hacks bolted
> > on.
> 
> I think the main complication is that the reachability rules are used
> during object transfer. So you'd probably want to introduce some
> protocol extension to say "I understand gitrefs", so that when one side
> says "I have sha1 X and its reachable objects", we know whether they are
> including gitrefs there. And likewise receivers with
> transfer.fsckObjects may complain about the new gitref tree mode
> (fortunately a new object type shouldn't be needed).
> 
> You might also want fallback rules for storing gitrefs on "old" servers
> (e.g., backfilling gitrefs you need if the server didn't them in the
> initial fetch). But I guess storing any gitrefs on such a server is
> inherently dangerous, because the server might prune them at any time.
> 
> So perhaps a related question is: how can gitrefs be designed such that
> existing servers reject them (rather than accepting the push and then
> later throwing away half the data). It would be easy to notice in the
> client during a push that we are sending gitrefs to a server which does
> not claim that capability. But it seems more robust if it is the server
> who decides "I will not accept these bogus objects".

This seems like the critical problem, here.  The parent hack I used in
git-series might be a hack, but it transparently works with old servers
and clients.  So, for instance, I can push a git-series ref to github,
with no changes required on github's part.  If git added gitrefs, and I
started using them in git-series, then that'd eliminate parent hack and
allow many standard git tools to work naturally on git-series commits
and history, but it'd also mean that people couldn't push git-series
commits to any server until that server updates git.

That said, I'd *love* to have gitrefs available, for a wide variety of
applications, and I can see an argument for introducing them and waiting
a few years for them to become universally available, similar to the
process gitlinks went through.

But I'd also love to have a backward-compatible solution.

- Josh Triplett


[PATCH] doc: fill in omitted word

2016-11-04 Thread Kristoffer Haugsbakk
Signed-off-by: Kristoffer Haugsbakk 
---
 Documentation/gitcore-tutorial.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/gitcore-tutorial.txt 
b/Documentation/gitcore-tutorial.txt
index 4546fa0..9860517 100644
--- a/Documentation/gitcore-tutorial.txt
+++ b/Documentation/gitcore-tutorial.txt
@@ -25,7 +25,7 @@ you want to understand Git's internals.
 The core Git is often called "plumbing", with the prettier user
 interfaces on top of it called "porcelain". You may not want to use the
 plumbing directly very often, but it can be good to know what the
-plumbing does for when the porcelain isn't flushing.
+plumbing does for you when the porcelain isn't flushing.
 
 Back when this document was originally written, many porcelain
 commands were shell scripts. For simplicity, it still uses them as
-- 
2.10.2



Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Stefan Beller
On Fri, Nov 4, 2016 at 1:58 PM, Brandon Williams  wrote:
> On 11/04, Brandon Williams wrote:
>> Signed-off-by: Brandon Williams 
>
> Is there an acceptable way to give credit to Jeff for helping with this patch?

What about:
Helped-by: Jeff King 


>
> --
> Brandon Williams


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 09:47:41PM +0100, Christian Couder wrote:
> On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano  wrote:
> >
> > Imagine we invent a new tree entry type, "gitref", that is similar
> > to "gitlink" in that it can record a commit object name in a tree,
> > but unlike "gitlink" it does imply reachability.  And you do not add
> > phony parents to your commit object.  A tree that has "gitref"s in
> > it is about annotating the commits in the same repository (e.g. the
> > tree references two commits, "base" and "tip", to point into a slice
> > of the main history).  And it is perfectly sensible for such a
> > pointer to imply reachability---after all it serves different
> > purposes from "gitlink".
> 
> The more I think about this (and also about how to limit ref
> advertisements as recently discussed in
> https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/),
> the more I think about Shawn's RefTree:
> 
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> 
> Couldn't a RefTree be used to store refs that point to the base
> commit, the tip commit and the blob that contains the cover letter,
> and maybe also a ref pointing to the RefTree of the previous version
> of the series?

That's really interesting!  The Software Heritage project is working on
something similar, because they want to store all the refs as part of
their data model as well.  I'll point them to the reftree work.

If upstream git supported RefTree, I could potentially use that for
git-series.  However, I do want a commit message and history for the
series itself, and using refs in the reftree to refer to the parents
seems like abusing reftree to recreate commits, in a reversal of the
hack of using commit parents as a reftree. :)

What if, rather than storing a hash reference to a reftree as a single
reference and replacing it with no history, a reftree could be
referenced from a commit and have history?  (That would also allow
tagging a version of the reftree.)


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Josh Triplett
On Fri, Nov 04, 2016 at 10:57:09AM -0700, Junio C Hamano wrote:
> After your talk at LPC2016, I was thinking about your proposal to
> give an option to hide certain parents from "git log" traversal.
> 
> While I do not think we would terribly mind a new feature in the
> core to support third-party additions like "git series" better, I
> think this particular one is a big mistake that we shouldn't take.
[...]
> I think this is backwards.  The root cause of the issue you have
> with "gitk" is because you added something that is *NOT* a parent to
> your commit.  We shouldn't have to add a mechanism to filter
> something that shouldn't have been added there in the first place.
> 
> I am wondering if an alternative approach would work better.
> 
> Imagine we invent a new tree entry type, "gitref", that is similar
> to "gitlink" in that it can record a commit object name in a tree,
> but unlike "gitlink" it does imply reachability.  And you do not add
> phony parents to your commit object.  A tree that has "gitref"s in
> it is about annotating the commits in the same repository (e.g. the
> tree references two commits, "base" and "tip", to point into a slice
> of the main history).  And it is perfectly sensible for such a
> pointer to imply reachability---after all it serves different
> purposes from "gitlink".

I absolutely agree with this, and I'd love to have gitref or similar in
core git.  Given the availability of that mechanism, I'd love to use it
in git-series.  (And in git submodule, as well, for other projects.)

The one critical issue there, though: that would break backward
compatibility with old versions of git.  No old version of git could
push, pull, gc, repack, or otherwise touch a repository that used this
feature.

The advantages of the approach (viewing and manipulating the series with
pure git) seem sufficiently high to make that worth considering, but it
is a significant downside.

> Another alternative that I am negative about (but is probably a
> better hack than how you abused the "parent" link) might be to add a
> new commit object header field that behaves similarly to "parent"
> only in that it implies reachability.  But recording the extra
> parent in commit object was not something you wanted to do in the
> first place (i.e. your series processing is done solely on the
> contents of the tree, and you do not read this extra parent). If you
> need to add an in-tree reference to another commit in your future
> versions of "git series", with either this variant or your original
> implementation, you would end up needing adding more "parent" (or
> pseudo parent) only to preserve reachability.  At that point, I
> think it makes more sense to have entries in the tree to directly
> ensure reachability, if you want these entries to always point at an
> in-tree object.

This would similarly break compatibility with old git, as old git
wouldn't follow those reachability-only links from commits, so it could
throw away the data.

One approach compatible with old git would be to continue adding the
relevant commits as artificial parents, but have a separate commit
metadata field that says which parents to ignore; old git would then do
the right thing, as long as it doesn't rewrite the commit entirely.

That does have the same disadvantages of having to duplicate the
information in both the tree and the parent list, though; it's the same
class of hack, just with improved usability.  I'd much rather use
gitrefs.

> I am afraid that I probably am two steps ahead of myself, because I
> am reasonably sure that it is quite possible that I have overlooked
> something trivially obvious that makes the "gitref" approach
> unworkable.

gitref seems like a good idea to me, as long as we can sort out the
compatibility story.


Re: [PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Brandon Williams
On 11/04, Brandon Williams wrote:
> Signed-off-by: Brandon Williams 

Is there an acceptable way to give credit to Jeff for helping with this patch?

-- 
Brandon Williams


[PATCH v3] transport: add protocol policy config option

2016-11-04 Thread Brandon Williams
Previously the `GIT_ALLOW_PROTOCOL` environment variable was used to
specify a whitelist of protocols to be used in clone/fetch/pull
commands.  This patch introduces new configuration options for more
fine-grained control for allowing/disallowing protocols.  This also has
the added benefit of allowing easier construction of a protocol
whitelist on systems where setting an environment variable is
non-trivial.

Now users can specify a policy to be used for each type of protocol via
the 'protocol..allow' config option.  A default policy for all
unknown protocols can be set with the 'protocol.allow' config option.
If no user configured default is made git, by default, will allow
known-safe protocols (http, https, git, ssh, file), disallow
known-dangerous protocols (ext), and have a default poliy of `user` for
all other protocols.

The supported policies are `always`, `never`, and `user`.  The `user`
policy can be used to configure a protocol to be usable when explicitly
used by a user, while disallowing it for commands which run
clone/fetch/pull commands without direct user intervention (e.g.
recursive initialization of submodules).  Commands which can potentially
clone/fetch/pull from untrusted repositories without user intervention
can export `GIT_PROTOCOL_FROM_USER` with a value of '0' to prevent
protocols configured to the `user` policy from being used.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt |  25 
 Documentation/git.txt|  19 +++---
 git-submodule.sh |  12 ++--
 t/lib-proto-disable.sh   | 132 +++
 t/t5509-fetch-push-namespaces.sh |   1 +
 t/t5802-connect-helper.sh|   1 +
 transport.c  |  73 +-
 7 files changed, 235 insertions(+), 28 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 27069ac..5d845c4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2308,6 +2308,31 @@ pretty.::
Note that an alias with the same name as a built-in format
will be silently ignored.
 
+protocol.allow::
+   If set, provide a user defined default policy for all protocols which
+   don't explicitly have a policy (protocol..allow).  By default,
+   if unset, known-safe protocols (http, https, git, ssh, file) have a
+   default policy of `always`, known-dangerous protocols (ext) have a
+   default policy of `never`, and all other protocols have a default policy
+   of `user`.  Supported policies:
++
+--
+
+* `always` - protocol is always able to be used.
+
+* `never` - protocol is never able to be used.
+
+* `user` - protocol is only able to be used when `GIT_PROTOCOL_FROM_USER` is
+  either unset or has a value of 1.  This policy should be used when you want a
+  protocol to be usable by the user but don't want it used by commands which
+  execute clone/fetch/pull commands without user input, e.g. recursive
+  submodule initialization.
+
+--
+
+protocol..allow::
+   Set a policy to be used by protocol  with clone/fetch/pull 
commands.
+
 pull.ff::
By default, Git does not create an extra merge commit when merging
a commit that is a descendant of the current commit. Instead, the
diff --git a/Documentation/git.txt b/Documentation/git.txt
index ab7215e..ab25580 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -1150,13 +1150,13 @@ of clones and fetches.
cloning a repository to make a backup).
 
 `GIT_ALLOW_PROTOCOL`::
-   If set, provide a colon-separated list of protocols which are
-   allowed to be used with fetch/push/clone. This is useful to
-   restrict recursive submodule initialization from an untrusted
-   repository. Any protocol not mentioned will be disallowed (i.e.,
-   this is a whitelist, not a blacklist). If the variable is not
-   set at all, all protocols are enabled.  The protocol names
-   currently used by git are:
+   The new way to configure allowed protocols is done through the config
+   interface, though this setting takes precedences.  See
+   linkgit:git-config[1] for more details.  If set, provide a
+   colon-separated list of protocols which are allowed to be used with
+   fetch/push/clone.  Any protocol not mentioned will be disallowed (i.e.,
+   this is a whitelist, not a blacklist).  The protocol names currently
+   used by git are:
 
  - `file`: any local file-based path (including `file://` URLs,
or local paths)
@@ -1174,6 +1174,11 @@ of clones and fetches.
  - any external helpers are named by their protocol (e.g., use
`hg` to allow the `git-remote-hg` helper)
 
+`GIT_PROTOCOL_FROM_USER`::
+   Set to 0 to prevent protocols used by fetch/push/clone which are
+   configured to the `user` state.  This is useful to restrict recursive
+   submodule initialization from an untrusted 

Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 12:19:55PM -0700, Jacob Keller wrote:

> I agree with your assessment here. The main difficulty in implementing
> gitrefs is to ensure that they actually do get picked up by
> reachability checks to prevent dropping commits. I'm not sure how easy
> this is, but I would much rather we go this route rather than
> continuing along with the hack. This seems like the ideal solution,
> since it solves the entire problem and doesn't need more hacks bolted
> on.

I think the main complication is that the reachability rules are used
during object transfer. So you'd probably want to introduce some
protocol extension to say "I understand gitrefs", so that when one side
says "I have sha1 X and its reachable objects", we know whether they are
including gitrefs there. And likewise receivers with
transfer.fsckObjects may complain about the new gitref tree mode
(fortunately a new object type shouldn't be needed).

You might also want fallback rules for storing gitrefs on "old" servers
(e.g., backfilling gitrefs you need if the server didn't them in the
initial fetch). But I guess storing any gitrefs on such a server is
inherently dangerous, because the server might prune them at any time.

So perhaps a related question is: how can gitrefs be designed such that
existing servers reject them (rather than accepting the push and then
later throwing away half the data). It would be easy to notice in the
client during a push that we are sending gitrefs to a server which does
not claim that capability. But it seems more robust if it is the server
who decides "I will not accept these bogus objects".

I haven't thought all that hard about this. That's just my initial
thoughts on what sound hard. Tweaking the reachability code doesn't seem
all that bad; we already know all of the spots that care about
S_ISGITLINK(). It may even be that some of those spots work out of the
box (because gitlinks are usually about telling the graph-walking code
that we _don't_ care about reachability; we do by default for trees and
blobs).

I'd be surprised if all such sites work out of the box, though. Even if
they see "ah, sha1 X is referenced by tree Y and isn't a gitlink, and
therefore should be reachable", they need to also note that "X" is a
commit and recursively walk its objects.

-Peff


Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Christian Couder
On Fri, Nov 4, 2016 at 6:57 PM, Junio C Hamano  wrote:
>
> Imagine we invent a new tree entry type, "gitref", that is similar
> to "gitlink" in that it can record a commit object name in a tree,
> but unlike "gitlink" it does imply reachability.  And you do not add
> phony parents to your commit object.  A tree that has "gitref"s in
> it is about annotating the commits in the same repository (e.g. the
> tree references two commits, "base" and "tip", to point into a slice
> of the main history).  And it is perfectly sensible for such a
> pointer to imply reachability---after all it serves different
> purposes from "gitlink".

The more I think about this (and also about how to limit ref
advertisements as recently discussed in
https://public-inbox.org/git/20161024132932.i42rqn2vlpocq...@sigill.intra.peff.net/),
the more I think about Shawn's RefTree:

https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/

Couldn't a RefTree be used to store refs that point to the base
commit, the tip commit and the blob that contains the cover letter,
and maybe also a ref pointing to the RefTree of the previous version
of the series?


[PATCH 5/5] check-ref-format: New --stdin option

2016-11-04 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Documentation/git-check-ref-format.txt | 10 --
 builtin/check-ref-format.c | 34 +++---
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt 
b/Documentation/git-check-ref-format.txt
index e9a2657..5a213ce 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -10,8 +10,9 @@ SYNOPSIS
 [verse]
 'git check-ref-format' [--report-errors] [--normalize]
[--[no-]allow-onelevel] [--refspec-pattern]
-   
-'git check-ref-format' [--report-errors] --branch 
+| --stdin
+'git check-ref-format' [--report-errors] --branch
+| --stdin
 
 DESCRIPTION
 ---
@@ -109,6 +110,11 @@ OPTIONS
If any ref does not check OK, print a message to stderr.
 (By default, git check-ref-format is silent.)
 
+--stdin::
+   Instead of checking on ref supplied on the command line,
+   read refs, one per line, from stdin.  The exit status is
+   0 if all the refs were OK.
+
 
 EXAMPLES
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 559d5c2..87f52fa 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -76,6 +76,7 @@ static int check_one_ref_format(const char *refname)
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
int i;
+   int use_stdin = 0;
 
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
@@ -93,6 +94,8 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
check_branch = 1;
else if (!strcmp(argv[i], "--report-errors"))
report_errors = 1;
+   else if (!strcmp(argv[i], "--stdin"))
+   use_stdin = 1;
else
usage(builtin_check_ref_format_usage);
}
@@ -100,8 +103,33 @@ int cmd_check_ref_format(int argc, const char **argv, 
const char *prefix)
if (check_branch && (flags || normalize))
usage(builtin_check_ref_format_usage);
 
-   if (! (i == argc - 1))
-   usage(builtin_check_ref_format_usage);
+   if (!use_stdin) {
+   if (! (i == argc - 1))
+   usage(builtin_check_ref_format_usage);
+
+   return check_one_ref_format(argv[i]);
+   } else {
+   char buffer[2048];
+   int worst = 0;
 
-   return check_one_ref_format(argv[i]);
+   if (! (i == argc))
+   usage(builtin_check_ref_format_usage);
+
+   while (fgets(buffer, sizeof(buffer), stdin)) {
+   char *newline = strchr(buffer, '\n');
+   if (!newline) {
+   fprintf(stderr, "%s --stdin: missing final 
newline or line too long\n", *argv);
+   exit(127);
+   }
+   *newline = 0;
+   int got = check_one_ref_format(buffer);
+   if (got > worst)
+   worst = got;
+   }
+   if (!feof(stdin)) {
+   perror("reading from stdin");
+   exit(127);
+   }
+   return worst;
+   }
 }
-- 
2.10.1



[PATCH 4/5] check-ref-format: New --report-errors option

2016-11-04 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 Documentation/git-check-ref-format.txt |  8 ++--
 builtin/check-ref-format.c | 10 --
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-check-ref-format.txt 
b/Documentation/git-check-ref-format.txt
index 8611a99..e9a2657 100644
--- a/Documentation/git-check-ref-format.txt
+++ b/Documentation/git-check-ref-format.txt
@@ -8,10 +8,10 @@ git-check-ref-format - Ensures that a reference name is well 
formed
 SYNOPSIS
 
 [verse]
-'git check-ref-format' [--normalize]
+'git check-ref-format' [--report-errors] [--normalize]
[--[no-]allow-onelevel] [--refspec-pattern]

-'git check-ref-format' --branch 
+'git check-ref-format' [--report-errors] --branch 
 
 DESCRIPTION
 ---
@@ -105,6 +105,10 @@ OPTIONS
with a status of 0.  (`--print` is a deprecated way to spell
`--normalize`.)
 
+--report-errors::
+   If any ref does not check OK, print a message to stderr.
+(By default, git check-ref-format is silent.)
+
 
 EXAMPLES
 
diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 020ebe8..559d5c2 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -9,7 +9,7 @@
 
 static const char builtin_check_ref_format_usage[] =
 "git check-ref-format [--normalize] [] \n"
-"   or: git check-ref-format --branch ";
+"   or: git check-ref-format [] --branch ";
 
 /*
  * Return a copy of refname but with leading slashes removed and runs
@@ -51,6 +51,7 @@ static int check_ref_format_branch(const char *arg)
 static int normalize = 0;
 static int check_branch = 0;
 static int flags = 0;
+static int report_errors = 0;
 
 static int check_one_ref_format(const char *refname)
 {
@@ -61,8 +62,11 @@ static int check_one_ref_format(const char *refname)
got = check_branch
? check_ref_format_branch(refname)
: check_refname_format(refname, flags);
-   if (got)
+   if (got) {
+   if (report_errors)
+   fprintf(stderr, "bad ref format: %s\n", refname);
return 1;
+   }
if (normalize) {
printf("%s\n", refname);
free((void*)refname);
@@ -87,6 +91,8 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
flags |= REFNAME_REFSPEC_PATTERN;
else if (!strcmp(argv[i], "--branch"))
check_branch = 1;
+   else if (!strcmp(argv[i], "--report-errors"))
+   report_errors = 1;
else
usage(builtin_check_ref_format_usage);
}
-- 
2.10.1



[PATCH 0/5] git check-ref-format --stdin --report-errors

2016-11-04 Thread Ian Jackson
I wanted to be able to syntax check lots of proposed refs quickly
(please don't ask why - it's complicated!)

So I added a --stdin option to git-check-ref-format.  Also it has
--report-errors now too so you can get some kind of useful error
message if it complains.

It's still not really a good batch mode but it's good enough for my
use case.  To improve it would involve a new command line option to
offer a suitable stdout output format.

There are three small refactoring patches and the two patches with new
options and corresponding docs.

Thanks for your attention.

FYI I am not likely to need this again in the near future: it's a
one-off use case.  So my effort for rework is probably limited.  I
thought I'd share what I'd done in what I hope is a useful form,
anyway.

Regards,
Ian.


[PATCH 2/5] check-ref-format: Refactor to make --branch code more common

2016-11-04 Thread Ian Jackson
We are going to want to permit other options with --branch.

So, replace the special case with just an entry for --branch in the
parser for ordinary options, and check for option compatibility at the
end.

No overall functional change.

Signed-off-by: Ian Jackson 
---
 builtin/check-ref-format.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index 4d56caa..f12c19c 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -49,13 +49,19 @@ static int check_ref_format_branch(const char *arg)
 }
 
 static int normalize = 0;
+static int check_branch = 0;
 static int flags = 0;
 
 static int check_one_ref_format(const char *refname)
 {
+   int got;
+
if (normalize)
refname = collapse_slashes(refname);
-   if (check_refname_format(refname, flags))
+   got = check_branch
+   ? check_ref_format_branch(refname)
+   : check_refname_format(refname, flags);
+   if (got)
return 1;
if (normalize)
printf("%s\n", refname);
@@ -68,9 +74,6 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
 
-   if (argc == 3 && !strcmp(argv[1], "--branch"))
-   return check_ref_format_branch(argv[2]);
-
for (i = 1; i < argc && argv[i][0] == '-'; i++) {
if (!strcmp(argv[i], "--normalize") || !strcmp(argv[i], 
"--print"))
normalize = 1;
@@ -80,9 +83,15 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
flags &= ~REFNAME_ALLOW_ONELEVEL;
else if (!strcmp(argv[i], "--refspec-pattern"))
flags |= REFNAME_REFSPEC_PATTERN;
+   else if (!strcmp(argv[i], "--branch"))
+   check_branch = 1;
else
usage(builtin_check_ref_format_usage);
}
+
+   if (check_branch && (flags || normalize))
+   usage(builtin_check_ref_format_usage);
+
if (! (i == argc - 1))
usage(builtin_check_ref_format_usage);
 
-- 
2.10.1



[PATCH 1/5] check-ref-format: Refactor out check_one_ref_format

2016-11-04 Thread Ian Jackson
We are going to want to reuse this.  No functional change right now.

It currently has a hidden memory leak if --normalize is used.

Signed-off-by: Ian Jackson 
---
 builtin/check-ref-format.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index eac4994..4d56caa 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -48,12 +48,22 @@ static int check_ref_format_branch(const char *arg)
return 0;
 }
 
+static int normalize = 0;
+static int flags = 0;
+
+static int check_one_ref_format(const char *refname)
+{
+   if (normalize)
+   refname = collapse_slashes(refname);
+   if (check_refname_format(refname, flags))
+   return 1;
+   if (normalize)
+   printf("%s\n", refname);
+}
+
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
 {
int i;
-   int normalize = 0;
-   int flags = 0;
-   const char *refname;
 
if (argc == 2 && !strcmp(argv[1], "-h"))
usage(builtin_check_ref_format_usage);
@@ -76,13 +86,5 @@ int cmd_check_ref_format(int argc, const char **argv, const 
char *prefix)
if (! (i == argc - 1))
usage(builtin_check_ref_format_usage);
 
-   refname = argv[i];
-   if (normalize)
-   refname = collapse_slashes(refname);
-   if (check_refname_format(refname, flags))
-   return 1;
-   if (normalize)
-   printf("%s\n", refname);
-
-   return 0;
+   return check_one_ref_format(argv[i]);
 }
-- 
2.10.1



[PATCH 3/5] check-ref-format: Abolish leak of collapsed refname

2016-11-04 Thread Ian Jackson
collapse_slashes always returns a value from xmallocz.

Right now this leak is not very interesting, since we only call
check_one_ref_format once.

Signed-off-by: Ian Jackson 
---
 builtin/check-ref-format.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/check-ref-format.c b/builtin/check-ref-format.c
index f12c19c..020ebe8 100644
--- a/builtin/check-ref-format.c
+++ b/builtin/check-ref-format.c
@@ -63,8 +63,10 @@ static int check_one_ref_format(const char *refname)
: check_refname_format(refname, flags);
if (got)
return 1;
-   if (normalize)
+   if (normalize) {
printf("%s\n", refname);
+   free((void*)refname);
+   }
 }
 
 int cmd_check_ref_format(int argc, const char **argv, const char *prefix)
-- 
2.10.1



Re: Regarding "git log" on "git series" metadata

2016-11-04 Thread Jacob Keller
On Fri, Nov 4, 2016 at 10:57 AM, Junio C Hamano  wrote:
> I think this is backwards.  The root cause of the issue you have
> with "gitk" is because you added something that is *NOT* a parent to
> your commit.  We shouldn't have to add a mechanism to filter
> something that shouldn't have been added there in the first place.
>
> I am wondering if an alternative approach would work better.
>
> Imagine we invent a new tree entry type, "gitref", that is similar
> to "gitlink" in that it can record a commit object name in a tree,
> but unlike "gitlink" it does imply reachability.  And you do not add
> phony parents to your commit object.  A tree that has "gitref"s in
> it is about annotating the commits in the same repository (e.g. the
> tree references two commits, "base" and "tip", to point into a slice
> of the main history).  And it is perfectly sensible for such a
> pointer to imply reachability---after all it serves different
> purposes from "gitlink".
>

I agree with your assessment here. The main difficulty in implementing
gitrefs is to ensure that they actually do get picked up by
reachability checks to prevent dropping commits. I'm not sure how easy
this is, but I would much rather we go this route rather than
continuing along with the hack. This seems like the ideal solution,
since it solves the entire problem and doesn't need more hacks bolted
on.

It would of course mean some work for people who previously used git
series as you would want to re-write the commits to drop the parent
links and become gitrefs instead of gitlinks. However, this can
(probably?) be solved by some sort of use of the filter-branch code.

I don't think you've hit upon any trivially obvious unworkable things.
It is probably somewhat complex to make the reachability checks detect
in-tree gitrefs but I don't think it would be impossible.

Thanks,
Jake


Regarding "git log" on "git series" metadata

2016-11-04 Thread Junio C Hamano
After your talk at LPC2016, I was thinking about your proposal to
give an option to hide certain parents from "git log" traversal.

While I do not think we would terribly mind a new feature in the
core to support third-party additions like "git series" better, I
think this particular one is a big mistake that we shouldn't take.

For those listening from sidelines, here is a version of my
understanding of "git series":

 * "git series" wants to represent a patch series evolution.  It is
   a history of history, and each element of this evolution is
   represented by:

   - a commit object, that is used to describe what this reroll of
 the topic is about, and its parent links point at previous
 rerolls (it could be a merge of two independent incarnations of
 a series).

   - the tree contained in the commit object records the base commit
 where the topic forks from the main history, and the tip commit
 where the topic ends.  These are pointers into the main history
 DAG.

   - the tree may have other metadata, an example of which is the
 cover letter contents to be used when the topic becomes ready
 for re-submission.  There may be more metadata you would want
 to add in the future versions of "git series".

   Needless to say, the commits that represent the history of a
   series record a tree that is completely differently shaped.  The
   only relation between the series history and main history is that
   the former has pointers into the latter.

 * You chose to represent the base and tip commit object as gitlinks
   in the tree of a series commit, simply because it was a way that
   was already implemented to record a commit object name in a tree.

 * However, because gitlink is designed to be used for "external"
   things (the prominent example is submodule), recording these as
   gitlinks would guarantee that they will get GCed as a series
   progresses, the main history rewound and rewritten thereby making
   the base and tip recorded in the older part of the series history
   unreachable from the main history.  Because you want to make sure
   that base and tip objects will stay in the repository even after
   the topic branch in the main history gets rewound, this is not
   what you want.

 * In order to workaround that reachability issue, the hack you
   invented is to add the tip commit as a "parent" of a commit that
   represents one step in the series.  This may guarantee the
   reachability---as long as a commit in a series history is
   reachable from a ref, the tip and base commits will be reachable
   from there even if they are rebased away from the main history.
   But of course, there are downsides.

 * Due to this hack, feeding "gitk" (or "git log") a commit in the
   series history will give you nonsense results.  You are not
   interested in traversing or viewing the commits in the main
   history.

 * Because of the above, you propose another hack to tell the
   revision traversal machinery to optionally omit a parent commit
   that appear as a gitlink in the tree.

I think this is backwards.  The root cause of the issue you have
with "gitk" is because you added something that is *NOT* a parent to
your commit.  We shouldn't have to add a mechanism to filter
something that shouldn't have been added there in the first place.

I am wondering if an alternative approach would work better.

Imagine we invent a new tree entry type, "gitref", that is similar
to "gitlink" in that it can record a commit object name in a tree,
but unlike "gitlink" it does imply reachability.  And you do not add
phony parents to your commit object.  A tree that has "gitref"s in
it is about annotating the commits in the same repository (e.g. the
tree references two commits, "base" and "tip", to point into a slice
of the main history).  And it is perfectly sensible for such a
pointer to imply reachability---after all it serves different
purposes from "gitlink".

Another alternative that I am negative about (but is probably a
better hack than how you abused the "parent" link) might be to add a
new commit object header field that behaves similarly to "parent"
only in that it implies reachability.  But recording the extra
parent in commit object was not something you wanted to do in the
first place (i.e. your series processing is done solely on the
contents of the tree, and you do not read this extra parent). If you
need to add an in-tree reference to another commit in your future
versions of "git series", with either this variant or your original
implementation, you would end up needing adding more "parent" (or
pseudo parent) only to preserve reachability.  At that point, I
think it makes more sense to have entries in the tree to directly
ensure reachability, if you want these entries to always point at an
in-tree object.

I am afraid that I probably am two steps ahead of myself, because I
am reasonably sure that it is quite possible that I have overlooked
something trivially 

Re: [PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()

2016-11-04 Thread Tobias Klauser
On 2016-11-04 at 17:37:14 +0100, Jeff King  wrote:
> On Fri, Nov 04, 2016 at 11:24:36AM +0100, Tobias Klauser wrote:
> 
> > The delta_limit parameter to diffcore_count_changes() has been unused
> > since commit c06c79667c95 ("diffcore-rename: somewhat optimized.").
> > Remove the parameter and adjust all callers.
> 
> Sounds like a good idea to get rid of an unused parameter, but I think
> this went away in ba23bbc8e (diffcore-delta: make change counter to byte
> oriented again., 2006-03-04).

Ugh, I must have fat-fingered the commit id. Will update the description
accordingly for v2.

Thanks!
Tobias


Re: [PATCH] branch: remove unused parameter to create_branch()

2016-11-04 Thread Tobias Klauser
On 2016-11-04 at 17:30:12 +0100, Jeff King  wrote:
> On Fri, Nov 04, 2016 at 04:19:49PM +0100, Tobias Klauser wrote:
> 
> > The name parameter to create_branch() has been unused since commit
> > 55c4a673070f ("Prevent force-updating of the current branch"). Remove
> > the parameter and adjust the callers accordingly. Also remove the
> > parameter from the function's documentation comment.
> 
> This seemed eerily familiar, and it turns out I wrote this as a
> preparatory step for a different topic a while back, but never finished
> it.
> 
> So clearly a good change, though we might want to explain a bit more why
> it's correct that the parameter is unused. Here's what I wrote:
> 
>   This function used to have the caller pass in the current value of
>   HEAD, in order to make sure we didn't clobber HEAD.  In 55c4a6730,
>   that logic moved to validate_new_branchname(), which just resolves
>   HEAD itself. The parameter to create_branch is now unused.

Ah, I didn't know about the history of this parameter. It clearly makes
sense to explain this in the patch description.

> I also ended up reformatting the documentation comment, but that's
> purely optional. My patch is below for reference. Feel free to grab any
> bits of it that you agree with.

I like your documentation comment much better as IMO it's easier to read
and to identify the individual parameters.

Given these facts, I guess it's better if my patch is dropped and yours
is applied instead :)

Thanks a lot!


Re: [PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 11:24:36AM +0100, Tobias Klauser wrote:

> The delta_limit parameter to diffcore_count_changes() has been unused
> since commit c06c79667c95 ("diffcore-rename: somewhat optimized.").
> Remove the parameter and adjust all callers.

Sounds like a good idea to get rid of an unused parameter, but I think
this went away in ba23bbc8e (diffcore-delta: make change counter to byte
oriented again., 2006-03-04).

The patch itself looks good.

-Peff


Re: git -C has unexpected behaviour

2016-11-04 Thread Johannes Schindelin
Hi Felix,

On Fri, 4 Nov 2016, Felix Nairz wrote:

> Now, to the unexpected part, which I think is a bug:
> 
> If the TestData folder is there, but empty (I deleted all the files),
> then running
> 
> git -C .\TestData reset --hard
> 
> will NOT throw me an error but run
> 
> git reset --hard
> 
> on the git repository (not the submodule in the sub-directory!),
> without warning, or error.

I *think* that this is actually intended. Please note that -C is *not* a
synonym of --git-dir. It is more of a synonym of

cd .\TestData
git reset --hard

In other words, you probably expected -C to specify the top-level
directory of a worktree, and you expected Git to error out when it is not.
Instead, Git will treat -C as a directory *from where to start*; It *can*
be a subdirectory of the top-level directory of the worktree.

Please note that

git --git-dir=.\TestData\.git reset --hard

will not work, though, as it mistakes the current directory for the
top-level directory of the worktree. What you want to do is probably

git -C .\TestData --git-dir=.git reset --hard

This will tell Git to change the current working directory to the
top-level of your intended worktree, *and* state that the repository needs
to be in .git (which can be a file containing "gitdir: ",
which is the default in submodules).

If the repository is *not* found, this command will exit with a failure.

Ciao,
Johannes


Re: git -C has unexpected behaviour

2016-11-04 Thread Stefan Beller
On Fri, Nov 4, 2016 at 7:28 AM, Felix Nairz  wrote:
> Hi guys,
>
> I ran into some really weird git behaviour today.
>
> My git --version is: git version 2.8.1.windows.1
>
> We have a git repository with a submodule called TestData. The data in
> there is modified and reset as part of our unit tests.
>
> The submodule is a sub-folder of the git repository called TestData.
> So the relative path from the git repository to the submodule is
> .\TestData
>
> If I delete the entire TestData folder and run
> git -C .\TestData reset --hard
>
> I will get the following error:
> git : fatal: Cannot change to '.\TestData': No such file or directory
> This is as expected.
>
>
> Now, to the unexpected part, which I think is a bug:
>
> If the TestData folder is there, but empty (I deleted all the files),

And "all the files" includes the ".git" file, which git uses to find out if
it is in a git repository. So it keeps walking up until it finds a .git
file/directory, which is the parent project.

Once a git directory is found, the main function of git initializes some
data structures, e.g. the "path prefix" inside the repository which would be
" .\TestData" in your case. then the actual command is found and run.

So what it is doing is:
"Suppose you are in the TestData directory of the parent project and then
run the command ..."

My gut reaction was to propose to check if any GITLINK (submodule)
is a prefix of said "path prefix" and then rather initialize and operate on
the submodule.

However I do not think this is a good idea:
* Git wants to be fast and checking if we are in any submodule
slows down the common case.
* Historically commands in un-initialized or deinitialized submodules
behave as if in the parent project. I think if we'd fix this issue, other
people would complain as their workflow is harmed.

>
> Because of this we have had losses of uncommitted changes a few times
> now (loosing a few days of work, and getting a bit paranoid),

* commit early, commit often such that the losses are less than a few days.
* do not remove the submodule directory as a whole thing
  (make sure the .git file is there and not wiped)
* instead use "git -C TestData clean -dffx && git -C TestData reset --hard"
   https://git-scm.com/docs/git-clean


> could never find the root cause for this until today, where I found
> out that it happens when the TestData directory is empty.

I am undecided if it is a bug or a feature. Fixing this as a bug
would be a performance penalty. Not fixing it may incur data losses.
I dunno.

>
> Thank for looking into this, and I am looking forward to hear your
> opinions about this.
>
> Best Regards, Felix Nairz


Re: [PATCH] branch: remove unused parameter to create_branch()

2016-11-04 Thread Jeff King
On Fri, Nov 04, 2016 at 04:19:49PM +0100, Tobias Klauser wrote:

> The name parameter to create_branch() has been unused since commit
> 55c4a673070f ("Prevent force-updating of the current branch"). Remove
> the parameter and adjust the callers accordingly. Also remove the
> parameter from the function's documentation comment.

This seemed eerily familiar, and it turns out I wrote this as a
preparatory step for a different topic a while back, but never finished
it.

So clearly a good change, though we might want to explain a bit more why
it's correct that the parameter is unused. Here's what I wrote:

  This function used to have the caller pass in the current value of
  HEAD, in order to make sure we didn't clobber HEAD.  In 55c4a6730,
  that logic moved to validate_new_branchname(), which just resolves
  HEAD itself. The parameter to create_branch is now unused.

I also ended up reformatting the documentation comment, but that's
purely optional. My patch is below for reference. Feel free to grab any
bits of it that you agree with.

-- >8 --
Subject: [PATCH] create_branch: drop unused "head" parameter

This function used to have the caller pass in the current
value of HEAD, in order to make sure we didn't clobber HEAD.
In 55c4a6730, that logic moved to validate_new_branchname(),
which just resolves HEAD itself. The parameter to
create_branch is now unused.

Since we have to update and re-wrap the docstring describing
the parameters anyway, let's take this opportunity to break
it out into a list, which makes it easier to find the
parameters.

Signed-off-by: Jeff King 
---
 branch.c   |  3 +--
 branch.h   | 22 ++
 builtin/branch.c   |  4 ++--
 builtin/checkout.c |  2 +-
 4 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/branch.c b/branch.c
index a5a8dcbd0..0d459b3cf 100644
--- a/branch.c
+++ b/branch.c
@@ -228,8 +228,7 @@ N_("\n"
 "will track its remote counterpart, you may want to use\n"
 "\"git push -u\" to set the upstream config as you push.");
 
-void create_branch(const char *head,
-  const char *name, const char *start_name,
+void create_branch(const char *name, const char *start_name,
   int force, int reflog, int clobber_head,
   int quiet, enum branch_track track)
 {
diff --git a/branch.h b/branch.h
index b2f964933..3103eb9ad 100644
--- a/branch.h
+++ b/branch.h
@@ -4,15 +4,21 @@
 /* Functions for acting on the information about branches. */
 
 /*
- * Creates a new branch, where head is the branch currently checked
- * out, name is the new branch name, start_name is the name of the
- * existing branch that the new branch should start from, force
- * enables overwriting an existing (non-head) branch, reflog creates a
- * reflog for the branch, and track causes the new branch to be
- * configured to merge the remote branch that start_name is a tracking
- * branch for (if any).
+ * Creates a new branch, where:
+ *
+ *   - name is the new branch name
+ *
+ *   - start_name is the name of the existing branch that the new branch should
+ * start from
+ *
+ *   - force enables overwriting an existing (non-head) branch
+ *
+ *   - reflog creates a reflog for the branch
+ *
+ *   - track causes the new branch to be configured to merge the remote branch
+ * that start_name is a tracking branch for (if any).
  */
-void create_branch(const char *head, const char *name, const char *start_name,
+void create_branch(const char *name, const char *start_name,
   int force, int reflog,
   int clobber_head, int quiet, enum branch_track track);
 
diff --git a/builtin/branch.c b/builtin/branch.c
index d5d93a8c0..60cc5c8e8 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -807,7 +807,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
 * create_branch takes care of setting up the tracking
 * info and making sure new_upstream is correct
 */
-   create_branch(head, branch->name, new_upstream, 0, 0, 0, quiet, 
BRANCH_TRACK_OVERRIDE);
+   create_branch(branch->name, new_upstream, 0, 0, 0, quiet, 
BRANCH_TRACK_OVERRIDE);
} else if (unset_upstream) {
struct branch *branch = branch_get(argv[0]);
struct strbuf buf = STRBUF_INIT;
@@ -853,7 +853,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
strbuf_release();
 
branch_existed = ref_exists(branch->refname);
-   create_branch(head, argv[0], (argc == 2) ? argv[1] : head,
+   create_branch(argv[0], (argc == 2) ? argv[1] : head,
  force, reflog, 0, quiet, track);
 
/*
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 9b2a5b31d..512492aad 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -630,7 +630,7 @@ static void update_refs_for_switch(const struct 

[PATCH] branch: remove unused parameter to create_branch()

2016-11-04 Thread Tobias Klauser
The name parameter to create_branch() has been unused since commit
55c4a673070f ("Prevent force-updating of the current branch"). Remove
the parameter and adjust the callers accordingly. Also remove the
parameter from the function's documentation comment.

Signed-off-by: Tobias Klauser 
---
 branch.c   |  3 +--
 branch.h   | 15 +++
 builtin/branch.c   |  4 ++--
 builtin/checkout.c |  2 +-
 4 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/branch.c b/branch.c
index a5a8dcbd0ed9..0d459b3cfe50 100644
--- a/branch.c
+++ b/branch.c
@@ -228,8 +228,7 @@ N_("\n"
 "will track its remote counterpart, you may want to use\n"
 "\"git push -u\" to set the upstream config as you push.");
 
-void create_branch(const char *head,
-  const char *name, const char *start_name,
+void create_branch(const char *name, const char *start_name,
   int force, int reflog, int clobber_head,
   int quiet, enum branch_track track)
 {
diff --git a/branch.h b/branch.h
index b2f964933270..8e63d1b6f964 100644
--- a/branch.h
+++ b/branch.h
@@ -4,15 +4,14 @@
 /* Functions for acting on the information about branches. */
 
 /*
- * Creates a new branch, where head is the branch currently checked
- * out, name is the new branch name, start_name is the name of the
- * existing branch that the new branch should start from, force
- * enables overwriting an existing (non-head) branch, reflog creates a
- * reflog for the branch, and track causes the new branch to be
- * configured to merge the remote branch that start_name is a tracking
- * branch for (if any).
+ * Creates a new branch, where name is the new branch name, start_name
+ * is the name of the existing branch that the new branch should start
+ * from, force enables overwriting an existing (non-head) branch, reflog
+ * creates a reflog for the branch, and track causes the new branch to
+ * be configured to merge the remote branch that start_name is a
+ * tracking branch for (if any).
  */
-void create_branch(const char *head, const char *name, const char *start_name,
+void create_branch(const char *name, const char *start_name,
   int force, int reflog,
   int clobber_head, int quiet, enum branch_track track);
 
diff --git a/builtin/branch.c b/builtin/branch.c
index d5d93a8c03fe..60cc5c8e8da0 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -807,7 +807,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
 * create_branch takes care of setting up the tracking
 * info and making sure new_upstream is correct
 */
-   create_branch(head, branch->name, new_upstream, 0, 0, 0, quiet, 
BRANCH_TRACK_OVERRIDE);
+   create_branch(branch->name, new_upstream, 0, 0, 0, quiet, 
BRANCH_TRACK_OVERRIDE);
} else if (unset_upstream) {
struct branch *branch = branch_get(argv[0]);
struct strbuf buf = STRBUF_INIT;
@@ -853,7 +853,7 @@ int cmd_branch(int argc, const char **argv, const char 
*prefix)
strbuf_release();
 
branch_existed = ref_exists(branch->refname);
-   create_branch(head, argv[0], (argc == 2) ? argv[1] : head,
+   create_branch(argv[0], (argc == 2) ? argv[1] : head,
  force, reflog, 0, quiet, track);
 
/*
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 9b2a5b31d423..512492aad909 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -630,7 +630,7 @@ static void update_refs_for_switch(const struct 
checkout_opts *opts,
}
}
else
-   create_branch(old->name, opts->new_branch, new->name,
+   create_branch(opts->new_branch, new->name,
  opts->new_branch_force ? 1 : 0,
  opts->new_branch_log,
  opts->new_branch_force ? 1 : 0,
-- 
2.9.0




git -C has unexpected behaviour

2016-11-04 Thread Felix Nairz
Hi guys,

I ran into some really weird git behaviour today.

My git --version is: git version 2.8.1.windows.1

We have a git repository with a submodule called TestData. The data in
there is modified and reset as part of our unit tests.

The submodule is a sub-folder of the git repository called TestData.
So the relative path from the git repository to the submodule is
.\TestData

If I delete the entire TestData folder and run
git -C .\TestData reset --hard

I will get the following error:
git : fatal: Cannot change to '.\TestData': No such file or directory
This is as expected.


Now, to the unexpected part, which I think is a bug:

If the TestData folder is there, but empty (I deleted all the files),
then running

git -C .\TestData reset --hard

will NOT throw me an error but run

git reset --hard

on the git repository (not the submodule in the sub-directory!),
without warning, or error. This is easy to reproduce by having an
empty .\TestData folder, and just changing any file in your git
repository before running

git -C .\TestData reset --hard

and seeing the local file changes gone.

Because of this we have had losses of uncommitted changes a few times
now (loosing a few days of work, and getting a bit paranoid), but
could never find the root cause for this until today, where I found
out that it happens when the TestData directory is empty.

Thank for looking into this, and I am looking forward to hear your
opinions about this.

Best Regards, Felix Nairz


[no subject]

2016-11-04 Thread 2107105957
Bug.private.program.



[PATCH] diffcore-delta: remove unused parameter to diffcore_count_changes()

2016-11-04 Thread Tobias Klauser
The delta_limit parameter to diffcore_count_changes() has been unused
since commit c06c79667c95 ("diffcore-rename: somewhat optimized.").
Remove the parameter and adjust all callers.

Signed-off-by: Tobias Klauser 
---
 diff.c| 2 +-
 diffcore-break.c  | 1 -
 diffcore-delta.c  | 1 -
 diffcore-rename.c | 4 
 diffcore.h| 1 -
 5 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/diff.c b/diff.c
index 8981477c436d..ec8728362dae 100644
--- a/diff.c
+++ b/diff.c
@@ -2023,7 +2023,7 @@ static void show_dirstat(struct diff_options *options)
if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) {
diff_populate_filespec(p->one, 0);
diff_populate_filespec(p->two, 0);
-   diffcore_count_changes(p->one, p->two, NULL, NULL, 0,
+   diffcore_count_changes(p->one, p->two, NULL, NULL,
   , );
diff_free_filespec_data(p->one);
diff_free_filespec_data(p->two);
diff --git a/diffcore-break.c b/diffcore-break.c
index 881a74f29e4f..c64359f489c8 100644
--- a/diffcore-break.c
+++ b/diffcore-break.c
@@ -73,7 +73,6 @@ static int should_break(struct diff_filespec *src,
 
if (diffcore_count_changes(src, dst,
   >cnt_data, >cnt_data,
-  0,
   _copied, _added))
return 0;
 
diff --git a/diffcore-delta.c b/diffcore-delta.c
index 2ebedb32d18a..ebe70fb06851 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -166,7 +166,6 @@ int diffcore_count_changes(struct diff_filespec *src,
   struct diff_filespec *dst,
   void **src_count_p,
   void **dst_count_p,
-  unsigned long delta_limit,
   unsigned long *src_copied,
   unsigned long *literal_added)
 {
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 54a2396653df..f7444c86bde3 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -145,7 +145,6 @@ static int estimate_similarity(struct diff_filespec *src,
 * call into this function in that case.
 */
unsigned long max_size, delta_size, base_size, src_copied, 
literal_added;
-   unsigned long delta_limit;
int score;
 
/* We deal only with regular files.  Symlink renames are handled
@@ -191,11 +190,8 @@ static int estimate_similarity(struct diff_filespec *src,
if (!dst->cnt_data && diff_populate_filespec(dst, 0))
return 0;
 
-   delta_limit = (unsigned long)
-   (base_size * (MAX_SCORE-minimum_score) / MAX_SCORE);
if (diffcore_count_changes(src, dst,
   >cnt_data, >cnt_data,
-  delta_limit,
   _copied, _added))
return 0;
 
diff --git a/diffcore.h b/diffcore.h
index c11b8465fc8e..623024135478 100644
--- a/diffcore.h
+++ b/diffcore.h
@@ -142,7 +142,6 @@ extern int diffcore_count_changes(struct diff_filespec *src,
  struct diff_filespec *dst,
  void **src_count_p,
  void **dst_count_p,
- unsigned long delta_limit,
  unsigned long *src_copied,
  unsigned long *literal_added);
 
-- 
2.9.0