Re: [PATCH 05/22] sequencer: allow the sequencer to take custody of malloc()ed data

2016-08-29 Thread Johannes Sixt

Am 29.08.2016 um 23:59 schrieb Jakub Narębski:

W dniu 29.08.2016 o 10:04, Johannes Schindelin pisze:

-#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, NULL, 
NULL, 0, 0, NULL }
+#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, NULL, 
NULL, 0, 0, NULL, NULL, 0, 0 }


Nb. it is a pity that we cannot use named initializers for structs,
so called designated inits.  It would make this macro more readable.


It is actually pointless to add the 0's and NULL's here. This should  be 
sufficient:


#define REPLAY_OPTS_INIT { -1, -1 }

because initialization with 0 (or NULL) is the default for any omitted 
members.


-- Hannes



Re: Subtree Split Includes Commits Outside Prefix Directory

2016-08-29 Thread ELI
Hey David,

Did this give you the repro case that you needed?
- Harpreet "Eli" Sangha


On Thu, Jun 23, 2016 at 7:32 PM, ELI  wrote:
> Attempting to resend without HTML...
>
> - Harpreet "Eli" Sangha
>
>
> On Thu, Jun 23, 2016 at 7:18 PM, ELI  wrote:
>> Sorry for the delayed response... your email somehow found it's way into my
>> Gmail spam folder.
>>
>> I've created a simple reproduction case and hosted the test repositories on
>> BitBucket for sharing:
>> https://bitbucket.org/eliptus/subtree-test-sup
>> https://bitbucket.org/eliptus/subtree-test-suba
>> https://bitbucket.org/eliptus/subtree-test-subb
>> https://bitbucket.org/eliptus/subtree-test-subc
>>
>> Quick Repro Step:
>> git -C Sup subtree push --prefix=c SubC working
>>
>> You can reproduce the behavior I describe previously with the test
>> repositories by simply attempting a subtree push from Sup to SubC.  The
>> result is that SubC contains commit history for changes make exclusively to
>> SubA and SubB.  Below are details of how I got to this state.
>>
>> Test Setup:
>>
>> Created four new git repositories with initial commits: Sup, SubA, SubB,
>> SubC
>>
>> The branch "working" in repository SubC still reflects the state after this
>> step.
>>
>> Performed "subtree add" for SubA, SubB, and SubC into Sup with prefixes a,
>> b, and c, respectively.
>> Added additional commits directly in repositories SubA and SubB.
>>
>> The branch "master" in repositories SubA and SubB still reflect the state
>> after this step.
>>
>> Performed "subtree pull" from SubA and SubB to Sup.
>> Added a commit in repository Sup that modifies prefix "c".
>>
>> The branch "master" in repository Sup still reflects the state after this
>> step.
>>
>> Performed "subtree push" to SubC from Sup.
>>
>> The branch "master" in repository SubC still reflects the state after this
>> step.
>> A "git log --patch master" in repository SubC shows commit's made in SubA
>> and SubB.
>>
>>
>> Regards,
>>
>> On Sat, May 21, 2016 at 4:06 PM David A. Greene 
>> wrote:
>>>
>>> ELI  writes:
>>>
>>> > I then reviewed the commit history of contrib/subtree/git-subtree.sh
>>> > and determined that the last successful subtree push was performed
>>> > prior to the integration of this change:
>>> >
>>> > https://git.kernel.org/cgit/git/git.git/commit/contrib/subtree/git-subtree.sh?id=933cfeb90b5d03b4096db6d60494a6eedea25d03
>>> >
>>> > As a next step, I reversed that patch on my local install of git
>>> > subtree, and the result was a successful subtree push.
>>>
>>> So you're saying that this patch caused a regression?
>>>
>>> > Unfortunately, I have not yet reproduced this with a test main project
>>> > and subprojects, and I cannot make the project I observed it in
>>> > public.
>>>
>>> I very much want to see a testcase for this.  I'm planning to
>>> fundamentally rewrite the split code this year and want to make sure it
>>> covers everything it does now and fixes a few bugs that have been
>>> exposed lately.
>>>
>>> It's tough to revert that patch since it fixed a problem for someone and
>>> we don't have a testcase demonstrating the problem you encountered.  Not
>>> saying your problem isn't important but we need to understand it and
>>> have a way to flag it before fixing or hiding it with a revert of the
>>> above patch.
>>>
>>>-David
>>
>> --
>> - Harpreet "Eli" Sangha


Re: [PATCH 4/6] require_clean_work_tree: ensure that the index was read

2016-08-29 Thread Junio C Hamano
Junio C Hamano  writes:

> I am not sure if it should be left as the responsibility of the
> caller (i.e. check the_index.initialized to bark at a caller that
> forgets to read from an index) ...

Scatch that.  That would not work in a freshly created repository
before doing any "git add".  An empty index is a normal state, so it
would not just be annoying to warn "You called me without reading
the index" but is simply wrong.

It was OK to have "we ensure that we've read into the_index from
some index file" as a fallback in a helper function designed to be
used in a "run once and exit" program, but I'd say as a part of
"libified" helper set, we should just make it a responsibility of
the caller to make sure whatever index entries the caller wants to
have in the_index to be used when calling this helper function.

During the course of developing whatever you are building that calls
this function, perhaps you were bitten by an unpolulated the_index,
_not_ because you genuinely could _not_ find the logical place that
is the best location to read the index file at in your code flow,
but simply because you forgot to read one and with that hindsight,
you _know_ what is the logical right place the index file should
have been read.  That is what I am guessing anyway.

And I further guess that this is a well-meaning attempt to _help_
others not having to worry about _when_ exactly the index file is
read.

But I do not think it is being helpful in the longer term.  When to
read the index file and when to discard the contents matters (and
for callers to which it does not matter, they can always read it at
the very beginning of the program, because by definition it does not
matter).  So let's not do this patch, unless there is some other
good reason to have it.




Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
This is great! Thanks Jake. If you happen to have the patch ID it
would be helpful.

Uma

On Mon, Aug 29, 2016 at 5:02 PM, Jacob Keller  wrote:
> On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
>> Uma Srinivasan  writes:
>>> This fixes my issue but what do you think? Is this the right way to
>>> fix it? Is there a better way?
>>
>> I think we already have a helper function that does a lot better
>> than "does it have a file called HEAD" to ask "is this a git
>> directory?" and its name I think is "is_git_directory" (I know, we
>> are not imaginative when naming our functions).
>>
>> As to the check makes sense in the context of this function, I am
>> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
>> know better than I do.
>
> One of my patches adds a "is_git_directory()" call to this, and if we
> fail falls back to checking the .gitmodules and git-config for
> information regarding the submodule should it no longer be checked
> out. I suspect this patch will address your concern.
>
> Thanks,
> Jake


Re: git submodules implementation question

2016-08-29 Thread Jacob Keller
On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>> This fixes my issue but what do you think? Is this the right way to
>> fix it? Is there a better way?
>
> I think we already have a helper function that does a lot better
> than "does it have a file called HEAD" to ask "is this a git
> directory?" and its name I think is "is_git_directory" (I know, we
> are not imaginative when naming our functions).
>
> As to the check makes sense in the context of this function, I am
> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
> know better than I do.

One of my patches adds a "is_git_directory()" call to this, and if we
fail falls back to checking the .gitmodules and git-config for
information regarding the submodule should it no longer be checked
out. I suspect this patch will address your concern.

Thanks,
Jake


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Yes, is_git_directory() is much better. Thanks for the pointer.

I will submit a patch unless I hear more suggestions from others.

Uma



On Mon, Aug 29, 2016 at 4:15 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  
>> wrote:
>>> Ok that makes sense. Thanks much.
>>>
>>> Uma
>>>
>> With respect to my original problem with a corrupted .git directory
>> under the submodule directory, I am thinking of adding the following 4
>> lines marked with ### to is_submodule_modified() to detect the
>> corrupted dir and die quickly instead of forking several child
>> processes:
>>
>>strbuf_addf(, "%s/.git", path);
>>git_dir = read_gitfile(buf.buf);
>>if (!git_dir) {
>>  ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
>>  ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
>>   ### die("Corrupted .git dir in submodule %s", 
>> path);
>>  ###}
>>  git_dir = buf.buf;
>>   }
>>
>> This fixes my issue but what do you think? Is this the right way to
>> fix it? Is there a better way?
>
> I think we already have a helper function that does a lot better
> than "does it have a file called HEAD" to ask "is this a git
> directory?" and its name I think is "is_git_directory" (I know, we
> are not imaginative when naming our functions).
>
> As to the check makes sense in the context of this function, I am
> not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
> know better than I do.


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
Uma Srinivasan  writes:

> On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  
> wrote:
>> Ok that makes sense. Thanks much.
>>
>> Uma
>>
> With respect to my original problem with a corrupted .git directory
> under the submodule directory, I am thinking of adding the following 4
> lines marked with ### to is_submodule_modified() to detect the
> corrupted dir and die quickly instead of forking several child
> processes:
>
>strbuf_addf(, "%s/.git", path);
>git_dir = read_gitfile(buf.buf);
>if (!git_dir) {
>  ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
>  ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
>   ### die("Corrupted .git dir in submodule %s", path);
>  ###}
>  git_dir = buf.buf;
>   }
>
> This fixes my issue but what do you think? Is this the right way to
> fix it? Is there a better way?

I think we already have a helper function that does a lot better
than "does it have a file called HEAD" to ask "is this a git
directory?" and its name I think is "is_git_directory" (I know, we
are not imaginative when naming our functions).

As to the check makes sense in the context of this function, I am
not an expert to judge.  I'd expect Jens, Heiko and/or Stefan to
know better than I do.


Re: [PATCH 5/6] Export also the has_un{staged,committed}_changed() functions

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> diff --git a/wt-status.h b/wt-status.h
> index cc4e5a3..75833c1 100644
> --- a/wt-status.h
> +++ b/wt-status.h
> @@ -115,6 +115,8 @@ void status_printf_ln(struct wt_status *s, const char 
> *color, const char *fmt, .
>  __attribute__((format (printf, 3, 4)))
>  void status_printf(struct wt_status *s, const char *color, const char *fmt, 
> ...);
>  
> +int has_unstaged_changes(void);
> +int has_uncommitted_changes(void);
>  int require_clean_work_tree(const char *action, const char *hint, int 
> gently);
>  
>  #endif /* STATUS_H */

These are good interfaces, if the caller wants to know these bits
independently.


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
With respect to my original problem with a corrupted .git directory
under the submodule directory, I am thinking of adding the following 4
lines marked with ### to is_submodule_modified() to detect the
corrupted dir and die quickly instead of forking several child
processes:

   strbuf_addf(, "%s/.git", path);
   git_dir = read_gitfile(buf.buf);
   if (!git_dir) {
 ### strbuf_addf(_ref, "%s/HEAD",buf.buf);
 ### if (strbuf_read_file(_ref, head_ref.buf,0) < 0) {
  ### die("Corrupted .git dir in submodule %s", path);
 ###}
 git_dir = buf.buf;
  }

This fixes my issue but what do you think? Is this the right way to
fix it? Is there a better way?

Thanks,
Uma

On Mon, Aug 29, 2016 at 2:13 PM, Uma Srinivasan  wrote:
> Ok that makes sense. Thanks much.
>
> Uma
>
> On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano  wrote:
>> On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  
>> wrote:
>>> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:

 A top-level superproject can have a submodule bound at its "dir/"
 directory, and "dir/.git" can either be a gitfile which you can read
 with read_gitfile() and point into somewhere in ".git/modules/" of
 the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
 Git directory.  So at the top of a superproject, you could do

 git clone $URL ./dir2
 git add dir2

 to clone an independent project into dir2 directory, and add it as a
 new submodule.  The fallback is to support such a layout.

>>> Thanks for the reply. However, in this case
>>>
>>>   git clone $URL ./dir2
>>>   git add dir2
>>>
>>> how will "dir2" get ever get registered as a submodule?
>>
>> With a separate invocation of "git config -f .gitmodules", of course.
>> The layout to use gitfile to point into .git/modules/ is a more recent
>> invention than the submodule support itself that "git add" knows about.
>> The code needs to support both layout as well as it can, and that
>> is what the "can we read it as gitfile?  If not, that directory itself may
>> be a git repository" codepath you asked about is doing.


Re: [PATCH 4/6] require_clean_work_tree: ensure that the index was read

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> The function would otherwise pretend to work fine, but totally ignore
> the working directory.

s/^T/Unless the caller has already read the index, t/;

I am not sure if it should be left as the responsibility of the
caller (i.e. check the_index.initialized to bark at a caller that
forgets to read from an index) or it is OK to unconditionally read
from $GIT_INDEX_FILE in a truly libified function.  A caller that
wants to read from a custom (temporary) index file can choose to
make sure it does read_inddex_from() from its custom file before
calling this function, but if it forgets to do so, the penalty is
severe than ordinary callers who would have read from the normal
index file anyway.

Even though the word-lego issue would make this particular API not
yet suitable, "who is responsible for reading the index" is an
orthogonal issue that we can discuss even on this version, so I
thought I'd bring it up.

>
> Signed-off-by: Johannes Schindelin 
> ---
>  wt-status.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/wt-status.c b/wt-status.c
> index 792dda9..33dd76f 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1804,6 +1804,13 @@ int require_clean_work_tree(const char *action, const 
> char *hint, int gently)
>   struct lock_file *lock_file = xcalloc(1, sizeof(*lock_file));
>   int err = 0;
>  
> + if (read_cache() < 0) {
> + error(_("Could not read index"));
> + if (gently)
> + return -1;
> + exit(1);
> + }
> +
>   hold_locked_index(lock_file, 0);
>   refresh_cache(REFRESH_QUIET);
>   update_index_if_able(_index, lock_file);


Re: [PATCH 2/6] pull: make code more similar to the shell script again

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> +static int require_clean_work_tree(const char *action, const char *hint,
> + int gently)
>  {
>   struct lock_file *lock_file = xcalloc(1, sizeof(*lock_file));
> - int do_die = 0;
> + int err = 0;
>  
>   hold_locked_index(lock_file, 0);
>   refresh_cache(REFRESH_QUIET);
> @@ -376,20 +377,26 @@ static void die_on_unclean_work_tree(void)
>   rollback_lock_file(lock_file);
>  
>   if (has_unstaged_changes()) {
> - error(_("Cannot pull with rebase: You have unstaged changes."));
> - do_die = 1;
> + error(_("Cannot %s: You have unstaged changes."), action);
> ...
>   if (!autostash)
> - die_on_unclean_work_tree();
> + require_clean_work_tree("pull with rebase",
> + "Please commit or stash them.", 0);
>  
>   if (get_rebase_fork_point(rebase_fork_point, repo, *refspecs))
>   hashclr(rebase_fork_point);

Splicing an English/C phrase 'pull with rebase' into a
_("localizable %s string") makes the life of i18n team hard.

Can we do this differently?

If you are eventually going to expose this function as public API, I
think the right approach would be to enumerate the possible error
conditions this function can diagnose and return them to the caller,
i.e.

#define WT_STATUS_DIRTY_WORKTREE 01
#define WT_STATUS_DIRTY_INDEX02

static int require_clean_work_tree(void)
{
int status = 0;
...
if (has_unstaged_changes())
status |= WT_STATUS_DIRTY_WORKTREE;
if (has_uncommitted_changes())
status |= WT_STATUS_DIRTY_INDEX;
return status;
}

Then die_on_unclean_work_tree() can be made as a thin-wrapper that
calls it and shows the pull-specific error message.


Re: [PATCH 1/6] pull: drop confusing prefix parameter of die_on_unclean_work_tree()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> -static int has_unstaged_changes(const char *prefix)
> +static int has_unstaged_changes(void)
>  {
>   struct rev_info rev_info;
>   int result;
>  
> - init_revisions(_info, prefix);
> + init_revisions(_info, NULL);
>   DIFF_OPT_SET(_info.diffopt, IGNORE_SUBMODULES);
>   DIFF_OPT_SET(_info.diffopt, QUICK);
>   diff_setup_done(_info.diffopt);
> @@ -344,7 +344,7 @@ static int has_unstaged_changes(const char *prefix)
>  /**
>   * Returns 1 if there are uncommitted changes, 0 otherwise.
>   */
> -static int has_uncommitted_changes(const char *prefix)
> +static int has_uncommitted_changes(void)
>  {
>   struct rev_info rev_info;
>   int result;
> @@ -352,7 +352,7 @@ static int has_uncommitted_changes(const char *prefix)
>   if (is_cache_unborn())
>   return 0;
>  
> - init_revisions(_info, prefix);
> + init_revisions(_info, NULL);

I agree these are good changes.  The prefix being ignored is a mere
accident and a time-bomb waiting to go off.  As you mentioned in the
log message, we do not even want to pass a pathspec to limit the "is
there anything modified" check, which may be affected with prefix,
but we want a full-tree diff no matter where we started in these two
functions.

Good.


Re: Reducing CPU load on git server

2016-08-29 Thread W. David Jarvis
> Pegging CPU for a few seconds doesn't sound out-of-place for
> pack-objects serving a fetch or clone on a large repository. And I can
> certainly believe "minutes", especially if it was not serving a fetch,
> but doing repository maintenance on a large repository.
>
> Talk to GitHub Enterprise support folks about what kind of process
> monitoring and accounting is available. Recent versions of GHE can
> easily tell things like which repositories and processes are using the
> most CPU, RAM, I/O, and network, which ones are seeing a lot of
> parallelism, etc.

We have an open support thread with the GHE support folks, but the
only feedback we've gotten so far on this subject is that they believe
our CPU load is being driven by the quantity of fetch operations (an
average of 16 fetch requests per second during normal business hours,
so 4 requests per second per subscriber box). About 4,000 fetch
requests on our main repository per day.

> None of those operations is triggered by client fetches. You'll see
> "rev-list" for a variety of operations, so that's hard to pinpoint. But
> I'm surprised that "prune" is a common one for you. It is run as part of
> the post-push, but I happen to know that the version that ships on GHE
> is optimized to use bitmaps, and to avoid doing any work when there are
> no loose objects that need pruning in the first place.

Would regular force-pushing trigger prune operations? Our engineering
body loves to force-push.

>> I might be misunderstanding this, but if the subscriber is already "up
>> to date" modulo a single updated ref tip, then this problem shouldn't
>> occur, right? Concretely: if ref A is built off of ref B, and the
>> subscriber already has B when it requests A, that shouldn't cause this
>> behavior, but it would cause this behavior if the subscriber didn't
>> have B when it requested A.
>
> Correct. So this shouldn't be a thing you are running into now, but it's
> something that might be made worse if you switch to fetching only single
> refs.

But in theory if we were always up-to-date (since we'd always fetch
any updated ref) we wouldn't run into this problem? We could have a
cron job to ensure that we run a full git fetch every once in a while
but I'd hope that if this was written properly we'd almost always have
the most recent commit for any dependency ref.

> That really sounds like repository maintenance. Repacks of
> torvalds/linux (including all of its forks) on github.com take ~15
> minutes of CPU. There may be some optimization opportunities there (I
> have a few things I'd like to explore in the next few months), but most
> of it is pretty fundamental. It literally takes a few minutes just to
> walk the entire object graph for that repo (that's one of the more
> extreme cases, of course, but presumably you are hosting some large
> repositories).
>
> Maintenance like that should be a very occasional operation, but it's
> possible that you have a very busy repo.

Our primary repository is fairly busy. It has about 1/3 the commits of
Linux and about 1/3 the refs, but seems otherwise to be on the same
scale. And, of course, it both hasn't been around for as long as Linux
has and has been experiencing exponential growth, which means its
current activity is higher than it has ever been before -- might put
it on a similar scale to Linux's current activity.

> OK, I double-checked, and your version should be coalescing identical
> fetches.
>
> Given that, and that a lot of the load you mentioned above is coming
> from non-fetch sources, it sounds like switching anything with your
> replica fetch strategy isn't likely to help much. And a multi-tiered
> architecture won't help if the load is being generated by requests that
> are serving the web-views directly on the box.
>
> I'd really encourage you to talk with GitHub Support about performance
> and clustering. It sounds like there may be some GitHub-specific things
> to tweak. And it may be that the load is just too much for a single
> machine, and would benefit from spreading the load across multiple git
> servers.

What surprised us is that we had been running this on an r3.4xlarge
(16vCPU) on AWS for two years without too much issue. Then in a span
of months we started experiencing massive CPU load, which forced us to
upgrade the box to one with 32 vCPU (and better CPUs). We just don't
understand what the precise driver of load is here.

As noted above, we are talking with GitHub about performance -- we've
also been pushing them to start working on a clustering plan, but my
impression has been that they're reluctant to go down that path. I
suspect that we use GHE much more aggressively than the typical GHE
client, but I could be wrong about that.

 - V

-- 

venanti.us
203.918.2328



Re: [PATCH 01/20] cache: convert struct cache_entry to use struct object_id

2016-08-29 Thread brian m. carlson
On Mon, Aug 29, 2016 at 05:43:41PM +0200, Johannes Schindelin wrote:
> Hi Kuba,
> 
> On Mon, 29 Aug 2016, Jakub Narębski wrote:
> 
> > I wonder if writing this patch series (or rather the following one)
> > would be helped by using one of semantic patch tools, such as
> > Coccinelle[1], spdiff[2], or Undebt[3]...
> > 
> > [1]: http://coccinelle.lip6.fr/
> 
> If previous work by Brian is any indication, he did use Coccinelle and the
> commit message actually shows the definition used for the transformation.

Yes, that is the case.  I used Coccinelle because it's been used with
success on LKML and it seems to work for the job.  I'll make a note in
the future that it obviously the semantic patch doesn't include the
actual struct change.

My goal with using a tool is that it's less error-prone and it helps
reviewers have more confidence in the changes.  Also, it makes large
changes far less tedious.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH v6 12/13] convert: add filter..process option

2016-08-29 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> +In case the filter cannot or does not want to process the content,
> +it is expected to respond with an "error" status. Depending on the
> +`filter..required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +
> +packet:  git< status=error\n
> +packet:  git< 
> +
> +
> +In case the filter cannot or does not want to process the content
> +as well as any future content for the lifetime of the Git process,
> +it is expected to respond with an "error-all" status. Depending on
> +the `filter..required` flag Git will interpret that as error
> +but it will not stop or restart the filter process.
> +
> +packet:  git< status=error-all\n
> +packet:  git< 
> +

This part of the document is well-written to help filter-writers.

One thing that was unclear from the above to me, when read as a
potential filter-writer, is when I am supposed to exit(2).  After I
tell Git with error-all (I would have called it "abort", but that's
OK) that I desire no further communication, am I free to go?  Or do
I wait until Git somehow disconnects (perhaps by closing the packet
stream I have been reading)?

> +If the filter dies during the communication or does not adhere to
> +the protocol then Git will stop the filter process and restart it
> +with the next file that needs to be processed.

Hmph, is there a reason not to retry a half-converted-and-failed
blob with the fresh process?  Note that this is not "you must do it
that way", and it is not even "I think doing so may be a better
idea".  I merely want to know the reason behind this decision.

> +After the filter has processed a blob it is expected to wait for
> +the next "key=value" list containing a command. When the Git process
> +terminates, it will send a kill signal to the filter in that stage.

The "kill" may not be very nice.  As Git side _knows_ that the
filter is waiting for the next command, having an explicit
"shutdown" command would give the filter a chance to implement a
clean exit--it may have some housekeeping tasks it wants to perform
once it is done.  The "explicit shutdown" could just be "the pipe
gets closed", so from the implementation point of view there may not
be anything you need to further add to this patch (after all, when
we exit, the pipes to them would be closed), but the shutdown
protocol and the expectation on the behaviour of filter processes
would need to be documented.

> +If a `filter..clean` or `filter..smudge` command
> +is configured then these commands always take precedence over
> +a configured `filter..process` command.

It may make more sense to give precedence to the .process (which is
a late-comer) if defined, ignoring .clean and .smudge, than the
other way around.

> +Please note that you cannot use an existing `filter..clean`
> +or `filter..smudge` command with `filter..process`
> +because the former two use a different inter process communication
> +protocol than the latter one.

Would it be a useful sample program we can ship in contrib/ if you
created a "filter adapter" that reads these two configuration
variables and act as a filter..process?

During an imaginary session of "git add .", I think I found where
you start THE filter process upon the first path that needs to be
filtered with one for the configured , and I think the same
place is where you reuse THE filter process, but I am not sure where
you are cleaning up by killing the filter once all paths are added.
Wouldn't you need some hooks at strategic places after such bulk
operation to tell the multi-file-filter machinery to walk all the
entries in cmd_process_map and tell the remaining filter processes
that they have no more tasks, or something?  Are you relying on
these processes to exit upon a read failure after we exit and the
pipe going to the filter is severed?

Thanks.


Re: [PATCH 18/20] builtin/am: convert to struct object_id

2016-08-29 Thread brian m. carlson
On Mon, Aug 29, 2016 at 03:02:56PM +0800, Paul Tan wrote:
> Hi Brian,
> 
> On Mon, Aug 29, 2016 at 7:27 AM, brian m. carlson
>  wrote:
> > Signed-off-by: brian m. carlson 
> > ---
> >  builtin/am.c | 138 
> > +--
> >  1 file changed, 69 insertions(+), 69 deletions(-)
> 
> I looked through this patch, and the conversion looks faithful and
> straightforward to me. Just two minor comments:
> 
> > diff --git a/builtin/am.c b/builtin/am.c
> > index 739b34dc..632d4288 100644
> > --- a/builtin/am.c
> > +++ b/builtin/am.c
> > @@ -1053,10 +1053,10 @@ static void am_setup(struct am_state *state, enum 
> > patch_format patch_format,
> > else
> > write_state_text(state, "applying", "");
> >
> > -   if (!get_sha1("HEAD", curr_head)) {
> > -   write_state_text(state, "abort-safety", 
> > sha1_to_hex(curr_head));
> > +   if (!get_oid("HEAD", _head)) {
> > +   write_state_text(state, "abort-safety", 
> > oid_to_hex(_head));
> > if (!state->rebasing)
> > -   update_ref("am", "ORIG_HEAD", curr_head, NULL, 0,
> > +   update_ref("am", "ORIG_HEAD", curr_head.hash, NULL, 
> > 0,
> > UPDATE_REFS_DIE_ON_ERR);
> 
> I noticed that you used update_ref_oid() in other places of this
> patch. Perhaps this should use update_ref_oid() as well for
> consistency?

I'll do that in a reroll.

> > @@ -1665,9 +1665,8 @@ static int fall_back_threeway(const struct am_state 
> > *state, const char *index_pa
> >   */
> >  static void do_commit(const struct am_state *state)
> >  {
> > -   unsigned char tree[GIT_SHA1_RAWSZ], parent[GIT_SHA1_RAWSZ],
> > - commit[GIT_SHA1_RAWSZ];
> > -   unsigned char *ptr;
> > +   struct object_id tree, parent, commit;
> > +   struct object_id *ptr;
> 
> Ah, I just noticed that this is a very poorly named variable. Whoops.
> Since we are here, should we rename this to something like "old_oid"?
> Also, this should probably be a "const struct object_id *" as well, I
> think.

I'll fix that.  Thanks for the review.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 05/22] sequencer: allow the sequencer to take custody of malloc()ed data

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 10:04, Johannes Schindelin pisze:

> The sequencer is our attempt to lib-ify cherry-pick. Yet it behaves
> like a one-shot command when it reads its configuration: memory is
> allocated and released only when the command exits.
> 
> This is kind of okay for git-cherry-pick, which *is* a one-shot
> command. All the work to make the sequencer its work horse was
> done to allow using the functionality as a library function, though,
> including proper clean-up after use.
> 
> This patch introduces an API to pass the responsibility of releasing
> certain memory to the sequencer.

So how this API would be / is meant to be used?  From the following
patches (which I shouldn't have to read to understand this one)
it looks like it is about strdup'ed strings from option parsing.
Or would there be something more in the future?

Would sequencer as a library function be called multiple times,
or only once?


I'm trying to find out how this is solved in other places of Git
code, and I have stumbled upon free_util in string_list...

> 
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 13 +
>  sequencer.h |  8 +++-
>  2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index c4b223b..b5be0f9 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -114,9 +114,22 @@ static int has_conforming_footer(struct strbuf *sb, 
> struct strbuf *sob,
>   return 1;
>  }
>  
> +void *sequencer_entrust(struct replay_opts *opts, void 
> *set_me_free_after_use)
> +{
> + ALLOC_GROW(opts->owned, opts->owned_nr + 1, opts->owned_alloc);
> + opts->owned[opts->owned_nr++] = set_me_free_after_use;
> +
> + return set_me_free_after_use;

I was wondering what this 'set_me_free_after_use' parameter is about;
wouldn't it be more readable if this parameter was called 'owned_data'
or 'owned_ptr'?

> +}
> +
>  static void remove_sequencer_state(const struct replay_opts *opts)
>  {
>   struct strbuf dir = STRBUF_INIT;
> + int i;
> +
> + for (i = 0; i < opts->owned_nr; i++)
> + free(opts->owned[i]);

I guess you can remove owned data in any order, regardless if you
store struct or its members first...

> + free(opts->owned);
>  
>   strbuf_addf(, "%s", get_dir(opts));
>   remove_dir_recursively(, 0);
> diff --git a/sequencer.h b/sequencer.h
> index c955594..20b708a 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -43,8 +43,14 @@ struct replay_opts {
>  
>   /* Only used by REPLAY_NONE */
>   struct rev_info *revs;
> +
> + /* malloc()ed data entrusted to the sequencer */
> + void **owned;
> + int owned_nr, owned_alloc;

I'm not sure about naming conventions for those types of data, but
wouldn't 'owned_data' be a better name?  I could be wrong here...

>  };
> -#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, NULL, 
> NULL, 0, 0, NULL }
> +#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, NULL, 
> NULL, 0, 0, NULL, NULL, 0, 0 }

Nb. it is a pity that we cannot use named initializers for structs,
so called designated inits.  It would make this macro more readable.

> +
> +void *sequencer_entrust(struct replay_opts *opts, void 
> *set_me_free_after_use);
>  
>  int sequencer_pick_revisions(struct replay_opts *opts);
>  
> 



Re: [PATCH v4 0/3] diff-highlight: add support for --graph option

2016-08-29 Thread Junio C Hamano
Brian Henderson  writes:

> How does this look?
>
> Drawing the graph helped me a lot in figuring out what I was
> actually testing. thanks!

Yeah, I also am pleased to see the picture of what is being tested
in the test script.

With your sign-off, they would have been almost perfect ;-).

> Brian Henderson (3):
>   diff-highlight: add some tests.
>   diff-highlight: add failing test for handling --graph output.
>   diff-highlight: add support for --graph output.
>
>  contrib/diff-highlight/Makefile  |   5 +
>  contrib/diff-highlight/diff-highlight|  19 +-
>  contrib/diff-highlight/t/Makefile|  22 +++
>  contrib/diff-highlight/t/t9400-diff-highlight.sh | 223 
> +++
>  4 files changed, 263 insertions(+), 6 deletions(-)
>  create mode 100644 contrib/diff-highlight/Makefile
>  create mode 100644 contrib/diff-highlight/t/Makefile
>  create mode 100755 contrib/diff-highlight/t/t9400-diff-highlight.sh


Re: [PATCH 14/22] sequencer: prepare for rebase -i's commit functionality

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> +/* We will introduce the 'interactive rebase' mode later */
> +#define IS_REBASE_I() 0

I do not see a point in naming this all caps.  The use site would be
a lot more pleasant to read when the reader does not have to care if
this is implemented as a preprocessor macro or a helper function.

> @@ -377,20 +387,72 @@ static int is_index_unchanged(void)
>   return !hashcmp(active_cache_tree->sha1, 
> head_commit->tree->object.oid.hash);
>  }
>  
> +static char **read_author_script(void)
> +{
> + struct strbuf script = STRBUF_INIT;
> + int i, count = 0;
> + char *p, *p2, **env;
> + size_t env_size;
> +
> + if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
> + return NULL;
> +
> + for (p = script.buf; *p; p++)
> + if (skip_prefix(p, "'''", (const char **)))
> + strbuf_splice(, p - script.buf, p2 - p, "'", 1);
> + else if (*p == '\'')
> + strbuf_splice(, p-- - script.buf, 1, "", 0);
> + else if (*p == '\n') {
> + *p = '\0';
> + count++;
> + }

Hmph.  What is this loop doing?  Is it decoding a sq-quoted buffer
or something?  Don't we have a helper function to do that?

> + env_size = (count + 1) * sizeof(*env);
> + strbuf_grow(, env_size);
> + memmove(script.buf + env_size, script.buf, script.len);
> + p = script.buf + env_size;
> + env = (char **)strbuf_detach(, NULL);
> +
> + for (i = 0; i < count; i++) {
> + env[i] = p;
> + p += strlen(p) + 1;
> + }
> + env[count] = NULL;
> +
> + return env;
> +}
> +
>  /*
>   * If we are cherry-pick, and if the merge did not result in
>   * hand-editing, we will hit this commit and inherit the original
>   * author date and name.
>   * If we are revert, or if our cherry-pick results in a hand merge,
> - * we had better say that the current user is responsible for that.
> + * we had better say that the current user is responsible for that
> + * (except, of course, while running an interactive rebase).
>   */

The added "(except, ...)" reads as if "even if we are reverting, if
that is done as part of an interactive rebase, the authorship rule
for a revert does not apply".

If that is not what you meant, i.e. if you did not mean to imply
that "rebase -i" doing a revert is a normal thing, this needs to be
rephrased to avoid the misinterpretation.


Re: Reducing CPU load on git server

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 13:57 -0700, W. David Jarvis wrote:

> >  * If you do need branches consider archiving stale tags/branches
> > after some time. I implemented this where I work, we just have a
> > $REPO-archive.git with every tag/branch ever created for a given
> > $REPO.git, and delete refs after a certain time.
> 
> This is something else that we're actively considering. Why did your
> company implement this -- was it to reduce load, or just to clean up
> your repositories? Did you notice any change in server load?

At 50k refs, ref negotiation gets expensive, especially over http

D. 


Re: Reducing CPU load on git server

2016-08-29 Thread Jeff King
On Mon, Aug 29, 2016 at 12:16:20PM -0700, W. David Jarvis wrote:

> > Do you know which processes are generating the load? git-upload-pack
> > does the negotiation, and then pack-objects does the actual packing.
> 
> When I look at expensive operations (ones that I can see consuming
> 90%+ of a CPU for more than a second), there are often pack-objects
> processes running that will consume an entire core for multiple
> seconds (I also saw one pack-object counting process run for several
> minutes while using up a full core).

Pegging CPU for a few seconds doesn't sound out-of-place for
pack-objects serving a fetch or clone on a large repository. And I can
certainly believe "minutes", especially if it was not serving a fetch,
but doing repository maintenance on a large repository.

Talk to GitHub Enterprise support folks about what kind of process
monitoring and accounting is available. Recent versions of GHE can
easily tell things like which repositories and processes are using the
most CPU, RAM, I/O, and network, which ones are seeing a lot of
parallelism, etc.

> rev-list shows up as a pretty active CPU consumer, as do prune and
> blame-tree.
> 
> I'd say overall that in terms of high-CPU consumption activities,
> `prune` and `rev-list` show up the most frequently.

None of those operations is triggered by client fetches. You'll see
"rev-list" for a variety of operations, so that's hard to pinpoint. But
I'm surprised that "prune" is a common one for you. It is run as part of
the post-push, but I happen to know that the version that ships on GHE
is optimized to use bitmaps, and to avoid doing any work when there are
no loose objects that need pruning in the first place.

Blame-tree is a GitHub-specific command (it feeds the main repository
view page), and is a known CPU hog. There's more clever caching for that
coming down the pipe, but it's not shipped yet.

> On the subject of prune - I forgot to mention that the `git fetch`
> calls from the subscribers are running `git fetch --prune`. I'm not
> sure if that changes the projected load profile.

That shouldn't change anything; the pruning is purely a client side
thing.

> > Maybe. If pack-objects is where your load is coming from, then
> > counter-intuitively things sometimes get _worse_ as you fetch less. The
> > problem is that git will generally re-use deltas it has on disk when
> > sending to the clients. But if the clients are missing some of the
> > objects (because they don't fetch all of the branches), then we cannot
> > use those deltas and may need to recompute new ones. So you might see
> > some parts of the fetch get cheaper (negotiation, pack-object's
> > "Counting objects" phase), but "Compressing objects" gets more
> > expensive.
> 
> I might be misunderstanding this, but if the subscriber is already "up
> to date" modulo a single updated ref tip, then this problem shouldn't
> occur, right? Concretely: if ref A is built off of ref B, and the
> subscriber already has B when it requests A, that shouldn't cause this
> behavior, but it would cause this behavior if the subscriber didn't
> have B when it requested A.

Correct. So this shouldn't be a thing you are running into now, but it's
something that might be made worse if you switch to fetching only single
refs.

> See comment above about a long-running counting objects process. I
> couldn't tell which of our repositories it was counting, but it went
> for about 3 minutes with full core utilization. I didn't see us
> counting pack-objects frequently but it's an expensive operation.

That really sounds like repository maintenance. Repacks of
torvalds/linux (including all of its forks) on github.com take ~15
minutes of CPU. There may be some optimization opportunities there (I
have a few things I'd like to explore in the next few months), but most
of it is pretty fundamental. It literally takes a few minutes just to
walk the entire object graph for that repo (that's one of the more
extreme cases, of course, but presumably you are hosting some large
repositories).

Maintenance like that should be a very occasional operation, but it's
possible that you have a very busy repo.

> > There's nothing in upstream git to help smooth these loads, but since
> > you mentioned GitHub Enterprise, I happen to know that it does have a
> > system for coalescing multiple fetches into a single pack-objects. I
> > _think_ it's in GHE 2.5, so you might check which version you're
> > running (and possibly also talk to GitHub Support, who might have more
> > advice; there are also tools for finding out which git processes are
> > generating the most load, etc).
> 
> We're on 2.6.4 at the moment.

OK, I double-checked, and your version should be coalescing identical
fetches.

Given that, and that a lot of the load you mentioned above is coming
from non-fetch sources, it sounds like switching anything with your
replica fetch strategy isn't likely to help much. And a multi-tiered
architecture won't help if the load is 

Re: [PATCH v2] blame: fix segfault on untracked files

2016-08-29 Thread Junio C Hamano
Thomas Gummerer  writes:

>> The point of this fix is not that we show the exact error message,
>> but we fail in a controlled manner.  I think
>>
>> test_expect_success 'blame untracked file in empty repo' '
>>>untracked &&
>>test_must_fail git blame untracked
>>'
>>
>> is sufficient.
>
> Yeah, I agree with that.
>
> Thanks for the review!

Thanks for a quick follow-up fix during -rc period.

>>  builtin/blame.c  | 3 ++-
>  t/t8002-blame.sh | 5 +
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 7ec7823..a5bbf91 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2244,7 +2244,8 @@ static void verify_working_tree_path(struct commit 
> *work_tree, const char *path)
>   pos = cache_name_pos(path, strlen(path));
>   if (pos >= 0)
>   ; /* path is in the index */
> - else if (!strcmp(active_cache[-1 - pos]->name, path))
> + else if (-1 - pos < active_nr &&
> +  !strcmp(active_cache[-1 - pos]->name, path))
>   ; /* path is in the index, unmerged */
>   else
>   die("no such path '%s' in HEAD", path);
> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index ff09ace..ab79de9 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -6,6 +6,11 @@ test_description='git blame'
>  PROG='git blame -c'
>  . "$TEST_DIRECTORY"/annotate-tests.sh
>  
> +test_expect_success 'blame untracked file in empty repo' '
> + >untracked &&
> + test_must_fail git blame untracked
> +'
> +
>  PROG='git blame -c -e'
>  test_expect_success 'blame --show-email' '
>   check_count \


Re: [PATCH v2 10/14] sequencer: lib'ify read_populate_opts()

2016-08-29 Thread Junio C Hamano
Junio C Hamano  writes:

>> +if (git_config_from_file(populate_opts_cb, git_path_opts_file(), *opts) 
>> < 0)
>> +return error(_("Malformed options sheet: %s"),
>> +git_path_opts_file());
>> +return 0;
>
> As discussed, perhaps have a comment immediately before calling
> config-from-file that says that the call could die when it is fed a
> syntactically broken file, but we ignore it for now because we will
> be writing the file we have written, or something?

Obviously we will be _READING_ here ;-)


Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Ok that makes sense. Thanks much.

Uma

On Mon, Aug 29, 2016 at 2:09 PM, Junio C Hamano  wrote:
> On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  
> wrote:
>> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
>>>
>>> A top-level superproject can have a submodule bound at its "dir/"
>>> directory, and "dir/.git" can either be a gitfile which you can read
>>> with read_gitfile() and point into somewhere in ".git/modules/" of
>>> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
>>> Git directory.  So at the top of a superproject, you could do
>>>
>>> git clone $URL ./dir2
>>> git add dir2
>>>
>>> to clone an independent project into dir2 directory, and add it as a
>>> new submodule.  The fallback is to support such a layout.
>>>
>> Thanks for the reply. However, in this case
>>
>>   git clone $URL ./dir2
>>   git add dir2
>>
>> how will "dir2" get ever get registered as a submodule?
>
> With a separate invocation of "git config -f .gitmodules", of course.
> The layout to use gitfile to point into .git/modules/ is a more recent
> invention than the submodule support itself that "git add" knows about.
> The code needs to support both layout as well as it can, and that
> is what the "can we read it as gitfile?  If not, that directory itself may
> be a git repository" codepath you asked about is doing.


Re: [PATCH 20/22] sequencer: remember do_recursive_merge()'s return value

2016-08-29 Thread Junio C Hamano
Jakub Narębski  writes:

> W dniu 29.08.2016 o 10:06, Johannes Schindelin pisze:
>
>> diff --git a/sequencer.c b/sequencer.c
>> index 5ec956f..0614b90 100644
>> --- a/sequencer.c
>> +++ b/sequencer.c
>> @@ -623,7 +623,7 @@ static int do_pick_commit(enum todo_command command, 
>> struct commit *commit,
>>  const char *base_label, *next_label;
>>  struct commit_message msg = { NULL, NULL, NULL, NULL };
>>  struct strbuf msgbuf = STRBUF_INIT;
>> -int res, unborn = 0, allow;
>> +int res = 0, unborn = 0, allow;
>
> Not that I am against this part of change, making initialization
> explicit, but why we are initializing automatic variables with 0,
> which would be the default value anyway?

Because an on-stack "auto" begins its life with an undefined value,
unlike a file-scope static (and global variables) that can be in BSS
segment.

> I thought our coding
> guidelines discourage initializing with 0 or NULL...

You are confused between the two, I am afraid.



Re: git submodules implementation question

2016-08-29 Thread Uma Srinivasan
Thanks for the reply. However, in this case

  git clone $URL ./dir2
  git add dir2

how will "dir2" get ever get registered as a submodule? I don't see
how one can reach the "is_submodule_modified" routine for the scenario
above.

My understanding is that a sub-directory can be registered as a
submodule only with the "git submodule add " command. In this case
only a gitlink file is created within the sub-directory and not a .git
subdirectory. Please correct me if I am wrong.

Thanks again,
Uma

On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
> Uma Srinivasan  writes:
>
>> git_dir = read_gitfile(buf.buf);
>> if (!git_dir)
>>
>> git_dir = buf.buf;
>>
>> Can anyone explain to me why we are replacing a failed reading of a
>> git file with the original sub directory name?
>
> A top-level superproject can have a submodule bound at its "dir/"
> directory, and "dir/.git" can either be a gitfile which you can read
> with read_gitfile() and point into somewhere in ".git/modules/" of
> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
> Git directory.  So at the top of a superproject, you could do
>
> git clone $URL ./dir2
> git add dir2
>
> to clone an independent project into dir2 directory, and add it as a
> new submodule.  The fallback is to support such a layout.
>


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
On Mon, Aug 29, 2016 at 2:03 PM, Uma Srinivasan  wrote:
> On Mon, Aug 29, 2016 at 1:03 PM, Junio C Hamano  wrote:
>>
>> A top-level superproject can have a submodule bound at its "dir/"
>> directory, and "dir/.git" can either be a gitfile which you can read
>> with read_gitfile() and point into somewhere in ".git/modules/" of
>> the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
>> Git directory.  So at the top of a superproject, you could do
>>
>> git clone $URL ./dir2
>> git add dir2
>>
>> to clone an independent project into dir2 directory, and add it as a
>> new submodule.  The fallback is to support such a layout.
>>
> Thanks for the reply. However, in this case
>
>   git clone $URL ./dir2
>   git add dir2
>
> how will "dir2" get ever get registered as a submodule?

With a separate invocation of "git config -f .gitmodules", of course.
The layout to use gitfile to point into .git/modules/ is a more recent
invention than the submodule support itself that "git add" knows about.
The code needs to support both layout as well as it can, and that
is what the "can we read it as gitfile?  If not, that directory itself may
be a git repository" codepath you asked about is doing.


Re: [PATCH v2 0/4] cat-file: optionally convert to worktree version

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> When third-party tools need to access to contents of blobs in the
> database, they might be more interested in the worktree version than in
> the "clean" version of said contents.

Just a friendly reminder before you completely shift your attention
to unrelated topics.  I think this topic is almost there; let's not
stretch ourselves too thin by nibbling-here-nibbling-there and leaving
loose ends untied.

Thanks.



Re: Reducing CPU load on git server

2016-08-29 Thread W. David Jarvis
>  * Consider having that queue of yours just send the pushed payload
> instead of "pull this", see git-bundle. This can turn this sync entire
> thing into a static file distribution problem.

As far as I know, GHE doesn't support this out of the box. We've asked
them for essentially this, though. Due to the nature of our license we
may not be able to configure something like this on the server
instance ourselves.

>  * It's not clear from your post why you have to worry about all these
> branches, surely your Chef instances just need the "master" branch,
> just push that around.

We allow deployments from non-master branches, so we do need multiple
branches. We also use the replication fleet as the target for our
build system, which needs to be able to build essentially any branch
on any repository.

>  * If you do need branches consider archiving stale tags/branches
> after some time. I implemented this where I work, we just have a
> $REPO-archive.git with every tag/branch ever created for a given
> $REPO.git, and delete refs after a certain time.

This is something else that we're actively considering. Why did your
company implement this -- was it to reduce load, or just to clean up
your repositories? Did you notice any change in server load?

>  * If your problem is that you're CPU bound on the master have you
> considered maybe solving this with something like NFS, i.e. replace
> your ad-hoc replication with just a bunch of "slave" boxes that mount
> the remote filesystem.

This is definitely an interesting idea. It'd be a significant
architectural change, though, and not one I'm sure we'd be able to get
support for.

>  * Or, if you're willing to deal with occasional transitory repo
> corruption (just retry?): rsync.

I think this is a cost we're not necessarily okay with having to deal with.

>  * Theres's no reason for why your replication chain needs to be
> single-level if master CPU is really the issue. You could have master
> -> N slaves -> N*X slaves, or some combination thereof.

This was discussed above - if the primary driver of load is the first
fetch, then moving to a multi-tiered architecture will not solve our
problems.

>  * Does it really even matter that your "slave" machines are all
> up-to-date? We have something similar at work but it's just a minutely
> cronjob that does "git fetch" on some repos, since the downstream
> thing (e.g. the chef run) doesn't run more than once every 30m or
> whatever anyway.

It does, because we use the replication fleet for our build server.

 - V

-- 

venanti.us
203.918.2328



Re: [PATCH v2 12/14] sequencer: lib'ify save_head()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

>   strbuf_addf(, "%s\n", head);
>   if (write_in_full(fd, buf.buf, buf.len) < 0)
> - die_errno(_("Could not write to %s"), git_path_head_file());
> + return error_errno(_("Could not write to %s"),
> +git_path_head_file());

Same comment around a left-over lockfile applies to this.  An extra
rollback being minimally intrusive also applies here, I think.

>   if (commit_lock_file(_lock) < 0)
> - die(_("Error wrapping up %s."), git_path_head_file());
> + return error(_("Error wrapping up %s."), git_path_head_file());
> + return 0;
>  }


Re: [PATCH v2 00/14] Lib'ify quite a few functions in sequencer.c

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> This patch series is one of the half dozen patch series left to move the
> bulk of rebase -i into a builtin.

This was a lot easier to understand compared to the previous round,
and overall looked alright.

Thanks.


Re: [PATCH v2 10/14] sequencer: lib'ify read_populate_opts()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> Instead of dying there, let the caller high up in the callchain notice
> the error and handle it (by dying, still).
>
> The only caller of read_populate_opts(), sequencer_continue() can
> already return errors, so its caller must be already prepared to
> handle error returns, and with this step, we make it notice an error
> return from this function.
>
> So this is a safe conversion to make read_populate_opts() callable
> from new callers that want it not to die, without changing the
> external behaviour of anything existing.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index e11b24f..be6020a 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -808,12 +808,14 @@ static int populate_opts_cb(const char *key, const char 
> *value, void *data)
>   return 0;
>  }
>  
> -static void read_populate_opts(struct replay_opts **opts_ptr)
> +static int read_populate_opts(struct replay_opts **opts)
>  {
>   if (!file_exists(git_path_opts_file()))
> - return;
> - if (git_config_from_file(populate_opts_cb, git_path_opts_file(), 
> *opts_ptr) < 0)
> - die(_("Malformed options sheet: %s"), git_path_opts_file());
> + return 0;
> + if (git_config_from_file(populate_opts_cb, git_path_opts_file(), *opts) 
> < 0)
> + return error(_("Malformed options sheet: %s"),
> + git_path_opts_file());
> + return 0;

As discussed, perhaps have a comment immediately before calling
config-from-file that says that the call could die when it is fed a
syntactically broken file, but we ignore it for now because we will
be writing the file we have written, or something?

>  }
>  
>  static int walk_revs_populate_todo(struct commit_list **todo_list,
> @@ -1021,8 +1023,8 @@ static int sequencer_continue(struct replay_opts *opts)
>  
>   if (!file_exists(git_path_todo_file()))
>   return continue_single_pick();
> - read_populate_opts();
> - if (read_populate_todo(_list, opts))
> + if (read_populate_opts() ||
> + read_populate_todo(_list, opts))
>   return -1;
>  
>   /* Verify that the conflict has been resolved */


Re: [PATCH v2 08/14] sequencer: lib'ify read_and_refresh_cache()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> Instead of dying there, let the caller high up in the callchain
> notice the error and handle it (by dying, still).
>
> There are two call sites of read_and_refresh_cache(), one of which is
> pick_commits(), whose callers were already prepared to do the right
> thing given an "error" return from it by an earlier patch, so the
> conversion is safe.
>
> The other one, sequencer_pick_revisions() was also prepared to relay
> an error return back to its caller in all remaining cases in an
> earlier patch.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 15 ++-
>  1 file changed, 10 insertions(+), 5 deletions(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index c006cae..e30aa82 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -640,18 +640,21 @@ static int prepare_revs(struct replay_opts *opts)
>   return 0;
>  }
>  
> -static void read_and_refresh_cache(struct replay_opts *opts)
> +static int read_and_refresh_cache(struct replay_opts *opts)
>  {
>   static struct lock_file index_lock;
>   int index_fd = hold_locked_index(_lock, 0);
>   if (read_index_preload(_index, NULL) < 0)
> - die(_("git %s: failed to read the index"), action_name(opts));
> + return error(_("git %s: failed to read the index"),
> + action_name(opts));
>   refresh_index(_index, REFRESH_QUIET|REFRESH_UNMERGED, NULL, NULL, 
> NULL);
>   if (the_index.cache_changed && index_fd >= 0) {
>   if (write_locked_index(_index, _lock, COMMIT_LOCK))
> - die(_("git %s: failed to refresh the index"), 
> action_name(opts));
> + return error(_("git %s: failed to refresh the index"),
> + action_name(opts));
>   }
>   rollback_lock_file(_lock);
> + return 0;
>  }

With the current set of callers, a caller that notices an error from
this function will immediately exit without doing any further
damage.

So in that sense, this is a "safe" conversion.

But is it a sensible conversion?  When the caller wants to do
anything else (e.g. clean-up and try something else, perhaps read
the index again), the caller can't, as the index is still locked,
because even though the code knows that the lock will not be
released until the process exit, it chose to return error without
releasing the lock.

For a file-scope static helper, that probably is sufficient.  But if
this can be reached from a public entry point in the API, the caller
of that entry point will find this not-so-useful, I would think.

I suspect doing the "right thing" to future-proof it may not be too
much more work.

static int read_and_refresh_cache(struct replay_opts *opts)
{
+   int retval = 0; /* assume success */
...
if (read_idnex_preload(...) < 0) {
retval = error(...);
goto finish;
}
refresh_index(...);
if (...changed...) {
if (write_locked_index(...))
retval = error(...);
}
+   finish:
rollback_lock_file(_lock);
return retval;
}

or something like that on top?


Re: [PATCH v2 05/14] sequencer: lib'ify do_pick_commit()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> Instead of dying there, let the caller high up in the callchain notice
> the error and handle it (by dying, still).
>
> The only two callers of do_pick_commit(), pick_commits() and
> single_pick() already check the return value and pass it on to their
> callers, so their callers must be already prepared to handle error
> returns, and with this step, we make it notice an error return from
> this function.
>
> So this is a safe conversion to make do_pick_commit() callable from
> new callers that want it not to die, without changing the external
> behaviour of anything existing.
>
> While at it, remove the superfluous space.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/sequencer.c b/sequencer.c
> index f6cdf65..7eef512 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -464,7 +464,7 @@ static int do_pick_commit(struct commit *commit, struct 
> replay_opts *opts)
>* to work on.
>*/
>   if (write_cache_as_tree(head, 0, NULL))
> - die (_("Your index file is unmerged."));
> + return error(_("Your index file is unmerged."));

Makes sense.

do_pick_commit() still calls fast_forward_to(), which was silently
half libified in 3/14 but can still die in checkout_fast_forward(),
though.


Re: [PATCH 20/22] sequencer: remember do_recursive_merge()'s return value

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 10:06, Johannes Schindelin pisze:

> diff --git a/sequencer.c b/sequencer.c
> index 5ec956f..0614b90 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -623,7 +623,7 @@ static int do_pick_commit(enum todo_command command, 
> struct commit *commit,
>   const char *base_label, *next_label;
>   struct commit_message msg = { NULL, NULL, NULL, NULL };
>   struct strbuf msgbuf = STRBUF_INIT;
> - int res, unborn = 0, allow;
> + int res = 0, unborn = 0, allow;

Not that I am against this part of change, making initialization
explicit, but why we are initializing automatic variables with 0,
which would be the default value anyway?  I thought our coding
guidelines discourage initializing with 0 or NULL...

Puzzled,
-- 
Jakub Narębski



Re: [PATCH v2 03/14] sequencer: lib'ify write_message()

2016-08-29 Thread Junio C Hamano
Johannes Schindelin  writes:

> Instead of dying there, let the caller high up in the callchain
> notice the error and handle it (by dying, still).
>
> The only caller of write_message(), do_pick_commit() already checks
> the return value and passes it on to its callers, so its caller must
> be already prepared to handle error returns, and with this step, we
> make it notice an error return from this function.
> ...
> @@ -223,7 +226,7 @@ static int fast_forward_to(const unsigned char *to, const 
> unsigned char *from,
>  
>   read_cache();
>   if (checkout_fast_forward(from, to, 1))
> - exit(128); /* the callee should have complained already */
> + return -1; /* the callee should have complained already */

This hunk does not seem to have anything to do with write_message()
conversion, other than that its only caller, do_pick_commit(), also
calls write_message().  checkout_fast_forward() itself can die when
it cannot write the index, though, so this hunk may deserve to be in
its own patch that teaches checkout_fast_forward() to instead return
an error if/when its caller desires.

With the updated message, the series has become far easier to review,
and the reordering them to sequencer-pick-revisions at the beginning
makes quite a lot of sense.


Re: [PATCH 03/22] sequencer: avoid unnecessary indirection

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 10:04, Johannes Schindelin pisze:

> We really do not need the *pointer to a* pointer to the options in
> the read_populate_opts() function.

Right.
 
> Signed-off-by: Johannes Schindelin 
> ---
>  sequencer.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 4d2b4e3..14ef79b 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -809,11 +809,11 @@ static int populate_opts_cb(const char *key, const char 
> *value, void *data)
>   return 0;
>  }
>  
> -static int read_populate_opts(struct replay_opts **opts)
> +static int read_populate_opts(struct replay_opts *opts)

Especially that other *_populate_*() use 'struct replay_opts *opts':

   read_populate_todo(struct commit_list **todo_list, struct replay_opts *opts)
   walk_revs_populate_todo(struct commit_list **todo_list, struct replay_opts 
*opts)

Though they use **todo_list, because they modify this list;
maybe that was why read_populate_opts was using **opts instead
of *opts?

>  {
>   if (!file_exists(git_path_opts_file()))
>   return 0;
> - if (git_config_from_file(populate_opts_cb, git_path_opts_file(), *opts) 
> < 0)
> + if (git_config_from_file(populate_opts_cb, git_path_opts_file(), opts) 
> < 0)
>   return error(_("Malformed options sheet: %s"),
>   git_path_opts_file());
>   return 0;
> @@ -1038,7 +1038,7 @@ static int sequencer_continue(struct replay_opts *opts)
>  
>   if (!file_exists(git_path_todo_file()))
>   return continue_single_pick();
> - if (read_populate_opts() ||
> + if (read_populate_opts(opts) ||
>   read_populate_todo(_list, opts))
>   return -1;
>  
> 



Re: Reducing CPU load on git server

2016-08-29 Thread Ævar Arnfjörð Bjarmason
On Sun, Aug 28, 2016 at 9:42 PM, W. David Jarvis
 wrote:
> I've run into a problem that I'm looking for some help with. Let me
> describe the situation, and then some thoughts.

Just a few points that you may not have considered, and I didn't see
mentioned in this thread:

 * Consider having that queue of yours just send the pushed payload
instead of "pull this", see git-bundle. This can turn this sync entire
thing into a static file distribution problem.

 * It's not clear from your post why you have to worry about all these
branches, surely your Chef instances just need the "master" branch,
just push that around.

 * If you do need branches consider archiving stale tags/branches
after some time. I implemented this where I work, we just have a
$REPO-archive.git with every tag/branch ever created for a given
$REPO.git, and delete refs after a certain time.

 * If your problem is that you're CPU bound on the master have you
considered maybe solving this with something like NFS, i.e. replace
your ad-hoc replication with just a bunch of "slave" boxes that mount
the remote filesystem.

 * Or, if you're willing to deal with occasional transitory repo
corruption (just retry?): rsync.

 * Theres's no reason for why your replication chain needs to be
single-level if master CPU is really the issue. You could have master
-> N slaves -> N*X slaves, or some combination thereof.

 * Does it really even matter that your "slave" machines are all
up-to-date? We have something similar at work but it's just a minutely
cronjob that does "git fetch" on some repos, since the downstream
thing (e.g. the chef run) doesn't run more than once every 30m or
whatever anyway.


Re: git submodules implementation question

2016-08-29 Thread Junio C Hamano
Uma Srinivasan  writes:

> git_dir = read_gitfile(buf.buf);
> if (!git_dir)
>
> git_dir = buf.buf;
>
> Can anyone explain to me why we are replacing a failed reading of a
> git file with the original sub directory name?

A top-level superproject can have a submodule bound at its "dir/"
directory, and "dir/.git" can either be a gitfile which you can read
with read_gitfile() and point into somewhere in ".git/modules/" of
the top-level superproject.  "dir/.git" can _ALSO_ be a fully valid
Git directory.  So at the top of a superproject, you could do

git clone $URL ./dir2
git add dir2

to clone an independent project into dir2 directory, and add it as a
new submodule.  The fallback is to support such a layout.



Re: [PATCH] p3400: make test script executable

2016-08-29 Thread Junio C Hamano
René Scharfe  writes:

> Signed-off-by: Rene Scharfe 
> ---
> This script was added by v2.10.0-rc0~3^2.

Thanks.  Will merge to 'master'.

>
>  t/perf/p3400-rebase.sh | 0
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  mode change 100644 => 100755 t/perf/p3400-rebase.sh
>
> diff --git a/t/perf/p3400-rebase.sh b/t/perf/p3400-rebase.sh
> old mode 100644
> new mode 100755


Re: [PATCH 02/22] sequencer: use memoized sequencer directory path

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 10:04, Johannes Schindelin pisze:
> Signed-off-by: Johannes Schindelin 
> ---
>  builtin/commit.c |  2 +-
>  sequencer.c  | 11 ++-
>  sequencer.h  |  5 +
>  3 files changed, 8 insertions(+), 10 deletions(-)

Just a sidenote: it would be probably easier to read with *.h before
*.c (at least this particular one).

> diff --git a/builtin/commit.c b/builtin/commit.c
> index 77e3dc8..0221190 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -173,7 +173,7 @@ static void determine_whence(struct wt_status *s)
>   whence = FROM_MERGE;
>   else if (file_exists(git_path_cherry_pick_head())) {
>   whence = FROM_CHERRY_PICK;
> - if (file_exists(git_path(SEQ_DIR)))
> + if (file_exists(git_path_seq_dir()))
>   sequencer_in_use = 1;
>   }
>   else

So it is more "Use memoized sequencer directory path" rather than
"sequencer: use memoized sequencer directory path" - it replaces
all occurrences of SEQ_DIR,... that's why it can be removed from
'sequencer.h'.

Though perhaps I misunderstood "sequencer: " prefix there.  Don't
mind me then.

> diff --git a/sequencer.c b/sequencer.c
> index b6481bb..4d2b4e3 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -21,10 +21,11 @@
>  const char sign_off_header[] = "Signed-off-by: ";
>  static const char cherry_picked_prefix[] = "(cherry picked from commit ";
>  
> -static GIT_PATH_FUNC(git_path_todo_file, SEQ_TODO_FILE)
> -static GIT_PATH_FUNC(git_path_opts_file, SEQ_OPTS_FILE)
> -static GIT_PATH_FUNC(git_path_seq_dir, SEQ_DIR)
> -static GIT_PATH_FUNC(git_path_head_file, SEQ_HEAD_FILE)
> +GIT_PATH_FUNC(git_path_seq_dir, "sequencer")
> +
> +static GIT_PATH_FUNC(git_path_todo_file, "sequencer/todo")
> +static GIT_PATH_FUNC(git_path_opts_file, "sequencer/opts")
> +static GIT_PATH_FUNC(git_path_head_file, "sequencer/head")

This also makes the ordering of memoized-path variables more
sensible.  Good work.

>  
>  static int is_rfc2822_line(const char *buf, int len)
>  {
> @@ -112,7 +113,7 @@ static void remove_sequencer_state(void)
>  {
>   struct strbuf seq_dir = STRBUF_INIT;
>  
> - strbuf_addstr(_dir, git_path(SEQ_DIR));
> + strbuf_addstr(_dir, git_path_seq_dir());
>   remove_dir_recursively(_dir, 0);
>   strbuf_release(_dir);
>  }
> diff --git a/sequencer.h b/sequencer.h
> index 2ca096b..c955594 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -1,10 +1,7 @@
>  #ifndef SEQUENCER_H
>  #define SEQUENCER_H
>  
> -#define SEQ_DIR  "sequencer"
> -#define SEQ_HEAD_FILE"sequencer/head"
> -#define SEQ_TODO_FILE"sequencer/todo"
> -#define SEQ_OPTS_FILE"sequencer/opts"
> +const char *git_path_seq_dir(void);

Right, I see this matches other git_path_*() functions declared in cache.h

>  
>  #define APPEND_SIGNOFF_DEDUP (1u << 0)
>  
> 



Re: [PATCH v13 00/14] libify apply and use lib in am, part 3

2016-08-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Christian Couder  writes:
>
>> Highlevel view of the patches in the series
>> ~~~
>>
>> This is "part 3" of the full patch series. I am resending only the
>> last 14 patches of the full series as "part 3", because I don't want
>> to resend the first 27 patches of v10 nearly as is.
>
> Just to make sure, you are sending the first 11 of these 14 exactly
> as-is, right?  I didn't spot anything different other than 12 and 13
> that replaced the "alternate index" step from the previous round.

In any case, I didn't spot anything glaringly wrong in the updated
part of the patches.  I'll wait for a few days and then mark it as
'Will merge to next' unless we hear comments from others.

Thanks.


Re: Notation for current branch?

2016-08-29 Thread Jacob Keller
On Mon, Aug 29, 2016 at 12:49 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>>> What's wrong with simply using 'HEAD' for scripting?
>>
>> When you want to display the current branch to the user, e.g. when
>> scripting a shell prompt or similar use
>
> Wait.  Even if a hypothetical version of Git understood @@ as "the
> current branch", how would you use that notation, instead of HEAD,
> to decide what to "display the current branch to the user"?
>
> You obviously will not be doing
>
> echo @@
>
> You would be doing something around "git symbolic-ref" at the very
> least, and wouldn't you be feeding HEAD to that symbolic-ref anyway
> while doing so?
>

Hmm. You are right, I mis-understood the original question. Use of
"git symbolic-ref" seems like the right thing if you need to obtain
the current branch name, and there's no reason to not just use HEAD
there.

Thanks,
Jake


[PATCH v2] blame: fix segfault on untracked files

2016-08-29 Thread Thomas Gummerer
Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
2016-07-16) git blame also looks at the index to determine if there is a
file that was freshly added to the index.

cache_name_pos returns -pos - 1 in case there is no match is found, or
if the name matches, but the entry has a stage other than 0.  As git
blame should work for unmerged files, it uses strcmp to determine
whether the name of the returned position matches, in which case the
file exists, but is merely unmerged, or if the file actually doesn't
exist in the index.

If the repository is empty, or if the file would lexicographically be
sorted as the last file in the repository, -cache_name_pos - 1 is
outside of the length of the active_cache array, causing git blame to
segfault.  Guard against that, and die() normally to restore the old
behaviour.

Reported-by: Simon Ruderich 
Helped-by: Junio C Hamano 
Signed-off-by: Thomas Gummerer 
---
> The point of this fix is not that we show the exact error message,
> but we fail in a controlled manner.  I think
>
> test_expect_success 'blame untracked file in empty repo' '
>>untracked &&
>test_must_fail git blame untracked
>'
>
> is sufficient.

Yeah, I agree with that.

Thanks for the review!

 builtin/blame.c  | 3 ++-
 t/t8002-blame.sh | 5 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 7ec7823..a5bbf91 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2244,7 +2244,8 @@ static void verify_working_tree_path(struct commit 
*work_tree, const char *path)
pos = cache_name_pos(path, strlen(path));
if (pos >= 0)
; /* path is in the index */
-   else if (!strcmp(active_cache[-1 - pos]->name, path))
+   else if (-1 - pos < active_nr &&
+!strcmp(active_cache[-1 - pos]->name, path))
; /* path is in the index, unmerged */
else
die("no such path '%s' in HEAD", path);
diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
index ff09ace..ab79de9 100755
--- a/t/t8002-blame.sh
+++ b/t/t8002-blame.sh
@@ -6,6 +6,11 @@ test_description='git blame'
 PROG='git blame -c'
 . "$TEST_DIRECTORY"/annotate-tests.sh
 
+test_expect_success 'blame untracked file in empty repo' '
+   >untracked &&
+   test_must_fail git blame untracked
+'
+
 PROG='git blame -c -e'
 test_expect_success 'blame --show-email' '
check_count \
-- 
2.10.0.rc2.246.g4fd2c60



Re: Notation for current branch?

2016-08-29 Thread Junio C Hamano
Jacob Keller  writes:

>> What's wrong with simply using 'HEAD' for scripting?
>
> When you want to display the current branch to the user, e.g. when
> scripting a shell prompt or similar use

Wait.  Even if a hypothetical version of Git understood @@ as "the
current branch", how would you use that notation, instead of HEAD,
to decide what to "display the current branch to the user"?

You obviously will not be doing

echo @@

You would be doing something around "git symbolic-ref" at the very
least, and wouldn't you be feeding HEAD to that symbolic-ref anyway
while doing so?



Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Junio C Hamano
Lars Schneider  writes:

> I see. Thanks for the explanation.

I expect the updated log message to explain the issue correctly
then.

>> And even on POSIX systems, if you are doing a long-running helper
>> any open file descriptor in the parent process when the long-running
>> helper is spawned will become leaked fd.  CLOEXEC is a possible
>> solution (but not necessarily the only or the best one) to the fd
>> leak in this case.
>> 
>> How much does the code that spawns these long-running helpers know
>> about the file descriptors that happen to be open?
>
> Nothing really.

OK.

>> The parent is
>> very likely to have pack windows open into .pack files and they need
>> to be closed on the child side after fork(2) starts the child
>> process but before execve(2) runs the helper, if we want to avoid
>> file descriptor leaks.
>
> I think I understand what you are saying. However, during my tests
> .pack file fd's were never a problem.

I do not expect during the lifetime of your long-running helper
anybody would try to unlink an existing packfile, so it is unlikely
that "cannot unlink an open file on Windows" issue to come up.  And
the cross-platform problem I pointed out is a fd leak; leaks would
not show up until you run out of the resource, just like you
wouldn't notice small memory leak here and there UNLESS you actively
look for them.  I would be surprised if your "tests" found anything.

> How would I find the open .pack file fd's?  Should I go through
> /proc/PID/fd? Why is this no problem for other longer running
> commands such as the git-credential-cache--daemon or git-daemon?

Nobody said this is "no problem for others".  While discussing the
patch that started this thread, we noticed that any open file
descriptor the main process has when the long-running clean/smudge
helper is spawned _is_ a problem.  Other helpers may share the same
problem, and if they do, that is all the more reason that fixing the
file descriptor leak is a good thing.

The primary thing I was wondering was if we should open the window
into packfiles with CLOEXEC, just like we recently started opening
the tempfiles and lockfiles with the flag.  The reason why I asked
if the site that spawns (i.e. run_command API) has an easy access to
the list of file descriptors that we opened ONLY for ourselves is
because that would make it possible to consider an alternative
approach close them before execve(2) in run_commands API.  That
would save us from having to sprinkle existing calls to open(2) with
CLOEXEC.  But if that is not the case, opening them with CLOEXEC is
a much better solution than going outside the program to "find" them
in non-portable ways.

Thanks.


Re: [PATCH v5 09/12] doc: revisions - define `reachable`

2016-08-29 Thread Philip Oakley

From: "Jakub Narębski" 

W dniu 29.08.2016 o 15:21, Philip Oakley pisze:

From: "Jakub Narębski" 
Sent: Sunday, August 28, 2016 2:01 PM

W dniu 12.08.2016 o 09:07, Philip Oakley pisze:

[...]


+For these commands,
+specifying a single revision, using the notation described in the
+previous section, means the set of commits `reachable` from the given
+commit.

[...]

+
+A commit's reachable set is the commit itself and the commits in
+its ancestry chain.
+


It is all right, but perhaps it would be better to just repeat:

 +Set of commits reachable from given commit is the commit itself
 +and all the commits in its ancestry chain.


It's very easy to go around in circles here. The original issue was
the A..B notation for the case where A is a direct descendant of B,
such that new users, or users more familiar with range notations from
elsewhere, would expect that the A..B range is *inclusive*, rather
than an open / closed interval. It was the addressing of that problem
that lead to the updating of the 'reachability' definition.


All right, I can see that.  It is a worthwhile goal.


The main part of my sentence formation was to make the 'reachable'
part the defining element, rather than being a feature of the set.
Maybe it's using the 'set' viewpoint that is distracting?>>


One one hand, the "A commit's reachable set is ..." approach puts
'reachable' upfront.  On the other hand it introduces new terminology,
namely 'reachable set' (or 'reachable set of a commit' to be more
exact)...  it doesn't read that well to me, but I am not a native
speaker.

But as I wrote, this is quite all right anyway
--
Jakub Narębski

--
Thanks.

Philip 



Re: Reducing CPU load on git server

2016-08-29 Thread W. David Jarvis
> So your load is probably really spiky, as you get thundering herds of
> fetchers after every push (the spikes may have a long flatline at the
> top, as it takes time to process the whole herd).

It is quite spiky, yes. At the moment, however, the replication fleet
is relatively small (at the moment it's just 4 machines). We had 6
machines earlier this month and we had hoped that terminating two of
them would lead to a material drop in CPU usage, but we didn't see a
really significant reduction.

> Yes, though I'd be surprised if this negotiation is that expensive in
> practice. In my experience it's not generally, and even if we ended up
> traversing every commit in the repository, that's on the order of a few
> seconds even for large, active repositories.
>
> In my experience, the problem in a mass-fetch like this ends up being
> pack-objects preparing the packfile. It has to do a similar traversal,
> but _also_ look at all of the trees and blobs reachable from there, and
> then search for likely delta-compression candidates.
>
> Do you know which processes are generating the load? git-upload-pack
> does the negotiation, and then pack-objects does the actual packing.

When I look at expensive operations (ones that I can see consuming
90%+ of a CPU for more than a second), there are often pack-objects
processes running that will consume an entire core for multiple
seconds (I also saw one pack-object counting process run for several
minutes while using up a full core). rev-list shows up as a pretty
active CPU consumer, as do prune and blame-tree.

I'd say overall that in terms of high-CPU consumption activities,
`prune` and `rev-list` show up the most frequently.

On the subject of prune - I forgot to mention that the `git fetch`
calls from the subscribers are running `git fetch --prune`. I'm not
sure if that changes the projected load profile.

> Maybe. If pack-objects is where your load is coming from, then
> counter-intuitively things sometimes get _worse_ as you fetch less. The
> problem is that git will generally re-use deltas it has on disk when
> sending to the clients. But if the clients are missing some of the
> objects (because they don't fetch all of the branches), then we cannot
> use those deltas and may need to recompute new ones. So you might see
> some parts of the fetch get cheaper (negotiation, pack-object's
> "Counting objects" phase), but "Compressing objects" gets more
> expensive.

I might be misunderstanding this, but if the subscriber is already "up
to date" modulo a single updated ref tip, then this problem shouldn't
occur, right? Concretely: if ref A is built off of ref B, and the
subscriber already has B when it requests A, that shouldn't cause this
behavior, but it would cause this behavior if the subscriber didn't
have B when it requested A.

> This is particularly noticeable with shallow fetches, which in my
> experience are much more expensive to serve.

I don't think we're doing shallow fetches anywhere in this system.

> Jakub mentioned bitmaps, and if you are using GitHub Enterprise, they
> are enabled. But they won't really help here. They are essentially
> cached information that git generates at repack time. But if we _just_
> got a push, then the new objects to fetch won't be part of the cache,
> and we'll fall back to traversing them as normal.  On the other hand,
> this should be a relatively small bit of history to traverse, so I'd
> doubt that "Counting objects" is that expensive in your case (but you
> should be able to get a rough sense by watching the progress meter
> during a fetch).

See comment above about a long-running counting objects process. I
couldn't tell which of our repositories it was counting, but it went
for about 3 minutes with full core utilization. I didn't see us
counting pack-objects frequently but it's an expensive operation.

> I'd suspect more that delta compression is expensive (we know we just
> got some new objects, but we don't know if we can make good deltas
> against the objects the client already has). That's a gut feeling,
> though.
>
> If the fetch is small, that _also_ shouldn't be too expensive. But
> things add up when you have a large number of machines all making the
> same request at once. So it's entirely possible that the machine just
> gets hit with a lot of 5s CPU tasks all at once. If you only have a
> couple cores, that takes many multiples of 5s to clear out.

I think this would show up if I was sitting around running `top` on
the machine, but that doesn't end up being what I see. That might just
be a function of there being a relatively small number of replication
machines, I'm not sure. But I'm not noticing 4 of the same tasks get
spawned simultaneously, which says to me that we're either utilizing a
cache or there's some locking behavior involved.

> There's nothing in upstream git to help smooth these loads, but since
> you mentioned GitHub Enterprise, I happen to know that it does have a
> system for coalescing 

Re: [PATCH v13 00/14] libify apply and use lib in am, part 3

2016-08-29 Thread Junio C Hamano
Christian Couder  writes:

> Highlevel view of the patches in the series
> ~~~
>
> This is "part 3" of the full patch series. I am resending only the
> last 14 patches of the full series as "part 3", because I don't want
> to resend the first 27 patches of v10 nearly as is.

Just to make sure, you are sending the first 11 of these 14 exactly
as-is, right?  I didn't spot anything different other than 12 and 13
that replaced the "alternate index" step from the previous round.



Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Lars Schneider

> On 29 Aug 2016, at 20:05, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider 
>> 
>> Consider the case of a file that requires filtering and is present in
>> branch A but not in branch B. If A is the current HEAD and we checkout B
>> then the following happens:
>> 
>> 1. ce_compare_data() opens the file
>> 2.   index_fd() detects that the file requires to run a clean filter and
>> calls index_stream_convert_blob()
>> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
>> 5.   convert_to_git_filter_fd() calls apply_filter() which creates a
>> new long running filter process (in case it is the first file
>> of this kind to be filtered)
>> 6.   The new filter process inherits all file handles. This is the
>> default on Linux/OSX and is explicitly defined in the
>> `CreateProcessW` call in `mingw.c` on Windows.
>> 7. ce_compare_data() closes the file
>> 8. Git unlinks the file as it is not present in B
>> 
>> The unlink operation does not work on Windows because the filter process
>> has still an open handle to the file. Apparently that is no problem on
>> Linux/OSX. Probably because "[...] the two file descriptors share open
>> file status flags" (see fork(2)).
> 
> Wait, a, minute.  "that is no problem" may be true as long as "that"
> is "unlinking the now-gone file in the filesystem", but the reason
> does not have anything to do with the "open-file status flags";
> unlike Windows, you _can_ unlink file that has an open file
> descriptor on it.

I see. Thanks for the explanation.

> 
> And even on POSIX systems, if you are doing a long-running helper
> any open file descriptor in the parent process when the long-running
> helper is spawned will become leaked fd.  CLOEXEC is a possible
> solution (but not necessarily the only or the best one) to the fd
> leak in this case.
> 
> How much does the code that spawns these long-running helpers know
> about the file descriptors that happen to be open?

Nothing really.

>  The parent is
> very likely to have pack windows open into .pack files and they need
> to be closed on the child side after fork(2) starts the child
> process but before execve(2) runs the helper, if we want to avoid
> file descriptor leaks.

I think I understand what you are saying. However, during my tests
.pack file fd's were never a problem. I use start_command() [1]
which wraps the fork() and exec calls [2].

How would I find the open .pack file fd's? Should I go through 
/proc/PID/fd? Why is this no problem for other longer running 
commands such as the git-credential-cache--daemon or git-daemon?

Thanks,
Lars


[1] https://github.com/larsxschneider/git/blob/protocol-filter/v6/convert.c#L566
[2] 
https://github.com/larsxschneider/git/blob/protocol-filter/v6/run-command.c#L345-L412




Re: [PATCH 1/2] gitk: align the commit summary format to the documentation

2016-08-29 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 26.08.2016 um 20:24 schrieb Junio C Hamano:
>> Beat Bolli  writes:
>>> In 175d38c ("SubmittingPatches: document how to reference previous commits",
>>> 2016-07-28) the format for referring to older commits was specified.
>>
>> is easier to read when pasted into a sentence than what the recent
>> update 175d38ca ("SubmittingPatches: document how to reference
>> previous commits", 2016-07-28) suggests to do, i.e.
>
> While it may be easier to read due to the extra mark-up, the resulting
> text where such a quotation appears does not flow well, IMO. A commit
> message text that references another commit reads more fluently
> without the quotes around the summary line because the quoted text is
> not so much a quotation that must be marked, but a parenthetical
> statement.
>
> I absolutely welcome the proposed change to gitk, because I always
> edit out the double-quotes.

I think that is highly subjective, and as you very well may know,
I've been referring to commits without double-quote pair, and have
an obvious bias for something I am used to ;-)

I do not see the "" as introducing a quotation.  I just view it as
very similar to the "" in the following sentence:

The commit whose title is "foo bar" did not consider there is
also need to consider baz.

The whole thing is inside () pair, so I agree that with or without
"" pair, it is possible to see where the title ends.  So I do not
have a strong opinion either way.



Re: [PATCH 1/2] gitk: align the commit summary format to the documentation

2016-08-29 Thread Jeff King
On Mon, Aug 29, 2016 at 11:17:19AM -0700, Junio C Hamano wrote:

> > While it may be easier to read due to the extra mark-up, the resulting
> > text where such a quotation appears does not flow well, IMO. A commit
> > message text that references another commit reads more fluently
> > without the quotes around the summary line because the quoted text is
> > not so much a quotation that must be marked, but a parenthetical
> > statement.
> >
> > I absolutely welcome the proposed change to gitk, because I always
> > edit out the double-quotes.
> 
> I think that is highly subjective, and as you very well may know,
> I've been referring to commits without double-quote pair, and have
> an obvious bias for something I am used to ;-)
> 
> I do not see the "" as introducing a quotation.  I just view it as
> very similar to the "" in the following sentence:
> 
> The commit whose title is "foo bar" did not consider there is
> also need to consider baz.
> 
> The whole thing is inside () pair, so I agree that with or without
> "" pair, it is possible to see where the title ends.  So I do not
> have a strong opinion either way.

I have an alias which produces similar output, without the double-quotes
(probably because I stole it from you originally).

I have noticed over the years that the output is occasionally ugly when
the commit names have parentheses themselves. E.g.:

  $ git config alias.ll
  !git --no-pager log -1 --pretty='tformat:%h (%s, %ad)' --date=short

  $ git ll 7e97e10
  7e97e10 (die(_("BUG")): avoid translating bug messages, 2016-07-26)

  $ git ll fa90ab4
  fa90ab4 (t3404: fix a grammo (commands are ran -> commands are run), 
2016-06-29)

Adding quotes can help with that. OTOH, I think it just introduces the
same problem with a different character. E.g.:

  $ git ll be00b57
  be00b57 (provide an initializer for "struct object_info", 2016-08-11)

  $ git llq be00b57
  be00b57 ("provide an initializer for "struct object_info"", 2016-08-11)

Perhaps one could write a script to find a custom pretty non-conflicting
delimiter for each case, but I don't know if it's worth the effort. :)

-Peff


Re: Crash when using git blame on untracked file

2016-08-29 Thread Junio C Hamano
Thomas Gummerer  writes:

> Subject: [PATCH] blame: fix segfault on untracked files
>
> Since 3b75ee9 ("blame: allow to blame paths freshly added to the index",
> 2016-07-16) git blame also looks at the index to determine if there is a
> file that was freshly added to the index.
>
> cache_name_pos returns -pos - 1 in case there is no match is found, or
> if the name matches, but the entry has a stage other than 0.  As git
> blame should work for unmerged files, it uses strcmp to determine
> whether the name of the returned position matches, in which case the
> file exists, but is merely unmerged, or if the file actually doesn't
> exist in the index.
>
> If the repository is empty, or if the file would lexicographically be
> sorted as the last file in the repository, -cache_name_pos - 1 is
> outside of the length of the active_cache array, causing git blame to
> segfault.  Guard against that, and die() normally to restore the old
> behaviour.
>
> Reported-by: Simon Ruderich 
> Signed-off-by: Thomas Gummerer 
> ---

This is a recent regression and unfortunately is also in 2.9.3; the
patch looks obviously correct.

>  builtin/blame.c  | 3 ++-
>  t/t8002-blame.sh | 7 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/blame.c b/builtin/blame.c
> index 7ec7823..a5bbf91 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -2244,7 +2244,8 @@ static void verify_working_tree_path(struct commit 
> *work_tree, const char *path)
>   pos = cache_name_pos(path, strlen(path));
>   if (pos >= 0)
>   ; /* path is in the index */
> - else if (!strcmp(active_cache[-1 - pos]->name, path))
> + else if (-1 - pos < active_nr &&
> +  !strcmp(active_cache[-1 - pos]->name, path))
>   ; /* path is in the index, unmerged */
>   else
>   die("no such path '%s' in HEAD", path);
> diff --git a/t/t8002-blame.sh b/t/t8002-blame.sh
> index ff09ace..7983bb7 100755
> --- a/t/t8002-blame.sh
> +++ b/t/t8002-blame.sh
> @@ -6,6 +6,13 @@ test_description='git blame'
>  PROG='git blame -c'
>  . "$TEST_DIRECTORY"/annotate-tests.sh
>  
> +test_expect_success 'blame untracked file in empty repo' '
> + touch untracked &&
> + test_must_fail git blame untracked 2>actual.err &&
> + echo "fatal: no such path '\''untracked'\'' in HEAD" >expected.err &&
> + test_cmp expected.err actual.err
> +'

The point of this fix is not that we show the exact error message,
but we fail in a controlled manner.  I think

test_expect_success 'blame untracked file in empty repo' '
>untracked &&
test_must_fail git blame untracked
'

is sufficient.

Thanks.

>  PROG='git blame -c -e'
>  test_expect_success 'blame --show-email' '
>   check_count \


Re: [PATCH] Documentation/SubmittingPatches: add quotes to advised commit reference

2016-08-29 Thread Junio C Hamano
Jakub Narębski  writes:

> W dniu 27.08.2016 o 00:42, Junio C Hamano  pisze:
>> Stefan Beller  writes:
>
>> -- >8 --
>> From: Beat Bolli 
>
> ???

The primary value the change adds is to make readers aware of the
gitk feature.  That comes from the primary author.  Not something I
added, Stefan added or Heiko added.


Re: [PATCH v6 00/13] Git filter protocol

2016-08-29 Thread Junio C Hamano
Lars Schneider  writes:

>> On 25 Aug 2016, at 13:07, larsxschnei...@gmail.com wrote:
>> 
>> From: Lars Schneider 
>> 
>> The goal of this series is to avoid launching a new clean/smudge filter
>> process for each file that is filtered.
>> 
>> A short summary about v1 to v5 can be found here:
>> https://git.github.io/rev_news/2016/08/17/edition-18/
>> 
>> This series is also published on web:
>> https://github.com/larsxschneider/git/pull/10
>
> Hi Junio,
>
> I am working on v7 of this patch series and I noticed that is does
> not merge clean anymore (not even with git/git master). Should
> I rebase my patch series? If yes, what would be the most
> convenient base for you?

Depends on what you are conflicting with. I think you want to use
something that has at least 05d1ed61 ("mingw: ensure temporary file
handles are not inherited by child processes", 2016-08-22).  Perhaps
2.10 final?


Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> From: Lars Schneider 
>
> Consider the case of a file that requires filtering and is present in
> branch A but not in branch B. If A is the current HEAD and we checkout B
> then the following happens:
>
> 1. ce_compare_data() opens the file
> 2.   index_fd() detects that the file requires to run a clean filter and
>  calls index_stream_convert_blob()
> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
> 5.   convert_to_git_filter_fd() calls apply_filter() which creates a
>  new long running filter process (in case it is the first file
>  of this kind to be filtered)
> 6.   The new filter process inherits all file handles. This is the
>  default on Linux/OSX and is explicitly defined in the
>  `CreateProcessW` call in `mingw.c` on Windows.
> 7. ce_compare_data() closes the file
> 8. Git unlinks the file as it is not present in B
>
> The unlink operation does not work on Windows because the filter process
> has still an open handle to the file. Apparently that is no problem on
> Linux/OSX. Probably because "[...] the two file descriptors share open
> file status flags" (see fork(2)).

Wait, a, minute.  "that is no problem" may be true as long as "that"
is "unlinking the now-gone file in the filesystem", but the reason
does not have anything to do with the "open-file status flags";
unlike Windows, you _can_ unlink file that has an open file
descriptor on it.

And even on POSIX systems, if you are doing a long-running helper
any open file descriptor in the parent process when the long-running
helper is spawned will become leaked fd.  CLOEXEC is a possible
solution (but not necessarily the only or the best one) to the fd
leak in this case.

How much does the code that spawns these long-running helpers know
about the file descriptors that happen to be open?  The parent is
very likely to have pack windows open into .pack files and they need
to be closed on the child side after fork(2) starts the child
process but before execve(2) runs the helper, if we want to avoid
file descriptor leaks.

> Fix this problem by opening files in read-cache with the `O_CLOEXEC`
> flag to ensure that the file descriptor does not remain open in a newly
> spawned process. `O_CLOEXEC` is defined as `O_NOINHERIT` on Windows. A
> similar fix for temporary file handles was applied on Git for Windows
> already:
> https://github.com/git-for-windows/git/commit/667b8b51ec850c3e1c7d75dee69dc13c29d1f162

After a few iterations, the final version of the same commit is now
in our tree as d5cb9cbd ("Git 2.10-rc2", 2016-08-26).


[PATCH v1] pack-protocol: fix maximum pkt-line size

2016-08-29 Thread larsxschneider
From: Lars Schneider 

According to LARGE_PACKET_MAX in pkt-line.h the maximal length of a
pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and
therefore the pkt-line data component must not exceed 65516 bytes.

Signed-off-by: Lars Schneider 
---

This patch was part of my "Git filter protocol" series [1]. Stefan
suggested to submit this patch individually as it is not strictly
required in the series [2].

Thanks,
Lars


[1] http://public-inbox.org/git/20160825110752.31581-1-larsxschnei...@gmail.com/
[2] 
http://public-inbox.org/git/cagz79kajn-yf28+ls2jyqss6jt-oj2jw-saxqn-oe5xaxsy...@mail.gmail.com/


 Documentation/technical/protocol-common.txt | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/technical/protocol-common.txt 
b/Documentation/technical/protocol-common.txt
index bf30167..ecedb34 100644
--- a/Documentation/technical/protocol-common.txt
+++ b/Documentation/technical/protocol-common.txt
@@ -67,9 +67,9 @@ with non-binary data the same whether or not they contain the 
trailing
 LF (stripping the LF if present, and not complaining when it is
 missing).

-The maximum length of a pkt-line's data component is 65520 bytes.
-Implementations MUST NOT send pkt-line whose length exceeds 65524
-(65520 bytes of payload + 4 bytes of length data).
+The maximum length of a pkt-line's data component is 65516 bytes.
+Implementations MUST NOT send pkt-line whose length exceeds 65520
+(65516 bytes of payload + 4 bytes of length data).

 Implementations SHOULD NOT send an empty pkt-line ("0004").

--
2.9.2



Re: [PATCH v6 10/13] convert: generate large test files only once

2016-08-29 Thread Junio C Hamano
Lars Schneider  writes:

>> On 25 Aug 2016, at 21:17, Stefan Beller  wrote:
>> 
>>> On Thu, Aug 25, 2016 at 4:07 AM,   wrote:
>>> From: Lars Schneider 
>>> 
>>> Generate more interesting large test files
>> 
>> How are the large test files more interesting?
>> (interesting in the notion of covering more potential bugs?
>> easier to debug? better to maintain, or just a pleasant read?)
>
> The old large test file was 1MB of zeros and 1 byte with a one, repeated 2048 
> times.
>
> Since the filter uses 64k packets we would test a large number of equally 
> looking packets.
>
> That's why I thought the pseudo random content is more interesting.

I guess my real question is why it is not just a single invocation
of test-genrandom that gives you the whole test file; if you are
using 20MB, the simplest would be to grab 20MB out of test-genrandom.
With that hopefully you won't see large number of equally looking
packets, no?


Re: [PATCH v6 10/13] convert: generate large test files only once

2016-08-29 Thread Junio C Hamano
larsxschnei...@gmail.com writes:

> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index 7b45136..34c8eb9 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
>  
>  . ./test-lib.sh
>  
> +if test_have_prereq EXPENSIVE
> +then
> + T0021_LARGE_FILE_SIZE=2048
> + T0021_LARGISH_FILE_SIZE=100
> +else
> + T0021_LARGE_FILE_SIZE=30
> + T0021_LARGISH_FILE_SIZE=2
> +fi

Minor: do we need T0021_ prefix?  What are you trying to avoid
collisions with?

> + git checkout -- test test.t test.i &&
> +
> + mkdir generated-test-data &&
> + for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
> + do
> + RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
> + ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"

In earlier iteration of loop with lower $i, what guarantees that
some bytes survive "tr -dc"?

> + # Generate 1MB of empty data and 100 bytes of random characters

100 bytes?  It seems to me that you are giving 1MB and then $i-byte
or less (which sometimes can be zero) of random string.

> + # printf "$(test-genrandom start $i)"
> + printf "%1048576d" 1 >>generated-test-data/large.file &&
> + printf "$RANDOM_STRING" >>generated-test-data/large.file &&
> + printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
> + printf "$ROT_RANDOM_STRING" 
> >>generated-test-data/large.file.rot13 &&
> +
> + if test $i = $T0021_LARGISH_FILE_SIZE
> + then
> + cat generated-test-data/large.file 
> >generated-test-data/largish.file &&
> + cat generated-test-data/large.file.rot13 
> >generated-test-data/largish.file.rot13
> + fi
> + done

This "now we are done with the loop, so copy them to the second
pair" needs to be in the loop?  Shouldn't it come after 'done'?

I do not quite get the point of this complexity.  You are using
exactly the same seed "end" every time, so in the first round you
have 1M of SP, letter '1', letter 'S' (from the genrandom), then
in the second round you have 1M of SP, letter '1', letter 'S' and
letter 'p' (the last two from the genrandom), and go on.  Is it
significant for the purpose of your test that the cruft inserted
between the repetition of 1M of SP gets longer by one byte but they
all share the same prefix (e.g. "1S", "1Sp", "1SpZ", "1SpZT",
... are what you insert between a large run of spaces)?


Re: [PATCH 01/22] sequencer: use static initializers for replay_opts

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 11:19, Dennis Kaarsemaker pisze:
> On ma, 2016-08-29 at 10:03 +0200, Johannes Schindelin wrote:
> 
>> +#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, 
>> NULL, NULL, 0, 0, NULL }
> 
> This looked off to me, as it replaces memset(..., 0, ...) so is not
> 100% equivalent. But the changed functions both set opts.action and
> call parse_args which sets opts.subcommand.

This information would be nice to have in the commit message.

-- 
Jakub Narębski



[PATCH v4 3/3] diff-highlight: add support for --graph output.

2016-08-29 Thread Brian Henderson
---
 contrib/diff-highlight/diff-highlight| 19 +--
 contrib/diff-highlight/t/t9400-diff-highlight.sh |  2 +-
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index ffefc31..9280c88 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -21,6 +21,10 @@ my $RESET = "\x1b[m";
 my $COLOR = qr/\x1b\[[0-9;]*m/;
 my $BORING = qr/$COLOR|\s/;
 
+# The patch portion of git log -p --graph should only ever have preceding | and
+# not / or \ as merge history only shows up on the commit line.
+my $GRAPH = qr/$COLOR?\|$COLOR?\s+/;
+
 my @removed;
 my @added;
 my $in_hunk;
@@ -32,12 +36,12 @@ $SIG{PIPE} = 'DEFAULT';
 while (<>) {
if (!$in_hunk) {
print;
-   $in_hunk = /^$COLOR*\@/;
+   $in_hunk = /^$GRAPH*$COLOR*\@/;
}
-   elsif (/^$COLOR*-/) {
+   elsif (/^$GRAPH*$COLOR*-/) {
push @removed, $_;
}
-   elsif (/^$COLOR*\+/) {
+   elsif (/^$GRAPH*$COLOR*\+/) {
push @added, $_;
}
else {
@@ -46,7 +50,7 @@ while (<>) {
@added = ();
 
print;
-   $in_hunk = /^$COLOR*[\@ ]/;
+   $in_hunk = /^$GRAPH*$COLOR*[\@ ]/;
}
 
# Most of the time there is enough output to keep things streaming,
@@ -163,6 +167,9 @@ sub highlight_pair {
}
 }
 
+# we split either by $COLOR or by character. This has the side effect of
+# leaving in graph cruft. It works because the graph cruft does not contain "-"
+# or "+"
 sub split_line {
local $_ = shift;
return utf8::decode($_) ?
@@ -211,8 +218,8 @@ sub is_pair_interesting {
my $suffix_a = join('', @$a[($sa+1)..$#$a]);
my $suffix_b = join('', @$b[($sb+1)..$#$b]);
 
-   return $prefix_a !~ /^$COLOR*-$BORING*$/ ||
-  $prefix_b !~ /^$COLOR*\+$BORING*$/ ||
+   return $prefix_a !~ /^$GRAPH*$COLOR*-$BORING*$/ ||
+  $prefix_b !~ /^$GRAPH*$COLOR*\+$BORING*$/ ||
   $suffix_a !~ /^$BORING*$/ ||
   $suffix_b !~ /^$BORING*$/;
 }
diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh 
b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index 54e11fe..e42232d 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -209,7 +209,7 @@ test_expect_failure 'diff-highlight highlights mismatched 
hunk size' '
 
 # TODO add multi-byte test
 
-test_expect_failure 'diff-highlight works with the --graph option' '
+test_expect_success 'diff-highlight works with the --graph option' '
dh_test_setup_history &&
 
# topo-order so that the order of the commits is the same as with 
--graph
-- 
2.9.3



[PATCH v4 0/3] diff-highlight: add support for --graph option

2016-08-29 Thread Brian Henderson
How does this look?

Drawing the graph helped me a lot in figuring out what I was actually testing. 
thanks!

Brian Henderson (3):
  diff-highlight: add some tests.
  diff-highlight: add failing test for handling --graph output.
  diff-highlight: add support for --graph output.

 contrib/diff-highlight/Makefile  |   5 +
 contrib/diff-highlight/diff-highlight|  19 +-
 contrib/diff-highlight/t/Makefile|  22 +++
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 223 +++
 4 files changed, 263 insertions(+), 6 deletions(-)
 create mode 100644 contrib/diff-highlight/Makefile
 create mode 100644 contrib/diff-highlight/t/Makefile
 create mode 100755 contrib/diff-highlight/t/t9400-diff-highlight.sh

-- 
2.9.3



[PATCH v4 1/3] diff-highlight: add some tests.

2016-08-29 Thread Brian Henderson
---
 contrib/diff-highlight/Makefile  |   5 +
 contrib/diff-highlight/t/Makefile|  22 +++
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 163 +++
 3 files changed, 190 insertions(+)
 create mode 100644 contrib/diff-highlight/Makefile
 create mode 100644 contrib/diff-highlight/t/Makefile
 create mode 100755 contrib/diff-highlight/t/t9400-diff-highlight.sh

diff --git a/contrib/diff-highlight/Makefile b/contrib/diff-highlight/Makefile
new file mode 100644
index 000..9018724
--- /dev/null
+++ b/contrib/diff-highlight/Makefile
@@ -0,0 +1,5 @@
+# nothing to build
+all:
+
+test:
+   $(MAKE) -C t
diff --git a/contrib/diff-highlight/t/Makefile 
b/contrib/diff-highlight/t/Makefile
new file mode 100644
index 000..5ff5275
--- /dev/null
+++ b/contrib/diff-highlight/t/Makefile
@@ -0,0 +1,22 @@
+-include ../../../config.mak.autogen
+-include ../../../config.mak
+
+# copied from ../../t/Makefile
+SHELL_PATH ?= $(SHELL)
+SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+T = $(wildcard t[0-9][0-9][0-9][0-9]-*.sh)
+
+all: test
+test: $(T)
+
+.PHONY: help clean all test $(T)
+
+help:
+   @echo 'Run "$(MAKE) test" to launch test scripts'
+   @echo 'Run "$(MAKE) clean" to remove trash folders'
+
+$(T):
+   @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
+
+clean:
+   $(RM) -r 'trash directory'.*
diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh 
b/contrib/diff-highlight/t/t9400-diff-highlight.sh
new file mode 100755
index 000..7c303f7
--- /dev/null
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -0,0 +1,163 @@
+#!/bin/sh
+
+test_description='Test diff-highlight'
+
+CURR_DIR=$(pwd)
+TEST_OUTPUT_DIRECTORY=$(pwd)
+TEST_DIRECTORY="$CURR_DIR"/../../../t
+DIFF_HIGHLIGHT="$CURR_DIR"/../diff-highlight
+
+CW="$(printf "\033[7m")"   # white
+CR="$(printf "\033[27m")"  # reset
+
+. "$TEST_DIRECTORY"/test-lib.sh
+
+if ! test_have_prereq PERL
+then
+   skip_all='skipping diff-highlight tests; perl not available'
+   test_done
+fi
+
+# dh_test is a test helper function which takes 3 file names as parameters. The
+# first 2 files are used to generate diff and commit output, which is then
+# piped through diff-highlight. The 3rd file should contain the expected output
+# of diff-highlight (minus the diff/commit header, ie. everything after and
+# including the first @@ line).
+dh_test () {
+   a="$1" b="$2" &&
+
+   cat >patch.exp &&
+
+   {
+   cat "$a" >file &&
+   git add file &&
+   git commit -m "Add a file" &&
+
+   cat "$b" >file &&
+   git diff file >diff.raw &&
+   git commit -am "Update a file" &&
+   git show >commit.raw
+   } >/dev/null &&
+
+   "$DIFF_HIGHLIGHT" diff.act &&
+   "$DIFF_HIGHLIGHT" commit.act &&
+   test_cmp patch.exp diff.act &&
+   test_cmp patch.exp commit.act
+}
+
+test_strip_patch_header () {
+   sed -n '/^@@/,$p' $*
+}
+
+test_expect_success 'diff-highlight highlights the beginning of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   0bb
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -${CW}b${CR}bb
+   +${CW}0${CR}bb
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight highlights the end of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   bb0
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -bb${CW}b${CR}
+   +bb${CW}0${CR}
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight highlights the middle of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   b0b
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -b${CW}b${CR}b
+   +b${CW}0${CR}b
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight does not highlight whole line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   000
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -bbb
+   +000
+ccc
+   EOF
+'
+
+test_expect_failure 'diff-highlight highlights mismatched hunk size' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   EOF
+
+   cat >b <<-\EOF &&
+ 

[PATCH v4 2/3] diff-highlight: add failing test for handling --graph output.

2016-08-29 Thread Brian Henderson
---
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 60 
 1 file changed, 60 insertions(+)

diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh 
b/contrib/diff-highlight/t/t9400-diff-highlight.sh
index 7c303f7..54e11fe 100755
--- a/contrib/diff-highlight/t/t9400-diff-highlight.sh
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -49,6 +49,55 @@ test_strip_patch_header () {
sed -n '/^@@/,$p' $*
 }
 
+# dh_test_setup_history generates a contrived graph such that we have at least
+# 1 nesting (E) and 2 nestings (F).
+#
+#A branch
+#   /
+#  D---E---F master
+#
+#  git log --all --graph
+#  * commit
+#  |A
+#  | * commit
+#  | |F
+#  | * commit
+#  |/
+#  |E
+#  * commit
+#   D
+#
+dh_test_setup_history () {
+   echo "file1" >file1 &&
+   echo "file2" >file2 &&
+   echo "file3" >file3 &&
+
+   cat file1 >file &&
+   git add file &&
+   git commit -m "D" &&
+
+   git checkout -b branch &&
+   cat file2 >file &&
+   git commit -am "A" &&
+
+   git checkout master &&
+   cat file2 >file &&
+   git commit -am "E" &&
+
+   cat file3 >file &&
+   git commit -am "F"
+}
+
+left_trim () {
+   "$PERL_PATH" -pe 's/^\s+//'
+}
+
+trim_graph () {
+   # graphs start with * or |
+   # followed by a space or / or \
+   "$PERL_PATH" -pe 's@^((\*|\|)( |/|\\))+@@'
+}
+
 test_expect_success 'diff-highlight highlights the beginning of a line' '
cat >a <<-\EOF &&
aaa
@@ -160,4 +209,15 @@ test_expect_failure 'diff-highlight highlights mismatched 
hunk size' '
 
 # TODO add multi-byte test
 
+test_expect_failure 'diff-highlight works with the --graph option' '
+   dh_test_setup_history &&
+
+   # topo-order so that the order of the commits is the same as with 
--graph
+   # trim graph elements so we can do a diff
+   # trim leading space because our trim_graph is not perfect
+   git log --branches -p --topo-order | "$DIFF_HIGHLIGHT" | left_trim 
>graph.exp &&
+   git log --branches -p --graph | "$DIFF_HIGHLIGHT" | trim_graph | 
left_trim >graph.act &&
+   test_cmp graph.exp graph.act
+'
+
 test_done
-- 
2.9.3



Re: Reducing CPU load on git server

2016-08-29 Thread Jeff King
On Mon, Aug 29, 2016 at 12:46:27PM +0200, Jakub Narębski wrote:

> > So your load is probably really spiky, as you get thundering herds of
> > fetchers after every push (the spikes may have a long flatline at the
> > top, as it takes time to process the whole herd).
> 
> One solution I have heard about, in the context of web cache, to reduce
> the thundering herd problem (there caused by cache expiring at the same
> time in many clients) was to add some random or quasi-random distribution
> to expiration time.  In your situation adding a random delay with some
> specified deviation could help.

That smooths the spikes, but you still have to serve all of the requests
eventually. So if your problem is that the load spikes and the system
slows to a crawl as a result (or runs out of RAM, etc), then
distributing the load helps. But if you have enough load that your
system is constantly busy, queueing the load in a different order just
shifts it around.

GHE will also introduce delays into starting git when load spikes, but
that's a separate system that coalescing identical requests.

> I wonder if this system for coalescing multiple fetches is something
> generic, or is it something specific to GitHub / GitHub Enterprise
> architecture?  If it is the former, would it be considered for
> upstreaming, and if so, when it would be in Git itself?

I've already sent upstream the patch for a "hook" that sits between
upload-pack and pack-objects (and it will be in v2.10). So that can call
an arbitrary script which can then make scheduling policy for
pack-objects, coalesce similar requests, etc.

GHE has a generic tool for coalescing program invocations that is not
Git-specific at all (it compares its stdin and command line arguments to
decide when two requests are identical, runs the command on its
arguments, and then passes the output to all members of the herd). That
_might_ be open-sourced in the future, but I don't have a specific
timeline.

> One thing to note: if you have repositories which are to have the
> same contents, you can distribute the pack-file to them and update
> references without going through Git.  It can be done on push
> (push to master, distribute to mirrors), or as part of fetch
> (master fetches from central repository, distributes to mirrors).
> I think; I have never managed large set of replicated Git repositories.

Doing it naively has some gotchas, because you want to make sure you
have all of the necessary objects. But if you are going this route,
probably distributed a git-bundle is the simplest way.

> > Generally no, they should not conflict. Writes into the object database
> > can happen simultaneously. Ref updates take a per-ref lock, so you
> > should generally be able to write two unrelated refs at once. The big
> > exception is that ref deletion required taking a repo-wide lock, but
> > that presumably wouldn't be a problem for your case.
> 
> Doesn't Git avoid taking locks, and use lockless synchronization
> mechanisms (though possibly equivalent to locks)?  I think it takes
> lockfile to update reflog together with reference, but if reflogs
> are turned off (and I think they are off for bare repositories by
> default), ref update uses "atomic file write" (write + rename)
> and compare-and-swap primitive.  Updating repository is lock-free:
> first update repository object database, then reference.

There is a lockfile to make the compare-and-swap atomic, but yes, it's
fundamentally based around the compare-and-swap. I don't think that
matters to the end user though. Fundamentally they will see "I hoped to
move from X to Y, but somebody else wrote Z, aborting", which is the
same as "I did not win the lock race, aborting".

The point is that updating two different refs is generally independent,
and updating the same ref is not.

> I guess that trying to replicate DGit approach that GitHub uses, see
> "Introducing DGit" (http://githubengineering.com/introducing-dgit)
> is currently out of question?

Minor nitpick (that you don't even have any way of knowing about, so
maybe more of a public service announcement). GitHub will stop using the
"DGit" name because it's too confusingly similar to "Git" (and "Git" is
trademarked by the project). There's a new blog post coming that
mentions the name change, and that historic one will have a note added
retroactively. The new name is "GitHub Spokes" (get it, Hub, Spokes?).

But in response to your question, I'll caution that replicating it is a
lot of work. :)

Since the original problem report mentions GHE, I'll note that newer
versions of GHE do support clustering and can share the git load across
multiple Spokes servers. So in theory that could make the replica layer
go away entirely, because it all happens behind the scenes.

-Peff

PS Sorry, I generally try to avoid hawking GitHub wares on the list, but
   since the OP mentioned GHE specifically, and because there aren't
   really generic solutions to most of these things, I do think 

Re: [PATCH v14 08/27] bisect--helper: `is_expected_rev` & `check_expected_revs` shell function in C

2016-08-29 Thread Junio C Hamano
Pranit Bauva  writes:

> Hey Junio,
>
> On Thu, Aug 25, 2016 at 3:43 AM, Junio C Hamano  wrote:
>> Pranit Bauva  writes:
>>
>>> +static int is_expected_rev(const char *expected_hex)
>>> +{
>>> + struct strbuf actual_hex = STRBUF_INIT;
>>> + int res = 0;
>>> + if (strbuf_read_file(_hex, git_path_bisect_expected_rev(), 0) 
>>> >= 0) {
>>> + strbuf_trim(_hex);
>>> + res = !strcmp(actual_hex.buf, expected_hex);
>>
>> If it is known to have 40-hex:
>>
>>  (1) accepting ">= 0" seems way too lenient.  You only expect a
>>  41-byte file (or 42 if somebody would write CRLF, but I do not
>>  think anybody other than yourself is expected to write into
>>  this file, and you do not write CRLF yourself);
>>
>>  (2) strbuf_trim() is overly loose.  You only want to trim the
>>  terimnating LF and it is an error to have other trailing
>>  whitespaces.
>>
>> I think the latter is not a new problem and it is OK to leave it
>> as-is; limiting (1) to >= 40 may still be a good change, though,
>> because it makes the intention of the code clearer.
>
> I can do the first change. Since this wasn't present in the shell
> code, I will also mention it in the commit message.

There is no need for that, as that was present in the original.  The
original compared the string with something that is known to be
40-hex, so anything shorter than 40 wouldn't have matched.

It is merely a trivial and obvious optimization to do that in C
rewrite.


Re: [PATCH v14 12/27] bisect--helper: `get_terms` & `bisect_terms` shell function in C

2016-08-29 Thread Junio C Hamano
Pranit Bauva  writes:

>>> +static int bisect_terms(struct bisect_terms *terms, const char **argv, int 
>>> argc)
>>> +{
>>> + int i;
>>> +
>>> + if (get_terms(terms)) {
>>> + fprintf(stderr, _("no terms defined\n"));
>>> + return -1;
>>> + }
>>> + if (argc == 0) {
>>> + printf(_("Your current terms are %s for the old state\nand "
>>> +"%s for the new state.\n"), terms->term_good.buf,
>>> +terms->term_bad.buf);
>>> + return 0;
>>> + }
>>> +
>>> + for (i = 0; i < argc; i++) {
>>> + if (!strcmp(argv[i], "--term-good"))
>>> + printf("%s\n", terms->term_good.buf);
>>> + else if (!strcmp(argv[i], "--term-bad"))
>>> + printf("%s\n", terms->term_bad.buf);
>>> + else
>>> + printf(_("invalid argument %s for 'git bisect "
>>> +   "terms'.\nSupported options are: "
>>> +   "--term-good|--term-old and "
>>> +   "--term-bad|--term-new."), argv[i]);
>>> + }
>>
>> The original took only one and gave one answer (and errored out when
>> the user asked for more), but this one loops.  I can see either way
>> is OK and do not think of a good reason to favor one over the other;
>> unless there is a strong reason why you need this extended behaviour
>> that allows users to ask multiple questions, I'd say we should keep
>> the original behaviour.
>
> True! I can just use return error() instead of printf. Also I noticed
> that this is printing to stdout while the original printed it to
> stderr. Thanks!

The original you removed because the above can take it over is this
in your patch.

-bisect_terms () {
-   get_terms
-   if ! test -s "$GIT_DIR/BISECT_TERMS"
-   then
-   die "$(gettext "no terms defined")"
-   fi
-   case "$#" in
-   0)
-   gettextln "Your current terms are $TERM_GOOD for the old state
-and $TERM_BAD for the new state."
-   ;;
-   1)
-   arg=$1
-   case "$arg" in
-   --term-good|--term-old)
-   printf '%s\n' "$TERM_GOOD"
-   ;;
-   --term-bad|--term-new)
-   printf '%s\n' "$TERM_BAD"
-   ;;
-   *)
-   die "$(eval_gettext "invalid argument \$arg for 
'git bisect terms'.
-Supported options are: --term-good|--term-old and --term-bad|--term-new.")"
-   ;;
-   esac
-   ;;
-   *)
-   usage ;;
-   esac
-}
-

The fprintf() that says "no terms defined" can be made error().  The
"invalid argument" message used to be die in the original, and
should be sent to the standard error stream as you noticed.

But a bigger difference is that the original made sure that the
caller asked one question at a time.  "terms --term-good --term-bad"
was responded with a "usage".  That is no longer true in the
rewrite.



Re: [PATCH v14 07/27] bisect--helper: `bisect_reset` shell function in C

2016-08-29 Thread Junio C Hamano
Pranit Bauva  writes:

>> with the original
>>
>> case $# in
>> 0)  reset to the branch ;;
>> 1)  reset to the commit ;;
>> *)  give usage and die ;;
>> esac
>>
>> and took the difference and reacted "ah, excess parameters are not
>> diagnosed in this function".
>>
>> Your caller does complain about excess parameters without giving
>> usage, and that is what I missed.
>>
>> I am not sure if you intended to change the behaviour in this case
>> to avoid giving the usage string; I tend to think it is a good
>> change, but I didn't see it mentioned in the proposed commit log,
>> which also contributed to my not noticing the test in the caller.
>
> I could include this in the commit message.

Nah, it was something anybody could notice with 2 more minutes of
reading and pondering from the patch text alone.  Not worth spending
more time on the log message on this one.

Thanks.


Re: [PATCH v6 00/13] Git filter protocol

2016-08-29 Thread Lars Schneider

> On 25 Aug 2016, at 13:07, larsxschnei...@gmail.com wrote:
> 
> From: Lars Schneider 
> 
> The goal of this series is to avoid launching a new clean/smudge filter
> process for each file that is filtered.
> 
> A short summary about v1 to v5 can be found here:
> https://git.github.io/rev_news/2016/08/17/edition-18/
> 
> This series is also published on web:
> https://github.com/larsxschneider/git/pull/10

Hi Junio,

I am working on v7 of this patch series and I noticed that is does
not merge clean anymore (not even with git/git master). Should
I rebase my patch series? If yes, what would be the most
convenient base for you?

Thanks,
Lars



Re: [PATCH 01/20] cache: convert struct cache_entry to use struct object_id

2016-08-29 Thread Johannes Schindelin
Hi Kuba,

On Mon, 29 Aug 2016, Jakub Narębski wrote:

> W dniu 29.08.2016 o 01:27, brian m. carlson pisze:
> 
> > Convert struct cache_entry to use struct object_id by applying the
> > following semantic patch and the object_id transforms from contrib:
> > 
> > @@
> > struct cache_entry E1;
> > @@
> > - E1.sha1
> > + E1.oid.hash
> > 
> > @@
> > struct cache_entry *E1;
> > @@
> > - E1->sha1
> > + E1->oid.hash
> 
> I wonder if writing this patch series (or rather the following one)
> would be helped by using one of semantic patch tools, such as
> Coccinelle[1], spdiff[2], or Undebt[3]...
> 
> [1]: http://coccinelle.lip6.fr/

If previous work by Brian is any indication, he did use Coccinelle and the
commit message actually shows the definition used for the transformation.

Ciao,
Johannes

Re: [PATCH v5 09/12] doc: revisions - define `reachable`

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 15:21, Philip Oakley pisze:
> From: "Jakub Narębski" 
> Sent: Sunday, August 28, 2016 2:01 PM
>> W dniu 12.08.2016 o 09:07, Philip Oakley pisze:
[...]

>>> +For these commands,
>>> +specifying a single revision, using the notation described in the
>>> +previous section, means the set of commits `reachable` from the given
>>> +commit.
[...]
>>> +
>>> +A commit's reachable set is the commit itself and the commits in
>>> +its ancestry chain.
>>> +
>>
>> It is all right, but perhaps it would be better to just repeat:
>>
>>  +Set of commits reachable from given commit is the commit itself
>>  +and all the commits in its ancestry chain.
> 
> It's very easy to go around in circles here. The original issue was
> the A..B notation for the case where A is a direct descendant of B,
> such that new users, or users more familiar with range notations from
> elsewhere, would expect that the A..B range is *inclusive*, rather
> than an open / closed interval. It was the addressing of that problem
> that lead to the updating of the 'reachability' definition.

All right, I can see that.  It is a worthwhile goal.

> The main part of my sentence formation was to make the 'reachable'
> part the defining element, rather than being a feature of the set.
> Maybe it's using the 'set' viewpoint that is distracting?>>

One one hand, the "A commit's reachable set is ..." approach puts
'reachable' upfront.  On the other hand it introduces new terminology,
namely 'reachable set' (or 'reachable set of a commit' to be more
exact)...  it doesn't read that well to me, but I am not a native
speaker.

But as I wrote, this is quite all right anyway
-- 
Jakub Narębski



Re: [PATCH 01/20] cache: convert struct cache_entry to use struct object_id

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 01:27, brian m. carlson pisze:

> Convert struct cache_entry to use struct object_id by applying the
> following semantic patch and the object_id transforms from contrib:
> 
> @@
> struct cache_entry E1;
> @@
> - E1.sha1
> + E1.oid.hash
> 
> @@
> struct cache_entry *E1;
> @@
> - E1->sha1
> + E1->oid.hash

I wonder if writing this patch series (or rather the following one)
would be helped by using one of semantic patch tools, such as
Coccinelle[1], spdiff[2], or Undebt[3]...

[1]: http://coccinelle.lip6.fr/
[2]: http://www.diku.dk/~jespera/doc.html
[3]: 
http://engineeringblog.yelp.com/2016/08/undebt-how-we-refactored-3-million-lines-of-code.html

Best,
-- 
Jakub Narębski



[PATCH v2] t/Makefile: add a rule to re-run previously-failed tests

2016-08-29 Thread Johannes Schindelin
While developing patch series, it is a good practice to run the test
suite from time to time, just to make sure that obvious bugs are caught
early.  With complex patch series, it is common to run `make -j15 -k
test`, i.e.  run the tests in parallel and *not* stop at the first
failing test but continue. This has the advantage of identifying
possibly multiple problems in one big test run.

It is particularly important to reduce the turn-around time thusly on
Windows, where the test suite spends 45 minutes on the computer on which
this patch was developed.

It is the most convenient way to determine which tests failed after
running the entire test suite, in parallel, to look for left-over "trash
directory.t*" subdirectories in the t/ subdirectory. However, as was
pointed out by Jeff King, those directories might live outside t/ when
overridden using the --root= option, to which the Makefile
has no access. The next best method is to grep explicitly for failed
tests in the test-results/ directory, which the Makefile *can* access.

This patch automates the process of determinig which tests failed
previously and re-running them.

Note that we need to be careful to inspect only the *newest* entries in
test-results/: this directory contains files of the form
t--.counts and is only removed wholesale when running the
*entire* test suite, not when running individual tests. We ensure that
with a little sed magic on `ls -t`'s output that simply skips lines
when the file name was seen earlier.

Signed-off-by: Johannes Schindelin 
---

The patch is unfortunately no longer as trivial as before, but it
now also works with --root=..., i.e. when the user overrode the
location of the trash directories.

Published-As: https://github.com/dscho/git/releases/tag/failing-tests-v2
Fetch-It-Via: git fetch https://github.com/dscho/git failing-tests-v2
Interdiff vs v1:

 diff --git a/t/Makefile b/t/Makefile
 index c402a9ec..8aa6a72 100644
 --- a/t/Makefile
 +++ b/t/Makefile
 @@ -35,7 +35,12 @@ all: $(DEFAULT_TEST_TARGET)
  test: pre-clean $(TEST_LINT)
$(MAKE) aggregate-results-and-cleanup
  
 -failed: $(patsubst trash,,$(patsubst directory.%,%.sh,$(wildcard trash\ 
directory.t[0-9]*)))
 +failed:
 +  @failed=$$(cd '$(TEST_RESULTS_DIRECTORY_SQ)' && \
 +  grep -l '^failed [1-9]' $$(ls -t *.counts | \
 +  sed 'G;h;/^\(t[^.]*\)-[0-9]*\..*\n\1-[0-9]*\./d;P;d') | 
\
 +  sed -n 's/-[0-9]*\.counts$$/.sh/p') && \
 +  test -z "$$failed" || $(MAKE) $$failed
  
  prove: pre-clean $(TEST_LINT)
@echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)


 t/Makefile | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/t/Makefile b/t/Makefile
index d613935..8aa6a72 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -35,6 +35,13 @@ all: $(DEFAULT_TEST_TARGET)
 test: pre-clean $(TEST_LINT)
$(MAKE) aggregate-results-and-cleanup
 
+failed:
+   @failed=$$(cd '$(TEST_RESULTS_DIRECTORY_SQ)' && \
+   grep -l '^failed [1-9]' $$(ls -t *.counts | \
+   sed 'G;h;/^\(t[^.]*\)-[0-9]*\..*\n\1-[0-9]*\./d;P;d') | 
\
+   sed -n 's/-[0-9]*\.counts$$/.sh/p') && \
+   test -z "$$failed" || $(MAKE) $$failed
+
 prove: pre-clean $(TEST_LINT)
@echo "*** prove ***"; $(PROVE) --exec '$(SHELL_PATH_SQ)' 
$(GIT_PROVE_OPTS) $(T) :: $(GIT_TEST_OPTS)
$(MAKE) clean-except-prove-cache
-- 
2.10.0.rc1.114.g2bd6b38

base-commit: d5cb9cbd64165153a318e1049f8bf14b09a16b11


Re: [PATCH v5 09/12] doc: revisions - define `reachable`

2016-08-29 Thread Philip Oakley

From: "Jakub Narębski" 
Sent: Sunday, August 28, 2016 2:01 PM

W dniu 12.08.2016 o 09:07, Philip Oakley pisze:
[...]


 History traversing commands such as `git log` operate on a set
-of commits, not just a single commit.  To these commands,
-specifying a single revision with the notation described in the
-previous section means the set of commits reachable from that
-commit, following the commit ancestry chain.
+of commits, not just a single commit.
+
+For these commands,
+specifying a single revision, using the notation described in the
+previous section, means the set of commits `reachable` from the given
+commit.


Why such strange formatting?  Is it to keep original contents
for better blame / word diff?


Strange as in 'not reflowed'? - yes that was the reason. It can be hard to 
see the wood from the trees if there is a lot of reflow going on, as it can 
hide the issue behind the key change.


I did slightly abuse notation by quoting `reachable` to indicate that it's a 
term with a precise definition that can be confusing to those that don't 
know. I also split out the reachability parts into their own paragraphs.





+
+A commit's reachable set is the commit itself and the commits in
+its ancestry chain.
+


It is all right, but perhaps it would be better to just repeat:

 +Set of commits reachable from given commit is the commit itself
 +and all the commits in its ancestry chain.


It's very easy to go around in circles here. The original issue was the A..B 
notation for the case where A is a direct descendant of B, such that new 
users, or users more familiar with range notations from elsewhere, would 
expect that the A..B range is *inclusive*, rather than an open / closed 
interval. It was the addressing of that problem that lead to the updating of 
the 'reachability' definition.


The main part of my sentence formation was to make the 'reachable' part the 
defining element, rather than being a feature of the set. Maybe it's using 
the 'set' viewpoint that is distracting?


--
Jakub Narębski






Re: [PATCH 12/15] sequencer: lib'ify save_opts()

2016-08-29 Thread Johannes Schindelin
Hi Junio,

On Fri, 26 Aug 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >  static int pick_commits(struct commit_list *todo_list, struct replay_opts 
> > *opts)
> > @@ -1128,9 +1130,9 @@ int sequencer_pick_revisions(struct replay_opts *opts)
> > return -1;
> > if (get_sha1("HEAD", sha1) && (opts->action == REPLAY_REVERT))
> > return error(_("Can't revert as initial commit"));
> > -   if (save_head(sha1_to_hex(sha1)))
> > +   if (save_head(sha1_to_hex(sha1)) ||
> > +   save_opts(opts))
> > return -1;
> > -   save_opts(opts);
> 
> I think it is much easier to read to keep this on a single line.  It
> would be more verbose but an even easier would be to keep these two
> operations two separate steps, i.e.
> 
> if (save_head())
> return -1;
> if (save_opts())
> return -1;

Done,
Dscho


Re: [PATCH 07/15] sequencer: lib'ify read_populate_opts()

2016-08-29 Thread Johannes Schindelin
Hi Junio,

On Fri, 26 Aug 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > -static void read_populate_opts(struct replay_opts **opts_ptr)
> > +static int read_populate_opts(struct replay_opts **opts)
> >  {
> > if (!file_exists(git_path_opts_file()))
> > -   return;
> > -   if (git_config_from_file(populate_opts_cb, git_path_opts_file(), 
> > *opts_ptr) < 0)
> > -   die(_("Malformed options sheet: %s"), git_path_opts_file());
> > +   return 0;
> > +   if (git_config_from_file(populate_opts_cb, git_path_opts_file(), *opts) 
> > < 0)
> > +   return error(_("Malformed options sheet: %s"),
> > +   git_path_opts_file());
> > +   return 0;
> 
> This may not be sufficient to avoid die(), unless we know that the
> file we are reading is syntactically sound.  git_config_from_file()
> will die in config.c::git_parse_source() when the config_source sets
> die_on_error, and it is set in config.c::do_config_from_file().
> 
> The source we are reading from is created when the sequencer
> machinery starts and is only written by save_opts() which is
> called by the sequencer machinery using git_config_set*() calls,
> so I think it is OK to assume that we won't hit errors that would
> cause git_config_from_file() to die, at least for now.

I amended the commit message.

Ciao,
Dscho


Re: [PATCH 08/20] streaming: make stream_blob_to_fd take struct object_id

2016-08-29 Thread Johannes Schindelin
Hi Brian,

On Sun, 28 Aug 2016, brian m. carlson wrote:

> Since all of its callers have been updated, modify stream_blob_to_fd to
> take a struct object_id.
> 
> Signed-off-by: brian m. carlson 

I reviewed the patches until here, and they all look very good to me.

Will continue to review after clearing out my brain with other things (it
is mesmerizing to sift through sha1->oid changes for too long).

Ciao,
Dscho


Re: [PATCH] Documentation/SubmittingPatches: add quotes to advised commit reference

2016-08-29 Thread Michael J Gruber
Junio C Hamano venit, vidit, dixit 27.08.2016 00:42:
> Stefan Beller  writes:
> 
>> Junio finds it is easier to read text when the commit subject is quoted.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>  Documentation/SubmittingPatches | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/Documentation/SubmittingPatches 
>> b/Documentation/SubmittingPatches
>> index 500230c..a591229 100644
>> --- a/Documentation/SubmittingPatches
>> +++ b/Documentation/SubmittingPatches
>> @@ -123,7 +123,7 @@ archive, summarize the relevant points of the discussion.
>>  
>>  If you want to reference a previous commit in the history of a stable
>>  branch use the format "abbreviated sha1 (subject, date)". So for example
>> -like this: "Commit f86a374 (pack-bitmap.c: fix a memleak, 2015-03-30)
>> +like this: "Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
>>  noticed [...]".
> 
> Thanks, but it is not sufficient as you would not see the need for
> quoting without the example.
> 
> My preference is not a main deciding factor in this case anyway.  A
> more important reason why it makes sense to quote (aside from the
> fact that it makes more sense when reading) is because we already
> have a tool support for that.
> 
> Perhaps something like this instead?
> 
> -- >8 --
> From: Beat Bolli 
> Subject: SubmittingPatches: use gitk's "Copy commit summary" format
> Date: Fri, 26 Aug 2016 18:59:01 +0200
> 
> Update the suggestion in 175d38ca ("SubmittingPatches: document how
> to reference previous commits", 2016-07-28) on the format to refer
> to a commit to match what gitk has been giving since last year with
> its "Copy commit summary" command; also mention this as one of the
> ways to obtain a commit reference in this format.
> 
> Signed-off-by: Beat Bolli 
> ---
>  Documentation/SubmittingPatches | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
> index 500230c..15adb86 100644
> --- a/Documentation/SubmittingPatches
> +++ b/Documentation/SubmittingPatches
> @@ -122,9 +122,14 @@ without external resources. Instead of giving a URL to a 
> mailing list
>  archive, summarize the relevant points of the discussion.
>  
>  If you want to reference a previous commit in the history of a stable
> -branch use the format "abbreviated sha1 (subject, date)". So for example
> -like this: "Commit f86a374 (pack-bitmap.c: fix a memleak, 2015-03-30)
> -noticed [...]".
> +branch, use the format "abbreviated sha1 (subject, date)",
> +with the subject enclosed in a pair of double-quotes, like this:
> +
> +Commit f86a374 ("pack-bitmap.c: fix a memleak", 2015-03-30)
> +noticed that ...
> +
> +The "Copy commit summary" command of gitk can be used to obtain this
> +format.

or (an alias for) the command

git show -s --date=format:%F --pretty='tformat:%h ("%s", %ad)'

>  
>  (3) Generate your patch using Git tools out of your commits.
> 



Re: [PATCH 04/22] sequencer: future-proof remove_sequencer_state()

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Johannes Schindelin wrote:

> On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:
> 
> > On ma, 2016-08-29 at 10:04 +0200, Johannes Schindelin wrote:
> > 
> > > +   if (read_and_refresh_cache(opts))
> > > +   return -1;
> > > +
> > 
> > This doesn't seem to be related to the get_dir changes?
> 
> Good eyes.
> 
> Let me investigate why I have it here...

Unfortunately my reflogs got corrupted by the git-worktree
implementations, so I cannot back that far.

Looking at the code, and after running the tests, I am convinced that it
is a leftover of some misguided attempt to implement "git rebase -i
--abort" in sequencer_rollback().

I removed this hunk from the patch.

Again, Thank you so much for your review!
Dscho

Re: [PATCH 00/22] Prepare the sequencer for the upcoming rebase -i patches

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:03 +0200, Johannes Schindelin wrote:
> 
> > Therefore I would be most grateful for every in-depth review.
> 
> Tried to do that, but could come up only with a few nits. I think the
> approach is sensible.

Thank you for the review!

Ciao,
Dscho


Re: [PATCH 20/22] sequencer: remember do_recursive_merge()'s return value

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:06 +0200, Johannes Schindelin wrote:
> 
> > The return value of do_recursive_merge() may be positive (indicating merge
> > conflicts), se let's OR later error conditions so as not to overwrite them
> > with 0.
> 
> s/se/so/?

Good eyes.

Fixed,
Dscho


Re: [PATCH 15/22] sequencer: introduce a helper to read files written by scripts

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:06 +0200, Johannes Schindelin wrote:
> > +   if (strbuf_read_file(buf, path, 0) < 0) {
> > +   warning_errno("could not read '%s'", path);
> > +   return 0;
> > +   }
> > +
> > +   if (buf->len > orig_len && buf->buf[buf->len - 1] == '\n') {
> > +   if (--buf->len > orig_len && buf->buf[buf->len - 1]
> > == '\r')
> > +   --buf->len;
> > +   buf->buf[buf->len] = '\0';
> > +   }
> 
> Why not use open + strbuf_getline instead of hand-rolling a newline
> eradicator?

Because strbuf_getline() erases the strbuf instead of appending to it
(which is what we sometimes need when converting shell scripts to C).

Ciao,
Dscho

Re: [PATCH 12/22] sequencer: refactor the code to obtain a short commit name

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:05 +0200, Johannes Schindelin wrote:
> 
> 
> 
> I fail to see the point of this patch, would you mind enlightening me?

Two reasons:

1) by refactoring it into a function, the code is more DRY (with all the
advantages that come with it, such as: only a single point to change if
changing the behavior)

2) it is easier to reuse the code in upcoming patches (that would be in
the next patch series)

Will amend the commit message.

Ciao,
Dscho


Re: [PATCH 04/22] sequencer: future-proof remove_sequencer_state()

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:04 +0200, Johannes Schindelin wrote:
> 
> > +   if (read_and_refresh_cache(opts))
> > +   return -1;
> > +
> 
> This doesn't seem to be related to the get_dir changes?

Good eyes.

Let me investigate why I have it here...

Ciao,
Dscho

Re: [PATCH 01/22] sequencer: use static initializers for replay_opts

2016-08-29 Thread Johannes Schindelin
Hi Dennis,

On Mon, 29 Aug 2016, Dennis Kaarsemaker wrote:

> On ma, 2016-08-29 at 10:03 +0200, Johannes Schindelin wrote:
> 
> > +#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, 
> > NULL, NULL, 0, 0, NULL }
> 
> This looked off to me, as it replaces memset(..., 0, ...) so is not
> 100% equivalent. But the changed functions both set opts.action and
> call parse_args which sets opts.subcommand.

Okay... Do you want me to change anything?

Ciao,
Dscho


Re: [PATCH 01/20] cache: convert struct cache_entry to use struct object_id

2016-08-29 Thread Johannes Schindelin
Hi Brian,

On Sun, 28 Aug 2016, brian m. carlson wrote:

> Convert struct cache_entry to use struct object_id by applying the
> following semantic patch and the object_id transforms from contrib:
> 
> @@
> struct cache_entry E1;
> @@
> - E1.sha1
> + E1.oid.hash
> 
> @@
> struct cache_entry *E1;
> @@
> - E1->sha1
> + E1->oid.hash
> 
> Signed-off-by: brian m. carlson 

Makes sense, iff you find:

> diff --git a/cache.h b/cache.h
> index b780a91a..a679484e 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -173,7 +173,7 @@ struct cache_entry {
>   unsigned int ce_flags;
>   unsigned int ce_namelen;
>   unsigned int index; /* for link extension */
> - unsigned char sha1[20];
> + struct object_id oid;
>   char name[FLEX_ARRAY]; /* more */
>  };

(which is unfortunately buried deep in the diff...)

Ciao,
Dscho


Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-29 Thread Johannes Schindelin
Hi Kuba,

On Sun, 28 Aug 2016, Jakub Narębski wrote:

> W dniu 25.08.2016 o 15:21, Johannes Schindelin pisze:
> > On Mon, 22 Aug 2016, Jakub Narębski wrote:
> >> W dniu 22.08.2016 o 15:18, Johannes Schindelin pisze:
> >>
> >>> So unfortunately this thread has devolved. Which is sad. Because all
> >>> I wanted is to have a change in Git's submission process that would
> >>> not exclude *so many* developers. That is really all I care about.
> >>> Not about tools. Not about open vs proprietary, or standards.
> >>>
> >>> I just want developers who are already familiar with Git, and come
> >>> up with an improvement to Git itself, to be able to contribute it
> >>> without having to pull out their hair in despair.
> >>
> >> What is lacking in using submitGit tool for those that have problems
> >> with sending patches via email?
> > 
> > Where do I start? And where do I stop? Here is a *very* brief list of
> > issues from the top of my head (and the history of my web browser):
> 
> Let me reorder those issues and group them into specific categories.

I am afraid that this is not the direction I was interested in.

You asked how submitGit fell short of my goal to provide a more convenient
and more efficient submission process, and I listed a couple of reasons.

I am really more interested in a better process, than in a point-by-point
discussion of the reasons I listed.

And some of those discussions really only distract. This comment, for
example, seems to be designed to veer off in a direction that, quite
frankly, does not matter for what I *really* wanted to discuss:

> > - submitGit requires you to go to a separate website to interact with the
> >   submitGit web app. Would be so much nicer if it were a bot operating on
> >   PRs.
> 
> [...]
> 
> Also, for some people registering on GitHub is "yet another service"... ;-)

I am really curious how this is supposed to keep the discussion rational.

*Of course* I implied that it is bad enough to require contributors to
register with one web service, and that requiring to register with *yet
another service* is just excessive.

Sheesh.

> > - You cannot Cc: people explicitly:
> >   https://github.com/rtyley/submitgit/issues/31
> > 
> > - submitGit does not include any interdiff
> 
> These are, I think, mainly related to lack of support for series *iteration*,
> for sending new version of patch series / of pull request.

Yes, of course. Because that is what makes the process so particularly
tedious, thankyouverymuch.

> I don't know how well GitHub pull requests deal with iteration and
> refinement, and what is available as API to apps such as submitGit.

Hmm. Then it may be a good idea if I let you find out before we continue
to discuss these bullet points that I never wanted to discuss.

> > - submitGit would require a substantial effort from me to learn how to
> >   extend/fix it, to run the web app locally and run its tests. That is a
> >   rather steep hurdle.
> 
> Well, you cannot extend/fix GitHub at all yourself, isn't it? ;-P

Sure you can. There is an API and you can register hooks. You can do even
more advanced stuff [*1*].

> >> Submitting changes in Git comes in three phases:
> >>  - submit email with patches
> >>  - review and discuss patch
> >>  - apply patches from email
> > 
> > You forgot a really crucial step. Maybe you did not go through dozens of
> > iterations in your patch series as I regularly do, or something, so it is
> > probably easy for you to forget:
> > 
> >   - find the commit in question, run rebase -i and patch it as suggested
> > 
> > This is something that costs me quite some time to do. It is easily the
> > most annoying aspect of the mail list-based approach for me.
> 
> I probably don't have as many topics in flight, and maybe the number of
> iterations is smaller, but I don't remember having troubles with that.

Well, aren't you lucky.

I bet you did not intend to sound condescending there, even.

> > It is only projects such as Linux, Cygwin and Git itself who refuse to
> > allow for tools that would let the majority of potential contributors
> > stick with their favorite way to read and write mails (I am talking
> > about users of GMail and Outlook, of course).
> 
> Those are projects that started before GitHub (for obvious reasons), and
> which created a good enough workflow based on email.  It might be that
> they are ossified relics, or it might be that they don't want to trade
> advantages of email based workflow for advantages of web app based
> workflow.

Those are projects that started before Git was invented. Yet all three
projects traded the advantages of patches and tarballs for advantages of
using Git.

> First, email clients and Usenet news readers support threading.  I
> haven't found a good web app that supports threading well (though it
> might be a matter of small set of such apps examined by me).  They allow
> marking and labeling posts to return back later.
> 
> Second, email allows offline work, and handle 

Re: Reducing CPU load on git server

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 07:47, Jeff King pisze:
> On Sun, Aug 28, 2016 at 12:42:52PM -0700, W. David Jarvis wrote:
> 
>> The actual replication process works as follows:
>>
>> 1. The primary git server receives a push and sends a webhook with the
>> details of the push (repo, ref, sha, some metadata) to a "publisher"
>> box
>>
>> 2. The publisher enqueues the details of the webhook into a queue
>>
>> 3. A fleet of "subscriber" (replica) boxes each reads the payload of
>> the enqueued message. Each of these then tries to either clone the
>> repository if they don't already have it, or they run `git fetch`.
> 
> So your load is probably really spiky, as you get thundering herds of
> fetchers after every push (the spikes may have a long flatline at the
> top, as it takes time to process the whole herd).

One solution I have heard about, in the context of web cache, to reduce
the thundering herd problem (there caused by cache expiring at the same
time in many clients) was to add some random or quasi-random distribution
to expiration time.  In your situation adding a random delay with some
specified deviation could help.

Note however that it is, I think, incompatible (to some extent) with
"caching" solution, where the 'thundering herd' get served the same
packfile.  Or at least one solution can reduce the positive effect
of the other.

>> 1. We currently run a blanket `git fetch` rather than specifically
>> fetching the ref that was pushed. My understanding from poking around
>> the git source code is that this causes the replication server to send
>> a list of all of its ref tips to the primary server, and the primary
>> server then has to verify and compare each of these tips to the ref
>> tips residing on the server.

[...]
> There's nothing in upstream git to help smooth these loads, but since
> you mentioned GitHub Enterprise, I happen to know that it does have a
> system for coalescing multiple fetches into a single pack-objects. I
> _think_ it's in GHE 2.5, so you might check which version you're
> running (and possibly also talk to GitHub Support, who might have more
> advice; there are also tools for finding out which git processes are
> generating the most load, etc).

I wonder if this system for coalescing multiple fetches is something
generic, or is it something specific to GitHub / GitHub Enterprise
architecture?  If it is the former, would it be considered for
upstreaming, and if so, when it would be in Git itself?


One thing to note: if you have repositories which are to have the
same contents, you can distribute the pack-file to them and update
references without going through Git.  It can be done on push
(push to master, distribute to mirrors), or as part of fetch
(master fetches from central repository, distributes to mirrors).
I think; I have never managed large set of replicated Git repositories.

If mirrors can get out of sync, you would need to ensure that the
repository doing the actual fetch / receiving the actual push is
a least common denominator, that it it looks like lagging behind
all other mirrors in set.  There is no problem if repository gets
packfile with more objects than it needs.

>> In other words, let's imagine a world in which we ditch our current
>> repo-level locking mechanism entirely. Let's also presume we move to
>> fetching specific refs rather than using blanket fetches. Does that
>> mean that if a fetch for ref A and a fetch for ref B are issued at
>> roughly the exact same time, the two will be able to be executed at
>> once without running into some git-internal locking mechanism on a
>> granularity coarser than the ref? i.e. are fetch A and fetch B going
>> to be blocked on the other's completion in any way? (let's presume
>> that ref A and ref B are not parents of each other).
> 
> Generally no, they should not conflict. Writes into the object database
> can happen simultaneously. Ref updates take a per-ref lock, so you
> should generally be able to write two unrelated refs at once. The big
> exception is that ref deletion required taking a repo-wide lock, but
> that presumably wouldn't be a problem for your case.

Doesn't Git avoid taking locks, and use lockless synchronization
mechanisms (though possibly equivalent to locks)?  I think it takes
lockfile to update reflog together with reference, but if reflogs
are turned off (and I think they are off for bare repositories by
default), ref update uses "atomic file write" (write + rename)
and compare-and-swap primitive.  Updating repository is lock-free:
first update repository object database, then reference.

That said, it might be that per-repository global lock that you
use is beneficial, limiting the amount of concurrent access; but
it could be detrimental, that global-lock contention is the cause
of stalls and latency.

>> The ultimate goal for us is just figuring out how we can best reduce
>> the CPU load on the primary instance so that we don't find ourselves
>> in a situation where we're not able to run 

Re: [PATCH v6 09/13] convert: modernize tests

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 22:03, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider 
>> 
>> Use `test_config` to set the config, check that files are empty with
>> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
>> after ">" and "<".
> 
> All of the above are good things to do, but the first one needs to
> be done a bit carefully.
> 
> It is unclear in the above description if you made sure that no
> subsequent test depends on the settings left by earlier test before
> replacing "git config" with "test_config".

I assumed that I would see test failures if a subsequent test depends
on the settings left by earlier tests. I'll add this comment to the
commit message.


>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>> index 7bac2bc..7b45136 100755
>> --- a/t/t0021-conversion.sh
>> +++ b/t/t0021-conversion.sh
>> @@ -13,8 +13,8 @@ EOF
>> chmod +x rot13.sh
>> 
>> test_expect_success setup '
>> -git config filter.rot13.smudge ./rot13.sh &&
>> -git config filter.rot13.clean ./rot13.sh &&
>> +test_config filter.rot13.smudge ./rot13.sh &&
>> +test_config filter.rot13.clean ./rot13.sh &&
>> 
>>  {
>>  echo "*.t filter=rot13"
> 
> For example, after this conversion, filter.rot13.* will be reverted
> back to unconfigured once setup is done.
> 
>> test_expect_success check '
>> 
>> -cmp test.o test &&
>> -cmp test.o test.t &&
>> +test_cmp test.o test &&
>> +test_cmp test.o test.t &&
>> 
>>  # ident should be stripped in the repository
>>  git diff --raw --exit-code :test :test.i &&
> 
> It happens to be true that this subsequent test does not do any
> operation to cause data come from and go into the object database
> for any path that match the pattern "*.t", because for whatever
> reason the previous "setup" step happens to do more than just
> "setup".  It already exercised the filtering by doing "git add" and
> "git checkout".
> 
> If we were writing the script t0021 from scratch today, we would
> have used test_config *AND* squashed there two tests into one
> (i.e. making it a single "the filter and ident operation" test).
> Then it is crystal clear that additional tests on commands that may
> involve filtering will always be added to that combined test
> (e.g. we may try "cat-file --filters" to ensure that rot13 filter is
> excercised).
> 
> But because we are not doing that, it may become tempting to add
> test for a new command that pays attention to a filter to either of
> the test, and it would have worked OK if this patch weren't there.
> Such an addition will break the test, because in the second "check"
> test, the filter commands are no longer active with this patch.
> 
> So while I do appreciate the change for the longer term, I am not
> quite happy with it.  It just looks like an invitation for future
> mistakes.

I'll follow your judgement here. However, from my point of view a future 
addition that causes test failures is no issue. Because these test failures
would be analyzed and then the tests could be refactored accordingly.


>> @@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
>> 
>>  # delete the files and check them out again, using a smudge filter
>>  # that will count the args and echo the command-line back to us
>> -git config filter.argc.smudge "sh ./argc.sh %f" &&
>> +test_config filter.argc.smudge "sh ./argc.sh %f" &&
>>  rm "$normal" "$special" &&
>>  git checkout -- "$normal" "$special" &&
> 
> This one is clearly OK.  Anything related to argc filter only
> appears in this single test so it is not just OK to use test_config,
> but it makes our intention very clear that nobody else would use the
> argc filter.  I think similar reasoning would apply to the test_config
> conversion in the remainder of the script.

OK, then I will keep these test_config's and revert the first one.


> As to other types of changes you did in this, I see no problem with
> them.


Thanks,
Lars


Re: [PATCH 00/22] Prepare the sequencer for the upcoming rebase -i patches

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:03 +0200, Johannes Schindelin wrote:

> Therefore I would be most grateful for every in-depth review.

Tried to do that, but could come up only with a few nits. I think the
approach is sensible.

D.


Re: [PATCH 20/22] sequencer: remember do_recursive_merge()'s return value

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:06 +0200, Johannes Schindelin wrote:

> The return value of do_recursive_merge() may be positive (indicating merge
> conflicts), se let's OR later error conditions so as not to overwrite them
> with 0.

s/se/so/?

D.


Re: improve 'git pull' status message

2016-08-29 Thread Jakub Narębski
W dniu 29.08.2016 o 11:11, KES pisze:
> 
> When we do git pull -v --rebase
> 
> We got this:
> remote: Counting objects: 7, done.
> remote: Compressing objects: 100% (7/7), done.
> remote: Total 7 (delta 5), reused 0 (delta 0)
> Unpacking objects: 100% (7/7), done.
> From ssh://slab/alexclear/ontico
>2b541e2..2c17d76  master -> origin/master
> Changes from 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7 to 
> 2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08:
>  x | 37 ++---
>  1 file changed, 22 insertions(+), 15 deletions(-)
> First, rewinding head to replay your work on top of it...
> 
> 
> That will be better if this
> 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7 to 
> 2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08
> will be replaced by this
> 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7..2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08
> 
> This will allow us to copy/paste this string into 'diff' command
> git diff 
> 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7..2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08

You can copy the fetch line, "2b541e2..2c17d76", it is
the same range / set of revisions.

Shortened SHA-1 works as well as full SHA-1.

-- 
Jakub Narębski


Re: [PATCH 15/22] sequencer: introduce a helper to read files written by scripts

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:06 +0200, Johannes Schindelin wrote:
> +   if (strbuf_read_file(buf, path, 0) < 0) {
> +   warning_errno("could not read '%s'", path);
> +   return 0;
> +   }
> +
> +   if (buf->len > orig_len && buf->buf[buf->len - 1] == '\n') {
> +   if (--buf->len > orig_len && buf->buf[buf->len - 1]
> == '\r')
> +   --buf->len;
> +   buf->buf[buf->len] = '\0';
> +   }

Why not use open + strbuf_getline instead of hand-rolling a newline
eradicator?

D.


Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 19:21, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>> OK, what function names would be more clear from your point of view?
>> 
>> write_packetized_stream_from_fd()
>> write_packetized_stream_from_buf()
>> read_packetized_stream_to_buf()
> 
> Would
> 
>write_packetized_from_fd()
>write_packetized_from_buf()
>read_packetized_to_buf()
> 
> be shorter, still understandable, be consistent with the existing
> convention to name write_/read_ a function that shuffles bytes via a
> file descriptor, and to the point, perhaps?

Agreed.

Thanks,
Lars


Re: [PATCH v6 05/13] pkt-line: add packet_write_gently()

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 19:15, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>>> Do you anticipate future need of non-gently variant of this
>>> function?  If so, perhaps a helper that takes a boolean "am I
>>> working for the gently variant?" may help share more code.
>> 
>> With helper you mean "an additional boolean parameter"? I don't 
>> see a need for a non-gently variant right now but I will
>> add this parameter if you think it is a good idea. How would the
>> signature look like?
>> 
>> int packet_write_gently(const int fd_out, const char *buf, size_t size, int 
>> gentle)
>> 
>> This would follow type_from_string_gently() in object.h.
> 
> I actually imagined it would be more like your packet_write_fmt vs
> packet_write_fmt_gently pair of functions.  If you do not have an
> immediate need for a non-gentle packet_write() right now, but you
> still forsee that it is likely some other caller may want one, you
> could still prepare for it by doing a static
> 
>   packet_write_1((const int fd_out, const char *buf, size_t size, int 
> gentle)
> 
> and make packet_write_gently() call it with gentle=1, without
> actually introducing packet_write() nobody yet calls.

I see. In that case I would like to keep packet_write_gently() as is
because I don't see the need for a non-gently variant right now.

If there is a need for packet_write() then we could just add it and
move the packet_write_gently() code to packet_write_1() following your
suggestion. No caller would need to change for this refactoring.

If you strongly disagree then I would use the "two function" approach
you suggested above right away, though.

Thanks,
Lars

Re: [PATCH 12/22] sequencer: refactor the code to obtain a short commit name

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:05 +0200, Johannes Schindelin wrote:



I fail to see the point of this patch, would you mind enlightening me?

D.


Re: [PATCH 04/22] sequencer: future-proof remove_sequencer_state()

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:04 +0200, Johannes Schindelin wrote:

> +   if (read_and_refresh_cache(opts))
> +   return -1;
> +

This doesn't seem to be related to the get_dir changes?

D.


Re: [PATCH 01/22] sequencer: use static initializers for replay_opts

2016-08-29 Thread Dennis Kaarsemaker
On ma, 2016-08-29 at 10:03 +0200, Johannes Schindelin wrote:

> +#define REPLAY_OPTS_INIT { -1, -1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, NULL, 
> NULL, 0, 0, NULL }

This looked off to me, as it replaces memset(..., 0, ...) so is not
100% equivalent. But the changed functions both set opts.action and
call parse_args which sets opts.subcommand.

D.


improve status message

2016-08-29 Thread KES
Hi

When we do git pull -v --rebase

We got this:
remote: Counting objects: 7, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 7 (delta 5), reused 0 (delta 0)
Unpacking objects: 100% (7/7), done.
>From ssh://slab/alexclear/ontico
   2b541e2..2c17d76  master -> origin/master
Changes from 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7 to 
2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08:
 x | 37 ++---
 1 file changed, 22 insertions(+), 15 deletions(-)
First, rewinding head to replay your work on top of it...


That will be better if this
2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7 to 
2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08
will be replaced by this
2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7..2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08

This will allow us to copy/paste this string into 'diff' command
git diff 
2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7..2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08

In compare to first version it give error:
git diff 2b541e2bbd23ab5c375c4ce1e0fae5255470a5e7 to 
2c17d767934f7f6784d2e0411c7a3a9bfc9c4d08
fatal: ambiguous argument 'to': unknown revision or path not in the working 
tree.
Use '--' to separate paths from revisions, like this:
'git  [...] -- [...]'

Thank you.


[PATCH 11/22] sequencer: get rid of the subcommand field

2016-08-29 Thread Johannes Schindelin
The subcommands are used exactly once, at the very beginning of
sequencer_pick_revisions(), to determine what to do. This is an
unnecessary level of indirection: we can simply call the correct
function to begin with. So let's do that.

While at it, ensure that the subcommands return an error code so that
they do not have to die() all over the place (bad practice for library
functions...).

Signed-off-by: Johannes Schindelin 
---
 builtin/revert.c | 36 
 sequencer.c  | 35 +++
 sequencer.h  | 13 -
 3 files changed, 31 insertions(+), 53 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 7365559..c9ae4dc 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -71,7 +71,7 @@ static void verify_opt_compatible(const char *me, const char 
*base_opt, ...)
die(_("%s: %s cannot be used with %s"), me, this_opt, base_opt);
 }
 
-static void parse_args(int argc, const char **argv, struct replay_opts *opts)
+static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 {
const char * const * usage_str = revert_or_cherry_pick_usage(opts);
const char *me = action_name(opts);
@@ -115,25 +115,15 @@ static void parse_args(int argc, const char **argv, 
struct replay_opts *opts)
if (opts->keep_redundant_commits)
opts->allow_empty = 1;
 
-   /* Set the subcommand */
-   if (cmd == 'q')
-   opts->subcommand = REPLAY_REMOVE_STATE;
-   else if (cmd == 'c')
-   opts->subcommand = REPLAY_CONTINUE;
-   else if (cmd == 'a')
-   opts->subcommand = REPLAY_ROLLBACK;
-   else
-   opts->subcommand = REPLAY_NONE;
-
/* Check for incompatible command line arguments */
-   if (opts->subcommand != REPLAY_NONE) {
+   if (cmd) {
char *this_operation;
-   if (opts->subcommand == REPLAY_REMOVE_STATE)
+   if (cmd == 'q')
this_operation = "--quit";
-   else if (opts->subcommand == REPLAY_CONTINUE)
+   else if (cmd == 'c')
this_operation = "--continue";
else {
-   assert(opts->subcommand == REPLAY_ROLLBACK);
+   assert(cmd == 'a');
this_operation = "--abort";
}
 
@@ -156,7 +146,7 @@ static void parse_args(int argc, const char **argv, struct 
replay_opts *opts)
"--edit", opts->edit,
NULL);
 
-   if (opts->subcommand != REPLAY_NONE) {
+   if (cmd) {
opts->revs = NULL;
} else {
struct setup_revision_opt s_r_opt;
@@ -174,6 +164,14 @@ static void parse_args(int argc, const char **argv, struct 
replay_opts *opts)
 
if (argc > 1)
usage_with_options(usage_str, options);
+
+   if (cmd == 'q')
+   return sequencer_remove_state(opts);
+   if (cmd == 'c')
+   return sequencer_continue(opts);
+   if (cmd == 'a')
+   return sequencer_rollback(opts);
+   return sequencer_pick_revisions(opts);
 }
 
 int cmd_revert(int argc, const char **argv, const char *prefix)
@@ -185,8 +183,7 @@ int cmd_revert(int argc, const char **argv, const char 
*prefix)
opts.edit = 1;
opts.action = REPLAY_REVERT;
git_config(git_default_config, NULL);
-   parse_args(argc, argv, );
-   res = sequencer_pick_revisions();
+   res = run_sequencer(argc, argv, );
if (res < 0)
die(_("revert failed"));
return res;
@@ -199,8 +196,7 @@ int cmd_cherry_pick(int argc, const char **argv, const char 
*prefix)
 
opts.action = REPLAY_PICK;
git_config(git_default_config, NULL);
-   parse_args(argc, argv, );
-   res = sequencer_pick_revisions();
+   res = run_sequencer(argc, argv, );
if (res < 0)
die(_("cherry-pick failed"));
return res;
diff --git a/sequencer.c b/sequencer.c
index 1b65202..ba1fd05 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -127,7 +127,7 @@ void *sequencer_entrust(struct replay_opts *opts, void 
*set_me_free_after_use)
return set_me_free_after_use;
 }
 
-static void remove_sequencer_state(const struct replay_opts *opts)
+int sequencer_remove_state(struct replay_opts *opts)
 {
struct strbuf dir = STRBUF_INIT;
int i;
@@ -141,6 +141,8 @@ static void remove_sequencer_state(const struct replay_opts 
*opts)
strbuf_addf(, "%s", get_dir(opts));
remove_dir_recursively(, 0);
strbuf_release();
+
+   return 0;
 }
 
 static const char *action_name(const struct replay_opts *opts)
@@ -941,7 +943,7 @@ static int rollback_single_pick(void)
return reset_for_rollback(head_sha1);
 }
 
-static int sequencer_rollback(struct replay_opts *opts)

  1   2   >