Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-22 Thread Junio C Hamano
Stefan Saasen  writes:

> Anyway, long story short. We're interested to help but I'm not
> entirely sure what that would look like at the moment. Are there
> formed ideas floating around or would you be looking for some form of
> proposal instead?

I am not proposing anything or looking for proposals myself,
actually.  It is just somebody expressed interest in having tested
older maintenance track that is kept alive in the past, so I was
merely trying to help connect you with that old thread.

If those who are interested in having such LTS track(s) need
something specific from me, and if it will not be unrealistic
maintenance burden, I am willing to help.  That's all.

For example, LTS group for whatever reason may nominate 2.2.x track
as a base that they want to keep alive longer than other maintenance
tracks and promise to test changes to them to keep it stable.  Then
I can help the effort by making sure people's bugfix patches would
apply down to 2.2.x track (often people make mistake of using newer
facility to fix or test the fix for an ancient bug, and bugfix topic
branch ends up forked at a point much newer than where it should
be).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-21 Thread Stefan Saasen
> Perhaps companies like Atlassian that rely on the stability of the
> open source Git can spare some resources and join forces with like
> minded folks on LTS of older maintenance tracks, if they are truly
> interested in.

We certainly can and would like to. I'm not entirely sure what that
would entail though?

>From reading through $gmane/264365 I've identified the following
responsibilities/opportunities to help:

>- Monitor "git log --first-parent maint-lts..master" and find
>  the tip of topic branches that need to be down-merged;
>
>- Down-merge such topics to maint-lts; this might involve
>  cherry-picking instead of merge, as the bugfix topics may
>  originally be done on the codebase newer than maint-lts;

and more importantly testing the maint-lts version to ensure
backported changes don't introduce regressions and the maint-lts
branch is stable.

This suggests specific, spaced LTS versions but in the same thread you
mention maint-2.1or maint-2.2.
So a different model could be maintaining old versions in a sliding
window fashion (e.g. critical issues would be backported to the last 6
months worth of git releases).

Maybe I'm getting ahead of myself here :)

Anyway, long story short. We're interested to help but I'm not
entirely sure what that would look like at the moment. Are there
formed ideas floating around or would you be looking for some form of
proposal instead?

Cheers,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-21 Thread Stefan Saasen
>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle.  Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).

Thanks for the clarification Junio.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-21 Thread Junio C Hamano
Junio C Hamano  writes:

> Stefan Saasen  writes:
>
>> I've noticed Peff's patches on pu which suggest they will be available
>> in git 2.5?
>
> Being on 'pu' (or 'next' for that matter) is not a suggestion for a
> change to appear in any future version at all, even though it often
> means that it would soon be merged to 'master' and will be in the
> upcoming release to be on 'next' in early part of a development
> cycle.  Some larger topics would stay on 'next' for a few cycles.
>
>> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?
>
> The topic will hopefully be merged to 'master' after 2.4 final is
> released end of this month, down to 'maint' early May and will ship
> with 2.4.1, unless there is unforeseen issues discovered in the
> change while people try it out while it is in 'next' (which will
> happen today, hopefully).

... and then if I do not forget and if the topic is really important
for real-world users, I am OK to merge it down to 2.3 and even 2.2
maintenance tracks later.  But that will happen only after the topic
hits 'maint', which will happen only after the topic hits 'master'.

What you _can_ help is the "if I do not forget" part ;-)  Also see a
similar discussion we had recently

  http://thread.gmane.org/gmane.comp.version-control.git/264365

The key sentence from my part in the thread is 

> When I say "the tip of 'master' is meant to be more stable than
> any tagged versions", I do mean it.

and the reasoning behind it that is given in the paragraph before
that, though.

Perhaps companies like Atlassian that rely on the stability of the
open source Git can spare some resources and join forces with like
minded folks on LTS of older maintenance tracks, if they are truly
interested in.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-21 Thread Junio C Hamano
Stefan Saasen  writes:

> I've noticed Peff's patches on pu which suggest they will be available
> in git 2.5?

Being on 'pu' (or 'next' for that matter) is not a suggestion for a
change to appear in any future version at all, even though it often
means that it would soon be merged to 'master' and will be in the
upcoming release to be on 'next' in early part of a development
cycle.  Some larger topics would stay on 'next' for a few cycles.

> Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?

The topic will hopefully be merged to 'master' after 2.4 final is
released end of this month, down to 'maint' early May and will ship
with 2.4.1, unless there is unforeseen issues discovered in the
change while people try it out while it is in 'next' (which will
happen today, hopefully).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Stefan Saasen
>> If it is critical to some people, they can downmerge to their custom
>> old installations of Git they maintain with ease, of course, and
>> that "with ease" part is the reason why I try to apply fixes to tip
>> of the original topic branch even though they were merged to the
>> mainline eons ago ;-).
>
> I think it is a bigger deal for folks who do not ship a custom
> installation, but expect to ship a third-party system that interacts
> with whatever version of git their customers happen to have (in which
> case they can only recommend their customers to upgrade).

Yes, this is the situation we are facing. We allow our customers to
use the git version that is supported/available on their OS (within a
certain range of supported versions) so our customers usually don't
compile from source.

> Either way, though, I do not think it is the upstream Git project's
> problem.

That's fair enough, I was mostly enquiring about the official git
versions this will land in so that we can advise customers what git
version to use (or not to use).

I've noticed Peff's patches on pu which suggest they will be available
in git 2.5?
Do you Junio, have plans to merge them to maint (2.3.x) and/or next (2.4)?

While I certainly agree that this is specific to Git on NFS and not a
more widespread git performance problem, I'd love to be able to
message something other than "skip all the git version between and
including git 2.2 - 2.4".

I appreciate your consideration and thanks again for the swift response on this.

Cheers,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Jeff King
On Mon, Apr 20, 2015 at 01:12:54PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Either way, though, I do not think it is the upstream Git project's
> > problem.
> 
> The commit to pick where to queue the fixes actually is my problem,
> as I have this illusion that I'd be helping these derived works by
> making it easier for them to merge, not cherry-pick.

True, I had just meant the actual rolling of the releases.

> But I would imagine that they may go the cherry-pick route anyway,
> in which case I may be wasting my time worrying about them X-<.

FWIW, I typically cherry-pick rather than merge. The resulting history
is not as nice, but it means I don't have to think as hard about the
history when doing so. It also means that topics may not be as well
tested (e.g., they may have been implicitly relying on some other thing
that happened upstream that I did _not_ cherry-pick). But we treat even
cherry-picked upstream topics as their own feature branches, and do our
normal internal testing and review.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Junio C Hamano
Jeff King  writes:

> Either way, though, I do not think it is the upstream Git project's
> problem.

The commit to pick where to queue the fixes actually is my problem,
as I have this illusion that I'd be helping these derived works by
making it easier for them to merge, not cherry-pick.

But I would imagine that they may go the cherry-pick route anyway,
in which case I may be wasting my time worrying about them X-<.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Jeff King
On Mon, Apr 20, 2015 at 01:04:11PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > ... But I don't know
> > if this counts as critical (it is for you, certainly, but I don't think
> > that many people are affected, as the crucial factor here is really the
> > slow NFS filesystem operations).
> 
> If it is critical to some people, they can downmerge to their custom
> old installations of Git they maintain with ease, of course, and
> that "with ease" part is the reason why I try to apply fixes to tip
> of the original topic branch even though they were merged to the
> mainline eons ago ;-).

I think it is a bigger deal for folks who do not ship a custom
installation, but expect to ship a third-party system that interacts
with whatever version of git their customers happen to have (in which
case they can only recommend their customers to upgrade).

I don't know how Stash or GitLab installations work. GitHub ships our
own custom git (which I maintain), though we are already on 2.3.x.

Either way, though, I do not think it is the upstream Git project's
problem.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Junio C Hamano
Jeff King  writes:

> ... But I don't know
> if this counts as critical (it is for you, certainly, but I don't think
> that many people are affected, as the crucial factor here is really the
> slow NFS filesystem operations).

If it is critical to some people, they can downmerge to their custom
old installations of Git they maintain with ease, of course, and
that "with ease" part is the reason why I try to apply fixes to tip
of the original topic branch even though they were merged to the
mainline eons ago ;-).

Thanks.  The patches look good from cursory reading.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-20 Thread Jeff King
On Sat, Apr 18, 2015 at 01:35:51PM +1000, Stefan Saasen wrote:

> Here are the timings for the two patches:
> [...]

Thanks, that matches what I was hoping for.

> My tweaked version of your second patch is:
> [...]
> -   return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
> +   if (!find_pack_entry(sha1, &e))
> +  return 0;
> +   if (e.p->freshened)
> +  return 1;
> +   return e.p->freshened = freshen_file(e.p->pack_name);
>  }

Whooops, yeah, setting the flag is probably helpful. :)

We usually try to avoid assignments in a return like this, so I've
written it out a little more verbosely in my final version. I'll send
those patches in a moment.

  [1/2]: sha1_file: freshen pack objects before loose
  [2/2]: sha1_file: only freshen packs once per run

> Is there a chance to backport those changes to the 2.2+ branches?

That's up to Junio. These patches can be applied straight to the
jk/prune-mtime topic. Usually he would then merge the topic up to
"maint", which at this would potentially become the next v2.3.x. If an
issue is critical (e.g., a security vulnerability), he'll sometimes
merge and roll maintenance releases for older versions. But I don't know
if this counts as critical (it is for you, certainly, but I don't think
that many people are affected, as the crucial factor here is really the
slow NFS filesystem operations).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-17 Thread Stefan Saasen
> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.

Thanks for the swift response, much appreciated Jeff!

Here are the timings for the two patches:

Patch 1 on top of 33d4221c79

   Elapsed   System  User
 Min.   :6.110   Min.   :0.6700   Min.   :0.3600
 1st Qu.:6.580   1st Qu.:0.6900   1st Qu.:0.3900
 Median :7.260   Median :0.7100   Median :0.4100
 Mean   :7.347   Mean   :0.7248   Mean   :0.4214
 3rd Qu.:8.000   3rd Qu.:0.7400   3rd Qu.:0.4600
 Max.   :8.860   Max.   :0.8700   Max.   :0.5100

I've had to slightly tweak your second patch (`freshened` was never
set) but applying the modified patch yielded even better results
compared to patch 1:

   Elapsed   System  User
 Min.   :0.38   Min.   :0.03000   Min.   :0.2900
 1st Qu.:0.38   1st Qu.:0.04000   1st Qu.:0.3100
 Median :0.39   Median :0.06000   Median :0.3200
 Mean   :0.43   Mean   :0.05667   Mean   :0.3519
 3rd Qu.:0.42   3rd Qu.:0.07000   3rd Qu.:0.3600
 Max.   :0.68   Max.   :0.08000   Max.   :0.5700

This is pretty much back to the "before" state.
The graph really tells the whole story:
https://bytebucket.org/snippets/ssaasen/GeRE/raw/7367353a58c50ccd7c493af40ffb6ba1533e1490/git-merge-timing-patched.png
(After is the change in #33d4221, Before the parent of #33d4221 and so on)
The graph and the NFS stats can be found here:
https://bitbucket.org/snippets/ssaasen/GeRE

My tweaked version of your second patch is:

diff --git a/cache.h b/cache.h
index 51ee856..8982055 100644
--- a/cache.h
+++ b/cache.h
@@ -1168,6 +1168,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
 pack_keep:1,
+   freshened:1,
 do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/x.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index bc6322e..c0ccd4b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned
char *sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
struct pack_entry e;
-   return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+   if (!find_pack_entry(sha1, &e))
+  return 0;
+   if (e.p->freshened)
+  return 1;
+   return e.p->freshened = freshen_file(e.p->pack_name);
 }

 int write_sha1_file(const void *buf, unsigned long len, const char
*type, unsigned char *returnsha1)



The only change is that I assign the result of `freshen_file` to the
`freshened` flag.

I've only ran this with the test case I was using before but it looks
like this is pretty much fixing the merge time changes we observed.

Thanks again for the swift response. I've got my test setup sitting
here, happy to rerun the tests if that'd be useful.

Is there a chance to backport those changes to the 2.2+ branches?

> You may also be interested in:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.

Thanks for the pointer, I'll have a look at that as well.

Cheers,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-17 Thread Junio C Hamano
Jeff King  writes:

> If it's not a problem, I'd love to see timings for your case with just
> the first patch, and then with both.

Thanks for two quick progress patches.

> You may also be interested in:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/266370
>
> which addresses another performance problem related to the
> freshen/recent code in v2.2.

That, too.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Performance regression due to #33d4221: write_sha1_file: freshen existing objects

2015-04-17 Thread Jeff King
On Fri, Apr 17, 2015 at 05:30:22PM +1000, Stefan Saasen wrote:

> The merge is created in a temporary location that uses alternates. The
> temporary repository is on a local disk, the alternate object database
> on an NFS mount.

Is the alternate writeable? If we can't freshen the object, we fall back
to storing the object locally, which could have a performance impact.
But it looks from your tables below like the utime() call is succeeding,
so that is probably not what is happening here.

> My current hypothesis is that the additional `access`, but more
> importantly the additional `utime` calls are responsible in the
> increased merge times that we see.

Yeah, that makes sense from your tables. The commit in question flips
the order of the loose/packed check, and the packed check should be much
faster on your NFS mount. Can you try:

diff --git a/sha1_file.c b/sha1_file.c
index 88f06ba..822aaef 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3014,7 +3014,7 @@ int write_sha1_file(const void *buf, unsigned long len, 
const char *type, unsign
write_sha1_file_prepare(buf, len, type, sha1, hdr, &hdrlen);
if (returnsha1)
hashcpy(returnsha1, sha1);
-   if (freshen_loose_object(sha1) || freshen_packed_object(sha1))
+   if (freshen_packed_object(sha1) || freshen_loose_object(sha1))
return 0;
return write_loose_object(sha1, hdr, hdrlen, buf, len, 0);
 }

I think that should clear up the access() calls, but leave the utime()
ones.

> Looking at the detailed strace shows that utime will be called
> repeatedly in same cases (e.g.
> https://bitbucket.org/snippets/ssaasen/oend shows an example where the
> same packfile will be updated more than 4000 times in a single merge).
> 
> http://www.spinics.net/lists/git/msg240106.html discusses a potential
> improvement for this case. Would that be an acceptable avenue to
> improve this situation?

I think so. Here's a tentative patch:

diff --git a/cache.h b/cache.h
index 3d3244b..72c6888 100644
--- a/cache.h
+++ b/cache.h
@@ -1174,6 +1174,7 @@ extern struct packed_git {
int pack_fd;
unsigned pack_local:1,
 pack_keep:1,
+freshened:1,
 do_not_close:1;
unsigned char sha1[20];
/* something like ".git/objects/pack/x.pack" */
diff --git a/sha1_file.c b/sha1_file.c
index 822aaef..f27cbf1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2999,7 +2999,11 @@ static int freshen_loose_object(const unsigned char 
*sha1)
 static int freshen_packed_object(const unsigned char *sha1)
 {
struct pack_entry e;
-   return find_pack_entry(sha1, &e) && freshen_file(e.p->pack_name);
+   if (!find_pack_entry(sha1, &e))
+   return 0;
+   if (e.p->freshened)
+   return 1;
+   return freshen_file(e.p->pack_name);
 }
 
 int write_sha1_file(const void *buf, unsigned long len, const char *type, 
unsigned char *returnsha1)


If it's not a problem, I'd love to see timings for your case with just
the first patch, and then with both.

You may also be interested in:

  http://thread.gmane.org/gmane.comp.version-control.git/266370

which addresses another performance problem related to the
freshen/recent code in v2.2.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html