Re: slow git-cherry-pick.

2013-12-04 Thread Paweł Sikora
On Wednesday 04 of December 2013 08:07:23 Duy Nguyen wrote:
> On Wed, Dec 4, 2013 at 3:13 AM, Thomas Rast  wrote:
> > Paweł Sikora  writes:
> > 
> > Umm, there's a gem here that the thread missed so far:
> >> my git repo isn't very big[1] but it's checked out on the linear lvm
> >> where random i/o generally hurts and strace shows that current git
> >> version
> >> performs 2x{lstat}+1x{open,read,close} [2] on whole checkout before
> >> 
> >^
> > 
> > There's no reason why it should do the lstat() *twice* for every file.
> > But Paweł is right; the code path roughly goes like this:
> > 
> > int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
> > {
> > [...]
> > 
> > res = sequencer_pick_revisions(&opts);
> > 
> > int sequencer_pick_revisions(struct replay_opts *opts)
> > {
> > [...]
> > 
> > read_and_refresh_cache(opts);
> > 
> > [...]
> > 
> > return pick_commits(todo_list, opts);
> > 
> > }
> > 
> > static int pick_commits(struct commit_list *todo_list, struct replay_opts
> > *opts) {
> > [...]
> > 
> > read_and_refresh_cache(opts);
> > 
> > I'm too tired to dig further, but AFAICT it's just a rather obvious case
> > of duplication of effort.
> 
> That's something to optimize, but it's single commit picking,
> sequencer_pick_revisions() should call single_pick() instead of
> pick_commits().
> 
> The read+close on the whole checkout looks like there's problem with
> refresh operation and git decides to read up and verify sha-1 by
> content. Pawel, if you run "strace git update-index --refresh" twice,
> does it still show 1 stat + 1 read for every entry on the second try?

the 'git update-index --refresh' runs quickly and strace shows only lstat()
on every file. i see no massive open/read actions in this case.

$ strace -o strace-try1.log git update-index --refresh
hmdb: needs update
$ strace -o strace-try2.log git update-index --refresh
hmdb: needs update

$ grep -c lstat strace-try1.log 
33793
$ grep -c lstat strace-try2.log
33793

-- 
gpg key fingerprint = 60B4 9886 AD53 EB3E 88BB 1EB5 C52E D01B 683B 9411
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-12-03 Thread Duy Nguyen
On Wed, Dec 4, 2013 at 3:13 AM, Thomas Rast  wrote:
> Paweł Sikora  writes:
>>
>
> Umm, there's a gem here that the thread missed so far:
>
>> my git repo isn't very big[1] but it's checked out on the linear lvm
>> where random i/o generally hurts and strace shows that current git version
>> performs 2x{lstat}+1x{open,read,close} [2] on whole checkout before
>^
>
> There's no reason why it should do the lstat() *twice* for every file.
> But Paweł is right; the code path roughly goes like this:
>
> int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
> {
> [...]
> res = sequencer_pick_revisions(&opts);
>
> int sequencer_pick_revisions(struct replay_opts *opts)
> {
> [...]
> read_and_refresh_cache(opts);
> [...]
> return pick_commits(todo_list, opts);
> }
>
> static int pick_commits(struct commit_list *todo_list, struct replay_opts 
> *opts)
> {
> [...]
> read_and_refresh_cache(opts);
>
>
> I'm too tired to dig further, but AFAICT it's just a rather obvious case
> of duplication of effort.

That's something to optimize, but it's single commit picking,
sequencer_pick_revisions() should call single_pick() instead of
pick_commits().

The read+close on the whole checkout looks like there's problem with
refresh operation and git decides to read up and verify sha-1 by
content. Pawel, if you run "strace git update-index --refresh" twice,
does it still show 1 stat + 1 read for every entry on the second try?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-12-03 Thread Thomas Rast
Paweł Sikora  writes:
>

Umm, there's a gem here that the thread missed so far:

> my git repo isn't very big[1] but it's checked out on the linear lvm
> where random i/o generally hurts and strace shows that current git version
> performs 2x{lstat}+1x{open,read,close} [2] on whole checkout before
   ^

There's no reason why it should do the lstat() *twice* for every file.
But Paweł is right; the code path roughly goes like this:

int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
{
[...]
res = sequencer_pick_revisions(&opts);

int sequencer_pick_revisions(struct replay_opts *opts)
{
[...]
read_and_refresh_cache(opts);
[...]
return pick_commits(todo_list, opts);
}

static int pick_commits(struct commit_list *todo_list, struct replay_opts *opts)
{
[...]
read_and_refresh_cache(opts);


I'm too tired to dig further, but AFAICT it's just a rather obvious case
of duplication of effort.

-- 
Thomas Rast
t...@thomasrast.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-12-03 Thread Paweł Sikora
On Monday 25 of November 2013 09:26:40 Junio C Hamano wrote:
> Paweł Sikora  writes:
> > On Sunday 24 of November 2013 19:47:10 Duy Nguyen wrote:
> >> On Sun, Nov 24, 2013 at 5:45 PM, Paweł Sikora  
wrote:
> >> > i've recently reinstalled a fresh system (fc20-beta) on my workstation
> >> > and observing a big slowdown on git cherry-pick operation
> >> > (git-1.8.4.2-1).
> >> > the previous centos installation with an old git version works faster
> >> > (few seconds per cherry pick). now the same operation takes >1 min.
> >> 
> >> What is the git version before the reinstallation?
> > 
> > git-1.7.11.3-1.el5.rf.
> > 
> > i've checked this version on another machine with centos-5.$latest
> > and it does similar amout of stat/read operation quickly (~6s).
> > this "fast" centos-5 machine has /home on raid-0 (2x500GB) while
> > my "slow (>1min)" workstation has /home on linear lvm (250G+1T).
> > 
> > so, i suppose that my "slow" working copy crosses disks boundary
> > or spread over 1TB drive and the random git i/o impacts performance.
> > 
> > the question still remains - does the git need to scan whole checkout
> > during picking well defined set of files?
> 
> We do update-index to see what local changes you have upfront in
> order to avoid stomping on them (and we do not know upfront what
> paths the cherry-picked commit would change, given that there may be
> renames involved), so the answer is unfortunately yes, we would need
> to do lstat(2) the whole thing.

this is quite weird for me (user). git pull also fetches objects from
server and aborts user's action if working copy contains uncommitted
modifications on files that will be modified by incoming objects.

from the other side, the cherry-pick needs to stat() the whole working
copy to achieve similar precondition. looks like suboptimal implementation.

> Doing that lstat(2) more lazily and do away with the update-index
> might be possible, but I suspect that may be quite a lot of work.

maybe you can use the existing implementation used by 'pull' ?


-- 
gpg key fingerprint = 60B4 9886 AD53 EB3E 88BB  1EB5 C52E D01B 683B 9411
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-11-25 Thread Junio C Hamano
Paweł Sikora  writes:

> On Sunday 24 of November 2013 19:47:10 Duy Nguyen wrote:
>> On Sun, Nov 24, 2013 at 5:45 PM, Paweł Sikora  wrote:
>> > i've recently reinstalled a fresh system (fc20-beta) on my workstation
>> > and observing a big slowdown on git cherry-pick operation (git-1.8.4.2-1).
>> > the previous centos installation with an old git version works faster
>> > (few seconds per cherry pick). now the same operation takes >1 min.
>> 
>> What is the git version before the reinstallation?
>
> git-1.7.11.3-1.el5.rf.
>
> i've checked this version on another machine with centos-5.$latest
> and it does similar amout of stat/read operation quickly (~6s).
> this "fast" centos-5 machine has /home on raid-0 (2x500GB) while
> my "slow (>1min)" workstation has /home on linear lvm (250G+1T).
>
> so, i suppose that my "slow" working copy crosses disks boundary
> or spread over 1TB drive and the random git i/o impacts performance.
>
> the question still remains - does the git need to scan whole checkout
> during picking well defined set of files?

We do update-index to see what local changes you have upfront in
order to avoid stomping on them (and we do not know upfront what
paths the cherry-picked commit would change, given that there may be
renames involved), so the answer is unfortunately yes, we would need
to do lstat(2) the whole thing.

Doing that lstat(2) more lazily and do away with the update-index
might be possible, but I suspect that may be quite a lot of work.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-11-24 Thread Paweł Sikora
On Sunday 24 of November 2013 19:47:10 Duy Nguyen wrote:
> On Sun, Nov 24, 2013 at 5:45 PM, Paweł Sikora  wrote:
> > i've recently reinstalled a fresh system (fc20-beta) on my workstation
> > and observing a big slowdown on git cherry-pick operation (git-1.8.4.2-1).
> > the previous centos installation with an old git version works faster
> > (few seconds per cherry pick). now the same operation takes >1 min.
> 
> What is the git version before the reinstallation?

git-1.7.11.3-1.el5.rf.

i've checked this version on another machine with centos-5.$latest
and it does similar amout of stat/read operation quickly (~6s).
this "fast" centos-5 machine has /home on raid-0 (2x500GB) while
my "slow (>1min)" workstation has /home on linear lvm (250G+1T).

so, i suppose that my "slow" working copy crosses disks boundary
or spread over 1TB drive and the random git i/o impacts performance.

the question still remains - does the git need to scan whole checkout
during picking well defined set of files?

> Do you cherry-pick on one commit or a commit range?

single commit.

-- 
gpg key fingerprint = 60B4 9886 AD53 EB3E 88BB  1EB5 C52E D01B 683B 9411
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-11-24 Thread Duy Nguyen
On Sun, Nov 24, 2013 at 5:45 PM, Paweł Sikora  wrote:
> i've recently reinstalled a fresh system (fc20-beta) on my workstation
> and observing a big slowdown on git cherry-pick operation (git-1.8.4.2-1).
> the previous centos installation with an old git version works faster
> (few seconds per cherry pick). now the same operation takes >1 min.

What is the git version the reinstallation? Do you cherry-pick on one
commit or a commit range?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: slow git-cherry-pick.

2013-11-24 Thread Duy Nguyen
On Sun, Nov 24, 2013 at 7:47 PM, Duy Nguyen  wrote:
> On Sun, Nov 24, 2013 at 5:45 PM, Paweł Sikora  wrote:
>> i've recently reinstalled a fresh system (fc20-beta) on my workstation
>> and observing a big slowdown on git cherry-pick operation (git-1.8.4.2-1).
>> the previous centos installation with an old git version works faster
>> (few seconds per cherry pick). now the same operation takes >1 min.
>
> What is the git version the reinstallation? Do you cherry-pick on one

I accidentally the word "before" in the first question..

> commit or a commit range?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


slow git-cherry-pick.

2013-11-24 Thread Paweł Sikora
Hi,

i've recently reinstalled a fresh system (fc20-beta) on my workstation
and observing a big slowdown on git cherry-pick operation (git-1.8.4.2-1).
the previous centos installation with an old git version works faster
(few seconds per cherry pick). now the same operation takes >1 min.

my git repo isn't very big[1] but it's checked out on the linear lvm
where random i/o generally hurts and strace shows that current git version
performs 2x{lstat}+1x{open,read,close} [2] on whole checkout before
cherry-pick. there is also a .gitattributes searching on all levels
which doing another tons of i/o. looks like git-status on action but
why on whole repo while cherry-pick touches limited set of files?

is it a bug or feature?

BR,
Paweł.

please CC me on reply.


[1]
$ du -sh .git/objects/
4.2G.git/objects/
$ find sources -type f|wc -l
9536
$ find buildenv -type f|wc -l
14637

[2]
lstat("buildenv/boost-1.51.0/include/boost/bimap.hpp", {st_mode=S_IFREG|0664, 
st_size=387, ...}) = 0
lstat("buildenv/boost-1.51.0/include/boost/bimap.hpp", {st_mode=S_IFREG|0664, 
st_size=387, ...}) = 0
open("buildenv/boost-1.51.0/include/boost/bimap.hpp", O_RDONLY) = 5
read(5, "// Boost.Bimap\n//\n// Copyright ("..., 387) = 387
close(5)

-- 
gpg key fingerprint = 60B4 9886 AD53 EB3E 88BB  1EB5 C52E D01B 683B 9411
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html