RE: [ANNOUNCE] Git v2.5.0-rc0

2015-06-26 Thread steve.norman
Jeff / Duy,

 Fixes since v2.4
  * Access to objects in repositories that borrow from another one on a
slow NFS server unnecessarily got more expensive due to recent code
becoming more cautious in a naive way not to lose objects to pruning.
(merge ee1c6c3 jk/prune-mtime later to maint).
 
  * Have we lost a race with competing repack? check was too
expensive, especially while receiving a huge object transfer
that runs index-pack (e.g. clone or fetch).
(merge 0eeb077 jk/index-pack-reduce-recheck later to maint).

I was reading through the 2.5 release notes and spotted the above two entries 
and I think they are probably fixes related to the NFS slowness I reported last 
month.

Did some tests this morning and things are much improved:

Version 2.4.5
  - Local clone of https://github.com/git/git - 8s
  - Clone to NFS - 1m 3s

Version 2.5.0.rc0
  - Local - 7s
  - NFS - 13s

Thanks for the effort in addressing this and sorry I missed the patch to check 
it out earlier. 

Steve
N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: Troubleshoot clone issue to NFS.

2015-06-05 Thread steve.norman
On Sunday, May 24, 2015 @ 10:01 AM Duy Nguyen [mailto:pclo...@gmail.com] did 
scribble:

 In case you want to back away from option 2 because it starts to leak
 raciness, which your old commit tried to fix in the first place. I
 think the only other place that tests for lots of non-existent loose
 objects is write_sha1_file (e.g. tar -xf bigtarball.tar.gz; cd
 bigtarball; git init; git add .). But the number of calls should be
 much smaller compared to index-pack and it does not use has_sha1_file,
 it uses check_and_freshen_file() instead.
 
 There are other places where has_sha1_file() may return 0, but I think
 the number of calls is even smaller to bother (shallow.c,
 fetch-pack.c, apply.c, buik-checkin.c)

Any updates / further thoughts on this?

Thanks,

Steve
N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: Troubleshoot clone issue to NFS.

2015-06-05 Thread steve.norman
On Friday, June 05, 2015 @ 1:18 PM Jeff King [mailto:p...@peff.net] did 
scribble:

 Sorry, I haven't had a chance to look at it further. It still on my todo
 list. My plan is:
 
   1. Devise some torture to tests to see whether my patch series is in
  fact racy on Linux.
 
   2. Assuming it is, scrap it and make a has_sha1_file_quick() which
  might sometimes return a false negative (if somebody else is
  repacking). Use that in index-pack (and possibly other places, but
  we can start with index-pack).
 
 If we skip step 1 out of pessimism (which I think is a reasonable thing
 to do), then step 2 should not be all that much work. I'm going to be
 offline for a few days, though, so I won't get to it until next week at
 the earliest. If you (or someone else) wants to take a stab at it,
 please feel free.

I've been off myself and wanted to make sure I hadn't missed anything in 
the email threads while I was away as there we rather a lot of them.

Not feeling confident enough to make the changes myself at the moment. 
I think what you are saying is that has_sha1_file_quick() would be the version
from before your change in 45e8a74.  And then use that, but I could be
barking up the wrong tree completely.

Thanks,

Steve


RE: Troubleshoot clone issue to NFS.

2015-05-22 Thread steve.norman
On Friday, May 22, 2015 @ 8:12 AM Jeff King did scribble:

  In builtin/index-pack.c, replace the line collision_test_needed =
  has_sha1_file(sha1); with collision_test_needed = 0;. Security is
  compromised but for this test it should be ok. Then clone again. I
  hope the new number gets down close to v1.8.4.1.
 
 Yeah, I think that is a good starting point. I timed a local clone
 before and after 45e8a74; there is a small difference on my system
 (about 5%), but it goes away with your suggestion.

Tested this change on a couple of versions, first of all on the revision
where things go wrong for me:

~ $ bin/git --version
git version 1.8.4.1.g45e8a74.dirty

~ $ time bin/git clone https://github.com/git/git test
snip

real0m7.105s
user0m9.566s
sys 0m0.989s

~ $ time bin/git clone https://github.com/git/git /sami/test
snip

real0m14.411s
user0m9.703s
sys 0m1.374s

This is more in line with what I see normally.   Also tested on master:

~ $ bin/git --version
git version 2.4.1.217.g6c1249c.dirty

~ $ time bin/git clone https://github.com/git/git test
snip

real0m5.946s
user0m9.111s
sys 0m1.332s


~ $ time bin/git clone https://github.com/git/git /sami/test
snip

real0m12.344s
user0m9.187s
sys 0m1.579s

So similar on the latest as well.

 The problem with has_sha1_file() prior to v1.8.4.2 is that it is racy
 with respect to simultaneous operations; we might claim we do not have
 an object, when in fact we do. As you noted, usually has_sha1_file()
 returns true (i.e., we look up objects that we expect to have), and the
 performance impact is minimal.
 
 But for code paths where _not_ having the object is normal, the impact
 is much greater. So I think there are two possibilities for improving
 this:
 
   1. Find places where we expect the object will not exist (like the
  collision_test check you pointed out) and use a
  has_sha1_file_fast that accepts that it may very occasionally
  erroneously return false. In this case it would mean potentially
  skipping a collision check, but I think that is OK. That could have
  security implications, but only if an attacker:
 
a. has broken sha1 to generate a colliding object
 
b. can manipulate the victim into repacking in a loop
 
c. can manipulate the victim into fetching (or receiving a push)
   simultaneously with (b)
 
  at which point they can try to race the repack procedure to add
  their colliding object to the repository. It seems rather unlikely
  (especially part a).
 
   2. Make reprepare_packed_git() cheaper in the common case that nothing
  has changed. It would probably be enough to stat(objects/pack).
  We know that packfiles themselves do not change; we may only add or
  delete them. And since the hierarchy of objects/pack is flat, we
  know that the mtime on that directory will change if any packs are
  added or removed.
 
  Of course, we are still doing an extra stat() for each has_sha1_file
  call. Whether that helps for the NFS case depends on whether stat()
  is significantly cheaper than opendir/readdir/closedir. On my local
  disk, the hacky patch below did seem to give me back the 5% lost by
  45e8a74 (I did it directly on master, since that old commit does
  not have the stat_validity infrastructure).

Also tested master with the patch provided:

~ $ bin/git --version
git version 2.4.1.217.g6c1249c.dirty

~ $ time git clone https://github.com/git/git test

real0m8.263s
user0m10.550s
sys 0m3.763s

~ $ time git clone https://github.com/git/git /sami/test

real1m3.286s
user0m12.149s
sys 0m9.192s

So the patch isn't reducing the time taken when cloning to NAS.

Here are the top calls from strace

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 90.68   19.946171  46437398 45436 futex
  4.631.017637  22 46141 5 read
  2.370.521161   4141429   pread
  0.730.161442   3 47130 9 write
  0.420.093146   0188645188621 access
  0.380.083033  26  3209   181 open
  0.320.069587   0188613  1146 stat
  0.230.050855  12  4082  3925 lstat
  0.110.023317   8  2979 1 fstat
  0.040.009134   0 35696 3 recvfrom
  0.030.0076661917 4   wait4
  0.020.004478   1  3923   madvise
  0.010.002291   0 17858   poll
  0.010.002155   0 17851   select

Thanks for looking into this.

Steve
N�r��yb�X��ǧv�^�)޺{.n�+ا���ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: Troubleshoot clone issue to NFS.

2015-05-22 Thread steve.norman
On Friday, May 22, 2015 @ 11:06 AM Duy Nguyen did write:

 Strange. Maybe there is something else... Anyway some numbers from me.
 This is nfs3 hosted by Raspberry Pi, accessed over wireless. I just
 run index-pack on git.git pack instead of full clone.
 
  - v1.8.4.1 34s
  - v1.8.4.2 519s (oh.. wireless..)
  - v1.8.4.2 without has_sha1_file() in index-pack.c 33s
  - v1.8.4.2 + Jeff's mtime patch 36s

Just ran the test again and it was 12 seconds.   Too many versions of git 
available on the 
machine and I think I might have run the wrong one:

~ $ time bin/git clone https://github.com/git/git test
Cloning into 'test'...
remote: Counting objects: 186015, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 186015 (delta 8), reused 7 (delta 7), pack-reused 186000
Receiving objects: 100% (186015/186015), 61.33 MiB | 30.35 MiB/s, done.
Resolving deltas: 100% (135512/135512), done.
Checking connectivity... done.

real0m6.931s
user0m10.011s
sys 0m2.685s

~ $ time bin/git clone https://github.com/git/git /sami/test
Cloning into '/sami/test'...
remote: Counting objects: 186015, done.
remote: Compressing objects: 100% (8/8), done.
remote: Total 186015 (delta 8), reused 7 (delta 7), pack-reused 186000
Receiving objects: 100% (186015/186015), 61.33 MiB | 29.91 MiB/s, done.
Resolving deltas: 100% (135512/135512), done.
Checking connectivity... done.
Checking out files: 100% (2795/2795), done.

real0m13.169s
user0m9.961s
sys 0m3.480s

So  I would say that was in line with the timings for the other versions I have 
tested.

And looking back at the email from the morning I did run the wrong bin/git.   
Sorry, totally my fault on that. 

If I can test anything else then let me know.

Thanks,

Steve
--


Troubleshoot clone issue to NFS.

2015-05-21 Thread steve.norman
In setting up some new git servers I was trying to test the performance of some 
NFS mounted volumes and when compared to local disk (although this is a 
vitualized server so not truly local) cloning to NFS was taking a long time.

Here are some timings:

~ $ time bin/git clone https://github.com/git/git test
Cloning into 'test'...
remote: Counting objects: 185964, done.
remote: Compressing objects: 100% (276/276), done.
remote: Total 185964 (delta 203), reused 32 (delta 32), pack-reused 185656
Receiving objects: 100% (185964/185964), 61.42 MiB | 26.16 MiB/s, done.
Resolving deltas: 100% (135454/135454), done.
Checking connectivity... done.

real0m8.156s
user0m10.569s
sys 0m3.857s

~ $ time bin/git clone https://github.com/git/git /sami/test
Cloning into '/sami/test'...
remote: Counting objects: 185964, done.
remote: Compressing objects: 100% (276/276), done.
remote: Total 185964 (delta 203), reused 32 (delta 32), pack-reused 185656
Receiving objects: 100% (185964/185964), 61.42 MiB | 10.15 MiB/s, done.
Resolving deltas: 100% (135454/135454), done.
Checking connectivity... done.
Checking out files: 100% (2795/2795), done.

real0m58.685s
user0m12.153s
sys 0m7.619s

So cloning to NFS is 50 seconds slower on average (I've run this a lot over the 
last few days and it does appear to be consistent and not a network / github 
connectivity issue).  Tests creating files on the NFS with dd didn't show that 
much difference to the local disk (and were sometimes quicker).

Volume mount differences are:

/dev/mapper/rootvg-homelv on /home type ext4 (rw,nodev)
nfsserver:/vol/sami/repos on /sami type nfs 
(rw,bg,nfsvers=3,tcp,hard,nointr,timeo=600,rsize=32768,wsize=32768,addr=10.1.1.1)

And there doesn't appear to be any issue with NFS retransmissions:

/sami $ nfsstat
Client rpc stats:
calls  retransauthrefrsh
11719847   0  11720190

This morning I did some more digging as when I tried this on a newly build 
server the NFS times were slower than local disk, but only buy around 6-10 
seconds.  The new server had git 1.7.1  installed on it compared to 2.4.0 on 
the machine I've been testing on.  So I build a number of versions of git to 
test each one to try and find the point where it changed:

v1.8.0  11.363 s
v1.8.3  13.597 s
v1.8.4  13.958 s
v1.8.4.114.563 s
v1.8.4.21m 0s
v1.8.4.31m 9s
v1.8.4.51m 1s
v1.8.5  1m 0s
v1.8.5.61m 0s
v1.9.0  1m 38

v2.4.0  58s
v2.4.1  58s

So there appears to be a change in 1.8.4.2 that made this issue appear for me.  
Looking at the release notes the only thing that I can see that might be 
related could be:

* When an object is not found after checking the packfiles and then
   loose object directory, read_sha1_file() re-checks the packfiles to
   prevent racing with a concurrent repacker; teach the same logic to
   has_sha1_file().

So the questions are:

1) Should I expect a clone to NFS to be that much slower?
2) Is there anything I could do to speed it up (I've tried --bare as that is 
what the repositories will be when stored on NFS and there wasn't really a 
difference).
3) What else can I use in git to compare the performance on local to NFS to see 
if it is just clones that are affected?
4) I assume I could bisect between 1.8.4.1 and 1.8.4.2 to find exactly where 
things get worse for me?

Thanks for any help,

Steve



This e-mail is for the sole use of the intended recipient and contains 
information that may be privileged and/or confidential. If you are not an 
intended recipient, please notify the sender by return e-mail and delete this 
e-mail and any attachments. Certain required legal entity disclosures can be 
accessed on our website.http://thomsonreuters.com/prof_disclosures/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Troubleshoot clone issue to NFS.

2015-05-21 Thread steve.norman
On Thu, May 21, 2015a at 9:31 PM, Duy Nguyen [mailto:pclo...@gmail.com], did 
scribble:
  In case an object is not found pack directory is re-read again, which
  might cause some increased load on nfs.
  has_sha1_file() not finding the object should not happen often..
 
 That last statement is probably very wrong, but I have no time to test this
 now. In index-pack, there is a has_sha1_file() for file collision test. That 
 call
 on a fresh clone would fail for _every_ object in the
 (new) pack and the cost of reprepare pack files could be sky high...

Confirmed with bisect that it is that commit:

~/git $ git bisect bad
45e8a7487339c0f0ea28244ef06851308d07387c is the first bad commit
commit 45e8a7487339c0f0ea28244ef06851308d07387c
Author: Jeff King p...@peff.net
Date:   Fri Aug 30 15:14:13 2013 -0400

I have an strace for that build but it is 153 megabytes so I probably shouldn't 
attach, but the call summary is below:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 93.71   39.670084 103386835  2171 futex
  3.161.338572   7190550   181 open
  1.560.658786  18 37450 3 read
  0.620.262740   2141390   pread
  0.410.171526   5 37313 9 write
  0.180.076166   0188859188835 access
  0.110.047941   0374712   getdents
  0.110.045174  11  4067  3910 lstat
  0.060.023630   0190425   close
  0.040.017995   6  2975 1 fstat
  0.020.0076681917 4   wait4
  0.010.004150   1  5065   madvise
  0.010.003548   0 16090 8 recvfrom
  0.000.001872   0  8048   select
  0.000.001870  11   173 1 mkdir
  0.000.000872   0  8055   poll
  0.000.000262  221212 readlink
  0.000.000185   0  1217  1146 stat
  0.000.000158   0   457   mprotect
  0.000.74   0   298   mmap
  0.000.69   1   109 8 rt_sigreturn
  0.000.47   0   159   brk
  0.000.21   117   getcwd
  0.000.00   042 3 lseek
  0.000.00   092   munmap
  0.000.00   035   rt_sigaction
  0.000.00   0 9   rt_sigprocmask
  0.000.00   0 8 3 ioctl
  0.000.00   011   pipe
  0.000.00   0 3   dup
  0.000.00   0 8   dup2
  0.000.00   0 6   setitimer
  0.000.00   011 1 socket
  0.000.00   0 8 7 connect
  0.000.00   0 8   sendto
  0.000.00   0 2   recvmsg
  0.000.00   0 1   bind
  0.000.00   0 1   getsockname
  0.000.00   0 3 1 getpeername
  0.000.00   0 2   setsockopt
  0.000.00   0 2   getsockopt
  0.000.00   0 8   clone
  0.000.00   0 5   execve
  0.000.00   0 3   uname
  0.000.00   0   100   fcntl
  0.000.00   0 2   fsync
  0.000.00   013   chdir
  0.000.00   014   rename
  0.000.00   0 2   link
  0.000.00   0 5   unlink
  0.000.00   0 2   symlink
  0.000.00   0 9   chmod
  0.000.00   0 6   getrlimit
  0.000.00   0 2   sysinfo
  0.000.00   0 8   getuid
  0.000.00   0 1   statfs
  0.000.00   0 5   arch_prctl
  0.000.00   0 1   gettid
  0.000.00   0 5   set_tid_address
  0.000.00   013   set_robust_list
-- --- --- - - 
100.00   42.333410   1594736196300 total

Is there anything else I can provide or test?

Thanks for the help,

Steve