Re: Anomaly with the new code - Re: git-svn performance

2014-10-28 Thread Eric Wong
Hin-Tak Leung ht...@users.sourceforge.net wrote:
 Eric Wong normalper...@yhbt.net wrote:
  Which SVN version are you using?  I'm cloning (currently on r373xx)
  https://svn.r-project.org/R using --stdlayout and
  unable to see memory growth of the git-svn Perl process beyond 40M
  (on a 32-bit system).
 
 git-svn hit 45M and took 11:44 to finish.   My ping times to
 svn.r-project.org is around 150ms (I'm running this from a server in
 Fremont, California).  I'll keep the repo around and periodically fetch
 to see how it runs.
 
 I'll apply the 10 patches against 2.1.0 and see then. As I wrote
 in my last reply, my 3rd clone took about 8 hours to finish,
 and the max resident size is about 700MB (according to GNU time).

The time command is not a good measurement since it includes child
process memory use (which may be file-backed mmap for git repack or
git cat-file --batch).  My measurements are just the RSS of the
git-svn Perl process (from ps aux or VmRSS in /proc/$PID/status
on Linux)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-27 Thread Eric Wong
Hin-Tak Leung ht...@users.sourceforge.net wrote:
 On Sat, Oct 25, 2014 00:34 BST Eric Wong wrote:
 0006 is insufficient and incompatible with older SVN.
 I pushed git-svn: reload RA every log-window-size
 (commit dfa72fdb96befbd790f623bb2909a347176753c2) instead
 which saves much more memory:

 it is fetching against the new clone taking twice as long and
 consuming twice as much memory.

Which SVN version are you using?  I'm cloning (currently on r373xx)
https://svn.r-project.org/R using --stdlayout and
unable to see memory growth of the git-svn Perl process beyond 40M
(on a 32-bit system).

I also tried http:// (not https), svn+ssh:// on my local (64-bit) system
and did not see memory growth, either:

http://mid.gmane.org/20141027014033.ga4...@dcvr.yhbt.net

I'm using svn 1.6.17 on Debian stable in all cases.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-27 Thread Eric Wong
Eric Wong normalper...@yhbt.net wrote:
 Which SVN version are you using?  I'm cloning (currently on r373xx)
 https://svn.r-project.org/R using --stdlayout and
 unable to see memory growth of the git-svn Perl process beyond 40M
 (on a 32-bit system).

git-svn hit 45M and took 11:44 to finish.   My ping times to
svn.r-project.org is around 150ms (I'm running this from a server in
Fremont, California).  I'll keep the repo around and periodically fetch
to see how it runs.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-27 Thread Hin-Tak Leung
--
On Mon, Oct 27, 2014 06:38 GMT Eric Wong wrote:

Which SVN version are you using?  I'm cloning (currently on r373xx)
https://svn.r-project.org/R using --stdlayout and
unable to see memory growth of the git-svn Perl process beyond 40M
(on a 32-bit system).

I also tried http:// (not https), svn+ssh:// on my local (64-bit) system
and did not see memory growth, either:

    http://mid.gmane.org/20141027014033.ga4...@dcvr.yhbt.net

I'm using svn 1.6.17 on Debian stable in all cases.

The memory consumption does seem to go up a good deal after r48xxx -ish (the 
total
being about 67xxx-ish now), when there are a fair number of branches. Seeing as
you seem to be able to make the memory consumption drops further,
I'll rebuild git with dropping/adding those patches now.

I also just realised /usr/bin/time -v git svn fetch --all also includes the 
periodic auto-
garbage collection from git itself if fetching more than a number of commits,
so may not be accurate once git svn's memory consumption drops below
a certain level. Is there any way of coping with that?

I made a 3rd clone yesterday - it took 8 hours 15 minutes, and 
Command being timed: git svn fetch --all
User time (seconds): 6897.80
System time (seconds): 18853.08
Percent of CPU this job got: 86%
Elapsed (wall clock) time (h:mm:ss or m:ss): 8:14:00
...
Maximum resident set size (kbytes): 675436

and fetching the next 8 commits:

$ /usr/bin/time -v git svn fetch --all
M   doc/NEWS.Rd
r66871 = 0a7f50fc04dee174229513a0d80fecfcd12975ca (refs/remotes/trunk)
...
M   doc/manual/R-exts.texi
r66879 = ede68f65df714c3ba283579d85105393c1eccc80 (refs/remotes/trunk)
Auto packing the repository in background for optimum performance.
See git help gc for manual housekeeping.
Command being timed: git svn fetch --all
User time (seconds): 856.82
System time (seconds): 29.78
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 15:03.39
...
Maximum resident set size (kbytes): 791088

and quite similar against the 2nd clone, but against the first clone (which 
were created
by fetching every few days over a few years):

Command being timed: git svn fetch --all
User time (seconds): 518.00
System time (seconds): 28.62
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 9:16.84
...
Maximum resident set size (kbytes): 403160

So it seems the first clone is rather different from the recent ones. I haven't 
got round to compare
the branches yet - it is actually easier than I thought, since I only need to 
compare
the branch HEADs. (I already mentioned that trunk is different, due to a blank 
vs 3 word
commit message about 2 years ago - I reckon I might see similar issues in the 
other branches
- I'll go and write a script to check that now).

All recent fetch were done with git 2.1.0 patched with the 6 patches I 
mentioned, on fedora 20
x86_64.

BTW, I have been meaning to ask - are you the same Eric Wong who maintained
some chinese packages on Debian some years ago? :-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-27 Thread Hin-Tak Leung
--
On Mon, Oct 27, 2014 16:56 GMT Eric Wong wrote:

Eric Wong normalper...@yhbt.net wrote:
 Which SVN version are you using?  I'm cloning (currently on r373xx)
 https://svn.r-project.org/R using --stdlayout and
 unable to see memory growth of the git-svn Perl process beyond 40M
 (on a 32-bit system).

git-svn hit 45M and took 11:44 to finish.   My ping times to
svn.r-project.org is around 150ms (I'm running this from a server in
Fremont, California).  I'll keep the repo around and periodically fetch
to see how it runs.

I'll apply the 10 patches against 2.1.0 and see then. As I wrote
in my last reply, my 3rd clone took about 8 hours to finish,
and the max resident size is about 700MB (according to GNU time).

AFAIK the hosting server is in northern Europe (Copahagen?), I think,
so it is supposed to be faster for me fetching from UK.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-25 Thread Eric Wong
Hin-Tak Leung ht...@users.sourceforge.net wrote:
 On Sat, Oct 25, 2014 00:34 BST Eric Wong wrote:
 Hin-Tak Leung ht...@users.sourceforge.net wrote:
  0006-git-svn-clear-global-SVN-pool-between-get_log-invoca.patch   
 
 0006 is insufficient and incompatible with older SVN.
 I pushed git-svn: reload RA every log-window-size
 (commit dfa72fdb96befbd790f623bb2909a347176753c2) instead
 which saves much more memory:
 
 it is fetching against the new clone taking twice as long and
 consuming twice as much memory.

Ugh, I've only tested git-svn: reload RA every log-window-size with
file:// repos so far, so it looks like I'll need to setup remote repos
on my test system to test.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-25 Thread Hin-Tak Leung





--
On Sat, Oct 25, 2014 00:34 BST Eric Wong wrote:

Hin-Tak Leung ht...@users.sourceforge.net wrote:
 I keep tabs of a particular svn repository over many years
 and run git svn fetch --all every few days. So that's the old clone.
 Since this discussion started, I made a new one with git 2.1.0 patched
 with the first two patches below, a couple of weeks ago. And I ran
 'git svn fetch --all' on both every few days since.
 
 I have added a few more patches, so the whole list is the 6
 below against 2.1.0. The latest fetch is really strange - the fetch against
 the new clone took almost twice as long and uses almost twice
 as much memory, vs against the old. 17 min, 800 MB vs 10 min 400MB.
 Details below. Maybe this is a performance issue about how the clones
 were made?

Memory usage seems to grow with the amount of revisions fetched,
see below.  And higher memory means slower fork() on Linux systems.


but this is fetching the same number of revisions, and same revisions to keep 
the two clone in sync. So the issue is about how distant history is stored and 
used/searched, i think.

 0001-git-svn-only-look-at-the-new-parts-of-svn-mergeinfo.patch
 0002-git-svn-only-look-at-the-root-path-for-svn-mergeinfo.patch   
 0003-git-svn-reduce-check_cherry_pick-cache-overhead.patch
 0004-git-svn-cache-only-mergeinfo-revisions.patch 

 0006-git-svn-clear-global-SVN-pool-between-get_log-invoca.patch   

0006 is insufficient and incompatible with older SVN.
I pushed git-svn: reload RA every log-window-size
(commit dfa72fdb96befbd790f623bb2909a347176753c2) instead
which saves much more memory:


it is fetching against the new clone taking twice as long and consuming twice 
as much memory.

http://mid.gmane.org/20141024225352.gb31...@dcvr.yhbt.net

But there still seems to be some slow growth with many revisions
which is not mergeinfo-related.

 0007-git-svn-remove-mergeinfo-rev-caching.patch 

I think it is also safe to remove the _rev_list memoization since
it uses a lot of memory.  The remaining caches should be tiny
(but useful, I think).

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Anomaly with the new code - Re: git-svn performance

2014-10-24 Thread Hin-Tak Leung
I keep tabs of a particular svn repository over many years
and run git svn fetch --all every few days. So that's the old clone.
Since this discussion started, I made a new one with git 2.1.0 patched
with the first two patches below, a couple of weeks ago. And I ran
'git svn fetch --all' on both every few days since.

I have added a few more patches, so the whole list is the 6
below against 2.1.0. The latest fetch is really strange - the fetch against
the new clone took almost twice as long and uses almost twice
as much memory, vs against the old. 17 min, 800 MB vs 10 min 400MB.
Details below. Maybe this is a performance issue about how the clones
were made?

0001-git-svn-only-look-at-the-new-parts-of-svn-mergeinfo.patch
0002-git-svn-only-look-at-the-root-path-for-svn-mergeinfo.patch   
0003-git-svn-reduce-check_cherry_pick-cache-overhead.patch
0004-git-svn-cache-only-mergeinfo-revisions.patch 
0006-git-svn-clear-global-SVN-pool-between-get_log-invoca.patch   
0007-git-svn-remove-mergeinfo-rev-caching.patch 

(I dropped #5 because it doesn't seem interesting?)

---
$ /usr/bin/time -v git svn fetch --all
...
Command being timed: git svn fetch --all
User time (seconds): 622.20
System time (seconds): 12.52
Percent of CPU this job got: 98%
Elapsed (wall clock) time (h:mm:ss or m:ss): 10:42.21
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 399588
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 320
Minor (reclaiming a frame) page faults: 383987
Voluntary context switches: 2088
Involuntary context switches: 68304
Swaps: 0
File system inputs: 168288
File system outputs: 148960
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
[Hin-Tak@localhost R]$ cd ../R-2/
[Hin-Tak@localhost R-2]$ /usr/bin/time -v git svn fetch --all
M   src/library/stats/R/hclust.R
M   src/library/stats/R/dendrogram.R
r66853 = 7c18b2e4084529d5912cf789c045f2eab7d4083c (refs/remotes/trunk)
M   doc/manual/R-exts.texi
r66854 = bc7b131e34eaf04859fede1ecedb796c0a33be02 (refs/remotes/trunk)
M   doc/manual/R-exts.texi
Checking svn:mergeinfo changes since r66844: 6 sources, 1 changed
W:svn cherry-pick ignored (/trunk:66824,66854) - missing 1084 commit(s) (eg 
6453a2d844e27f2963ba87142028b023c50385ef)
r66855 = de5daf8db948732fa96c3d5b32077d8057e2a7e7 (refs/remotes/R-3-1-branch)
M   src/modules/internet/internet.c
r66856 = a1e9300c6dd49ec4c3dd11f861bca0dbe3ca65b4 (refs/remotes/trunk)
M   doc/manual/R-admin.texi
r66857 = eb5f3175e67a806482c39def71246f5d18bf8660 (refs/remotes/trunk)
M   doc/manual/R-admin.texi
Checking svn:mergeinfo changes since r66855: 6 sources, 1 changed
W:svn cherry-pick ignored (/trunk:66854,66857) - missing 1086 commit(s) (eg 
e8cc0c31ddeeea3f8fa1ad47105d09a2c19e1a98)
r66858 = 10c8013f103d57c8a717b738e2a51c8d397c88f0 (refs/remotes/R-3-1-branch)
M   VERSION
r66859 = 0f865f247da3191431bb17bcc3c307e8735dbd97 (refs/remotes/R-3-1-branch)
Command being timed: git svn fetch --all
User time (seconds): 1023.06
System time (seconds): 15.30
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 17:27.65
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 785332
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 884
Minor (reclaiming a frame) page faults: 527668
Voluntary context switches: 2792
Involuntary context switches: 107718
Swaps: 0
File system inputs: 194704
File system outputs: 170032
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

---
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Anomaly with the new code - Re: git-svn performance

2014-10-24 Thread Eric Wong
Hin-Tak Leung ht...@users.sourceforge.net wrote:
 I keep tabs of a particular svn repository over many years
 and run git svn fetch --all every few days. So that's the old clone.
 Since this discussion started, I made a new one with git 2.1.0 patched
 with the first two patches below, a couple of weeks ago. And I ran
 'git svn fetch --all' on both every few days since.
 
 I have added a few more patches, so the whole list is the 6
 below against 2.1.0. The latest fetch is really strange - the fetch against
 the new clone took almost twice as long and uses almost twice
 as much memory, vs against the old. 17 min, 800 MB vs 10 min 400MB.
 Details below. Maybe this is a performance issue about how the clones
 were made?

Memory usage seems to grow with the amount of revisions fetched,
see below.  And higher memory means slower fork() on Linux systems.

 0001-git-svn-only-look-at-the-new-parts-of-svn-mergeinfo.patch
 0002-git-svn-only-look-at-the-root-path-for-svn-mergeinfo.patch   
 0003-git-svn-reduce-check_cherry_pick-cache-overhead.patch
 0004-git-svn-cache-only-mergeinfo-revisions.patch 

 0006-git-svn-clear-global-SVN-pool-between-get_log-invoca.patch   

0006 is insufficient and incompatible with older SVN.
I pushed git-svn: reload RA every log-window-size
(commit dfa72fdb96befbd790f623bb2909a347176753c2) instead
which saves much more memory:

http://mid.gmane.org/20141024225352.gb31...@dcvr.yhbt.net

But there still seems to be some slow growth with many revisions
which is not mergeinfo-related.

 0007-git-svn-remove-mergeinfo-rev-caching.patch 

I think it is also safe to remove the _rev_list memoization since
it uses a lot of memory.  The remaining caches should be tiny
(but useful, I think).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html