On 8/9/2007 5:41 PM, Dustin J. Mitchell wrote:
I've attached a patch to do just that. This patch also fixes a typo in
the RevisionSet constructor which triggered when end_rev was None,
although that code remains unused. I did remove the support for a
nonexistent end_rev from analyze_revs.
Yes, thanks for that. I think it used to be used :)
I don't have the birds-eye understanding to know if the
repositories-equal check
get_repo_root(branch_target) == get_repo_root(source_url)
is going to cause an extra round-trip to the server. I don't *think* it
will, assuming that branch_target is a working copy. Can anyone
confirm?
It will still cause an extra roudtrip because at this point we should
have ever executed "svn info source_url".
I debugged this further and noticed that the extra roundtrip is already
present in the trunk version if you use -S, and it's caused by this code
in main():
source_pathid = target_to_pathid(source)
if str(cmd) == "init" and \
source_pathid == target_to_pathid("."):
error("cannot init integration source path '%s'\nIts
repository-relative path must "
"differ from the repository-relative path of the
current directory." % source_pathid)
The target_to_pathid() call requires to get the reporoot.
I think that this shows how hard it is to notice this kind of
performance issue now that svnmerge.py code has grown in complexity and
features (and abstraction layers). Thus, I believe that an approach like
that in your patch, that tries to avoid a server roundtrip explicitally
within an unrelated function (which is just trying to get its job done)
does not scale quite well.
I think we should try to fix things at the *bottom* of the chain, that
is in the guts of the functions that physically execute svn calls.
The attached patch does just that. It adds an internal cache in
get_reporoot, so that the function becomes smarter and you can now use
it everywhere (get_reporoot() itself or any abstraction layer built upon
it) without having to code defensively. I tested this patch and it looks
OK. I also verified that it avoids the already existing
extra-roundtrip when using "-S" on the command line.
I plan to commit this, but I would like your review on it. With this
patch in, your original one-liner to just use source_url instead of
branch_url is OK because it does not cause performance regressions anymore.
(also please double check the log commit format, thanks!)
--
Giovanni Bajo
[[[
Improve performance by avoiding redundant server round-trips when
using -S.
This patch introduces an internal cache in get_repo_root() which
allows the function to automatically detect if a given URL is part
of an already analyzed repository without doing a server roundtrip.
Since I was at it, I also removed a useless global declaration for
the other similar cache used by get_svninfo().
* svnmerge.py
(get_svninfo): Remove global declaration.
(get_repo_root): Update and use the new _cache_reporoot,
storing repository roots that were already found.
* svnmerge_test.py
(TestCase_SvnMerge.svnmerge2): empty the new cache.
]]]
Index: svnmerge_test.py
===================================================================
--- svnmerge_test.py (revision 26025)
+++ svnmerge_test.py (working copy)
@@ -202,6 +202,7 @@
# Clear svnmerge's internal cache before running any
# commands.
svnmerge._cache_svninfo = {}
+ svnmerge._cache_reporoot = {}
ret = svnmerge.main(args)
except SystemExit, e:
Index: svnmerge.py
===================================================================
--- svnmerge.py (revision 26025)
+++ svnmerge.py (working copy)
@@ -769,7 +769,6 @@
"""Extract the subversion information for a target (through 'svn info').
This function uses an internal cache to let clients query information
many times."""
- global _cache_svninfo
if _cache_svninfo.has_key(target):
return _cache_svninfo[target]
info = {}
@@ -789,13 +788,16 @@
return info["URL"]
return target
+_cache_reporoot = {}
def get_repo_root(target):
"""Compute the root repos URL given a working-copy path, or a URL."""
# Try using "svn info WCDIR". This works only on SVN clients >= 1.3
if not is_url(target):
try:
info = get_svninfo(target)
- return info["Repository Root"]
+ root = info["Repository Root"]
+ _cache_reporoot[root] = None
+ return root
except KeyError:
pass
url = target_to_url(target)
@@ -803,10 +805,21 @@
else:
url = target
+ # Go through the cache of the repository roots. This avoids extra
+ # server round-trips if we are asking the root of different URLs
+ # in the same repository (the cache in get_svninfo() cannot detect
+ # that of course and would issue a remote command).
+ assert is_url(url)
+ for r in _cache_reporoot:
+ if url.startswith(r):
+ return r
+
# Try using "svn info URL". This works only on SVN clients >= 1.2
try:
info = get_svninfo(url)
- return info["Repository Root"]
+ root = info["Repository Root"]
+ _cache_reporoot[root] = None
+ return root
except LaunchError:
pass
@@ -818,6 +831,7 @@
try:
launchsvn('proplist "%s"' % temp)
except LaunchError:
+ _cache_reporoot[url] = None
return url
url = temp
_______________________________________________
Svnmerge mailing list
[email protected]
http://www.orcaware.com/mailman/listinfo/svnmerge