On 8/9/2007 5:41 PM, Dustin J. Mitchell wrote:

I've attached a patch to do just that.  This patch also fixes a typo in
the RevisionSet constructor which triggered when end_rev was None,
although that code remains unused.  I did remove the support for a
nonexistent end_rev from analyze_revs.

Yes, thanks for that. I think it used to be used :)

I don't have the birds-eye understanding to know if the
repositories-equal check
  get_repo_root(branch_target) == get_repo_root(source_url)
is going to cause an extra round-trip to the server.  I don't *think* it
will, assuming that branch_target is a working copy.  Can anyone
confirm?

It will still cause an extra roudtrip because at this point we should have ever executed "svn info source_url".

I debugged this further and noticed that the extra roundtrip is already present in the trunk version if you use -S, and it's caused by this code in main():

        source_pathid = target_to_pathid(source)
        if str(cmd) == "init" and \
               source_pathid == target_to_pathid("."):
error("cannot init integration source path '%s'\nIts repository-relative path must " "differ from the repository-relative path of the current directory." % source_pathid)


The target_to_pathid() call requires to get the reporoot.

I think that this shows how hard it is to notice this kind of performance issue now that svnmerge.py code has grown in complexity and features (and abstraction layers). Thus, I believe that an approach like that in your patch, that tries to avoid a server roundtrip explicitally within an unrelated function (which is just trying to get its job done) does not scale quite well.

I think we should try to fix things at the *bottom* of the chain, that is in the guts of the functions that physically execute svn calls.

The attached patch does just that. It adds an internal cache in get_reporoot, so that the function becomes smarter and you can now use it everywhere (get_reporoot() itself or any abstraction layer built upon it) without having to code defensively. I tested this patch and it looks OK. I also verified that it avoids the already existing extra-roundtrip when using "-S" on the command line.

I plan to commit this, but I would like your review on it. With this patch in, your original one-liner to just use source_url instead of branch_url is OK because it does not cause performance regressions anymore.

(also please double check the log commit format, thanks!)
--
Giovanni Bajo

[[[
Improve performance by avoiding redundant server round-trips when
using -S.

This patch introduces an internal cache in get_repo_root() which
allows the function to automatically detect if a given URL is part
of an already analyzed repository without doing a server roundtrip.

Since I was at it, I also removed a useless global declaration for
the other similar cache used by get_svninfo().

* svnmerge.py
  (get_svninfo): Remove global declaration.
  (get_repo_root): Update and use the new _cache_reporoot,
    storing repository roots that were already found.

* svnmerge_test.py
  (TestCase_SvnMerge.svnmerge2): empty the new cache.
]]]

Index: svnmerge_test.py
===================================================================
--- svnmerge_test.py (revision 26025)
+++ svnmerge_test.py (working copy)
@@ -202,6 +202,7 @@
                 # Clear svnmerge's internal cache before running any
                 # commands.
                 svnmerge._cache_svninfo = {}
+                svnmerge._cache_reporoot = {}
 
                 ret = svnmerge.main(args)
             except SystemExit, e:
Index: svnmerge.py
===================================================================
--- svnmerge.py (revision 26025)
+++ svnmerge.py (working copy)
@@ -769,7 +769,6 @@
     """Extract the subversion information for a target (through 'svn info').
     This function uses an internal cache to let clients query information
     many times."""
-    global _cache_svninfo
     if _cache_svninfo.has_key(target):
         return _cache_svninfo[target]
     info = {}
@@ -789,13 +788,16 @@
         return info["URL"]
     return target
 
+_cache_reporoot = {}
 def get_repo_root(target):
     """Compute the root repos URL given a working-copy path, or a URL."""
     # Try using "svn info WCDIR". This works only on SVN clients >= 1.3
     if not is_url(target):
         try:
             info = get_svninfo(target)
-            return info["Repository Root"]
+            root = info["Repository Root"]
+            _cache_reporoot[root] = None
+            return root
         except KeyError:
             pass
         url = target_to_url(target)
@@ -803,10 +805,21 @@
     else:
         url = target
 
+    # Go through the cache of the repository roots. This avoids extra
+    # server round-trips if we are asking the root of different URLs
+    # in the same repository (the cache in get_svninfo() cannot detect
+    # that of course and would issue a remote command).
+    assert is_url(url)
+    for r in _cache_reporoot:
+        if url.startswith(r):
+            return r
+
     # Try using "svn info URL". This works only on SVN clients >= 1.2
     try:
         info = get_svninfo(url)
-        return info["Repository Root"]
+        root = info["Repository Root"]
+        _cache_reporoot[root] = None
+        return root
     except LaunchError:
         pass
 
@@ -818,6 +831,7 @@
         try:
             launchsvn('proplist "%s"' % temp)
         except LaunchError:
+            _cache_reporoot[url] = None
             return url
         url = temp
 
_______________________________________________
Svnmerge mailing list
[email protected]
http://www.orcaware.com/mailman/listinfo/svnmerge

Reply via email to