Re: [Svnmerge] request for review: #2863 - analyze_source_revs fix

Giovanni Bajo Fri, 10 Aug 2007 02:17:15 -0700

On 8/9/2007 5:41 PM, Dustin J. Mitchell wrote:

I've attached a patch to do just that.  This patch also fixes a typo in
the RevisionSet constructor which triggered when end_rev was None,
although that code remains unused.  I did remove the support for a
nonexistent end_rev from analyze_revs.


Yes, thanks for that. I think it used to be used :)

I don't have the birds-eye understanding to know if the
repositories-equal check
  get_repo_root(branch_target) == get_repo_root(source_url)
is going to cause an extra round-trip to the server.  I don't *think* it
will, assuming that branch_target is a working copy.  Can anyone
confirm?

It will still cause an extra roudtrip because at this point we shouldhave ever executed "svn info source_url".

I debugged this further and noticed that the extra roundtrip is alreadypresent in the trunk version if you use -S, and it's caused by this codein main():


        source_pathid = target_to_pathid(source)
        if str(cmd) == "init" and \
               source_pathid == target_to_pathid("."):

error("cannot init integration source path '%s'\nItsrepository-relative path must ""differ from the repository-relative path of thecurrent directory." % source_pathid)



The target_to_pathid() call requires to get the reporoot.

I think that this shows how hard it is to notice this kind ofperformance issue now that svnmerge.py code has grown in complexity andfeatures (and abstraction layers). Thus, I believe that an approach likethat in your patch, that tries to avoid a server roundtrip explicitallywithin an unrelated function (which is just trying to get its job done)does not scale quite well.

I think we should try to fix things at the *bottom* of the chain, thatis in the guts of the functions that physically execute svn calls.

The attached patch does just that. It adds an internal cache inget_reporoot, so that the function becomes smarter and you can now useit everywhere (get_reporoot() itself or any abstraction layer built uponit) without having to code defensively. I tested this patch and it looksOK. I also verified that it avoids the already existingextra-roundtrip when using "-S" on the command line.

I plan to commit this, but I would like your review on it. With thispatch in, your original one-liner to just use source_url instead ofbranch_url is OK because it does not cause performance regressions anymore.


(also please double check the log commit format, thanks!)
--
Giovanni Bajo

[[[
Improve performance by avoiding redundant server round-trips when
using -S.

This patch introduces an internal cache in get_repo_root() which
allows the function to automatically detect if a given URL is part
of an already analyzed repository without doing a server roundtrip.

Since I was at it, I also removed a useless global declaration for
the other similar cache used by get_svninfo().

* svnmerge.py
  (get_svninfo): Remove global declaration.
  (get_repo_root): Update and use the new _cache_reporoot,
    storing repository roots that were already found.

* svnmerge_test.py
  (TestCase_SvnMerge.svnmerge2): empty the new cache.
]]]

Index: svnmerge_test.py
===================================================================
--- svnmerge_test.py (revision 26025)
+++ svnmerge_test.py (working copy)
@@ -202,6 +202,7 @@
                 # Clear svnmerge's internal cache before running any
                 # commands.
                 svnmerge._cache_svninfo = {}
+                svnmerge._cache_reporoot = {}
 
                 ret = svnmerge.main(args)
             except SystemExit, e:
Index: svnmerge.py
===================================================================
--- svnmerge.py (revision 26025)
+++ svnmerge.py (working copy)
@@ -769,7 +769,6 @@
     """Extract the subversion information for a target (through 'svn info').
     This function uses an internal cache to let clients query information
     many times."""
-    global _cache_svninfo
     if _cache_svninfo.has_key(target):
         return _cache_svninfo[target]
     info = {}
@@ -789,13 +788,16 @@
         return info["URL"]
     return target
 
+_cache_reporoot = {}
 def get_repo_root(target):
     """Compute the root repos URL given a working-copy path, or a URL."""
     # Try using "svn info WCDIR". This works only on SVN clients >= 1.3
     if not is_url(target):
         try:
             info = get_svninfo(target)
-            return info["Repository Root"]
+            root = info["Repository Root"]
+            _cache_reporoot[root] = None
+            return root
         except KeyError:
             pass
         url = target_to_url(target)
@@ -803,10 +805,21 @@
     else:
         url = target
 
+    # Go through the cache of the repository roots. This avoids extra
+    # server round-trips if we are asking the root of different URLs
+    # in the same repository (the cache in get_svninfo() cannot detect
+    # that of course and would issue a remote command).
+    assert is_url(url)
+    for r in _cache_reporoot:
+        if url.startswith(r):
+            return r
+
     # Try using "svn info URL". This works only on SVN clients >= 1.2
     try:
         info = get_svninfo(url)
-        return info["Repository Root"]
+        root = info["Repository Root"]
+        _cache_reporoot[root] = None
+        return root
     except LaunchError:
         pass
 
@@ -818,6 +831,7 @@
         try:
             launchsvn('proplist "%s"' % temp)
         except LaunchError:
+            _cache_reporoot[url] = None
             return url
         url = temp

_______________________________________________
Svnmerge mailing list
[email protected]
http://www.orcaware.com/mailman/listinfo/svnmerge

Re: [Svnmerge] request for review: #2863 - analyze_source_revs fix

Reply via email to