Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-21 Thread Doug Robinson
Julian:

Given the required RA protocol changes, when could this change be shipped?
 What version of SVN?

Thank you.

Doug


On Wed, Feb 19, 2014 at 10:06 AM, Julian Foad julianf...@btopenworld.comwrote:

 Marc Strapetz wrote:
  Julian Foad wrote:
  It looks like we have an agreement in principle. Would you like to file
 an
  enhancement issue?
 
  Great. I've filed an issue now:
 
  http://subversion.tigris.org/issues/show_bug.cgi?id=4469
 
  Would you please review the various attributes (Subcomponent, ...)?

 That's great, thanks. I added a reference to this email thread, added
 myself to the CC list, and tweaked the type from 'feature' to 'enhancement'
 (just my personal interpretation) and schedule from '---' to 'unscheduled'
 (which just indicates I've thought about it and am stating that it's not
 currently tied to any particular release, it doesn't mean it has to stay
 that way).

 I talked with Brane about this and we discussed how it might make more
 sense to do a higher level API. Instead of asking what is the absolute
 difference in the mergeinfo representations? it could ask What merges and
 other interesting events have occurred in the lifetime of this path?.
 There are a couple of reasons.

 The API as sketched so far is pretty straightforward, but even so the
 effort needed to implement it is not trivial. It requires RA protocol
 changes as well as all the layers of API change. The mergeinfo
 representation is subject to change. It feels like a backward step to
 invest effort in adding more support that is tied specifically to the
 current format.

 SmartSVN and other front ends like to be able to draw a merge graph. Even
 the 'svn mergeinfo' command-line command now draws a little ASCII-art graph
 showing limited information about the most recent merge. At present they
 all have to interpret mergeinfo themselves, at a pretty low level, and the
 interpretation is subtle and poorly understood. (I don't understand the
 edge cases related to adds and deletes properly, and I've been working with
 it for years.)

 So it seems like a good idea to encapsulate the interpretation of
 mergeinfo a bit more, and expose data in a form that is geared specifically
 towards explaining the history in the way that users can understand it.
 Maybe think of it as an extended 'log' operation, adding a small number of
 new notification types such as:

   * there is a full merge into here, bringing in all the new changes
   from PATH up to REV;
   * there is a partial merge to here, bringing in some changes
   from PATH between REV1 and REV2;

 What do you think of that sort of interface? Does your code already
 calculate something like that?

 - Julian




-- 
Douglas B. Robinson | *Senior Product Manager*

WANdisco // *Non-Stop Data*

t. 925-396-1125
e. doug.robin...@wandisco.com

-- 
Listed on the London Stock Exchange: 
WANDhttp://www.bloomberg.com/quote/WAND:LN

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE 
PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its 
subsidiaries, (WANdisco) does not waive any confidentiality or privilege. 
 If you are not the intended recipient, please notify us immediately and 
destroy the message without disclosing its contents to anyone.  Any 
distribution, use or copying of this e-mail or the information it contains 
by other than an intended recipient is unauthorized.  The views and 
opinions expressed in this e-mail message are the author's own and may not 
reflect the views and opinions of WANdisco, unless the author is authorized 
by WANdisco to express such views or opinions on its behalf.  All email 
sent to or from this address is subject to electronic storage and review by 
WANdisco.  Although WANdisco operates anti-virus programs, it does not 
accept responsibility for any damage whatsoever caused by viruses being 
passed.



Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-21 Thread Branko Čibej
On 21.02.2014 15:50, Doug Robinson wrote:
 Julian:

 Given the required RA protocol changes, when could this change be
 shipped?  What version of SVN?

We treat a protocol extension the same way as an API extension: new
protocol-level features can only appear in minor version releases (e.g.,
1.9.0 or 1.10.0), and they must be implemented in such a way that they
do not affect older clients.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-21 Thread Julian Foad
Doug Robinson wrote:

 Julian:
 
 Given the required RA protocol changes, when could this change be
 shipped?  What version of SVN?


Hi Doug. A change like that could be shipped in a 1.x.0 version.

- Julian


 Julian Foad wrote:
 Marc Strapetz wrote:
 Julian Foad wrote:
 It looks like we have an agreement in principle. Would you like to file an
 enhancement issue?

 Great. I've filed an issue now:

 http://subversion.tigris.org/issues/show_bug.cgi?id=4469

[...]

 I talked with Brane about this and we discussed how it might make more
 sense to do a higher level API. [...]


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-19 Thread Marc Strapetz
On 18.02.2014 15:26, Julian Foad wrote:
 Marc Strapetz wrote:
 On 17.02.2014 18:36, Julian Foad wrote:
   Marc Strapetz wrote:
   Hence an API like the following should work well for us:

   interface MergeinfoDiffCallback {
void mergeinfoDiff(int revision,
   MapString, Mergeinfo pathToAddedMergeinfo,
   MapString, Mergeinfo pathToRemovedMergeinfo);
   }

   void getMergeinfoDiff(String rootPath,
long fromRev, long toRev,
MergeinfoDiffCallback callback)
throws ClientException;

   This should give us all mergeinfo which affects any path at or below
   rootPath.
 [...]
 let's use the simpler version that's sufficient for your use case.

 That will be fine.
 [...]
 From cache perspective it's easier to build the cache starting at r0:
 [...] Anyway, I agree that receiving mergeinfo for more recent
 revisions first is reasonable as well. Hence if you say the effort is
 the same, then we could allow both: fromRev = toRev, in which case we
 will received mergeinfo in ascending order and fromRev  toRev in which
 case it will be descending order?
 
 Could do. It seems like a relatively minor decision.
 
 [...] important that ranges for which no mergeinfo diff is present
   will be processed quickly on the server-side, otherwise we could run
   into some kind of endless loop, if the cache building process is
   shutdown and resumed frequently.

   [...] There is a client-side work-around: request ranges of say a thousand
 revisions at a time, and then you can easily keep track of how many of these
 requests have been completed.

 OK, that will work.
 
 It looks like we have an agreement in principle. Would you like to file an 
 enhancement issue?

Great. I've filed an issue now:

http://subversion.tigris.org/issues/show_bug.cgi?id=4469

Would you please review the various attributes (Subcomponent, ...)?

-Marc




Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-19 Thread Julian Foad
Marc Strapetz wrote:
 Julian Foad wrote:
 It looks like we have an agreement in principle. Would you like to file an 
 enhancement issue?
 
 Great. I've filed an issue now:
 
 http://subversion.tigris.org/issues/show_bug.cgi?id=4469
 
 Would you please review the various attributes (Subcomponent, ...)?

That's great, thanks. I added a reference to this email thread, added myself to 
the CC list, and tweaked the type from 'feature' to 'enhancement' (just my 
personal interpretation) and schedule from '---' to 'unscheduled' (which just 
indicates I've thought about it and am stating that it's not currently tied to 
any particular release, it doesn't mean it has to stay that way).

I talked with Brane about this and we discussed how it might make more sense to 
do a higher level API. Instead of asking what is the absolute difference in 
the mergeinfo representations? it could ask What merges and other interesting 
events have occurred in the lifetime of this path?. There are a couple of 
reasons.

The API as sketched so far is pretty straightforward, but even so the effort 
needed to implement it is not trivial. It requires RA protocol changes as well 
as all the layers of API change. The mergeinfo representation is subject to 
change. It feels like a backward step to invest effort in adding more support 
that is tied specifically to the current format.

SmartSVN and other front ends like to be able to draw a merge graph. Even the 
'svn mergeinfo' command-line command now draws a little ASCII-art graph showing 
limited information about the most recent merge. At present they all have to 
interpret mergeinfo themselves, at a pretty low level, and the interpretation 
is subtle and poorly understood. (I don't understand the edge cases related to 
adds and deletes properly, and I've been working with it for years.)

So it seems like a good idea to encapsulate the interpretation of mergeinfo a 
bit more, and expose data in a form that is geared specifically towards 
explaining the history in the way that users can understand it. Maybe think of 
it as an extended 'log' operation, adding a small number of new notification 
types such as:

  * there is a full merge into here, bringing in all the new changes
      from PATH up to REV;
  * there is a partial merge to here, bringing in some changes
      from PATH between REV1 and REV2;

What do you think of that sort of interface? Does your code already calculate 
something like that?

- Julian



Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-19 Thread Marc Strapetz
On 19.02.2014 16:06, Julian Foad wrote:
 Marc Strapetz wrote:
 Julian Foad wrote:
 It looks like we have an agreement in principle. Would you like
 to file an enhancement issue?
 
 Great. I've filed an issue now:
 
 http://subversion.tigris.org/issues/show_bug.cgi?id=4469
 
 Would you please review the various attributes (Subcomponent,
 ...)?
 
 [...]
 
 SmartSVN and other front ends like to be able to draw a merge graph.
 Even the 'svn mergeinfo' command-line command now draws a little
 ASCII-art graph showing limited information about the most recent
 merge. At present they all have to interpret mergeinfo themselves, at
 a pretty low level, and the interpretation is subtle and poorly
 understood. (I don't understand the edge cases related to adds and
 deletes properly, and I've been working with it for years.)
 So it seems like a good idea to encapsulate the interpretation of
 mergeinfo a bit more, and expose data in a form that is geared
 specifically towards explaining the history in the way that users can
 understand it. Maybe think of it as an extended 'log' operation,
 adding a small number of new notification types such as:
 
 * there is a full merge into here, bringing in all the new changes 
 from PATH up to REV;
 * there is a partial merge to here, bringing in
 some changes from PATH between REV1 and REV2;
 
 What do you think of that sort of interface?

That definitely sounds good. Just to note that the
extended-log-information should be easily receivable and cacheable for
the entire repository and it must be rich enough to easily extract
information for a specific path.

Examples:

- allow to include/exclude subtree merges for merge arrows

- allow merge arrow display for sub-directories and individual files

Ultimately, when having received all extended-log-information for all
revisions, one should be able to recreate raw svn:mergeinfo for all
paths of all revisions. I think this will guarantee that we won't miss
any possible use case when defining the protocol and data structures.

 Does your code already calculate something like that?

Yes, and I recall having a hard time when writing this code :)

-Marc


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-18 Thread Marc Strapetz
On 17.02.2014 18:36, Julian Foad wrote:
 Marc Strapetz wrote:
 
 ... I'll dig into the cache code ...

 I did that now and the storage is quite simple: we have a main file
 which contains the diff (added, removed) for every path in every
 revision and a revision-based index file with constant record length (to
 quickly locate entries in the main file).

 This storage allows to efficiently query for the mergeinfo diff for a
 path in a certain revision. That's sufficient to build the merge arrows.
 Assembling the complete mergeinfo for a certain revision is hard with
 this cache, but actually not necessary for our use case.

 Hence an API like the following should work well for us:

 interface MergeinfoDiffCallback {
   void mergeinfoDiff(int revision,
  MapString, Mergeinfo pathToAddedMergeinfo,
  MapString, Mergeinfo pathToRemovedMergeinfo);
 }

 void getMergeinfoDiff(String rootPath,
   long fromRev, long toRev,
   MergeinfoDiffCallback callback)
   throws ClientException;

 This should give us all mergeinfo which affects any path at or below
 rootPath.

 When disregarding our particular use case, a more consistent API could be:

 void getMergeinfoDiff(IterableString paths,
   long fromRev, long toRev,
   Mergeinfo.Inheritance inherit,
   boolean includeDescendants,
   MergeinfoDiffCallback callback)
   throws ClientException;
 
 I want to discourage callers from knowing or caring how the mergeinfo is 
 stored, so I want to leave out the 'inherit' parameter.
 
 I also think it makes sense not to offer the options of ignoring descendants 
 (that is, subtree mergeinfo), or specifying multiple paths. After all, this 
 is not a low level API to be used for implementing the mergeinfo subsystem, 
 it's a high level query.
 
 So let's use the simpler version that's sufficient for your use case.

That will be fine.

 The mergeinfo diff should be received starting at fromRev and ending at
 toRev. No callback is expected if there is no mergeinfo diff for a
 certain revision. Depending on the server-side storage, we may require
 to always have fromRev = toRev or always fromRev = toRev. If it
 doesn't matter, better have always fromRev = toRev (for reasons given
 below).
 
 The same procedure could work either forwards or backwards, it doesn't really 
 matter as long as you know which way it is going. Often it is useful to know 
 about the more recent changes first, and have the option to look back right 
 to revision 0 if necessary.

From cache perspective it's easier to build the cache starting at r0:
then cache files will contain information for older revision at lower
positions. This allows to crop files easily at a certain revision and
rebuild them from there. That's something we do, if a Log message is
modified from within the GUI (it might not play a role for mergeinfo,
though). Anyway, I agree that receiving mergeinfo for more recent
revisions first is reasonable as well. Hence if you say the effort is
the same, then we could allow both: fromRev = toRev, in which case we
will received mergeinfo in ascending order and fromRev  toRev in which
case it will be descending order?

 Regarding the usage, let's assume always fromRev = toRev, then we will
 invoke

 getMergeinfoDiff(cacheRoot, 0, head, callback)

 This should start returning mergeinfo diff immediately, starting at
 revision 0, so we quickly make at least a bit of progress. Now, if the
 cache building process is shutdown and restarted later, it will resume
 with the latest known revision:

 getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)

 This procedure will be performed until we have caught up with head.
 Note, that the latestKnownRevision is the last revision for which we
 have received a callback. Depending on the server-side storage, this may
 be different from the current revision which the server is currently
 processing at the time the cache building process is shutdown. Hence it
 will be important that ranges for which no mergeinfo diff is present
 will be processed quickly on the server-side, otherwise we could run
 into some kind of endless loop, if the cache building process is
 shutdown and resumed frequently.
 
 Yes -- if the server takes a long time to work its way through a large range 
 of (say a million) revisions where there are no mergeinfo changes, there is 
 no graceful way to stop the procedure part way through, and no way to 
 discover how far it has searched when you kill it. Maybe that is not 
 important. There is a client-side work-around: request ranges of say a 
 thousand revisions at a time, and then you can easily keep track of how many 
 of these requests have been completed.

OK, that will work.

-Marc


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-18 Thread Julian Foad
Marc Strapetz wrote:
 On 17.02.2014 18:36, Julian Foad wrote:
  Marc Strapetz wrote:
  Hence an API like the following should work well for us:
 
  interface MergeinfoDiffCallback {
    void mergeinfoDiff(int revision,
                       MapString, Mergeinfo pathToAddedMergeinfo,
                       MapString, Mergeinfo pathToRemovedMergeinfo);
  }
 
  void getMergeinfoDiff(String rootPath,
                        long fromRev, long toRev,
                        MergeinfoDiffCallback callback)
                        throws ClientException;
 
  This should give us all mergeinfo which affects any path at or below
  rootPath.
[...]
 let's use the simpler version that's sufficient for your use case.
 
 That will be fine.
[...]
 From cache perspective it's easier to build the cache starting at r0:
 [...] Anyway, I agree that receiving mergeinfo for more recent
 revisions first is reasonable as well. Hence if you say the effort is
 the same, then we could allow both: fromRev = toRev, in which case we
 will received mergeinfo in ascending order and fromRev  toRev in which
 case it will be descending order?

Could do. It seems like a relatively minor decision.

 [...] important that ranges for which no mergeinfo diff is present
  will be processed quickly on the server-side, otherwise we could run
  into some kind of endless loop, if the cache building process is
  shutdown and resumed frequently.
 
  [...] There is a client-side work-around: request ranges of say a thousand
 revisions at a time, and then you can easily keep track of how many of these
 requests have been completed.
 
 OK, that will work.

It looks like we have an agreement in principle. Would you like to file an 
enhancement issue?

http://subversion.tigris.org/issues/

When you are logged in, that page includes links for filing a new issue. Please 
note that filing an issue doesn't affect whether or when the work will be done, 
but it's useful as a central place to refer to the task.

Do you have the resources to work on implementing this or are you looking for a 
volunteer?

- Julian


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-17 Thread Julian Foad
Marc Strapetz wrote:

 ... I'll dig into the cache code ...
 
 I did that now and the storage is quite simple: we have a main file
 which contains the diff (added, removed) for every path in every
 revision and a revision-based index file with constant record length (to
 quickly locate entries in the main file).
 
 This storage allows to efficiently query for the mergeinfo diff for a
 path in a certain revision. That's sufficient to build the merge arrows.
 Assembling the complete mergeinfo for a certain revision is hard with
 this cache, but actually not necessary for our use case.
 
 Hence an API like the following should work well for us:
 
 interface MergeinfoDiffCallback {
   void mergeinfoDiff(int revision,
                      MapString, Mergeinfo pathToAddedMergeinfo,
                      MapString, Mergeinfo pathToRemovedMergeinfo);
 }
 
 void getMergeinfoDiff(String rootPath,
                       long fromRev, long toRev,
                       MergeinfoDiffCallback callback)
                       throws ClientException;
 
 This should give us all mergeinfo which affects any path at or below
 rootPath.
 
 When disregarding our particular use case, a more consistent API could be:
 
 void getMergeinfoDiff(IterableString paths,
                       long fromRev, long toRev,
                       Mergeinfo.Inheritance inherit,
                       boolean includeDescendants,    
                       MergeinfoDiffCallback callback)
                       throws ClientException;

I want to discourage callers from knowing or caring how the mergeinfo is 
stored, so I want to leave out the 'inherit' parameter.

I also think it makes sense not to offer the options of ignoring descendants 
(that is, subtree mergeinfo), or specifying multiple paths. After all, this is 
not a low level API to be used for implementing the mergeinfo subsystem, it's a 
high level query.

So let's use the simpler version that's sufficient for your use case.


 The mergeinfo diff should be received starting at fromRev and ending at
 toRev. No callback is expected if there is no mergeinfo diff for a
 certain revision. Depending on the server-side storage, we may require
 to always have fromRev = toRev or always fromRev = toRev. If it
 doesn't matter, better have always fromRev = toRev (for reasons given
 below).

The same procedure could work either forwards or backwards, it doesn't really 
matter as long as you know which way it is going. Often it is useful to know 
about the more recent changes first, and have the option to look back right to 
revision 0 if necessary.

 Regarding the usage, let's assume always fromRev = toRev, then we will
 invoke
 
 getMergeinfoDiff(cacheRoot, 0, head, callback)
 
 This should start returning mergeinfo diff immediately, starting at
 revision 0, so we quickly make at least a bit of progress. Now, if the
 cache building process is shutdown and restarted later, it will resume
 with the latest known revision:
 
 getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)
 
 This procedure will be performed until we have caught up with head.
 Note, that the latestKnownRevision is the last revision for which we
 have received a callback. Depending on the server-side storage, this may
 be different from the current revision which the server is currently
 processing at the time the cache building process is shutdown. Hence it
 will be important that ranges for which no mergeinfo diff is present
 will be processed quickly on the server-side, otherwise we could run
 into some kind of endless loop, if the cache building process is
 shutdown and resumed frequently.

Yes -- if the server takes a long time to work its way through a large range of 
(say a million) revisions where there are no mergeinfo changes, there is no 
graceful way to stop the procedure part way through, and no way to discover how 
far it has searched when you kill it. Maybe that is not important. There is a 
client-side work-around: request ranges of say a thousand revisions at a time, 
and then you can easily keep track of how many of these requests have been 
completed.

OK, that sounds good enough.

- Julian


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-17 Thread Julian Foad
I took a stab at writing the JavaHL boiler-plate code for this, as attached, 
though I'm unfamiliar with JavaHL. It seems to require modifying 5 java files 
and creating 3 new ones. Is that right, JavaHL experts? It seems a lot.

The implementation in the core library is empty, as yet, in the attached patch.

- Julian


  interface MergeinfoDiffCallback {
    void mergeinfoDiff(int revision,
                       MapString, Mergeinfo pathToAddedMergeinfo,
                       MapString, Mergeinfo pathToRemovedMergeinfo);
  }
 
  void getMergeinfoDiff(String rootPath,
                        long fromRev, long toRev,
                        MergeinfoDiffCallback callback)
                        throws ClientException;Add boiler-plate code in JavaHL for a new API to get per-revision mergeinfo
diffs.

Suggested by: Marc Strapetz marc.strapetz{_AT_}syntevo.com

* subversion/include/svn_ra.h,
  subversion/libsvn_ra/ra_loader.c
  (svn_ra_get_mergeinfo): New function, with an empty implementation.

In subversion/bindings/javahl/:

* native/MergeinfoDiffCallback.h,
  native/MergeinfoDiffCallback.cpp
New files, copied from LogMessageCallback.* and adjusted.

* native/org_apache_subversion_javahl_remote_RemoteSession.cpp
  (Java_org_apache_subversion_javahl_remote_RemoteSession_getMergeinfoDiffs):
New function.

* native/RemoteSession.h,
  native/RemoteSession.cpp
  (getMergeinfoDiffs): New method.

* src/org/apache/subversion/javahl/callback/MergeinfoDiffCallback.java
  New file, copied from LogMessageCallback.java and adjusted.

* src/org/apache/subversion/javahl/ISVNRemote.java
  (getMergeinfoDiffs): New method.

* src/org/apache/subversion/javahl/remote/RemoteSession.java
  (svn_ra_get_mergeinfo_diffs): New function.
--This line, and those below, will be ignored--

Index: subversion/bindings/javahl/native/MergeinfoDiffCallback.cpp
===
--- subversion/bindings/javahl/native/MergeinfoDiffCallback.cpp	(revision 1568992)
+++ subversion/bindings/javahl/native/MergeinfoDiffCallback.cpp	(working copy)
@@ -17,60 +17,65 @@
  *KIND, either express or implied.  See the License for the
  *specific language governing permissions and limitations
  *under the License.
  * 
  * @endcopyright
  *
- * @file LogMessageCallback.cpp
- * @brief Implementation of the class LogMessageCallback
+ * @file MergeinfoDiffCallback.cpp
+ * @brief Implementation of the class MergeinfoDiffCallback
  */
 
-#include LogMessageCallback.h
+#include MergeinfoDiffCallback.h
 #include CreateJ.h
 #include EnumMapper.h
 #include JNIUtil.h
 #include svn_time.h
 #include svn_sorts.h
 #include svn_compat.h
 
 /**
- * Create a LogMessageCallback object
+ * Create a MergeinfoDiffCallback object
  * @param jcallback the Java callback object.
  */
-LogMessageCallback::LogMessageCallback(jobject jcallback)
+MergeinfoDiffCallback::MergeinfoDiffCallback(jobject jcallback)
 {
   m_callback = jcallback;
 }
 
 /**
- * Destroy a LogMessageCallback object
+ * Destroy a MergeinfoDiffCallback object
  */
-LogMessageCallback::~LogMessageCallback()
+MergeinfoDiffCallback::~MergeinfoDiffCallback()
 {
   // The m_callback does not need to be destroyed because it is the
-  // passed in parameter to the Java SVNClientInterface.logMessages
+  // passed in parameter to the Java ISVNRemote.getMergeinfoDiffs
   // method.
 }
 
 svn_error_t *
-LogMessageCallback::callback(void *baton,
- svn_log_entry_t *log_entry,
- apr_pool_t *pool)
+MergeinfoDiffCallback::callback(void *baton,
+svn_revnum_t revision,
+svn_mergeinfo_t *added_mergeinfo,
+svn_mergeinfo_t *deleted_mergeinfo,
+apr_pool_t *pool)
 {
   if (baton)
-return static_castLogMessageCallback *(baton)-singleMessage(
-log_entry, pool);
+return static_castMergeinfoDiffCallback *(baton)-singleMessage(
+revision, added_mergeinfo, deleted_mergeinfo, pool);
 
   return SVN_NO_ERROR;
 }
 
 /**
- * Callback called for a single log message
+ * Callback called for a single mergeinfo diff
  */
 svn_error_t *
-LogMessageCallback::singleMessage(svn_log_entry_t *log_entry, apr_pool_t *pool)
+MergeinfoDiffCallback::singleMessage(svn_revnum_t revision,
+ svn_mergeinfo_t *added_mergeinfo,
+ svn_mergeinfo_t *deleted_mergeinfo,
+ apr_pool_t *pool)
 {
   JNIEnv *env = JNIUtil::getEnv();
 
   // Create a local frame for our references
   env-PushLocalFrame(LOCAL_FRAME_SIZE);
   if (JNIUtil::isJavaExceptionThrown())
@@ -78,55 +83,41 @@ LogMessageCallback::singleMessage(svn_lo
 
   // The method id will not change during the time this library is
   // loaded, so it can be cached.
 

Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-17 Thread Branko Čibej
On 17.02.2014 22:25, Julian Foad wrote:
 I took a stab at writing the JavaHL boiler-plate code for this, as attached, 
 though I'm unfamiliar with JavaHL. It seems to require modifying 5 java files 
 and creating 3 new ones. Is that right, JavaHL experts? It seems a lot.

It's about right. Welcome to Java and JNI.

If this were a real attempt, we'd want to use the new jniwrapper for the
native code; see, for example, NativeStream.hpp/.cpp.

-- Brane

-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Julian Foad
Marc Strapetz wrote:
 For SmartSVN we are optionally displaying merge arrows in the Revision
 Graph. Here is a sample image, how this looks like:
 
 http://imgur.com/MzrLq00
 
 From the JavaHL sources I understand that there is currently only one
 method to retrieve server-side mergeinfo and this one works on a single
 revision only:
 
 MapString, Mergeinfo getMergeinfo(IterableString paths,
                                     long revision,
                                     Mergeinfo.Inheritance inherit,
                                     boolean includeDescendants)

Right. This is a wrapper around the core library function 
svn_ra_get_mergeinfo().

 This makes the Merge Arrow feature practically unusable for larger graphs.
 
 To improve performance, in earlier versions we were using a client-side
 mergeinfo cache (similar as the main log-cache, which TSVN is using as
 well). However, populating this cache (i.e. querying for mergeinfo for
 *every* revision of the repository) often resulted in bringing the
 entire Apache server down, especially if many users were building their
 log cache at the same time.
 
 To address these problems, it would be great to have a more powerful
 API, which allows either to retrieve all mergeinfo for a *revision
 range* or for a *set of revisions*.

The request for a more powerful API certainly makes sense, but what form of API?

In the Subversion project source code:

  # How many lines/bytes of mergeinfo in trunk, right now?
  $ svn pg -R svn:mergeinfo | wc -lc
    245   24063

  # How many branches and tags?
  $ svn ls ^/subversion/tags/ ^/subversion/branches/ | wc -l
  288

  # Approx. total lines/bytes mergeinfo per revision?
  $ echo $((245 * 289)) $((24063 * 289))
  70805 6954207

So in each revision  there are roughly 70,000 lines of mergeinfo, occupying 7 
MB in plain text representation.

The mergeinfo properties change whenever a merge is done. All other commits 
leave all the mergeinfo unchanged. So mergeinfo is unchanged in, what, 99% of 
revisions?

It doesn't seem logical to simply request all the mergeinfo for each revision 
in turn, and return it all in raw form.

Can we think of a better way to design the API so that it returns the 
interesting data without all the redundancy? Basically I think we want to 
describe changes to mergeinfo, rather than raw mergeinfo.

- Julian



 Querying a set of revisions would be more flexible and would allow to
 generate merge arrows on the fly. On the other hand, to alleviate the
 server, it's desirable to cache retrieved mergeinfo on the client-side
 anyway, hence a range query would be fine as well.
 
 -Marc




Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Branko Čibej
On 14.02.2014 11:38, Julian Foad wrote:
 Marc Strapetz wrote:
 For SmartSVN we are optionally displaying merge arrows in the Revision
 Graph. Here is a sample image, how this looks like:

 http://imgur.com/MzrLq00

 From the JavaHL sources I understand that there is currently only one
 method to retrieve server-side mergeinfo and this one works on a single
 revision only:
 MapString, Mergeinfo getMergeinfo(IterableString paths,
 long revision,
 Mergeinfo.Inheritance inherit,
 boolean includeDescendants)
 Right. This is a wrapper around the core library function 
 svn_ra_get_mergeinfo().

 This makes the Merge Arrow feature practically unusable for larger graphs.

 To improve performance, in earlier versions we were using a client-side
 mergeinfo cache (similar as the main log-cache, which TSVN is using as
 well). However, populating this cache (i.e. querying for mergeinfo for
 *every* revision of the repository) often resulted in bringing the
 entire Apache server down, especially if many users were building their
 log cache at the same time.

 To address these problems, it would be great to have a more powerful
 API, which allows either to retrieve all mergeinfo for a *revision
 range* or for a *set of revisions*.
 The request for a more powerful API certainly makes sense, but what form of 
 API?

 In the Subversion project source code:

   # How many lines/bytes of mergeinfo in trunk, right now?
   $ svn pg -R svn:mergeinfo | wc -lc
 245   24063

   # How many branches and tags?
   $ svn ls ^/subversion/tags/ ^/subversion/branches/ | wc -l
   288

   # Approx. total lines/bytes mergeinfo per revision?
   $ echo $((245 * 289)) $((24063 * 289))
   70805 6954207

 So in each revision  there are roughly 70,000 lines of mergeinfo, occupying 7 
 MB in plain text representation.

 The mergeinfo properties change whenever a merge is done. All other commits 
 leave all the mergeinfo unchanged. So mergeinfo is unchanged in, what, 99% of 
 revisions?

 It doesn't seem logical to simply request all the mergeinfo for each revision 
 in turn, and return it all in raw form.

 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.

I wonder, Julian, could something like this be useful for improving
merge in general?

We know that clients can cache most of the mergeinfo in the repository,
if they want to; I just don't have any feeling for how much sense it
would make to maintain such a cache, and if it can be made smart enough
to speed up merging significantly.

-- Brane


-- 
Branko Čibej | Director of Subversion
WANdisco // Non-Stop Data
e. br...@wandisco.com


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Julian Foad
I (Julian Foad) wrote:

 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.

Marc,

Perhaps a better way to ask the question is: Can I encourage you to write the 
API that you want? You already designed a cache for the data. What is the shape 
of the data
 in your cache, and can the API get the data you want in the form you 
want it, directly? We'd be glad to help implement it. Even if you start with an 
API which simply iterates over a range of revisions, at least that would allow 
for the possibility of improving the efficiency internally at various layers.

- Julian


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Marc Strapetz
On 14.02.2014 11:38, Julian Foad wrote:
 Marc Strapetz wrote:
 For SmartSVN we are optionally displaying merge arrows in the Revision
 Graph. Here is a sample image, how this looks like:

 http://imgur.com/MzrLq00

 From the JavaHL sources I understand that there is currently only one
 method to retrieve server-side mergeinfo and this one works on a single
 revision only:

 MapString, Mergeinfo getMergeinfo(IterableString paths,
 long revision,
 Mergeinfo.Inheritance inherit,
 boolean includeDescendants)
 
 Right. This is a wrapper around the core library function 
 svn_ra_get_mergeinfo().
 
 This makes the Merge Arrow feature practically unusable for larger graphs.

 To improve performance, in earlier versions we were using a client-side
 mergeinfo cache (similar as the main log-cache, which TSVN is using as
 well). However, populating this cache (i.e. querying for mergeinfo for
 *every* revision of the repository) often resulted in bringing the
 entire Apache server down, especially if many users were building their
 log cache at the same time.

 To address these problems, it would be great to have a more powerful
 API, which allows either to retrieve all mergeinfo for a *revision
 range* or for a *set of revisions*.
 
 The request for a more powerful API certainly makes sense, but what form of 
 API?
 
 In the Subversion project source code:
 
   # How many lines/bytes of mergeinfo in trunk, right now?
   $ svn pg -R svn:mergeinfo | wc -lc
 245   24063
 
   # How many branches and tags?
   $ svn ls ^/subversion/tags/ ^/subversion/branches/ | wc -l
   288
 
   # Approx. total lines/bytes mergeinfo per revision?
   $ echo $((245 * 289)) $((24063 * 289))
   70805 6954207
 
 So in each revision  there are roughly 70,000 lines of mergeinfo, occupying 7 
 MB in plain text representation.
 
 The mergeinfo properties change whenever a merge is done. All other commits 
 leave all the mergeinfo unchanged. So mergeinfo is unchanged in, what, 99% of 
 revisions?
 
 It doesn't seem logical to simply request all the mergeinfo for each revision 
 in turn, and return it all in raw form.
 
 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.

True, actually on the client-side we interested in the diff, anyway. So
some kind of callback:

interface MergeInfoDiffCallback {
  void mergeInfoDiff(int revision, Mergeinfo added, Mergeinfo removed);
}

would be convenient. This would work for revision ranges as well as a
set of revisions.

-Marc


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Marc Strapetz
 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.
 
 Marc,
 
 Perhaps a better way to ask the question is: Can I encourage you to write the 
 API that you want? You already designed a cache for the data. What is the 
 shape of the data
  in your cache, and can the API get the data you want in the form you 
 want it, directly? We'd be glad to help implement it. Even if you start with 
 an API which simply iterates over a range of revisions, at least that would 
 allow for the possibility of improving the efficiency internally at various 
 layers.

Looks like our emails have crossed :) I'll dig into the cache code and
will try to come back with a more detailed API suggestion soon.

-Marc


On 14.02.2014 14:09, Julian Foad wrote:
 I (Julian Foad) wrote:
 
 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.
 
 Marc,
 
 Perhaps a better way to ask the question is: Can I encourage you to write the 
 API that you want? You already designed a cache for the data. What is the 
 shape of the data
  in your cache, and can the API get the data you want in the form you 
 want it, directly? We'd be glad to help implement it. Even if you start with 
 an API which simply iterates over a range of revisions, at least that would 
 allow for the possibility of improving the efficiency internally at various 
 layers.
 
 - Julian
 


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Julian Foad
Marc Strapetz wrote:

  Can we think of a better way to design the API so that it returns the 
  interesting data without all the redundancy? Basically I think we want
 to   describe changes to mergeinfo, rather than raw mergeinfo.
 
  Marc,
 
  Perhaps a better way to ask the question is: Can I encourage you to write 
 the API that you want? You already designed a cache for the data. What is 
 the 
 shape of the data in your cache, and can the API get the data you want in the
 form you   want it, directly? We'd be glad to help implement it. Even if you
 start  with an API which simply iterates over a range of revisions, at least
 that would  allow for the possibility of improving the efficiency internally
 at various  layers.
 
 Looks like our emails have crossed :) I'll dig into the cache code and
 will try to come back with a more detailed API suggestion soon.

Excellent! Thanks.

- Julian


Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Julian Foad
Branko Čibej wrote:

 On 14.02.2014 11:38, Julian Foad wrote:

 Can we think of a better way to design the API so that it returns the
 interesting data without all the redundancy? Basically I think we want
 to describe changes to mergeinfo, rather than raw mergeinfo.
 
 I wonder, Julian, could something like this be useful for merge in general?
 
 We know that clients can cache most of the mergeinfo in the
 repository, if they want to; I just don't have any feeling for how
 much sense it would make to maintain such a cache, and if it can be
 made smart enough to speed up merging significantly.


I wasn't sure how much mergeinfo we fetch in a typical merge so I tried some 
merges with current svn branches. They all fetched mergeinfo either two or 
three times, all at the head revision, and the time taken to fetch it was not a 
substantial portion of the overall merge time. So I think the answer is we 
wouldn't currently benefit from this within the scope of one merge. (A 
persistent cache on the client machine is a different matter.)


- Julian



Re: RFE: API for an efficient retrieval of server-side mergeinfo data

2014-02-14 Thread Marc Strapetz
On 14.02.2014 14:18, Marc Strapetz wrote:
 Can we think of a better way to design the API so that it returns the 
 interesting data without all the redundancy? Basically I think we want to 
 describe changes to mergeinfo, rather than raw mergeinfo.

 Marc,

 Perhaps a better way to ask the question is: Can I encourage you to write 
 the API that you want? You already designed a cache for the data. What is 
 the shape of the data
  in your cache, and can the API get the data you want in the form you 
 want it, directly? We'd be glad to help implement it. Even if you start with 
 an API which simply iterates over a range of revisions, at least that would 
 allow for the possibility of improving the efficiency internally at various 
 layers.
 
 Looks like our emails have crossed :) I'll dig into the cache code and
 will try to come back with a more detailed API suggestion soon.

I did that now and the storage is quite simple: we have a main file
which contains the diff (added, removed) for every path in every
revision and a revision-based index file with constant record length (to
quickly locate entries in the main file).

This storage allows to efficiently query for the mergeinfo diff for a
path in a certain revision. That's sufficient to build the merge arrows.
Assembling the complete mergeinfo for a certain revision is hard with
this cache, but actually not necessary for our use case.

Hence an API like the following should work well for us:

interface MergeinfoDiffCallback {
  void mergeinfoDiff(int revision,
 MapString, Mergeinfo pathToAddedMergeinfo,
 MapString, Mergeinfo pathToRemovedMergeinfo);
}

void getMergeinfoDiff(String rootPath,
  long fromRev, long toRev,
  MergeinfoDiffCallback callback)
  throws ClientException;

This should give us all mergeinfo which affects any path at or below
rootPath.

When disregarding our particular use case, a more consistent API could be:

void getMergeinfoDiff(IterableString paths,
  long fromRev, long toRev,
  Mergeinfo.Inheritance inherit,
  boolean includeDescendants,   
  MergeinfoDiffCallback callback)
  throws ClientException;

The mergeinfo diff should be received starting at fromRev and ending at
toRev. No callback is expected if there is no mergeinfo diff for a
certain revision. Depending on the server-side storage, we may require
to always have fromRev = toRev or always fromRev = toRev. If it
doesn't matter, better have always fromRev = toRev (for reasons given
below).

Regarding the usage, let's assume always fromRev = toRev, then we will
invoke

getMergeinfoDiff(cacheRoot, 0, head, callback)

This should start returning mergeinfo diff immediately, starting at
revision 0, so we quickly make at least a bit of progress. Now, if the
cache building process is shutdown and restarted later, it will resume
with the latest known revision:

getMergeinfoDiff(cacheRoot, latestKnownRevision, head, callback)

This procedure will be performed until we have caught up with head.
Note, that the latestKnownRevision is the last revision for which we
have received a callback. Depending on the server-side storage, this may
be different from the current revision which the server is currently
processing at the time the cache building process is shutdown. Hence it
will be important that ranges for which no mergeinfo diff is present
will be processed quickly on the server-side, otherwise we could run
into some kind of endless loop, if the cache building process is
shutdown and resumed frequently.

-Marc