date:20100915

On Wed, Sep 15, 2010 at 06:52:07AM -, hwri...@apache.org wrote:
 ==
 --- subversion/trunk/subversion/include/svn_string.h (original)
 +++ subversion/trunk/subversion/include/svn_string.h Wed Sep 15 06:52:06 2010
 @@ -253,6 +253,15 @@ svn_stringbuf_chop(svn_stringbuf_t *str,
  void
  svn_stringbuf_fillchar(svn_stringbuf_t *str, unsigned char c);
  
 +/** Append a single character @a byte onto @a targetstr.
 + *
 + * reallocs if necessary. @a targetstr is affected, nothing else is.
 + * @since New in 1.7.
 + */
 +void
 +svn_stringbuf_appendbyte(svn_stringbuf_t *targetstr,
 + char byte);
 +

The docstring should list advantages svn_stringbuf_appendbyte(buf, c)
has over svn_stringbuf_appendbytes(buf, c, 1). We need to understand
where the performance benefits really come from. IIRC the benefit was
dependent on the optimizer in the compiler to some extent, is this correct?

Stefan

Re: svn commit: r995603 - /subversion/branches/performance/subversion/libsvn_subr/svn_string.c

On Fri, Sep 10, 2010 at 03:32:14PM +0200, Stefan Sperling wrote:
 On Thu, Sep 09, 2010 at 10:58:06PM -, stef...@apache.org wrote:
  Author: stefan2
  Date: Thu Sep  9 22:58:06 2010
  New Revision: 995603
  
  URL: http://svn.apache.org/viewvc?rev=995603view=rev
  Log:
  Try really hard to help the compiler generate good code
  for this frequently called function.
  
  * subversion/libsvn_subr/svn_string.c
(svn_stringbuf_appendbyte): reformulate
  
  Modified:
  subversion/branches/performance/subversion/libsvn_subr/svn_string.c
  
  Modified: 
  subversion/branches/performance/subversion/libsvn_subr/svn_string.c
  URL: 
  http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/svn_string.c?rev=995603r1=995602r2=995603view=diff
  ==
  --- subversion/branches/performance/subversion/libsvn_subr/svn_string.c 
  (original)
  +++ subversion/branches/performance/subversion/libsvn_subr/svn_string.c Thu 
  Sep  9 22:58:06 2010
  @@ -403,8 +403,11 @@ svn_stringbuf_appendbyte(svn_stringbuf_t
 apr_size_t old_len = str-len;
 if (str-blocksize  old_len + 1)
   {
  -  str-data[old_len] = byte;
  -  str-data[old_len+1] = '\0';
  +  char *dest = str-data;
  +
  +  dest[old_len] = byte;
  +  dest[old_len+1] = '\0';
  +
 str-len = old_len+1;
   }
 else
  @@ -412,7 +415,8 @@ svn_stringbuf_appendbyte(svn_stringbuf_t
 /* we need to re-allocate the string buffer
  * - let the more generic implementation take care of that part
  */
  -  svn_stringbuf_appendbytes(str, byte, 1);
  +  char b = byte;
  +  svn_stringbuf_appendbytes(str, b, 1);
   }
   }
   
  
 
 If you're expecting this function to stay written as it now,
 you need to add a comment explaining why it is laid out the way it is.
 Otherwise people might change it, and the resulting generated code will
 be less optimal again.
 
 That said, I have my doubts whether tweaking C code in order to tune the
 output of certain compilers in certain versions is a useful thing to do.
 Will this still work as expected when new compiler versions are released?
 It seems you'd rather want to fix the optimizer of the compiler.

This has been merged into trunk now, so I'm bumping this thread to
repeat my question above.

Stefan

Re: svn commit: r997203 - in /subversion/trunk: ./ subversion/include/ subversion/libsvn_client/ subversion/libsvn_diff/ subversion/libsvn_fs_fs/ subversion/libsvn_ra_svn/ subversion/libsvn_repos/ sub

On Wed, Sep 15, 2010 at 9:45 AM, Stefan Sperling s...@elego.de wrote:
 On Wed, Sep 15, 2010 at 06:52:07AM -, hwri...@apache.org wrote:
 ==
 --- subversion/trunk/subversion/include/svn_string.h (original)
 +++ subversion/trunk/subversion/include/svn_string.h Wed Sep 15 06:52:06 2010
 @@ -253,6 +253,15 @@ svn_stringbuf_chop(svn_stringbuf_t *str,
  void
  svn_stringbuf_fillchar(svn_stringbuf_t *str, unsigned char c);

 +/** Append a single character @a byte onto @a targetstr.
 + *
 + * reallocs if necessary. @a targetstr is affected, nothing else is.
 + * @since New in 1.7.
 + */
 +void
 +svn_stringbuf_appendbyte(svn_stringbuf_t *targetstr,
 +                         char byte);
 +

 The docstring should list advantages svn_stringbuf_appendbyte(buf, c)
 has over svn_stringbuf_appendbytes(buf, c, 1). We need to understand
 where the performance benefits really come from. IIRC the benefit was
 dependent on the optimizer in the compiler to some extent, is this correct?

I don't think that's completely correct, but I'll leave it to Stefan
F. to determine that.

Since this function has now been merged to trunk, updates to the doc
string should be done on trunk.  But I feel comfortable allowing
Stefan to make these docstring edits and commit without prior review,
instead of requiring him to mail the patch to the list.

-Hyrum

Re: svn commit: r995603 - /subversion/branches/performance/subversion/libsvn_subr/svn_string.c

2010-09-15 Thread Branko Čibej

 On 15.09.2010 09:46, Stefan Sperling wrote:
 On Fri, Sep 10, 2010 at 03:32:14PM +0200, Stefan Sperling wrote:
 On Thu, Sep 09, 2010 at 10:58:06PM -, stef...@apache.org wrote:
 Author: stefan2
 Date: Thu Sep  9 22:58:06 2010
 New Revision: 995603

 URL: http://svn.apache.org/viewvc?rev=995603view=rev
 Log:
 Try really hard to help the compiler generate good code
 for this frequently called function.

 * subversion/libsvn_subr/svn_string.c
   (svn_stringbuf_appendbyte): reformulate

 Modified:
 subversion/branches/performance/subversion/libsvn_subr/svn_string.c

 Modified: 
 subversion/branches/performance/subversion/libsvn_subr/svn_string.c
 URL: 
 http://svn.apache.org/viewvc/subversion/branches/performance/subversion/libsvn_subr/svn_string.c?rev=995603r1=995602r2=995603view=diff
 ==
 --- subversion/branches/performance/subversion/libsvn_subr/svn_string.c 
 (original)
 +++ subversion/branches/performance/subversion/libsvn_subr/svn_string.c Thu 
 Sep  9 22:58:06 2010
 @@ -403,8 +403,11 @@ svn_stringbuf_appendbyte(svn_stringbuf_t
apr_size_t old_len = str-len;
if (str-blocksize  old_len + 1)
  {
 -  str-data[old_len] = byte;
 -  str-data[old_len+1] = '\0';
 +  char *dest = str-data;
 +
 +  dest[old_len] = byte;
 +  dest[old_len+1] = '\0';
 +
str-len = old_len+1;
  }
else
 @@ -412,7 +415,8 @@ svn_stringbuf_appendbyte(svn_stringbuf_t
/* we need to re-allocate the string buffer
 * - let the more generic implementation take care of that part
 */
 -  svn_stringbuf_appendbytes(str, byte, 1);
 +  char b = byte;
 +  svn_stringbuf_appendbytes(str, b, 1);
  }
  }
  

 If you're expecting this function to stay written as it now,
 you need to add a comment explaining why it is laid out the way it is.
 Otherwise people might change it, and the resulting generated code will
 be less optimal again.

 That said, I have my doubts whether tweaking C code in order to tune the
 output of certain compilers in certain versions is a useful thing to do.
 Will this still work as expected when new compiler versions are released?
 It seems you'd rather want to fix the optimizer of the compiler.
 This has been merged into trunk now, so I'm bumping this thread to
 repeat my question above.


The two optimizations shown above are indeed superfluous in C, since
there are no function calls with possible side effects between creating
the temp var and using it. A good compiler will do them itself, a great
compiler will look for possible global optimizations and might do who
knows what -- might even create a specialized inline for the second case.

It would be different if the results were used after the function call,
since we cannot use restrict because of our standards compatibility.
But as it is ... which compiler is this supposed to tickle to generate
better code? Whether something gets passed in a register in a call
depends on the platform ABI and inlining, not on the presence of an
extra local variable (which may just get optimized out anyway).

-- Brane

Public API documentation

I've been thinking about ways that our API documentation could be made
a bit cleaner.

Currently, we use a lot of prose to describe APIs, their parameters,
and other nuances.  I find this approach rather unique, and in some
ways obfuscating.  We have great documentation in that we cover a lot
of ground, but I wonder if we can be more consistent and clear.

For instance, look at the description of svn_client_checkout3():
http://people.apache.org/~hwright/svn/doc/api/trunk/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

It's is really difficult to determine what each parameter does or what
possible errors are returned, without reading the entire docstring.
Maybe that's intended, but perhaps a strategy which enumerates the
parameters would be a bit better.  I've mocked up with this may look
like:
http://people.apache.org/~hwright/svn/doc/api/temp/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

You'll note that there isn't a complex discussion of client contexts,
depth, or peg revisions, but rather links to a common location where
such concepts can be discussed in-depth.  I've also tried to detect
the various errors which may be returned, and enumerate them
specifically, rather than buried in the prose description of the API.
I think this strategy will lead to more consistent and maintainable
documentation across the project (and easier identifiable deviations
from standard behavior).

Thoughts?

-Hyrum

RE: svn commit: r997026 - /subversion/trunk/subversion/mod_dav_svn/repos.c

2010-09-15 Thread Bert Huijben

 -Original Message-
 From: cmpil...@apache.org [mailto:cmpil...@apache.org]
 Sent: dinsdag 14 september 2010 20:09
 To: comm...@subversion.apache.org
 Subject: svn commit: r997026 -
 /subversion/trunk/subversion/mod_dav_svn/repos.c

 Author: cmpilato
 Date: Tue Sep 14 18:08:59 2010
 New Revision: 997026

 URL: http://svn.apache.org/viewvc?rev=997026view=rev
 Log:
 For issue #3709 (Inconsistency between svn list and svn
 checkout), teach the mod_dav_svn tree walker logic to authorize
 access to the paths it touches.

Is it possible to port this back to 1.6.x?

Bert

RE: svn commit: r997237 - in /subversion/trunk/subversion/bindings/javahl/native: ClientContext.cpp ClientContext.h SVNClient.cpp

2010-09-15 Thread Bert Huijben



 -Original Message-
 From: hwri...@apache.org [mailto:hwri...@apache.org]
 Sent: woensdag 15 september 2010 11:18
 To: comm...@subversion.apache.org
 Subject: svn commit: r997237 - in
 /subversion/trunk/subversion/bindings/javahl/native: ClientContext.cpp
 ClientContext.h SVNClient.cpp
 
 Author: hwright
 Date: Wed Sep 15 09:17:34 2010
 New Revision: 997237
 
 URL: http://svn.apache.org/viewvc?rev=997237view=rev
 Log:
 Revert r997228.  Apparently, some of the improved error handling added
 in that
 revision wasn't to friendly with the tests.  It will require further
 digging,
 but I'm reverting in the intrim.
 
 Modified:
 
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.h
 subversion/trunk/subversion/bindings/javahl/native/SVNClient.cpp
 
 Modified:
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 URL:
 http://svn.apache.org/viewvc/subversion/trunk/subversion/bindings/javah
 l/native/ClientContext.cpp?rev=997237r1=997236r2=997237view=diff
 ===
 ===
 ---
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 (original)
 +++
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 Wed Sep 15 09:17:34 2010
 @@ -97,20 +97,20 @@ ClientContext::~ClientContext()
  env-DeleteGlobalRef(m_jctx);
  }
 
 -svn_error_t *
 -ClientContext::getContext(svn_client_ctx_t **ctx, CommitMessage
 *message)
 +svn_client_ctx_t *
 +ClientContext::getContext(CommitMessage *message)
  {
  SVN::Pool *requestPool = JNIUtil::getRequestPool();
  apr_pool_t *pool = requestPool-pool();
  svn_auth_baton_t *ab;
 -
 -*ctx = persistentCtx;
 +svn_client_ctx_t *ctx = persistentCtx;
 +//SVN_JNI_ERR(svn_client_create_context(ctx, pool), NULL);
 
  const char *configDir = m_configDir.c_str();
  if (m_configDir.length() == 0)
  configDir = NULL;
 -SVN_ERR(svn_config_get_config(((*ctx)-config), configDir,
 pool));
 -svn_config_t *config = (svn_config_t *) apr_hash_get((*ctx)-
 config,
 +SVN_JNI_ERR(svn_config_get_config((ctx-config), configDir,
 pool), NULL);
 +svn_config_t *config = (svn_config_t *) apr_hash_get(ctx-config,
 
 SVN_CONFIG_CATEGORY_CONFIG,
 
 APR_HASH_KEY_STRING);
 
 @@ -118,8 +118,10 @@ ClientContext::getContext(svn_client_ctx
  apr_array_header_t *providers;
 
  /* Populate the registered providers with the platform-specific
 providers */
 -
 SVN_ERR(svn_auth_get_platform_specific_client_providers(providers,
 -config,
 pool));
 +
 SVN_JNI_ERR(svn_auth_get_platform_specific_client_providers(providers,
 +
 config,
 +pool),
 +NULL);


This doesn't look right (in the old and new code). If you use a client session 
multiple time, you shouldn't recreate the auth baton on every invocation, but 
only if there is a reason to recreate.

The auth baton also handles some caching for the auth providers. (At least it 
stores the used passwords that weren't saved to disk, but I think it also 
stores some context state for a few providers)

Bert

Re: Public API documentation

Hyrum K. Wright wrote:
I've been thinking about ways that our API documentation could be made
a bit cleaner.

Currently, we use a lot of prose to describe APIs, their parameters,
and other nuances. I find this approach rather unique, and in some
ways obfuscating. We have great documentation in that we cover a lot
of ground, but I wonder if we can be more consistent and clear.

For instance, look at the description of svn_client_checkout3():
http://people.apache.org/~hwright/svn/doc/api/trunk/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

It's is really difficult to determine what each parameter does or what
possible errors are returned, without reading the entire docstring.
Maybe that's intended, but perhaps a strategy which enumerates the
parameters would be a bit better. I've mocked up with this may look
like:
http://people.apache.org/~hwright/svn/doc/api/temp/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

You'll note that there isn't a complex discussion of client contexts,
depth, or peg revisions, but rather links to a common location where
such concepts can be discussed in-depth. I've also tried to detect
the various errors which may be returned, and enumerate them
specifically, rather than buried in the prose description of the API.
I think this strategy will lead to more consistent and maintainable
documentation across the project (and easier identifiable deviations
from standard behavior).

Thoughts?

-Hyrum

+1.

You know what, I've been a bit skeptical of the list-of-params style of
documentation, as sometimes I've seen documentation that says @param
revision - the revision number; @param path - the path, without saying
the path of what, what kind of path, relative to what.

But your mock-up is very good. I particularly like that it refers to
the canonical definitions of depth, client_ctx (saying which parts of
that are relevant) and operative/peg revisions, instead of trying to
explain them yet again in yet another way.

I also like the way we can easily check whether the correct set of
parameters is documented. In the past we have often changed function
parameters (when revving the function, or just renaming existing params)
and have failed to notice that the doc string no longer refers to the
real params.

This style won't fit all functions - for example, I don't think it would
be useful to rewrite the doc string of svn_client_checkout2() which is
currently:

Similar to svn_client_checkout3() but with allow_unver_obstructions
always set to FALSE, and depth set according to recurse: if recurse is
TRUE, depth is svn_depth_infinity, if recurse is FALSE, depth is
svn_depth_files.

So +1 on starting to use this style and on committing your
svn_client_checkout3() doc string to start the ball rolling.

- Julian

Re: Public API documentation

On Wed, Sep 15, 2010 at 11:35 AM, Julian Foad julian.f...@wandisco.com wrote:
Hyrum K. Wright wrote:
I've been thinking about ways that our API documentation could be made
a bit cleaner.

For instance, look at the description of svn_client_checkout3():
http://people.apache.org/~hwright/svn/doc/api/trunk/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

Thoughts?

-Hyrum

+1.

And we get the added benefit of allowing doxygen to help us by
identifying undocumented or documented-but-not-existent parameters.

This style won't fit all functions - for example, I don't think it would
be useful to rewrite the doc string of svn_client_checkout2() which is
currently:

Agreed. That is essentially inherit the docstring for checkout3(),
but with the following delta.

So +1 on starting to use this style and on committing your
svn_client_checkout3() doc string to start the ball rolling.

Thanks. Though I'll probably wait a day or two, to let any dissenters
make themselves known. :)

-Hyrum

Re: svn commit: r997237 - in /subversion/trunk/subversion/bindings/javahl/native: ClientContext.cpp ClientContext.h SVNClient.cpp

On Wed, Sep 15, 2010 at 11:34 AM, Bert Huijben b...@qqmail.nl wrote:


 -Original Message-
 From: hwri...@apache.org [mailto:hwri...@apache.org]
 Sent: woensdag 15 september 2010 11:18
 To: comm...@subversion.apache.org
 Subject: svn commit: r997237 - in
 /subversion/trunk/subversion/bindings/javahl/native: ClientContext.cpp
 ClientContext.h SVNClient.cpp

 Author: hwright
 Date: Wed Sep 15 09:17:34 2010
 New Revision: 997237

 URL: http://svn.apache.org/viewvc?rev=997237view=rev
 Log:
 Revert r997228.  Apparently, some of the improved error handling added
 in that
 revision wasn't to friendly with the tests.  It will require further
 digging,
 but I'm reverting in the intrim.

 Modified:

 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
     subversion/trunk/subversion/bindings/javahl/native/ClientContext.h
     subversion/trunk/subversion/bindings/javahl/native/SVNClient.cpp

 Modified:
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 URL:
 http://svn.apache.org/viewvc/subversion/trunk/subversion/bindings/javah
 l/native/ClientContext.cpp?rev=997237r1=997236r2=997237view=diff
 ===
 ===
 ---
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 (original)
 +++
 subversion/trunk/subversion/bindings/javahl/native/ClientContext.cpp
 Wed Sep 15 09:17:34 2010
 @@ -97,20 +97,20 @@ ClientContext::~ClientContext()
      env-DeleteGlobalRef(m_jctx);
  }

 -svn_error_t *
 -ClientContext::getContext(svn_client_ctx_t **ctx, CommitMessage
 *message)
 +svn_client_ctx_t *
 +ClientContext::getContext(CommitMessage *message)
  {
      SVN::Pool *requestPool = JNIUtil::getRequestPool();
      apr_pool_t *pool = requestPool-pool();
      svn_auth_baton_t *ab;
 -
 -    *ctx = persistentCtx;
 +    svn_client_ctx_t *ctx = persistentCtx;
 +    //SVN_JNI_ERR(svn_client_create_context(ctx, pool), NULL);

      const char *configDir = m_configDir.c_str();
      if (m_configDir.length() == 0)
          configDir = NULL;
 -    SVN_ERR(svn_config_get_config(((*ctx)-config), configDir,
 pool));
 -    svn_config_t *config = (svn_config_t *) apr_hash_get((*ctx)-
 config,
 +    SVN_JNI_ERR(svn_config_get_config((ctx-config), configDir,
 pool), NULL);
 +    svn_config_t *config = (svn_config_t *) apr_hash_get(ctx-config,

 SVN_CONFIG_CATEGORY_CONFIG,

 APR_HASH_KEY_STRING);

 @@ -118,8 +118,10 @@ ClientContext::getContext(svn_client_ctx
      apr_array_header_t *providers;

      /* Populate the registered providers with the platform-specific
 providers */
 -
 SVN_ERR(svn_auth_get_platform_specific_client_providers(providers,
 -                                                            config,
 pool));
 +
 SVN_JNI_ERR(svn_auth_get_platform_specific_client_providers(providers,
 +
 config,
 +                                                                pool),
 +                NULL);


 This doesn't look right (in the old and new code). If you use a client 
 session multiple time, you shouldn't recreate the auth baton on every 
 invocation, but only if there is a reason to recreate.

 The auth baton also handles some caching for the auth providers. (At least it 
 stores the used passwords that weren't saved to disk, but I think it also 
 stores some context state for a few providers)

The reason we have to recreate the auth baton is because the user
could change the various prompters in between invocations of client
APIs.  If the prompter was immutable, we wouldn't have to do this.

(There may be a clever way of coding around it, I just haven't gotten
there yet.)

-Hyrum

Re: [Issue 3474] making a new subdir, moving files into it and then renaming the subdir, breaks history of the moved files

2010-09-15 Thread Philip Martin

Julian Foad julian.f...@wandisco.com writes:

 On Tue, 2010-09-14, Daniel Shahaf wrote:
 Johan Corveleyn wrote on Sat, Sep 11, 2010 at 00:02:16 +0200:
  On Fri, Sep 10, 2010 at 11:45 PM,  joha...@tigris.org wrote:
   http://subversion.tigris.org/issues/show_bug.cgi?id=3474
  
   --- Additional comments from joha...@tigris.org Fri Sep 10 14:45:17 
   -0700 2010 ---
   This issue seems to be fixed on trunk. The described scenario now goes 
   as follows:
  
   (assuming we're in a working copy with a versioned file a.txt)
   [[[
   $ svn mkdir subdir
   A subdir # same as in 1.6
  
   $ svn mv a.txt subdir
   A subdir\a.txt
   D a.txt  # same as in 1.6
  
   $ svn st
   A   subdir
   A  +subdir\a.txt
   D   a.txt# same as in 1.6
  
   $ svn mv subdir subdir2
   A subdir2
   D subdir\a.txt
   D subdir # different! will ask on dev list about 
   this.
  
  Is the above output an expected change of behavior?

 I believe that's a bug.  It should display the same as it did in 1.6,
 unless and until we decide to change it.

The notification rules for 1.6 are a bit haphazard, I suspect some of
the behaviour is not the result of deliberate decisions but simply a
consequence of the way the code is structured.

For example, in 1.6 move/delete notifies deletes for all nodes in a
tree, apart from nodes within an added directory.  There is no reason
for that, beyond it being the way the code is written.

For moves/copies the add notification is for all nodes in a tree apart
from nodes within a copy source directory that is normal or copied.
(I may not have got that exactly right!)

These rules are not documented as far as I can see, so the question is
whether we have to maintain this arbitrary behaviour.

Perhaps we should simply notify for all nodes?  Perhaps each
individual client should be deciding which notifications to suppress?

-- 
Philip

1.6.13 release

It's been 3 months since the last 1.6.x release, and we've got a few
items in STATUS or CHANGES which could warrant a release.  I'd like to
roll in 1 or 2 weeks, so please go through STATUS and do some review.
If folks have thoughts on the timeliness of this release, or other
changes which ought to be included, please discuss below.

(Maybe this can be an activity for those attending the apache retreat
this weekend?)

Thanks,
-Hyrum

RE: [Issue 3474] making a new subdir, moving files into it and then renaming the subdir, breaks history of the moved files

2010-09-15 Thread Bert Huijben

 -Original Message-
 From: Philip Martin [mailto:philip.mar...@wandisco.com]
 Sent: woensdag 15 september 2010 12:08
 To: Julian Foad
 Cc: Daniel Shahaf; Johan Corveleyn; Subversion Development
 Subject: Re: [Issue 3474] making a new subdir, moving files into it and
 then renaming the subdir, breaks history of the moved files

 Julian Foad julian.f...@wandisco.com writes:

  On Tue, 2010-09-14, Daniel Shahaf wrote:
  Johan Corveleyn wrote on Sat, Sep 11, 2010 at 00:02:16 +0200:
   On Fri, Sep 10, 2010 at 11:45 PM,  joha...@tigris.org wrote:
http://subversion.tigris.org/issues/show_bug.cgi?id=3474

--- Additional comments from joha...@tigris.org Fri Sep 10
 14:45:17 -0700 2010 ---
This issue seems to be fixed on trunk. The described scenario
 now goes as follows:

(assuming we're in a working copy with a versioned file a.txt)
[[[
$ svn mkdir subdir
A subdir # same as in 1.6

$ svn mv a.txt subdir
A subdir\a.txt
D a.txt  # same as in 1.6

$ svn st
A   subdir
A  +subdir\a.txt
D   a.txt# same as in 1.6

$ svn mv subdir subdir2
A subdir2
D subdir\a.txt
D subdir # different! will ask on dev list
 about this.

   Is the above output an expected change of behavior?

  I believe that's a bug.  It should display the same as it did in 1.6,
  unless and until we decide to change it.

 The notification rules for 1.6 are a bit haphazard, I suspect some of
 the behaviour is not the result of deliberate decisions but simply a
 consequence of the way the code is structured.

 For example, in 1.6 move/delete notifies deletes for all nodes in a
 tree, apart from nodes within an added directory.  There is no reason
 for that, beyond it being the way the code is written.

 For moves/copies the add notification is for all nodes in a tree apart
 from nodes within a copy source directory that is normal or copied.
 (I may not have got that exactly right!)

 These rules are not documented as far as I can see, so the question is
 whether we have to maintain this arbitrary behaviour.

 Perhaps we should simply notify for all nodes?  Perhaps each
 individual client should be deciding which notifications to suppress?

+1 on moving the choice to the clients. (svn has different requirements then
clients like Subclipse).

Personally (and for AnkhSVN) I don't have an issue with notifying adds and
deletes for the root only. (The fact that it is an add or delete tells that
it's operation changes the state of all descendants)

Bert

Re: [Issue 3474] making a new subdir, moving files into it and then renaming the subdir, breaks history of the moved files

2010-09-15 Thread Philip Martin

Bert Huijben b...@qqmail.nl writes:

 -Original Message-
 From: Philip Martin [mailto:philip.mar...@wandisco.com]
 
 Perhaps we should simply notify for all nodes?  Perhaps each
 individual client should be deciding which notifications to suppress?

 +1 on moving the choice to the clients. (svn has different requirements then
 clients like Subclipse).

 Personally (and for AnkhSVN) I don't have an issue with notifying adds and
 deletes for the root only. (The fact that it is an add or delete tells that
 it's operation changes the state of all descendants)

Root-only delete notification is much easier to implement in the
library than in the client.  That's because delete notifications for
children tend to happen before the parent, and the application has to
be organised to allow it to determine that a child will be followed by
a parent.

-- 
Philip

Re: sasl mechanisms order

On Tue, Sep 07, 2010 at 02:33:28PM +0700, Victor Sudakov wrote:
 This time it has compiled, but does not work.
 
 svn list svn://admin/test/ works OK (IMHO because the ANONYMOUS
 mechanism is sufficient for that) but svn co svn://admin/test/
 dumps core immediately.

Hi Victor,

I hope I have found the problem. Does the patch below work better?

Thanks,
Stefan

Index: subversion/libsvn_subr/config_file.c
===
--- subversion/libsvn_subr/config_file.c(revision 997080)
+++ subversion/libsvn_subr/config_file.c(working copy)
@@ -759,6 +759,11 @@ svn_config_ensure(const char *config_dir, apr_pool
  NL
 ###  may be cached to disk.NL
 ###   username   Specifies the default username.   NL
+###   preferred-sasl-mechanism   Specifies which SASL mechanismNL
+###  among the ones offered by the NL
+###  server should be tried first. NL
+###  See the SASL documentation forNL
+###  a list of mechanisms available.   NL
 ###NL
 ### Set store-passwords to 'no' to avoid storing passwords in the  NL
 ### auth/ area of your config directory.  It defaults to 'yes',NL
Index: subversion/libsvn_ra_svn/cyrus_auth.c
===
--- subversion/libsvn_ra_svn/cyrus_auth.c   (revision 997080)
+++ subversion/libsvn_ra_svn/cyrus_auth.c   (working copy)
@@ -27,6 +27,7 @@
 #include apr_thread_mutex.h
 #include apr_version.h
 
+#include svn_config.h
 #include svn_types.h
 #include svn_string.h
 #include svn_error.h
@@ -720,6 +721,67 @@ svn_error_t *svn_ra_svn__get_addresses(const char
   return SVN_NO_ERROR;
 }
 
+
+/* Return one or more SASL mechanisms from MECHLIST.
+ * SESS is the session baton.
+ * If a preferred SASL mechanism has been defined in the configuration,
+ * prefer it if it occurs within MECHLIST. Else, fall back to EXTERNAL,
+ * then ANONYMOUS, then let SASL decide.
+ * Potentially allocate the returned list of mechanisms in RESULT_POOL.
+ * Use SCRATCH_POOL for temporary allocations. */
+static const char *
+get_sasl_mechanisms(svn_ra_svn__session_baton_t *sess,
+apr_array_header_t *mechlist,
+apr_pool_t *result_pool,
+apr_pool_t *scratch_pool)
+{
+  const char *mechstring = ;
+  svn_config_t *cfg;
+
+  cfg = sess-config ? apr_hash_get(sess-config, SVN_CONFIG_CATEGORY_SERVERS,
+APR_HASH_KEY_STRING) : NULL;
+  if (cfg)
+{
+  const char *server_group;
+  const char *preferred_mech;
+
+  server_group = svn_config_find_group(cfg, sess-hostname,
+   SVN_CONFIG_SECTION_GROUPS,
+   scratch_pool);
+  if (server_group)
+svn_config_get(cfg, preferred_mech, server_group,
+   SVN_CONFIG_OPTION_PREFERRED_SASL_MECHANISM, NULL);
+  else
+preferred_mech = NULL;
+
+  if (preferred_mech  svn_ra_svn__find_mech(mechlist, preferred_mech))
+return preferred_mech;
+}
+
+  if (svn_ra_svn__find_mech(mechlist, EXTERNAL))
+return EXTERNAL;
+  else if (svn_ra_svn__find_mech(mechlist, ANONYMOUS))
+return ANONYMOUS;
+  else
+{
+  int i;
+
+  /* Create a string containing the list of mechanisms,
+   * separated by spaces. */
+  for (i = 0; i  mechlist-nelts; i++)
+{
+  svn_ra_svn_item_t *elt = APR_ARRAY_IDX(mechlist, i,
+  svn_ra_svn_item_t);
+  mechstring = apr_pstrcat(result_pool,
+   mechstring,
+   i == 0 ?  :  ,
+   elt-u.word, NULL);
+}
+}
+
+  return mechstring;
+}
+
 svn_error_t *
 svn_ra_svn__do_cyrus_auth(svn_ra_svn__session_baton_t *sess,
   apr_array_header_t *mechlist,
@@ -734,7 +796,6 @@ svn_ra_svn__do_cyrus_auth(svn_ra_svn__session_bato
  array terminator). */
   sasl_callback_t callbacks[3];
   cred_baton_t cred_baton;
-  int i;
 
   if (!sess-is_tunneled)
 {
@@ -742,24 +803,7 @@ svn_ra_svn__do_cyrus_auth(svn_ra_svn__session_bato
 sess-conn, pool));
 }
 
-  /* Prefer EXTERNAL, then ANONYMOUS, then let SASL decide. */
-  if (svn_ra_svn__find_mech(mechlist, EXTERNAL))
-mechstring = EXTERNAL;
-  else if (svn_ra_svn__find_mech(mechlist, ANONYMOUS))
-mechstring = ANONYMOUS;
-  else
-{
-  /* Create a string containing the list of mechanisms, separated by 
spaces.

Re: svn diff optimization to make blame faster?

2010-09-15 Thread Johan Corveleyn

On Sun, Sep 5, 2010 at 1:53 AM, Johan Corveleyn jcor...@gmail.com wrote:
 On Tue, Aug 24, 2010 at 11:11 AM, Philip Martin
 philip.mar...@wandisco.com wrote:
 Johan Corveleyn jcor...@gmail.com writes:
 On Sun, Aug 22, 2010 at 4:02 PM, Branko Čibej br...@xbc.nu wrote:
 svn_diff uses basically the same algorithm as GNU diff but implemented
 slightly differently and IIRC it doesn't have some of GNU diff's
 optimizations. I'm sure it can be speeded up, but haven't a clue about
 how much.

 Ok, thanks. In the meantime I saw that there is not that much
 difference anymore between GNU diff and svn_diff, after running the
 latter from a release build, and disabling my anti-virus (which makes
 me wonder why my anti-virus slows down svn_diff (impact when opening
 the modified datasource), but not on GNU diff).

 The big difference between Subversion's diff and GNU diff is that GNU
 uses heuristics to cut short the diff algorithm whereas Subversion
 grinds on to the end.  Certain files trigger the heuristics and then
 GNU diff is much faster than Subversion.  Typically the file has a
 large number of matches and non-matches repeated through the file,
 machine generated files sometimes fit this pattern.

 GNU diff's heuristics work well so when they trigger the resulting
 diff is usually good enough.  They can be disabled using the --minimal
 option and using that makes GNU diff performance much more like
 Subversion.

 Hmm, I thought harder about this. If I try --minimal with GNU diff,
 it's just as fast, still significantly faster than svn diff.

 Then I read a little bit more about diff algorithms, like the blog
 entry that Greg Hudson suggested in the other thread [1], about the
 Patience Diff algorithm [2]. In there the following two lines caught
 my eye, first steps in the diff algo:

 [[[
 1) Match the first lines of both if they're identical, then match the
 second, third, etc. until a pair doesn't match.
 2) Match the last lines of both if they're identical, then match the
 next to last, second to last, etc. until a pair doesn't match.
 ]]]
 (then the longest common subsequence algorithm kicks in)

 Now, these steps are not specific to Patience Diff, they are common to
 most diff algorithms. And I bet it's a much more efficient to exclude
 the common prefix/suffix this way, than to include them into the
 entire lcs algorithm.

 Right now, svn diff doesn't do this AFAICS. However, GNU diff does,
 and I think that's where most of the performance difference comes
 from. For big files, where a couple of lines are modified somewhere in
 the middle (and the first 3 and last 3 lines are identical), I
 think this can make a big difference.

 Thoughts?

 If no-one else fancies this, I think I'll take a stab at implementing
 this optimization (first with a POC to see if it's significant).
 Unless feedback tells me it's not worth it, not a good idea, ...

Some update on this: I have implemented this for svn_diff (excluding
the identical prefix and suffix of both files, and only then starting
to fill up the token tree and let the lcs-agorithm to its thing). It
makes a *huge* difference. On my bigfile.xml (1.5 Mb) with only one
line changed, the call to svn_diff_diff is ~10 times faster (15-20 ms
vs. 150-170 ms).

This means blame is much faster for such big files as well, because
diff accounts for 90% of the client-side work of blame. This means
that in most cases the client cpu won't be the bottleneck anymore, so
it's more constrained by server-side and network. In my tests, blame
was about twice as fast for bigfile.xml (since all my tests were done
with client and server on the same machine, I'm guessing the server is
now the bottleneck (as well as them running on the same machine)).
Same factor of performance increase for a medium-sized file (a java
source file of 300 Kb).

The patch is still very crude, with lots of code duplication, and
everything stuffed in one big function (and I may have overlooked some
edge conditions), so I don't feel it's ready to post yet (don't want
to waste people's time :)). I'll try to clean it up as soon as I can
and post a reasonable version to the list.

Some notes already:
- I basically wrote a new function datasources_open in diff_file.c
(similar to datasource_open), that opens both files (original and
modified), and compares bytes from the beginning until the first
nonmatching byte (counting newlines when it sees them), and backwards
from the end to find the first (or last, depending how you look at it)
nonmatching byte. The number of prefix_lines is then used to determine
the starting offset of the tokens (lines) that will later be
extracted.
- I think I should generalize this function to take an array of
datasources, and not just two, so it can be used by diff3 and diff4 as
well. But for now I didn't want to take on that extra complication.
- The code is complicated by the fact that the svn_diff code
(diff_file.c) works with chunks of data (doesn't read the entire
file(s)

Re: svn diff optimization to make blame faster?

On Wed, Sep 15, 2010 at 02:20:05PM +0200, Johan Corveleyn wrote:
 Just to know how hard/fast I should pursue this: is there any chance
 that a change like this could still make it into 1.7 (maybe even
 backported to 1.6?), provided that the patch is of good quality and
 all... ?

Depending on how invasive the change is, and how quickly we get it
reviewed and tested, there's a chance that this can make the 1.7 release.
For 1.6 it's probably not worth the effort.

Stefan

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

On Tue, 2010-09-07, s...@apache.org wrote:
 Author: stsp
 Date: Tue Sep  7 00:09:46 2010
 New Revision: 993183
 
 URL: http://svn.apache.org/viewvc?rev=993183view=rev
 Log:
 Introduce a new family of functions to parse numbers from strings.

Looks useful. Thanks.

 The goal is to replace direct calls to functions like atoi() and apr_atoi64(),
 which have very inconvenient error handling to the point where a lot of
 our code simply ignores conversion errors and continues operating with
 the converted data in potentially undefined state.
 The new functions, on the other hand, report conversion errors via the
 standard Subversion error handling interface, thereby not allowing callers
 to ignore conversion errors.
 
 This commit also switches selected pieces of code over to the new functions.
 More conversion to come.
 
 * subversion/include/svn_string.h
   (svn_cstring_strtoi64, svn_cstring_atoi64, svn_cstring_atoi,
svn_cstring_strtoui64, svn_cstring_atoui64, svn_cstring_atoui): Declare.
 
 * subversion/libsvn_subr/svn_string.c
   (): Include svn_private_config.h for the _() gettext macro.
   (svn_cstring_strtoi64, svn_cstring_strtoui64, svn_cstring_atoi64,
svn_cstring_atoi): New.

And the other two functions as well?  (Just missing from the log
message.)

 * subversion/libsvn_ra_serf/serf.c
   (dirent_walker): Use svn_cstring_atoi64() instead of apr_atoi64().
 
 * subversion/libsvn_diff/parse-diff.c
   (parse_offset): Call svn_cstring_strtoui64() instead of calling
apr_atoi64() and performing manual overflow checking.
 
 * subversion/mod_dav_svn/reports/log.c
   (dav_svn__log_report): Use svn_cstring_atoi() instead of atoi() for
parsing CDATA of the limit element.
 
 * subversion/mod_dav_svn/reports/replay.c
   (dav_svn__replay_report): Use svn_cstring_strtoi64() instead of atoi() for
parsing CDATA of the send-deltas element.
 
 * subversion/libsvn_subr/win32_xlate.c
   (get_page_id_from_name): Use svn_cstring_atoui() instead of atoi() to parse
the number of a codepage.
 
 * subversion/libsvn_subr/io.c
   (svn_io_read_version_file): Use svn_cstring_atoi() instead of atoi().
 
 * subversion/libsvn_subr/hash.c
   (svn_hash_read): Use svn_cstring_atoi() instead of atoi().


 Modified: subversion/trunk/subversion/include/svn_string.h
 +/**
 + * Parse the C string @a str into a 64 bit number, and return it in @a *n.
 + * Assume that the number is represented in base @a base.
 + * Raise an error if conversion fails (e.g. due to overflow), or if the
 + * converted number is smaller than @a minval or larger than @a maxval. 

What are the semantics regarding 'str' - does it ignore leading spaces,
a plus sign, decimal point, trailing spaces, trailing non-spaces?
Preferably refer to a standard function to make this clear without
trying to spell it all out in full.

 + * @since New in 1.7.
 + */
 +svn_error_t *
 +svn_cstring_strtoi64(apr_int64_t *n, const char *str,
 + apr_int64_t minval, apr_int64_t maxval,
 + int base);
 +
 +/**
 + * Parse the C string @a str into a 64 bit number, and return it in @a *n.
 + * Assume that the number is represented in base 10.
 + * Raise an error if conversion fails (e.g. due to overflow).
 + *
 + * @since New in 1.7.
 + */
 +svn_error_t *
 +svn_cstring_atoi64(apr_int64_t *n, const char *str);
 +
 +/**
 + * Parse the C string @a str into a 32 bit number, and return it in @a *n.

What does parse into a 32 bit number mean when 'int' is not 32 bits?

If the intention is to limit this to 32-bit numbers its name should
reflect it.  (The data type *could* stay as 'int' because we are allowed
to assume it's at least 32 bits, but I don't know if we'd want to do
that.)

Otherwise it should support the range of 'int' and not assume 32 bits.

 + * Assume that the number is represented in base 10.
 + * Raise an error if conversion fails (e.g. due to overflow).
 + *
 + * @since New in 1.7.
 + */
 +svn_error_t *
 +svn_cstring_atoi(int *n, const char *str);
 +
 +/**
 + * Parse the C string @a str into an unsigned 64 bit number, and return
 + * it in @a *n. Assume that the number is represented in base @a base.
 + * Raise an error if conversion fails (e.g. due to overflow), or if the
 + * converted number is smaller than @a minval or larger than @a maxval. 
 + *
 + * @since New in 1.7.
 + */
 +svn_error_t *
 +svn_cstring_strtoui64(apr_uint64_t *n, const char *str,
 +  apr_uint64_t minval, apr_uint64_t maxval,
 +  int base);
 +
 +/**
 + * Parse the C string @a str into an unsigned 64 bit number, and return
 + * it in @a *n. Assume that the number is represented in base 10.
 + * Raise an error if conversion fails (e.g. due to overflow).
 + *
 + * @since New in 1.7.
 + */
 +svn_error_t *
 +svn_cstring_atoui64(apr_uint64_t *n, const char *str);
 +
 +/**
 + * Parse the C string @a str into an unsigned 32 bit number, and return
 + * it in @a *n. Assume that the number is represented in base 10.
 + * Raise

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

Also printf formatting type mismatches:

subversion/libsvn_subr/svn_string.c: In function
'svn_cstring_strtoui64':
subversion/libsvn_subr/svn_string.c:662: format '%lu' expects type 'long
unsigned int', but argument 5 has type 'apr_uint64_t'
subversion/libsvn_subr/svn_string.c:662: format '%lu' expects type 'long
unsigned int', but argument 6 has type 'apr_uint64_t'
subversion/libsvn_subr/svn_string.c: In function 'svn_cstring_strtoi64':
subversion/libsvn_subr/svn_string.c:706: format '%ld' expects type 'long
int', but argument 5 has type 'apr_int64_t'
subversion/libsvn_subr/svn_string.c:706: format '%ld' expects type 'long
int', but argument 6 has type 'apr_int64_t'

(These should use APR_[U]INT64_T_FMT.  Doing so is complicated by the
fact that the format string localization tools don't like string
concatenation with macros, and there's no pool easily available to do a
temporary printf.)

- Julian

Re: Public API documentation

2010-09-15 Thread Uwe Stuehler

On Mi, 2010-09-15 at 10:35 +0100, Julian Foad wrote:
 Hyrum K. Wright wrote:
  I've been thinking about ways that our API documentation could be made
  a bit cleaner.
...
  Thoughts?
  
  -Hyrum
 
 +1.
 
 You know what, I've been a bit skeptical of the list-of-params style of
 documentation, as sometimes I've seen documentation that says @param
 revision - the revision number; @param path - the path, without saying
 the path of what, what kind of path, relative to what.
 
 But your mock-up is very good.  I particularly like that it refers to
 the canonical definitions of depth, client_ctx (saying which parts of
 that are relevant) and operative/peg revisions, instead of trying to
 explain them yet again in yet another way.

Indeed, I find it very good and it adds a lot of value by referencing
important concepts and giving a concise list of return values.

This style also feels a lot easier to handle, in my opinion, if I refer
to the documentation in order to use other language bindings.  Could be
due to fewer uses of C syntax and marking parameters as in/out.  I don't
feel burdened with the task to read everything and can quickly check a
single parameter or return value's description.

 This style won't fit all functions - for example, I don't think it would
 be useful to rewrite the doc string of svn_client_checkout2() which is
 currently:
 
   Similar to svn_client_checkout3() but with allow_unver_obstructions
 always set to FALSE, and depth set according to recurse: if recurse is
 TRUE, depth is svn_depth_infinity, if recurse is FALSE, depth is
 svn_depth_files.

I think that is true. It would fit many of the current version functions
but describing differences to older versions, for example, may be easier
done with a descriptive text.

Cheers,
Uwe

Re: svn diff optimization to make blame faster?

Johan Corveleyn wrote on Wed, Sep 15, 2010 at 14:20:05 +0200:
 The patch is still very crude, with lots of code duplication, and
 everything stuffed in one big function (and I may have overlooked some
 edge conditions), so I don't feel it's ready to post yet (don't want
 to waste people's time :)). I'll try to clean it up as soon as I can
 and post a reasonable version to the list.

+1, sounds like you have some interesting stuff working there. :-)

Re: svn commit: r997249 - in /subversion/trunk/subversion: include/svn_client.h libsvn_client/patch.c svn/cl.h svn/main.c svn/patch-cmd.c tests/cmdline/patch_tests.py tests/libsvn_client/client-test.c

s...@apache.org wrote on Wed, Sep 15, 2010 at 10:05:15 -:
 Author: stsp
 Date: Wed Sep 15 10:05:14 2010
 New Revision: 997249
 
 URL: http://svn.apache.org/viewvc?rev=997249view=rev
 Log:
 Add an --old-patch-target-names option to svn patch.
 This option is useful in two cases:
 
  1) A patch contains names like
 --- foo.c
 +++ foo.c.new
 and should be applied to foo.c.
 
  2) A patch contains names like
 --- foo.c.orig
 +++ foo.c
 and should be applied in reverse to foo.c, e.g. to undo prior application
 of the patch.

What happens if a user forgets to supply the new option?  (Does svn
complain that 'foo.c.new is nonexistent/unversioned'?)

For case (2): suppose I have such a patch (was emailed to me).  I
applied it using 'svn patch $patchfile'.  Now I want to unapply it; so
so I use 'svn patch --rd $patchfile' or 'svn patch --rd --optn $patchfile'?  
AIUI, currently only the latter will work?

Is the UI 'svn patch' always uses the filename in the /^+++/ line?

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

On Wed, Sep 15, 2010 at 02:14:36PM +0100, Julian Foad wrote:
 On Tue, 2010-09-07, s...@apache.org wrote:
  Author: stsp
  Date: Tue Sep  7 00:09:46 2010
  New Revision: 993183
  
  URL: http://svn.apache.org/viewvc?rev=993183view=rev
  Log:
  Introduce a new family of functions to parse numbers from strings.
 
 Looks useful. Thanks.
 
  The goal is to replace direct calls to functions like atoi() and 
  apr_atoi64(),
  which have very inconvenient error handling to the point where a lot of
  our code simply ignores conversion errors and continues operating with
  the converted data in potentially undefined state.
  The new functions, on the other hand, report conversion errors via the
  standard Subversion error handling interface, thereby not allowing callers
  to ignore conversion errors.
  
  This commit also switches selected pieces of code over to the new functions.
  More conversion to come.
  
  * subversion/include/svn_string.h
(svn_cstring_strtoi64, svn_cstring_atoi64, svn_cstring_atoi,
 svn_cstring_strtoui64, svn_cstring_atoui64, svn_cstring_atoui): Declare.
  
  * subversion/libsvn_subr/svn_string.c
(): Include svn_private_config.h for the _() gettext macro.
(svn_cstring_strtoi64, svn_cstring_strtoui64, svn_cstring_atoi64,
 svn_cstring_atoi): New.
 
 And the other two functions as well?  (Just missing from the log
 message.)

Will fix that.

  Modified: subversion/trunk/subversion/include/svn_string.h
  +/**
  + * Parse the C string @a str into a 64 bit number, and return it in @a *n.
  + * Assume that the number is represented in base @a base.
  + * Raise an error if conversion fails (e.g. due to overflow), or if the
  + * converted number is smaller than @a minval or larger than @a maxval. 
 
 What are the semantics regarding 'str' - does it ignore leading spaces,
 a plus sign, decimal point, trailing spaces, trailing non-spaces?
 Preferably refer to a standard function to make this clear without
 trying to spell it all out in full.

It's C-string, i.e. NUL-terminated. The number format is the same as
apr_strtoi64() expects, which is equivalent to the C standard function
strtol().

  + * @since New in 1.7.
  + */
  +svn_error_t *
  +svn_cstring_strtoi64(apr_int64_t *n, const char *str,
  + apr_int64_t minval, apr_int64_t maxval,
  + int base);
  +
  +/**
  + * Parse the C string @a str into a 64 bit number, and return it in @a *n.
  + * Assume that the number is represented in base 10.
  + * Raise an error if conversion fails (e.g. due to overflow).
  + *
  + * @since New in 1.7.
  + */
  +svn_error_t *
  +svn_cstring_atoi64(apr_int64_t *n, const char *str);
  +
  +/**
  + * Parse the C string @a str into a 32 bit number, and return it in @a *n.
 
 What does parse into a 32 bit number mean when 'int' is not 32 bits?
 
 If the intention is to limit this to 32-bit numbers its name should
 reflect it.  (The data type *could* stay as 'int' because we are allowed
 to assume it's at least 32 bits, but I don't know if we'd want to do
 that.)
 
 Otherwise it should support the range of 'int' and not assume 32 bits.

I'll just change that to say integer. Internally, it uses an int,
and assumes that any value from APR_INT32_MIN to APR_INT32_MAX will fit.

  Modified: subversion/trunk/subversion/libsvn_subr/svn_string.c
  +svn_error_t *
  +svn_cstring_strtoui64(apr_uint64_t *n, const char *str,
  +  apr_uint64_t minval, apr_uint64_t maxval,
  +  int base)
  +{
  +  apr_int64_t parsed;
  +
  +  /* We assume errno is thread-safe. */
  +  errno = 0; /* APR-0.9 doesn't always set errno */
  +
  +  /* ### We're throwing away half the number range here.
  +   * ### Maybe implement our own number parser? */
 
 What use is this function, compared to _strtoi64(), if it doesn't
 actually support the top half of the range?

Forward planning :)
As soon as apr_strtoui64() exists, we can switch this function to use
it without much effort.
 
  +  parsed = apr_strtoi64(str, NULL, base);
  +  if (errno != 0)
  +return svn_error_return(
  + svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL,
  +   _(Could not convert '%s' into a number),
  +   str));
  +  if (parsed  0 || parsed  minval || parsed  maxval)
 
 Forget parsed  0; the parsed  minval term will always catch that
 case because minval can't be negative.

I'd prefer to keep it for clarity until this function can work with
unsigned integers only. Note also a later commit which forced the minval
and maxval comparisons to be signed.

  +return svn_error_return(
  + svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL,
  +   _(Number '%s' is out of range), str));
  +  *n = parsed;
  +  return SVN_NO_ERROR;
  +}
  +
  +svn_error_t *
  +svn_cstring_atoui64(apr_uint64_t *n, const char *str)
  +{
  +  return svn_error_return(svn_cstring_strtoui64(n, str, 0,
  +

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

On Wed, Sep 15, 2010 at 02:27:21PM +0100, Julian Foad wrote:
 Also printf formatting type mismatches:
 
 subversion/libsvn_subr/svn_string.c: In function
 'svn_cstring_strtoui64':
 subversion/libsvn_subr/svn_string.c:662: format '%lu' expects type 'long
 unsigned int', but argument 5 has type 'apr_uint64_t'
 subversion/libsvn_subr/svn_string.c:662: format '%lu' expects type 'long
 unsigned int', but argument 6 has type 'apr_uint64_t'
 subversion/libsvn_subr/svn_string.c: In function 'svn_cstring_strtoi64':
 subversion/libsvn_subr/svn_string.c:706: format '%ld' expects type 'long
 int', but argument 5 has type 'apr_int64_t'
 subversion/libsvn_subr/svn_string.c:706: format '%ld' expects type 'long
 int', but argument 6 has type 'apr_int64_t'
 
 (These should use APR_[U]INT64_T_FMT.  Doing so is complicated by the
 fact that the format string localization tools don't like string
 concatenation with macros, and there's no pool easily available to do a
 temporary printf.)

The translation issues is why I didn't use the APR_[U]INT64_T_FMT.
It's either using it and make life hard for translators, or have these
warnings... neither solution is perfect. Or is there a better solution?

We have a similar define for svn_revnum_t BTW, but it's been deprecated
for the same reason.

Stefan

Re: Public API documentation

On 09/15/2010 05:35 AM, Julian Foad wrote:
 Hyrum K. Wright wrote:
 I've been thinking about ways that our API documentation could be made
 a bit cleaner.

[...]

 It's is really difficult to determine what each parameter does or what
 possible errors are returned, without reading the entire docstring.
 Maybe that's intended, but perhaps a strategy which enumerates the
 parameters would be a bit better.  I've mocked up with this may look
 like:
 http://people.apache.org/~hwright/svn/doc/api/temp/group__clnt__wc__checkout.html#ga31e87d0db53bf1158eaceb6552c63bae

I would suggest that yes, it is intended that folks read the entire
docstring for a function before using it.  But I get that sometimes it
might be nice to more quickly lookup a particular parameter's general
meaning without swimming through tons of context.

Julian Foad wrote:
 But your mock-up is very good.  I particularly like that it refers to
 the canonical definitions of depth, client_ctx (saying which parts of
 that are relevant) and operative/peg revisions, instead of trying to
 explain them yet again in yet another way.

+1

 I also like the way we can easily check whether the correct set of
 parameters is documented.

Yup.


-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Re: Public API documentation

2010-09-15 Thread Daniel Trebbien

Hyrum K. Wright hyrum_wright at mail.utexas.edu writes:
 Thanks.  Though I'll probably wait a day or two, to let any dissenters
 make themselves known. :)

-1/2. There are parts of your idea that I like, such as linking to common
documentation, enumerating all of the errors that can be returned, and
specifying whether parameters are in, out, or in/out, but in general I do not
like the idea of changing the style of documentation to focus on documentation
of the parameters.

I am a new user of the API, so in some ways I think that my initial impression
is representative of others newbies' initial impression of the documentation.
And, my initial impression was that I liked how the all of the parameters were
explained in enough detail to be able to actually use the function. In the case
of `svn_client_checkout3`, for example, I like how the documentation currently
points out that the most important field of the ctx is `notify_func2`. The
proposed idea of simply linking to the documentation of `svn_client_ctx_t`
would not be very helpful to a beginner, I think, because there are *so many*
fields of the `svn_client_ctx_t` structure, and it is not immediately clear
which ones affect the behavior of `svn_client_checkout3`. Also, other details
were trimmed away in the mock-up (e.g.: the basic description went from the
detailed Checkout a working copy of `URL` at `revision`, looked up at
`peg_revision`, using `path` as the root directory of the newly checked out
working copy, and authenticating with the authentication baton cached in `ctx`
to Checkout a working copy from a repository;  the description of
`result_rev` is currently more detailed than in the proposed description).

The use of structures to pass in bundles of parameters is problematic for
the proposed style of documentation, because it is not obvious how the fields
of the bundles of parameters should be listed as parameters. Do
you document `ctx-notify_func2` as an [in] parameter? Also, Julian's example
of `svn_client_checkout2` is another problem with the proposed style.

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

2010-09-15 Thread Philip Martin

Stefan Sperling s...@elego.de writes:

 The translation issues is why I didn't use the APR_[U]INT64_T_FMT.
 It's either using it and make life hard for translators, or have these
 warnings... neither solution is perfect. Or is there a better solution?

Have two format strings and pick the right one based on sizeof(long).
It's tricky to select at compile time but it could be done at run-time
as this string only occurs in error messages and so is not performance
critical.  Or it could be done at configure time, borrowing code from
APR, or we could get APR to provide APR_LONG_IS[32|64]BITS.

-- 
Philip

Re: Copies via update -- net win or net loss? (Was: svn commit: r996914 ...)

C. Michael Pilato wrote on Tue, Sep 14, 2010 at 10:58:20 -0400:
 On 09/14/2010 10:25 AM, rhuij...@apache.org wrote:
  Author: rhuijben
  Date: Tue Sep 14 14:25:52 2010
  New Revision: 996914
  
  URL: http://svn.apache.org/viewvc?rev=996914view=rev
  Log:
  When mixing the two major hacks of the update editor (copy_from location
  and file externals), we can get into an unexpected state. Update an
  assertion to handle this state properly and slightly update the expected
  test result.
 
 In light of our vision for the future regarding the pristine cache, and the
 seeming flakiness of the special-case code added to handle copies during
 update, should we kill that feature?

Could someone summarize the feature/hack/special-case being discussed
here?  Is it about using the copyfrom info to optimize some over-the-wire
transmissions during an update, as opposed to always asking for the fulltext
of a copied file?  (e.g., I found locate_copyfrom() in update-editor.c)

Thanks,

Daniel.

 I've never been convinced that it was
 truly beneficial as written anyway -- it seems to just sorta throws what
 ifs across the wire and then burdens the client with verification and
 handling before falling back to doing what it has always done for non-copied
 files.
 
 -- 
 C. Michael Pilato cmpil...@collab.net
 CollabNet  www.collab.net  Distributed Development On Demand

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

Stefan Sperling wrote:
 On Wed, Sep 15, 2010 at 02:14:36PM +0100, Julian Foad wrote:
  On Tue, 2010-09-07, s...@apache.org wrote:
   * subversion/include/svn_string.h
 (svn_cstring_strtoi64, svn_cstring_atoi64, svn_cstring_atoi,
  svn_cstring_strtoui64, svn_cstring_atoui64, svn_cstring_atoui): 
   Declare.
   
   * subversion/libsvn_subr/svn_string.c
 (): Include svn_private_config.h for the _() gettext macro.
 (svn_cstring_strtoi64, svn_cstring_strtoui64, svn_cstring_atoi64,
  svn_cstring_atoi): New.
  
  And the other two functions as well?  (Just missing from the log
  message.)
 
 Will fix that.
 
   Modified: subversion/trunk/subversion/include/svn_string.h
   +/**
   + * Parse the C string @a str into a 64 bit number, and return it in @a 
   *n.
   + * Assume that the number is represented in base @a base.
   + * Raise an error if conversion fails (e.g. due to overflow), or if the
   + * converted number is smaller than @a minval or larger than @a maxval. 
  
  What are the semantics regarding 'str' - does it ignore leading spaces,
  a plus sign, decimal point, trailing spaces, trailing non-spaces?
  Preferably refer to a standard function to make this clear without
  trying to spell it all out in full.
 
 It's C-string, i.e. NUL-terminated. The number format is the same as
 apr_strtoi64() expects, which is equivalent to the C standard function
 strtol().

Great, either of those will make a good reference from the doc string.

   + * @since New in 1.7.
   + */
   +svn_error_t *
   +svn_cstring_strtoi64(apr_int64_t *n, const char *str,
   + apr_int64_t minval, apr_int64_t maxval,
   + int base);

   +/**
   + * Parse the C string @a str into a 32 bit number, and return it in @a 
   *n.
  
  What does parse into a 32 bit number mean when 'int' is not 32 bits?
  
  If the intention is to limit this to 32-bit numbers its name should
  reflect it.  (The data type *could* stay as 'int' because we are allowed
  to assume it's at least 32 bits, but I don't know if we'd want to do
  that.)
  
  Otherwise it should support the range of 'int' and not assume 32 bits.
 
 I'll just change that to say integer.

Great.

  Internally, it uses an int,
 and assumes that any value from APR_INT32_MIN to APR_INT32_MAX will fit.

That's the inconsistency I mentioned below: it claims to support 'int'
but the implementation enforces a 32-bit range limit regardless of the
size of 'int'.  It should enforce the range of the actual 'int' type
instead.  (If the code uses 64-bit internally and assumes that 'int' is
= 64 bits, I'd be OK with that.)

   Modified: subversion/trunk/subversion/libsvn_subr/svn_string.c
   +svn_error_t *
   +svn_cstring_strtoui64(apr_uint64_t *n, const char *str,
   +  apr_uint64_t minval, apr_uint64_t maxval,
   +  int base)
   +{
   +  apr_int64_t parsed;
   +
   +  /* We assume errno is thread-safe. */
   +  errno = 0; /* APR-0.9 doesn't always set errno */
   +
   +  /* ### We're throwing away half the number range here.
   +   * ### Maybe implement our own number parser? */
  
  What use is this function, compared to _strtoi64(), if it doesn't
  actually support the top half of the range?
 
 Forward planning :)
 As soon as apr_strtoui64() exists, we can switch this function to use
 it without much effort.

OK, that's fine if we're only using it to read counts of real things,
like real file sizes.  If we want to read arbitrary 64-bit numbers such
as digests or virtual memory addresses, then we might need the upper
range to work.

What's the failure mode?  From a quick look at the present code I think
it will return an error like Number '(2^63)' is out of range '[0, (2^64
- 1)]' which is not exactly correct but at least it leads the developer
to the right part of the source code.

Please add a note in the doc string that these two functions only
support numbers up to 2^63-1.  We can remove the restiction and the note
in some later version.

 
   +  parsed = apr_strtoi64(str, NULL, base);
   +  if (errno != 0)
   +return svn_error_return(
   + svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL,
   +   _(Could not convert '%s' into a number),
   +   str));
   +  if (parsed  0 || parsed  minval || parsed  maxval)
  
  Forget parsed  0; the parsed  minval term will always catch that
  case because minval can't be negative.
 
 I'd prefer to keep it for clarity until this function can work with
 unsigned integers only. Note also a later commit which forced the minval
 and maxval comparisons to be signed.

Oh, oops - I see: the comparison with minval can't catch it because it
would convert both args to unsigned.

The present version says:

  if ((errno == ERANGE  (val == APR_INT64_MIN || val == APR_INT64_MAX)) ||
  val  0 || (apr_uint64_t)val  minval || (apr_uint64_t)val  maxval)
return svn_error_return(

Re: svn commit: r996581 - in /subversion/trunk/subversion: libsvn_diff/parse-diff.c tests/cmdline/patch_tests.py

Stefan Sperling wrote on Tue, Sep 14, 2010 at 21:18:26 +0200:
 On Tue, Sep 14, 2010 at 12:49:06PM +0100, Julian Foad wrote:
  Right.  The issue seems to be How do I determine what path I should
  apply the patch to?  If the patch style is
  
  --- wc/file
  +++ wc/file
  
  or
  
  --- old_wc/file
  +++ new_wc/file
  
  then you're going to be stripping the first component so it doesn't
  matter which path you use.  If the patch is just to one file and looks
  something like
  
  --- wc/file.old
  +++ wc/file
  
  or
  
  --- wc/file
  +++ wc/file.new
  
  or
  
  --- wc/file.old
  +++ wc/file.new
  
  then obviously it's not so simple and the patch consumer (person) may
  need to be able to tell svn patch which path it should look at if
  we're to be able to handle cases like this.
 
 Yes. I've been trying to come up with a good UI for selecting the right
 filename for svn patch to use. Any ideas?

Paraphrasing my commit reply a few hours ago, how about:

* Always use the name from the /^+++/ line by default (regardless of --rd).
* Have a --strip-extension flag.  (Effect: filename loses everything after the 
last dot)
* Have a the following config option:
[[[
# ~/.subversion/config
[miscellany]
patch-strip-extensions = .new .old .orig .org ~ \
 .merge-left .merge-right .working
]]]
for extensions that are stripped automagically.

Sounds sane?  (I haven't thought about this /too/ much, so beware of
bugs in the above idea)

Daniel

Re: Copies via update -- net win or net loss? (Was: svn commit: r996914 ...)

On 09/15/2010 12:01 PM, Daniel Shahaf wrote:
 C. Michael Pilato wrote on Tue, Sep 14, 2010 at 10:58:20 -0400:
 On 09/14/2010 10:25 AM, rhuij...@apache.org wrote:
 Author: rhuijben
 Date: Tue Sep 14 14:25:52 2010
 New Revision: 996914

 URL: http://svn.apache.org/viewvc?rev=996914view=rev
 Log:
 When mixing the two major hacks of the update editor (copy_from location
 and file externals), we can get into an unexpected state. Update an
 assertion to handle this state properly and slightly update the expected
 test result.

 In light of our vision for the future regarding the pristine cache, and the
 seeming flakiness of the special-case code added to handle copies during
 update, should we kill that feature?
 
 Could someone summarize the feature/hack/special-case being discussed
 here?  Is it about using the copyfrom info to optimize some over-the-wire
 transmissions during an update, as opposed to always asking for the fulltext
 of a copied file?  (e.g., I found locate_copyfrom() in update-editor.c)

That's the one.

The server does a little extra work to calculate a copyfrom path, a little
extra work to authorize that path.  If it all checks out, the server sends
that copyfrom info plus any additional content/prop deltas against that
copyfrom source to the client instead of just transmitting the new contents
of the copied thing in full.

If the client sees copyfrom info, it then goes to try to find that same
thing in the working copy.  If it manages to find something, it does a local
copy and then applies those extra deltas (if any) from the server.  If it
doesn't find the copyfrom source locally, it then asks the server for the
pristine version of the copyfrom source so it has something to apply those
deltas to.

Yeah.  locate_copyfrom().  Eek.

I don't doubt that in some cases, this code might save someone a bit of
network traffic.  But our vision for this code in the future involves much
more reliable, less-heuristically-driven approaches.  SHA1 checksums into a
database keyed on as much, etc.  I seriously believe we'd be better off
losing this codepath today, if only so we can simplify what is already a
very complex operation (update).

At this point, I'm not even really asking for permission any more.  If
somebody doesn't beat me to it, I'll gut the feature myself after I wrap up
some other things that hold my attention right now.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

[PATCH] NODES table presence values

Bert, Erik, Greg...

I think the schema should not disallow the 'excluded' presence in NODES
table where op_depth  0 (which corresponds roughly to old
WORKING_NODE).  There are already 'copy' cases where it is used, and
seems useful and right.

I also think the schema should not disallow 'absent'.  We have already
talked about how, if you allow a sub-tree copy (for example) to have a
child that is 'absent' (unauthorized), you won't be able to commit that
copy.  But I don't think that means that the schema should forbid ever
representing such a case.  I think if we forbid it at this level we
would possibly be making the rules more complex than they need to be and
would possibly run into problems later when we might find that actually
there are good cases to be made for representing such a state.

So I think we should simplify the NODES table doc string as follows:

[[[
Index: subversion/libsvn_wc/wc-metadata.sql
===
--- subversion/libsvn_wc/wc-metadata.sql(revision 997335)
+++ subversion/libsvn_wc/wc-metadata.sql(working copy)
@@ -709,38 +709,13 @@ CREATE TABLE NODES (
   /* Is this node present or has it been excluded for some reason?
 
  In case 'op_depth' is equal to 0, this is part of the BASE tree; in
  that case, all presence values except 'base-deleted' are allowed.
 
  In case 'op_depth' is greater than 0, this is part of a layer of
- working nodes; in that case, the following presence values apply:
-
- Only allowed values: normal, not-present, incomplete, base-deleted.
- (the others do not make sense for the WORKING tree)
-
- normal: this node has been added/copied/moved-here. There may be an
-   underlying BASE node at this location, implying this is a replace.
-   Scan upwards from here looking for copyfrom or moved_here values
-   to detect the type of operation constructing this node.
-
- not-present: the node (or parent) was originally copied or moved-here.
-   A subtree of that source has since been deleted. There may be
-   underlying BASE node to replace. For a move-here or copy-here, the
-   records are simply removed rather than switched to not-present.
-   Note this reflects a deletion only. It is not possible move-away
-   nodes from the WORKING tree. The purported destination would receive
-   a copy from the original source of a copy-here/move-here, or if the
-   nodes were plain adds, those nodes would be shifted to that target
-   for addition.
-
- incomplete: nodes are being added into the WORKING tree, and the full
-   information about this node is not (yet) present.
-
- base-deleted: the underlying BASE node has been marked for deletion due
-   to a delete or a move-away (see the moved_to column to determine
-   which), and has not been replaced.  */
+ working nodes; in that case, all presence values are allowed.  */
   presence  TEXT NOT NULL,
 
   /* NULL depth means default (typically svn_depth_infinity) */
   /* ### depth on WORKING? seems this is a BASE-only concept. how do
  ### you do files on an added-directory? can't really ignore
  ### the subdirs! */
]]]

Thoughts?

- Julian

Re: svn commit: r997249 - in /subversion/trunk/subversion: include/svn_client.h libsvn_client/patch.c svn/cl.h svn/main.c svn/patch-cmd.c tests/cmdline/patch_tests.py tests/libsvn_client/client-test.c

On Wed, Sep 15, 2010 at 04:51:15PM +0200, Daniel Shahaf wrote:
 s...@apache.org wrote on Wed, Sep 15, 2010 at 10:05:15 -:
  Author: stsp
  Date: Wed Sep 15 10:05:14 2010
  New Revision: 997249
  
  URL: http://svn.apache.org/viewvc?rev=997249view=rev
  Log:
  Add an --old-patch-target-names option to svn patch.
  This option is useful in two cases:
  
   1) A patch contains names like
  --- foo.c
  +++ foo.c.new
  and should be applied to foo.c.
  
   2) A patch contains names like
  --- foo.c.orig
  +++ foo.c
  and should be applied in reverse to foo.c, e.g. to undo prior 
  application
  of the patch.
 
 What happens if a user forgets to supply the new option?  (Does svn
 complain that 'foo.c.new is nonexistent/unversioned'?)

$ svn patch test.diff
Skipped missing target: 'foo.new'
Summary of conflicts:
  Skipped paths: 1

 For case (2): suppose I have such a patch (was emailed to me).  I
 applied it using 'svn patch $patchfile'.  Now I want to unapply it; so
 so I use 'svn patch --rd $patchfile' or 'svn patch --rd --optn $patchfile'?  
 AIUI, currently only the latter will work?

Which one works depends on the way path names appear in the patch.

Say you have a diff produced with svn diff.
That will work with either combination of options.

$ svn diff
Index: alpha
===
--- alpha   (revision 2)
+++ alpha   (working copy)
@@ -1 +1,2 @@
 alpha
+aaa

You need the new option only to handle cases where people used e.g. the
UNIX diff tool to create a patch using left/right names they made up
while producing the diff.

 Is the UI 'svn patch' always uses the filename in the /^+++/ line?

Yes. By default, that's the name that is used.

If you reverse the diff, the filenames get swapped as well.
So if you use the --optn option, the filenames get swapped again, or not at
all, depending which way you look at it :)

This UI may not be ideal, and I'm open for suggestions on how we could
improve it. Maybe there's a way to handle this without user interaction,
or a better name for the new option?

The UNIX patch tool prompts for a filename when it cannot find a target.
We could do the same, defining yet another prompt callback type.

But I don't like the fact that UNIX patch does not do tab-completion in its
prompt. I think the lack of tab-completion really affects usability.
So if svn had a prompt for this, I'd like it to support tab-completion.

So the new --optn option is an easier way to implement this.
Maybe it's sufficient? Note that having the option around doesn't hurt
even if we decide to implement prompting later.

Stefan

Re: Public API documentation

On Wed, Sep 15, 2010 at 5:07 PM, Daniel Trebbien dtrebb...@gmail.com wrote:
 Hyrum K. Wright hyrum_wright at mail.utexas.edu writes:
 Thanks.  Though I'll probably wait a day or two, to let any dissenters
 make themselves known. :)

 -1/2. There are parts of your idea that I like, such as linking to common
 documentation, enumerating all of the errors that can be returned, and
 specifying whether parameters are in, out, or in/out, but in general I do not
 like the idea of changing the style of documentation to focus on documentation
 of the parameters.

 I am a new user of the API, so in some ways I think that my initial impression
 is representative of others newbies' initial impression of the documentation.
 And, my initial impression was that I liked how the all of the parameters were
 explained in enough detail to be able to actually use the function. In the 
 case
 of `svn_client_checkout3`, for example, I like how the documentation currently
 points out that the most important field of the ctx is `notify_func2`. The

Actually, the important fields are notify_func2, and the auth_baton,
but the current docs make zero mention of the later.

 proposed idea of simply linking to the documentation of `svn_client_ctx_t`
 would not be very helpful to a beginner, I think, because there are *so many*
 fields of the `svn_client_ctx_t` structure, and it is not immediately clear
 which ones affect the behavior of `svn_client_checkout3`.

Then we need to make it clear, such as saying used for authentication
and notification.  In the client context docs, we'd have something
like if used for authentication... and if used for
notification

 Also, other details
 were trimmed away in the mock-up (e.g.: the basic description went from the
 detailed Checkout a working copy of `URL` at `revision`, looked up at
 `peg_revision`, using `path` as the root directory of the newly checked out
 working copy, and authenticating with the authentication baton cached in 
 `ctx`
 to Checkout a working copy from a repository;  the description of
 `result_rev` is currently more detailed than in the proposed description).

But that's my point.  We spent a lot of time writing the *exact same*
documentation for many APIs.  Instead of write of `URL` at
`revision`, looked up at `peg_revision`, using `path` as the root
directory of the newly checked out working copy, for every single API
which takes a URL, peg revision and revision, I'd prefer to only write
that text once, where it can then be referenced from every API which
needs it.  (Kinda like a documentation subroutine. :)

In the bits that are referenced often, such as svn_client_ctx_t, it
would be important that we are explicit about what the members mean,
and what they do, much more than we currently are.  And I pray we're
currently being consistent on how we use those members between APIs.

By the way, I think your perspective is very valuable.  It's been a
long time since I was new to the Subversion APIs, and have since lost
that perspective.  But I also use the API documentation on a regular
basis, demonstrating that we should cater to both new and old API
consumers.

 The use of structures to pass in bundles of parameters is problematic for
 the proposed style of documentation, because it is not obvious how the fields
 of the bundles of parameters should be listed as parameters. Do
 you document `ctx-notify_func2` as an [in] parameter?

We document all of the client context as an [in] parameter.  But yeah,
having a massive bunch of stuff is not very conducive for
documentation (or for maintenance, it turns out).

 Also, Julian's example
 of `svn_client_checkout2` is another problem with the proposed style.

We'd leave that as-is.

A side goal of this little project is to make the documentation better
parable, so that automagic docs of the bindings, for instance, could
be generated.  But that's really tangential to human readability.

Thanks for the feedback,
-Hyrum

Re: svn commit: r997249 - in /subversion/trunk/subversion: include/svn_client.h libsvn_client/patch.c svn/cl.h svn/main.c svn/patch-cmd.c tests/cmdline/patch_tests.py tests/libsvn_client/client-test.c

On Wed, Sep 15, 2010 at 8:42 PM, Stefan Sperling s...@elego.de wrote:
 On Wed, Sep 15, 2010 at 04:51:15PM +0200, Daniel Shahaf wrote:
 s...@apache.org wrote on Wed, Sep 15, 2010 at 10:05:15 -:
  Author: stsp
  Date: Wed Sep 15 10:05:14 2010
  New Revision: 997249
 
  URL: http://svn.apache.org/viewvc?rev=997249view=rev
  Log:
  Add an --old-patch-target-names option to svn patch.
  This option is useful in two cases:
 
   1) A patch contains names like
      --- foo.c
      +++ foo.c.new
      and should be applied to foo.c.
 
   2) A patch contains names like
      --- foo.c.orig
      +++ foo.c
      and should be applied in reverse to foo.c, e.g. to undo prior 
  application
      of the patch.

 What happens if a user forgets to supply the new option?  (Does svn
 complain that 'foo.c.new is nonexistent/unversioned'?)

 $ svn patch test.diff
 Skipped missing target: 'foo.new'
 Summary of conflicts:
  Skipped paths: 1

 For case (2): suppose I have such a patch (was emailed to me).  I
 applied it using 'svn patch $patchfile'.  Now I want to unapply it; so
 so I use 'svn patch --rd $patchfile' or 'svn patch --rd --optn $patchfile'?
 AIUI, currently only the latter will work?

 Which one works depends on the way path names appear in the patch.

 Say you have a diff produced with svn diff.
 That will work with either combination of options.

 $ svn diff
 Index: alpha
 ===
 --- alpha       (revision 2)
 +++ alpha       (working copy)
 @@ -1 +1,2 @@
  alpha
 +aaa

 You need the new option only to handle cases where people used e.g. the
 UNIX diff tool to create a patch using left/right names they made up
 while producing the diff.

 Is the UI 'svn patch' always uses the filename in the /^+++/ line?

 Yes. By default, that's the name that is used.

 If you reverse the diff, the filenames get swapped as well.
 So if you use the --optn option, the filenames get swapped again, or not at
 all, depending which way you look at it :)

 This UI may not be ideal, and I'm open for suggestions on how we could
 improve it. Maybe there's a way to handle this without user interaction,
 or a better name for the new option?

 The UNIX patch tool prompts for a filename when it cannot find a target.
 We could do the same, defining yet another prompt callback type.

 But I don't like the fact that UNIX patch does not do tab-completion in its
 prompt. I think the lack of tab-completion really affects usability.
 So if svn had a prompt for this, I'd like it to support tab-completion.

 So the new --optn option is an easier way to implement this.
 Maybe it's sufficient? Note that having the option around doesn't hurt
 even if we decide to implement prompting later.

How about just adding '--patch-left' and '--patch-right' or something.
 The either both have to used, or neither, and if they are, use those
names as the once in the patch.  If people can use the UNIX diff tool
to create left/right names they just made up, let's just let them
worry about feeding those names back into Subversion (rather than
guessing).

-Hyrum

Fwd: Regression for issue:2362 ?

2010-09-15 Thread Weirdan

Forwarding to dev@ as this seems to me an obvious regression (unfortunately
I got no reply from users@)

-- Forwarded message --
From: Weirdan weir...@gmail.com
Date: Wed, Sep 8, 2010 at 2:50 PM
Subject: Regression for issue:2362 ?
To: us...@subversion.apache.org

Hello

I'm using svn v1.6.12 and it seems that calling 'svn info --xml' on a file
in an unversioned directory still causes svn to produce broken xml:

[weir...@localhost /home/sam/trunk] mkdir dir

[weir...@localhost /home/sam/trunk] touch dir/file

[weir...@localhost /home/sam/trunk] svn info --xml dir/file

?xml version=1.0?

info

svn: 'dir' is not a working copy

[weir...@localhost /home/sam/trunk] svn --version

svn, version 1.6.12 (r955767)

   compiled Aug  5 2010, 15:04:19

 Copyright (C) 2000-2009 CollabNet.

Subversion is open source software, see http://subversion.tigris.org/

This product includes software developed by CollabNet (
 http://www.Collab.Net/).

 The following repository access (RA) modules are available:

 * ra_neon : Module for accessing a repository via WebDAV protocol using
 Neon.

  - handles 'http' scheme

  - handles 'https' scheme

* ra_svn : Module for accessing a repository using the svn network protocol.

  - with Cyrus SASL authentication

  - handles 'svn' scheme

* ra_local : Module for accessing a repository on local disk.

  - handles 'file' scheme

* ra_serf : Module for accessing a repository via WebDAV protocol using
 serf.

  - handles 'http' scheme

  - handles 'https' scheme

 [weir...@localhost /home/sam/trunk]

While the output is not exactly the same as in this ticket:
http://subversion.tigris.org/issues/show_bug.cgi?id=2887 ,  it seems to be
similar enough to call this a regression.

I'd comment on original issue, but it doesn't seem to be possible, so I'm
posting this to the mailing list hoping someone from dev team would pick
this up.

-- 
  Best regards,
  Bruce Weirdan mailto:
weir...@gmail.com

-- 
  Best regards,
  Bruce Weirdan mailto:
weir...@gmail.com

Re: svn commit: r993183 - in /subversion/trunk/subversion: include/ libsvn_diff/ libsvn_ra_serf/ libsvn_subr/ mod_dav_svn/reports/

On Wed, Sep 15, 2010 at 05:12:43PM +0100, Julian Foad wrote:
+ * @since New in 1.7.
+ */
+svn_error_t *
+svn_cstring_strtoi64(apr_int64_t *n, const char *str,
+ apr_int64_t minval, apr_int64_t maxval,
+ int base);
 
+/**
+ * Parse the C string @a str into a 32 bit number, and return it in @a 
*n.
   
   What does parse into a 32 bit number mean when 'int' is not 32 bits?
   
   If the intention is to limit this to 32-bit numbers its name should
   reflect it.  (The data type *could* stay as 'int' because we are allowed
   to assume it's at least 32 bits, but I don't know if we'd want to do
   that.)
   
   Otherwise it should support the range of 'int' and not assume 32 bits.
  
  I'll just change that to say integer.
 
 Great.
 
   Internally, it uses an int,
  and assumes that any value from APR_INT32_MIN to APR_INT32_MAX will fit.
 
 That's the inconsistency I mentioned below: it claims to support 'int'
 but the implementation enforces a 32-bit range limit regardless of the
 size of 'int'.  It should enforce the range of the actual 'int' type
 instead.  (If the code uses 64-bit internally and assumes that 'int' is
 = 64 bits, I'd be OK with that.)

Should we use INT_MIN and INT_MAX so?

Modified: subversion/trunk/subversion/libsvn_subr/svn_string.c
+svn_error_t *
+svn_cstring_strtoui64(apr_uint64_t *n, const char *str,
+  apr_uint64_t minval, apr_uint64_t maxval,
+  int base)
+{
+  apr_int64_t parsed;
+
+  /* We assume errno is thread-safe. */
+  errno = 0; /* APR-0.9 doesn't always set errno */
+
+  /* ### We're throwing away half the number range here.
+   * ### Maybe implement our own number parser? */
   
   What use is this function, compared to _strtoi64(), if it doesn't
   actually support the top half of the range?
  
  Forward planning :)
  As soon as apr_strtoui64() exists, we can switch this function to use
  it without much effort.
 
 OK, that's fine if we're only using it to read counts of real things,
 like real file sizes.  If we want to read arbitrary 64-bit numbers such
 as digests or virtual memory addresses, then we might need the upper
 range to work.
 
 What's the failure mode?  From a quick look at the present code I think
 it will return an error like Number '(2^63)' is out of range '[0, (2^64
 - 1)]' which is not exactly correct but at least it leads the developer
 to the right part of the source code.
 
 Please add a note in the doc string that these two functions only
 support numbers up to 2^63-1.  We can remove the restiction and the note
 in some later version.

If we allowed ourselves to use strtoull() directly instead of using
apr_strtoi64(), this problem would go away. Apparently it's part of C99
only though. If we decide to do this, I'll skip apr_strtoi64() as well,
and just call strtoll() directly.

We could also use strtol() and strtoul(), which are C89.

+  parsed = apr_strtoi64(str, NULL, base);
+  if (errno != 0)
+return svn_error_return(
+ svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL,
+   _(Could not convert '%s' into a 
number),
+   str));
+  if (parsed  0 || parsed  minval || parsed  maxval)
   
   Forget parsed  0; the parsed  minval term will always catch that
   case because minval can't be negative.
  
  I'd prefer to keep it for clarity until this function can work with
  unsigned integers only. Note also a later commit which forced the minval
  and maxval comparisons to be signed.
 
 Oh, oops - I see: the comparison with minval can't catch it because it
 would convert both args to unsigned.
 
 The present version says:
 
   if ((errno == ERANGE  (val == APR_INT64_MIN || val == APR_INT64_MAX)) ||
   val  0 || (apr_uint64_t)val  minval || (apr_uint64_t)val  maxval)
 return svn_error_return(
  svn_error_createf(SVN_ERR_INCORRECT_PARAMS, NULL,
_(Number '%s' is out of range '[%lu, %lu]'),
str, minval, maxval));
 
 It forces the comparisons to be unsigned.  AFAIK that's the same as C's
 rules for signed/unsigned comparisons of the same size anyway.

It might be, but I can't keep all rules of C in my head.
So it's easier to be explicit.
 
 What's with the ERANGE  (val == ...) test?  I notice the Posix spec
 for 'strtol' says when it sets errno to ERANGE it also returns LONG_MIN
 or LONG_MAX, but what's the point of testing that, even if
 apr_strtoi64() does the same (which it doesn't document)?  When reading
 this code, it looks like you're deliberately excluding some other ERANGE
 conditions from this test and allowing them to be treated as success
 conditions, but I don't think that's what you mean.  Strip the redundant
 tests!

POSIX does that to allow the caller to tell whether underflow or
overflow

Re: svn commit: r995475 - /subversion/trunk/subversion/libsvn_repos/load.c

On Fri, Sep 10, 2010 at 04:02:52PM +0200, Branko Čibej wrote:
  On 10.09.2010 15:26, Stefan Sperling wrote:
  I know we're using C89, but maybe it's time to move on and upgrade to C99
  where the benefits are desirable? When Subversion was started, C89 was about
  a decade old, and C99 is just as old now...
 
 Microsoft's C compiler, to name only one, still does not provide most of
 the C99 features. I don't know about standard library support.
 
 However, I don't see where you gain with using strtol(). First of all,
 it was already in C89 and has effectively the same interface as the APR
 conversions. C99 added strtoll() but it has the same interface. What's
 the benefit?

Raising this thread again, because there is a benefit to using strtol()'s
unsigned cousin, strtoul(). APR does not provide an interface for it.
Can we use it? Can we also use its 64-bit cousin strtoull()? If we can use
those two, we might as well use strtol() and strtoul() and skip apr_strtoi64()
completely.

Stefan

Re: svn commit: r995475 - /subversion/trunk/subversion/libsvn_repos/load.c

On Wed, Sep 15, 2010 at 9:03 PM, Stefan Sperling s...@elego.de wrote:
 On Fri, Sep 10, 2010 at 04:02:52PM +0200, Branko Čibej wrote:
  On 10.09.2010 15:26, Stefan Sperling wrote:
  I know we're using C89, but maybe it's time to move on and upgrade to C99
  where the benefits are desirable? When Subversion was started, C89 was 
  about
  a decade old, and C99 is just as old now...

 Microsoft's C compiler, to name only one, still does not provide most of
 the C99 features. I don't know about standard library support.

 However, I don't see where you gain with using strtol(). First of all,
 it was already in C89 and has effectively the same interface as the APR
 conversions. C99 added strtoll() but it has the same interface. What's
 the benefit?

 Raising this thread again, because there is a benefit to using strtol()'s
 unsigned cousin, strtoul(). APR does not provide an interface for it.
 Can we use it? Can we also use its 64-bit cousin strtoull()? If we can use
 those two, we might as well use strtol() and strtoul() and skip apr_strtoi64()
 completely.

Adding an implementation to APR isn't a problem, but it would take a
while to trickle down (we'd still need a private implementation).

-Hyrum

Re: svn commit: r997249 - in /subversion/trunk/subversion: include/svn_client.h libsvn_client/patch.c svn/cl.h svn/main.c svn/patch-cmd.c tests/cmdline/patch_tests.py tests/libsvn_client/client-test.c

On Wed, Sep 15, 2010 at 08:53:17PM +0200, Hyrum K. Wright wrote:
 How about just adding '--patch-left' and '--patch-right' or something.
  The either both have to used, or neither, and if they are, use those
 names as the once in the patch.

I don't like these names because it's not clear what left and right refer to.
Also, why would you need to use both these options together?

 If people can use the UNIX diff tool
 to create left/right names they just made up, let's just let them
 worry about feeding those names back into Subversion (rather than
 guessing).

One goal of svn patch is to facilitate application of arbitrary unidiffs.
The person generating the patch may not be the same person worrying
about applying the patch to a Subversion working copy.

In any case, people can always edit the patch file. But having an option
like this saves a bit of manual edits in the cases described.

Stefan

Re: svn commit: r996581 - in /subversion/trunk/subversion: libsvn_diff/parse-diff.c tests/cmdline/patch_tests.py

On Wed, Sep 15, 2010 at 06:15:58PM +0200, Daniel Shahaf wrote:
 Stefan Sperling wrote on Tue, Sep 14, 2010 at 21:18:26 +0200:
  On Tue, Sep 14, 2010 at 12:49:06PM +0100, Julian Foad wrote:
   Right.  The issue seems to be How do I determine what path I should
   apply the patch to?  If the patch style is
   
   --- wc/file
   +++ wc/file
   
   or
   
   --- old_wc/file
   +++ new_wc/file
   
   then you're going to be stripping the first component so it doesn't
   matter which path you use.  If the patch is just to one file and looks
   something like
   
   --- wc/file.old
   +++ wc/file
   
   or
   
   --- wc/file
   +++ wc/file.new
   
   or
   
   --- wc/file.old
   +++ wc/file.new
   
   then obviously it's not so simple and the patch consumer (person) may
   need to be able to tell svn patch which path it should look at if
   we're to be able to handle cases like this.
  
  Yes. I've been trying to come up with a good UI for selecting the right
  filename for svn patch to use. Any ideas?
 
 Paraphrasing my commit reply a few hours ago, how about:
 
 * Always use the name from the /^+++/ line by default (regardless of --rd).
 * Have a --strip-extension flag.  (Effect: filename loses everything after 
 the last dot)
 * Have a the following config option:
 [[[
 # ~/.subversion/config
 [miscellany]
 patch-strip-extensions = .new .old .orig .org ~ \
  .merge-left .merge-right .working
 ]]]
 for extensions that are stripped automagically.
 
 Sounds sane?  (I haven't thought about this /too/ much, so beware of
 bugs in the above idea)

Looking at the patch(1) man page, it sounds like we don't really need
to reinvent the wheel here:

 patch will choose the file name by performing the following steps, with
 the first match used:

 1.   If patch is operating in strict IEEE Std 1003.1-2008 (``POSIX'')
  mode, the first of the ``old'', ``new'' and ``index'' file names
  that exist is used.  Otherwise, patch will examine either the
  ``old'' and ``new'' file names or, for a non-context diff, the
  ``index'' file name, and choose the file name with the fewest path
  components, the shortest basename, and the shortest total file name
  length (in that order).

 2.   If no file exists, patch checks for the existence of the files in an
  SCCS or RCS directory (using the appropriate prefix or suffix) using
  the criteria specified above.  If found, patch will attempt to get
  or check out the file.

 3.   If no suitable file was found to patch, the patch file is a context
  or unified diff, and the old file was zero length, the new file name
  is created and used.

 4.   If the file name still cannot be determined, patch will prompt the
  user for the file name to use.

I suppose we could implement the method described in step 1.
That would make us compatible to UNIX patch, and should resolve the two
use cases I have in mind.

Step 2 doesn't apply to us. Step 3 is used already. Step 4 could be
implemented as well (but I'd prefer to do that post-1.7).

Thoughts?

Stefan

Pending 1.6.x backport proposals.

I've committed  a bunch of changes for issue #3709 and related situations
(r996884, r997026, r997457, r997466, r997471, and r997474).  I *do* plan to
propose them for backport to 1.6.x, but that work will have to wait until
tomorrow.  Just letting folks know.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Pending 1.6.x backport proposals.

I've committed  a bunch of changes for issue #3709 and related situations
(r996884, r997026, r997457, r997466, r997471, and r997474).  I *do* plan to
propose them for backport to 1.6.x, but that work will have to wait until
tomorrow.  Just letting folks now.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Distributed Development On Demand



signature.asc
Description: OpenPGP digital signature

Re: svn commit: r997403 - /subversion/trunk/subversion/libsvn_wc/wc-metadata.sql

On Wed, Sep 15, 2010 at 13:19,  julianf...@apache.org wrote:
...
 -    - ### A comment on 'repos_id' and 'repos_relpath' says they may be null;
 -      is this true and wanted?

Yes. That meant use the parent's information, and extend it to
compute this node's value.

 +    - The Node-Rev group points to the corresponding repository node-rev.

 -    - ### A comment on 'revnum' says, this could be NULL for non-present
 -      nodes -- no info; is this true and wanted?
 +    - If not used (as specified by the 'presence' table above), the values
 +      are undefined.
 +      ### Perhaps we should set them to null to make it clearer.

Yes, we should.


   Content columns: (kind, properties, depth, target, checksum)
                       --  -  --  
 @@ -165,6 +177,7 @@ Overview of BASE_NODE columns:
     - Includes a copy of the corresponding date and author rev-props.

     - If not used (as specified by the 'presence' table above), all null.
 +      ### Not checked; in practice these columns may have undefined values.

They should never be undefined. They should be null.

We should never have undefined information in our schema. That leads
to madness...

Cheers,
-g

Re: svn commit: r997334 - /subversion/trunk/subversion/libsvn_wc/wc-metadata.sql

On Wed, Sep 15, 2010 at 09:45,  julianf...@apache.org wrote:
...
 +++ subversion/trunk/subversion/libsvn_wc/wc-metadata.sql Wed Sep 15 13:45:12 
 2010
 @@ -618,6 +618,21 @@ PRAGMA user_version =
    BASE nodes and on top of other WORKING nodes, due to nested tree structure
    changes. The layers are modelled using the op_depth column.

 +   An 'operation depth' refers to the number of directory levels down from
 +   the WC root at which a tree-change operation (delete, add?, copy, move)
 +   was performed.  A row's 'op_depth' does NOT refer to the depth of its own
 +   'local_relpath', but rather to the depth of the nearest tree change that
 +   affects that node.
 +
 +   The row with op_depth=0 for any given local relpath represents the base
 +   node that is created and updated by checkout, update, switch and commit
 +   post-processing.  The row with the highest op_depth for a particular
 +   local_relpath represents the working version.  Any rows with intermediate

Strictly speaking: wc_id, local_relpath

We just happen to only ever have one wc_id value right now. In the
future, should one metadata store be used for multiple working copies,
then the wc_id value will become very meaningful.

...

Cheers,
-g

Re: svn commit: r996811 - in /subversion/trunk/subversion/libsvn_wc: upgrade.c wc.h

On Tue, Sep 14, 2010 at 06:08,  rhuij...@apache.org wrote:
 Author: rhuijben
 Date: Tue Sep 14 10:08:35 2010
 New Revision: 996811

 URL: http://svn.apache.org/viewvc?rev=996811view=rev
 Log:
 Move some defines that are only used on upgrades from wc.h to upgrade.c.

 * subversion/libsvn_wc/upgrade.c
  (SVN_WC__BASE_EXT,
   SVN_WC__WORK_EXT,
   SVN_WC__REVERT_EXT): Move here...

In the past, I've dropped the SVN prefix when these became local.

Cheers,
-g

Re: svn commit: r997493 - /subversion/object-model/

On Wed, Sep 15, 2010 at 16:39,  hwri...@apache.org wrote:
 Author: hwright
 Date: Wed Sep 15 20:39:44 2010
 New Revision: 997493

 URL: http://svn.apache.org/viewvc?rev=997493view=rev
 Log:
 Create the object-model branch, wherein we will attempt to put into practice
 some of the ideas defined in notes/object-model.txt.

 This is not just because I like syncing branches with trunk.  No, really.

 (And yes, gstein, there will be a BRANCH-README forthcoming.)

Oh, I don't need to see a BRANCH-README in that directory. Maybe a
TOPLEVEL-README or somesuch would be appropriate.

:-P


Cheers,
-g

Re: svn commit: r997493 - /subversion/object-model/

On Wed, Sep 15, 2010 at 10:43 PM, Greg Stein gst...@gmail.com wrote:
 On Wed, Sep 15, 2010 at 16:39,  hwri...@apache.org wrote:
 Author: hwright
 Date: Wed Sep 15 20:39:44 2010
 New Revision: 997493

 URL: http://svn.apache.org/viewvc?rev=997493view=rev
 Log:
 Create the object-model branch, wherein we will attempt to put into practice
 some of the ideas defined in notes/object-model.txt.

 This is not just because I like syncing branches with trunk.  No, really.

 (And yes, gstein, there will be a BRANCH-README forthcoming.)

 Oh, I don't need to see a BRANCH-README in that directory. Maybe a
 TOPLEVEL-README or somesuch would be appropriate.

Har har.  I'm just trying to inflate rev numbers to get r1M.

See r997494, -5.

-Hyrum

UTF-8 NFC/NFD paths issue

2010-09-15 Thread Erik Huelsmann

Yesterday, I was talking to CMike about our long-standing issue with UTF-8
strings designating a certain path not neccessarily being equal to other
strings designating the same path. The issue has to do with NFC (composed)
and NFD (decomposed) representation of Unicode characters. CMike nicely
called the issue the Erik Huelsmann issue yesterday :-)

The issue consists of two parts:
 1. The repository which should determine that paths being added by a commit
are unique, regardless of their encoding (NFC/NFD)
 2. The client which should detect that the pathnames coming in from the
filesystem may differ in encoding from what's in the working copy
administrative files [this is mainly an issue on the Mac:
http://subversion.tigris.org/issues/show_bug.cgi?id=2464]

Mike, the thing I have been trying to find around our filesystem
implementation is where an editor drive adding a path [add_directory() or
add_file()] checks whether the file already exists. The check at that point
should be encoding independent, for example by making all paths NFC (or NFD)
before comparison. You could use utf8proc (
http://www.flexiguided.de/publications.utf8proc.en.html) to do the
normalization - it's very light-weight in contrast to ICU which provides the
same fuctionality, but has a much broader scope.

The problem I was telling you about is that I was looking in libsvn_fs_base
to find where the existence check is performed, but I couldn't find it.

Basically what I was trying to do is: do what we do now (ie fail if the path
exists and succeed if it doesn't), with the only difference that the paths
used for comparison are guarenteed to be the same normalization - meaning
they are the same byte sequence when they're equal unicode.


Bye,


Erik.

Re: UTF-8 NFC/NFD paths issue

Erik Huelsmann wrote on Wed, Sep 15, 2010 at 23:20:06 +0200:
 Yesterday, I was talking to CMike about our long-standing issue with UTF-8
 strings designating a certain path not neccessarily being equal to other
 strings designating the same path. The issue has to do with NFC (composed)
 and NFD (decomposed) representation of Unicode characters. CMike nicely
 called the issue the Erik Huelsmann issue yesterday :-)
 
 The issue consists of two parts:
  1. The repository which should determine that paths being added by a commit
 are unique, regardless of their encoding (NFC/NFD)

Will you assume that all paths in the repository have been
Unicode-canonicalized prior to entering the repository?

If yes, then we infer that no two in-repository paths (which are
bytewise different) canonicalize to the same byte sequence.  Which is
pretty useful precondition to have, i.e., what /can/ svn do on a legacy
repository where some two paths are bytewise-different yet Unicode-equal?

  2. The client which should detect that the pathnames coming in from the
 filesystem may differ in encoding from what's in the working copy
 administrative files [this is mainly an issue on the Mac:
 http://subversion.tigris.org/issues/show_bug.cgi?id=2464]
 
...
 Basically what I was trying to do is: do what we do now (ie fail if the path
 exists and succeed if it doesn't), with the only difference that the paths
 used for comparison are guarenteed to be the same normalization - meaning
 they are the same byte sequence when they're equal unicode.

Re: UTF-8 NFC/NFD paths issue