RE: log --search test failures on trunk and 1.8.x

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Stefan Sperling [mailto:s...@elego.de]
 Sent: maandag 22 april 2013 00:08
 To: Ivan Zhakov
 Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org
 Subject: Re: log --search test failures on trunk and 1.8.x
 
 On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote:
  So I propose the following plan:
  1. Make 'log --search case-sensitive in trunk and 1.8.x.
  2. Merge utf8proc stuff to trunk
  3. Implement svn_utf__casefold() using utf8proc
  4. Implement 'log --isearch' using
 
 No --isearch please. It did exist on trunk but we made --search
 case-insensitive in r1388530 to avoid having too many options.
 
 Has anyone tried my APR patch on windows yet? I'd be interested to
 know whether or not that helps... if you are running Windows and
 care about this issue, please let me know if my APR patch helps so
 that I can prepare a fix for APR and SVN 1.8.x. I think we can do
 better than making --search case-sensitive just because of a bug in APR.
 
 I don't see why we couldn't ship a private and fixed version of
 apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined
 behaviour within tolower() via apr_fnmatch() without requiring future
 APR versions. Would this not be a good way of fixing this?

What would this fix?

This doesn't make apr_fnmatch use proper case folding, as that needs proper
UTF-8 processing and following locale rules. 

The hack will make apr_fnmatch whatever is the current locale casing and
current chararacter encoding rules on the platform for our UTF-8 characters
in the log message. (Which are known to not always be strict utf-8). And
even there it ignores multibyte characters by handing them to tolower one at
a time.

This is just stacking undefined behaviour. 

If you would first convert the characters back to the system encoding and
then pass things to apr_fnmatch it would be good enough for an 'svn'
implementation for me, but probably not for our libraries.

Like Ivan suggested: for our libraries we want strict behavior over all
platforms, no undefined behavior.

Bert





Re: log --search test failures on trunk and 1.8.x

2013-04-22 Thread 'Stefan Sperling'
On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote:
  -Original Message-
  From: Stefan Sperling [mailto:s...@elego.de]
  Sent: maandag 22 april 2013 00:08
  To: Ivan Zhakov
  Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org
  Subject: Re: log --search test failures on trunk and 1.8.x
  
  On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote:
   So I propose the following plan:
   1. Make 'log --search case-sensitive in trunk and 1.8.x.
   2. Merge utf8proc stuff to trunk
   3. Implement svn_utf__casefold() using utf8proc
   4. Implement 'log --isearch' using
  
  No --isearch please. It did exist on trunk but we made --search
  case-insensitive in r1388530 to avoid having too many options.
  
  Has anyone tried my APR patch on windows yet? I'd be interested to
  know whether or not that helps... if you are running Windows and
  care about this issue, please let me know if my APR patch helps so
  that I can prepare a fix for APR and SVN 1.8.x. I think we can do
  better than making --search case-sensitive just because of a bug in APR.
  
  I don't see why we couldn't ship a private and fixed version of
  apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined
  behaviour within tolower() via apr_fnmatch() without requiring future
  APR versions. Would this not be a good way of fixing this?
 
 What would this fix?

Our custom version of apr_fnmatch() would not pass values to tolower()
which cannot be represented as unsigned char, thus eliminating undefined
behaviour and avoiding assertion failures on Windows. We would put this
custom fnmatch() into 'svn' (not the libraries) to specifically address
the assertion failures caused by svn log --search.

Is that more clear and does it make sense?

 This doesn't make apr_fnmatch use proper case folding, as that needs proper
 UTF-8 processing and following locale rules. 

AFAIK, we are talking about assertion failures in the test suite on
Windows. The discussion has balooned into adding locale-specific
case-folding to apr_fnmatch() in order to make log --search truly
case-insensitive in all locales. But that is a separate issue from
assertion failures in tests. I want to fix the assertions, I do not
want to open up the can of worms required to have locale-aware
case-insensitive matching in apr_fnmatch() (at least I don't want
to open it right now -- perhaps later but it would be a change to APR,
not Subversion).


RE: log --search test failures on trunk and 1.8.x

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: 'Stefan Sperling' [mailto:s...@elego.de]
 Sent: maandag 22 april 2013 11:46
 To: Bert Huijben
 Cc: 'Ivan Zhakov'; 'Branko Čibej'; dev@subversion.apache.org
 Subject: Re: log --search test failures on trunk and 1.8.x
 
 On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote:
   -Original Message-
   From: Stefan Sperling [mailto:s...@elego.de]
   Sent: maandag 22 april 2013 00:08
   To: Ivan Zhakov
   Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org
   Subject: Re: log --search test failures on trunk and 1.8.x
  
   On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote:
So I propose the following plan:
1. Make 'log --search case-sensitive in trunk and 1.8.x.
2. Merge utf8proc stuff to trunk
3. Implement svn_utf__casefold() using utf8proc
4. Implement 'log --isearch' using
  
   No --isearch please. It did exist on trunk but we made --search
   case-insensitive in r1388530 to avoid having too many options.
  
   Has anyone tried my APR patch on windows yet? I'd be interested to
   know whether or not that helps... if you are running Windows and
   care about this issue, please let me know if my APR patch helps so
   that I can prepare a fix for APR and SVN 1.8.x. I think we can do
   better than making --search case-sensitive just because of a bug in APR.
  
   I don't see why we couldn't ship a private and fixed version of
   apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined
   behaviour within tolower() via apr_fnmatch() without requiring future
   APR versions. Would this not be a good way of fixing this?
 
  What would this fix?
 
 Our custom version of apr_fnmatch() would not pass values to tolower()
 which cannot be represented as unsigned char, thus eliminating undefined
 behaviour and avoiding assertion failures on Windows. We would put this
 custom fnmatch() into 'svn' (not the libraries) to specifically address
 the assertion failures caused by svn log --search.
 
 Is that more clear and does it make sense?
 
  This doesn't make apr_fnmatch use proper case folding, as that needs
 proper
  UTF-8 processing and following locale rules.
 
 AFAIK, we are talking about assertion failures in the test suite on
 Windows. The discussion has balooned into adding locale-specific
 case-folding to apr_fnmatch() in order to make log --search truly
 case-insensitive in all locales. But that is a separate issue from
 assertion failures in tests. I want to fix the assertions, I do not
 want to open up the can of worms required to have locale-aware
 case-insensitive matching in apr_fnmatch() (at least I don't want
 to open it right now -- perhaps later but it would be a change to APR,
 not Subversion).

The assertion shows a design problem which we should handle for future 
compatibility and you suggest just adding some bandages to patch/hide the test 
failure?

The current code is broken and the suggestion you do is like the solution 
mostly vetoed by most of the responders in this thread: assuming there is only 
us-english, by using a function that has platform specific behavior.

(tolower() is locale and platform character encoding dependent. You should 
never just pass individual UTF-8 bytes to it)



Personally, I don't care if 'svn' does such an assumption about the world and 
leave the bugfixing to future releases, but libsvn_* should never have such 
assumptions in it. 

But then please just hardcode converting 'a-z' to 'A-Z' for the log messages 
instead of pretending tolower() handles everything else for you.  This has 
strictly defined behavior and shows that we need a better implementation later. 
While using tolower() with a hack around it just ships undefined behavior.


Bert



Re: log --search test failures on trunk and 1.8.x

2013-04-22 Thread Branko Čibej
On 22.04.2013 12:59, Bert Huijben wrote:

 -Original Message-
 From: 'Stefan Sperling' [mailto:s...@elego.de]
 Sent: maandag 22 april 2013 11:46
 To: Bert Huijben
 Cc: 'Ivan Zhakov'; 'Branko Čibej'; dev@subversion.apache.org
 Subject: Re: log --search test failures on trunk and 1.8.x

 On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote:
 -Original Message-
 From: Stefan Sperling [mailto:s...@elego.de]
 Sent: maandag 22 april 2013 00:08
 To: Ivan Zhakov
 Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org
 Subject: Re: log --search test failures on trunk and 1.8.x

 On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote:
 So I propose the following plan:
 1. Make 'log --search case-sensitive in trunk and 1.8.x.
 2. Merge utf8proc stuff to trunk
 3. Implement svn_utf__casefold() using utf8proc
 4. Implement 'log --isearch' using
 No --isearch please. It did exist on trunk but we made --search
 case-insensitive in r1388530 to avoid having too many options.

 Has anyone tried my APR patch on windows yet? I'd be interested to
 know whether or not that helps... if you are running Windows and
 care about this issue, please let me know if my APR patch helps so
 that I can prepare a fix for APR and SVN 1.8.x. I think we can do
 better than making --search case-sensitive just because of a bug in APR.

 I don't see why we couldn't ship a private and fixed version of
 apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined
 behaviour within tolower() via apr_fnmatch() without requiring future
 APR versions. Would this not be a good way of fixing this?
 What would this fix?
 Our custom version of apr_fnmatch() would not pass values to tolower()
 which cannot be represented as unsigned char, thus eliminating undefined
 behaviour and avoiding assertion failures on Windows. We would put this
 custom fnmatch() into 'svn' (not the libraries) to specifically address
 the assertion failures caused by svn log --search.

 Is that more clear and does it make sense?

 This doesn't make apr_fnmatch use proper case folding, as that needs
 proper
 UTF-8 processing and following locale rules.
 AFAIK, we are talking about assertion failures in the test suite on
 Windows. The discussion has balooned into adding locale-specific
 case-folding to apr_fnmatch() in order to make log --search truly
 case-insensitive in all locales. But that is a separate issue from
 assertion failures in tests. I want to fix the assertions, I do not
 want to open up the can of worms required to have locale-aware
 case-insensitive matching in apr_fnmatch() (at least I don't want
 to open it right now -- perhaps later but it would be a change to APR,
 not Subversion).
 The assertion shows a design problem which we should handle for future 
 compatibility and you suggest just adding some bandages to patch/hide the 
 test failure?

 The current code is broken and the suggestion you do is like the solution 
 mostly vetoed by most of the responders in this thread: assuming there is 
 only us-english, by using a function that has platform specific behavior.

 (tolower() is locale and platform character encoding dependent. You should 
 never just pass individual UTF-8 bytes to it)



 Personally, I don't care if 'svn' does such an assumption about the world and 
 leave the bugfixing to future releases, but libsvn_* should never have such 
 assumptions in it. 

 But then please just hardcode converting 'a-z' to 'A-Z' for the log messages 
 instead of pretending tolower() handles everything else for you.  This has 
 strictly defined behavior and shows that we need a better implementation 
 later. While using tolower() with a hack around it just ships undefined 
 behavior.

I tend to agree. As it is, it would be better to make --search
case-sensitive, or remove it from 1.8, than to tell people it's
case-insensitive but just happens not to work as advertised.

-- Brane

-- 
Branko Čibej
Director of Subversion | WANdisco | www.wandisco.com



Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py

2013-04-22 Thread C. Michael Pilato
On 04/20/2013 05:00 AM, Bert Huijben wrote:
 Instead of patching more and more corner cases with individual extra ra
 calls I think svn_client_log should call svn_client__repos_locations()
 once to obtain the history of the path to look at over the entire range
 of versions. (=over MAX(rev):MIN(rev)) and then use that information
 directly as the paths for the rest of the log calls.

Agreed.  Was thinking exactly the same thing.

I wonder, though:  isn't this rev-range support better situated in the RA
layer itself, extended up to the server?  I ask because IIRC, the fallback
implementation of svn_ra_get_repos_locations() uses none other than the
'log' functionality.

So I see this:

  BEST CASE:  client's RA layer and server can handle log-with-ranges

  FALLBACK 1:  client RA layer works around server's lack of log-with-ranges
  support by using get_locations() and a series of log requests

  FALLBACK 2:  client RA layer works around server's lack of log-with-ranges
  support and lack of get_locations support by using a single
  log request from which disinteresting revisions are merely filtered
  out.


-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Enterprise Cloud Development



signature.asc
Description: OpenPGP digital signature


RE: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: C. Michael Pilato [mailto:cmpil...@collab.net]
 Sent: maandag 22 april 2013 15:46
 To: Bert Huijben
 Cc: dev@subversion.apache.org
 Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion:
 include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c
 tests/cmdline/log_tests.py
 
 On 04/20/2013 05:00 AM, Bert Huijben wrote:
  Instead of patching more and more corner cases with individual extra ra
  calls I think svn_client_log should call svn_client__repos_locations()
  once to obtain the history of the path to look at over the entire range
  of versions. (=over MAX(rev):MIN(rev)) and then use that information
  directly as the paths for the rest of the log calls.
 
 Agreed.  Was thinking exactly the same thing.
 
 I wonder, though:  isn't this rev-range support better situated in the RA
 layer itself, extended up to the server?  I ask because IIRC, the fallback
 implementation of svn_ra_get_repos_locations() uses none other than the
 'log' functionality.
 
 So I see this:
 
   BEST CASE:  client's RA layer and server can handle log-with-ranges
 
   FALLBACK 1:  client RA layer works around server's lack of log-with-ranges
   support by using get_locations() and a series of log requests

I think this should work for 1.5+ servers.

   FALLBACK 2:  client RA layer works around server's lack of log-with-ranges
   support and lack of get_locations support by using a single
   log request from which disinteresting revisions are merely filtered
   out.

+1

Given that we already branched for 1.8 I would suggest backporting FALLBACK 1 
to 1.8 and leaving the BEST CASE for 1.9.

But maybe if somebody starts now, he/she can get it ready for 1.8.
(Would be nice to see the performance improvements sooner)

The fallback 1/2 cases can be implemented as a bugfix even for 1.7 if there is 
enough interest as they don't change public APIs.

Bert



Re: Pristine text missing - cleanup doesn't work

2013-04-22 Thread Julian Foad
I have filed this as issue #4357, Pristine text missing - cleanup doesn't 
work, with a reference to this email thread.

I thought of the following possible improvements, which I have noted in the doc 
string of pristine_cleanup_wcroot():

 * TODO: At least check that any zero refcount is really correct, before
 *   using it.
 *
 * TODO: Ideas for possible extra clean-up operations:
 *
 *   * Check and correct all the refcounts.  Identify any rows missing
 * from the 'pristine' table.  (Create a temporary index for speed
 * if necessary?)
 *
 *   * Check the checksums.  (Very expensive to check them all, so find
 * a way to not check them all.)
 *
 *   * Check for pristine files missing from disk but referenced in the
 * 'pristine' table.
 *
 *   * Repair any pristine files missing from disk and/or rows missing
 * from the 'pristine' table and/or bad checksums.  Generally
 * requires contacting the server, so requires support at a higher
 * level than this function.
 *
 *   * Identify any pristine text files on disk that are not referenced
 * in the DB, and delete them.
 *
 * TODO: Provide feedback about any errors found and any corrections made.

Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in my 
initial email but forgot to attach there.

- Julian


Johan Corveleyn wrote:
 On Thu, Apr 18, 2013 at 10:05 PM, Bert Huijben b...@qqmail.nl wrote:
 Johan Corveleyn reported that he heard multiple reports of this being caused
 by earlier SvnKit versions. I have not got a single bugreport of this type
 for AnkhSvn via our error reporting infrastructure.

 I don’t see how anybody using SQLite can cause this as the updating of the
 reference count is only handled by the triggers...

 But SvnKit implemented their own clone, which used to have problems in this
 field...
 
 Indeed. But those errors have since been fixed (have not seen such
 problems anymore in the last half year or so).
 
 However ...
 
 On Thu, Apr 18, 2013 at 9:55 PM, Julian Foad julianf...@btopenworld.com 
 wrote:
 ...
 I think we should do something to improve the situation.  Fortunately it
 seems to be relatively rare, as not many people have complained about it,
 but there are over 5000 pages listed by Google with that error message.
 
 Big +1 from me.
 
 See my own prodding for improvement in this area.
 
http://svn.haxx.se/dev/archive-2012-09/0304.shtml
 http://mail-archives.apache.org/mod_mbox/subversion-dev/201206.mbox/%3CCAB84uBUJeZDHjhKP6Fz2hoU_g1oHUZyUq_wNVNC=ppyju7h...@mail.gmail.com%3E
 http://svn.haxx.se/dev/archive-2011-07/0001.shtml
 
 FWIW, I don't buy the argument that svn / sqlite can't break. Bugs (or
 rare cosmic ray disturbances or whatever) do happen. Having an 'svn
 repair' / 'svn validate' command or something (like SmartSVN's
 'validate admin area') would definitely be nice.#!/bin/bash
# Given a pristine file's SHA1, which is missing from the PRISTINE table but
# referenced from the NODES table, download it into the pristine store and
# insert a PRISTINE table row.

SHA1=$1

set -e

PRISTINE=.svn/pristine/${SHA1:0:2}/$SHA1.svn-base
echo In NODES table:
svnsqlite3 'select * from nodes where checksum=$sha1$'$SHA1''
RRP=$(svnsqlite3 'select repos_path from nodes where checksum=$sha1$'$SHA1'')
REV=$(svnsqlite3 'select revision from nodes where checksum=$sha1$'$SHA1'')
echo In PRISTINE table:
svnsqlite3 'select * from pristine where checksum=$sha1$'$SHA1''
if [ -e $PRISTINE ]; then
  echo 2 Pristine file already exists: $PRISTINE
  exit 1
fi
echo Downloading '^/$RRP@$REV'...
svn cat ^/$RRP@$REV  $PRISTINE
V=$(sha1sum $PRISTINE); V=${V:0:40}
echo SHA1: $V
V=$(md5sum $PRISTINE); MD5=${V:0:32}
echo MD5:  $MD5
LEN=$(stat --printf=%s $PRISTINE)
echo Len:  $LEN
echo Inserting PRISTINE table row...
#echo svnsqlite3 'insert into pristine values ($sha1$'$SHA1', null, '$LEN', 1, $md5 $'$MD5')'
svnsqlite3 'insert into pristine values ($sha1$'$SHA1', null, '$LEN', 1, $md5 $'$MD5')'
svnsqlite3 'select * from pristine where checksum=$sha1$'$SHA1''


AW: Making the Windows Build Easier

2013-04-22 Thread Markus Schaber
Hi,

Berts SharpSVN build also includes a way to pull all the dependencies, using 
msbuild.

However, it does not work with the express editions yet, AFAICS, so your script 
clearly has at least one advantage. :-)

Will it work using the VS 2010 command prompt?

Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: 
http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade 
register: Kempten HRB 6186 | Tax ID No.: DE 167014915

 -Ursprüngliche Nachricht-
 Von: Ben Reser [mailto:b...@reser.org]
 Gesendet: Sonntag, 14. April 2013 02:46
 An: Subversion Development
 Betreff: Making the Windows Build Easier
 
 As we've discussed previously a big part of the problem in getting the
 Windows build working has been getting the dependencies built.
 Everyone I've talked to manages to get this done, then has no idea how
 to duplicate what they've done.
 
 This past week I wanted to use some static analysis tools that were only
 available on Windows.  So I set out to build Subversion on Windows.  I
 spent about 2 days fighting with it, taking notes the whole way so I
 could try and reproduce it.  On the 3rd day I realized that taking notes
 wasn't going to work and that by the time I finished I'd have some
 instructions that left some important detail out or didn't explain it
 fully.  So I started writing a script to automate this annoying process.
 
 The result is the build-svn-deps-win.pl script that I've commited in
 r1467714.
 
 This script is far from perfect.  It still needs some work on it, but
 it's probably about 90% of the way there to turning setting up a Windows
 development environment into about an hour effort from a couple day slog
 that you have no hope of being able to reproduce.
 
 A lot of credit goes to pburba for his blog post here:
 http://blogs.collab.net/subversion/building-subversion-on-windows-a-
 walk-through
 
 Without it I would have spent even more time doing this than I did.
 
 So with all that said, here are some details.
 
 It downloads the following things:
 bdb
 zlib
 pcre
 httpd
 apr
 apr-util
 apr-iconv
 sqlite-amalgamation
 serf
 
 It builds all of the above expect for sqlite and serf which get built as
 part of the Subversion build itself (zlib could be built there as well
 but I built it so that mod_deflate would work in httpd).
 
 The script is built to use an entirely modern tool chain and
 dependencies.  Everything is the current released version.  That means
 httpd-2.4.4, Visual Studio 2012, etc...  It does not build or deal with
 neon, though that could be trivially added.
 
 Unfortunately, it does not work with Visual Studio 2012 Windows Desktop
 Express due to the lack of devenv.(com|exe).  However, the script is
 written to produce a the binaries in a way that they can be packaged.
 There's still some work to do here in making the package as minimal as
 possible (some of it possibly in our build system).
 
 At current the script only builds a Win32 Release build.  But I expect
 to change that in the near future.
 
 So without further ado, here are some instructions:
 
 1) Get the dependencies.  If you have a full version of Visual Studio
 (not an Express version) you can follow a) or b) depending on your
 preference.  For Visual Studio Express you have to follow b).
 
 a) Pick a location to do the build and make a directory to build the
 deps, in my case I put it in C:\Users\breser\svn-trunk-deps.  Download
 the script at
 http://svn.apache.org/repos/asf/subversion/trunk/tools/dev/build-svn-
 deps-win.pl
 and put it in this directory.  Open the script in a text editor and make
 sure you install the few dependencies you'll need (Perl, Python, 7-Zip
 and CMake).  Python actually may not be needed for this but it's needed
 for the Subversion build itself.  If presented the option to add the
 commands to your path accept it (Python and Perl will do this by default,
 CMake you have to explicitly decide to do this).  7-zip doesn't have one
 so make a note of where you installed it for later.
 Once you have the dependencies downloaded open the VS2012 x86 Native
 Tools Command Prompt (usually found in the Visual Studio Tools group).
 cd to your directory and run build-svn-deps-win.pl.  If 7z.exe is not in
 C:\Program Files\7-Zip\7z.exe you can override this by passing
 SEVEN_ZIP=C:\Path\To\7z.exe to the script.  The script will then
 download and build the dependencies for you.  For me on my VM setup it
 takes about 45 minutes for this to run.
 
 b) Download http://ben.reser.org/svn-windows-deps/svn-trunk-deps-win32-
 release-20130413.7z
 and extract it.  You can 

RE: Pristine text missing - cleanup doesn't work

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Julian Foad [mailto:julianf...@btopenworld.com]
 Sent: maandag 22 april 2013 16:34
 To: Johan Corveleyn
 Cc: Bert Huijben; Subversion Development
 Subject: Re: Pristine text missing - cleanup doesn't work
 
 I have filed this as issue #4357, Pristine text missing - cleanup doesn't 
 work,
 with a reference to this email thread.
 
 I thought of the following possible improvements, which I have noted in the
 doc string of pristine_cleanup_wcroot():
 
  * TODO: At least check that any zero refcount is really correct, before
  *   using it.

We already do this in debug builds.

The foreign key check enforces this, but at a certainly not-null performance 
cost (understatement). This is why I haven't enabled this for release builds.
(The original reason for not enabling them was that it required a newer Sqlite 
version)

You can't delete a PRISTINE row that is used with foreign keys enabled. (That 
part is currently not backed by an index... the performance killer. And 
optimized away by our reference counting via triggers)

  * TODO: Ideas for possible extra clean-up operations:
  *
  *   * Check and correct all the refcounts.  Identify any rows missing
  * from the 'pristine' table.  (Create a temporary index for speed
  * if necessary?)

The index you need is already there, or the triggers that update them on any 
change of NODES would be terribly slow.

With foreign key checking enabled you can't add a row to NODES pointing to a 
non-existing PRISTINE row.

  *
  *   * Check the checksums.  (Very expensive to check them all, so find
  * a way to not check them all.)
  *
  *   * Check for pristine files missing from disk but referenced in the
  * 'pristine' table.
  *
  *   * Repair any pristine files missing from disk and/or rows missing
  * from the 'pristine' table and/or bad checksums.  Generally
  * requires contacting the server, so requires support at a higher
  * level than this function.
  *
  *   * Identify any pristine text files on disk that are not referenced
  * in the DB, and delete them.
  *
  * TODO: Provide feedback about any errors found and any corrections
 made.
 
 Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in
 my initial email but forgot to attach there.

Bert



RE: Pristine text missing - cleanup doesn't work

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Bert Huijben [mailto:b...@qqmail.nl]
 Sent: maandag 22 april 2013 17:18
 To: 'Julian Foad'; 'Johan Corveleyn'
 Cc: 'Subversion Development'
 Subject: RE: Pristine text missing - cleanup doesn't work
 
 
 
  -Original Message-
  From: Julian Foad [mailto:julianf...@btopenworld.com]
  Sent: maandag 22 april 2013 16:34
  To: Johan Corveleyn
  Cc: Bert Huijben; Subversion Development
  Subject: Re: Pristine text missing - cleanup doesn't work
 
  I have filed this as issue #4357, Pristine text missing - cleanup doesn't
 work,
  with a reference to this email thread.
 
  I thought of the following possible improvements, which I have noted in
 the
  doc string of pristine_cleanup_wcroot():
 
   * TODO: At least check that any zero refcount is really correct, before
   *   using it.
 
 We already do this in debug builds.
 
 The foreign key check enforces this, but at a certainly not-null performance
 cost (understatement). This is why I haven't enabled this for release builds.
 (The original reason for not enabling them was that it required a newer Sqlite
 version)
 
 You can't delete a PRISTINE row that is used with foreign keys enabled. (That
 part is currently not backed by an index... the performance killer. And
 optimized away by our reference counting via triggers)

If we perform a check to verify the reference count before using it for cleanup 
we should just remove the reference count updating triggers. This gives a few 
percent performance boost to every update/add/delete on the NODES table.


Given that we never update the reference count manually I would be very 
surprised if we could ever find a reason why somebody not using the Sqlite db 
directly (via Sqlite or a third party sqlite db implementation) would get the 
refcount in a broken state.

Bert



Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Julian Foad
I noticed that a fresh 1.8.x (pre-release) WC has a different schema from a WC 
created by 1.7.8 and upgraded by 1.8.x.  (I haven't tried different 1.7.x 
versions.)  The differences are:


--- schema-1.7.8-upgraded-to-1.8-dev
+++ schema-1.8-dev

-   file_external  TEXT,
+   file_external  INTEGER,


-CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
-    wc_id, parent_relpath);
+CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
+    wc_id, parent_relpath, local_relpath);

-CREATE INDEX I_NODES_PARENT ON NODES (
-    wc_id, parent_relpath, op_depth);
+CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
+    wc_id, parent_relpath, local_relpath, op_depth);


This looks like a 1.8 release blocker.

Filed as issue #4358 http://subversion.tigris.org/issues/show_bug.cgi?id=4358.

- Julian


--
Join WANdisco's free daily demo sessions on Scaling Subversion for the 
Enterprise
http://www.wandisco.com/training/webinars



Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py

2013-04-22 Thread Paul Burba
On Sat, Apr 20, 2013 at 5:00 AM, Bert Huijben b...@qqmail.nl wrote:

 -Original Message-
 From: pbu...@apache.org [mailto:pbu...@apache.org]
 Sent: vrijdag 19 april 2013 20:22
 To: comm...@subversion.apache.org
 Subject: svn commit: r1469982 - in /subversion/trunk/subversion:
 include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c
 tests/cmdline/log_tests.py

 Author: pburba
 Date: Fri Apr 19 18:21:36 2013
 New Revision: 1469982

 URL: http://svn.apache.org/r1469982
 Log:
 2nd attempt to fix issue #4355 'svn_client_log5 broken with multiple
 revisions which span a rename'.

 Hi,

 This patch and the few before extend svn_client_log to call more and more 
 history operations on the same path.

Hi Bert,

The few before?  There is only one (rr1469515) that I see.  Are you
referring to work older than what I've done in the prior week?  If so
could you point out the specific revs? I want to be sure I know what
you are referring to.

 (The path resolving via svn_client__repos_locations() is on the file system 
 layer essentially the same thing as walking the history on the filesystem 
 layer.)

 Instead of patching more and more corner cases with individual extra ra calls 
 I think svn_client_log should call svn_client__repos_locations()  once to 
 obtain the history of the path to look at over the entire range of versions. 
 (=over MAX(rev):MIN(rev)) and then use that information directly as the paths 
 for the rest of the log calls.

How exactly would svn_client__repos_locations be useful?  Do you mean
svn_ra_get_location_segments?

 libsvn_client's log operation is already a very slow operation over high 
 latency links with mod_dav and this patching with extra ra requests is making 
 it worse and worse.

I reverted r1469515 and r1469982 (excepting the new log tests of
course, I left those in), so we are back at square one.

 Performing this operation once should improve the log performance 
 considerable, making the multi range version much better for script and api 
 users.

 Bert


On Mon, Apr 22, 2013 at 9:46 AM, C. Michael Pilato cmpil...@collab.net wrote:
 On 04/20/2013 05:00 AM, Bert Huijben wrote:
 Instead of patching more and more corner cases with individual extra ra
 calls I think svn_client_log should call svn_client__repos_locations()
 once to obtain the history of the path to look at over the entire range
 of versions. (=over MAX(rev):MIN(rev)) and then use that information
 directly as the paths for the rest of the log calls.

 Agreed.  Was thinking exactly the same thing.

 I wonder, though:  isn't this rev-range support better situated in the RA
 layer itself, extended up to the server?  I ask because IIRC, the fallback
 implementation of svn_ra_get_repos_locations()

You too mean svn_ra_get_location_segments or svn_ra_get_locations I assume?

  uses none other than the 'log' functionality.

 So I see this:

   BEST CASE:  client's RA layer and server can handle log-with-ranges

   FALLBACK 1:  client RA layer works around server's lack of log-with-ranges
   support by using get_locations() and a series of log requests

   FALLBACK 2:  client RA layer works around server's lack of log-with-ranges
   support and lack of get_locations support by using a single
   log request from which disinteresting revisions are merely filtered
   out.

--
Paul T. Burba
CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development
Skype: ptburba

On Mon, Apr 22, 2013 at 10:21 AM, Bert Huijben b...@qqmail.nl wrote:


 -Original Message-
 From: C. Michael Pilato [mailto:cmpil...@collab.net]
 Sent: maandag 22 april 2013 15:46
 To: Bert Huijben
 Cc: dev@subversion.apache.org
 Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion:
 include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c
 tests/cmdline/log_tests.py

 On 04/20/2013 05:00 AM, Bert Huijben wrote:
  Instead of patching more and more corner cases with individual extra ra
  calls I think svn_client_log should call svn_client__repos_locations()
  once to obtain the history of the path to look at over the entire range
  of versions. (=over MAX(rev):MIN(rev)) and then use that information
  directly as the paths for the rest of the log calls.

 Agreed.  Was thinking exactly the same thing.

 I wonder, though:  isn't this rev-range support better situated in the RA
 layer itself, extended up to the server?  I ask because IIRC, the fallback
 implementation of svn_ra_get_repos_locations() uses none other than the
 'log' functionality.

 So I see this:

   BEST CASE:  client's RA layer and server can handle log-with-ranges

   FALLBACK 1:  client RA layer works around server's lack of log-with-ranges
   support by using get_locations() and a series of log requests

 I think this should work for 1.5+ servers.

   FALLBACK 2:  client RA layer works around server's lack of log-with-ranges
   support and lack of get_locations support by using a single

Re: Pristine text missing - cleanup doesn't work

2013-04-22 Thread Julian Foad
Bert Huijben wrote:
  * TODO: At least check that any zero refcount is really correct, before
  *       using it.
 
  We already do this in debug builds.

Hmm, you're right, when using a current 1.8-dev build and a fresh WC.

When I tested this scenario the other day it was in an old WC that has been 
upgraded from time to time, and is now supposedly at 1.8.  There (I just 
repeated to make sure) if I set the refcount to 0, then 'svn cleanup' quietly 
deletes the 'pristine' table row and the file.

The DB schema in that WC does not match what is should be for 1.8 -- it doesn't 
have any 'REFERENCES PRISTINE (checksum)' clauses in it.  It may have got that 
way by being upgraded incrementally through several 1.7-era development 
versions, so we need not be too concerned about it.

(While investigating, I discovered and filed issue #4358 Svn WC 1.8 upgrade 
from 1.7 - wrong schema, 
http://subversion.tigris.org/issues/show_bug.cgi?id=4358.  The schema 
differences there do *not* include a missing 'REFERENCES' clause.)

I tested with a checkout containing just one file in four freshly created WCs:

- checkout with 1.6.20, upgrade with 1.8.x
- checkout with 1.6.20, upgrade with 1.7.8, upgrade with 1.8.x
- checkout with 1.7.8, upgrade with 1.8.x
- checkout with 1.8.x

and in each of these, I ran:

- sqlite3 'update pristine set refcount=0'
- svn cleanup  # fails: 'foreign key constraint failed'
- svn cat foo  # succeeds
- sqlite3 'update pristine set refcount=1'
- svn cleanup  # succeeds

In each case, 'svn cleanup' errored out with an 'foreign key constraint failed' 
error, preventing any further damage.

So that seems to be fairly safe when the foreign key constraint checking is 
enabled.

  The foreign key check enforces this, but at a certainly not-null performance
  cost (understatement). This is why I haven't enabled this for release 
builds.
  (The original reason for not enabling them was that it required a newer 
Sqlite
  version)
 
  You can't delete a PRISTINE row that is used with foreign keys enabled. 
(That
  part is currently not backed by an index... the performance killer. And
  optimized away by our reference counting via triggers)
 
 If we perform a check to verify the reference count before using it for 
 cleanup 
 we should just remove the reference count updating triggers. This gives a few 
 percent performance boost to every update/add/delete on the NODES table.

If we removed the ref-count-updating triggers, then we wouldn't be able to 
select pristines for deletion based on refount==0.  What would we do instead?

 Given that we never update the reference count manually I would be very 
 surprised if we could ever find a reason why somebody not using the Sqlite db 
 directly (via Sqlite or a third party sqlite db implementation) would get the 
 refcount in a broken state.

Well, it has happened before and it will probably happen again.

- Julian



Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py

2013-04-22 Thread C. Michael Pilato
On 04/22/2013 01:22 PM, Paul Burba wrote:
 I wonder, though:  isn't this rev-range support better situated in the RA
 layer itself, extended up to the server?  I ask because IIRC, the fallback
 implementation of svn_ra_get_repos_locations()
 
 You too mean svn_ra_get_location_segments or svn_ra_get_locations I assume?

Yes, sorry.  I confess I was trying to dredge up the API name from memory.
It's svn_ra_get_location_segments() that I had in mind.  Sorry for the
confusion.

-- 
C. Michael Pilato cmpil...@collab.net
CollabNet  www.collab.net  Enterprise Cloud Development



signature.asc
Description: OpenPGP digital signature


Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Julian Foad
I (Julian Foad) wrote:

 I noticed that a fresh 1.8.x (pre-release) WC has a different schema from a 
 WC 
 created by 1.7.8 and upgraded by 1.8.x.  (I haven't tried different 1.7.x 
 versions.)  The differences are:
 
 --- schema-1.7.8-upgraded-to-1.8-dev
 +++ schema-1.8-dev
 
 -   file_external  TEXT,
 +   file_external  INTEGER,
 
 -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
 -    wc_id, parent_relpath);
 +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
 +    wc_id, parent_relpath, local_relpath);
 
 -CREATE INDEX I_NODES_PARENT ON NODES (
 -    wc_id, parent_relpath, op_depth);
 +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
 +    wc_id, parent_relpath, local_relpath, op_depth);
 
 
 This looks like a 1.8 release blocker.

Before anyone points out that these schema differences may not in themselves 
cause wrong functionality, I think the very fact that 'upgrade' doesn't make 
the schema identical is a blocker issue in itself.  I don't know, and haven't 
tried to find out, whether there is any functional brokenness here.  If the 
upgrade missed making these changes, then we definitely need to review this 
part of the upgrade to ensure it isn't missing any data changes that should be 
made along with these schema changes.

- Julian


 Filed as issue #4358 
 http://subversion.tigris.org/issues/show_bug.cgi?id=4358.


RE: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Paul Burba [mailto:ptbu...@gmail.com]
 Sent: maandag 22 april 2013 19:23
 To: Bert Huijben
 Cc: Michael Pilato; Subversion Development
 Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion:
 include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c
 tests/cmdline/log_tests.py
 
 On Sat, Apr 20, 2013 at 5:00 AM, Bert Huijben b...@qqmail.nl wrote:
 
  -Original Message-
  From: pbu...@apache.org [mailto:pbu...@apache.org]
  Sent: vrijdag 19 april 2013 20:22
  To: comm...@subversion.apache.org
  Subject: svn commit: r1469982 - in /subversion/trunk/subversion:
  include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c
  tests/cmdline/log_tests.py
 
  Author: pburba
  Date: Fri Apr 19 18:21:36 2013
  New Revision: 1469982
 
  URL: http://svn.apache.org/r1469982
  Log:
  2nd attempt to fix issue #4355 'svn_client_log5 broken with multiple
  revisions which span a rename'.
 
  Hi,
 
  This patch and the few before extend svn_client_log to call more and more
 history operations on the same path.
 
 Hi Bert,
 
 The few before?  There is only one (rr1469515) that I see.  Are you
 referring to work older than what I've done in the prior week?  If so
 could you point out the specific revs? I want to be sure I know what
 you are referring to.

I remember some other recent changes to the log function (Related to its 
strange set of arguments), but I can't find the references here.
 
  (The path resolving via svn_client__repos_locations() is on the file system
 layer essentially the same thing as walking the history on the filesystem
 layer.)
 
  Instead of patching more and more corner cases with individual extra ra
 calls I think svn_client_log should call svn_client__repos_locations()  once 
 to
 obtain the history of the path to look at over the entire range of versions.
 (=over MAX(rev):MIN(rev)) and then use that information directly as the
 paths for the rest of the log calls.
 
 How exactly would svn_client__repos_locations be useful?  Do you mean
 svn_ra_get_location_segments?

svn_client__repos_locations() is a libsvn_client wrapper of 
svn_ra_get_locations(), which is similar to svn_ra_get_location_segments(), but 
returns result in a harder to use format. The reason I used that function in my 
e-mail is because that is the function that implements the real work inside 
resolve_rev_and_url()

Bert

 
  libsvn_client's log operation is already a very slow operation over high
 latency links with mod_dav and this patching with extra ra requests is making
 it worse and worse.
 
 I reverted r1469515 and r1469982 (excepting the new log tests of
 course, I left those in), so we are back at square one.
 
  Performing this operation once should improve the log performance
 considerable, making the multi range version much better for script and api
 users.
 
  Bert
 



RE: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Julian Foad [mailto:julianf...@btopenworld.com]
 Sent: maandag 22 april 2013 18:51
 To: Subversion Development
 Subject: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
 
 I noticed that a fresh 1.8.x (pre-release) WC has a different schema from
a
 WC created by 1.7.8 and upgraded by 1.8.x.  (I haven't tried different
1.7.x
 versions.)  The differences are:
 
 
 --- schema-1.7.8-upgraded-to-1.8-dev
 +++ schema-1.8-dev
 
 -   file_external  TEXT,
 +   file_external  INTEGER,

This doesn't matter for our use of Sqlite. We only use NULL vs set, but
integer documents what we now store in file_external.

Sqlite doesn't implement an ALTER table statement that can update this, so I
don't think we should try to change this for 1.8.

 
 
 -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
 -    wc_id, parent_relpath);
 +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
 +    wc_id, parent_relpath, local_relpath);
 
 -CREATE INDEX I_NODES_PARENT ON NODES (
 -    wc_id, parent_relpath, op_depth);
 +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
 +    wc_id, parent_relpath, local_relpath, op_depth);

I'll look into these tomorrow, but the performance difference is not that
large. And as the names are 100% identical they can never cause problems
upgrading to future versions. (And SQL guarantees that this can't affect our
queries in any other way than performance)

Bert



Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Julian Foad
Bert Huijben wrote:

 Julian Foad wrote:
  --- schema-1.7.8-upgraded-to-1.8-dev
  +++ schema-1.8-dev
 
  -   file_external  TEXT,
  +   file_external  INTEGER,
 
 This doesn't matter for our use of Sqlite. We only use NULL vs set, but
 integer documents what we now store in file_external.
 
 Sqlite doesn't implement an ALTER table statement that can update this, so I
 don't think we should try to change this for 1.8.

  -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
  -    wc_id, parent_relpath);
  +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
  +    wc_id, parent_relpath, local_relpath);
 
  -CREATE INDEX I_NODES_PARENT ON NODES (
  -    wc_id, parent_relpath, op_depth);
  +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
  +    wc_id, parent_relpath, local_relpath, op_depth);

For 1.8, what's the point of still including 'parent_relpath' in the index, 
when we know that every 'local_relpath' value starts with 'parent_relpath'?  
Doesn't that just make the index a bit bigger and a bit slower than

CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
    wc_id, local_relpath, op_depth);

?

 I'll look into these tomorrow, but the performance difference is not that
 large. And as the names are 100% identical they can never cause problems
 upgrading to future versions. (And SQL guarantees that this can't affect our
 queries in any other way than performance)

BTW, Brane has started working on this issue.

- Julian


RE: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Bert Huijben


 -Original Message-
 From: Julian Foad [mailto:julianf...@btopenworld.com]
 Sent: maandag 22 april 2013 20:47
 To: Bert Huijben
 Cc: 'Subversion Development'
 Subject: Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
 
 Bert Huijben wrote:
 
  Julian Foad wrote:
   --- schema-1.7.8-upgraded-to-1.8-dev
   +++ schema-1.8-dev
 
   -   file_external  TEXT,
   +   file_external  INTEGER,
 
  This doesn't matter for our use of Sqlite. We only use NULL vs set, but
  integer documents what we now store in file_external.
 
  Sqlite doesn't implement an ALTER table statement that can update this,
so
 I
  don't think we should try to change this for 1.8.
 
   -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
   -    wc_id, parent_relpath);
   +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
   +    wc_id, parent_relpath, local_relpath);
 
   -CREATE INDEX I_NODES_PARENT ON NODES (
   -    wc_id, parent_relpath, op_depth);
   +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
   +    wc_id, parent_relpath, local_relpath, op_depth);
 
 For 1.8, what's the point of still including 'parent_relpath' in the
index, when
 we know that every 'local_relpath' value starts with
 'parent_relpath'?  Doesn't that just make the index a bit bigger and a bit
 slower than
 
 CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
     wc_id, local_relpath, op_depth);

We also have that index..., but we query nodes in 3 ways:

* Where local_relpath = 'something' (exact lookup)
* Where parent_relpath = 'something' (everything in a directory)
* Where local_relpath  './' and local_relpath  '0' (all
descendants)

This index is used for that second variant and by making it a unique index
instead of one with the same value multiple times it provides a stable
order, cheaper index-updates (via exact lookup) and cheaper lookups for the
case where we only want values that are cached in the index.

 
 ?
 
  I'll look into these tomorrow, but the performance difference is not
that
  large. And as the names are 100% identical they can never cause problems
  upgrading to future versions. (And SQL guarantees that this can't affect
our
  queries in any other way than performance)
 
 BTW, Brane has started working on this issue.

There should be some examples in earlier format bumps where we drop one
index and add another one.

This change can safely made part of the last existing format bump for 1.8.

Bert



Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Julian Foad
Bert Huijben wrote:

 Julian Foad wrote:
  -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
  -    wc_id, parent_relpath);
  +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE (
  +    wc_id, parent_relpath, local_relpath);

  -CREATE INDEX I_NODES_PARENT ON NODES (
  -    wc_id, parent_relpath, op_depth);
  +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
  +    wc_id, parent_relpath, local_relpath, op_depth);
 
  For 1.8, what's the point of still including 'parent_relpath' in the
 index, when  we know that every 'local_relpath' value starts with
  'parent_relpath'?  Doesn't that just make the index a bit bigger and
 a bit  slower than
 
  CREATE UNIQUE INDEX I_NODES_PARENT ON NODES (
      wc_id, local_relpath, op_depth);
 
 We also have that index..., but we query nodes in 3 ways:
 
 * Where local_relpath = 'something' (exact lookup)
 * Where parent_relpath = 'something' (everything in a directory)
 * Where local_relpath  './' and local_relpath  '0' (all
 descendants)
 
 This index is used for that second variant and by making it a unique index
 instead of one with the same value multiple times it provides a stable
 order, cheaper index-updates (via exact lookup) and cheaper lookups for the
 case where we only want values that are cached in the index.

That makes sense in itself, but then the original (wc_id, local_relpath, 
op_depth) index is redundant.  So wouldn't it be better to add 'parent_relpath' 
to the primary key:

-  PRIMARY KEY (wc_id, local_relpath, op_depth)
+  PRIMARY KEY (wc_id, parent_relpath, local_relpath, op_depth)

and not have a second index?  Then there would only be one index to update, and 
all the other goodness would still be there.

- Julian


Re: Pristine text missing - cleanup doesn't work

2013-04-22 Thread Johan Corveleyn
On Mon, Apr 22, 2013 at 4:34 PM, Julian Foad julianf...@btopenworld.com wrote:
 I have filed this as issue #4357, Pristine text missing - cleanup doesn't 
 work, with a reference to this email thread.

 I thought of the following possible improvements, which I have noted in the 
 doc string of pristine_cleanup_wcroot():

  * TODO: At least check that any zero refcount is really correct, before
  *   using it.
  *
  * TODO: Ideas for possible extra clean-up operations:
  *
  *   * Check and correct all the refcounts.  Identify any rows missing
  * from the 'pristine' table.  (Create a temporary index for speed
  * if necessary?)
  *
  *   * Check the checksums.  (Very expensive to check them all, so find
  * a way to not check them all.)
  *
  *   * Check for pristine files missing from disk but referenced in the
  * 'pristine' table.
  *
  *   * Repair any pristine files missing from disk and/or rows missing
  * from the 'pristine' table and/or bad checksums.  Generally
  * requires contacting the server, so requires support at a higher
  * level than this function.
  *
  *   * Identify any pristine text files on disk that are not referenced
  * in the DB, and delete them.
  *
  * TODO: Provide feedback about any errors found and any corrections made.

 Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in 
 my initial email but forgot to attach there.


Great. Thanks for driving this, Julian.

Naively I'd say that any validation / repair that SmartSVN does (and
perhaps more) is interesting for inclusion in the core (or if not in
the core, a tool that's supported by the Subversion project is an
alternative).

--
Johan


Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Julian Foad
Julian Foad wrote:
 Bert Huijben wrote:
  We also have that index..., but we query nodes in 3 ways:
 
  * Where local_relpath = 'something' (exact lookup)
  * Where parent_relpath = 'something' (everything in a directory)
  * Where local_relpath  './' and local_relpath  '0' (all
  descendants)
 
  This index is used for that second variant and by making it a unique index
  instead of one with the same value multiple times it provides a stable
  order, cheaper index-updates (via exact lookup) and cheaper lookups for the
  case where we only want values that are cached in the index.
 
 That makes sense in itself, but then the original (wc_id, local_relpath, 
 op_depth) index is redundant.  So wouldn't it be better to add 
 'parent_relpath' to the primary key:
 
 -  PRIMARY KEY (wc_id, local_relpath, op_depth)
 +  PRIMARY KEY (wc_id, parent_relpath, local_relpath, op_depth)
 
 and not have a second index?  Then there would only be one index to update, 
 and 
 all the other goodness would still be there.

Ah... but that index couldn't be used for queries that only provide a 
local_relpath.  Although parent_relpath is a prefix of local_relpath, a plain 
'local_relpath' index is ordered by lexical order where the '/' character is 
not special (ordering example: foo0bar, foo/bar, fooZbar), whereas the local 
relpaths in a (parent_relpath, local_relpath) index would be would be in a 
different order (foo/bar, foo0bar, fooZbar).

I wish we indexed the tables by (parent_relpath, basename) instead of 
duplicating the whole parent_relpath in the local_relpath column; that would 
make things like this easier.

Sorry for the noise.

- Julian



Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema

2013-04-22 Thread Alan Barrett

On Mon, 22 Apr 2013, Bert Huijben wrote:

Sqlite doesn't implement an ALTER table statement that can update this, so I
don't think we should try to change this for 1.8.


You can copy the data to a temporary table; drop the old table; create a
new table with the new schema; and then copy everything back from the
temporary table.  Do it all in a transaction and it should be reasonably
efficient.  Like this:

BEGIN;
CREATE TEMP TABLE temp_foo AS SELECT * FROM foo;
DROP TABLE foo;
CREATE TABLE foo (col1 TYPE1, col2 TYPE2, col3 TYPE3);
INSERT INTO foo SELECT col1, col2, col3 FROM temp_foo;
DROP TABLE temp_foo;
COMMIT;

--apb (Alan Barrett)