RE: log --search test failures on trunk and 1.8.x
-Original Message- From: Stefan Sperling [mailto:s...@elego.de] Sent: maandag 22 april 2013 00:08 To: Ivan Zhakov Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote: So I propose the following plan: 1. Make 'log --search case-sensitive in trunk and 1.8.x. 2. Merge utf8proc stuff to trunk 3. Implement svn_utf__casefold() using utf8proc 4. Implement 'log --isearch' using No --isearch please. It did exist on trunk but we made --search case-insensitive in r1388530 to avoid having too many options. Has anyone tried my APR patch on windows yet? I'd be interested to know whether or not that helps... if you are running Windows and care about this issue, please let me know if my APR patch helps so that I can prepare a fix for APR and SVN 1.8.x. I think we can do better than making --search case-sensitive just because of a bug in APR. I don't see why we couldn't ship a private and fixed version of apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined behaviour within tolower() via apr_fnmatch() without requiring future APR versions. Would this not be a good way of fixing this? What would this fix? This doesn't make apr_fnmatch use proper case folding, as that needs proper UTF-8 processing and following locale rules. The hack will make apr_fnmatch whatever is the current locale casing and current chararacter encoding rules on the platform for our UTF-8 characters in the log message. (Which are known to not always be strict utf-8). And even there it ignores multibyte characters by handing them to tolower one at a time. This is just stacking undefined behaviour. If you would first convert the characters back to the system encoding and then pass things to apr_fnmatch it would be good enough for an 'svn' implementation for me, but probably not for our libraries. Like Ivan suggested: for our libraries we want strict behavior over all platforms, no undefined behavior. Bert
Re: log --search test failures on trunk and 1.8.x
On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote: -Original Message- From: Stefan Sperling [mailto:s...@elego.de] Sent: maandag 22 april 2013 00:08 To: Ivan Zhakov Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote: So I propose the following plan: 1. Make 'log --search case-sensitive in trunk and 1.8.x. 2. Merge utf8proc stuff to trunk 3. Implement svn_utf__casefold() using utf8proc 4. Implement 'log --isearch' using No --isearch please. It did exist on trunk but we made --search case-insensitive in r1388530 to avoid having too many options. Has anyone tried my APR patch on windows yet? I'd be interested to know whether or not that helps... if you are running Windows and care about this issue, please let me know if my APR patch helps so that I can prepare a fix for APR and SVN 1.8.x. I think we can do better than making --search case-sensitive just because of a bug in APR. I don't see why we couldn't ship a private and fixed version of apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined behaviour within tolower() via apr_fnmatch() without requiring future APR versions. Would this not be a good way of fixing this? What would this fix? Our custom version of apr_fnmatch() would not pass values to tolower() which cannot be represented as unsigned char, thus eliminating undefined behaviour and avoiding assertion failures on Windows. We would put this custom fnmatch() into 'svn' (not the libraries) to specifically address the assertion failures caused by svn log --search. Is that more clear and does it make sense? This doesn't make apr_fnmatch use proper case folding, as that needs proper UTF-8 processing and following locale rules. AFAIK, we are talking about assertion failures in the test suite on Windows. The discussion has balooned into adding locale-specific case-folding to apr_fnmatch() in order to make log --search truly case-insensitive in all locales. But that is a separate issue from assertion failures in tests. I want to fix the assertions, I do not want to open up the can of worms required to have locale-aware case-insensitive matching in apr_fnmatch() (at least I don't want to open it right now -- perhaps later but it would be a change to APR, not Subversion).
RE: log --search test failures on trunk and 1.8.x
-Original Message- From: 'Stefan Sperling' [mailto:s...@elego.de] Sent: maandag 22 april 2013 11:46 To: Bert Huijben Cc: 'Ivan Zhakov'; 'Branko Čibej'; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote: -Original Message- From: Stefan Sperling [mailto:s...@elego.de] Sent: maandag 22 april 2013 00:08 To: Ivan Zhakov Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote: So I propose the following plan: 1. Make 'log --search case-sensitive in trunk and 1.8.x. 2. Merge utf8proc stuff to trunk 3. Implement svn_utf__casefold() using utf8proc 4. Implement 'log --isearch' using No --isearch please. It did exist on trunk but we made --search case-insensitive in r1388530 to avoid having too many options. Has anyone tried my APR patch on windows yet? I'd be interested to know whether or not that helps... if you are running Windows and care about this issue, please let me know if my APR patch helps so that I can prepare a fix for APR and SVN 1.8.x. I think we can do better than making --search case-sensitive just because of a bug in APR. I don't see why we couldn't ship a private and fixed version of apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined behaviour within tolower() via apr_fnmatch() without requiring future APR versions. Would this not be a good way of fixing this? What would this fix? Our custom version of apr_fnmatch() would not pass values to tolower() which cannot be represented as unsigned char, thus eliminating undefined behaviour and avoiding assertion failures on Windows. We would put this custom fnmatch() into 'svn' (not the libraries) to specifically address the assertion failures caused by svn log --search. Is that more clear and does it make sense? This doesn't make apr_fnmatch use proper case folding, as that needs proper UTF-8 processing and following locale rules. AFAIK, we are talking about assertion failures in the test suite on Windows. The discussion has balooned into adding locale-specific case-folding to apr_fnmatch() in order to make log --search truly case-insensitive in all locales. But that is a separate issue from assertion failures in tests. I want to fix the assertions, I do not want to open up the can of worms required to have locale-aware case-insensitive matching in apr_fnmatch() (at least I don't want to open it right now -- perhaps later but it would be a change to APR, not Subversion). The assertion shows a design problem which we should handle for future compatibility and you suggest just adding some bandages to patch/hide the test failure? The current code is broken and the suggestion you do is like the solution mostly vetoed by most of the responders in this thread: assuming there is only us-english, by using a function that has platform specific behavior. (tolower() is locale and platform character encoding dependent. You should never just pass individual UTF-8 bytes to it) Personally, I don't care if 'svn' does such an assumption about the world and leave the bugfixing to future releases, but libsvn_* should never have such assumptions in it. But then please just hardcode converting 'a-z' to 'A-Z' for the log messages instead of pretending tolower() handles everything else for you. This has strictly defined behavior and shows that we need a better implementation later. While using tolower() with a hack around it just ships undefined behavior. Bert
Re: log --search test failures on trunk and 1.8.x
On 22.04.2013 12:59, Bert Huijben wrote: -Original Message- From: 'Stefan Sperling' [mailto:s...@elego.de] Sent: maandag 22 april 2013 11:46 To: Bert Huijben Cc: 'Ivan Zhakov'; 'Branko Čibej'; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Mon, Apr 22, 2013 at 11:22:11AM +0200, Bert Huijben wrote: -Original Message- From: Stefan Sperling [mailto:s...@elego.de] Sent: maandag 22 april 2013 00:08 To: Ivan Zhakov Cc: Branko Čibej; Bert Huijben; dev@subversion.apache.org Subject: Re: log --search test failures on trunk and 1.8.x On Sun, Apr 21, 2013 at 07:11:14PM +0400, Ivan Zhakov wrote: So I propose the following plan: 1. Make 'log --search case-sensitive in trunk and 1.8.x. 2. Merge utf8proc stuff to trunk 3. Implement svn_utf__casefold() using utf8proc 4. Implement 'log --isearch' using No --isearch please. It did exist on trunk but we made --search case-insensitive in r1388530 to avoid having too many options. Has anyone tried my APR patch on windows yet? I'd be interested to know whether or not that helps... if you are running Windows and care about this issue, please let me know if my APR patch helps so that I can prepare a fix for APR and SVN 1.8.x. I think we can do better than making --search case-sensitive just because of a bug in APR. I don't see why we couldn't ship a private and fixed version of apr_fnmatch() for use by log --search in 1.8.x, to avoid undefined behaviour within tolower() via apr_fnmatch() without requiring future APR versions. Would this not be a good way of fixing this? What would this fix? Our custom version of apr_fnmatch() would not pass values to tolower() which cannot be represented as unsigned char, thus eliminating undefined behaviour and avoiding assertion failures on Windows. We would put this custom fnmatch() into 'svn' (not the libraries) to specifically address the assertion failures caused by svn log --search. Is that more clear and does it make sense? This doesn't make apr_fnmatch use proper case folding, as that needs proper UTF-8 processing and following locale rules. AFAIK, we are talking about assertion failures in the test suite on Windows. The discussion has balooned into adding locale-specific case-folding to apr_fnmatch() in order to make log --search truly case-insensitive in all locales. But that is a separate issue from assertion failures in tests. I want to fix the assertions, I do not want to open up the can of worms required to have locale-aware case-insensitive matching in apr_fnmatch() (at least I don't want to open it right now -- perhaps later but it would be a change to APR, not Subversion). The assertion shows a design problem which we should handle for future compatibility and you suggest just adding some bandages to patch/hide the test failure? The current code is broken and the suggestion you do is like the solution mostly vetoed by most of the responders in this thread: assuming there is only us-english, by using a function that has platform specific behavior. (tolower() is locale and platform character encoding dependent. You should never just pass individual UTF-8 bytes to it) Personally, I don't care if 'svn' does such an assumption about the world and leave the bugfixing to future releases, but libsvn_* should never have such assumptions in it. But then please just hardcode converting 'a-z' to 'A-Z' for the log messages instead of pretending tolower() handles everything else for you. This has strictly defined behavior and shows that we need a better implementation later. While using tolower() with a hack around it just ships undefined behavior. I tend to agree. As it is, it would be better to make --search case-sensitive, or remove it from 1.8, than to tell people it's case-insensitive but just happens not to work as advertised. -- Brane -- Branko Čibej Director of Subversion | WANdisco | www.wandisco.com
Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py
On 04/20/2013 05:00 AM, Bert Huijben wrote: Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. Agreed. Was thinking exactly the same thing. I wonder, though: isn't this rev-range support better situated in the RA layer itself, extended up to the server? I ask because IIRC, the fallback implementation of svn_ra_get_repos_locations() uses none other than the 'log' functionality. So I see this: BEST CASE: client's RA layer and server can handle log-with-ranges FALLBACK 1: client RA layer works around server's lack of log-with-ranges support by using get_locations() and a series of log requests FALLBACK 2: client RA layer works around server's lack of log-with-ranges support and lack of get_locations support by using a single log request from which disinteresting revisions are merely filtered out. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Enterprise Cloud Development signature.asc Description: OpenPGP digital signature
RE: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py
-Original Message- From: C. Michael Pilato [mailto:cmpil...@collab.net] Sent: maandag 22 april 2013 15:46 To: Bert Huijben Cc: dev@subversion.apache.org Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py On 04/20/2013 05:00 AM, Bert Huijben wrote: Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. Agreed. Was thinking exactly the same thing. I wonder, though: isn't this rev-range support better situated in the RA layer itself, extended up to the server? I ask because IIRC, the fallback implementation of svn_ra_get_repos_locations() uses none other than the 'log' functionality. So I see this: BEST CASE: client's RA layer and server can handle log-with-ranges FALLBACK 1: client RA layer works around server's lack of log-with-ranges support by using get_locations() and a series of log requests I think this should work for 1.5+ servers. FALLBACK 2: client RA layer works around server's lack of log-with-ranges support and lack of get_locations support by using a single log request from which disinteresting revisions are merely filtered out. +1 Given that we already branched for 1.8 I would suggest backporting FALLBACK 1 to 1.8 and leaving the BEST CASE for 1.9. But maybe if somebody starts now, he/she can get it ready for 1.8. (Would be nice to see the performance improvements sooner) The fallback 1/2 cases can be implemented as a bugfix even for 1.7 if there is enough interest as they don't change public APIs. Bert
Re: Pristine text missing - cleanup doesn't work
I have filed this as issue #4357, Pristine text missing - cleanup doesn't work, with a reference to this email thread. I thought of the following possible improvements, which I have noted in the doc string of pristine_cleanup_wcroot(): * TODO: At least check that any zero refcount is really correct, before * using it. * * TODO: Ideas for possible extra clean-up operations: * * * Check and correct all the refcounts. Identify any rows missing * from the 'pristine' table. (Create a temporary index for speed * if necessary?) * * * Check the checksums. (Very expensive to check them all, so find * a way to not check them all.) * * * Check for pristine files missing from disk but referenced in the * 'pristine' table. * * * Repair any pristine files missing from disk and/or rows missing * from the 'pristine' table and/or bad checksums. Generally * requires contacting the server, so requires support at a higher * level than this function. * * * Identify any pristine text files on disk that are not referenced * in the DB, and delete them. * * TODO: Provide feedback about any errors found and any corrections made. Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in my initial email but forgot to attach there. - Julian Johan Corveleyn wrote: On Thu, Apr 18, 2013 at 10:05 PM, Bert Huijben b...@qqmail.nl wrote: Johan Corveleyn reported that he heard multiple reports of this being caused by earlier SvnKit versions. I have not got a single bugreport of this type for AnkhSvn via our error reporting infrastructure. I don’t see how anybody using SQLite can cause this as the updating of the reference count is only handled by the triggers... But SvnKit implemented their own clone, which used to have problems in this field... Indeed. But those errors have since been fixed (have not seen such problems anymore in the last half year or so). However ... On Thu, Apr 18, 2013 at 9:55 PM, Julian Foad julianf...@btopenworld.com wrote: ... I think we should do something to improve the situation. Fortunately it seems to be relatively rare, as not many people have complained about it, but there are over 5000 pages listed by Google with that error message. Big +1 from me. See my own prodding for improvement in this area. http://svn.haxx.se/dev/archive-2012-09/0304.shtml http://mail-archives.apache.org/mod_mbox/subversion-dev/201206.mbox/%3CCAB84uBUJeZDHjhKP6Fz2hoU_g1oHUZyUq_wNVNC=ppyju7h...@mail.gmail.com%3E http://svn.haxx.se/dev/archive-2011-07/0001.shtml FWIW, I don't buy the argument that svn / sqlite can't break. Bugs (or rare cosmic ray disturbances or whatever) do happen. Having an 'svn repair' / 'svn validate' command or something (like SmartSVN's 'validate admin area') would definitely be nice.#!/bin/bash # Given a pristine file's SHA1, which is missing from the PRISTINE table but # referenced from the NODES table, download it into the pristine store and # insert a PRISTINE table row. SHA1=$1 set -e PRISTINE=.svn/pristine/${SHA1:0:2}/$SHA1.svn-base echo In NODES table: svnsqlite3 'select * from nodes where checksum=$sha1$'$SHA1'' RRP=$(svnsqlite3 'select repos_path from nodes where checksum=$sha1$'$SHA1'') REV=$(svnsqlite3 'select revision from nodes where checksum=$sha1$'$SHA1'') echo In PRISTINE table: svnsqlite3 'select * from pristine where checksum=$sha1$'$SHA1'' if [ -e $PRISTINE ]; then echo 2 Pristine file already exists: $PRISTINE exit 1 fi echo Downloading '^/$RRP@$REV'... svn cat ^/$RRP@$REV $PRISTINE V=$(sha1sum $PRISTINE); V=${V:0:40} echo SHA1: $V V=$(md5sum $PRISTINE); MD5=${V:0:32} echo MD5: $MD5 LEN=$(stat --printf=%s $PRISTINE) echo Len: $LEN echo Inserting PRISTINE table row... #echo svnsqlite3 'insert into pristine values ($sha1$'$SHA1', null, '$LEN', 1, $md5 $'$MD5')' svnsqlite3 'insert into pristine values ($sha1$'$SHA1', null, '$LEN', 1, $md5 $'$MD5')' svnsqlite3 'select * from pristine where checksum=$sha1$'$SHA1''
AW: Making the Windows Build Easier
Hi, Berts SharpSVN build also includes a way to pull all the dependencies, using msbuild. However, it does not work with the express editions yet, AFAICS, so your script clearly has at least one advantage. :-) Will it work using the VS 2010 command prompt? Best regards Markus Schaber CODESYS® a trademark of 3S-Smart Software Solutions GmbH Inspiring Automation Solutions 3S-Smart Software Solutions GmbH Dipl.-Inf. Markus Schaber | Product Development Core Technology Memminger Str. 151 | 87439 Kempten | Germany Tel. +49-831-54031-979 | Fax +49-831-54031-50 E-Mail: m.scha...@codesys.com | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com CODESYS forum: http://forum.codesys.com Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915 -Ursprüngliche Nachricht- Von: Ben Reser [mailto:b...@reser.org] Gesendet: Sonntag, 14. April 2013 02:46 An: Subversion Development Betreff: Making the Windows Build Easier As we've discussed previously a big part of the problem in getting the Windows build working has been getting the dependencies built. Everyone I've talked to manages to get this done, then has no idea how to duplicate what they've done. This past week I wanted to use some static analysis tools that were only available on Windows. So I set out to build Subversion on Windows. I spent about 2 days fighting with it, taking notes the whole way so I could try and reproduce it. On the 3rd day I realized that taking notes wasn't going to work and that by the time I finished I'd have some instructions that left some important detail out or didn't explain it fully. So I started writing a script to automate this annoying process. The result is the build-svn-deps-win.pl script that I've commited in r1467714. This script is far from perfect. It still needs some work on it, but it's probably about 90% of the way there to turning setting up a Windows development environment into about an hour effort from a couple day slog that you have no hope of being able to reproduce. A lot of credit goes to pburba for his blog post here: http://blogs.collab.net/subversion/building-subversion-on-windows-a- walk-through Without it I would have spent even more time doing this than I did. So with all that said, here are some details. It downloads the following things: bdb zlib pcre httpd apr apr-util apr-iconv sqlite-amalgamation serf It builds all of the above expect for sqlite and serf which get built as part of the Subversion build itself (zlib could be built there as well but I built it so that mod_deflate would work in httpd). The script is built to use an entirely modern tool chain and dependencies. Everything is the current released version. That means httpd-2.4.4, Visual Studio 2012, etc... It does not build or deal with neon, though that could be trivially added. Unfortunately, it does not work with Visual Studio 2012 Windows Desktop Express due to the lack of devenv.(com|exe). However, the script is written to produce a the binaries in a way that they can be packaged. There's still some work to do here in making the package as minimal as possible (some of it possibly in our build system). At current the script only builds a Win32 Release build. But I expect to change that in the near future. So without further ado, here are some instructions: 1) Get the dependencies. If you have a full version of Visual Studio (not an Express version) you can follow a) or b) depending on your preference. For Visual Studio Express you have to follow b). a) Pick a location to do the build and make a directory to build the deps, in my case I put it in C:\Users\breser\svn-trunk-deps. Download the script at http://svn.apache.org/repos/asf/subversion/trunk/tools/dev/build-svn- deps-win.pl and put it in this directory. Open the script in a text editor and make sure you install the few dependencies you'll need (Perl, Python, 7-Zip and CMake). Python actually may not be needed for this but it's needed for the Subversion build itself. If presented the option to add the commands to your path accept it (Python and Perl will do this by default, CMake you have to explicitly decide to do this). 7-zip doesn't have one so make a note of where you installed it for later. Once you have the dependencies downloaded open the VS2012 x86 Native Tools Command Prompt (usually found in the Visual Studio Tools group). cd to your directory and run build-svn-deps-win.pl. If 7z.exe is not in C:\Program Files\7-Zip\7z.exe you can override this by passing SEVEN_ZIP=C:\Path\To\7z.exe to the script. The script will then download and build the dependencies for you. For me on my VM setup it takes about 45 minutes for this to run. b) Download http://ben.reser.org/svn-windows-deps/svn-trunk-deps-win32- release-20130413.7z and extract it. You can
RE: Pristine text missing - cleanup doesn't work
-Original Message- From: Julian Foad [mailto:julianf...@btopenworld.com] Sent: maandag 22 april 2013 16:34 To: Johan Corveleyn Cc: Bert Huijben; Subversion Development Subject: Re: Pristine text missing - cleanup doesn't work I have filed this as issue #4357, Pristine text missing - cleanup doesn't work, with a reference to this email thread. I thought of the following possible improvements, which I have noted in the doc string of pristine_cleanup_wcroot(): * TODO: At least check that any zero refcount is really correct, before * using it. We already do this in debug builds. The foreign key check enforces this, but at a certainly not-null performance cost (understatement). This is why I haven't enabled this for release builds. (The original reason for not enabling them was that it required a newer Sqlite version) You can't delete a PRISTINE row that is used with foreign keys enabled. (That part is currently not backed by an index... the performance killer. And optimized away by our reference counting via triggers) * TODO: Ideas for possible extra clean-up operations: * * * Check and correct all the refcounts. Identify any rows missing * from the 'pristine' table. (Create a temporary index for speed * if necessary?) The index you need is already there, or the triggers that update them on any change of NODES would be terribly slow. With foreign key checking enabled you can't add a row to NODES pointing to a non-existing PRISTINE row. * * * Check the checksums. (Very expensive to check them all, so find * a way to not check them all.) * * * Check for pristine files missing from disk but referenced in the * 'pristine' table. * * * Repair any pristine files missing from disk and/or rows missing * from the 'pristine' table and/or bad checksums. Generally * requires contacting the server, so requires support at a higher * level than this function. * * * Identify any pristine text files on disk that are not referenced * in the DB, and delete them. * * TODO: Provide feedback about any errors found and any corrections made. Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in my initial email but forgot to attach there. Bert
RE: Pristine text missing - cleanup doesn't work
-Original Message- From: Bert Huijben [mailto:b...@qqmail.nl] Sent: maandag 22 april 2013 17:18 To: 'Julian Foad'; 'Johan Corveleyn' Cc: 'Subversion Development' Subject: RE: Pristine text missing - cleanup doesn't work -Original Message- From: Julian Foad [mailto:julianf...@btopenworld.com] Sent: maandag 22 april 2013 16:34 To: Johan Corveleyn Cc: Bert Huijben; Subversion Development Subject: Re: Pristine text missing - cleanup doesn't work I have filed this as issue #4357, Pristine text missing - cleanup doesn't work, with a reference to this email thread. I thought of the following possible improvements, which I have noted in the doc string of pristine_cleanup_wcroot(): * TODO: At least check that any zero refcount is really correct, before * using it. We already do this in debug builds. The foreign key check enforces this, but at a certainly not-null performance cost (understatement). This is why I haven't enabled this for release builds. (The original reason for not enabling them was that it required a newer Sqlite version) You can't delete a PRISTINE row that is used with foreign keys enabled. (That part is currently not backed by an index... the performance killer. And optimized away by our reference counting via triggers) If we perform a check to verify the reference count before using it for cleanup we should just remove the reference count updating triggers. This gives a few percent performance boost to every update/add/delete on the NODES table. Given that we never update the reference count manually I would be very surprised if we could ever find a reason why somebody not using the Sqlite db directly (via Sqlite or a third party sqlite db implementation) would get the refcount in a broken state. Bert
Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
I noticed that a fresh 1.8.x (pre-release) WC has a different schema from a WC created by 1.7.8 and upgraded by 1.8.x. (I haven't tried different 1.7.x versions.) The differences are: --- schema-1.7.8-upgraded-to-1.8-dev +++ schema-1.8-dev - file_external TEXT, + file_external INTEGER, -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); This looks like a 1.8 release blocker. Filed as issue #4358 http://subversion.tigris.org/issues/show_bug.cgi?id=4358. - Julian -- Join WANdisco's free daily demo sessions on Scaling Subversion for the Enterprise http://www.wandisco.com/training/webinars
Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py
On Sat, Apr 20, 2013 at 5:00 AM, Bert Huijben b...@qqmail.nl wrote: -Original Message- From: pbu...@apache.org [mailto:pbu...@apache.org] Sent: vrijdag 19 april 2013 20:22 To: comm...@subversion.apache.org Subject: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py Author: pburba Date: Fri Apr 19 18:21:36 2013 New Revision: 1469982 URL: http://svn.apache.org/r1469982 Log: 2nd attempt to fix issue #4355 'svn_client_log5 broken with multiple revisions which span a rename'. Hi, This patch and the few before extend svn_client_log to call more and more history operations on the same path. Hi Bert, The few before? There is only one (rr1469515) that I see. Are you referring to work older than what I've done in the prior week? If so could you point out the specific revs? I want to be sure I know what you are referring to. (The path resolving via svn_client__repos_locations() is on the file system layer essentially the same thing as walking the history on the filesystem layer.) Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. How exactly would svn_client__repos_locations be useful? Do you mean svn_ra_get_location_segments? libsvn_client's log operation is already a very slow operation over high latency links with mod_dav and this patching with extra ra requests is making it worse and worse. I reverted r1469515 and r1469982 (excepting the new log tests of course, I left those in), so we are back at square one. Performing this operation once should improve the log performance considerable, making the multi range version much better for script and api users. Bert On Mon, Apr 22, 2013 at 9:46 AM, C. Michael Pilato cmpil...@collab.net wrote: On 04/20/2013 05:00 AM, Bert Huijben wrote: Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. Agreed. Was thinking exactly the same thing. I wonder, though: isn't this rev-range support better situated in the RA layer itself, extended up to the server? I ask because IIRC, the fallback implementation of svn_ra_get_repos_locations() You too mean svn_ra_get_location_segments or svn_ra_get_locations I assume? uses none other than the 'log' functionality. So I see this: BEST CASE: client's RA layer and server can handle log-with-ranges FALLBACK 1: client RA layer works around server's lack of log-with-ranges support by using get_locations() and a series of log requests FALLBACK 2: client RA layer works around server's lack of log-with-ranges support and lack of get_locations support by using a single log request from which disinteresting revisions are merely filtered out. -- Paul T. Burba CollabNet, Inc. -- www.collab.net -- Enterprise Cloud Development Skype: ptburba On Mon, Apr 22, 2013 at 10:21 AM, Bert Huijben b...@qqmail.nl wrote: -Original Message- From: C. Michael Pilato [mailto:cmpil...@collab.net] Sent: maandag 22 april 2013 15:46 To: Bert Huijben Cc: dev@subversion.apache.org Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py On 04/20/2013 05:00 AM, Bert Huijben wrote: Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. Agreed. Was thinking exactly the same thing. I wonder, though: isn't this rev-range support better situated in the RA layer itself, extended up to the server? I ask because IIRC, the fallback implementation of svn_ra_get_repos_locations() uses none other than the 'log' functionality. So I see this: BEST CASE: client's RA layer and server can handle log-with-ranges FALLBACK 1: client RA layer works around server's lack of log-with-ranges support by using get_locations() and a series of log requests I think this should work for 1.5+ servers. FALLBACK 2: client RA layer works around server's lack of log-with-ranges support and lack of get_locations support by using a single
Re: Pristine text missing - cleanup doesn't work
Bert Huijben wrote: * TODO: At least check that any zero refcount is really correct, before * using it. We already do this in debug builds. Hmm, you're right, when using a current 1.8-dev build and a fresh WC. When I tested this scenario the other day it was in an old WC that has been upgraded from time to time, and is now supposedly at 1.8. There (I just repeated to make sure) if I set the refcount to 0, then 'svn cleanup' quietly deletes the 'pristine' table row and the file. The DB schema in that WC does not match what is should be for 1.8 -- it doesn't have any 'REFERENCES PRISTINE (checksum)' clauses in it. It may have got that way by being upgraded incrementally through several 1.7-era development versions, so we need not be too concerned about it. (While investigating, I discovered and filed issue #4358 Svn WC 1.8 upgrade from 1.7 - wrong schema, http://subversion.tigris.org/issues/show_bug.cgi?id=4358. The schema differences there do *not* include a missing 'REFERENCES' clause.) I tested with a checkout containing just one file in four freshly created WCs: - checkout with 1.6.20, upgrade with 1.8.x - checkout with 1.6.20, upgrade with 1.7.8, upgrade with 1.8.x - checkout with 1.7.8, upgrade with 1.8.x - checkout with 1.8.x and in each of these, I ran: - sqlite3 'update pristine set refcount=0' - svn cleanup # fails: 'foreign key constraint failed' - svn cat foo # succeeds - sqlite3 'update pristine set refcount=1' - svn cleanup # succeeds In each case, 'svn cleanup' errored out with an 'foreign key constraint failed' error, preventing any further damage. So that seems to be fairly safe when the foreign key constraint checking is enabled. The foreign key check enforces this, but at a certainly not-null performance cost (understatement). This is why I haven't enabled this for release builds. (The original reason for not enabling them was that it required a newer Sqlite version) You can't delete a PRISTINE row that is used with foreign keys enabled. (That part is currently not backed by an index... the performance killer. And optimized away by our reference counting via triggers) If we perform a check to verify the reference count before using it for cleanup we should just remove the reference count updating triggers. This gives a few percent performance boost to every update/add/delete on the NODES table. If we removed the ref-count-updating triggers, then we wouldn't be able to select pristines for deletion based on refount==0. What would we do instead? Given that we never update the reference count manually I would be very surprised if we could ever find a reason why somebody not using the Sqlite db directly (via Sqlite or a third party sqlite db implementation) would get the refcount in a broken state. Well, it has happened before and it will probably happen again. - Julian
Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py
On 04/22/2013 01:22 PM, Paul Burba wrote: I wonder, though: isn't this rev-range support better situated in the RA layer itself, extended up to the server? I ask because IIRC, the fallback implementation of svn_ra_get_repos_locations() You too mean svn_ra_get_location_segments or svn_ra_get_locations I assume? Yes, sorry. I confess I was trying to dredge up the API name from memory. It's svn_ra_get_location_segments() that I had in mind. Sorry for the confusion. -- C. Michael Pilato cmpil...@collab.net CollabNet www.collab.net Enterprise Cloud Development signature.asc Description: OpenPGP digital signature
Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
I (Julian Foad) wrote: I noticed that a fresh 1.8.x (pre-release) WC has a different schema from a WC created by 1.7.8 and upgraded by 1.8.x. (I haven't tried different 1.7.x versions.) The differences are: --- schema-1.7.8-upgraded-to-1.8-dev +++ schema-1.8-dev - file_external TEXT, + file_external INTEGER, -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); This looks like a 1.8 release blocker. Before anyone points out that these schema differences may not in themselves cause wrong functionality, I think the very fact that 'upgrade' doesn't make the schema identical is a blocker issue in itself. I don't know, and haven't tried to find out, whether there is any functional brokenness here. If the upgrade missed making these changes, then we definitely need to review this part of the upgrade to ensure it isn't missing any data changes that should be made along with these schema changes. - Julian Filed as issue #4358 http://subversion.tigris.org/issues/show_bug.cgi?id=4358.
RE: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py
-Original Message- From: Paul Burba [mailto:ptbu...@gmail.com] Sent: maandag 22 april 2013 19:23 To: Bert Huijben Cc: Michael Pilato; Subversion Development Subject: Re: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py On Sat, Apr 20, 2013 at 5:00 AM, Bert Huijben b...@qqmail.nl wrote: -Original Message- From: pbu...@apache.org [mailto:pbu...@apache.org] Sent: vrijdag 19 april 2013 20:22 To: comm...@subversion.apache.org Subject: svn commit: r1469982 - in /subversion/trunk/subversion: include/private/svn_client_private.h libsvn_client/log.c libsvn_client/ra.c tests/cmdline/log_tests.py Author: pburba Date: Fri Apr 19 18:21:36 2013 New Revision: 1469982 URL: http://svn.apache.org/r1469982 Log: 2nd attempt to fix issue #4355 'svn_client_log5 broken with multiple revisions which span a rename'. Hi, This patch and the few before extend svn_client_log to call more and more history operations on the same path. Hi Bert, The few before? There is only one (rr1469515) that I see. Are you referring to work older than what I've done in the prior week? If so could you point out the specific revs? I want to be sure I know what you are referring to. I remember some other recent changes to the log function (Related to its strange set of arguments), but I can't find the references here. (The path resolving via svn_client__repos_locations() is on the file system layer essentially the same thing as walking the history on the filesystem layer.) Instead of patching more and more corner cases with individual extra ra calls I think svn_client_log should call svn_client__repos_locations() once to obtain the history of the path to look at over the entire range of versions. (=over MAX(rev):MIN(rev)) and then use that information directly as the paths for the rest of the log calls. How exactly would svn_client__repos_locations be useful? Do you mean svn_ra_get_location_segments? svn_client__repos_locations() is a libsvn_client wrapper of svn_ra_get_locations(), which is similar to svn_ra_get_location_segments(), but returns result in a harder to use format. The reason I used that function in my e-mail is because that is the function that implements the real work inside resolve_rev_and_url() Bert libsvn_client's log operation is already a very slow operation over high latency links with mod_dav and this patching with extra ra requests is making it worse and worse. I reverted r1469515 and r1469982 (excepting the new log tests of course, I left those in), so we are back at square one. Performing this operation once should improve the log performance considerable, making the multi range version much better for script and api users. Bert
RE: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
-Original Message- From: Julian Foad [mailto:julianf...@btopenworld.com] Sent: maandag 22 april 2013 18:51 To: Subversion Development Subject: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema I noticed that a fresh 1.8.x (pre-release) WC has a different schema from a WC created by 1.7.8 and upgraded by 1.8.x. (I haven't tried different 1.7.x versions.) The differences are: --- schema-1.7.8-upgraded-to-1.8-dev +++ schema-1.8-dev - file_external TEXT, + file_external INTEGER, This doesn't matter for our use of Sqlite. We only use NULL vs set, but integer documents what we now store in file_external. Sqlite doesn't implement an ALTER table statement that can update this, so I don't think we should try to change this for 1.8. -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); I'll look into these tomorrow, but the performance difference is not that large. And as the names are 100% identical they can never cause problems upgrading to future versions. (And SQL guarantees that this can't affect our queries in any other way than performance) Bert
Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
Bert Huijben wrote: Julian Foad wrote: --- schema-1.7.8-upgraded-to-1.8-dev +++ schema-1.8-dev - file_external TEXT, + file_external INTEGER, This doesn't matter for our use of Sqlite. We only use NULL vs set, but integer documents what we now store in file_external. Sqlite doesn't implement an ALTER table statement that can update this, so I don't think we should try to change this for 1.8. -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); For 1.8, what's the point of still including 'parent_relpath' in the index, when we know that every 'local_relpath' value starts with 'parent_relpath'? Doesn't that just make the index a bit bigger and a bit slower than CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( wc_id, local_relpath, op_depth); ? I'll look into these tomorrow, but the performance difference is not that large. And as the names are 100% identical they can never cause problems upgrading to future versions. (And SQL guarantees that this can't affect our queries in any other way than performance) BTW, Brane has started working on this issue. - Julian
RE: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
-Original Message- From: Julian Foad [mailto:julianf...@btopenworld.com] Sent: maandag 22 april 2013 20:47 To: Bert Huijben Cc: 'Subversion Development' Subject: Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema Bert Huijben wrote: Julian Foad wrote: --- schema-1.7.8-upgraded-to-1.8-dev +++ schema-1.8-dev - file_external TEXT, + file_external INTEGER, This doesn't matter for our use of Sqlite. We only use NULL vs set, but integer documents what we now store in file_external. Sqlite doesn't implement an ALTER table statement that can update this, so I don't think we should try to change this for 1.8. -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); For 1.8, what's the point of still including 'parent_relpath' in the index, when we know that every 'local_relpath' value starts with 'parent_relpath'? Doesn't that just make the index a bit bigger and a bit slower than CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( wc_id, local_relpath, op_depth); We also have that index..., but we query nodes in 3 ways: * Where local_relpath = 'something' (exact lookup) * Where parent_relpath = 'something' (everything in a directory) * Where local_relpath './' and local_relpath '0' (all descendants) This index is used for that second variant and by making it a unique index instead of one with the same value multiple times it provides a stable order, cheaper index-updates (via exact lookup) and cheaper lookups for the case where we only want values that are cached in the index. ? I'll look into these tomorrow, but the performance difference is not that large. And as the names are 100% identical they can never cause problems upgrading to future versions. (And SQL guarantees that this can't affect our queries in any other way than performance) BTW, Brane has started working on this issue. There should be some examples in earlier format bumps where we drop one index and add another one. This change can safely made part of the last existing format bump for 1.8. Bert
Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
Bert Huijben wrote: Julian Foad wrote: -CREATE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( - wc_id, parent_relpath); +CREATE UNIQUE INDEX I_ACTUAL_PARENT ON ACTUAL_NODE ( + wc_id, parent_relpath, local_relpath); -CREATE INDEX I_NODES_PARENT ON NODES ( - wc_id, parent_relpath, op_depth); +CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( + wc_id, parent_relpath, local_relpath, op_depth); For 1.8, what's the point of still including 'parent_relpath' in the index, when we know that every 'local_relpath' value starts with 'parent_relpath'? Doesn't that just make the index a bit bigger and a bit slower than CREATE UNIQUE INDEX I_NODES_PARENT ON NODES ( wc_id, local_relpath, op_depth); We also have that index..., but we query nodes in 3 ways: * Where local_relpath = 'something' (exact lookup) * Where parent_relpath = 'something' (everything in a directory) * Where local_relpath './' and local_relpath '0' (all descendants) This index is used for that second variant and by making it a unique index instead of one with the same value multiple times it provides a stable order, cheaper index-updates (via exact lookup) and cheaper lookups for the case where we only want values that are cached in the index. That makes sense in itself, but then the original (wc_id, local_relpath, op_depth) index is redundant. So wouldn't it be better to add 'parent_relpath' to the primary key: - PRIMARY KEY (wc_id, local_relpath, op_depth) + PRIMARY KEY (wc_id, parent_relpath, local_relpath, op_depth) and not have a second index? Then there would only be one index to update, and all the other goodness would still be there. - Julian
Re: Pristine text missing - cleanup doesn't work
On Mon, Apr 22, 2013 at 4:34 PM, Julian Foad julianf...@btopenworld.com wrote: I have filed this as issue #4357, Pristine text missing - cleanup doesn't work, with a reference to this email thread. I thought of the following possible improvements, which I have noted in the doc string of pristine_cleanup_wcroot(): * TODO: At least check that any zero refcount is really correct, before * using it. * * TODO: Ideas for possible extra clean-up operations: * * * Check and correct all the refcounts. Identify any rows missing * from the 'pristine' table. (Create a temporary index for speed * if necessary?) * * * Check the checksums. (Very expensive to check them all, so find * a way to not check them all.) * * * Check for pristine files missing from disk but referenced in the * 'pristine' table. * * * Repair any pristine files missing from disk and/or rows missing * from the 'pristine' table and/or bad checksums. Generally * requires contacting the server, so requires support at a higher * level than this function. * * * Identify any pristine text files on disk that are not referenced * in the DB, and delete them. * * TODO: Provide feedback about any errors found and any corrections made. Also, here is the svn-fetch-pristine-by-sha1.sh script that I mentioned in my initial email but forgot to attach there. Great. Thanks for driving this, Julian. Naively I'd say that any validation / repair that SmartSVN does (and perhaps more) is interesting for inclusion in the core (or if not in the core, a tool that's supported by the Subversion project is an alternative). -- Johan
Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
Julian Foad wrote: Bert Huijben wrote: We also have that index..., but we query nodes in 3 ways: * Where local_relpath = 'something' (exact lookup) * Where parent_relpath = 'something' (everything in a directory) * Where local_relpath './' and local_relpath '0' (all descendants) This index is used for that second variant and by making it a unique index instead of one with the same value multiple times it provides a stable order, cheaper index-updates (via exact lookup) and cheaper lookups for the case where we only want values that are cached in the index. That makes sense in itself, but then the original (wc_id, local_relpath, op_depth) index is redundant. So wouldn't it be better to add 'parent_relpath' to the primary key: - PRIMARY KEY (wc_id, local_relpath, op_depth) + PRIMARY KEY (wc_id, parent_relpath, local_relpath, op_depth) and not have a second index? Then there would only be one index to update, and all the other goodness would still be there. Ah... but that index couldn't be used for queries that only provide a local_relpath. Although parent_relpath is a prefix of local_relpath, a plain 'local_relpath' index is ordered by lexical order where the '/' character is not special (ordering example: foo0bar, foo/bar, fooZbar), whereas the local relpaths in a (parent_relpath, local_relpath) index would be would be in a different order (foo/bar, foo0bar, fooZbar). I wish we indexed the tables by (parent_relpath, basename) instead of duplicating the whole parent_relpath in the local_relpath column; that would make things like this easier. Sorry for the noise. - Julian
Re: Issue #4358 - Svn WC 1.8 upgrade from 1.7 - wrong schema
On Mon, 22 Apr 2013, Bert Huijben wrote: Sqlite doesn't implement an ALTER table statement that can update this, so I don't think we should try to change this for 1.8. You can copy the data to a temporary table; drop the old table; create a new table with the new schema; and then copy everything back from the temporary table. Do it all in a transaction and it should be reasonably efficient. Like this: BEGIN; CREATE TEMP TABLE temp_foo AS SELECT * FROM foo; DROP TABLE foo; CREATE TABLE foo (col1 TYPE1, col2 TYPE2, col3 TYPE3); INSERT INTO foo SELECT col1, col2, col3 FROM temp_foo; DROP TABLE temp_foo; COMMIT; --apb (Alan Barrett)