Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700: Could fossil silently retry a couple times instead of giving up so easily? Not silent, but it can retry: http://www.fossil-scm.org/index.html/info/76bc297e96211b50d7b7e518ba45663c80889f1f This still won't avoid the occasional fork if the user answers ``Yes'' to the question, but it will try as many times as you configure it to try. Andy -- TAI64 timestamp: 4000539a77ca ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On 02/05/14 19:57, Andy Bradford wrote: Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #key_3_2 [...] As you say, it is highly reproducible, but it requires quite a bit of time to trigger sometimes. This particular error hasn't come up since this checkin (which didn't make it into Fossil 1.28, so it's only in trunk or on branch-1.28): http://www.fossil-scm.org/index.html/info/b4dffdac5e706980d911a0e672526ad461ec0640 I wonder if you could try again with a build from trunk? I've been using later versions of fossil for both the NetBSD and pkgsrc repositories since this discussion took place, and I had one {COMMIT} error, but other than that it has worked great. I'm so happy to be able to nuke the git repositories I have been using as a work-around. I'm very, very happy about this fix -- it changes a lot for me (exclusively to the better). -- Kind Regards, Jan ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, May 9, 2014 at 5:08 AM, Andy Bradford amb-fos...@bradfords.orgwrote: Thus said Doug Franklin on Thu, 08 May 2014 23:00:03 -0400: Does SQLite support nested transactions? If so, that would seem to be worth considering. It does appear to support them: https://www.sqlite.org/lang_transaction.html It doesn't directly support them, but fossil/libfossil add a level of abstraction which simulates them. The notable requirement is that one use the [lib]fossil C APIs to begin/end transactions, as opposed to using BEGIN/END directly. Fossil has an assertion in place to catch if COMMIT is called directly from SQL code while C-initiated transaction is opened. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, May 7, 2014 at 12:59 AM, Andy Bradford amb-sendok-1402034378.gfecjnjggaibliman...@bradfords.org wrote: Thus said Rich Neswold on Wed, 16 Apr 2014 15:40:23 -0500: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. I've been working a bit on implementing a per round-trip commit as suggested by Richard and it does commit in smaller transactions, though not all of them will be valid timeline commits: http://www.fossil-scm.org/index.html/info/d02f144d708e89299ae28a2b99eeb829a6799c5f Basically it does a commit each round trip and defers execution of hooks until the last round-trip happens. I'm not convinced if this is correct behavior---specifically, should it execute them even if there is an error during sync? I was thinking of attacking the problem a little higher up (since I'm way too nervous touching the low-level stuff): The idea is to add a command line option to indicate that you want a partial sync (e.g. --pull-limit 1). This option would only be honored for pulls -- if pushes are occurring, ignore the option because it complicates finding an interruption point for both pulls and pushes. Process cards as they come in and decrement the counter when a card that represents a checkpoint has been completed. When the counter is zero, we break the outer loop (set 'go' to 0): https://www.fossil-scm.org/index.html/artifact/dace4194506b2ea732ca27f68300b156816e403a?ln=1482 When the loop is exited, all the database closing hooks are done and we simply haven't transferred all the history. Issuing another pull will transfer N more artifacts. Eventually, the full history will be transferred. Of course, if the command line option isn't given, then process cards until the sender says they're done sending. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Rich Neswold on Thu, 08 May 2014 15:18:43 -0500: I was thinking of attacking the problem a little higher up (since I'm way too nervous touching the low-level stuff): So did I initially, though my first thought was simply to have autosync try multiple times when failing (in the autosync-tries branch). Then Richard mentioned that it could be done by simply committing more frequently, and so I focused on that approach. I think it actually works quite well and I even added some protections to handle corner cases where a user might receive a partial sync, but then attempt to update/merge to a checkin that is not complete: http://www.fossil-scm.org/index.html/info/f2adddfe601d33c98974f9c645e8aceb9622aa86 One is free to force the update/merge if one desires with the --force-missing option. It would be interesting to get some actual testing with the repository that was mentioned would rollback after a 1GB sync to see how it does. Make sure it's a spare clone repository just in case, though I haven't seen any problems in my testing. Thoughts? Andy -- TAI64 timestamp: 4000536c3088 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On 2014-05-08 16:18, Rich Neswold wrote: On Wed, May 7, 2014 at 12:59 AM, Andy Bradford amb-sendok-1402034378.gfecjnjggaibliman...@bradfords.org wrote: Thus said Rich Neswold on Wed, 16 Apr 2014 15:40:23 -0500: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. Does SQLite support nested transactions? If so, that would seem to be worth considering. -- Thanks, DougF (KG4LMZ) ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Doug Franklin on Thu, 08 May 2014 23:00:03 -0400: Does SQLite support nested transactions? If so, that would seem to be worth considering. It does appear to support them: https://www.sqlite.org/lang_transaction.html Andy -- TAI64 timestamp: 4000536c46e4 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, May 8, 2014 at 10:08 PM, Andy Bradford amb-fos...@bradfords.org wrote: Thus said Doug Franklin on Thu, 08 May 2014 23:00:03 -0400: Does SQLite support nested transactions? If so, that would seem to be worth considering. It does appear to support them: https://www.sqlite.org/lang_transaction.html I don't think nested transactions would help the problem I'm hoping will get solved. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Rich Neswold on Wed, 16 Apr 2014 15:40:23 -0500: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. I've been working a bit on implementing a per round-trip commit as suggested by Richard and it does commit in smaller transactions, though not all of them will be valid timeline commits: http://www.fossil-scm.org/index.html/info/d02f144d708e89299ae28a2b99eeb829a6799c5f Basically it does a commit each round trip and defers execution of hooks until the last round-trip happens. I'm not convinced if this is correct behavior---specifically, should it execute them even if there is an error during sync? Also, there is one potential surprise factor involved after a partial sync occurs but it's hard to predict how often it will actually happen. It's possible that there are phantoms in the repository that will manifest themselves if the resulting change is interrupted at just the right time, and autosync is turned off, and one attempts to update to a version that has those phantoms. It seems that this particular behavior has been in Fossil since 2011, but perhaps difficult to expose because Fossil would rollback the entire sync if there were any failures. It won't result in data loss, but may make things confusing if a commit is made while in this state as the files will show up as Deleted in the checkin, even though there was no indication that they would be deleted (except the warnings/REMOVE that happened during the update). When the phantoms are encountered when running ``fossil update,'' you will see a warning about ``content missing'' and Fossil will then remove the file from the current checkout and report them as being REMOVEd. fossil status, however, will not know about that and report that the current checkout is up-to-date. Here's the relevant code: http://www.fossil-scm.org/index.html/artifact/64d8e49634442edde612084f8b60f4185630d8be?ln=108,111 I'm not sure what the correct behavior should be. If we remove the continue on line 110, fossil will not remove the files, but will attempt to merge an empty file with whatever exists (or replace the current file with a 0 byte file if it is current). Neither seems to be the optimal way to handle this. Another option would be to have ``fossil update'' abort when it sees phantoms thus making it even more difficult to accidentally checkin file deletes. Also, I'm not sure how much it would take to only accept ``valid timeline commits'' as you suggested. Feedback is be appreciated. Thanks, Andy -- TAI64 timestamp: 40005369cbeb ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, May 7, 2014 at 1:59 AM, Andy Bradford amb-fos...@bradfords.orgwrote: Basically it does a commit each round trip and defers execution of hooks until the last round-trip happens. That is scary. The purpose of the hooks is to verify that all of the content in the repository is still accessible. Before each commit, the hooks run to verify that all of the artifacts can still be un-deltaed and uncompressed and they survive those operations intact. Suppose some future change to Fossil introduces a bug that causes the delta or compress operations to lose information so that historical artifacts are no longer recoverable. The hooks are intended to detect that problem *before* it can permanently damage the repository. Doing a commit without running the hooks disables that very important safety mechanism. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Richard Hipp on Wed, 07 May 2014 07:06:31 -0400: The purpose of the hooks is to verify that all of the content in the repository is still accessible. Before each commit, the hooks run to verify that all of the artifacts can still be un-deltaed and uncompressed and they survive those operations intact. Hmm, that does indeed sound problematic to be disabled and it certainly should not be done if it can compromise the integrity of the artifacts. Perhaps I misunderstood the purpose of this block of code in manifest_crosslink_end(): http://www.fossil-scm.org/index.html/artifact/05e0e4bec391ca300d1a6fc30fc19c0a12454be1?ln=1506,1518 It's asimple change torestore these linesto call manifest_crosslink_end(MC_PERMIT_HOOKS) for each round-trip instead of just once at the end: http://www.fossil-scm.org/index.html/artifact/ab14c3fbb94acf319a0bf4e60ba8c8f8b98975e1?ln=1917,1922 Thanks, Andy -- TAI64 timestamp: 4000536a4432 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, May 7, 2014 at 10:32 AM, Andy Bradford amb-fos...@bradfords.orgwrote: Thus said Richard Hipp on Wed, 07 May 2014 07:06:31 -0400: The purpose of the hooks is to verify that all of the content in the repository is still accessible. Before each commit, the hooks run to verify that all of the artifacts can still be un-deltaed and uncompressed and they survive those operations intact. Hmm, that does indeed sound problematic to be disabled and it certainly should not be done if it can compromise the integrity of the artifacts. Perhaps I misunderstood the purpose of this block of code in manifest_crosslink_end(): http://www.fossil-scm.org/index.html/artifact/05e0e4bec391ca300d1a6fc30fc19c0a12454be1?ln=1506,1518 We might be talking about different hooks. I'm concerned about the verify_before_commit hook implemented here: http://www.fossil-scm.org/fossil/artifact/615e25ed6?ln=94-104 -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, May 7, 2014 at 5:02 PM, Richard Hipp d...@sqlite.org wrote: On Wed, May 7, 2014 at 10:32 AM, Andy Bradford amb-fos...@bradfords.orgwrote: http://www.fossil-scm.org/index.html/artifact/05e0e4bec391ca300d1a6fc30fc19c0a12454be1?ln=1506,1518 We might be talking about different hooks. I'm concerned about the verify_before_commit hook implemented here: http://www.fossil-scm.org/fossil/artifact/615e25ed6?ln=94-104 My understanding (from having elided it in libfossil) is that MC_PERMIT_HOOKS refers to commit hooks (TH1/TCL code). Sidebar: the verify-before-commit hook was one of the first features libfossil got because it's such a godsend to not have to worry so much before writing to the db. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Richard Hipp on Wed, 07 May 2014 11:02:55 -0400: We might be talking about different hooks. I'm concerned about the verify_before_commit hook implemented here: http://www.fossil-scm.org/fossil/artifact/615e25ed6?ln=94-104 Yes, it does appear that we were talking about different hooks. I did not alter anything with content_put_ex or verify_before_commit. I must admit, modifying this part of the code has been scary, so any code review is welcome. :-) From whatI could tell, theMC_PERMIT_HOOK passed into manifest_crosslink_end() mainly dealt with custom TH1 scripts that might be executed but seemed to have no bearing on whether or not things should be COMMITed or ROLLBACKed. Thanks for looking at it. Andy -- TAI64 timestamp: 4000536a5169 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Andy Bradford on 04 May 2014 22:44:21 -0600: What have I missed? Perhaps with the per round-trip commit it is not really necessary to also have a COMMIT if the network drops that is implemented in [1317331eed]? Ok, as it turns out, one potential resolution is to simply call fossil_fatal when autosync fails during update here: http://www.fossil-scm.org/index.html/artifact/f90dabeaf78a319b0b3b8791c0dded8d2f6170ec?ln=132 This does have one side-effect when autosync is enabled, but it does perhaps further distinguish update from checkout. The side-effect is that if we call fossil_fatal here, it seems that it will not be possible to update to a different revision than the current checkout while the network is down. It will be possible, however, to do a checkout (or if autosync is disabled, updates work). Andy -- TAI64 timestamp: 400053673bb2 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Richard Hipp on Thu, 17 Apr 2014 11:33:59 -0400: Would that be a valid strategy? Couldn't we end up with a partial state which we can't work from until the pull finishes to completion? The logic (in manifest.c) is designed to be able to deal with partial state transfers. I'm not saying there are definitely no bugs, but I'm pretty sure it does work. I've made some changes on the per-round-trip-commit branch that implement what you suggested, but I believe I've run into a potential bug---though perhaps introduced by my most recent changes. It now successfully COMMITs with each round-trip and if I interrupt the transfer the next time I pull, it only gets those things that were missed in the previous sync. This part works great as far as I can tell! However, with checkin [1317331eed] it now allows errors that happen during the http_exchange to actually be returned to the caller, so a network failue will also result in a COMMIT (rather than a fatal error). It seems that this latter behavior introduces (or uncovers) a serious problem. While the sync operation was running, if the network connection is killed at just the right moment, the sync operation does indeed fail, but then the last set of changes gets committed and the update continues as follows: $ fossil up Autosync: http://amb@remote:8080/ Round-trips: 11 Artifacts sent: 0 received: 101 server did not reply Pull finished with 16512 bytes sent, 16267948 bytes received Autosync failed content missing for file.77 UPDATE file.1 ... REMOVE file.77 ... updated-to: 57487563d6208c04cbbeee3efa0280cb9166000a 2014-05-05 03:37:46 UTC changes: 100 files modified. Now, I may think that my repository is in a good state, but it is not, as one file is entirely missing, and fossil update will not restore it. If I do another update, Fossil shows that the rest of the missing artifacts were pulled down, but it still does not restore the files reported as having missing content. Also, it does not even show that this file is missing (e.g. either a MISSING file.1, nor a DELETE file.1), and if I make a change (say to file.1) and then checkin, the commit actually includes file.77 as a DELETED file even though there was no visual indication that this event would occur. I even tried a rebuild and it still will not restore the missing file. If I close/open the local repository it actually does finally bring back the file: $ fossil op ../clone.fossil file.77 project-name: unnamed repository: /tmp/clone/../clone.fossil local-root: /tmp/clone/ config-db:/home/amb/.fossil project-code: 43748f4be07be41523019a2c4532effbc3f5a02f checkout: 57487563d6208c04cbbeee3efa0280cb9166000a 2014-05-05 03:37:46 UTC parent: a2330f3775d7a939d9f0dd448bca639c1208505d 2014-05-05 03:30:47 UTC leaf: open tags: trunk comment: three (user: amb) checkins: 4 Notice that the UUID matches that which was received during the sync operation . Any ideas what might be going on here? Somtimes I've seen as much as 75% of the files end up with ``content missing.'' If I compile without [1317331eed] this particular problem doesn't happen, and all http_exchange errors are treated as fatal which result in an eventual ROLLBACK of the current round-trip. What have I missed? Perhaps with the per round-trip commit it is not really necessary to also have a COMMIT if the network drops that is implemented in [1317331eed]? Thanks, Andy -- TAI64 timestamp: 400053671747 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 10:12 AM, Rich Neswold rich.nesw...@gmail.com wrote: On Wed, Apr 16, 2014 at 3:40 PM, Rich Neswold rich.nesw...@gmail.com wrote: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. The first few times that my pulls failed, there was no obvious change to the timeline so I assumed none of the data was being saved. After the last timeout, however, there were some new entries from the NetBSD project. So maybe new pulls start were the previous left off after all. Although syncs/pulls appear to make progress, even when a failure occurs, I'd still like to see a fossil pull breaking the request into multiple smaller transactions. The one transaction for the entire request doesn't scale at all. My main NetBSD fossil repo is 11G. I want to keep a copy on another machine so my local changes are stored in more than one location. Last night, my backup was 2G (because I hadn't sync-ed in a while) so I started a fossil pull and then went home. This morning, the pull was aborted by a signal 2 and my local directory showed the following: [~/repo]$ ls -l total 351919880 -rw-r--r-- 1 neswold2335277056 May 1 16:03 netbsd.fossil -rw-r--r-- 1 neswold 9273344 May 2 09:37 netbsd.fossil-shm -rw-r--r-- 1 neswold 177838427024 May 2 06:32 netbsd.fossil-wal That's right, my write-ahead file is 177 GB (16x the expected size of the final repository!) I'm doing a fossil sqlite and it's slowly trying to apply the transaction, but I really don't have any hope it will succeed -- I'm just curious what it will do. More than likely, I'll delete this repo and clone it again. There have to be points during a sync/pull that the target repository is in a stable, consistent state. The transaction can be committed and a new one started. Or maybe add a command-line option to pull/sync which lets the user select how many artifacts to pull over and then the user can run the command multiple times until nothing is left to transfer. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, May 2, 2014 at 10:10 AM, Rich Neswold rich.nesw...@gmail.com wrote: That's right, my write-ahead file is 177 GB (16x the expected size of the final repository!) I'm doing a fossil sqlite and it's slowly trying to apply the transaction, but I really don't have any hope it will succeed -- I'm just curious what it will do. It looks like fossil sqlite was simply verifying the integrity of the database. It ended up deleting the 177 GB of work it did overnight. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On 18/04/14 17:52, Matt Welland wrote: Just FYI, I'm seeing this kind of message quite often. This is due to overlapping clone operations on large fossils on relatively slow disk. [---] Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #key_3_2 That error is the reason I had to switch over to the git port of the netbsd fossil repository. First I thought it was a fossil-on-bsd-problem, but I got it on Linux as well. As you say, it is highly reproducible, but it requires quite a bit of time to trigger sometimes. I'm not running on NFS, but I get the exact same behavior. -- Kind Regards, Jan ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Jan Danielsson on Fri, 02 May 2014 17:39:20 +0200: Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #key_3_2 [...] As you say, it is highly reproducible, but it requires quite a bit of time to trigger sometimes. This particular error hasn't come up since this checkin (which didn't make it into Fossil 1.28, so it's only in trunk or on branch-1.28): http://www.fossil-scm.org/index.html/info/b4dffdac5e706980d911a0e672526ad461ec0640 I wonder if you could try again with a build from trunk? Thanks, Andy -- TAI64 timestamp: 40005363dc90 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700: Autosync: ssh://host/path/project.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Have you tried running the latest from trunk on your fossil server? You can test this easily without impacting existing users via SSH by installing the new version to a different location on the server and then cloning with a URL of: fossil clone ssh://host/path/project.fossil?fossil=/path/to/new/fossil clone.fossil I tried the latest from trunk and I don't see this particular error anymore. If this also goes away for you, then you simply need to update your servers (no client updates should be necessary). Andy -- TAI64 timestamp: 40005354cb1b ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Mon, Apr 21, 2014 at 12:38 AM, Andy Bradford amb-fos...@bradfords.orgwrote: Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700: Autosync: ssh://host/path/project.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Have you tried running the latest from trunk on your fossil server? Yes! This is fixed on latest! Any idea which commit fixes the problem? I guess we should switch to this not-officially-released version but what other issues am I likely to run into? Strictly speaking I'd feel more comfortable with a version 1.28 patched with whatever fixes the bug rather than taking on the myriad of changes made since 1.28 was released. What do people advise? You can test this easily without impacting existing users via SSH by installing the new version to a different location on the server and then cloning with a URL of: fossil clone ssh://host/path/project.fossil?fossil=/path/to/new/fossil clone.fossil I tried the latest from trunk and I don't see this particular error anymore. If this also goes away for you, then you simply need to update your servers (no client updates should be necessary). Andy -- TAI64 timestamp: 40005354cb1b -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Mon, Apr 21, 2014 at 6:26 PM, Matt Welland estifo...@gmail.com wrote: What do people advise? Historically speaking there has been little or no reason not to rely on the tip of the trunk. Rarely, something gets put in which breaks the build, but that doesn't happen often and is always fixed quickly. i've used Fossil daily since the end of 2007, and the only copy of the fossil binary on my machines is the one under my clone of the main repo. That has occasionally bitten me (requiring me to go download a binary), but only when i'm tinkering on fossil, can't compile it, and have already cleaned up the old binary (so can't stash/revert my changes). -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Mon, 21 Apr 2014 09:26:25 -0700: Yes! This is fixed on latest! Any idea which commit fixes the problem? Will you tell me exactly which version of fossil it is? e.g. run ``fossil version'' with the fossil binary that exhibits the problem on the server. Thanks, Andy -- TAI64 timestamp: 40005355f2da ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Mon, Apr 21, 2014 at 9:40 PM, Andy Bradford amb-fos...@bradfords.orgwrote: Thus said Matt Welland on Mon, 21 Apr 2014 09:26:25 -0700: Yes! This is fixed on latest! Any idea which commit fixes the problem? Will you tell me exactly which version of fossil it is? e.g. run ``fossil version'' with the fossil binary that exhibits the problem on the server. It is the version 1.28 downloaded from the downloads page. Which should be 3d49f04587 BTW, note that it is the same fossil binary accessible from both the client and server perspective as it is off of NFS. I don't think this matters but thought I'd mention it. Thanks, Andy -- TAI64 timestamp: 40005355f2da -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Mon, 21 Apr 2014 09:26:25 -0700: Yes! This is fixed on latest! Any idea which commit fixes the problem? I ran fossil bisect to figure out where the fix came into trunk [I must say, this is the first time I've used fossil bisect and it was quite handy!]. Here is the last BAD commit that had the problem: http://www.fossil-scm.org/index.html/timeline?dp=ab00f2b007d5229d And the commit just after that by drh fixes it [b4dffdac5e] It was also merged into the 1.28 branch (branch-1.28): http://www.fossil-scm.org/index.html/info/ebac09bcf72fbed9b389c07766a931264df9e304 So if you feel better sticking with with Fossil version 1.28, you can update to the latest on branch-1.28. Andy -- TAI64 timestamp: 40005355fff8 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Richard Hipp on Thu, 17 Apr 2014 11:13:38 -0400: Would this really require a big change? Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. I think Fossil already does the latter; or I just read the code wrong. At the end of client_sync() it calls db_end_transaction(0): http://www.fossil-scm.org/index.html/artifact/dace4194506b2ea7?ln=1936 Which will cause a COMMIT to happen unless there are errors (with commit hooks): http://www.fossil-scm.org/index.html/artifact/17595c8a94256a4d?ln=162,185 Am I wrong? Andy -- TAI64 timestamp: 40005352ed3d ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Wed, 16 Apr 2014 09:01:28 -0700: fossil commit cfgdat tests -m Added another drc test Autosync: ssh://host/path/project.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Round-trips: 1 Artifacts sent: 0 received: 0 Pull finished with 360 bytes sent, 280 bytes received Autosync failed continue in spite of sync failure (y/N)? n I've done a fair bit of profiling with this, and this seems to happen primarily with the test-http command (the default sync method for SSH clients). I don't know what the history is behind the test-http command, but my guess is that it was really not intended to be a heavily used sync method for shared repositories. I'm not really sure why this particular database locking error happens so frequently with test-http, but not at all with http. This is happening in manifest_crosslink_end() when it's trying to fudge times. If I force my SSH command to use http instead of test-http, this error disappears entirely and I only ever see an occasional locking error due to multiple committers when I try to commit large change sets (like a 10,000 line, 840K change set); same behavior as standard HTTP/HTTPS transports in my environment (slow disk/cpu/network). Are all your users using SSH to access shared repositories? Or do you just have a few users using SSH? Perhaps it would be better to switch to using SSH keys and forced commands to cause fossil to use http instead of test-http? This does require a bit more setup. For example, each .fossil has to have the remote_user_ok configuration enabled so you can setup the REMOTE_USER environment variable for them. This is because there currently is no mechanism to use Fossil authentication while using SSH as the transport and fossil http requires it if you want to commit. I suppose an alternative configuration would be to give nobody/anonymous users the ability to write, which if SSH authentication is the only allowed sync method it may be acceptable. The only drawback that I see there is that the rcvfrom information would show up as having come from nobody, e.g., User: amb Received From: nobody @ 192.168.1.9 on 2014-04-20 04:33:35 I think one thing I've learned from all this is that forks and database locking errors occur much more frequently on slow hardware and large change sets. Also, I seem to be able to cause forking that goes undetected (without a warning). All of this probably explains why it is difficult to reproduce except on older hardware. As for making sync try harder, we could certainly just loop X number of times if we think it is worth it (not sure how feasible it will be to make it silent, or if there will be other side effects). Here I have it loop for 10 times before bailing. As you can see it failed once, but then succeeded the second time and received updates that indicate it is out of sync: $ fossil ci -m synctest2 Autosync: ssh://fossil/tmp/test.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Round-trips: 1 Artifacts sent: 0 received: 0 Pull finished with 314 bytes sent, 280 bytes received Autosync failed Autosync: ssh://fossil/tmp/test.fossil Round-trips: 3 Artifacts sent: 0 received: 102 Pull finished with 3451 bytes sent, 170661 bytes received would fork. update first or use --allow-fork. There was also a sync failure on the first committer after it successfully committed the artifacts: $ fossil ci -m synctest1 Autosync: ssh://fossil/tmp/test.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Pull finished with 316 bytes sent, 229 bytes received New_Version: 04e7debfa4f29ee3c1635007e3f380f0a0630366 Autosync: ssh://fossil/tmp/test.fossil Round-trips: 3 Artifacts sent: 101 received: 0 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Round-trips: 3 Artifacts sent: 101 received: 0 Sync finished with 179617 bytes sent, 3234 bytes received Autosync failed Autosync: ssh://fossil/tmp/test.fossil Round-trips: 1 Artifacts sent: 0 received: 1 Sync finished with 4916 bytes sent, 2724 bytes received Thoughts? Andy -- TAI64 timestamp: 4000535358db ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Just FYI, I'm seeing this kind of message quite often. This is due to overlapping clone operations on large fossils on relatively slow disk. Round-trips: 1 Artifacts sent: 0 received: 0 Round-trips: 1 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 895 Round-trips: 4 Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #key_3_2 On Thu, Apr 17, 2014 at 1:56 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Thu, Apr 17, 2014 at 02:06:26PM -0500, Rich Neswold wrote: On Thu, Apr 17, 2014 at 1:46 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: Please note that while moving to a newer, faster server I also moved to source to /cvsroot to match real CVS. That was responsible for quite a few changes. So I'm sync'ing a completely new repository on top of mine? No, just that the original RCS files moved, which in some case changes the way certain RCS keyword are expanded during the fossil conversion. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Sorry for the multiple mails but I have a little more info. I can reliably reproduce this. Just do two simultaneous clones via ssh from a large fossil. This is on NFS. It happens very quickly so fossil is giving up pretty fast. On Fri, Apr 18, 2014 at 8:52 AM, Matt Welland estifo...@gmail.com wrote: Just FYI, I'm seeing this kind of message quite often. This is due to overlapping clone operations on large fossils on relatively slow disk. Round-trips: 1 Artifacts sent: 0 received: 0 Round-trips: 1 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 895 Round-trips: 4 Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #1457589b194a2723_key_3_2 On Thu, Apr 17, 2014 at 1:56 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: On Thu, Apr 17, 2014 at 02:06:26PM -0500, Rich Neswold wrote: On Thu, Apr 17, 2014 at 1:46 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: Please note that while moving to a newer, faster server I also moved to source to /cvsroot to match real CVS. That was responsible for quite a few changes. So I'm sync'ing a completely new repository on top of mine? No, just that the original RCS files moved, which in some case changes the way certain RCS keyword are expanded during the fossil conversion. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 6:00 PM, Matt Welland estifo...@gmail.com wrote: I can reliably reproduce this. Just do two simultaneous clones via ssh from a large fossil. This is on NFS. It happens very quickly so fossil is giving up pretty fast. NFS w/ db file == fundamentally bad idea. db.c sets the default budy timeout to 5 seconds. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 9:21 AM, Stephan Beal sgb...@googlemail.com wrote: NFS w/ db file == fundamentally bad idea. db.c sets the default budy timeout to 5 seconds. So you are recommending we abandon fossil because of this? Storing the files on local disk is not an option for us. Also, other than being a little slow, storing fossils on NFS has not been an issue. I did some more testing and this is unique to using ssh and it occurs on local disk just as fast as on NFS. Anyone sharing fossils using ssh will run into this sooner or later. This is using 1.28 -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 6:47 PM, Matt Welland estifo...@gmail.com wrote: So you are recommending we abandon fossil because of this? Storing the files on local disk is not an option for us. Also, other than being a little slow, storing fossils on NFS has not been an issue. Search this page for NFS: http://sqlite.org/howtocorrupt.html I did some more testing and this is unique to using ssh and it occurs on local disk just as fast as on NFS. Then you're lucky. Anyone sharing fossils using ssh will run into this sooner or later. This is using 1.28 SSH is not the problem - NFS is historically problematic when it comes to file locking. i've seen apps slow down by a factor of a hundred when using locking over NFS, and heard/read many horror stories of shared file corruption over buggy NFSes. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 12:47 PM, Matt Welland estifo...@gmail.com wrote: I did some more testing and this is unique to using ssh and it occurs on local disk just as fast as on NFS. I don't have NFS set up anywhere so I cannot test that. But I can do multiple ssh clones from a different machine and when I do, everything works fine. I've tried as many three different, simultaneous clones of the same repo, all running at the same time. I've used both an old mac and a beagleboard as the server. (Client is always my linux desktop.) It always works. I cannot recreate the problem. Do you have any additional hints for me? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
NFS is not needed to reproduce this. Simultaneous parallel cloning via ssh from one file is giving me this every single time. Could it be an OS dependency? I'm on SuSe Linux (SLES11). I downloaded the binary from fossil-scm.org and tested again and get exactly the same issue. I do happen to be cloning from and to the same host. On Fri, Apr 18, 2014 at 10:12 AM, Richard Hipp d...@sqlite.org wrote: On Fri, Apr 18, 2014 at 12:47 PM, Matt Welland estifo...@gmail.comwrote: I did some more testing and this is unique to using ssh and it occurs on local disk just as fast as on NFS. I don't have NFS set up anywhere so I cannot test that. But I can do multiple ssh clones from a different machine and when I do, everything works fine. I've tried as many three different, simultaneous clones of the same repo, all running at the same time. I've used both an old mac and a beagleboard as the server. (Client is always my linux desktop.) It always works. I cannot recreate the problem. Do you have any additional hints for me? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 1:32 PM, Matt Welland estifo...@gmail.com wrote: NFS is not needed to reproduce this. Simultaneous parallel cloning via ssh from one file is giving me this every single time. Could it be an OS dependency? I'm on SuSe Linux (SLES11). I downloaded the binary from fossil-scm.org and tested again and get exactly the same issue. I do happen to be cloning from and to the same host. Tried again here, running three simultaneous clones of the same repo, but this time ssh to the same host. Still no errors. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
How big is the repo? The one I'm cloning is 420 MB. Perhaps that is a factor? On Fri, Apr 18, 2014 at 10:39 AM, Richard Hipp d...@sqlite.org wrote: On Fri, Apr 18, 2014 at 1:32 PM, Matt Welland estifo...@gmail.com wrote: NFS is not needed to reproduce this. Simultaneous parallel cloning via ssh from one file is giving me this every single time. Could it be an OS dependency? I'm on SuSe Linux (SLES11). I downloaded the binary from fossil-scm.org and tested again and get exactly the same issue. I do happen to be cloning from and to the same host. Tried again here, running three simultaneous clones of the same repo, but this time ssh to the same host. Still no errors. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 1:41 PM, Matt Welland estifo...@gmail.com wrote: How big is the repo? The one I'm cloning is 420 MB. Perhaps that is a factor? I was using SQLite, 55MB. The biggest repo I have at hand is System.Data.SQLite at 264MB. I just did three simultaneous ssh clones of it without any issues. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 10:03 AM, Stephan Beal sgb...@googlemail.comwrote: NFS is historically problematic when it comes to file locking. This is true. However technology doesn't stop evolving. The locking on NFS on the systems I use seems pretty rock solid. I push sqlite3 to extremes on NFS and there have been challenges but all considered it is quite remarkable how well it performs. As I mentioned in a previous email the built in timeout mechanism in sqlite3 seems to tie up the database and using shorter timeouts and delaying a short while before trying again really seemed to improve throughput. Overall I'd say that be cautious but don't hesitate to keep sqlite3 in your tool chest even if you have to work on NFS. Note that the issue I'm seeing is happening with no NFS. -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Fri, Apr 18, 2014 at 01:50:08PM -0400, Richard Hipp wrote: On Fri, Apr 18, 2014 at 1:41 PM, Matt Welland estifo...@gmail.com wrote: How big is the repo? The one I'm cloning is 420 MB. Perhaps that is a factor? I was using SQLite, 55MB.A The biggest repo I have at hand is System.Data.SQLite at 264MB.A I just did three simultaneous ssh clones of it without any issues. May be delete mode VS wal mode ? -- Martin G. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Fri, 18 Apr 2014 08:52:09 -0700: Round-trips: 1 Artifacts sent: 0 received: 0 Round-trips: 1 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 109 Round-trips: 2 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 773 Round-trips: 3 Artifacts sent: 0 received: 895 Round-trips: 4 Artifacts sent: 0 received: 895 Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} #key_3_2 What version of Fossil produced this output? What version of fossil was on the remote side? Thanks, Andy -- TAI64 timestamp: 40005351c34f ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Fri, 18 Apr 2014 10:32:26 -0700: Could it be an OS dependency? I'm on SuSe Linux (SLES11). No, I can reproduce it on OpenBSD. I'm looking at it more closely to see what might be causing it. Basically, you need a long commit in progress and then try to sync. I can also reproduce it if I am committing via HTTP and trying to pull via SSH. Andy -- TAI64 timestamp: 40005351c5da ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Matt Welland on Fri, 18 Apr 2014 10:41:39 -0700: How big is the repo? The one I'm cloning is 420 MB. Perhaps that is a factor? No, the problem appears to be the difference between using test-http and http as the remote command. The default behavior for the Fossil client is to send a remote ``fossil test-http /path'' to the server. If instead I force the Fossil client to talk to: fossil http /path/to/fossil Everything works as expected (e.g. no locking issues). Andy -- TAI64 timestamp: 40005351c9cc ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
Thus said Andy Bradford on 18 Apr 2014 18:56:09 -0600: Everything works as expected (e.g. no locking issues). I spoke too soon. If I give the Fossil user permissions (e.g. don't clone as nobody) then the issue arises again. It doesn't appear to be isolated to just SSH. I can cause locking errors with HTTP too, including some interesting behavior like files unexpectedly being removed from the cloned repository. For example, after modifying a test repo and then starting a sync via HTTP, I started a sync from another HTTP client which caused this: $ f sync Sync with http://amb@fossil:8080/ Round-trips: 24 Artifacts sent: 24 received: 0 Error: Database error: database is locked: {COMMIT} Round-trips: 24 Artifacts sent: 24 received: 0 Sync finished with 3840799 bytes sent, 65248 bytes received After the second sync completed (having received a partial set of artifacts), I did an update and suddenly files began being removed (presumably due to the partial commit above). After letting the original commit run to completion another time, I did an update in the second but the files have not come back: On the first committer: $ f stat | grep checkout checkout: 4393b959511a32ec949f32900049d1195226f8d4 2014-04-19 01:05:46 UTC $ ls | wc -l 101 On the second: $ f stat | grep checkout checkout: 4393b959511a32ec949f32900049d1195226f8d4 2014-04-19 01:05:46 UTC $ ls | wc -l 31 Only after closing and opening the local repository was I able to make the files come back, but I wonder what would have happened had I tried to commit files during the time that it was in this state. It's possible that using SSH as a client makes it easier to cause problems, but it does appear to be possible to run into issues with HTTP as well. Once I even got told that a fork had happened (both clients had autosync on). But it seems to be due in part to large commits and slow disk. I'll keep looking at it as I get time. Andy -- TAI64 timestamp: 40005351d3f6 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 10:40 PM, Rich Neswold rich.nesw...@gmail.comwrote: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. That's a very interesting idea. That's not something for a weekend hack (it would require bigger changes), but that would certainly be of benefit in libfossil once it is far enough along to sync. There's no specific reason why it has to internally track the transient sync data the same way fossil(1) does. e.g. it might make sense to buffer it all to an extra table and then feed that table to the part which does the real work. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 3:40 PM, Rich Neswold rich.nesw...@gmail.com wrote: It would be even nicer if it didn't throw away partial pull data on a DB timeout: I'm trying to pull the latest NetBSD changes (to pull in the Heartbleed fixes) and my session keeps failing with the fudge time error. Unfortunately, this means all the data it transferred (sometimes over 1GB!) gets rolled back and I have to try again later. It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. I may be confused and I'm definitely ignorant of fossil internals. The first few times that my pulls failed, there was no obvious change to the timeline so I assumed none of the data was being saved. After the last timeout, however, there were some new entries from the NetBSD project. So maybe new pulls start were the previous left off after all. (The heartbleed bug probably caused many changes to several NetBSD branches, so there are probably many more entries to pull than normal.) I'll hit Mr. Sonnenberger's server a few more time throughout the day and see if I can eventually complete a pull. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 11:03 AM, Stephan Beal sgb...@googlemail.comwrote: On Wed, Apr 16, 2014 at 10:40 PM, Rich Neswold rich.nesw...@gmail.comwrote: It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. That's a very interesting idea. That's not something for a weekend hack (it would require bigger changes), but that would certainly be of benefit in libfossil once it is far enough along to sync. There's no specific reason why it has to internally track the transient sync data the same way fossil(1) does. e.g. it might make sense to buffer it all to an extra table and then feed that table to the part which does the real work. Would this really require a big change? Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 5:13 PM, Richard Hipp d...@sqlite.org wrote: That's a very interesting idea. That's not something for a weekend hack (it would require bigger changes), Would this really require a big change? i kinda made an conservative guess there ;). Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. Would that be a valid strategy? Couldn't we end up with a partial state which we can't work from until the pull finishes to completion? -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 11:16 AM, Stephan Beal sgb...@googlemail.comwrote: Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. Would that be a valid strategy? Couldn't we end up with a partial state which we can't work from until the pull finishes to completion? The logic (in manifest.c) is designed to be able to deal with partial state transfers. I'm not saying there are definitely no bugs, but I'm pretty sure it does work. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
I'm not sure if this is relevant but I found with sqlite3 that in situations with high contention for a database (multiple coincident reads/writes) that backing off and trying again in a half second rather than relying on the sqlite3 timeout seemed to increase overall throughput and reliability. I suspect that if on the server side if there are multiple concurrent readers and a concurrent writer that a brief release of the read lock to allow any pending writers to complete their work might improve overall throughput and decrease the number of sqlite3 errors. One fossil I work with is getting over a hundred commits a day and the sqlite3 failures result in a fork every few weeks. I'm glad to say the database itself has proven resistant to corruption under this heavy load. On Thu, Apr 17, 2014 at 8:33 AM, Richard Hipp d...@sqlite.org wrote: On Thu, Apr 17, 2014 at 11:16 AM, Stephan Beal sgb...@googlemail.comwrote: Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. Would that be a valid strategy? Couldn't we end up with a partial state which we can't work from until the pull finishes to completion? The logic (in manifest.c) is designed to be able to deal with partial state transfers. I'm not saying there are definitely no bugs, but I'm pretty sure it does work. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 10:12:44AM -0500, Rich Neswold wrote: The first few times that my pulls failed, there was no obvious change to the timeline so I assumed none of the data was being saved. After the last timeout, however, there were some new entries from the NetBSD project. So maybe new pulls start were the previous left off after all. (The heartbleed bug probably caused many changes to several NetBSD branches, so there are probably many more entries to pull than normal.) Please note that while moving to a newer, faster server I also moved to source to /cvsroot to match real CVS. That was responsible for quite a few changes. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 11:13:38AM -0400, Richard Hipp wrote: Would this really require a big change? Seems like about all you have to do is COMMIT after each round-trip to the server, rather than waiting to COMMIT at the very end. Or, just COMMIT instead of ROLLBACK after getting a server timeout. Yes, please. Even for local syncs, the overhead should be small. For remote operations, net latency should eat everything... That reminds me, the other problem with the network protocoll is its synchronous nature. Consider the case of having enough phantoms to issue the next round before processing the answer of the server. Sending that request in parallel while processing the answer would significantly increase through put. Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 1:46 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: Please note that while moving to a newer, faster server I also moved to source to /cvsroot to match real CVS. That was responsible for quite a few changes. So I'm sync'ing a completely new repository on top of mine? A fossil repository doesn't have a UUID to tell if I shouldn't pull from a remote anymore? Like, if for some strange reason, Mr. Sonnenberger decided to replace the NetBSD repo with the fossil repo, I'd be pulling fossil source into my repo/ticket/wiki without a warning? Or is my ignorance showing again? :) -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 3:06 PM, Rich Neswold rich.nesw...@gmail.comwrote: So I'm sync'ing a completely new repository on top of mine? Every project as a project-id, which is supposed to be unique. Fossil recognizes when the project-ids do not match and refuses to sync. That said, there is nothing to prevent a clever individual, like Joerg, from manually setting a duplicate project-id using raw SQL statements. But on the other hand, why would he do that? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 12:06 PM, Rich Neswold rich.nesw...@gmail.com wrote: On Thu, Apr 17, 2014 at 1:46 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: Please note that while moving to a newer, faster server I also moved to source to /cvsroot to match real CVS. That was responsible for quite a few changes. So I'm sync'ing a completely new repository on top of mine? A fossil repository doesn't have a UUID to tell if I shouldn't pull from a remote anymore? A project in fossil has a project id ... Called a project-code. Example: fossil info ; on my Tcl checkout = project-name: Tcl Source Code project-code: 1ec9da4c469c29f4717e2a967fe6b916d9c8c06e Fossil will not push/pull between repos of different project codes. My understanding of Joerg's mail was that the moved files around in the repository to match a specific directory structure, but not that he created a new project. Like, if for some strange reason, Mr. Sonnenberger decided to replace the NetBSD repo with the fossil repo, I'd be pulling fossil source into my repo/ticket/wiki without a warning? Or is my ignorance showing again? :) -- Andreas Kupries Senior Tcl Developer Code to Cloud: Smarter, Safer, Faster(tm) F: 778.786.1133 andre...@activestate.com http://www.activestate.com Learn about Stackato for Private PaaS: http://www.activestate.com/stackato EuroTcl'2014, July 12-13, Munich, GER -- http://www.eurotcl.tcl3d.org/ 21'st Tcl/Tk Conference: Nov 10-14, Portland, OR, USA -- http://www.tcl.tk/community/tcl2014/ ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Thu, Apr 17, 2014 at 2:10 PM, Richard Hipp d...@sqlite.org wrote: Every project as a project-id, which is supposed to be unique. Fossil recognizes when the project-ids do not match and refuses to sync. That said, there is nothing to prevent a clever individual, like Joerg, from manually setting a duplicate project-id using raw SQL statements. But on the other hand, why would he do that? Good! So the extra long pull times are simply due to Joerg doing some housekeeping. Thanks for the information! -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 6:01 PM, Matt Welland estifo...@gmail.com wrote: Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Round-trips: 1 Artifacts sent: 0 received: 0 Pull finished with 360 bytes sent, 280 bytes received Autosync failed continue in spite of sync failure (y/N)? n Could fossil silently retry a couple times instead of giving up so easily? If the user says y and continues then we get forks in the timeline which are very confusing to non-experts. Isn't the db being locked a sign that a fork is almost eminent? If someone is writing to the repo and that lock is blocking your autosync, then a fork has possibly already happened (or will if autosync retries, either automatically or because the user tapped Y). -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 9:14 AM, Stephan Beal sgb...@googlemail.com wrote: On Wed, Apr 16, 2014 at 6:01 PM, Matt Welland estifo...@gmail.com wrote: Error: Database error: database is locked: {UPDATE event SET mtime=(SELECT m1 FROM time_fudge WHERE mid=objid) WHERE objid IN (SELECT mid FROM time_fudge);} Round-trips: 1 Artifacts sent: 0 received: 0 Pull finished with 360 bytes sent, 280 bytes received Autosync failed continue in spite of sync failure (y/N)? n Could fossil silently retry a couple times instead of giving up so easily? If the user says y and continues then we get forks in the timeline which are very confusing to non-experts. Isn't the db being locked a sign that a fork is almost eminent? If someone is writing to the repo and that lock is blocking your autosync, then a fork has possibly already happened (or will if autosync retries, either automatically or because the user tapped Y). Yes, exactly. Presumably a commit from someone else is in progress. All fossil has to do is wait a second and then try the sync again and then report the fossil will fork message if appropriate or follow though with the commit if the overlapping commit was on a different branch. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users -- Matt -=- 90% of the nations wealth is held by 2% of the people. Bummer to be in the majority... ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 6:22 PM, Matt Welland estifo...@gmail.com wrote: then try the sync again and then report the fossil will fork message if appropriate or follow though with the commit if the overlapping commit was on a different branch. Ah, right - i didn't think that through to the next step. That does indeed sound like it would be an improvement. This weekend is a four-day one for us in southern Germany (for Easter), so i'll see if i can tinker with this if someone doesn't beat me to it. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 06:26:49PM +0200, Stephan Beal wrote: Ah, right - i didn't think that through to the next step. That does indeed sound like it would be an improvement. This weekend is a four-day one for us in southern Germany (for Easter), so i'll see if i can tinker with this if someone doesn't beat me to it. It would also be nice if clone didn't abort with removal of the repository on such errors. pull/push should return an error etc. There are a bunch of basic usability issues in this area. This is made worse by pull not being read-only... Joerg ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 6:35 PM, Joerg Sonnenberger jo...@britannica.bec.de wrote: It would also be nice if clone didn't abort with removal of the repository on such errors. pull/push should return an error etc. There are a bunch of basic usability issues in this area. This is made worse by pull not being read-only... i make no promises but will look into it. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] can fossil try harder on sync failure?
On Wed, Apr 16, 2014 at 11:35 AM, Joerg Sonnenberger jo...@britannica.bec.de wrote: It would also be nice if clone didn't abort with removal of the repository on such errors. pull/push should return an error etc. There are a bunch of basic usability issues in this area. This is made worse by pull not being read-only... It would be even nicer if it didn't throw away partial pull data on a DB timeout: I'm trying to pull the latest NetBSD changes (to pull in the Heartbleed fixes) and my session keeps failing with the fudge time error. Unfortunately, this means all the data it transferred (sometimes over 1GB!) gets rolled back and I have to try again later. It would be nice if fossil would break the pull into smaller transactions which contain valid timeline commits so, if there's a database timeout, the next time I try to pull it can continue where it left off. -- Rich ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users