Re: [fossil-users] unclustered vs private vs phantom and not syncing content
Thus said Andy Bradford on 12 Jul 2014 13:36:58 -0600: 2) The artifact rid was in the unclustered table, but when create_cluster() ran it prematurely removed it from the table. I have been able to successfully reproduce/cause this. When a large number of artifacts are being transfered, they produce phantoms on the server side of the sync operation. Eventually, the unclustered table grows large enough that create_cluster() starts cleaning house and building a new cluster artifact to replace all the entries in the unclustered table. Then it deletes everything that it didn't just create as part of creating clusters, including phantoms for which content has not yet arrived (most importantly a checkin artifact in which the manifest references a lot of other files). If no other artifacts reference the artifacts that were phantoms on the unclustered table, now deleted, then the content seemingly disappears to other clients that are trying to sync. If the content eventually gets incorporated in other manifests then it will eventually sync because they will discover the artifacts in those manifests, mark them as phantoms and then request them with gimme cards. This is most easily reproduced by simply doing this in a clone of a repository: $ jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump file.$x; done $ fossil ci -m bigupdate --branch big $ fossil up trunk $ echo $RANDOM file.1 $ fossil ci -m back Now, never merge in the ``big'' branch and clients that have previously cloned the server repository will never see the checkin to the branch unless they use --verily. Otherwise, if the branch is merged into trunk (or the checkin is edited and the branch closed), then suddenly it will appear to those clients (assuming those changes don't get deleted from the unclustered table first). For repositories that are very active, and have smallish commits, this won't likely ever present itself. I have confirmed that the change in the cluster-changes branch actually does correct this, but I would like to solicit other alternatives if anyone has any suggestions. Basically, I made Fossil ignore phantoms when deleting from the unclustered table: http://www.fossil-scm.org/index.html/vdiff?from=619fa857c9330c10to=5c6891b2ab10c4d0sbs=1 Suggestions? Thanks, Andy -- TAI64 timestamp: 400053ce2090 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
On Tue, Jul 22, 2014 at 10:27 AM, Andy Bradford amb-fos...@bradfords.org wrote: Suggestions? Only one: Keep it up! That was impressive investigatory work! It seems to me that you've discovered that fossil does indeed (unintentionally) support a form of branch-specific sync ;). -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do. -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
On 22 July 2014 10:27, Andy Bradford amb-fos...@bradfords.org wrote: Thus said Andy Bradford on 12 Jul 2014 13:36:58 -0600: 2) The artifact rid was in the unclustered table, but when create_cluster() ran it prematurely removed it from the table. I have been able to successfully reproduce/cause this. When a large number of artifacts are being transfered, they produce phantoms on the server side of the sync operation. Eventually, the unclustered table grows large enough that create_cluster() starts cleaning house and building a new cluster artifact to replace all the entries in the unclustered table. Then it deletes everything that it didn't just create as part of creating clusters, including phantoms for which content has not yet arrived (most importantly a checkin artifact in which the manifest references a lot of other files). If no other artifacts reference the artifacts that were phantoms on the unclustered table, now deleted, then the content seemingly disappears to other clients that are trying to sync. If the content eventually gets incorporated in other manifests then it will eventually sync because they will discover the artifacts in those manifests, mark them as phantoms and then request them with gimme cards. This is most easily reproduced by simply doing this in a clone of a repository: $ jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump file.$x; done $ fossil ci -m bigupdate --branch big $ fossil up trunk $ echo $RANDOM file.1 $ fossil ci -m back Now, never merge in the ``big'' branch and clients that have previously cloned the server repository will never see the checkin to the branch unless they use --verily. Otherwise, if the branch is merged into trunk (or the checkin is edited and the branch closed), then suddenly it will appear to those clients (assuming those changes don't get deleted from the unclustered table first). Thank you for the effort, this is or has been a long-standing issue indeed. I can't seem to reproduce what you describe - either that, or I'm missing the point (did you mean 'merge' as in 'fossil merge'?). I'm assuming you left out 'fossil add' (or 'addremove') twice in your example. I tried your example on a single host, hopefully to exclude complexity added by any physical network. (Do you think it's necessary to use 2 different hosts to reproduce the issue like you described?) I cloned using http:// before adding files, and then updated from within the cloned repo's workdir. (I can see the artifacts being received on the cloned repo's side, so I guess the attempt really ends there.) Longish typescript follows: --- michai@main:/fossils$ f ver This is fossil version 1.30 [619fa857c9] 2014-07-19 19:20:25 UTC michai@main:/fossils$ grep fossil /etc/inetd.conf http stream tcp nowait.1000 root /usr/local/bin/f /usr/local/bin/f http /fossils michai@main:/fossils$ f new --date-override 2014-01-01 ab.fossil project-id: e0b53c254d86b6445060df9c65a9017134b348db server-id: c72a02a0849d982ca8066b812521a1f7cde187af admin-user: michai (initial password is 794d90) michai@main:/fossils$ mkdir f michai@main:/fossils$ cd f michai@main:/fossils/f$ f open ../ab.fossil project-name: unnamed repository: /fossils/f/../ab.fossil local-root: /fossils/f/ config-db:/home/michai/.fossil project-code: e0b53c254d86b6445060df9c65a9017134b348db checkout: b58cc4d9818973107a8acba469dda6edd4ba9683 2014-01-01 00:00:00 UTC leaf: open tags: trunk comment: initial empty check-in (user: michai) checkins: 1 michai@main:/fossils/f$ pushd /tmp /tmp /fossils/f michai@main:/tmp$ mkdir f michai@main:/tmp$ cd f michai@main:/tmp/f$ f clone http://localhost/ab ab.fossil Round-trips: 1 Artifacts sent: 0 received: 0 Round-trips: 1 Artifacts sent: 0 received: 1 Round-trips: 2 Artifacts sent: 0 received: 1 Round-trips: 2 Artifacts sent: 0 received: 3 Clone finished with 461 bytes sent, 1155 bytes received Rebuilding repository meta-data... 0.0% complete... 100.0% complete... project-id: e0b53c254d86b6445060df9c65a9017134b348db server-id: 25482d35a0445e5710395d12423b421e90b9f4be admin-user: michai (password is e4a7ef) michai@main:/tmp/f$ mkdir f michai@main:/tmp/f$ cd f michai@main:/tmp/f/f$ f open ../ab.fossil project-name: unnamed repository: /tmp/f/f/../ab.fossil local-root: /tmp/f/f/ config-db:/home/michai/.fossil project-code: e0b53c254d86b6445060df9c65a9017134b348db checkout: b58cc4d9818973107a8acba469dda6edd4ba9683 2014-01-01 00:00:00 UTC leaf: open tags: trunk comment: initial empty check-in (user: michai) checkins: 1 michai@main:/tmp/f/f$ popd /fossils/f michai@main:/fossils/f$ cat /tmp/f.sh #!/bin/sh jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump file.$x; done f addr f ci -m bigupdate --branch big f up trunk echo $RANDOM file.1 f add file.1 f ci -m back
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
Thus said Michai Ramakers on Tue, 22 Jul 2014 12:35:03 +0200: I can't seem to reproduce what you describe - either that, or I'm missing the point (did you mean 'merge' as in 'fossil merge'?). I'm assuming you left out 'fossil add' (or 'addremove') twice in your example. Yes, I left out a few steps (sorry). It was assumed that the 1500 files already exist in the repository and the changes are just updates (but essentially a 100% rewrite of the file due to the randomness). Also, the entire lump of changes has to be large enough that max-download comes into play and there are multiple sync operations that occur as a result during the checkin. I don't think it matters whether these are new files or modified files (I just used edits because I was trying multiple variations), so after generating all the files, you could do ``fossil addremove'' to get the big change set. I tried your example on a single host, hopefully to exclude complexity added by any physical network. (Do you think it's necessary to use 2 different hosts to reproduce the issue like you described?) I cloned using http:// before adding files, and then updated from within the cloned repo's workdir. More steps I left out... No, I did this all on one host. I created the repo and started fossil server with the repo. Then I cloned it 2 times. In one clone I made the changes and then after the last checkin, I did an update in the second clone. It never received the artifact for the checkin (because it wasn't on the unclustered artifact and not mentioned in any other manifests). Also, as far as the Fossil version is concerned, though I think any should suffice, I was using [619fa857c933]. Thanks for attempting to confirm the problem. Andy -- TAI64 timestamp: 400053ce837e ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
On 22 July 2014 17:29, Andy Bradford amb-fos...@bradfords.org wrote: Thus said Michai Ramakers on Tue, 22 Jul 2014 12:35:03 +0200: I can't seem to reproduce what you describe - either that, or I'm missing the point (did you mean 'merge' as in 'fossil merge'?). I'm assuming you left out 'fossil add' (or 'addremove') twice in your example. Yes, I left out a few steps (sorry). It was assumed that the 1500 files already exist in the repository and the changes are just updates (but essentially a 100% rewrite of the file due to the randomness). Also, the entire lump of changes has to be large enough that max-download comes into play and there are multiple sync operations that occur as a result during the checkin. I don't think it matters whether these are new files or modified files (I just used edits because I was trying multiple variations), so after generating all the files, you could do ``fossil addremove'' to get the big change set. I tried your example on a single host, hopefully to exclude complexity added by any physical network. (Do you think it's necessary to use 2 different hosts to reproduce the issue like you described?) I cloned using http:// before adding files, and then updated from within the cloned repo's workdir. More steps I left out... No, I did this all on one host. I created the repo and started fossil server with the repo. Then I cloned it 2 times. In one clone I made the changes and then after the last checkin, I did an update in the second clone. It never received the artifact for the checkin (because it wasn't on the unclustered artifact and not mentioned in any other manifests). Also, as far as the Fossil version is concerned, though I think any should suffice, I was using [619fa857c933]. ahh, right :-) Now everything works (breaks) perfectly. I tried to mimic the actual situation I had earlier (http://lists.fossil-scm.org:8080/pipermail/fossil-users/2013-August/013629.html), except on 1 host like you suggest, using 2 clones. I don't / didn't use branches other than trunk (which still breaks, using your example - good). Effectively committed the 1500 files onto trunk from within the 1st clone's workdir, and didn't follow it by an additional commit. Sync from within the 2nd clone's workdir received iirc 161 out of approx 1500 artifacts, after which the timeline didn't show the commit. Following that by a single added/committed file from within the 1st clone's workdir again, and a sync from within the 2nd clone's workdir, retrieved everything up to and including the last single-file commit. So... this seems exactly what I saw happen here at that time; thx again for the effort, I'm very happy this seems pinpointed! Michai ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
Thus said Michai Ramakers on Tue, 22 Jul 2014 17:58:24 +0200: I don't / didn't use branches other than trunk (which still breaks, using your example - good). Yes, this should work with trunk as long as there are no commits that follow the one which caused the exclusion of artifacts from clusters, as you mentioned. Following that by a single added/committed file from within the 1st clone's workdir again, and a sync from within the 2nd clone's workdir, retrieved everything up to and including the last single-file commit. In this case, the artifact is indirectly available through other manifests which get picked up and then pulled, but not through any cluster artifacts. Thanks, Andy -- TAI64 timestamp: 400053ce8e79 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] unclustered vs private vs phantom and not syncing content
On 12 July 2014 21:36, Andy Bradford amb-fos...@bradfords.org wrote: I've been trying to investigate the problem that has been reported over time and so far I haven't been able to reproduce it or come to any conclusive decision regarding what might be the cause. When (in Fossil versions) did this problem first get noticed or start happening? FWIW, for me this happened first using fossil versions 1.26 [3ca6979514] 2013-07-23 18:57:25 UTC and 1.25 [a6dad6508c] 2013-06-14 07:19:58 UTC on client and server, respectively (see this post: http://lists.fossil-scm.org:8080/pipermail/fossil-users/2013-August/013629.html). I have next to no clue on fossil's source and innards, so I can't really comment on what you typed in this mail - anyway, happy hunting. Michai ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] unclustered vs private vs phantom and not syncing content
Hello, I've been trying to investigate the problem that has been reported over time and so far I haven't been able to reproduce it or come to any conclusive decision regarding what might be the cause. When (in Fossil versions) did this problem first get noticed or start happening? Thanks to Donny Ward, we at least have one Fossil repository that seems to reveal one possible cause. For whatever reason, one of the checkin artifacts in the timeline did not make it into a cluster. I've been trying to determine how this could come about and so far can only see a few possibilities (none of them a solid cause): 1) When the content was received for the checkin, something failed to put the artifact rid in the unclustered table. 2) The artifact rid was in the unclustered table, but when create_cluster() ran it prematurely removed it from the table. 3) The content was originally private (private content does not get put into unclustered table) and somehow got marked as public. 4) There is a bug in this loop that sometimes prematurely removes artifact rids from the unclustered table: http://www.fossil-scm.org/index.html/artifact/37f2afbbd186bf5cef90c57b7fa1acd7097977cd?ln=692,701 I haven't been able to find any way that 1 could happen. The only way I can imagine 2 could happen is if there is a phantom for the checkin artifact, the content hasn't yet arrived, and create_cluster() is run which will cause the rid of the artifact to be removed from the unclustered table and it will never make it into a cluster. I'm not sure how this could happen, but it is potentially a problem (I started working on this particular scenario in the cluster-changes branch). For 3, I see that there is some code that will remove rids from the private table if they are received and made public: http://www.fossil-scm.org/index.html/artifact/37f2afbbd186bf5cef90c57b7fa1acd7097977cd?ln=172 http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=738,745 But if they are in the private table, then necessarily they will not have been added to the unclustered table: http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=572,575 http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=596,598 Which means that they will never show up in a cluster (even after made public) and they will not be in the unclustered table. So why does the content_make_public() function exist? Is it possible to switch private content from private to public? I didn't think it was, but perhaps this is also one potential way that a checkin could fail to sync if it happens because removing the rid from the private table won't put the rid into the unclustered table. Finally, for option 4, this block of code only kicks in for large unclustered tables where the number of rows currently added to a new cluster is =800 and the number of artifacts not yet clustered is rows+100. Again, I cannot see where there is any problem, but a number of people have indicated that it happens when there are a lot of files in the checkin---this was also the case of Donny's a9b134481708 artifact. It was a checkin that had 1251 F-cards in the manifest which means there were probably a lot of entries in unclustered until the next time a pull happened. I have tried to reproduce it with various large numbers of commits and haven't yet been able to cause it. At any rate, Donny, were you doing anything with private content for checkin a9b134481708 in your repository? Or did you have any kind of automated pull that might have coincided with the time that the checkin was being committed to the repository? Did you shun the checkin artifact and then unshun it? Other thoughts? Should cluster artifacts have a D-card in them? Might make it easier to correlate the timeline checkins with when they get created. Thanks, Andy -- TAI64 timestamp: 400053c18e7d ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users