Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Andy Bradford
Thus said Andy Bradford on 12 Jul 2014 13:36:58 -0600:

 2)  The  artifact   rid  was  in  the  unclustered   table,  but  when
 create_cluster() ran it prematurely removed it from the table.

I  have been  able to  successfully reproduce/cause  this. When  a large
number of artifacts  are being transfered, they produce  phantoms on the
server side  of the  sync operation.  Eventually, the  unclustered table
grows  large  enough that  create_cluster()  starts  cleaning house  and
building  a new  cluster  artifact to  replace all  the  entries in  the
unclustered table. Then it deletes everything that it didn't just create
as part of  creating clusters, including phantoms for  which content has
not  yet arrived  (most  importantly  a checkin  artifact  in which  the
manifest references a lot of other files).

If no other artifacts reference the  artifacts that were phantoms on the
unclustered table, now deleted, then the content seemingly disappears to
other clients  that are trying to  sync. If the content  eventually gets
incorporated in  other manifests  then it  will eventually  sync because
they  will discover  the  artifacts  in those  manifests,  mark them  as
phantoms and then request them with gimme cards.

This is  most easily  reproduced by simply  doing this in  a clone  of a
repository:

$ jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump  
file.$x; done
$ fossil ci -m bigupdate --branch big
$ fossil up trunk
$ echo $RANDOM  file.1
$ fossil ci -m back

Now, never merge in the ``big''  branch and clients that have previously
cloned the  server repository will never  see the checkin to  the branch
unless they use --verily. Otherwise, if  the branch is merged into trunk
(or the checkin is edited and  the branch closed), then suddenly it will
appear to those  clients (assuming those changes don't  get deleted from
the unclustered table first).

For repositories that are very  active, and have smallish commits, this 
won't likely ever present itself.   

I have confirmed that the  change in the cluster-changes branch actually
does correct  this, but I  would like  to solicit other  alternatives if
anyone has  any suggestions.  Basically, I  made Fossil  ignore phantoms
when deleting from the unclustered table:

http://www.fossil-scm.org/index.html/vdiff?from=619fa857c9330c10to=5c6891b2ab10c4d0sbs=1

Suggestions?

Thanks,

Andy
-- 
TAI64 timestamp: 400053ce2090


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Stephan Beal
On Tue, Jul 22, 2014 at 10:27 AM, Andy Bradford amb-fos...@bradfords.org
wrote:

 Suggestions?


Only one: Keep it up! That was impressive investigatory work! It seems to
me that you've discovered that fossil does indeed (unintentionally) support
a form of branch-specific sync ;).


-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
http://gplus.to/sgbeal
Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do. -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Michai Ramakers
On 22 July 2014 10:27, Andy Bradford amb-fos...@bradfords.org wrote:
 Thus said Andy Bradford on 12 Jul 2014 13:36:58 -0600:

 2)  The  artifact   rid  was  in  the  unclustered   table,  but  when
 create_cluster() ran it prematurely removed it from the table.

 I  have been  able to  successfully reproduce/cause  this. When  a large
 number of artifacts  are being transfered, they produce  phantoms on the
 server side  of the  sync operation.  Eventually, the  unclustered table
 grows  large  enough that  create_cluster()  starts  cleaning house  and
 building  a new  cluster  artifact to  replace all  the  entries in  the
 unclustered table. Then it deletes everything that it didn't just create
 as part of  creating clusters, including phantoms for  which content has
 not  yet arrived  (most  importantly  a checkin  artifact  in which  the
 manifest references a lot of other files).

 If no other artifacts reference the  artifacts that were phantoms on the
 unclustered table, now deleted, then the content seemingly disappears to
 other clients  that are trying to  sync. If the content  eventually gets
 incorporated in  other manifests  then it  will eventually  sync because
 they  will discover  the  artifacts  in those  manifests,  mark them  as
 phantoms and then request them with gimme cards.

 This is  most easily  reproduced by simply  doing this in  a clone  of a
 repository:

 $ jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump  
 file.$x; done
 $ fossil ci -m bigupdate --branch big
 $ fossil up trunk
 $ echo $RANDOM  file.1
 $ fossil ci -m back

 Now, never merge in the ``big''  branch and clients that have previously
 cloned the  server repository will never  see the checkin to  the branch
 unless they use --verily. Otherwise, if  the branch is merged into trunk
 (or the checkin is edited and  the branch closed), then suddenly it will
 appear to those  clients (assuming those changes don't  get deleted from
 the unclustered table first).

Thank you for the effort, this is or has been a long-standing issue indeed.

I can't seem to reproduce what you describe - either that, or I'm
missing the point (did you mean 'merge' as in 'fossil merge'?). I'm
assuming you left out 'fossil add' (or 'addremove') twice in your
example.

I tried your example on a single host, hopefully to exclude complexity
added by any physical network. (Do you think it's necessary to use 2
different hosts to reproduce the issue like you described?) I cloned
using http:// before adding files, and then updated from within the
cloned repo's workdir.

(I can see the artifacts being received on the cloned repo's side, so
I guess the attempt really ends there.)

Longish typescript follows:

---

michai@main:/fossils$ f ver
This is fossil version 1.30 [619fa857c9] 2014-07-19 19:20:25 UTC
michai@main:/fossils$ grep fossil /etc/inetd.conf
http stream tcp nowait.1000 root /usr/local/bin/f /usr/local/bin/f http /fossils
michai@main:/fossils$ f new --date-override 2014-01-01 ab.fossil
project-id: e0b53c254d86b6445060df9c65a9017134b348db
server-id:  c72a02a0849d982ca8066b812521a1f7cde187af
admin-user: michai (initial password is 794d90)
michai@main:/fossils$ mkdir f
michai@main:/fossils$ cd f
michai@main:/fossils/f$ f open ../ab.fossil
project-name: unnamed
repository:   /fossils/f/../ab.fossil
local-root:   /fossils/f/
config-db:/home/michai/.fossil
project-code: e0b53c254d86b6445060df9c65a9017134b348db
checkout: b58cc4d9818973107a8acba469dda6edd4ba9683 2014-01-01 00:00:00 UTC
leaf: open
tags: trunk
comment:  initial empty check-in (user: michai)
checkins: 1
michai@main:/fossils/f$ pushd /tmp
/tmp /fossils/f
michai@main:/tmp$ mkdir f
michai@main:/tmp$ cd f
michai@main:/tmp/f$ f clone http://localhost/ab ab.fossil
Round-trips: 1   Artifacts sent: 0  received: 0
Round-trips: 1   Artifacts sent: 0  received: 1
Round-trips: 2   Artifacts sent: 0  received: 1
Round-trips: 2   Artifacts sent: 0  received: 3
Clone finished with 461 bytes sent, 1155 bytes received
Rebuilding repository meta-data...
  0.0% complete...
  100.0% complete...
project-id: e0b53c254d86b6445060df9c65a9017134b348db
server-id:  25482d35a0445e5710395d12423b421e90b9f4be
admin-user: michai (password is e4a7ef)
michai@main:/tmp/f$ mkdir f
michai@main:/tmp/f$ cd f
michai@main:/tmp/f/f$ f open ../ab.fossil
project-name: unnamed
repository:   /tmp/f/f/../ab.fossil
local-root:   /tmp/f/f/
config-db:/home/michai/.fossil
project-code: e0b53c254d86b6445060df9c65a9017134b348db
checkout: b58cc4d9818973107a8acba469dda6edd4ba9683 2014-01-01 00:00:00 UTC
leaf: open
tags: trunk
comment:  initial empty check-in (user: michai)
checkins: 1
michai@main:/tmp/f/f$ popd
/fossils/f
michai@main:/fossils/f$ cat /tmp/f.sh
#!/bin/sh

jot 1500 | while read x; do dd if=/dev/urandom bs=1k count=1 | hexdump
 file.$x; done
f addr
f ci -m bigupdate --branch big
f up trunk
echo $RANDOM  file.1
f add file.1
f ci -m back

Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Andy Bradford
Thus said Michai Ramakers on Tue, 22 Jul 2014 12:35:03 +0200:

 I can't  seem to  reproduce what  you describe -  either that,  or I'm
 missing the  point (did you mean  'merge' as in 'fossil  merge'?). I'm
 assuming  you left  out 'fossil  add' (or  'addremove') twice  in your
 example.

Yes, I left out a few steps  (sorry). It was assumed that the 1500 files
already exist  in the repository and  the changes are just  updates (but
essentially a 100% rewrite of the file due to the randomness). Also, the
entire lump  of changes has to  be large enough that  max-download comes
into play and there are multiple  sync operations that occur as a result
during the checkin. I don't think it matters whether these are new files
or  modified files  (I just  used edits  because I  was trying  multiple
variations), so  after generating all  the files, you could  do ``fossil
addremove'' to get the big change set.

 I tried your example on a single host, hopefully to exclude complexity
 added by any  physical network. (Do you think it's  necessary to use 2
 different hosts to  reproduce the issue like you  described?) I cloned
 using http://  before adding files,  and then updated from  within the
 cloned repo's workdir.

More steps I left out...

No, I did  this all on one  host. I created the repo  and started fossil
server with the repo. Then I cloned it  2 times. In one clone I made the
changes and then after  the last checkin, I did an  update in the second
clone. It never received the artifact for the checkin (because it wasn't
on the unclustered artifact and not mentioned in any other manifests).

Also, as  far as  the Fossil  version is concerned,  though I  think any
should suffice, I was using [619fa857c933].

Thanks for attempting to confirm the problem.

Andy
--
TAI64 timestamp: 400053ce837e
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Michai Ramakers
On 22 July 2014 17:29, Andy Bradford amb-fos...@bradfords.org wrote:
 Thus said Michai Ramakers on Tue, 22 Jul 2014 12:35:03 +0200:

 I can't  seem to  reproduce what  you describe -  either that,  or I'm
 missing the  point (did you mean  'merge' as in 'fossil  merge'?). I'm
 assuming  you left  out 'fossil  add' (or  'addremove') twice  in your
 example.

 Yes, I left out a few steps  (sorry). It was assumed that the 1500 files
 already exist  in the repository and  the changes are just  updates (but
 essentially a 100% rewrite of the file due to the randomness). Also, the
 entire lump  of changes has to  be large enough that  max-download comes
 into play and there are multiple  sync operations that occur as a result
 during the checkin. I don't think it matters whether these are new files
 or  modified files  (I just  used edits  because I  was trying  multiple
 variations), so  after generating all  the files, you could  do ``fossil
 addremove'' to get the big change set.

 I tried your example on a single host, hopefully to exclude complexity
 added by any  physical network. (Do you think it's  necessary to use 2
 different hosts to  reproduce the issue like you  described?) I cloned
 using http://  before adding files,  and then updated from  within the
 cloned repo's workdir.

 More steps I left out...

 No, I did  this all on one  host. I created the repo  and started fossil
 server with the repo. Then I cloned it  2 times. In one clone I made the
 changes and then after  the last checkin, I did an  update in the second
 clone. It never received the artifact for the checkin (because it wasn't
 on the unclustered artifact and not mentioned in any other manifests).

 Also, as  far as  the Fossil  version is concerned,  though I  think any
 should suffice, I was using [619fa857c933].

ahh, right :-) Now everything works (breaks) perfectly.

I tried to mimic the actual situation I had earlier
(http://lists.fossil-scm.org:8080/pipermail/fossil-users/2013-August/013629.html),
except on 1 host like you suggest, using 2 clones.

I don't / didn't use branches other than trunk (which still breaks,
using your example - good).

Effectively committed the 1500 files onto trunk from within the 1st
clone's workdir, and didn't follow it by an additional commit.
Sync from within the 2nd clone's workdir received iirc 161 out of
approx 1500 artifacts, after which the timeline didn't show the
commit.
Following that by a single added/committed file from within the 1st
clone's workdir again, and a sync from within the 2nd clone's workdir,
retrieved everything up to and including the last single-file commit.

So... this seems exactly what I saw happen here at that time; thx
again for the effort, I'm very happy this seems pinpointed!

Michai
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-22 Thread Andy Bradford
Thus said Michai Ramakers on Tue, 22 Jul 2014 17:58:24 +0200:

 I don't  / didn't use branches  other than trunk (which  still breaks,
 using your example - good).

Yes, this should  work with trunk as  long as there are  no commits that
follow the one which caused the exclusion of artifacts from clusters, as
you mentioned.

 Following that  by a single  added/committed file from within  the 1st
 clone's workdir again, and a sync from within the 2nd clone's workdir,
 retrieved everything up to and including the last single-file commit.

In  this  case,  the  artifact is  indirectly  available  through  other
manifests  which get  picked up  and then  pulled, but  not through  any
cluster artifacts.

Thanks,

Andy
--
TAI64 timestamp: 400053ce8e79
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-15 Thread Michai Ramakers
On 12 July 2014 21:36, Andy Bradford amb-fos...@bradfords.org wrote:

 I've been trying to investigate the  problem that has been reported over
 time and  so far  I haven't  been able to  reproduce it  or come  to any
 conclusive decision regarding  what might be the cause.  When (in Fossil
 versions) did this problem first get noticed or start happening?

FWIW, for me this happened first using fossil versions
  1.26 [3ca6979514] 2013-07-23 18:57:25 UTC and
  1.25 [a6dad6508c] 2013-06-14 07:19:58 UTC
on client and server, respectively (see this post:
http://lists.fossil-scm.org:8080/pipermail/fossil-users/2013-August/013629.html).

I have next to no clue on fossil's source and innards, so I can't
really comment on what you typed in this mail - anyway, happy hunting.

Michai
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] unclustered vs private vs phantom and not syncing content

2014-07-12 Thread Andy Bradford
Hello,

I've been trying to investigate the  problem that has been reported over
time and  so far  I haven't  been able to  reproduce it  or come  to any
conclusive decision regarding  what might be the cause.  When (in Fossil
versions) did this problem first get noticed or start happening?

Thanks to Donny Ward, we at  least have one Fossil repository that seems
to reveal  one possible cause. For  whatever reason, one of  the checkin
artifacts in  the timeline  did not  make it into  a cluster.  I've been
trying to determine how this could come  about and so far can only see a
few possibilities (none of them a solid cause):

1) When  the content was received  for the checkin, something  failed to
put the artifact rid in the unclustered table.

2)  The   artifact  rid   was  in  the   unclustered  table,   but  when
create_cluster() ran it prematurely removed it from the table.

3) The content was originally private  (private content does not get put
into unclustered table) and somehow got marked as public.

4)  There is  a  bug in  this loop  that  sometimes prematurely  removes
artifact rids from the unclustered table:

http://www.fossil-scm.org/index.html/artifact/37f2afbbd186bf5cef90c57b7fa1acd7097977cd?ln=692,701

I haven't been able to find any way that 1 could happen.

The  only  way  I  can  imagine  2   could  happen  is  if  there  is  a
phantom for  the checkin artifact,  the content hasn't yet  arrived, and
create_cluster() is run  which will cause the rid of  the artifact to be
removed from  the unclustered  table and  it will never  make it  into a
cluster.  I'm not  sure how  this could  happen, but  it is  potentially
a  problem  (I  started  working  on this  particular  scenario  in  the
cluster-changes branch).

For 3,  I see that  there is  some code that  will remove rids  from the
private table if they are received and made public:

http://www.fossil-scm.org/index.html/artifact/37f2afbbd186bf5cef90c57b7fa1acd7097977cd?ln=172
http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=738,745

But if  they are in  the private table,  then necessarily they  will not
have been added to the unclustered table:

http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=572,575
http://www.fossil-scm.org/index.html/artifact/ef8da0287cc50af631daec6886f56b458aa1e4fc?ln=596,598

Which means that they  will never show up in a  cluster (even after made
public) and they will  not be in the unclustered table.  So why does the
content_make_public() function  exist? Is it possible  to switch private
content from private to public? I  didn't think it was, but perhaps this
is  also one  potential way  that a  checkin could  fail to  sync if  it
happens because  removing the rid from  the private table won't  put the
rid into the unclustered table.

Finally,  for option  4, this  block  of code  only kicks  in for  large
unclustered tables  where the number  of rows  currently added to  a new
cluster  is =800  and  the number  of artifacts  not  yet clustered  is
rows+100. Again, I cannot see where  there is any problem, but a number
of people have indicated  that it happens when there are  a lot of files
in  the  checkin---this  was  also  the  case  of  Donny's  a9b134481708
artifact. It was  a checkin that had 1251 F-cards  in the manifest which
means there were probably a lot of entries in unclustered until the next
time a  pull happened. I have  tried to reproduce it  with various large
numbers of commits and haven't yet been able to cause it.

At any  rate, Donny, were  you doing  anything with private  content for
checkin a9b134481708  in your repository?  Or did  you have any  kind of
automated pull that might have coincided  with the time that the checkin
was being committed to the repository? Did you shun the checkin artifact
and then unshun it?

Other thoughts?

Should cluster artifacts have a D-card  in them? Might make it easier to
correlate the timeline checkins with when they get created.

Thanks,

Andy
-- 
TAI64 timestamp: 400053c18e7d


___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users