Re: [fossil-users] Backups of deconstructed fossil repositories

2018-06-29 Thread Thomas Levine
On Sun, Jun 17, 2018, at 20:05, Warren Young wrote:
> However, I’ll also give a counterargument to the whole idea: you 
> probably aren’t saving anything in the end.  An intelligent deconstruct 
> + backup probably saves no net I/O over just re-copying the Fossil repo 
> DB to the destination unless the destination is *much* slower than the 
> machine being backed up.
> 
> (rsync was created for the common case where networks are much slower 
> than the computers they connect.  rsync within a single computer is 
> generally no faster than cp -r, and sometimes slower, unless you take 
> the mtime optimization mentioned above.)
> 
> The VM/ZFS + snapshots case has a similar argument against it: if you’re 
> using snapshots to back up a Fossil repo, deconstruction isn’t helpful.  
> The snapshot/CoW mechanism will only clone the changed disk blocks in 
> the repo.
> 
> So, what problem are you solving?  If it isn’t the slow-networks 
> problem, I suspect you’ve got an instance of the premature optimization 
> problem here.  If you go ahead and implement it, measure before 
> committing the change, and if you measure a meaningful difference, 
> document the conditions to help guide expectations.

I want my approximately daily backups to be small.

I currently version the fossil SQLite files in borg, and I am considering 
versioning instead the artefact dumps. I figure these will change less than the 
SQLite files do and that they also will be smaller because they lack caches.

But the backups are already very small.

I suppose I could test this.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backups of deconstructed fossil repositories

2018-06-17 Thread Richard Hipp
On 6/17/18, Thomas Levine <_...@thomaslevine.com> wrote:
> As content is added to a fossil repository, files in the corresponding
> deconstructed repository never change; they are only added. Most backup
> software will track changes to the deconstructed repository with great
> efficiency.
>
> I should thus take my backups of the deconstructed repositories, yes?

Fossil itself tracks changes with great efficiency.  The best backup
of a fossil repository is a clone.

The self-hosting Fossil repo at https://fossil-scm.org/ is backed up
by two clones, one at https://www2.fossil-scm.org/ and the other at
https://www3.fossil-scm.org/site.cgi.  Each of these clones is in a
separate data center in a different part of the world.  The second
clone uses a different ISP (DigitalOcean instead of Linode).  Both
clones sync to the master hourly via a cron job.

-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backups of deconstructed fossil repositories

2018-06-17 Thread Stephan Beal
On Sun, Jun 17, 2018 at 10:08 PM Warren Young  wrote:

> On Jun 17, 2018, at 2:05 PM, Warren Young  wrote:
> >
> > If you’re willing to gamble that if the first test returns true that the
> second will also returns true, it buys you a big increase in speed.  The
> gamble is worth taking as long as the files’ modification timestamps are
> trustworthy.
>
> I just remembered something: “fossil up” purposely does not modify the
> mtimes of the files it writes to match the mtime of the file in the
> repository because it can cause difficult-to-diagnose build system errors.
> Writing changed files out with the current wall time as the mtime is more
> likely to cause correct builds.
>

To that i'm going to add that fossil doesn't actually store any file
timestamps! It only records the time of a commit. When fossil is asked
"what's the timestamp for file X?", the answer is really the timestamp of
the last commit in which that file was modified.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backups of deconstructed fossil repositories

2018-06-17 Thread Warren Young
On Jun 17, 2018, at 2:05 PM, Warren Young  wrote:
> 
> If you’re willing to gamble that if the first test returns true that the 
> second will also returns true, it buys you a big increase in speed.  The 
> gamble is worth taking as long as the files’ modification timestamps are 
> trustworthy.

I just remembered something: “fossil up” purposely does not modify the mtimes 
of the files it writes to match the mtime of the file in the repository because 
it can cause difficult-to-diagnose build system errors.  Writing changed files 
out with the current wall time as the mtime is more likely to cause correct 
builds.

I wonder if the fossil deconstruct mechanism also does the same thing?  If so, 
then you can’t take the rsync mtime optimization without changing that behavior.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backups of deconstructed fossil repositories

2018-06-17 Thread Warren Young
On Jun 17, 2018, at 12:16 PM, Thomas Levine <_...@thomaslevine.com> wrote:
> 
> One inconvenience I noted is that the deconstruct command always writes
> artefacts to the filesystem, even if a file of the appropriate name and
> size and contents already exists.

You might want to split that observation into two, as rsync does:

- name, size, and modification date match
- contents also match

If you’re willing to gamble that if the first test returns true that the second 
will also returns true, it buys you a big increase in speed.  The gamble is 
worth taking as long as the files’ modification timestamps are trustworthy.

When the timestamps aren’t trustworthy, you do the first test, then if that 
returns true, also do the second as extra assurance.

> Would the developers welcome a flag
> to blob_write_to_file in src/blob.c to skip the writing of a new
> artefact file if the file already exists?

In addition to your backup case, it might also benefit snapshotting mechanisms 
found in many virtual machine systems and in some of the more advanced 
filesystems.  (ZFS, btrfs, APFS…)

However, I’ll also give a counterargument to the whole idea: you probably 
aren’t saving anything in the end.  An intelligent deconstruct + backup 
probably saves no net I/O over just re-copying the Fossil repo DB to the 
destination unless the destination is *much* slower than the machine being 
backed up.

(rsync was created for the common case where networks are much slower than the 
computers they connect.  rsync within a single computer is generally no faster 
than cp -r, and sometimes slower, unless you take the mtime optimization 
mentioned above.)

The VM/ZFS + snapshots case has a similar argument against it: if you’re using 
snapshots to back up a Fossil repo, deconstruction isn’t helpful.  The 
snapshot/CoW mechanism will only clone the changed disk blocks in the repo.

So, what problem are you solving?  If it isn’t the slow-networks problem, I 
suspect you’ve got an instance of the premature optimization problem here.  If 
you go ahead and implement it, measure before committing the change, and if you 
measure a meaningful difference, document the conditions to help guide 
expectations.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Backups of deconstructed fossil repositories

2018-06-17 Thread Thomas Levine
As content is added to a fossil repository, files in the corresponding
deconstructed repository never change; they are only added. Most backup
software will track changes to the deconstructed repository with great
efficiency.

I should thus take my backups of the deconstructed repositories, yes?
That is, should I back up the SQLite database format of the fossil
repository or the deconstructed directory format of the repository?

One inconvenience I noted is that the deconstruct command always writes
artefacts to the filesystem, even if a file of the appropriate name and
size and contents already exists. Would the developers welcome a flag
to blob_write_to_file in src/blob.c to skip the writing of a new
artefact file if the file already exists? That is, rebuild_step in
src/rebuild.c would check for the existance of the file corresponding
the artefact's hash, and if such a file exists already (even if its
content is wrong), rebuild_step would skip writing this artefact.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users