Re: [fossil-users] Backup traffic

2018-07-20 Thread Richard Hipp
On 7/20/18, John P. Rouillard  wrote:
> Does a clone/sync grab passwords and user accounts as well? I thought those
> weren't copied in the clone but were private to the repository.

If you have Admin or Setup privilege, you can do "fossil config sync user"
-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Warren Young
On Jul 20, 2018, at 3:32 PM, John P. Rouillard  wrote:
> 
> Does a clone/sync grab passwords and user accounts as well? I thought those
> weren't copied in the clone but were private to the repository.

You get a copy of the users table *if* you clone while logged in with a user 
with Setup privileges.  It might also work with Admin, but I haven’t checked.

Otherwise, you’re right: Fossil strips the user table contents while cloning, 
on purpose.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread John P. Rouillard
Hi all:

In message 
,
Florian Balmer writes:
>Richard Hipp:
>> ... create your backups by cloning and syncing ...
>
>Thank you for your comments.
>
>I see, this completely makes sense.
>
>The process of "restoring" a repository from backup would include
>copying database files, as syncing from backup → original might not
>work if something's gone awry with the original. My main concern here
>is that the cloned backup really includes everything from the original
>(configuration, etc.). But hearing again (haven't you already outlined
>the "cloning as backup strategy" recently, on this list?) that it
>works for the experts should give me the faith to trust it.

Does a clone/sync grab passwords and user accounts as well? I thought those
weren't copied in the clone but were private to the repository.

--
-- rouilj
John Rouillard
===
My employers don't acknowledge my existence much less my opinions.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Warren Young
On Jul 20, 2018, at 5:04 AM, Richard Hipp  wrote:
> 
> create your backups by cloning and syncing

…with Admin privileges.  Otherwise, you won’t get important things like the 
user table.  After the first clone, each backup should consist of both a 
“fossil sync” as well as a “fossil conf pull all”.

While you can recreate the user *list* from Fossil checkin contents and then 
recreate the users table and do whatever dance it is you do to pass out user 
passwords and get them changed to something secure, it’s better to just back 
all that up to begin with.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Warren Young
On Jul 20, 2018, at 2:12 AM, Florian Balmer  wrote:
> 
> There's a lot of backup traffic

Quantify “a lot.”  

Do you have benchmark numbers showing that the current load is too high, and 
that your wished-for changes will reduce load to acceptable levels?

> (This was also the main reason for my complaining about the leftover
> WAL and SHM files, recently, which accumulated in my backup logs.
> Because in the end, WAL and SHM have to be kept together with the
> SQLite database, as they might contain valuable information?)

The greater concern is that if these files are present after all clients have 
disconnected from the DB, it means you’ve got a DB client that is dying without 
closing the DB properly.  That’s a problem in its own right, but it might also 
mean that the last transaction run might not have hit the journal before the 
program died, so it’s effectively rolled back upon replay of the journal.

Rather than worry over the resulting WAL size, I’d find out why the DB client 
is dying early and fix that, so that the WAL ends up being deleted entirely 
upon a clean DB shutdown.

> From peeking at the Fossil timeline, my question is, will the new
> "backoffice processing" cause even more frequent updates to the main
> repository database, i.e. with the pids stored in the configuration
> table, and updated after each web page display?

How many checkins, syncs, etc. do you have per day?

I find it odd that some people get so itchy over DB concurrency and such with 
Fossil when highly active projects might have 40 or so commits per day.  
Amortized evenly over an 8-hour work day, that’s only one every 12 minutes.  
With real-world bursty traffic, there’s still an excellent chance that on every 
DB update, there is no write contention at all.

> Does anybody care about the repository
> database, holding all your valuable contents, being modified
> frequently with simple non-contents state information?

If I didn’t trust it to withstand that, I wouldn’t trust it to hold my unique 
work products, either.
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Florian Balmer
Just one more thought:

Copying database files (vs. cloning) also includes any hand-made meta
changes, for example I remember adjusting the page size and journal
mode for older repositories, when the defaults for new Fossil
repositories were changed.

Of course `fossil rebuild --wal' after the sync can help with things
like these, but the database file checksum will definitely change and
trigger a complete backup, for the rebuilt repository.

I think I need to come away from my traditional "copy a file and get
exactly what you had" way of thinking ...

--Florian
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Florian Balmer
Richard Hipp:

> ... create your backups by cloning and syncing ...

Thank you for your comments.

I see, this completely makes sense.

The process of "restoring" a repository from backup would include
copying database files, as syncing from backup → original might not
work if something's gone awry with the original. My main concern here
is that the cloned backup really includes everything from the original
(configuration, etc.). But hearing again (haven't you already outlined
the "cloning as backup strategy" recently, on this list?) that it
works for the experts should give me the faith to trust it.

Backing up "hot" databases is currently not a concern with my private,
traditional-style CGI-served repositories.

I would like to have some "rotating" backup, with a way to go back
certain steps with the complete repository, i.e. day-by-day, for up to
one week, so I could catch the "last good" if I notice something
wrong. Copying and replacing duplicate files with hard-links is an
extremely straight forward and space efficient process to achieve
this.

I will try the same with cloning new (some extra logic required) and
syncing existing repositories. But it may not be possible to detect
unchanged / duplicate repository database files, like this, as some
internally stored last sync or URL last access time stamps might
always result in a different database file, I assume.

--Florian
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Richard Hipp
On 7/20/18, Florian Balmer  wrote:
> But what is a good
> strategy to minimize backup traffic, if repository databases change
> that often?
>

Don't backup by copying the database file (which is not safe to do
anyhow, unless you shutdown Fossil during the copy, because otherwise
the database file might change while it is being copied, resulting in
a corrupt copy.).  Instead, create your backups by cloning and
syncing.  That is what DVCSes are designed to do.

The canonical Fossil self-hosting repository, and the SQLite source
repository that Fossil was created to manage, are both backed up this
way.  There are three separate servers, each in separate
geographically distributed data centers, managed by two indenpendent
ISPs.  These repos are all synced with one another automatically using
a cron-job.

One cool bonus feature of this approach is that the 'backups" are live
repositories, that can be directly accessed (as
https://www2.fossil-scm.org/ and
httpss://www3.fossil-scm.org/site.cgi) so it is easy to verify that
the backups are really happening and that they are correct.
-- 
D. Richard Hipp
d...@sqlite.org
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


Re: [fossil-users] Backup traffic

2018-07-20 Thread Florian Balmer
Stephan Beal:

> .. i'm not sure that i like it enough to justify the idea of
> maintaining two files where one file is sufficient.

The current implementation uses one single cookie database shared for
all repositories in the same directory, which can be excluded from
backups, and deleted (or, better, emptied by SQL script) to have
everybody logged off. But it's possible to modify the code to use one
single cookie database per system, or per repository.

> .. the login cookie db could become a point of locking contention ...

Would WAL mode prevent this, mostly?

Below are my current "works for me on Windows and FreeBSD" patches. I
hope we still have the same definitions of "surprisingly simple" :)

The 2nd patch is only required with my previous patch to change ETag
generation to produce a "login-time-sensitive" hash.

I'm sorry e-mail processing may insert one or two unwanted line breaks
after column 72, as Fossil seems to use a source code line length
limit of 80 chars.

Some notes are included directly with the patch file headers, but I'd
like to emphasize that I haven't bothered making things work with
login groups, so far.

I'd be happy to do more work towards a more generalized "separate
(shared) database for non-repository contents, such as 'volatile' or
'system-specific' state information" approach, should this be
considered interesting for Fossil.

--Florian

= Patch for Fossil [e08f9c04] ==

Baseline: Fossil [e08f9c0423]

Proof-of-concept to outsource login cookie information to a separate
database named "cookiestore", saved as "fossil-cookiestore.sqlite" in
the directory of the main repository database, and attached on demand.

The "cookiestore" database is left attached until shutdown; it may be
safer to have it detached explicitly as soon as possible.

HTTP cache handlers, and any other code relying on "user.cexpire", must
query "cookiestore.user.cexpire", instead.

Support to share login credentials across login groups is not
implemented by this patch; in fact, this may even break login group
features.

Admins changing their own password through the /setup_uedit page (not
through the /login page) are no longer logged out automatically.

To prevent writes to the main repository database caused by read-only
web server access, the "PRAGMA optimize" call needs to be removed, and
the "access_log" feature needs to be disabled (the logs could be
recorded to a plain text file, or outsourced to a separate database, if
required).

There may be more elegant SQL queries to work with the connected tables,
either by using JOINs, or FOREIGN KEYs (yet the latter have been
disabled by Fossil).

Windows batch file to dump or tweak the "cookiestore" database:

:: @echo off
:: setlocal
:: set c=fossil-cookiestore.sqlite
:: if not exist "%c%" goto:eof
:: (
:: echo ATTACH '%c%' AS 'c';
:: echo -- PRAGMA c.journal_mode;
:: echo -- PRAGMA c.page_size;
:: echo -- PRAGMA c.auto_vacuum;
:: echo SELECT * FROM c.user;
:: echo -- UPDATE c.user SET cexpire=0;
:: ) | fossil sql --no-repository

Index: src/login.c
==
--- src/login.c
+++ src/login.c
@@ -143,10 +143,53 @@
 */
 static char *abbreviated_project_code(const char *zFullCode){
   return mprintf("%.16s", zFullCode);
 }

+/*
+** Attach the fossil-cookiestore.sqlite db to store login cookies.
+*/
+void attach_cookiestore()
+{
+  static int attached_cookiestore = 0;
+  char *zDBName;
+  Blob bDBFullName;
+  char *zProjCode;
+
+  if (attached_cookiestore) return;
+
+  zDBName = mprintf("%s/../fossil-cookiestore.sqlite",g.zRepositoryName);
+  file_canonical_name(zDBName,,0);
+  sqlite3_free(zDBName);
+  db_attach(blob_str(),"cookiestore");
+  blob_reset();
+
+  /* Initialize */
+  db_multi_exec(
+"CREATE TABLE IF NOT EXISTS cookiestore.user( "
+"repo TEXT, uid INTEGER, login TEXT, "
+"cookie TEXT, ipaddr TEXT, cexpire DATETIME,"
+"PRIMARY KEY (repo, uid), "
+"UNIQUE (repo, uid, login) ON CONFLICT REPLACE );");
+  /* Clear expired cookies */
+  zProjCode = db_get("project-code",NULL);
+  db_multi_exec(
+"DELETE FROM cookiestore.user WHERE "
+"repo=%Q AND cexpire 0) && "Invalid user data.");
+  attach_cookiestore();
   zHash = db_text(0,
-  "SELECT cookie FROM user"
-  " WHERE uid=%d"
+  "SELECT cookie FROM cookiestore.user"
+  " WHERE repo=%Q AND uid=%d"
   "   AND ipaddr=%Q"
   "   AND cexpire>julianday('now')"
   "   AND length(cookie)>30",
-  uid, zRemoteAddr);
+  zProjCode, uid, zRemoteAddr);
   if( zHash==0 ) zHash = db_text(0, "SELECT hex(randomblob(25))");
   zCookie = login_gen_user_cookie_value(zUsername, zHash);
   cgi_set_cookie(zCookieName, zCookie, login_cookie_path(), expires);
   record_login_attempt(zUsername, zIpAddr, 1);
   db_multi_exec(
-"UPDATE user SET cookie=%Q, ipaddr=%Q, "
-"  cexpire=julianday('now')+%d/86400.0 WHERE uid=%d",
-

Re: [fossil-users] Backup traffic

2018-07-20 Thread Stephan Beal
On Fri, Jul 20, 2018 at 10:13 AM Florian Balmer 
wrote:

> I have created a (surprisingly simple) patch to attach a separate
> login cookie database (shared among all repositories in the same
> directory), so that plain login and logout actions will no longer
> cause repository database writes. With admin and user logs turned off,
> and "PRAGMA optimize" removed, the repository database is only touched
> if there's new contents, or new configuration settings.
>
> What's your comments to this? Does anybody care about the repository
> database, holding all your valuable contents, being modified
> frequently with simple non-contents state information?


This behaviour doesn't bother me at all (in 10 years of using Fossil), but
if a patch for working around it is simple and non-intrusive, i would
consider it to be an interesting feature (with the caveat that it might
impact future changes).

i conceptually like the idea of the login cookie/timestamps being in a
separate db, but i'm not sure that i like it enough to justify the idea of
maintaining two files where one file is sufficient. That wouldn't really
impact me much, as i keep all of my hosted .fsl files in one directory, but
for a hoster like chisselapp, where each repo is (probably) in its own
directory, it doubles the number of fossil-related files. One _potential_
problem i see, but it's largely hypothetical, is that the login cookie db
could become a point of locking contention if is used together with many
very active .fsl files. That is probably only possible if several of those
repos are _extremely_ active, though.

-- 
- stephan beal
http://wanderinghorse.net/home/stephan/
"Freedom is sloppy. But since tyranny's the only guaranteed byproduct of
those who insist on a perfect world, freedom will have to do." -- Bigby Wolf
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users


[fossil-users] Backup traffic

2018-07-20 Thread Florian Balmer
As much as I like the simplicity of keeping the full repository
history in one single SQLite database, I see a minor downside.

There's a lot of backup traffic, if "non-contents changes" (such as
the admin and user logs, the login cookies, but also the "PRAGMA
optimize" information) are causing updates to the repository database,
marking it dirty for the next backup cycle.

(This was also the main reason for my complaining about the leftover
WAL and SHM files, recently, which accumulated in my backup logs.
Because in the end, WAL and SHM have to be kept together with the
SQLite database, as they might contain valuable information?)

From peeking at the Fossil timeline, my question is, will the new
"backoffice processing" cause even more frequent updates to the main
repository database, i.e. with the pids stored in the configuration
table, and updated after each web page display?

I have created a (surprisingly simple) patch to attach a separate
login cookie database (shared among all repositories in the same
directory), so that plain login and logout actions will no longer
cause repository database writes. With admin and user logs turned off,
and "PRAGMA optimize" removed, the repository database is only touched
if there's new contents, or new configuration settings.

What's your comments to this? Does anybody care about the repository
database, holding all your valuable contents, being modified
frequently with simple non-contents state information? Given the
reliability of SQLite, we probably shouldn't care. But what is a good
strategy to minimize backup traffic, if repository databases change
that often?

--Florian
___
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users