Re: [fossil-users] Backup traffic
Richard Hipp: > I think this problem has been addressed in a more general way > on the latest trunk. Please let me know if you find otherwise. This works fine (tested only on Windows, so far), thank you very much! --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On 7/21/18, Florian Balmer wrote: > > The current tip version of Fossil still > exhibits the behavior summarized here: > > https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg27269.html > I think this problem has been addressed in a more general way on the latest trunk. Please let me know if you find otherwise. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
> There's been some changes to fossil_exit() in the meantime, I'll check > these, and report back here. I was wrong, the changes were to fossil_fatal() and fossil_panic(), and not to fossil_exit(). The current tip version of Fossil still exhibits the behavior summarized here: https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg27269.html --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
Warren Young: > Quantify “a lot.” I have some rarely committed-to but frequently web-accessed repositories (with login), and I see daily backups of the modified repository database, even though I'm sure I haven't committed anything. It's like "hey, what's going on there with my babies?" everytime, but I need to get used to it. > I’d find out why the DB client is dying early and fix that, so that > the WAL ends up being deleted entirely upon a clean DB shutdown. I think I found it: https://www.mail-archive.com/fossil-users@lists.fossil-scm.org/msg27269.html There's a call to fossil_exit() from within a db_step()...db_finalize() block, and calling fossil_exit() only after db_finalize() fixed it. There's been some changes to fossil_exit() in the meantime, I'll check these, and report back here. > I find it odd that some people get so itchy over DB concurrency and > such with Fossil when highly active projects might have 40 or so > commits per day. I'm not worried by this. Stephan just wondered if a shared cookie database may be prone to locks contention, if I got him right. I'd assume the main bottleneck to be high-frequency, read-only, no-login web access (for a renowned project), in which case the cookie database doesn't even need to be attached, and not the frequency of commits. --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On 7/20/18, John P. Rouillard wrote: > Does a clone/sync grab passwords and user accounts as well? I thought those > weren't copied in the clone but were private to the repository. If you have Admin or Setup privilege, you can do "fossil config sync user" -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On Jul 20, 2018, at 3:32 PM, John P. Rouillard wrote: > > Does a clone/sync grab passwords and user accounts as well? I thought those > weren't copied in the clone but were private to the repository. You get a copy of the users table *if* you clone while logged in with a user with Setup privileges. It might also work with Admin, but I haven’t checked. Otherwise, you’re right: Fossil strips the user table contents while cloning, on purpose. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
Hi all: In message , Florian Balmer writes: >Richard Hipp: >> ... create your backups by cloning and syncing ... > >Thank you for your comments. > >I see, this completely makes sense. > >The process of "restoring" a repository from backup would include >copying database files, as syncing from backup → original might not >work if something's gone awry with the original. My main concern here >is that the cloned backup really includes everything from the original >(configuration, etc.). But hearing again (haven't you already outlined >the "cloning as backup strategy" recently, on this list?) that it >works for the experts should give me the faith to trust it. Does a clone/sync grab passwords and user accounts as well? I thought those weren't copied in the clone but were private to the repository. -- -- rouilj John Rouillard === My employers don't acknowledge my existence much less my opinions. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On Jul 20, 2018, at 5:04 AM, Richard Hipp wrote: > > create your backups by cloning and syncing …with Admin privileges. Otherwise, you won’t get important things like the user table. After the first clone, each backup should consist of both a “fossil sync” as well as a “fossil conf pull all”. While you can recreate the user *list* from Fossil checkin contents and then recreate the users table and do whatever dance it is you do to pass out user passwords and get them changed to something secure, it’s better to just back all that up to begin with. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On Jul 20, 2018, at 2:12 AM, Florian Balmer wrote: > > There's a lot of backup traffic Quantify “a lot.” Do you have benchmark numbers showing that the current load is too high, and that your wished-for changes will reduce load to acceptable levels? > (This was also the main reason for my complaining about the leftover > WAL and SHM files, recently, which accumulated in my backup logs. > Because in the end, WAL and SHM have to be kept together with the > SQLite database, as they might contain valuable information?) The greater concern is that if these files are present after all clients have disconnected from the DB, it means you’ve got a DB client that is dying without closing the DB properly. That’s a problem in its own right, but it might also mean that the last transaction run might not have hit the journal before the program died, so it’s effectively rolled back upon replay of the journal. Rather than worry over the resulting WAL size, I’d find out why the DB client is dying early and fix that, so that the WAL ends up being deleted entirely upon a clean DB shutdown. > From peeking at the Fossil timeline, my question is, will the new > "backoffice processing" cause even more frequent updates to the main > repository database, i.e. with the pids stored in the configuration > table, and updated after each web page display? How many checkins, syncs, etc. do you have per day? I find it odd that some people get so itchy over DB concurrency and such with Fossil when highly active projects might have 40 or so commits per day. Amortized evenly over an 8-hour work day, that’s only one every 12 minutes. With real-world bursty traffic, there’s still an excellent chance that on every DB update, there is no write contention at all. > Does anybody care about the repository > database, holding all your valuable contents, being modified > frequently with simple non-contents state information? If I didn’t trust it to withstand that, I wouldn’t trust it to hold my unique work products, either. ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
Just one more thought: Copying database files (vs. cloning) also includes any hand-made meta changes, for example I remember adjusting the page size and journal mode for older repositories, when the defaults for new Fossil repositories were changed. Of course `fossil rebuild --wal' after the sync can help with things like these, but the database file checksum will definitely change and trigger a complete backup, for the rebuilt repository. I think I need to come away from my traditional "copy a file and get exactly what you had" way of thinking ... --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
Richard Hipp: > ... create your backups by cloning and syncing ... Thank you for your comments. I see, this completely makes sense. The process of "restoring" a repository from backup would include copying database files, as syncing from backup → original might not work if something's gone awry with the original. My main concern here is that the cloned backup really includes everything from the original (configuration, etc.). But hearing again (haven't you already outlined the "cloning as backup strategy" recently, on this list?) that it works for the experts should give me the faith to trust it. Backing up "hot" databases is currently not a concern with my private, traditional-style CGI-served repositories. I would like to have some "rotating" backup, with a way to go back certain steps with the complete repository, i.e. day-by-day, for up to one week, so I could catch the "last good" if I notice something wrong. Copying and replacing duplicate files with hard-links is an extremely straight forward and space efficient process to achieve this. I will try the same with cloning new (some extra logic required) and syncing existing repositories. But it may not be possible to detect unchanged / duplicate repository database files, like this, as some internally stored last sync or URL last access time stamps might always result in a different database file, I assume. --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
On 7/20/18, Florian Balmer wrote: > But what is a good > strategy to minimize backup traffic, if repository databases change > that often? > Don't backup by copying the database file (which is not safe to do anyhow, unless you shutdown Fossil during the copy, because otherwise the database file might change while it is being copied, resulting in a corrupt copy.). Instead, create your backups by cloning and syncing. That is what DVCSes are designed to do. The canonical Fossil self-hosting repository, and the SQLite source repository that Fossil was created to manage, are both backed up this way. There are three separate servers, each in separate geographically distributed data centers, managed by two indenpendent ISPs. These repos are all synced with one another automatically using a cron-job. One cool bonus feature of this approach is that the 'backups" are live repositories, that can be directly accessed (as https://www2.fossil-scm.org/ and httpss://www3.fossil-scm.org/site.cgi) so it is easy to verify that the backups are really happening and that they are correct. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] Backup traffic
Stephan Beal: > .. i'm not sure that i like it enough to justify the idea of > maintaining two files where one file is sufficient. The current implementation uses one single cookie database shared for all repositories in the same directory, which can be excluded from backups, and deleted (or, better, emptied by SQL script) to have everybody logged off. But it's possible to modify the code to use one single cookie database per system, or per repository. > .. the login cookie db could become a point of locking contention ... Would WAL mode prevent this, mostly? Below are my current "works for me on Windows and FreeBSD" patches. I hope we still have the same definitions of "surprisingly simple" :) The 2nd patch is only required with my previous patch to change ETag generation to produce a "login-time-sensitive" hash. I'm sorry e-mail processing may insert one or two unwanted line breaks after column 72, as Fossil seems to use a source code line length limit of 80 chars. Some notes are included directly with the patch file headers, but I'd like to emphasize that I haven't bothered making things work with login groups, so far. I'd be happy to do more work towards a more generalized "separate (shared) database for non-repository contents, such as 'volatile' or 'system-specific' state information" approach, should this be considered interesting for Fossil. --Florian = Patch for Fossil [e08f9c04] == Baseline: Fossil [e08f9c0423] Proof-of-concept to outsource login cookie information to a separate database named "cookiestore", saved as "fossil-cookiestore.sqlite" in the directory of the main repository database, and attached on demand. The "cookiestore" database is left attached until shutdown; it may be safer to have it detached explicitly as soon as possible. HTTP cache handlers, and any other code relying on "user.cexpire", must query "cookiestore.user.cexpire", instead. Support to share login credentials across login groups is not implemented by this patch; in fact, this may even break login group features. Admins changing their own password through the /setup_uedit page (not through the /login page) are no longer logged out automatically. To prevent writes to the main repository database caused by read-only web server access, the "PRAGMA optimize" call needs to be removed, and the "access_log" feature needs to be disabled (the logs could be recorded to a plain text file, or outsourced to a separate database, if required). There may be more elegant SQL queries to work with the connected tables, either by using JOINs, or FOREIGN KEYs (yet the latter have been disabled by Fossil). Windows batch file to dump or tweak the "cookiestore" database: :: @echo off :: setlocal :: set c=fossil-cookiestore.sqlite :: if not exist "%c%" goto:eof :: ( :: echo ATTACH '%c%' AS 'c'; :: echo -- PRAGMA c.journal_mode; :: echo -- PRAGMA c.page_size; :: echo -- PRAGMA c.auto_vacuum; :: echo SELECT * FROM c.user; :: echo -- UPDATE c.user SET cexpire=0; :: ) | fossil sql --no-repository Index: src/login.c == --- src/login.c +++ src/login.c @@ -143,10 +143,53 @@ */ static char *abbreviated_project_code(const char *zFullCode){ return mprintf("%.16s", zFullCode); } +/* +** Attach the fossil-cookiestore.sqlite db to store login cookies. +*/ +void attach_cookiestore() +{ + static int attached_cookiestore = 0; + char *zDBName; + Blob bDBFullName; + char *zProjCode; + + if (attached_cookiestore) return; + + zDBName = mprintf("%s/../fossil-cookiestore.sqlite",g.zRepositoryName); + file_canonical_name(zDBName,&bDBFullName,0); + sqlite3_free(zDBName); + db_attach(blob_str(&bDBFullName),"cookiestore"); + blob_reset(&bDBFullName); + + /* Initialize */ + db_multi_exec( +"CREATE TABLE IF NOT EXISTS cookiestore.user( " +"repo TEXT, uid INTEGER, login TEXT, " +"cookie TEXT, ipaddr TEXT, cexpire DATETIME," +"PRIMARY KEY (repo, uid), " +"UNIQUE (repo, uid, login) ON CONFLICT REPLACE );"); + /* Clear expired cookies */ + zProjCode = db_get("project-code",NULL); + db_multi_exec( +"DELETE FROM cookiestore.user WHERE " +"repo=%Q AND cexpire 0) && "Invalid user data."); + attach_cookiestore(); zHash = db_text(0, - "SELECT cookie FROM user" - " WHERE uid=%d" + "SELECT cookie FROM cookiestore.user" + " WHERE repo=%Q AND uid=%d" " AND ipaddr=%Q" " AND cexpire>julianday('now')" " AND length(cookie)>30", - uid, zRemoteAddr); + zProjCode, uid, zRemoteAddr); if( zHash==0 ) zHash = db_text(0, "SELECT hex(randomblob(25))"); zCookie = login_gen_user_cookie_value(zUsername, zHash); cgi_set_cookie(zCookieName, zCookie, login_cookie_path(), expires); record_login_attempt(zUsername, zIpAddr, 1); db_multi_exec( -"UPDATE user SET cookie=%Q, ipaddr=%Q, " -" cexpire=julianday('now')+%d/86400.0 WH
Re: [fossil-users] Backup traffic
On Fri, Jul 20, 2018 at 10:13 AM Florian Balmer wrote: > I have created a (surprisingly simple) patch to attach a separate > login cookie database (shared among all repositories in the same > directory), so that plain login and logout actions will no longer > cause repository database writes. With admin and user logs turned off, > and "PRAGMA optimize" removed, the repository database is only touched > if there's new contents, or new configuration settings. > > What's your comments to this? Does anybody care about the repository > database, holding all your valuable contents, being modified > frequently with simple non-contents state information? This behaviour doesn't bother me at all (in 10 years of using Fossil), but if a patch for working around it is simple and non-intrusive, i would consider it to be an interesting feature (with the caveat that it might impact future changes). i conceptually like the idea of the login cookie/timestamps being in a separate db, but i'm not sure that i like it enough to justify the idea of maintaining two files where one file is sufficient. That wouldn't really impact me much, as i keep all of my hosted .fsl files in one directory, but for a hoster like chisselapp, where each repo is (probably) in its own directory, it doubles the number of fossil-related files. One _potential_ problem i see, but it's largely hypothetical, is that the login cookie db could become a point of locking contention if is used together with many very active .fsl files. That is probably only possible if several of those repos are _extremely_ active, though. -- - stephan beal http://wanderinghorse.net/home/stephan/ "Freedom is sloppy. But since tyranny's the only guaranteed byproduct of those who insist on a perfect world, freedom will have to do." -- Bigby Wolf ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
[fossil-users] Backup traffic
As much as I like the simplicity of keeping the full repository history in one single SQLite database, I see a minor downside. There's a lot of backup traffic, if "non-contents changes" (such as the admin and user logs, the login cookies, but also the "PRAGMA optimize" information) are causing updates to the repository database, marking it dirty for the next backup cycle. (This was also the main reason for my complaining about the leftover WAL and SHM files, recently, which accumulated in my backup logs. Because in the end, WAL and SHM have to be kept together with the SQLite database, as they might contain valuable information?) From peeking at the Fossil timeline, my question is, will the new "backoffice processing" cause even more frequent updates to the main repository database, i.e. with the pids stored in the configuration table, and updated after each web page display? I have created a (surprisingly simple) patch to attach a separate login cookie database (shared among all repositories in the same directory), so that plain login and logout actions will no longer cause repository database writes. With admin and user logs turned off, and "PRAGMA optimize" removed, the repository database is only touched if there's new contents, or new configuration settings. What's your comments to this? Does anybody care about the repository database, holding all your valuable contents, being modified frequently with simple non-contents state information? Given the reliability of SQLite, we probably shouldn't care. But what is a good strategy to minimize backup traffic, if repository databases change that often? --Florian ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users