[Pan-users] Database stability discussion

Duncan Sun, 10 Aug 2025 02:15:50 -0700

Dominique Dumont via Pan-users posted on Sat, 09 Aug 2025 17:44:23 +0200
as excerpted:


> Pan with sqlite [...]

> I've not tested much header download. It works, but I don't really know
> what happens during a crash when downloading.
> 
> That said, a crash may corrupt the database. I've not seen it often, but
> it can happen.

So this has been my unstated-until-now concern. I actually had your first-
in-thread status message marked unread until I had time to try to writeup 
something on it.[1]

I suppose most computer-experienced folks have at some point experienced 
at least one "catastrophic" database corruption failure.  Hopefully we've 
mostly learned the importance of timely backups in the process, but we all 
carry those scars and often fear dealing with "databased" data -- as 
opposed to plain-text data where at worst, partial recovery is available 
via manual text-editing.

A big example personally is when the kdepim folks "jumped the akonadi 
shark" back in the mid-kde4 (4.6) era, and kmail repeatedly ate my 
emails.  They still had plain-text backups fortunately and could (mostly?) 
automatically recover the data from them, but hours of recovery, 
repeatedly, for /email/, which is at its core '80s plain-text technology 
and *NOT* rocket science, and which I and so many others depend on to be 
*rock* *solid*, was simply /not/ something I was willing to accept or 
tolerate.

So I ended up jumping the kmail ship I'd been on for over a decade, to the 
still plain-text-based claws-mail (which ironically I had considered but 
chose kmail back when I did MS -> Linux at the turn of the century), where 
its plain-text scriptability is a primary selling feature, so they will 
NOT be jumping the binary-format database bandwagon any time soon!

OTOH, there /are/ reasonably reliable databases out there -- it has been 
quite awhile since I had to resort to backups to recover everything 
databased in my firefox profile, for instance.  And while I chose the 
plain-text claws-mail as I didn't need/want the extra features, the 
database-backed thunderbird and evolution email clients are generally 
considered reasonably reliable as well.  Plus of course there's the many 
commercial databases and all the TB of data they safeguard out there, but 
in part their reliability is surely due to professional management with 
5-9s uptime requirements and thus professionally managed geographically 
diversified multi-level hot- and cold- backups.  Obviously that's rather 
far from the use-case of the guy/gal with a single physical machine 
hosting his/her database apps, with no redundant power supplies and likely 
no backups beyond what the apps do automatically.

So it's certainly possible to have a reasonably stable binary-format 
database solution, as long as folks sufficiently familiar with database 
stability coding techniques are handling it.  Which is I very strongly 
suspect why Charles ultimately didn't do it despite his recognizing it 
needed done -- he simply wasn't sure he could pull off the stability he 
himself wanted.  And while I'm not as sure on the reasoning, that could  
be why Heinrich never attempted it either.

So anyway, this has been not a small, if as yet unstated, concern for me.  
Yes I know the reasons and have now seen pan's scalability struggle for 
nearing two and a half decades now, and I 100% agree it's long past time 
the move to database needs done, but... I'm still fearful.

Here's hoping it goes well! May you truly be the coder saving the day in 
that regard with pan as you have been in just stepping up as upstream pan 
maintainer in the first place!  =:^)

Some followup questions in response to your details, below...

> Since the DB contains servers, group and article, a corrupted DB means a
> restart from scratch (unless you have a backup).

Backups are good, particularly when running experimental-branch database 
code! =:^)

> To limit the consequence of a corrupted DB, I think I need to split the
> DB in 3 separate files:
> - server (mostly filled manually by user)
> - groups
> - articles and headers
> 
> Hopefully, only article and headers would be corrupted in case of a
> crash.

That makes a lot of sense.  Is this likely to be simple enough to be 
implemented soon?  I've been considering switching to the database branch, 
and after that split might be an opportune time to do so.

Meanwhile, as I've been writing this I've been thinking about my current 
pan usage.  Surely the database backend will help with binaries group 
scaling, and (as I've mentioned) I do hope to get back into that... but in 
practice it's a theoretical "some day" that may never really happen for me 
personally.

But my current use-case, in practice, is archiving text-groups, never 
expiring headers and with a (multi-gig dedicated partition) cache several 
times bigger than my multi-decade accumulation so it won't be expiring 
anything either.

In the case of my ISP's old groups, I'm /literally/ archiving them, as 
they're no longer a provider and while some of the groups do appear on the 
public newsgroup tree, what I have is archived from their original servers 
and likely no longer publicly available (the NSA surely has them archived 
too but that's not public).  My "news server" configured for them is set 
to zero connections and of course the DNS address is now invalid, so 
what's in my pan text-instance cache for them is /literally/ archived -- 
if that was corrupted without backup, it couldn't be replaced, but by the 
same token, it's no longer updated, so a backup will always remain 
"current" for that server.

The other server I have configured for my text instance is gmane, which of 
course is itself an archive in news-message on news-server form of various 
mailing lists including this one.  As an archive gmane doesn't normally 
expire either, tho as a respectable public server it does honor the x-
noarchive header and (I think) expires them after two weeks or so (else it 
never makes them accessible at all, IDR tho I do remember reading a 
discussion on it, the reason I know the header exists in the first place).

Obviously the gmane lists/groups could be refetched if corrupted locally 
as long as it remains a public server, but I'd prefer not to need to do 
that, certainly beyond the year-ish I might worst-case go between backups 
but ideally beyond say a month (a more reliable backup frequency in any 
case).

So... not talking about the currently experimental database stuff, where 
not keeping regular backups would be flat stupid (aka, defining the value 
of the data not backed up as literally too trivial to be worth backing up 
despite the experimental nature of the database it's stored in and despite 
whatever claims may be made to the contrary, because the backups or lack 
thereof defined in practice what idle claims could not)... but about the 
longer term case where pan's database backend is considered stable...

How good a fit for my text-post-archive use-case do you believe pan's 
database backend, once stable, will be?  I'm assuming that any database-
corrupted messages no longer on the server and not on a local backup 
either simply won't be recoverable...

Regardless, this should be a somewhat different use-case to try the 
database code on... as long as I'm keeping good backups!

Might there be an option to still build with the old text-based backend 
for text-message-archive-use-cases like mine?  (Not that I really consider 
that practical, but maybe...)

Or perhaps more practical, call the new database version pan3 or some such 
(or start its pre-stable release versions at 1.9xx and bump to 2.0 on 
stabilization), and continue maintaining pan 0.xxx as the text-based 
version?

Or maybe I should look into a more traditonal text-based news client for 
that archiving use-case?  (OTOH, current pan has the convenience of being 
able to handle the occasionally posted binary, say screenshots attached to 
messages here on the pan-user list/group, without issue, while many 
traditional text-based news clients require jumping thru hoops for 
binaries.)

Any recommendations on other news clients that are still around and 
maintained if so?  (Sure I can and will when necessary look myself, but 
early-stage thinking via typing, ATM.)

Or maybe I should just run something like a leafnode local server, as 
arguably a more appropriate news archive in the first place?  Then I can 
have it be the unexpiring archive and let pan's database corrupt and be 
rebuilt from a fresh pull from leafnode as necessary?

And assuming I do decide I need to look for a pan archive-alternative, how 
long do you anticipate it'll be before (a) a database-stable release-
version pan is available, and (b) the older text-based pan bitrots to the 
point it's no longer easy to build against reasonably current and distro-
available libraries?  (Obviously the latter depends somewhat on the 
distro, but compared to things like gtk2, python2, etc, thus giving people 
a good idea for their distro by comparison.)

> Note that, as long as sqlite branch is not merged, I don't guarantee
> that the DB stays compatible. Updating the branch may mean that DB needs
> to be destroyed.

Useful to have specified... and a practical reason I should probably wait 
until that database split to switch, given I've not done so yet! =:^)

---
[1]  Due to being on the autism spectrum my messages, as regulars surely 
know, tend to the long side (understatement) because we/I tend to see a 
complex picture where properly addressing one subject, to us, means 
covering how it touches all sorts of other stuff most other folks consider 
tangentially related at best.  But invariably, at least one such element 
ends up in followup discussion anyway, so choosing what to eliminate is a 
real struggle.  Meaning anything above trivial messages often take /hours/ 
to compose, either "properly" covering the subject, or editing and 
reediting out all the "irrelevant" stuff to folks /not/ on the spectrum, 
that isn't irrelevant to the complex picture we see in our head, plus 
sometimes adjusting tone to something hopefully not so "emotionally 
insensitive" as well.  Such editing really is a painful process because to 
us/me we're eliminating context necessary to properly grasp the larger 
picture, and it really can take hours.  But I've accepted it as an often 
necessary aspect of the "masking" ASD people must do to be accepted by 
elements of mainstream society that don't understand the problem, and more 
often as I've recognized and accepted my ASD, I'll spend an hour or two, 
then scrap it and either give up as not worth it after all, or start from 
scratch.  (Tho I never consider such aborted attempts wasted, because they 
often allowed me to better grasp whatever concept myself in the process of 
writing it down, and worst case, I figure if it was worth it to me to 
spend that time I /needed/ to do so, even if I'm the only one that saw 
it.)

This aside is surely a case in point (I did scrap an earlier attempt at 
this post a day or two ago, leaving it marked unread to reply to later), 
but at least it's footnoted instead of parentheticalized in the main text.

So on the lists/groups I cope by marking messages I intend to followup on 
later as unread.  FWIW I probably get back to half of them... The others 
usually eventually getting marked read a year or two later when I give up 
all hope of it even being relevant any longer, tho occasionally I'll 
decide it's still relevant and bring back a thread from the dead.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


_______________________________________________
Pan-users mailing list
Pan-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/pan-users

[Pan-users] Database stability discussion

Reply via email to