Dominique Dumont via Pan-users posted on Sat, 09 Aug 2025 17:44:23 +0200 as excerpted:
> Pan with sqlite [...] > I've not tested much header download. It works, but I don't really know > what happens during a crash when downloading. > > That said, a crash may corrupt the database. I've not seen it often, but > it can happen. So this has been my unstated-until-now concern. I actually had your first- in-thread status message marked unread until I had time to try to writeup something on it.[1] I suppose most computer-experienced folks have at some point experienced at least one "catastrophic" database corruption failure. Hopefully we've mostly learned the importance of timely backups in the process, but we all carry those scars and often fear dealing with "databased" data -- as opposed to plain-text data where at worst, partial recovery is available via manual text-editing. A big example personally is when the kdepim folks "jumped the akonadi shark" back in the mid-kde4 (4.6) era, and kmail repeatedly ate my emails. They still had plain-text backups fortunately and could (mostly?) automatically recover the data from them, but hours of recovery, repeatedly, for /email/, which is at its core '80s plain-text technology and *NOT* rocket science, and which I and so many others depend on to be *rock* *solid*, was simply /not/ something I was willing to accept or tolerate. So I ended up jumping the kmail ship I'd been on for over a decade, to the still plain-text-based claws-mail (which ironically I had considered but chose kmail back when I did MS -> Linux at the turn of the century), where its plain-text scriptability is a primary selling feature, so they will NOT be jumping the binary-format database bandwagon any time soon! OTOH, there /are/ reasonably reliable databases out there -- it has been quite awhile since I had to resort to backups to recover everything databased in my firefox profile, for instance. And while I chose the plain-text claws-mail as I didn't need/want the extra features, the database-backed thunderbird and evolution email clients are generally considered reasonably reliable as well. Plus of course there's the many commercial databases and all the TB of data they safeguard out there, but in part their reliability is surely due to professional management with 5-9s uptime requirements and thus professionally managed geographically diversified multi-level hot- and cold- backups. Obviously that's rather far from the use-case of the guy/gal with a single physical machine hosting his/her database apps, with no redundant power supplies and likely no backups beyond what the apps do automatically. So it's certainly possible to have a reasonably stable binary-format database solution, as long as folks sufficiently familiar with database stability coding techniques are handling it. Which is I very strongly suspect why Charles ultimately didn't do it despite his recognizing it needed done -- he simply wasn't sure he could pull off the stability he himself wanted. And while I'm not as sure on the reasoning, that could be why Heinrich never attempted it either. So anyway, this has been not a small, if as yet unstated, concern for me. Yes I know the reasons and have now seen pan's scalability struggle for nearing two and a half decades now, and I 100% agree it's long past time the move to database needs done, but... I'm still fearful. Here's hoping it goes well! May you truly be the coder saving the day in that regard with pan as you have been in just stepping up as upstream pan maintainer in the first place! =:^) Some followup questions in response to your details, below... > Since the DB contains servers, group and article, a corrupted DB means a > restart from scratch (unless you have a backup). Backups are good, particularly when running experimental-branch database code! =:^) > To limit the consequence of a corrupted DB, I think I need to split the > DB in 3 separate files: > - server (mostly filled manually by user) > - groups > - articles and headers > > Hopefully, only article and headers would be corrupted in case of a > crash. That makes a lot of sense. Is this likely to be simple enough to be implemented soon? I've been considering switching to the database branch, and after that split might be an opportune time to do so. Meanwhile, as I've been writing this I've been thinking about my current pan usage. Surely the database backend will help with binaries group scaling, and (as I've mentioned) I do hope to get back into that... but in practice it's a theoretical "some day" that may never really happen for me personally. But my current use-case, in practice, is archiving text-groups, never expiring headers and with a (multi-gig dedicated partition) cache several times bigger than my multi-decade accumulation so it won't be expiring anything either. In the case of my ISP's old groups, I'm /literally/ archiving them, as they're no longer a provider and while some of the groups do appear on the public newsgroup tree, what I have is archived from their original servers and likely no longer publicly available (the NSA surely has them archived too but that's not public). My "news server" configured for them is set to zero connections and of course the DNS address is now invalid, so what's in my pan text-instance cache for them is /literally/ archived -- if that was corrupted without backup, it couldn't be replaced, but by the same token, it's no longer updated, so a backup will always remain "current" for that server. The other server I have configured for my text instance is gmane, which of course is itself an archive in news-message on news-server form of various mailing lists including this one. As an archive gmane doesn't normally expire either, tho as a respectable public server it does honor the x- noarchive header and (I think) expires them after two weeks or so (else it never makes them accessible at all, IDR tho I do remember reading a discussion on it, the reason I know the header exists in the first place). Obviously the gmane lists/groups could be refetched if corrupted locally as long as it remains a public server, but I'd prefer not to need to do that, certainly beyond the year-ish I might worst-case go between backups but ideally beyond say a month (a more reliable backup frequency in any case). So... not talking about the currently experimental database stuff, where not keeping regular backups would be flat stupid (aka, defining the value of the data not backed up as literally too trivial to be worth backing up despite the experimental nature of the database it's stored in and despite whatever claims may be made to the contrary, because the backups or lack thereof defined in practice what idle claims could not)... but about the longer term case where pan's database backend is considered stable... How good a fit for my text-post-archive use-case do you believe pan's database backend, once stable, will be? I'm assuming that any database- corrupted messages no longer on the server and not on a local backup either simply won't be recoverable... Regardless, this should be a somewhat different use-case to try the database code on... as long as I'm keeping good backups! Might there be an option to still build with the old text-based backend for text-message-archive-use-cases like mine? (Not that I really consider that practical, but maybe...) Or perhaps more practical, call the new database version pan3 or some such (or start its pre-stable release versions at 1.9xx and bump to 2.0 on stabilization), and continue maintaining pan 0.xxx as the text-based version? Or maybe I should look into a more traditonal text-based news client for that archiving use-case? (OTOH, current pan has the convenience of being able to handle the occasionally posted binary, say screenshots attached to messages here on the pan-user list/group, without issue, while many traditional text-based news clients require jumping thru hoops for binaries.) Any recommendations on other news clients that are still around and maintained if so? (Sure I can and will when necessary look myself, but early-stage thinking via typing, ATM.) Or maybe I should just run something like a leafnode local server, as arguably a more appropriate news archive in the first place? Then I can have it be the unexpiring archive and let pan's database corrupt and be rebuilt from a fresh pull from leafnode as necessary? And assuming I do decide I need to look for a pan archive-alternative, how long do you anticipate it'll be before (a) a database-stable release- version pan is available, and (b) the older text-based pan bitrots to the point it's no longer easy to build against reasonably current and distro- available libraries? (Obviously the latter depends somewhat on the distro, but compared to things like gtk2, python2, etc, thus giving people a good idea for their distro by comparison.) > Note that, as long as sqlite branch is not merged, I don't guarantee > that the DB stays compatible. Updating the branch may mean that DB needs > to be destroyed. Useful to have specified... and a practical reason I should probably wait until that database split to switch, given I've not done so yet! =:^) --- [1] Due to being on the autism spectrum my messages, as regulars surely know, tend to the long side (understatement) because we/I tend to see a complex picture where properly addressing one subject, to us, means covering how it touches all sorts of other stuff most other folks consider tangentially related at best. But invariably, at least one such element ends up in followup discussion anyway, so choosing what to eliminate is a real struggle. Meaning anything above trivial messages often take /hours/ to compose, either "properly" covering the subject, or editing and reediting out all the "irrelevant" stuff to folks /not/ on the spectrum, that isn't irrelevant to the complex picture we see in our head, plus sometimes adjusting tone to something hopefully not so "emotionally insensitive" as well. Such editing really is a painful process because to us/me we're eliminating context necessary to properly grasp the larger picture, and it really can take hours. But I've accepted it as an often necessary aspect of the "masking" ASD people must do to be accepted by elements of mainstream society that don't understand the problem, and more often as I've recognized and accepted my ASD, I'll spend an hour or two, then scrap it and either give up as not worth it after all, or start from scratch. (Tho I never consider such aborted attempts wasted, because they often allowed me to better grasp whatever concept myself in the process of writing it down, and worst case, I figure if it was worth it to me to spend that time I /needed/ to do so, even if I'm the only one that saw it.) This aside is surely a case in point (I did scrap an earlier attempt at this post a day or two ago, leaving it marked unread to reply to later), but at least it's footnoted instead of parentheticalized in the main text. So on the lists/groups I cope by marking messages I intend to followup on later as unread. FWIW I probably get back to half of them... The others usually eventually getting marked read a year or two later when I give up all hope of it even being relevant any longer, tho occasionally I'll decide it's still relevant and bring back a thread from the dead. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users