Robin Laing posted on Tue, 29 Jul 2025 19:24:42 -0600 as excerpted: > I have been using pan for over a decade and the last time I tried pan a > few days ago (Jul 27), it almost crashed my machine by using all the > available memory, with usage over 16G and the rest of the swap on a > machine with 32G ram and 8G swap. Before that it froze up a couple of > times downloading headers in the large groups for 100 days. > > Would it be worth removing the full .pan2 directory and starting fresh > to see if problems clear up? > > Groups I look at have a large number of articles. Due to obfuscation > usage, they get a very large number, in the 100's of thousands in a few > months. There are original posts as well.
So it has been awhile since I did a "big picture" post. This takes the long way around to a direct answer to your question, but hopefully, it eventually gets there... High-level overview: Pan has always stored its "group world-view" in RAM. Historically that has at times triggered "large group" scaling issues as post volume and server retention competed with current per-app RAM capacity. Various efficiencies have mitigated the problem temporarily, but ultimately pan needs to migrate to a "moving window view" in RAM while only keeping its overall world-view on-permanent-storage. Efforts are ongoing... The first problems (I was around for, at least) were seen in the 00s with 32-bit machines and normal per-app RAM capacity of 1 or 2 gig depending on kernel memory model (32-bit addressable was 4-gig but generally half that was kernel-space, leaving only 2 gig max addressable in users-space). Back then, pan was tracking uncompressed messages using list-widgets, server retention tended to be perhaps ten days at best, a few hundred- thousand messages, and pan would run into memory-capacity issues at ~200K messages. Physical machine coping mechanisms included switching to 64-bit machines to kill the 4-gig limit, or on 32-bit, switching to less efficient separate kernel/user 4G/4G kernel/userspace models to raise per-app memory caps to 4G. (A Linux patch was available but apparently never made it to mainline/Linus -- I switched to amd64 in late 2003 and had "cheap" servers with lower retention -- sometimes only hours on the biggest groups -- so really never personally ran into that limit, but I saw the complaints on- list.) Code-wise, then pan maintainer Charles Kerr switched to a more memory- efficient list-widget and "compressed" some usage by storing common strings like frequent poster names or fragments of large post subject lines only once and referring to them using shorter symbols, only expanding them for display or actual download. At the pan config level, users could work around the problem by expiring "headers" faster if they were on high-retention servers, a technique that continues to help today. Both the coding efficiencies and going 64-bit helped, but only temporarily. Even on 64-bit platforms with 8-16 gig of RAM, high-end for that time, pan would at first still run into issues at around half a million "headers", tho the memory efficiency techniques helped and I remember people reporting pan could now handle ~750K headers before it started struggling. Around that time the Charles introduced the pan rewrite from C to C++ as well, and that kept in mind the memory issues. That and generally increasing memory capacities helped for awhile too, but pan would still run into scaling issues at around 1-1.2 million headers, tho by this point it wasn't so much memory capacity issues per se, but general compute inefficiencies in the entire pan approach, and this remains the case today, though with a good machines I believe pan can handle perhaps a couple million headers now. Meanwhile, Charles had the idea of redesigning pan's entire approach, switching to a database model where pan basically never tracked the whole picture at once as it does currently, but fed a database the header information and did database queries to get a viewing window on whatever information it was trying to deal with at that time. The database backend was going to be sqlite. Unfortunately, I don't believe Charles was really comfortable with database programming and whatever experiments he might have tried he never presented publicly. I think he got frustrated as he ran into issues he didn't have the database programming experience to solve, and "real life" intruded. I remember he tried to find another pan maintainer to take over, but news/nntp has always been somewhat niche, and nobody stepped forward. Eventually he moved on to other things and pan development was effectively abandoned for a few years, tho distro package maintainers tried to keep it at least updated/building/running against still maintained libraries. Petr Kovar did eventually step up as upstream pan maintainer for a period, but he was actually a gtk/gnome translator not a programmer and while he did his best and us users were glad he did, mostly he took distro and user patches and applied them upstream, not really doing a whole lot of his own development. Then Heinrich Mueller appeared, and pan had an upstream maintainer that could and did implement many new pan features once again. It was Heinrich that finally implemented yenc posting, and Heinrich that implemented the long discussed rules integration with scoring, so we could for instance configure pan to auto-delete (pre-download) messages that scored to "ignored" level, auto-mark-read messages that scored "low" (less than zero but not to ignored/-9999), display but not auto-download "headers" for normal messages, and auto-download (or cache) "watched" (+9999) messages. (Two-paragraph connections-limit diversion...) Heinrich also implemented but then on community request reverted the ability to GUI-configure more than four connections per server. The problem was that GNKSA had a rule of max four connections per server, and while I think most will agree that seems a bit anachronistic today (pay servers usually allow double-digit connections as they want you to use up your allotment and buy more, while the remaining free and ISP servers must strictly limit connections server-side to prevent abuse, so a client-level connection limit seems little relevant), Charles took quite a bit of pride in pan being 100% GNKSA compliant including many other factors such as quote/reply format, plain text not HTML, etc, and pan's user community has tended to likewise place quite some emphasis on that. But it was personally my own fear, and the active on-pan-list community seemed to agree, that despite the lack of current relevance of that individual GNKSA limit, should pan eliminate that limit and lose its 100% GNKSA compliance, it'd be a slippery slope and way too easy for pan to lose many of the other GNKSA features that have made pan and its community what it is over the decades. So after an informal on-list vote, Heinrich reverted that change, tho in terms of connection limits, it's worth noting that the GNKSA wording does have a loophole, which pan does exploit. While in keeping with GNKSA pan (the pan GUI) does not allow more than four connections per server to be configured, should a pan user text-edit the pan config to say 5 or 20 or 50 or whatever connections, pan will indeed honor that. So that's how to get around the PAN and GNKSA limit, should one desire to. However, it's worth noting that pan connections are efficient enough that the number of connections is seldom the bottleneck. Instead, the bottleneck is usually the allowed connection speed, either that of the internet connection itself, or the allowed server connection speed. Of course on slow machines that can occasionally be the bottleneck as well, but it's very seldom that pan's GUI config limit of four connections per server actually turns out to be the real bottleneck, so pan's GUI per-server-connections limit doesn't turn out to be much of an issue for most in any case. Never-the-less, the text-edit config option is there for people who /think/ they need more. With Heinrich's changes, pan finally could be considered basically feature complete, but for two things, one of which I at least don't believe is appropriate for pan as a general purpose news client anyway. This is that pan isn't and I don't believe ever will be the best "batch uploader". There are other tools for that, and arguably, the workflow for that doesn't really match that of the general use news client that pan targets, in any case. But Heinrich did implement general purpose yenc uploading, thereby checking off that feature required for a well rounded general- purpose news client. The other one is the subject at hand. But before that, to wrap up a loose thread... While Heinrich did basically "feature-complete" pan, I'm not entirely sure what actually happened to him. Did he simply lose interest after that? Did "real life" happen, say he got a family and didn't really have time for pan any more? Did the always niche-case of news leave his life as a factor and he simply didn't have a personal use-case for pan any longer so he lost interest? In any case, while I definitely remember Charles /trying/ to find a new upstream maintainer to hand off to, and failing, Heinrich was a different story. He both seemed to appear out of nowhere to go great guns on pan for awhile, feature-completing it like I said, and disappear into nowhere. Unfortunately that left pan orphaned without an upstream maintainer again for awhile. Meanwhile, in real life nntp/new's always niche interest seemed to diverge even further from that of the mainline, and the distro package maintainers that had carried pan through its previous orphan state didn't seem so interested now. Unfortunately, pan, like news/nntp in general, has diverged far enough from mainstream interest now that when it lost strong upstream maintainership, many distros (Debian being a significant exception!) dropped it, and without that, pan was increasingly in danger of losing the updates that allowed it to continue to build against current libraries, due to lack of (primarily) distro maintainers creating and maintaining the necessary patches. But I DID say Debian was an exception, the very fortunate exception in this case, as Dominique Dumont, Debian's pan package maintainer, ultimately stepped up to be the upstream pan maintainer as well. Pan was in very real danger of stale-code death when he stepped forward, completing and stabilizing the port to gtk3 and other current libraries just in time as many distros are now dropping or have already dropped gtk2 support. In the process he fixed quite a few bugs, and has actually started introducing new pan code once again. =:^) That brings us back to the subject at hand. Multi-million-message scalability implemented in the form of a database backend, that being pan's biggest current challenge and the second of the two remaining yet- to-implement features. Charles had talked about it but never implemented it and that was the one big feature Heinrich never to my knowledge even attempted, but now Dominique's attempting it. While I personally run live-git pan, DD's entirely reasonably doing that in a separate development branch that I've not tried, so I don't know current status. Last he posted, however, he was working on implementing the database backend first for some less critical stuff, before attempting the real scalability challenge stuff. He may well post a status update reply here, but meanwhile, what about your current problem, using the current pan code? To finally answer your question directly... Yes, removing ~/.pan2 to clear the problem may be worth it, altho personally I'd use a bisect troubleshooting process here, backing up and removing only parts of my config and cache to test, seeing where the actual problem is. In particular, tasks.nzb is the current pan task list and could possibly be corrupted in a crash. Removing it should only clear whatever tasks pan had queued at its freeze/crash, without killing the existing config, and is thus a reasonably safe and limited test. To try to deal with the scaling issues on multi-million-header groups I'd recommend setting as short an expiry as you can reasonably deal with, say two or three days if you download daily. Certainly, on several-hundred- thousand headers per day groups, I'd try to keep expiry under a week if at all possible, as that's going to directly affect pan's memory and scalability due to its current "keep the big picture in main memory" model. Also consider cache size. Pan's default cache size is already quite small for big-binary groups and you probably don't want to reduce it further, but if you've increased it to multiple gigs you might try something smaller. Depending on your workflow, however, especially if your workflow involves a multi-stage download-to-cache before sorting and saving off method (as my normal workflow does), you may run into the problem I had -- too small a cache so messages end up deleted out of cache before you can actually process them. With that size groups you may have to change such a workflow to be more in line with pan's apparently designed workflow of saving off binaries directly, without the intermediate local-cache-first then sort and only /then/ save selected binaries that I actually prefer. Again, with such large/active groups the cache size may directly affect pan's ability to scale and too large a cache could be problematic. And of course you can try to delete the cache (with or without backing it up to restore if the removal doesn't help) if you think it's corrupted, but with binary groups pan will soon fill it again in any case, and you will of course lose the work of already downloading those messages to cache for any you're still working on. Beyond that, crashes could have corrupted files in the groups subdir, which is where pan stores the symbol-mapping I mentioned above. Pan will recreate these if necessary, but given its big-picture-in-memory strategy, unless a file's corrupted deleting it isn't going to do pan much good or save any memory as pan will just have to recreate it. The newsgroups.* files track various newsgroup related state, and can be recreated if necessary, of course with loss of state such as read-message tracking, etc. preferences.xml is the main preferences file, servers.xml is server config and newsrc mapping, and of course the newsrc files track group state too. As I said I'd prefer to back up my pan data dir and test deleting only individual files to troubleshoot where the problem is, and the above should help with that, but if you prefer you can of course just blow the whole thing away and start over, instead. Meanwhile, one other config-related hint you may find helpful. ~/.pan2 is actually only the default. If pan find the PAN_HOME variable set in the environment it inherits at start, it'll look there instead. I actually use this along with a pan wrapper script to setup multiple pan instances, each with its own config. Here I run separate binary vs text instances, for instance, which could be very helpful if you want say much longer text group expiry while you're doing as short a binary group expiry as possible to work around pan's scaling issues. Here, I don't expire my text groups at all, but obviously that's not practical for most binary groups, particularly at the high post volume you're dealing with. (I have a separate test instance too, so state for groups I'm just browsing temporarily doesn't end up in my more permanent instance config.) Setup multiple separate pan instances and different expiry for each isn't a problem! =:^) This of course assumes you know how to setup such wrappers yourself. If you need help with that, ask. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list Pan-users@nongnu.org https://lists.nongnu.org/mailman/listinfo/pan-users