JCA <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Mon, 25 Jun 2007 06:50:26 -0700:
JCA <[EMAIL PROTECTED]> posted [EMAIL PROTECTED], excerpted below, on Mon, 25 Jun 2007 06:50:26 -0700: > For the last few weeks some idiot has taken to flooding sci.crypt > (and possibly other groups) with junk. The postings are spoofed to > appear as coming from regulars in the group, and the contents of the > postings are just random drivel. > > Anybody know a rule, or set of rules, to filter them out? It would > appear that the bogus postings all come from a specific news provider - > things like > > news.highwinds-media.com!hw-filter.lga!newsfe04.lga.POSTED!53ab2750 > > but I don't know how to filter this out. <mode=rant> This is one reason I've pushed for a long time to have scoring/filtering (since before pan had scoring, when it was all binary decision filtering, that's how long) that could match anywhere in the post, in the body, or in headers not in the overviews. The problem is, the stuff in the overviews can generally be entirely controlled by the poster, so if they want to be deliberately disruptive and therefore deliberately and continuously modify this info, in ordered to evade scoring systems like pan's, unfortunately, there's not a lot that the poor users of such clients can do. The problem is, in ordered to score/filter on things not in the overviews, the post must be downloaded first. For better or for worse, Charles' position has always seemed to emphasize scoring in ordered to choose /what/ to download (and/or what to delete without downloading), simply trusting that the overview data used to make such decisions isn't going to be deliberately obfuscated, in ordered to prevent such scoring/ filters from working. My position, OTOH, is that while it's a bonus if a useful score can be used to ignore (ultimately, to kill/delete) or watch (ultimately, to auto- download or at least mark for download) before downloading, just because the post must be downloaded first doesn't mean the war is already lost. It still takes time to view the message, and if automated tools (scoring/ filtering) can be used to either prioritize the viewing (in the case of watch or positive scores), or to allow mark-read or deletion without actual viewing (in the case of ignore or negative scores), well, the war is still won, tho admittedly not as easily. Unfortunately, while I'd have much rather had effective filtering based on /anything/ in the message, than scoring still restricted to overview data only, and while I've been a very active volunteer here on the pan lists/groups, it seems your problem and mine don't appear to hit enough people to be very high on the priority list. Back years ago, when I originally filed the request, Charles stated that yes, he agreed that sort of thing would be useful. However, it was for him pretty much in the "nice to have at some point" category, and thus was "blueskied" (aka "backburnered") into never-never-land. BTW, even the official slrn scorefile documentation, (slrn's scorefile format is what pan uses) says non-overview headers can be matched, tho it goes to pains to point out that it's less efficient since the posts must be downloaded before those scores will match. Of course, Charles has always been quite open to patches, and I've little doubt if someone with the skills had submitted a patch to implement this functionality, we'd not be talking about it now as it'd work as well as overview scoring does. Unfortunately, that's not a set of skills I have, and no one else has seemed to have the itch to scratch, so the functionality remains "bluesky", nice to have "someday". OTOH, the very fact that I'm still here means regardless of whether this particular feature I'd sure like has been instituted or not, pan continues to work better for me than the alternatives, so I guess I can't complain to strenuously. </mode=rant> Meanwhile, despite the fact that we're left fighting with the equivalent of our hands tied behind our backs, there's still a slight chance you can find something useful to match. I assume you've already found nothing useful to match in the subject or author headers, and date, group, line- count, xref, etc, are too generic to be useful. That leaves one remaining possibility, the message-ID. If you are lucky and this guy isn't an expert at this yet, the message-ID header, which *IS* part of the overview headers, will contain something identifying that can be scored on, hopefully without matching a bunch of other posts in the process. Message-ID is (or is supposed to be) unique for each post, so you'll have to use contains or regex expression type matching. You'll also have to hand-edit the score in your scorefile, altho you can get it most of the way there using pan's GUI. Of course, you first have to see if there's part of the message-ID that's uniquely his, but matches all his messages. Turn view headers on and check that header in several of his messages. You will likely want to compare those of other regulars as well, just to be sure you won't over-match. If you find something useful to match, select one of his messages and add a score on it, based on the References header, which pan will auto-fill-out with the message-ID. You'll need to edit out the part that changes, of course. Once you have it setup, add the score (without rescore), but keep open the view scores dialog. Then load the scorefile in your favorite text editor and find the score (should be at the end). Edit the References line, changing it to Message-ID. Save the file, and back in pan, NOW hit the close and rescore in the view article's score window. If you got it right, that should do it, and won't match anyone else's real posts. As I said tho, the good attackers won't overlook message-ID and will already set it so his provider won't, and you'll have no reliable way to score his posts. The best attackers won't just fake the message-ID, they'll make it look like the one the regular author they are faking uses, so matching it will unfortunately match the regular author's posts as well. BTW, that highwinds-media entry looks familiar. My ISP (Cox) outsources from them, so all Cox users get that stamp. If it's a Cox user, however, not some other non-cox user of the same server, a number of other headings will show up as well, including an unencrypted NNTP-posting- host, an X-Complaints-To header listing [EMAIL PROTECTED], and an X-Trace header listing the same user IP as the NNTP-Posting-Host and the same server as the posted entry. If it doesn't have those elements, it's probably not a Cox user, anyway. Unfortunately, none of those headers normally appear in the overviews, so pan can't properly score against them. =8^( -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman _______________________________________________ Pan-users mailing list [email protected] http://lists.nongnu.org/mailman/listinfo/pan-users
