I talked to Justin and he is not aware of a SurfNet machine. They are a
targeted sponsor who has helped us. I've sent one email seeing if I can
dig anything up but I must concur.
Dave's research shows that "trap-proc.spamassassin.org and
spam-upload.spamassassin.org both resolve to 192.87.106.247 which is a
SURFnet IP in the Netherlands. I am not able to ping or nmap any
response to that IP."
With that, we are giving up on the mass-check centralized box and
shutting down the trap-proc and sought processing.
These are the tasks:
1 - Find out more from SurfNet about the 192.87.106.247 machine [KAM]
2 - [Joe, can you help?] Find out what Sonic is trapping and where they
are sending it.
3 - Look at Sought 2.0 [Perhaps with AXB and Dave Jones]
4 - Delete the backup data on sa-vm1, move /usr/local/spamassassin to a
different mount so that entire partition can be returned to Infra [DAVE]
5 - Open Infra ticket to return data. [KAM]
So Joe can you do the following:
- Please deprecate the old colo box. Thank you for hosting and
providing it on behalf of the project!
- the ASF would like to recognize Sonic as a sponsor. Any update on the
paperwork? If paperwork is an issue, I can grandfather you without it.
- How's our bandwidth usage look to Sonic. Should we bump it up/down?
Dave:
Any chance you can handle these tasks or let me know if they already
were done?
- Put an A record for the box into DNS for apachesf.Spamassassin.org
- Add to SysAdmins Docs / wiki the new apachesf and remove colo and note
that trap-proc is no longer online.
On 1/16/2018 7:19 PM, Dave Jones wrote:
On 01/16/2018 09:00 AM, Kevin A. McGrail wrote:
Joe and Dave + SASA:
I have combed my notes. Here's what I have which I have removed some
password info on but I think it can help rebuild the process. Dave,
can you take a look? I can get you passwords. Talon1 still exists,
spamassassin-vm does not, I have a backup of spamassassin-vm from 3.7
months ago.
Regards,
KAM
#1 - Some boxes are just names for other boxes
trap-proc.spamassassin.org. Sonic has scripts set up to archive
collected spam to that server.
trap-proc.spamassassin.org and spam-upload.spamassassin.org both
resolve to 192.87.106.247 which is a SURFnet IP in the Netherlands. I
am not able to ping or nmap any response to that IP.
#2 - My notes from spamassassin-vm.apache.org that catastrophically
died:
this was the traps cron that needs to be added on spamassassin-vm
20 2 * * * rsync -rze ssh --whole-file --size-only --delete
j...@trap-proc.spamassassin.org
<mailto:j...@trap-proc.spamassassin.org>:/home/jm/cor/.
/export/home/bbmass/uploadedcorpora/traps/.
DONE - add this traps account
DONE - fix perms for /export/home/bbmass/uploadedcorpora/traps/
DONE - add cron job
Since this IP is not responding to anything, not sure how long this
has been offline. I can't see what purpose this would be be doing
based on the masscheck flow that worked on last May on sa-vm1.apache.org.
The old spamassassin-vm.apache.org used to run a buildbot version for
on-demand masschecking of uploaded corpus but we don't have that setup
again anywhere. If we do set this up again, I don't want to put this
on the sa-vm1.apache.org without any swap space. I don't see the value
of uploading ham/spam and running the masscheck centrally anymore.
#3 - From April 2017
Let me know if you are not the correct person to talk to about this, but
we are having issues reaching trap-proc.spamassassin.org. It looks like
we have some scripts set up to archive collected spam to that server,
and I haven't seen a successful connection for a few days now.
--
Grant Keller
System Operations
grant.kel...@sonic.com <mailto:grant.kel...@sonic.com>
#4 - from 2014
The box at Sonic is the backend for the SpamAssassin spamtraps feed.
To be honest, I am not sure anyone or anything is consuming the
collected data at this stage -- it should probably be shut down,
unless someone wants to take it over?
My vote is to shut it down.
incoming.spamassassin.org : this is the spamtrap machine at Sonic.
Basically, qpsmtpd
handles the incoming SMTP traffic, handing it off via a Gearman queue
to "gears" -- a
set of scripts running in the background which filter out noise,
crap, bounces, etc.
then buffer them to mbox files and upload.
/home/trap contains the code, /home/trapper is the output files.
/etc/init.d/gears starts the
scripts which compose it, copying them to /tmpfs first so they don't
hit the disk where possible,
for speed.
The main config file is at /home/trap/code/gears/config .
The buffered mbox files are then uploaded to my S3 account, using an
IAM credential which can only
access one single bucket called "mailtrap". After 1 day those files
are auto-expired.
This stuff all appears to be working ok, although the volume is
pretty high (and I suspect
it's costing me a fair bit of money even despite the auto-expiration!)
The gears logs show no recent activity. I think this has stopped
being fed data/files a long time ago.
Next step is spamassassin2.zones.apache.org, which has an alias of
"trap-proc.spamassassin.org"
in DNS. A cron on my user account runs
"/home/trapscripts/copy_to_corpus" which
(at least at some point) appears to have selected a randomised subset
of uploaded spam corpora
into /home/jm/cor/spam and /home/jm/cor/nonspam. Those directories
are now empty, so
I think this part may have broken at some point in 2013 :(
I can't track down the script which downloads files from the S3
account, annoyingly!
Again, everything there runs as "jm".
Again, don't see the value of running a centralized masscheck anymore
especially on sa-vm1.apache.org.
Finally there is talon1. The host is talon1.pccc.com; username "jm",
password is in
spamassassin2.zones.apache.org/root/sought_rules_info.txt (readable
only by
root).
I though the SOUGHT rulesets became "unsupported" a long time ago. If
you want me to check out talon1.pccc.com, send me the creds.
That host is being used to generate the SOUGHT rulesets, and as far
as I can
see (apologies, I haven't been monitoring it at all recently!) it
still seems
to be doing so. It all runs from the "jm" user account, every 4 hours
from
cron; see "crontab -l". Part of the process is to rsync-over-ssh the
ham and spam
corpus from jm@spamassassin2.
Then the final step of that script is to publish the files to my
server at taint.org,
by "svn commit"ing in ~/sought on talon1. That directory commits
back to an
svn repo on that host over svn+ssh, then SSHes to that host and runs
a script;
that generates GPG signatures, updates the DNS records and pushes it
to the
Cloudfront/S3 bucket for rules.yerp.org. If/when you guys take this
over, this
bit definitely needs to be moved to another host and account, since
that's
my main personal server ;)
Having said that, I'm happy to hand over the credentials to all the
other "jm" accounts
named above. I've put the passwords into
spamassassin2.zones.apache.org/root/sought_rules_info.txt (readable
only by root).
Feel free to take over those accounts and do what you will with them ;)
Sorry I haven't handed this over earlier -- even reverse-engineering
all this took
quite a lot of effort. Legacy systems suck!
I definitely agree!
The trap data comes from: It's partly a typical spamtrapping MX
capturing dead domains, and partly /etc/aliases forwards from other
ISPs around the world, who are following the "how to donate your
spamtrap to SA" instructions on the wiki. Note that the latter means
that we have to do a bunch of stripping off forwarding steps when/if
we act on that data.
the domains are MX records hanging off existing,
live domains; e.g. I'd add a "mx.taint.org", seed a few email
addresses in
those domains eg for web scrapers, then MX the entire domain to the
traps
machine.
> - the alias forwards: where they pointing to?
Essentially there's a *@incoming.spamassassin.org
<mailto:*@incoming.spamassassin.org> catch-all, and the alias
forwards redirect spam into named addresses there.
sought_rules_info.txt:
jm@talon1 password: <removed>
incoming.spamassassin.org = traps machine: u root p <removed >
u jm p <removed>
jm account on zones2: <removed>
Doing about 74000 messages per day as of 2/5/2014
DONE - WORKS AS of 4/23 GOING TO 76.191.162.2 1 - Get SSH access
working - Pinged Justin on 4/22
DONE - 2 - why does incoming.spamassassin.org have two IPs? - Emailed
Justin
incoming.spamassassin.org. 3507 IN A 76.191.162.2
incoming.spamassassin.org. 3507 IN A 75.101.166.134
Huh. I had no idea we were still doing that ;) That is the
Mailchannels spamtrap IP. If you remember back in 2008 (private@ was
cc'd), they donated spamtrap hosting to us, in exchange for spam
data. We eventually moved off the donated spamtrap server (in EC2)
which they were paying for, to the current one in PCCC. it looks like
we never changed the 50:50 split setup though on the MX record (and
I'd forgotten about it). I think we can probably turn that off now….
76.191.162.2 is our one. I've just verified that I'm able to SSH to
it as root.
DONE - 2a - Remove 75.101.166.134 from incoming.spamassassin.org. DNS
entry
I think I added the second incoming A record (didn't see the first
one) to the new sa-vm1.apache.org just because I found an Apache HTTPD
virtual host setup on the old box for incoming but it's not doing
anything.
3 - more?
On 1/15/2018 11:49 AM, Dave Jones wrote:
--
Dave