As a follow-up to, but off-topic from the bug report ... ------- Additional Comments From [EMAIL PROTECTED] > 2004-01-25 02:18 ------- > I don't like the idea of having to run mass-checks manually and > extracting domain names to check from that -- mostly because most > people won't do it. > > How about this: > > - Extract registerable domain part using reportedly existing heuristics > (hostpart.spammer.co.uk -> spammer.co.uk) >
Over the weekend, I've collected 3600 host names associated with 16,300 URL's extracted from about 80,000 spam messages going back to August of this year. They're sorted in reverse dot order, for example: trimtram.net trinketreach.net www.try4free.net www.ultrastats.net umbrellacover.net www.usagov.net www.usaskylink.net ns.usenetsolution.net www.vacationpromo.net mysite.verizon.net viva-x.net www.vivato.net bradford.hfwnflvzxb.wealthnation.net lane.nerbq.wealthnation.net www.whitephantom.net www.whitetrashsluts.net www.whoringfor-college.net www.wideep.net As you can see, for example, the wealthnation.net entries are together, but the host name prefixes are different. Question: is there a Perl package that can be used to boil these down to their domain name part, suitable for a whois look up? Where I'm going with this is to try and build a data base of same regirstrar/techinal point of contact and so on. One approach I thought of was to try a whois on the fully qualified host names above, and if it doesn't succed, then remove the first component and try again, and so on, but that's not very elegant. Regarding whois, I tried a few of the domains in the list and noticed that whois turned up empty. Is there a database somewhere that relates domain names to their registrar, or to a server that will reply with their whois info?
