Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
> > We just created an URL signature algorithm to be able to query an entire > URL at our URIBL: > > https://spfbl.net/en/uribl/ > > Now we are able to blacklist any malicious shortener URL > > > Leandro, > > Thanks for all you do! And good luck with that. But there are a few > potential problems. When I analyzed Google's shortners about a month ago, I > found that a VERY large percentage of the most malicious shortened URLs > were a situation where the spammers were generating a unique shortner for > each individual message/recipient-address. This causes the following HUGE > problems (at least for THESE particular shortners) when publishing a > full-URL dnsbl: > Thank you for all those observations! > (1) much of what you populate your rbldnsd file with is going to be > totally ineffective for anyone since it ONLY applied to whatever single > email address where the spam was original sent (where you had trapped it) - > everyone else is going to get DIFFERENT shortners for the spam from these > same campaigns that are sent to their users. > You are right, but we do not use rbldnsd. We have our own DNSBL implementation that uses a more efficient data structure. Anyway, I thing that is not a good idea list each shortener, as you sad. Maybe thrice complains of same shortener. We will discover what is the best some time. > (2) get ready for EXTREME rbldnsd bloat. You're gonna need a LOT of RAM > eventually? And if you ever distribute via rsync, those are going to be > HUGE rsync files (and then THEY will need a lot of RAM). Sadly, most of > that bloat is going to come from entries that are doing absolutely nothing > for anyone. > That is it! We use a VM with 16GB and the software is using about 10GB to keep more than 30 million registers at memory. That is something about 350 bytes per register. Our software have an expiration mechanism, than this memory occupation is not growing to fast now. But we must keep one eye on it always. > (3) You might be revealing your spam traps to the spammers. In cases where > the spammers are sending that 1-to-1 spam to single recipient shortners, > then all they gave to do is enumerate through their list of shortners, > checking them against your list - and they INSTANTLY get a list of every > recipient address that triggers a listing on your DNSBL. If you want to > destroy the effectiveness of your own DNSBL's spam traps - be my guest. But > if you're getting 3rd party spam feeds (paid or free) - then know that > you're then screwing over your 3rd party spam feed's spam traps - and those > OTHER anti-spam system that rely on such feeds, which will then diminish in > quality. (unless you are filtering OUT these MANY 1-to-1 shortner spams) > Not only spamtraps will trigger this listing. All active users will do it too by complains. The spammer will not know who is spamtrap and who is active user. > Maybe there is enough OTHER shortners (that are sending the same shortners > to multiple recipients) to make this worthwhile? But the bloat from the > ones that are uniquely generated could be a challenge, and could > potentially cause a MASSIVE amount of useless queries. I'd be very > interested to see what PERCENTAGE of such queries generated a hit! > > Meanwhile, in my analysis I did about a month ago, about 80% of Google's > shortners found in egregious spams (that did this one-to-one > shorter-to-recipient tactic)... were all banging on one of ONLY a dozen > different spammers' domains. Therefore, doing a lookup on these and then > checking the domain found at the base of the link it redirects to... is a > more effective strategy for these - whereas, for THESE 80% of egregious > google shortners, a full URL lookup is worthless, consuming resources > without a single hit. > That is right. We have same situation here. But check first URL is not only action we do. Our script can follow shortener redirections and catch the spammer by last URL of redirection chain: https://www.dropbox.com/s/5aorrijafw5ygk0/uribl.pl?dl=0 The spammers can be trapped by any shortener they have or by this dozen domains that shortener hides. Alternatively, you may have found a way to filter out these types of > individualized shortners, to prevent that bloat? But even then, everyone > should know that while your new list might be helpful, it would be good for > others to know your new list isn't applicable to a large percentage of > spammy shortners, since it is still useless against these individualized > shortners. > I think that we all must cause to much of work for spammers, as much they cause to us. If the spammer uses individualized shorteners, we can list each one by crossing data with listed final chain URL domains. If they uses individualized URL domains, we can list each one by crossing data with listed URL equivalent IP (same machine for all spammer domains). We can make it more and more expensive for spammers. But we must work together to do it. > NOTE: Google has made
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
> > > > Then the frequency is 10 per second, not 100ms. Querying more often > > > is a higher frequency. > > > > That is it! 10 per second or one every 100ms. The first is a flow rate > and > > the second is a frequency. > > One every 100ms is a frequency, agreed. > > Two every 100ms is a higher frequency, and means faster requests. > > One every 50ms is the same rate as two every 100ms, therefore it is also a > higher frequency than one every 100ms. > You are right! My mistake. I just fixed website information. Thanks! > > Regards, > > > Antony. > > -- > I wasn't sure about having a beard at first, but then it grew on me. > >
Re: OT: Frequency vs. Period (was Re: The "goo.gl" shortner...)
2018-04-03 11:57 GMT-03:00 Dianne Skoll: > On Tue, 3 Apr 2018 11:09:38 -0300 > Leandro wrote: > > > This means, for example, your system do 10 queries at same second, > > then the query frequency is 100ms. > > In SI units, frequency has the unit s^(-1) and period has the unit s, > where s stands for "second" > > So 100ms is the period, and 10/s is the frequency. Basic dimensional > analysis. > You are right! My mistake. I just fixed website information. Thanks! > > Regards, > > Dianne. >
Re: OT: Congratulations Dianne
Thank you everyone. I hope this leads to good things for email filtering. > Sorry, but what is AppRiver, and what is Roaring Penguin, and who is > Dianne? Answers to those questions are all a Google query away. It is off-topic for Spamassassin, I grant you, and hence the OT: tag. Thank you again everyone, but I think this thread should end. Regards, Dianne.
Re: OT: Congratulations Dianne
Axbwrites: > AppRiver Acquires Roaring Penguin > > https://globenewswire.com/news-release/2018/03/26/1453063/0/en/AppRiver-Acquires-Roaring-Penguin.html Sorry, but what is AppRiver, and what is Roaring Penguin, and who is Dianne? It seems like people are responding as if this isn't spam, so I'm actually kind of curious because if you have no idea and are just here because of spamassassin, its a bit of a weird message.
Re: OT: Congratulations Dianne
>Excellent! Dianne, I hope you benefited greatly in this acquisition! Rob, let's add to your hope some beers as well to celebrate it! :-) PedroD
Re: OT: Congratulations Dianne
On 4/3/2018 1:18 PM, Axb wrote: AppRiver Acquires Roaring Penguin https://globenewswire.com/news-release/2018/03/26/1453063/0/en/AppRiver-Acquires-Roaring-Penguin.html Excellent! Dianne, I hope you benefited greatly in this acquisition! -- Rob McEwen https://www.invaluement.com
Re: OT: Congratulations Dianne
$Congrats{'Dianne'} += 1 ; # :-) PedroD >On Tuesday, April 3, 2018, 7:18:08 PM GMT+2, Axbwrote: >AppRiver Acquires Roaring Penguin>>https://globenewswire.com/news-release/2018/03/26/1453063/0/en/AppRiver-Acquires-Roaring-Penguin.html
OT: Congratulations Dianne
AppRiver Acquires Roaring Penguin https://globenewswire.com/news-release/2018/03/26/1453063/0/en/AppRiver-Acquires-Roaring-Penguin.html
Re: OT: Frequency vs. Period (was Re: The "goo.gl" shortner...)
Note: Goo.gl is being shutdown. https://www.engadget.com/2018/03/30/google-shutting-down-goo-gl-url-shortening-service/ Apologies if I already noted that here. -- Kevin A. McGrail Asst. Treasurer & VP Fundraising, Apache Software Foundation Chair Emeritus Apache SpamAssassin Project https://www.linkedin.com/in/kmcgrail - 703.798.0171 On Tue, Apr 3, 2018 at 10:57 AM, Dianne Skollwrote: > On Tue, 3 Apr 2018 11:09:38 -0300 > Leandro wrote: > > > This means, for example, your system do 10 queries at same second, > > then the query frequency is 100ms. > > In SI units, frequency has the unit s^(-1) and period has the unit s, > where s stands for "second" > > So 100ms is the period, and 10/s is the frequency. Basic dimensional > analysis. > > Regards, > > Dianne. >
Re: Spam from addresses where full name mirrors left-hand side of address
On Tue, 3 Apr 2018, RW wrote: On Mon, 2 Apr 2018 11:33:27 -0700 (PDT) John Hardin wrote: On Mon, 2 Apr 2018, Amir Caspi wrote: many organizations -- especially government or other large orgs -- also use firstname.middleinitial.lastname as their user part. So require a minimum length for the middle part: header THREE_WORD_MONTY From =~ /(\w+) (\w{2,}) (\w+) <\1.\2.\3/ A meta rule using multi-dots could work, by either looking for specific keywords or matching with other spammy indicators... but by itself there's no real way to distinguish these AFAICT. I think a meta rule is the only safe way to go, but personally I would _NOT_ use a rule like the one suggested where the quoted part equals the user part, since every firstname.lastname address will get caught that way. Your comment is valid, but the suggested rule requires three parts, so won't hit on firstname.lastname-style mailbox naming. However, since it's looking for periods, it won't hit the dash- and underscore-delimited versions. It looks for . not \. Ah, yes, my mistake. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The world has enough Mouse Clicking System Engineers. -- Dave Pooser --- 10 days until Thomas Jefferson's 275th Birthday
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Tue, 3 Apr 2018 11:21:35 -0400 Rob McEwenwrote: > Thanks for all you do! And good luck with that. But there are a few > potential problems. When I analyzed Google's shortners about a month > ago, I found that a VERY large percentage of the most malicious > shortened URLs were a situation where the spammers were generating a > unique shortner for each individual message/recipient-address. We found that too, but in most cases, they generated the unique URLs by adding query parameters to the same base URL, sort of like this: http://malware.net/?id=znsjdsjau http://malware.net/?id=aosu94e etc... and then shortening them. So if you blacklist just the base URL, you cover those all off, assuming you expand out shortened URLs as part of your processing, of course. > Meanwhile, in my analysis I did about a month ago, about 80% of > Google's shortners found in egregious spams (that did this one-to-one > shorter-to-recipient tactic)... were all banging on one of ONLY a > dozen different spammers' domains. Therefore, doing a lookup on these > and then checking the domain found at the base of the link it > redirects to... is a more effective strategy for these - whereas, for > THESE 80% of egregious google shortners, a full URL lookup is > worthless, consuming resources without a single hit. Yep, that's what we found too. Regards, Dianne.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On 4/3/2018 9:27 AM, Leandro wrote: We just created an URL signature algorithm to be able to query an entire URL at our URIBL: https://spfbl.net/en/uribl/ Now we are able to blacklist any malicious shortener URL Leandro, Thanks for all you do! And good luck with that. But there are a few potential problems. When I analyzed Google's shortners about a month ago, I found that a VERY large percentage of the most malicious shortened URLs were a situation where the spammers were generating a unique shortner for each individual message/recipient-address. This causes the following HUGE problems (at least for THESE particular shortners) when publishing a full-URL dnsbl: (1) much of what you populate your rbldnsd file with is going to be totally ineffective for anyone since it ONLY applied to whatever single email address where the spam was original sent (where you had trapped it) - everyone else is going to get DIFFERENT shortners for the spam from these same campaigns that are sent to their users. (2) get ready for EXTREME rbldnsd bloat. You're gonna need a LOT of RAM eventually? And if you ever distribute via rsync, those are going to be HUGE rsync files (and then THEY will need a lot of RAM). Sadly, most of that bloat is going to come from entries that are doing absolutely nothing for anyone. (3) You might be revealing your spam traps to the spammers. In cases where the spammers are sending that 1-to-1 spam to single recipient shortners, then all they gave to do is enumerate through their list of shortners, checking them against your list - and they INSTANTLY get a list of every recipient address that triggers a listing on your DNSBL. If you want to destroy the effectiveness of your own DNSBL's spam traps - be my guest. But if you're getting 3rd party spam feeds (paid or free) - then know that you're then screwing over your 3rd party spam feed's spam traps - and those OTHER anti-spam system that rely on such feeds, which will then diminish in quality. (unless you are filtering OUT these MANY 1-to-1 shortner spams) Maybe there is enough OTHER shortners (that are sending the same shortners to multiple recipients) to make this worthwhile? But the bloat from the ones that are uniquely generated could be a challenge, and could potentially cause a MASSIVE amount of useless queries. I'd be very interested to see what PERCENTAGE of such queries generated a hit! Meanwhile, in my analysis I did about a month ago, about 80% of Google's shortners found in egregious spams (that did this one-to-one shorter-to-recipient tactic)... were all banging on one of ONLY a dozen different spammers' domains. Therefore, doing a lookup on these and then checking the domain found at the base of the link it redirects to... is a more effective strategy for these - whereas, for THESE 80% of egregious google shortners, a full URL lookup is worthless, consuming resources without a single hit. Alternatively, you may have found a way to filter out these types of individualized shortners, to prevent that bloat? But even then, everyone should know that while your new list might be helpful, it would be good for others to know your new list isn't applicable to a large percentage of spammy shortners, since it is still useless against these individualized shortners. NOTE: Google has made some improvements recently, and I haven't yet analyzed how much those improvements have changed any of these things I've mentioned? PS - the alphanumeric code at the end of these shortners tend to be case-sensitive, while the rest of the URL is NOT case sensitive (and they also work with both "https" and "http")... so you might want to standardize this on (1) https and (2) everything lower case up until the code at the end of the shortner - before the MD5 is calculated. Otherwise, it could easily break if the spammer just mixes up the capitalization of the shortner URL up until the code at the end of the shortner. -- Rob McEwen https://www.invaluement.com
Re: Spam from addresses where full name mirrors left-hand side of address
On Mon, 2 Apr 2018 11:33:27 -0700 (PDT) John Hardin wrote: > On Mon, 2 Apr 2018, Amir Caspi wrote: > > > many organizations -- especially government or other > > large orgs -- also use firstname.middleinitial.lastname as their > > user part. > > So require a minimum length for the middle part: > >header THREE_WORD_MONTY From =~ /(\w+) (\w{2,}) (\w+) <\1.\2.\3/ > > > A meta rule using multi-dots could work, by either looking for > > specific keywords or matching with other spammy indicators... but > > by itself there's no real way to distinguish these AFAICT. I think > > a meta rule is the only safe way to go, but personally I would > > _NOT_ use a rule like the one suggested where the quoted part > > equals the user part, since every firstname.lastname address will > > get caught that way. > > Your comment is valid, but the suggested rule requires three parts, > so won't hit on firstname.lastname-style mailbox naming. > > However, since it's looking for periods, it won't hit the dash- and > underscore-delimited versions. It looks for . not \.
OT: Frequency vs. Period (was Re: The "goo.gl" shortner...)
On Tue, 3 Apr 2018 11:09:38 -0300 Leandrowrote: > This means, for example, your system do 10 queries at same second, > then the query frequency is 100ms. In SI units, frequency has the unit s^(-1) and period has the unit s, where s stands for "second" So 100ms is the period, and 10/s is the frequency. Basic dimensional analysis. Regards, Dianne.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Tuesday 03 April 2018 at 16:43:09, Leandro wrote: > 2018-04-03 11:35 GMT-03:00 RW: > > On Tue, 3 Apr 2018 11:09:38 -0300 Leandro wrote: > > > 2018-04-03 10:34 GMT-03:00 Antony Stone: > > > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies > > > > require contribution. Please contact us informing your IP or range, > > > > for further details." > > > > > > This means, for example, your system do 10 queries at same second, > > > then the query frequency is 100ms. > > > > Then the frequency is 10 per second, not 100ms. Querying more often > > is a higher frequency. > > That is it! 10 per second or one every 100ms. The first is a flow rate and > the second is a frequency. One every 100ms is a frequency, agreed. Two every 100ms is a higher frequency, and means faster requests. One every 50ms is the same rate as two every 100ms, therefore it is also a higher frequency than one every 100ms. Regards, Antony. -- I wasn't sure about having a beard at first, but then it grew on me. Please reply to the list; please *don't* CC me.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
2018-04-03 11:35 GMT-03:00 RW: > On Tue, 3 Apr 2018 11:09:38 -0300 > Leandro wrote: > > > 2018-04-03 10:34 GMT-03:00 Antony Stone < > > antony.st...@spamassassin.open.source.it>: > > > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies > > > require contribution. Please contact us informing your IP or range, > > > for further details." > > > > > > > > > This means, for example, your system do 10 queries at same second, > > then the query frequency is 100ms. > > Then the frequency is 10 per second, not 100ms. Querying more often > is a higher frequency. > That is it! 10 per second or one every 100ms. The first is a flow rate and the second is a frequency.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
> > > > > > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies > > > require contribution. Please contact us informing your IP or range, for > > > further details." > > > > This means, for example, your system do 10 queries at same second, then > the > > query frequency is 100ms. > > Yes, I got that bit. > > How big is an IP block? > Maybe I did not understand your question, but our DNSBL lists individual IPs, even /128 for IPv6. Sometimes our system lists entire blocks, like /64 for IPv6 and /24 for IPv4. > > > > Please could you explain what this means; what limitations are imposed > on > > > use of this service - specifically what is an "IP block", and do you > really > > > mean "lower frequencies require contribution"? Surely that should be > > > "higher"? > > > > Yes, I am sure. Lets use the same example above, but now your system do > 20 > > queries at same second, then the query frequency becomes 50ms, less than > > first case. > > Ah; I would call 50ms the interval and 20 queries per second the frequency. > > Thanks for the explanation. > > You are always welcome!
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Tue, 3 Apr 2018 11:09:38 -0300 Leandro wrote: > 2018-04-03 10:34 GMT-03:00 Antony Stone < > antony.st...@spamassassin.open.source.it>: > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies > > require contribution. Please contact us informing your IP or range, > > for further details." > > > > > This means, for example, your system do 10 queries at same second, > then the query frequency is 100ms. Then the frequency is 10 per second, not 100ms. Querying more often is a higher frequency.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Tuesday 03 April 2018 at 16:09:38, Leandro wrote: > 2018-04-03 10:34 GMT-03:00 Antony Stone: > > On Tuesday 03 April 2018 at 15:27:11, Leandro wrote: > > > Hey guys. We just created an URL signature algorithm to be able to > > > query an entire URL at our URIBL: > > > > > > https://spfbl.net/en/uribl/ > > > > I don't think I understand the following statement on that page: > > > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies > > require contribution. Please contact us informing your IP or range, for > > further details." > > This means, for example, your system do 10 queries at same second, then the > query frequency is 100ms. Yes, I got that bit. How big is an IP block? > > Please could you explain what this means; what limitations are imposed on > > use of this service - specifically what is an "IP block", and do you really > > mean "lower frequencies require contribution"? Surely that should be > > "higher"? > > Yes, I am sure. Lets use the same example above, but now your system do 20 > queries at same second, then the query frequency becomes 50ms, less than > first case. Ah; I would call 50ms the interval and 20 queries per second the frequency. Thanks for the explanation. Antony. -- 90% of networking problems are routing problems. 9 of the remaining 10% are routing problems in the other direction. The remaining 1% might be something else, but check the routing anyway. Please reply to the list; please *don't* CC me.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
2018-04-03 10:34 GMT-03:00 Antony Stone < antony.st...@spamassassin.open.source.it>: > On Tuesday 03 April 2018 at 15:27:11, Leandro wrote: > > > Hey guys. We just created an URL signature algorithm to be able to query > an > > entire URL at our URIBL: > > > > https://spfbl.net/en/uribl/ > > I don't think I understand the following statement on that page: > > "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies require > contribution. Please contact us informing your IP or range, for further > details." > This means, for example, your system do 10 queries at same second, then the query frequency is 100ms. > > Please could you explain what this means; what limitations are imposed on > use > of this service - specifically what is an "IP block", and do you really > mean > "lower frequencies require contribution"? Surely that should be "higher"? > Yes, I am sure. Lets use the same example above, but now your system do 20 queries at same second, then the query frequency becomes 50ms, less than first case. > > > Thanks, > > > Antony. > > -- > There's a good theatrical performance about puns on in the West End. It's > a > play on words. > >Please reply to the > list; > please *don't* CC > me. >
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
On Tuesday 03 April 2018 at 15:27:11, Leandro wrote: > Hey guys. We just created an URL signature algorithm to be able to query an > entire URL at our URIBL: > > https://spfbl.net/en/uribl/ I don't think I understand the following statement on that page: "IMPORTANT: Current limit is 100 ms per IP block. Lower frequencies require contribution. Please contact us informing your IP or range, for further details." Please could you explain what this means; what limitations are imposed on use of this service - specifically what is an "IP block", and do you really mean "lower frequencies require contribution"? Surely that should be "higher"? Thanks, Antony. -- There's a good theatrical performance about puns on in the West End. It's a play on words. Please reply to the list; please *don't* CC me.
Re: The "goo.gl" shortner is OUT OF CONTROL (+ invaluement's response)
Hey guys. We just created an URL signature algorithm to be able to query an entire URL at our URIBL: https://spfbl.net/en/uribl/ Now we are able to blacklist any malicious shortener URL. Now I will think about some public complain interface that automatic lists any correct malicious sample using some simple AI. All you have to do now is implement a SA plugin to make this signature and do the URIBL query. Regards, Leandro SPFBL.net
Re: This sucks
Hello Giovanni, On Tue, Apr 03, 2018 at 11:04:46AM +0200, Giovanni Bechis wrote: > if you start spamd from /root and you use a perl module that is using "use > lib 'lib';" or similar piece of code the relevant code will not load because > the user spamd is running on (spamd or whichever you have configured) will > not have access to $PWD. Thank you very much - this makes sense. NetAddr uses such a construct and I can confirm that triggering a DNS query before setuid is called will make the problem go away. Despite what has already been said about starting spamd from /root I think this should be addressed because people might stumble over it while doing debugging. Regards, Michael Brunnbauer -- ++ Michael Brunnbauer ++ netEstate GmbH ++ Geisenhausener Straße 11a ++ 81379 München ++ Tel +49 89 32 19 77 80 ++ Fax +49 89 32 19 77 89 ++ E-Mail bru...@netestate.de ++ http://www.netestate.de/ ++ ++ Sitz: München, HRB Nr.142452 (Handelsregister B München) ++ USt-IdNr. DE221033342 ++ Geschäftsführer: Michael Brunnbauer, Franz Brunnbauer ++ Prokurist: Dipl. Kfm. (Univ.) Markus Hendel
Re: This sucks
On Mon, Apr 02, 2018 at 03:09:34AM +0200, Michael Brunnbauer wrote: [...] > So being in /root when started changes the behavior of spamd. Is it possible > that this is a timing issue? Could "\# 4 7f03" be some unprocessed > response that would be converted to 127.0.0.3 a moment later? Or is there > some other explanation for this? > if you start spamd from /root and you use a perl module that is using "use lib 'lib';" or similar piece of code the relevant code will not load because the user spamd is running on (spamd or whichever you have configured) will not have access to $PWD. Giovanni signature.asc Description: PGP signature