On 12/3/2009 3:02 PM, Peter Saint-Andre wrote:
On 12/2/09 2:22 PM, Jesse Thompson wrote:
Peter Saint-Andre wrote:
On 11/25/09 11:53 AM, Jesse Thompson wrote:
Peter Saint-Andre wrote:
I think that the key for the 'right/best' anti-SPAM XMPP solution
is to
involve regular/polite XMPP users in any way.
I have my doubts that normal users will bother to flag messages as
spam.
However, given that I have only ever received a few spam messages over
XMPP (and even those I wasn't 100% sure about), perhaps it would not be
such a huge burden.
I like the idea of account level reputation.  The current, most
troublesome, battlefront on the war against email spam is dealing
spammer-created freemail accounts,

Most of the large, public XMPP IM services essentially offer "freechat"
accounts. The use of CAPTCHAs at, e.g., jabber.org is a small hurdle.

CAPTCHAs won't stop them from creating accounts.

Agreed. That's why I said the hurdle was small. :)

This idea is probably too elaborate, but I'll throw it out there since it would actually leverage CAPTCHAs nicely.

Would it be possible for the server to force the sender to solve a CAPTCHA for every "new" conversation between users that have not already authorized each other in their rosters?

And here is another off the wall idea: Or is there some way to implement greylisting in XMPP? The idea here is to initially tempfail (if that is even possible in XMPP) a "conversation", but then accept it when the sending server retries.

and another idea... is there a way to implement content scanning? If I get an unsolicited message from someone not in my roster, then can the client or server send the content to a service for classification?


Take a look at this
list of email "phishing reply dropbox" email addresses that we have been
collecting over the past year or two.

https://aper.svn.sourceforge.net/svnroot/aper/phishing_reply_addresses

Nice. And some of those double as IM addresses.

and with phished account credentials
on closed systems.

I think we've seen less of this on the XMPP network because we don't
have very good web integration.

No, the phishers just ask the users to reply via email with their
account credentials.  The link above is a list of these reply
destination email accounts.

Or, they put up a web form somewhere.

You would be surprised how many users will give away their credentials
to anyone that asks.

Sadly, you're probably right.

You could apply an account-level reputation system at the server as well
as the client.

An XMPP operator could set up the server to block domains whose
trustworthy account ratio is below their tolerance level.  This would
effectively block domains that have only spammers.  But it would not
block domains like jabber.org or gmail that are trustworthy but have
spammers signing up for free accounts.

Agreed.

For spamming accounts in trustworthy domains, the server operator could
set it up to block accounts that meet a certain untrustworthiness
threshold.

So when mydomain.com receives an inbound stanza from [email protected], it
would check the trust score of the sender?

yeah

That could generate a lot of traffic. Perhaps it could be optimized to
check only on the first message received in a chat session (although
"chat session" is mostly undefined at the protocol level).

Yeah. However, keep in mind that the server/client can inherently trust any traffic that is between users that have already authorized each other in their rosters.


Or, the users could do it at the client level.

That seems like more work. See above about user laziness. :)

My thought was more that ambitious developers will be more able to
integrate it into the clients before it is adopted into server software
and deployed by the operators.  Think of it as a way to bridge the gap.

I'm not opposed to both methods, although I think that development of
clients and servers is about equal in speed these days.

It has more to do with necessity. Right now, there aren't enough users of a service that actually have a problem with spam to justify a service operator to spend time implementing anti-spam. Those users who have a spam problem would gravitate to clients that support anti-spam in the mean time.


Anti-spam scanning was built into email clients well before it became
common on the server-side (around 2002.)  Once the servers caught up,
the client approach became less effective, but it is still useful in
some situations.

Agreed. And the client-side approach might tie in nicely with rosters.

The key is to figure out how to collect and expose the data in a private
way.

Your thoughts are welcome.

Do you mean the scores need to be private, or the source data needs to
be private?

I was initially thinking of a trust network: I trust someone who is
trusted by the people I trust.  I could then set it up so that people
who are very trustworthy are allowed to send me anonymous messages and I
will auto-authorize into my roster, someone who is completely foreign to
my trust network is blocked from sending messages, and various levels in
between.

Some of this data is already available within the server roster
databases, but otherwise it would have to be fed by opt-in contributers.
  The problem with this trust network approach is that the data could be
mined by spammers and phishers, so it would need to be kept private
somehow.

Otherwise, traditional DNSBLs (specifically, URIBLs of JIDs) are the way
to go.  It might be possible to work with the existing DNSBL providers
to create a new blacklist of JIDs.

Yes, that's worth exploring, though I'd like a way to query it in XMPP
and not over DNS.

As an analogy, in regards to cross-network IM, we have some clients that rely on transports, and others that implement the protocol directly. The DNS part of the implementation should be relatively easy since clients already look up SRV records, but the UI would be non-trivial regardless of how the client does the query. If it's implemented using service discovery, then the user could configure their client to use an external anti-spam service just like they can use an external transport today. But there has to be someone willing to run the service.

How the anti-spam plugin/service is implemented depends on where the data is stored. If you want to leverage existing URIBLs, then the plugin would have to be capable to querying via DNS. Are DNSBLs still using DNS because it is the best way to do it, or is it just legacy? That's something I'm not sure of.

I kind of wish that our service actually got spam so that I could try out some of these ideas. :-) Spammers: hit me!

Jesse



Peter


--
  Jesse Thompson
  Division of Information Technology, University of Wisconsin-Madison
  Email/IM: [email protected]

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to