[General] Webboard: Antispam algorythm

2013-12-18 Thread bar
Author: fasfuuiios
Email: 
Message:
I have forgotten to add that this black hat seo program is still 
under active development because after end of November spam activity 
grows.

The say on black hat forums that this program currently can recognize 
up to 100.000 of text based capthcaz and it can collects these 
questions and send them to developers servers for analysis. 

One of the ways to stop them is using database from
http://www.stopforumspam.com/

But this needs statistics and preparation. So it looks like simple 
solution as described is much better. It can save a lot of traffic and 
keep search results clean. But also harm normal sites. 



Reply: http://www.mnogosearch.org/board/message.php?id=21616

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


Re: [General] Webboard: Antispam algorythm

2013-12-12 Thread John McCormac

On 11/12/2013 15:59, b...@mnogosearch.org wrote:

Author: fasfuuiios
Email:
Message:
Currently it looks like there is no way to stop indexing of spammed
sites. Link spammers even spam this board automatically from time to
time. That software is very pluggable and can be adapted for any type
of cms and submit forms.


One approach would be to build a table of problem/spam links. Then if a 
site has any of these toxic links, either drop the website or add the 
toxic link to a regexp.



I thought about global dirty solution that could haunt spam during
indexing process. Here is the idea.

-

Say we have new option for 3.4 + versions:

ExternalLinkCount [maxlinks] [maxpages] [nofollow]

maxlinks is the limit for external links on page. (Spammers are trying
to add direct links for pagerank etc.)


From work I do every month in TLD web usage surveys (measuring how 
websites are used in TLDs and the percentages of active/holding 
page/PPC/redirects), link spam is either comment form or cracked 
Joomla/Wordpress spam links. Comment spam can be be blocked with a 
regexp. The injection link spam may be invisible to ordinary browsers 
but visible to search engines due to CSS rules. These sites often use an 
old version of Joomla or Wordpress or a vulnerable plug-in. But they do 
not have many outbound toxic links. And these toxic links generally 
change each month.



This will delete any page which has more than 20 external links.


Most news sites will have more than this as they will have a network of 
their own sites, analytics, advertising, social media and other links. 
Web directories still exist and they typically have more than 20 
outbound links per page.


(Will post more later.)

Regards...jmcc
___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: Antispam algorythm

2013-12-11 Thread bar
Author: fasfuuiios
Email: 
Message:
Currently it looks like there is no way to stop indexing of spammed 
sites. Link spammers even spam this board automatically from time to 
time. That software is very pluggable and can be adapted for any type 
of cms and submit forms. 

I thought about global dirty solution that could haunt spam during 
indexing process. Here is the idea.

-

Say we have new option for 3.4 + versions:

ExternalLinkCount [maxlinks] [maxpages] [nofollow]

maxlinks is the limit for external links on page. (Spammers are trying 
to add direct links for pagerank etc.)

maxpages is the limit for probably spammed pages on same host.

nofollow is true or false. Filter only spam pages with or without 
rel=nofollow

---

Examples:

ExternalLinkCount 20

This will delete any page which has more than 20 external links.

ExternalLinkCount 20 20

This will automatically ban and remove site that has more than 20 
pages where each page has more than 20 external links.

ExternalLinkCount 20 20 true

This will do previos thing with and without nofollow links.

ExternalLinkCount 20 20 false

Only for direct links that play with pagerank etc.

---

This is not ideal. It can cut normal pages. But those webmasters who 
use nofollow as google recommended are rather safe. This can cut blog 
pages with tons of good comments.
Big scientific pages, catalogs and wikis are not probably safe from 
such dirty filtering. 

Anyway this is probably the simplest way to catch those sites that 
have tons of spammed pages. With high limits it could probably help.



Example of site that is currently under spam attack. It generates 
thousands of such spammed pages. That is why I thought about this 
problem in very basic but cruel way.

http://www.gksbeton.ru/index.php/peremychki-pb/item/35-novost-1/35-
novost-1?start=400


Reply: http://www.mnogosearch.org/board/message.php?id=21609

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general


[General] Webboard: Antispam algorythm

2013-12-11 Thread bar
Author: fasfuuiios
Email: 
Message:
I'm not completely sure that it's good idea but probably it is better 
than nothing at all to stop this. Of course, it needs tests and 
analysis. I believe that normal html page has no more than 5 external 
links. Currently even paid links are usually limited to 3, and they are 
located inside of article to avoid google filter penalties etc. 

Reply: http://www.mnogosearch.org/board/message.php?id=21610

___
General mailing list
General@mnogosearch.org
http://lists.mnogosearch.org/listinfo/general