Re: [webkit-dev] Spam and indexing

Lucas Forschler Thu, 28 Mar 2019 15:36:53 -0700


> On Mar 28, 2019, at 2:10 PM, Konstantin Tokarev <[email protected]> wrote:
> 
> 
> 
> 28.03.2019, 23:58, "Alexey Proskuryakov" <[email protected] 
> <mailto:[email protected]>>:
>> Hello,
>> 
>> The robots.txt file that we have on bugs.webkit.org 
>> <http://bugs.webkit.org/> currently allows search engines access to 
>> individual bug pages, but not to any bug lists. As a result, search engines 
>> and the Internet Archive only index bugs that were filed before robots.txt 
>> changes a few years ago, and bugs that are directly linked from webpages 
>> elsewhere. These bugs are where most spam content naturally ends up on.
>> 
>> This is quite wrong, as indexing just a subset of bugs is not beneficial to 
>> anyone other than spammers. So we can go in either direction:
>> 
>> 1. Allow indexers to enumerate bugs, thus indexing all of them.
>> 
>> Seems reasonable that people should be able to find bugs using search 
>> engines.
> 
> Yes, and it may give better result even than searching bugzilla directly
> 
>> On the other hand, we'll need to do something to ensure that indexers don't 
>> destroy Bugzilla performance,
> 
> This can be solved by caching
> 
>> and of course spammers will love having more flexibility.
> 
> rel="nofollow" on all links in comments should be enough to make spamming 
> useless


Theoretically yes… but a couple google searches say it doesn’t make a 
difference. Here is one of many
https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html
 
<https://www.seroundtable.com/google-nofollow-link-attribute-failed-comments-26959.html>

I expect that spammers don’t reply care if they get a nofollow or not, they are 
mostly un-manned scripts anyway.

I’m not opposed to adding this, I just don’t expect it will solve the problem. 
We could measure and see.
Lucas


> 
>> 
>> 2. Block indexing completely.
>> 
>> Seems like no one was bothered by lack of indexing on new bugs so far.
> 
> That's survival bias - if nobody can find relevant bugs, nobody will ever 
> complain
> 
>> 
>> Thoughts?
>> 
>> For reference, here is the current robots.txt content:
>> 
>> $ curl https://bugs.webkit.org/robots.txt
>> User-agent: *
>> Allow: /index.cgi
>> Allow: /show_bug.cgi
>> Disallow: /
>> Crawl-delay: 20
>> 
>> - Alexey
>> - Alexey
>> 
>> _______________________________________________
>> webkit-dev mailing list
>> [email protected]
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
> 
> -- 
> Regards,
> Konstantin
> 
> _______________________________________________
> webkit-dev mailing list
> [email protected] <mailto:[email protected]>
> https://lists.webkit.org/mailman/listinfo/webkit-dev 
> <https://lists.webkit.org/mailman/listinfo/webkit-dev>

_______________________________________________
webkit-dev mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-dev

Re: [webkit-dev] Spam and indexing

Reply via email to