Re: [webkit-dev] Spam and indexing

Konstantin Tokarev Thu, 28 Mar 2019 14:13:09 -0700

28.03.2019, 23:58, "Alexey Proskuryakov" <a...@webkit.org>:
> Hello,
>
> The robots.txt file that we have on bugs.webkit.org currently allows search 
> engines access to individual bug pages, but not to any bug lists. As a 
> result, search engines and the Internet Archive only index bugs that were 
> filed before robots.txt changes a few years ago, and bugs that are directly 
> linked from webpages elsewhere. These bugs are where most spam content 
> naturally ends up on.
>
> This is quite wrong, as indexing just a subset of bugs is not beneficial to 
> anyone other than spammers. So we can go in either direction:
>
> 1. Allow indexers to enumerate bugs, thus indexing all of them.
>
> Seems reasonable that people should be able to find bugs using search engines.

Yes, and it may give better result even than searching bugzilla directly

>On the other hand, we'll need to do something to ensure that indexers don't 
>destroy Bugzilla performance,

This can be solved by caching

>and of course spammers will love having more flexibility.

rel="nofollow" on all links in comments should be enough to make spamming 
useless

>
> 2. Block indexing completely.
>
> Seems like no one was bothered by lack of indexing on new bugs so far.

That's survival bias - if nobody can find relevant bugs, nobody will ever 
complain

>
> Thoughts?
>
> For reference, here is the current robots.txt content:
>
> $ curl https://bugs.webkit.org/robots.txt
> User-agent: *
> Allow: /index.cgi
> Allow: /show_bug.cgi
> Disallow: /
> Crawl-delay: 20
>
> - Alexey
> - Alexey
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev@lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev

-- 
Regards,
Konstantin

_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] Spam and indexing

Reply via email to