29.03.2019, 19:16, "Alexey Proskuryakov" <a...@webkit.org>: >> 28 марта 2019 г., в 14:10, Konstantin Tokarev <annu...@yandex.ru> написал(а): >> >> 28.03.2019, 23:58, "Alexey Proskuryakov" <a...@webkit.org>: >>> Hello, >>> >>> The robots.txt file that we have on bugs.webkit.org currently allows search >>> engines access to individual bug pages, but not to any bug lists. As a >>> result, search engines and the Internet Archive only index bugs that were >>> filed before robots.txt changes a few years ago, and bugs that are directly >>> linked from webpages elsewhere. These bugs are where most spam content >>> naturally ends up on. >>> >>> This is quite wrong, as indexing just a subset of bugs is not beneficial to >>> anyone other than spammers. So we can go in either direction: >>> >>> 1. Allow indexers to enumerate bugs, thus indexing all of them. >>> >>> Seems reasonable that people should be able to find bugs using search >>> engines. >> >> Yes, and it may give better result even than searching bugzilla directly >> >>> On the other hand, we'll need to do something to ensure that indexers don't >>> destroy Bugzilla performance, >> >> This can be solved by caching > > Is this something that other Bugzilla instances do? I'm actually not sure how > caching can be meaningfully applied to Bugzilla. One wants to always see the > latest updates, and our automation in particular won't be OK with stale data.
I'm not sure if HTTP-level caching may be used here, but quick search brings this: https://www.bugzilla.org/releases/5.0.4/release-notes.html#feat_caching_performance If we can update Bugzilla it should be possible at least to reduce number of database hits when pages are rendered. > - Alexey > >>> and of course spammers will love having more flexibility. >> >> rel="nofollow" on all links in comments should be enough to make spamming >> useless >> >>> 2. Block indexing completely. >>> >>> Seems like no one was bothered by lack of indexing on new bugs so far. >> >> That's survival bias - if nobody can find relevant bugs, nobody will ever >> complain >> >>> Thoughts? >>> >>> For reference, here is the current robots.txt content: >>> >>> $ curl https://bugs.webkit.org/robots.txt >>> User-agent: * >>> Allow: /index.cgi >>> Allow: /show_bug.cgi >>> Disallow: / >>> Crawl-delay: 20 >>> >>> - Alexey >>> - Alexey >>> >>> _______________________________________________ >>> webkit-dev mailing list >>> webkit-dev@lists.webkit.org >>> https://lists.webkit.org/mailman/listinfo/webkit-dev >> >> -- >> Regards, >> Konstantin -- Regards, Konstantin _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev