29.03.2019, 19:16, "Alexey Proskuryakov" <a...@webkit.org>:
>> 28 марта 2019 г., в 14:10, Konstantin Tokarev <annu...@yandex.ru> написал(а):
>>
>> 28.03.2019, 23:58, "Alexey Proskuryakov" <a...@webkit.org>:
>>> Hello,
>>>
>>> The robots.txt file that we have on bugs.webkit.org currently allows search 
>>> engines access to individual bug pages, but not to any bug lists. As a 
>>> result, search engines and the Internet Archive only index bugs that were 
>>> filed before robots.txt changes a few years ago, and bugs that are directly 
>>> linked from webpages elsewhere. These bugs are where most spam content 
>>> naturally ends up on.
>>>
>>> This is quite wrong, as indexing just a subset of bugs is not beneficial to 
>>> anyone other than spammers. So we can go in either direction:
>>>
>>> 1. Allow indexers to enumerate bugs, thus indexing all of them.
>>>
>>> Seems reasonable that people should be able to find bugs using search 
>>> engines.
>>
>> Yes, and it may give better result even than searching bugzilla directly
>>
>>> On the other hand, we'll need to do something to ensure that indexers don't 
>>> destroy Bugzilla performance,
>>
>> This can be solved by caching
>
> Is this something that other Bugzilla instances do? I'm actually not sure how 
> caching can be meaningfully applied to Bugzilla. One wants to always see the 
> latest updates, and our automation in particular won't be OK with stale data.

I'm not sure if HTTP-level caching may be used here, but quick search brings 
this:
https://www.bugzilla.org/releases/5.0.4/release-notes.html#feat_caching_performance

If we can update Bugzilla it should be possible at least to reduce number of 
database hits when pages
are rendered.

> - Alexey
>
>>> and of course spammers will love having more flexibility.
>>
>> rel="nofollow" on all links in comments should be enough to make spamming 
>> useless
>>
>>> 2. Block indexing completely.
>>>
>>> Seems like no one was bothered by lack of indexing on new bugs so far.
>>
>> That's survival bias - if nobody can find relevant bugs, nobody will ever 
>> complain
>>
>>> Thoughts?
>>>
>>> For reference, here is the current robots.txt content:
>>>
>>> $ curl https://bugs.webkit.org/robots.txt
>>> User-agent: *
>>> Allow: /index.cgi
>>> Allow: /show_bug.cgi
>>> Disallow: /
>>> Crawl-delay: 20
>>>
>>> - Alexey
>>> - Alexey
>>>
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev@lists.webkit.org
>>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>>
>> --
>> Regards,
>> Konstantin


-- 
Regards,
Konstantin
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

Reply via email to