Re: Why is fedoraproject.org only indexed by Google?
On Tue, 27 Jan 2015 10:29:23 -0600 Michael Cronenworth m...@cchtml.com wrote: On 01/26/2015 09:46 AM, Kevin Fenzi wrote: I think we added the Crawl-delay several years ago when we were having storage issues. We could definitely try removing it and see if things improve. 10 seconds may be on the high side, but you may still want to keep it and lower it to 1 to 5 seconds. Bing Bot hits my sites like it's running a benchmark. Ok. I changed it to 1. Should sync out over the next little while... We can see if it makes any difference. kevin pgpZwXWQpiQpY.pgp Description: OpenPGP digital signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Why is fedoraproject.org only indexed by Google?
On 26.1.2015 16:46, Kevin Fenzi wrote: On Mon, 26 Jan 2015 07:53:27 -0700 Brandon Vincent brandon.vinc...@asu.edu wrote: On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com wrote: Any idea why? https://lists.fedoraproject.org/robots.txt User-agent: * Crawl-delay: 10 From Bing, This means the higher your crawl delay is, the fewer pages BingBot will crawl. As crawling fewer pages may result in getting less content indexed, we usually do not recommend it, although we also understand that different web sites may have different bandwidth constraints. [1]. [1] http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/ Not sure that explains why there are no results at all though. I think we added the Crawl-delay several years ago when we were having storage issues. We could definitely try removing it and see if things improve. Yes please. We definitely should get archives indexed by search engines! -- Petr Spacek @ Red Hat -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Why is fedoraproject.org only indexed by Google?
On 01/26/2015 09:46 AM, Kevin Fenzi wrote: I think we added the Crawl-delay several years ago when we were having storage issues. We could definitely try removing it and see if things improve. 10 seconds may be on the high side, but you may still want to keep it and lower it to 1 to 5 seconds. Bing Bot hits my sites like it's running a benchmark. -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Why is fedoraproject.org only indexed by Google?
I recently discovered that all search engines except Google (well, the Google U.S. index) do not cover fedoraproject.org well (specifically, lists.fedoraproject.org). https://duckduckgo.com/?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22 http://www.bing.com/search?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22 http://us.ask.com/web?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22 Any idea why? I'm not concerned that specific Fedora search results are buried deep down the general web search. Many, many mailing lists postings are not part of the index *at all*. I find this extremely annoying. I looked at robots.txt and the HTML code in the mailing list archive, but could not spot any obvious offenders. -- Florian Weimer / Red Hat Product Security -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Why is fedoraproject.org only indexed by Google?
On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com wrote: Any idea why? https://lists.fedoraproject.org/robots.txt User-agent: * Crawl-delay: 10 From Bing, This means the higher your crawl delay is, the fewer pages BingBot will crawl. As crawling fewer pages may result in getting less content indexed, we usually do not recommend it, although we also understand that different web sites may have different bandwidth constraints. [1]. [1] http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/ Brandon Vincent -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Re: Why is fedoraproject.org only indexed by Google?
On Mon, 26 Jan 2015 07:53:27 -0700 Brandon Vincent brandon.vinc...@asu.edu wrote: On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com wrote: Any idea why? https://lists.fedoraproject.org/robots.txt User-agent: * Crawl-delay: 10 From Bing, This means the higher your crawl delay is, the fewer pages BingBot will crawl. As crawling fewer pages may result in getting less content indexed, we usually do not recommend it, although we also understand that different web sites may have different bandwidth constraints. [1]. [1] http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/ Not sure that explains why there are no results at all though. I think we added the Crawl-delay several years ago when we were having storage issues. We could definitely try removing it and see if things improve. kevin pgppLBD9DNtXk.pgp Description: OpenPGP digital signature -- devel mailing list devel@lists.fedoraproject.org https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct