Re: Why is fedoraproject.org only indexed by Google?

2015-01-28 Thread Kevin Fenzi
On Tue, 27 Jan 2015 10:29:23 -0600
Michael Cronenworth m...@cchtml.com wrote:

 On 01/26/2015 09:46 AM, Kevin Fenzi wrote:
  I think we added the Crawl-delay several years ago when we were
  having storage issues. We could definitely try removing it and see
  if things improve.
 
 10 seconds may be on the high side, but you may still want to keep it
 and lower it to 1 to 5 seconds. Bing Bot hits my sites like it's
 running a benchmark.

Ok. I changed it to 1. Should sync out over the next little while... 

We can see if it makes any difference. 

kevin


pgpZwXWQpiQpY.pgp
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Why is fedoraproject.org only indexed by Google?

2015-01-27 Thread Petr Spacek
On 26.1.2015 16:46, Kevin Fenzi wrote:
 On Mon, 26 Jan 2015 07:53:27 -0700 Brandon Vincent
 brandon.vinc...@asu.edu wrote:
 
 On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com 
 wrote:
 Any idea why?
 
 https://lists.fedoraproject.org/robots.txt
 
 User-agent: * Crawl-delay: 10
 
 From Bing, This means the higher your crawl delay is, the fewer pages 
 BingBot will crawl. As crawling fewer pages may result in getting less 
 content indexed, we usually do not recommend it, although we also 
 understand that different web sites may have different bandwidth 
 constraints. [1].
 
 [1] 
 http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/

 
 Not sure that explains why there are no results at all though.
 
 I think we added the Crawl-delay several years ago when we were having 
 storage issues. We could definitely try removing it and see if things 
 improve.

Yes please. We definitely should get archives indexed by search engines!

-- 
Petr Spacek  @  Red Hat
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Why is fedoraproject.org only indexed by Google?

2015-01-27 Thread Michael Cronenworth

On 01/26/2015 09:46 AM, Kevin Fenzi wrote:

I think we added the Crawl-delay several years ago when we were having
storage issues. We could definitely try removing it and see if things
improve.


10 seconds may be on the high side, but you may still want to keep it and lower 
it to 1 to 5 seconds. Bing Bot hits my sites like it's running a benchmark.

--
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Why is fedoraproject.org only indexed by Google?

2015-01-26 Thread Florian Weimer
I recently discovered that all search engines except Google (well, the
Google U.S. index) do not cover fedoraproject.org well (specifically,
lists.fedoraproject.org).

https://duckduckgo.com/?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22
http://www.bing.com/search?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22
http://us.ask.com/web?q=site%3Afedoraproject.org+%22Why+no+Class-Path+manifest+attribute%22

Any idea why?

I'm not concerned that specific Fedora search results are buried deep
down the general web search.  Many, many mailing lists postings are not
part of the index *at all*.  I find this extremely annoying.  I looked
at robots.txt and the HTML code in the mailing list archive, but could
not spot any obvious offenders.

-- 
Florian Weimer / Red Hat Product Security
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Why is fedoraproject.org only indexed by Google?

2015-01-26 Thread Brandon Vincent
On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com wrote:
 Any idea why?

https://lists.fedoraproject.org/robots.txt

User-agent: *
Crawl-delay: 10

From Bing, This means the higher your crawl delay is, the fewer pages
BingBot will crawl. As crawling fewer pages may result in getting less
content indexed, we usually do not recommend it, although we also
understand that different web sites may have different bandwidth
constraints. [1].

[1] 
http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/

Brandon Vincent
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct

Re: Why is fedoraproject.org only indexed by Google?

2015-01-26 Thread Kevin Fenzi
On Mon, 26 Jan 2015 07:53:27 -0700
Brandon Vincent brandon.vinc...@asu.edu wrote:

 On Mon, Jan 26, 2015 at 7:21 AM, Florian Weimer fwei...@redhat.com
 wrote:
  Any idea why?
 
 https://lists.fedoraproject.org/robots.txt
 
 User-agent: *
 Crawl-delay: 10
 
 From Bing, This means the higher your crawl delay is, the fewer pages
 BingBot will crawl. As crawling fewer pages may result in getting less
 content indexed, we usually do not recommend it, although we also
 understand that different web sites may have different bandwidth
 constraints. [1].
 
 [1]
 http://blogs.bing.com/webmaster/2012/05/03/to-crawl-or-not-to-crawl-that-is-bingbots-question/

Not sure that explains why there are no results at all though. 

I think we added the Crawl-delay several years ago when we were having
storage issues. We could definitely try removing it and see if things
improve. 

kevin



pgppLBD9DNtXk.pgp
Description: OpenPGP digital signature
-- 
devel mailing list
devel@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/devel
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct