[Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
Hi, Our DSpace (v1.4.1) has recently started logging a lot of Internal Server Errors that appear to be being caused by a Googlebot. They appear to be happening like clockwork every 14 minutes and come in blocks (sometimes lasting several hours). They are all associated with the IP Address

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Dorothea Salo
You found my favorite oldie bug! I'm guessing that item 1893/214 has been withdrawn or deleted. 1.4.1 throws a fit when a crawler tries to browse a page that should begin with a withdrawn or deleted item. I've forgotten the fix (other than upgrade to 1.4.2, in which the bug was squashed), but it

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Tom De Mulder
On Thu, 11 Feb 2010, Michael White wrote: :session_id=9E40BFD899A2AA5C23E81404AF5B97A5:internal_error:-- URL Was: https://dspace.stir.ac.uk/dspace/browse-title?bottom=1893/214 [snip] User-agent: * Disallow: /browse-author Disallow: /items-by-author

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Graham Triggs
On 11 February 2010 14:37, Tom De Mulder td...@cam.ac.uk wrote: You should add /dspace to the start of those disallowed patterns, because your DSpace URLs start with /dspace after the hostname. You should also ensure that the robots.txt is available at the root of the server... ie.

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server

2010-02-11 Thread Michael White
Thanks Dorothea, You found my favorite oldie bug! I'm guessing that item 1893/214 has been withdrawn or deleted. I must admit, I didn't think to check, but having checked it now, I see that it is actually a Collection homepage (as are the others that I checked from a random sample) - not

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
robot! Googlebot and Internal Server Errors On 11 February 2010 14:37, Tom De Mulder td...@cam.ac.ukmailto:td...@cam.ac.uk wrote: You should add /dspace to the start of those disallowed patterns, because your DSpace URLs start with /dspace after the hostname. You should also ensure

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Joseph Greene
Project Manager 325 James Joyce Library University College Dublin Belfield, Dublin 4 353 (0)1 716 7398 joseph.gre...@ucd.ie http://irserver.ucd.ie/dspace/ Message: 1 Date: Thu, 11 Feb 2010 12:30:04 + From: Michael White michael.wh...@stir.ac.uk Subject: [Dspace-tech] Bad robot! Googlebot

Re: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors

2010-02-11 Thread Michael White
2010 12:30:04 + From: Michael White michael.wh...@stir.ac.uk Subject: [Dspace-tech] Bad robot! Googlebot and Internal Server Errors To: dspace-tech@lists.sourceforge.net dspace-tech@lists.sourceforge.net Message-ID: 7c43cb6f3460394f9b5236c0f68d7b6a5d6baa4...@exch2007.ad.stir.ac.uk