Hi All, I'm investigating an issue at the moment where GoogleBot appears to be asking for files on one domain that exist on another and could do with some input from other brains.
The issue is that for example, a file exists at www.domain-one.co.nz/files/123/executive-summary.pdf (and has for some time) and now GoogleBot is now also trying to index that file at www.domain-two.co.nz/files/123/executive-summary.pdf where it does not exist. In the last day I've seen 20 different occurrences of this (different domain names, same files name) for a total of 183 requests. These requests have been consistent - when a non-existent file is requested it's always the same file and same domains. I can't spot a pattern in the domains and files being requested. According to our Apache access logs these started happening yesterday at 1:36pm NZ time and didn't appear to have been happening before that. Some relevant points: 1. The sites in question are all running on a custom in-house CMS and on the same server. 2. The files being requested are stored in separate directories and in separate mySQL databases. 3. I cannot find any reference to these files on the "bad" domain (grepped the database and file contents) Possible explanations: 1. Google has changed something (e.g. looking harder for duplicate content) and are now asking for files in a way they hadn't previously. 2. We've screwed something up and are unwittingly telling Google that those files exist with some kind of site map. 3. Somewhere someone has made an incorrect index of the sites and GoogleBot is treating those links as authoritative. What I think I want: When we see a request coming for a non-existent file we want to know the *reason* why GoogleBot thinks it's OK to ask for that file. I've looked in Google's Webmaster Tools (we don't have it installed yet on an affected domain) and can't find this anyway. Hopefully this will get us closer to an answer and I'm hoping that it's just a facepalm issue but the multiple domain-ness of it is just weird. Any other suggestions? I'll post the actual URL's if required but would prefer not to disclose them at this point. Cheers, - Bob - -- Bob Brown, [L|W]AMP Web Developer [email protected], http://www.guru.net.nz -- NZ PHP Users Group: http://groups.google.com/group/nzphpug To post, send email to [email protected] To unsubscribe, send email to [email protected]
