I would recommend putting a 301 permanent redirect on any content that has
been access at an incorrect URL.

You can't alway guess why this happens, but it is likly some one linked to
it incorrectly.

There are also other things you could do if it fits your needs:

   - The URL is incorrect, and totally irrelevant for the domain, issue a
   404 as it is not found in the current context
   - The URL was old, but updated, issue a 3xx redirect, and/or implement
   canonical URLs
   - And if you truly think this is an issue on Google's part, you could
   try and report it to them, but I don't think you would get very far. So I
   would recommend one of the above.


David Neilsen | 07 834 3366 | PANmedia ®


On Fri, May 11, 2012 at 12:56 PM, Bob Brown <[email protected]> wrote:

> Hi All,
>
> I'm investigating an issue at the moment where GoogleBot appears to be
> asking for files on one domain that exist on another and could do with
> some input from other brains.
>
> The issue is that for example, a file exists at
> www.domain-one.co.nz/files/123/executive-summary.pdf (and has for some
> time) and now GoogleBot is now also trying to index that file at
> www.domain-two.co.nz/files/123/executive-summary.pdf where it does not
> exist.
>
> In the last day I've seen 20 different occurrences of this (different
> domain names, same files name) for a total of 183 requests. These
> requests have been consistent - when a non-existent file is requested
> it's always the same file and same domains. I can't spot a pattern in
> the domains and files being requested. According to our Apache access
> logs these started happening yesterday at 1:36pm NZ time and didn't
> appear to have been happening before that.
>
> Some relevant points:
>
> 1. The sites in question are all running on a custom in-house CMS and
> on the same server.
> 2. The files being requested are stored in separate directories and in
> separate mySQL databases.
> 3. I cannot find any reference to these files on the "bad" domain
> (grepped the database and file contents)
>
> Possible explanations:
>
> 1. Google has changed something (e.g. looking harder for duplicate
> content) and are now asking for files in a way they hadn't previously.
> 2. We've screwed something up and are unwittingly telling Google that
> those files exist with some kind of site map.
> 3. Somewhere someone has made an incorrect index of the sites and
> GoogleBot is treating those links as authoritative.
>
> What I think I want:
>
> When we see a request coming for a non-existent file we want to know
> the *reason* why GoogleBot thinks it's OK to ask for that file. I've
> looked in Google's Webmaster Tools (we don't have it installed yet on
> an affected domain) and can't find this anyway. Hopefully this will
> get us closer to an answer and I'm hoping that it's just a facepalm
> issue but the multiple domain-ness of it is just weird.
>
> Any other suggestions?  I'll post the actual URL's if required but
> would prefer not to disclose them at this point.
>
> Cheers,
>
> - Bob -
>
> --
> Bob Brown, [L|W]AMP Web Developer
> [email protected], http://www.guru.net.nz
>
> --
> NZ PHP Users Group: http://groups.google.com/group/nzphpug
> To post, send email to [email protected]
> To unsubscribe, send email to
> [email protected]
>

-- 
NZ PHP Users Group: http://groups.google.com/group/nzphpug
To post, send email to [email protected]
To unsubscribe, send email to
[email protected]

Reply via email to