[gnu.org #701121] Duplicate content?
Josh wrote: Hi, I search the Emacs manual using Google's site: operator. For example: http://www.google.com/search?q=site:www.gnu.org/software/emacs+EXAMPLE I've been noticing that pages in the manual don't always show up in the search results when I use this method. For example, when I search the manual for refill, the Refill page doesn't show up: http://www.google.com/search?q=site:www.gnu.org/software/emacs+refill I think this may be due to the fact that the Emacs manual is shown via multiple distinct URLs, which Google perceives as duplicate content: http://www.gnu.org/software/emacs/manual/html_node/emacs/ http://www.gnu.org/software/libtool/manual/emacs/ Here's an article that addresses the problem of duplicate content: http://www.google.com/support/webmasters/bin/answer.py?answer=66359 Thanks for maintaining gnu.org. It's one of the most useful sites on the web. Take care, Josh Hello libtool maintainers, just forwarding the above feedback to you for your consideration. Perhaps one option might be linking to the GNU Emacs manual instead of maintaining an independent copy? ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: [gnu.org #701121] Duplicate content?
On 10/24/2011 11:05 AM, Jason Self via RT wrote: Josh wrote: Hi, I search the Emacs manual using Google's site: operator. For example: http://www.google.com/search?q=site:www.gnu.org/software/emacs+EXAMPLE I've been noticing that pages in the manual don't always show up in the search results when I use this method. For example, when I search the manual for refill, the Refill page doesn't show up: http://www.google.com/search?q=site:www.gnu.org/software/emacs+refill I think this may be due to the fact that the Emacs manual is shown via multiple distinct URLs, which Google perceives as duplicate content: http://www.gnu.org/software/emacs/manual/html_node/emacs/ http://www.gnu.org/software/libtool/manual/emacs/ Here's an article that addresses the problem of duplicate content: http://www.google.com/support/webmasters/bin/answer.py?answer=66359 Thanks for maintaining gnu.org. It's one of the most useful sites on the web. Take care, Josh Hello libtool maintainers, just forwarding the above feedback to you for your consideration. Perhaps one option might be linking to the GNU Emacs manual instead of maintaining an independent copy? Do we control this? Using curl I see it redirect with a 301: * Connected to www.gnu.org (140.186.70.148) port 80 (#0) GET /software/libtool/manual/emacs HTTP/1.1 User-Agent: curl/7.21.0 (x86_64-redhat-linux-gnu) libcurl/7.21.0 NSS/3.12.10.0 zlib/1.2.5 libidn/1.18 libssh2/1.2.4 Host: www.gnu.org Accept: */* HTTP/1.1 301 Moved Permanently Date: Mon, 24 Oct 2011 18:51:19 GMT Server: Apache/2.2.14 Location: http://www.gnu.org/software/emacs/manual/html_node/emacs/ Cache-Control: max-age=0 Expires: Mon, 24 Oct 2011 18:51:19 GMT Vary: Accept-Encoding Content-Length: 333 Content-Type: text/html; charset=iso-8859-1 I don't see 'emacs' in libtool's web cvs: http://web.cvs.savannah.gnu.org/viewvc/libtool/manual/?root=libtool Peter ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: [gnu.org #701121] Duplicate content?
On 10/24/2011 11:05 AM, Jason Self via RT wrote: Josh wrote: Hi, I search the Emacs manual using Google's site: operator. For example: http://www.google.com/search?q=site:www.gnu.org/software/emacs+EXAMPLE I've been noticing that pages in the manual don't always show up in the search results when I use this method. For example, when I search the manual for refill, the Refill page doesn't show up: http://www.google.com/search?q=site:www.gnu.org/software/emacs+refill I think this may be due to the fact that the Emacs manual is shown via multiple distinct URLs, which Google perceives as duplicate content: http://www.gnu.org/software/emacs/manual/html_node/emacs/ http://www.gnu.org/software/libtool/manual/emacs/ Here's an article that addresses the problem of duplicate content: http://www.google.com/support/webmasters/bin/answer.py?answer=66359 Thanks for maintaining gnu.org. It's one of the most useful sites on the web. Take care, Josh Hello libtool maintainers, just forwarding the above feedback to you for your consideration. Perhaps one option might be linking to the GNU Emacs manual instead of maintaining an independent copy? Do we control this? Using curl I see it redirect with a 301: * Connected to www.gnu.org (140.186.70.148) port 80 (#0) GET /software/libtool/manual/emacs HTTP/1.1 User-Agent: curl/7.21.0 (x86_64-redhat-linux-gnu) libcurl/7.21.0 NSS/3.12.10.0 zlib/1.2.5 libidn/1.18 libssh2/1.2.4 Host: www.gnu.org Accept: */* HTTP/1.1 301 Moved Permanently Date: Mon, 24 Oct 2011 18:51:19 GMT Server: Apache/2.2.14 Location: http://www.gnu.org/software/emacs/manual/html_node/emacs/ Cache-Control: max-age=0 Expires: Mon, 24 Oct 2011 18:51:19 GMT Vary: Accept-Encoding Content-Length: 333 Content-Type: text/html; charset=iso-8859-1 I don't see 'emacs' in libtool's web cvs: http://web.cvs.savannah.gnu.org/viewvc/libtool/manual/?root=libtool Peter ___ https://lists.gnu.org/mailman/listinfo/libtool
Re: [gnu.org #701121] Duplicate content?
On 10/24/2011 01:54 PM, Peter O'Gorman wrote: On 10/24/2011 11:05 AM, Jason Self via RT wrote: Hello libtool maintainers, just forwarding the above feedback to you for your consideration. Perhaps one option might be linking to the GNU Emacs manual instead of maintaining an independent copy? Do we control this? Using curl I see it redirect with a 301: Never mind, .symlinks controls this. These are there for a reason: http://lists.gnu.org/archive/html/libtool/2011-04/msg4.html Peter ___ https://lists.gnu.org/mailman/listinfo/libtool