As for logging 301s, 302s and 404s and the scope, I don't think we are interested in checking EOL content for those.
As we are about to approve https://review.openstack.org/#/c/507629/, we also want everybody to understand broken links found in EOL content won't be fixed, since no content updates to EOL content will be provided. Cheers, pk On Thu, 5 Oct 2017 22:51:31 -0400 (EDT) "[email protected]" <[email protected]> wrote: > Hello all! > > As you may be aware, sitemaps generation for docs.openstack.org is currently > done via a manually triggered scrapy process. It currently also scrapes the > entirety of docs.openstack.org, making processing slow. In order to improve > the efficiency of this process, I would like to propose the following updates > to the sitemap generation toolkit: > * keep track (in logs) of 301s, 302s, and 404s, > * automatic pull of supported releases, > * cron-managed automatic updates, and > * setup of Google Webmaster tools (https://www.google.com/webmasters/) > * a few style cleanups > > Beyond this, implementing more targeted crawling would improve the processing > speed and scope massively. This is, however, a bit of a complicated matter, > as it requires us to decide what, exactly, defines scope relevence, in order > to limit the crawl domain. > > These are, of course, only our precursory findings. and we would love to hear > some feedback about alternate methods and possible tricky aspects of the > suggested changes. What do you think? Let us know! > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
