Hello all!
As you may be aware, sitemaps generation for docs.openstack.org is currently
done via a manually triggered scrapy process. It currently also scrapes the
entirety of docs.openstack.org, making processing slow. In order to improve the
efficiency of this process, I would like to propose the following updates to
the sitemap generation toolkit:
* keep track (in logs) of 301s, 302s, and 404s,
* automatic pull of supported releases,
* cron-managed automatic updates, and
* setup of Google Webmaster tools (https://www.google.com/webmasters/)
* a few style cleanups
Beyond this, implementing more targeted crawling would improve the processing
speed and scope massively. This is, however, a bit of a complicated matter, as
it requires us to decide what, exactly, defines scope relevence, in order to
limit the crawl domain.
These are, of course, only our precursory findings. and we would love to hear
some feedback about alternate methods and possible tricky aspects of the
suggested changes. What do you think? Let us know!
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev