Hello all!

As you may be aware, sitemaps generation for docs.openstack.org is currently 
done via a manually triggered scrapy process. It currently also scrapes the 
entirety of docs.openstack.org, making processing slow. In order to improve the 
efficiency of this process, I would like to propose the following updates to 
the sitemap generation toolkit:
    * keep track (in logs) of 301s, 302s, and 404s,
    * automatic pull of supported releases,
    * cron-managed automatic updates, and
    * setup of Google Webmaster tools (https://www.google.com/webmasters/) 
    * a few style cleanups
    
Beyond this, implementing more targeted crawling would improve the processing 
speed and scope massively. This is, however, a bit of a complicated matter, as 
it requires us to decide what, exactly, defines scope relevence, in order to 
limit the crawl domain.

These are, of course, only our precursory findings. and we would love to hear 
some feedback about alternate methods and possible tricky aspects of the 
suggested changes. What do you think? Let us know!


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to