On Wed, 2008-02-13 at 16:21 +0100, Stefano Zacchiroli wrote: > On Tue, Feb 12, 2008 at 09:04:14PM +0000, Adam D. Barratt wrote: > > following some of my recent commits, I've decided to take the bull by > > its proverbial horns and look at converting some of our current HTML > > scraping to use the BTS SOAP interface. > > That's great news!
:-) [...] > > bts > > --- > > > > The perennial trigger for discussions about replacing HTML scraping with > > SOAP. Sadly the fact that bts (rather usefully :) supports offline > > working and local caches of bug content means we're largely stuck with > > parsing the generated HTML. > > Uhm, does it? I might start asking dumb questions since I've never used > the offline part of bts, but what features are actually provided in > offline mode? According to the manpage: > > * show/bugs clearly should work offline, but in that case we are anyhow > showing either an HTML page or a mailbox, so it isn't really related > to SOAP or scarping HTML, since in one of the two cases the HTML is > actually the final target of our action If you're using bts cache, or show/bugs with cache mode set to full, you don't just get an HTML page, but a set of HTML pages, attachments, mboxes and version graph images. In theory, one should be able to navigate between and within each of the pages as if one were online, assuming the relevant files are in the cache - the version images are displayed, links to individual messages, source and binary package bug pages, maintainer bug pages, attachments, etc. are mangled to refer to the local files (together with a link to the online version). It doesn't always work, but that's the theory. The regexes in mangle_cache_file() and href_to_filename() have made my head hurt at least once^Wtwice - particularly the ones I wrote ;). Adam -- To unsubscribe, send mail to [EMAIL PROTECTED]
