Re: [backstage] Screen Scraping Advice ...
Murray, Simon (IED) wrote: Some time ago I wrote a simple screen scrape script in classic ASP using the Internet Transfer Protocol (InetCtls.Inet) which had it's limitations. I'm interested in using .Net and the HttpWebRequest class, but would welcome any guidance on the subjectparticularly when accessing data spanning across multiple pages. Again, a not .Net answer - I've successfully used Python with Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/ Saying that, there is a version of Python for .Net called IronPython, so maybe you can get Beautiful Soup to work with that http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython cheers Scot
Re: [backstage] backstage.bbc.co.uk wins innovation new media award
At 13:51 +0100 25/7/06, dotBen (aka Ben Metcalfe) wrote: Hello all I just thought you would like to know that backstage.bbc.co.uk won the innovation award at last night's New Statesman New Media Awards (http://www.newstatesman.co.uk/nma/nma2006/nma2006home.php). I've written a blog post about it: http://benmetcalfe.com/blog/index.php/2006/07/25/backstagebbccouk-wins-innovation-award/ Although I obviously don't work for the BBC anymore, if I can dedicate any part of the award it would be to you guys -- the community -- to make backstage what it is. Ben - Oh er. Nice subtitle B A M F 2.0 Gordo -- Think Feynman/ http://pobox.com/~gordo/ [EMAIL PROTECTED]/// - Sent via the backstage.bbc.co.uk discussion group. To unsubscribe, please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html. Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/
Re: [backstage] Screen Scraping Advice ...
Murray, Simon (IED) wrote: I'm interested in using .Net and the HttpWebRequest class, but would welcome any guidance on the subject particularly when accessing data spanning across multiple pages. http://www.crummy.com/software/BeautifulSoup/ might be useful? I've heard good things about it. Our parser of Hansard (for which we have a licence, I should point out) has to cope with things spanning pages. It used to just look for the Next Section link and follow that until they stopped, but these are occasionally missing, so it now stores all the links from an index page, starts following Next Section links and hopefully works out what to do if one is missing. -- ATB, Matthew - Sent via the backstage.bbc.co.uk discussion group. To unsubscribe, please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html. Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/