Re: [backstage] Screen Scraping Advice ...

2006-07-26 Thread Scot McSweeney-Roberts




Murray, Simon (IED) wrote:

  
  
  
  
  
  Some
time ago I wrote a simple screen scrape script in classic ASP using the
Internet Transfer Protocol (InetCtls.Inet) which had it's limitations. I'm interested
in using .Net and the HttpWebRequest class, but would welcome any
guidance on the subjectparticularly when accessing data spanning
across multiple pages.
  
  


Again, a not .Net answer - I've successfully used Python with
Beautiful Soup - http://www.crummy.com/software/BeautifulSoup/ 

Saying that, there is a version of Python for .Net called IronPython,
so maybe you can get Beautiful Soup to work with that

http://www.codeplex.com/Wiki/View.aspx?ProjectName=IronPython

cheers

Scot






Re: [backstage] backstage.bbc.co.uk wins innovation new media award

2006-07-26 Thread Gordon Joly

At 13:51 +0100 25/7/06, dotBen (aka Ben Metcalfe) wrote:

Hello all

I just thought you would like to know that backstage.bbc.co.uk won the
innovation award at last night's New Statesman New Media Awards
(http://www.newstatesman.co.uk/nma/nma2006/nma2006home.php).

I've written a blog post about it:
http://benmetcalfe.com/blog/index.php/2006/07/25/backstagebbccouk-wins-innovation-award/

Although I obviously don't work for the BBC anymore, if I can dedicate
any part of the award it would be to you guys -- the community -- to
make backstage what it is.

Ben


-




Oh er. Nice subtitle

B A M F 2.0


Gordo

--
Think Feynman/
http://pobox.com/~gordo/
[EMAIL PROTECTED]///
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Screen Scraping Advice ...

2006-07-26 Thread Matthew Somerville

Murray, Simon (IED) wrote:
I'm interested in using .Net and the HttpWebRequest class, 
but would welcome any guidance on the subject particularly when 
accessing data spanning across multiple pages.


http://www.crummy.com/software/BeautifulSoup/ might be useful? I've heard 
good things about it.


Our parser of Hansard (for which we have a licence, I should point out) has 
to cope with things spanning pages. It used to just look for the Next 
Section link and follow that until they stopped, but these are occasionally 
missing, so it now stores all the links from an index page, starts following 
Next Section links and hopefully works out what to do if one is missing.

--
ATB,
Matthew

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/