Re: [backstage] Programatic searching of /programmes

2009-02-25 Thread Andy
In case anyone was wondering I did go for parsing the results of
/programmes/a-z/by and everything appears to be working fine. I ran
into a slight problem where I forgot to convert HTML entities found in
the page source, but tat' mostly fixed (it can't fully handle all
named entities but the Beeb seem to using numeric ones anyway).

$ java ui.ShowSearchResults 'doctor'
Search Term: doctor
--
b0072v72Doctor in the House
b006q2x0Doctor Who
b006q2xbDoctor Who Confidential
b006mh9vDoctors

$ java ui.ShowEpisodes 'b006q2x0'
Programme ID: b006q2x0
--
Title: Doctor Who - Series 2, Doomsday
Synopsis: As the human race is caught in an intergalactic war, the
Doctor faces a greater dilemma.
Episode URL: http://www.bbc.co.uk/iplayer/episode/b0074frg
Availability: 2 days left to watch

[SNIP]

All I have to do now is write the program logic and the GUI ;)

Thanks again for all the help.

Andy

-- 
$ fortune
bug, n:
A son of a glitch.
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Programatic searching of /programmes

2009-02-18 Thread Martin Poppy Hatfield
On Tue, Feb 17, 2009 at 9:27 PM, Andy stude.l...@googlemail.com wrote:
 What I'm looking for is a way of sending a query such as Top Gear
 and getting back b006mj59 and preferably the name of the programme
 incase of partial matches.

I did something like this a while ago with yahoo pipes -
http://pipes.yahoo.com/mart/programmecode
Dunno if it's any help to you.

MCH
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Programatic searching of /programmes

2009-02-18 Thread Soulla Stylianou
I know we have a chatbot that tells you what's on the telly built for
Backstage eons ago.

Don't think its what you're looking for but its fun to ask the chatbot whats
on tonight etc :)

http://www.daden2.co.uk/chatbots/livebots/charlotte.html

Personally I prefer speaking to Halo

http://www.daden2.co.uk/chatbots/livebots/halo_ajax_sitepal.html

have fun

Soulla

2009/2/18 Martin  Poppy Hatfield mar...@moppy.co.uk

 On Tue, Feb 17, 2009 at 9:27 PM, Andy stude.l...@googlemail.com wrote:
  What I'm looking for is a way of sending a query such as Top Gear
  and getting back b006mj59 and preferably the name of the programme
  incase of partial matches.

 I did something like this a while ago with yahoo pipes -
 http://pipes.yahoo.com/mart/programmecode
 Dunno if it's any help to you.

 MCH
 -
 Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
 visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
  Unofficial list archive:
 http://www.mail-archive.com/backstage@lists.bbc.co.uk/




-- 
Soulla Stylianou
RL Client Director
DADEN LIMITED

e: soulla.stylia...@daden.co.uk
t: 0121 698 8520
m: 07814145167

w: www.daden.co.uk
http://twitter.com/SoullaStylianou

sl: http://www.slurl.com/secondlife/daden%20prime/160/184/26

sl IM: ImmortalitySou Ballinger

Daden Limited is an Information 2.0 Consultancy and full service Virtual
Worlds/Second Life development agency.

Creators of Daden Navigator - the first Web Browser for Second Life (
http://www.daden.co.uk/navigator)


Re: [backstage] Programatic searching of /programmes

2009-02-18 Thread Iain Wallace
 The last time I needed to do something like this I tried Search first, but
 ended up using the A-Z on /programmes as the results were much more what I
 was after. The HTML on /programmes is also easy to parse. I don't call using
 an XML parser and XPath screen scraping :)

It's screen scraping if the output wasn't designed to be read by a
machine. Change the format and you've got a broken screen scraper. If
the output was XML any changes to the output would either be
non-destructive to the existing format or would explicitly use a
different version of the API on a different URL or with different
arguments (like the difference between RSS and Atom).

You could use a parser like Beautiful Soup to turn whatever rubbish
you're looking at into perfectly traversable XML but it doesn't change
the fact that the entire thing would break if the page author decided
to juggle the layout around a bit.

That's my rule of thumb about what constitutes screen scraping anyway.

Iain
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Programatic searching of /programmes

2009-02-17 Thread Jonathan Tweed

Hi Andy

On 17 Feb 2009, at 21:27, Andy wrote:


What I'm looking for is a way of sending a query such as Top Gear
and getting back b006mj59 and preferably the name of the programme
incase of partial matches.

Of course it's possible to spider data from
http://www.bbc.co.uk/programmes/a-z/by/[LETTER]/all but that would
require Screen Scrapping and 27 queries (to check for matches that
aren't at the begining of the title). But something more efficent
would be good.


I wouldn't write off the A-Z so quickly, it's actually pretty clever  
and does find partial matches, e.g.:


http://www.bbc.co.uk/programmes/a-z/by/top%20gear/all

returns Best of Top Gear, Top Gear and Top Gear Take Two.

The last time I needed to do something like this I tried Search first,  
but ended up using the A-Z on /programmes as the results were much  
more what I was after. The HTML on /programmes is also easy to parse.  
I don't call using an XML parser and XPath screen scraping :)


Cheers
Jonathan
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Programatic searching of /programmes

2009-02-17 Thread Andy
2009/2/17 Jonathan Tweed jonat...@tweed.name:
 I wouldn't write off the A-Z so quickly, it's actually pretty clever and
 does find partial matches, e.g.:

 http://www.bbc.co.uk/programmes/a-z/by/top%20gear/all

 returns Best of Top Gear, Top Gear and Top Gear Take Two.

Wow, I didn't realise A-Z could do that, I assumed it just listed
programmes by the first letter. Clearly it's much more than that.

 The HTML on /programmes is also easy to parse. I don't call using
 an XML parser and XPath screen scraping :)

Not really sure what XPath is but the HTML does look quite simple.
Should be able to extract what I want with a Regex or two.

Thanks for your help

Andy

-- 
Computers are like air conditioners.  Both stop working, if you open windows.
-- Adam Heath
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/