RE: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-25 Thread Jeremy Stone
Hi Graeme

Get in touch with me off list and we can sort this out.

thanks

Jem Stone.


-Original Message-
From: [EMAIL PROTECTED] on behalf of Robin Doran
Sent: Tue 7/24/2007 9:50 PM
To: backstage@lists.bbc.co.uk; backstage@lists.bbc.co.uk
Subject: RE: Uploading the BBC programme catalogue to freebase (was RE: 
[backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any 
APIs yet?))
 
Hi Graeme,
 
The robots.txt file has been accidentally dropped from the new release and we 
will be re-introducing it, this is due to initial concerns  complaints raised 
about personal data population in external search engines  when the service was 
launched.
 
On the subject of scraping the data, I've asked the catalogue.bbc.co.uk team to 
clarify the terms of use on the data to see if that will help answer your 
question but if you have a specific request then I would recommend using the 
Contact Us page http://catalogue.bbc.co.uk/catalogue/infax/contact
Regards,
 



From: [EMAIL PROTECTED] on behalf of Graeme West
Sent: Tue 7/24/2007 20:39
To: backstage@lists.bbc.co.uk
Subject: Re: Uploading the BBC programme catalogue to freebase (was RE: 
[backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any 
APIs yet?))


Hi all, 
Sorry to re-open an old thread - just wondering what the position is on 
scraping the catalogue.bbc.co.uk test site? I say this because I'm trying a 
little experiment - ingesting the whole catalogue into our Fedora repository ( 
http://www.fedora.info ) to be cross-referenced with the 200+ hours of BBC 
audio and video which we legally hold in our legacy repository as per our 
deposit agreement with the BBC ( 
http://www.spokenword.ac.uk/using-audio-video/copyright/ ).

The reason I ask is that I've constructed a set of scripts which scrape the 
catalogue.bbc.co.uk archive's RDF files. I've already got a 'master' list of 
all programme URLs (the script to generate that took a pretty long time on a 
JANET connection), but having started the crawler grabbing the actual RDF 
streams for each programme, I can see that this is going to involve a pretty 
large amount of data transfer.

FYI, my crawler uses Wget and respects robots.txt files. There's no robots.txt 
file on catalogue.bbc.co.uk so it seems to be fair game, but there is one on 
open.bbc.co.uk - I'm scraping from the former obviously. Clearly there's a 
licensing issue with copying the content but I'm only trying this as a 
technical experiment at this stage anyway - it will not be publicly available.

--
Graeme West
Spoken Word Services
Glasgow Caledonian University

Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
Project web site: 
http://www.spokenword.ac.uk/ http://www.spokenword.ac.uk/ 


On 9 Jul 2007, at 21:30, Brendan Quinn wrote:


I was considering entering a hack for Hack Day around that very thing.
But then they went and made me one of the judges ;-)

Wanna help? A simple set of scripts that scrape the archive (er I mean
call that big RESTful API) and post entries/updates to the freebase
sandbox server would be an interesting experiment.

I agree that freebase is an amazing resource, especially when the
programme data is curated properly:

compare
http://www.freebase.com/view/?id=%239202a8c04000641f80012406 
with
http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
!

There may be some rights issues around what would basically amount to
opening up the programme catalogue under the creative commons
attribution license, where the attribution wouldn't go to the BBC but to
Freebase...

Brendan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
Sent: 09 July 2007 20:51
To: backstage@lists.bbc.co.uk
Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
Programme Catalogue -any APIs yet?)

I've been following the Programme Catalogue since it was announced, and
its pretty interesting.

I do however have a question for the BBC people on the list - have you
considered simply uploading all the information to Freebase[1]? I can
understand that you might want to keep it in house, but if you merged it
with the wealth of information on Freebase you can do exponentially
more.

For example, if it was properly integrated you could run a query that
would tell me how many of the contributors to Spooks series 2 were born
in London.

Regards,
Oli

[1] http://www.freebase.com - A very cool structured database, currently
handling 2.3 million instances of 870 'types'

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, 
please visit 

Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-24 Thread Graeme West

Hi all,
Sorry to re-open an old thread - just wondering what the position is  
on scraping the catalogue.bbc.co.uk test site? I say this because I'm  
trying a little experiment - ingesting the whole catalogue into our  
Fedora repository ( http://www.fedora.info ) to be cross-referenced  
with the 200+ hours of BBC audio and video which we legally hold in  
our legacy repository as per our deposit agreement with the BBC  
( http://www.spokenword.ac.uk/using-audio-video/copyright/ ).


The reason I ask is that I've constructed a set of scripts which  
scrape the catalogue.bbc.co.uk archive's RDF files. I've already got  
a 'master' list of all programme URLs (the script to generate that  
took a pretty long time on a JANET connection), but having started  
the crawler grabbing the actual RDF streams for each programme, I can  
see that this is going to involve a pretty large amount of data  
transfer.


FYI, my crawler uses Wget and respects robots.txt files. There's no  
robots.txt file on catalogue.bbc.co.uk so it seems to be fair game,  
but there is one on open.bbc.co.uk - I'm scraping from the former  
obviously. Clearly there's a licensing issue with copying the content  
but I'm only trying this as a technical experiment at this stage  
anyway - it will not be publicly available.


--
Graeme West
Spoken Word Services
Glasgow Caledonian University

Email: [EMAIL PROTECTED]
Project web site:
http://www.spokenword.ac.uk/


On 9 Jul 2007, at 21:30, Brendan Quinn wrote:


I was considering entering a hack for Hack Day around that very thing.
But then they went and made me one of the judges ;-)

Wanna help? A simple set of scripts that scrape the archive (er I mean
call that big RESTful API) and post entries/updates to the freebase
sandbox server would be an interesting experiment.

I agree that freebase is an amazing resource, especially when the
programme data is curated properly:

compare
http://www.freebase.com/view/?id=%239202a8c04000641f80012406
with
http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
!

There may be some rights issues around what would basically amount to
opening up the programme catalogue under the creative commons
attribution license, where the attribution wouldn't go to the BBC  
but to

Freebase...

Brendan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
Sent: 09 July 2007 20:51
To: backstage@lists.bbc.co.uk
Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
Programme Catalogue -any APIs yet?)

I've been following the Programme Catalogue since it was announced,  
and

its pretty interesting.

I do however have a question for the BBC people on the list - have you
considered simply uploading all the information to Freebase[1]? I can
understand that you might want to keep it in house, but if you  
merged it

with the wealth of information on Freebase you can do exponentially
more.

For example, if it was properly integrated you could run a query that
would tell me how many of the contributors to Spooks series 2 were  
born

in London.

Regards,
Oli

[1] http://www.freebase.com - A very cool structured database,  
currently

handling 2.3 million instances of 870 'types'

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe,  
please visit http://backstage.bbc.co.uk/archives/2005/01/ 
mailing_list.html.  Unofficial list archive: http://www.mail- 
archive.com/backstage@lists.bbc.co.uk/


Email has been scanned for viruses by Altman Technologies' email  
management service - www.altman.co.uk/emailsystems




RE: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-24 Thread Robin Doran
Hi Graeme,
 
The robots.txt file has been accidentally dropped from the new release and we 
will be re-introducing it, this is due to initial concerns  complaints raised 
about personal data population in external search engines  when the service was 
launched.
 
On the subject of scraping the data, I've asked the catalogue.bbc.co.uk team to 
clarify the terms of use on the data to see if that will help answer your 
question but if you have a specific request then I would recommend using the 
Contact Us page http://catalogue.bbc.co.uk/catalogue/infax/contact
Regards,
 



From: [EMAIL PROTECTED] on behalf of Graeme West
Sent: Tue 7/24/2007 20:39
To: backstage@lists.bbc.co.uk
Subject: Re: Uploading the BBC programme catalogue to freebase (was RE: 
[backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any 
APIs yet?))


Hi all, 
Sorry to re-open an old thread - just wondering what the position is on 
scraping the catalogue.bbc.co.uk test site? I say this because I'm trying a 
little experiment - ingesting the whole catalogue into our Fedora repository ( 
http://www.fedora.info ) to be cross-referenced with the 200+ hours of BBC 
audio and video which we legally hold in our legacy repository as per our 
deposit agreement with the BBC ( 
http://www.spokenword.ac.uk/using-audio-video/copyright/ ).

The reason I ask is that I've constructed a set of scripts which scrape the 
catalogue.bbc.co.uk archive's RDF files. I've already got a 'master' list of 
all programme URLs (the script to generate that took a pretty long time on a 
JANET connection), but having started the crawler grabbing the actual RDF 
streams for each programme, I can see that this is going to involve a pretty 
large amount of data transfer.

FYI, my crawler uses Wget and respects robots.txt files. There's no robots.txt 
file on catalogue.bbc.co.uk so it seems to be fair game, but there is one on 
open.bbc.co.uk - I'm scraping from the former obviously. Clearly there's a 
licensing issue with copying the content but I'm only trying this as a 
technical experiment at this stage anyway - it will not be publicly available.

--
Graeme West
Spoken Word Services
Glasgow Caledonian University

Email: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] 
Project web site: 
http://www.spokenword.ac.uk/ http://www.spokenword.ac.uk/ 


On 9 Jul 2007, at 21:30, Brendan Quinn wrote:


I was considering entering a hack for Hack Day around that very thing.
But then they went and made me one of the judges ;-)

Wanna help? A simple set of scripts that scrape the archive (er I mean
call that big RESTful API) and post entries/updates to the freebase
sandbox server would be an interesting experiment.

I agree that freebase is an amazing resource, especially when the
programme data is curated properly:

compare
http://www.freebase.com/view/?id=%239202a8c04000641f80012406 
with
http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
!

There may be some rights issues around what would basically amount to
opening up the programme catalogue under the creative commons
attribution license, where the attribution wouldn't go to the BBC but to
Freebase...

Brendan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
Sent: 09 July 2007 20:51
To: backstage@lists.bbc.co.uk
Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
Programme Catalogue -any APIs yet?)

I've been following the Programme Catalogue since it was announced, and
its pretty interesting.

I do however have a question for the BBC people on the list - have you
considered simply uploading all the information to Freebase[1]? I can
understand that you might want to keep it in house, but if you merged it
with the wealth of information on Freebase you can do exponentially
more.

For example, if it was properly integrated you could run a query that
would tell me how many of the contributors to Spooks series 2 were born
in London.

Regards,
Oli

[1] http://www.freebase.com - A very cool structured database, currently
handling 2.3 million instances of 870 'types'

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, 
please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

Email has been scanned for viruses by Altman Technologies' email 
management service - www.altman.co.uk/emailsystems




RE: [backstage] Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-10 Thread Michael Smethurst
Just a reminder that the top of the pops data is available under creative 
commons and has an xml representation (that could probably do with some work) 
and has musicbrainz ids and musicbrainz has been uploaded to freebase in it's 
entirety

Unfortuntely it's under an attribution licence but like tom says about dr who, 
totp is clearly [a] BBC programme[s], I'm not sure it's the end of
the world...

It's something i've been meaning to do for weeks but keep getting bored reading 
the api docs. IF you wanna give it a go and need anything more from me shout...

http://bbc-hackday.dyndns.org:2821/




-Original Message-
From: [EMAIL PROTECTED] on behalf of Tom Loosemore
Sent: Mon 7/9/2007 10:48 PM
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Re: Uploading the BBC programme catalogue to freebase 
(was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme 
Catalogue -any APIs yet?))
 
On 09/07/07, Oliver Cole [EMAIL PROTECTED] wrote:
 On Mon, 2007-07-09 at 21:30 +0100, Brendan Quinn wrote:
  I was considering entering a hack for Hack Day around that very thing.
  But then they went and made me one of the judges ;-)
 
  Wanna help? A simple set of scripts that scrape the archive (er I mean
  call that big RESTful API) and post entries/updates to the freebase
  sandbox server would be an interesting experiment.

 I've not yet (bulk) posted data on Freebase - I'll take a look at this
 when I'm more au fait with it.

  compare
  http://www.freebase.com/view/?id=%239202a8c04000641f80012406
  with
  http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
  !

 Freebase is still in alpha as far as I know - those who can't see the
 first link can see a screenshot at:
 http://cornflakes.imen.org.uk/~oli/DrWho.png

 Those who are particularly interested can feel free to ask me for one of
 my remaining 4 invites - and I imagine Brendan has some too.

  There may be some rights issues around what would basically amount to
  opening up the programme catalogue under the creative commons
  attribution license, where the attribution wouldn't go to the BBC but to
  Freebase...

 Well, the RDF for the catalogue links to
 http://backstage.bbc.co.uk/archives/2005/05/api_licence.html:

 The BBC grants to You a ... non-sublicensable right to copy...

 Further:

 d. not publish, distribute or otherwise make the APIs available,
 (including in any Work You create), in a way that would enable other
 people to download or use the APIs other than as set out in this
 Licence.

standard backstage API licence -  it was the only one lying around at
the time... (nov 2005)

 I don't see any legal way that we can export the data to Freebase and
 relicense it as CC-BY.

yeah... the attribution back to BBC kinda matters... though given the
programmes are clearly BBC programmes, I'm not sure it's the end of
the world...


 Would you be able to get the appropriate BBC people to get this done?

I'll do a bit of lobbying...
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/

winmail.dat

RE: [backstage] Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-10 Thread Michael Smethurst
one final thought. getting both the programme catalogue and totp into freebase 
would allow us combine

http://catalogue.bbc.co.uk/catalogue/infax/programme/LLVM414K

and

http://bbc-hackday.dyndns.org:2821/totp/episode/vj3n

(the only episode i remember)

and start to make links from bbc programmes to musicbrainz which would be a 
good thing


-Original Message-
From: [EMAIL PROTECTED] on behalf of Michael Smethurst
Sent: Tue 7/10/2007 8:20 AM
To: backstage@lists.bbc.co.uk
Subject: RE: [backstage] Re: Uploading the BBC programme catalogue to freebase 
(was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme 
Catalogue -any APIs yet?))
 
Just a reminder that the top of the pops data is available under creative 
commons and has an xml representation (that could probably do with some work) 
and has musicbrainz ids and musicbrainz has been uploaded to freebase in it's 
entirety

Unfortuntely it's under an attribution licence but like tom says about dr who, 
totp is clearly [a] BBC programme[s], I'm not sure it's the end of
the world...

It's something i've been meaning to do for weeks but keep getting bored reading 
the api docs. IF you wanna give it a go and need anything more from me shout...

http://bbc-hackday.dyndns.org:2821/




-Original Message-
From: [EMAIL PROTECTED] on behalf of Tom Loosemore
Sent: Mon 7/9/2007 10:48 PM
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Re: Uploading the BBC programme catalogue to freebase 
(was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme 
Catalogue -any APIs yet?))
 
On 09/07/07, Oliver Cole [EMAIL PROTECTED] wrote:
 On Mon, 2007-07-09 at 21:30 +0100, Brendan Quinn wrote:
  I was considering entering a hack for Hack Day around that very thing.
  But then they went and made me one of the judges ;-)
 
  Wanna help? A simple set of scripts that scrape the archive (er I mean
  call that big RESTful API) and post entries/updates to the freebase
  sandbox server would be an interesting experiment.

 I've not yet (bulk) posted data on Freebase - I'll take a look at this
 when I'm more au fait with it.

  compare
  http://www.freebase.com/view/?id=%239202a8c04000641f80012406
  with
  http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
  !

 Freebase is still in alpha as far as I know - those who can't see the
 first link can see a screenshot at:
 http://cornflakes.imen.org.uk/~oli/DrWho.png

 Those who are particularly interested can feel free to ask me for one of
 my remaining 4 invites - and I imagine Brendan has some too.

  There may be some rights issues around what would basically amount to
  opening up the programme catalogue under the creative commons
  attribution license, where the attribution wouldn't go to the BBC but to
  Freebase...

 Well, the RDF for the catalogue links to
 http://backstage.bbc.co.uk/archives/2005/05/api_licence.html:

 The BBC grants to You a ... non-sublicensable right to copy...

 Further:

 d. not publish, distribute or otherwise make the APIs available,
 (including in any Work You create), in a way that would enable other
 people to download or use the APIs other than as set out in this
 Licence.

standard backstage API licence -  it was the only one lying around at
the time... (nov 2005)

 I don't see any legal way that we can export the data to Freebase and
 relicense it as CC-BY.

yeah... the attribution back to BBC kinda matters... though given the
programmes are clearly BBC programmes, I'm not sure it's the end of
the world...


 Would you be able to get the appropriate BBC people to get this done?

I'll do a bit of lobbying...
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


winmail.dat

Re: [backstage] Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-10 Thread Jonathan Powell

On 7/10/07, Tom Loosemore [EMAIL PROTECTED] wrote:


it'd be a much easier sell chez auntie if Freebase itself didn't
demand attribution for any use of the content and data within it...
why should the BBC give up attribution (its form of pseudo revenue) on
its data and hand over attribution of credit and link juice to
Freebase? Why does Freebase insist on its own attribution licence for
3rd party us of all content  data other people upload?

As I was pondering the answer to this question, it struck me that it's
not clear on what basis Freebase.com exists - by which I mean
Freebase.com the entity, rather than the content and data within it.
Is Freebase.com a charity? Is it a not-for-profit? Is it fully
commercial?   Or is it just in its understandably confused early
stages? The FAQ is not particularly forthcoming.
http://www.freebase.com/signin/faq

Not much chance of Auntie handing over its data to someone else to
apply their - valuable - attribution licence too with so much
uncertainty - however cool it would be from a tech and product and
play angle.

On 10/07/07, Michael Smethurst [EMAIL PROTECTED] wrote:
 Just a reminder that the top of the pops data is available under
creative commons and has an xml representation (that could probably do with
some work) and has musicbrainz ids and musicbrainz has been uploaded to
freebase in it's entirety

 Unfortuntely it's under an attribution licence but like tom says about
dr who, totp is clearly [a] BBC programme[s], I'm not sure it's the end of
 the world...

 It's something i've been meaning to do for weeks but keep getting bored
reading the api docs. IF you wanna give it a go and need anything more from
me shout...

 http://bbc-hackday.dyndns.org:2821/




 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Tom Loosemore
 Sent: Mon 7/9/2007 10:48 PM
 To: backstage@lists.bbc.co.uk
 Subject: Re: [backstage] Re: Uploading the BBC programme catalogue to
freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC
Programme Catalogue -any APIs yet?))

 On 09/07/07, Oliver Cole [EMAIL PROTECTED] wrote:
  On Mon, 2007-07-09 at 21:30 +0100, Brendan Quinn wrote:
   I was considering entering a hack for Hack Day around that very
thing.
   But then they went and made me one of the judges ;-)
  
   Wanna help? A simple set of scripts that scrape the archive (er I
mean
   call that big RESTful API) and post entries/updates to the
freebase
   sandbox server would be an interesting experiment.
 
  I've not yet (bulk) posted data on Freebase - I'll take a look at this
  when I'm more au fait with it.
 
   compare
   http://www.freebase.com/view/?id=%239202a8c04000641f80012406
   with
   http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
   !
 
  Freebase is still in alpha as far as I know - those who can't see the
  first link can see a screenshot at:
  http://cornflakes.imen.org.uk/~oli/DrWho.png
 
  Those who are particularly interested can feel free to ask me for one
of
  my remaining 4 invites - and I imagine Brendan has some too.
 
   There may be some rights issues around what would basically amount
to
   opening up the programme catalogue under the creative commons
   attribution license, where the attribution wouldn't go to the BBC
but to
   Freebase...
 
  Well, the RDF for the catalogue links to
  http://backstage.bbc.co.uk/archives/2005/05/api_licence.html:
 
  The BBC grants to You a ... non-sublicensable right to copy...
 
  Further:
 
  d. not publish, distribute or otherwise make the APIs available,
  (including in any Work You create), in a way that would enable other
  people to download or use the APIs other than as set out in this
  Licence.

 standard backstage API licence -  it was the only one lying around at
 the time... (nov 2005)

  I don't see any legal way that we can export the data to Freebase and
  relicense it as CC-BY.

 yeah... the attribution back to BBC kinda matters... though given the
 programmes are clearly BBC programmes, I'm not sure it's the end of
 the world...


  Would you be able to get the appropriate BBC people to get this done?

 I'll do a bit of lobbying...
 -
 Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe,
please visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial
list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/



-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  Unofficial
list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/



It always comes down to licensing issues, doesn't it?
I have no problem with the BBC's licensing for this data, but I do agree
that freebase seems a little sinister or over-eager. It feels like a lawsuit
waiting to happen :)

The BBC Programme Catalogue is an excellent resource that could just really
do with some better API search result data. There's no XML data supplied for

RE: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-09 Thread Chris Sizemore
http://catalogue.bbc.co.uk/catalogue/infax/series/DR+WHO

holy synonomous concepts, batman... 
(http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO)

point is, it would be easy to merge these on freebase, nearly impossible 
directly in the BBC Programme Catalogue context...

suppose this all has to do with the different purposes of the 2 products... 

arguably, the BBC has done it's part by making the Catalogue data available via 
RDF and Atom? if freebase is a useful (interim) destination for this data, 
isn't the assumption that the community will make it happen? (hint, hint?)


best--

--cs

-Original Message-
From: [EMAIL PROTECTED] on behalf of Brendan Quinn
Sent: Mon 7/9/2007 9:30 PM
To: backstage@lists.bbc.co.uk
Subject: Uploading the BBC programme catalogue to freebase (was RE: [backstage] 
Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))
 
I was considering entering a hack for Hack Day around that very thing.
But then they went and made me one of the judges ;-)

Wanna help? A simple set of scripts that scrape the archive (er I mean
call that big RESTful API) and post entries/updates to the freebase
sandbox server would be an interesting experiment.

I agree that freebase is an amazing resource, especially when the
programme data is curated properly:

compare
http://www.freebase.com/view/?id=%239202a8c04000641f80012406 
with
http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
!

There may be some rights issues around what would basically amount to
opening up the programme catalogue under the creative commons
attribution license, where the attribution wouldn't go to the BBC but to
Freebase...

Brendan.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
Sent: 09 July 2007 20:51
To: backstage@lists.bbc.co.uk
Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
Programme Catalogue -any APIs yet?)

I've been following the Programme Catalogue since it was announced, and
its pretty interesting.

I do however have a question for the BBC people on the list - have you
considered simply uploading all the information to Freebase[1]? I can
understand that you might want to keep it in house, but if you merged it
with the wealth of information on Freebase you can do exponentially
more.

For example, if it was properly integrated you could run a query that
would tell me how many of the contributors to Spooks series 2 were born
in London.

Regards,
Oli

[1] http://www.freebase.com - A very cool structured database, currently
handling 2.3 million instances of 870 'types'

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/



[backstage] Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-09 Thread Oliver Cole
On Mon, 2007-07-09 at 21:30 +0100, Brendan Quinn wrote:
 I was considering entering a hack for Hack Day around that very thing.
 But then they went and made me one of the judges ;-)
 
 Wanna help? A simple set of scripts that scrape the archive (er I mean
 call that big RESTful API) and post entries/updates to the freebase
 sandbox server would be an interesting experiment.

I've not yet (bulk) posted data on Freebase - I'll take a look at this
when I'm more au fait with it.

 compare
 http://www.freebase.com/view/?id=%239202a8c04000641f80012406 
 with
 http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
 !

Freebase is still in alpha as far as I know - those who can't see the
first link can see a screenshot at:
http://cornflakes.imen.org.uk/~oli/DrWho.png

Those who are particularly interested can feel free to ask me for one of
my remaining 4 invites - and I imagine Brendan has some too.

 There may be some rights issues around what would basically amount to
 opening up the programme catalogue under the creative commons
 attribution license, where the attribution wouldn't go to the BBC but to
 Freebase...

Well, the RDF for the catalogue links to
http://backstage.bbc.co.uk/archives/2005/05/api_licence.html:

The BBC grants to You a ... non-sublicensable right to copy...

Further:

d. not publish, distribute or otherwise make the APIs available,
(including in any Work You create), in a way that would enable other
people to download or use the APIs other than as set out in this
Licence.


I don't see any legal way that we can export the data to Freebase and
relicense it as CC-BY.

I don't think it can be done without a relicensing of the catalogue - I
guess its lucky you didn't go ahead and write that script at Hack day :)

Would you be able to get the appropriate BBC people to get this done?

Regards,

Oli

 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
 Sent: 09 July 2007 20:51
 To: backstage@lists.bbc.co.uk
 Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
 Programme Catalogue -any APIs yet?)
 
 I've been following the Programme Catalogue since it was announced, and
 its pretty interesting.
 
 I do however have a question for the BBC people on the list - have you
 considered simply uploading all the information to Freebase[1]? I can
 understand that you might want to keep it in house, but if you merged it
 with the wealth of information on Freebase you can do exponentially
 more.
 
 For example, if it was properly integrated you could run a query that
 would tell me how many of the contributors to Spooks series 2 were born
 in London.
 
 Regards,
 Oli
 
 [1] http://www.freebase.com - A very cool structured database, currently
 handling 2.3 million instances of 870 'types'
 
 -
 Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
 visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
 Unofficial list archive: 
 http://www.mail-archive.com/backstage@lists.bbc.co.uk/
 

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


[backstage] RE: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-09 Thread Oliver Cole
On Mon, 2007-07-09 at 22:05 +0100, Chris Sizemore wrote:
 holy synonomous concepts, batman... 
 point is, it would be easy to merge these on freebase, nearly
 impossible directly in the BBC Programme Catalogue context...

Indeed, Freebase is superior in this regard.

 arguably, the BBC has done it's part by making the Catalogue data
 available via RDF and Atom? if freebase is a useful (interim)
 destination for this data, isn't the assumption that the community
 will make it happen? (hint, hint?)

I believe the community would make it happen if the license allowed it
to happen. The problem is that the BBC took the obvious step of applying
the Backstage API license to the data on the Catalogue, which is more of
a database than an API... See my other post for license discussion.

Regards,
Oli
 
 
 best--
 
 --cs
 
 -Original Message-
 From: [EMAIL PROTECTED] on behalf of Brendan Quinn
 Sent: Mon 7/9/2007 9:30 PM
 To: backstage@lists.bbc.co.uk
 Subject: Uploading the BBC programme catalogue to freebase (was RE:
 [backstage] Programme Catalogue vs. Freebase (was: BBC Programme
 Catalogue -any APIs yet?))
 
 I was considering entering a hack for Hack Day around that very thing.
 But then they went and made me one of the judges ;-)
 
 Wanna help? A simple set of scripts that scrape the archive (er I mean
 call that big RESTful API) and post entries/updates to the freebase
 sandbox server would be an interesting experiment.
 
 I agree that freebase is an amazing resource, especially when the
 programme data is curated properly:
 
 compare
 http://www.freebase.com/view/?id=%239202a8c04000641f80012406
 with
 http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
 !
 
 There may be some rights issues around what would basically amount to
 opening up the programme catalogue under the creative commons
 attribution license, where the attribution wouldn't go to the BBC but
 to
 Freebase...
 
 Brendan.
 
 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Oliver Cole
 Sent: 09 July 2007 20:51
 To: backstage@lists.bbc.co.uk
 Subject: [backstage] Programme Catalogue vs. Freebase (was: BBC
 Programme Catalogue -any APIs yet?)
 
 I've been following the Programme Catalogue since it was announced,
 and
 its pretty interesting.
 
 I do however have a question for the BBC people on the list - have you
 considered simply uploading all the information to Freebase[1]? I can
 understand that you might want to keep it in house, but if you merged
 it
 with the wealth of information on Freebase you can do exponentially
 more.
 
 For example, if it was properly integrated you could run a query that
 would tell me how many of the contributors to Spooks series 2 were
 born
 in London.
 
 Regards,
 Oli
 
 [1] http://www.freebase.com - A very cool structured database,
 currently
 handling 2.3 million instances of 870 'types'
 
 -
 Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe,
 please visit
 http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
 Unofficial list archive:
 http://www.mail-archive.com/backstage@lists.bbc.co.uk/
 
 
 

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Re: Uploading the BBC programme catalogue to freebase (was RE: [backstage] Programme Catalogue vs. Freebase (was: BBC Programme Catalogue -any APIs yet?))

2007-07-09 Thread Tom Loosemore

On 09/07/07, Oliver Cole [EMAIL PROTECTED] wrote:

On Mon, 2007-07-09 at 21:30 +0100, Brendan Quinn wrote:
 I was considering entering a hack for Hack Day around that very thing.
 But then they went and made me one of the judges ;-)

 Wanna help? A simple set of scripts that scrape the archive (er I mean
 call that big RESTful API) and post entries/updates to the freebase
 sandbox server would be an interesting experiment.

I've not yet (bulk) posted data on Freebase - I'll take a look at this
when I'm more au fait with it.

 compare
 http://www.freebase.com/view/?id=%239202a8c04000641f80012406
 with
 http://open.bbc.co.uk/catalogue/infax/series/DOCTOR+WHO
 !

Freebase is still in alpha as far as I know - those who can't see the
first link can see a screenshot at:
http://cornflakes.imen.org.uk/~oli/DrWho.png

Those who are particularly interested can feel free to ask me for one of
my remaining 4 invites - and I imagine Brendan has some too.

 There may be some rights issues around what would basically amount to
 opening up the programme catalogue under the creative commons
 attribution license, where the attribution wouldn't go to the BBC but to
 Freebase...

Well, the RDF for the catalogue links to
http://backstage.bbc.co.uk/archives/2005/05/api_licence.html:

The BBC grants to You a ... non-sublicensable right to copy...

Further:

d. not publish, distribute or otherwise make the APIs available,
(including in any Work You create), in a way that would enable other
people to download or use the APIs other than as set out in this
Licence.


standard backstage API licence -  it was the only one lying around at
the time... (nov 2005)


I don't see any legal way that we can export the data to Freebase and
relicense it as CC-BY.


yeah... the attribution back to BBC kinda matters... though given the
programmes are clearly BBC programmes, I'm not sure it's the end of
the world...



Would you be able to get the appropriate BBC people to get this done?


I'll do a bit of lobbying...
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/