Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-10 Thread Billy Abbott

On Tue, 10 Jun 2008, Andy wrote:


Just tried it out. I did notice the text from Hansard was not actually
the same as what was said, is this common?


As it says in the bullet points to the right of the video and text:

"Hansard is not a verbatim transcript, so spoken words might differ 
slightly from the printed version."


Making things more formal or more followable than when the MPs are talking 
over each other seems to be very normal.


--billy
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-10 Thread Andy
Etienne Pollard wrote:
> You might be interested to learn about a new project that has just
> been launched by TheyWorkForYou.com - an online video archive of the
> House of Commons, with video clips posted in Flash video format
> alongside the text of speeches from Hansard.

Just tried it out. I did notice the text from Hansard was not actually
the same as what was said, is this common?

For instance Hansard text says:
> I am a little worried by the example that the hon. Gentleman has just given

But the video says:
> I am a little worried by the example that he's highlighted

See: http://www.theyworkforyou.com/debate/?id=2008-06-06a.1050.2

All in all though an excellent service and let's hope this can get more
people interested in politics.

Keep up the good work!

Andy


-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-06 Thread Frank Wales

John O'Donovan wrote:

Now that you know what happens I bet you won't do that again...


Actually, I think that behaviour is a bug, but as I'm now out of
scratch pantaloons to test with, I'll leave it for others more
versed in surprise linguo-tailoring incidents to investigate.
--
Frank Wales [EMAIL PROTECTED]
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-06 Thread John O'Donovan
Now that you know what happens I bet you won't do that again...
 
Cheers,
 
jod



From: [EMAIL PROTECTED] on behalf of Frank Wales
Sent: Thu 6/5/2008 22:57
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Video recordings of the House of Commons on 
TheyWorkForYou.com



John O'Donovan wrote:
> If you swear on this list for example, your trousers will fall down like
> a comedy clown.

Huh.  I did not know that.

But how sensitive is this language-sensitive depant-o-tron?  Let's find out...

What word starts with 'f' and ends in 'uck'?



















Firetruck!








Hey, look at that, my pants are still up.
They're on fire, but they're still up.
--
Frank Wales [EMAIL PROTECTED]
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/




Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-05 Thread Frank Wales

John O'Donovan wrote:
If you swear on this list for example, your trousers will fall down like 
a comedy clown.


Huh.  I did not know that.

But how sensitive is this language-sensitive depant-o-tron?  Let's find out...

What word starts with 'f' and ends in 'uck'?



















Firetruck!








Hey, look at that, my pants are still up.
They're on fire, but they're still up.
--
Frank Wales [EMAIL PROTECTED]
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-05 Thread John O'Donovan
No comment.
 
There are of course many ways to ensure unparliamentarily language is
not used.
 
The BBC has it's own. 
 
If you swear on this list for example, your trousers will fall down like
a comedy clown.
 
Cheers,
 
jod



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Thomas Leitch
Sent: 05 June 2008 10:35
To: backstage@lists.bbc.co.uk
Subject: RE: [backstage] Video recordings of the House of Commons on
TheyWorkForYou.com


Well done Etienne, 
 
A fantastic piece of work...
 
 
But I would have to take issue with your view John, of Hansard being an
entirely representative view of what went on in the various chambers...
http://news.bbc.co.uk/1/hi/uk_politics/7187907.stm  ;-)
 
 





From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John O'Donovan
Sent: 04 June 2008 19:17
To: backstage@lists.bbc.co.uk
    Subject: RE: [backstage] Video recordings of the House of
Commons on TheyWorkForYou.com


Hi - as part of the Digital Democracy project we will be looking
at ways to improve the quantity and quality of coverage, as well as
tagging and metadata developments, some of which will be automated and
produce better metadata at source.
 
One of the challenges here is that much of the metadata does not
come from the BBC. Lining up transcripts and other metadata with video
is a difficult to do reliably in an automated way as there is so much
room for error. Also the captions available at source are not a
replacement for the full transcript produced by Hansard.
 
There is a very early overview of the principles for the DD
project here...


http://www.bbc.co.uk/blogs/bbcinternet/2008/02/digital_democracy.html
 
The way MySociety have approached this simplifies a difficult
task and makes the video more accessible as a result.
 
It is a great way to democratise the process of democratising
democracy
 
Cheers,

jod

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brian Butterworth
Sent: 04 June 2008 12:48
To: backstage@lists.bbc.co.uk
    Subject: Re: [backstage] Video recordings of the House of
Commons on TheyWorkForYou.com


Phil,


2008/6/4 Phil Wilson <[EMAIL PROTECTED]>:


I'm sure one of the first computing acronyms I
ever leant was GIGO...

http://en.wikipedia.org/wiki/GIGO



Yes, I know it. Take a look at Etienne's reply for one
aspect of the details and why the captions may also count as garbage.

Another important point is that the video captioner
they've put together matches video to Hansard, rather than just the
captions - that is, to the official record of what was said, rather than
what was actually said, which is an important distinction.


I still can't help thinking that this should be done "at
source".  I thought Auntie was supposed to be give good tagging?
 



Phil




2008/6/4 Phil Wilson <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>: 


   However, a clear text feed of the data
would keep the data pure,
   surely?


   Seriously, where would the fun in that be?

   Phil 'timestamp-tastic' Wilson

   -

   Sent via the backstage.bbc.co.uk
<http://backstage.bbc.co.uk> 

   discussion group.  To unsubscribe, please
visit

http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:

http://www.mail-archive.com/backstage@lists.bbc.co.uk/




-- 
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital
television and switchover advice, since 2002


-
Sent via the backstage.bbc.co.uk discussion group.  To
unsubscribe, please visit
http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:
http://www.mail-archive.com/backstage@lists.bbc.co.uk/





--

Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-05 Thread Etienne Pollard
On Thu, Jun 5, 2008 at 10:34 AM, Thomas Leitch <[EMAIL PROTECTED]> wrote:
> Well done Etienne,
>
> A fantastic piece of work...

Thank you for the compliment.  It wasn't just me, by any means - there
were also very significant amounts of work done by Matthew Somerville
and other members of the mySociety team.  And we stood on the
shoulders of some giants, in particular the people who created the
ffmpeg/mplayer suite, and the lighttpd crowd.

> But I would have to take issue with your view John, of Hansard being an
> entirely representative view of what went on in the various chambers...
> http://news.bbc.co.uk/1/hi/uk_politics/7187907.stm  ;-)

-- etienne
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-05 Thread Thomas Leitch
Well done Etienne, 
 
A fantastic piece of work...
 
 
But I would have to take issue with your view John, of Hansard being an
entirely representative view of what went on in the various chambers...
http://news.bbc.co.uk/1/hi/uk_politics/7187907.stm  ;-)
 
 





From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John O'Donovan
Sent: 04 June 2008 19:17
To: backstage@lists.bbc.co.uk
Subject: RE: [backstage] Video recordings of the House of
Commons on TheyWorkForYou.com


Hi - as part of the Digital Democracy project we will be looking
at ways to improve the quantity and quality of coverage, as well as
tagging and metadata developments, some of which will be automated and
produce better metadata at source.
 
One of the challenges here is that much of the metadata does not
come from the BBC. Lining up transcripts and other metadata with video
is a difficult to do reliably in an automated way as there is so much
room for error. Also the captions available at source are not a
replacement for the full transcript produced by Hansard.
 
There is a very early overview of the principles for the DD
project here...


http://www.bbc.co.uk/blogs/bbcinternet/2008/02/digital_democracy.html
 
The way MySociety have approached this simplifies a difficult
task and makes the video more accessible as a result.
 
It is a great way to democratise the process of democratising
democracy
 
Cheers,

jod

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brian Butterworth
Sent: 04 June 2008 12:48
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Video recordings of the House of
Commons on TheyWorkForYou.com


Phil,


2008/6/4 Phil Wilson <[EMAIL PROTECTED]>:


I'm sure one of the first computing acronyms I
ever leant was GIGO...

http://en.wikipedia.org/wiki/GIGO



Yes, I know it. Take a look at Etienne's reply for one
aspect of the details and why the captions may also count as garbage.

Another important point is that the video captioner
they've put together matches video to Hansard, rather than just the
captions - that is, to the official record of what was said, rather than
what was actually said, which is an important distinction.


I still can't help thinking that this should be done "at
source".  I thought Auntie was supposed to be give good tagging?
 



Phil




2008/6/4 Phil Wilson <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>: 


   However, a clear text feed of the data
would keep the data pure,
   surely?


   Seriously, where would the fun in that be?

   Phil 'timestamp-tastic' Wilson

   -

   Sent via the backstage.bbc.co.uk
<http://backstage.bbc.co.uk> 

   discussion group.  To unsubscribe, please
visit

http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:

http://www.mail-archive.com/backstage@lists.bbc.co.uk/




-- 
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital
television and switchover advice, since 2002


-
Sent via the backstage.bbc.co.uk discussion group.  To
unsubscribe, please visit
http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:
http://www.mail-archive.com/backstage@lists.bbc.co.uk/





-- 

Brian Butterworth

http://www.ukfree.tv - independent digital television and
switchover advice, since 2002 



Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-05 Thread Etienne Pollard
On Wed, Jun 4, 2008 at 7:17 PM, John O'Donovan <[EMAIL PROTECTED]> wrote:
> The way MySociety have approached this simplifies a difficult task and makes
> the video more accessible as a result.
>
> It is a great way to democratise the process of democratising democracy

Glad to hear that you like it!

-- etienne
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread John O'Donovan
Hi - as part of the Digital Democracy project we will be looking at ways
to improve the quantity and quality of coverage, as well as tagging and
metadata developments, some of which will be automated and produce
better metadata at source.
 
One of the challenges here is that much of the metadata does not come
from the BBC. Lining up transcripts and other metadata with video is a
difficult to do reliably in an automated way as there is so much room
for error. Also the captions available at source are not a replacement
for the full transcript produced by Hansard.
 
There is a very early overview of the principles for the DD project
here...
http://www.bbc.co.uk/blogs/bbcinternet/2008/02/digital_democracy.html
 
The way MySociety have approached this simplifies a difficult task and
makes the video more accessible as a result.
 
It is a great way to democratise the process of democratising democracy
 
Cheers,

jod

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brian Butterworth
Sent: 04 June 2008 12:48
To: backstage@lists.bbc.co.uk
Subject: Re: [backstage] Video recordings of the House of Commons on
TheyWorkForYou.com


Phil,


2008/6/4 Phil Wilson <[EMAIL PROTECTED]>:


I'm sure one of the first computing acronyms I ever
leant was GIGO...

http://en.wikipedia.org/wiki/GIGO



Yes, I know it. Take a look at Etienne's reply for one aspect of
the details and why the captions may also count as garbage.

Another important point is that the video captioner they've put
together matches video to Hansard, rather than just the captions - that
is, to the official record of what was said, rather than what was
actually said, which is an important distinction.


I still can't help thinking that this should be done "at source".  I
thought Auntie was supposed to be give good tagging?
 



Phil




2008/6/4 Phil Wilson <[EMAIL PROTECTED]
<mailto:[EMAIL PROTECTED]>>: 


   However, a clear text feed of the data would keep
the data pure,
   surely?


   Seriously, where would the fun in that be?

   Phil 'timestamp-tastic' Wilson

   -

   Sent via the backstage.bbc.co.uk
<http://backstage.bbc.co.uk> 

   discussion group.  To unsubscribe, please visit

http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:

http://www.mail-archive.com/backstage@lists.bbc.co.uk/




-- 
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital television
and switchover advice, since 2002


-
Sent via the backstage.bbc.co.uk discussion group.  To
unsubscribe, please visit
http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
Unofficial list archive:
http://www.mail-archive.com/backstage@lists.bbc.co.uk/





-- 

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover
advice, since 2002 


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Martin Deutsch
On 6/4/08, Etienne Pollard <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 4, 2008 at 10:55 AM, Brian Butterworth
> <[EMAIL PROTECTED]> wrote:
> > What I was saying was that the old Freeview version of BBC Parliament used
> > to have a quarter-screen picture and the information that is now in the
> > Astons was provided using MHEG5.  This was clear text (to keep the bandwidth
> > down) not bitmap graphics.
>
> Forgive my ignorance, but what is an Aston?
Aston Broadcast Systems made a rather popular line of TV caption
generating equipment - what are sometimes known as 'lower third
graphics' are frequently referred to in the UK generically as Astons.

> > OCRing is never going to be brilliant, given the semi-transparent nature of
> > the captions on BBC Parliament.
> >
> > However, a clear text feed of the data would keep the data pure, surely?
>
> The machines that put the captions up on the screen have internal
> text-based logs, to which we have access.  However, since this is
> basically just pulling logfiles off a set of operational machines this
> access isn't 100% reliable.  The data in the log files is of variable
> quality, since there are some speeches that are not captioned, and
> other times captions aren't actually speeches (e.g. reaction shot of
> previous speaker during a long speech can prompt a back and forth of
> captions, even though the same person is speaking throughout the
> changeover in captions).  So although we use the logfiles to get an
> approximate fix, we had to resort to the timestamping game for
> accuracy.

Likewise, the caption may not appear as soon as the speaker does - a
friend of mine spent a most of a summer in a BBC Parliament
transmission gallery, captioning House of Lords coverage in real time.
It took while, but she got quite good at recognising peers by their
beards.

 - martin
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Christopher Woods
> Aston is a company who provide systems for generating 
> on-screen graphics for live programmes - however it's also 
> used as a generic term for those same graphics.  So the kind 
> of graphics like you get on the News where they'll say "Nick 
> Higham reporting", the name of an interviewee or similar.

It's amazing how manual the whole process is, still... And amusing for me
(not for them, I'm sure) when little mistakes creep into live broadcasts :D
When I was lucky enough to get a tour round the Mailbox studios (an
unexpected one-off perk from one of my uni course's lecturers), I was quite
surprised when we got to go into the control room during a new broadcast and
suddenly had Natasha Kaplinsky's disembodied voice shouting "ASTON ON"
"ASTON..." "...ASTON OFF" at the vision mixer!

Personally I think Brian Blessed's voice would been a better motivator ;)

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Brian Butterworth
Phil,

2008/6/4 Phil Wilson <[EMAIL PROTECTED]>:

> I'm sure one of the first computing acronyms I ever leant was GIGO...
>>
>> http://en.wikipedia.org/wiki/GIGO
>>
>
> Yes, I know it. Take a look at Etienne's reply for one aspect of the
> details and why the captions may also count as garbage.
>
> Another important point is that the video captioner they've put together
> matches video to Hansard, rather than just the captions - that is, to the
> official record of what was said, rather than what was actually said, which
> is an important distinction.


I still can't help thinking that this should be done "at source".  I thought
Auntie was supposed to be give good tagging?


>
>
> Phil
>
>
>> 2008/6/4 Phil Wilson <[EMAIL PROTECTED] > [EMAIL PROTECTED]>>:
>>
>>However, a clear text feed of the data would keep the data pure,
>>surely?
>>
>>
>>Seriously, where would the fun in that be?
>>
>>Phil 'timestamp-tastic' Wilson
>>
>>-
>>Sent via the backstage.bbc.co.uk 
>>discussion group.  To unsubscribe, please visit
>>http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>> Unofficial list archive:
>>http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>>
>>
>>
>>
>> --
>> Please email me back if you need any more help.
>>
>> Brian Butterworth
>>
>> http://www.ukfree.tv - independent digital television and switchover
>> advice, since 2002
>>
> -
> Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
> visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>  Unofficial list archive:
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>



-- 

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover advice,
since 2002


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Brian Butterworth
2008/6/4 Etienne Pollard <[EMAIL PROTECTED]>:

> On Wed, Jun 4, 2008 at 10:55 AM, Brian Butterworth
> <[EMAIL PROTECTED]> wrote:
> > I thought they were trying to do OCR on the captions from the DVB-T
> stream.
> >
> > What I was saying was that the old Freeview version of BBC Parliament
> used
> > to have a quarter-screen picture and the information that is now in the
> > Astons was provided using MHEG5.  This was clear text (to keep the
> bandwidth
> > down) not bitmap graphics.
>
> Forgive my ignorance, but what is an Aston?


Sorry, it's a genericized trademark for "captions" overlaid on TV output..

http://en.wikipedia.org/wiki/Aston_Broadcast_Systems



>
> > OCRing is never going to be brilliant, given the semi-transparent nature
> of
> > the captions on BBC Parliament.
> >
> > However, a clear text feed of the data would keep the data pure, surely?
>
> The machines that put the captions up on the screen have internal
> text-based logs, to which we have access.  However, since this is
> basically just pulling logfiles off a set of operational machines this
> access isn't 100% reliable.


The MHEG5 service was 100% reliable, I would conjecture that it is possible
to get them reliably.


>  The data in the log files is of variable
> quality, since there are some speeches that are not captioned, and
> other times captions aren't actually speeches (e.g. reaction shot of
> previous speaker during a long speech can prompt a back and forth of
> captions, even though the same person is speaking throughout the
> changeover in captions).  So although we use the logfiles to get an
> approximate fix, we had to resort to the timestamping game for
> accuracy.


IMHO this is a just a clear case of GIGO.  The best thing is whoever is
operating the captions for BBC Parliament to be provided with the ability to
correctly tag the content in the first place.  The taxpayer (not Licence Fee
payer) is paying for this to be done already, it seems just crazy that they
can't do it, ahem, properly.

I'm not attacking the idea of the workaround, I'm just saying that it would
be best for the data to be prepared correctly at source and then
distributed.




>
>
> Hope that helps,
>
> -- etienne
> -
> Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
> visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>  Unofficial list archive:
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>



-- 


Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover advice,
since 2002


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Matthew Somerville

Brian Butterworth wrote:
I thought they were trying to do OCR on the captions from the DVB-T 
stream. 


No, we have clear text. As it says in the blog post :-)


However, a clear text feed of the data would keep the data pure, surely?


Sadly not (trust me, I've spent some time on this!) - even ignoring some 
missing data (so we'd have to do this for then anyway), when there's a long 
debate sometimes the captioning simply shows a summary of what's going on 
rather than someone's name (especially if they're a minister so we "know" 
who they are); captions don't cover quick interruptions, which can really 
mess things up if there's a lot of going back and forth between two people; 
etc. etc. :)


ATB,
Matthew

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Phil Wilson

I'm sure one of the first computing acronyms I ever leant was GIGO...

http://en.wikipedia.org/wiki/GIGO


Yes, I know it. Take a look at Etienne's reply for one aspect of the details and why the 
captions may also count as garbage.


Another important point is that the video captioner they've put together matches video to 
Hansard, rather than just the captions - that is, to the official record of what was said, 
rather than what was actually said, which is an important distinction.


Phil



2008/6/4 Phil Wilson <[EMAIL PROTECTED] 
>:


However, a clear text feed of the data would keep the data pure,
surely?


Seriously, where would the fun in that be?

Phil 'timestamp-tastic' Wilson

-
Sent via the backstage.bbc.co.uk 
discussion group.  To unsubscribe, please visit
http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
 Unofficial list archive:
http://www.mail-archive.com/backstage@lists.bbc.co.uk/




--
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover 
advice, since 2002

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


RE: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Andrew Bowden

> Forgive my ignorance, but what is an Aston?

Aston is a company who provide systems for generating on-screen graphics
for live programmes - however it's also used as a generic term for those
same graphics.  So the kind of graphics like you get on the News where
they'll say "Nick Higham reporting", the name of an interviewee or
similar.
 

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Matthew Somerville

Phil Wilson wrote:

Phil 'timestamp-tastic' Wilson


People are catching up on you, Phil, better get back to it! ;-)

ATB,
Matthew
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Brian Butterworth
Phil,

I'm sure one of the first computing acronyms I ever leant was GIGO...

http://en.wikipedia.org/wiki/GIGO

2008/6/4 Phil Wilson <[EMAIL PROTECTED]>:

> However, a clear text feed of the data would keep the data pure, surely?
>>
>
> Seriously, where would the fun in that be?
>
> Phil 'timestamp-tastic' Wilson
>
> -
> Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
> visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>  Unofficial list archive:
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>



-- 
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover advice,
since 2002


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Etienne Pollard
On Wed, Jun 4, 2008 at 10:55 AM, Brian Butterworth
<[EMAIL PROTECTED]> wrote:
> I thought they were trying to do OCR on the captions from the DVB-T stream.
>
> What I was saying was that the old Freeview version of BBC Parliament used
> to have a quarter-screen picture and the information that is now in the
> Astons was provided using MHEG5.  This was clear text (to keep the bandwidth
> down) not bitmap graphics.

Forgive my ignorance, but what is an Aston?

> OCRing is never going to be brilliant, given the semi-transparent nature of
> the captions on BBC Parliament.
>
> However, a clear text feed of the data would keep the data pure, surely?

The machines that put the captions up on the screen have internal
text-based logs, to which we have access.  However, since this is
basically just pulling logfiles off a set of operational machines this
access isn't 100% reliable.  The data in the log files is of variable
quality, since there are some speeches that are not captioned, and
other times captions aren't actually speeches (e.g. reaction shot of
previous speaker during a long speech can prompt a back and forth of
captions, even though the same person is speaking throughout the
changeover in captions).  So although we use the logfiles to get an
approximate fix, we had to resort to the timestamping game for
accuracy.

Hope that helps,

-- etienne
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Phil Wilson

However, a clear text feed of the data would keep the data pure, surely?


Seriously, where would the fun in that be?

Phil 'timestamp-tastic' Wilson
-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Brian Butterworth
Matthew,

I thought they were trying to do OCR on the captions from the DVB-T stream.


What I was saying was that the old Freeview version of BBC Parliament used
to have a quarter-screen picture and the information that is now in the
Astons was provided using MHEG5.  This was clear text (to keep the bandwidth
down) not bitmap graphics.

OCRing is never going to be brilliant, given the semi-transparent nature of
the captions on BBC Parliament.

However, a clear text feed of the data would keep the data pure, surely?

Sorry if I've missed something.

2008/6/4 Matthew Somerville <[EMAIL PROTECTED]>:

> Brian Butterworth wrote:
>
>> But why on earth is this being done this way?
>>
>
> If by Astons you mean the superimposed captions, then if you had read the
> text below (and the blog posting linked to), you would see that we did try
> exactly that and it sadly just wasn't good enough.
>
> ATB,
> Matthew
>
>  The Astons on the channel carry the information anyway, and we know that
>> this can be fed into another computer system, as the MHEG5 version of BBC
>> Parliament.
>>
>> I can't be that hard for BBC Parliament to provide the feed of information
>> that is used to generate the Astons (and the former MHEG5 service) as a live
>> text file (or something).
>>
>
>  2008/6/3 Etienne Pollard <[EMAIL PROTECTED] >:
>> > See the blog posting at
>> >
>> http://www.mysociety.org/2008/06/01/video-recordings-of-the-house-of-commons-on-theyworkforyoucom/
>> > for the full announcement.
>>
>
>  > Matching up individual speeches to video cuepoints is actually done in
>> > two stages - firstly, the CaptionerBot makes an approximate match for
>> > some of the speeches in Hansard using the raw BBC captions, and then
>> > we ask the general public to improve on the work of CaptionerBot using
>> > our simple and addictive online game (league table, prizes, etc).
>>
> -
> Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
> visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>  Unofficial list archive:
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>



-- 

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover advice,
since 2002


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Matthew Somerville

Brian Butterworth wrote:

But why on earth is this being done this way?


If by Astons you mean the superimposed captions, then if you had read the 
text below (and the blog posting linked to), you would see that we did try 
exactly that and it sadly just wasn't good enough.


ATB,
Matthew

The Astons on the channel carry the information anyway, and we know that 
this can be fed into another computer system, as the MHEG5 version of 
BBC Parliament.


I can't be that hard for BBC Parliament to provide the feed of 
information that is used to generate the Astons (and the former MHEG5 
service) as a live text file (or something).



2008/6/3 Etienne Pollard <[EMAIL PROTECTED] >:
> See the blog posting at
> 
http://www.mysociety.org/2008/06/01/video-recordings-of-the-house-of-commons-on-theyworkforyoucom/
> for the full announcement.



> Matching up individual speeches to video cuepoints is actually done in
> two stages - firstly, the CaptionerBot makes an approximate match for
> some of the speeches in Hansard using the raw BBC captions, and then
> we ask the general public to improve on the work of CaptionerBot using
> our simple and addictive online game (league table, prizes, etc).

-
Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please 
visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.  
Unofficial list archive: http://www.mail-archive.com/backstage@lists.bbc.co.uk/


Re: [backstage] Video recordings of the House of Commons on TheyWorkForYou.com

2008-06-04 Thread Brian Butterworth
Generally a great idea.

But why on earth is this being done this way?

The Astons on the channel carry the information anyway, and we know that
this can be fed into another computer system, as the MHEG5 version of BBC
Parliament.

I can't be that hard for BBC Parliament to provide the feed of information
that is used to generate the Astons (and the former MHEG5 service) as a live
text file (or something).



2008/6/3 Etienne Pollard <[EMAIL PROTECTED]>:

> Hello,
>
> You might be interested to learn about a new project that has just
> been launched by TheyWorkForYou.com - an online video archive of the
> House of Commons, with video clips posted in Flash video format
> alongside the text of speeches from Hansard.  You can view them on the
> website, or you can embed clips of the individual speeches on your
> blog or personal website by copying and pasting a bit of HTML that is
> listed below each clip on theyworkforyou.com.  See the blog posting at
>
> http://www.mysociety.org/2008/06/01/video-recordings-of-the-house-of-commons-on-theyworkforyoucom/
> for the full announcement.
>
> The key thing now is that we need your help to match up ~28,000
> speeches with the video footage (we've already got about 4,300 done).
> We've built a really simple, hyper-addictive website for people to
> use, complete with league tables and prizes (the rare and coveted
> mySociety hoodies).  You can find it right now at
> http://www.theyworkforyou.com/video/ - if you want to appear on the
> league table then take 30 seconds and register a username.  It's crowd
> sourcing applied to video timestamping - using our simple and
> remarkably addictive online game (with league tables, and did I
> mention the prizes?).
>
> Matching up individual speeches to video cuepoints is actually done in
> two stages - firstly, the CaptionerBot makes an approximate match for
> some of the speeches in Hansard using the raw BBC captions, and then
> we ask the general public to improve on the work of CaptionerBot using
> our simple and addictive online game (league table, prizes, etc).
>
> The video is taken from BBC Parliament, chopped up and transcoded into
> Flash video format (generic Flash 6, iirc), and served up to the
> general public using lighttpd and mod_flv_streaming.  This lets us
> give you direct access to any point in the video file just by
> specifying a parameter in the URL that indicates seconds elapsed since
> the start of the file.  The backend processing system uses lots of
> open source software to download and process live footage of the House
> of Commons from BBC Parliament (ffmpeg, mplayer, mencoder, yamdi, and
> quite a lot of perl), and the BBC web api to get the schedule
> information it needs to extract the live coverage.
>
> Now, please help us out by timestamping some video!
> http://www.theyworkforyou.com/video/ is the place to be...
>
> All the best,
>
> Etienne
> --
> Etienne Pollard
> [EMAIL PROTECTED]
> +44 (0) 7946 415 996
> -
> Sent via the backstage.bbc.co.uk discussion group.  To unsubscribe, please
> visit http://backstage.bbc.co.uk/archives/2005/01/mailing_list.html.
>  Unofficial list archive:
> http://www.mail-archive.com/backstage@lists.bbc.co.uk/
>



-- 
Please email me back if you need any more help.

Brian Butterworth

http://www.ukfree.tv - independent digital television and switchover advice,
since 2002