On 12/10/09 4:54 PM, Weston Ruter wrote:
I've been working on a web app which reads text in a web page,
highlighting each word as it is read. For this to be possible, a
Text-To-Speech API is needed which is able to:
(1) generate the speech audio from some text, and
(2) include the time indicies for when each of the words in the text is
spoken.
Microsoft has its Sapi.SpVoice API via ActiveXObject which does (1) but
not (2) apparently. There are web services (usable in conjunction with
HTML5 Audio) which also do (1) such as the iSpeech API
<http://www.ispeech.org/api> and Google Translate's TTS
<http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en
<http://translate.google.com/translate_tts?q=Hello%2C+World&tl=en>>, but
none that I have found which do (2). In any case, web services
aren't preferable since they require that the audio be transferred over
the network which could take a significant amount of time.
Is anyone aware of any work done to develop a standard TTS API for the
Web? Operating systems already have this functionality built-in, and
it's a shame that web apps can't make use of it. If Google Gears were
alive, it would've been a good place to prototype this, but alas…
You probably want to ask W3C multimodal working group.
There are specifications like XHTML+Voice and SALT
(neither really W3C specifications) and (old) proposals like
MMI-CSS.
-Olli