(Sending this 2nd time. Hopefully whatwg list doesn't bounce it back.)
On 12/11/09 6:05 AM, Bjorn Bringert wrote:
Thanks for the discussion - cool to see more interest today also
(http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-December/024453.html)
I've hacked up a proof-of-concept JavaScript API for speech
recognition and synthesis. It adds a navigator.speech object with
these functions:
void listen(ListenCallback callback, ListenOptions options);
void speak(DOMString text, SpeakCallback callback, SpeakOptions options);
So if I read the examples correctly you're not using grammars anywhere.
I wonder how well does that work in real world cases. Of course if
the speech recognizer can handle everything well without grammars, the
result validation could be done in JS after the result is got from the
recognizer. But I think having support for grammars simplifies coding
and can make speech dialogs somewhat more manageable.
W3C has already standardized things like
http://www.w3.org/TR/speech-grammar/ and
http://www.w3.org/TR/semantic-interpretation/
and the latter one works quite nicely with JS.
Again, I think this kind of discussion should happen in W3C multimodal
WG. Though, I'm not sure how actively or how openly that working group
works atm.
-Olli
The implementation uses an NPAPI plugin for the Android browser that
wraps the existing Android speech APIs. The code is available at
http://code.google.com/p/speech-api-browser-plugin/
There are some simple demo apps in
http://code.google.com/p/speech-api-browser-plugin/source/browse/trunk/android-plugin/demos/
including:
- English to Spanish speech-to-speech translation
- Google search by speaking a query
- The obligatory pizza ordering system
- A phone number dialer
Comments appreciated!
/Bjorn
On Fri, Dec 4, 2009 at 2:51 PM, Olli Pettay<[email protected]> wrote:
Indeed the API should be something significantly simpler than X+V.
Microsoft has (had?) support for SALT. That API is pretty simple and
provides speech recognition and TTS.
The API could be probably even simpler than SALT.
IIRC, there was an extension for Firefox to support SALT (well, there was
also an extension to support X+V).
If the platform/OS provides ASR and TTS, adding a JS API for it should
be pretty simple. X+V tries to handle some logic using VoiceXML FIA, but
I think it would be more web-like to give pure JS API (similar to SALT).
Integrating visual and voice input could be done in scripts. I'd assume
there would be some script libraries to handle multimodal input integration
- especially if there will be touch and gestures events too etc. (Classic
multimodal map applications will become possible in web.)
But this all is something which should be possibly designed in or with W3C
multimodal working group. I know their current architecture is way more
complex, but X+X, SALT and even Multimodal-CSS has been discussed in that
working group.
-Olli
On 12/3/09 2:50 AM, Dave Burke wrote:
We're envisaging a simpler programmatic API that looks familiar to the
modern Web developer but one which avoids the legacy of dialog system
languages.
Dave
On Wed, Dec 2, 2009 at 7:25 PM, João Eiras<[email protected]
<mailto:[email protected]>> wrote:
On Wed, 02 Dec 2009 12:32:07 +0100, Bjorn Bringert
<[email protected]<mailto:[email protected]>> wrote:
We've been watching our colleagues build native apps that use
speech
recognition and speech synthesis, and would like to have JavaScript
APIs that let us do the same in web apps. We are thinking about
creating a lightweight and implementation-independent API that lets
web apps use speech services. Is anyone else interested in that?
Bjorn Bringert, David Singleton, Gummi Hafsteinsson
This exists already, but only Opera supports it, although there are
problems with the library we use for speech recognition.
http://www.w3.org/TR/xhtml+voice/
http://dev.opera.com/articles/view/add-voice-interactivity-to-your-site/
Would be nice to revive that specification and get vendor buy-in.
--
João Eiras
Core Developer, Opera Software ASA, http://www.opera.com/