Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

Olli Pettay Mon, 09 Jan 2012 08:36:40 -0800

On 01/09/2012 06:17 PM, Young, Milan wrote:

To clarify, are you interested in developing the entirety of the JS API
we developed in the HTML Speech XG, or just the subset proposed by
Google?


Not sure if you sent the reply to me only on purpose.
CCing the WG and XG lists.

Since from practical point of view
the API+protocol XG defined is a huge thing to implement at once, it
makes sense to implement it in pieces. Something like
(1) Initial API implementation. Some subset of what XG defined
    Not necessarily exactly what Google proposed but something close to
    it. Support for remote speech services could be in the initial API,
    but if UA doesn't implement the protocol, it would just fail when
    trying to connect to remove services.
(2) Simultaneously or later - depending on the protocol standardization
    in IETF or elsewhere - support remote speech services
(3) implement some more of the API XG defined (if needed by web
    developers or web services)
(4) Implement <reco>? I'm not at all convinced we need reco element
    since automatic value binding makes it just a bit strange and
    inconsistent.

This is the way web APIs tend to evolve. Implement first something quitesmall, and then add new features if/when needed.




-Olli


Thanks


-----Original Message-----
From: Olli Pettay [mailto:olli.pet...@helsinki.fi]
Sent: Monday, January 09, 2012 8:13 AM
To: Arthur Barstow
Cc: ext Satish S; Peter Beverloo; Glen Shires; public-webapps@w3.org;
public-xg-htmlspe...@w3.org; Dan Burnett
Subject: Re: Speech Recognition and Text-to-Speech Javascript API -
seeking feedback for eventual standardization

On 01/09/2012 04:59 PM, Arthur Barstow wrote:

Hi All,

As I indicated in [1], WebApps already has a relatively large number
of specs in progress and the group has agreed to add some new specs.
As such, to review any new charter addition proposals, I think we need

at least the following:

1. Relatively clear scope of the feature(s). (This information should
be detailed enough for WG members with relevant IP to be able to make
an IP
assessment.)

2. Editor commitment(s)

3. Implementation commitments from at least two WG members

Is this really requirement nowadays?
Is there for example commitment to implement File System API?
http://dev.w3.org/2009/dap/file-system/file-dir-sys.html

But anyway, I'm interested to implement the speech API, and as far as I
know, also other people involved with Mozilla have shown interest.


4. Testing commitment(s)

Re the APIs in this thread ->  I think Glen's API proposal [2]
adequately addresses #1 above and his previous responses imply support

for #2 but it would be good for Glen, et al. to confirm. Re #3, other
than Google, I don't believe any other implementor has voiced their
support for WebApps adding these APIs. As such, I think we we need
additional input on implementation support (e.g. Apple, Microsoft,

Mozilla, Opera, etc.).

It doesn't matter too much to me in which group the API will be
developed (except that I'm against doing it in HTML WG).
WebApps is reasonably good place (if there won't be any IP issues.)




-Olli


Re the markup question ->  WebAppsdoes have some precedence for

defining

markup (e.g. XBL2, Widget XML config). I don't have a strong opinion

on

whether or not WebApps should include the type of markup in the XG
Report. I think the next step here is for WG members to submit

comments

on this question. In particular, proponents of including markup in
WebApps' charter should respond to #1-4 above.

-AB

[1]

http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1474.html

[2]

http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
peechapi.html




On 1/5/12 6:49 AM, ext Satish S wrote:


2) How does the draft incorporate with the existing<input speech>
API[1]? It seems to me as if it'd be best to define both the

attribute

as the DOM APIs in a single specification, also because they share
several events (yet don't seem to be interchangeable) and the
attribute already has an implementation.


The<input speech>  API proposal was implemented as<input
x-webkit-speech>  in Chromium a while ago. A lot of the developer
feedback we received was about finer grained control including a
javascript API and letting the web application decide how to present
the user interface rather than tying it to the<input>  element.

The HTML Speech Incubator Group's final report [1] includes a<reco>
element which addresses both these concerns and provides automatic
binding of speech recognition results to existing HTML elements. We
are not sure if the WebApps WG is a good place to work on
standardising such markup elements, hence did not include in the
simplified Javascript API [2]. If there is sufficient interest and
scope in the WebApps WG charter for the Javascript API and markup, we
are happy to combine them both in the proposal.

[1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
[2]

http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
peechapi.html




Thanks,
Peter

[1]

http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-002
0/api-draft.html



On Thu, Jan 5, 2012 at 07:15, Glen Shires<gshi...@google.com
<mailto:gshi...@google.com>>  wrote:

As Dan Burnett wrote below: The HTML Speech Incubator Group [1]

has recently

wrapped up its work on use cases, requirements, and proposals

for adding

automatic speech recognition (ASR) and text-to-speech (TTS)

capabilities to

HTML. The work of the group is documented in the group's Final

Report. [2]

The members of the group intend this work to be input to one or

more

working groups, in W3C and/or other standards development

organizations such

as the IETF, as an aid to developing full standards in this space.

Because that work was so broad, Art Barstow asked (below) for a

relatively

specific proposal. We at Google are proposing that a subset of it

be

accepted as a work item by the Web Applications WG.

Specifically, we are

proposing this Javascript API [3], which enables web developers to
incorporate speech recognition and synthesis into their web pages.
This simplified subset enables developers to use scripting to

generate

text-to-speech output and to use speech recognition as an input

for forms,

continuous dictation and control, and it supports the majority

of use-cases

in the Incubator Group's Final Report.

We welcome your feedback and ask that the Web Applications WG
consider accepting this Javascript API [3] as a work item.

[1] charter: http://www.w3.org/2005/Incubator/htmlspeech/charter
[2] report:

http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/

[3]
API:

http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/s
peechapi.html


Bjorn Bringert
Satish Sampath
Glen Shires

On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires

<gshi...@google.com<mailto:gshi...@google.com>>  wrote:


Milan,
The IDLs contained in both documents are in the same format and

order, so

it's relatively easy to compare the two side-by-side. The

semantics of the

attributes, methods and events have not changed, and both IDLs

link directly

to the definitions contained in the Speech XG Final Report.

As you mention, we agree that the protocol portions of the

Speech XG Final

Report are most appropriate for consideration by a group such

as IETF, and

believe such work can proceed independently, particularly

because the Speech

XG Final Report has provided a roadmap for these to remain

compatible.

Also, as shown in the Speech XG Final Report - Overview, the

"Speech Web

API" is not dependent on the "Speech Protocol" and a "Default

Speech"

service can be used for local or remote speech recognition and

synthesis.


Glen Shires


On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan

<milan.yo...@nuance.com<mailto:milan.yo...@nuance.com>>

wrote:


Hello Glen,



The proposal says that it contains a "simplified subset of the

JavaScript

API". Could you please clarify which elements of the HTMLSpeech
recommendation's JavaScript API were omitted? I think this

would be the

most efficient way for those of us familiar with the XG

recommendation to

evaluate the new proposal.



I'd also appreciate clarification on how you see the protocol

being

handled. In the HTMLSpeech group we were thinking about this as a
hand-in-hand relationship between W3C and IETF like

WebSockets. Is this

still your (and Google's) vision?



Thanks





From: Glen Shires [mailto:gshi...@google.com

<mailto:gshi...@google.com>]

Sent: Thursday, December 22, 2011 11:14 AM
To: public-webapps@w3.org<mailto:public-webapps@w3.org>;

Arthur Barstow

Cc: public-xg-htmlspe...@w3.org

<mailto:public-xg-htmlspe...@w3.org>; Dan Burnett



Subject: Re: HTML Speech XG Completes, seeks feedback for

eventual

standardization



We at Google believe that a scripting-only (Javascript) subset

of the API

defined in the Speech XG Incubator Group Final Report is of

appropriate

scope for consideration by the WebApps WG.



The enclosed scripting-only subset supports the majority of

the use-cases

and samples in the XG proposal. Specifically, it enables

web-pages to

generate speech output and to use speech recognition as an

input for forms,

continuous dictation and control. The Javascript API will

allow web pages to

control activation and timing and to handle results and

alternatives.




We welcome your feedback and ask that the Web Applications WG

consider

accepting this as a work item.



Bjorn Bringert

Satish Sampath

Glen Shires



On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires

<gshi...@google.com<mailto:gshi...@google.com>>  wrote:


We at Google believe that a scripting-only (Javascript) subset

of the API

defined in the Speech XG Incubator Group Final Report [1] is

of appropriate

scope for consideration by the WebApps WG.



A scripting-only subset supports the majority of the use-cases

and

samples in the XG proposal. Specifically, it enables web-pages

to generate

speech output and to use speech recognition as an input for

forms,

continuous dictation and control. The Javascript API will

allow web pages to

control activation and timing and to handle results and

alternatives




As Dan points out above, we envision that different portions

of the

Incubator Group Final Report are applicable to different

working groups "in

W3C and/or other standards development organizations such as

the IETF".

This scripting API subset does not preclude other groups from

pursuing

standardization of relevant HTML markup or underlying

transport protocols,

and indeed the Incubator Group Final Report defines a

potential roadmap such

that such additions can be compatible.



To make this more concrete, Google will provide to this

mailing list a

specific proposal extracted from the Incubator Group Final

Report, that

includes only those portions we believe are relevant to

WebApps, with links

back to the Incubator Report as appropriate.



Bjorn Bringert

Satish Sampath

Glen Shires



[1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/



On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett

<dburn...@voxeo.com<mailto:dburn...@voxeo.com>>  wrote:


Thanks for the info, Art. To be clear, I personally am *NOT*

proposing

adding any specs to WebApps, although others might. My email

below as a

Chair of the group is merely to inform people of this work and

ask for

feedback.
I expect that your information will be useful for others who

might wish

for some of this work to continue in WebApps.

-- dan



On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:

Hi Dan,

WebApps already has a relatively large number of specs in

progress (see

[PubStatus]) and the group has agreed to add some additional

specs (see

[CharterChanges]). As such, please provide a relatively

specific proposal

about the features/specs you and other proponents would like

to add to

WebApps.

Regarding the level of detail for your proposal, I think a

reasonable

precedence is something like the Gamepad and

Pointer/MouseLock proposals

(see [CharterChanges]). (Perhaps this could be achieved by

identifying

specific sections in the XG's Final Report?)

-Art Barstow

[PubStatus]

http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications

[CharterChanges]

http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed


On 12/12/11 5:25 PM, ext Dan Burnett wrote:

Dear WebApps people,

The HTML Speech Incubator Group [1] has recently wrapped up

its work

on use cases, requirements, and proposals for adding

automatic speech

recognition (ASR) and text-to-speech (TTS) capabilities to

HTML. The work

of the group is documented in the group's Final Report. [2]

The members of the group intend this work to be input to

one or more

working groups, in W3C and/or other standards development

organizations such

as the IETF, as an aid to developing full standards in this

space.

Whether the W3C work happens in a new Working Group or an

existing

one, we are interested in collecting feedback on the

Incubator Group's work.

We are specifically interested in input from the members of

the WebApps

Working Group.

If you have any feedback to share, please send it to, or

cc, the

group's mailing list (public-xg-htmlspe...@w3.org

<mailto:public-xg-htmlspe...@w3.org>). This will allow

comments to be archived in one consistent location for use

by whatever group

takes up this work.


Dan Burnett, Co-Chair
HTML Speech Incubator Group


[1] charter:

http://www.w3.org/2005/Incubator/htmlspeech/charter

[2]

http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/


p.s. This feedback request is being sent to the following

groups:

WebApps, HTML, Audio, DAP, Voice Browser, Multimodal

Interaction

Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

Reply via email to