Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

Arthur Barstow Mon, 09 Jan 2012 07:03:06 -0800

Hi All,

As I indicated in [1], WebApps already has a relatively large number ofspecs in progress and the group has agreed to add some new specs. Assuch, to review any new charter addition proposals, I think we need atleast the following:

1. Relatively clear scope of the feature(s). (This information should bedetailed enough for WG members with relevant IP to be able to make an IPassessment.)


2. Editor commitment(s)

3. Implementation commitments from at least two WG members

4. Testing commitment(s)

Re the APIs in this thread -> I think Glen's API proposal [2] adequatelyaddresses #1 above and his previous responses imply support for #2 butit would be good for Glen, et al. to confirm. Re #3, other than Google,I don't believe any other implementor has voiced their support forWebApps adding these APIs. As such, I think we we need additional inputon implementation support (e.g. Apple, Microsoft, Mozilla, Opera, etc.).

Re the markup question -> WebAppsdoes have some precedence for definingmarkup (e.g. XBL2, Widget XML config). I don't have a strong opinion onwhether or not WebApps should include the type of markup in the XGReport. I think the next step here is for WG members to submit commentson this question. In particular, proponents of including markup inWebApps' charter should respond to #1-4 above.


-AB

[1] http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/1474.html

[2]http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html



On 1/5/12 6:49 AM, ext Satish S wrote:


    2) How does the draft incorporate with the existing <input speech>
    API[1]? It seems to me as if it'd be best to define both the attribute
    as the DOM APIs in a single specification, also because they share
    several events (yet don't seem to be interchangeable) and the
    attribute already has an implementation.

The <input speech> API proposal was implemented as <inputx-webkit-speech> in Chromium a while ago. A lot of the developerfeedback we received was about finer grained control including ajavascript API and letting the web application decide how to presentthe user interface rather than tying it to the <input> element.

The HTML Speech Incubator Group's final report [1] includes a <reco>element which addresses both these concerns and provides automaticbinding of speech recognition results to existing HTML elements. Weare not sure if the WebApps WG is a good place to work onstandardising such markup elements, hence did not include in thesimplified Javascript API [2]. If there is sufficient interest andscope in the WebApps WG charter for the Javascript API and markup, weare happy to combine them both in the proposal.


[1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/

[2]http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html



    Thanks,
    Peter

    [1]
    
http://lists.w3.org/Archives/Public/public-xg-htmlspeech/2011Feb/att-0020/api-draft.html

    On Thu, Jan 5, 2012 at 07:15, Glen Shires <gshi...@google.com
    <mailto:gshi...@google.com>> wrote:
    > As Dan Burnett wrote below: The HTML Speech Incubator Group [1]
    has recently
    > wrapped up its work on use cases, requirements, and proposals
    for adding
    > automatic speech recognition (ASR) and text-to-speech (TTS)
    capabilities to
    > HTML. The work of the group is documented in the group's Final
    Report. [2]
    > The members of the group intend this work to be input to one or more
    > working groups, in W3C and/or other standards development
    organizations such
    > as the IETF, as an aid to developing full standards in this space.
    >
    > Because that work was so broad, Art Barstow asked (below) for a
    relatively
    > specific proposal. We at Google are proposing that a subset of it be
    > accepted as a work item by the Web Applications WG.
    Specifically, we are
    > proposing this Javascript API [3], which enables web developers to
    > incorporate speech recognition and synthesis into their web pages.
    > This simplified subset enables developers to use scripting to
    generate
    > text-to-speech output and to use speech recognition as an input
    for forms,
    > continuous dictation and control, and it supports the majority
    of use-cases
    > in the Incubator Group's Final Report.
    >
    > We welcome your feedback and ask that the Web Applications WG
    > consider accepting this Javascript API [3] as a work item.
    >
    > [1] charter: http://www.w3.org/2005/Incubator/htmlspeech/charter
    > [2] report:
    http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
    > [3]
    > API:
    
http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/att-1696/speechapi.html
    >
    > Bjorn Bringert
    > Satish Sampath
    > Glen Shires
    >
    > On Thu, Dec 22, 2011 at 11:38 AM, Glen Shires
    <gshi...@google.com <mailto:gshi...@google.com>> wrote:
    >>
    >> Milan,
    >> The IDLs contained in both documents are in the same format and
    order, so
    >> it's relatively easy to compare the two side-by-side. The
    semantics of the
    >> attributes, methods and events have not changed, and both IDLs
    link directly
    >> to the definitions contained in the Speech XG Final Report.
    >>
    >> As you mention, we agree that the protocol portions of the
    Speech XG Final
    >> Report are most appropriate for consideration by a group such
    as IETF, and
    >> believe such work can proceed independently, particularly
    because the Speech
    >> XG Final Report has provided a roadmap for these to remain
    compatible.
    >> Also, as shown in the Speech XG Final Report - Overview, the
    "Speech Web
    >> API" is not dependent on the "Speech Protocol" and a "Default
    Speech"
    >> service can be used for local or remote speech recognition and
    synthesis.
    >>
    >> Glen Shires
    >>
    >>
    >> On Thu, Dec 22, 2011 at 10:32 AM, Young, Milan
    <milan.yo...@nuance.com <mailto:milan.yo...@nuance.com>>
    >> wrote:
    >>>
    >>> Hello Glen,
    >>>
    >>>
    >>>
    >>> The proposal says that it contains a “simplified subset of the
    JavaScript
    >>> API”. Could you please clarify which elements of the HTMLSpeech
    >>> recommendation’s JavaScript API were omitted? I think this
    would be the
    >>> most efficient way for those of us familiar with the XG
    recommendation to
    >>> evaluate the new proposal.
    >>>
    >>>
    >>>
    >>> I’d also appreciate clarification on how you see the protocol
    being
    >>> handled. In the HTMLSpeech group we were thinking about this as a
    >>> hand-in-hand relationship between W3C and IETF like
    WebSockets. Is this
    >>> still your (and Google’s) vision?
    >>>
    >>>
    >>>
    >>> Thanks
    >>>
    >>>
    >>>
    >>>
    >>>
    >>> From: Glen Shires [mailto:gshi...@google.com
    <mailto:gshi...@google.com>]
    >>> Sent: Thursday, December 22, 2011 11:14 AM
    >>> To: public-webapps@w3.org <mailto:public-webapps@w3.org>;
    Arthur Barstow
    >>> Cc: public-xg-htmlspe...@w3.org
    <mailto:public-xg-htmlspe...@w3.org>; Dan Burnett
    >>>
    >>>
    >>> Subject: Re: HTML Speech XG Completes, seeks feedback for eventual
    >>> standardization
    >>>
    >>>
    >>>
    >>> We at Google believe that a scripting-only (Javascript) subset
    of the API
    >>> defined in the Speech XG Incubator Group Final Report is of
    appropriate
    >>> scope for consideration by the WebApps WG.
    >>>
    >>>
    >>>
    >>> The enclosed scripting-only subset supports the majority of
    the use-cases
    >>> and samples in the XG proposal. Specifically, it enables
    web-pages to
    >>> generate speech output and to use speech recognition as an
    input for forms,
    >>> continuous dictation and control. The Javascript API will
    allow web pages to
    >>> control activation and timing and to handle results and
    alternatives.
    >>>
    >>>
    >>>
    >>> We welcome your feedback and ask that the Web Applications WG
    consider
    >>> accepting this as a work item.
    >>>
    >>>
    >>>
    >>> Bjorn Bringert
    >>>
    >>> Satish Sampath
    >>>
    >>> Glen Shires
    >>>
    >>>
    >>>
    >>> On Tue, Dec 13, 2011 at 11:39 AM, Glen Shires
    <gshi...@google.com <mailto:gshi...@google.com>> wrote:
    >>>
    >>> We at Google believe that a scripting-only (Javascript) subset
    of the API
    >>> defined in the Speech XG Incubator Group Final Report [1] is
    of appropriate
    >>> scope for consideration by the WebApps WG.
    >>>
    >>>
    >>>
    >>> A scripting-only subset supports the majority of the use-cases and
    >>> samples in the XG proposal. Specifically, it enables web-pages
    to generate
    >>> speech output and to use speech recognition as an input for forms,
    >>> continuous dictation and control. The Javascript API will
    allow web pages to
    >>> control activation and timing and to handle results and
    alternatives
    >>>
    >>>
    >>>
    >>> As Dan points out above, we envision that different portions
    of the
    >>> Incubator Group Final Report are applicable to different
    working groups "in
    >>> W3C and/or other standards development organizations such as
    the IETF".
    >>> This scripting API subset does not preclude other groups from
    pursuing
    >>> standardization of relevant HTML markup or underlying
    transport protocols,
    >>> and indeed the Incubator Group Final Report defines a
    potential roadmap such
    >>> that such additions can be compatible.
    >>>
    >>>
    >>>
    >>> To make this more concrete, Google will provide to this
    mailing list a
    >>> specific proposal extracted from the Incubator Group Final
    Report, that
    >>> includes only those portions we believe are relevant to
    WebApps, with links
    >>> back to the Incubator Report as appropriate.
    >>>
    >>>
    >>>
    >>> Bjorn Bringert
    >>>
    >>> Satish Sampath
    >>>
    >>> Glen Shires
    >>>
    >>>
    >>>
    >>> [1] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
    >>>
    >>>
    >>>
    >>> On Tue, Dec 13, 2011 at 5:32 AM, Dan Burnett
    <dburn...@voxeo.com <mailto:dburn...@voxeo.com>> wrote:
    >>>
    >>> Thanks for the info, Art. To be clear, I personally am *NOT*
    proposing
    >>> adding any specs to WebApps, although others might. My email
    below as a
    >>> Chair of the group is merely to inform people of this work and
    ask for
    >>> feedback.
    >>> I expect that your information will be useful for others who
    might wish
    >>> for some of this work to continue in WebApps.
    >>>
    >>> -- dan
    >>>
    >>>
    >>>
    >>> On Dec 13, 2011, at 7:06 AM, Arthur Barstow wrote:
    >>>
    >>> > Hi Dan,
    >>> >
    >>> > WebApps already has a relatively large number of specs in
    progress (see
    >>> > [PubStatus]) and the group has agreed to add some additional
    specs (see
    >>> > [CharterChanges]). As such, please provide a relatively
    specific proposal
    >>> > about the features/specs you and other proponents would like
    to add to
    >>> > WebApps.
    >>> >
    >>> > Regarding the level of detail for your proposal, I think a
    reasonable
    >>> > precedence is something like the Gamepad and
    Pointer/MouseLock proposals
    >>> > (see [CharterChanges]). (Perhaps this could be achieved by
    identifying
    >>> > specific sections in the XG's Final Report?)
    >>> >
    >>> > -Art Barstow
    >>> >
    >>> > [PubStatus]
    >>> > http://www.w3.org/2008/webapps/wiki/PubStatus#API_Specifications
    >>> > [CharterChanges]
    >>> >
    http://www.w3.org/2008/webapps/wiki/CharterChanges#Additions_Agreed
    >>> >
    >>> > On 12/12/11 5:25 PM, ext Dan Burnett wrote:
    >>> >> Dear WebApps people,
    >>> >>
    >>> >> The HTML Speech Incubator Group [1] has recently wrapped up
    its work
    >>> >> on use cases, requirements, and proposals for adding
    automatic speech
    >>> >> recognition (ASR) and text-to-speech (TTS) capabilities to
    HTML. The work
    >>> >> of the group is documented in the group's Final Report. [2]
    >>> >>
    >>> >> The members of the group intend this work to be input to
    one or more
    >>> >> working groups, in W3C and/or other standards development
    organizations such
    >>> >> as the IETF, as an aid to developing full standards in this
    space.
    >>> >> Whether the W3C work happens in a new Working Group or an
    existing
    >>> >> one, we are interested in collecting feedback on the
    Incubator Group's work.
    >>> >> We are specifically interested in input from the members of
    the WebApps
    >>> >> Working Group.
    >>> >>
    >>> >> If you have any feedback to share, please send it to, or
    cc, the
    >>> >> group's mailing list (public-xg-htmlspe...@w3.org
    <mailto:public-xg-htmlspe...@w3.org>). This will allow
    >>> >> comments to be archived in one consistent location for use
    by whatever group
    >>> >> takes up this work.
    >>> >>
    >>> >>
    >>> >> Dan Burnett, Co-Chair
    >>> >> HTML Speech Incubator Group
    >>> >>
    >>> >>
    >>> >> [1] charter:
    http://www.w3.org/2005/Incubator/htmlspeech/charter
    >>> >> [2] http://www.w3.org/2005/Incubator/htmlspeech/XGR-htmlspeech/
    >>> >>
    >>> >> p.s. This feedback request is being sent to the following
    groups:
    >>> >> WebApps, HTML, Audio, DAP, Voice Browser, Multimodal
    Interaction
    >>>
    >>>
    >>>
    >>>
    >>
    >>
    >

Re: Speech Recognition and Text-to-Speech Javascript API - seeking feedback for eventual standardization

Reply via email to