Re: Intent to ship: Web Speech API

2019-10-14 Thread Henri Sivonen
On Sat, Oct 12, 2019 at 12:29 PM Andre Natal  wrote:
> We tried to capture everything here [1], so please if you don't see your
> question addressed in this document, just give us a shout either here in
> the thread or directly.
...
> [1]
> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#

Thanks. It doesn't address the question of what the UI in Firefox is
like. Following the links for experimenting with the UI on one's own
leads to https://mdn.github.io/web-speech-api/speech-color-changer/ ,
which doesn't work in Nightly even with prefs flipped.

(Trying that example in Chrome shows that Chrome presents the
permission prompt as a matter of sharing the microphone with
mdn.github.io as if this was WebRTC, which suggests that mdn.github.io
decides where the audio goes. Chrome does not surface that, if I
understand correctly how this API works in Chrome, the audio is
instead sent to a destination of Chrome's choosing and not to a
destination of mdn.github.io's choosing. The example didn't work for
me in Safari.)

--
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Andre Natal
Hi Fabrice,

Thanks for letting us know. I'm cc'ing jvo here so she can open them.


On Sun, Oct 13, 2019, 1:53 AM Fabrice Desre  wrote:

>  Hi André :)
>
> The links to the last 3 docs seem to not be publicly accessible:
> - Are you adding voice commands to Firefox?
>  -> mana is not public.
>
> What’s next?
>   -> private google doc.
>
> Have a question not addressed here?
>   -> private slack channel.
>
> On 10/12/19 3:14 PM, Andre Natal wrote:
> > The doc should be open now, please let us know if you still can't access
> it.
> >
> > On Sat, Oct 12, 2019, 4:40 PM  wrote:
> >
> >> The link to the FAQ is posted in the public group, in a thread meant for
> >> audiences outside MoCo.  Please consider opening the doc to be readable
> to
> >> everyone, or at least copy the questions which already have answers
> (that
> >> you consider "done") in a reply to this thread.
> >>
> >> Thanks
> >> Tomislav
> >>
> >>
> >> On Saturday, October 12, 2019 at 11:29:55 AM UTC+2, Andre Natal wrote:
> >>> sorry for the delay, but besides the patch itself we were working in an
> >> FAQ
> >>> to address all the questions raised in this thread along others we got
> >> from
> >>> other teams.
> >>>
> >>>
> >>> [1]
> >>
> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#
> >> ___
> >> dev-platform mailing list
> >> dev-platform@lists.mozilla.org
> >> https://lists.mozilla.org/listinfo/dev-platform
> >>
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > https://lists.mozilla.org/listinfo/dev-platform
> >
>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Andre Natal
I believe there was a slight misunderstanding. The current work being made
is on the recognition part of the API only. The synthesis part landed a
while ago, and is already enabled by default. You can find some
documentation here:

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

On Sat, Oct 12, 2019 at 4:23 PM Gijs Kruitbosch 
wrote:

> The document says "The WebSpeech API allows websites to enable speech
> input within their experiences." and that it is emphatically NOT
> "Text-to-speech/narration".
>
> This doesn't correspond to the original email here:
>
> > As of October 11th, the Emerging Technologies Team intend to turn "Web
> > Speech API" on by default in *Nightly only* on Mac, Windows, and Linux.
> It
> > has been developed behind the "media.webspeech.recognition.*" and
> > "media.webspeech.synth" preference.
>
> What is the "synth" part if not speech synthesis ie TTS ?
>
> ~ Gijs
>
>
> On 12/10/2019 10:29, Andre Natal wrote:
> > Hello everyone,
> >
> > sorry for the delay, but besides the patch itself we were working in an
> FAQ
> > to address all the questions raised in this thread along others we got
> from
> > other teams.
> >
> > We tried to capture everything here [1], so please if you don't see your
> > question addressed in this document, just give us a shout either here in
> > the thread or directly.
> >
> > Also see below the actual phab [2] and the bug [3] for more information.
> >
> > [1]
> >
> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#
> >
> >
> > [2] https://phabricator.services.mozilla.com/D26047
> >
> > [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1248897
> >
> > Thanks,
> >
> > Andre
> >
> > On Wed, Oct 9, 2019 at 4:40 AM Marcos Caceres 
> wrote:
> >
> >> On Monday, October 7, 2019 at 12:55:23 PM UTC+11, Marcos Caceres wrote:
> >>> As of October 11th, the Emerging Technologies Team intend to turn "Web
> >> Speech API" on by default in *Nightly only* on Mac, Windows, and Linux.
> It
> >> has been developed behind the "media.webspeech.recognition.*" and
> >> "media.webspeech.synth" preference.
> >>>
> >>
> >> Note that because this is only being pref'ed on in Nightly, it should be
> >> considered a kind of "intent to experiment". This is to allow the ET
> team
> >> to get a better understanding of what need to be fixed to get better
> >> interop and what needs to be fixed in the spec. Concerns with the
> current
> >> spec around outlined in:
> >>
> >> https://github.com/mozilla/standards-positions/issues/170
> >>
> >> Collaboration with Google folks is ongoing to address some of those at
> the
> >> spec level.
> >> ___
> >> dev-platform mailing list
> >> dev-platform@lists.mozilla.org
> >> https://lists.mozilla.org/listinfo/dev-platform
> >>
> >
> >
>
>

-- 

-- 
Thanks,

Andre
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Fabrice Desre
 Hi André :)

The links to the last 3 docs seem to not be publicly accessible:
- Are you adding voice commands to Firefox?
 -> mana is not public.

What’s next?
  -> private google doc.

Have a question not addressed here?
  -> private slack channel.

On 10/12/19 3:14 PM, Andre Natal wrote:
> The doc should be open now, please let us know if you still can't access it.
> 
> On Sat, Oct 12, 2019, 4:40 PM  wrote:
> 
>> The link to the FAQ is posted in the public group, in a thread meant for
>> audiences outside MoCo.  Please consider opening the doc to be readable to
>> everyone, or at least copy the questions which already have answers (that
>> you consider "done") in a reply to this thread.
>>
>> Thanks
>> Tomislav
>>
>>
>> On Saturday, October 12, 2019 at 11:29:55 AM UTC+2, Andre Natal wrote:
>>> sorry for the delay, but besides the patch itself we were working in an
>> FAQ
>>> to address all the questions raised in this thread along others we got
>> from
>>> other teams.
>>>
>>>
>>> [1]
>> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#
>> ___
>> dev-platform mailing list
>> dev-platform@lists.mozilla.org
>> https://lists.mozilla.org/listinfo/dev-platform
>>
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
> 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Andre Natal
The doc should be open now, please let us know if you still can't access it.

On Sat, Oct 12, 2019, 4:40 PM  wrote:

> The link to the FAQ is posted in the public group, in a thread meant for
> audiences outside MoCo.  Please consider opening the doc to be readable to
> everyone, or at least copy the questions which already have answers (that
> you consider "done") in a reply to this thread.
>
> Thanks
> Tomislav
>
>
> On Saturday, October 12, 2019 at 11:29:55 AM UTC+2, Andre Natal wrote:
> > sorry for the delay, but besides the patch itself we were working in an
> FAQ
> > to address all the questions raised in this thread along others we got
> from
> > other teams.
> >
> >
> > [1]
> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread tomica
The link to the FAQ is posted in the public group, in a thread meant for 
audiences outside MoCo.  Please consider opening the doc to be readable to 
everyone, or at least copy the questions which already have answers (that you 
consider "done") in a reply to this thread.

Thanks
Tomislav


On Saturday, October 12, 2019 at 11:29:55 AM UTC+2, Andre Natal wrote:
> sorry for the delay, but besides the patch itself we were working in an FAQ
> to address all the questions raised in this thread along others we got from
> other teams.
> 
> 
> [1] 
> https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Gijs Kruitbosch
The document says "The WebSpeech API allows websites to enable speech 
input within their experiences." and that it is emphatically NOT 
"Text-to-speech/narration".


This doesn't correspond to the original email here:


As of October 11th, the Emerging Technologies Team intend to turn "Web
Speech API" on by default in *Nightly only* on Mac, Windows, and Linux. It
has been developed behind the "media.webspeech.recognition.*" and
"media.webspeech.synth" preference.


What is the "synth" part if not speech synthesis ie TTS ?

~ Gijs


On 12/10/2019 10:29, Andre Natal wrote:

Hello everyone,

sorry for the delay, but besides the patch itself we were working in an FAQ
to address all the questions raised in this thread along others we got from
other teams.

We tried to capture everything here [1], so please if you don't see your
question addressed in this document, just give us a shout either here in
the thread or directly.

Also see below the actual phab [2] and the bug [3] for more information.

[1]
https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#


[2] https://phabricator.services.mozilla.com/D26047

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1248897

Thanks,

Andre

On Wed, Oct 9, 2019 at 4:40 AM Marcos Caceres  wrote:


On Monday, October 7, 2019 at 12:55:23 PM UTC+11, Marcos Caceres wrote:

As of October 11th, the Emerging Technologies Team intend to turn "Web

Speech API" on by default in *Nightly only* on Mac, Windows, and Linux. It
has been developed behind the "media.webspeech.recognition.*" and
"media.webspeech.synth" preference.




Note that because this is only being pref'ed on in Nightly, it should be
considered a kind of "intent to experiment". This is to allow the ET team
to get a better understanding of what need to be fixed to get better
interop and what needs to be fixed in the spec. Concerns with the current
spec around outlined in:

https://github.com/mozilla/standards-positions/issues/170

Collaboration with Google folks is ongoing to address some of those at the
spec level.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform






___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-12 Thread Andre Natal
Hello everyone,

sorry for the delay, but besides the patch itself we were working in an FAQ
to address all the questions raised in this thread along others we got from
other teams.

We tried to capture everything here [1], so please if you don't see your
question addressed in this document, just give us a shout either here in
the thread or directly.

Also see below the actual phab [2] and the bug [3] for more information.

[1]
https://docs.google.com/document/d/1BE90kgbwE37fWoQ8vqnsQ3YMiJCKJSvqQwa463yCN1Y/edit?ts=5da0f63f#


[2] https://phabricator.services.mozilla.com/D26047

[2] https://bugzilla.mozilla.org/show_bug.cgi?id=1248897

Thanks,

Andre

On Wed, Oct 9, 2019 at 4:40 AM Marcos Caceres  wrote:

> On Monday, October 7, 2019 at 12:55:23 PM UTC+11, Marcos Caceres wrote:
> > As of October 11th, the Emerging Technologies Team intend to turn "Web
> Speech API" on by default in *Nightly only* on Mac, Windows, and Linux. It
> has been developed behind the "media.webspeech.recognition.*" and
> "media.webspeech.synth" preference.
> >
>
> Note that because this is only being pref'ed on in Nightly, it should be
> considered a kind of "intent to experiment". This is to allow the ET team
> to get a better understanding of what need to be fixed to get better
> interop and what needs to be fixed in the spec. Concerns with the current
> spec around outlined in:
>
> https://github.com/mozilla/standards-positions/issues/170
>
> Collaboration with Google folks is ongoing to address some of those at the
> spec level.
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>


-- 

-- 
Thanks,

Andre
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-08 Thread Marcos Caceres
On Monday, October 7, 2019 at 12:55:23 PM UTC+11, Marcos Caceres wrote:
> As of October 11th, the Emerging Technologies Team intend to turn "Web Speech 
> API" on by default in *Nightly only* on Mac, Windows, and Linux. It has been 
> developed behind the "media.webspeech.recognition.*" and 
> "media.webspeech.synth" preference. 
> 

Note that because this is only being pref'ed on in Nightly, it should be 
considered a kind of "intent to experiment". This is to allow the ET team to 
get a better understanding of what need to be fixed to get better interop and 
what needs to be fixed in the spec. Concerns with the current spec around 
outlined in:

https://github.com/mozilla/standards-positions/issues/170

Collaboration with Google folks is ongoing to address some of those at the spec 
level. 
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-08 Thread Marcos Caceres
(Apologies for top-posting. I've asked the folks from ET to reply to the 
questions - Andre said he will respond soon! I was just helping them post the 
Intent, but I'm personally not involved with the implementation so I can't 
answer these really good questions... I'm just helping with our process stuff 
:)).

On Monday, October 7, 2019 at 8:32:18 PM UTC+11, Jonathan Kew wrote:
> On 07/10/2019 09:53, Henri Sivonen wrote:
> > On Mon, Oct 7, 2019 at 5:00 AM Marcos Caceres  wrote:
> 
> >>   - speech is processed in our cloud servers, not on device.
> > 
> > What should one read to understand the issues that lead to this change?
> 
> +1. This seems like a change of direction which has *huge* implications 
> for issues like availability (the feature doesn't work if my device is 
> offline?), privacy (my device is sending microphone input to the 
> cloud?), and cost (how much of my expensive metered data does this 
> gobble up?) that need to be openly considered and discussed.
> 
> The original "Intent to prototype" seemed to be about an entirely 
> device-local feature, which means it had fundamentally different 
> characteristics.
> 
> Thanks,
> 
> JK

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-07 Thread Jonathan Kew

On 07/10/2019 09:53, Henri Sivonen wrote:

On Mon, Oct 7, 2019 at 5:00 AM Marcos Caceres  wrote:



  - speech is processed in our cloud servers, not on device.


What should one read to understand the issues that lead to this change?


+1. This seems like a change of direction which has *huge* implications 
for issues like availability (the feature doesn't work if my device is 
offline?), privacy (my device is sending microphone input to the 
cloud?), and cost (how much of my expensive metered data does this 
gobble up?) that need to be openly considered and discussed.


The original "Intent to prototype" seemed to be about an entirely 
device-local feature, which means it had fundamentally different 
characteristics.


Thanks,

JK
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-07 Thread Gijs Kruitbosch

On 07/10/2019 02:55, Marcos Caceres wrote:

  - speech is processed in our cloud servers, not on device.


Is this the case for both recognition and synthesizing? It's not clear 
from this concise description.


Also, hasn't window.speechSynthesis been shipped before now? It's used 
from e.g. reader mode's "narrate" functionality, and has been for quite 
a while, including on release...


~ Gijs
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API

2019-10-07 Thread Henri Sivonen
On Mon, Oct 7, 2019 at 5:00 AM Marcos Caceres  wrote:
>  - The updated implementation more closely aligns with Chrome's 
> implementation - meaning we get better interop across significant sites.

What site can one try to get an idea of what the user interface is like?

>  - speech is processed in our cloud servers, not on device.

What should one read to understand the issues that lead to this change?

-- 
Henri Sivonen
hsivo...@mozilla.com
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Synthesis

2015-11-02 Thread Xidorn Quan
On Tue, Nov 3, 2015 at 9:42 AM, Eitan Isaacson  wrote:
> As of this week, I intend to turn on the speech synthesis API on by default
> for the desktop browser. It has been enabled in b2g for the last few years.
> Thanks to work by Makoto Kato, and Yash Girdhar, we now have support for
> speech in all desktop platforms.
>
> The pref that will be flipped is "media.webspeech.synth.enabled". The bug
> tracking speech in desktop platforms is bug 1003439
> .
>
> The spec
> ,
> is okay-ish. There are no MDN docs yet, all in due time, I guess.

There is a template you may want to follow for Intent to ship email.
If there was no Intent to implement email before, probably some
content there should be included as well.
https://wiki.mozilla.org/WebAPI/ExposureGuidelines#Email_templates

I think the main things absent here are: target release, devtools bug,
and other UAs' status.

- Xidorn
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-18 Thread Andre Natal
Chris,

I was discussing with sphinx leaders and we can build models from
audiobooks as well.

This approach saves a lot of time and enhances the quality since the
narrative is well accurate and clear.

We are currently defining a way to create hindi and brazilian portuguese
models.

Thanks

Andre
On Oct 30, 2014 5:47 PM, Chris Hofmann chofm...@mozilla.com wrote:

 On 10/30/14 5:24 PM, smaug wrote:

 On 10/31/2014 02:21 AM, smaug wrote:

 Intent to ship is too strong for this.
 We need to first have implementation landed and tested ;)

 I wouldn't ship the implementation in desktop FF without plenty of more
 testing.


 But I guess the question is what people think about shipping the
 pocketspinx + API, even if disabled by default.

 Andre, we need some numbers here. How much does Pocketsphinx increase
 binary size? or download size?
 When the pref is enabled, how much does it use memory on desktop, what
 about on b2g?


  This is important work and the competition is ramping quicky after many
 years of promises about this year being the year of voice recognition.  We
 will probably fall behind quickly if we don't get something going here in
 the next year.

 Can you also talk a bit about what the plan and set of challenges look
 like for expanding the supported languages, and how these would impact the
 numbers ollie has asked for?

 The place we really need this is b2g, but phones are only shipping in
 international markets right now so english only is not all that helpful.

 -chofmann



 -Olli


 On 10/31/2014 01:18 AM, Andre Natal wrote:

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the
 management of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference
 to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   - Embed english models. Bug 1065911 [4]
   - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug
 1051148 [6]


 Also, other important features that we don't have patches yet:
   - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
   - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
   - Inlcude and build models for other languages [9]
   - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14



 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform


 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-14 Thread Sandip Kamat
Hi Andre, I suggest let's update the wiki for these sizes (as well as other 
questions in this thread) so we can use that as a central place of info. 

-Sandip 

- Original Message -

 From: Andre Natal ana...@gmail.com
 To: smaug sm...@welho.com
 Cc: Sandip Kamat ska...@mozilla.com, dev-platform@lists.mozilla.org
 Sent: Saturday, November 8, 2014 8:50:44 PM
 Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
 Pocketsphinx

 Hi Olli,

  How much does Pocketsphinx increase binary size? or download size?

 In the past was suggested to avoid ship the models with packages, but yes to
 create a preferences panel in the apps to allow the user to download the
 models he wants to.

 About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
 [1]. I don't know which type of compression the build system does when
 compiling/packaging, but should be efficient enough.

 [1]
 MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
 /usr/local/lib/libsphinxbase.a
 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
 /usr/local/lib/libsphinxbase.a
 MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
 /usr/local/lib/libpocketsphinx.a
 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
 /usr/local/lib/libpocketsphinx.a

  When the pref is enabled, how much does it use memory on desktop, what
  about
  on b2g?
 

 On b2g, it uses memory only after the decoder be activated and loaded the
 models. I did a profile in Zte Open C and here is the report [2] and here
 the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

 In desktop mac os Nightly, the memory usage was of ~11mb.

 [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
 [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

   -Olli
  
 

   On 10/31/2014 01:18 AM, Andre Natal wrote:
  
 

I've been researching speech recognition in Firefox for two years.
First
   
  
 
SpeechRTC, then emscripten, and now Web Speech API with CMU
pocketsphinx
   
  
 
[1] embedded in Gecko C++ layer, project that I had the luck to develop
for
   
  
 
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
   
  
 
Gonçalves, Steven Lee, Randell Jesup plus others and with the
management
of
   
  
 
Sandip Kamat.
   
  
 

The implementation already works in B2G, Fennec and all FF desktop
   
  
 
versions, and the first language supported will be english. The API and
   
  
 
implementation are in conformity with W3C standard [2]. The preference
to
   
  
 
enable it is: media.webspeech.service. default = pocketsphinx
   
  
 

The required patches for achieve this are:
   
  
 

- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   
  
 
- Embed english models. Bug 1065911 [4]
   
  
 
- Change SpeechGrammarList to store grammars inside SpeechGrammar
objects.
   
  
 
Bug 1088336 [5]
   
  
 
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
[6]
   
  
 

Also, other important features that we don't have patches yet:
   
  
 
- Relax VAD strategy to be les strict and avoid stop in the middle of
   
  
 
speech when speaking low volume phonemes [7]
   
  
 
- Integrate or develop a grapheme to phoneme algorithm to realtime
   
  
 
generator when compiling grammars [8]
   
  
 
- Inlcude and build models for other languages [9]
   
  
 
- Continuous and wordspotting recognition [10]
   
  
 

The wip repo is here [11] and this Air Mozilla video [12] plus this
wiki
   
  
 
has more detailed info [13].
   
  
 

At this comment you can see a cpu usage on flame while recognition is
   
  
 
happening [14]
   
  
 

I wish to hear your comments.
   
  
 

Thanks,
   
  
 

Andre Natal
   
  
 

[1] http://cmusphinx.sourceforge. net/
   
  
 
[2] https://dvcs.w3.org/hg/speech- api/raw-file/tip/speechapi. html
   
  
 
[3] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051146
   
  
 
[4] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065911
   
  
 
[5] https://bugzilla.mozilla.org/ show_bug.cgi?id=1088336
   
  
 
[6] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148
   
  
 
[7] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051604
   
  
 
[8] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051554
   
  
 
[9] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065904 and
   
  
 
https://bugzilla.mozilla.org/ show_bug.cgi?id=1051607
   
  
 
[10] https://bugzilla.mozilla.org/ show_bug.cgi?id=967896
   
  
 
[11] https://github.com/andrenatal/ gecko-dev
   
  
 
[12] https://air.mozilla.org/ mozilla-weekly-project- meeting-20141027/
(Jump
   
  
 
to 12:00)
   
  
 
[13] https://wiki.mozilla.org/ SpeechRTC_-_Speech_enabling_
the_open_web
   
  
 
[14] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148#c14

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-14 Thread Sandip Kamat
Hi Olli, In general for FxOS devices, the thought is to let the OEMs decide 
which language models they would like to ship with, preloaded. That way there 
is a partner choice based on regions, but also the users could directly 
download the packages they like. For now, since we are very early stage, we 
just have English support. We need help to build and test other language models 
in parallel. 

Sandip 

- Original Message -

 From: Andre Natal ana...@gmail.com
 To: smaug sm...@welho.com
 Cc: Sandip Kamat ska...@mozilla.com, dev-platform@lists.mozilla.org
 Sent: Saturday, November 8, 2014 8:50:44 PM
 Subject: Re: Intent to ship: Web Speech API - Speech Recognition with
 Pocketsphinx

 Hi Olli,

  How much does Pocketsphinx increase binary size? or download size?

 In the past was suggested to avoid ship the models with packages, but yes to
 create a preferences panel in the apps to allow the user to download the
 models he wants to.

 About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb
 [1]. I don't know which type of compression the build system does when
 compiling/packaging, but should be efficient enough.

 [1]
 MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
 /usr/local/lib/libsphinxbase.a
 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39
 /usr/local/lib/libsphinxbase.a
 MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
 /usr/local/lib/libpocketsphinx.a
 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52
 /usr/local/lib/libpocketsphinx.a

  When the pref is enabled, how much does it use memory on desktop, what
  about
  on b2g?
 

 On b2g, it uses memory only after the decoder be activated and loaded the
 models. I did a profile in Zte Open C and here is the report [2] and here
 the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

 In desktop mac os Nightly, the memory usage was of ~11mb.

 [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
 [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0

   -Olli
  
 

   On 10/31/2014 01:18 AM, Andre Natal wrote:
  
 

I've been researching speech recognition in Firefox for two years.
First
   
  
 
SpeechRTC, then emscripten, and now Web Speech API with CMU
pocketsphinx
   
  
 
[1] embedded in Gecko C++ layer, project that I had the luck to develop
for
   
  
 
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
   
  
 
Gonçalves, Steven Lee, Randell Jesup plus others and with the
management
of
   
  
 
Sandip Kamat.
   
  
 

The implementation already works in B2G, Fennec and all FF desktop
   
  
 
versions, and the first language supported will be english. The API and
   
  
 
implementation are in conformity with W3C standard [2]. The preference
to
   
  
 
enable it is: media.webspeech.service. default = pocketsphinx
   
  
 

The required patches for achieve this are:
   
  
 

- Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   
  
 
- Embed english models. Bug 1065911 [4]
   
  
 
- Change SpeechGrammarList to store grammars inside SpeechGrammar
objects.
   
  
 
Bug 1088336 [5]
   
  
 
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
[6]
   
  
 

Also, other important features that we don't have patches yet:
   
  
 
- Relax VAD strategy to be les strict and avoid stop in the middle of
   
  
 
speech when speaking low volume phonemes [7]
   
  
 
- Integrate or develop a grapheme to phoneme algorithm to realtime
   
  
 
generator when compiling grammars [8]
   
  
 
- Inlcude and build models for other languages [9]
   
  
 
- Continuous and wordspotting recognition [10]
   
  
 

The wip repo is here [11] and this Air Mozilla video [12] plus this
wiki
   
  
 
has more detailed info [13].
   
  
 

At this comment you can see a cpu usage on flame while recognition is
   
  
 
happening [14]
   
  
 

I wish to hear your comments.
   
  
 

Thanks,
   
  
 

Andre Natal
   
  
 

[1] http://cmusphinx.sourceforge. net/
   
  
 
[2] https://dvcs.w3.org/hg/speech- api/raw-file/tip/speechapi. html
   
  
 
[3] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051146
   
  
 
[4] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065911
   
  
 
[5] https://bugzilla.mozilla.org/ show_bug.cgi?id=1088336
   
  
 
[6] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051148
   
  
 
[7] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051604
   
  
 
[8] https://bugzilla.mozilla.org/ show_bug.cgi?id=1051554
   
  
 
[9] https://bugzilla.mozilla.org/ show_bug.cgi?id=1065904 and
   
  
 
https://bugzilla.mozilla.org/ show_bug.cgi?id=1051607
   
  
 
[10] https://bugzilla.mozilla.org/ show_bug.cgi?id=967896
   
  
 
[11] https://github.com/andrenatal/ gecko-dev
   
  
 
[12] https://air.mozilla.org/ mozilla-weekly

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-09 Thread Andre Natal
Hi Marco.

SpeechRTC was my first tentative with the platform. At early 2013 neither I
had enough knowledge about gecko internals as even b2g was at very early
stage (in the very beggining, Steven Lee needed to send me patches to gum
work properly), so the fastest path was capture and stream online. The
great part is that opus is pretty efficient plus nodejs + a speech server
wrapping pocketsphinx turned the whole roundtrip really fast.

But I knew that was not ideal for command and control / grammar, then I
started to research a direct port of pocketsphinx using emscripten. Did
work but three reasons made me move to a full cpp version:

1) the whole speech api frontend in gecko was ready to roll only waiting a
backend, and this, as we know was built in cpp;

2) my tests ran very well, but on peak [2] for example, performed slower
than on low end devices running android [3]

3) with emscripten, the model loading inside decoder's creation at each
reload ended very slow and I couldn't figure out how to keep the decoder
instance between tabs and reloads while in cpp this happens only once, due
Gecko's architecture
On Oct 31, 2014 12:27 AM, Marco Chen mc...@mozilla.com wrote:

 Hi Andre,

 It is a nice work and expect the voice recognition on B2G.

 Beside this final result, I am also interesting in the reason of you
 migrate from SpeechRTC - emscripten - Web Speech API.
 Could you also share what is the factor triggered these transition? Then
 that can be the lesson learn for us.

 ex: SpeechRTC - voice recognition can't be performed on local.
  emscripten - performance issue? or license issue? or ?

 Thanks,
 Sincerely yours.

 --
 *From: *Andre Natal ana...@gmail.com
 *To: *dev-platform@lists.mozilla.org, Sandip Kamat ska...@mozilla.com,
 Olli.Pettay opet...@mozilla.com
 *Sent: *Friday, October 31, 2014 7:18:06 AM
 *Subject: *Intent to ship: Web Speech API - Speech Recognition with
 Pocketsphinx

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
 Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


 Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-09 Thread Andre Natal
Sorry, I forgot the links:

2 - Speechrtc offline on Firefox OS (Peak): http://youtu.be/FXKXhrRDEb8

3 - Continuous speech recognition on android with poc…:
http://youtu.be/3lTtCFaQF2A
 On Nov 9, 2014 11:12 AM, Andre Natal ana...@gmail.com wrote:

 Hi Marco.

 SpeechRTC was my first tentative with the platform. At early 2013 neither
 I had enough knowledge about gecko internals as even b2g was at very early
 stage (in the very beggining, Steven Lee needed to send me patches to gum
 work properly), so the fastest path was capture and stream online. The
 great part is that opus is pretty efficient plus nodejs + a speech server
 wrapping pocketsphinx turned the whole roundtrip really fast.

 But I knew that was not ideal for command and control / grammar, then I
 started to research a direct port of pocketsphinx using emscripten. Did
 work but three reasons made me move to a full cpp version:

 1) the whole speech api frontend in gecko was ready to roll only waiting a
 backend, and this, as we know was built in cpp;

 2) my tests ran very well, but on peak [2] for example, performed slower
 than on low end devices running android [3]

 3) with emscripten, the model loading inside decoder's creation at each
 reload ended very slow and I couldn't figure out how to keep the decoder
 instance between tabs and reloads while in cpp this happens only once, due
 Gecko's architecture
 On Oct 31, 2014 12:27 AM, Marco Chen mc...@mozilla.com wrote:

 Hi Andre,

 It is a nice work and expect the voice recognition on B2G.

 Beside this final result, I am also interesting in the reason of you
 migrate from SpeechRTC - emscripten - Web Speech API.
 Could you also share what is the factor triggered these transition? Then
 that can be the lesson learn for us.

 ex: SpeechRTC - voice recognition can't be performed on local.
  emscripten - performance issue? or license issue? or ?

 Thanks,
 Sincerely yours.

 --
 *From: *Andre Natal ana...@gmail.com
 *To: *dev-platform@lists.mozilla.org, Sandip Kamat ska...@mozilla.com,
 Olli.Pettay opet...@mozilla.com
 *Sent: *Friday, October 31, 2014 7:18:06 AM
 *Subject: *Intent to ship: Web Speech API - Speech Recognition with
 Pocketsphinx

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the management
 of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
 [6]


 Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal
Thanks Nick, I appreciate your help.

I created two versions of Fennec apk: one [1] with the english models
bundled (43.7 mb), and other [2] without it (34.6mb).  This was the
mozconfig I used [3]

Actually, I had a conversation with Jonas Sicking some months ago and we
agreed that the ideal scenario about this is to allow the user to download
the package for the language he prefer from some sort of preferences
screen, instead ship them bundled into the apk.


[1]
https://www.dropbox.com/s/6snv6e3mqqcs4zi/fennec-34.0a1.en-US.android-arm.apk?dl=0
[2]
https://www.dropbox.com/s/zxxop34unj21r1s/fennec-35.0a1.en-US.android-arm.apk?dl=0
[3]
#DEBUG
#ac_add_options --enable-debug
#ac_add_options --enable-trace-malloc
#ac_add_options --enable-accessibility
#ac_add_options --enable-signmar
ac_add_options --disable-tests

# android options
ac_add_options --enable-application=mobile/android
ac_add_options --with-android-ndk=/Volumes/extra/android-ndk-r8e/
ac_add_options
--with-android-sdk=/Volumes/extra/android-sdk-macosx/platforms/android-19/

# FOR ARM
ac_add_options --target=arm-linux-androideabi
mk_add_options MOZ_OBJDIR=./obj-arm-linux-androideabi-debug


# FOR 386
#ac_add_options --target=i386-linux-android
#mk_add_options MOZ_OBJDIR=./objdir-droid-i386

On Thu, Oct 30, 2014 at 9:36 PM, Nick Alexander nalexan...@mozilla.com
wrote:

 On 2014-10-30, 4:18 PM, Andre Natal wrote:

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the management
 of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference to
 enable it is: media.webspeech.service.default = pocketsphinx


 First, Andre, let me offer my congratulations on getting this project to
 this point.  We've talked a few times and I've always been impressed.

 Can you point me at Fennec try builds?  I vaguely recall that these speech
 recognition approaches require large pattern matching files, and I'd like
 to see what including the Speech API does to the Fennec APK size.  We're
 pushing pretty hard on reducing our APK size right now because we believe
 it's a big barrier to entry and especially to upgrading older devices.

 Nick
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal
Hi Olli,


 How much does Pocketsphinx increase binary size? or download size?

In the past was suggested to avoid ship the models with packages, but yes
to create a preferences panel in the apps to allow the user to download the
models he wants to.

About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3
mb [1]. I don't know which type of compression the build system does when
compiling/packaging, but should be efficient enough.

[1]
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libsphinxbase.a
2184 -rw-r--r--  1 root  admin  1114840 Jul  7 14:39
/usr/local/lib/libsphinxbase.a
MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa
/usr/local/lib/libpocketsphinx.a
2352 -rw-r--r--  1 root  admin  1201240 Jul  7 14:52
/usr/local/lib/libpocketsphinx.a



When the pref is enabled, how much does it use memory on desktop, what
 about on b2g?



On b2g, it uses memory only after the decoder be activated and loaded the
models. I did a profile in Zte Open C and here is the report [2] and here
the exact snapshot [3]. Seems ~ 21 mb is used after load the models.

In desktop mac os Nightly, the memory usage was of ~11mb.

[2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0
[3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0









 -Olli


 On 10/31/2014 01:18 AM, Andre Natal wrote:

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the management
 of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   - Embed english models. Bug 1065911 [4]
   - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
 [6]


 Also, other important features that we don't have patches yet:
   - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
   - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
   - Inlcude and build models for other languages [9]
   - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14




___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal
Hi Chris.

For new languages, after the decoder get integrated inside gecko, you only
need to build new models (acoustic and language), since the decoder is
language agnostic.

The procedure of model building is the same for every language: in pretty
big picture, you need to record thousands of hours of spoken phrases
covering all phones of the aimed language from people of different genders
age, regions, accents and etc... all this data is compiled and transformed
in the acoustic model.

For the language model, you need to build a phonetic dictionary for that
language, to then allow tools that do grapheme-to-phoneme (like
phonetisaurus [1], e.g.) generate real-time phonetic representations of the
words input in your grammar.

Build models it is not a trivial task, and requires a closer work between
speech engineers and linguists.

Pocketsphinx offers some models besides English [2]  and they have useful
tutorials about acoustic [3] and language [4] model creation.

Thanks,

Andre

[1] https://code.google.com/p/phonetisaurus/
[2]
http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/
[3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustics[]=models
[4] http://cmusphinx.sourceforge.net/wiki/tutoriallm



On Thu, Oct 30, 2014 at 10:45 PM, Chris Hofmann chofm...@mozilla.com
wrote:

 On 10/30/14 5:24 PM, smaug wrote:

 On 10/31/2014 02:21 AM, smaug wrote:

 Intent to ship is too strong for this.
 We need to first have implementation landed and tested ;)

 I wouldn't ship the implementation in desktop FF without plenty of more
 testing.


 But I guess the question is what people think about shipping the
 pocketspinx + API, even if disabled by default.

 Andre, we need some numbers here. How much does Pocketsphinx increase
 binary size? or download size?
 When the pref is enabled, how much does it use memory on desktop, what
 about on b2g?


  This is important work and the competition is ramping quicky after many
 years of promises about this year being the year of voice recognition.  We
 will probably fall behind quickly if we don't get something going here in
 the next year.

 Can you also talk a bit about what the plan and set of challenges look
 like for expanding the supported languages, and how these would impact the
 numbers ollie has asked for?

 The place we really need this is b2g, but phones are only shipping in
 international markets right now so english only is not all that helpful.

 -chofmann



 -Olli


 On 10/31/2014 01:18 AM, Andre Natal wrote:

 I've been researching speech recognition in Firefox for two years. First
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
 [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme
 Gonçalves, Steven Lee, Randell Jesup plus others and with the
 management of
 Sandip Kamat.

 The implementation already works in B2G, Fennec and all FF desktop
 versions, and the first language supported will be english. The API and
 implementation are in conformity with W3C standard [2]. The preference
 to
 enable it is: media.webspeech.service.default = pocketsphinx

 The required patches for achieve this are:

   - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
   - Embed english models. Bug 1065911 [4]
   - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
 Bug 1088336 [5]
   - Creation of a SpeechRecognitionService for Pocketsphinx. Bug
 1051148 [6]


 Also, other important features that we don't have patches yet:
   - Relax VAD strategy to be les strict and avoid stop in the middle of
 speech when speaking low volume phonemes [7]
   - Integrate or develop a grapheme to phoneme algorithm to realtime
 generator when compiling grammars [8]
   - Inlcude and build models for other languages [9]
   - Continuous and wordspotting recognition [10]

 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
 has more detailed info [13].

 At this comment you can see a cpu usage on flame while recognition is
 happening [14]

 I wish to hear your comments.

 Thanks,

 Andre Natal

 [1] http://cmusphinx.sourceforge.net/
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
 [11] https://github.com/andrenatal/gecko-dev
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
 to 12:00)
 [13] 

Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-08 Thread Andre Natal
Thank you Chris, sure we can do it!

Here we have a straightforward page with all objects and methods for the
Speech API we are aiming to do:

https://github.com/andrenatal/webspeechapi/blob/gh-pages/index_clean.html

Maybe we can start from it.

Thanks!

Andre


On Mon, Nov 3, 2014 at 9:58 AM, Chris Mills cmi...@mozilla.com wrote:

 Awesome to see this mail, Andre!

 And remember that we do have the pages set up on MDN ready to be filled in
 also.

 https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

 Once this is shipped, do you think we can find some time to start
 collaborating on these docs?

 Chris Mills
Senior tech writer || Mozilla
 developer.mozilla.org || MDN
cmi...@mozilla.com || @chrisdavidmills



  On 31 Oct 2014, at 02:27, Marco Chen mc...@mozilla.com wrote:
 
  Hi Andre,
 
  It is a nice work and expect the voice recognition on B2G.
 
  Beside this final result, I am also interesting in the reason of you
 migrate from SpeechRTC - emscripten - Web Speech API.
  Could you also share what is the factor triggered these transition? Then
 that can be the lesson learn for us.
 
  ex: SpeechRTC - voice recognition can't be performed on local.
  emscripten - performance issue? or license issue? or ?
 
  Thanks,
  Sincerely yours.
 
  - Original Message -
 
  From: Andre Natal ana...@gmail.com
  To: dev-platform@lists.mozilla.org, Sandip Kamat ska...@mozilla.com,
 Olli.Pettay opet...@mozilla.com
  Sent: Friday, October 31, 2014 7:18:06 AM
  Subject: Intent to ship: Web Speech API - Speech Recognition with
 Pocketsphinx
 
  I've been researching speech recognition in Firefox for two years. First
  SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
  [1] embedded in Gecko C++ layer, project that I had the luck to develop
 for
  Google Summer of Code with the mentoring of Olli Pettay, Guilherme
  Gonçalves, Steven Lee, Randell Jesup plus others and with the management
 of
  Sandip Kamat.
 
  The implementation already works in B2G, Fennec and all FF desktop
  versions, and the first language supported will be english. The API and
  implementation are in conformity with W3C standard [2]. The preference to
  enable it is: media.webspeech.service.default = pocketsphinx
 
  The required patches for achieve this are:
 
  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar
 objects.
  Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148
 [6]
 
 
  Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
  speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
  generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]
 
  The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
  has more detailed info [13].
 
  At this comment you can see a cpu usage on flame while recognition is
  happening [14]
 
  I wish to hear your comments.
 
  Thanks,
 
  Andre Natal
 
  [1] http://cmusphinx.sourceforge.net/
  [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
  [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
  [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
  [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
  [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
  [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
  [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
  [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
  https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
  [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
  [11] https://github.com/andrenatal/gecko-dev
  [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/
 (Jump
  to 12:00)
  [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
  [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14
  ___
  dev-platform mailing list
  dev-platform@lists.mozilla.org
  https://lists.mozilla.org/listinfo/dev-platform
 
  ___
  dev-platform mailing list
  dev-platform@lists.mozilla.org
  https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-11-03 Thread Chris Mills
Awesome to see this mail, Andre!

And remember that we do have the pages set up on MDN ready to be filled in also.

https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

Once this is shipped, do you think we can find some time to start collaborating 
on these docs?

Chris Mills
   Senior tech writer || Mozilla
developer.mozilla.org || MDN
   cmi...@mozilla.com || @chrisdavidmills



 On 31 Oct 2014, at 02:27, Marco Chen mc...@mozilla.com wrote:
 
 Hi Andre, 
 
 It is a nice work and expect the voice recognition on B2G. 
 
 Beside this final result, I am also interesting in the reason of you migrate 
 from SpeechRTC - emscripten - Web Speech API. 
 Could you also share what is the factor triggered these transition? Then that 
 can be the lesson learn for us. 
 
 ex: SpeechRTC - voice recognition can't be performed on local. 
 emscripten - performance issue? or license issue? or ? 
 
 Thanks, 
 Sincerely yours. 
 
 - Original Message -
 
 From: Andre Natal ana...@gmail.com 
 To: dev-platform@lists.mozilla.org, Sandip Kamat ska...@mozilla.com, 
 Olli.Pettay opet...@mozilla.com 
 Sent: Friday, October 31, 2014 7:18:06 AM 
 Subject: Intent to ship: Web Speech API - Speech Recognition with 
 Pocketsphinx 
 
 I've been researching speech recognition in Firefox for two years. First 
 SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx 
 [1] embedded in Gecko C++ layer, project that I had the luck to develop for 
 Google Summer of Code with the mentoring of Olli Pettay, Guilherme 
 Gonçalves, Steven Lee, Randell Jesup plus others and with the management of 
 Sandip Kamat. 
 
 The implementation already works in B2G, Fennec and all FF desktop 
 versions, and the first language supported will be english. The API and 
 implementation are in conformity with W3C standard [2]. The preference to 
 enable it is: media.webspeech.service.default = pocketsphinx 
 
 The required patches for achieve this are: 
 
 - Import pocketsphinx sources in Gecko. Bug 1051146 [3] 
 - Embed english models. Bug 1065911 [4] 
 - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. 
 Bug 1088336 [5] 
 - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] 
 
 
 Also, other important features that we don't have patches yet: 
 - Relax VAD strategy to be les strict and avoid stop in the middle of 
 speech when speaking low volume phonemes [7] 
 - Integrate or develop a grapheme to phoneme algorithm to realtime 
 generator when compiling grammars [8] 
 - Inlcude and build models for other languages [9] 
 - Continuous and wordspotting recognition [10] 
 
 The wip repo is here [11] and this Air Mozilla video [12] plus this wiki 
 has more detailed info [13]. 
 
 At this comment you can see a cpu usage on flame while recognition is 
 happening [14] 
 
 I wish to hear your comments. 
 
 Thanks, 
 
 Andre Natal 
 
 [1] http://cmusphinx.sourceforge.net/ 
 [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html 
 [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 
 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 
 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 
 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 
 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 
 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 
 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and 
 https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 
 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 
 [11] https://github.com/andrenatal/gecko-dev 
 [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump 
 to 12:00) 
 [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web 
 [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 
 ___ 
 dev-platform mailing list 
 dev-platform@lists.mozilla.org 
 https://lists.mozilla.org/listinfo/dev-platform 
 
 ___
 dev-platform mailing list
 dev-platform@lists.mozilla.org
 https://lists.mozilla.org/listinfo/dev-platform

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Nick Alexander

On 2014-10-30, 4:18 PM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx


First, Andre, let me offer my congratulations on getting this project to 
this point.  We've talked a few times and I've always been impressed.


Can you point me at Fennec try builds?  I vaguely recall that these 
speech recognition approaches require large pattern matching files, and 
I'd like to see what including the Speech API does to the Fennec APK 
size.  We're pushing pretty hard on reducing our APK size right now 
because we believe it's a big barrier to entry and especially to 
upgrading older devices.


Nick
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread smaug

On 10/31/2014 02:21 AM, smaug wrote:

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.



But I guess the question is what people think about shipping the pocketspinx + 
API, even if disabled by default.

Andre, we need some numbers here. How much does Pocketsphinx increase binary 
size? or download size?
When the pref is enabled, how much does it use memory on desktop, what about on 
b2g?





-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14





___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread smaug

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of more testing.



-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:

I've been researching speech recognition in Firefox for two years. First
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to develop for
Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of
Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The preference to
enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar objects.
Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6]


Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki
has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump
to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14



___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Chris Hofmann

On 10/30/14 5:24 PM, smaug wrote:

On 10/31/2014 02:21 AM, smaug wrote:

Intent to ship is too strong for this.
We need to first have implementation landed and tested ;)

I wouldn't ship the implementation in desktop FF without plenty of 
more testing.




But I guess the question is what people think about shipping the 
pocketspinx + API, even if disabled by default.


Andre, we need some numbers here. How much does Pocketsphinx increase 
binary size? or download size?
When the pref is enabled, how much does it use memory on desktop, what 
about on b2g?



This is important work and the competition is ramping quicky after many 
years of promises about this year being the year of voice recognition.  
We will probably fall behind quickly if we don't get something going 
here in the next year.


Can you also talk a bit about what the plan and set of challenges look 
like for expanding the supported languages, and how these would impact 
the numbers ollie has asked for?


The place we really need this is b2g, but phones are only shipping in 
international markets right now so english only is not all that helpful.


-chofmann




-Olli


On 10/31/2014 01:18 AM, Andre Natal wrote:
I've been researching speech recognition in Firefox for two years. 
First
SpeechRTC, then emscripten, and now Web Speech API with CMU 
pocketsphinx
[1] embedded in Gecko C++ layer, project that I had the luck to 
develop for

Google Summer of Code with the mentoring of Olli Pettay, Guilherme
Gonçalves, Steven Lee, Randell Jesup plus others and with the 
management of

Sandip Kamat.

The implementation already works in B2G, Fennec and all FF desktop
versions, and the first language supported will be english. The API and
implementation are in conformity with W3C standard [2]. The 
preference to

enable it is: media.webspeech.service.default = pocketsphinx

The required patches for achieve this are:

  - Import pocketsphinx sources in Gecko. Bug 1051146 [3]
  - Embed english models. Bug 1065911 [4]
  - Change SpeechGrammarList to store grammars inside SpeechGrammar 
objects.

Bug 1088336 [5]
  - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 
1051148 [6]



Also, other important features that we don't have patches yet:
  - Relax VAD strategy to be les strict and avoid stop in the middle of
speech when speaking low volume phonemes [7]
  - Integrate or develop a grapheme to phoneme algorithm to realtime
generator when compiling grammars [8]
  - Inlcude and build models for other languages [9]
  - Continuous and wordspotting recognition [10]

The wip repo is here [11] and this Air Mozilla video [12] plus this 
wiki

has more detailed info [13].

At this comment you can see a cpu usage on flame while recognition is
happening [14]

I wish to hear your comments.

Thanks,

Andre Natal

[1] http://cmusphinx.sourceforge.net/
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896
[11] https://github.com/andrenatal/gecko-dev
[12] 
https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump

to 12:00)
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14





___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Mark Hammond

On 31/10/2014 11:45 AM, Chris Hofmann wrote:

The place we really need this is b2g, but phones are only shipping in
international markets right now so english only is not all that helpful.


While this doesn't change the point you are making in any way, FWIW, 
Firefox OS phones are on sale in Australia via one of our largest 
electronics retailers:


https://www.jbhifi.com.au/phones/Outright-Mobile-Handsets/zte/zte-open-c-handset-grey/624980/

http://www.gizmodo.com.au/2014/10/jb-hi-fi-is-now-selling-australias-first-firefox-os-phone/

Nice!

Mark

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx

2014-10-30 Thread Marco Chen
Hi Andre, 

It is a nice work and expect the voice recognition on B2G. 

Beside this final result, I am also interesting in the reason of you migrate 
from SpeechRTC - emscripten - Web Speech API. 
Could you also share what is the factor triggered these transition? Then that 
can be the lesson learn for us. 

ex: SpeechRTC - voice recognition can't be performed on local. 
emscripten - performance issue? or license issue? or ? 

Thanks, 
Sincerely yours. 

- Original Message -

From: Andre Natal ana...@gmail.com 
To: dev-platform@lists.mozilla.org, Sandip Kamat ska...@mozilla.com, 
Olli.Pettay opet...@mozilla.com 
Sent: Friday, October 31, 2014 7:18:06 AM 
Subject: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx 

I've been researching speech recognition in Firefox for two years. First 
SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx 
[1] embedded in Gecko C++ layer, project that I had the luck to develop for 
Google Summer of Code with the mentoring of Olli Pettay, Guilherme 
Gonçalves, Steven Lee, Randell Jesup plus others and with the management of 
Sandip Kamat. 

The implementation already works in B2G, Fennec and all FF desktop 
versions, and the first language supported will be english. The API and 
implementation are in conformity with W3C standard [2]. The preference to 
enable it is: media.webspeech.service.default = pocketsphinx 

The required patches for achieve this are: 

- Import pocketsphinx sources in Gecko. Bug 1051146 [3] 
- Embed english models. Bug 1065911 [4] 
- Change SpeechGrammarList to store grammars inside SpeechGrammar objects. 
Bug 1088336 [5] 
- Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] 


Also, other important features that we don't have patches yet: 
- Relax VAD strategy to be les strict and avoid stop in the middle of 
speech when speaking low volume phonemes [7] 
- Integrate or develop a grapheme to phoneme algorithm to realtime 
generator when compiling grammars [8] 
- Inlcude and build models for other languages [9] 
- Continuous and wordspotting recognition [10] 

The wip repo is here [11] and this Air Mozilla video [12] plus this wiki 
has more detailed info [13]. 

At this comment you can see a cpu usage on flame while recognition is 
happening [14] 

I wish to hear your comments. 

Thanks, 

Andre Natal 

[1] http://cmusphinx.sourceforge.net/ 
[2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html 
[3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 
[4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 
[5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 
[6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 
[7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 
[8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 
[9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 
[10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 
[11] https://github.com/andrenatal/gecko-dev 
[12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump 
to 12:00) 
[13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web 
[14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 
___ 
dev-platform mailing list 
dev-platform@lists.mozilla.org 
https://lists.mozilla.org/listinfo/dev-platform 

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform