Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Chris, I was discussing with sphinx leaders and we can build models from audiobooks as well. This approach saves a lot of time and enhances the quality since the narrative is well accurate and clear. We are currently defining a way to create hindi and brazilian portuguese models. Thanks Andre On Oct 30, 2014 5:47 PM, "Chris Hofmann" wrote: > On 10/30/14 5:24 PM, smaug wrote: > >> On 10/31/2014 02:21 AM, smaug wrote: >> >>> Intent to ship is too strong for this. >>> We need to first have implementation landed and tested ;) >>> >>> I wouldn't ship the implementation in desktop FF without plenty of more >>> testing. >>> >>> >> But I guess the question is what people think about shipping the >> pocketspinx + API, even if disabled by default. >> >> Andre, we need some numbers here. How much does Pocketsphinx increase >> binary size? or download size? >> When the pref is enabled, how much does it use memory on desktop, what >> about on b2g? >> >> >> This is important work and the competition is ramping quicky after many > years of promises about this year being the year of voice recognition. We > will probably fall behind quickly if we don't get something going here in > the next year. > > Can you also talk a bit about what the plan and set of challenges look > like for expanding the supported languages, and how these would impact the > numbers ollie has asked for? > > The place we really need this is b2g, but phones are only shipping in > international markets right now so english only is not all that helpful. > > -chofmann > > >>> >>> -Olli >>> >>> >>> On 10/31/2014 01:18 AM, Andre Natal wrote: >>> I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 >>> >> ___ >> dev-platform mailing list >> dev-platform@lists.mozilla.org >> https://lists.mozilla.org/listinfo/dev-platform >> > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Olli, In general for FxOS devices, the thought is to let the OEMs decide which language models they would like to ship with, preloaded. That way there is a partner choice based on regions, but also the users could directly download the packages they like. For now, since we are very early stage, we just have English support. We need help to build and test other language models in parallel. Sandip - Original Message - > From: "Andre Natal" > To: "smaug" > Cc: "Sandip Kamat" , dev-platform@lists.mozilla.org > Sent: Saturday, November 8, 2014 8:50:44 PM > Subject: Re: Intent to ship: Web Speech API - Speech Recognition with > Pocketsphinx > Hi Olli, > > How much does Pocketsphinx increase binary size? or download size? > In the past was suggested to avoid ship the models with packages, but yes to > create a preferences panel in the apps to allow the user to download the > models he wants to. > About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb > [1]. I don't know which type of compression the build system does when > compiling/packaging, but should be efficient enough. > [1] > MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa > /usr/local/lib/libsphinxbase.a > 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39 > /usr/local/lib/libsphinxbase.a > MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa > /usr/local/lib/libpocketsphinx.a > 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52 > /usr/local/lib/libpocketsphinx.a > > When the pref is enabled, how much does it use memory on desktop, what > > about > > on b2g? > > On b2g, it uses memory only after the decoder be activated and loaded the > models. I did a profile in Zte Open C and here is the report [2] and here > the exact snapshot [3]. Seems ~ 21 mb is used after load the models. > In desktop mac os Nightly, the memory usage was of ~11mb. > [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0 > [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0 > > > -Olli > > > > > > On 10/31/2014 01:18 AM, Andre Natal wrote: > > > > > > > I've been researching speech recognition in Firefox for two years. > > > > First > > > > > > > > > > SpeechRTC, then emscripten, and now Web Speech API with CMU > > > > pocketsphinx > > > > > > > > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop > > > > for > > > > > > > > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme > > > > > > > > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the > > > > management > > > > of > > > > > > > > > > Sandip Kamat. > > > > > > > > > > The implementation already works in B2G, Fennec and all FF desktop > > > > > > > > > > versions, and the first language supported will be english. The API and > > > > > > > > > > implementation are in conformity with W3C standard [2]. The preference > > > > to > > > > > > > > > > enable it is: media.webspeech.service. default = pocketsphinx > > > > > > > > > > The required patches for achieve this are: > > > > > > > > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3] > > > > > > > > > > - Embed english models. Bug 1065911 [4] > > > > > > > > > > - Change SpeechGrammarList to store grammars inside SpeechGrammar > > > > objects. > > > > > > > > > > Bug 1088336 [5] > > > > > > > > > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 > > > > [6] > > > > > > > > > > Also, other important features that we don't have patches yet: > > > > > > > > > > - Relax VAD strategy to be les strict and avoid stop in the middle of > > > > > > > > > > speech when speaking low volume phonemes [7] > > > > > > > > > > - Integrate or develop a grapheme to phoneme algorithm to realtime > > > > > > > > > > generator when compiling grammars [8] > > > > > > > > > > - Inlcude and build models for other languages [9] > > > > > > > > > > - Continuous and wordspotting recognition [10] > >
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Andre, I suggest let's update the wiki for these sizes (as well as other questions in this thread) so we can use that as a central place of info. -Sandip - Original Message - > From: "Andre Natal" > To: "smaug" > Cc: "Sandip Kamat" , dev-platform@lists.mozilla.org > Sent: Saturday, November 8, 2014 8:50:44 PM > Subject: Re: Intent to ship: Web Speech API - Speech Recognition with > Pocketsphinx > Hi Olli, > > How much does Pocketsphinx increase binary size? or download size? > In the past was suggested to avoid ship the models with packages, but yes to > create a preferences panel in the apps to allow the user to download the > models he wants to. > About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb > [1]. I don't know which type of compression the build system does when > compiling/packaging, but should be efficient enough. > [1] > MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa > /usr/local/lib/libsphinxbase.a > 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39 > /usr/local/lib/libsphinxbase.a > MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa > /usr/local/lib/libpocketsphinx.a > 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52 > /usr/local/lib/libpocketsphinx.a > > When the pref is enabled, how much does it use memory on desktop, what > > about > > on b2g? > > On b2g, it uses memory only after the decoder be activated and loaded the > models. I did a profile in Zte Open C and here is the report [2] and here > the exact snapshot [3]. Seems ~ 21 mb is used after load the models. > In desktop mac os Nightly, the memory usage was of ~11mb. > [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0 > [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0 > > > -Olli > > > > > > On 10/31/2014 01:18 AM, Andre Natal wrote: > > > > > > > I've been researching speech recognition in Firefox for two years. > > > > First > > > > > > > > > > SpeechRTC, then emscripten, and now Web Speech API with CMU > > > > pocketsphinx > > > > > > > > > > [1] embedded in Gecko C++ layer, project that I had the luck to develop > > > > for > > > > > > > > > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme > > > > > > > > > > Gonçalves, Steven Lee, Randell Jesup plus others and with the > > > > management > > > > of > > > > > > > > > > Sandip Kamat. > > > > > > > > > > The implementation already works in B2G, Fennec and all FF desktop > > > > > > > > > > versions, and the first language supported will be english. The API and > > > > > > > > > > implementation are in conformity with W3C standard [2]. The preference > > > > to > > > > > > > > > > enable it is: media.webspeech.service. default = pocketsphinx > > > > > > > > > > The required patches for achieve this are: > > > > > > > > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3] > > > > > > > > > > - Embed english models. Bug 1065911 [4] > > > > > > > > > > - Change SpeechGrammarList to store grammars inside SpeechGrammar > > > > objects. > > > > > > > > > > Bug 1088336 [5] > > > > > > > > > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 > > > > [6] > > > > > > > > > > Also, other important features that we don't have patches yet: > > > > > > > > > > - Relax VAD strategy to be les strict and avoid stop in the middle of > > > > > > > > > > speech when speaking low volume phonemes [7] > > > > > > > > > > - Integrate or develop a grapheme to phoneme algorithm to realtime > > > > > > > > > > generator when compiling grammars [8] > > > > > > > > > > - Inlcude and build models for other languages [9] > > > > > > > > > > - Continuous and wordspotting recognition [10] > > > > > > > > > > The wip repo is here [11] and this Air Mozilla video [12] plus this > > > > wiki > > > > > > > > > > has more detailed info [13]. > > > &g
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Sorry, I forgot the links: 2 - Speechrtc offline on Firefox OS (Peak): http://youtu.be/FXKXhrRDEb8 3 - Continuous speech recognition on android with poc…: http://youtu.be/3lTtCFaQF2A On Nov 9, 2014 11:12 AM, "Andre Natal" wrote: > Hi Marco. > > SpeechRTC was my first tentative with the platform. At early 2013 neither > I had enough knowledge about gecko internals as even b2g was at very early > stage (in the very beggining, Steven Lee needed to send me patches to gum > work properly), so the fastest path was capture and stream online. The > great part is that opus is pretty efficient plus nodejs + a speech server > wrapping pocketsphinx turned the whole roundtrip really fast. > > But I knew that was not ideal for command and control / grammar, then I > started to research a direct port of pocketsphinx using emscripten. Did > work but three reasons made me move to a full cpp version: > > 1) the whole speech api frontend in gecko was ready to roll only waiting a > backend, and this, as we know was built in cpp; > > 2) my tests ran very well, but on peak [2] for example, performed slower > than on low end devices running android [3] > > 3) with emscripten, the model loading inside decoder's creation at each > reload ended very slow and I couldn't figure out how to keep the decoder > instance between tabs and reloads while in cpp this happens only once, due > Gecko's architecture > On Oct 31, 2014 12:27 AM, "Marco Chen" wrote: > >> Hi Andre, >> >> It is a nice work and expect the voice recognition on B2G. >> >> Beside this final result, I am also interesting in the reason of you >> migrate from SpeechRTC -> emscripten -> Web Speech API. >> Could you also share what is the factor triggered these transition? Then >> that can be the lesson learn for us. >> >> ex: SpeechRTC -> voice recognition can't be performed on local. >> emscripten -> performance issue? or license issue? or ? >> >> Thanks, >> Sincerely yours. >> >> -------------- >> *From: *"Andre Natal" >> *To: *dev-platform@lists.mozilla.org, "Sandip Kamat" , >> "Olli.Pettay" >> *Sent: *Friday, October 31, 2014 7:18:06 AM >> *Subject: *Intent to ship: Web Speech API - Speech Recognition with >> Pocketsphinx >> >> I've been researching speech recognition in Firefox for two years. First >> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx >> [1] embedded in Gecko C++ layer, project that I had the luck to develop >> for >> Google Summer of Code with the mentoring of Olli Pettay, Guilherme >> Gonçalves, Steven Lee, Randell Jesup plus others and with the management >> of >> Sandip Kamat. >> >> The implementation already works in B2G, Fennec and all FF desktop >> versions, and the first language supported will be english. The API and >> implementation are in conformity with W3C standard [2]. The preference to >> enable it is: media.webspeech.service.default = pocketsphinx >> >> The required patches for achieve this are: >> >> - Import pocketsphinx sources in Gecko. Bug 1051146 [3] >> - Embed english models. Bug 1065911 [4] >> - Change SpeechGrammarList to store grammars inside SpeechGrammar >> objects. >> Bug 1088336 [5] >> - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 >> [6] >> >> >> Also, other important features that we don't have patches yet: >> - Relax VAD strategy to be les strict and avoid stop in the middle of >> speech when speaking low volume phonemes [7] >> - Integrate or develop a grapheme to phoneme algorithm to realtime >> generator when compiling grammars [8] >> - Inlcude and build models for other languages [9] >> - Continuous and wordspotting recognition [10] >> >> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki >> has more detailed info [13]. >> >> At this comment you can see a cpu usage on flame while recognition is >> happening [14] >> >> I wish to hear your comments. >> >> Thanks, >> >> Andre Natal >> >> [1] http://cmusphinx.sourceforge.net/ >> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html >> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 >> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 >> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 >> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 >> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 >> [8] https
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Marco. SpeechRTC was my first tentative with the platform. At early 2013 neither I had enough knowledge about gecko internals as even b2g was at very early stage (in the very beggining, Steven Lee needed to send me patches to gum work properly), so the fastest path was capture and stream online. The great part is that opus is pretty efficient plus nodejs + a speech server wrapping pocketsphinx turned the whole roundtrip really fast. But I knew that was not ideal for command and control / grammar, then I started to research a direct port of pocketsphinx using emscripten. Did work but three reasons made me move to a full cpp version: 1) the whole speech api frontend in gecko was ready to roll only waiting a backend, and this, as we know was built in cpp; 2) my tests ran very well, but on peak [2] for example, performed slower than on low end devices running android [3] 3) with emscripten, the model loading inside decoder's creation at each reload ended very slow and I couldn't figure out how to keep the decoder instance between tabs and reloads while in cpp this happens only once, due Gecko's architecture On Oct 31, 2014 12:27 AM, "Marco Chen" wrote: > Hi Andre, > > It is a nice work and expect the voice recognition on B2G. > > Beside this final result, I am also interesting in the reason of you > migrate from SpeechRTC -> emscripten -> Web Speech API. > Could you also share what is the factor triggered these transition? Then > that can be the lesson learn for us. > > ex: SpeechRTC -> voice recognition can't be performed on local. > emscripten -> performance issue? or license issue? or ? > > Thanks, > Sincerely yours. > > -- > *From: *"Andre Natal" > *To: *dev-platform@lists.mozilla.org, "Sandip Kamat" , > "Olli.Pettay" > *Sent: *Friday, October 31, 2014 7:18:06 AM > *Subject: *Intent to ship: Web Speech API - Speech Recognition with > Pocketsphinx > > I've been researching speech recognition in Firefox for two years. First > SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx > [1] embedded in Gecko C++ layer, project that I had the luck to develop for > Google Summer of Code with the mentoring of Olli Pettay, Guilherme > Gonçalves, Steven Lee, Randell Jesup plus others and with the management of > Sandip Kamat. > > The implementation already works in B2G, Fennec and all FF desktop > versions, and the first language supported will be english. The API and > implementation are in conformity with W3C standard [2]. The preference to > enable it is: media.webspeech.service.default = pocketsphinx > > The required patches for achieve this are: > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3] > - Embed english models. Bug 1065911 [4] > - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. > Bug 1088336 [5] > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] > > > Also, other important features that we don't have patches yet: > - Relax VAD strategy to be les strict and avoid stop in the middle of > speech when speaking low volume phonemes [7] > - Integrate or develop a grapheme to phoneme algorithm to realtime > generator when compiling grammars [8] > - Inlcude and build models for other languages [9] > - Continuous and wordspotting recognition [10] > > The wip repo is here [11] and this Air Mozilla video [12] plus this wiki > has more detailed info [13]. > > At this comment you can see a cpu usage on flame while recognition is > happening [14] > > I wish to hear your comments. > > Thanks, > > Andre Natal > > [1] http://cmusphinx.sourceforge.net/ > [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html > [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 > [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 > [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 > [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 > [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 > [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 > [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and > https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 > [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 > [11] https://github.com/andrenatal/gecko-dev > [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ > (Jump > to 12:00) > [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web > [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Thank you Chris, sure we can do it! Here we have a straightforward page with all objects and methods for the Speech API we are aiming to do: https://github.com/andrenatal/webspeechapi/blob/gh-pages/index_clean.html Maybe we can start from it. Thanks! Andre On Mon, Nov 3, 2014 at 9:58 AM, Chris Mills wrote: > Awesome to see this mail, Andre! > > And remember that we do have the pages set up on MDN ready to be filled in > also. > > https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API > > Once this is shipped, do you think we can find some time to start > collaborating on these docs? > > Chris Mills >Senior tech writer || Mozilla > developer.mozilla.org || MDN >cmi...@mozilla.com || @chrisdavidmills > > > > > On 31 Oct 2014, at 02:27, Marco Chen wrote: > > > > Hi Andre, > > > > It is a nice work and expect the voice recognition on B2G. > > > > Beside this final result, I am also interesting in the reason of you > migrate from SpeechRTC -> emscripten -> Web Speech API. > > Could you also share what is the factor triggered these transition? Then > that can be the lesson learn for us. > > > > ex: SpeechRTC -> voice recognition can't be performed on local. > > emscripten -> performance issue? or license issue? or ? > > > > Thanks, > > Sincerely yours. > > > > - Original Message - > > > > From: "Andre Natal" > > To: dev-platform@lists.mozilla.org, "Sandip Kamat" , > "Olli.Pettay" > > Sent: Friday, October 31, 2014 7:18:06 AM > > Subject: Intent to ship: Web Speech API - Speech Recognition with > Pocketsphinx > > > > I've been researching speech recognition in Firefox for two years. First > > SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx > > [1] embedded in Gecko C++ layer, project that I had the luck to develop > for > > Google Summer of Code with the mentoring of Olli Pettay, Guilherme > > Gonçalves, Steven Lee, Randell Jesup plus others and with the management > of > > Sandip Kamat. > > > > The implementation already works in B2G, Fennec and all FF desktop > > versions, and the first language supported will be english. The API and > > implementation are in conformity with W3C standard [2]. The preference to > > enable it is: media.webspeech.service.default = pocketsphinx > > > > The required patches for achieve this are: > > > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3] > > - Embed english models. Bug 1065911 [4] > > - Change SpeechGrammarList to store grammars inside SpeechGrammar > objects. > > Bug 1088336 [5] > > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 > [6] > > > > > > Also, other important features that we don't have patches yet: > > - Relax VAD strategy to be les strict and avoid stop in the middle of > > speech when speaking low volume phonemes [7] > > - Integrate or develop a grapheme to phoneme algorithm to realtime > > generator when compiling grammars [8] > > - Inlcude and build models for other languages [9] > > - Continuous and wordspotting recognition [10] > > > > The wip repo is here [11] and this Air Mozilla video [12] plus this wiki > > has more detailed info [13]. > > > > At this comment you can see a cpu usage on flame while recognition is > > happening [14] > > > > I wish to hear your comments. > > > > Thanks, > > > > Andre Natal > > > > [1] http://cmusphinx.sourceforge.net/ > > [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html > > [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 > > [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 > > [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 > > [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 > > [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 > > [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 > > [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and > > https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 > > [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 > > [11] https://github.com/andrenatal/gecko-dev > > [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ > (Jump > > to 12:00) > > [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web > > [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 > > ___ > > dev-platform mailing list > > dev-platform@lists.mozilla.org > > https://lists.mozilla.org/listinfo/dev-platform > > > > ___ > > dev-platform mailing list > > dev-platform@lists.mozilla.org > > https://lists.mozilla.org/listinfo/dev-platform > > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Chris. For new languages, after the decoder get integrated inside gecko, you only need to build new models (acoustic and language), since the decoder is language agnostic. The procedure of model building is the same for every language: in pretty big picture, you need to record thousands of hours of spoken phrases covering all phones of the aimed language from people of different genders age, regions, accents and etc... all this data is compiled and transformed in the acoustic model. For the language model, you need to build a phonetic dictionary for that language, to then allow tools that do grapheme-to-phoneme (like phonetisaurus [1], e.g.) generate real-time phonetic representations of the words input in your grammar. Build models it is not a trivial task, and requires a closer work between speech engineers and linguists. Pocketsphinx offers some models besides English [2] and they have useful tutorials about acoustic [3] and language [4] model creation. Thanks, Andre [1] https://code.google.com/p/phonetisaurus/ [2] http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ [3] http://cmusphinx.sourceforge.net/wiki/tutorialam?s[]=acoustic&s[]=models [4] http://cmusphinx.sourceforge.net/wiki/tutoriallm On Thu, Oct 30, 2014 at 10:45 PM, Chris Hofmann wrote: > On 10/30/14 5:24 PM, smaug wrote: > >> On 10/31/2014 02:21 AM, smaug wrote: >> >>> Intent to ship is too strong for this. >>> We need to first have implementation landed and tested ;) >>> >>> I wouldn't ship the implementation in desktop FF without plenty of more >>> testing. >>> >>> >> But I guess the question is what people think about shipping the >> pocketspinx + API, even if disabled by default. >> >> Andre, we need some numbers here. How much does Pocketsphinx increase >> binary size? or download size? >> When the pref is enabled, how much does it use memory on desktop, what >> about on b2g? >> >> >> This is important work and the competition is ramping quicky after many > years of promises about this year being the year of voice recognition. We > will probably fall behind quickly if we don't get something going here in > the next year. > > Can you also talk a bit about what the plan and set of challenges look > like for expanding the supported languages, and how these would impact the > numbers ollie has asked for? > > The place we really need this is b2g, but phones are only shipping in > international markets right now so english only is not all that helpful. > > -chofmann > > >>> >>> -Olli >>> >>> >>> On 10/31/2014 01:18 AM, Andre Natal wrote: >>> I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.m
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Olli, > How much does Pocketsphinx increase binary size? or download size? In the past was suggested to avoid ship the models with packages, but yes to create a preferences panel in the apps to allow the user to download the models he wants to. About the size of pocketsphinx libraries itself, in mac os, they sum ~ 2.3 mb [1]. I don't know which type of compression the build system does when compiling/packaging, but should be efficient enough. [1] MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa /usr/local/lib/libsphinxbase.a 2184 -rw-r--r-- 1 root admin 1114840 Jul 7 14:39 /usr/local/lib/libsphinxbase.a MacBook-Air-de-AndreNatal:gecko-dev andrenatal$ ls -lsa /usr/local/lib/libpocketsphinx.a 2352 -rw-r--r-- 1 root admin 1201240 Jul 7 14:52 /usr/local/lib/libpocketsphinx.a When the pref is enabled, how much does it use memory on desktop, what > about on b2g? > > > On b2g, it uses memory only after the decoder be activated and loaded the models. I did a profile in Zte Open C and here is the report [2] and here the exact snapshot [3]. Seems ~ 21 mb is used after load the models. In desktop mac os Nightly, the memory usage was of ~11mb. [2] https://www.dropbox.com/s/cf1drl3thkf6mp1/memory-reports?dl=0 [3] https://www.dropbox.com/s/1rt6z9t5h30whn0/Vaani_b2g_openc.png?dl=0 > > >> >> -Olli >> >> >> On 10/31/2014 01:18 AM, Andre Natal wrote: >> >>> I've been researching speech recognition in Firefox for two years. First >>> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx >>> [1] embedded in Gecko C++ layer, project that I had the luck to develop >>> for >>> Google Summer of Code with the mentoring of Olli Pettay, Guilherme >>> Gonçalves, Steven Lee, Randell Jesup plus others and with the management >>> of >>> Sandip Kamat. >>> >>> The implementation already works in B2G, Fennec and all FF desktop >>> versions, and the first language supported will be english. The API and >>> implementation are in conformity with W3C standard [2]. The preference to >>> enable it is: media.webspeech.service.default = pocketsphinx >>> >>> The required patches for achieve this are: >>> >>> - Import pocketsphinx sources in Gecko. Bug 1051146 [3] >>> - Embed english models. Bug 1065911 [4] >>> - Change SpeechGrammarList to store grammars inside SpeechGrammar >>> objects. >>> Bug 1088336 [5] >>> - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 >>> [6] >>> >>> >>> Also, other important features that we don't have patches yet: >>> - Relax VAD strategy to be les strict and avoid stop in the middle of >>> speech when speaking low volume phonemes [7] >>> - Integrate or develop a grapheme to phoneme algorithm to realtime >>> generator when compiling grammars [8] >>> - Inlcude and build models for other languages [9] >>> - Continuous and wordspotting recognition [10] >>> >>> The wip repo is here [11] and this Air Mozilla video [12] plus this wiki >>> has more detailed info [13]. >>> >>> At this comment you can see a cpu usage on flame while recognition is >>> happening [14] >>> >>> I wish to hear your comments. >>> >>> Thanks, >>> >>> Andre Natal >>> >>> [1] http://cmusphinx.sourceforge.net/ >>> [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html >>> [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 >>> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 >>> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 >>> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 >>> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 >>> [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 >>> [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and >>> https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 >>> [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 >>> [11] https://github.com/andrenatal/gecko-dev >>> [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ >>> (Jump >>> to 12:00) >>> [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web >>> [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 >>> >>> >> > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Thanks Nick, I appreciate your help. I created two versions of Fennec apk: one [1] with the english models bundled (43.7 mb), and other [2] without it (34.6mb). This was the mozconfig I used [3] Actually, I had a conversation with Jonas Sicking some months ago and we agreed that the ideal scenario about this is to allow the user to download the package for the language he prefer from some sort of preferences screen, instead ship them bundled into the apk. [1] https://www.dropbox.com/s/6snv6e3mqqcs4zi/fennec-34.0a1.en-US.android-arm.apk?dl=0 [2] https://www.dropbox.com/s/zxxop34unj21r1s/fennec-35.0a1.en-US.android-arm.apk?dl=0 [3] #DEBUG #ac_add_options --enable-debug #ac_add_options --enable-trace-malloc #ac_add_options --enable-accessibility #ac_add_options --enable-signmar ac_add_options --disable-tests # android options ac_add_options --enable-application=mobile/android ac_add_options --with-android-ndk="/Volumes/extra/android-ndk-r8e/" ac_add_options --with-android-sdk="/Volumes/extra/android-sdk-macosx/platforms/android-19/" # FOR ARM ac_add_options --target=arm-linux-androideabi mk_add_options MOZ_OBJDIR=./obj-arm-linux-androideabi-debug # FOR 386 #ac_add_options --target=i386-linux-android #mk_add_options MOZ_OBJDIR=./objdir-droid-i386 On Thu, Oct 30, 2014 at 9:36 PM, Nick Alexander wrote: > On 2014-10-30, 4:18 PM, Andre Natal wrote: > >> I've been researching speech recognition in Firefox for two years. First >> SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx >> [1] embedded in Gecko C++ layer, project that I had the luck to develop >> for >> Google Summer of Code with the mentoring of Olli Pettay, Guilherme >> Gonçalves, Steven Lee, Randell Jesup plus others and with the management >> of >> Sandip Kamat. >> >> The implementation already works in B2G, Fennec and all FF desktop >> versions, and the first language supported will be english. The API and >> implementation are in conformity with W3C standard [2]. The preference to >> enable it is: media.webspeech.service.default = pocketsphinx >> > > First, Andre, let me offer my congratulations on getting this project to > this point. We've talked a few times and I've always been impressed. > > Can you point me at Fennec try builds? I vaguely recall that these speech > recognition approaches require large pattern matching files, and I'd like > to see what including the Speech API does to the Fennec APK size. We're > pushing pretty hard on reducing our APK size right now because we believe > it's a big barrier to entry and especially to upgrading older devices. > > Nick > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Awesome to see this mail, Andre! And remember that we do have the pages set up on MDN ready to be filled in also. https://developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API Once this is shipped, do you think we can find some time to start collaborating on these docs? Chris Mills Senior tech writer || Mozilla developer.mozilla.org || MDN cmi...@mozilla.com || @chrisdavidmills > On 31 Oct 2014, at 02:27, Marco Chen wrote: > > Hi Andre, > > It is a nice work and expect the voice recognition on B2G. > > Beside this final result, I am also interesting in the reason of you migrate > from SpeechRTC -> emscripten -> Web Speech API. > Could you also share what is the factor triggered these transition? Then that > can be the lesson learn for us. > > ex: SpeechRTC -> voice recognition can't be performed on local. > emscripten -> performance issue? or license issue? or ? > > Thanks, > Sincerely yours. > > - Original Message - > > From: "Andre Natal" > To: dev-platform@lists.mozilla.org, "Sandip Kamat" , > "Olli.Pettay" > Sent: Friday, October 31, 2014 7:18:06 AM > Subject: Intent to ship: Web Speech API - Speech Recognition with > Pocketsphinx > > I've been researching speech recognition in Firefox for two years. First > SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx > [1] embedded in Gecko C++ layer, project that I had the luck to develop for > Google Summer of Code with the mentoring of Olli Pettay, Guilherme > Gonçalves, Steven Lee, Randell Jesup plus others and with the management of > Sandip Kamat. > > The implementation already works in B2G, Fennec and all FF desktop > versions, and the first language supported will be english. The API and > implementation are in conformity with W3C standard [2]. The preference to > enable it is: media.webspeech.service.default = pocketsphinx > > The required patches for achieve this are: > > - Import pocketsphinx sources in Gecko. Bug 1051146 [3] > - Embed english models. Bug 1065911 [4] > - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. > Bug 1088336 [5] > - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] > > > Also, other important features that we don't have patches yet: > - Relax VAD strategy to be les strict and avoid stop in the middle of > speech when speaking low volume phonemes [7] > - Integrate or develop a grapheme to phoneme algorithm to realtime > generator when compiling grammars [8] > - Inlcude and build models for other languages [9] > - Continuous and wordspotting recognition [10] > > The wip repo is here [11] and this Air Mozilla video [12] plus this wiki > has more detailed info [13]. > > At this comment you can see a cpu usage on flame while recognition is > happening [14] > > I wish to hear your comments. > > Thanks, > > Andre Natal > > [1] http://cmusphinx.sourceforge.net/ > [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html > [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 > [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 > [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 > [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 > [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 > [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 > [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and > https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 > [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 > [11] https://github.com/andrenatal/gecko-dev > [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump > to 12:00) > [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web > [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > > ___ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Hi Andre, It is a nice work and expect the voice recognition on B2G. Beside this final result, I am also interesting in the reason of you migrate from SpeechRTC -> emscripten -> Web Speech API. Could you also share what is the factor triggered these transition? Then that can be the lesson learn for us. ex: SpeechRTC -> voice recognition can't be performed on local. emscripten -> performance issue? or license issue? or ? Thanks, Sincerely yours. - Original Message - From: "Andre Natal" To: dev-platform@lists.mozilla.org, "Sandip Kamat" , "Olli.Pettay" Sent: Friday, October 31, 2014 7:18:06 AM Subject: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
On 31/10/2014 11:45 AM, Chris Hofmann wrote: The place we really need this is b2g, but phones are only shipping in international markets right now so english only is not all that helpful. While this doesn't change the point you are making in any way, FWIW, Firefox OS phones are on sale in Australia via one of our largest electronics retailers: https://www.jbhifi.com.au/phones/Outright-Mobile-Handsets/zte/zte-open-c-handset-grey/624980/ http://www.gizmodo.com.au/2014/10/jb-hi-fi-is-now-selling-australias-first-firefox-os-phone/ Nice! Mark ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
On 10/30/14 5:24 PM, smaug wrote: On 10/31/2014 02:21 AM, smaug wrote: Intent to ship is too strong for this. We need to first have implementation landed and tested ;) I wouldn't ship the implementation in desktop FF without plenty of more testing. But I guess the question is what people think about shipping the pocketspinx + API, even if disabled by default. Andre, we need some numbers here. How much does Pocketsphinx increase binary size? or download size? When the pref is enabled, how much does it use memory on desktop, what about on b2g? This is important work and the competition is ramping quicky after many years of promises about this year being the year of voice recognition. We will probably fall behind quickly if we don't get something going here in the next year. Can you also talk a bit about what the plan and set of challenges look like for expanding the supported languages, and how these would impact the numbers ollie has asked for? The place we really need this is b2g, but phones are only shipping in international markets right now so english only is not all that helpful. -chofmann -Olli On 10/31/2014 01:18 AM, Andre Natal wrote: I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
Intent to ship is too strong for this. We need to first have implementation landed and tested ;) I wouldn't ship the implementation in desktop FF without plenty of more testing. -Olli On 10/31/2014 01:18 AM, Andre Natal wrote: I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
On 10/31/2014 02:21 AM, smaug wrote: Intent to ship is too strong for this. We need to first have implementation landed and tested ;) I wouldn't ship the implementation in desktop FF without plenty of more testing. But I guess the question is what people think about shipping the pocketspinx + API, even if disabled by default. Andre, we need some numbers here. How much does Pocketsphinx increase binary size? or download size? When the pref is enabled, how much does it use memory on desktop, what about on b2g? -Olli On 10/31/2014 01:18 AM, Andre Natal wrote: I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Re: Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
On 2014-10-30, 4:18 PM, Andre Natal wrote: I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx First, Andre, let me offer my congratulations on getting this project to this point. We've talked a few times and I've always been impressed. Can you point me at Fennec try builds? I vaguely recall that these speech recognition approaches require large pattern matching files, and I'd like to see what including the Speech API does to the Fennec APK size. We're pushing pretty hard on reducing our APK size right now because we believe it's a big barrier to entry and especially to upgrading older devices. Nick ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform
Intent to ship: Web Speech API - Speech Recognition with Pocketsphinx
I've been researching speech recognition in Firefox for two years. First SpeechRTC, then emscripten, and now Web Speech API with CMU pocketsphinx [1] embedded in Gecko C++ layer, project that I had the luck to develop for Google Summer of Code with the mentoring of Olli Pettay, Guilherme Gonçalves, Steven Lee, Randell Jesup plus others and with the management of Sandip Kamat. The implementation already works in B2G, Fennec and all FF desktop versions, and the first language supported will be english. The API and implementation are in conformity with W3C standard [2]. The preference to enable it is: media.webspeech.service.default = pocketsphinx The required patches for achieve this are: - Import pocketsphinx sources in Gecko. Bug 1051146 [3] - Embed english models. Bug 1065911 [4] - Change SpeechGrammarList to store grammars inside SpeechGrammar objects. Bug 1088336 [5] - Creation of a SpeechRecognitionService for Pocketsphinx. Bug 1051148 [6] Also, other important features that we don't have patches yet: - Relax VAD strategy to be les strict and avoid stop in the middle of speech when speaking low volume phonemes [7] - Integrate or develop a grapheme to phoneme algorithm to realtime generator when compiling grammars [8] - Inlcude and build models for other languages [9] - Continuous and wordspotting recognition [10] The wip repo is here [11] and this Air Mozilla video [12] plus this wiki has more detailed info [13]. At this comment you can see a cpu usage on flame while recognition is happening [14] I wish to hear your comments. Thanks, Andre Natal [1] http://cmusphinx.sourceforge.net/ [2] https://dvcs.w3.org/hg/speech-api/raw-file/tip/speechapi.html [3] https://bugzilla.mozilla.org/show_bug.cgi?id=1051146 [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1065911 [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1088336 [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148 [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1051604 [8] https://bugzilla.mozilla.org/show_bug.cgi?id=1051554 [9] https://bugzilla.mozilla.org/show_bug.cgi?id=1065904 and https://bugzilla.mozilla.org/show_bug.cgi?id=1051607 [10] https://bugzilla.mozilla.org/show_bug.cgi?id=967896 [11] https://github.com/andrenatal/gecko-dev [12] https://air.mozilla.org/mozilla-weekly-project-meeting-20141027/ (Jump to 12:00) [13] https://wiki.mozilla.org/SpeechRTC_-_Speech_enabling_the_open_web [14] https://bugzilla.mozilla.org/show_bug.cgi?id=1051148#c14 ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform