Re: [PD] speech recognition and ethics

david medine Sat, 07 Feb 2015 11:15:07 -0800

One of the bad things about Google is that it is essentially a giantbillboard. Having said that, I am going to advertise a couple of things.

If you want a speech recognition API that doesn't rely on a tax-exemptcorporation that has more money than the nation of Russia, builds itsproducts in unsafe overseas sweatshops, charges you $99/year to developsoftware for the device you already paid for, eagerly aids the federalgovernment in unconstitutional spying, or is in the process ofassimilating all of human culture, you might want to check CMU's speechrecognition toolkit, Sphinx.

http://cmusphinx.sourceforge.net/

Another advantage of Sphinx is that it doesn't rely on internet accessto decode speech. And, someone even wrote a simple Pd extern with Sphinx.

https://github.com/dmedine/recog_tilde

And yes, it is quite difficult to train Sphinx. Building a dictionary iscopious work, and Google and Apple have done it 1000 better than anyoneelse because they have mountains of data and cash and luxury modelmachine learning algorithms. . . but no one ever said DIY was easy.


On 2/7/15 9:55 AM, Spencer Russell wrote:

I saw a really interesting talk last year by Johan Schalkwyk, the headof the Google speech recognition group. One of the points he made wasthat while Google's algorithms are important, they got a lot moreleverage from the sheer amount of data they have access to. It allowsthem to get away with much simpler algorithms. I think that's one ofthe biggest problems with trying to compete with Google and Apple onspeech recognition, because OSS developers just don't have access to ahuge corpus of data.Even though a lot of that data is unlabeled (they don't know what theactual words are that correspond to the audio), they have a hugeamount of interaction data, so they can for instance look at whetherthe user tried multiple times with a particular phrase or whether theuser accepted a given transcription.It seems like if we want an open-source speech recognition package weshould focus on finding ways to get an accessible shared corpus.Unless there was some tricky licensing I think that corpus would alsobenefit the big guys though, so their corpus would remain a propersuperset of what's available to OSS developers.
On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:
Hi list,
Here's a fun thought-experiment: suppose you're doing a port of Pd,and the graphics toolkit you're using will include functionality tohook in to Google's speech recognition API. Such an API could makethe software accessible to people who would otherwise find it veryhard to write Pd patches.However, the API works by shipping off your audio data to Google'sservers, doing the computation on their machines, and sending youback the results.
Do you use the API in your port, or not?
I'm decidedly not going to use that API, for what I think are obvioussecurity, privacy, and philosophical reasons. But I'm curious justhow obvious the security and privacy implications are to othershere. How many people would use a speech-patching mechanism thatsends all your speech to Google?I'm also increasingly worried by the apparent gap between theusability of Google and Apple's products, and the seemingly glacialpace at which _usable_ free software speech recognition is beingdeveloped. My position won't change, but I'm afraid it's becomingmore symbolic than practical as these insecure tools become a naturalpart of most people's lives.
-Jonathan
_________________________________________________
[email protected] <mailto:[email protected]> mailing list
UNSUBSCRIBE and account-management ->http://lists.puredata.info/listinfo/pd-list
_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Re: [PD] speech recognition and ethics

Reply via email to