Re: [PD] speech recognition and ethics

Jamie Bullock Sat, 07 Feb 2015 11:30:55 -0800


> On 7 Feb 2015, at 19:12, david medine <[email protected]> wrote:
> 
> One of the bad things about Google is that it is essentially a giant 
> billboard. Having said that, I am going to advertise a couple of things.
> 
> If you want a speech recognition API that doesn't rely on a tax-exempt 
> corporation that has more money than the nation of Russia, builds its 
> products in unsafe overseas sweatshops, charges you $99/year to develop 
> software for the device you already paid for, eagerly aids the federal 
> government in unconstitutional spying, or is in the process of assimilating 
> all of human culture, you might want to check CMU's speech recognition 
> toolkit, Sphinx. 
> http://cmusphinx.sourceforge.net/
> 
> Another advantage of Sphinx is that it doesn't rely on internet access to 
> decode speech. And, someone even wrote a simple Pd extern with Sphinx.  
> https://github.com/dmedine/recog_tilde
> 
> And yes, it is quite difficult to train Sphinx. Building a dictionary is 
> copious work, and Google and Apple have done it 1000 better than anyone else 
> because they have mountains of data and cash and luxury model machine 
> learning algorithms. . . but no one ever said DIY was easy. 
> 

Depending on what you want to do, some of the pre-existing CMU models / 
dictionaries quite good. If what you want to do is "voice commands", or 
something else with a highly restricted lexicon, you might be OK. 

Arbitrary speech recognition is a different matter, but then Siri doesn't do 
that well either. It works on a clever set of assumptions about the most common 
use cases.

Best,

Jamie



>> On 2/7/15 9:55 AM, Spencer Russell wrote:
>> I saw a really interesting talk last year by Johan Schalkwyk, the head of 
>> the Google speech recognition group. One of the points he made was that 
>> while Google's algorithms are important, they got a lot more leverage from 
>> the sheer amount of data they have access to. It allows them to get away 
>> with much simpler algorithms. I think that's one of the biggest problems 
>> with trying to compete with Google and Apple on speech recognition, because 
>> OSS developers just         don't have access to a huge corpus of data. 
>>  
>> Even though a lot of that data is unlabeled (they don't know what the actual 
>> words are that correspond to the audio), they have a huge amount of 
>> interaction data, so they can for instance look at whether the user tried 
>> multiple times with a particular phrase or whether the user accepted a given 
>> transcription.
>>  
>> It seems like if we want an open-source speech recognition package we should 
>> focus on finding ways to get an accessible shared corpus. Unless there was 
>> some tricky licensing I think that corpus would also benefit the big guys 
>> though, so their corpus would remain a proper superset of what's available 
>> to OSS developers.
>>  
>>  
>>> On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:
>>> Hi list,
>>>  
>>> Here's a fun thought-experiment: suppose you're doing a port of Pd, and the 
>>> graphics toolkit you're using will include functionality to hook in to 
>>> Google's speech             recognition API.  Such an API could make the 
>>> software accessible to people who would otherwise find it very hard to 
>>> write Pd patches.
>>>  
>>> However, the API works by shipping off your audio data to Google's servers, 
>>> doing the computation on their machines, and sending you back the results.
>>>  
>>> Do you use the API in your port, or not?
>>>  
>>> I'm decidedly not going to use that API, for             what I think are 
>>> obvious security, privacy, and philosophical reasons.  But I'm curious just 
>>> how obvious the security and privacy implications are to others here.  How 
>>> many people would use a speech-patching mechanism that sends all your 
>>> speech to Google?
>>>  
>>> I'm also increasingly worried by the apparent gap between the usability of 
>>> Google and Apple's products, and the seemingly glacial pace at which 
>>> _usable_ free software speech recognition is being developed.  My position 
>>> won't change, but I'm afraid it's becoming more symbolic than practical as 
>>> these insecure tools become a natural part of most people's lives.
>>>  
>>> -Jonathan
>>> _______________________________________________
>>> [email protected] mailing list
>>> UNSUBSCRIBE and account-management -> 
>>> http://lists.puredata.info/listinfo/pd-list
>>  
>> 
>> 
>> _______________________________________________
>> [email protected] mailing list
>> UNSUBSCRIBE and account-management -> 
>> http://lists.puredata.info/listinfo/pd-list
> 
> _______________________________________________
> [email protected] mailing list
> UNSUBSCRIBE and account-management -> 
> http://lists.puredata.info/listinfo/pd-list

_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management -> 
http://lists.puredata.info/listinfo/pd-list

Re: [PD] speech recognition and ethics

Reply via email to