> On 7 Feb 2015, at 19:12, david medine <[email protected]> wrote: > > One of the bad things about Google is that it is essentially a giant > billboard. Having said that, I am going to advertise a couple of things. > > If you want a speech recognition API that doesn't rely on a tax-exempt > corporation that has more money than the nation of Russia, builds its > products in unsafe overseas sweatshops, charges you $99/year to develop > software for the device you already paid for, eagerly aids the federal > government in unconstitutional spying, or is in the process of assimilating > all of human culture, you might want to check CMU's speech recognition > toolkit, Sphinx. > http://cmusphinx.sourceforge.net/ > > Another advantage of Sphinx is that it doesn't rely on internet access to > decode speech. And, someone even wrote a simple Pd extern with Sphinx. > https://github.com/dmedine/recog_tilde > > And yes, it is quite difficult to train Sphinx. Building a dictionary is > copious work, and Google and Apple have done it 1000 better than anyone else > because they have mountains of data and cash and luxury model machine > learning algorithms. . . but no one ever said DIY was easy. > Depending on what you want to do, some of the pre-existing CMU models / dictionaries quite good. If what you want to do is "voice commands", or something else with a highly restricted lexicon, you might be OK. Arbitrary speech recognition is a different matter, but then Siri doesn't do that well either. It works on a clever set of assumptions about the most common use cases. Best, Jamie >> On 2/7/15 9:55 AM, Spencer Russell wrote: >> I saw a really interesting talk last year by Johan Schalkwyk, the head of >> the Google speech recognition group. One of the points he made was that >> while Google's algorithms are important, they got a lot more leverage from >> the sheer amount of data they have access to. It allows them to get away >> with much simpler algorithms. I think that's one of the biggest problems >> with trying to compete with Google and Apple on speech recognition, because >> OSS developers just don't have access to a huge corpus of data. >> >> Even though a lot of that data is unlabeled (they don't know what the actual >> words are that correspond to the audio), they have a huge amount of >> interaction data, so they can for instance look at whether the user tried >> multiple times with a particular phrase or whether the user accepted a given >> transcription. >> >> It seems like if we want an open-source speech recognition package we should >> focus on finding ways to get an accessible shared corpus. Unless there was >> some tricky licensing I think that corpus would also benefit the big guys >> though, so their corpus would remain a proper superset of what's available >> to OSS developers. >> >> >>> On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote: >>> Hi list, >>> >>> Here's a fun thought-experiment: suppose you're doing a port of Pd, and the >>> graphics toolkit you're using will include functionality to hook in to >>> Google's speech recognition API. Such an API could make the >>> software accessible to people who would otherwise find it very hard to >>> write Pd patches. >>> >>> However, the API works by shipping off your audio data to Google's servers, >>> doing the computation on their machines, and sending you back the results. >>> >>> Do you use the API in your port, or not? >>> >>> I'm decidedly not going to use that API, for what I think are >>> obvious security, privacy, and philosophical reasons. But I'm curious just >>> how obvious the security and privacy implications are to others here. How >>> many people would use a speech-patching mechanism that sends all your >>> speech to Google? >>> >>> I'm also increasingly worried by the apparent gap between the usability of >>> Google and Apple's products, and the seemingly glacial pace at which >>> _usable_ free software speech recognition is being developed. My position >>> won't change, but I'm afraid it's becoming more symbolic than practical as >>> these insecure tools become a natural part of most people's lives. >>> >>> -Jonathan >>> _______________________________________________ >>> [email protected] mailing list >>> UNSUBSCRIBE and account-management -> >>> http://lists.puredata.info/listinfo/pd-list >> >> >> >> _______________________________________________ >> [email protected] mailing list >> UNSUBSCRIBE and account-management -> >> http://lists.puredata.info/listinfo/pd-list > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list
_______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
