There is still the access to computational power challenge, unless we
make a seti@home-like speech recognition crawler which in and of itself
has similar ethical implications.
On 2/7/2015 12:55 PM, Spencer Russell wrote:
I saw a really interesting talk last year by Johan Schalkwyk, the head
of the Google speech recognition group. One of the points he made was
that while Google's algorithms are important, they got a lot more
leverage from the sheer amount of data they have access to. It allows
them to get away with much simpler algorithms. I think that's one of
the biggest problems with trying to compete with Google and Apple on
speech recognition, because OSS developers just don't have access to a
huge corpus of data.
Even though a lot of that data is unlabeled (they don't know what the
actual words are that correspond to the audio), they have a huge
amount of interaction data, so they can for instance look at whether
the user tried multiple times with a particular phrase or whether the
user accepted a given transcription.
It seems like if we want an open-source speech recognition package we
should focus on finding ways to get an accessible shared corpus.
Unless there was some tricky licensing I think that corpus would also
benefit the big guys though, so their corpus would remain a proper
superset of what's available to OSS developers.
On Sat, Feb 7, 2015, at 11:39 AM, Jonathan Wilkes via Pd-list wrote:
Hi list,
Here's a fun thought-experiment: suppose you're doing a port of Pd,
and the graphics toolkit you're using will include functionality to
hook in to Google's speech recognition API. Such an API could make
the software accessible to people who would otherwise find it very
hard to write Pd patches.
However, the API works by shipping off your audio data to Google's
servers, doing the computation on their machines, and sending you
back the results.
Do you use the API in your port, or not?
I'm decidedly not going to use that API, for what I think are obvious
security, privacy, and philosophical reasons. But I'm curious just
how obvious the security and privacy implications are to others
here. How many people would use a speech-patching mechanism that
sends all your speech to Google?
I'm also increasingly worried by the apparent gap between the
usability of Google and Apple's products, and the seemingly glacial
pace at which _usable_ free software speech recognition is being
developed. My position won't change, but I'm afraid it's becoming
more symbolic than practical as these insecure tools become a natural
part of most people's lives.
-Jonathan
_________________________________________________
[email protected] <mailto:[email protected]> mailing list
UNSUBSCRIBE and account-management ->
http://lists.puredata.info/listinfo/pd-list
_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management ->
http://lists.puredata.info/listinfo/pd-list
--
Ivica Ico Bukvic, D.M.A.
Associate Professor
Computer Music
ICAT Senior Fellow
DISIS, L2Ork
Virginia Tech
School of Performing Arts – 0141
Blacksburg, VA 24061
(540) 231-6139
[email protected]
www.performingarts.vt.edu
disis.music.vt.edu
l2ork.music.vt.edu
_______________________________________________
[email protected] mailing list
UNSUBSCRIBE and account-management ->
http://lists.puredata.info/listinfo/pd-list