Peter,

While I don’t have direct experience with these programs, there are several 
Open-Source code sets that may be appropriate to your use.

They are located at: http://www.speech.cs.cmu.edu/.  The Sphinx software set 
might be the most useful.

Marc Siskin

On Dec 27, 2017, at 10:04 AM, Rick Harrison via use-livecode 
<use-livecode@lists.runrev.com<mailto:use-livecode@lists.runrev.com>> wrote:

Hi Peter,

What you are trying to do is extremely complex.

Each person’s voice print is unique to that person.
The waveform of one person pronouncing a
spoken phrase correctly could be quite different
from how someone else’s waveform looks when
spoken correctly, due to overtone harmonics,
voice pitch, the speed at which the user is
speaking etc.

Think about dictation software, and about how
it can still be like having a stupid secretary
which misses 5% or more of the words spoken.

It is a voice recognition problem which requires
a lot of “fuzzy logic” to get it right.  Companies
have spent millions of dollars, and tens of
thousands of hours developing these tools.

You need to find a codebase for this that
has already been developed, and that hopefully
is either open-source, (good luck with that one),
or you will have to license it from some company
for a steep price.  It will most probably also be
a large program, will require a lot of CPU
resources, and memory to run on a device.

Good luck, and let us know if you find a good
solution!

Rick


On Dec 27, 2017, at 7:16 AM, Peter Reid via use-livecode 
<use-livecode@lists.runrev.com<mailto:use-livecode@lists.runrev.com>> wrote:

i'm developing an app for cheap Android tablets (e.g. Amazon Fire 7in) that 
allows a user to practice speaking a set of words.  The app plays a sample of a 
word and the user then tries to say the same word.  So far the app can play 
sample words and capture the user's attempts for the same words.  The sample 
words and user attempts are uncompressed WAV files.

I'm trying to find the code to do the comparison of 2 WAV files.  Ideally, the 
code will be in the following formats (best first):

1. LiveCode
2. Pseudocode
3. Other code (Python, Java, C++ etc.)
4. Academic papers

I'm considering 2 general methods:

a. Compare 2 voice clips directly
b. Convert 2 voice clips to text (using voice-to-text) and then compare the 
words in text format

Note that Ali Lloyd from the LiveCode team has developed various things to 
help.  However I've hit problems as follows:

a. Ali has wrapped a standard Android sound library that compares 2 WAV files 
and gives a percentage match. However the comparison is either far too 
forgiving or far too strict, i.e. highly unreliable.

b. Ali has wrapped a standard Android voice-to-text library which works well 
with the devices he's tried it on.  However, the Amazon tablets do not support 
this Android library!

Given the two developments from Ali both relied on preformed blackbox code 
(Android Java libraries), i may have to implement a comparison algorithm from 
scratch. A solution that's completely in LiveCode would have several benefits:

i. it may work!
ii. it may work cross-platform
iii. it may be understandable!

General reading around this subject produces recommendations such as using FFTs 
(Fast Fourier Transforms), MFCCs (Mel Frequency Cepstral Coefficient), etc. but 
I can't find anything that gives an end-to-end method, from sound in to 
comparative score out!

Any help with this would be gratefully received!

Peter
--
Peter Reid
Loughborough, UK


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com<mailto:use-livecode@lists.runrev.com>
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com<mailto:use-livecode@lists.runrev.com>
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

---------------
Marc Siskin
Manager, Modern Language Resource Center
Carnegie Mellon University
msis...@andrew.cmu.edu<mailto:msis...@andrew.cmu.edu>



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to