GSoC Project Status Update 03: Speech Recognition Facility in Openmoko

2008-06-22 Thread saurabh gupta
Hello everyone,

This is the status update of the GSoC project, Speech Recognition facility
in Openmoko. This week, much of the time was devoted in writing codes and
optimizing the existing one. I have written many subroutines like forward
backward procedure, LPC and cepstral analysis of speech signals in frames,
viterbi algorithm and training algorithm using K-means segmental method. All
the source codes have been successfully compiled using GNU C compiler.
 There are various optimizations done in the coding to make it suitable
for working on the ARM 16/32-bit processor running at 266 or 400 MHz
maximum. The whole code is written using fixed point arithmetic.  I used
some external libraries for some subroutines and converted them in fixed
point arithmetic. The other optimization was done by choosing K-means
segmental procedure for training the HMM models rather than Baum Welch
algorithm which requires more processing since it accounts for all the
possible hidden states for a given sequence. On the other hand K-means
segmental method uses viterbi algorithm to find the best state sequence and
then iterates for re-estimation and training the HMM model. K-means
segmental method has been proved to show good results and fast processing
than Baum-Welch. The other optimization is regarding the probability density
function. As this project aims for  a small vocabulary (around 5 or 10) for
recognition, vector quantization will be used instead of continuous
observation sequence. Vector quantization procedure is faster and yields
good result for applications in small embedded devices. The vector
quantization source code is about to finish. Soon after that, the actual
testing of speech recognition code will be done on the speech samples
collected.
   I have uploaded all Documents (Design Document version-0.2) and
source codes on the svn repository of Openmoko (
https://svn.projects.openmoko.org/svnroot/speech/). Any comments and
suggestions will be highly appreciated.

http://saurabh1403.wordpress.com/

Regards
-- 
Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: GSoC Project Status Update 03: Speech Recognition Facility in Openmoko

2008-06-22 Thread prishelec
Will it be possible to use it with voice dialing?
You said vocabulary is 5-10. Will it be enough?
It would be cool if I would have a possibility to say: Message to
Jane to open sms dialog or Call to Jane to call, presuming Jane is
a hot chick ;-)
Is it possible?

On 6/22/08, saurabh gupta [EMAIL PROTECTED] wrote:
 Hello everyone,

 This is the status update of the GSoC project, Speech Recognition facility
 in Openmoko. This week, much of the time was devoted in writing codes and
 optimizing the existing one. I have written many subroutines like forward
 backward procedure, LPC and cepstral analysis of speech signals in frames,
 viterbi algorithm and training algorithm using K-means segmental method. All
 the source codes have been successfully compiled using GNU C compiler.
  There are various optimizations done in the coding to make it suitable
 for working on the ARM 16/32-bit processor running at 266 or 400 MHz
 maximum. The whole code is written using fixed point arithmetic.  I used
 some external libraries for some subroutines and converted them in fixed
 point arithmetic. The other optimization was done by choosing K-means
 segmental procedure for training the HMM models rather than Baum Welch
 algorithm which requires more processing since it accounts for all the
 possible hidden states for a given sequence. On the other hand K-means
 segmental method uses viterbi algorithm to find the best state sequence and
 then iterates for re-estimation and training the HMM model. K-means
 segmental method has been proved to show good results and fast processing
 than Baum-Welch. The other optimization is regarding the probability density
 function. As this project aims for  a small vocabulary (around 5 or 10) for
 recognition, vector quantization will be used instead of continuous
 observation sequence. Vector quantization procedure is faster and yields
 good result for applications in small embedded devices. The vector
 quantization source code is about to finish. Soon after that, the actual
 testing of speech recognition code will be done on the speech samples
 collected.
I have uploaded all Documents (Design Document version-0.2) and
 source codes on the svn repository of Openmoko (
 https://svn.projects.openmoko.org/svnroot/speech/). Any comments and
 suggestions will be highly appreciated.

 http://saurabh1403.wordpress.com/

 Regards
 --
 Saurabh Gupta
 Electronics and Communication Engg.
 NSIT,New Delhi


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: GSoC Project Status Update 03: Speech Recognition Facility in Openmoko

2008-06-22 Thread saurabh gupta
Hello,

On Mon, Jun 23, 2008 at 1:05 AM, [EMAIL PROTECTED] wrote:

 Will it be possible to use it with voice dialing?
 You said vocabulary is 5-10. Will it be enough?
 It would be cool if I would have a possibility to say: Message to
 Jane to open sms dialog or Call to Jane to call, presuming Jane is
 a hot chick ;-)
 Is it possible?


yes, it is of course possible.

But it requires the speech recognition for connected words which needs the
level building algorithms and proper noise handling along with learning
grammar for machine. This project has a great scope and can be extended to
any limit. However in this small duration for GSoC Project, I dont think
that it will be possible to incorporate these advanced features in it. The
initial aim will be to provide an API in which user can store his/her own
words individually and connect any particular activity with that word. Upon
detection of that word, the API corresponding to that activity for that word
will be called. I have included these points in my Design Document and the
scope of advanced models using speech recognition. I think once the
individual word recognition application is built, the advanced features can
be added using this application and newer one.



 On 6/22/08, saurabh gupta [EMAIL PROTECTED] wrote:
  Hello everyone,
 
  This is the status update of the GSoC project, Speech Recognition
 facility
  in Openmoko. This week, much of the time was devoted in writing codes and
  optimizing the existing one. I have written many subroutines like forward
  backward procedure, LPC and cepstral analysis of speech signals in
 frames,
  viterbi algorithm and training algorithm using K-means segmental method.
 All
  the source codes have been successfully compiled using GNU C compiler.
   There are various optimizations done in the coding to make it
 suitable
  for working on the ARM 16/32-bit processor running at 266 or 400 MHz
  maximum. The whole code is written using fixed point arithmetic.  I used
  some external libraries for some subroutines and converted them in fixed
  point arithmetic. The other optimization was done by choosing K-means
  segmental procedure for training the HMM models rather than Baum Welch
  algorithm which requires more processing since it accounts for all the
  possible hidden states for a given sequence. On the other hand K-means
  segmental method uses viterbi algorithm to find the best state sequence
 and
  then iterates for re-estimation and training the HMM model. K-means
  segmental method has been proved to show good results and fast processing
  than Baum-Welch. The other optimization is regarding the probability
 density
  function. As this project aims for  a small vocabulary (around 5 or 10)
 for
  recognition, vector quantization will be used instead of continuous
  observation sequence. Vector quantization procedure is faster and yields
  good result for applications in small embedded devices. The vector
  quantization source code is about to finish. Soon after that, the actual
  testing of speech recognition code will be done on the speech samples
  collected.
 I have uploaded all Documents (Design Document version-0.2)
 and
  source codes on the svn repository of Openmoko (
  https://svn.projects.openmoko.org/svnroot/speech/). Any comments and
  suggestions will be highly appreciated.
 
  http://saurabh1403.wordpress.com/
 
  Regards
  --
  Saurabh Gupta
  Electronics and Communication Engg.
  NSIT,New Delhi
 

 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community




-- 
Saurabh Gupta
Electronics and Communication Engg.
NSIT,New Delhi
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community