GSoC Project Status Update 03: Speech Recognition Facility in Openmoko
Hello everyone, This is the status update of the GSoC project, Speech Recognition facility in Openmoko. This week, much of the time was devoted in writing codes and optimizing the existing one. I have written many subroutines like forward backward procedure, LPC and cepstral analysis of speech signals in frames, viterbi algorithm and training algorithm using K-means segmental method. All the source codes have been successfully compiled using GNU C compiler. There are various optimizations done in the coding to make it suitable for working on the ARM 16/32-bit processor running at 266 or 400 MHz maximum. The whole code is written using fixed point arithmetic. I used some external libraries for some subroutines and converted them in fixed point arithmetic. The other optimization was done by choosing K-means segmental procedure for training the HMM models rather than Baum Welch algorithm which requires more processing since it accounts for all the possible hidden states for a given sequence. On the other hand K-means segmental method uses viterbi algorithm to find the best state sequence and then iterates for re-estimation and training the HMM model. K-means segmental method has been proved to show good results and fast processing than Baum-Welch. The other optimization is regarding the probability density function. As this project aims for a small vocabulary (around 5 or 10) for recognition, vector quantization will be used instead of continuous observation sequence. Vector quantization procedure is faster and yields good result for applications in small embedded devices. The vector quantization source code is about to finish. Soon after that, the actual testing of speech recognition code will be done on the speech samples collected. I have uploaded all Documents (Design Document version-0.2) and source codes on the svn repository of Openmoko ( https://svn.projects.openmoko.org/svnroot/speech/). Any comments and suggestions will be highly appreciated. http://saurabh1403.wordpress.com/ Regards -- Saurabh Gupta Electronics and Communication Engg. NSIT,New Delhi ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: GSoC Project Status Update 03: Speech Recognition Facility in Openmoko
Will it be possible to use it with voice dialing? You said vocabulary is 5-10. Will it be enough? It would be cool if I would have a possibility to say: Message to Jane to open sms dialog or Call to Jane to call, presuming Jane is a hot chick ;-) Is it possible? On 6/22/08, saurabh gupta [EMAIL PROTECTED] wrote: Hello everyone, This is the status update of the GSoC project, Speech Recognition facility in Openmoko. This week, much of the time was devoted in writing codes and optimizing the existing one. I have written many subroutines like forward backward procedure, LPC and cepstral analysis of speech signals in frames, viterbi algorithm and training algorithm using K-means segmental method. All the source codes have been successfully compiled using GNU C compiler. There are various optimizations done in the coding to make it suitable for working on the ARM 16/32-bit processor running at 266 or 400 MHz maximum. The whole code is written using fixed point arithmetic. I used some external libraries for some subroutines and converted them in fixed point arithmetic. The other optimization was done by choosing K-means segmental procedure for training the HMM models rather than Baum Welch algorithm which requires more processing since it accounts for all the possible hidden states for a given sequence. On the other hand K-means segmental method uses viterbi algorithm to find the best state sequence and then iterates for re-estimation and training the HMM model. K-means segmental method has been proved to show good results and fast processing than Baum-Welch. The other optimization is regarding the probability density function. As this project aims for a small vocabulary (around 5 or 10) for recognition, vector quantization will be used instead of continuous observation sequence. Vector quantization procedure is faster and yields good result for applications in small embedded devices. The vector quantization source code is about to finish. Soon after that, the actual testing of speech recognition code will be done on the speech samples collected. I have uploaded all Documents (Design Document version-0.2) and source codes on the svn repository of Openmoko ( https://svn.projects.openmoko.org/svnroot/speech/). Any comments and suggestions will be highly appreciated. http://saurabh1403.wordpress.com/ Regards -- Saurabh Gupta Electronics and Communication Engg. NSIT,New Delhi ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: GSoC Project Status Update 03: Speech Recognition Facility in Openmoko
Hello, On Mon, Jun 23, 2008 at 1:05 AM, [EMAIL PROTECTED] wrote: Will it be possible to use it with voice dialing? You said vocabulary is 5-10. Will it be enough? It would be cool if I would have a possibility to say: Message to Jane to open sms dialog or Call to Jane to call, presuming Jane is a hot chick ;-) Is it possible? yes, it is of course possible. But it requires the speech recognition for connected words which needs the level building algorithms and proper noise handling along with learning grammar for machine. This project has a great scope and can be extended to any limit. However in this small duration for GSoC Project, I dont think that it will be possible to incorporate these advanced features in it. The initial aim will be to provide an API in which user can store his/her own words individually and connect any particular activity with that word. Upon detection of that word, the API corresponding to that activity for that word will be called. I have included these points in my Design Document and the scope of advanced models using speech recognition. I think once the individual word recognition application is built, the advanced features can be added using this application and newer one. On 6/22/08, saurabh gupta [EMAIL PROTECTED] wrote: Hello everyone, This is the status update of the GSoC project, Speech Recognition facility in Openmoko. This week, much of the time was devoted in writing codes and optimizing the existing one. I have written many subroutines like forward backward procedure, LPC and cepstral analysis of speech signals in frames, viterbi algorithm and training algorithm using K-means segmental method. All the source codes have been successfully compiled using GNU C compiler. There are various optimizations done in the coding to make it suitable for working on the ARM 16/32-bit processor running at 266 or 400 MHz maximum. The whole code is written using fixed point arithmetic. I used some external libraries for some subroutines and converted them in fixed point arithmetic. The other optimization was done by choosing K-means segmental procedure for training the HMM models rather than Baum Welch algorithm which requires more processing since it accounts for all the possible hidden states for a given sequence. On the other hand K-means segmental method uses viterbi algorithm to find the best state sequence and then iterates for re-estimation and training the HMM model. K-means segmental method has been proved to show good results and fast processing than Baum-Welch. The other optimization is regarding the probability density function. As this project aims for a small vocabulary (around 5 or 10) for recognition, vector quantization will be used instead of continuous observation sequence. Vector quantization procedure is faster and yields good result for applications in small embedded devices. The vector quantization source code is about to finish. Soon after that, the actual testing of speech recognition code will be done on the speech samples collected. I have uploaded all Documents (Design Document version-0.2) and source codes on the svn repository of Openmoko ( https://svn.projects.openmoko.org/svnroot/speech/). Any comments and suggestions will be highly appreciated. http://saurabh1403.wordpress.com/ Regards -- Saurabh Gupta Electronics and Communication Engg. NSIT,New Delhi ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community -- Saurabh Gupta Electronics and Communication Engg. NSIT,New Delhi ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community