Dear Sir, I’m a final year student of Electronics and electrical engineering, Birla Institute of technology and sciences Pilani. I’m interested in following two projects mentored by Ankur, India.
1. Speech based query and result retrieval system for Indian languages 2. Add a language model for speech recognition software for Bengali language I have been working in the field of speech processing from last 2 years through following study oriented projects. 1. Real time isolated word recognition and continuous word recognizer for a vocabulary of 40 words. (Implemented on Dspace processor using Simulink interface) 2. Phoneme recognition system and spoken term detection by phonetic string matching approach using HTK. 3. Continuous speech recognition system trained on Assamese data having a vocabulary 3000 words – implemented in sphinx 3. For last two semesters I worked under Dr. Solomon Raju, senior scientist, CEERI Pilani. In the current semester I have been working ( as a part of my final year project) in a speech recognition start up – Speechwarenet (TIC, IIT Guwahati) under Dr. S. R. M. Prasanna, Professor, IIT Guwahati. I have also worked on Asterisks and developed a voicemail server to exchange voicemails between different users using asterisk interface. I could send the log files (10.falign_ci_hmm.zip) and output file after running Hvite decoder in HTK which could only be generated during training in in sphinx 3 and can’t be downloaded from anywhere else if required. As a part of my final year project I’m working on a project sponsored by VoxEdu, on the American pronunciation practise. The project would extend upto May 15 and I’ll be able to commit my full time for your projects after then. Here’s my interpretation in terms of implementation of speech based query and result retrieval system. System would contain 2 different modules 1. Speech recognition system 2. User interface using asterisk which could use TTS engine festival for text to speech conversion 1. Speech recognition system: To develop a speech recognizer for recognizing user queries, an acoustic model has to be trained. This demands large amount of speech data with corresponding transcription and I suppose the data would be available while project implementation. To train a language model text data is needed which could be readily available in the local language in which system is supposed to be implemented. System could be trained either using HTK or sphinx, personally I would recommend sphinx 3 which is open source. Sphinx-3 or sphinx-4 decoder could be used for recognizing the audio file using training model. Again in terms of performance sphinx 4 is much better recognizer. 2. User interface using Asterisk: Following tasks could be performed using Asterisk 1. Receive the call from user and generate an audio prompt (e.g. what would you like to ask or Ask your query after the beep sound (play beep sound)) either using festival or by directly playing the previously recorded file. 2. Wait for certain time (10 seconds or so) , receive and store the input query from user in a wave file. 3. Pass the wave file to sphinx recognizer and get the result. Search in the database for result of sphinx recognized query and get the result from database. Effectively searching the database demands implementation of a speech tagging algorithm. 4. Answer to user by playing the result using Festival. Please send me your feedback so that we could discuss in terms of implementation of the project. -- Nikhil Bhendawade, BITS Pilani
_______________________________________________ Project-ideas mailing list [email protected] http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in
