On Wed, Apr 17, 2013 at 12:24 AM, srikanth ronanki <[email protected]> wrote: >> Hi Ronanki, >> >> Most part of the proposal is fine with me but I would be particularly >> interested in the below >> >> a) Collection of data and extension of it to a more general sense >> incorporating complex/compound sentences data. > > Collection of data is always a tedious task. I am used to it. I > recorded a small corpus (400 words + 120 sentences) as a part of last > year project as well. I can do it with my own lab members if required. > I never did the second part. Extension of the data to incorporate > complex/compound sentences seems a good idea. Definitely, I can try > that given broad picture of what need to be done exactly. >
What I was thinking was like extension of data to a more general set broadly encompassing our daily speech styles and incorporating the same. Please excuse me if I'm not clear enough. >> b) Voice activity detection algorithm you can use so that distortion >> gets reduced > > Yes, we can use voice activity detection algorithm to reduce the > distortion. Simplest way of doing it is to compute signal to noise > ration (SNR) and detect voiced regions based on some threshold. Also, > we can also use the time-efficient spectral subtraction methods based > on VAD to reduce the noise as well if any. > > I worked on these lines to some extent earlier: > http://web.iiit.ac.in/~ronanki/speech_enhancement/ > > (sorry, if you couldn't play those files) > Had a look but unfortunately couldnt play the same. >> c) extension to a global set for porting and extension into other languages > > Global phone set is some what similar to this type: > http://homepage.ntlworld.com/stone-catend/trimain1.htm > > We can modify the latin symbols with phones that can be represented > with English letters. > For ex: > Telugu - a అ, aa ఆ, i ఇ, ii ఈ, u ఉ, uu ఊ, rq ఋ, rrq ౠ, e ఎ, ee ఏ, ai > ఐ, o ఒ, oo ఓ, au ఔ > Bengali - ao অ, aa আ, i ই,ঈ, u উ,ঊ, rq ঋ, rrq ৠ, e এ, oi ঐ, o ও, ou ঔ > Fine by me. >> d) Achieving word sense disambiguation >> > I don't have any great idea on this as of now. But, sure, I can work > on this once I get to know about the corpus/text. > Does an annotator, something on the below lines help? http://code.google.com/p/ytex/wiki/WordSenseDisambiguation_V08 >> and finally >> >> e) working 48 hra/week (which is equivalent to full time work) > > GSoC requires 30hrs per week and I know that it won't be pretty much > same with my previous experience. 48hrs per week is the maximum time I > can work on the project in any given week. Sometimes, I may take a > week off during the project and some times I will work to the full > potential. It all depends on with how much interest we are doing. I > would be very happy to work on a project in Indian languages and to > get a notable accuracy. I am looking forward to do this project and > will definitely try my best to meet the goals and milestones as per > the schedule. > Fine again. Thanks for your replies. Regards, -- Bhavani Shankar Ubuntu Developer | www.ubuntu.com https://launchpad.net/~bhavi _______________________________________________ Project-ideas mailing list [email protected] http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in
