Hi Gourab, As I am also interested in the project "Develop a system with multi-lingual capabilities in order to receive answer to user specific queries", I have had some discussions on this with Sankarshan which you can browse through the archives. Also, you can find my proposal at - http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/abhi7/20019 which might help you in giving a broad idea of the problem and what I think can be a good solution.
I welcome your comments on the list as you won't be able to comment there and sincerely hope that my proposal would help you as well in framing yours :) Also, regarding your the approach that you have suggested I have a doubt. The approach suggested on the lines of an information retrieval may not work so well (or may be required) for the problem. As a very specific domain of FAQ is targeted, we can probably narrow down our steps to well defined approaches. *For example, we may not require the mentioned dataset as we might work in a way that no conversion from English - Bangla or vice versa is required.* Regards Abhishek On 28 March 2012 21:24, Gourab Saha <[email protected]> wrote: > > > I am Gourab here.As I am on the process to draft my formal project > proposal, I have few questions regarding some issues. > As previously said I am seriously interested to work on a project on the > field of "information retrieval" as a part of GSOC2012. > and even continue to work on that field to for my M.Tech thesis. > > During the previous week I did a good study on the following project ideas > you have floated on this area. > > > 1. Improving models for Cross Language Text Re-use > 2. Develop a system with multi-lingual capabilities in order to receive > answer to user specific queries > 3. Improving information retrieval methods for OCR data sets consisting of > indic scripts > > I have talked with my professors having similar field of research > interest.As per their valuable suggestions on the above mentioned project > ideas I am on my way to draft a > formal proposal. > > As you have raised the concern over the license issue over the > dataset/tools available, I have clarified from my > professor, RISOT data set(RISOT)(On which I am planing to work) is freely > available and not constrained by any license. > The lemur toolkit is complete open source framework for IR software > development. and Trec_eval, a standard tool for > performance evolution is also completely open source. > > I am writing my proposals in a brief here . Kindly give suggestions how it > can be further improved. > > The key idea is to propose and implement a method to improve the > cross-language information retrieval with a > pair of languages(Bengali/English).We have RISOT data set containing > article from ABP bengali news paper corpus > from 2004-2006 as well as The Telegraph english newspaper corpus. It will > take the query in english and retrieve the > results from the bengali corpus . The above mentioned will go through a > process of implementation translation,transliteration, > blind relevance feedback,query expansion and finally the information > retrieval.I am aiming for a well-accepted accuracy > measure. > > I have few questions other than technical issues of the project. > > 1. Apart from mentors from the organization(http://ankur.org.in/) can I > have a mentor from my institution/foreign university? However They will not > be > anyway related to GSOC2012. > > 2. If my proposal is accepted and my research in this summer lead to paper > publication is there any type of constraints/to-dos from > GSOC or from your organization for publishing a paper?As far I > understood Google doesn't have any problem as long as I release my > code under open source license. > > Kindly let me know any other issues regarding the proposal(Details will > be included in the final proposal) or any other impediments over > any other related issues . Kindly give your valuable feedback,as I am on > my way to draft my formal proposal. > I am sincerely hoping to work with you in this summer. > > If you have any questions or concerns, please feel free to send me an > email gourab.isikolkata@gmail .com or call me at 9051110501. > > Thanks > > Gourab Saha > M.Tech(Computer Science) > Indian Statistical Institute,Kolkata > [email protected] > (+91)9051110501 > > > > _______________________________________________ > Project-ideas mailing list > [email protected] > http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in > >
_______________________________________________ Project-ideas mailing list [email protected] http://lists.ankur.org.in/listinfo.cgi/project-ideas-ankur.org.in
