Amila, thanks. In addition to the proposal writing skill I encourage you to demonstrate your coding skills for those who vote for your acceptance as a GSoC participant. For example, you can write a class which would parse cut&paste detector arguments into some internal representation. It should contain a main method, a loop to cycle throwgh arguments, and a usage message. The latter is actually a lightweight way to approach an architectural question of the tool scope.
Thanks. 2009/4/2 Amila De Silva <[email protected]>: > Hi Alexei, > Thanks for the reply! > I'll send my application asap. > BR, > Amila > > > On 4/1/09, Alexei Fedotov <[email protected]> wrote: >> Amila, >> I'm sorry, I have unintentionally marked your mail as read. Please >> don't hesitate to ping me again if there is no answer. >> >> Your method would do the job. Let me just add that making sliding >> window size automatically adjustable would have the same linear >> algorithm complexity, so it might be a proper investment. >> >> Please send a proposal to the official GSoC app now. >> >> Thanks! >> >> >> On Fri, Mar 27, 2009 at 6:49 PM, Amila De Silva <[email protected]> wrote: >>> Hi, >>> I did a bit search on code search engines (Google Code Search, Krugel >>> and Koder) to find out a scalable solution. >>> As the first step we can set an initial size for the sliding >>> window(this size can be changed by the user). >>> When a long string is sent to the search engine, it will be tokenized >>> before searching. As I understood there is a limit of tokens they >>> create; if the query string is too long ,after a certain amount of >>> tokens the rest of the string will be considered as a single token. >>> If we can get this number of tokens , its better to set this as the >>> window length(so that window contains that much of tokens). >>> Let's say this size is n. If a query fails to find any result, then >>> whole n tokens will be removed and the next n tokens will be loaded >>> and the search will be performed again. >>> If this query returns any result those URLs will be recorded (I think >>> it's better to take first 3 or 4 URLs only). >>> Even the query returns any result the next n tokens will be newly loaded. >>> By this way the whole code can be searched much quickly , preserving >>> search engine resources. After a list of URLs has been prepared >>> in-depth search can be performed. >>> >>> I'd like to hear your comments on this methods. >>> >>> Best Regards, >>> Amila >>> >> >> >> >> -- >> With best regards / с наилучшими пожеланиями, >> Alexei Fedotov / Алексей Федотов, >> http://www.telecom-express.ru/ >> http://people.apache.org/~aaf/ >> > -- With best regards / с наилучшими пожеланиями, Alexei Fedotov / Алексей Федотов, http://www.telecom-express.ru/ http://people.apache.org/~aaf/
