Amila,
I'm sorry, I have unintentionally marked your mail as read. Please
don't hesitate to ping me again if there is no answer.

Your method would do the job. Let me just add that making sliding
window size automatically adjustable would have the same linear
algorithm complexity, so it might be a proper investment.

Please send a proposal to the official GSoC app now.

Thanks!


On Fri, Mar 27, 2009 at 6:49 PM, Amila De Silva <[email protected]> wrote:
> Hi,
> I did a bit search on code search engines (Google Code Search, Krugel
> and Koder) to find out a scalable solution.
> As the first step we can set an initial size for the sliding
> window(this size can be changed by the user).
> When a long string is sent to the search engine, it will be tokenized
> before searching. As I understood there is a limit of tokens they
> create; if the query string is too long ,after a certain amount of
> tokens the rest of the string will be considered as a single token.
> If we can get this number of tokens , its better to set this as the
> window length(so that window contains that much of tokens).
> Let's say this size is n. If a query fails to find any result, then
> whole n tokens will be removed and the next n tokens will be loaded
> and the search will be performed again.
> If this query returns any result those URLs will be recorded (I think
> it's better to take first 3 or 4 URLs only).
> Even the query returns any result the next n tokens will be newly loaded.
> By this way the whole code can be searched much quickly , preserving
> search engine resources. After a list of URLs has been prepared
> in-depth search can be performed.
>
> I'd like to hear your comments on this methods.
>
> Best Regards,
> Amila
>



-- 
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://www.telecom-express.ru/
http://people.apache.org/~aaf/

Reply via email to