Hello Tharindu, Thanks for your question. Simple methods work well for real-life tasks, while neural networks may not. For example, I used regular expressions to find multi-line comments, and yes, most of them were copied. Also since legal matters are involved, the simpler is our algorithm, the more understandable are scan implications (and complications).
>From architectural point of view it would be nice to separate logic into a separate module, so a sliding window could be easily replaced with more sophisticated algorithms, even those you'd mentioned. I wish you good luck trying GSoC this year. On Tue, Mar 10, 2009 at 7:45 AM, Tharindu Mathew <[email protected]> wrote: > Hi, > > I'm a student interested in the Cut and Paste detector. > > I was wondering about the scope of this project. Does this iclude parsing > just a few regex strings to code search? Or are we looking at a > spohisticated mechanism where even a neural network maybe trained to > identify certain code patterns? > > > Regards, > > Tharindu > -- С уважением, Алексей Федотов, http://people.apache.org/~aaf/
