Hi, my name is Marija Šljivović and I am a student of informatics and
mathematics in Serbia.
I am interested in "RAT 1 Cut&Paste Detector" project. I find this project
very interesting because
there are already several tools which provide finding duplicated code (PMD,
Simian...),
but neither one of them can check code on internet. This will be the
greatest difference between them and Apache RAT.
This is something completely new and I would like to be a part of it.

 I have just set up my application at
http://socghop.appspot.com/student_proposal/show/google/gsoc2009/maka/t123843563294

where I presented my ideas. This project is very interesting to me and I
think that some of my ideas will be useful for realization this project no
matter who works on it.

 RAT will work in similar way like PMD and ChackStyle Eclipse plugin-s work,
but it will be retrieve code for comparation from several search engines.
This tool will have xml configuration file for each search engine (engines
may change search query's gramatic ).
In those files will be defined characteristic properties for each search
engine (for example - checking only results written in particularly program
language).

In order to prevent search engines to suspect this robot for DDOS attack,
this tool must support waiting for certain time amount between each two
queries.
This time amount will be defined in configuration file.

 This tool must support multithreading (checking source in multiple search
engines in same time).

 I think that this tool also must support pause/continue because of
potentially big time amount needed for checking big source-code bases with
search engines.
That way if we stop checking source-code base for any reason (lost of
internet connection for example )
we wont have to start checking from the start but simply continue from where
we stop last time.

 In addition a Swing GUI can be made for this tool. It will support
configuration (for xml files I mention before) and managing all process from
selecting source-code for comparation,
running checking process to viewing generated report.

The biggest challenge will be to make search engines to cooperate. I think
that whole source code must be checked - this means not only suspected
chunks of code will be examined.
 Because of that, it must been learned how search engines works to avoid
multiple queries.



Best regards,Marija...

Reply via email to