Hi, this is timeline for this project that I hope is real...

Command line parsing

We decide this tool have to be a command-line tool. All input
parameters will be sent to program with command-line arguments
or will be provided with configuration(xml or plaintext) file. This
config-file can be loaded automatically(like Ant do) or by specifying
it like command line argument.
For now I think about this parameters:
-source-code location
-mode in which checker will be working(brute force, heuristic, combined...)
-source code programing language(this is  very important for heuristic logic)
search engines on which source code will be checked. For this i think
that argument can be class name of ISearchEngine implementation I
explained it in previous mail.
-stuffs connected with report(location, format or...)

I think that it is enough 5 days in the beginning of coding(I newer
code parser for command line arguments before, but I think I can use
RAT's)

Design of plagiarism module interface
Sliding window module
Basic example of this algorithm I already
coded(http://b0ss.on.neobee.net/MinuliRad/RAT/). Lets talk about two
version of this algorithm:
-one for brute-force checking - this algorithm I already made
-one which will use heuristic. It will work like previous but with
different input(source file from which we removed all getter's,
setters and other code chunks we are not interested in)
, and will use list off different "good-to-be-copied-code" recognition methods.
This methods are not part of this algorithm and will be probably
separated by an interface. This heuristic methods will be codded
separate of this task and will get more time then rest of code.
Because of that I think to write only several of them in the beginning
and rest of on the end of timeframe.
        Main difference between this two algorithms will be that second will
rarely call search engines.
For writing these two algorithms and implementation of few basic
heuristic methods(only for one language) I think that is enough
10 days

This algorithms will call ISearchEngine (or more of them). For writing
these parsers I hope that will take me another 10 days.
I found that some of code search engines cannot get correct result if
we send them large number of tokens. This is quite unexplored lend to
me...

After this first version will be completed. Then I plan to look back
to see if design of application is good or I must change something.

Testing
This project have to be wall-tested. Because of that I will write
tests for every heuristic method. I think to use JUnit
Test source will be loaded from files that exists in code engines and
from files which I manually checked to be unique.
Probably I will have test source code  for whole algorithm and for
each  of heuristic methods.

When (probably and in some occasions before )tool pass all my test , I
will do real testing on search engines.
That I will do same thing - use non plagiarised code I wrote and code
I got from source-repositories which are indexed by search engines.
This tests for small number of heuristic methods will take huge amount
of time, but I think that this will enlarge time for 10-15 days.

Because of my poor Internet connection, I plan to make my own search
engine on localhost and tray this tool on it.(I think I will use
Struts2 and Apache Lucene if it is easy to use...)

After that, when community approve my design and quality of code, I
will tray to do more heuristic methods, more tests and some
modifications to algorithms(pause/resume, multi threading...everything
we think that is missing)

P.S.
I think to post this document, and my ideas too, on
http://b0ss.on.neobee.net/MinuliRad/RAT/ where I can update it later.

Regards, Marija Sljivovic

Reply via email to