Marija,
The timeline is a big progress, and I like your work breakdown.

I see two things which can be improved here. Please, could you do the following?

1. Allocate some time for creation of a maven-based build system for
your project. You should use the same build system as the whole RAT
project.
2. Sum up the time estimates and come up with completion dates for
each milestone.

Thanks.


2009/4/8 Marija Šljivović <[email protected]>:
> Hi, this is timeline for this project that I hope is real...
>
> Command line parsing
>
> We decide this tool have to be a command-line tool. All input
> parameters will be sent to program with command-line arguments
> or will be provided with configuration(xml or plaintext) file. This
> config-file can be loaded automatically(like Ant do) or by specifying
> it like command line argument.
> For now I think about this parameters:
> -source-code location
> -mode in which checker will be working(brute force, heuristic, combined...)
> -source code programing language(this is  very important for heuristic logic)
> search engines on which source code will be checked. For this i think
> that argument can be class name of ISearchEngine implementation I
> explained it in previous mail.
> -stuffs connected with report(location, format or...)
>
> I think that it is enough 5 days in the beginning of coding(I newer
> code parser for command line arguments before, but I think I can use
> RAT's)
>
> Design of plagiarism module interface
> Sliding window module
> Basic example of this algorithm I already
> coded(http://b0ss.on.neobee.net/MinuliRad/RAT/). Lets talk about two
> version of this algorithm:
> -one for brute-force checking - this algorithm I already made
> -one which will use heuristic. It will work like previous but with
> different input(source file from which we removed all getter's,
> setters and other code chunks we are not interested in)
> , and will use list off different "good-to-be-copied-code" recognition 
> methods.
> This methods are not part of this algorithm and will be probably
> separated by an interface. This heuristic methods will be codded
> separate of this task and will get more time then rest of code.
> Because of that I think to write only several of them in the beginning
> and rest of on the end of timeframe.
>        Main difference between this two algorithms will be that second will
> rarely call search engines.
> For writing these two algorithms and implementation of few basic
> heuristic methods(only for one language) I think that is enough
> 10 days
>
> This algorithms will call ISearchEngine (or more of them). For writing
> these parsers I hope that will take me another 10 days.
> I found that some of code search engines cannot get correct result if
> we send them large number of tokens. This is quite unexplored lend to
> me...
>
> After this first version will be completed. Then I plan to look back
> to see if design of application is good or I must change something.
>
> Testing
> This project have to be wall-tested. Because of that I will write
> tests for every heuristic method. I think to use JUnit
> Test source will be loaded from files that exists in code engines and
> from files which I manually checked to be unique.
> Probably I will have test source code  for whole algorithm and for
> each  of heuristic methods.
>
> When (probably and in some occasions before )tool pass all my test , I
> will do real testing on search engines.
> That I will do same thing - use non plagiarised code I wrote and code
> I got from source-repositories which are indexed by search engines.
> This tests for small number of heuristic methods will take huge amount
> of time, but I think that this will enlarge time for 10-15 days.
>
> Because of my poor Internet connection, I plan to make my own search
> engine on localhost and tray this tool on it.(I think I will use
> Struts2 and Apache Lucene if it is easy to use...)
>
> After that, when community approve my design and quality of code, I
> will tray to do more heuristic methods, more tests and some
> modifications to algorithms(pause/resume, multi threading...everything
> we think that is missing)
>
> P.S.
> I think to post this document, and my ideas too, on
> http://b0ss.on.neobee.net/MinuliRad/RAT/ where I can update it later.
>
> Regards, Marija Sljivovic
>



-- 
With best regards / с наилучшими пожеланиями,
Alexei Fedotov / Алексей Федотов,
http://www.telecom-express.ru/
http://people.apache.org/~aaf/
http://harmony.apache.org/
http://code.google.com/p/openmeetings/

Reply via email to