Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code by 
searching on web code search engines
--------------------------------------------------------------------------------------------------------------------

                 Key: RAT-45
                 URL: https://issues.apache.org/jira/browse/RAT-45
             Project: RAT
          Issue Type: New Feature
         Environment: This improvements of Apache RAT tool will be written in 
Java.
Requirements: OS with RE already installed on  and Internet connection
            Reporter: Marija Sljivovic


This document is about implementing new tool which will be included in Apache 
RAT project.
Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project

Aim is to create working, modular, configurable command-line tool
for searching the web based code search  engines for possible plagiarised code 
in our code bases.

Tool will be heuristic in nature. It will make guesses about code parts.
If it decide that code is good-to-be-copy&pasted, it will check if there is 
matching code on code search engines.
This part of code will be stored in report if any  match is found.
Man who read this report will decide about is code really copied or it is not.

Algorithm which will be in base of this tool is variant of sliding-window 
algorithm.
Current code parts which algorithm generate will be checked by different 
heuristic methods and optionally
will be sent to some code search engine for checking.

More information and ideas about this project can be found here:
http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to