Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code by
searching on web code search engines
--------------------------------------------------------------------------------------------------------------------
Key: RAT-45
URL: https://issues.apache.org/jira/browse/RAT-45
Project: RAT
Issue Type: New Feature
Environment: This improvements of Apache RAT tool will be written in
Java.
Requirements: OS with RE already installed on and Internet connection
Reporter: Marija Sljivovic
This document is about implementing new tool which will be included in Apache
RAT project.
Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
Aim is to create working, modular, configurable command-line tool
for searching the web based code search engines for possible plagiarised code
in our code bases.
Tool will be heuristic in nature. It will make guesses about code parts.
If it decide that code is good-to-be-copy&pasted, it will check if there is
matching code on code search engines.
This part of code will be stored in report if any match is found.
Man who read this report will decide about is code really copied or it is not.
Algorithm which will be in base of this tool is variant of sliding-window
algorithm.
Current code parts which algorithm generate will be checked by different
heuristic methods and optionally
will be sent to some code search engine for checking.
More information and ideas about this project can be found here:
http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.