[ 
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marija Sljivovic updated RAT-45:
--------------------------------

    Attachment: apache-rat-pd-src-0.05.zip

After code-reviews(https://issues.apache.org/jira/browse/RAT-45) it is decided 
not to use enums and if/switch, but inheritance and polymorphism if it is 
possible.
This leads to more simple classes. JavaCommentsHeuristicChecker and 
PascalCommentsHeuristicChecker are written. 
In these classes regular expressions are used for comment matching. Some tests 
for these classes are written, too.
I will now think about ISearchEngine interface according to google code search 
API and tray to make parser using libraries for google code search.This is very 
important part of this tool.

> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code 
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-45
>                 URL: https://issues.apache.org/jira/browse/RAT-45
>             Project: RAT
>          Issue Type: New Feature
>         Environment: This improvements of Apache RAT tool will be written in 
> Java.
> Requirements: OS with RE already installed on  and Internet connection
>            Reporter: Marija Sljivovic
>         Attachments: apache-rat-pd(maven included)0.03.zip, 
> apache-rat-pd-0.02.zip, apache-rat-pd-src-0.04.zip, 
> apache-rat-pd-src-0.05.zip, copyandpaste.zip, 
> copyandpastedetector-src-0.01.zip, pom.xml
>
>   Original Estimate: 2688h
>  Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache 
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search  engines for possible plagiarised 
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is 
> matching code on code search engines.
> This part of code will be stored in report if any  match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window 
> algorithm.
> Current code parts which algorithm generate will be checked by different 
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to