[
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marija Sljivovic updated RAT-45:
--------------------------------
Attachment: apache-rat-pd-src-0.05.zip
After code-reviews(https://issues.apache.org/jira/browse/RAT-45) it is decided
not to use enums and if/switch, but inheritance and polymorphism if it is
possible.
This leads to more simple classes. JavaCommentsHeuristicChecker and
PascalCommentsHeuristicChecker are written.
In these classes regular expressions are used for comment matching. Some tests
for these classes are written, too.
I will now think about ISearchEngine interface according to google code search
API and tray to make parser using libraries for google code search.This is very
important part of this tool.
> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
> Key: RAT-45
> URL: https://issues.apache.org/jira/browse/RAT-45
> Project: RAT
> Issue Type: New Feature
> Environment: This improvements of Apache RAT tool will be written in
> Java.
> Requirements: OS with RE already installed on and Internet connection
> Reporter: Marija Sljivovic
> Attachments: apache-rat-pd(maven included)0.03.zip,
> apache-rat-pd-0.02.zip, apache-rat-pd-src-0.04.zip,
> apache-rat-pd-src-0.05.zip, copyandpaste.zip,
> copyandpastedetector-src-0.01.zip, pom.xml
>
> Original Estimate: 2688h
> Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search engines for possible plagiarised
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is
> matching code on code search engines.
> This part of code will be stored in report if any match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window
> algorithm.
> Current code parts which algorithm generate will be checked by different
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.