[ 
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701091#action_12701091
 ] 

Alexei Fedotov commented on RAT-45:
-----------------------------------

Thanks for the update.

Let me start code review. Regardless of style my comments are not mandatory to 
fix - they just reflect what do I think.

1. The directory name should be aligned with other rat artifacts, e.g. 
apache-rat-<a short token>. If you get three letter token, that would be nice.
2. pom.xml is missed. If you read more about maven, this would affect the whole 
directory structure. You would get 
    apache-rat-<the short token>/src/main/java/org/
    apache-rat-<the short token>/src/test/java/org/
3. The package names should be org.apache.rat.<the short token>
4. raport->report
As a general rule I suggest using Eclipse IDE for development. It has an 
embedded spell checker.

5. ReadMe.txt I like the content. The typical name of this file is given here: 
http://en.wikipedia.org/wiki/README Mixed case is from rare Windows dialect.

6. package "common" - I believe there should be "util" package for different 
manipulators. I believe there should be a separate package for language parser 
implementations.
7. package "tool" - well, the whole thing is the tool. This may be "core".
8. package "searchengines" it would be nice to have something shorter
"engines"?

9. ISearchEngine.java the license text should be the first comment
for search engine, it should provide the following info:
1. if a given pattern is found
2. how often a given pattern appears in different projects
3. if it is not too common pattern, the engine should return where it is found

The interface requires more work to reflect this logic (well, at least number 
of interface functions does not match).

10.  enum ProgramingLanguages 
probably not needed

11.  There are general rules how to describe javadoc. I do not request all 
methods to be documented in this way, but it would be nice to have all 
interface methods documented. This would save us writing architectural 
documents and facing misunderstanding.
http://java.sun.com/j2se/javadoc/writingdoccomments/#format

Finally, you are doing a great job! Thanks!



> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code 
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-45
>                 URL: https://issues.apache.org/jira/browse/RAT-45
>             Project: RAT
>          Issue Type: New Feature
>         Environment: This improvements of Apache RAT tool will be written in 
> Java.
> Requirements: OS with RE already installed on  and Internet connection
>            Reporter: Marija Sljivovic
>         Attachments: copyandpaste.zip, copyandpastedetector-src-0.01.zip
>
>   Original Estimate: 2688h
>  Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache 
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search  engines for possible plagiarised 
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is 
> matching code on code search engines.
> This part of code will be stored in report if any  match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window 
> algorithm.
> Current code parts which algorithm generate will be checked by different 
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to