[
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701091#action_12701091
]
Alexei Fedotov commented on RAT-45:
-----------------------------------
Thanks for the update.
Let me start code review. Regardless of style my comments are not mandatory to
fix - they just reflect what do I think.
1. The directory name should be aligned with other rat artifacts, e.g.
apache-rat-<a short token>. If you get three letter token, that would be nice.
2. pom.xml is missed. If you read more about maven, this would affect the whole
directory structure. You would get
apache-rat-<the short token>/src/main/java/org/
apache-rat-<the short token>/src/test/java/org/
3. The package names should be org.apache.rat.<the short token>
4. raport->report
As a general rule I suggest using Eclipse IDE for development. It has an
embedded spell checker.
5. ReadMe.txt I like the content. The typical name of this file is given here:
http://en.wikipedia.org/wiki/README Mixed case is from rare Windows dialect.
6. package "common" - I believe there should be "util" package for different
manipulators. I believe there should be a separate package for language parser
implementations.
7. package "tool" - well, the whole thing is the tool. This may be "core".
8. package "searchengines" it would be nice to have something shorter
"engines"?
9. ISearchEngine.java the license text should be the first comment
for search engine, it should provide the following info:
1. if a given pattern is found
2. how often a given pattern appears in different projects
3. if it is not too common pattern, the engine should return where it is found
The interface requires more work to reflect this logic (well, at least number
of interface functions does not match).
10. enum ProgramingLanguages
probably not needed
11. There are general rules how to describe javadoc. I do not request all
methods to be documented in this way, but it would be nice to have all
interface methods documented. This would save us writing architectural
documents and facing misunderstanding.
http://java.sun.com/j2se/javadoc/writingdoccomments/#format
Finally, you are doing a great job! Thanks!
> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
> Key: RAT-45
> URL: https://issues.apache.org/jira/browse/RAT-45
> Project: RAT
> Issue Type: New Feature
> Environment: This improvements of Apache RAT tool will be written in
> Java.
> Requirements: OS with RE already installed on and Internet connection
> Reporter: Marija Sljivovic
> Attachments: copyandpaste.zip, copyandpastedetector-src-0.01.zip
>
> Original Estimate: 2688h
> Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search engines for possible plagiarised
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is
> matching code on code search engines.
> This part of code will be stored in report if any match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window
> algorithm.
> Current code parts which algorithm generate will be checked by different
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.