[ 
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12704008#action_12704008
 ] 

Aleksey Shipilev commented on RAT-45:
-------------------------------------

Hi Marija!

Overall, the project looks great! I just have some minor issues to face:

1. Please use the Java coding conventions, they are available in the 
comprehensive guide: http://java.sun.com/docs/codeconv/html/CodeConvTOC.doc.html

2. Please use shorter names where possible. If you haven't enough name space in 
one current package, it's probably time to make a new package, rather that 
resorting to this:

        private int checkByJavaCommentHueristicCheckers(String codeToBeChecked) 
{
                return 
checkByJavaSlashSlashCommentHueristicChecker(codeToBeChecked)
                                + 
checkByJavaSlashStarCommentHueristicChecker(codeToBeChecked);
        }

3. Please use proper OOP idioms. E.g. use polymorphism where appropriate, about 
this:

                switch (language) {
                case Java:
                        toret = 
checkByJavaCommentHueristicCheckers(codeToBeChecked) > limit;
                        break;

Switch statement here is the maintenance disaster! You should go for 
polymorphic call here... and this leads to:

4. Please don't be afraid of making another class if task decomposition wants 
it. Your methods check* can go into separate classes, implementing the same 
interface. That would be scalable and maintainable solution, not the overblown 
switch-case statements.

5. And to enforce the rules above, use FindBugs early and often. It would break 
most of mishabits of coding for you. I haven't see anyone to go with strictest 
checks, but relaxed checks help a lot. It would also tell you a lot more than 
me :)

6. I'm eager to to try this project in practice, but I had stuck with how to 
run your prototype. Ideally, there should be the intuitive way to checkout the 
project, build it, and try it. Maven is good for that.

You have the potential to go. Go! :)

> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code 
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-45
>                 URL: https://issues.apache.org/jira/browse/RAT-45
>             Project: RAT
>          Issue Type: New Feature
>         Environment: This improvements of Apache RAT tool will be written in 
> Java.
> Requirements: OS with RE already installed on  and Internet connection
>            Reporter: Marija Sljivovic
>         Attachments: apache-rat-pd-0.02.zip, copyandpaste.zip, 
> copyandpastedetector-src-0.01.zip
>
>   Original Estimate: 2688h
>  Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache 
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search  engines for possible plagiarised 
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is 
> matching code on code search engines.
> This part of code will be stored in report if any  match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window 
> algorithm.
> Current code parts which algorithm generate will be checked by different 
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to