[ 
https://issues.apache.org/jira/browse/RAT-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12701324#action_12701324
 ] 

Marija Sljivovic commented on RAT-45:
-------------------------------------

Firstly, I would like to thank you for allowing me to be a part of this project 
I would do my best to justify your trust in me. 

About Suggestions: 
1.I like long, description names, but I agree with you in this situation(what 
do you think about: apache-rat-pd (Plagiarism Detector) or apache-rat-plagdet 
?) 
2.I know very Little about maven, but I will inform myself more about it. 
3. Using the reverse domain naming convention is useful. Thanks for suggestion. 
I will do that. 
4. It is typo...I use Eclipse spell checker always, but it can't say when is 
package name misspelled... 
5. "Mixed case is from rare Windows dialect." I like this comment. README is 
better for me, too. 
6. In this prototype is used class for loading source file from file system and 
when I thought where to place this class I decide to create common directory . 
I beleave that I can use org.apache.rat.DirectoryWalker for reading whole 
source directory so FileManiplator will be deleted. So common will be deleted 
then. 
7, 8...OK. 
9. I will think more about this. 
10. Language Enum can be Iner class or something. Lets use it for a while and 
if we decide that it is not more useful, we will delete it. 
11. I like well-documented code. I will tray to document this code by standards 
you give to me. Thank you for the link. 

About Google Code Search API: I studied several days ago this libraries. I was 
afraid that licence of it will be restrictive, but it is Apache Licence so I 
suppose that I can use this API in parser for Google Code Search. There will 
not be mixing of code at all... 
I think even to instantiate all parsers using reflection, including Google Code 
Search parser too. 
On this way parsers will be plugins for our application and we will not have 
problems with licence oh any jar library in any parser. 
We can have more than one parser for one code search engine too...if anyone 
want to write other plugin 
What do you think about it? 

Thank you for this suggestions. I found them very useful. 
I will very soon correct source according to this list.

> Apache RAT copy&paste detector - tool for detecting copied(plagiarised) code 
> by searching on web code search engines
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: RAT-45
>                 URL: https://issues.apache.org/jira/browse/RAT-45
>             Project: RAT
>          Issue Type: New Feature
>         Environment: This improvements of Apache RAT tool will be written in 
> Java.
> Requirements: OS with RE already installed on  and Internet connection
>            Reporter: Marija Sljivovic
>         Attachments: copyandpaste.zip, copyandpastedetector-src-0.01.zip
>
>   Original Estimate: 2688h
>  Remaining Estimate: 2688h
>
> This document is about implementing new tool which will be included in Apache 
> RAT project.
> Original idea: http://wiki.apache.org/general/SummerOfCode2009#rat-project
> Aim is to create working, modular, configurable command-line tool
> for searching the web based code search  engines for possible plagiarised 
> code in our code bases.
> Tool will be heuristic in nature. It will make guesses about code parts.
> If it decide that code is good-to-be-copy&pasted, it will check if there is 
> matching code on code search engines.
> This part of code will be stored in report if any  match is found.
> Man who read this report will decide about is code really copied or it is not.
> Algorithm which will be in base of this tool is variant of sliding-window 
> algorithm.
> Current code parts which algorithm generate will be checked by different 
> heuristic methods and optionally
> will be sent to some code search engine for checking.
> More information and ideas about this project can be found here:
> http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to