Marija Šljivović wrote: > Hi! > I am working on copy&paste(plagiarism) detector.
cool > You can see information about project and reports of my progress on this > locations: > http://wiki.apache.org/general/MarijaSljivovic/SoC2009ApacheRatProposal > https://issues.apache.org/jira/browse/RAT-45 > or get source code and binary distributions on: > http://code.google.com/p/apache-rat-pd/ > I think now to make some misspellings heuristic checkers. This algorithms > will be able to notice some misspelled words in source code. > Then this part of code will be sent to some of code search > engines(GoogleCodeSearch for example) to check if it can find any similar > misspellings in public code bases. > On that way we can check possibility if code part is plagiarised. > Now i search for an open source library which can be used for this task. I > found one: jazzy ( http://jazzy.sourceforge.net/ ) and I think that it is > good for this purpose. probably best to make the API pluggable (jazzy is LGPL but this is good advice in any case) > Any suggestion for other solution that is better then jazzy? i'm not sure whether it would be better but an alternative approach would be to use a semi-structured text analysis tool for example UIMA (http://incubator.apache.org/uima/) or lucene > Work on apache-rat-pd(plagiarism detector) is continuing. great :-) - robert
