Block-quoting and plagiarism are two different questions.

Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear.

Mahout has a scalable implementation of n-gram based document similarity. It calculates distances between all documents and identifies clusters of similar documents. This is a much more general technique and may help you find "obfuscated" plagiarism.

Lance

On 07/23/2013 02:33 AM, Furkan KAMACI wrote:
Hi;

Sometimes a huge part of a document may exist in another document. As like
in student plagiarism or quotation of a blog post at another blog post.
Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to
detect it?


Reply via email to