Block-quoting and plagiarism are two different questions.
Block-quoting is simple: break the text apart into sentences or even paragraphs and make them separate documents. Make facets of the post-analysis text. Now just pull counts of facets and block quotes will be clear.
Mahout has a scalable implementation of n-gram based document similarity. It calculates distances between all documents and identifies clusters of similar documents. This is a much more general technique and may help you find "obfuscated" plagiarism.
Lance On 07/23/2013 02:33 AM, Furkan KAMACI wrote:
Hi; Sometimes a huge part of a document may exist in another document. As like in student plagiarism or quotation of a blog post at another blog post. Does Solr/Lucene or its libraries (UIMA, OpenNLP, etc.) has any class to detect it?