Re: [CODE4LIB] Plagiarism checker
My first thought was something like programatically doing a pairwise diff of the files, 5500 times. I was surprised I couldn't find a utility that just does this. But i did find something called diffuse [1], that allows you to graphically compare any number of text files in a diff-like fashion. This would probably at least be able to tell you which files need closer scrutiny. I think you'd presumably have to be able to extract the text from each file; I doubt it would work on raw Word docs or PDFs, so that might be a stopper. It seems like the realm of source control has a lot of software designed to help with this problem, so there might be other similar things out there. But probably not anything designed to natively handle print-ready files. -dre. [1] http://diffuse.sourceforge.net/about.html On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote: Can anyone recommend a plagiarism checking software besides Turnitin and SafeAssign? I need to compare about 100 student assignments against each other to make sure they don't copy each other's assignments. Thanks. Judy K. Meirose Systems Librarian Florida Coastal School of Law 8787 Baypine Rd Jacksonville, FL (904)680-7603 This email transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential, privileged and/or proprietary information for the sole use of the intended recipient(s). If you are not an intended recipient or a person responsible for delivering it to an intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please: (1) immediately notify me by reply e-mail; and (2) destroy the original (and any copies) of this transmission and its attachments without reading or saving in any manner.
Re: [CODE4LIB] Plagiarism checker
Just thought I'd pop my head in: TurnItIn does compare to other previous submissions (both at your own institution and others) unless the submitter chooses not to include them in the repository for future checks. Cheers, Adam Traub Electronic Resources Librarian The Wallace Center Rochester Institute of Technology adam.tr...@rit.edu -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mark A. Matienzo Sent: Friday, January 23, 2015 9:45 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Plagiarism checker I believe Turnitin and SafeAssign both compare the text of submissions to against external sources (e.g., SafeAssign uses ABI/INFORM, among others). I am not certain if they compare submissions against each other. However, if you're looking for something along the lines of what Dre suggests, you could use ssdeep, which is an implementation of a piecewise hashing algorithm [0]. The issue with that you would have to assume that all students would probably be using the same file format. You could also using something like Tika to extract the text content from all the submissions, and then compare them against each other. [0] http://ssdeep.sourceforge.net/ [1] http://tika.apache.org/ Mark -- Mark A. Matienzo m...@matienzo.org Director of Technology, Digital Public Library of America On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides akorp...@ncsu.edu wrote: My first thought was something like programatically doing a pairwise diff of the files, 5500 times. I was surprised I couldn't find a utility that just does this. But i did find something called diffuse [1], that allows you to graphically compare any number of text files in a diff-like fashion. This would probably at least be able to tell you which files need closer scrutiny. I think you'd presumably have to be able to extract the text from each file; I doubt it would work on raw Word docs or PDFs, so that might be a stopper. It seems like the realm of source control has a lot of software designed to help with this problem, so there might be other similar things out there. But probably not anything designed to natively handle print-ready files. -dre. [1] http://diffuse.sourceforge.net/about.html On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote: Can anyone recommend a plagiarism checking software besides Turnitin and SafeAssign? I need to compare about 100 student assignments against each other to make sure they don't copy each other's assignments. Thanks. Judy K. Meirose Systems Librarian Florida Coastal School of Law 8787 Baypine Rd Jacksonville, FL (904)680-7603 This email transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential, privileged and/or proprietary information for the sole use of the intended recipient(s). If you are not an intended recipient or a person responsible for delivering it to an intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please: (1) immediately notify me by reply e-mail; and (2) destroy the original (and any copies) of this transmission and its attachments without reading or saving in any manner.
Re: [CODE4LIB] Plagiarism checker
On Jan 23, 2015, at 9:44 AM, Mark A. Matienzo wrote: I believe Turnitin and SafeAssign both compare the text of submissions to against external sources (e.g., SafeAssign uses ABI/INFORM, among others). I am not certain if they compare submissions against each other. My understanding of TurnItIn, at least initially, was that they built their corpus on existing submissions. (they had some deals with universities back when they started up to use their service for free or cheap, so that they could build up their corpus). However, if you're looking for something along the lines of what Dre suggests, you could use ssdeep, which is an implementation of a piecewise hashing algorithm [0]. The issue with that you would have to assume that all students would probably be using the same file format. You could also using something like Tika to extract the text content from all the submissions, and then compare them against each other. I'd agree on extracting the text. MS Word used to store documents as strings of edits, making it difficult to compare two documents for similarity without parsing the format. (I don't know if they still do this in .docx) -Joe
Re: [CODE4LIB] Plagiarism checker
I believe Turnitin and SafeAssign both compare the text of submissions to against external sources (e.g., SafeAssign uses ABI/INFORM, among others). I am not certain if they compare submissions against each other. However, if you're looking for something along the lines of what Dre suggests, you could use ssdeep, which is an implementation of a piecewise hashing algorithm [0]. The issue with that you would have to assume that all students would probably be using the same file format. You could also using something like Tika to extract the text content from all the submissions, and then compare them against each other. [0] http://ssdeep.sourceforge.net/ [1] http://tika.apache.org/ Mark -- Mark A. Matienzo m...@matienzo.org Director of Technology, Digital Public Library of America On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides akorp...@ncsu.edu wrote: My first thought was something like programatically doing a pairwise diff of the files, 5500 times. I was surprised I couldn't find a utility that just does this. But i did find something called diffuse [1], that allows you to graphically compare any number of text files in a diff-like fashion. This would probably at least be able to tell you which files need closer scrutiny. I think you'd presumably have to be able to extract the text from each file; I doubt it would work on raw Word docs or PDFs, so that might be a stopper. It seems like the realm of source control has a lot of software designed to help with this problem, so there might be other similar things out there. But probably not anything designed to natively handle print-ready files. -dre. [1] http://diffuse.sourceforge.net/about.html On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote: Can anyone recommend a plagiarism checking software besides Turnitin and SafeAssign? I need to compare about 100 student assignments against each other to make sure they don't copy each other's assignments. Thanks. Judy K. Meirose Systems Librarian Florida Coastal School of Law 8787 Baypine Rd Jacksonville, FL (904)680-7603 This email transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential, privileged and/or proprietary information for the sole use of the intended recipient(s). If you are not an intended recipient or a person responsible for delivering it to an intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please: (1) immediately notify me by reply e-mail; and (2) destroy the original (and any copies) of this transmission and its attachments without reading or saving in any manner.
[CODE4LIB] Plagiarism checker
Can anyone recommend a plagiarism checking software besides Turnitin and SafeAssign? I need to compare about 100 student assignments against each other to make sure they don't copy each other's assignments. Thanks. Judy K. Meirose Systems Librarian Florida Coastal School of Law 8787 Baypine Rd Jacksonville, FL (904)680-7603 This email transmission, and any documents, files or previous e-mail messages attached to it, may contain confidential, privileged and/or proprietary information for the sole use of the intended recipient(s). If you are not an intended recipient or a person responsible for delivering it to an intended recipient, any disclosure, copying, distribution or use of any of the information contained in or attached to this transmission is strictly prohibited. If you have received this transmission in error, please: (1) immediately notify me by reply e-mail; and (2) destroy the original (and any copies) of this transmission and its attachments without reading or saving in any manner.