[CODE4LIB] Fwd: Job: Digital Library Applications Developer

2015-01-23 Thread Katherine Lynch
Temple University Libraries' software development team is growing!  With
exciting projects currently in development and on the horizon for Temple
University Libraries, this is an opportunity to work as part of a dynamic
and passionate team on highly-active Open Source projects like Hydra,
Fedora Commons, and Blacklight.



*NOTE: For this position, we are willing to consider a telecommuting
arrangement of up to 80% (4 days a week) depending on candidate's
experience and qualifications.*
-- Forwarded message --
From: Katherine Lynch tuf15...@temple.edu
Date: Tue, Dec 2, 2014 at 12:34 PM
Subject: Job: Digital Library Applications Developer
To: CODE4LIB@listserv.nd.edu


** Please excuse any cross-posting **

The Temple University Libraries are seeking a creative and energetic
individual to fill the position of Digital Library Applications Developer.
This position is an opportunity to engage with the active Hydra/Fedora
community and other Open Source communities.  Temple’s federated library
system serves an urban research university with over 1,800 full-time
faculty and a student body of 36,000 that is among the most diverse in the
nation. For more information about Temple and Philadelphia, visit
http://www.temple.edu.

Primary Duties and Responsibilities:

Reporting to the Senior Digital Library Applications Developer and working
closely with others in the Digital Library Initiatives Department, the DLAD
helps develop and maintain the technological infrastructure for Temple
University’s digital library initiatives and services, which includes
preserving and delivering large collections of digital objects with the
Hydra repository framework, and supporting digital scholarship (including
digital humanities), and scholarly communication initiatives throughout the
Library. As part of the development team, the DLAD architects, implements,
tests and deploys new tools and services primarily based on open source
project software, such as Hydra, Fedora Commons, Omeka, VIVO, Scalar, and
Open Journal Systems (OJS), potentially contributing code to those
projects. The DLAD advances professional skills through engagement with the
active Open Source community via training and participation at national and
regional conferences/meet-ups.  Performs other duties as assigned.

Required Education and Experience:

Bachelor’s degree in Computer Science or related field, and at least one
year of experience. An equivalent combination of education and experience
may be considered.

Required Skills and Abilities:

* Demonstrated experience with application development in at least one
major programming language such as Ruby on Rails, PHP, or Java.
* Demonstrated experience with MySQL or other database management systems.
* Demonstrated knowledge of the LAMP stack or similar technology stacks.
* Demonstrated ability to perform effective code testing and QA testing.
* Experience with project requirements gathering.
* Strong organizational and interpersonal skills, demonstrated ability to
work in a collaborative team-based environment, and to communicate well
with IT and non-IT staff.
* Commitment to responsive and innovative service.
* Demonstrated ability to write clear documentation.

Preferred Skills and Abilities:

* Experience with a repository system such as Hydra.
* Familiarity with a Content Management System like Drupal or an exhibit
curation system like Omeka would be a plus.
* Experience working with Open Source software, including multi-platform
integration.
* Experience with version control, test-driven development, and continuous
integration techniques.
* Experience with Linux/Unix operating systems, including scripting.
* Experience working with authentication and authorization protocols,
including LDAP.
* Knowledge of XML/XSLT.
* Familiarity with digital library standards, such as Dublin Core, MARC,
METS, EAD, and OAI-PMH.

Compensation:

Competitive salary and benefits package.

To apply:

To apply for this position, please visit
http://www.temple.edu/hr/departments/employment/jobs_within.htm, click on
Non-Employees Only, and search for job number TU-18555.  For full
consideration, please submit your completed electronic application, along
with a cover letter and resume. Review of applications will begin
immediately and will continue until the position is filled.

Temple University is an Affirmative Action/Equal Opportunity Employer with
a strong commitment to cultural diversity.

-- 

Katherine Lynch, Senior Digital Library Applications Developer
Temple University Library (http://library.temple.edu)
Samuel L. Paley Library, Room 113, 1210 Polett Walk, Philadelphia, PA 19122
Tel: 215-204-2821 | Fax: 215-204-5201 | Email: katherine.ly...@temple.edu





-- 

Katherine Lynch, Senior Digital Library Applications Developer
Temple University Library (http://library.temple.edu)
Samuel L. Paley Library, Room 113, 1210 Polett Walk, Philadelphia, PA 19122
Tel: 215-204-2821 | Fax: 215-204-5201 | Email: 

[CODE4LIB] EVENT: Islandora Conference, August 3 - 7 in Charlottetown PEI

2015-01-23 Thread Islandora Community
The Islandora Foundation is thrilled to invite you to the first-ever
Islandora Conference, taking place August 3 - 7, 2015 in the birthplace of
Islandora: Charlottetown, PEI.

This full week event will consist of sessions from the Islandora
Foundation, Interest groups, community presentations, two full days of
hands-on Islandora training, and will end with a Hackfest where we invite
you to make your mark in the Islandora code and work together with your
fellow Islandorians to complete projects selected by the community.

Our theme for the conference is Community - the Islandora community, the
community of people our institutions serve, the community of researchers
and librarians and developers who work together to curate digital assets,
and the community of open source projects that work together and in
parallel.

Registration is now open, with an Early Bird rate available until the end
of March. Institutional rates are also available for groups of three or
more.

For more information or to sign up for the conference, please visit our
conference website: http://islandora.ca/camps/conference2015

Thank you,

The Islandora Team
commun...@islandora.ca
http://islandora.ca


[CODE4LIB] MARC Validation in a UNIX Environment

2015-01-23 Thread Dana Jemison
Hello!

I'm looking for a MARC validation tool (either binary or XML MARC) to identify 
formatting and structural errors in MARC records, which can be run in a Unix 
environment.  Does anyone know of such a tool, or has anyone built something 
like this which they'd be willing to share?

Thanks so much!

Dana

Dana Jemison
Principal Metadata Analyst
California Digital Library
University of California, Office of the President
415 20th Street, 4th Floor, Office 424B
Oakland, CA 94612-2901
Tel: 510.987.0832
Email: dana.jemi...@ucop.edu


Re: [CODE4LIB] wifi / network use policies

2015-01-23 Thread Alex Byrne
Hi, Nate!

Here's the Internet Use Policy that we display for all our public computers 
when they log on:

These are Pierce County Library System's rules for use of the Internet. 
Failure to use this service appropriately and responsibly may result in 
suspension of Internet use privileges, library privileges, and/or criminal 
prosecution.


*Refrain from deliberately accessing illegal sites.

*Comply with copyright laws or software licensing restrictions.

*Do not make any attempt to damage computer equipment or software; 
alter software configurations; cause degradation of system performance.

*Respect the privacy of others.

*Refrain from any activity which is disruptive, libelous or slanderous.

The library staff may request computer users to move to another location or 
vacate a station in order to ensure equitable use for all patrons. Any person 
showing a lack of cooperation with library staff in the use of the Internet is 
subject to being restricted from using it.

Be a smart consumer of information. Evaluate information for accuracy, 
completeness, and validity. Remember, the safest use of the Internet is to 
provide no personal information.

I have read and understand the Library guidelines for Internet use.

(Which can seem a bit hardcore, now that I've typed them out. I wonder how many 
people actually read them when they log in to know what they're agreeing to?)

___

Date:Thu, 22 Jan 2015 16:22:38 +

From:Riley Childs rchi...@cucawarriors.com

Subject: Re: wifi / network use policies



This is the one we use for our Guest/Student network





By accessing this network you agree to the following:



h1***ALL ACCESS ON THIS NETWORK IS LOGGED***/h1 br





ul

liYou will not utilize this network for any illegal purpose/li liYou will 
not utilize this network in such a way that may degrade the performance of the 
network for others/li liUsage of this network may be revoked at any time 
for any reason deemed by Charlotte United Christian Academy/li liYou agree 
to hold Charlotte United Christian Academy and Associated Organizations 
blameless in any issues that may arise/li /ul If you have any issues please 
contact IT Services



--

Riley Childs

Senior

IT Manager

Library Services Administrator

Charlotte United Christian Academy

office: +1 (704) 537-0331 x101

mobile: +1 (704) 497-2086

web: rileychilds.net

twitter: @RowdyChildren

Checkout our new Online Library Catalog: catalog.cucawarriors.com





From: Code for Libraries CODE4LIB@LISTSERV.ND.EDU on behalf of Nate Hill 
nathanielh...@gmail.com

Sent: Thursday, January 22, 2015 9:11 AM

To: CODE4LIB@LISTSERV.ND.EDU

Subject: [CODE4LIB] wifi / network use policies



Hi all,



I wonder if libraries that manage their own networks, either academic or 
public, would be willing to share their wifi / network use policies with me?  
I'm working with the city of Chattanooga to separate our library's 4th Floor 
GigLab http://blog.giglab.io/ from the city's network.  The 4th Floor is our 
library's beta space / makerspace / civic lab, and we are constantly running 
public experiments of one kind or another here.  Our ISP has given us a 
separate 1gig fiber drop for this space, and we intend to use (or keep using) 
the whole area as a public laboratory to experiment with the network, hardware, 
and software.



So... I need to get a policy to city legal for review and to my board before we 
actually make this separation.  I don't really want to go to jail when someone 
hacks North Korea from the library's GigLab.



Thanks for any documents or input you all might provide,



Nate





--

Nate Hill

nathanielh...@gmail.commailto:nathanielh...@gmail.com

http://4thfloor.chattlibrary.org/

http://www.natehill.net



--
Alexander Byrne
Youth Services Librarian
Pierce County Library System
Current Assignments:
University Place Pierce County Library: 253-548-3307
Steilacoom Pierce County Library: 253-548-3313


Re: [CODE4LIB] MARC Validation in a UNIX Environment

2015-01-23 Thread Terry Reese
I believe MARC::LINT
(http://search.cpan.org/~eijabb/MARC-Lint_1.48/lib/MARC/Lint.pm ) provides
some of that functionality (I think).

--tr

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Dana
Jemison
Sent: Friday, January 23, 2015 3:19 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: [CODE4LIB] MARC Validation in a UNIX Environment

Hello!

I'm looking for a MARC validation tool (either binary or XML MARC) to
identify formatting and structural errors in MARC records, which can be run
in a Unix environment.  Does anyone know of such a tool, or has anyone built
something like this which they'd be willing to share?

Thanks so much!

Dana

Dana Jemison
Principal Metadata Analyst
California Digital Library
University of California, Office of the President
415 20th Street, 4th Floor, Office 424B
Oakland, CA 94612-2901
Tel: 510.987.0832
Email: dana.jemi...@ucop.edu


[CODE4LIB] Encrypting EZProxy + SIP2 authentication

2015-01-23 Thread Jane Sandberg
Hi all,

I'd like to have our EZProxy server authenticate users using SIP2,
which is totally supported and documented here:
http://www.oclc.org/support/services/ezproxy/documentation/usr/sip.en.html.

However, I am not enthusiastic about sending unencrypted patron login
information over Telnet or raw sockets, and neither is our ILS
sysadmin.  I'd like to figure out a way to perform the SIP2
authentication/authorization check over SSH, but am not quite sure how
best to do that.  Do either of these approaches make sense?

* Installing stunnel on the EZProxy server to encrypt the outgoing and
incoming SIP2 traffic.

* Writing a custom external script that would handle the whole auth
process: SSHing into our SIP server and seeing if the user is legit.
Here's what EZProxy has to say about this type of option:
http://www.oclc.org/support/services/ezproxy/documentation/usr/external.en.html
-- I'd have to write some code to handle the SIP auth rather than
using EZProxy's built-in option, but my ILS has pretty good
documentation for its SIP implementation.

Am I missing some simpler option?  Our EZProxy is running on a Windows
machine, by the way, and we use Evergreen as our ILS.  I'd love any
advice or suggestions that you seasoned EZProxy experts can share.

Appreciatively,

  -Jane

-- 
Jane Sandberg
Electronic Resources Librarian
Linn-Benton Community College
sand...@linnbenton.edu / 541-917-4655


[CODE4LIB] Assignment planner-calculator use

2015-01-23 Thread Jason Stirnaman
One of our librarians came across K-State's Assignment Planner 
http://www.lib.k-state.edu/apps/ap/
which is based on Minnesota/Minitex's 
http://sourceforge.net/projects/research-calc/
We're curious to hear:
  1. some anecdotes as to how much use this kind of service gets and
  2. if there are worthy alternatives (free or fee)?

Contact me directly if you prefer.

Thanks,
Jason

Jason Stirnaman, MLS
Application Development, Library and Information Services, IR
University of Kansas Medical Center
jstirna...@kumc.edu
913-588-7319


Re: [CODE4LIB] wifi / network use policies

2015-01-23 Thread Kyle Banerjee
I haven't managed a network for years, but our approach was to provide a
broad statement of what the network was for and to make it clear the
network couldn't be used for malicious or illegal purposes.

The CYA policy is a start but you'll still have to deal with problems such
as people using the network to stalk/harass others, intentionally or
unintentionally attack other systems, and piracy. Balancing user needs with
very real privacy issues, network capacity, and the sad fact that some
people act like jerks when they can hide behind a veil of anonymity is
challenging. I'm glad I don't have to worry about that kind of stuff
anymore.

kyle


On Thu, Jan 22, 2015 at 6:11 AM, Nate Hill nathanielh...@gmail.com wrote:

 Hi all,

 I wonder if libraries that manage their own networks, either academic or
 public, would be willing to share their wifi / network use policies with
 me?  I'm working with the city of Chattanooga to separate our library's 4th
 Floor GigLab http://blog.giglab.io/ from the city's network.  The 4th
 Floor is our library's beta space / makerspace / civic lab, and we are
 constantly running public experiments of one kind or another here.  Our ISP
 has given us a separate 1gig fiber drop for this space, and we intend to
 use (or keep using) the whole area as a public laboratory to experiment
 with the network, hardware, and software.

 So... I need to get a policy to city legal for review and to my board
 before we actually make this separation.  I don't really want to go to jail
 when someone hacks North Korea from the library's GigLab.

 Thanks for any documents or input you all might provide,

 Nate


 --
 Nate Hill
 nathanielh...@gmail.com
 http://4thfloor.chattlibrary.org/
 http://www.natehill.net



Re: [CODE4LIB] Assignment planner-calculator use

2015-01-23 Thread Dhanushka Samarakoon
Hi Jason,
I can answer the first question. Since we launched it in Nov/2013 we had
278 assignments scheduled through the system.
Feel free to contact me if you need any other information.
-Dhanushka.

On Fri, Jan 23, 2015 at 1:40 PM, Jason Stirnaman jstirna...@kumc.edu
wrote:

 One of our librarians came across K-State's Assignment Planner
 http://www.lib.k-state.edu/apps/ap/
 which is based on Minnesota/Minitex's
 http://sourceforge.net/projects/research-calc/
 We're curious to hear:
   1. some anecdotes as to how much use this kind of service gets and
   2. if there are worthy alternatives (free or fee)?

 Contact me directly if you prefer.

 Thanks,
 Jason

 Jason Stirnaman, MLS
 Application Development, Library and Information Services, IR
 University of Kansas Medical Center
 jstirna...@kumc.edu
 913-588-7319



[CODE4LIB] Checksums for objects and not embedded metadata

2015-01-23 Thread Kyle Banerjee
Howdy all,

I've been toying with the idea of embedding DOI's in all our digital assets
and possibly inserting/updating other metadata as well. However, doing this
would alter checksums created using normal methods.

Is there a practical/easy way to checksum only the objects themselves
without the metadata? If the metadata in a tiff or other kind of file is
modified, it does nothing to the actual object. Since providing more
complete metadata within objects makes them more usable/identifiable and
might simplify migrations down the road, it seems like this wouldn't be a
bad way to go.

Thanks,

kyle


Re: [CODE4LIB] MARC Validation in a UNIX Environment

2015-01-23 Thread Scott Prater
We do most of our development work with the Java library MARC4j (you can 
output MARCXML, and validate that):


http://marc4j.tigris.org/

And we've used the MARC tools in YAZ on the command line quite often, too:

http://www.indexdata.com/yaz

The LOC has a comprehensive list of tools:

http://www.loc.gov/marc/marctools.html

-- Scott

On 01/23/2015 02:19 PM, Dana Jemison wrote:

Hello!

I'm looking for a MARC validation tool (either binary or XML MARC) to identify 
formatting and structural errors in MARC records, which can be run in a Unix 
environment.  Does anyone know of such a tool, or has anyone built something 
like this which they'd be willing to share?

Thanks so much!

Dana

Dana Jemison
Principal Metadata Analyst
California Digital Library
University of California, Office of the President
415 20th Street, 4th Floor, Office 424B
Oakland, CA 94612-2901
Tel: 510.987.0832
Email: dana.jemi...@ucop.edu



--
Scott Prater
Shared Development Group
General Library System
University of Wisconsin - Madison


Re: [CODE4LIB] wifi / network use policies

2015-01-23 Thread Riley Childs
Another question: Are you talking full on AUP or a short statement (like I 
provided)?

Sent from my Windows Phone

--
Riley Childs
Senior
Charlotte United Christian Academy
Library Services Administrator
IT Services Administrator
(704) 537-0331x101
(704) 497-2086
rileychilds.net
@rowdychildren
I use Lync (select External Contact on any XMPP chat client)

CONFIDENTIALITY NOTICE:  This email and any files transmitted with it are the 
property of Charlotte United Christian Academy.  This e-mail, and any 
attachments thereto, is intended only for use by the addressee(s) named herein 
and may contain confidential information that is privileged and/or exempt from 
disclosure under applicable law.  If you are not one of the named original 
recipients or have received this e-mail in error, please permanently delete the 
original and any copy of any e-mail and any printout thereof. Thank you for 
your compliance.  This email is also subject to copyright. No part of it nor 
any attachments may be reproduced, adapted, forwarded or transmitted without 
the written consent of the copyright ow...@cucawarriors.com


From: Kyle Banerjeemailto:kyle.baner...@gmail.com
Sent: ‎1/‎23/‎2015 2:44 PM
To: CODE4LIB@LISTSERV.ND.EDUmailto:CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] wifi / network use policies

I haven't managed a network for years, but our approach was to provide a
broad statement of what the network was for and to make it clear the
network couldn't be used for malicious or illegal purposes.

The CYA policy is a start but you'll still have to deal with problems such
as people using the network to stalk/harass others, intentionally or
unintentionally attack other systems, and piracy. Balancing user needs with
very real privacy issues, network capacity, and the sad fact that some
people act like jerks when they can hide behind a veil of anonymity is
challenging. I'm glad I don't have to worry about that kind of stuff
anymore.

kyle


On Thu, Jan 22, 2015 at 6:11 AM, Nate Hill nathanielh...@gmail.com wrote:

 Hi all,

 I wonder if libraries that manage their own networks, either academic or
 public, would be willing to share their wifi / network use policies with
 me?  I'm working with the city of Chattanooga to separate our library's 4th
 Floor GigLab http://blog.giglab.io/ from the city's network.  The 4th
 Floor is our library's beta space / makerspace / civic lab, and we are
 constantly running public experiments of one kind or another here.  Our ISP
 has given us a separate 1gig fiber drop for this space, and we intend to
 use (or keep using) the whole area as a public laboratory to experiment
 with the network, hardware, and software.

 So... I need to get a policy to city legal for review and to my board
 before we actually make this separation.  I don't really want to go to jail
 when someone hacks North Korea from the library's GigLab.

 Thanks for any documents or input you all might provide,

 Nate


 --
 Nate Hill
 nathanielh...@gmail.com
 http://4thfloor.chattlibrary.org/
 http://www.natehill.net



[CODE4LIB] Allies bingo card

2015-01-23 Thread Andreas Orphanides
Can I just say how much I love this Tech Diversity Bingo Card that was
posted on Geek Feminism recently:

http://www.maleallies.com/

The website is Male Allies but the bingo card is designed to be
applicable to allies of any historically marginalized group in tech. I also
like that -- unlike many other bingo card social commentaries -- it
highlights things that allies can do right rather than what's often done
wrong (yay positive behavior modeling).


[CODE4LIB] Job: Digitization Services Manager at New York Public Library

2015-01-23 Thread jobs
Digitization Services Manager
New York Public Library
Long Island City

_**Overview:**_

  
The Digital Imaging Unit of NYPL Labs is seeking a visionary, inventive
manager to help The New York Public Library to share its vast collections with
the world through digitization. The Manager will oversee the preservation-
grade photography and reformatting of The New York Public Library's rare and
unique holdings (including illuminated manuscripts, prints, rare books,
literary and historical archives, historical maps, one of the world's largest
photography collections, Broadway set and costume designs, and diverse
documents charting the changing landscape of New York City). The Digitization
Services Manager will also serve as a primary architect of a range of new
digitization streams with the goal of dramatically increasing the volume,
speed, and range of NYPL's imaging activities (e.g. rapid book scanning, high-
speed microfilm digitization, and experiments with new approaches). A key
member of the NYPL Labs leadership team, this is a perfect opportunity for an
enthusiastic, problem-solving individual interested in the full digital
library lifecycle, from digitization to the creation of online access
platforms and user-engagement tools (see http://labs.nypl.org/ for some
examples).

  
  
_**About the Department**_:

  
The Digital Imaging Unit is part of the New York Public Library Labs (NYPL
Labs). Based dually at the Library's landmark central branch on 42nd Street,
and at its cutting-edge services center in Long Island City, NYPL Labs is an
interdisciplinary team working to reformat and reposition the Library's
knowledge for the Internet age. Labs combines core digital library capacities
(digitization, metadata, permissions/reproductions etc.) with an award-winning
tech/design and outreach team focused on deepening engagement with digital
collections and data, and fostering new forms of research and creativity.

  
  
_**Responsibilities:**_

  
Under the general direction of the Deputy Director, NYPL Labs:

  
 Supervises the
Digital Imaging Unit (DIU) team (7 FTE), including performance management,
training and scheduling.

  
 Reviews practices,
procedures, and policies of the department and revise as necessary to improve
efficiency and/or workflow.

  
 Identifies and
implements best technical practices for producing and archiving digital
images.

  
 Assists with the
selection of new equipment and follows trends in digital reformatting.

  
 Maintains production
goals for the unit, ensuring that the staff development needs are met and the
infrastructure is in place to support the unit.

  
 Oversee material
preparation, quality control, and documentation.

  
 Liaises with the
Library's curators, librarians, exhibitions staff, metadata staff, repository
development team, and others on a wide variety of projects.

  
 Oversee in the
receiving and returning of materials to and from collection units.

  
 Oversees the
preparation of digital files for archiving, printing and access purposes.

  
 Oversees the timely
delivery of public order photography requests.

  
 Troubleshoots
capture issues and advises on solutions.

  
 Handles and assigns
special projects.

  
 Performs related
duties as required.

  
_**Qualifications:**_

  
 Bachelors degree and
substantial relevant experience in a research library or

 similar institution;
or an equivalent combination of education and experience.

  
 Successfully
demonstrated experience with production work, production scheduling and
attainable goal setting.

  
 Successfully
demonstrated experience training, supervising and evaluating staff and

 demonstrated
experience in workflow planning and management, production

 of statistics, and
an understanding of bibliographic records.

  
 Successfully
demonstrated working knowledge of information technologies associated

 with digitizing
books, documents and visual materials including experience with Adobe
Photoshop CS.

  
 Excellent working
knowledge of practices and procedures for copy photography,

 including the
understanding of cameras, lighting and lenses, and demonstrated experience
changing lenses and handling medium format camera and lighting hardware.

  
 Demonstrated
understanding of bibliographic control issues within a large research library
environment preferred.

  
 Knowledge of library
preservation issues and successfully demonstrated experience handling rare and
fragile materials preferred.

  
 Demonstrated
experience in a production environment.

  
 Awareness of leading
trends in digitization, and an eagerness to expand the scope of digitization
possibilities at NYPL

  
_**Work Environment:**_

  
Professional photography studio located at NYPL Library Services Center in
Long Island City, Queens; travel for meetings at various Manhattan research
center locations as required.

  
**Union / Non Union:**  
Local 1930

  
_*TO APPLY, PLEASE VISIT*_


Re: [CODE4LIB] Plagiarism checker

2015-01-23 Thread Andreas Orphanides
My first thought was something like programatically doing a pairwise diff
of the files, 5500 times. I was surprised I couldn't find a utility that
just does this.

But i did find something called diffuse [1], that allows you to graphically
compare any number of text files in a diff-like fashion. This would
probably at least be able to tell you which files need closer scrutiny.

I think you'd presumably have to be able to extract the text from each
file; I doubt it would work on raw Word docs or PDFs, so that might be a
stopper.

It seems like the realm of source control has a lot of software designed to
help with this problem, so there might be other similar things out there.
But probably not anything designed to natively handle print-ready files.

-dre.


[1] http://diffuse.sourceforge.net/about.html

On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote:

 Can anyone recommend a plagiarism checking software besides Turnitin and
 SafeAssign?  I need to compare about 100 student assignments against each
 other to make sure they don't copy each other's assignments.

 Thanks.

 Judy K. Meirose
 Systems Librarian
 Florida Coastal School of Law
 8787 Baypine Rd
 Jacksonville, FL
 (904)680-7603

 This email transmission, and any documents, files or previous e-mail
 messages attached to it, may contain confidential, privileged and/or
 proprietary information for the sole use of the intended recipient(s). If
 you are not an intended recipient or a person responsible for delivering it
 to an intended recipient, any disclosure, copying, distribution or use of
 any of the information contained in or attached to this transmission is
 strictly prohibited. If you have received this transmission in error,
 please: (1) immediately notify me by reply e-mail; and (2) destroy the
 original (and any copies) of this transmission and its attachments without
 reading or saving in any manner.



Re: [CODE4LIB] Plagiarism checker

2015-01-23 Thread Adam Traub
Just thought I'd pop my head in:

TurnItIn does compare to other previous submissions (both at your own 
institution and others) unless the submitter chooses not to include them in the 
repository for future checks.  

Cheers,
Adam Traub
Electronic Resources Librarian
The Wallace Center
Rochester Institute of Technology
adam.tr...@rit.edu



-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Mark A. 
Matienzo
Sent: Friday, January 23, 2015 9:45 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Plagiarism checker

I believe Turnitin and SafeAssign both compare the text of submissions to 
against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
I am not certain if they compare submissions against each other.

However, if you're looking for something along the lines of what Dre suggests, 
you could use ssdeep, which is an implementation of a piecewise hashing 
algorithm [0]. The issue with that you would have to assume that all students 
would probably be using the same file format.

You could also using something like Tika to extract the text content from all 
the submissions, and then compare them against each other.

[0] http://ssdeep.sourceforge.net/
[1] http://tika.apache.org/

Mark

--
Mark A. Matienzo m...@matienzo.org
Director of Technology, Digital Public Library of America

On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides akorp...@ncsu.edu
wrote:

 My first thought was something like programatically doing a pairwise 
 diff of the files, 5500 times. I was surprised I couldn't find a 
 utility that just does this.

 But i did find something called diffuse [1], that allows you to 
 graphically compare any number of text files in a diff-like fashion. 
 This would probably at least be able to tell you which files need closer 
 scrutiny.

 I think you'd presumably have to be able to extract the text from each 
 file; I doubt it would work on raw Word docs or PDFs, so that might be 
 a stopper.

 It seems like the realm of source control has a lot of software 
 designed to help with this problem, so there might be other similar things 
 out there.
 But probably not anything designed to natively handle print-ready files.

 -dre.


 [1] http://diffuse.sourceforge.net/about.html

 On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote:

  Can anyone recommend a plagiarism checking software besides Turnitin 
  and SafeAssign?  I need to compare about 100 student assignments 
  against each other to make sure they don't copy each other's assignments.
 
  Thanks.
 
  Judy K. Meirose
  Systems Librarian
  Florida Coastal School of Law
  8787 Baypine Rd
  Jacksonville, FL
  (904)680-7603
 
  This email transmission, and any documents, files or previous e-mail 
  messages attached to it, may contain confidential, privileged and/or 
  proprietary information for the sole use of the intended 
  recipient(s). If you are not an intended recipient or a person 
  responsible for delivering
 it
  to an intended recipient, any disclosure, copying, distribution or 
  use of any of the information contained in or attached to this 
  transmission is strictly prohibited. If you have received this 
  transmission in error,
  please: (1) immediately notify me by reply e-mail; and (2) destroy 
  the original (and any copies) of this transmission and its 
  attachments
 without
  reading or saving in any manner.
 



Re: [CODE4LIB] Plagiarism checker

2015-01-23 Thread Joe Hourcle
On Jan 23, 2015, at 9:44 AM, Mark A. Matienzo wrote:

 I believe Turnitin and SafeAssign both compare the text of submissions to
 against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
 I am not certain if they compare submissions against each other.

My understanding of TurnItIn, at least initially, was that they
built their corpus on existing submissions.  

(they had some deals with universities back when they started up
to use their service for free or cheap, so that they could build
up their corpus).


 However, if you're looking for something along the lines of what Dre
 suggests, you could use ssdeep, which is an implementation of a piecewise
 hashing algorithm [0]. The issue with that you would have to assume that
 all students would probably be using the same file format.
 
 You could also using something like Tika to extract the text content from
 all the submissions, and then compare them against each other.

I'd agree on extracting the text.  MS Word used to store documents
as strings of edits, making it difficult to compare two
documents for similarity without parsing the format.

(I don't know if they still do this in .docx)

-Joe


Re: [CODE4LIB] Plagiarism checker

2015-01-23 Thread Mark A. Matienzo
I believe Turnitin and SafeAssign both compare the text of submissions to
against external sources (e.g., SafeAssign uses ABI/INFORM, among others).
I am not certain if they compare submissions against each other.

However, if you're looking for something along the lines of what Dre
suggests, you could use ssdeep, which is an implementation of a piecewise
hashing algorithm [0]. The issue with that you would have to assume that
all students would probably be using the same file format.

You could also using something like Tika to extract the text content from
all the submissions, and then compare them against each other.

[0] http://ssdeep.sourceforge.net/
[1] http://tika.apache.org/

Mark

--
Mark A. Matienzo m...@matienzo.org
Director of Technology, Digital Public Library of America

On Fri, Jan 23, 2015 at 8:47 AM, Andreas Orphanides akorp...@ncsu.edu
wrote:

 My first thought was something like programatically doing a pairwise diff
 of the files, 5500 times. I was surprised I couldn't find a utility that
 just does this.

 But i did find something called diffuse [1], that allows you to graphically
 compare any number of text files in a diff-like fashion. This would
 probably at least be able to tell you which files need closer scrutiny.

 I think you'd presumably have to be able to extract the text from each
 file; I doubt it would work on raw Word docs or PDFs, so that might be a
 stopper.

 It seems like the realm of source control has a lot of software designed to
 help with this problem, so there might be other similar things out there.
 But probably not anything designed to natively handle print-ready files.

 -dre.


 [1] http://diffuse.sourceforge.net/about.html

 On Fri, Jan 23, 2015 at 7:26 AM, Judy Meirose jmeir...@fcsl.edu wrote:

  Can anyone recommend a plagiarism checking software besides Turnitin and
  SafeAssign?  I need to compare about 100 student assignments against each
  other to make sure they don't copy each other's assignments.
 
  Thanks.
 
  Judy K. Meirose
  Systems Librarian
  Florida Coastal School of Law
  8787 Baypine Rd
  Jacksonville, FL
  (904)680-7603
 
  This email transmission, and any documents, files or previous e-mail
  messages attached to it, may contain confidential, privileged and/or
  proprietary information for the sole use of the intended recipient(s). If
  you are not an intended recipient or a person responsible for delivering
 it
  to an intended recipient, any disclosure, copying, distribution or use of
  any of the information contained in or attached to this transmission is
  strictly prohibited. If you have received this transmission in error,
  please: (1) immediately notify me by reply e-mail; and (2) destroy the
  original (and any copies) of this transmission and its attachments
 without
  reading or saving in any manner.
 



[CODE4LIB] Plagiarism checker

2015-01-23 Thread Judy Meirose
Can anyone recommend a plagiarism checking software besides Turnitin and 
SafeAssign?  I need to compare about 100 student assignments against each other 
to make sure they don't copy each other's assignments.

Thanks.

Judy K. Meirose
Systems Librarian
Florida Coastal School of Law
8787 Baypine Rd
Jacksonville, FL
(904)680-7603

This email transmission, and any documents, files or previous e-mail messages 
attached to it, may contain confidential, privileged and/or proprietary 
information for the sole use of the intended recipient(s). If you are not an 
intended recipient or a person responsible for delivering it to an intended 
recipient, any disclosure, copying, distribution or use of any of the 
information contained in or attached to this transmission is strictly 
prohibited. If you have received this transmission in error, please: (1) 
immediately notify me by reply e-mail; and (2) destroy the original (and any 
copies) of this transmission and its attachments without reading or saving in 
any manner.