Re: [CODE4LIB] change management system

2010-02-11 Thread Randy Stern

Harvard University Library does separate OPS and DEV.

Only OPS have write permissions on production boxes, and we have a change 
control procedure that is implemented through a series of scripts.


In addition to Development and QA instances of applications, we have 
implemented a Staging server for releases. The staging server is configured 
identically to the production servers, with access to production databases. 
By release I mean deployment of a software update that has already been 
QA'ed in a QA environment writable by developers (QA'ed by library project 
staff who play that role, we do not have a formal QA group).


Developers run a stage script to check code out of source control and 
build on the staging server. After a stage, limited testing is done, by the 
developer usually, on the staging server to confirm that the QA'ed software 
seems to operate properly with production database and file system mounts. 
Once that is done, the developer runs a publish script to let the 
operations staff know that the release is ready for deployment. Operations 
runs a move2prod script to deploy the software, typically to multiple 
production servers. They have a rollback script available should 
something go wrong in the deployment.


For tracking of this process, and for software bug tracking, we use good 
'ol bugzilla. Before a publish, a bug is entered in an Operations instance 
of bugzilla, for the change control product. All steps in the release are 
tracked as updates to the bug. A little bit of a distortion of what 
bugzilla was designed for, but its working well for us...


- Randy


At 08:55 AM 2/11/2010 -0800, Walker, David wrote:

Thanks to everyone who responded.  The comments have been very helpful!

Is anyone using RT? [1]

Also, I'm curious how many academic libraries are following a formal 
change management process?


By that, I mean: Do you maintain a strict separation between developers 
and operations staff (the people who put the changes into 
production)?  And do you have something like a Change Advisory Board that 
reviews changes before they can be put into production?


Just as background to these questions:

We've been asked to come-up with a change management procedure/system for 
a variety of academic technology groups here that have not previously had 
such (at least nothing formal).  But find the process that the business 
(i.e., PeopleSoft ) folks here follow to be a bit too elaborate for our 
purposes.  They use Remedy.


--Dave

[1] http://bestpractical.com/rt

==
David Walker
Library Web Services Manager
California State University
http://xerxes.calstate.edu

From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mark A. 
Matienzo [m...@matienzo.org]

Sent: Thursday, February 11, 2010 5:47 AM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] change management system

I'm inclined to say that any sort of tracking software could be used
for this - it's mostly an issue of creating sticking with policy
decisions about what the various workflow states are, how things
become triaged, etc. I believe if you define that up front, you could
find Trac or any other tracking/issue system adaptable to what you
want to do.

Mark A. Matienzo
Digital Archivist, Manuscripts and Archives
Yale University Library


[CODE4LIB] DASH Open Access repository developer position

2010-01-15 Thread Randy Stern
The Office for Scholarly Communication of the Harvard University Library 
(http://osc.hul.harvard.edu/osc.php) has a job opening for a Digital 
Library Software Engineer to work primarily on DSpace development of our 
DASH open access repository (http://dash.harvard.edu/). The DASH repository 
implements Harvard's Open Access policies for making the scholarly output 
of Harvard faculty and staff available to the world. If interested in this 
position, please see the job posting for details:


http://jobs.harvard.edu/jobs/summ_req?in_post_id=43042


Re: [CODE4LIB] OCR for handwritten pages

2010-01-13 Thread Randy Stern
Parascript (http://www.parascript.com/) has handwriting recognition 
software, but it only works reliably for things like forms, checks, and 
addresses where there is a lot of dictionary-like context to verify the 
image recognition.  Generalized free text hand writing recognition is un 
unsolved problem


At 01:50 PM 1/13/2010 -0700, Han, Yan wrote:

Hello, Colleagues,
Does anyone know/use any OCR software working on handwritten pages? or at 
least think it is better than hiring a student key-in.

I know these OCR software such as ABBYY, but they do not work on handwriting.

Thanks,
Yan


Re: [CODE4LIB] Recommend book scanner?

2009-05-04 Thread Randy Stern

Printed test sheets:

http://www.diytrade.com/china/4/products/1707979/IEEE_Resolution_Chart.html?r=0

or

http://www.aig-imaging.com/mm5/merchant.mvc?Screen=PRODStore_Code=AIIPIProduct_Code=QA-60Category_Code=Video-Scanner-Resolution-Charts

At 04:54 PM 5/2/2009 -0700, st...@archive.org wrote:

On 5/1/09 8:27 PM, Lars Aronsson wrote:
Does anybody have a printed test sheet that we can scan or photo, and 
then compare the resulting digital images?  It should have lines at 
various densities and areas of different colours, just like an old TV 
test image.  Can you buy such calibration sheets?


archive.org scans typically include a color card target
image near the back (or front) of the book, e.g.

http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2

typical specs for our scanning rig (scribe) are roughly:

  1 8x8x5' scribe structure
  2 Canon EOS 5Ds
  2 light boxes
  1 orthogonal glass platen and cradle
  1 foot pedal, pulley system
  Linux PC
LAMP stack
custom web-based UI
gphoto, imagemagick, leptonica, rsync
  fast internet


we scan over 1,000 books a day with about 100 scribes like this.


/st...@archive.org


Re: [CODE4LIB] Recommend book scanner?

2009-05-01 Thread Randy Stern
My understanding is that a flatbed or sheetfed document scanner that 
produces 300 dpi will produce much better OCR results than a cheap digital 
camera that produces 300 dpi. The reasons have to do with the resolution 
and distortion of the resulting image, where resolution is defined as the 
number of line pairs per mm can be resolved (for example when scanning a 
test chart) - in other words the details that will show up for character 
images, and distortion is image aberration that can appear at the edges of 
the page image areas, particularly when illumination is not even. A scanner 
has much more even illumination.


At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote:

At Fri, 1 May 2009 09:51:19 -0500,
Amanda P wrote:

 On the other hand, there are projects like bkrpr [2] and [3],
 home-brew scanning stations build for marginally more than the cost of
 a pair of $100 cameras.

 Cameras around $100 dollars are very low quality. You could get no where
 near the dpi recommended for materials that need to be OCRed. The 
quality of

 images from cameras would be not only low, but the OCR (even with the best
 software) would probably have many errors. For someone scanning items at
 home this might be ok, but for archival quality, I would not recommend
 cameras. If you are grant funded and the grant provider requires a certain
 level of quality, you need to make sure the scanning mechanism you use can
 scan at that quality.

I know very little about digital cameras, so I hope I get this right.

According to Wikipedia, Google uses (or used) an 11MP camera (Elphel
323). You can get a 12MP camera for about $200.

With a 12MP camera you should easily be able to get 300 DPI images of
book pages and letter size archival documents. For a $100 camera you
can get more or less 300 DPI images of book pages. *

The problems I have always seen with OCR had much to do with alignment
and artifacts than with DPI. 300 DPI is fine for OCR as far as my
(limited) experience goes - as long as you have quality images.

If your intention is to scan items for preservation, then, yes, you
want higher quality - but I can’t imagine any setup for archival
quality costing anywhere near $1000. If you just want to make scans 
full text OCR available, these setups seem worth looking at -
especially if the software  workflow can be improved.

best,
Erik

* 12 MP seems to equal 4256 x 2848 pixels. To take a ‘scan’ (photo) of
a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing
pixels / 300). As long as you can get the camera close enough to the
image to not waste much space you will be getting in the close to 300
DPI range for images of size 8.5 x 11 or less.
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3


Re: [CODE4LIB] best OCR package?

2009-02-03 Thread Randy Stern
Abbyy Finereader and Nuance Omnipage are the two leading commercial OCR 
products. Both can achieve 98% + character accuracy on most book-like 
material scanned at 300 dpi.


- Randy Stern (who formerly worked in the OCR industry)

At 07:37 AM 2/3/2009 -0500, Nicole Engard wrote:

I'm with Christian - I loved Abbyy FineReader when I used it at both
my previous libraries.  It's very accurate and it's affordable if
you're not using it for mass digitization :) but we never got the
server contract because like Christian said - it is quite expensive.

---

Nicole C. Engard
Open Source Evangelist, LibLime
(888) Koha ILS (564-2457) ext. 714
n...@liblime.com
AIM/Y!/Skype: nengard

http://liblime.com
http://blogs.liblime.com/open-sesame/



On Tue, Feb 3, 2009 at 6:23 AM, MJ Ray m...@phonecoop.coop wrote:
 Alberto Accomazzi aaccoma...@cfa.harvard.edu wrote:
 [...] I know about OCRopus but I have a feeling that
 commercial products still have a significant edge over public domain
 packages. [...]

 OCRopus is released under the Apache License 2.0, which allows
 commercial development.  It is not a public domain package.
 Feel free to use it as a commercial product without fear.

 Hope that helps,
 --
 MJ Ray (slef)
 Webmaster for hire, statistician and online shop builder for a small
 worker cooperative http://www.ttllp.co.uk/ http://mjr.towers.org.uk/
 (Notice http://mjr.towers.org.uk/email.html) tel:+44-844-4437-237



[CODE4LIB] Job Posting - DSpace developer

2008-05-14 Thread Randy Stern

As you may know, Harvard's faculty in the Faculty of Arts and Sciences has
recently voted to provide open access to scholarly articles created by
faculty. This job posting is in support of that goal. If interested, please
contact Randy Stern.

Digital Library Software Engineer
Harvard University Library
Office for Information Systems
Grade 57, One Year Term

Duties  Responsibilities:

Reporting to the Manager of Systems Development in the Office for
Information Systems, serves as the lead developer for software applications
and tools for the implementation of a new institutional repository within
the Harvard University Library. Responsibilities include the configuration,
customization, and on-going support of a DSpace instance at Harvard, as
well as the development, maintenance, and integration of institutional
repository software tools to create an extremely user friendly deposit and
repository management and reporting process. May also integrate
authentication processes and other DSpace modules with Harvard systems.

This position includes customizing the user interface of DSpace, utilizing
XML, XSL, and JSP technologies.

This position requires the ability to grasp a high level view of
requirements from discussions with stakeholders, recommend solutions, and
iteratively translate that into specifications, prototypes, and working
code with accompanying documentation.

Requirements: BA/BS in computer science with a minimum of 4 years
development experience in java. Ability to produce results and work
independently with general guidance in an environment in which requirements
evolve over time. Strong interpersonal, verbal and written communication
skills. Experience designing, developing, deploying, and managing both
stand alone and Internet applications utilizing Unix and World Wide Web
technologies. Experience with java stand alone and web applications, SQL,
JDBC, XML, XSL, HTML.

Desirable: Experience with IT and library systems in a higher education
environment; experience with Open Source software; familiarity with library
metadata standards such as Dublin Core, METS, MODS, and the OAI protocol;
knowledge of associated digital storage formats and conversion principles,
procedures, and operations; strong understanding of information
organization and retrieval technologies used to organize, store, and access
digital content; experience with programming best-practices, including
test-drive development and design patterns. Hands-on experience with
DSpace, Perl, CVS, Eclipse, Struts, Tomcat, Cocoon, Maven, Ant a plus.





Randy Stern
Manager of Systems Development
Harvard University Library Office for Information Systems
90 Mount Auburn Street
Cambridge, MA 02138
Tel. +1 (617) 495-3724
Email [EMAIL PROTECTED]


Re: [CODE4LIB] Library data in outside systems?

2007-09-10 Thread Randy Stern

Edward,

Here are a few things we do at Harvard:

1. Viewing of course reserves reading lists in on-line course systems by
faculty and students:
See http://hul.harvard.edu/ois/systems/readinglist/index.html

2. Providing a (currently non-public) OAI data provider for ARTstor to
harvest metadata and images from our visual information access system, VIA.

3. Shortly, we'll be publishing public OAI data provider url for harvesting
data from Harvadr Virtual Collections,
http://hul.harvard.edu/ois/systems/vc/index.html

4. Enabling various public interface systems for easy crawling by search
engine crawlers. To date, OASIS http://hul.harvard.edu/ois/systems/oasis/
and the Harvard Geospatial Library
http://hul.harvard.edu/ois/systems/hgl/index.html have been enabled.

5. and of course Harvard holdings are represented in OCLC Worldcat.

-Randy


At 08:32 AM 9/10/2007 -0400, you wrote:

Hello All,

I am in the process of preparing a presentation about how libraries are
putting library data into other systems (for example, in a course
management system, or in a social networking site such as Facebook or
MySpace). I am most interested in how libraries are advertising library
holdings out of their ILS in other systems, but any library resource
would be useful. If your library is, or you know of another library that
is, including or otherwise advertising data from the library catalog or
other library data in systems outside of the library, would you kindly
let me know? I have come up with a number of different things with my
own research, but I am sure there is more of this type of thing going on
out there than I'm currently aware of.

Thank you,

Edward Corrado

--
Edward M. Corrado
http://www.tcnj.edu/~corrado/
Systems Librarian
The College of New Jersey
403E TCNJ Library
PO Box 7718 Ewing, NJ 08628-0718
Tel: 609.771.3337  Fax: 609.637.5177
Email: [EMAIL PROTECTED]


[CODE4LIB] Position available at Harvard University Library

2006-09-25 Thread Randy Stern

I apologize if this has already been posted:


The Harvard University Library is seeking a highly competent and
experienced software developer to play a leadership design and development
role in a team creating digital library systems, tools, and delivery
services, as well as the next generation of Harvard's Digital Repository
Service (DRS), a preservation and delivery repository currently managing
and serving over 17 terabytes of electronic images, audio, books, archival
finding aids, harvested web data, and geospatial data, projected to grow
to over 200TB over the next several years.

In this position you will play a lead role in designing and developing
Harvard's next generation digital repository, as well as have the
opportunity to mentor others and work on a variety of interesting projects
with a variety of interesting people!

The Harvard University Library is a unique and exciting placed to work,
from our brand new green building situated in Harvard Square, to the range
of technologists from across the University with whom you will interact.
You will work with nationally respected digital library experts to define
requirements and brainstorm solutions. You will be part of a cohesive
software development team that is researching, designing, and deploying an
integrated suite of database-driven, web-based Java applications that
support our existing digital library infrastructure and re-invent it for
the future - a team that works closely with other teams responsible for
production operations, integrated library systems and e-resources, and
digital library projects.

For more information, including detailed position requirements, please see
job position 27813 at
http://jobs.harvard.edu/jobs/summ_req?in_post_id=31254 or contact me at
[EMAIL PROTECTED] or 617-495-3724.