Re: [CODE4LIB] change management system
Harvard University Library does separate OPS and DEV. Only OPS have write permissions on production boxes, and we have a change control procedure that is implemented through a series of scripts. In addition to Development and QA instances of applications, we have implemented a Staging server for releases. The staging server is configured identically to the production servers, with access to production databases. By release I mean deployment of a software update that has already been QA'ed in a QA environment writable by developers (QA'ed by library project staff who play that role, we do not have a formal QA group). Developers run a stage script to check code out of source control and build on the staging server. After a stage, limited testing is done, by the developer usually, on the staging server to confirm that the QA'ed software seems to operate properly with production database and file system mounts. Once that is done, the developer runs a publish script to let the operations staff know that the release is ready for deployment. Operations runs a move2prod script to deploy the software, typically to multiple production servers. They have a rollback script available should something go wrong in the deployment. For tracking of this process, and for software bug tracking, we use good 'ol bugzilla. Before a publish, a bug is entered in an Operations instance of bugzilla, for the change control product. All steps in the release are tracked as updates to the bug. A little bit of a distortion of what bugzilla was designed for, but its working well for us... - Randy At 08:55 AM 2/11/2010 -0800, Walker, David wrote: Thanks to everyone who responded. The comments have been very helpful! Is anyone using RT? [1] Also, I'm curious how many academic libraries are following a formal change management process? By that, I mean: Do you maintain a strict separation between developers and operations staff (the people who put the changes into production)? And do you have something like a Change Advisory Board that reviews changes before they can be put into production? Just as background to these questions: We've been asked to come-up with a change management procedure/system for a variety of academic technology groups here that have not previously had such (at least nothing formal). But find the process that the business (i.e., PeopleSoft ) folks here follow to be a bit too elaborate for our purposes. They use Remedy. --Dave [1] http://bestpractical.com/rt == David Walker Library Web Services Manager California State University http://xerxes.calstate.edu From: Code for Libraries [code4...@listserv.nd.edu] On Behalf Of Mark A. Matienzo [m...@matienzo.org] Sent: Thursday, February 11, 2010 5:47 AM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] change management system I'm inclined to say that any sort of tracking software could be used for this - it's mostly an issue of creating sticking with policy decisions about what the various workflow states are, how things become triaged, etc. I believe if you define that up front, you could find Trac or any other tracking/issue system adaptable to what you want to do. Mark A. Matienzo Digital Archivist, Manuscripts and Archives Yale University Library
[CODE4LIB] DASH Open Access repository developer position
The Office for Scholarly Communication of the Harvard University Library (http://osc.hul.harvard.edu/osc.php) has a job opening for a Digital Library Software Engineer to work primarily on DSpace development of our DASH open access repository (http://dash.harvard.edu/). The DASH repository implements Harvard's Open Access policies for making the scholarly output of Harvard faculty and staff available to the world. If interested in this position, please see the job posting for details: http://jobs.harvard.edu/jobs/summ_req?in_post_id=43042
Re: [CODE4LIB] OCR for handwritten pages
Parascript (http://www.parascript.com/) has handwriting recognition software, but it only works reliably for things like forms, checks, and addresses where there is a lot of dictionary-like context to verify the image recognition. Generalized free text hand writing recognition is un unsolved problem At 01:50 PM 1/13/2010 -0700, Han, Yan wrote: Hello, Colleagues, Does anyone know/use any OCR software working on handwritten pages? or at least think it is better than hiring a student key-in. I know these OCR software such as ABBYY, but they do not work on handwriting. Thanks, Yan
Re: [CODE4LIB] Recommend book scanner?
Printed test sheets: http://www.diytrade.com/china/4/products/1707979/IEEE_Resolution_Chart.html?r=0 or http://www.aig-imaging.com/mm5/merchant.mvc?Screen=PRODStore_Code=AIIPIProduct_Code=QA-60Category_Code=Video-Scanner-Resolution-Charts At 04:54 PM 5/2/2009 -0700, st...@archive.org wrote: On 5/1/09 8:27 PM, Lars Aronsson wrote: Does anybody have a printed test sheet that we can scan or photo, and then compare the resulting digital images? It should have lines at various densities and areas of different colours, just like an old TV test image. Can you buy such calibration sheets? archive.org scans typically include a color card target image near the back (or front) of the book, e.g. http://www-steve.us.archive.org/public/data/eg/birdsthateverych00doub/birdsthateverych00doub_jp2/birdsthateverych00doub_0371.jp2 typical specs for our scanning rig (scribe) are roughly: 1 8x8x5' scribe structure 2 Canon EOS 5Ds 2 light boxes 1 orthogonal glass platen and cradle 1 foot pedal, pulley system Linux PC LAMP stack custom web-based UI gphoto, imagemagick, leptonica, rsync fast internet we scan over 1,000 books a day with about 100 scribes like this. /st...@archive.org
Re: [CODE4LIB] Recommend book scanner?
My understanding is that a flatbed or sheetfed document scanner that produces 300 dpi will produce much better OCR results than a cheap digital camera that produces 300 dpi. The reasons have to do with the resolution and distortion of the resulting image, where resolution is defined as the number of line pairs per mm can be resolved (for example when scanning a test chart) - in other words the details that will show up for character images, and distortion is image aberration that can appear at the edges of the page image areas, particularly when illumination is not even. A scanner has much more even illumination. At 11:21 AM 5/1/2009 -0700, Erik Hetzner wrote: At Fri, 1 May 2009 09:51:19 -0500, Amanda P wrote: On the other hand, there are projects like bkrpr [2] and [3], home-brew scanning stations build for marginally more than the cost of a pair of $100 cameras. Cameras around $100 dollars are very low quality. You could get no where near the dpi recommended for materials that need to be OCRed. The quality of images from cameras would be not only low, but the OCR (even with the best software) would probably have many errors. For someone scanning items at home this might be ok, but for archival quality, I would not recommend cameras. If you are grant funded and the grant provider requires a certain level of quality, you need to make sure the scanning mechanism you use can scan at that quality. I know very little about digital cameras, so I hope I get this right. According to Wikipedia, Google uses (or used) an 11MP camera (Elphel 323). You can get a 12MP camera for about $200. With a 12MP camera you should easily be able to get 300 DPI images of book pages and letter size archival documents. For a $100 camera you can get more or less 300 DPI images of book pages. * The problems I have always seen with OCR had much to do with alignment and artifacts than with DPI. 300 DPI is fine for OCR as far as my (limited) experience goes - as long as you have quality images. If your intention is to scan items for preservation, then, yes, you want higher quality - but I canât imagine any setup for archival quality costing anywhere near $1000. If you just want to make scans full text OCR available, these setups seem worth looking at - especially if the software workflow can be improved. best, Erik * 12 MP seems to equal 4256 x 2848 pixels. To take a âscanâ (photo) of a page at 300 DPI, that page would need to be 14.18 x 9.49 (dividing pixels / 300). As long as you can get the camera close enough to the image to not waste much space you will be getting in the close to 300 DPI range for images of size 8.5 x 11 or less. ;; Erik Hetzner, California Digital Library ;; gnupg key id: 1024D/01DB07E3
Re: [CODE4LIB] best OCR package?
Abbyy Finereader and Nuance Omnipage are the two leading commercial OCR products. Both can achieve 98% + character accuracy on most book-like material scanned at 300 dpi. - Randy Stern (who formerly worked in the OCR industry) At 07:37 AM 2/3/2009 -0500, Nicole Engard wrote: I'm with Christian - I loved Abbyy FineReader when I used it at both my previous libraries. It's very accurate and it's affordable if you're not using it for mass digitization :) but we never got the server contract because like Christian said - it is quite expensive. --- Nicole C. Engard Open Source Evangelist, LibLime (888) Koha ILS (564-2457) ext. 714 n...@liblime.com AIM/Y!/Skype: nengard http://liblime.com http://blogs.liblime.com/open-sesame/ On Tue, Feb 3, 2009 at 6:23 AM, MJ Ray m...@phonecoop.coop wrote: Alberto Accomazzi aaccoma...@cfa.harvard.edu wrote: [...] I know about OCRopus but I have a feeling that commercial products still have a significant edge over public domain packages. [...] OCRopus is released under the Apache License 2.0, which allows commercial development. It is not a public domain package. Feel free to use it as a commercial product without fear. Hope that helps, -- MJ Ray (slef) Webmaster for hire, statistician and online shop builder for a small worker cooperative http://www.ttllp.co.uk/ http://mjr.towers.org.uk/ (Notice http://mjr.towers.org.uk/email.html) tel:+44-844-4437-237
[CODE4LIB] Job Posting - DSpace developer
As you may know, Harvard's faculty in the Faculty of Arts and Sciences has recently voted to provide open access to scholarly articles created by faculty. This job posting is in support of that goal. If interested, please contact Randy Stern. Digital Library Software Engineer Harvard University Library Office for Information Systems Grade 57, One Year Term Duties Responsibilities: Reporting to the Manager of Systems Development in the Office for Information Systems, serves as the lead developer for software applications and tools for the implementation of a new institutional repository within the Harvard University Library. Responsibilities include the configuration, customization, and on-going support of a DSpace instance at Harvard, as well as the development, maintenance, and integration of institutional repository software tools to create an extremely user friendly deposit and repository management and reporting process. May also integrate authentication processes and other DSpace modules with Harvard systems. This position includes customizing the user interface of DSpace, utilizing XML, XSL, and JSP technologies. This position requires the ability to grasp a high level view of requirements from discussions with stakeholders, recommend solutions, and iteratively translate that into specifications, prototypes, and working code with accompanying documentation. Requirements: BA/BS in computer science with a minimum of 4 years development experience in java. Ability to produce results and work independently with general guidance in an environment in which requirements evolve over time. Strong interpersonal, verbal and written communication skills. Experience designing, developing, deploying, and managing both stand alone and Internet applications utilizing Unix and World Wide Web technologies. Experience with java stand alone and web applications, SQL, JDBC, XML, XSL, HTML. Desirable: Experience with IT and library systems in a higher education environment; experience with Open Source software; familiarity with library metadata standards such as Dublin Core, METS, MODS, and the OAI protocol; knowledge of associated digital storage formats and conversion principles, procedures, and operations; strong understanding of information organization and retrieval technologies used to organize, store, and access digital content; experience with programming best-practices, including test-drive development and design patterns. Hands-on experience with DSpace, Perl, CVS, Eclipse, Struts, Tomcat, Cocoon, Maven, Ant a plus. Randy Stern Manager of Systems Development Harvard University Library Office for Information Systems 90 Mount Auburn Street Cambridge, MA 02138 Tel. +1 (617) 495-3724 Email [EMAIL PROTECTED]
Re: [CODE4LIB] Library data in outside systems?
Edward, Here are a few things we do at Harvard: 1. Viewing of course reserves reading lists in on-line course systems by faculty and students: See http://hul.harvard.edu/ois/systems/readinglist/index.html 2. Providing a (currently non-public) OAI data provider for ARTstor to harvest metadata and images from our visual information access system, VIA. 3. Shortly, we'll be publishing public OAI data provider url for harvesting data from Harvadr Virtual Collections, http://hul.harvard.edu/ois/systems/vc/index.html 4. Enabling various public interface systems for easy crawling by search engine crawlers. To date, OASIS http://hul.harvard.edu/ois/systems/oasis/ and the Harvard Geospatial Library http://hul.harvard.edu/ois/systems/hgl/index.html have been enabled. 5. and of course Harvard holdings are represented in OCLC Worldcat. -Randy At 08:32 AM 9/10/2007 -0400, you wrote: Hello All, I am in the process of preparing a presentation about how libraries are putting library data into other systems (for example, in a course management system, or in a social networking site such as Facebook or MySpace). I am most interested in how libraries are advertising library holdings out of their ILS in other systems, but any library resource would be useful. If your library is, or you know of another library that is, including or otherwise advertising data from the library catalog or other library data in systems outside of the library, would you kindly let me know? I have come up with a number of different things with my own research, but I am sure there is more of this type of thing going on out there than I'm currently aware of. Thank you, Edward Corrado -- Edward M. Corrado http://www.tcnj.edu/~corrado/ Systems Librarian The College of New Jersey 403E TCNJ Library PO Box 7718 Ewing, NJ 08628-0718 Tel: 609.771.3337 Fax: 609.637.5177 Email: [EMAIL PROTECTED]
[CODE4LIB] Position available at Harvard University Library
I apologize if this has already been posted: The Harvard University Library is seeking a highly competent and experienced software developer to play a leadership design and development role in a team creating digital library systems, tools, and delivery services, as well as the next generation of Harvard's Digital Repository Service (DRS), a preservation and delivery repository currently managing and serving over 17 terabytes of electronic images, audio, books, archival finding aids, harvested web data, and geospatial data, projected to grow to over 200TB over the next several years. In this position you will play a lead role in designing and developing Harvard's next generation digital repository, as well as have the opportunity to mentor others and work on a variety of interesting projects with a variety of interesting people! The Harvard University Library is a unique and exciting placed to work, from our brand new green building situated in Harvard Square, to the range of technologists from across the University with whom you will interact. You will work with nationally respected digital library experts to define requirements and brainstorm solutions. You will be part of a cohesive software development team that is researching, designing, and deploying an integrated suite of database-driven, web-based Java applications that support our existing digital library infrastructure and re-invent it for the future - a team that works closely with other teams responsible for production operations, integrated library systems and e-resources, and digital library projects. For more information, including detailed position requirements, please see job position 27813 at http://jobs.harvard.edu/jobs/summ_req?in_post_id=31254 or contact me at [EMAIL PROTECTED] or 617-495-3724.