Re: [CODE4LIB] Vote for NE code4lib meetup location

2008-10-20 Thread Klein, Michael
Looking into the space/time issue this week, folks. I promise. -- Michael B. Klein Digital Initiatives Technology Librarian Boston Public Library (617) 859-2391 [EMAIL PROTECTED] From: Jay Luker [EMAIL PROTECTED] Reply-To: Code for Libraries CODE4LIB@LISTSERV.ND.EDU

Re: [CODE4LIB] OCR PDFs

2008-10-20 Thread Michael Beccaria
It's not exactly what you're looking for, but Microsoft Office comes with a scripting OCR engine that works on TIFFs. I use it to get text from yearbooks we are scanning so people can look for names and such. While I wouldn't put it on par with ABBYY, it does a pretty decent job. I wrote a simple

[CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Bess Sadler
Dear Code4Libbers, I'm very pleased to announce that for the first time in almost two years there has been a new release of marc4j. Release 2.4 is a minor release in the sense that it shouldn't break any existing code, but it's a major release in the sense that it represents an influx of

[CODE4LIB] Call for Book Reviewers Announcing a New Reviews Editor: Journal of Web Librarianship

2008-10-20 Thread Jody Condit Fagan
Please excuse cross postings! The Journal of Web Librarianship is pleased to announce Lisa Ennis and Nicole Mitchell as new co-editors of the reviews section beginning with volume 3, issue 1. Lisa is the Systems Librarian at UAB’s Lister Hill Library of the Health Sciences. She received her

Re: [CODE4LIB] 2009 Conference Registration Rates?

2008-10-20 Thread jean rainwater
We're still working to line up sponsors but we hope to be able to keep the registration fee the same as last year - $125. Room rate at the conference hotel is $135 plus tax (free internet in guest rooms). Jean Rainwater Brown University Library Providence, RI 02912 On Mon, Oct 20, 2008 at

[CODE4LIB] Mashed Library UK 2008 - registration is open

2008-10-20 Thread Stephens, Owen
I posted a little while ago that I was organising a 'Mashed Libraries' event. Well, registration for the event is now open at http://www.ukoln.ac.uk/events/mashed-library-2008/ There is no charge for the day, thanks to my employer (Imperial College London), sponsorship from UKOLN

Re: [CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Michael Beccaria
Very cool! I noticed that a feature, MarcDirStreamReader, is capable of iterating over all marc record files in a given directory. Does anyone know of any de-duplicating efforts done with marc4j? For example, libraries that have similar holdings would have their records merged into one record with

[CODE4LIB] JOB ADVERTISEMENT- Web Applications Developer, VCU Libraries

2008-10-20 Thread Jimmy Ghaphery
Web Applications Developer. Virginia Commonwealth University Libraries seeks faculty candidates for advancing the state of the art in the library’s Web environment, making it a rich, functional, and highly engaging experience for the VCU community of users. Position reports to the Web

Re: [CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Bess Sadler
Hi, Mike. I don't know of any off-the-shelf software that does de-duplication of the kind you're describing, but it would be pretty useful. That would be awesome if someone wanted to build something like that into marc4j. Has anyone published any good algorithms for de-duping? As I

Re: [CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Jonathan Rochkind
To me, de-duplication means throwing out some records as duplicates. Are we talking about that, or are we talking about what I call work set grouping and others (erroneously in my opinion) call FRBRization? If the latter, I don't think there is any mature open source software that addresses

Re: [CODE4LIB] marc4j 2.4 released

2008-10-20 Thread Kyle Banerjee
Terry Reese wrote a program called RobertCompare a few years back http://oregonstate.edu/~reeset/marcedit/html/robertcompare.html that could compare MARC records and tell you about differences. Perhaps that would be useful. kyle On Mon, Oct 20, 2008 at 11:55 AM, Bess Sadler [EMAIL PROTECTED]

[CODE4LIB] de-dupping (was: marc4j 2.4 released)

2008-10-20 Thread Naomi Dushay
I've wondered if standard number matching (ISBN, LCCN, OCLC, ISSN ...) would be a big piece. Isn't there such a service from OCLC, and another flavor of something-or-other from LibraryThing? - Naomi On Oct 20, 2008, at 12:21 PM, Jonathan Rochkind wrote: To me, de-duplication means

Re: [CODE4LIB] de-dupping (was: marc4j 2.4 released)

2008-10-20 Thread Min-Yen Kan
Hi all: My student, Yee Fan Tan, and I published a short technical column on record linkage tasks (very similar to the de-dup task discussed here) in February in the Communications of the ACM. Min-Yen Kan and Yee Fan Tan (2008) Record matching in digital library metadata. In Communications of

[CODE4LIB] XML Workshop

2008-10-20 Thread Patrick Yott
This is being shamelessly cross-posted ‹ all apologies for full mailboxes! WEB DEVELOPMENT WITH XML: DESIGN AND APPLICATIONS, JAN. 5-9, 2009, CHAPEL HILL, NC Washington DC‹The Association of Research Libraries (ARL) is pleased to offer once again an in-depth workshop focused on Web development