Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread Harper, Cynthia
Perhaps if you exported data from lists that re likely to have items/bibs deleted after you have collected them, you could keep an archive of data. -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of don warner saklad Sent: Friday, October 11, 2013

Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread Graeme Williams
I have a background job that wakes up every night and screen scrapes my reading history and lists into a local database, and updates cached availability information -- so I don't have to worry about the problem that Don mentions. However, this is not a solution that scales to Minuteman's 600K+

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Eric Lease Morgan
On Oct 13, 2013, at 6:21 PM, David Friggens frigg...@waikato.ac.nz wrote: For a limited period of time I am making publicly available a Web-based program called PDF2TXT --http://bit.ly/1bJRyh8 PDF2TXT extracts the text from an OCRed PDF document The file I tried was digital native

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Eric Lease Morgan
On Oct 14, 2013, at 1:48 AM, Penelope Campbell penelope.campb...@facs.nsw.gov.au wrote: For a limited period of time I am making publicly available a Web-based program called PDF2TXT -- http://bit.ly/1bJRyh8 As a small special library (solo librarian) in an Australian State Government

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Eric Lease Morgan
On Oct 14, 2013, at 7:56 AM, Nicolas Franck nicolas.fra...@ugent.be wrote: Could this also be done by Apache Tika? Or do I miss a crucial point? http://tika.apache.org/1.4/gettingstarted.html Nicolas, this looks VERY promising! It seemingly can extract the OCR from a PDF document as well

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Eric Lease Morgan
On Oct 14, 2013, at 4:49 PM, Robert Haschart rh...@virginia.edu wrote: For a limited period of time I am making publicly available a Web-based program called PDF2TXT --http://bit.ly/1bJRyh8 Although based on some subsequent messages where you mention tesseract maybe I misunderstood and

Re: [CODE4LIB] ANNOUNCEMENT: Traject MARC-Solr indexer release

2013-10-15 Thread Tom Cramer
++ Jonathan and Bill. 1.) Do you have any thoughts on extending traject to index other types of data--say MODS--into solr, in the future? 2.) What's the etymology of 'traject'? - Tom On Oct 14, 2013, at 8:53 AM, Jonathan Rochkind wrote: Jonathan Rochkind (Johns Hopkins) and Bill Dueber

Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread McDonald, Stephen
Don Warner Saklad said: a) Forensics studies deal with how to retrieve deleted unarchived data. So called deleted data is actually available. Computer forensics cannot always get the data back. Television crime shows greatly exaggerate the capabilities of computer forensics. It depends on

Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread don warner saklad
Thank you Steve McDonald ! On Tue, Oct 15, 2013 at 12:32 PM, McDonald, Stephen steve.mcdon...@tufts.edu wrote: Don Warner Saklad said: a) Forensics studies deal with how to retrieve deleted unarchived data. So called deleted data is actually available. Computer forensics cannot always get

Re: [CODE4LIB] ANNOUNCEMENT: Traject MARC-Solr indexer release

2013-10-15 Thread Bill Dueber
'traject' means to transmit (e.g., trajectory) -- or at least it did, when people still used it, which they don't. The traject workflow is incredibly general: *a reader* sends *a record* to *an indexing routine* which stuffs...stuff...into a context object which is then sent to *a writer*. We

Re: [CODE4LIB] ANNOUNCEMENT: Traject MARC-Solr indexer release

2013-10-15 Thread Jonathan Rochkind
Yep, what Bill said, I have had thoughts of extending it to other types of input too, it was part of my original design goals. In particular, I was thinking of extending it to arbitrary XML. Unlike MARC, there are many other options for indexing XML into Solr (assuming that's your end goal),

Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread don warner saklad
For the My Lists feature what steps are actually involved retrieving an altered/deleted listing like [_] Record b2491348 is not available 03-12-2013 by that bibliographic reference code from the 7 month system backup? Perhaps the backup is compressed and searching a compressed file is a barrier

[CODE4LIB] REMINDER: Code4Lib Journal Call for Editors closes Friday October 18

2013-10-15 Thread Shawn Averkamp
The Code4Lib Journal (http://journal.code4lib.org/) is looking for volunteers to join its editorial committee. Editorial committee members work collaboratively to produce the quarterly Code4Lib Journal. Editorsare expected to: * Read, discuss, and vote on incoming proposals. * Volunteer to be

Re: [CODE4LIB] What can be done to stop deleting of records belonging to users of our Minuteman Library Network in Massachusetts?

2013-10-15 Thread Kyle Banerjee
Searching compressed files is no big deal. First of all, you can always decompress. But if they've just been compressed and not put in a tarball or some other archive format, you can just use zgrep. However, many if not most files are in structures that don't lend themselves to just scanning for

[CODE4LIB] local APIs atop III's Sierra DB

2013-10-15 Thread Thomale, Jason
Hello Code4lib, I'm wondering if any III Sierra users out there have worked on building an API for accessing their ILS data on top of Sierra's Postgres database. Right now I'm looking into possibly building something to serve local needs and use cases, as we're not terribly confident that

Re: [CODE4LIB] local APIs atop III's Sierra DB

2013-10-15 Thread Becky Yoose
Hi Jason, We haven't planned to write our own APIs for Sierra at this point (we're still working on getting Sierra to work in the first place), but Grinnell would be interested in seeing how the process goes for you in terms of local API building. As for the Sierra APIs - III just hired a new

Re: [CODE4LIB] local APIs atop III's Sierra DB

2013-10-15 Thread Francis Kayiwa
On Tue, Oct 15, 2013 at 07:29:01PM +, Thomale, Jason wrote: Hello Code4lib, I'm wondering if any III Sierra users out there have worked on building an API for accessing their ILS data on top of Sierra's Postgres database. Right now I'm looking into possibly building something to serve

[CODE4LIB] Reminder: Makerspaces in libraries survey

2013-10-15 Thread Burke, John
(Please pardon repeated posts). My thanks to everyone who has responded to the survey so far. I am still eager to hear about libraries of all types that have implemented or are planning to implement makerspaces or making activities. Please respond to the survey linked below before October 22.

Re: [CODE4LIB] local APIs atop III's Sierra DB

2013-10-15 Thread Van Mil, James (vanmiljf)
Hi Jason, I've started looking into using ActiveRecord in Rails to plug into the Sierra Postgres tables. I'm still learning how to work with Ruby and Rails, but initial experiments are working: https://github.com/jamesvanmil/ActiveSierra (really have just written a few simple models with

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Arash.Joorabchi
Eric, You might want to consider using http://www.documentcloud.org to host your users document. That would also take care of privacy/authentication concerns. I know of a project in journalism domain (http://overview.ap.org/) which does that. As far as I remember they do provide an API interface

Re: [CODE4LIB] pdf2txt

2013-10-15 Thread Al Matthews
+1 https://www.documentcloud.org/opensource -- Al Matthews Software Developer, Digital Services Unit Atlanta University Center, Robert W. Woodruff Library email: amatth...@auctr.edu; office: 1 404 978 2057 On 10/15/13 4:23 PM, Arash.Joorabchi arash.joorab...@ul.ie wrote: Eric, You might

Re: [CODE4LIB] local APIs atop III's Sierra DB

2013-10-15 Thread Julia Bauder
Jason, To expand on Becky's answer a bit: we haven't written our own APIs yet, but I did write a Sierra driver for VuFind, so I do have some notes that might be useful to you that I'm happy to share. At least, I've learned the hard way some things that you don't want to do when you're querying

Re: [CODE4LIB] ANNOUNCEMENT: Traject MARC-Solr indexer release

2013-10-15 Thread Tom Cramer
Jonathan, Bill, Very interesting--thanks for the replies. While I'm not sure I understand what indexing arbitrary XML into solr might look like, this does prompt me to think it would be interesting to look at Trajecting up some EAD (may I use it as a verb?) into solr, for finding aid

[CODE4LIB] Job: Digital Projects Librarian at Kent State University

2013-10-15 Thread jobs
Kent State University Libraries seeks an experienced and creative Digital Projects Librarian (DPL) who will be responsible for the research, planning, execution and management of digital projects throughout the University Libraries' environment. The DPL will work with programmers and applications

[CODE4LIB] Job: Systems Engineer (2 positions) at Virginia Polytechnic Institute and State University

2013-10-15 Thread jobs
Virginia Tech's Newman Library and the Center for Digital Research and Scholarship (CDRS) are seeking qualified candidates for two Systems Engineers for data initiatives. Incumbents will develop systems that: 1) enable data integration across distributed and heterogeneous local and external data