[CODE4LIB] Job: Head, Information Commons Services at University at Albany, SUNY

2015-06-18 Thread jobs
Head, Information Commons Services University at Albany, SUNY Albany Established in 1844 and designated a University Center of the State University of New York in 1962, the University at Albany's broad mission of excellence in undergraduate and graduate education, research and public service engag

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Owen Stephens
It may depend on the format of the PDF, but I’ve used the Scraperwiki Python Module ‘pdf2xml’ function to extract text data from PDFs in the past. There is a write up (not by me) at http://schoolofdata.org/2013/08/16/scraping-pdfs-with-python-and-the-scraperwiki-module/

Re: [CODE4LIB] Combining RSS feeds

2015-06-18 Thread Kyle Breneman
You're reading me correctly, Nitin. Thanks everybody for your thoughtful contributions! On Thu, Jun 18, 2015 at 10:53 AM, nitin arora wrote: > I might be totally misunderstanding your example, but it seems you want to > rank items by appropriacy and not just date of publication. > In your examp

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Matt Sherman
Thanks, that is interesting since we can export from the PDFs, and while the OCR text is a little messy it is in decent shape. I'll certainly look into that. On Thu, Jun 18, 2015 at 3:13 PM, Gordon, Bonnie wrote: > We¹re actually also working on getting a bibliography from a Word Doc to a > mor

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Gordon, Bonnie
We¹re actually also working on getting a bibliography from a Word Doc to a more structured format. We¹re using regular expressions in LibreOffice Writer to mark up the citations, then insert tabs between the elements, and then copy into a spreadsheet (similar to what¹s described in http://programmi

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Harper, Cynthia
Eric or others, do you know of any utility that converts a PDF and retains coding for where font or font-style changes? Or converts a web page with associated CSS and notes where font-style and HTML text block stops and starts? It seems that would be the starting point for recognizing citation

[CODE4LIB] DLF Forum Proposals due June 22

2015-06-18 Thread Sibyl Schaefer
One final reminder that DLF Forum Proposals are due next Monday, June 22. The DLF Forum is an annual meeting where the digital library community comes together to discover better methods of working through sharing and collaboration. It serves as a resour

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Matt Sherman
The hope is to take these bibliographies put it into more of a web searchable/sortable format for researchers to make use out of them. My colleague was taking some inspiration from the Marlowe Bibliography ( https://marlowebibliography.org/), though we are hoping to possibly get a bit more robust

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Kyle Banerjee
How you want to preprocess and structure the data depends on what you hope to achieve. Can you say more about what you want the end product to look like? kyle On Thu, Jun 18, 2015 at 10:08 AM, Matt Sherman wrote: > That is a pretty good summation of it yes. I appreciate the suggestions, > this

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Matt Sherman
That is a pretty good summation of it yes. I appreciate the suggestions, this is a bit of a new realm for me and while I know what I want it to do and the structure I want to put it in, the conversion process has been eluding me so thanks for giving me some tools to look into. On Thu, Jun 18, 201

Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Eric Lease Morgan
On Jun 18, 2015, at 12:02 PM, Matt Sherman wrote: > I am working with colleague on a side project which involves some scanned > bibliographies and making them more web searchable/sortable/browse-able. > While I am quite familiar with the metadata and organization aspects we > need, but I am at a

Re: [CODE4LIB] Combining RSS feeds

2015-06-18 Thread Tom Keays
I guess I'd approach it by tweaking your RSS feeds such that the PubDate is the date the event occurs rather than the default of the date the entry was entered into the system. Otherwise, you have to add some sort of non-standard tag (in the description maybe?) and build a system that sorts on that

[CODE4LIB] Code4Lib Regional meetup for MD, DC, VA on 8/11-12

2015-06-18 Thread Kim, Bohyun
Hi all - This is a "Save the Date" notice for code4lib folks in MD, DC, VA area. We are organizing a two-day event on Tue. 8/11 - Wed. 8/12. It will take place at the McKeldin Library at University of Maryland, College Park. More details will be posted on the Wiki page below. The registration w

[CODE4LIB] Jobs: join us at the Getty

2015-06-18 Thread Joshua Gomez
I'm looking for two good library coders to join my team at the Getty Research Institute. Our current team is made up of myself, three more software engineers, and a UX designer (3 female, 2 male). Our current projects include: 1. Rebuilding

[CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-18 Thread Matt Sherman
Hi Code4Libbers, I am working with colleague on a side project which involves some scanned bibliographies and making them more web searchable/sortable/browse-able. While I am quite familiar with the metadata and organization aspects we need, but I am at a bit of a loss on how to automate the proce

Re: [CODE4LIB] Combining RSS feeds

2015-06-18 Thread nitin arora
Real quick: I'll throw this out there. A less obtrusive way to "encode" the date of event might be to stick it in a tag - if that's possible with the software you use for both calendars. It would be with WordPress, for instance. That's to say, depending on what gets placed in the RSS feed from bot

Re: [CODE4LIB] Combining RSS feeds

2015-06-18 Thread nitin arora
I might be totally misunderstanding your example, but it seems you want to rank items by appropriacy and not just date of publication. In your example, the ranking depends on date of occurrence of event and not publication date. Did I read that right? If that's the only criteria, than you might be

[CODE4LIB] Job: Multimedia Instructional Technologist at St. Olaf College

2015-06-18 Thread jobs
Multimedia Instructional Technologist St. Olaf College Northfield The Multimedia Instructional Technologist is a member of a team-oriented staff and shares in the responsibility of providing the St. Olaf community with instructional technology services. The Multimedia Instructional Technologist pr

[CODE4LIB] seeking Python assistant for ALA pre-conference

2015-06-18 Thread Heidi P Frank
Hi, I'm looking for someone who understands Python enough to troubleshoot basic issues and would be willing to help out during a pre-conference session coming up at ALA in San Francisco on Thursday, June 25th. The Python/PyMARC portion of the pre-conference will be in the afternoon from 2:45-4:15p

[CODE4LIB] Job opportunity -- Digital Projects and Services Librarian at Temple University

2015-06-18 Thread Delphine Khanna
*Summary:* Looking for a dynamic work environment and professional growth opportunities? Come join the Digital Library Initiatives team at Temple. We seek an enthusiastic service-oriented Digital Projects and Services Librarian to be involved in a range of collaborative projects including rethinki