Re: [CODE4LIB] indexing pdf files

2009-09-16 Thread Eric Lease Morgan
Eric Morgan wrote: http://infomotions.com/highlights/ Rosalyn Metz wrote: I have librarians that would kill for this. In fact I was talking to one about it the other day. She felt there must be a way to handle active reading and make it portable. This would be great in conjunction with

Re: [CODE4LIB] indexing pdf files

2009-09-16 Thread Cindy Harper
We're just talking about creating an index, not a separate copy of the works, right? because I imagine that copyright has a lot to do with why this type of thing doesn't already exist. On Wed, Sep 16, 2009 at 3:08 PM, Eric Lease Morgan emor...@nd.edu wrote: Eric Morgan wrote:

Re: [CODE4LIB] indexing pdf files

2009-09-16 Thread Eric Lease Morgan
On Sep 16, 2009, at 4:01 PM, Cindy Harper wrote: http://infomotions.com/highlights/ We're just talking about creating an index, not a separate copy of the works, right? because I imagine that copyright has a lot to do with why this type of thing doesn't already exist. No, not just an

[CODE4LIB] indexing pdf files

2009-09-15 Thread Eric Lease Morgan
I have been having fun recently indexing PDF files. For the pasts six months or so I have been keeping the articles I've read in a pile, and I was rather amazed at the size of the pile. It was about a foot tall. When I read these articles I actively read them -- meaning, I write, scribble,

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread Rosalyn Metz
Eric, I have librarians that would kill for this. In fact I was talking to one about it the other day. She felt there must be a way to handle active reading and make it portable. This would be great in conjunction with RefWorks or Zotero or something along those lines. Rosalyn On Tue, Sep

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread Mark A. Matienzo
Eric, 5. Use pdttotext to extract the OCRed text from the PDF and index it along with the MyLibrary metadata using Solr. [3, 4] Have you considered using Solr's ExtractingRequestHandler [1] for the PDFs? We're using it at NYPL with pretty great success. [1]

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread Peter Kiraly
changed since that time. Király Péter http://eXtensibleCatalog.org - Original Message - From: Mark A. Matienzo m...@matienzo.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, September 15, 2009 3:56 PM Subject: Re: [CODE4LIB] indexing pdf files Eric, 5. Use pdttotext to extract the OCRed

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread danielle plumer
My (much more primitive) version of the same thing involves reading and annotating articles using my Tablet PC. Although I do get a variety of print publications, I find I don't tend to annotate them as much anymore. I used to use EndNote to do the metadata, then I switched to Zotero. I hadn't

Re: [CODE4LIB] indexing pdf files

2009-09-15 Thread Erik Hatcher
://eXtensibleCatalog.org - Original Message - From: Mark A. Matienzo m...@matienzo.org To: CODE4LIB@LISTSERV.ND.EDU Sent: Tuesday, September 15, 2009 3:56 PM Subject: Re: [CODE4LIB] indexing pdf files Eric, 5. Use pdttotext to extract the OCRed text from the PDF and index it along