Re: Issue 1252 in qubit-toolkit: Add the full-text content of uploaded PDFs to the search index

qubit-toolkit Sun, 03 Apr 2011 08:56:33 -0700

Updates:
        Owner: suhonos

Labels: -Milestone-Release-Post-1.2 -Priority-Low Milestone-Release-1.2Priority-High

Comment #2 on issue 1252 by [email protected]: Add the full-textcontent of uploaded PDFs to the search index

http://code.google.com/p/qubit-toolkit/issues/detail?id=1252

I know several OJS-based journals successfully used pdftotext forindexing their PDF-based articles. It basically extracts text from thePDF and makes it available to whatever backend you have; ie. it can beused to locate a specific PDF, but it doesn't do in-document highlightingor anything like that.

MJ

On 2011-03-28, at 12:58 PM, David Juhasz wrote:

Oh, also this:
http://en.wikipedia.org/wiki/Pdftotext

On 28-Mar-11, at 9:56 AM, David Juhasz wrote:

How about this?
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

Looks like it uses a "pdfinfo" application that I've never heard of,but looks pretty straight-forward. I wonder if we could do somethingsimpler (i.e. just indexing the contents) using ghostscript?

David

On 27-Mar-11, at 8:53 AM, Peter Van Garderen wrote:

we'll need some type of support for this in the 1.x branch. Ideally wecan make this work with ZSL or if not, some other PHP component thatcan plugin relatively painlessly with the existing Qubit architectureand minimum system requirements.

The ugliest hack discussed thus far is to use a full-text search boxvia Google site search.


--
You received this message because you are subscribed to the Google Groups "Qubit 
Toolkit Issues" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/qubit-issues?hl=en.

Re: Issue 1252 in qubit-toolkit: Add the full-text content of uploaded PDFs to the search index

Reply via email to