Updates:
        Owner: suhonos
Labels: -Milestone-Release-Post-1.2 -Priority-Low Milestone-Release-1.2 Priority-High

Comment #2 on issue 1252 by [email protected]: Add the full-text content of uploaded PDFs to the search index
http://code.google.com/p/qubit-toolkit/issues/detail?id=1252

I know several OJS-based journals successfully used pdftotext for indexing their PDF-based articles. It basically extracts text from the PDF and makes it available to whatever backend you have; ie. it can be used to locate a specific PDF, but it doesn't do in-document highlighting or anything like that.

MJ

On 2011-03-28, at 12:58 PM, David Juhasz wrote:

Oh, also this:
http://en.wikipedia.org/wiki/Pdftotext


On 28-Mar-11, at 9:56 AM, David Juhasz wrote:

How about this?
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/

Looks like it uses a "pdfinfo" application that I've never heard of, but looks pretty straight-forward. I wonder if we could do something simpler (i.e. just indexing the contents) using ghostscript?

David

On 27-Mar-11, at 8:53 AM, Peter Van Garderen wrote:



we'll need some type of support for this in the 1.x branch. Ideally we can make this work with ZSL or if not, some other PHP component that can plugin relatively painlessly with the existing Qubit architecture and minimum system requirements.

The ugliest hack discussed thus far is to use a full-text search box via Google site search.

--
You received this message because you are subscribed to the Google Groups "Qubit 
Toolkit Issues" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/qubit-issues?hl=en.

Reply via email to