Updates:
Owner: suhonos
Labels: -Milestone-Release-Post-1.2 -Priority-Low Milestone-Release-1.2
Priority-High
Comment #2 on issue 1252 by [email protected]: Add the full-text
content of uploaded PDFs to the search index
http://code.google.com/p/qubit-toolkit/issues/detail?id=1252
I know several OJS-based journals successfully used pdftotext for
indexing their PDF-based articles. It basically extracts text from the
PDF and makes it available to whatever backend you have; ie. it can be
used to locate a specific PDF, but it doesn't do in-document highlighting
or anything like that.
MJ
On 2011-03-28, at 12:58 PM, David Juhasz wrote:
Oh, also this:
http://en.wikipedia.org/wiki/Pdftotext
On 28-Mar-11, at 9:56 AM, David Juhasz wrote:
How about this?
http://www.kapustabrothers.com/2008/01/20/indexing-pdf-documents-with-zend_search_lucene/
Looks like it uses a "pdfinfo" application that I've never heard of,
but looks pretty straight-forward. I wonder if we could do something
simpler (i.e. just indexing the contents) using ghostscript?
David
On 27-Mar-11, at 8:53 AM, Peter Van Garderen wrote:
we'll need some type of support for this in the 1.x branch. Ideally we
can make this work with ZSL or if not, some other PHP component that
can plugin relatively painlessly with the existing Qubit architecture
and minimum system requirements.
The ugliest hack discussed thus far is to use a full-text search box
via Google site search.
--
You received this message because you are subscribed to the Google Groups "Qubit
Toolkit Issues" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/qubit-issues?hl=en.