A more general way would be to have a function which takes a pdf in and returns the text. Mark it immutable.
Then you can index the output of converting that text to a tsvector. You may want to pull everything into a tsvector column for ease of review, but functional indexes also make that less important On Sat, Feb 20, 2016 at 1:10 AM, Stephen Davies <sdav...@sdc.com.au> wrote: > On 20/02/16 00:24, Bruce Momjian wrote: > >> On Fri, Feb 19, 2016 at 02:49:16PM +0100, s d wrote: >> >>> On 19 February 2016 at 14:19, Bruce Momjian <br...@momjian.us> wrote: >>> > Ah, no. That's not possible >>> > >>> > >>> > ...not possible, Yet. >>> > >>> > PostgreSQL grows by adding the features people need and its >>> changing >>> rapidly. >>> >>> I wonder if PLPerl could be used to extract the words from a PDF >>> document and create a tsvector column from it. >>> >>> I don't know about PLPerl(I'm pretty sure it could be used for this >>> purpose, >>> though.). On the other hand I've written code for this in Python which >>> should >>> be easy to adapt for PLPython, if necessary. >>> >> >> Right, so you would write a PL/Perl or PL/Python trigger function that >> would populate the tsvector column on every INSERT or UPDATE. >> >> FWIW, I just use pdftotext in my CGI. > > -- > > ============================================================================= > Stephen Davies Consulting P/L Phone: 08-8177 > 1595 > Adelaide, South Australia. Mobile:040 304 > 0583 > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Best Wishes, Chris Travers Efficito: Hosted Accounting and ERP. Robust and Flexible. No vendor lock-in. http://www.efficito.com/learn_more