A more general way would be to have a function which takes a pdf in and
returns the text.  Mark it immutable.

Then you can index the output of converting that text to a tsvector.

You may want to pull everything into a tsvector column for ease of review,
but functional indexes also make that less important

On Sat, Feb 20, 2016 at 1:10 AM, Stephen Davies <sdav...@sdc.com.au> wrote:

> On 20/02/16 00:24, Bruce Momjian wrote:
>
>> On Fri, Feb 19, 2016 at 02:49:16PM +0100, s d wrote:
>>
>>> On 19 February 2016 at 14:19, Bruce Momjian <br...@momjian.us> wrote:
>>>      >     Ah, no. That's not possible
>>>      >
>>>      >
>>>      > ...not possible, Yet.
>>>      >
>>>      > PostgreSQL grows by adding the features people need and its
>>> changing
>>>      rapidly.
>>>
>>>      I wonder if PLPerl could be used to extract the words from a PDF
>>>      document and create a tsvector column from it.
>>>
>>>   I don't know about PLPerl(I'm pretty sure it could be used for this
>>> purpose,
>>> though.).  On the other hand I've written code for this in Python which
>>> should
>>> be easy to adapt for PLPython, if necessary.
>>>
>>
>> Right, so you would write a PL/Perl or PL/Python trigger function that
>> would populate the tsvector column on every INSERT or UPDATE.
>>
>> FWIW, I just use pdftotext in my CGI.
>
> --
>
> =============================================================================
> Stephen Davies Consulting P/L                             Phone: 08-8177
> 1595
> Adelaide, South Australia.                                Mobile:040 304
> 0583
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



-- 
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor
lock-in.
http://www.efficito.com/learn_more

Reply via email to