Mark,

That link is a mirror of this mailing list; it's not from 5 months ago.
If you are in the year 2012 please respond with lottery numbers and the
like.



On Mon, Jun 13, 2011 at 9:43 PM, Mark Johnson <
m...@remingtondatabasesolutions.com> wrote:

>
>
> I found another post where you asked the same questions 5 months ago.  Have
> you tested in that time?
> http://www.spinics.net/lists/pgsql-admin/msg19438.html
>
>
> A text search vector is an array of distinct lexemes (less any stopwords)
> and their positions.  Taking your example we get ...
>
> select to_tsvector('the lord of the rings.txt') "answer";
>       answer
> -------------------
> 'lord':2, 'rings.txt':5
>
> You can put the length() function around it to just get the number of
> lexemes.  This is the size in terms of number of distinct lexemes, not size
> in terms of space utilization.
>
> select length(to_tsvector('the lord of the rings.txt')) "answer";
>   answer
> --------
>         2
>
> You might find the tsvector data consumes 2x the space required by the
> input text.  It will depend on your configuration and your input data.  Test
> it and let us know what you find.
>
> -Mark
>
> -----Original Message-----
> *From:* Tim [mailto:elatl...@gmail.com]
> *Sent:* Monday, June 13, 2011 03:19 PM
> *To:* pgsql-admin@postgresql.org
> *Subject:* [ADMIN] tsvector limitations
>
> Dear list,
>
> How big of a file would one need to fill the 1MB limit of a tsvector?
> Reading
> http://www.postgresql.org/docs/9.0/static/textsearch-limitations.htmlseems to 
> hint that filling a tsvector is improbable.
>
> Is there an easy way of query the bytes of a tsvector?
> something like length(tsvector) but bytes(tsvector).
>
> If there no easy method to query the bytes of a tsvector
> I realize the answer is highly dependent on the contents of the file, so
> here are 2 random examples:
> How many bytes of a tsvector would a 32MB ascii english unique word list
> make?
> How many bytes of a tsvector would something like "The Lord of the
> Rings.txt" make?
>
> If this limitation is ever hit is there a common practice for using more
> than one tsvector?
> Using a separate "one to many" table seems like an obvious solution piece,
> but I would not know how to detect or calculate how much text to give each
> tsvector.
> Assuming tsvectors can't be linked maybe they would need some overlap.
>
>
> Thanks in advance.
>
>

Reply via email to