Mark, That link is a mirror of this mailing list; it's not from 5 months ago. If you are in the year 2012 please respond with lottery numbers and the like.
On Mon, Jun 13, 2011 at 9:43 PM, Mark Johnson < m...@remingtondatabasesolutions.com> wrote: > > > I found another post where you asked the same questions 5 months ago. Have > you tested in that time? > http://www.spinics.net/lists/pgsql-admin/msg19438.html > > > A text search vector is an array of distinct lexemes (less any stopwords) > and their positions. Taking your example we get ... > > select to_tsvector('the lord of the rings.txt') "answer"; > answer > ------------------- > 'lord':2, 'rings.txt':5 > > You can put the length() function around it to just get the number of > lexemes. This is the size in terms of number of distinct lexemes, not size > in terms of space utilization. > > select length(to_tsvector('the lord of the rings.txt')) "answer"; > answer > -------- > 2 > > You might find the tsvector data consumes 2x the space required by the > input text. It will depend on your configuration and your input data. Test > it and let us know what you find. > > -Mark > > -----Original Message----- > *From:* Tim [mailto:elatl...@gmail.com] > *Sent:* Monday, June 13, 2011 03:19 PM > *To:* pgsql-admin@postgresql.org > *Subject:* [ADMIN] tsvector limitations > > Dear list, > > How big of a file would one need to fill the 1MB limit of a tsvector? > Reading > http://www.postgresql.org/docs/9.0/static/textsearch-limitations.htmlseems to > hint that filling a tsvector is improbable. > > Is there an easy way of query the bytes of a tsvector? > something like length(tsvector) but bytes(tsvector). > > If there no easy method to query the bytes of a tsvector > I realize the answer is highly dependent on the contents of the file, so > here are 2 random examples: > How many bytes of a tsvector would a 32MB ascii english unique word list > make? > How many bytes of a tsvector would something like "The Lord of the > Rings.txt" make? > > If this limitation is ever hit is there a common practice for using more > than one tsvector? > Using a separate "one to many" table seems like an obvious solution piece, > but I would not know how to detect or calculate how much text to give each > tsvector. > Assuming tsvectors can't be linked maybe they would need some overlap. > > > Thanks in advance. > >