[PERFORM] Storage/Performance and splitting a table

Craig A. James Sat, 19 Nov 2005 10:04:22 -0800

In a recent thread, several people pointed out that UPDATE = DELETE+INSERT.  
This got me to wondering.


I have a table that, roughly, looks like this:

 create table doc (
    id         integer primary key,
    document   text,
    keywords   tsvector
  );

where "keywords" has a GIST index.  There are about 10 million rows in the 
table, and an average of 20 keywords per document.  I have two questions.

First, I occasionally rebuild the keywords, after which the VACUUM FULL ANALYZE 
takes a LONG time - like 24 hours.  Given the UPDATE = DELETE+INSERT, it sounds 
like I'd be better off with something like this:

 create table doc (
    id         integer primary key,
    document   text,
  );
 create table keywords (
    id         integer primary key,
    keywords   tsvector
  );

Then I could just drop the GIST index, truncate the keywords table, rebuild the 
keywords, and reindex.  My suspicion is that VACUUM FULL ANALYZE would be quick 
-- there would be no garbage to collect, so all it would to do is the ANALYZE 
part.

My second question: With the doc and keywords split into two tables, would the tsearch2/GIST performance be faster?  The second 
schema's "keywords" table has just pure keywords (no documents); does that translate to fewer blocks being read during 
a tsearch2/GIST query?  Or are the "document" and "keywords" columns of the first schema already stored 
separately on disk so that the size of the "document" data doesn't affect the "keywords" search performance?

Thanks,
Craig

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

[PERFORM] Storage/Performance and splitting a table

Reply via email to