Hi, I am testing upgrading from RT 3.6.7 on mysql to RT 4.0.2 on postgresql 8.4.8 (debian) and enabling full text search. The database has more that 30000 tickets.
I have been able to convert the database to postgresql and update RT to 4.0.2. I have done the set up for the full text search using defaults except for selecting GIN index instead of the default. When running "/opt/rt4/sbin/rt-fulltext-indexer --all" the indexer runs for hours with the occasional minor warnings as follows: [Thu Sep 15 01:56:27 2011] [warning]: NOTICE: word is too long to be indexed DETAIL: Words longer than 2047 characters are ignored. (/opt/rt4/sbin/rt-fulltext-indexer:370) I think that is ok. However the indexer eventually fails/stops with this error: [Mon Sep 19 02:08:28 2011] [warning]: DBD::Pg::db do failed: ERROR: string is too long for tsvector (3831236 bytes, max 1048575 bytes) at /opt/rt4/sbin/rt-fulltext-indexer line 370. (/opt/rt4/sbin/rt-fulltext-indexer:370) [Mon Sep 19 02:08:28 2011] [crit]: DBD::Pg::db do failed: ERROR: string is too long for tsvector (3831236 bytes, max 1048575 bytes) at /opt/rt4/sbin/rt-fulltext-indexer line 370. (/opt/rt4/sbin/../lib/RT.pm:340) DBD::Pg::db do failed: ERROR: string is too long for tsvector (3831236 bytes, max 1048575 bytes) at /opt/rt4/sbin/rt-fulltext-indexer line 370. related postgresql log: 2011-09-19 02:08:28 UTC ERROR: string is too long for tsvector (3831236 bytes, max 1048575 bytes) 2011-09-19 02:08:28 UTC STATEMENT: UPDATE Attachments SET ContentIndex = to_tsvector($1) WHERE id = $2 I think it is getting to a ticket that has too many unique words so it can't index it and it critically fails and stops indexing any further. If I run "/opt/rt4/sbin/rt-fulltext-indexer --all" again, it stops with that error again quite quickly (after about a minute). I tested doing some content searches for a particular word (using full text search) and not all tickets that contain that word are found. With full text turned off (content search still enabled) all tickets with that word are found. So I think the indexer has not been able to complete indexing all tickets as it is getting a problem with one ticket that halts the indexing. I notice on an older wiki page for PostgreSQLFullText for before RT had fulltext indexing built in http://requesttracker.wikia.com/wiki/PostgreSQLFullText they mention limiting the size of tsvector that will attempt to index: "So I am adding the tsvectors only to those entries with a size < 500KB" I think maybe this has not been implemented for rt 4.0.2? I would appreciate some advice on how I can proceed with getting the rest of my data indexed. I think any of the following would be suitable but I don't know how to implement them (I am not a coder or a dba) and could use some help. Options: - modify the rt-fulltext-indexer script to truncate strings that are "too long for tsvector". or - modify the rt-fulltext-indexer script to skip tickets that have that issue and continue indexing other tickets. or - find out which ticket is causing the problem (hopefully only one) and maybe I can delete it before running the rt-fulltext-indexer script. or - maybe someone can suggest a better alternative Please help. Regards, Anthony -------- RT Training Sessions (http://bestpractical.com/services/training.html) * Chicago, IL, USA September 26 & 27, 2011 * San Francisco, CA, USA October 18 & 19, 2011 * Washington DC, USA October 31 & November 1, 2011 * Melbourne VIC, Australia November 28 & 29, 2011 * Barcelona, Spain November 28 & 29, 2011
