[rt-users] Full text indexing failure (invalid byte sequence for encoding UTF8)
We're currently running RT 4.0.5-3~bpo60+1 (from Debian backports) with Postgresql 8.4.12-0squeeze1. Recently I tried to enable full text search following the instructions here: http://blog.bestpractical.com/2011/06/full-text-searching.html ...but ran into this error an hour into the initial rt-fulltext-indexer --all: [crit]: error: ERROR: invalid byte sequence for encoding UTF8: 0xfc HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by client_encoding. at /usr/sbin/rt-fulltext-indexer-4 line 375. (/usr/share/request-tracker4/lib/RT.pm:351) Subsequent runs of the same command end with the same error. The encoding for the rt4 db has been set to utf8 for as long as I can recall. I assume this relates to some data inserted into the db ages ago when client_encoding was something other than utf8, or in a previous version of postgresql which might have been less stringent about input. There is a FAQ about 'invalid byte sequence for encoding' but I'm not sure that this is the same issue. Anyone else been through this sort of issue? Would it be better to take the question to a postgresql list? Ben -- pub 4096R/318B6A97 2009-05-11 Ben Poliakoff b...@reed.edu Primary key fingerprint: 3F23 EBC8 B73E 92B7 0A67 705A 8219 DCF0 318B 6A97 signature.asc Description: Digital signature
Re: [rt-users] Full text indexing failure (invalid byte sequence for encoding UTF8)
On Fri, 2013-02-01 at 17:03 -0800, Ben Poliakoff wrote: We're currently running RT 4.0.5-3~bpo60+1 (from Debian backports) with Postgresql 8.4.12-0squeeze1. This is fixed in RT 4.0.9 and above, wich resolve this issue by skipping the attachment with bad data. RT 4.0.7 and above are better about not trusting emails which claim to be utf-8, which prevents the bad data from getting in in the first place, which is the likely cause here, and which older Pg allowed. - Alex