[GENERAL] UTF8 conversion revisited

Geoffrey Myers Tue, 29 Mar 2011 12:39:26 -0700

So, we are still having an issue with this and I thought I'd throw thisout to the list to see if I'm missing something. Basically, we haveidentified the tables/fields we need to convert. I'm running thefollowing perl code against the fields and re-inserting the 'fixed' codeinto the field:


data =~ s/(.)/((ord($1) >= 0) && (ord($1) <= 8))
                || (ord($1) == 11)
                || ((ord($1) >= 13) && (ord($1) <= 31))
                || ((ord($1) >= 127)) ?"": $1/egs;

This appears to be working as a large number of records are cleaned.Problem is, someone it's not fixing data that contains the hex value0xbd, as when I attempt to dump this database and create a new one withthe UTF8 encoding I get the following error:


pg_restore: [archiver (db)] Error while PROCESSING TOC:

pg_restore: [archiver (db)] Error from TOC entry 5246; 0 4978675 TABLEDATA cust postgrespg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequencefor encoding "UTF8": 0xbd

As I see it, the perl code above should catch this '0xbd' character, butsomehow it is finding it's way through.


Any insights would be greatly appreciated.

--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

--
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] UTF8 conversion revisited

Reply via email to