On Fri, 26 Nov 2004, Hunter Hillegas wrote:
> When I SELECT from a certain table, I see this JDBC exception: > > "Invalid character data was found. This is most likely caused by stored > data containing characters that are invalid for the character set the > database was created in. The most common example of this is storing 8bit > data in a SQL_ASCII database." > > The database is indeed of type SQL_ASCII. The table stores mailing list data > and has about 400,000 rows. > > Looking at the data via psql, I see that some of the rows have strange > characters in them, such as question marks where I would not expect them, > etc... > > What are my options? Is there a way to identify the 'bad' records, or the > ones causing trouble? > To really solve this problem you need to have a correctly encoded database. This will involve a dump and restore process and possibly recoding your data. This is straightforward if you know what encoding your data is, although it will cause some downtime. To detect the bad data you can try various SELECTs with the JDBC driver and see what errors out. The function below will determine if a particular field has data with the high bit set which is something the database really doesn't know what to do with. SELECT pkcolumn, hashighbit(columna), hashighbit(columnb) FROM mytable; Kris Jurka CREATE OR REPLACE FUNCTION hashighbit(text) RETURNS boolean AS ' DECLARE i int; BEGIN i := LENGTH($1); WHILE i > 0 LOOP IF ascii(substring($1, i, 1)) >= 128 THEN RETURN true; END IF; i := i-1; END LOOP; RETURN false; END; ' LANGUAGE 'plpgsql'; ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster