Thanks for your help.
I think a trigger will actually be the easiest. The way i can tell if
there is invalid data is simply to do an Upper(text) and if it has
invalid data it fails.
I dumped the fixed database. Now I have a years worth of backups that I
can't restore a specific table from. It most probably will never mean
anything. Every once in a while, I get asked to check what was in the db
against what is in there, but this table will probably never be audited.
Sim
On 11/23/2010 10:33 AM, Dmitriy Igrishin wrote:
Hey Sim,
2010/11/23 Sim Zacks <[email protected] <mailto:[email protected]>>
On 11/21/2010 05:55 PM, Dmitriy Igrishin wrote:
Hey Sim,
Maybe this helps:
http://blog.tapoueh.org/articles/blog/_Getting_out_of_SQL_ASCII,_part_2.html
That worked to find some of them. I still needed to find a bunch
of others manually, such as 0xa0 and 0xd725 which weren't found
with that function. I finally figured out that
select * from emaildetails where emailbody like '%\xa0%' and
select * from emaildetails where emailbody like '%\xd7\x25%' would
show me all those rows.
My 2 big problems now are:
A) how to make sure that these chars are not inserted in the
future. The database should prevent them from being inserted.
Consider to use domains -- generic-based types with constraints --
instead of generic types.
Use regular expressions in constraints.
As alternative, you can use triggers for more complex validation. But
domains in you case
IMO will work good.
B) How to fix the backups that I have so that I can restore them.
As I mentioned, they are being taken with -Fc
Oops. Why not dump fixed database ?
Sim
2010/11/21 Sim Zacks <[email protected] <mailto:[email protected]>>
I am using PG 8.2.17 with UTF8 encoding.
"PostgreSQL 8.2.17 on i686-pc-linux-gnu, compiled by GCC gcc
(GCC) 4.1.1 (Gentoo 4.1.1)"
One of my tables somehow has invalid characters in it:
ERROR: invalid byte sequence for encoding "UTF8": 0xa9
HINT: This error can also happen if the byte sequence
does not match the encoding expected by the server, which
is controlled by "client_encoding".
I have already manually found a number of the bad rows by
running queries with text functions (upper) between groups of
IDs until I found the specific bad row.
1) Is there a quicker way to get a list of all rows with
invalid characters
2) Shouldn't the database prevent these rows from being
entered in the first place?
3) I have backups of this database (using -Fc) and I noticed
that on restore, this table is not restored because of this
error. Is there a way to fix the existing backups, or tell
the restore to ignore bad rows instead of erroring out the
whole table?
Thanks
Sim
--
Sent via pgsql-general mailing list
([email protected]
<mailto:[email protected]>)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
// Dmitriy.
--
// Dmitriy.