[ADMIN] dump/restore results in duplicate key violation with 7.4.6. Bug?

danda Sun, 07 Nov 2004 18:30:48 -0800

Hi, I am getting an error when attempting to perform a pg_dump/pg_restore cycle (data only).

Source database is 7.4.3, Target is 7.4.6. Source database was originally using 'SQL_ASCII' as the encoding, but all data is in UTF-8. Target database uses 'UNICODE'. The table that is giving me problems contains over 400,000 rows. (It holds category data from the dmoz project. ) In both the source and target database, there is a unique constraint on the topic key. Upon restore I am getting duplicate key violations on 12 rows. All of the problem rows contain non latin1 data.

What I've tried:
==========

1) SQL Dump / Restore

source machine:
pg_dump -a -t category -U postgres dbname > cats.dump.sql

target machine:
psql
\i cats.dump.sql

which outputs: \i /tmp/category.dump.sql SET SET SET SET SET UPDATE 1 SET psql:/tmp/category.dump.sql:24: ERROR: duplicate key violates unique constraint "category_topic_key" CONTEXT: COPY category, line 133302: "1227568 503988 Top/Kids_and_Teens/International/Korean/�� 0 0" SET UPDATE 1

( I also tried it using inserts instead of copy from, but with similar results. )

2) Binary (custom) Dump / Restore
source machine:
pg_dump -F c -Z 8 -t category -U postgres dbname > category_dump.custom.gz

target machine:
pg_restore -d dbname -a category_dump.custom.gz

which churns for a while and then gives me this error:

pg_restore: ERROR: duplicate key violates unique constraint "category_topic_key" CONTEXT: COPY category, line 133302: "1227568 503988 Top/Kids_and_Teens/International/Korean/�� 0 0" pg_restore: [archiver (db)] error returned by PQendcopy


3) Remove unique constraints.

I then removed the unique constraint in the target database so I could at least import the data. After that I was able to view exactly which rows have been duplicated:

select sub.topic, sub.cnt from (select topic, count(*) as cnt from category group by topic) sub where cnt > 1; topic | cnt -----------------------------------------------------------------------+----- Top/Adult/World/Japanese/��/�� | 2 Top/Adult/World/Japanese/��/�� /��/�� | 10 Top/Adult/World/Korean/�거 | 4 Top/Adult/World/Korean/��_��/�� | 2 Top/Adult/World/Korean/�� /문� | 2 Top/Adult/World/Korean/�� | 2 Top/Adult/World/Korean/��_X/�� | 2 Top/Kids_and_Teens/International/Japanese/�� | 2 Top/Kids_and_Teens/International/Japanese/�� | 2 Top/Kids_and_Teens/International/Japanese/��/�� | 4 Top/Kids_and_Teens/International/Japanese/趣��/�� | 2 Top/Kids_and_Teens/International/Korean/�� | 5 (12 rows)

Performing this same query on the source database:

select sub.topic, sub.cnt from (select topic, count(*) as cnt from category group by topic) sub where cnt > 1; topic | cnt -------+----- (0 rows)


4) Attempted to identify category_id of duplicate rows:

select category_id from category where topic = 'Top/Adult/World/Korean/�거'; category_id ------------- (0 rows)

I believe this failed due to some sort of encoding or font problem between xterm and psql and DB or even X clipboard. Note that the data does display correctly when viewed in mozilla. Still it would be nice to be able to copy/paste psql result string and use it as input and actually find a match!

5) Manual inspection of one of the rows.

I chose the topic 'Top/Adult/World/Korean/�거' to pursue further. I executed the following query and looked for multiple instances of that string. There should be 4 according to our duplicates query above.

select category_id, topic from category where topic like 'Top/Adult/World/Korean%'; category_id | topic -------------+---------------------------------------------- 328048 | Top/Adult/World/Korean/��,CD 381025 | Top/Adult/World/Korean/��,CD/백�CD 400131 | Top/Adult/World/Korean/��_X 400136 | Top/Adult/World/Korean/��_X/�� 400133 | Top/Adult/World/Korean/��_X/�� 5830581 | Top/Adult/World/Korean/미�� 5830906 | Top/Adult/World/Korean/�� 589823 | Top/Adult/World/Korean/��_��/기�,�체 324253 | Top/Adult/World/Korean 367742 | Top/Adult/World/Korean/�거 378503 | Top/Adult/World/Korean/�� 590650 | Top/Adult/World/Korean/�� /�� 378504 | Top/Adult/World/Korean/�� /문� 590649 | Top/Adult/World/Korean/�� /� �� 5828700 | Top/Adult/World/Korean/��_X/�� 5812536 | Top/Adult/World/Korean/�� 5832542 | Top/Adult/World/Korean/��/취�,�� 364360 | Top/Adult/World/Korean/�� 324254 | Top/Adult/World/Korean/��_�� 592044 | Top/Adult/World/Korean/��_��/�� 5852704 | Top/Adult/World/Korean/��_��/�� 406487 | Top/Adult/World/Korean/��_��/�� 365940 | Top/Adult/World/Korean/�� (23 rows)

Yet I only see one row that matches the string exactly. It is the one with category_id = 367742.


6) Attempt to import same data back into source database ( 7.4.3 )

Acting on the theory that this is possibly a new problem in 7.4.6, I tried the following in the source DB, (still with SQL_ASCII encoding) which worked just fine:

create table category_tmp as select * from category;

alter table category_tmp add constraint category_tmp_topic_key unique (topic); NOTICE: ALTER TABLE / ADD UNIQUE will create implicit index "category_tmp_topic_key" for table "category_tmp" ALTER TABLE

pg_dump -a -t category_tmp -U postgres dbname > category_tmp_dump.sql

psql

delete from category_tmp ;
DELETE 461153

\i category_tmp_dump.sql
SET
SET
SET
SET


7) Experiments with encoding

a) Used GNU recode to recode the sql dump file to UTF-8. Resulting file was unchanged, meaning data was already UTF-8.

b) Updated the encoding in source database to 'UNICODE' to match target database.

update pg_database set encoding = 6 where datname = 'dbname';

c) Re-imported the data back into the source database again (as in 6). Worked fine again.

So at this point I am mostly at a loss. I would have thought that after changing the source DB to UNICODE encoding it should exhibit the same behavior as the target. I can think of two explanations:

1: initdb does something with the encoding beyond setting pg_database(encoding).

2: there is a bug in 7.4.6 that does not exist in 7.4.3

I suppose the next step is to create a new DB in 7.4.3 using UNICODE and attempt to import the data in the same manner. But right now I need a break.


Dan Libby

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

[ADMIN] dump/restore results in duplicate key violation with 7.4.6. Bug?

Reply via email to