Re: [GENERAL] Duplicate rows during pg_dump

Adrian Klaver Sat, 24 Oct 2015 12:46:51 -0700

On 10/24/2015 12:35 PM, Chaz Yoon wrote:

I am seeing a duplicate, stale copy of the same row when performing a
pg_dump or copying a specific table, but not when directly selecting
from it. I'm running PostgreSQL 9.3.9 on Amazon RDS, with 9.3.10 client
tools.


It's happening on a users table, which has a primary key and enforces a
unique email address:

                                         Table "public.users"
       Column     |            Type             |                 Modifiers

---------------+-----------------------------+---------------------------------------------------
    id            | integer                     | not null default
nextval('users_id_seq'::regclass)
    email         | character varying(255)      | not null default
''::character varying
    last_activity | timestamp without time zone |
   Indexes:
       "users_pkey" PRIMARY KEY, btree (id)
       "users_unique_email" UNIQUE, btree (email)

I first noticed the problem when doing copying the table to another
database. Roughly this:

   % pg_dump --column-inserts -Fp -h remotedb remote_db > users.txt
   % psql test_db < users.txt
   [...]
   ERROR:  could not create unique index "users_pkey"
   DETAIL:  Key (id)=(123) is duplicated.
   [...]
   ERROR:  could not create unique index "users_unique_email"
   DETAIL:  Key (email)=(this_u...@xyz.com <mailto:this_u...@xyz.com>)
is duplicated.

It appears there's some sort of duplicate record for a single user in
the database. Checking the pg_dump output, I saw that a single user's
record was being exported twice:

   % grep -i this_u...@xyz.com <mailto:this_u...@xyz.com> users.txt
   INSERT INTO users (id, email, last_activity) VALUES (123,
'this_u...@xyz.com <mailto:this_u...@xyz.com>', '2015-10-21
10:32:15.997887');
   INSERT INTO users (id, email, last_activity) VALUES (123,
'this_u...@xyz.com <mailto:this_u...@xyz.com>', '2015-10-02
11:32:58.615743');

The rows were not exactly the same. Connecting to the source database
directly, I tried this:

   remote_db=> select count(1) from users where id = 123;
    count
   -------
        1
   (1 row)

   remote_db=> select count(1) from users where email =
'this_u...@xyz.com <mailto:this_u...@xyz.com>';
    count
   -------
        1
   (1 row)

To eliminate any risk of it being a weird locking issue, I restored a
snapshot of the database into a new RDS instance but I got the same
results. I then tried the following:

   remote_db=> create table users_copy_with_indexes (like users
including defaults including constraints including indexes including
storage including comments);
   CREATE TABLE
   remote_db=> insert into users_copy_with_indexes select * from users;
   ERROR:  duplicate key value violates unique constraint
"users_copy_with_indexes_pkey"
   DETAIL:  Key (id)=(123) already exists.

However, when I created a copy without the indexes, I can see the
duplicate rows:

   remote_db=> create table users_copy_without_indexes (like users);
   CREATE TABLE
   remote_db=> insert into users_copy_without_indexes select * from users;
   INSERT 0 523342
   remote_db=> select count(1) from users_copy_without_indexes where id
= 123;
    count
   -------
        2
   (1 row)
   remote_db=> select count(1) from users_copy_without_indexes where
email = 'this_u...@xyz.com <mailto:this_u...@xyz.com>';
    count
   -------
        2
   (1 row)

Any suggestions for what to look for next? Is it table corruption?

I would say the smoking gun is the copy w/o indexes shows both recordsand the one with indexes only one. I would DROP/CREATE index on theoriginal table, with the usual caveat that this does place a load on thetable. Using Concurrently might help, but I would read the information here:


http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html#SQL-CREATEINDEX-CONCURRENTLY

Building Indexes Concurrently


Chaz



--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Duplicate rows during pg_dump

Reply via email to