On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 08/08/2016 09:47 AM, Ioana Danes wrote:
>
>>
>>
>> On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver
>> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote:
>>
>>     On 08/08/2016 09:28 AM, Ioana Danes wrote:
>>
>>
>>
>>         On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
>>         <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
>>         <mailto:adrian.kla...@aklaver.com
>>         <mailto:adrian.kla...@aklaver.com>>> wrote:
>>
>>             On 08/08/2016 09:11 AM, Ioana Danes wrote:
>>
>>                 Hi,
>>
>>                 I suspect I am having a case of data corruption. Here
>>         are the
>>                 details:
>>
>>                 I am running postgres 9.4.8:
>>
>>                 postgresql94-9.4.8-1PGDG.rhel7.x86_64
>>                 postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
>>                 postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
>>                 postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
>>
>>                 on CentOS Linux release 7.2.1511 (Core)
>>
>>                 This is happening in a production environment but
>>         luckily on the
>>                 reporting database.
>>                 I have a cluster of 3 databases, db1 and db2 are masters
>> and
>>                 replicate
>>                 between each other and also replicate to db3 (db1 <->
>>         db2, db1
>>                 -> db3,
>>                 db2 -> db3).
>>                 For replication I am using Bucardo.
>>
>>
>>             I would say this is more a question for the Burcardo list:
>>
>>             https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>>         <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>
>>             <https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>>         <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>>
>>
>>             I am just not seeing that replicating two masters on to a
>> single
>>             database is going to end well.
>>
>>
>>         Only one master is active at one time the other one is in stand
>>         by that
>>         is a topic for another discussion but in our case that works well.
>>
>>         That was my first assumption, that it is a kind of a race
>>         condition or a
>>         bug on replication but I quickly ruled that out because that
>>         does not
>>         explain why when I filtered the table by transactionid = 75315815
>> it
>>         shows one record with transactionid 75315811...
>>
>>         select gameplayid, transactionid, encodedplay from
>>         abrazo.matchgameplay
>>         where transactionid in (75315815) order by transactionid;;
>>          gameplayid | transactionid | encodedplay
>>         ------------+---------------+--------------
>>           160019271 |      75315815 | mix:9,0,9
>>           160019269 |      75315815 | mix:9,8,9
>>           160019267 |      75315815 | mix:9,2,2
>>           160019265 |      75315815 | mix:2,2,8
>>           160019263 |      *75315811 *| backup:1,9,1
>>           160019261 |      75315815 | backup:2,0,9
>>
>>         So I don't think it is a replication issue...
>>
>>
>>     Other that, if I am following correctly, it is on the database(db3)
>>     being replicated to. The only way db3 is getting its data is through
>>     replication, is that correct?. On the master databases the data is
>>     correct.
>>
>> OK, let's assume that what you're saying is correct and the replication
>> has a bug, or corruption or whatever that is and the record gets created
>> with transactionid = 75315811. Bucardo replication is trigger based and
>> it is using a copy command to insert the new records into the replicated
>> database.
>>
>> Then how can I explain that my query select gameplayid, transactionid,
>> encodedplay from abrazo.matchgameplay where transactionid in (75315815)
>> order by transactionid; returns me a record with transactionid 75315811???
>>
>
> Corrupted index on db3?
>
> yes


> Might want to look in the db3 logs to see if anything pops out.
>
> I checked the logs, no traces of errors or corruption.


> I just do not know enough about Burcardo to be of much help beyond that.
>


it is trigger based, it saves the ids of the inserted record in a delta
table and then on sync it creates copy commands to the slave. Even if there
is a bug or corruption in that process I don't see how that corrupts the
index on db3...


So it is either replication bug + index corruption on db3 or data
corruption on db3...


In response to Melvin, the query returns no rows:


SELECT n.nspname,
               i.relname,
               i.indexrelname,
               CASE WHEN idx.indisprimary
                         THEN 'pkey'
                          WHEN idx.indisunique
                          THEN 'uidx'
                          ELSE 'idx'
                  END AS type,
                  'INVALID'
  FROM pg_stat_all_indexes i
    JOIN pg_class c     ON (c.oid = i.relid)
    JOIN pg_namespace n ON (n.oid = c.relnamespace)
    JOIN pg_index idx   ON (idx.indexrelid =  i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;

 nspname | relname | indexrelname | type | ?column?
---------+---------+--------------+------+----------
(0 rows)


Thank you for your thoughts,
ioana

>
>
>> Thanks,
>> ioana
>>
>>
>>
>>     --
>>     Adrian Klaver
>>     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

Reply via email to