On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver <adrian.kla...@aklaver.com> wrote:
> On 08/08/2016 09:47 AM, Ioana Danes wrote: > >> >> >> On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver >> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>> wrote: >> >> On 08/08/2016 09:28 AM, Ioana Danes wrote: >> >> >> >> On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver >> <adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> >> <mailto:adrian.kla...@aklaver.com >> <mailto:adrian.kla...@aklaver.com>>> wrote: >> >> On 08/08/2016 09:11 AM, Ioana Danes wrote: >> >> Hi, >> >> I suspect I am having a case of data corruption. Here >> are the >> details: >> >> I am running postgres 9.4.8: >> >> postgresql94-9.4.8-1PGDG.rhel7.x86_64 >> postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64 >> postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64 >> postgresql94-server-9.4.8-1PGDG.rhel7.x86_64 >> >> on CentOS Linux release 7.2.1511 (Core) >> >> This is happening in a production environment but >> luckily on the >> reporting database. >> I have a cluster of 3 databases, db1 and db2 are masters >> and >> replicate >> between each other and also replicate to db3 (db1 <-> >> db2, db1 >> -> db3, >> db2 -> db3). >> For replication I am using Bucardo. >> >> >> I would say this is more a question for the Burcardo list: >> >> https://mail.endcrypt.com/mailman/listinfo/bucardo-general >> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general> >> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general >> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>> >> >> I am just not seeing that replicating two masters on to a >> single >> database is going to end well. >> >> >> Only one master is active at one time the other one is in stand >> by that >> is a topic for another discussion but in our case that works well. >> >> That was my first assumption, that it is a kind of a race >> condition or a >> bug on replication but I quickly ruled that out because that >> does not >> explain why when I filtered the table by transactionid = 75315815 >> it >> shows one record with transactionid 75315811... >> >> select gameplayid, transactionid, encodedplay from >> abrazo.matchgameplay >> where transactionid in (75315815) order by transactionid;; >> gameplayid | transactionid | encodedplay >> ------------+---------------+-------------- >> 160019271 | 75315815 | mix:9,0,9 >> 160019269 | 75315815 | mix:9,8,9 >> 160019267 | 75315815 | mix:9,2,2 >> 160019265 | 75315815 | mix:2,2,8 >> 160019263 | *75315811 *| backup:1,9,1 >> 160019261 | 75315815 | backup:2,0,9 >> >> So I don't think it is a replication issue... >> >> >> Other that, if I am following correctly, it is on the database(db3) >> being replicated to. The only way db3 is getting its data is through >> replication, is that correct?. On the master databases the data is >> correct. >> >> OK, let's assume that what you're saying is correct and the replication >> has a bug, or corruption or whatever that is and the record gets created >> with transactionid = 75315811. Bucardo replication is trigger based and >> it is using a copy command to insert the new records into the replicated >> database. >> >> Then how can I explain that my query select gameplayid, transactionid, >> encodedplay from abrazo.matchgameplay where transactionid in (75315815) >> order by transactionid; returns me a record with transactionid 75315811??? >> > > Corrupted index on db3? > > yes > Might want to look in the db3 logs to see if anything pops out. > > I checked the logs, no traces of errors or corruption. > I just do not know enough about Burcardo to be of much help beyond that. > it is trigger based, it saves the ids of the inserted record in a delta table and then on sync it creates copy commands to the slave. Even if there is a bug or corruption in that process I don't see how that corrupts the index on db3... So it is either replication bug + index corruption on db3 or data corruption on db3... In response to Melvin, the query returns no rows: SELECT n.nspname, i.relname, i.indexrelname, CASE WHEN idx.indisprimary THEN 'pkey' WHEN idx.indisunique THEN 'uidx' ELSE 'idx' END AS type, 'INVALID' FROM pg_stat_all_indexes i JOIN pg_class c ON (c.oid = i.relid) JOIN pg_namespace n ON (n.oid = c.relnamespace) JOIN pg_index idx ON (idx.indexrelid = i.indexrelid ) WHERE idx.indisvalid = FALSE ORDER BY 1, 2; nspname | relname | indexrelname | type | ?column? ---------+---------+--------------+------+---------- (0 rows) Thank you for your thoughts, ioana > > >> Thanks, >> ioana >> >> >> >> -- >> Adrian Klaver >> adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> >> >> >> > > -- > Adrian Klaver > adrian.kla...@aklaver.com >