On Thu, 2006-06-29 at 21:59 -0400, Tom Lane wrote: > [ back to the start of the thread... ] > > BTW, a couple of thoughts here: > > * If my theory about the low-level cause is correct, then reindexing > sl_log_1 would make the "duplicate key" errors go away, but nonetheless > you'd have lost data --- the overwritten rows would be gone. I suppose > that this would result in the slave missing some rows that are present > on the master. Have you tried comparing slave and master databases to > see if you can find any discrepancies?
Haven't done that yet - in test we tend to restart the old subscriber as the new provider and rebuild the cluster. I'll check the logs from our production failure to figure out what to compare and see what I can discover. > * One way that the problem could happen would be if a race condition in > the kernel allowed an lseek(fd, 0, SEEK_END) to return a value less than > the true end-of-file (as determined by another process' write() > extending the EOF just slightly earlier --- ie, lseek fails to note the > effects of the just-completed write, and returns the prior EOF value). > PG does have internal locking that should guarantee that the lseek is > not done until after the write completes ... but could there be a bug in > the kernel allowing stale data to be returned? The SMP hardware is > relevant (maybe one processor sees different data than the other) and > frankly I don't trust NFS very far at all for questions such as this. > It'd be interesting to see if you can reproduce the problem in a > database on local storage. Unfortunately we haven't got any local storage that can stand the sort of loads we are putting through. With slower storage the CPUs mostly sit idle and we are very unlikely to trigger a timing-based bug if that's what it is. I'll get back to you with kernel build information tomorrow. We'll also try to talk to some kernel hackers about this. Many thanks for your efforts so far. __ Marc
Description: This is a digitally signed message part