Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Craig Ringer
On 08/08/2012 09:39 AM, Stephen Frost wrote: Terry, * Terry Schmitt (tschm...@schmittworks.com) wrote: So far, executing pg_dumpall seems to be fairly reliable for finding the corrupt objects after my initial data load, but unfortunately much of the corruption has been with indexes which pgdump

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Stephen Frost
Terry, * Terry Schmitt (tschm...@schmittworks.com) wrote: > So far, executing pg_dumpall > seems to be fairly reliable for finding the corrupt objects after my > initial data load, but unfortunately much of the corruption has been with > indexes which pgdump will not expose. Shouldn't be too hard

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Terry Schmitt
Thanks Craig. "# Brad's el-ghetto do-our-storage-stacks-lie?-script" I like it already :) I may play around with that. Looks interesting. For everyone else, here's a post describing the use of diskchecker: http://brad.livejournal.com/2116715.html I experimented with sysbench today, which was som

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Stephen Frost
Terry, * Terry Schmitt (tschm...@schmittworks.com) wrote: > The new environment is RHEL 6.x guests running inside Redhat Virtualization > using XFS and LVM. That's quite the shift, yet you left out any details on this piece.. How is the VM connected to the NetApp LUN? What kind of options have

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Craig Ringer
On 08/08/2012 06:23 AM, Terry Schmitt wrote: Anyone have a solid method to test if fdatasync is working correctly or thoughts on troubleshooting this? Try diskchecker.pl https://gist.github.com/3177656 The other obvious step is that you've changed three things, so start isolation testing.

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Terry Schmitt
Simon, While I agree with your reply in general and am working that angle and more, I'm hoping to add to my personal tool kit and gain more insight into methods to test fsync and prove without a doubt that it is functioning properly on any given system no matter what type of database I'm running.

Re: [ADMIN] Data corruption after SAN snapshot

2012-08-07 Thread Simon Riggs
On 7 August 2012 23:23, Terry Schmitt wrote: > I have a pretty strange issue that I'm looking for ideas on. > I'm using Postgres Plus Advanced Server 9.1, but I believe this problem is > relevant to Postgres Community. It is certainly possible to be a EDB bug and > I am already working with them