Just for clarity, can you post the proper sequence you're now using to take SAN based snapshots? I'd like to try this on a new cluster I'm setting up.
Thanks, Brian Daniel Keisling <[EMAIL PROTECTED]> 2008-12-04 12:15: > I've restarted the box and the heartbeat threads and messages are now > gone. I've taken six snapshots and unmounted the filesystems several > times and the segmentation faults do not occur. > > Thank you so much for looking into this, finding the problem, and > getting me a fix. I look forward to the 1.4.2 release. > > Daniel > > > -----Original Message----- > > From: Sunil Mushran [mailto:[EMAIL PROTECTED] > > Sent: Thursday, December 04, 2008 11:45 AM > > To: Daniel Keisling > > Cc: Joel Becker > > Subject: Re: [Ocfs2-users] Another node is heartbeating in > > our slot! errorswith LUN removal/addition > > > > These could be hb thread that were not killed when you > > umounted those volumes. Have you restarted the box > > since you cleaned out those devices? > > > > Daniel Keisling wrote: > > > Sunil, > > > > > > I edited /dev/sdo and /dev/sdr and the rest of corrupted devices > > > disappeared, so there are no more corrupted OCFS2 > > filesystems when doing > > > a 'mounted.ocfs2 -f.' However, the 'heartbeating in our slot' error > > > messages are still coming. The devices in question are not in the > > > device-mapper maps and are not mounted, but do appear in > > mounted.ocfs2. > > > Do I need to do the same procedure and wipe out the signature? > > > > > > Dec 4 10:29:35 ausracdbd01 kernel: > > (26064,2):o2hb_do_disk_heartbeat:770 > > > ERROR: Device "dm-43": another node is heartbeating in our slot! > > > > > > [EMAIL PROTECTED] ~]# multipath -ll | grep dm-43 > > > [EMAIL PROTECTED] ~]# > > > > > > [EMAIL PROTECTED] ~]# mounted.ocfs2 -f | grep dm-43 > > > /dev/dm-43 ocfs2 ausracdbd01 > > > > > > [EMAIL PROTECTED] ~]# mounted.ocfs2 -d | grep dm-43 > > > /dev/dm-43 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > > > > [EMAIL PROTECTED] ~]# mounted.ocfs2 -d | grep > > > ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdw1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdat1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdbq1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdcn1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sddk1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdeh1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdfe1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/sdgb1 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > /dev/dm-43 ocfs2 ce7c5099-145f-457b-9644-923202450f31 > > > > > > Daniel > > > > > > > > >> -----Original Message----- > > >> From: Sunil Mushran [mailto:[EMAIL PROTECTED] > > >> Sent: Wednesday, December 03, 2008 3:01 PM > > >> To: Daniel Keisling > > >> Cc: Joel Becker; Sunil Mushran > > >> Subject: Re: [Ocfs2-users] Another node is heartbeating in > > >> our slot! errorswith LUN removal/addition > > >> > > >> OK... so now know what the problem is. Filed a bugzilla for this. > > >> http://oss.oracle.com/bugzilla/show_bug.cgi?id=1053 > > >> > > >> Instead of waiting for the fix, may be quicker if you fix > > >> this by hand. > > >> > > >> Do you have a binary editor? While we could script this, it > > >> will be safer > > >> if you _fix_ this manually. > > >> > > >> Say. you had bvi. The steps for 4K blocksize fs would be: > > >> > > >> $ bvi -b 8192 -s 512 /dev/sdo > > >> > > >> You will see OCFSV2 signature at the very start. Edit 4F (O) > > >> to 00 (.). > > >> Or something other than Oh. In short, we want to clobber the > > >> signature. > > >> This needs to be repeated for each volume below. If you > > don't see the > > >> signature, abort. Means the blocksize is less than 4K... say > > >> 2K. In that > > >> case, it will become "bvi -b 4096 -s 512 DEVICE". > > >> > > >> You will know it is fixed when "mounted.ocfs2 -d" does not show any > > >> of these volumes. > > >> > > >> Sunil > > >> > > >> Daniel Keisling wrote: > > >> > > >>> [EMAIL PROTECTED] ~]# debugfs.ocfs2 -R "stat //heartbeat" > > /dev/sdo > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> > > >>> [EMAIL PROTECTED] ~]# mount -t debugfs debugfs /debug > > >>> [EMAIL PROTECTED] ~]# debugfs.ocfs2 -R "stat //heartbeat" > > /dev/sdo > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> > > >>> [EMAIL PROTECTED] ~]# for d in o r al ao bi bl cf ci dc df > > >>> > > >> dz ec ew ez > > >> > > >>> ft fw ; do > > >>> > > >>> > > >>>> echo Device /dev/sd${d} ; > > >>>> debugfs.ocfs2 -R "stat //heartbeat" /dev/sd${d} ; > > >>>> done ; > > >>>> > > >>>> > > >>> Device /dev/sdo > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdr > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdal > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdao > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdbi > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdbl > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdcf > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdci > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sddc > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sddf > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sddz > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdec > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdew > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdez > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdft > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> Device /dev/sdfw > > >>> stat: OCFS2 directory corrupted '//heartbeat' > > >>> [EMAIL PROTECTED] ~]# > > >>> > > >>> > > >>> > > >>>> -----Original Message----- > > >>>> From: Sunil Mushran [mailto:[EMAIL PROTECTED] > > >>>> Sent: Wednesday, December 03, 2008 1:07 PM > > >>>> To: Daniel Keisling > > >>>> Subject: Re: [Ocfs2-users] Another node is heartbeating in > > >>>> our slot! errorswith LUN removal/addition > > >>>> > > >>>> I think I know what the issue is. > > >>>> > > >>>> Can you run the following on your box? > > >>>> $ debugfs.ocfs2 -R "stat //heartbeat" /dev/sdo > > >>>> > > >>>> Email me the output. > > >>>> > > >>>> While we are at it, why don't you run this script as it may save > > >>>> us a roundtrip. > > >>>> > > >>>> $ for d in o r al ao bi bl cf ci dc df dz ec ew ez ft fw ; do > > >>>> echo Device /dev/sd${d} ; > > >>>> debugfs.ocfs2 -R "stat //heartbeat" /dev/sd${d} ; > > >>>> done ; > > >>>> > > >>>> All this does is dump the inode of the heartbeat inode > > >>>> > > >> file. I suspect > > >> > > >>>> these devices. Meaning no writing... only reading. > > >>>> > > >>>> Sunil > > >>>> > > >>>> Daniel Keisling wrote: > > >>>> > > >>>> > > >>>>> Yes, please do. I have development time on the machine > > >>>>> > > >> for the next > > >> > > >>>>> couple of days. > > >>>>> > > >>>>> > > >>>>> > > >>>>> > > >>>>>> -----Original Message----- > > >>>>>> From: Sunil Mushran [mailto:[EMAIL PROTECTED] > > >>>>>> Sent: Tuesday, December 02, 2008 8:16 PM > > >>>>>> To: Daniel Keisling > > >>>>>> Cc: [email protected] > > >>>>>> Subject: Re: [Ocfs2-users] Another node is heartbeating in > > >>>>>> our slot! errorswith LUN removal/addition > > >>>>>> > > >>>>>> Yes. Your diagnosis is correct. > > >>>>>> > > >>>>>> ocfs2_hb_ctl segfault is not making any sense. The > > >>>>>> > > >> coredump has not > > >> > > >>>>>> been helpful. I may have to send you a debug build. > > >>>>>> > > >> strace also led > > >> > > >>>>>> me down a blind alley. > > >>>>>> > > >>>>>> Let me know if you will be willing to copy a debug build of the > > >>>>>> ocfs2_hb_ctl util. The coredump from that should help us > > >>>>>> > > >> nail down > > >> > > >>>>>> this issue. > > >>>>>> > > >>>>>> Sunil > > >>>>>> > > >>>>>> > > >>>>>> > > >>>> > > >>>> > > >>> > > >> > > ______________________________________________________________________ > > >> > > >>> This email transmission and any documents, files or previous email > > >>> messages attached to it may contain information that is > > >>> > > >> confidential or > > >> > > >>> legally privileged. If you are not the intended recipient > > >>> > > >> or a person > > >> > > >>> responsible for delivering this transmission to the > > >>> > > >> intended recipient, > > >> > > >>> you are hereby notified that you must not read this > > transmission and > > >>> that any disclosure, copying, printing, distribution or > > use of this > > >>> transmission is strictly prohibited. If you have received > > >>> > > >> this transmission > > >> > > >>> in error, please immediately notify the sender by telephone > > >>> > > >> or return email > > >> > > >>> and delete the original transmission and its attachments > > >>> > > >> without reading > > >> > > >>> or saving in any manner. > > >>> > > >>> > > >>> > > >> > > >> > > > > > > > > ______________________________________________________________________ > > > This email transmission and any documents, files or previous email > > > messages attached to it may contain information that is > > confidential or > > > legally privileged. If you are not the intended recipient > > or a person > > > responsible for delivering this transmission to the > > intended recipient, > > > you are hereby notified that you must not read this transmission and > > > that any disclosure, copying, printing, distribution or use of this > > > transmission is strictly prohibited. If you have received > > this transmission > > > in error, please immediately notify the sender by telephone > > or return email > > > and delete the original transmission and its attachments > > without reading > > > or saving in any manner. > > > > > > > > > > > > > > ______________________________________________________________________ > This email transmission and any documents, files or previous email > messages attached to it may contain information that is confidential or > legally privileged. If you are not the intended recipient or a person > responsible for delivering this transmission to the intended recipient, > you are hereby notified that you must not read this transmission and > that any disclosure, copying, printing, distribution or use of this > transmission is strictly prohibited. If you have received this transmission > in error, please immediately notify the sender by telephone or return email > and delete the original transmission and its attachments without reading > or saving in any manner. > > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
