Re: [Ocfs2-users] Another node is heartbeating in our slot! errors with LUN removal/addition

Sunil Mushran Mon, 01 Dec 2008 12:42:03 -0800

So the problem you are encountering is killing via uuid. You could kill by
device name too.


By now you have the list of heartbeat regions. To get the device name for
a region, do:

$ cat 
/sys/kernel/config/cluster/CLUSERNAME/heartbeat/C43CB881C2C84B09BAC14546BF6DCAD9/dev
 

sdf1

$ ocfs2_hb_ctl -K -d /dev/sdf1

Now makesure that that device is not mounted. It should not be. If it
is, then you probably have used force-uuid-reset to change the uuid of 
an active
device. In that case, I see no solution other than a node reset.

But before you do this, I would like some more info.

1. strace -o /tmp/hbctl.out ocfs2_hb_ctl -K -u 
F5F0522D39FC4EB2824C3E68C0B1D589
2. uname -a
3. rpm -qa | grep ocfs2
4. rpm -qf `which ocfs2_hb_ctl`
5. mounted.ocfs2 -d >/tmp/mounted.out

Thanks
Sunil

Daniel Keisling wrote:
> I wrote a script to easily get the heartbeats that should have been
> killed.  However, I get a segmentation fault everytime I try and kill
> the "dead" heartbeats:
>
> [EMAIL PROTECTED] tmp]# mounted.ocfs2 -d | grep -i f5f0 | wc -l
> 0
>
> [EMAIL PROTECTED] tmp]# ocfs2_hb_ctl -K -u
> F5F0522D39FC4EB2824C3E68C0B1D589
> Segmentation fault (core dumped)
>
>
>
> The process is still active:
>
> [EMAIL PROTECTED] tmp]# ps -ef | grep -i f5f0
> root       620   169  0 Nov29 ?        00:00:30 [o2hb-F5F0522D39]
> root     22608 18491  0 14:07 pts/4    00:00:00 grep -i f5f0
>
> Attached is the core.
>
> While I can create and mount snapshot filesystems on my development
> node, a dead heartbeat on one of my production nodes is not letting me
> mount the snapshot for a newly presented filesystem (thus causing our
> backups to fail).  What else can I do?  I really don't want to open an
> SR with Oracle...
>
> Thanks,
>
> Daniel

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Another node is heartbeating in our slot! errors with LUN removal/addition

Reply via email to