Re: [Ocfs2-users] self fencing and system panicproblem afterforced reboot

Sunil Mushran Fri, 15 Sep 2006 11:15:10 -0700

Yes, we are working on it. :)

Alexei_Roudnev wrote:

It's all about the same - need 'single node' mounting mode on OCFSv2, so
that sysadmin be able to mount it with any media errors and without
working cluster.


(Of course, such mount should show many warnings before going thru).

----- Original Message -----From: "Holger Brueckner" <[EMAIL PROTECTED]>

To: "Sunil Mushran" <[EMAIL PROTECTED]>
Cc: <[email protected]>
Sent: Friday, September 15, 2006 1:20 AM
Subject: Re: [Ocfs2-users] self fencing and system panicproblem afterforced
reboot

i guess i found the solution. while dumping some files with debugfs, it
suddenly stopped working and could not be killed. and guess what, media
error on the drive :-/. funny that a filesystem check succeeds.

anyway thx a lot to those who responded.

holger

On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote:

Not sure why a power outage should cause this.

Do you have the full stack of the oops? It will show the times taken
in the last 24 operations in the hb thread. That should tell us as to
what is up.

Holger Brueckner wrote:

i just discovered the ls, cd, dump and rdump commands in

debugfs.ocfs2.

they work fine :-). neverless i would really like to know why mounting
and accessing the volume is not possible anymore.

but thanks for the hint pieter

holger brueckner

On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote:

Hi Holger

Maybe you should try the fscat tools
(http://oss.oracle.com/projects/fscat/) - which has a fsls (to list)

and

fscp (to copy) directly from the device.

I have not tried it yet, so good luck!


Pieter Viljoen


-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Holger
Brueckner
Sent: Thursday, September 14, 2006 14:17
To: [email protected]
Subject: Re: [Ocfs2-users] self fencing and system panic problem
afterforced reboot

side note: setting HEARBEAT_THRESHOLD to 30 did not help either.

could it be that the syncronization between the daemons does not

work?

(e.g daemons think fs is mounted on some nodes and try to synchonize

but

actually the fs isn't mounted on any node?)

i'm rather clueless now. finding a way to access the data and copy it

to

the non shared partitions would help me a lot.

thx

holger brueckner


On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote:

X-CS-3-Report: plain


hello,

i'm running ocfs2 to provide a shared disk thoughout a xen cluster.
this setup was working fine until today where there was an power

outage

and all xen nodes where forcefully shut down. whenever i try to
mount/access the ocfs2 partition the system panics and reboots:

darks:~# fsck.ocfs2 -y -f /dev/sda4
(617,0):__dlm_print_nodes:377 Nodes in my domain
("5BA3969FC2714FFEAD66033486242B58"):
(617,0):__dlm_print_nodes:381  node 0
Checking OCFS2 filesystem in /dev/sda4:
  label:              <NONE>
  uuid:               5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b

  number of blocks:   35983584
  bytes per block:    4096
  number of clusters: 4497948
  bytes per cluster:  32768
  max slots:          4

/dev/sda4 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
Pass 0b: Checking inode allocation chains
Pass 0c: Checking extent block allocation chains
Pass 1: Checking inodes and blocks.
[CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster
bitmap but it isn't in use.  Clear its bit in the bitmap? y
[CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster
bitmap but it isn't in use.  Clear its bit in the bitmap? y
[CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster
bitmap but it isn't in use.  Clear its bit in the bitmap? y
Pass 2: Checking directory entries.
Pass 3: Checking directory connectivity.
Pass 4a: checking for orphaned inodes
Pass 4b: Checking inodes link counts.
All passes succeeded.
darks:~# mount /data
(622,0):ocfs2_initialize_super:1326 max_slots for this device: 4
(622,0):ocfs2_fill_local_node_info:1019 I am node 0
(622,0):__dlm_print_nodes:377 Nodes in my domain
("5BA3969FC2714FFEAD66033486242B58"):
(622,0):__dlm_print_nodes:381  node 0
(622,0):ocfs2_find_slot:261 slot 2 is already allocated to this

node!

(622,0):ocfs2_find_slot:267 taking node slot 2
(622,0):ocfs2_check_volume:1586 File system was not unmounted

cleanly,

recovering volume.
kjournald starting.  Commit interval 5 seconds
ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data

mode.

(630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on
device (8,4)
darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout

to

device sda4 after 12000 milliseconds
(4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all

active

regions.
Kernel panic - not syncing: ocfs2 is very sorry to be fencing this
system by panicing

ocfs2-tools    1.2.1-1
kernel         2.6.16-xen (with corresponding ocfs2 compiled into

the

               kernel)

i already tried the elevator=deadline scheduler option with no

effect.

any further help debugging this issue is greatly appreciated. are

ther

any other possibilities to get access to the data from outside the
cluster (obviously while the partition isn't mounted) ?

thanks for your help

holger brueckner







_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users


_______________________________________________
Ocfs2-users mailing list
[email protected]
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] self fencing and system panicproblem afterforced reboot

Reply via email to