[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

2016-03-23 Thread Michael Ulbrich
Hi ocfs2-users,

my first post to this list from yesterday probably didn't get through.

Anyway, I've made some progress in the meantime and may now ask more
specific questions ...

I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy:

Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

the kernel modules are:

modinfo ocfs2 -> version: 1.5.0

using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.

As an alternative I cloned and built the latest ocfs2-tools from
markfasheh's ocfs2-tools on github which should be version 1.8.4.

The filesystem runs on top of drbd, is used to roughly 40 % and suffers
from read-only remounts and hanging clients since the last reboot. This
may be DLM problems but I suspect they stem from some corrupt disk
structures. Before that it all ran stable for months.

This situation made me want to run fsck.ocfs2 and now I wonder how to do
that. The filesystem is not mounted.

With the stock ocfs-tools 1.6.4:

root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.6.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:  ocfs2_ASSET
  UUID:   6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size: 2048
  Number of clusters: 2778641591
  Cluster size:   4096
  Number of slots:16

I'm checking fsck_drbd1.log and find that it is making progress in

Pass 0a: Checking cluster allocation chains

until it reaches "chain 73" and goes into an infinite loop filling the
logfile with breathtaking speed.

With the newly built ocfs-tools 1.8.4 I get:

root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.8.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:  ocfs2_ASSET
  UUID:   6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size: 2048
  Number of clusters: 2778641591
  Cluster size:   4096
  Number of slots:16

Again watching the verbose output in fsck_drbd1.log I find that this
time it proceeds up to

Pass 0a: Checking cluster allocation chains
o2fsck_pass0:1360 | found inode alloc 13 at block 13

and stays there without any further progress. I've terminated this
process after waiting for more than an hour.

Now - I'm lost somehow ... and would very much appreciate if anybody on
this list would share his knowledge and give me a hint what to do next.

What could be done to get this file system checked and repaired? Am I
missing something important or do I just have to wait a little bit
longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will
perform as expected?

I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away
from taking that risk without any clue of whether that might solve my
problem ...

Thanks in advance ... Michael Ulbrich

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users


Re: [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

2016-03-24 Thread Michael Ulbrich
Hi Joseph,

ok, got it! Here's the loop in chain 73:

Group Chain: 73   Parent Inode: 13  Generation: 1172963971
CRC32:    ECC: 
##   Block#TotalUsed Free Contig   Size
0428077363215872114874385 1774 1984
12583263232158725341 105315153 1984
24543613952158725329 105435119 1984
3453266227215872107535119 5119 1984
44539963392158723223 126497530 1984
54536312832158725219 106535534 1984
64529011712158726047 9825 3359 1984
74525361152158724475 113975809 1984
84521710592158723182 126905844 1984
94518060032158725881 9991 5131 1984
10   423696691215872107535119 5119 1984
11   409824563215872107565116 3388 1984
12   4514409472158728826 7046 5119 1984
13   34411448321587215   158579680 1984
14   4404892672158727563 8309 5119 1984
15   4233316352158729398 6474 5114 1984
16   448882158726358 9514 5119 1984
17   3901115392158729932 5940 3757 1984
18   4507108352158726557 9315 6166 1984
19   408364339215872571  153014914 1984 <--
20   4510758912158724834 110386601 1984
21   4492506112158726532 9340 5119 1984
22   449615667215872107535119 5119 1984
23   450345779215872107185154 5119 1984
...
154   408364339215872571  153014914 1984 <--
155   4510758912158724834 110386601 1984
156   4492506112158726532 9340 5119 1984
157   449615667215872107535119 5119 1984
158   450345779215872107185154 5119 1984
...
289   408364339215872571  153014914 1984 <--
290   4510758912158724834 110386601 1984
291   4492506112158726532 9340 5119 1984
292   449615667215872107535119 5119 1984
293   450345779215872107185154 5119 1984

etc.

So the loop begins at record #154 and spans 135 records, right?

Will backup fs metadata as soon as I have some external storage at hand.

Thanks a lot so far ... Michael

On 03/24/2016 10:41 AM, Joseph Qi wrote:
> Hi Michael,
> It seems that dead loop happens in chain 73. You have formatted using 2K
> block and 4K cluster, so each chain should have 1522 or 1521 records.
> But at first glance, I cannot figure out which block goes wrong, because
> the output you pasted indicates all blocks are different. So I suggest
> you investigate the all blocks which belong to chain 73 and try to find
> out if there is a loop there.
> BTW, have you backed up the metadata using o2image?
> 
> Thanks,
> Joseph
> 
> On 2016/3/24 16:40, Michael Ulbrich wrote:
>> Hi Joseph,
>>
>> thanks a lot for your help. It is very much appreciated!
>>
>> I ran debugsfs.ocfs2 from ocfs2-tools 1.6.4 on the mounted file system:
>>
>> root@s1a:~# debugfs.ocfs2 -R 'stat //global_bitmap' /dev/drbd1 >
>> debugfs_drbd1.log 2>&1
>>
>> Inode: 13   Mode: 0644   Generation: 1172963971 (0x45ea0283)
>> FS Generation: 1172963971 (0x45ea0283)
>> CRC32:    ECC: 
>> Type: Regular   Attr: 0x0   Flags: Valid System Allocbitmap Chain
>> Dynamic Features: (0x0)
>> User: 0 (root)   Group: 0 (root)   Size: 11381315956736
>> Links: 1   Clusters: 2778641591
>> ctime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> atime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> mtime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> dtime: 0x0 -- Thu Jan  1 01:00:00 1970
>> ctime_nsec: 0x -- 0
>> atime_nsec: 0x -- 0
>> mtime_nsec: 0x -- 0
>> Refcount Block: 0
>> Last Extblk: 0   Orphan Slot: 0
>> Sub Alloc Slot: Global   Sub Alloc Bit: 7
>> Bitmap Total: 2778641591   Used: 1083108631   Free: 1695532960
>> Clusters per Group: 15872   Bits per Cluster: 1
>> Count: 115   Next Free Rec: 115
>> ##   TotalUsed Free Block#
>> 024173056 9429318  14743738 4533995520
>> 124173056 9421663  14751393 4548629504
>> 224173056 9432421  14740635 4588817408
>> 324173056 9427533  14745523 4548692992
>> 424173056 9433978  14739078 4508568576
>> 524173056 9436974  14736082 4636369920
>> 624173056 942

Re: [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

2016-03-24 Thread Michael Ulbrich
Hi Joseph,

thanks for this information although this does not sound too optimistic ...

So, if I understand you correctly, if we had a metadata backup from
o2image _before_ the crash we could have looked up the missing info to
remove the loop from group chain 73, right?

But how could the loop issue be fixed and at the same time the damage to
the data be minimized? There is a recent file level backup from which
damaged or missing files could be restored later.

151   4054438912158722152 13720106061984
152   409459507215872107535119 5119 1984
153   4090944512158721818 140549646 1984 <--
154   408364339215872571  153014914 1984
155   4510758912158724834 110386601 1984
156   4492506112158726532 9340 5119 1984

Could you describe a "brute force" way how to dd out and edit record
#153 to remove the loop and minimize potential loss of data at the same
time? So that fsck would have a chance to complete and fix the remaining
issues?

Thanks a lot for your help ... Michael

On 03/24/2016 02:10 PM, Joseph Qi wrote:
> Hi Michael,
> So I think the block of record #153 goes wrong, which points next to
> block 4083643392 of record #19.
> But the problem is we don't know the right info of the block of record
> #153, otherwise we can dd out, edit it and then dd in to fix it.
> 
> Thanks,
> Joseph
> 
> On 2016/3/24 18:38, Michael Ulbrich wrote:
>> Hi Joseph,
>>
>> ok, got it! Here's the loop in chain 73:
>>
>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>> CRC32:    ECC: 
>> ##   Block#TotalUsed Free Contig   Size
>> 0428077363215872114874385 1774 1984
>> 12583263232158725341 105315153 1984
>> 24543613952158725329 105435119 1984
>> 3453266227215872107535119 5119 1984
>> 44539963392158723223 126497530 1984
>> 54536312832158725219 106535534 1984
>> 64529011712158726047 9825 3359 1984
>> 74525361152158724475 113975809 1984
>> 84521710592158723182 126905844 1984
>> 94518060032158725881 9991 5131 1984
>> 10   423696691215872107535119 5119 1984
>> 11   409824563215872107565116 3388 1984
>> 12   4514409472158728826 7046 5119 1984
>> 13   34411448321587215   158579680 1984
>> 14   4404892672158727563 8309 5119 1984
>> 15   4233316352158729398 6474 5114 1984
>> 16   448882158726358 9514 5119 1984
>> 17   3901115392158729932 5940 3757 1984
>> 18   4507108352158726557 9315 6166 1984
>> 19   408364339215872571  153014914 1984 <--
>> 20   4510758912158724834 110386601 1984
>> 21   4492506112158726532 9340 5119 1984
>> 22   449615667215872107535119 5119 1984
>> 23   450345779215872107185154 5119 1984
>> ...
>> 154   408364339215872571  153014914 1984 <--
>> 155   4510758912158724834 110386601 1984
>> 156   4492506112158726532 9340 5119 1984
>> 157   449615667215872107535119 5119 1984
>> 158   450345779215872107185154 5119 1984
>> ...
>> 289   408364339215872571  153014914 1984 <--
>> 290   4510758912158724834 110386601 1984
>> 291   4492506112158726532 9340 5119 1984
>> 292   449615667215872107535119 5119 1984
>> 293   450345779215872107185154 5119 1984
>>
>> etc.
>>
>> So the loop begins at record #154 and spans 135 records, right?
>>
>> Will backup fs metadata as soon as I have some external storage at hand.
>>
>> Thanks a lot so far ... Michael
>>
>> On 03/24/2016 10:41 AM, Joseph Qi wrote:
>>> Hi Michael,
>>> It seems that dead loop happens in chain 73. You have formatted using 2K
>>> block and 4K cluster, so each chain should have 1522 or 1521 records.
>>> But at first glance, I cannot figure out which block goes wrong, because
>>> the output you pasted indicates all blocks are different

Re: [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

2016-03-24 Thread Michael Ulbrich
555253350415872115871158711984
1522   555618406415872115871158711984

... all following group chains are similarly structured up to #73 which
looks as follows:

Group Chain: 73   Parent Inode: 13  Generation: 1172963971
CRC32:    ECC: 
##   Block#TotalUsed Free Contig   Size
02583263232158725341 105315153 1984
14543613952158725329 105435119 1984
2453266227215872107535119 5119 1984
34539963392158723223 126497530 1984
44536312832158725219 106535534 1984
54529011712158726047 9825 3359 1984
64525361152158724475 113975809 1984
74521710592158723182 126905844 1984
84518060032158725881 9991 5131 1984
9423696691215872107535119 5119 1984
...
2059651   4299026432158724334 115384816 1984
2059652   4087293952158727003 8869 2166 1984
2059653   4295375872158726626 9246 5119 1984
2059654   428807475215872509  153639662 1984
2059655   4291725312158726151 9721 5119 1984
2059656   428442419215872100525820 5119 1984
2059657   4277123072158727383 8489 5120 1984
2059658   42734725121587214   158585655 1984
2059659   4269821952158722637 132357060 1984
2059660   426617139215872107585114 3674 1984
...

Assuming this would go on forever I stopped debugfs.ocfs2.

With debugs.ocfs2 from ocfs2-tools 1.8.4 I get an identical result.

Please let me know if I can provide any further information and help to
fix this issue.

Thanks again + Best regards ... Michael

On 03/24/2016 01:30 AM, Joseph Qi wrote:
> Hi Michael,
> Could you please use debugfs to check the output?
> # debugfs.ocfs2 -R 'stat //global_bitmap' 
> 
> Thanks,
> Joseph
> 
> On 2016/3/24 6:38, Michael Ulbrich wrote:
>> Hi ocfs2-users,
>>
>> my first post to this list from yesterday probably didn't get through.
>>
>> Anyway, I've made some progress in the meantime and may now ask more
>> specific questions ...
>>
>> I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy:
>>
>> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
>>
>> the kernel modules are:
>>
>> modinfo ocfs2 -> version: 1.5.0
>>
>> using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.
>>
>> As an alternative I cloned and built the latest ocfs2-tools from
>> markfasheh's ocfs2-tools on github which should be version 1.8.4.
>>
>> The filesystem runs on top of drbd, is used to roughly 40 % and suffers
>> from read-only remounts and hanging clients since the last reboot. This
>> may be DLM problems but I suspect they stem from some corrupt disk
>> structures. Before that it all ran stable for months.
>>
>> This situation made me want to run fsck.ocfs2 and now I wonder how to do
>> that. The filesystem is not mounted.
>>
>> With the stock ocfs-tools 1.6.4:
>>
>> root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
>> fsck.ocfs2 1.6.4
>> Checking OCFS2 filesystem in /dev/drbd1:
>>   Label:  ocfs2_ASSET
>>   UUID:   6A1A0189A3F94E32B6B9A526DF9060F3
>>   Number of blocks:   5557283182
>>   Block size: 2048
>>   Number of clusters: 2778641591
>>   Cluster size:   4096
>>   Number of slots:16
>>
>> I'm checking fsck_drbd1.log and find that it is making progress in
>>
>> Pass 0a: Checking cluster allocation chains
>>
>> until it reaches "chain 73" and goes into an infinite loop filling the
>> logfile with breathtaking speed.
>>
>> With the newly built ocfs-tools 1.8.4 I get:
>>
>> root@s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
>> fsck.ocfs2 1.8.4
>> Checking OCFS2 filesystem in /dev/drbd1:
>>   Label:  ocfs2_ASSET
>>   UUID:   6A1A0189A3F94E32B6B9A526DF9060F3
>>   Number of blocks:   5557283182
>>   Block size: 2048
>>   Number of clusters: 2778641591
>>   Cluster size:   4096
>>   Number of slots:16
>>
>> Again watching the verbose output in fsck_drbd1.log I find that this
>> time it proceeds up to
>>
>> Pass 0a: Checking cluster allocation chains
>> o2fsck_pass0:1360 | found inode alloc 13 at block 13
>>
>> and stays there wi

Re: [Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

2016-03-25 Thread Michael Ulbrich
Joseph,

thanks again for your help!

Currently I'm dumping out 4 TB of data from the broken ocfs2 device to
an external disk. I have shut down the cluster and have the fs mounted
read-only on a single node. It seems that the data structures are still
intact and that the file system problems are bound to internal data
areas (DLM?) which are not in use in the single node r/o mount use case.

Will create a new ocfs2 device and restore the data later.

Besides taking  metadata backups with o2image is there any advice which
you would give to avoid similar situations in the future?

All the best ... Michael

On 03/25/2016 01:36 AM, Joseph Qi wrote:
> Hi Michael,
> 
> On 2016/3/24 21:47, Michael Ulbrich wrote:
>> Hi Joseph,
>>
>> thanks for this information although this does not sound too optimistic ...
>>
>> So, if I understand you correctly, if we had a metadata backup from
>> o2image _before_ the crash we could have looked up the missing info to
>> remove the loop from group chain 73, right?
> If we have metadata backup, we can use o2image to restore it back, but
> this may loss some data.
> 
>>
>> But how could the loop issue be fixed and at the same time the damage to
>> the data be minimized? There is a recent file level backup from which
>> damaged or missing files could be restored later.
>>
>> 151   4054438912158722152 13720106061984
>> 152   409459507215872107535119 5119 1984
>> 153   4090944512158721818 140549646 1984 <--
>> 154   408364339215872571  153014914 1984
>> 155   4510758912158724834 110386601 1984
>> 156   4492506112158726532 9340 5119 1984
>>
>> Could you describe a "brute force" way how to dd out and edit record
>> #153 to remove the loop and minimize potential loss of data at the same
>> time? So that fsck would have a chance to complete and fix the remaining
>> issues?
> This is dangerous until we can know exactly what's info the block should
> store.
> 
> My idea is to find out the actual block of record #154 and let block
> 4090944512 of record #153 points to it. This must be a bit complicated
> and should be done under deep understanding of the disk layout.
> 
> I have went though fsck.ocfs2 patches, and found the following may help:
> commit efca4b0f2241 (Break a chain loop in group desc)
> But as you said, you have already upgraded to version 1.8.4. So I'm sorry
> currently I don't have a better idea.
> 
> Thanks,
> Joseph
>>
>> Thanks a lot for your help ... Michael
>>
>> On 03/24/2016 02:10 PM, Joseph Qi wrote:
>>> Hi Michael,
>>> So I think the block of record #153 goes wrong, which points next to
>>> block 4083643392 of record #19.
>>> But the problem is we don't know the right info of the block of record
>>> #153, otherwise we can dd out, edit it and then dd in to fix it.
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/3/24 18:38, Michael Ulbrich wrote:
>>>> Hi Joseph,
>>>>
>>>> ok, got it! Here's the loop in chain 73:
>>>>
>>>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>>>> CRC32:    ECC: 
>>>> ##   Block#TotalUsed Free Contig   Size
>>>> 0428077363215872114874385 1774 1984
>>>> 12583263232158725341 105315153 1984
>>>> 24543613952158725329 105435119 1984
>>>> 3453266227215872107535119 5119 1984
>>>> 44539963392158723223 126497530 1984
>>>> 54536312832158725219 106535534 1984
>>>> 64529011712158726047 9825 3359 1984
>>>> 74525361152158724475 113975809 1984
>>>> 84521710592158723182 126905844 1984
>>>> 94518060032158725881 9991 5131 1984
>>>> 10   423696691215872107535119 5119 1984
>>>> 11   409824563215872107565116 3388 1984
>>>> 12   4514409472158728826 7046 5119 1984
>>>> 13   34411448321587215   158579680 1984
>>>> 14   4404892672158727563 8309 5119 1984
>>>> 15   4233316352158729398 6474 5114 1984
>>>> 16   448882158726358 9514 5119 1984

[Ocfs2-users] Mixed mounts w/ different physical block sizes (long post)

2017-09-18 Thread Michael Ulbrich
Hi again,

chatting with a helpful person on #ocfs2 IRC channel this morning  I got
encouraged to cross-post to ocsf2-devel. For historic background and
further details pls. see my two previous posts to ocfs2-users from last
week which are unanswered so far.

According to my current state of inspection I changed the topic from

"Node 8 doesn't mount / Wrong slot map assignment" to the current "Mixed
mounts ..."

Here we go:

I've learnt that large hard disks in increasing number come formatted w/
4k physical blocks size.

Now I've created an ocfs2 shared file system on top of drbd on a RAID1
of two 6 TB disks with such 4k physical block size. File system creation
was done on a hypervisor which actually saw the device as having 4k
physical sector size.

I'm using the default o2cb cluster stack. Version is ocfs2 1.6.4 on
stock Debian 8.

A node (numbered "1" in cluster.conf) which mounts this device with 4k
phys. blocks leads to a strange "times 8" numbering when checking
heartbeat debug info with 'echo "hb" | debugfs.ocfs2 -n /dev/drbd1':

hb
node: node  seq   generation checksum
   8:1 59bfd253 00bfa1b63f30e494 c518c55a

I'm not sure why the first 2 columns are named "node:" and "node" but
assume the first "node:" is an index into some internal data structure
(slot map ?, heartbeat region ?) while the second "node" column shows
the actual node number as given in cluster.conf

Now a second node mounts the shared file system again as 4k block device:

hb
node: node  seq   generation checksum
   8:1 59bfd36a 00bfa1b63f30e494 d4f79d63
  16:2 59bfd369 7acf8521da342228 4b8cd74d

As it actually happened in my setup of a two node cluster with 2
hypervisors and  3 virtual machines on top of each (8 nodes in total),
when mounting the fs on the first virtual machine with node number 3 we get:

hb
node: node  seq   generation checksum
   3:3 59bfd413 59eb77b4db07884b 87a5057d
   8:1 59bfd412 00bfa1b63f30e494 e782d86e
  16:2 59bfd413 7acf8521da342228 cd48c018

Uhm, ... wait ... 3 ??

Mounting on further VMs (nodes 4, 5, 6 and 7) leads to:

hb
node: node  seq   generation checksum
   3:3 59bfd413 59eb77b4db07884b 87a5057d
   4:4 59bfd413 debf95d5ff50dc10 3839c791
   5:5 59bfd414 529a98c758325d5b 60080c42
   6:6 59bfd412 14acfb487fa8c8b8 f54cef9d
   7:7 59bfd413 4d2d36de0b0d6b2e 3f1ad275
   8:1 59bfd412 00bfa1b63f30e494 e782d86e
  16:2 59bfd413 7acf8521da342228 cd48c018

Up to this point I did not notice any error or warning in the machines'
console or kernel logs.

And then trying to mount on node 8 finally there's an error:

kern.log node 1:

(o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)

kern.log node 8:

ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
(o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)

(actual seq and generation are not from above hb debug dump)

Now we have a conflict on slot 8.

When I encountered this error for the first time, I didn't know about
heartbeat debug info, slot maps or heartbeat regions and had no idea
what might have gone wrong so I started experimenting and found a
"solution" by swapping nodes 1 <-> 8 in cluster.conf. This leads to the
following layout of the heartbeat region (?):

hb
node: node  seq   generation checksum
   1:1 59bfd412 00bfa1b63f30e494 e782d86e
   3:3 59bfd413 59eb77b4db07884b 87a5057d
   4:4 59bfd413 debf95d5ff50dc10 3839c791
   5:5 59bfd414 529a98c758325d5b 60080c42
   6:6 59bfd412 14acfb487fa8c8b8 f54cef9d
   7:7 59bfd413 4d2d36de0b0d6b2e 3f1ad275
  16:2 59bfd413 7acf8521da342228 cd48c018
  64:8 59bfd413 73a63eb550a33095 f4e074d1

Voila - all 8 nodes mounted, problem solved - let's continue with
getting this cluster ready for production ...

As it turned out this was in no way a stable configuration in that after
few weeks spurious reboots (fencing peer) started to happen (drbd losing
its replication connection, all kinds of weird kernel oopses and panics
from drbd and ocfs2). Reboots were usually preceded by burst of errors like:

Sep 11 00:01:27 web1 kernel: [ 9697.644436]
(o2hb-10254DCA50,515,1):o2hb_check_own_slot:582 ERROR: Heartbeat
sequence mismatch on device (vdc): expected(3:0x743493e99d19e721,
0x59b5b635), 

[Ocfs2-users] Node 8 doesn't mount / Wrong slot map assignment?

2017-09-13 Thread Michael Ulbrich
Hi all,

we've a small (?) problem with a 2-node cluster on Debian 8:

Linux h1b 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2+deb8u2 (2017-06-26)
x86_64 GNU/Linux

ocfs2-tools 1.6.4-3

Two ocfs2 filesystems (drbd0 600 GB w/ 8 slots and drbd1 6 TB w/ 6
slots) are created on top of drbd w/ 4k block and cluster size,
'max_features' enabled.

cluster.conf assigns sequential node numbers 1 - 8. Nodes 1, 2 are the
hypervisors. Nodes 3, 4, 5 are VMs on node 1. Nodes 6, 7, 8 the
corresponding VMs on node 2.

VMs all run Debian 8 as well:

Linux srv2 3.16.0-4-amd64 #1 SMP Debian 3.16.39-1 (2016-12-30) x86_64
GNU/Linux

When mounting drbd0 in order of increasing node numbers and concurrently
watching the 'hb' output from debugsfs.ocfs2 we get a clean slot map (?):

hb
node: node  seq   generation checksum
   1:1 59b8d94a fa60f0d8423590d9 edec9643
   2:2 59b8d94c aca059df4670f467 994e3458
   3:3 59b8d949 f03dc9ba8f27582c d4473fc2
   4:4 59b8d94b df5bbdb756e757f8 12a198eb
   5:5 59b8d94a 1af81d94a7cb681b 91fba906
   6:6 59b8d94b 104538f30cdb35fa 8713e798
   7:7 59b8d94b 195658c9fb8ca7f9 5e54edf6
   8:8 59b8d949 dc6bfb46b9cf1ac3 de7a8757

Device drbd1 in contrast yields the following table after mounting on
nodes 1, 2:

hb
node: node  seq   generation checksum
   8:1 59b8d9ba 73a63eb550a33095 f4e074d1
  16:2 59b8d9b9 5c7504c05637983e 07d696ec

Proceeding with the drbd1 mounts on nodes 3, 5, 6 leads us to:

hb
node: node  seq   generation checksum
   3:3 59b8da3b 9443b4b209b16175 f2cc87ec
   5:5 59b8da3c 4b742f709377466f 3ac41cf3
   6:6 59b8da3b d96e2de0a55514f6 335a4d90
   8:1 59b8da3c 73a63eb550a33095 2312c1c4
  16:2 59b8da3d 5c7504c05637983e 659571a1

The problem arises when trying to mount node 8 since its slot is already
occupied by node 1:

kern.log node 1:

(o2hb-0AEE381A14,50990,4):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (drbd1): expected(1:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(8:0xb91302db72a65364, 0x59b8445b)

kern.log node 8:

ocfs2: Mounting device (254,16) on (node 8, slot 7) with ordered data mode.
(o2hb-0AEE381A14,518,1):o2hb_check_own_slot:582 ERROR: Another node is
heartbeating on device (vdc): expected(8:0x18acf7b0b3e5544c,
0x59b8445c), ondisk(1:0x18acf7b0b3e5544c, 0x59b8445c)

This can be "fixed" by exchanging node numbers 1 and 8 in cluster.conf.
Then node 8 will be assigned slot 8, node 2 stays in slot 16, 3 to 7 as
expected. There is no node 16 configured so there's no conflict. But
since we experience some other so far not explainable instabilities with
this ocfs2 device / system during operation further down the road we
decided to take care of and try to fix this issue first.

Somehow the failure reminds of bit shift or masking problems:

1 << 3 = 8
2 << 3 = 16

But then again - what do I know ...

Tried so far:

A. Create offending file system with 8 slots instead of 6 -> same issue.
B. Set features to 'default' (disables feature 'extended-slotmap') ->
same issue.

We'd very much appreciate any comments on this. Has anything similar
ever been experienced before? Are we completely missing something
important here?

If there's a fix already out for this any pointers (src files / commits)
to where to look would be greatly appreciated.

Thanks in advance + Best regards ... Michael U.

___
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-users