Hello,
As I said it in the previous thread, the 3 nodes are 3 identical KVM
virtual machines running on the same physical host which have more than
enough resources (48 CPU 256GB RAM Gbit network).
I have also tried to move them to other physical servers but didn't
help.
I also run constant p
Hi,
It seems that your cluster has something wrong with connection between
nodes. So no dlm message can be sent out.
This may cause a node being fenced, thus to crash.
Please check your network condition including switch, Ethernet HBA card,
etc.
Thanks,
Changwei
On 2017/11/28 18:07, net...@tan
Hello Netbsd,
What was your problem?
dlm_send_remote_convert_request failed, or hung_task_timeout?
Thanks
Gang
>>>
> Hello,
>
> Servers crashed like 20 times since the last time I wrote to the list.
> Today is the last with:
>
> [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_r
Hello,
Servers crashed like 20 times since the last time I wrote to the list.
Today is the last with:
[ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
[ 1901.918314] (php-fpm7.0,822,3):dlm_send_remote_con
Hi,
in some cases, apache could crash a OCFS2 cluster, because a miss defined
reading or write privilege to a file or folder and provoke a dead-lock in
the cluster.
On Fri, Sep 1, 2017 at 4:23 AM, Adi Kriegisch wrote:
> Hi!
>
> > We were experimenting with the newer version of OCFS2 on Debian 9
Hi!
> We were experimenting with the newer version of OCFS2 on Debian 9
> Stretch inside KVM GUESTS.
[...]
> We have 3 nodes, but it is the same with 1 single node when we do apache
> benchmark on the vm it crashes (becomes unpingable, unreachable, kernel
> crashlog on virtual console) until destr
Hello List,
We were experimenting with the newer version of OCFS2 on Debian 9
Stretch inside KVM GUESTS.
#1 SMP Debian 4.11.6-1~bpo9+1 (2017-07-09) x86_64 GNU/Linux
Kernels what we have tried:
vmlinuz-4.11.0-0.bpo.1-amd64
vmlinuz-4.1.1 custom
vmlinuz-4.9.0-3-amd64
We have 3 nodes, but it is t
On 08/21/2015 03:10 AM, Martin Lund wrote:
> Hello,
>
> We have a 3 node OCFS2 cluster, using:
>
> Kernel: 3.16.0-0.bpo.4-amd64
> ii ocfs2-tools 1.6.4-1+deb7u1amd64
>tools for managing OCFS2 cluster filesystems
>
> Today two of the nodes out of th
Hello,
We have a 3 node OCFS2 cluster, using:
Kernel: 3.16.0-0.bpo.4-amd64
ii ocfs2-tools 1.6.4-1+deb7u1amd64
tools for managing OCFS2 cluster filesystems
Today two of the nodes out of the 3 had some partial OCFS2 related kernel panic
(see at
On Fri, Mar 1, 2013 at 10:42 PM, Srinivas Eeda wrote:
> Yes that was the crash I was referring to which stopped me from testing my
> other patch on mainline. I think the crashes started since some workqueue
> patches introduced by commit 57b30ae77bf00d2318df711ef9a4d2a9be0a3a2a
> Earlier kernels
[ 1481.620253] o2hb: Unable to stabilize heartbeart on region
1352E2692E704EEB8040E5B8FF560997 (vdb)
What this means is that the device is suspect. o2hb writes are not hitting
the disk. vdb is accepting and
acknowledging the write but spitting out something else during the next
read. Heartbeat de
Yes that was the crash I was referring to which stopped me from testing
my other patch on mainline. I think the crashes started since some
workqueue patches introduced by commit
57b30ae77bf00d2318df711ef9a4d2a9be0a3a2a Earlier kernels should be fine.
Patch https://lkml.org/lkml/2012/10/18/592
Hi!
Using 3.8.1 OCFS2 crashes while joining nodes to the cluster.
The cluster consists of 10 nodes, while node3 joins the kernel on node3 crashes.
(Somtimes later...)
See dmesg below.
Is this a known issue? I didn't test older kernels so far.
node1:
[ 1471.881922] o2dlm: Joining domain 1352E2692E
t;B Leggett"
Cc: "Jürgen Herrmann" , ocfs2-users@oss.oracle.com
Sent: Thursday, June 30, 2011 1:24:18 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
Try setting /proc/sys/kernel/panic_on_oops to 1. It appears you are
getting oopses but the box keeps running
oss.oracle.com
> Sent: Wednesday, June 29, 2011 5:57:19 PM GMT -05:00 US/Canada Eastern
> Subject: Re: [Ocfs2-users] OCFS2 Crash
>
> On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote:
>> That's troubling, these are really static systems. I know anything
>>
ern
Subject: Re: [Ocfs2-users] OCFS2 Crash
On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote:
> That's troubling, these are really static systems. I know anything
> can happen, but to inherit a kernel issue two years later seems nuts.
> Not that your analysis is wrong,
>
> - Original Message -
> From: "Sunil Mushran"
> To: "B Leggett"
> Cc: ocfs2-users@oss.oracle.com
> Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada
> Eastern
> Subject: Re: [Ocfs2-users] OCFS2 Crash
>
> You should ping your kernel
build?
- Original Message -
From: "Sunil Mushran"
To: "B Leggett"
Cc: ocfs2-users@oss.oracle.com
Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
You should ping your kernel vendor. While this does not look ocfs
c7 40
> 04 00 00 00 00 c3 56 53 8b 70 04 eb 2c 8b 5e 04 83 eb 1c<8b> 43 18 8d 53 04
> e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08
>
> - Original Message -
> From: "B Leggett"
> To: ocfs2-users@oss.oracle.com
> Sent: Wednesday, June 29, 2011 3:42:42 P
--- Original Message -
From: "B Leggett"
To: ocfs2-users@oss.oracle.com
Sent: Wednesday, June 29, 2011 3:42:42 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
For the list, I accidentally sent it direct to Sunil. My apologies for that.
Bruce
- Original
For the list, I accidentally sent it direct to Sunil. My apologies for that.
Bruce
- Original Message -
From: "B Leggett"
To: "Sunil Mushran"
Sent: Wednesday, June 29, 2011 3:40:52 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
Sunil,
I did
1.2.1? That's 5 years old. We've had a few fixes since then. ;)
You have to catch the oops trace to figure out the reason. And one
way to get it by using netconsole. Check the sles10 docs to see how to
configure netconsole. Or, whatever is recommended for capturing the
oops log in that release.
O
Hi,
I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box.
This is a 3 node cluster that's been running for 2 years with just about zero
modification. The storage is a high end SAN and the transport is iscsi. We went
two years without an issue and all a sudden node 1 in t
Hi,
We're running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2
64bit cluster. We've now experienced the following crash twice:
Apr 11 14:25:55 XXX-XXX03 kernel:
(o2net,19027,5):dlm_assert_master_handler:1837 ERROR: DIE! Mastery
assert from 0, but current owner is 1! (O0
Thanks for quick response,
the bug:
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1319
Regards,
Piotr Teodorowski
On Tuesday 01 of March 2011 02:55:01 Sunil Mushran wrote:
> Thanks for the bug report. Please can you file a bz and attach
> the all the message files. Yes the problem started with t
Thanks for the bug report. Please can you file a bz and attach
the all the message files. Yes the problem started with the hb
timeout in esiprap01. The problem spread to other nodes possibly
because of a race in migration. A bz will help us track the issue
more easily.
On 02/28/2011 01:46 AM, Pio
Hi,
2days ago our ocfs2 cluster was crashing. On all nodes the filesystem
wasn't reachable any longer.
Here are the logs of two nodes:
1.: (ATTENTION reverse Order of log)
http://pastebin.com/u97mcqX6
2.: (http://pastebin.com/tGizwSu0)
Stefan
___
Oc
Hi Guys,
Upon more investigation, it seems that my ext4 fs is getting data
corruption at the FS level as well. It may be something up with iSER and
stgt afterall. I'll do a bit more investigating.
Sorry for the trouble.
Matt
___
Ocfs2-users mailin
Hi,
I'm getting system (and eventually cluster) crashes on intensive disk
writes in ubuntu server 10.04 with my OCFS2 file system.
I have an iSER (infiniband) backed shared disk array with OCFS2 on it.
There are 6 nodes in the cluster, and the heartbeat interface is over a
regular 1GigE conn
Does the slab data have to be right before a crash? Or can we tell
from just 2-3 days of data collection? After one day it appears
certain numbers from slabinfo are only going up.
On 1/17/07, Sunil Mushran <[EMAIL PROTECTED]> wrote:
Could be. But I cannot say for sure till I don't get the slab/m
Could be. But I cannot say for sure till I don't get the slab/mem data.
Brian Sieler wrote:
Does this appear to be the same issue as the "OOM Killer" issue
previously reported that would be fixed with ocfs2 1.2.4?
On 1/16/07, Sunil Mushran <[EMAIL PROTECTED]> wrote:
Looks to be running out of
Does this appear to be the same issue as the "OOM Killer" issue
previously reported that would be fixed with ocfs2 1.2.4?
On 1/16/07, Sunil Mushran <[EMAIL PROTECTED]> wrote:
Looks to be running out of lowmem.
# date
# cat /proc/meminfo
# cat /proc/slabinfo
Run a script that dumps the above ev
Looks to be running out of lowmem.
# date
# cat /proc/meminfo
# cat /proc/slabinfo
Run a script that dumps the above every 1 to 5 mins. That should
help explain the cause.
Brian Sieler wrote:
Using 2-node clustered file system on DELL/EMC SAN/RHEL
2.6.9-34.0.2.ELsmp x86_64.
Config:
O2CB_HEAR
Using 2-node clustered file system on DELL/EMC SAN/RHEL
2.6.9-34.0.2.ELsmp x86_64.
Config:
O2CB_HEARTBEAT_THRESHOLD=30
Kernel param: elavator=deadline (per FAQ)
These log items appear and the server crashes. Has happened twice now
at three week intervals, each time during a heavy IO operation:
34 matches
Mail list logo