Hello,
As I said it in the previous thread, the 3 nodes are 3 identical KVM
virtual machines running on the same physical host which have more than
enough resources (48 CPU 256GB RAM Gbit network).
I have also tried to move them to other physical servers but didn't
help.
I also run constant
Hi,
It seems that your cluster has something wrong with connection between
nodes. So no dlm message can be sent out.
This may cause a node being fenced, thus to crash.
Please check your network condition including switch, Ethernet HBA card,
etc.
Thanks,
Changwei
On 2017/11/28 18:07,
Hello Netbsd,
What was your problem?
dlm_send_remote_convert_request failed, or hung_task_timeout?
Thanks
Gang
>>>
> Hello,
>
> Servers crashed like 20 times since the last time I wrote to the list.
> Today is the last with:
>
> [ 1901.810483]
Hello,
Servers crashed like 20 times since the last time I wrote to the list.
Today is the last with:
[ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420
ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0
[ 1901.918314]
Hi,
in some cases, apache could crash a OCFS2 cluster, because a miss defined
reading or write privilege to a file or folder and provoke a dead-lock in
the cluster.
On Fri, Sep 1, 2017 at 4:23 AM, Adi Kriegisch wrote:
> Hi!
>
> > We were experimenting with the newer
Hi!
> We were experimenting with the newer version of OCFS2 on Debian 9
> Stretch inside KVM GUESTS.
[...]
> We have 3 nodes, but it is the same with 1 single node when we do apache
> benchmark on the vm it crashes (becomes unpingable, unreachable, kernel
> crashlog on virtual console) until
Hello List,
We were experimenting with the newer version of OCFS2 on Debian 9
Stretch inside KVM GUESTS.
#1 SMP Debian 4.11.6-1~bpo9+1 (2017-07-09) x86_64 GNU/Linux
Kernels what we have tried:
vmlinuz-4.11.0-0.bpo.1-amd64
vmlinuz-4.1.1 custom
vmlinuz-4.9.0-3-amd64
We have 3 nodes, but it is
On 08/21/2015 03:10 AM, Martin Lund wrote:
Hello,
We have a 3 node OCFS2 cluster, using:
Kernel: 3.16.0-0.bpo.4-amd64
ii ocfs2-tools 1.6.4-1+deb7u1amd64
tools for managing OCFS2 cluster filesystems
Today two of the nodes out of the 3 had
Hello,
We have a 3 node OCFS2 cluster, using:
Kernel: 3.16.0-0.bpo.4-amd64
ii ocfs2-tools 1.6.4-1+deb7u1amd64
tools for managing OCFS2 cluster filesystems
Today two of the nodes out of the 3 had some partial OCFS2 related kernel panic
(see at
[ 1481.620253] o2hb: Unable to stabilize heartbeart on region
1352E2692E704EEB8040E5B8FF560997 (vdb)
What this means is that the device is suspect. o2hb writes are not hitting
the disk. vdb is accepting and
acknowledging the write but spitting out something else during the next
read. Heartbeat
: Wednesday, June 29, 2011 5:57:19 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote:
That's troubling, these are really static systems. I know anything
can happen, but to inherit a kernel issue two years later seems
To: B Leggett blegg...@ngent.com
Cc: Jürgen Herrmann juergen.herrm...@xlhost.de, ocfs2-users@oss.oracle.com
Sent: Thursday, June 30, 2011 1:24:18 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
Try setting /proc/sys/kernel/panic_on_oops to 1. It appears you are
getting oopses
Hi,
I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box.
This is a 3 node cluster that's been running for 2 years with just about zero
modification. The storage is a high end SAN and the transport is iscsi. We went
two years without an issue and all a sudden node 1 in
, June 29, 2011 2:42:08 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
1.2.1? That's 5 years old. We've had a few fixes since then. ;)
You have to catch the oops trace to figure out the reason. And one
way to get it by using netconsole. Check the sles10 docs to see how
83 eb 1c8b 43 18 8d 53 04
e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08
- Original Message -
From: B Leggettblegg...@ngent.com
To: ocfs2-users@oss.oracle.com
Sent: Wednesday, June 29, 2011 3:42:42 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
For the list
?
- Original Message -
From: Sunil Mushran sunil.mush...@oracle.com
To: B Leggett blegg...@ngent.com
Cc: ocfs2-users@oss.oracle.com
Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
You should ping your kernel vendor. While this does
Mushran sunil.mush...@oracle.com
To: B Leggett blegg...@ngent.com
Cc: ocfs2-users@oss.oracle.com
Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada
Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
You should ping your kernel vendor. While this does not look ocfs2
related, even
/Canada Eastern
Subject: Re: [Ocfs2-users] OCFS2 Crash
On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote:
That's troubling, these are really static systems. I know anything
can happen, but to inherit a kernel issue two years later seems nuts.
Not that your analysis is wrong, just blows
Hi,
We're running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2
64bit cluster. We've now experienced the following crash twice:
Apr 11 14:25:55 XXX-XXX03 kernel:
(o2net,19027,5):dlm_assert_master_handler:1837 ERROR: DIE! Mastery
assert from 0, but current owner is 1!
Hi,
2days ago our ocfs2 cluster was crashing. On all nodes the filesystem
wasn't reachable any longer.
Here are the logs of two nodes:
1.: (ATTENTION reverse Order of log)
http://pastebin.com/u97mcqX6
2.: (http://pastebin.com/tGizwSu0)
Stefan
___
Hi,
I'm getting system (and eventually cluster) crashes on intensive disk
writes in ubuntu server 10.04 with my OCFS2 file system.
I have an iSER (infiniband) backed shared disk array with OCFS2 on it.
There are 6 nodes in the cluster, and the heartbeat interface is over a
regular 1GigE
Hi Guys,
Upon more investigation, it seems that my ext4 fs is getting data
corruption at the FS level as well. It may be something up with iSER and
stgt afterall. I'll do a bit more investigating.
Sorry for the trouble.
Matt
___
Ocfs2-users
Does this appear to be the same issue as the OOM Killer issue
previously reported that would be fixed with ocfs2 1.2.4?
On 1/16/07, Sunil Mushran [EMAIL PROTECTED] wrote:
Looks to be running out of lowmem.
# date
# cat /proc/meminfo
# cat /proc/slabinfo
Run a script that dumps the above every
Does the slab data have to be right before a crash? Or can we tell
from just 2-3 days of data collection? After one day it appears
certain numbers from slabinfo are only going up.
On 1/17/07, Sunil Mushran [EMAIL PROTECTED] wrote:
Could be. But I cannot say for sure till I don't get the
Looks to be running out of lowmem.
# date
# cat /proc/meminfo
# cat /proc/slabinfo
Run a script that dumps the above every 1 to 5 mins. That should
help explain the cause.
Brian Sieler wrote:
Using 2-node clustered file system on DELL/EMC SAN/RHEL
2.6.9-34.0.2.ELsmp x86_64.
Config:
25 matches
Mail list logo