Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-29 Thread netbsd
Hello, As I said it in the previous thread, the 3 nodes are 3 identical KVM virtual machines running on the same physical host which have more than enough resources (48 CPU 256GB RAM Gbit network). I have also tried to move them to other physical servers but didn't help. I also run constant

Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread Changwei Ge
Hi, It seems that your cluster has something wrong with connection between nodes. So no dlm message can be sent out. This may cause a node being fenced, thus to crash. Please check your network condition including switch, Ethernet HBA card, etc. Thanks, Changwei On 2017/11/28 18:07,

Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread Gang He
Hello Netbsd, What was your problem? dlm_send_remote_convert_request failed, or hung_task_timeout? Thanks Gang >>> > Hello, > > Servers crashed like 20 times since the last time I wrote to the list. > Today is the last with: > > [ 1901.810483]

[Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread netbsd
Hello, Servers crashed like 20 times since the last time I wrote to the list. Today is the last with: [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0 [ 1901.918314]

Re: [Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-09-05 Thread Carlos A. Pedreros Lizama
Hi, in some cases, apache could crash a OCFS2 cluster, because a miss defined reading or write privilege to a file or folder and provoke a dead-lock in the cluster. On Fri, Sep 1, 2017 at 4:23 AM, Adi Kriegisch wrote: > Hi! > > > We were experimenting with the newer

Re: [Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-09-01 Thread Adi Kriegisch
Hi! > We were experimenting with the newer version of OCFS2 on Debian 9 > Stretch inside KVM GUESTS. [...] > We have 3 nodes, but it is the same with 1 single node when we do apache > benchmark on the vm it crashes (becomes unpingable, unreachable, kernel > crashlog on virtual console) until

[Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-08-28 Thread netbsd
Hello List, We were experimenting with the newer version of OCFS2 on Debian 9 Stretch inside KVM GUESTS. #1 SMP Debian 4.11.6-1~bpo9+1 (2017-07-09) x86_64 GNU/Linux Kernels what we have tried: vmlinuz-4.11.0-0.bpo.1-amd64 vmlinuz-4.1.1 custom vmlinuz-4.9.0-3-amd64 We have 3 nodes, but it is

Re: [Ocfs2-users] OCFS2 Crash

2015-08-21 Thread Srinivas Eeda
On 08/21/2015 03:10 AM, Martin Lund wrote: Hello, We have a 3 node OCFS2 cluster, using: Kernel: 3.16.0-0.bpo.4-amd64 ii ocfs2-tools 1.6.4-1+deb7u1amd64 tools for managing OCFS2 cluster filesystems Today two of the nodes out of the 3 had

[Ocfs2-users] OCFS2 Crash

2015-08-21 Thread Martin Lund
Hello, We have a 3 node OCFS2 cluster, using: Kernel: 3.16.0-0.bpo.4-amd64 ii ocfs2-tools 1.6.4-1+deb7u1amd64 tools for managing OCFS2 cluster filesystems Today two of the nodes out of the 3 had some partial OCFS2 related kernel panic (see at

Re: [Ocfs2-users] [OCFS2] Crash at o2net_shutdown_sc()

2013-03-01 Thread Sunil Mushran
[ 1481.620253] o2hb: Unable to stabilize heartbeart on region 1352E2692E704EEB8040E5B8FF560997 (vdb) What this means is that the device is suspect. o2hb writes are not hitting the disk. vdb is accepting and acknowledging the write but spitting out something else during the next read. Heartbeat

Re: [Ocfs2-users] OCFS2 Crash

2011-06-30 Thread Herbert van den Bergh
: Wednesday, June 29, 2011 5:57:19 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote: That's troubling, these are really static systems. I know anything can happen, but to inherit a kernel issue two years later seems

Re: [Ocfs2-users] OCFS2 Crash

2011-06-30 Thread B Leggett
To: B Leggett blegg...@ngent.com Cc: Jürgen Herrmann juergen.herrm...@xlhost.de, ocfs2-users@oss.oracle.com Sent: Thursday, June 30, 2011 1:24:18 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash Try setting /proc/sys/kernel/panic_on_oops to 1. It appears you are getting oopses

[Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
Hi, I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
, June 29, 2011 2:42:08 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash 1.2.1? That's 5 years old. We've had a few fixes since then. ;) You have to catch the oops trace to figure out the reason. And one way to get it by using netconsole. Check the sles10 docs to see how

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Sunil Mushran
83 eb 1c8b 43 18 8d 53 04 e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08 - Original Message - From: B Leggettblegg...@ngent.com To: ocfs2-users@oss.oracle.com Sent: Wednesday, June 29, 2011 3:42:42 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash For the list

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
? - Original Message - From: Sunil Mushran sunil.mush...@oracle.com To: B Leggett blegg...@ngent.com Cc: ocfs2-users@oss.oracle.com Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash You should ping your kernel vendor. While this does

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Jürgen Herrmann
Mushran sunil.mush...@oracle.com To: B Leggett blegg...@ngent.com Cc: ocfs2-users@oss.oracle.com Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash You should ping your kernel vendor. While this does not look ocfs2 related, even

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote: That's troubling, these are really static systems. I know anything can happen, but to inherit a kernel issue two years later seems nuts. Not that your analysis is wrong, just blows

[Ocfs2-users] OCFS2 crash

2011-04-15 Thread Thompson, Mark
Hi, We're running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2 64bit cluster. We've now experienced the following crash twice: Apr 11 14:25:55 XXX-XXX03 kernel: (o2net,19027,5):dlm_assert_master_handler:1837 ERROR: DIE! Mastery assert from 0, but current owner is 1!

[Ocfs2-users] OCFS2 Crash

2011-01-09 Thread Stefan Priebe - Profihost AG
Hi, 2days ago our ocfs2 cluster was crashing. On all nodes the filesystem wasn't reachable any longer. Here are the logs of two nodes: 1.: (ATTENTION reverse Order of log) http://pastebin.com/u97mcqX6 2.: (http://pastebin.com/tGizwSu0) Stefan ___

[Ocfs2-users] ocfs2 crash on intensive disk write

2010-08-22 Thread Matthew Chan
Hi, I'm getting system (and eventually cluster) crashes on intensive disk writes in ubuntu server 10.04 with my OCFS2 file system. I have an iSER (infiniband) backed shared disk array with OCFS2 on it. There are 6 nodes in the cluster, and the heartbeat interface is over a regular 1GigE

Re: [Ocfs2-users] ocfs2 crash on intensive disk write

2010-08-22 Thread Matthew Chan
Hi Guys, Upon more investigation, it seems that my ext4 fs is getting data corruption at the FS level as well. It may be something up with iSER and stgt afterall. I'll do a bit more investigating. Sorry for the trouble. Matt ___ Ocfs2-users

Re: [Ocfs2-users] OCFS2 crash

2007-01-17 Thread Brian Sieler
Does this appear to be the same issue as the OOM Killer issue previously reported that would be fixed with ocfs2 1.2.4? On 1/16/07, Sunil Mushran [EMAIL PROTECTED] wrote: Looks to be running out of lowmem. # date # cat /proc/meminfo # cat /proc/slabinfo Run a script that dumps the above every

Re: [Ocfs2-users] OCFS2 crash

2007-01-17 Thread Brian Sieler
Does the slab data have to be right before a crash? Or can we tell from just 2-3 days of data collection? After one day it appears certain numbers from slabinfo are only going up. On 1/17/07, Sunil Mushran [EMAIL PROTECTED] wrote: Could be. But I cannot say for sure till I don't get the

Re: [Ocfs2-users] OCFS2 crash

2007-01-16 Thread Sunil Mushran
Looks to be running out of lowmem. # date # cat /proc/meminfo # cat /proc/slabinfo Run a script that dumps the above every 1 to 5 mins. That should help explain the cause. Brian Sieler wrote: Using 2-node clustered file system on DELL/EMC SAN/RHEL 2.6.9-34.0.2.ELsmp x86_64. Config: