Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-29 Thread netbsd
Hello, As I said it in the previous thread, the 3 nodes are 3 identical KVM virtual machines running on the same physical host which have more than enough resources (48 CPU 256GB RAM Gbit network). I have also tried to move them to other physical servers but didn't help. I also run constant p

Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread Changwei Ge
Hi, It seems that your cluster has something wrong with connection between nodes. So no dlm message can be sent out. This may cause a node being fenced, thus to crash. Please check your network condition including switch, Ethernet HBA card, etc. Thanks, Changwei On 2017/11/28 18:07, net...@tan

Re: [Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread Gang He
Hello Netbsd, What was your problem? dlm_send_remote_convert_request failed, or hung_task_timeout? Thanks Gang >>> > Hello, > > Servers crashed like 20 times since the last time I wrote to the list. > Today is the last with: > > [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_r

[Ocfs2-users] OCFS2 CRASH Again and Again, this filesystem is COMPLETE GARBAGE

2017-11-28 Thread netbsd
Hello, Servers crashed like 20 times since the last time I wrote to the list. Today is the last with: [ 1901.810483] (php-fpm7.0,822,3):dlm_send_remote_convert_request:420 ERROR: Error -107 when sending message 504 (key 0x91e4e5c6) to node 0 [ 1901.918314] (php-fpm7.0,822,3):dlm_send_remote_con

Re: [Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-09-05 Thread Carlos A. Pedreros Lizama
Hi, in some cases, apache could crash a OCFS2 cluster, because a miss defined reading or write privilege to a file or folder and provoke a dead-lock in the cluster. On Fri, Sep 1, 2017 at 4:23 AM, Adi Kriegisch wrote: > Hi! > > > We were experimenting with the newer version of OCFS2 on Debian 9

Re: [Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-09-01 Thread Adi Kriegisch
Hi! > We were experimenting with the newer version of OCFS2 on Debian 9 > Stretch inside KVM GUESTS. [...] > We have 3 nodes, but it is the same with 1 single node when we do apache > benchmark on the vm it crashes (becomes unpingable, unreachable, kernel > crashlog on virtual console) until destr

[Ocfs2-users] OCFS2 Crash With Multiple Virtual CPUs

2017-08-28 Thread netbsd
Hello List, We were experimenting with the newer version of OCFS2 on Debian 9 Stretch inside KVM GUESTS. #1 SMP Debian 4.11.6-1~bpo9+1 (2017-07-09) x86_64 GNU/Linux Kernels what we have tried: vmlinuz-4.11.0-0.bpo.1-amd64 vmlinuz-4.1.1 custom vmlinuz-4.9.0-3-amd64 We have 3 nodes, but it is t

Re: [Ocfs2-users] OCFS2 Crash

2015-08-21 Thread Srinivas Eeda
On 08/21/2015 03:10 AM, Martin Lund wrote: > Hello, > > We have a 3 node OCFS2 cluster, using: > > Kernel: 3.16.0-0.bpo.4-amd64 > ii ocfs2-tools 1.6.4-1+deb7u1amd64 >tools for managing OCFS2 cluster filesystems > > Today two of the nodes out of th

[Ocfs2-users] OCFS2 Crash

2015-08-21 Thread Martin Lund
Hello, We have a 3 node OCFS2 cluster, using: Kernel: 3.16.0-0.bpo.4-amd64 ii ocfs2-tools 1.6.4-1+deb7u1amd64 tools for managing OCFS2 cluster filesystems Today two of the nodes out of the 3 had some partial OCFS2 related kernel panic (see at

Re: [Ocfs2-users] [OCFS2] Crash at o2net_shutdown_sc()

2013-03-02 Thread richard -rw- weinberger
On Fri, Mar 1, 2013 at 10:42 PM, Srinivas Eeda wrote: > Yes that was the crash I was referring to which stopped me from testing my > other patch on mainline. I think the crashes started since some workqueue > patches introduced by commit 57b30ae77bf00d2318df711ef9a4d2a9be0a3a2a > Earlier kernels

Re: [Ocfs2-users] [OCFS2] Crash at o2net_shutdown_sc()

2013-03-01 Thread Sunil Mushran
[ 1481.620253] o2hb: Unable to stabilize heartbeart on region 1352E2692E704EEB8040E5B8FF560997 (vdb) What this means is that the device is suspect. o2hb writes are not hitting the disk. vdb is accepting and acknowledging the write but spitting out something else during the next read. Heartbeat de

Re: [Ocfs2-users] [OCFS2] Crash at o2net_shutdown_sc()

2013-03-01 Thread Srinivas Eeda
Yes that was the crash I was referring to which stopped me from testing my other patch on mainline. I think the crashes started since some workqueue patches introduced by commit 57b30ae77bf00d2318df711ef9a4d2a9be0a3a2a Earlier kernels should be fine. Patch https://lkml.org/lkml/2012/10/18/592

[Ocfs2-users] [OCFS2] Crash at o2net_shutdown_sc()

2013-03-01 Thread richard -rw- weinberger
Hi! Using 3.8.1 OCFS2 crashes while joining nodes to the cluster. The cluster consists of 10 nodes, while node3 joins the kernel on node3 crashes. (Somtimes later...) See dmesg below. Is this a known issue? I didn't test older kernels so far. node1: [ 1471.881922] o2dlm: Joining domain 1352E2692E

Re: [Ocfs2-users] OCFS2 Crash

2011-06-30 Thread B Leggett
t;B Leggett" Cc: "Jürgen Herrmann" , ocfs2-users@oss.oracle.com Sent: Thursday, June 30, 2011 1:24:18 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash Try setting /proc/sys/kernel/panic_on_oops to 1. It appears you are getting oopses but the box keeps running

Re: [Ocfs2-users] OCFS2 Crash

2011-06-30 Thread Herbert van den Bergh
oss.oracle.com > Sent: Wednesday, June 29, 2011 5:57:19 PM GMT -05:00 US/Canada Eastern > Subject: Re: [Ocfs2-users] OCFS2 Crash > > On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote: >> That's troubling, these are really static systems. I know anything >>

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
ern Subject: Re: [Ocfs2-users] OCFS2 Crash On Wed, 29 Jun 2011 16:43:09 -0500 (GMT-05:00), B Leggett wrote: > That's troubling, these are really static systems. I know anything > can happen, but to inherit a kernel issue two years later seems nuts. > Not that your analysis is wrong,

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Jürgen Herrmann
> > - Original Message - > From: "Sunil Mushran" > To: "B Leggett" > Cc: ocfs2-users@oss.oracle.com > Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada > Eastern > Subject: Re: [Ocfs2-users] OCFS2 Crash > > You should ping your kernel

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
build? - Original Message - From: "Sunil Mushran" To: "B Leggett" Cc: ocfs2-users@oss.oracle.com Sent: Wednesday, June 29, 2011 5:23:40 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash You should ping your kernel vendor. While this does not look ocfs

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Sunil Mushran
c7 40 > 04 00 00 00 00 c3 56 53 8b 70 04 eb 2c 8b 5e 04 83 eb 1c<8b> 43 18 8d 53 04 > e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08 > > - Original Message - > From: "B Leggett" > To: ocfs2-users@oss.oracle.com > Sent: Wednesday, June 29, 2011 3:42:42 P

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
--- Original Message - From: "B Leggett" To: ocfs2-users@oss.oracle.com Sent: Wednesday, June 29, 2011 3:42:42 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash For the list, I accidentally sent it direct to Sunil. My apologies for that. Bruce - Original

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
For the list, I accidentally sent it direct to Sunil. My apologies for that. Bruce - Original Message - From: "B Leggett" To: "Sunil Mushran" Sent: Wednesday, June 29, 2011 3:40:52 PM GMT -05:00 US/Canada Eastern Subject: Re: [Ocfs2-users] OCFS2 Crash Sunil, I did

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Sunil Mushran
1.2.1? That's 5 years old. We've had a few fixes since then. ;) You have to catch the oops trace to figure out the reason. And one way to get it by using netconsole. Check the sles10 docs to see how to configure netconsole. Or, whatever is recommended for capturing the oops log in that release. O

[Ocfs2-users] OCFS2 Crash

2011-06-29 Thread B Leggett
Hi, I am running the OCFS2 1.2.1 on SLES 10, just the stuff right out of the box. This is a 3 node cluster that's been running for 2 years with just about zero modification. The storage is a high end SAN and the transport is iscsi. We went two years without an issue and all a sudden node 1 in t

[Ocfs2-users] OCFS2 crash

2011-04-15 Thread Thompson, Mark
Hi, We're running ocfs2-2.6.18-92.el5-1.4.7-1.el5, on a 12 node Red Hat 5.2 64bit cluster. We've now experienced the following crash twice: Apr 11 14:25:55 XXX-XXX03 kernel: (o2net,19027,5):dlm_assert_master_handler:1837 ERROR: DIE! Mastery assert from 0, but current owner is 1! (O0

Re: [Ocfs2-users] ocfs2 crash with bugs reports (dlmmaster.c)

2011-03-01 Thread Piotr Teodorowski
Thanks for quick response, the bug: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1319 Regards, Piotr Teodorowski On Tuesday 01 of March 2011 02:55:01 Sunil Mushran wrote: > Thanks for the bug report. Please can you file a bz and attach > the all the message files. Yes the problem started with t

Re: [Ocfs2-users] ocfs2 crash with bugs reports (dlmmaster.c)

2011-02-28 Thread Sunil Mushran
Thanks for the bug report. Please can you file a bz and attach the all the message files. Yes the problem started with the hb timeout in esiprap01. The problem spread to other nodes possibly because of a race in migration. A bz will help us track the issue more easily. On 02/28/2011 01:46 AM, Pio

[Ocfs2-users] OCFS2 Crash

2011-01-09 Thread Stefan Priebe - Profihost AG
Hi, 2days ago our ocfs2 cluster was crashing. On all nodes the filesystem wasn't reachable any longer. Here are the logs of two nodes: 1.: (ATTENTION reverse Order of log) http://pastebin.com/u97mcqX6 2.: (http://pastebin.com/tGizwSu0) Stefan ___ Oc

Re: [Ocfs2-users] ocfs2 crash on intensive disk write

2010-08-22 Thread Matthew Chan
Hi Guys, Upon more investigation, it seems that my ext4 fs is getting data corruption at the FS level as well. It may be something up with iSER and stgt afterall. I'll do a bit more investigating. Sorry for the trouble. Matt ___ Ocfs2-users mailin

[Ocfs2-users] ocfs2 crash on intensive disk write

2010-08-21 Thread Matthew Chan
Hi, I'm getting system (and eventually cluster) crashes on intensive disk writes in ubuntu server 10.04 with my OCFS2 file system. I have an iSER (infiniband) backed shared disk array with OCFS2 on it. There are 6 nodes in the cluster, and the heartbeat interface is over a regular 1GigE conn

Re: [Ocfs2-users] OCFS2 crash

2007-01-17 Thread Brian Sieler
Does the slab data have to be right before a crash? Or can we tell from just 2-3 days of data collection? After one day it appears certain numbers from slabinfo are only going up. On 1/17/07, Sunil Mushran <[EMAIL PROTECTED]> wrote: Could be. But I cannot say for sure till I don't get the slab/m

Re: [Ocfs2-users] OCFS2 crash

2007-01-17 Thread Sunil Mushran
Could be. But I cannot say for sure till I don't get the slab/mem data. Brian Sieler wrote: Does this appear to be the same issue as the "OOM Killer" issue previously reported that would be fixed with ocfs2 1.2.4? On 1/16/07, Sunil Mushran <[EMAIL PROTECTED]> wrote: Looks to be running out of

Re: [Ocfs2-users] OCFS2 crash

2007-01-17 Thread Brian Sieler
Does this appear to be the same issue as the "OOM Killer" issue previously reported that would be fixed with ocfs2 1.2.4? On 1/16/07, Sunil Mushran <[EMAIL PROTECTED]> wrote: Looks to be running out of lowmem. # date # cat /proc/meminfo # cat /proc/slabinfo Run a script that dumps the above ev

Re: [Ocfs2-users] OCFS2 crash

2007-01-16 Thread Sunil Mushran
Looks to be running out of lowmem. # date # cat /proc/meminfo # cat /proc/slabinfo Run a script that dumps the above every 1 to 5 mins. That should help explain the cause. Brian Sieler wrote: Using 2-node clustered file system on DELL/EMC SAN/RHEL 2.6.9-34.0.2.ELsmp x86_64. Config: O2CB_HEAR

[Ocfs2-users] OCFS2 crash

2007-01-16 Thread Brian Sieler
Using 2-node clustered file system on DELL/EMC SAN/RHEL 2.6.9-34.0.2.ELsmp x86_64. Config: O2CB_HEARTBEAT_THRESHOLD=30 Kernel param: elavator=deadline (per FAQ) These log items appear and the server crashes. Has happened twice now at three week intervals, each time during a heavy IO operation: