Re: [Ocfs2-users] one node kernel panic

2011-10-07 Thread Sunil Mushran
same as UEK5 arise? > > (2011/10/05 1:45), Sunil Mushran wrote: >> int sigprocmask(int how, sigset_t *set, sigset_t *oldset) >> { >> int error; >> >> spin_lock_irq(¤t->sighand->siglock); < CRASH >> if (oldset) >> *oldset = current->b

Re: [Ocfs2-users] Kernel Panic / Fencing

2011-10-06 Thread Sunil Mushran
I am unclear. What happens when a server is rebooted (or crashes). Crash the network? Can you expand on this? On 10/06/2011 05:52 PM, Tony Rios wrote: > Hey all, > > I'm running a current version of Ubuntu and we are using OCFS2 across > a cluster of 9 web servers. > Everything works perfectly, so

Re: [Ocfs2-users] Fwd: OCFS drives not syncing

2011-10-05 Thread Sunil Mushran
On 10/05/2011 08:46 AM, Bradlee Landis wrote: > Sorry Sunil, my email replied to you instead of the list. > > On Wed, Oct 5, 2011 at 10:09 AM, Sunil Mushran > wrote: >> ocfs2 is a shared disk cluster file system. It requires a shared disk. >> >> However, if you are

Re: [Ocfs2-users] OCFS drives not syncing

2011-10-05 Thread Sunil Mushran
ocfs2 is a shared disk cluster file system. It requires a shared disk. However, if you are only going to use 2 nodes, you could use drbd, a replicating block device. To ocfs2, it appears as a shared disk. Google drbd and ocfs2 for more. On 10/05/2011 07:15 AM, Bradlee Landis wrote: > I have asked

Re: [Ocfs2-users] one node kernel panic

2011-10-04 Thread Sunil Mushran
int sigprocmask(int how, sigset_t *set, sigset_t *oldset) { int error; spin_lock_irq(¤t->sighand->siglock); < CRASH if (oldset) *oldset = current->blocked; ... } current->sighand is NULL. So definitely a race. Generic kernel issue. Ping your kernel

Re: [Ocfs2-users] dlm_lockres_release:507 ERROR: Resource W0000000000000001b027d69b591f15 not on the Tracking list

2011-09-30 Thread Sunil Mushran
On 09/30/2011 06:49 AM, Herman L wrote: > On Thursday, September 29, 2011 2:04 PM Sunil Mushran wrote: >> On 09/29/2011 08:56 AM, Herman L wrote: >>>> On Wednesday, September 21, 2011 4:00 PM, Sunil Mushran wrote: >>>> On 09/21/2011 12:37 PM, Herman L wrote: >&

Re: [Ocfs2-users] dlm_lockres_release:507 ERROR: Resource W0000000000000001b027d69b591f15 not on the Tracking list

2011-09-29 Thread Sunil Mushran
On 09/29/2011 08:56 AM, Herman L wrote: >> On Wednesday, September 21, 2011 4:00 PM, Sunil Mushran wrote: >> On 09/21/2011 12:37 PM, Herman L wrote: >>>>> On 09/19/2011 08:35 AM, Herman L wrote: >>>>> Hi all, >>>>> >>>>>

Re: [Ocfs2-users] Problem with tunefs.ocfs2, similar to fsck.ocfs2 on EL5

2011-09-27 Thread Sunil Mushran
On 09/27/2011 09:12 AM, Ulf Zimmermann wrote: > - -Original Message- >> From: Sunil Mushran [mailto:sunil.mush...@oracle.com] >> Sent: Monday, September 26, 2011 10:09 AM >> To: Ulf Zimmermann >> Cc: ocfs2-users@oss.oracle.com >> Subject: Re: [Ocfs2

Re: [Ocfs2-users] Problem with tunefs.ocfs2, similar to fsck.ocfs2 on EL5

2011-09-26 Thread Sunil Mushran
I'll look at the tunefs issue. But the other one does not make sense. strict_jbd is a compat flag. Mount should work. What is the mount error? As in, in dmesg. On 09/25/2011 04:43 AM, Ulf Zimmermann wrote: > As tunefs.ocfs2 wasn't working for us, I tried to mkfs.ocfs2 the volumes > again with --f

Re: [Ocfs2-users] ΑΠ: Linux kernel crash due to ocfs2

2011-09-22 Thread Sunil Mushran
-start.c:231 > self = (struct pthread *) 0x0 > result = > unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-8358199, 0, > 265182513, 0, > 0, 0, 0, 0, 0, 0, 0, 0, 0, 268234624, -5572000, -5571980, 3, 0, > 268367568, 268107764, 0, 570426402, 0, &g

Re: [Ocfs2-users] dlm_lockres_release:507 ERROR: Resource W0000000000000001b027d69b591f15 not on the Tracking list

2011-09-21 Thread Sunil Mushran
es: [ ], >>> inflight=0 >>> Sep 19 08:07:15 server-1 kernel: [3892420.398200] granted queue: >>> Sep 19 08:07:15 server-1 kernel: [3892420.398200] converting queue: >>> Sep 19 08:07:15 server-1 kernel: [3892420.398201] blocked queue: >>> >&g

Re: [Ocfs2-users] 11gr1 RAC + ocfs2 node2 is down and not able to mount the ocfs2 FS on node1

2011-09-19 Thread Sunil Mushran
The connect is failing. One of the main reason is a firewall. See if iptables is running. Check on both nodes. If so, shutdown it down or add a rule to allow traffic on the o2cb port. On 09/18/2011 08:57 PM, veeraa bose wrote: Hi All, we are having two node 11gr1 RAC (we have used ocfs2 for CR

Re: [Ocfs2-users] dlm_lockres_release:507 ERROR: Resource W0000000000000001b027d69b591f15 not on the Tracking list

2011-09-19 Thread Sunil Mushran
I've no idea of the state of the source that you are using. The message is a warning indicating a race. While it probably did not affect the functioning, there is no guarantee that that would be the case the next time around. The closest relevant patch is over 2 years old. http://oss.oracle.com/

Re: [Ocfs2-users] fsck doesn't fix "bad chain"

2011-09-17 Thread Sunil Mushran
Can you save the o2image of the volume when it is in that state. We'll need that for analysis. On 09/16/2011 05:41 AM, Andre Nathan wrote: > Hello > > For a while I had seen errors like this in the kernel logs: > >OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group >descriptor #69

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-16 Thread Sunil Mushran
downloading it. > > Thanks, > > George > > On Thu, 2011-09-15 at 09:45 -0700, Sunil Mushran wrote: >> I was hoping to get a readable stack. Please could you provide a link to >> the coredump. >> >> On 09/15/2011 02:51 AM, Betzos Giorgos wrote: >>> Hello,

Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Sunil Mushran
-F does not run the full fsck. -f does. But I would not recommend running fsck as this corruption is not normal. The inodes in the system directory have been overwritten. That typically means a storage issue. The fs does not create/remove inodes in sysdir. Only the tools do that. You may want to

Re: [Ocfs2-users] Trouble getting node to re-join two node cluster (OCFS2/DRBD Primary/Primary)

2011-09-15 Thread Sunil Mushran
open("/dev/drbd0", O_RDONLY|O_DIRECT) = -1 EMEDIUMTYPE (Wrong medium type) drbd_open() ... if (mdev->state.role != R_PRIMARY) { if (mode & FMODE_WRITE) rv = -EROFS; else if (!allow_oos) rv = -EMEDIUMTYPE;

Re: [Ocfs2-users] The mounting of too many OCFS2 volumes (i.e. 50 or more) per cluster

2011-09-15 Thread Sunil Mushran
That's very old. We have users having 50+ mounts. The one disadvantage is that the o2cb stack heartbeats on all mounts. That problem will be addressed in 1.8 (the tools will be released soon), with global heartbeat (hb volumes are user-configurable). Having said that, the number of volumes depends

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-15 Thread Sunil Mushran
/root/o2image.ppc.dbg > 10060000-10090000 rwxp 1006 00:00 0 > [heap] > f768-f7ff rw-p f768 00:00 0 > ff9a-ffaf rw-p ff9a 00:00 0 > [stack] > Aborted (core dumped) > > > On Thu, 2011-09-08 at 12:10 -0700, Sunil Mushran wrote: >> http://oss

Re: [Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Sunil Mushran
an_dir:0002 > 14 drwxr-xr-x 2 0 04096 > 22-May-2008 11:57 . > 6 drwxr-xr-x 6 0 04096 > 22-May-2008 11:57 .. > debugfs: ls -l //orphan_dir:0003 > 15 drwxr-xr-x 2 0 04096 > 22-May-2008 11:57

Re: [Ocfs2-users] Syslog reports (ocfs2_wq, 15527, 2):ocfs2_orphan_del:1841 ERROR: status = -2

2011-09-15 Thread Sunil Mushran
The issue that caused it has been fixed. The fix is here. http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commit;h=b6f3de3fd54026df748bfd1449bbe31b9803f8f7 The actual problem could have happened much earlier. 1.4.4 is showing the messages as it is more aggressive (than 1.4.1) in cleaning up the orph

Re: [Ocfs2-users] No space left on device

2011-09-14 Thread Sunil Mushran
On 09/14/2011 03:21 PM, Florin Andrei wrote: > On 09/14/2011 03:11 PM, Florin Andrei wrote: >> It's a 2-node cluster. I rebooted one node, waited until it came up, and >> now I can create files on that volume: > Nope, it's doing it again. :( > > # touch test > touch: cannot touch `test': No space l

Re: [Ocfs2-users] mount type heartbeat=local

2011-09-14 Thread Sunil Mushran
...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] *On Behalf Of *Sunil Mushran *Sent:* Friday, September 09, 2011 9:46 PM *To:* Hai Tao *Cc:* ocfs2-users@oss.oracle.com *Subject:* Re: [Ocfs2-users] mount type heartbeat=local That's mount type. Yes, we should not have overloaded the

Re: [Ocfs2-users] node_count=0

2011-09-12 Thread Sunil Mushran
It is wrong config. On 09/09/2011 10:15 PM, Hai Tao wrote: I have a two node ocfs2 cluster, and in the /etc/ocfs2/cluster.conf file, the node_count=0 rather than 2. Does this have to be a wrong config, and how would this affect the cluster? Thanks. Hai Tao _

Re: [Ocfs2-users] disable heartbeat nic caused ocfs2 errors

2011-09-12 Thread Sunil Mushran
ocfs2 uses disk heartbeat to detect node liveness. It uses net heartbeat to detect link liveness. Both need to operate for the cluster to function. If the network link between two nodes snaps, then one of the two nodes is fenced. The stack below indicates that the two nodes are not able to commun

Re: [Ocfs2-users] mount type heartbeat=local

2011-09-09 Thread Sunil Mushran
That's mount type. Yes, we should not have overloaded the term "local". On 09/09/2011 07:53 PM, Hai Tao wrote: but this is what I saw in the guide OCFS2 - A Cluster File System For Linux *

Re: [Ocfs2-users] mount type heartbeat=local

2011-09-09 Thread Sunil Mushran
That mount option is appended by mount.ocfs2. It tells users the heartbeat mode. "none" means non-clustered. "local" means the heartbeat region is on the mounted volume. This is the default mode. In 1.8 we have "global" which means the heartbeat region has been configured on 1+ devices. local and

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-08 Thread Sunil Mushran
tools-1.4.4-1.el5.ppc > > On Wed, 2011-09-07 at 09:13 -0700, Sunil Mushran wrote: >> version of ocfs2-tools? >> >> On 09/07/2011 09:10 AM, Betzos Giorgos wrote: >>> Hello, >>> >>> I tried what you suggested but here is what I got: >>> &

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-07 Thread Sunil Mushran
that particular case. Oracle DB rman backup > files are from 7 to 11Gb. > Maybe Oracle DataGuard was also using on the same fs. > After the crash, when we rebooted the servers, they would crash again. We > then noticed that > the fs was full and we removed some unneeded files. &

Re: [Ocfs2-users] (mount.ocfs2, 3315, 4):ocfs2_global_read_info:403 ERROR: status = 24

2011-09-06 Thread Sunil Mushran
kernel/fs. On 09/06/2011 10:31 PM, Stefan Priebe - Profihost AG wrote: > could you point me to the code? Is it ocfs2tools code or kernel code. > I wasn't able to find it. > > Stefan > > Am 06.09.2011 22:52, schrieb Sunil Mushran: >> harmless. the message needs to be

Re: [Ocfs2-users] (mount.ocfs2, 3315, 4):ocfs2_global_read_info:403 ERROR: status = 24

2011-09-06 Thread Sunil Mushran
harmless. the message needs to be silenced. On 09/06/2011 01:31 PM, Stefan Priebe - Profihost AG wrote: > Hi List, > > i've upgraded some machines to linux kernel from 2.8.38 to 3.0.4. Now > i'm always seeing this message when mounting an ocfs2 volume: > > [ 38.745584] (mount.ocfs2,3315,4):ocfs2

Re: [Ocfs2-users] Linux kernel crash due to ocfs2

2011-09-02 Thread Sunil Mushran
Can you provide me with the o2image. It includes the entire fs metadata. The size of the image file depends on the number of files/dirs. # o2image /dev/sdX /path/to/image/file So the error is clear. We have underestimated the amount of credits (num of blocks that need to be dirtied in that trans

Re: [Ocfs2-users] dlm locking bug?

2011-09-02 Thread Sunil Mushran
Log what you have in a bz. I can take a look. I doubt you will be able to attach that file though. You'll need to provide me with a link. On 09/02/2011 07:28 AM, Sérgio Surkamp wrote: > Hello, > > We have got a problem this morning with our cluster. > > Cluster setup: > > Servers: > * Two R800 Del

Re: [Ocfs2-users] Zero allocated blocks for nonempty file

2011-09-02 Thread Sunil Mushran
Yes. Files and directories under 3800 bytes (or so) are inlined. The max inline size depends on the features enabled. On 09/02/2011 05:42 AM, Sérgio Surkamp wrote: > Hi. > > I *suppose* its the inline-data feature, as it's permit the allocation > of small files and directories inside the inode its

Re: [Ocfs2-users] Max number of files in OCFS2 file system

2011-08-31 Thread Sunil Mushran
There is no such limit. You are running into a bug that has been fixed in mainline kernel 2.6.35 and is available with the UEK kernel. Upgrade to that kernel, install ocfs2-tools 1.6 and enable the discontig-bg feature. On 08/31/2011 03:15 PM, Omega Xtreme wrote: Hi All, Please I would like to k

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-31 Thread Sunil Mushran
On 08/24/2011 09:56 PM, Stefan Priebe - Profihost AG wrote: > > ok here is a new complete test with values before and after and all bonnie > details. > > File creation and seq. delete drops again massively. > Version 1.96 --Sequential Output-- --Sequential Input- --Random- Concurre

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-30 Thread Sunil Mushran
Was on vacation. May take me few days to clean up my mail box. On 08/29/2011 11:35 PM, Stefan Priebe - Profihost AG wrote: > Am 22.08.2011 23:27, schrieb Sunil Mushran: >> On 08/22/2011 11:11 AM, Stefan Priebe - Profihost AG wrote: >>>>>> BTW, how much memory do

Re: [Ocfs2-users] bug report: ocfs2 sparc64 panic

2011-08-30 Thread Sunil Mushran
We appear to be underestimating block credits for quota synching. OCFS2_QSYNC_CREDITS. Please file a bugzilla at oss.oracle.com/bugzilla so that we don't forget this. Possible temporary workarounds include: 1. Incrementing the above #define by a few. 2. Disabling quotas until we have a fix. Sun

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-22 Thread Sunil Mushran
On 08/22/2011 11:11 AM, Stefan Priebe - Profihost AG wrote: BTW, how much memory do the nodes have? Lock Resources: 1863 (497079) This means that that node has created 497K lock resources but have under 2000 cached. That could be due to deletes or could be due to lack

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-22 Thread Sunil Mushran
On 08/22/2011 09:57 AM, Stefan Priebe - Profihost AG wrote: > >> Well the values during and after the test will give better info. > I will create a dedicated test setup and provide new values and stats. > >> BTW, how much memory do the nodes have? >> Lock Resources: 1863 (497079) >> >> This means t

Re: [Ocfs2-users] Lost ocfs mount point on one node between two shared servers

2011-08-22 Thread Sunil Mushran
The user's guide explains all that. On 08/22/2011 09:34 AM, Kalra, Pratima wrote: It could be possible that it wasn't auto-mounted on reboot. Is there a separate setting for that? *From:*Sunil Mushran [mailto:sunil.mush...@oracle.com] *Sent:* Monday, August 22, 2011 9:27 AM *

Re: [Ocfs2-users] Lost ocfs mount point on one node between two shared servers

2011-08-22 Thread Sunil Mushran
On 08/22/2011 09:22 AM, Kalra, Pratima wrote: Hello All, Is it normal to lose a shared node in OCFS shared servers? We have either lost the whole ocfs mount point or lost mount point on one of the shared nodes couple of times. Is this due to some misconfiguration? Define lose? Could it b

Re: [Ocfs2-users] Please help me to find some answers

2011-08-22 Thread Sunil Mushran
On 08/22/2011 09:11 AM, Medapuram, Gopala wrote: > To all OCFS gurus, > > We have discussion scheduled on pros and cons of OCFS2 over NFS. > > Can you please guide me to some good notes and documentation to prepare for > proper discussion? > > Appreciate the help. > > Thank you, > Gopal NFS and O

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-22 Thread Sunil Mushran
On 08/22/2011 09:08 AM, Stefan Priebe - Profihost AG wrote: > HI, > > here are all values. Just a side note all machines had a fresh reboot. > So these values are not right "after" the test. > Network latency: cat /sys/kernel/debug/o2net/stats > 1,3,992696,1284798776,6583557287,129962217

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-22 Thread Sunil Mushran
On 08/22/2011 09:01 AM, Stefan Priebe - Profihost AG wrote: > >> What features do you have enabled on disk? >> debugfs.ocfs2 -R "stats" /dev/sdX > > The disks were formated using max-features flag. The output looks like > this: > Revision: 0.90 > Mount Count: 0 Max Mount Count: 20

Re: [Ocfs2-users] Slow OCFS2 file creation / how to make ocfs2 generally faster?

2011-08-22 Thread Sunil Mushran
On 08/21/2011 11:36 PM, Stefan Priebe - Profihost AG wrote: > Hi Guys, > > all in all ocfs2 is a nice piece of software and works very well. Last > week i made some benchmarks and was thinking if there is a way to make > it faster. > > Here are some results perhaps someone can comment them: > > XFS

Re: [Ocfs2-users] IO performance appears slow

2011-08-19 Thread Sunil Mushran
ile in the physical world? Is there a sweet spot for network latency that I should strive for? The user guide only makes mention of 'low latency' but lacks figures save for heartbeat and timeouts. -nick *From:*Sunil Mushran [mailto:sunil.mush...@oracle.com] *Sent:* Friday, August 19, 201

Re: [Ocfs2-users] IO performance appears slow

2011-08-19 Thread Sunil Mushran
from o2net not equivalent to a simple ping between the hosts? Is my reported latency too great for OCFS2 to function well? Thanks for your assistance. -Nick *From:*Sunil Mushran [mailto:sunil.mush...@oracle.com] *Sent:* Thursday, August 18, 2011 10:26 PM *To:* Nick Geron *Cc:* ocfs2-

Re: [Ocfs2-users] IO performance appears slow

2011-08-18 Thread Sunil Mushran
The network interconnect between the vms is slow. What would have helped is the sys and user times. But my guess is that that is low. Most of it is spent in wall time. In mainline, o2net dumps stats showing the ping time between nodes. Unfortunately this kernel is too old. On 08/18/2011 04:24 PM

Re: [Ocfs2-users] OCFS2 unmount problems after online resize

2011-07-25 Thread Sunil Mushran
The umount and the hb stop threads are deadlocking on the s_umount lock. This problem is due to the local heartbeat scheme employed in which the hb device is the same as the mounted one. umount trigger hb stop which calls open() => ... => rescan_partitions() => ... => get_super() => down_read().

Re: [Ocfs2-users] sudden crash, possibly OCFS was the cause?

2011-07-22 Thread Sunil Mushran
The log is not complete. It is best to configure netconsole/kdump/etc to capture the full oops trace. Having said that, the following patch fits the issue best. Available in releases after 1.4.7. http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commitdiff;h=adbd097b5bdc15c999bc04b16c6fba379cd5d3f2 Y

Re: [Ocfs2-users] Slow umounts on SLES10 patchlevel 3 ocfs2

2011-07-14 Thread Sunil Mushran
587471 resources > 585462C2FA5A428D913A3CBDBC77E116: 20 resources > > > Let me know if you need more information. > > Thanks > Marc. > - "Sunil Mushran" wrote: > >> It was designed to run in prod envs. >> >> On 07/07/2011 12:21 AM, Marc G

Re: [Ocfs2-users] reset all ocfs2 data

2011-07-11 Thread Sunil Mushran
If you've rebooted, then there is not much more to do. # /sbin/lsmod | grep ocfs2 # egrep "ocfs2|dlm" /proc/slabinfo After shutting down o2cb, run the above commands. The first one lists the modules. The second lists the slabs. Both should show no entries. Did you file a bugzilla for this? If no

Re: [Ocfs2-users] Slow umounts on SLES10 patchlevel 3 ocfs2

2011-07-07 Thread Sunil Mushran
It was designed to run in prod envs. On 07/07/2011 12:21 AM, Marc Grimme wrote: > Sunil, > can I query those figures during runtime of a productive cluster? > Or might it influence the availability performance what ever? > > Thanks for your help. > Marc. > - &

Re: [Ocfs2-users] Slow umounts on SLES10 patchlevel 3 ocfs2

2011-07-06 Thread Sunil Mushran
umount is a two step process. First the fs frees the inodes. Then the o2dlm takes stock of all active resources and migrates ones that are still in use. This typically takes some time. But I have never heard of it taking 45 mins. But I guess it could be if one has a lot of resources. Lets start by

Re: [Ocfs2-users] inotify

2011-06-30 Thread Sunil Mushran
This is because we have not hooked up inotify to the cluster stack. On 06/30/2011 07:26 AM, Jeroen Koekkoek wrote: > Hi, > > I'm running a 2 node OCFS2 + DRBD cluster to host maildirs. The IMAP server > (Dovecot) uses inotify to track changes to the maildir, and informs the > client when changes

Re: [Ocfs2-users] GPF when mounting second device in same cluster

2011-06-29 Thread Sunil Mushran
Strange. Both udevd and mount thread encountered issue in memory allocation routine. I would suggest you ping the kernel vendor. This looks more than just the fs. On 06/28/2011 04:36 PM, Richard Pickett wrote: Gents, OK, back to the single cluster, 2-node, w/ 3 devices. Here's my cluster.conf c

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Sunil Mushran
c7 40 > 04 00 00 00 00 c3 56 53 8b 70 04 eb 2c 8b 5e 04 83 eb 1c<8b> 43 18 8d 53 04 > e8 6d 3d fc ff 8b 03 e8 a8 12 ff ff 8d 46 08 > > - Original Message ----- > From: "B Leggett" > To: ocfs2-users@oss.oracle.com > Sent: Wednesday, June 29, 2011 3:42:42 P

Re: [Ocfs2-users] OCFS2 Crash

2011-06-29 Thread Sunil Mushran
1.2.1? That's 5 years old. We've had a few fixes since then. ;) You have to catch the oops trace to figure out the reason. And one way to get it by using netconsole. Check the sles10 docs to see how to configure netconsole. Or, whatever is recommended for capturing the oops log in that release. O

Re: [Ocfs2-users] Heartbeat stays active & stops o2cb shutdown

2011-06-28 Thread Sunil Mushran
C 3000, Australia > +61 3 9623 5488 | Mobile +61 0402 885 057 | _chris.shave@mercer.com_ > <mailto:chris.sh...@mercer.com> > _www.mmc.com_ <http://www.mmc.com/> > Working Hours: > Mon-Fri: 8:00am-4:00pm AEST > > > -

Re: [Ocfs2-users] how to do rolling upgrade the ocfs2 2 node cluster along with new kernel without application outage.

2011-06-28 Thread Sunil Mushran
You have to be more specific than that. Maybe best if you ping support. On 06/28/2011 09:26 AM, veeraa bose wrote: Hi ALL, I have to do rolling upgrade on two node ocfs2 cluster, patch the server one by one, with out application outage. I tested in pre-prod 2 node cluster, once the DB is stop

Re: [Ocfs2-users] multiple cluster doesn't work

2011-06-28 Thread Sunil Mushran
nk? > > > Thanks and God Bless, > > Richard W. Pickett, Jr. > www.MyHaitianAdoption.org <http://www.MyHaitianAdoption.org> > > P.S. Have you downloaded the journal from my trip to Haiti: > > http://www.myhaitianadopti

Re: [Ocfs2-users] multiple cluster doesn't work

2011-06-28 Thread Sunil Mushran
On 06/28/2011 08:07 AM, Richard Pickett wrote: > 1 Terabyte. We have 3 1Terabyte drives. They are already being replicated by > the lower-layer architecture, so we don't need to raid them. We'd like to be > able to use all three devices at the same time for archive purposes. > > I'm surprised to

Re: [Ocfs2-users] multiple cluster doesn't work

2011-06-27 Thread Sunil Mushran
om my trip to Haiti: http://www.myhaitianadoption.org/trips/journal-earthquake-rescue-jan-23-feb-2/ On Mon, Jun 27, 2011 at 9:05 PM, Sunil Mushran mailto:sunil.mush...@oracle.com>> wrote: Whereas the cluster.conf allows users to define multiple clusters, only one cluster can be active at any ti

Re: [Ocfs2-users] multiple cluster doesn't work

2011-06-27 Thread Sunil Mushran
Whereas the cluster.conf allows users to define multiple clusters, only one cluster can be active at any time. The bug you ran into has probably been fixed. The link has been posted in the bz. Why do you need multiple clusters active concurrently? On 06/27/2011 04:44 PM, Richard Pickett wrote: W

Re: [Ocfs2-users] Heartbeat stays active & stops o2cb shutdown

2011-06-27 Thread Sunil Mushran
So by default, the hb is supposed to stop on umount. Do: # find /sys/kernel/config/cluster//heartbeat/* -type d | xargs basename 77D95EF51C0149D2823674FCC162CF8B This will list the active heartbeats. For each hb, do: # ocfs2_hb_ctl -I -u 77D95EF51C0149D2823674FCC162CF8B 77D95EF51C0149D2823674F

Re: [Ocfs2-users] Kernel oops: ocfs2_read_blocks

2011-06-24 Thread Sunil Mushran
How many nodes? Does it happen on all the nodes or one in particular? Are you running the same kernel version on all nodes? Did this issue start reproducing after some update? How often does it happen? Maybe best if you file a bugzilla on oss.oracle.com/bugzilla and answer the qs there. This could

Re: [Ocfs2-users] ocfs2 with cman luster stack

2011-06-23 Thread Sunil Mushran
So this is ubuntu 11.04. The qs is is anyone using that distro/version gotten this to work. If not, then one possibility is a build issue. Maybe file a bug with ubuntu to see if they have tested it with their binaries. On 06/23/2011 01:55 PM, charles wrote: hello, i opened a bug on the ocfs2 bu

Re: [Ocfs2-users] number of nodes is LUN dependent or cluster group dependent ?

2011-06-21 Thread Sunil Mushran
On 6/21/2011 9:41 PM, Thomas Lau wrote: > As title, because LUN A on cluster group 1 is using N=8, if I have new > LUN B, can I set number of node to something bigger and join same > cluster group 1? Yes. ___ Ocfs2-users mailing list Ocfs2-users@oss.ora

Re: [Ocfs2-users] Unable to umount a filesystem - OCFS still thinks it has it mounted?‏

2011-06-16 Thread Sunil Mushran
Check /proc/mounts. That's the kernels view of the mounts. mount looks at /etc/mtab. And ocfs2 1.2 adds and removes entries /proc/fs/ocfs2 during mount/umount. Also, see if there are relevant errors in dmesg. On 06/16/2011 07:16 PM, Neil Campbell wrote: Hi all, Not sure what has happened but

Re: [Ocfs2-users] Any suggestions how to copy between two OCFS2 volumes faster?

2011-06-15 Thread Sunil Mushran
Try "dd bs=1M iflag=direct" on few files. See if that helps. On 06/15/2011 01:00 PM, Ulf Zimmermann wrote: > I need to copy a number of volumes from one SAN to another SAN. Most of our > volumes are snapclone based, so moving those has been easy. But we got > several 700GB volumes, which I can'

Re: [Ocfs2-users] ocfs2 slow write performance on Linux 2.6.38

2011-06-12 Thread Sunil Mushran
What type of writes are these... sequential or random? On 6/12/2011 5:37 PM, fibrer...@gmail.com wrote: > Hello all, > > I am benchmarking OCFS2 in a single node environment to see how its > performance stacks up against other Linux file systems. My hardware is > dual CPU, 6-cores per CPU, 2.4GHz

Re: [Ocfs2-users] Errors about a hole in an inode, not fixed by fsck.ocfs2

2011-06-08 Thread Sunil Mushran
On 06/08/2011 03:36 PM, Herman wrote: > Hi all, > > Using: RHEL 6 / DRBD 8.3.10-2 kmod from ElRepo / OCFS2 compiled from > Redhat's kernel source 2.6.32-71.18.2.el6.x86_64 > > I have a system running DRBD with OCFS2. The OCFS2 filesystem is not > being used for databases. I had a split-brain due

Re: [Ocfs2-users] ocfs2 writing files bigger than 4MB

2011-06-08 Thread Sunil Mushran
Upgrade to a more recent kernel (2.6.35+). Upgrade ocfs2-tools to 1.6.x. Run tunefs.ocfs2 and enable feature discontig-bg. This will address this issue. On 06/08/2011 02:16 PM, Osvaldo Alvarez Pozo wrote: > Hi > I can not copy files bigger than 4Mbytes! > I have an ocfs2 cluster with 4 nodes usin

Re: [Ocfs2-users] cannot write to filesystem, permission denied?

2011-06-07 Thread Sunil Mushran
On 06/07/2011 05:01 AM, Sven Karlsson wrote: > Hello, > > We have installed Fedora 15 to get the latest ocfs2 release nicely > packaged in a 2.6.38 kernel and ocfs2-tools 1.6.3. > Setup went fine, mkfs.ocfs2 went fine, a cluster was created and the > local node added: > > # o2cb_ctl -C -i -n myclus

Re: [Ocfs2-users] Problems with descriptions.

2011-06-02 Thread Sunil Mushran
That's the number of files open on the system. So this looks like an app problem. Some app has many files open. On 06/01/2011 10:37 PM, Vasyl S. Kostroma wrote: Hi guys! I can’t find an answer in google, so my last hope i

Re: [Ocfs2-users] mkfs.ocfs2 optimal options for web server

2011-06-01 Thread Sunil Mushran
On 06/01/2011 02:03 AM, Alex Sobrino wrote: > We're planning a three web server cluster based on OCFS2. Basically, it > will handle a huge CMS, with lots of PHP code, and some file uploads > (but mainly file reads). > > Initially, I was thinking in: > > - Block size 4K > - Cluster size 4K > - Node

Re: [Ocfs2-users] Large Files Hang Server

2011-05-25 Thread Sunil Mushran
If your apps do not care about atime, then noatime is helpful. data=writeback should performs better than data=ordered. But there is a small chance that files having trailing nulls if a node were to reboot after a journal commit but before a data flush. This is documented in the manpages and the us

Re: [Ocfs2-users] Large Files Hang Server

2011-05-24 Thread Sunil Mushran
fer will hand ls -l when not on the node doing > the transfer. > > I am starting to think this is expected behaviour. Am I correct? > > +---+ > + Keith + > +-------+ > > On Tue, 24 May 2011, Sunil Mu

Re: [Ocfs2-users] Ocfs and ASM

2011-05-24 Thread Sunil Mushran
There should be no conflict. On 05/24/2011 11:32 AM, Keith W wrote: > I have a lab system that is currently running Oracle RAC 11g > with ASM volumes and grid infrastructure > > Is it possible to have an ocfs2 cluster running and accessing > a different disk as well as the oracle clustering for RA

Re: [Ocfs2-users] Large Files Hang Server

2011-05-24 Thread Sunil Mushran
/2011 11:45 AM, Keith W wrote: > No change in behavior. > My mount options > /dev/sdj1 /u03ocfs2 _netdev,noatime,data=writeback,nointr 0 0 > > +---+ > + Keith + > +---+ > > On Tue,

Re: [Ocfs2-users] Large Files Hang Server

2011-05-24 Thread Sunil Mushran
Repeat the same test but with volumes mounted with data=writeback mount option. mount -o data=writeback /dev/sdX /path On 05/24/2011 07:11 AM, Keith W wrote: > Hello list. > Apologies in advance, this may be a bit long. Just trying to give > as much info as I can at the outset. > > I have a two n

Re: [Ocfs2-users] OCFS2 1.6 for RHEL?

2011-05-17 Thread Sunil Mushran
"unbreakable linux kernel" tree: > > http://oss.oracle.com/git/?p=linux-2.6-unbreakable.git;a=blob;f=fs/ocfs2/ver.c;h=8da71cb480f9cdd4cb0ff67f70a9b863f62b9a8b;hb=HEAD > > It was bumped in september last year... > > [~/linux-2.6-unbreakable]$ git log fs/ocfs2/ver.c > co

Re: [Ocfs2-users] fsck.ocfs2

2011-05-16 Thread Sunil Mushran
Set up a netconsole server to catch oops log. On 5/16/2011 3:22 AM, Xavier Diumé wrote: I don't know if is it possible, but kernel panic error is not in /var/log/kern.log. 2011/5/13 Sunil Mushran <mailto:sunil.mush...@oracle.com>> Please do not remove the cc-s.

Re: [Ocfs2-users] OCFS2 1.6 for RHEL?

2011-05-13 Thread Sunil Mushran
to make successful products IMHO. > > /Kristian > > > Sunil Mushran skrev 2011-05-13 18:46: >> Support is a whole different ballgame. I am only talking >> about availability. And I interpreted that qs to be asking >> whether ocfs2 1.6 be available for the standard rh

Re: [Ocfs2-users] fsck.ocfs2

2011-05-13 Thread Sunil Mushran
know if it is the better way, but is the only that I've used succesfully. 2011/5/13 Sunil Mushran mailto:sunil.mush...@oracle.com>> On 05/13/2011 11:44 AM, Xavier Diumé wrote: Hello, Is it possible to fsck a mounted filesystem. When one of the cluster nodes r

Re: [Ocfs2-users] fsck.ocfs2

2011-05-13 Thread Sunil Mushran
On 05/13/2011 11:44 AM, Xavier Diumé wrote: > Hello, > Is it possible to fsck a mounted filesystem. When one of the cluster nodes > reboots because a kernel panic, the device requires fsck.ocfs2 because in > mounted.ocfs2 -f rebooted node is shown. If mounted.ocfs2 -f shows the rebooted node, th

Re: [Ocfs2-users] OCFS2 1.6 for RHEL?

2011-05-13 Thread Sunil Mushran
t - HP-Oracle Competency Center > Americas Shared Solutions Architecture (SSA) > Hewlett-Packard Company > 281 475 8632 / Tel > kei...@hp.com / Email > Reach the team at s...@hp.com > > -Original Message- > From: ocfs2-users-boun...@oss.oracle.com > [mailto:ocf

Re: [Ocfs2-users] OCFS2 1.6 for RHEL?

2011-05-13 Thread Sunil Mushran
On 05/13/2011 03:13 AM, Kristian Jörg wrote: > Hello! > > When is it planned ocfs2 1.6 will be available for RHEL? > > /Kristian No plans. Only OL/UEK. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-

Re: [Ocfs2-users] 答复: 答复: hi,if i can post ocfs2-dmesg to you?i have trouble on ocfs2

2011-05-13 Thread Sunil Mushran
00912f71f > 40f165f416bde747d85cdf71bc9dde700912f71f tags/v2.6.35-rc6~34^2~13] > > Would you give me a patch or URL. > > I only know kernel.org packag > > thanks > > *发件人:*Sunil Mushran [mailto:sunil.mush...@oracle.com] > *发送时间:*2011年5月13日10:09 > *收件人:*Longguang Yue > *抄送:*ocfs2-users@oss.ora

Re: [Ocfs2-users] 答复: hi,if i can post ocfs2-dmesg to you?i have trouble on ocfs2

2011-05-12 Thread Sunil Mushran
On 05/12/2011 06:50 PM, Longguang Yue wrote: > > Thank you first of all > > totally, there are 5 kinds of error occur. > > Spinlock leads to cpu lockup, o2net modules panic, kernel BUG at > mm/slub.c:2969, BUG unable to handle kernel paging request at addr > > My environment: kernel-2.6.32.23 + xe

Re: [Ocfs2-users] How to change node name ?

2011-05-12 Thread Sunil Mushran
It is a manual process until 1.6. The upcoming release of tools (1.8) will allow online modification and removal. On 05/12/2011 05:02 AM, Thomas Lau wrote: Guys, how could I change node name and delete nodes after add into cluster? /Connected by MOTOBLUR™/ ___

Re: [Ocfs2-users] Server hang after error

2011-05-09 Thread Sunil Mushran
Your config is sufficient. Hard to say why it did not reboot. Ping the debian mailing list to see if there are reports of the same on whatever kernel you are on. As far as the reason for it goes, there should have been a message just prior to the Kernel Panic message. Most likely reason is that i

Re: [Ocfs2-users] read/write performance across cluster

2011-05-04 Thread Sunil Mushran
On 05/04/2011 09:56 AM, Florin Andrei wrote: > On 05/04/2011 09:44 AM, Srinivas Eeda wrote: >> Yes, there is locking involved. Extending a file needs an exclusive >> lock. Grepping a file needs read lock. If the same node(lets call it >> writer node) does extending and grepping, then grep already h

Re: [Ocfs2-users] Kernel Feature List?

2011-05-03 Thread Sunil Mushran
man mkfs.ocfs2 is better. On May 3, 2011, at 6:24 PM, Tiger Yang wrote: > On 05/02/2011 03:57 PM, Stefan Priebe - Profihost AG wrote: >> Hi, >> >> is there a list available which ocfs2 feature is available at which >> vanilla kernel version? >> >> Stefan > Hi, > > There is one list for mainl

Re: [Ocfs2-users] How long for an fsck?

2011-04-23 Thread Sunil Mushran
On 04/23/2011 12:24 AM, Josep Guerrero wrote: > Hello, > How long did the debugfs output take? >>> I think about 30 minutes. No more than 50 for sure (just by looking at >>> the times of the mails). >>> Did fsck eventually finish? >>> No. I had to cancel it after it stayed 24 hours in the

Re: [Ocfs2-users] can't mount device

2011-04-22 Thread Sunil Mushran
Is this during boot or is the mount manual? Does it succeed on second attempt? On 04/22/2011 06:33 AM, Christophe BOUDER wrote: > Hello, > i'm running ocfs2 on 27 nodes > with 2 devices ( 2 fiber channel disk array storage) > on debian system > vanilla kernel 2.6.38.2 > ocfs2-tools1.6.3-1 > >

Re: [Ocfs2-users] How long for an fsck?

2011-04-22 Thread Sunil Mushran
On 04/22/2011 02:33 PM, Sunil Mushran wrote: > On 04/21/2011 10:46 AM, Josep Guerrero wrote: >> Hello again, >> >> It just finished. The output file is almost 9 MB long, but compressed is less >> than 1 MB. I attach it to the message. >> >>> Do: >>

Re: [Ocfs2-users] How long for an fsck?

2011-04-22 Thread Sunil Mushran
On 04/21/2011 10:46 AM, Josep Guerrero wrote: > Hello again, > > It just finished. The output file is almost 9 MB long, but compressed is less > than 1 MB. I attach it to the message. > >> Do: >> # debugfs.ocfs2 -R "stat //global_bitmap" /dev/hidrahome/lvol0 >> >> Does this hang too? Redirect the o

Re: [Ocfs2-users] How long for an fsck?

2011-04-21 Thread Sunil Mushran
On 04/21/2011 06:43 AM, Josep Guerrero wrote: > I have a cluster with 8 nodes, all of them running Debian Lenny (plus some > additions so multipath and Infiniband works), which share an array of 48 1TB > disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first > 21 pairs are or

<    1   2   3   4   5   6   7   8   9   10   >