Re: [Ocfs2-users] ocfs2 on VMware workstation

2008-05-20 Thread Sunil Mushran
What you are missing is a shared (virtual) disk. If you believe that a disk is so shared, then format it on one guest and then issue the following command on both. $ debugfs.ocfs2 -R "stats" /dev/sdX | grep UUID You should get the same UUID on both guests. Once this works, then attempt to mount

Re: [Ocfs2-users] OCFS2 + NFS setup deadlocking

2008-05-21 Thread Sunil Mushran
If the hang you see is after a node (with a mounted ocfs2 volume) dies, then it is a known one. This specific recovery bug was introduced in 1.2.7 and fixed in 1.2.8-2. 1.2.8-SLES-r3074 maps to 1.2.8-1. The fixed one should be version r3080 or more. If so, upgrade to the latest SLES10 SP1 kernel.

Re: [Ocfs2-users] huge "something" problem urgent

2008-05-23 Thread Sunil Mushran
Any errors in the /var/log/messages? Any busy locks: http://oss.oracle.com/~smushran/.debug/scripts/scanlocks $ scanlocks If so, dump them on all nodes using: $ echo R List of domains can be gotten from this: http://oss.oracle.com/~smushran/.debug/scripts/listdomains Alexandre Racine wrote: >

Re: [Ocfs2-users] huge "something" problem

2008-05-27 Thread Sunil Mushran
Alexandre Racine wrote: > Excellent, that works great! Now that I have the locks and the domain > name what should I do to unlock them? (Or fix the problem). > The locking and unlocking is handled by the dlm. I'm working on updating the wikis with more information on debugging such issues. For t

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-05-27 Thread Sunil Mushran
AFAIK: a. There is no force umount in Linux. b. There is no way to know whether a local fs is mounted on another node. Luis Freitas wrote: > Alexandra, > >You could use only CRS and ext3 instead of ocfs2 for this kind of > use. You would need to register a script to force umount the > filesy

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-05-27 Thread Sunil Mushran
Lazy umount is not the same as forced umount. The processes that have active descriptors will keep reading and writing to the fs. The processes are not killed. Luis Freitas wrote: > Hmm, > > There is a "lazy" umount: > > >-l Lazy unmount. Detach the filesystem from the filesystem >

Re: [Ocfs2-users] OCFS2 & Sparse image files?

2008-05-28 Thread Sunil Mushran
http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-new-features.html Added in 2.6.22 It will be available for enterprise kernels with ocfs2 1.4. 1.4 has just started shipping with sles10 sp2. It will be available for (rh)el5 u2 in a months time. Matthew Barr wrote: > Does ocfs2 support

Re: [Ocfs2-users] huge "something" problem

2008-06-02 Thread Sunil Mushran
That means o2dlm is not the cause for the process hang. Next, run ps: $ ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN Run it 6 times in 10 sec interval. This should tell us the processes in D state and the location in kernel. Hopefully. The next step is to to get the stack trace. alt-sysrq-t. T

Re: [Ocfs2-users] huge "something" problem

2008-06-02 Thread Sunil Mushran
; jfsCommit jfs_lazycommit > 332 S< jfsSync jfs_sync > 333 S< xfslogd/0 worker_thread > 334 S< xfslogd/1 worker_thread > 335 S< xfslogd/2 worker_thread > 336 S< xfslogd/3 worker_thread > 337 S< xfsda

Re: [Ocfs2-users] CRS/CSS and OCFS2

2008-06-05 Thread Sunil Mushran
Stop o2cb and switch node number in /etc/cluster/ocfs2.conf. After changing on boh, restart o2cb on both. [EMAIL PROTECTED] wrote: > > Hi Sunil, > > my lotus notes choked on the table from excel... So the two nodes have > the following nodenumbers: > Node ocfs2 crs/css > byaz05 0 2 > byaz10 1 1 >

Re: [Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen

2008-06-07 Thread Sunil Mushran
What version/kernel are you running? We are about to release 1.2.9 that addresses one known case of o2net consuming 100% cpu. Due out next week. Michael S. Moody wrote: > > I had an instance today on several servers where the load average > soared, and all of my apache processes were in uninterr

Re: [Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen

2008-06-09 Thread Sunil Mushran
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=c824c3c723f2e37a00b3b739a55b28de595fd72e Michael Moody wrote: > Ocfs2-tools version 1.3.9 > Kernel 2.6.24-gentoo-r8 > > Michael > > Sunil Mushran wrote: > >> What version/kernel are you

Re: [Ocfs2-users] OCFS2 Threads using 100% CPU, filesystem operations were frozen

2008-06-09 Thread Sunil Mushran
The one I am aware of is o2net spinning at 100% before the node is fenced due to cluster timeout. Michael Moody wrote: > What ramifications could this bug have? > > Random filesystem lockups, file locks not being released? > > Michael > > Sunil Mushran wrote: > >

[Ocfs2-users] OCFS2 1.2.9-1 for RHEL4 and RHEL5 released

2008-06-09 Thread Sunil Mushran
All, We are pleased to announce the release of OCFS2 1.2.9-1 for RHEL4 and RHEL5 on x86, x86_64, ppc64 and ia64 architectures. This release includes bug fixes, most of which have been backported from the mainline kernel. Some of the more interesting ones have been described in detail. For the ful

Re: [Ocfs2-users] mystery reboot

2008-06-10 Thread Sunil Mushran
http://oss.oracle.com/bugzilla/show_bug.cgi?id=919 Fixed in 1.2.9-1. SUSE has the fix checked into their tree. It should be out soon with sles10 sp1. Charlie Sharkey wrote: > > On a two node cluster I got a reboot (core dump) on the first node. > The /var/log/messages doesn't show anything wron

Re: [Ocfs2-users] errors in logs...

2008-06-17 Thread Sunil Mushran
Yes. I am assuming this was from the time you had to force shutdown the servers. If so, ignore. Alexandre Racine wrote: > Hi all, > > I had a drive freeze and after shutting down all servers and starting > them one by one I had this in the logs. What does this tells you? > Network problems? Thanks

Re: [Ocfs2-users] errors in logs...

2008-06-17 Thread Sunil Mushran
The only relevant message is the first one that indicates that srv2 has not heard from srv1 for 30 secs. The rest of the messages are because the link broke and are more informative than errors. Alexandre Racine wrote: > Well this was today, but yes before I had to force shutdown the > machines. >

Re: [Ocfs2-users] OCFS2 inside Virtuozzo nodes

2008-06-18 Thread Sunil Mushran
The ocfs2 1.x packages are only meant for sles and (rh)el and not other distributions. 2.6.18 and 2.6.16 shipped with (rh)el and sles respectively are not the same as the original mainline tree. So building it will require some development work. One option for you is to ping the kernel vendor and

Re: [Ocfs2-users] mkfs.ocfs2: double free or corruption

2008-06-18 Thread Sunil Mushran
File a bugzilla and attach the coredump. One thing to try would be reducing the number of slots. [EMAIL PROTECTED] wrote: > Dear Srs, > > I get this error when running "mkfs.ocfs2": > > = > # mkfs.ocfs2 -b 4K -C 32K -

Re: [Ocfs2-users] ocfs2 1.2.8 issues

2008-06-18 Thread Sunil Mushran
http://oss.oracle.com/projects/ocfs2/news/article_18.html This is oss bugzilla#919 that has been fixed in 1.2.9-1. Saranya Sivakumar wrote: > Hi, > > We recently upgraded ocfs2 to 1.2.8 from 1.2.3 on our 4 node RAC > production systems. > > On one of the nodes, we notice the following in the log

Re: [Ocfs2-users] ocfs2 1.2.8 issues

2008-06-18 Thread Sunil Mushran
mar > > > ----- Original Message > From: Sunil Mushran <[EMAIL PROTECTED]> > To: Saranya Sivakumar <[EMAIL PROTECTED]> > Cc: ocfs2-users@oss.oracle.com > Sent: Wednesday, June 18, 2008 3:54:31 PM > Subject: Re: [Ocfs2-users] ocfs2 1.2.8 issues > > http://

Re: [Ocfs2-users] OCFS2 available for Solaris?

2008-06-19 Thread Sunil Mushran
OCFS2 is only available for the Linux kernel. Diane Petersen wrote: > Hello, > > I'm building an 11g RAC 2-node cluster on Solaris 10 (64-bit SPARC). > Is there an OCFS2 version available and where would I find it? > I've looked on the main oss.oracle.com/projects/ocfs2 and it's not > obvious whe

[Ocfs2-users] Heads up regarding using nfs with ocfs2

2008-06-20 Thread Sunil Mushran
All, This is a heads up only for users exporting OCFS2 volumes as NFS mounts. If not, please disregard this email. Recently there was a bugzilla filed that mentioned observing file system lockups when accessing OCFS2 exported volumes with FreeBSD NFS clients. The lockups were not observed by him

Re: [Ocfs2-users] crash during big file transfers

2008-06-23 Thread Sunil Mushran
The fs in 2.6.21 still uses the old very short cluster timeouts. In mainline, the defaults were updated in 2.6.25-ish. The faq has the details on setting them manually. http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT Carlos Xavier wrote: > Dear Srs. > > I have be

Re: [Ocfs2-users] crash during big file transfers

2008-06-23 Thread Sunil Mushran
g for help on compiling the kernel module, so as we > can have a updated one for the kernel 2.6.21.5 and 2.6.24.5 distributed with > Slackware 12.0 and 12.1. > > Tanks, > Carlos Xavier. > > - Original Message - > From: "Sunil Mushran" <[EMAIL PROT

Re: [Ocfs2-users] Invalid argument while mounting

2008-06-24 Thread SUNIL . MUSHRAN
Run fsck to repair that inode. fsck.ocfs2 -f /dev/sdd1 Also, better if you upgrade the fs to 1.2.9-1. --- Begin Message --- I get the following error when trying to mount: Nothing changed (in any case not that I know of). I get the same error from both nodes. Please assist. # mount /u02 moun

Re: [Ocfs2-users] mount readonly without lockmanager

2008-06-24 Thread Sunil Mushran
Sure. You can mark the volume as local (man tunefs.ocfs2 or mkfs.ocfs2) and mount it without the cluster stack (like any local file system). You can use the ro mount option to mount it readonly. Combine the two and you get what you want. BTW, if the fs image on a physical ro media, the fs autom

Re: [Ocfs2-users] mount readonly without lockmanager

2008-06-25 Thread SUNIL . MUSHRAN
Kruyt [EMAIL PROTECTED] Sent: Wednesday, June 25, 2008 8:41 AM To: Sunil Mushran Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] mount readonly without lockmanager Thanks, But the filesystem is on a SAN, and shared arcoss 3 nodes. One of that node I dont wat a lockmanager and the system

Re: [Ocfs2-users] Invalid argument while mounting

2008-06-25 Thread Sunil Mushran
$ dd if=/dev/sdd1 of=/tmp/out count=100 bs=4K Can you file a bugzilla and attach the /tmp/out. Looks like it is unable to read the inode because it's signature if off. The above command will dump the first 400K of the device. I want to see what the extent of the corruption is. If it is localized

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-06-27 Thread Sunil Mushran
Fedora ships ocfs2 fs modules natively. You don't have to do all this. What is missing is the tools rpm. But the good news is that that should be available any day now literally speaking. Tina Soles wrote: > > Hello, > > I’m brand new to RAC and ocfs2. I need to install ocfs2, but there is >

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-06-27 Thread Sunil Mushran
ply. Can you be more specific and give me the exact > name of the native Fedora 9 rpm(s) that I need for ocfs2 and ocfs2-tools? > Thanks. > ---- > *From:* Sunil Mushran [mailto:[EMAIL PROTECTED] > *Sent:* Fri 6/27/

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-06-29 Thread SUNIL . MUSHRAN
ROTECTED] Sent: Sunday, June 29, 2008 10:16 PM To: Tina Soles Cc: Sunil Mushran; ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9 Hi Tina, Sorry, I am a not RAC expert. So you may have to wait for Sunil's suggestion for it. If you

Re: [Ocfs2-users] Fence abnormal and with not apparent reason

2008-06-30 Thread Sunil Mushran
Could be due to bugzilla#919 as explained in the list of fixes in 1.2.9-1. http://oss.oracle.com/projects/ocfs2/news/article_18.html Gabriele Di Giambelardini wrote: > Hi, this is my output on all the 5 servers > > Module "configfs": Loaded > Filesystem "configfs": Mounted > Module "ocfs2_nodemana

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-06-30 Thread Sunil Mushran
mount(2) man page lists the following reasons for it to return an EBUSY: EBUSY source is already mounted. Or, it cannot be remounted read-only, because it still holds files open for writing. Or, it cannot be mounted on target because target is still busy (it is the working directory of some

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-07-01 Thread Sunil Mushran
My recommendation is for you to use an enterprise kernel... (rh)el or sles. For shared storage, use iscsi. sles10 ships with a good iscsi target. Firewire as a shared disk was useful when there was no inexpensive shared disk available. That is no longer the case. Tina Soles wrote: > Sunil, > > I a

Re: [Ocfs2-users] ocfs2 fencing problem

2008-07-01 Thread Sunil Mushran
Upgrade to OCFS2 1.2.9-1 shipping with the latest SLES9 SP4 kernel (2.6.5-7.312). http://download.novell.com/Download?buildid=27kCZ1qWwWo~ You are most likely hitting bug#6680001 as mentioned here. http://oss.oracle.com/projects/ocfs2/news/article_17.html Also, you might want to tone down the he

Re: [Ocfs2-users] Slow backups, slow rsync

2008-07-01 Thread Sunil Mushran
Which kernel? uname -a? block/cluster sizes? debugfs.ocfs2 -R "stats" /dev/sdX How many nodes in your cluster? memory? cat /proc/meminfo Michael Moody wrote: > > I use rsync to take backups of my ocfs2 filesystems (since nothing > else really supports it out of the box). Unfortunately, it’s ve

Re: [Ocfs2-users] Slow backups, slow rsync

2008-07-01 Thread Sunil Mushran
34359738367 kB > VmallocUsed:293184 kB > VmallocChunk: 34359444779 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > > 5 nodes in the cluster. > > Michael > > -Ori

Re: [Ocfs2-users] "Propagate Configuration" missing

2008-07-02 Thread Sunil Mushran
What distro, kernel, packges versions, etc? http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#DOWNLOAD Check the requirements for the console. You could be missing a package that enables propagate config. [EMAIL PROTECTED] wrote: > I've been attempting to follow the instruct

Re: [Ocfs2-users] ocfs2 fencing problem

2008-07-03 Thread Sunil Mushran
Gabriele Di Giambelardini wrote: > Hi to all, some time ago, I read that the ocfs have a limit for the > subfolder. Is it possible this whren this limit gone exceeded the > ocfs2 have those problem??? > > > Or some boby know the limit number? http://oss.oracle.com/projects/ocfs2/dist/documentat

Re: [Ocfs2-users] ocfs2 limits

2008-07-03 Thread Sunil Mushran
y have hundreds of thousands files > inside the concurrent output and log directories. > > Regards, > Luis > > --- On *Thu, 7/3/08, Sunil Mushran /<[EMAIL PROTECTED]>/* wrote: > > From: Sunil Mushran <[EMAIL PROTECTED]> > Subject: Re: [Ocfs2-users] o

Re: [Ocfs2-users] Howto compile ocfs2-1.3.9-0.1 for CentOS5.2 2.6.18-92.1.6.el5xen

2008-07-04 Thread SUNIL . MUSHRAN
Post the exact command and the error message. --- Begin Message --- Hello, I would like to install OCFS2 on CentOS5.2 with the 2 2.6.18-92.1.6.el5xen kernel. When I try to compile the OCFS2 source for this kernel I see a message nothing todo for rhel5 in the vendor section. What I did was the fol

Re: [Ocfs2-users] Ooops in OCFS2

2008-07-04 Thread SUNIL . MUSHRAN
1.2.5 is year+ old. Suggest you upgrade to 1.2.9. The oops is bizzare to say the least. I notice you are using xenU kernel. 4 nodes are VMs? Just trying to understand the layout. Is it reproducible? Definitely upgrade to 1.2.9. If the issue reproduces, file a bugzilla with all the details. This

Re: [Ocfs2-users] Howto compile ocfs2-1.3.9-0.1 for CentOS5.2 2.6.18-92.1.6.el5xen +compile messages

2008-07-04 Thread SUNIL . MUSHRAN
./configure --with-src=/usr/src/kernels/2.6.18-92.1.6.el5-x86_64 make --- Begin Message --- > Post the exact command and the error message. This is a selection of the output, I have attached the complete output. [EMAIL PROTECTED] ocfs2-1.3.9]# ./configure checking build system type...

Re: [Ocfs2-users] Question regarding old memory leak

2008-07-07 Thread SUNIL . MUSHRAN
In mainline, that issue was resolved in 2.6.21. We have patches for 2.6.20 but not older than that. Sunil --- Begin Message --- Hi all, I just started with OCFS2 and set up a 2-node cluster where one node is writing and both read from the clustered volume. Currently I'm moving data to the volum

[Ocfs2-users] OCFS2 1.2.9-1 for Novell's SLES9 SP4 and SLES10 SP1 released

2008-07-07 Thread Sunil Mushran
All, This is to inform all SLES users that the latest SLES9 SP4 and SLES10 SP1 kernel erratas includes OCFS2 1.2.9-1. Users running the older kernel with 1.2.8-1 are urged to upgrade to the current release. For more information on the changes in OCFS2, please refer to the email announcing 1.2.9-1

Re: [Ocfs2-users] ocfs2 datavolume option and oracle

2008-07-08 Thread SUNIL . MUSHRAN
In mainline, the issue was addressed in 2.6.21. In enterprise kernels, the issue was addressed in 1.2.4-2. If you are on (RH)EL4 or (RH)EL5, install 1.2.9-1. If you are on SLES9 SP4 or SLES10 SP1, upgrade to the latest kernel. Sunil --- Begin Message --- From the User's Guide: > Oracle databas

Re: [Ocfs2-users] ocfs2 datavolume option and oracle

2008-07-08 Thread SUNIL . MUSHRAN
RAC is only supported on (RH)EL and SLES. It may work with other distros, but support is a different beast. datavolume mount option is not in mainline kernel. Oracle 10g onwards, the database itself does not require it... as one can set filesystemio_options to directio (init.ora param). That away

Re: [Ocfs2-users] ocfs2 datavolume option and oracle

2008-07-08 Thread Sunil Mushran
[EMAIL PROTECTED] wrote: > By raw are you meaning raw device access without a filesystem like ocfs2 > on the volume for the voting disk? Or am I not following? > Raw means specifying the block device directly. So make two partitions, say, sdd1 and sdd2, and feed that (/dev/sdd1, etc) to the to

Re: [Ocfs2-users] Different size with du and ls

2008-07-10 Thread Sunil Mushran
Email me the following info: $ debugfs.ocfs2 -R "stats" /dev/sdX <== replace with ocfs2 device $ stat /mnt/user/small/11/11wa1.jpg $ stat /data/user/small/11/11wa1.jpg Markus Meyer wrote: > Hi all, > > I stumbled over a curious thing. The Linux tools "df" and "du" aren't > working cor

Re: [Ocfs2-users] Different size with du and ls

2008-07-10 Thread Sunil Mushran
Markus Meyer wrote: > Block Size Bits: 12 Cluster Size Bits: 16 > Links: 0 Clusters: 6707596 So you have a 4TB volume. Correct? Appears mkfs chose 64K as the cluster size. This means the smallest data allocation would be 64K. >File: `/mnt/user/small/11/11wa1.jpg' >Size

Re: [Ocfs2-users] Fence abnormal and with not apparent reason

2008-07-11 Thread Sunil Mushran
If you are still on 1.2.8-2, then it is a known issue fixed in 1.2.9-1. Gabriele Di Giambelardini wrote: > Hi to all, watching the log by more attention and in the moment when a > node go down, I have this imformation by the kernel about o2net : > > Jul 10 16:52:02 be1 kernel: BUG: soft lockup -

Re: [Ocfs2-users] Node fence on RHEL4 machine running 1.2.8-2

2008-07-14 Thread SUNIL . MUSHRAN
File a bugzilla with the logs of all the machines. /var/log/messages. Meanwhile do schedule an upgrade to 1.2.9-1. We have one fix relating to o2net fencing that could have been in play here. But I'll need to read the full logs to be sure. Sunil --- Begin Message --- Hello, We have a four-node R

Re: [Ocfs2-users] Much higher disk usage in OCFS2 then in XFS

2008-07-15 Thread Sunil Mushran
That's 175 million files. I hope they are spread out across many directories. Our inodes are blocksized. 4k blocksize means 700G of metadata. 2K means 350G. 1K means 175G. AFAIK, XFS has 256 byte inodes. Maybe try 1K blocksize and 8K clustersize. You would be an ideal candidate for the inlined

Re: [Ocfs2-users] ocfs2 performance and scaling

2008-07-17 Thread Sunil Mushran
Sabuj Pattanayek wrote: > Hi, > > I'm using OCFS2 from 2.6.26 with some patches I made that allow for > the creation of a volume greater than 16TB: > > http://oss.oracle.com/pipermail/ocfs2-devel/2008-July/002568.html > http://oss.oracle.com/pipermail/ocfs2-tools-devel/2008-July/000857.html > > The

Re: [Ocfs2-users] OCFS processes active after a umount [SEC=UNOFFICIAL]

2008-07-21 Thread Sunil Mushran
That is strange. Next time double check the mounts with: $ cat /proc/mounts The mount command prints the entries in /etc/mtab while the /proc/mounts dumps the information from the kernel. If those threads are there, it means the volume is still mounted. Two in this case. The entries in mtab are

Re: [Ocfs2-users] OCFS processes active after a umount [SEC=UNOFFICIAL]

2008-07-22 Thread Sunil Mushran
Did you monitor /proc/mounts as I had suggested. -Original Message- >From Mark Schloss <[EMAIL PROTECTED]> Sent Mon 7/21/2008 9:22 PM To Sunil Mushran <[EMAIL PROTECTED]> Cc ocfs2-users@oss.oracle.com Subject Re: [Ocfs2-users] OCFS processes active after a umount [SEC=UNO

Re: [Ocfs2-users] ORA-19870 and ORA-19502 During RMAN restore to OCFS2 filesystem

2008-07-23 Thread Sunil Mushran
Please file a SR with Oracle support. Database issues are best resolved in that forum. __ >From Ed Gulakowski <[EMAIL PROTECTED]> Sent Wed 7/23/2008 7:48 PM To ocfs2-users@oss.oracle.com Subject [Ocfs2-users] ORA-19870 and ORA-19502 During RMAN

Re: [Ocfs2-users] Recommended block size for a mail environment

2008-07-24 Thread Sunil Mushran
To start off, you are using a very old version of the fs/tools. 2+ year old. Upgrading will take care of the -R option. However, the basic problem you are experiencing will remain. As in, ocfs2 uses blocksized inodes. You can reduce the blocksize, but that will result in a loss of thruput as the i

Re: [Ocfs2-users] OCFS processes active after a umount [SEC=UNOFFICIAL]

2008-07-24 Thread Sunil Mushran
rious stages in the test > outlined below. Also, the -n option is not used on the mount. > > Regards > > Mark Schloss > > > Mark Schloss | Oracle DBA | Information Technology | x0013 > > -Original Message- > From: Sunil Mushran [mailto:[EMAIL PROTECTED]

Re: [Ocfs2-users] ocfs2 fencing issue on 1.2.9.1

2008-07-24 Thread Sunil Mushran
Hard for me to diagnose the issue with no logs. Maybe best if you logged a bugzilla with novell and provided them with all the logs. Kuang, Howard [WHQKT] wrote: > > Hi, Sunil, > > > > I upgrade ocfs2 to 1.2.9.1 with the new kernel from Novell. The > fencing problem is still existing. When one

Re: [Ocfs2-users] why does mkfs.ocfs2 take so long?

2008-07-28 Thread Sunil Mushran
Two inits take time. 1. Cluster group init. 2. Journal init. Considering this is a 16TB volume being formatted with 4K/4K block/cluster sizes, means it has 127074 cluster groups to initialize. So 127074 4K blocks to initialize. But this bit should be somewhat similar to ext3. Journal initializati

Re: [Ocfs2-users] OCFS2 and VMware ESX

2008-07-28 Thread Sunil Mushran
SLES10 SP2 is shipping OCFS2 1.4. We will releasing the same for (RH)EL in the coming weeks. -Original Message- >From Haydn Cahir <[EMAIL PROTECTED]> Sent Mon 7/28/2008 8:07 PM To ocfs2-users@oss.oracle.com Subject Re: [Ocfs2-users] OCFS2 and VMware ESX Hi Mark, Thanks for your reply. Ho

Re: [Ocfs2-users] Problems building ocfs2 rpm on Fedora 9

2008-07-29 Thread Sunil Mushran
Use an enterprise kernel. Tina Soles wrote: > OK, another snag. Fedora 9 does not support RAW devices, so I can't > configure the voting disk or OCR disk to be as such. Any suggestions? I > think I'm "up a creek" here... > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PRO

Re: [Ocfs2-users] Filesystem usage after mkfs.ocfs2

2008-07-30 Thread Sunil Mushran
The default mkfs params make 4 slots each with a 256M journal. That's 1G. If you want them smaller, mkfs provides parameters to override the same. Secondly, we compute based on the full device. Most other filesystems deduct the blocks consumed by the fs on creation in their calculation. Arnold Ma

Re: [Ocfs2-users] ocfs2 kernel BUG

2008-08-01 Thread Sunil Mushran
No, the kernel is old. A year+ old. Refer to this announcement below. http://oss.oracle.com/pipermail/ocfs2-announce/2008-July/26.html From the stack, it looks you are encountering the rename/extend race that was fixed a long time ago. http://oss.oracle.com/projects/ocfs2/news/article_14.htm

Re: [Ocfs2-users] ocfs2 node reboot method

2008-08-05 Thread Sunil Mushran
I believe 1.2.5-SLES-r2997 is the version of the fs and not the tools. Meaning, an upgrade is required to the latest kernel that is shipping 1.2.9. As far as failure to mount goes, one reason could be that the default timeout (10 secs) could be low. See if increasing to the new default of 30 secs

Re: [Ocfs2-users] ocfs2 node reboot method

2008-08-06 Thread Sunil Mushran
http://oss.oracle.com/bugzilla/show_bug.cgi?id=838 Check out this bugzilla. Tao Ma wrote: > Hi, > > Masanari Iida wrote: >> Hello Tao and Sunil, > ]> My case, the symptom (ocfs2 failed to mount a volume using >> /etc/fstab) happend when I reboot the system. >> Even if it failed to mount (by /etc/f

Re: [Ocfs2-users] OCFS2 troubleshooting tools

2008-08-11 Thread Sunil Mushran
ocfs2 home page on oss.oracle.com has support guides. Also, we have a wiki. http://oss.oracle.com/osswiki/OCFS2/Debugging The dlm debugging has been improved in sles10 sp2. We will be soon releasing the details in ocfs2 1.4's user's guide. Make sure you are running the latest sles10 sp1 kernel. h

Re: [Ocfs2-users] dlm domain problem

2008-08-11 Thread Sunil Mushran
Do: $ debugfs.ocfs2 -l DLM ENTRY EXIT allow $ mkdir /dlm/test $ debugfs.ocfs2 -l DLM off ENTRY EXIT deny File a bugzilla and attach the /var/log/messages. Charlie Sharkey wrote: > > > > I'm having a problem creating a dlm domain. The libo2dlm library > returns a 'could not create domain' err

Re: [Ocfs2-users] Version compatibity

2008-08-12 Thread Sunil Mushran
Yes. It is fully ondisk compatible. Read the bit about file system compatibility here. http://oss.oracle.com/pipermail/ocfs2-announce/2008-March/23.html We will be releasing a more formal user's guide soon. Paulo Rodrigues wrote: > Hello, > > is it safe to unmount FS under 1.3.3 and mount un

Re: [Ocfs2-users] Bug in OCFS2 1.3.3

2008-08-13 Thread Sunil Mushran
This could suggest an on disk problem. Have you run fsck.ocfs2 recently? fsck.ocfs2 -f /dev/sdX1 Paulo Rodrigues wrote: > Hello, > > I'm on 2.6.24 with OCFS2 1.3.3 and every couple days this comes up in > dmesg. I have to reboot the cluster machines, there's nothing else I > can do. Stopping t

Re: [Ocfs2-users] Suggestion about Heartbeat

2008-08-13 Thread Sunil Mushran
I am hoping we would not have this problem with the new cluster stacks that are in development, cman and pacemaker. But always good to hear about the issues being encountered by the users. Well, not good... but you know what I mean. Michael Moody wrote: > > I have a suggestion about the heartbeat

Re: [Ocfs2-users] Bug in OCFS2 1.3.3

2008-08-13 Thread Sunil Mushran
Does not look you used the force option. Or, you ran with the file system mounted. Umount the fs on all nodes and do: $ fsck.ocfs2 -f /dev/dm-1 Paulo Rodrigues wrote: > Hello Sunil, > > fsck says its clean: > > Checking OCFS2 filesystem in /dev/dm-1: > label: /var/lib/dovecot/spool

Re: [Ocfs2-users] Linux-x86_64 Error: 4: Interrupted system call

2008-08-14 Thread Sunil Mushran
This may not be related to ocfs2. Check the oracle doc for the version of the database you are running. It will have all the appropriate kernel settings. Daniel Keisling wrote: > Greetings, > > When attempting to start up a database on an OCFS2 filesystem, Oracle > complains with the following

Re: [Ocfs2-users] Bug in OCFS2 1.3.3

2008-08-15 Thread Sunil Mushran
Please can you file a bugzilla and attach this stack trace. Also attach the output of the following: $ objdump -DSl /lib/modules/`uname -r`/kernel/fs/ocfs2/ocfs2.ko >/tmp/ocfs2.out Paulo Rodrigues wrote: > Got the same error again today. > > BUG: unable to handle kernel NULL pointer dereference

Re: [Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4 server (kernel:2.6.9-42.0.2.ELs)

2008-08-18 Thread Sunil Mushran
Configure a netdump or netconsole server. It will catch the relevant messages. Derek Hazell wrote: > > Dear OCFS2 forum > > We run ocfs2 version 1.2.9-1 as an ocfs2 cluster on four Linux servers > running RHEL 4 (kernel: 2.6.9-42.0.2.ELs) > > We are getting unexpected reboots of one of the L

Re: [Ocfs2-users] OCFS2 random kernel error on Fedora 8

2008-08-18 Thread Sunil Mushran
Thanks. This looks like a new issue. Please log it in the bugzilla. http://oss.oracle.com/bugzilla Wessel wrote: > Hello All, > > I’ve been running a 4-node OCFS2 cluster for about a month now, and recently > I’ve had a total of 3 kernel errors on random nodes. This causes the machine > to lock up

[Ocfs2-users] OCFS2 1.4 is released

2008-08-19 Thread Sunil Mushran
All, We are pleased to announce the release of OCFS2 1.4. This release has been available with Novell's SUSE Linux Enterprise Server (SLES10 SP2) for some time now. Today we are announcing the release of the same for Red Hat's and Oracle's Enterprise Linux (EL5 U2) distributions. Before upgrading

Re: [Ocfs2-users] new server and version ocfs2

2008-08-19 Thread Sunil Mushran
Actually, the mount will fail. The clusterstack detects mismatches during the handshake. The 1.3.9 tools corresponds with the file system that comes with the kernel. It was the best release at that time. However, software development is a constant process. We are adding new features and fixing bug

Re: [Ocfs2-users] Weird messages at kernel.log

2008-08-20 Thread Sunil Mushran
2.6.18 is very very old. We've fixed many may bugs since then. Upgrade to a more recent kernel. Earliest I would say 2.6.21. Or anything after that. Dante Garro wrote: > > Hi all! > > I've just configured a new 2 node cluster and I found messages at > kernel.log like the following: > > Aug 19 19:

Re: [Ocfs2-users] formatting and mounting ocfs2 on 2 rac nodes

2008-08-21 Thread Sunil Mushran
What does "mounted.ocfs2 -d" return on the two nodes? Corne Lombard wrote: > > I ran into a problem in section 16 (Install & Configure Oracle Cluster > File System (OCFS2)) of the following article – > > http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_iscsi_2.html#16. > > I have 2 R

Re: [Ocfs2-users] ocfs2 issue? : unexplained reboots of RHEL 4 server (kernel:2.6.9-42.0.2.ELs)

2008-08-23 Thread Sunil Mushran
ms to do submit_bio for read > Index 8: took 120303 ms to do waiting for read completion > *** ocfs2 is very sorry to be fencing this system by restarting *** > Bootdata ok (command line is ro root=/dev/VolGroup_ID_12182/LogVol1 > console=ttyS0,9600n8) > > > ##

Re: [Ocfs2-users] VM node won't talk to host

2008-08-25 Thread Sunil Mushran
No, the device names have nothing to do. When you mount, mount.ocfs2 kicks off the heartbeat. When other nodes see a new node heartbeating, o2net attempts to connect to the node. That connect is necessary for the mount to succeed. My investigation would start with disk heartbeat. # watch -d -n2

Re: [Ocfs2-users] migration methods (ocfs <-> ocfs2)

2008-08-27 Thread Sunil Mushran
No, the fscat tools can only read certain unmounted file systems. They cannot write. You can use it to copy data from ocfs to ocfs2 on a box running the 2.6 kernel (sles9/10, el4/5). Mehmet Can ÖNAL wrote: > > Hi everyone; > > > > we have a production system with 6 nodes of RAC upon ocfs file sy

Re: [Ocfs2-users] Using OCFS2 with More than Two Nodes

2008-08-29 Thread Sunil Mushran
Define export? Do you want the nodes to be part of a cluster? As in, want local fs semantics across nodes. If so, use a shared device that can accomodate more than 2 nodes. drbd8 is great for what it does but also limits users to 2 nodes. Zack Gilburd wrote: > Hi all, > > I have ocfs2 atop drbd8

Re: [Ocfs2-users] (no subject)

2008-09-01 Thread Sunil Mushran
So in 1.4, we have a much improved debugging infrastructure for such issues. Check out the write on dlm debugging in the 1.4 user's guide in the chapter titled notes. In short, you have correctly identified the lock resource. But we need to go a step further and get the info from the dlm and see a

Re: [Ocfs2-users] (no subject)

2008-09-02 Thread Sunil Mushran
ively, if you've rebooted the system that holds the lock > would the others reclaim locks held and carry on as normal? > >Andy > > > > -Original Message- > From: Sunil Mushran [mailto:[EMAIL PROTECTED] > Sent: Tue 02/09/2008 05:21 > To: Andrew Phillips >

Re: [Ocfs2-users] 2 node cluster reboot

2008-09-02 Thread Sunil Mushran
Check out qs 80/81 in the faq. http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#QUORUM which version, distro, etc.? Dante Garro wrote: > Hi Sirs, I have a 2 nodes cluster. > > Seems (but I'm not sure)when I've rebooted one of nodes the other reboots > itself too. > > Is

Re: [Ocfs2-users] VM node won't talk to host

2008-09-04 Thread Sunil Mushran
That will be so if KVM is buffering the ios. Which it must be doing for performance reasons. Bret Baptist wrote: > On Friday 29 August 2008 18:38:08 Bret Baptist wrote: > >> On Thursday 28 August 2008 18:59:07 Sunil Mushran wrote: >> >>> If the VM is not see

Re: [Ocfs2-users] [Ocfs-users] Hard system restart when DRBD connection fails while in use

2008-09-07 Thread Sunil Mushran
Repeat the test. This time run the following on Node A after you have killed Node B. $ ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN If we are lucky we'll get to see where that process is waiting. Henri Cook wrote: > Hi all, > > I have two nodes (A+B) running a DRBD file system (using OCFS2) on

Re: [Ocfs2-users] 2 node cluster reboot

2008-09-07 Thread Sunil Mushran
rify it's still connected with a ping node and continue about > its duties (it can even maintain read/write in a DRBD context) > > H > > Sunil Mushran wrote: >> Check out qs 80/81 in the faq. >> http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.h

Re: [Ocfs2-users] cant see parameters values

2008-09-08 Thread Sunil Mushran
What version of ocfs2-tools are you running? Doron Tamir wrote: > > Hi all , > > > > When I type : > > cat /etc/sysconfig/o2cb > > > > I see only > > 1. # O2CB_ENABELED: 'true' means to load the driver on boot. > 2. O2CB_ENABLED=true > 3.4.

Re: [Ocfs2-users] Version of ocfs2 in vanilla?

2008-09-08 Thread Sunil Mushran
That fix went into 2.6.26. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=ffda89a3bf3b968bdc268584c6bc1da5c173cf12 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a4a4891164d4f6f383cc17e7c90828a7ca6a1146 http://git.kernel.org/?p=linu

Re: [Ocfs2-users] OCFS2

2008-09-11 Thread Sunil Mushran
What version? $ modinfo ocfs2 $ rpm -qa | grep ocfs2 $ uname -a Sunil Marshall, Richard wrote: > > Hello: > > We have a 3 node cluster, on one of the nodes the LAN cable was > accidentally disconnected, and the node hard booted (i.e. power > off/on). There were no indications from ocfs2 and o2

Re: [Ocfs2-users] OCFS2 and Xen - aio error -14

2008-09-16 Thread Sunil Mushran
Which kernel? The error message itself needs to be silenced. The OR should be changed to an AND. 2184 if (ret != -EFAULT || ret != -ENOSPC) 2185 mlog_errno(ret); But that just means we are treating this as a user error. However, as the same

Re: [Ocfs2-users] OCFS2 and Xen - aio error -14

2008-09-16 Thread Sunil Mushran
Enable some tracing: $ debugfs.ocfs2 -l ENTRY EXIT INODE DISK_ALLOC SUPER FILE_IO NAMEI AIO allow $ xm save $ debugfs.ocfs2 -l ENTRY EXIT deny INODE DISK_ALLOC SUPER FILE_IO NAMEI AIO off File a bugzilla and attach the syslog. Brett Worth wrote: > Sunil Mushran wrote: > &g

Re: [Ocfs2-users] OCFS2 and Xen - aio error -14

2008-09-17 Thread Sunil Mushran
One more thing. Do: $ strace -ff -o /tmp/save xm save Zip up the traces and attach to bugzilla. Lastly, is the guest hvm or pvm, 32-bit or 64-bit. From the packages the host machine appears to be 64-bit. Sunil Mushran wrote: > Enable some tracing: > > $ debugfs.ocfs2 -l ENTRY E

Re: [Ocfs2-users] Ocfs2 cluster and sw level...

2008-09-19 Thread Sunil Mushran
The network protocol differs between 1.2 and 1.4. It won't work. Marco Mililotti wrote: > Hi all, > > is it possible/acceptable to mount and use an Ocfs2 filesystem on two > machines running *different* software level? I.e.: > - M1 that runs ocfs2 1.2.3-0.7, kernel drv ver: 1.2.5-SLES-r2997 > - M

Re: [Ocfs2-users] o2hb_do_disk_heartbeat:982:ERROR

2008-09-19 Thread Sunil Mushran
Ensure the cluster.conf is the same across the cluster. If it is not, edit and restart the cluster. The "transport endpoint" error means that the tcpip connect failed. It could be because of incorrect ip, firewall, or a bad cluster.conf. The dmesg errors indicate that the cluster.conf could be mi

<    2   3   4   5   6   7   8   9   10   11   >