Re: [Ocfs2-users] Unstable Cluster
Sérgio Surkamp ser...@gruposinternet.com.br 2011-12-09 11:17: Hi. Why are you using OCFS2 version 1.5.0 in production? As long as I known, 1.5 series is for developers only. I think that's just the version tag line they give on the mainline kernel. It's not just for developers, it just may not be as well supported by some commercial linux vendor if/when something goes wrong. Brian Em Fri, 9 Dec 2011 00:42:25 -0800 Tony Rios t...@tonyrios.com escreveu: I managed to get ahold of the kernel panic message because it's happening on any new machines I try to introduce to the cluster: [ 66.276054] OCFS2 1.5.0 snip/ signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Diagnosing some OCFS2 error messages
Patrick J. LoPresti lopre...@gmail.com 2010-06-13 19:14: Hello. I am experimenting with OCFS2 on Suse Linux Enterprise Server 11 Service Pack 1. I am performing various stress tests. My current exercise involves writing to files using a shared-writable mmap() from two nodes. (Each node mmaps and writes to different files; I am not trying to access the same file from multiple nodes.) Both nodes are logging messages like these: [94355.116255] (ocfs2_wq,5995,6):ocfs2_block_check_validate:443 ERROR: CRC32 failed: stored: 2715161149, computed 575704001. Applying ECC. [94355.116344] (ocfs2_wq,5995,6):ocfs2_block_check_validate:457 ERROR: Fixed CRC32 failed: stored: 2715161149, computed 2102707465 [94355.116348] (ocfs2_wq,5995,6):ocfs2_validate_extent_block:903 ERROR: Checksum failed for extent block 2321665 [94355.116352] (ocfs2_wq,5995,6):__ocfs2_find_path:1861 ERROR: status = -5 [94355.116355] (ocfs2_wq,5995,6):ocfs2_find_leaf:1958 ERROR: status = -5 [94355.116358] (ocfs2_wq,5995,6):ocfs2_find_new_last_ext_blk:6655 ERROR: status = -5 [94355.116361] (ocfs2_wq,5995,6):ocfs2_do_truncate:6900 ERROR: status = -5 [94355.116364] (ocfs2_wq,5995,6):ocfs2_commit_truncate:7559 ERROR: status = -5 [94355.116370] (ocfs2_wq,5995,6):ocfs2_truncate_for_delete:597 ERROR: status = -5 [94355.116373] (ocfs2_wq,5995,6):ocfs2_wipe_inode:770 ERROR: status = -5 [94355.116376] (ocfs2_wq,5995,6):ocfs2_delete_inode:1062 ERROR: status = -5 ...although the particular extent block number varies somewhat. In addition, when I run fsck.ocfs2 -y -f /dev/md0, I get an I/O error: dp-1:~ # fsck.ocfs2 -y -f /dev/md0 fsck.ocfs2 1.4.3 Checking OCFS2 filesystem in /dev/md0: Label: NONE UUID: 29BB12B5AA4C449E9DDE906405F5BDE4 Number of blocks: 3221225472 Block size: 4096 Number of clusters: 12582912 Cluster size: 1048576 Number of slots:4 /dev/md0 was run with -f, check forced. Pass 0a: Checking cluster allocation chains Pass 0b: Checking inode allocation chains Pass 0c: Checking extent block allocation chains Pass 1: Checking inodes and blocks. extent.c: I/O error on channel reading extent block at 2321665 in owner 9704867 for verification pass1: I/O error on channel while iterating over the blocks for inode 9704867 fsck.ocfs2: I/O error on channel while performing pass 1 This looks like a straightforward I/O error, right? The only problem is that there is nothing in any log (dmesg, /var/log/messages, event log on the hardware RAID) to indicate any hardware problem. That is, when fsck.ocfs2 reports this I/O error, no other errors are logged anywhere as far as I can tell. Shouldn't the kernel log a message if a block device gets an I/O error? I am using a pair of hardware RAID chassis accessed via iSCSI, and then using Linux md (RAID-0) to stripe between them. Questions: 1) I would like to confirm this I/O error for myself using dd. How do I map the numbers above (extent block at 2321665 in owner 9704867) to an actual offset on the block device so I can try to read the blocks by hand? 2) Is there any plausible explanation for these errors other than bad hardware? Thanks! - Pat I don't believe OCFS2 can currently support any logical volume manager other than a simple concatenation (and even then it's with extreme caution). The overhead involved in the lower software layer doing striping needs to somehow be coordinated among all the nodes in the cluster else all fs consistency guarantees provided by the SCSI layer are lost. Brian signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Support and Stability
Michael Austin onedbg...@gmail.com 2010-05-24 13:32: I would like to get some feedback on the overall perception on the support and stability of OCFS2 (latest). This tool looks like a perfect fit for a production system I am planning, but, due to it's open source roots, there are some concerns about ss. The app will be deemed mission critical with very little tolerance for any downtime (24x365). Thanks. M. Austin Consultant It pains me to, but I can't say I'd recommend it for something like a mail setup that has heavy write of tiny files. There's a fragmentation issue that burned us bad recently and before that a locking issue (search the archives). Even then I have to say that the Oracle devs were responsive to us even without a service contract, for which I'm very grateful. You might have better luck with a supported distro. I've always used mainline kernels with Debian. That said, I had been using an earlier version for a web server backend (couple of TB, mostly read) and a video streaming library (_many_ TB and _lots_ of read traffic) for a long time without any reports of problems. I don't work there anymore, but from what I hear everything's still humming along without interruption (that should be read overall cluster interruption) for almost 3 years now. That even with crummy server rooms that try bake their inhabitants from time to time :) I will also say just off hand that OCFS2 is still the best OSS shared disk cluster fs I've tried. I've tested GFS2 off and on for a couple of years and it still has a rather trivial deadlock case: # cssh node1 node2 node3 # mkdir /cluster/$HOSTNAME # touch /cluster/$HOSTNAME/test # rm -rf /cluster/* Cheers, Brian signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] slowdown - fragmentation?
Brian Kroth bpkr...@gmail.com 2010-04-26 09:17: Hello all, I've got a moderately active mail system running on OCFS2. It's been on fresh volume for about 9 months now, however only ever with one node active at a time. For the most part it's been very happy, however recently we started experiencing very noticeable and periodic but short lived slowdowns. From all of the graphs and measurements I've been doing the IO system seems bored (not much more than ~200 IOPS even, and typically less on a 14 disk raid 50 with 15K disks). We haven't changed anything recently and I'm having trouble nailing down the cause. Given all the talk about ENOSPC and fragmentation of late it's growing on my list of worries. Can someone please take a look at my stat_sysdir output and give me a quick opinion of whether or not they think that might be and issue? Thanks for your help, Brian We're running Debian Lenny with a 2.6.30 kernel in VMWare ESX 4. # dpkg -l | grep -i ocfs2 ii ocfs2-tools 1.4.2-1 First, sorry for spamming everyone's mailbox by attaching that dump last time. I should have posted it somewhere. Unfortunately, we've gotten confirmation that we hit the infamous ENOSPC bug [1] in error messages stating the inability to store new mail messages. We performed the recommended reduction in slots from 8 to 4 and things are temporarily up and running again. However, I'm curious if you have any thoughts on how much time or # of files we've bought ourselves by doing this. Ideally we'd like to be able to hold out for another month when there's a mandatory holiday so that no you can use the system anyways before we do our total overhaul fix. Here's the results of two stat_sysdir.sh dumps. http://cae.wisc.edu/~bpkroth/public/ocfs2/ Thanks very much, Brian [1] http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189 signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Recommended settings for mkfs.ocfs2
lenny-backports has a 2.6.32 based kernel that might already have the free space fix in it. I haven't checked yet. Also you don't really explain what you're trying to use the data store for (eg: lots of small files, video files, heavy writes, heavy reads, random, sequential, etc.). It may impact the options you want to give to mkfs. Brian Andrew Robert Nicols andrew.nic...@luns.net.uk 2010-04-19 10:06: I'm planning to deploy a new data store using a pair of servers running Debian Lenny and using DRBD to replicate data between them. My intention is to use ocfs2 as the file system on these so that we can operate in a dual primary mode. The raid device I'm using gives 15TB of usable space and, having had a brief look through the ocfs2-users archive, I see that in January an issue with space left on device was fixed, but this isn't available in the stock Lenny kernel yet (2.6.26). From the notes on the bug, I see that altering the number of slots on the file system can help to alleviate the issue, but what are the recommendations on what mkfs.ocfs2 options work best. I've already created the file system with 2 node slots, but I see from comment #13 that Sunil recommends against adding slots. I am free to re-create the file system if need be. To summarise: * 2 Node cluster * 15TB storage * ocfs2-tools version 1.4.1 * 2.6.26 kernel Any advice would be gratefully received, Andrew Nicols -- Systems Developer e: andrew.nic...@luns.net.uk im: a.nic...@jabber.lancs.ac.uk t: +44 (0)1524 5 10147 Lancaster University Network Services is a limited company registered in England and Wales. Registered number: 04311892. Registered office: University House, Lancaster University, Lancaster, LA1 4YW ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Using multiple clusters on the same host
I believe that due to the way the fencing works you only need a single cluster to have multiple volumes. Just make sure that all of the hosts involved are specified in the same cluster.conf file. For example, nodes a, b, c could mount volume1, while b, c, d mount volume2, and e, f, g mount volume3 so long as the cluster.conf file holds nodes a-g and is consistent across all nodes. Brian Daniel Bidwell bidw...@andrews.edu 2010-03-18 08:46: Is it possible to use multiple ocfs2 clusters on the same host? I would like to have a given host have access to several different clustered file systems. The older documentation says that you can only have one cluster per host. This restriction appears to have been removed from the latest documentation, but I don't see any mention of how to configure multiple clusters. -- Daniel R. Bidwell | bidw...@andrews.edu Andrews University | Information Technology Services If two always agree, one of them is unnecessary Friends don't let friends do DOS In theory, theory and practice are the same. In practice, however, they are not. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] No Space left on the device.
I also have a mail volume hosted on OCFS2 and I'm somewhat concerned about /when/ we will run into this problem and what we can do to help avoid too much hurt when it happens. Are there any tips on reading the output of stat_sysdir.sh? The man page wasn't especially helpful, but I'm guessing I'm looking at the Contig column for enough clusters 511. I can post the output if you'd prefer. As mentioned in the bug (didn't think it was a proper place for discussion) I'm also curious more generally about backporting these fixes to the 2.6.32 kernel since it's been designated long term stable. Is that responsibility just on the individual distro's kernel maintainer or are the OCFS2 devs planning on submitting fixes to the mainline 2.6.32 tree? Thanks, Brian Brad Plant bpl...@iinet.net.au 2010-03-04 16:17: Hi Aravind, Sounds like you might have hit the free space fragmentation issue: http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189 I'm sure that if you post output of stat_sysdir.sh (http://oss.oracle.com/~seeda/misc/stat_sysdir.sh) one of the ocfs2 devs will be able to confirm this. *If* it is this problem, removing some node slots will help. That is of course if you have more node slots that you need. I think 8 are created by default. Cheers, Brad On Thu, 4 Mar 2010 10:28:49 +0530 (IST) Aravind Divakaran aravind.divaka...@yukthi.com wrote: HiAll, For my mailserver i am using ocfs2 filesystem configured on san. Now my mail delivery application is sometimes complaining No Space left on the device, even though there is enough space and inodes. Can anyone help me to solve this issue. Rgds, Aravind M D ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] No Space left on the device.
Joel Becker joel.bec...@oracle.com 2010-03-05 13:48: On Fri, Mar 05, 2010 at 08:33:34AM -0600, Brian Kroth wrote: As mentioned in the bug (didn't think it was a proper place for discussion) I'm also curious more generally about backporting these fixes to the 2.6.32 kernel since it's been designated long term stable. Is that responsibility just on the individual distro's kernel maintainer or are the OCFS2 devs planning on submitting fixes to the mainline 2.6.32 tree? Who 'designated' it long term stable? I'm just wondering who we should send our patches to ;-) Joel Fair enough. Here's the most authoritative source [1] [2] I can find, though a quick google on long term stable kernel produces a number of other results [3]. [1] http://lwn.net/Articles/370236/ [2] http://www.kroah.com/log/linux/stable-status-01-2010.html [3] http://www.fabian-fingerle.de/2010-02-23.233 Thanks, Brian signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] fencing question
That seems unwise. Presumably the connection to the disk or other network nodes was lost due to some failure in which case you don't want nodes operating on the disk unless they can agree on what's safe. If there was a planned outage of the disk or network connection, then the related volumes should probably have been unmounted first. Brian Charlie Sharkey charlie.shar...@bustech.com 2010-02-25 14:09: Hi, I have a question on fencing. Besides setting the O2CB_HEARTBEAT_THRESHOLD parameter to some large value, is there any way to setup ocfs2 to only fence when it loses connections to ALL mounted disk volumes rather than to ANY one volume ? thanks, charlie signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] invalid opcode bug in dlmglue?
We've gotten a couple of dumps likes this in the last couple of days while migrating some new users to our mail store which involves untarring/moving large quantities of files. We've gracefully rebooted the node after every instance and it seems to do fine with normal mail operations. I'm wondering if you have any thoughts on the messages? Running in ESX 3.5. The kernel is Debian 2.6.30 based. Storage backend is iSCSI EqualLogic. Only one node currently has the FS mounted. Thanks, Brian Feb 4 09:34:41 iris kernel: [528465.151651] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Feb 4 09:34:41 iris kernel: [528465.151722] Process rm (pid: 32114, ti=dfad4000 task=e17b9610 task.ti=dfad4000) Feb 4 09:34:41 iris kernel: [528465.147544] [ cut here ] Feb 4 09:34:41 iris kernel: [528465.148706] kernel BUG at fs/ocfs2/dlmglue.c:2470! Feb 4 09:34:41 iris kernel: [528465.148818] invalid opcode: [#1] SMP Feb 4 09:34:41 iris kernel: [528465.148983] last sysfs file: /sys/devices/system/clocksource/clocksource0/available_clocksource Feb 4 09:34:41 iris kernel: [528465.149113] Modules linked in: ocfs2 jbd2 quota_tree ocfs2_stack_o2cb ocfs2_stackglue netconsole vmsync vmmemctl ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs usbhid hid uhci_hcd ohci_hcd eh ci_hcd usbcore psmouse evdev serio_raw parport_pc parport snd_pcsp snd_pcm snd_timer snd soundcore snd_page_alloc container button ac i2c_piix4 processor i2c_core intel_agp shpchp agpgart pci_hotplug ext3 jbd mbcache dm_mirror dm_region_ha sh dm_log dm_snapshot dm_mod sd_mod crc_t10dif ide_cd_mod cdrom ata_generic libata ide_pci_generic floppy mptspi mptscsih mptbase scsi_transport_spi scsi_mod vmxnet piix ide_core thermal fan thermal_sys [last unloaded: scsi_wait_scan] Feb 4 09:34:41 iris kernel: [528465.150763] Feb 4 09:34:41 iris kernel: [528465.150945] Pid: 32114, comm: rm Not tainted (2.6.30-vmwareguest-smp-64g.20090711 #1) VMware Virtual Platform Feb 4 09:34:41 iris kernel: [528465.151104] EIP: 0060:[f887783d] EFLAGS: 00010246 CPU: 2 Feb 4 09:34:41 iris kernel: [528465.151446] EIP is at ocfs2_dentry_lock+0x26/0xf7 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.151520] EAX: f6548800 EBX: c8c53c6c ECX: EDX: Feb 4 09:34:41 iris kernel: [528465.151586] ESI: 17395beb EDI: f6538000 EBP: 0005 ESP: dfad5e88 Feb 4 09:34:41 iris kernel: [528465.151651] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Feb 4 09:34:41 iris kernel: [528465.151722] Process rm (pid: 32114, ti=dfad4000 task=e17b9610 task.ti=dfad4000) Feb 4 09:34:41 iris kernel: [528465.151814] Stack: Feb 4 09:34:41 iris kernel: [528465.151886] 0001 c75fe8dc c8c53c6c 17395beb c8c53c6c f888fdcc Feb 4 09:34:41 iris kernel: [528465.152084] 17395beb f88925c9 f6538000 f0987900 c75fe940 c75fe5c0 Feb 4 09:34:41 iris kernel: [528465.152303] e6472e10 Feb 4 09:34:41 iris kernel: [528465.152566] Call Trace: Feb 4 09:34:41 iris kernel: [528465.152580] [f888fdcc] ? ocfs2_remote_dentry_delete+0xe/0x95 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.152872] [f88925c9] ? ocfs2_unlink+0x3fe/0xa26 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.152960] [c019ab62] ? vfs_unlink+0x5c/0x95 Feb 4 09:34:41 iris kernel: [528465.153165] [c019be31] ? do_unlinkat+0x93/0xfc Feb 4 09:34:41 iris kernel: [528465.153240] [c0114001] ? smp_reschedule_interrupt+0x13/0x1c Feb 4 09:34:41 iris kernel: [528465.153336] [c0107eda] ? reschedule_interrupt+0x2a/0x30 Feb 4 09:34:41 iris kernel: [528465.153413] [c01077d4] ? sysenter_do_call+0x12/0x28 Feb 4 09:34:41 iris kernel: [528465.153538] Code: e9 19 fe ff ff 55 57 56 53 83 ec 0c 83 fa 01 8b 50 58 19 ed 83 e5 fe 83 c5 05 89 54 24 04 8b 40 54 85 d2 8b b8 98 01 00 00 75 04 0f 0b eb fe 8d 9f 9c 00 00 00 89 d8 e8 2a 1e ab c7 8b 87 a4 00 Feb 4 09:34:41 iris kernel: [528465.154876] EIP: [f887783d] ocfs2_dentry_lock+0x26/0xf7 [ocfs2] SS:ESP 0068:dfad5e88 Feb 4 09:34:41 iris kernel: [528465.155307] ---[ end trace 62c828cac153c25f ]--- signature.asc Description: Digital signature ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] invalid opcode bug in dlmglue?
Excellent. Thanks for the quick response. Also, any idea when the tools might support indexed dirs? I suspect we'll have some downtime coming up in a couple of months and am wondering if we can use the opportunity to turn on that feature for quicker lookup times. Thanks, Brian Sunil Mushran sunil.mush...@oracle.com 2010-02-04 09:16: Fixed. http://oss.oracle.com/bugzilla/show_bug.cgi?id=1137 You probably already have this patch. If not, add it. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a5a0a630922a2f6a774b6dac19f70cb5abd86bb0 You are definitely missing this patch. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a1b08e75dff3dc18a88444803753e667bb1d126e Brian Kroth wrote: We've gotten a couple of dumps likes this in the last couple of days while migrating some new users to our mail store which involves untarring/moving large quantities of files. We've gracefully rebooted the node after every instance and it seems to do fine with normal mail operations. I'm wondering if you have any thoughts on the messages? Running in ESX 3.5. The kernel is Debian 2.6.30 based. Storage backend is iSCSI EqualLogic. Only one node currently has the FS mounted. Thanks, Brian Feb 4 09:34:41 iris kernel: [528465.151651] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Feb 4 09:34:41 iris kernel: [528465.151722] Process rm (pid: 32114, ti=dfad4000 task=e17b9610 task.ti=dfad4000) Feb 4 09:34:41 iris kernel: [528465.147544] [ cut here ] Feb 4 09:34:41 iris kernel: [528465.148706] kernel BUG at fs/ocfs2/dlmglue.c:2470! Feb 4 09:34:41 iris kernel: [528465.148818] invalid opcode: [#1] SMP Feb 4 09:34:41 iris kernel: [528465.148983] last sysfs file: /sys/devices/system/clocksource/clocksource0/available_clocksource Feb 4 09:34:41 iris kernel: [528465.149113] Modules linked in: ocfs2 jbd2 quota_tree ocfs2_stack_o2cb ocfs2_stackglue netconsole vmsync vmmemctl ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager configfs usbhid hid uhci_hcd ohci_hcd eh ci_hcd usbcore psmouse evdev serio_raw parport_pc parport snd_pcsp snd_pcm snd_timer snd soundcore snd_page_alloc container button ac i2c_piix4 processor i2c_core intel_agp shpchp agpgart pci_hotplug ext3 jbd mbcache dm_mirror dm_region_ha sh dm_log dm_snapshot dm_mod sd_mod crc_t10dif ide_cd_mod cdrom ata_generic libata ide_pci_generic floppy mptspi mptscsih mptbase scsi_transport_spi scsi_mod vmxnet piix ide_core thermal fan thermal_sys [last unloaded: scsi_wait_scan] Feb 4 09:34:41 iris kernel: [528465.150763] Feb 4 09:34:41 iris kernel: [528465.150945] Pid: 32114, comm: rm Not tainted (2.6.30-vmwareguest-smp-64g.20090711 #1) VMware Virtual Platform Feb 4 09:34:41 iris kernel: [528465.151104] EIP: 0060:[f887783d] EFLAGS: 00010246 CPU: 2 Feb 4 09:34:41 iris kernel: [528465.151446] EIP is at ocfs2_dentry_lock+0x26/0xf7 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.151520] EAX: f6548800 EBX: c8c53c6c ECX: EDX: Feb 4 09:34:41 iris kernel: [528465.151586] ESI: 17395beb EDI: f6538000 EBP: 0005 ESP: dfad5e88 Feb 4 09:34:41 iris kernel: [528465.151651] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Feb 4 09:34:41 iris kernel: [528465.151722] Process rm (pid: 32114, ti=dfad4000 task=e17b9610 task.ti=dfad4000) Feb 4 09:34:41 iris kernel: [528465.151814] Stack: Feb 4 09:34:41 iris kernel: [528465.151886] 0001 c75fe8dc c8c53c6c 17395beb c8c53c6c f888fdcc Feb 4 09:34:41 iris kernel: [528465.152084] 17395beb f88925c9 f6538000 f0987900 c75fe940 c75fe5c0 Feb 4 09:34:41 iris kernel: [528465.152303] e6472e10 Feb 4 09:34:41 iris kernel: [528465.152566] Call Trace: Feb 4 09:34:41 iris kernel: [528465.152580] [f888fdcc] ? ocfs2_remote_dentry_delete+0xe/0x95 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.152872] [f88925c9] ? ocfs2_unlink+0x3fe/0xa26 [ocfs2] Feb 4 09:34:41 iris kernel: [528465.152960] [c019ab62] ? vfs_unlink+0x5c/0x95 Feb 4 09:34:41 iris kernel: [528465.153165] [c019be31] ? do_unlinkat+0x93/0xfc Feb 4 09:34:41 iris kernel: [528465.153240] [c0114001] ? smp_reschedule_interrupt+0x13/0x1c Feb 4 09:34:41 iris kernel: [528465.153336] [c0107eda] ? reschedule_interrupt+0x2a/0x30 Feb 4 09:34:41 iris kernel: [528465.153413] [c01077d4] ? sysenter_do_call+0x12/0x28 Feb 4 09:34:41 iris kernel: [528465.153538] Code: e9 19 fe ff ff 55 57 56 53 83 ec 0c 83 fa 01 8b 50 58 19 ed 83 e5 fe 83 c5 05 89 54 24 04 8b 40 54 85 d2 8b b8 98 01 00 00 75 04 0f 0b eb fe 8d 9f 9c 00 00 00 89 d8 e8 2a 1e ab c7 8b 87 a4 00 Feb 4 09:34:41 iris kernel: [528465.154876] EIP: [f887783d] ocfs2_dentry_lock+0x26/0xf7 [ocfs2] SS:ESP 0068:dfad5e88 Feb 4 09:34:41 iris kernel: [528465.155307] ---[ end trace 62c828cac153c25f ]--- signature.asc Description: Digital
[Ocfs2-users] esx elevator=noop
http://lonesysadmin.net/2008/02/21/elevatornoop/ I ran across this recently which describes, when operating in a virtual environment with shared storage, how to try and let the storage and hypervisor deal with arranging disk write operations in a more globally optimal way rather than having all the guests try to do it and muck it up. However, this is contrary to ocfs2 recommendation of using the deadline elevator. I'm just wondering if you have any comments one way or the other? My concern would be that while noop might make things globally optimal it would still allow starvation in a single guest which might lead to ocfs2 fencing. Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] esx elevator=noop
At least in ESX you can setup various reservations and priority weightings for various resources to unsure that some machines are considered more important than others. As to what actually happens in practice, who knows. Thanks for the info, Brian Herbert van den Bergh herbert.van.den.be...@oracle.com 2010-01-15 11:20: I would assume that all bets are off if you're running a cluster inside a vm. There are no garantees for either I/O or cpu scheduling. Thanks, Herbert. On 01/15/2010 10:50 AM, Sunil Mushran wrote: The deadline recommendation was for early el4 kernels that had a bug in cfq. That bug was fixed years ago. I am unsure how using noop in guest will trigger starvation. Not that I am recommending it. I have not thought about this much. On Jan 15, 2010, at 9:55 AM, Brian Krothbpkr...@gmail.com wrote: http://lonesysadmin.net/2008/02/21/elevatornoop/ I ran across this recently which describes, when operating in a virtual environment with shared storage, how to try and let the storage and hypervisor deal with arranging disk write operations in a more globally optimal way rather than having all the guests try to do it and muck it up. However, this is contrary to ocfs2 recommendation of using the deadline elevator. I'm just wondering if you have any comments one way or the other? My concern would be that while noop might make things globally optimal it would still allow starvation in a single guest which might lead to ocfs2 fencing. Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0?
Luis Freitas lfreita...@yahoo.com 2009-12-11 05:40: Patrick, Depending on what you are using, you could use the volume manager to do the striping, but you need to use CLVM. So if you can, go for Heartbeat2+CLVM+OCFS2, all integrated. Not sure but I think Heartbeat2+OCFS2 is only available on the vanilla kernels, not on the enterprise ones. Maybe Suse has support, I don't know, you will have to check. Best Regards, Luis Freitas Just to elaborate on these comments. Last time I checked CLVM required the openais/cman cluster stack, which neither heartbeat nor ocfs2 use (by default). The userspace stack option for ocfs2 in recent mainline kernels added support for the openais stack and pacemaker is required to make heartbeat work with that rather than use it's own cluster stack. Now, you can do an basic LVM linear span, concatenation, or whatever you want to call it without any cluster stack, so long as it's not striped and so long as you heed Sunil's warning about fat fingering changes to the thing while more than one host is using it. That means that if you want to add another LUN to the span you can't do it on the fly. You have to do something like this: # On all nodes: umount /ocfs2 # On all nodes but one: vgchange -an ocfs2span # Or, to be extra safe: halt -p # On the remaining node: vgextend ocfs2span /dev/newlun lvextend -l+100%FREE /dev/mapper/ocfs2span-lv tunefs.ocfs2 -S /dev/mapper/ocfs2span-lv # You might actually need the fs mounted for that last bit, I forget. # Probably a fsck somewhere in there would be wise as well. # Bring the other nodes back up. Brian --- On Wed, 12/9/09, Patrick J. LoPresti lopre...@gmail.com wrote: From: Patrick J. LoPresti lopre...@gmail.com Subject: [Ocfs2-users] Combining OCFS2 with Linux software RAID-0? To: ocfs2-users@oss.oracle.com, linux-r...@vger.kernel.org Date: Wednesday, December 9, 2009, 9:03 PM Is it possible to run an OCFS2 file system on top of Linux software RAID? Here is my situation. I have four identical disk chassis that perform hardware RAID internally. Each chassis has a pair of fiber channel ports, and I can assign the same LUN to both ports. I want to connect all of these chassis to two Linux systems. I want the two Linux systems to share a file system that is striped across all four chassis for performance. I know I can use software RAID (mdadm) to do RAID-0 striping across the four chassis on a single machine; I have tried this, it works fine, and the performance is tremendous. I also know I can use OCFS2 to create a single filesystem on a single chassis that is shared between my two Linux systems. What I want is to combine these two things. Suse's documentation ([1]http://www.novell.com/documentation/sles11/stor_admin/?page=/documentation/sles11/stor_admin/data/raidyast.html) says: IMPORTANT:Software RAID is not supported underneath clustered file systems such as OCFS2, because RAID does not support concurrent activation. If you want RAID for OCFS2, you need the RAID to be handled by the storage subsystem. Because my disk chassis already perform hardware RAID-5, I only need Linux to do the striping (RAID-0) in software. So for me, there is no issue about which node should rebuild the RAID etc. I understand that Linux md stores meta-data on the partitions and is not cluster aware, but will this create problems for OCFS2 even if it is just RAID 0? Has anybody tried something like this? Are there alternative RAID-0 solutions for Linux that would be expected to work? Thank you. - Pat ___ Ocfs2-users mailing list [2]ocfs2-us...@oss.oracle.com [3]http://oss.oracle.com/mailman/listinfo/ocfs2-users References Visible links 1. http://www.novell.com/documentation/sles11/stor_admin/?page=/documentation/sles11/stor_admin/data/raidyast.html 2. file:///mc/compose?to=ocfs2-us...@oss.oracle.com 3. http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 and VMware ESX
Late to the party... Here's what I did to get OCFS2 going with RDMs _and_ VMotion (with some exceptions). Almost surely not supported, but it works: First, create the RDM as a physical passthru using either the gui or the cli. It need not be on a separate virtual controller. However, to have VMotion working it must _not_ use bus sharing. For the other nodes setup another passthru RDM that uses the same path as the previous one. You must do this on the cli since the GUI will hide the LUN path once it's been configured any where else. # Query the path: cd /vmfs/volumes/whereever/node1 vmkfstools -q node1_1.vmdk # Make a passthru VMDK for that RDM cd /vmfs/volumes/wherever/node2 vmkfstools -z /vmfs/devices/disks/vmhba37\:25\:0\:0 node2.vmdk Now you can add that disk to the VM either by editing the vmx like David said, or with the GUI. Here's the catch - ESX locks that /vmfs/devices/disk/whatever path when a node starts using it, so you can't 1) Run more than one of these vm nodes on the same esx node 2) Migrate one of these vm nodes to an esx node that is already running another of the vm nodes. So, if your esx cluster is greater than your ocfs2 cluster, you're ok. Else, you need to stick to either the bus sharing method which means no vmotion, or the cluster in a box method which means all on one esx node (which kinda defeats the point in my opinion). I have noticed significant performance benefits from moving to RDMs vs sw iscsi virtualized in the guest os. It's the only reason I'd risk all this. As to snapshots I've been told by the vmware techs not to use them for production level things (or at least not for very long) as they can really kill performance. We do snapshots on the raid device serving the lun and restrict its visibility to a particular machine (actually another vm) in order to do all of the backups from there. Actually it had to be on a separate VM outside the normal cluster else the machine would refuse to mount it. Works out better this way anyways since then we're not burdening the production VM with the backup work (though it still hits the same storage device). The script that does this also deals with all of the necessary fs fixups to have multiple snapshots mounted at once though I think the 1.4.2 version of ocfs2-tools provides a cloned fs option for doing all that now. Brian David Murphy da...@icewatermedia.com 2009-10-22 09:10: With RDM versus the method Kent described. It's a bit more complicated and will prevent snapshots and vmotion. Basically follow what he said but instead of making a vmdk disk choose RDM and select a LUN. Then make sure that machine is NOT powered on, log into the esx host and move the RDM file to say /vmfs/volumes/volume_name/RawDeviceMaps ( you need to make that folder). Next manually edit the VMX for that host and change it path to the RDM to where every you moved it to. Now you can create new clones of your base template, and add the RDM drive to it ( as ken mentioned , its VERY important), pointing to the RawDeviceMaps folder and the correct RDM file for that LUN. This approach has many issue so I'm planning on moving away from it. 1) You can't clone 2) You can't snapshot 3) You can't vmotion 4) If you delete a host that has that drive attached you completely destroy the RDM file. (BAD JOJO) I you do need to have cluster in such an environment I would suggest a combination of the 2 approaches. 1) Build a new LUN and make it VMFS and let the ESX hosts discover it. 2) Create the VMDK's on that LUN not in you main VMFS for VM's 3) Make sure you set any OCFS drive to separate controller and physical, persistent ( so it won't snapshot it) You should retain snap/vmotion. But we aware. I am not sure if cloning will make a new vmdk on your VMFS volume you make for the ocfs drives. So I would have a base template I clone, then add that drive to the clone ( to guarantee the drives location). It's a bit more work that just saving the VMDK to the VM's folder on your main VMFS, but it separates the OCFS drives to another LUN. So you could easily stop your cluster, take a snapshot of the lun for backups and bring them back up. Limiting your downtime window. Might be over kill depend on the companies backup stance. Hope it helps David From: ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com] On Behalf Of Rankin, Kent Sent: Monday, July 28, 2008 9:13 PM To: Haydn Cahir; ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] OCFS2 and VMware ESX What I did a few days ago was to create a vmware disk for each OCFS2 filesystem, and store it with one of the VM nodes. Then, add that disk to each
Re: [Ocfs2-users] more ocfs2_delete_inode dmesg questions
So, we've found that this has actually been causing some dropped mail and backlogs. Here's the situation: MX servers filter into the main mail server, all running sendmail. The main mail server has an OCFS2 spool volume which will periodically throw those error messages in dmesg that I listed earlier. Sendmail returns one of these two sequences: Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 0: fl=0x0, mode=20666: CHR: dev=0/13, ino=679, nlink=1, u/gid=0/0, size=0 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 1: fl=0x802, mode=140777: SOCK smtp/25-mx2/54625 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 2: fl=0x1, mode=20666: CHR: dev=0/13, ino=679, nlink=1, u/gid=0/0, size=0 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 3: fl=0x2, mode=140777: SOCK localhost-[[UNIX: /dev/log]] Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 4: fl=0x802, mode=140777: SOCK smtp/25-mx2/54625 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 5: fl=0x0, mode=100640: dev=8/33, ino=580655, nlink=1, u/gid=0/25, size=164461 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 6: fl=0x8000, mode=100644: dev=8/1, ino=175982, nlink=1, u/gid=107/25, size=12288 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 7: fl=0x8000, mode=100644: dev=8/1, ino=175982, nlink=1, u/gid=107/25, size=12288 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 8: fl=0x8000, mode=100644: dev=8/1, ino=175663, nlink=1, u/gid=107/25, size=49152 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 9: fl=0x802, mode=140777: SOCK smtp/25-mx2/54625 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 10: fl=0x8000, mode=100644: dev=8/1, ino=175663, nlink=1, u/gid=107/25, size=49152 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 11: fl=0x8000, mode=100644: dev=8/1, ino=175662, nlink=1, u/gid=107/25, size=2621440 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 12: fl=0x8000, mode=100644: dev=8/1, ino=175662, nlink=1, u/gid=107/25, size=2621440 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 13: fl=0x8000, mode=100644: dev=8/1, ino=175622, nlink=1, u/gid=107/25, size=2543616 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: 14: fl=0x8000, mode=100644: dev=8/1, ino=175622, nlink=1, u/gid=107/25, size=2543616 Oct 6 16:46:02 iris sm-mta[14407]: n96Lk23h014407: SYSERR(root): queueup: cannot create queue file ./qfn96Lk23h014407, euid=0, fd=-1, fp=0x0: File exists This results in dropped mail. Oct 6 16:26:09 iris sm-mta[29393]: n96LQ4uK029393: SYSERR(root): collect: bfcommit(./dfn96LQ4uK029393): already on disk, size=0: File exists This results in a failure being propagated to the mx, which then backlogs mail for a while before it retries. The debufs.ocfs2 command allows me to identify the file, which turns out to be a 0 byte file on the spool volume. If it's a data file I can usually examine the cooresponding control file, or vise-versa, but after not too long, both are removed, or rather replaced by new files. I'm not sure if that's just sendmail shoving lots of files around and therefore needing to reuse inodes, or sendmail cleaning up after itself, or ocfs2 cleaning up after itself. For now I've moved the spool volume back to a local disk, but may try to test this out some more on our test setup. Any thoughts? Thanks, Brian Brian Kroth bpkr...@gmail.com 2009-08-25 08:52: Sunil Mushran sunil.mush...@oracle.com 2009-08-24 18:12: So a delete was called for some inodes that had not been orphaned. The pre-checks detected the same and correctly aborted the deletes. No harm done. Very good to hear. No, the messages do not pinpoint the device. It's something we discussed adding, but have not done it as yet. Next time this happens and you can identify the volume, do: # debugfs.ocfs2 -R findpath 613069 /dev/sdX This will tell you the pathname for the inode#. Then see if you can remember performing any op on that file. Anything. It may help us narrow down the issue. Sunil Will do. Thanks again, Brian Brian Kroth wrote: I recently brought up a mail server with two ocfs2 volumes on it, one large one for the user maildirs, and one small one for queue/spool directories. More information on the specifics below. When flushing the queues from the MXs I saw the messages listed below fly by, but since then nothing. A couple of questions: - Should I be worried about these? They seemed similar yet different to a number of other out of space and failure to delete reports of late. - How can I tell which volume has the problem inodes? - Is there anything to be done about them? Here's the snip from the tail of dmesg: [ 34.578787] netconsole: network logging started [ 36.695679] ocfs2: Registered cluster interface o2cb [ 43.354897] OCFS2 1.5.0 [ 43.373100] ocfs2_dlm: Nodes in domain (94468EF57C9F4CA18C8D218C63E99A9C): 1 [ 43.386623] kjournald2 starting: pid 2328
Re: [Ocfs2-users] OCFS2 1.4 Problem on SuSE
Angelo McComis ang...@mccomis.com 2009-09-29 11:19: I'm sorry -- it's lvm2, and yes. :-) On Tue, Sep 29, 2009 at 10:41 AM, Charlie Sharkey charlie.shar...@bustech.com wrote: It was mentioned: - Checked our lvm configuration - seems to be good as well. Is lvm supported by ocfs2 ? I didn't think this part was true. The issue being that all nodes need to be aware of possible metadata changes to the volume group and logical volumes. clvm (which I believe is supported by Novell) can handle that locking between nodes so that they have a consistent view of the metadata, but last I checked it used a different cluster stack that wasn't quite supported by ocfs2 yet and running both sided by side would run into some fencing issues. Alternatively I think you can (read unsupported, but does work) do simple LVM configurations like linear spans since they don't have any striping metadata information that needs to be updated. The trick is that you need to take everything offline when you want to make any changes to the volume group or logical volume. Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Clear Node
You can do what Sunil mentioned using heartbeat [1]. However, MySQL also has replication built into it and you can also use heartbeat to automatically turn a slave into a master very quickly without any need for shared storage. This way you could also use the slave to do load balancing of reads and provide backups without interrupting access on the master when all the tables and databases get locked. Brian [1] http://linux-ha.org Sunil Mushran sunil.mush...@oracle.com 2009-08-25 17:14: Can you describe the mount lock? You don't have to limit the mount to just one node. Have both nodes mount the volume but run mysql only on one node only. Sunil James Devine wrote: I am trying to make a mysql standby setup with 2 machines, one primary and one hot standby, which both share disk for the data directory. I used tunefs.ocfs2 to change the number of open slots to 1 since only one machine should be accessing it at a time. This way it is fairly safe to assume one shouldn't clobber the other's data. Only problem is, if one node dies, the mount lock still persists. Is there a way to clear that lock so the other node can mount the share? ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] more ocfs2_delete_inode dmesg questions
Sunil Mushran sunil.mush...@oracle.com 2009-08-24 18:12: So a delete was called for some inodes that had not been orphaned. The pre-checks detected the same and correctly aborted the deletes. No harm done. Very good to hear. No, the messages do not pinpoint the device. It's something we discussed adding, but have not done it as yet. Next time this happens and you can identify the volume, do: # debugfs.ocfs2 -R findpath 613069 /dev/sdX This will tell you the pathname for the inode#. Then see if you can remember performing any op on that file. Anything. It may help us narrow down the issue. Sunil Will do. Thanks again, Brian Brian Kroth wrote: I recently brought up a mail server with two ocfs2 volumes on it, one large one for the user maildirs, and one small one for queue/spool directories. More information on the specifics below. When flushing the queues from the MXs I saw the messages listed below fly by, but since then nothing. A couple of questions: - Should I be worried about these? They seemed similar yet different to a number of other out of space and failure to delete reports of late. - How can I tell which volume has the problem inodes? - Is there anything to be done about them? Here's the snip from the tail of dmesg: [ 34.578787] netconsole: network logging started [ 36.695679] ocfs2: Registered cluster interface o2cb [ 43.354897] OCFS2 1.5.0 [ 43.373100] ocfs2_dlm: Nodes in domain (94468EF57C9F4CA18C8D218C63E99A9C): 1 [ 43.386623] kjournald2 starting: pid 2328, dev sdb1:36, commit interval 5 seconds [ 43.395413] ocfs2: Mounting device (8,17) on (node 1, slot 0) with ordered data mode. [ 44.984201] eth1: no IPv6 routers present [ 54.362580] warning: `ntpd' uses 32-bit capabilities (legacy support in use) [ 1601.560932] ocfs2_dlm: Nodes in domain (10BBA4EB7687450496F7FCF0475F9372): 1 [ 1601.581106] kjournald2 starting: pid 7803, dev sdc1:36, commit interval 5 seconds [ 1601.593065] ocfs2: Mounting device (8,33) on (node 1, slot 0) with ordered data mode. [ 3858.778792] (26441,0):ocfs2_query_inode_wipe:882 ERROR: Inode 613069 (on-disk 613069) not orphaned! Disk flags 0x1, inode flags 0x80 [ 3858.779005] (26441,0):ocfs2_delete_inode:1010 ERROR: status = -17 [ 4451.007580] (5053,0):ocfs2_query_inode_wipe:882 ERROR: Inode 613118 (on-disk 613118) not orphaned! Disk flags 0x1, inode flags 0x80 [ 4451.007711] (5053,0):ocfs2_delete_inode:1010 ERROR: status = -17 [ 4807.908463] (11859,0):ocfs2_query_inode_wipe:882 ERROR: Inode 612899 (on-disk 612899) not orphaned! Disk flags 0x1, inode flags 0x80 [ 4807.908611] (11859,0):ocfs2_delete_inode:1010 ERROR: status = -17 [ 5854.377155] (31074,1):ocfs2_query_inode_wipe:882 ERROR: Inode 612867 (on-disk 612867) not orphaned! Disk flags 0x1, inode flags 0x80 [ 5854.377302] (31074,1):ocfs2_delete_inode:1010 ERROR: status = -17 [ 6136.297464] (3463,0):ocfs2_query_inode_wipe:882 ERROR: Inode 612959 (on-disk 612959) not orphaned! Disk flags 0x1, inode flags 0x80 [ 6136.297555] (3463,0):ocfs2_delete_inode:1010 ERROR: status = -17 [19179.000100] NOHZ: local_softirq_pending 80 There's actually three nodes, all VMs, that are setup for the ocfs2 cluster volumes, but only one has it mounted. The others are available as cold standbys that may eventually be managed by heartbeat, so there shouldn't be any locking contention going on. All nodes are running 2.6.30 with ocfs2-tools 1.4.2. Here's the commands used to make the volumes: mkfs.ocfs2 -v -L ocfs2mailcluster2 -N 8 -T mail /dev/sdb1 mkfs.ocfs2 -v -L ocfs2mailcluster2spool -N 8 -T mail /dev/sdc1 The features the were setup with: tunefs.ocfs2 -Q Label: %V\nFeatures: %H %O\n /dev/sdb1 Label: ocfs2mailcluster2 Features: sparse inline-data unwritten tunefs.ocfs2 -Q Label: %V\nFeatures: %H %O\n /dev/sdc1 Label: ocfs2mailcluster2spool Features: sparse inline-data unwritten And their mount options: mount | grep cluster /dev/sdb1 on /cluster type ocfs2 (rw,noexec,nodev,_netdev,relatime,localflocks,heartbeat=local) /dev/sdc1 on /cluster-spool type ocfs2 (rw,noexec,nodev,_netdev,relatime,localflocks,heartbeat=local) localflocks because I ran into a problem with them previously, and since it's a single active node model currently there's no reason for them anyways. Let me know if you need any other information. Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Ghost files in OCFS2 filesystem
I didn't see this in the bug list. Which mainline release is this fixed in? Thanks, Brian Sunil Mushran sunil.mush...@oracle.com 2009-08-20 17:46: Yes, this is a known issue in OCFS2 1.4.1 and 1.4.2. That is assuming no process in the cluster has that file open. We have the fix. It will be available with 1.4.3 which is in testing. This was discussed in the email announcing the 1.4.2 release. http://oss.oracle.com/pipermail/ocfs2-announce/2009-June/28.html 1. Oracle# 7643059 - Orphan files not getting deleted When one unlinks a file, its inode is moved to the orphan directory and deleted when it is no longer in-use across the cluster. As part of the scheme, the node that unlinks the file, informs interested nodes of the same, asking the last node to stop using that inode to recover the space allocated to it. However, this scheme fails if memory pressure forces a node to forget to delete the inode on close. This issue was introduced in OCFS2 1.4.1. While we have fixed this issue, the fix did not make it into this release. Users running into this issue can call Oracle Support and ask for an interim release with this fix. Workaround: If on 1.4.2, mount the fs on another node. Chances are it will delete the orphans. If not or if you are on 1.4.1, umount vol on all nodes and run fsck.ocfs2 -f. You could ping support to get an interim fix. But we are close to releasing 1.4.3. So maybe better if you wait for that. Sunil Shave, Chris wrote: Hi, I have encountered an issue on an Oracle RAC cluster using ocfs2, OS is RH Linux 5.3. One of the ocfs2 filesystems appears to be 97% full, yet when I look at the files in there they only equal about 13gig (filesystems is 40gig in size). I have seen this sort of thing in HP-UX but that involved a process who's output file was deleted but the process hadn't been stopped properly, once we killed the offending process the space was released, but I can't seem to find any process on this Linux server that is using or writing files to that filesystem. File system: FilesystemSize Used Avail Use% Mounted on /dev/emcpoweri140G 39G 1.9G 96% /oraexport File listing: Other than directories, these are the only files in that filesystem, nothing in lost+found either.. [r...@aumel21db01cn01]# ll total 11926784 -rw-rw 1 oracle oinstall 12210978816 Aug 21 05:23 0.Full.090821.dmp -rw-rw-r-- 1 oracle oinstall 1920327 Aug 21 05:23 0.Full.090821.log Any assistance with what is going on would be greatly appreciated. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 on email storage
I just did this: mkfs.ocfs2 -v -L ocfs2mail -N 8 -T mail /dev/sdb1 The tools happen to choose -b 4096 -C 4096 for you at that point. Brian Sérgio Surkamp ser...@gruposinternet.com.br 2009-08-12 12:03: Em Wed, 12 Aug 2009 15:05:44 +0800 Thomas G. Lau thomas@ntt.com.hk escreveu: Dear all, Anyone using OCFS2 on email storage system ? (postfix/qmail). Wondering if any of you suffer any problem?! Also, what parameter did you tune for email system only? thanks. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users Hello Thomas, We are using as general propose shared filesystem (email, shared files, some web related files, etc...) and the only drawback from a local filesystem is that it's a bit slower due to cluster stack. About the tuning we have reduced the block size and pre-allocated size to reduce the on-disk size, since email messages usually are tiny files (lets say about 8K per message). Our setup: mkfs.ocfs2 -b 4096 -C 4096 -N 4 /dev/sdX If you plan to use it for a large number of accounts, keep your eye on the 32000 sub-directories limitation. Regards, -- .:':. .:'` Sérgio Surkamp | Gerente de Rede :: ser...@gruposinternet.com.br `:..:' `:, ,.:' *Grupos Internet S.A.* `: :'R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florianópolis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-announce] OCFS2 1.4.2-1 and OCFS2 Tools 1.4.2-1 released
Sunil Mushran sunil.mush...@oracle.com 2009-06-16 16:38: LOOKING AHEAD We are aiming to release OCFS2 1.6 later this year. This release will include the features that we have worked on over the past year. These are: 1. Extended Attributes (unlimited number of attributes) 2. POSIX ACLs 3. Security Attributes 4. Metadata Checksums with ECC (inodes and directories are checksummed) 5. JBD2 Support 6. Indexed Directories (subdirs number increased from 32000 to 2147483647) 7. REFLINK (inode snapshotting) Just looking over these as well: http://kernelnewbies.org/Linux_2_6_29#head-2febaacb9f9bef03ee54da9a2b026fdea824a996 http://kernelnewbies.org/Linux_2_6_30#head-1a54a63244fb0d85375f8ecbe651cf94dac38c6c For those of us wanting to play with the 2.6.30 kernel, does the new ocfs2-tools release support any of these features? In particular are ACLs, extended attributes, indexed directories, or optimized inode allocations supported by the tools? Last I had heard they weren't quite ready for some of those yet. Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Filesystem corruption and OCFS2 errors
Sunil Mushran sunil.mush...@oracle.com 2009-05-20 15:36: Well, as long as the LVM mappings remain consistent on all nodes, it will work. The problem is that if someone changes the setup on a node, you will encounter the problem you just did. The only safe way is to have the lvm clustered too. Whereas clvm is clustered, we would prefer supporting it if we can run the fs and it, using one clusterstack. SLES11 HAE will have support for this. We hope to have the same by (RH)EL6. That's what's always held me back from doing this as well. Will the common stack be the openais stack (ie: the so called user stack), the o2cb stack, or something completely different? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]
Luis Freitas lfreita...@yahoo.com 2009-05-20 10:46: I am not aware of any filesystem that can withstand a online fsck. Sun ZFS can do online correction, but it doesnt have a fsck tool. I hear btrfs will support this. It may be a feature that's easier to accomplish with copy on write. Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 FS with BACKUP Tools/Vendors
We've used Veritas NetBackup before without problems but are currently toying with rsyncing to ZFS (running on OpenSolaris) with fs compression and daily (ZFS) snapshots and then possibly dumping to tape. It's working really well so far. All of this actually happens from a SAN based snapshot of the OCFS2 volume so that we can have a point in time backup and not hassle the production machines with backups (even though the underlying storage is still being taxed). Brian Bumpass, Brian brian.bump...@wachovia.com 2009-04-02 14:30: My apology up front if this has been discussed already. I've reviewed the archives back to Nov. 2005 and found little of anything. I need some information concerning support for OCFS2 by backup products. Currently we use IBM/Tivoli's TSM tool.They don't support OCFS2 filesystems. And it looks like they have no intent to supporting the FS in next releases.Note... They do support their own NAS FS, GPFS. But this costs extra. Additionally, the small testing I have done is that a file under an OCFS2 FS backs up and recovers quite nicely.I have not tested using ACL-lists. But don't really care about those. This issue comes down to support. So... I guess what I am looking for is some indication of what the user community with OCFS2 and doing backups has been along similar issues. Sorry... The environment being supported is SLES 10 SP2 64-bit on DELL HP hardware. Thanks in advance, -B ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 backup strategies
Uwe Schuerkamp uwe.schuerk...@nionex.net 2009-03-13 10:42: Hi folks, I was wondering what is a good backup strategy for ocfs2 based clusters. Background: We're running a cluster of 8 SLES 10 sp2 machines sharing a common SAN-based FS (/shared) which is about 350g in size at the moment. We've already taken care of the usual optimizations concerning mount options on the cluster nodes (noatime and so on), but our backup software (bacula 2.2.8) slows to a crawl when encountering directories in this filesystem that contain quite a few small files. Data rates usually average in the tens of MB/sec doing normal backups of local filesystems on remote machines in the same LAN, but with the ocfs2 fs bacula is hard pressed to not fall below 1mb / sec sustained throughput which obviously isn't enough to back up 350g of data in a sensible timeframe. I've already tried disabling compression, rsync'ing to another server and so on, but so far nothing has helped with improving data rates. How would reducing the number of cluster nodes help with backups? Is there a dirty read option in ocfs2 that would allow reading the files without locking them first or something similar? I don't think bacula is the culprit as it easily manages larger backups in the same environment, even reading off smb shares is order of magnitudes faster in this case, so my guess is I'm missing out some non-obvious optimization that would improve ocfs2 cluster performance. Thanks in advance for any pointers all the best, Uwe This clearly may not work for all cases and I'm sure is totally unsupported, but our SAN (Equallogic) has the ability to take RW snapshots which is where we do our backups from. There was a thread a while back about the proper way to do this. Basically after taking the snapshot you need to fixup the filesystem in a couple of different ways (fsck, relabel, reuuid, etc.) so that the machine can mount several of these at once. If anyone's interested I can post these scripts. Since there's only one machine handling the snapshots and it's outside of the real ocfs2 cluster, while we're doing the fixups we also convert the snapshot to a local fs and finally remount it ro. This prevents all network locking from happening (since it's unnecessary) while the backups happen. We're doing this with a 2TB mail volume (~700G of _many_ small files) and haven't noticed any problems with it. I think you could probably achieve something similar by taking the number of active nodes in the cluster down to 1 during your backup window, but that has it's own problems to be concerned with. I think a simple umount /shared on all but that one would do it. Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] quota/acl support
Attempting this again since I got a DSN earlier. Brian Kroth bpkr...@gmail.com 2009-02-25 10:25: I'm doing some research on the possibility of using OCFS2 to serve users' home directories and other shared space. I noticed that quota and posix acl support was added in 2.6.29 but the tools are not there yet. When can we expect that? Also, are the quotas implemented on a directory or volume level? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] quota/acl support
I'm doing some research on the possibility of using OCFS2 to serve users' home directories and other shared space. I noticed that quota and posix acl support was added in 2.6.29 but the tools are not there yet. When can we expect that? Also, are the quotas implemented on a directory or volume level? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 hangs during webserver usage
I've run a web cluster with OCFS2 for almost two years now and found the local log files option to work just fine. You can use various tools to merge them, though the things that come with awstats have suited my tastes. As for monitoring multiple nodes logs during troubleshooting, I've used swatch and cssh. This may depend upon your setup, but in general the individual node that a client is connected to doesn't change frequently. Here are some other possible solutions to the problems you mentioned. David Johle djo...@industrialinfo.com 2009-01-28 13:53: 1) Lots of VirtualHosts, and I believe the correlation of a log with a particular host is lost when using syslog as there aren't enough facilities to allocate one per VH. Include the vhost name in the logging output. You already should be doing this if you're using mod_vhost_alias. If you really want separate logs per vhost have your syslog server split them back out. I know syslog-ng can do this. 2) A single syslog server is a single point of failure (for logging at least). I guess I could set up multiple syslog destinations and have each server send duplicate syslogs out. One method might be to use heartbeat (linux-ha.org) and rotate the service ip that syslog clients send to based on the health of a couple of syslog servers. 3) More network overhead, especially in the case of multiple log servers. You already have that to some extent if you're logging to OCFS2. 4) Doesn't address combined logging of non-Apache processes (e.g. Tomcat), which may eventually have the same issue. I did see mod_log_spread, which sounds like a promising alternative to apache syslogging as it addresses #1-3 above. http://www.backhand.org/mod_log_spread At 12:44 PM 1/28/2009, Sean Gray wrote: Why not just setup a syslog server and send all your apache logs to a central repository. Here is a quick tutorial http://www.oreillynet.com/pub/a/sysadmin/2006/10/12/httpd-syslog.html ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 replication
Our iscsi San (equallogic) does block level replication so we were thinking of trying to set something up soon so that we could have some nodes in another building connected via fiber to provide site level failover. I'll report back our experiences when we do that, but I imagine it would be similar to drdb with nice interconnect. Brian On Jan 22, 2009, at 2:08 PM, CA Lists li...@creativeanvil.com wrote: Can't say I've replicated it between two sites, but definitely between two physical servers. I used drbd in my particular case. Here's a small blog entry I put together a while back about what I did. Hopefully it's helpful: http://www.creativeanvil.com/blog/2008/how-to-create-an-iscsi-san-using-heartbeat-drbd-and-ocfs2/ Joe Koenig Creative Anvil, Inc. Phone: 314.692.0338 1346 Baur Blvd. Olivette, MO 63132 j...@creativeanvil.com http://www.creativeanvil.com David Schüler wrote: What about drbd? Von: ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com ] Im Auftrag von Garcia, Raymundo Gesendet: Donnerstag, 22. Januar 2009 20:46 An: ocfs2-users@oss.oracle.com Betreff: Re: [Ocfs2-users] OCFS2 replication RSYNC is not real time…. any other suggestion…? I treid RSYNC already… From: ocfs2-users-boun...@oss.oracle.com [mailto:ocfs2-users-boun...@oss.oracle.com ] On Behalf Of Sérgio Surkamp Sent: Thursday, January 22, 2009 12:32 PM Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] OCFS2 replication Try rsync. Garcia, Raymundo wrote: Hello… I am trying to replicate a OCFS2 filesystem in a site A to another OCFS@ based partition in another site B … I have tried sev eral products, inmage, steeleye.. etc without any luck.. those pro grams help me to replicate the filesystem but nnot the OCFS2 mount ed …I assume that this is because that most software based rep lication system work on the block level instead of the file leve… I wonder is anyone have tried to replicate OCFS2 between 2 sites…. Thanks Raymundo Garcia The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users Regards, -- .:':. .:'` Sérgio Surkamp | Gerente de Rede :: ser...@gruposinternet.com.br `:..:' `:, ,.:' *Grupos Internet S.A.* `: :'R. Lauro Linhares, 2123 Torre B - Sala 201 : : Trindade - Florianópolis - SC :.' :: +55 48 3234-4109 : ' http://www.gruposinternet.com.br Virus checked by G DATA AntiVirusKit Version: AVF 19.226 from 18.01.2009 Virus news: www.antiviruslab.com ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] flock errors in dmesg
Thanks for the info you guys. I ended up having to take everything down, fsck, and remount with localflocks. Hopefully that will prevent it from happening again while we wait for the kernel fix to go stable. Nothing should be doing flock anymore anyways. Luckily we can take san based snapshots of pre and post fsck to see what's changed. That script is still running. I'll report back if I have questions about that. So far it looks to be just the dovecot.index.cache files that got nuked, which is to be expected since that's what dovecot was flocking. Thanks again, Brian On Jan 15, 2009, at 11:58 AM, Coly Li coly...@suse.de wrote: Brian Kroth Wrote: I've been working on creating a mail cluster using ocfs2. Dovecot was configured to use flock since the kernel we're running is debian based 2.6.26 which supports cluster aware flock. User space is 1.4.1. During testing everything seemed fine, but when we got a real load on things we got a whole bunch of these messages in dmesg on the node that was hosting imap. Note that it's maildir and only one node is hosting imap so we don't actually need flock. I think we're going to switch back to dotlock'ing but I was hoping someone could interpret these error messages for me? Are they dangerous? This is an known issue and the patch gets merged in 2.6.29-rc1. Here is the patch for your reference. Author: Sunil Mushran sunil.mush...@oracle.com ocfs2/dlm: Fix race during lockres mastery dlm_get_lock_resource() is supposed to return a lock resource with a proper master. If multiple concurrent threads attempt to lookup the lockres for the same lockid while the lock mastery in underway, one or more threads are likely to return a lockres without a proper master. This patch makes the threads wait in dlm_get_lock_resource() while the mastery is underway, ensuring all threads return the lockres with a proper master. This issue is known to be limited to users using the flock() syscall. For all other fs operations, the ocfs2 dlmglue layer serializes the dlm op for each lockid. Users encountering this bug will see flock() return EINVAL and dmesg have the following error: ERROR: Dlm error DLM_BADARGS while calling dlmlock on resource LOCKID: bad api args Reported-by: Coly Li co...@suse.de Signed-off-by: Sunil Mushran sunil.mush...@oracle.com Signed-off-by: Mark Fasheh mfas...@suse.com --- 7b791d68562e4ce5ab57cbacb10a1ad4ee33956e fs/ocfs2/dlm/dlmmaster.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index cbf3abe..54e182a 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -732,14 +732,21 @@ lookup: if (tmpres) { int dropping_ref = 0; +spin_unlock(dlm-spinlock); + spin_lock(tmpres-spinlock); +/* We wait for the other thread that is mastering the resource */ +if (tmpres-owner == DLM_LOCK_RES_OWNER_UNKNOWN) { +__dlm_wait_on_lockres(tmpres); +BUG_ON(tmpres-owner == DLM_LOCK_RES_OWNER_UNKNOWN); +} + if (tmpres-owner == dlm-node_num) { BUG_ON(tmpres-state DLM_LOCK_RES_DROPPING_REF); dlm_lockres_grab_inflight_ref(dlm, tmpres); } else if (tmpres-state DLM_LOCK_RES_DROPPING_REF) dropping_ref = 1; spin_unlock(tmpres-spinlock); -spin_unlock(dlm-spinlock); /* wait until done messaging the master, drop our ref to allow * the lockres to be purged, start over. */ Thanks, Brian [snip] -- Coly Li SuSE Labs ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....
Add one for --srcport as well and I think you'll be ok. Actually, since my cluster traffic all goes over a separate switch I usually just allow all traffic in/out of eth1. Brian Bret Palsson b...@getjive.com 2009-01-15 08:12: So it looks like iptables is what is stopping it from working. After disabling iptables completely for 1 minute then trying to mount on node 1 it worked fine. So my new question is why did `iptables -A INPUT -ptcp --dport -j ACCEPT ; service iptables save` not allow ocfs2 to talk? What do people add the their iptables? -Bret On Jan 14, 2009, at 4:50 PM, Sunil Mushran wrote: It's part and parcel of the fs. If you want mainline linux, goto [1]http://kernel.org. Bret Palsson wrote: Can I get the source for DLM 1.5.0 and build it on my other machines? If so where do I grab it? Thanks, Bret On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote: I hate cut-paste's because I have no idea whether I can trust it or not. A misspelled 0 and 1 makes a whole world of difference. But the following seems to indicate that the configuration is bad. (3130,1):o2net_connect_expired:1659 ERROR: no connection established with node 0 after 30.0 seconds, giving up and returning errors. (4670,1):dlm_request_join:1033 ERROR: status = -107 (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107 (4670,1):dlm_join_domain:1485 ERROR: status = -107 (4670,1):dlm_register_domain:1732 ERROR: status = -107 (4670,1):o2cb_cluster_connect:302 ERROR: status = -107 (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107 (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107 ocfs2: Unmounting device (253,2) on (node 0) Why is the mount failing on node 0? I thought it was mounted on node 0? Maybe best if you file a bugzilla and attach the /var/log/messages of both nodes. Indicate the time you did the mount. Sunil Bret Palsson wrote: Output of Node 0 { OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 0f78045c75c0174e50e4cf0934bf9eae) OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build 4ce8fae327880c466761f40fb7619490) OCFS2 User DLM kernel interface loaded SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling eth3: no IPv6 routers present OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build 3fc82af4b5669945497b322b6aabd031) ocfs2_dlm: Nodes in domain (8B2CCF82F1BA4A70B587580B23D9D7F7): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-3, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (222B65A090D6477481AD30DE9FCE7961): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-2, type ocfs2), not configured for labeling ocfs2_dlm: Nodes in domain (0425C0367AF547E989864A46F3DBD6E6): 0 kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered data mode. SELinux: initialized (dev dm-4, type ocfs2), not configured for labeling } Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0 OCFS2 User DLM kernel interface loaded device eth0 entered promiscuous mode OCFS2 1.5.0 } On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote: What about the dmesg on node 1? Now ideally we want the fs versions to be the same on all nodes. However as we have not changed the protocol since 1.4.1, this should still work. Bret Palsson wrote: node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4 Output of Node 1 { OCFS2 Node Manager 1.5.0 OCFS2 DLM 1.5.0 ocfs2: Registered cluster interface o2cb OCFS2 DLMFS 1.5.0
[Ocfs2-users] flock errors in dmesg
I've been working on creating a mail cluster using ocfs2. Dovecot was configured to use flock since the kernel we're running is debian based 2.6.26 which supports cluster aware flock. User space is 1.4.1. During testing everything seemed fine, but when we got a real load on things we got a whole bunch of these messages in dmesg on the node that was hosting imap. Note that it's maildir and only one node is hosting imap so we don't actually need flock. I think we're going to switch back to dotlock'ing but I was hoping someone could interpret these error messages for me? Are they dangerous? Thanks, Brian [257387.675734] (21573,0):ocfs2_file_lock:1587 ERROR: status = -22 [257387.675734] (21573,0):ocfs2_do_flock:79 ERROR: status = -22 [257392.121692] (21360,0):dlm_send_remote_lock_request:333 ERROR: status = -40 [257392.121938] (21360,0):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257392.122023] (21360,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257392.122079] (21360,0):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F0008fca8f0e7e038e3 [257392.10] (21360,0):ocfs2_file_lock:1587 ERROR: status = -22 [257392.122326] (21360,0):ocfs2_do_flock:79 ERROR: status = -22 [257479.277941] (21950,0):dlm_send_remote_lock_request:333 ERROR: status = -40 [257479.277941] (21950,0):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257479.277941] (21950,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257479.277941] (21950,0):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F00085d6ff2e7e0a106 [257479.277941] (21950,0):ocfs2_file_lock:1587 ERROR: status = -22 [257479.277941] (21950,0):ocfs2_do_flock:79 ERROR: status = -22 [257480.407024] (21947,0):dlm_send_remote_lock_request:333 ERROR: status = -40 [257480.407024] (21947,0):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257480.407024] (21947,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257480.407024] (21947,0):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F000955ae83e7e0a13d [257480.407024] (21947,0):ocfs2_file_lock:1587 ERROR: status = -22 [257480.407024] (21947,0):ocfs2_do_flock:79 ERROR: status = -22 [257483.221066] (21972,1):dlm_send_remote_lock_request:333 ERROR: status = -40 [257483.221066] (21972,1):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257483.221066] (21972,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257483.221066] (21972,1):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F000955ae84e7e0a23c [257483.221066] (21972,1):ocfs2_file_lock:1587 ERROR: status = -22 [257483.221066] (21972,1):ocfs2_do_flock:79 ERROR: status = -22 [257725.200695] (12536,0):dlm_send_remote_lock_request:333 ERROR: status = -40 [257725.200695] (12536,0):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257725.200695] (12536,0):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257725.200695] (12536,0):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F000758938de7e0de1f [257725.200695] (12536,0):ocfs2_file_lock:1587 ERROR: status = -22 [257725.200695] (12536,0):ocfs2_do_flock:79 ERROR: status = -22 [257959.288124] (18619,1):dlm_send_remote_lock_request:333 ERROR: status = -40 [257959.288124] (18619,1):dlmlock_remote:269 ERROR: dlm status = DLM_BADARGS [257959.288124] (18619,1):dlmlock:747 ERROR: dlm status = DLM_BADARGS [257959.288124] (18619,1):ocfs2_lock_create:998 ERROR: DLM error -22 while calling ocfs2_dlm_lock on resource F000585c3e9e7e0e40d [257959.288124] (18619,1):ocfs2_file_lock:1587 ERROR: status = -22 [257959.288124] (18619,1):ocfs2_do_flock:79 ERROR: status = -22 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] how to format
Brett Worth br...@worth.id.au 2009-01-14 20:00: Christophe BOUDER wrote: but i can't format my new big device to use more than 16To for it. You should Consider increasing the block size to perhaps 16k. That should increase the size to 64TB I think he meant cluster size. http://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html Look at number 64 and 32. I know the documentation is for the previous version, but I believe the principles still apply. Note that since this sets the smallest allocatable size to a single file, if your volume is meant for many small files you'll end up wasting a lot of space. Perhaps the data in inode feature helps with that though. I've used an increased cluster size on media volumes before though and had no troubles. Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] mem usage
I've got a question about tuning mem usage which may or may not be ocfs2 related. I have some VMs that share an iSCSI device formatted with ocfs2. They're all running a debian based 2.6.26 kernel. We basically just dialed the kernel down to 100HZ rather than the default 1000HZ. Everything else is the same. All the machines have 2 CPUs and 2GB of RAM. Over time I would expect that the amount of free mem decreases towards 0 and the amount of (fs) cached mem increases. I think one can simulate this by doing the following: echo 1 /proc/sys/vm/drop_caches ls -lR /ocfs2/ /dev/null When I do this on a physical machine with a large ext3 volume, the cached field steadily increases as I expected. However, on the ocfs2 volume what I actually see is that the free mem and cache mem remains fairly constant. # free total used free sharedbuffers cached Mem: 2076376 9892561087120 0 525356 33892 -/+ buffers/cache: 4300081646368 Swap: 1052216 01052216 # top -n1 | head -n5 top - 11:11:19 up 2 days, 20:26, 2 users, load average: 1.45, 1.39, 1.26 Tasks: 140 total, 1 running, 139 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 3.7%sy, 0.0%ni, 79.5%id, 15.2%wa, 0.1%hi, 0.6%si, 0.0%st Mem: 2076376k total, 989772k used, 1086604k free, 525556k buffers Swap: 1052216k total,0k used, 1052216k free,33920k cached I've tried to trick the machines into allowing for more cached inodes by decreasing vfs_cache_pressure, but it doesn't seem to have had much affect. I also get the same results if only one machine has the ocfs2 fs mounted. I have also tried mounting it with the localalloc=16 option that I found in a previous mailing list post. The ocfs2 filesystem is 2TB and has about 600GB of maildirs on it (many small files). The ext3 machine is about 200GB and has a couple of workstation images on it (a mix of file sizes). I haven't yet been able to narrow down whether this is vm vs. physical, ocfs2 vs. ext3, iSCSI vs. local, or something else. Has anyone else seen similar results or have some advice as to how to improve the situation? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] mem usage
I had read in the past that lots of RAM would be helpful for caching inodes, locks, and whatnot, so it concerned me that the machines didn't appear to be using it. As for my goals, the machines will be hosting maildirs so I think that caching directories would be of most use. Although I agree the ls -lR is perhaps not the best test, I saw similar results with other VMs that didn't have ocfs2, so I'm leaning towards an iSCSI/VM problem and not ocfs2. However, getting it to cache more directories would still be of interest to me. I'll have to report back later about the number of disk reads. Thanks, Brian Herbert van den Bergh herbert.van.den.be...@oracle.com 2009-01-05 10:09: The ls -lR command will only access directory entries and inodes. These are cached in slabs (see /proc/slabinfo). Not sure what happens to the disk pages that they live in, but those disk pages can be discarded immediately after the slab caches have been populated. So it's probably not a good test of filesystem data caching. You may want to explain what you hope to achieve with this: more cache hits on directory and inode entries, or on file data? Are you seeing more disk reads in this configuration than in the one you're comparing with? Thanks, Herbert. Brian Kroth wrote: I've got a question about tuning mem usage which may or may not be ocfs2 related. I have some VMs that share an iSCSI device formatted with ocfs2. They're all running a debian based 2.6.26 kernel. We basically just dialed the kernel down to 100HZ rather than the default 1000HZ. Everything else is the same. All the machines have 2 CPUs and 2GB of RAM. Over time I would expect that the amount of free mem decreases towards 0 and the amount of (fs) cached mem increases. I think one can simulate this by doing the following: echo 1 /proc/sys/vm/drop_caches ls -lR /ocfs2/ /dev/null When I do this on a physical machine with a large ext3 volume, the cached field steadily increases as I expected. However, on the ocfs2 volume what I actually see is that the free mem and cache mem remains fairly constant. # free total used free sharedbuffers cached Mem: 2076376 9892561087120 0 525356 33892 -/+ buffers/cache: 4300081646368 Swap: 1052216 01052216 # top -n1 | head -n5 top - 11:11:19 up 2 days, 20:26, 2 users, load average: 1.45, 1.39, 1.26 Tasks: 140 total, 1 running, 139 sleeping, 0 stopped, 0 zombie Cpu(s): 0.8%us, 3.7%sy, 0.0%ni, 79.5%id, 15.2%wa, 0.1%hi, 0.6%si, 0.0%st Mem: 2076376k total, 989772k used, 1086604k free, 525556k buffers Swap: 1052216k total,0k used, 1052216k free,33920k cached I've tried to trick the machines into allowing for more cached inodes by decreasing vfs_cache_pressure, but it doesn't seem to have had much affect. I also get the same results if only one machine has the ocfs2 fs mounted. I have also tried mounting it with the localalloc=16 option that I found in a previous mailing list post. The ocfs2 filesystem is 2TB and has about 600GB of maildirs on it (many small files). The ext3 machine is about 200GB and has a couple of workstation images on it (a mix of file sizes). I haven't yet been able to narrow down whether this is vm vs. physical, ocfs2 vs. ext3, iSCSI vs. local, or something else. Has anyone else seen similar results or have some advice as to how to improve the situation? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] mailcluster advice?
I'm working on setting up a mail cluster (imap, pop, mx) using ocfs2. Does anyone have any advice or experiences they'd like to share? I've done web and video clusters before with great success, however those were structured in a such a way that there was generally only ever one write node active at a time and many read nodes. Mail won't have that feature. It's all maildir, so there's no file locking, but one does need to be concerned about ocfs2 lock contention. Currently our thought is to have one node for mx, pop, and imap each. Ignoring pop for the moment, that means there's potentially a source of contention on $mail_folder/new when sendmail is pushing mail and dovecot is trying to move mail to $mail_folder/cur. After that imap connections should be on their own as far as other nodes are concerned. There may be multiple connections from different clients to a single folder, but that's still confined to the node serving imap, so in theory obtaining locks on the directory should be quick. I'm ignoring pop because the bulk of the users use imap and those that use pop almost exclusively use pop, so the analysis remains the same. Thoughts? Also, does ocfs2 support dnotify or inotify? Thanks, Brian ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users