Re: [Ocfs2-users] [ocfs2-users] RedHat 4 Update 2
We will be releasing one by tomorrow. Christophe JOBARD (GHH) wrote: Hi, Where can i get the RPM's of the OCFS2 software for the new Red Hat Enterprise 2.6.9-22.0.2 kernel (RH4 Update 2) ? Many Thanks, Christophe JOBARD ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Problem with configuration file.
oops... it'll be fixed today. Mathieu Avila wrote: Norbert Tretkowski wrote: * Mathieu Avila wrote: I must have missed something obvious, but i can't see what. Any ideas? You forgot indention in the configuration file. Norbert Thank you very much. I have taken the example file from the user's guide (http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_users_guide.pdf), in which there is a bad indention. -- Mathieu Avila ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Add a new node to ocfs cluster
o2cb_ctl -C -n hostname -t node -a cluster=clustername -a ip_address=ip -a ip_port=port -a number=nodenum -i For e.g. # o2cb_ctl -C -n node99 -t node -a cluster=clus5 -a ip_address=192.168.0.99 -a ip_port= -a number=99 -i Refer man o2cb_ctl for more details. Vanaclocha Llorens, Jose Lorenzo wrote: Many thanks for your help Sunil. One more question my final intention is to do it in a silent mode. Can the ocfs2console step be replaced with a sequence of commands? I mean, my final goal is add a node to an existing Oracle RAC automatically. Do you think that it is possible from the ocfs perspective? Best regards, Llorenç Vanaclocha -Mensaje original- De: Sunil Mushran [mailto:[EMAIL PROTECTED] Enviado el: sábado, 11 de marzo de 2006 0:29 Para: Vanaclocha Llorens, Jose Lorenzo CC: ocfs2-users@oss.oracle.com Asunto: Re: [Ocfs2-users] Add a new node to ocfs cluster One can add nodes dynamically. Run ocfs2console on all existing nodes to add the new node. Note adding on one and propagating to others will not work. One needs to add on each active node via ocfs2console. Once added, the new node should show up in /config/cluster/clustername/nodes. Ensure you see the new node on all active nodes. Then copy the updated cluster.conf to the new node and start the cluster and mount. Vanaclocha Llorens, Jose Lorenzo wrote: Hi everybody, My problem is that I want to add a new node to an existing RAC with ocfs2, without stop the database. If I add a new node to an existing ocfs cluster, do I need to stop the ocfs in the others nodes of the cluster? I've tried to do it without stop the ocfs in the others nodes but I get the following error: -- [EMAIL PROTECTED] ~]# mount.ocfs2 /dev/mapper/eva_d1 /u01/app/oracle mount.ocfs2: Transport endpoint is not connected while mounting /dev/mapper/eva_d1 on /u01/app/oracle -- I've three nodes: raclab3, raclab4 and raclab5. I'm trying to add raclab3 to the existing cluster formed by raclab4 and raclab5. In the raclab3 I have the following folder: -- [EMAIL PROTECTED] ~]# ls -l /config/cluster/ocfs2/node/ total 0 drwxr-xr-x 2 root root 0 Mar 10 11:53 raclab3 drwxr-xr-x 2 root root 0 Mar 10 11:47 raclab4 drwxr-xr-x 2 root root 0 Mar 10 11:47 raclab5 -- But in the raclab4 and raclab5 I have: -- [EMAIL PROTECTED] ~]# ls -l /config/cluster/ocfs2/node/ total 0 drwxr-xr-x 2 root root 0 Mar 10 12:47 raclab4 drwxr-xr-x 2 root root 0 Mar 10 12:47 raclab5 -- If I execute in both nodes: -- [EMAIL PROTECTED] ~]# /etc/init.d/o2cb disable Writing O2CB configuration: OK [EMAIL PROTECTED] ~]# /etc/init.d/o2cb enable Writing O2CB configuration: OK Loading module configfs: OK Mounting configfs filesystem at /config: OK Loading module ocfs2_nodemanager: OK Loading module ocfs2_dlm: OK Loading module ocfs2_dlmfs: OK Mounting ocfs2_dlmfs filesystem at /dlm: OK Starting cluster ocfs2: OK -- I get information of raclab3: -- [EMAIL PROTECTED] ~]# ls -l /config/cluster/ocfs2/node/ total 0 drwxr-xr-x 2 root root 0 Mar 10 12:51 raclab3 drwxr-xr-x 2 root root 0 Mar 10 12:51 raclab4 drwxr-xr-x 2 root root 0 Mar 10 12:51 raclab5 -- And finally I can mount my file system in the raclab3 node. Summarizing, is it possible to add a new node without stop the ocfs in all the cluster nodes
Re: [Ocfs2-users] nodes dont see eachother pls help!
Is this a shared disk? Do: # echo stats | debugfs.ocfs2 -n /dev/sdX | grep UUID on all nodes Is the UUID the same? Oneill wrote: Hi! I working on an oracle cluster but I cannot get fahrer because ocfs2 nodes dont synchronize. I can create ocfs2 filesystem both mashine if i want but they totally dont see eachother and it's not a network error (unsecure fedora core4 boxes without firewall or security patch etc.) , all settings perfect, i generated cluster.conf many times with ocfs2console and manually too, but cant help. I read all writeings on the ocfs2 page and really dont know why they dont work. As I said there is 2 fedora core 4 box, config the same, I compiled the kernel with your ocfs2 patch (version:2.6.14), all startup scripts, and ocfs2console works perfectly, there is no error in the logs, 2 mashine can ping eachother but i checked traffic when tried to setup the cluster and there isnt a single packet going to port . Disk partitions same both sides, I tried to format and mount volumes node1, node2 ; node2, node1, nothing, happens the other node... And 1 more think /proc/fs/ocfs2 dont exits! I dont know why, i can format and mount ocfs2 particions locally. Help me pls! Thanks: Oneill ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Getting eI am using RHLError when mounting shar ed OCFS2 device.
/etc/hosts is not the problem. Do: /sbin/ifconfig Do you see the vip bound on the same interface as the one used in cluster.conf? Also, what does the dmesg indicate on both nodes. The lower node number will list the ip which is trying to connect to it. Vaidya, Sachin wrote: Removed VIPs from hosts and restarted the cluster. But nothing changed. Still cannot mount /dev/md0 on both nodes. Do I need to reboot servers after changing the /etc/hosts ? Any other suggestions ? Thanks, Sachin Vaidya Infrastructure Management Senior Analyst Affiliated Computer Services -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, March 30, 2006 5:34 PM To: Vaidya, Sachin Cc: ''ocfs2-users@oss.oracle.com' ' Subject:Re: [Ocfs2-users] Getting eI am using RHLError when mountingshar ed OCFS2 device. Remove vip and mount on both. See if that helps. Vaidya, Sachin wrote: Hi, Tried both public and private ip addreses but still not able to mount device on both nodes. Here are my configuration details. hosts file : same on both nodes. 127.0.0.1 localhost.localdomain localhost 172.18.11.12acspittdw001acspittdw001.servicemetrics.net 172.18.22.1 priv-acspittdw001 172.18.11.24vip-acspittdw001 172.18.11.13acspittdw002acspittdw002.servicemetrics.net 172.18.22.2 priv-acspittdw002 172.18.11.25vip-acspittdw002 The cluster.conf on both nodes looks same as node: ip_port = ip_address = 172.18.11.12 number = 0 name = acspittdw001 cluster = ocfs2 node: ip_port = ip_address = 172.18.11.13 number = 1 name = acspittdw002 cluster = ocfs2 cluster: node_count = 2 name = ocfs2 Both nodes can ping each other on public and private ips. The mount command produces following error on node 2 when device is already mounted on node 1. [EMAIL PROTECTED] ~]# mount -t ocfs2 /dev/md0 /crs1 mount.ocfs2: Transport endpoint is not connected while mounting /dev/md0 on /crs1 [EMAIL PROTECTED] ~]# dmesg show following messages SELinux: initialized (dev debugfs, type debugfs), uses genfs_contexts (5027,2):ocfs2_initialize_super:1354 max_slots for this device: 8 (5027,2):ocfs2_fill_local_node_info:1031 I am node 1 (4986,2):o2net_connect_expired:1446 ERROR: no connection established with node 0 after 10 seconds, giving up and returning errors. (5027,2):dlm_request_join:771 ERROR: status = -107 (5027,2):dlm_try_to_join_domain:919 ERROR: status = -107 (5027,2):dlm_join_domain:1164 ERROR: status = -107 (5027,2):dlm_register_domain:1354 ERROR: status = -107 (5027,2):ocfs2_dlm_init:1996 ERROR: status = -107 (5027,2):ocfs2_mount_volume:1063 ERROR: status = -107 ocfs2: Unmounting device (9,0) on (node 1) [EMAIL PROTECTED] ~]# Any idea why this is happening ? I can provide more details if needed. Any help will be greatly appreciated. Thanks in advance. - Sachin Vaidya. -Original Message- From: Sunil Mushran To: Vaidya, Sachin Cc: 'ocfs2-users@oss.oracle.com' Sent: 3/29/2006 7:16 PM Subject: Re: [Ocfs2-users] Getting eI am using RHLError when mounting shared OCFS2 device. Connection failiure. Check dmesg. Mount triggers the heartbeat thread which triggers the o2net to make a connection to all heartbeating nodes. If this connection fails, the mount fails. (The larger node number initiates the connection to the lower node number.) Obvious error would be incorrect ipaddr specified in cluster.conf. Error messages in /var/log/messsages on both nodes will provide more clues. Vaidya, Sachin wrote: Hi, I am using RHLE4 2.6.9-34.Elsmp with OCFS2 1.2. The h/w for this 2 node cluster is connected correctly. After loading ocfs2 on both nodes, the shared device could only be mounted on one node. When I try to mount same shared device on second node then I get following error. Mount.ocfs2: Transport endpoint is not connected while mounting /dev/md0 on /crs1 Any idea, why this is happening ? Any help will be highly appreciated. Thanks, Sachin Vaidya ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] understanding self fencing with ocfs2
In a 2 node setup, if node 0 or 1 crashes, the other node should survive. The one issue encountered by many users was while shutting down node 0, node 1 would fence it self. The latter was because of the sequencing of service shutdowns. We added ocfs2-init script to handle shutdown sequencing. However, 1.0.2 is fairly old. We've made numerous fixes. Ideally one should be on SP3. Infact, look for SuSE to make a new drop in the coming weeks which will include the certified ocfs2 bits. [EMAIL PROTECTED] wrote: hi list, heaving read the FAQ, I still have a problem understanding the self fencing thing. the FAQ sais: Q02 How does OCFS2's cluster services define a quorum? A02 ... A node has quorum when: * it sees an odd number of heartbeating nodes and has network connectivity to more than half of them. or * it sees an even number of heartbeating nodes and has network connectivity to at least half of them *and* has connectivity to the heartbeating node with the lowest node number. and Q03 What is fencing? A03 Fencing is the act of forecefully removing a node from a cluster. A node with OCFS2 mounted will fence itself when it realizes that it doesn't have quorum in a degraded cluster. ... with a two-node-cluster with node numbers 0 and 1, I see the following problem. if the node with node number 0 crashes and neither does heartbeat nor is it reachable via LAN, we have: - an odd number of heartbeating nodes (1, the node with number 1) but - no network connectivity to more than half of them (the only other node is'nt reachable anymore) so, as I see it, no qorum = self fencing. as a result, we end up with no node at all. is this right (and is it meant that way) or is there any special algorithm in a two node environment? our config is: two HP DL380 G4, SLES9 SP2 (no SP3, because it's not supported by EMC powerpath) Linux bmiam112 2.6.5-7.201-bigsmp #1 SMP Thu Aug 25 06:20:45 UTC 2005 i686 i686 i386 GNU/Linux all OCFS modules version 1.0.2-SLES, ocfs2console-0.99.14-0.3 ocfs2-tools-0.99.14-0.3 each with two NICs in active-standby (bond0) thanks in advance and sorry, if this is kind of a newby-question greetings thomas zimolong Bundesministerium des Inneren Referat Z 6 - Funktionsbereich Anwendungsentwicklung Alt-Moabit 101 D D-10559 Berlin Fon 01888 681 2383 Fax 01888 681 5 2383 mailto:[EMAIL PROTECTED] http://bmi.bund.de ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Node panic
You may want to upgrade to 1.2.1. We have done fixes in this area. Jim Erb wrote: Can anyone tell me what might be happening here. I have a 3 node cluster running under RH AS 4 (2.6.9-22.0.1.ELsmp) with ocfs2 v. 1.2.0-1. I've recently implemented elevator=deadline in grub.conf to fix some previous panics, but now it seems this box goes down every few days with this panic: May 10 13:46:10 linux97 kernel: (29579,0):ocfs2_extend_file:784 ERROR: bug expression: i_size_read(inode) != (le64_to_cpu(fe-i_size) - *bytes_extended) May 10 13:46:10 linux97 kernel: (29579,0):ocfs2_extend_file:784 ERROR: Inode 3891726 i_size = 77801, dinode i_size = 79865, bytes_extended = 0, new_i_size = 77874 May 10 13:46:10 linux97 kernel: [ cut here ] May 10 13:46:10 linux97 kernel: kernel BUG at /rpmbuild/jlbec/BUILD/ocfs2-1.2.0/fs/ocfs2/file.c:784! May 10 13:46:10 linux97 kernel: invalid operand: [#1] May 10 13:46:10 linux97 kernel: SMP May 10 13:46:10 linux97 kernel: Modules linked in: nfs lockd hangcheck_timer md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core ocfs2(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) sunrpc dm_mirror dm_mod emcphr(U) emcpmpap(U) emcpmpaa(U) emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac uhci_hcd ehci_hcd hw_random shpchp e1000 bonding(U) floppy sg ext3 jbd lpfc(U) scsi_transport_fc megaraid_mbox megaraid_mm sd_mod scsi_mod May 10 13:46:10 linux97 kernel: CPU:0 May 10 13:46:10 linux97 kernel: EIP:0060:[f8f8c635]Tainted: P VLI May 10 13:46:10 linux97 kernel: EFLAGS: 00210292 (2.6.9-22.0.1.ELsmp) May 10 13:46:10 linux97 kernel: EIP is at ocfs2_extend_file+0x380/0xf25 [ocfs2] May 10 13:46:10 linux97 kernel: eax: 0086 ebx: ecx: ea42fe6c edx: f8fb52b5 May 10 13:46:10 linux97 kernel: esi: f4144e24 edi: ea42ff18 ebp: cc54 esp: ea42fea4 May 10 13:46:10 linux97 kernel: ds: 007b es: 007b ss: 0068 May 10 13:46:10 linux97 kernel: Process oracle (pid: 29579, threadinfo=ea42f000 task=e8bd11b0) May 10 13:46:10 linux97 kernel: Stack: f70d0380 f4144e24 f6ef4f00 ea42ff58 May 10 13:46:10 linux97 kernel: e53ff48c f7fbc200 ea42ff68 ea42ff68 ea42ff68 May 10 13:46:10 linux97 kernel: f4144e24 f8f9a4ee 00013032 ea42ff18 00012fe9 May 10 13:46:10 linux97 kernel: Call Trace: May 10 13:46:10 linux97 kernel: [f8f9a4ee] ocfs2_write_lock_maybe_extend+0x731/0xad5 [ocfs2] May 10 13:46:10 linux97 kernel: [f8f8a684] ocfs2_file_write +0x11f/0x254 [ocfs2] May 10 13:46:10 linux97 kernel: [c0159d24] vfs_write+0xb6/0xe2 May 10 13:46:10 linux97 kernel: [c0159dee] sys_write+0x3c/0x62 May 10 13:46:10 linux97 kernel: [c02d0fb7] syscall_call+0x7/0xb May 10 13:46:10 linux97 kernel: Code: b1 e0 fd ff ff ff b1 dc fd ff ff 68 10 03 00 00 68 01 fb fa f8 ff 70 10 ff b2 94 00 00 00 68 b5 52 fb f8 e8 f8 5b 19 c7 83 c4 3c 0f 0b 10 03 1c 50 fb f8 8b 5c 24 10 8b 83 54 01 00 00 0f ae e8 May 10 13:46:10 linux97 kernel: 0Fatal exception: panic in 5 seconds ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 hangs system on reboot
ocfs2-tools includes two init scripts, o2cb and ocfs2. Ensure the scripts are active and running in the correct sequence. As in, the startup seq should network, o2cb and then ocfs2. The shutdown is the reverse of that. [EMAIL PROTECTED] wrote: Anyone experience OCFS2 hanging the system on reboot, I'm running OCFS2 1.2.1-1 on RHEL 4 Update 3 64bit. OCFS2 is up and running on 3 nodes, with mounts. When I issue a shutdown -ry now command with OCFS2 mounts still mounted the system begins to shutdown then starts freaking out about not being able to communicate with other nodes in the cluster and starts panicing and fences itself. It hangs here and I have to cycle the server by hand. This is not a problem if I manually unmount the OCFS2 filesystems prior to rebooting, I've tried putting an unmount script in /etc/rc6.d but to no avail, whatever is happening is happening before it get to my unmount script. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Disk-based DLM
OCFS2 does not have a disk-based dlm. Net connectivity is a must. Leonardo de Assis wrote: Hi, I have two machines that does not have network connection. If my disk can be shared between them, there is an way to use disk-based dlm or any other manner that does not relay on network access? -- Leonardo de Assis Computação - UFCG [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] RHEL 4 U2 / OCFS 1.2.1 weekly crash?
The hb failure is just the effect of the ios not completing within 12 secs. The full oops trace gives the last 24 ops and their timings. One solution is to double up the hb timeout. Set, O2CB_HEARTBEAT_THRESHOLD = 14 Brian Long wrote: Hello, I have two nodes running the 2.6.9-22.0.2.ELsmp kernel and the OCFS2 1.2.1 RPMs. About once a week, one of the nodes crashes itself (self- fencing) and I get a full vmcore on my netdump server. The netdump log file shows the shared filesystem LUN (/dev/dm-6) did not respond within 12000ms. I have not changed the default heartbeat values in /etc/sysconfig/o2cb. There was no other IO ongoing when this happens, but they are HP Proliant servers running the Insight Manager agents. Why would the heartbeat fail roughly once a week? Should I open a bugzilla and upload my netdump log file? Thanks. /Brian/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] bug in /etc/init.d/o2cb?
Yes, we are missing that bit. File a bug on http://oss.oracle.com/bugzilla component ocfs2-tools. [EMAIL PROTECTED] wrote: hi, maybe this is not the place to file a bug, but I think there is one in /etc/init.d/o2cb. the script should be used to create the config file /etc/sysconfig/o2cb by calling it with o2cb configure, and the generated config file contains info not to edit the file, but to use the script. alas, the script only checks for to parameters, the cluster name and wether to enable the cluster or not. the sometimes necessary modification of the heartbeat threshold cannot be made via the script, and, even worse, it is overwritten by write_config(), when someone calls the script after the modification was made using some editor. so, maybe configure_ask() should contain a third loop to ask for that parameter to (or any parameter which additionaly can be set). greets thomas zimolong Bundesministerium des Inneren Referat Z 6 - Funktionsbereich Anwendungsentwicklung Alt-Moabit 101 D D-10559 Berlin Fon 01888 681 2383 Fax 01888 681 5 2383 mailto:[EMAIL PROTECTED] http://bmi.bund.de ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Different versions of ocfs2 and Kernel
I would not recommend that for 1.1.7 to 1.2.1. While neither the on-disk format nor the messaging has changed since 1.0, there have been other internal changes which could cause problems. The recommendation is documented in the faq... under Upgrading to 1.2.1. Marco Friebe wrote: Thanks for your answers. It was only meant to be a temporary solution. While updating one node the others have to be available. No downtime for all nodes in the same time frame. --- Marco Friebe __ Systemberater Robotron Datenbank-Software GmbH Stuttgarter Straße 29 01189 Dresden Telefon: +49 (0) 351/4021 655 Telefax: +49 (0) 351/4021 696 Mailto: [EMAIL PROTECTED] Web: www.robotron.de -Ursprüngliche Nachricht- Von: Sunil Mushran [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 13. Juni 2006 18:14 An: Marco Friebe Cc: ocfs2-users@oss.oracle.com Betreff: Re: [Ocfs2-users] Different versions of ocfs2 and Kernel As we never test such mixed setups, we never recommended it. Also, it is much easier to manage clusters when one has the same software on all the nodes. Marco Friebe wrote: Hello, is it possible to have different kernel and ocfs version on different nodes? Here: ocfs2 v1.1.7 Kernel 2.6.5-7.244 (SuSE) and Ocfs2 v1.2.1 Kernel 2.6.5-7.257 (SuSE) Marco Friebe __ Systemberater Robotron Datenbank-Software GmbH Stuttgarter Straße 29 01189 Dresden Telefon: +49 (0) 351/4021 655 Telefax: +49 (0) 351/4021 696 Mailto: [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] Web: www.robotron.de http://www.robotron.de ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] change heartbeat threshold online?
It's not a sysctl entry. It won't work that way. Set the required value in /etc/sysconfig/o2cb and restart the cluster. Do it on all nodes. [EMAIL PROTECTED] wrote: hi, I'm just thinking about changing the heartbeat threshold of our cluster online by issuing # echo 31 /proc/fs/ocfs2_nodemanager/hb_dead_threshold I thought I read that somewhere ut cannot recall where and I don't find it in the FAQ(?). So is this the way to do it, or is it stop and restart ocfs2/o2cb? greetz thomas zimolong Bundesministerium des Inneren Referat Z 6 - Funktionsbereich Anwendungsentwicklung Alt-Moabit 101 D D-10559 Berlin Fon 01888 681 2383 Fax 01888 681 5 2383 mailto:[EMAIL PROTECTED] http://bmi.bund.de ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] kernel BUG at /rpmbuild/smushran/BUILD/ocfs2-1.2.1/fs/ocfs2/file.c:787!
Check out http://oss.oracle.com/bugzilla/show_bug.cgi?id=723 Peter McMahon wrote: All still working on the use of OCFS2 Yesterday, when we were running autoconfig for an Apps DB node in a RAC cluster the other node crashed extract from /var/log/messages...is below... If anyone can advise of what action we should take please do Thanks in advance Peter other info which me be relavent - o2cb_ctl version 1.2.1 - Linux 2.6.9-34.ELsmp #1 SMP Fri Feb 24 16:54:53 EST 2006 i686 i686 i386 GNU/Linux - o2cb status Module configfs: Loaded Filesystem configfs: Mounted Module ocfs2_nodemanager: Loaded Module ocfs2_dlm: Loaded Module ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking cluster ocfs2: Online Checking heartbeat: Active - ocfs2 status Configured OCFS2 mountpoints: /home /d01 /d02 /d03 /d04 /CRS Active OCFS2 mountpoints: /home /d01 /d02 /d03 /d04 /CRS - O2CB_HEARTBEAT_THRESHOLD=60 === Jun 20 13:15:21 tudbsou01 kernel: (1659,1):ocfs2_extend_file:787 ERROR: bug expression: i_size_read(inode) != (le64_to_cpu(fe-i_size) - *bytes_extended) Jun 20 13:15:21 tudbsou01 kernel: (1659,1):ocfs2_extend_file:787 ERROR: Inode 16615168 i_size = 225, dinode i_size = 3774, bytes_extended = 0, new_i_size = 345 Jun 20 13:15:21 tudbsou01 kernel: [ cut here ] Jun 20 13:15:21 tudbsou01 kernel: kernel BUG at /rpmbuild/smushran/BUILD/ocfs2-1.2.1/fs/ocfs2/file.c:787! Jun 20 13:15:21 tudbsou01 kernel: invalid operand: [#1] Jun 20 13:15:21 tudbsou01 kernel: SMP Jun 20 13:15:21 tudbsou01 kernel: Modules linked in: md5 ipv6 autofs4 i2c_dev i2c_core ocfs2(U) debugfs(U) ocfs2_dlmfs(U) ocfs2_dlm(U) ocfs2_nodemanager(U) configfs(U) sunrpc emcphr(U) emcpmpap(U) emcpmpaa(U) emcpmpc(U) emcpmp(U) emcp(U) emcplib(U) button battery ac joydev uhci_hcd shpchp tg3 sg st dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300(U) qla2xxx(U) qla2xxx_conf(U) mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Jun 20 13:15:21 tudbsou01 kernel: CPU:1 Jun 20 13:15:21 tudbsou01 kernel: EIP: 0060:[f93ce081]Tainted: P VLI Jun 20 13:15:21 tudbsou01 kernel: EFLAGS: 00010292 (2.6.9-34.ELsmp) Jun 20 13:15:21 tudbsou01 kernel: EIP is at ocfs2_extend_file+0x380/0xf25 [ocfs2] Jun 20 13:15:21 tudbsou01 kernel: eax: 0081 ebx: ecx: e71c8e6c edx: f93f726f Jun 20 13:15:21 tudbsou01 kernel: esi: f3b0d624 edi: e71c8f18 ebp: f3a6b000 esp: e71c8ea4 Jun 20 13:15:21 tudbsou01 kernel: ds: 007b es: 007b ss: 0068 Jun 20 13:15:21 tudbsou01 kernel: Process racgmain (pid: 1659, threadinfo=e71c8000 task=f6eea930) Jun 20 13:15:21 tudbsou01 kernel: Stack: f6ffef40 f3b0d624 f600ad80 e71c8f58 Jun 20 13:15:21 tudbsou01 kernel: f3921458 f7e53a00 e71c8f68 e71c8f68 e71c8f68 Jun 20 13:15:21 tudbsou01 kernel: f3b0d624 f93dc213 0159 e71c8f18 00e1 Jun 20 13:15:21 tudbsou01 kernel: Call Trace: Jun 20 13:15:21 tudbsou01 kernel: [f93dc213] ocfs2_write_lock_maybe_extend+0x731/0xad5 [ocfs2] Jun 20 13:15:21 tudbsou01 kernel: [f93cc0d0] ocfs2_file_write+0x11f/0x254 [ocfs2] Jun 20 13:15:21 tudbsou01 kernel: [c015a5e8] vfs_write+0xb6/0xe2 Jun 20 13:15:21 tudbsou01 kernel: [c015a6b2] sys_write+0x3c/0x62 Jun 20 13:15:21 tudbsou01 kernel: [c02d2657] syscall_call+0x7/0xb Jun 20 13:15:21 tudbsou01 kernel: [c02d007b] schedule+0x32f/0x8d3 Jun 20 13:15:21 tudbsou01 kernel: Code: b1 e0 fd ff ff ff b1 dc fd ff ff 68 13 03 00 00 68 b5 18 3f f9 ff 70 10 ff b2 94 00 00 00 68 6f 72 3f f9 e8 bc 45 d5 c6 83 c4 3c 0f 0b 13 03 d3 6f 3f f9 8b 5c 24 10 8b 83 54 01 00 00 0f ae e8 Jun 20 13:15:21 tudbsou01 kernel: 0Fatal exception: panic in 5 seconds Jun 20 13:25:25 tudbsou01 syslogd 1.4.1: restart. The LOST Ninja blog: Exclusive clues, clips and gossip. http://au.blogs.yahoo.com/lostninja ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Error while Mounting
Is it always the mount using node slot 1 that fails? If so, the jbd superblock may be corrupted for that slot. Grow the journal by, say, 1MB. It will reinitialize the JBD superblock for all the slots. Either that or just reformat the device. To see the size of the existing journal, do: # echo ls -l // | debugfs.ocfs2 -n /dev/sda1 | grep journal 36 -rw-r--r-- 1 0 067108864 21-Jun-2006 11:58 journal: 37 -rw-r--r-- 1 0 067108864 21-Jun-2006 11:58 journal:0001 38 -rw-r--r-- 1 0 067108864 21-Jun-2006 11:58 journal:0002 39 -rw-r--r-- 1 0 067108864 21-Jun-2006 11:58 journal:0003 The grow the journal, do: # tunefs.ocfs2 -Jsize=65M /dev/sdX Zachary Williams wrote: I am attempting to setup a 2 node ocfs2 cluster. At this point, I have the latest 1.2.1 version of the tools on both nodes. They are not running identical kernels (one is 2.6.16.18 http://2.6.16.18, the other is 2.6.17.1 http://2.6.17.1) both are using the kernels built in OCFS2 modules, not using from source. I can mount my iscsi volume on either node individually, but when I attempt to mount two nodes, I get the following error. (To confirm, I have 2 nodes setup in the config file, and the filesystem set to a maximum of 4 nodes) The error is JDB: no valid journal superblock found I have searched high and low for this, but wasn't able to come up with anything as to why I get this. This error will occur on either node. (3509,0):o2net_set_nn_state:415 accepted connection from node bsp (num 1) at 10.1.1.11: http://10.1.1.11: (3575,0):ocfs2_initialize_super:1326 max_slots for this device: 4 (3575,0):ocfs2_fill_local_node_info:1019 I am node 0 (3575,0):__dlm_print_nodes:377 Nodes in my domain (E09A0D90C8454749B81E9754438611B8): (3575,0):__dlm_print_nodes:381 node 0 (3575,0):__dlm_print_nodes:381 node 1 (3575,0):ocfs2_find_slot:267 taking node slot 1 JBD: no valid journal superblock found (3575,0):ocfs2_journal_wipe:814 ERROR: status = -22 (3575,0):ocfs2_check_volume:1581 ERROR: status = -22 (3575,0):ocfs2_mount_volume:1087 ERROR: status = -22 ocfs2: Unmounting device (8,16) on (node 0) (3577,0):o2net_set_nn_state:400 no longer connected to node bsp (num 1) at 10.1.1.11: http://10.1.1.11: ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] out of memory?
I would like the entire /proc/meminfo and /proc/slabinfo. Dump it to a file every 1 min or so. What version of the kernel/ocfs2? Paul Jimenez wrote: On Jun 29, 2006, at 8:22 AM, Brian Long wrote: On Wed, 2006-06-28 at 17:03 -0500, Paul Jimenez wrote: I'm getting out of memory errors trying to do 'rsync -av /foo /bar' where /foo is a local dir and /bar is an ocfs2 filesystem running on an ~ 6T ATA-over-Ethernet box. Paul, Can you also include some information about your /foo partition? It is millions of little files or hundreds of large files? What is the RSS of rsync when you run out of memory? http://samba.anu.edu.au/rsync/FAQ.html#5 http://lists.samba.org/archive/rsync/2002-July/003160.html /foo is ~ 4600 files each about 60GB for a total of ~259GB. Some output after or slightly-before it crashed: Every 2s: cat /proc/slabinfo | sort -rnk 2 | head Thu Jun 29 11:58:01 2006 buffer_head 754620 754632 52 721 : tunables 120 608 : slabdata 10481 10481 0 bio 225600 225600128 301 : tunables 120 608 : slabdata 7520 7520 0 biovec-1 225593 225736 16 2031 : tunables 120 608 : slabdata 1112 1112 0 journal_head 175548 182448 52 721 : tunables 120 608 : slabdata 2530 2534 0 aoe_bufs 112536 112554 48 781 : tunables 120 608 : slabdata 1443 1443 0 radix_tree_node41510 41510276 141 : tunables 54 278 : slabdata 2965 2965 0 sysfs_dir_cache 3644 3772 40 921 : tunables 120 608 : slabdata 41 41 0 size-32 2938 4407 32 1131 : tunables 120 608 : slabdata 39 39 0 size-64 2354 2596 64 591 : tunables 120 608 : slabdata 44 44 0 dentry_cache2086 3090128 301 : tunables 120 608 : slabdata103103 0 Free swap: 16779608kB 4718592 pages of RAM 4489216 pages of HIGHMEM 562809 reserved pages 530215 pages shared 0 pages swap cached 136994 pages dirty 61878 pages writeback 142502 pages mapped 29403 pages slab 480 pages pagetables 4718592 pages of RAM 4489216 pages of HIGHMEM 562809 reserved pages 530215 pages shared 0 pages swap cached 136994 pages dirty 61876 pages writeback 142502 pages mapped 29425 pages slab 480 pages pagetables I don't think it's rsync running things oom; its memory consumption is filecount based and 4600 files just isn't that many. The tunables that I had in place from the AoE faq (http:// www.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html#toc5.18) this time were: vm.overcommit_memory=2 vm.dirty_ratio=3 vm.dirty_background_ratio=3 vm.min_free_kbytes=5120 Any help appreciated. --pj ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] out of memory?
HighFree: 11877028 kB LowFree:391020 kB HighFree: 11761892 kB LowFree:342380 kB HighFree: 11654316 kB LowFree:315860 kB HighFree: 11578756 kB LowFree:291928 kB HighFree: 11490936 kB LowFree:264788 kB That's at the end. I fail to see the enomem. Plenty of lowfree and highfree. Some of the slabs do have high counts, but this is a big box. What is crashing? Is the server oopsing? oom-kill? Or, is the user-space process erroring out? Paul Jimenez wrote: I have that complete file - from before rsync to the crash (~ 4MB) at http://www.rgmadvisors.com/~pj/memslabinfo. Kernel is 2.6.16.7 vanilla, and the version of ocfs2 it came with. --pj On Jun 29, 2006, at 2:10 PM, Sunil Mushran wrote: I would like the entire /proc/meminfo and /proc/slabinfo. Dump it to a file every 1 min or so. What version of the kernel/ocfs2? Paul Jimenez wrote: On Jun 29, 2006, at 8:22 AM, Brian Long wrote: On Wed, 2006-06-28 at 17:03 -0500, Paul Jimenez wrote: I'm getting out of memory errors trying to do 'rsync -av /foo /bar' where /foo is a local dir and /bar is an ocfs2 filesystem running on an ~ 6T ATA-over-Ethernet box. Paul, Can you also include some information about your /foo partition? It is millions of little files or hundreds of large files? What is the RSS of rsync when you run out of memory? http://samba.anu.edu.au/rsync/FAQ.html#5 http://lists.samba.org/archive/rsync/2002-July/003160.html /foo is ~ 4600 files each about 60GB for a total of ~259GB. Some output after or slightly-before it crashed: Every 2s: cat /proc/slabinfo | sort -rnk 2 | head Thu Jun 29 11:58:01 2006 buffer_head 754620 754632 52 721 : tunables 120608 : slabdata 10481 10481 0 bio 225600 225600128 301 : tunables 120608 : slabdata 7520 7520 0 biovec-1 225593 225736 16 2031 : tunables 120608 : slabdata 1112 1112 0 journal_head 175548 182448 52 721 : tunables 120608 : slabdata 2530 2534 0 aoe_bufs 112536 112554 48 781 : tunables 120608 : slabdata 1443 1443 0 radix_tree_node41510 41510276 141 : tunables 54278 : slabdata 2965 2965 0 sysfs_dir_cache 3644 3772 40 921 : tunables 120608 : slabdata 41 41 0 size-32 2938 4407 32 1131 : tunables 120608 : slabdata 39 39 0 size-64 2354 2596 64 591 : tunables 120608 : slabdata 44 44 0 dentry_cache2086 3090128 301 : tunables 120608 : slabdata103103 0 Free swap: 16779608kB 4718592 pages of RAM 4489216 pages of HIGHMEM 562809 reserved pages 530215 pages shared 0 pages swap cached 136994 pages dirty 61878 pages writeback 142502 pages mapped 29403 pages slab 480 pages pagetables 4718592 pages of RAM 4489216 pages of HIGHMEM 562809 reserved pages 530215 pages shared 0 pages swap cached 136994 pages dirty 61876 pages writeback 142502 pages mapped 29425 pages slab 480 pages pagetables I don't think it's rsync running things oom; its memory consumption is filecount based and 4600 files just isn't that many. The tunables that I had in place from the AoE faq (http:// www.coraid.com/support/linux/EtherDrive-2.6-HOWTO.html#toc5.18) this time were: vm.overcommit_memory=2 vm.dirty_ratio=3 vm.dirty_background_ratio=3 vm.min_free_kbytes=5120 Any help appreciated. --pj ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] out of memory?
Strange. The meminfo/slabinfo data does not match this. The deal is if none of the components are leaking memory, not much one can do other than limiting the lowmem consumption. So, yes, try HIGHPTE. If 4G/4G was in mainline, I would have suggested that too. Else, maybe just limit the box to 8G (from 16G). Or, just upgrade to a 64-bit box. :) Paul Jimenez wrote: [4296647.18] oom-killer: gfp_mask=0xd0, order=0 [4296647.181000] [c014148b] out_of_memory+0xb4/0xd1 [4296647.181000] [c0142627] __alloc_pages+0x267/0x2fa [4296647.181000] [c01426e4] __get_free_pages+0x2a/0x4e [4296647.181000] [c016fcb7] __pollwait+0x86/0xc7 [4296647.181000] [c03de7d4] datagram_poll+0x2b/0xcf [4296647.181000] [c04173f1] udp_poll+0x23/0xf7 [4296647.181000] [c03d7867] sock_poll+0x23/0x2b [4296647.181000] [c0170075] do_select+0x29b/0x2f5 [4296647.181000] [c016fc31] __pollwait+0x0/0xc7 [4296647.183000] [c01702e1] core_sys_select+0x1ed/0x316 [4296647.183000] [c01704c7] sys_select+0xbd/0x18d [4296647.183000] [c010221b] sys_sigreturn+0xcf/0xde [4296647.183000] [c0102ccd] syscall_call+0x7/0xb [4296647.183000] Mem-info: [4296647.183000] DMA per-cpu: [4296647.183000] cpu 0 hot: high 0, batch 1 used:0[4296647.183000] cpu 0 cold: high 0, batch 1 used:0 [4296647.184000] cpu 1 hot: high 0, batch 1 used:0[4296647.184000] cpu 1 cold: high 0, batch 1 used:0 [4296647.184000] cpu 2 hot: high 0, batch 1 used:0 [4296647.184000] cpu 2 cold: high 0, batch 1 used:0[4296647.184000] cpu 3 hot: high 0, batch 1 used:0 [4296647.184000] cpu 3 cold: high 0, batch 1 used:0 [4296647.184000] DMA32 per-cpu: empty[4296647.184000] Normal per-cpu: [4296647.184000] cpu 0 hot: high 186, batch 31 used:96[4296647.184000] cpu 0 cold: high 62, batch 15 used:54[4296647.184000] cpu 1 hot: high 186, batch 31 used:31 [4296647.184000] cpu 1 cold: high 62, batch 15 used:52 [4296647.184000] cpu 2 hot: high 186, batch 31 used:155 [4296647.184000] cpu 2 cold: high 62, batch 15 used:47 [4296647.184000] cpu 3 hot: high 186, batch 31 used:32 [4296647.184000] cpu 3 cold: high 62, batch 15 used:7 [4296647.184000] HighMem per-cpu: [4296647.184000] cpu 0 hot: high 186, batch 31 used:145 [4296647.185000] cpu 0 cold: high 62, batch 15 used:12 [4296647.185000] cpu 1 hot: high 186, batch 31 used:14 [4296647.185000] cpu 1 cold: high 62, batch 15 used:1 [4296647.185000] cpu 2 hot: high 186, batch 31 used:185 [4296647.185000] cpu 2 cold: high 62, batch 15 used:5 [4296647.185000] cpu 3 hot: high 186, batch 31 used:14 [4296647.185000] cpu 3 cold: high 62, batch 15 used:4 [4296647.185000] Free pages:14219236kB (14211892kB HighMem) [4296647.185000] Active:2840 inactive:406695 dirty:78930 writeback:147046 unstable:0 free:3554809 slab:26149 mapped:2601 pagetables:102 [4296647.185000] DMA free:3588kB min:88kB low:108kB high:132kB active:0kB inactive:0kB present:16384kB pages_scanned:6 all_unreclaimable? no [4296647.185000] lowmem_reserve[]: 0 0 880 18416 [4296647.185000] DMA32 free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no [4296647.185000] lowmem_reserve[]: 0 0 880 18416 [4296647.185000] Normal free:3756kB min:5028kB low:6284kB high:7540kB active:604kB inactive:324kB present:901120kB pages_scanned:414 all_unreclaimable? no [4296647.186000] lowmem_reserve[]: 0 0 0 140288[4296647.186000] HighMem free:14211892kB min:512kB low:6836kB high:13164kB active:10756kB inactive:1626456kB present:17956864kB pages_scanned:0 all_unreclaimable? no [4296647.186000] lowmem_reserve[]: 0 0 0 0 [4296647.186000] DMA: 1*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3588kB [4296647.186000] DMA32: empty [4296647.186000] Normal: 1*4kB 1*8kB 0*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 3756kB [4296647.186000] HighMem: 2015*4kB 3457*8kB 3245*16kB 3099*32kB 5194*64kB 5422*128kB 2960*256kB 1088*512kB 474*1024kB 116*2048kB 2676*4096kB = 14211892kB [4296647.186000] Swap cache: add 0, delete 0, find 0/0, race 0+0 [4296647.186000] Free swap = 16779884kB [4296647.186000] Total swap = 16779884kB [4296647.187000] Free swap: 16779884kB [4296647.288000] 4718592 pages of RAM [4296647.288000] 4489216 pages of HIGHMEM [4296647.289000] 562809 reserved pages[4296647.289000] 347365 pages shared [4296647.289000] 0 pages swap cached[4296647.289000] 78668 pages dirty [4296647.289000] 147126 pages writeback [4296647.289000] 2601 pages mapped[4296647.289000] 26149 pages slab [4296647.289000] 102 pages pagetables [4296647.289000] Out of Memory: Kill process 1304 (portmap) score 422 and children.[4296647.289000] Out of memory: Killed process 1304 (portmap). suggestions? So I'm running out of lowmem? will turning on HIGHPTE be enough to fix this? --pj On Jun 29, 2006, at 5:02 PM, Sunil Mushran wrote: HighFree: 11877028 kB LowFree:391020 kB HighFree: 11761892 kB
Re: [Ocfs2-users] What is wrong
Before you can mount, you have to ensure all the nodes in the cluster access the same device. #echo stats | debugfs.ocfs2 -n /dev/sdX | grep UUID should return the same uuid from all nodes. Once all nodes can see the same device, the you can mount it on all nodes. There are no passive node(s). The dlm ensures only one node updates a particular metadata block at a time. boka wrote: Hello, i have configuration made with slackware 10.2 2.6.17.2 ocfs2tools 1.2.1 on two dell poweredge 852 machines with eonstore array with two scsi controllers. Array is divided in two logical volumes. First logical drive is connected to first node as sdb1, etc. I will use linuxha software for standby cluster. Cluster software, ocfs2tools, tells that it is working. First question is ocfs2 partition should be mounted on all nodes if yes what determine the active node Second question i have mounted ocfs2 partition on node1 and node2 can not see that it is mounted. node1:~# echo slotmap | debugfs.ocfs2 -n /dev/sdb1 Slot# Node# 0 1 node2:~# echo slotmap | debugfs.ocfs2 -n /dev/sdb1 Slot# Node# Any idea Third question I can not see traffic on interconnect devices. ps. sorry for my poor english ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Resizing OCFS2 Filesystems
ocfs2-tools 1.2.2 will have the offline-extend feature. Still in testing. Karen Penman wrote: Hi All, Can anyone tell me if OCFS2 filesystems can be dynamically extended? If not, is this something that is likely to be available in the future? Thanks, Karen ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Unable to mount node2 mount.ocfs2: Transport endpoint is not connected while mounting /dev/sdb1 on /u02/oradata/orcl
Check dmesg on both nodes. The error indicates that the connect failed. Ensure the ip addresses of all nodes in /etc/ocfs2/cluster.conf are correct. Also, that the conf file is the same on all nodes. Try pinging the other node on the configured interface: # ping -I ethX node1 Akin Seigmund Walter-Johnson III wrote: I currenlty have the setup below, both nodes can see the shared drive ( confirmed with fdisk -l ) However I am unable to mount the shared device from node (2) after I mounted from node(1) I get the follwoign error mount.ocfs2: Transport endpoint is not connected while mounting /dev/sdb1 on /u02/oradata/orcl OS Red Hat uname -r - 2.6.9-22.ELsmp OCFS version OCFS2 1.2.1 Fri Apr 21 12:21:12 PDT 2006 (build bd2f25ba0af9677db3572e3ccd92f739) /sbin/lsmod |grep ocfs ocfs2 350660 1 debugfs14216 2 ocfs2 ocfs2_dlmfs27272 1 ocfs2_dlm 183816 2 ocfs2,ocfs2_dlmfs ocfs2_nodemanager 154464 7 ocfs2,ocfs2_dlmfs,ocfs2_dlm configfs 28044 2 ocfs2_nodemanager jbd59481 2 ocfs2,ext3 rpm -qa | grep ocfs ocfs2console-1.2.1-1 ocfs2-tools-1.2.1-1 ocfs2-2.6.9-22.ELsmp-1.2.1-1 ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] [Ocfs2-announce] OCFS2 1.2.2 released
ocfs2-tools 1.2.2 :) Brian Long wrote: On Fri, 2006-06-30 at 16:10 -0700, Sunil Mushran wrote: All, We are pleased to announce the release of OCFS2 1.2.2. This release includes some recent fixes, including bugzilla#723 http://oss.oracle.com/bugzilla/show_bug.cgi?id=723. (Users running 1.2.1-3 are encouraged to upgrade to 1.2.2.) With this release, OCFS2 now detects nodes having different heartbeat timeout values (O2CB_HEARTBEAT_THRESHOLD). Check dmesg after mount(s) to look for errors suggesting the same. Multipath users are encouraged to refer to the FAQ for more on this parameter. Also, new with this release are the largesmp packages for the x86-64, IA64 and PPC64 architectures. In an earlier thread, you mentioned 1.2.2 was going to support offline resize / extend. I do not see this mentioned in the FAQ or the Users Guide. Is there any documentation on this new feature? Thanks. /Brian/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 and Snapshots
OCFS2 relies on the uniqueness of the uuid for it to distinguish between different volumes. One cannot mount two volumes having the same uuid on the same node. Infact, one should not do that across the cluster too, i.e., mount two different physical volumes having the same identical uuid. If you have to mirror and mount, mount it on another node in a different cluster. It could be a 1 node cluster too. Andre Brinkmann wrote: Hello, I am trying to couple OCFS2 with a storage virtualization environment to use features like mirroring and snapshots. Unfortunately it seems to be impossible for ocfs2console (and for mount.ocfs2) to distinguish between the original volume and its snapshot and ocffs2 stops the mount-process with the following messages: Jul 20 17:23:54 sinalco kernel: (5028,1):ocfs2_initialize_super:1395 max_slots for this device: 4 Jul 20 17:23:54 sinalco kernel: (5028,0):ocfs2_fill_super:642 ERROR: Unable to create per-mount debugfs root. Is it possible too change the uuid and other relevant parameters? Best Regards André ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 and Snapshots
Cool. Andre Brinkmann wrote: I hope this patch is in a better diff -u -p-format :-) Patch for the Makefile === --- tunefs.ocfs2/Makefile 2006-04-21 23:40:29.0 +0200 +++ tunefs.ocfs2_new/Makefile 2006-07-21 14:29:48.0 +0200 @@ -36,6 +36,6 @@ OBJS = $(subst .c,.o,$(CFILES)) DIST_FILES = $(CFILES) tunefs.ocfs2.8.in tunefs.ocfs2: $(OBJS) $(LIBOCFS2_DEPS) $(LIBO2DLM_DEPS) $(LIBO2CB_DEPS) - $(LINK) $(LIBOCFS2_LIBS) $(LIBO2DLM_LIBS) $(LIBO2CB_LIBS) $(COM_ERR_LIBS) + $(LINK) $(LIBOCFS2_LIBS) $(UUID_LIBS) $(LIBO2DLM_LIBS) $(LIBO2CB_LIBS) $(COM_ERR_LIBS) include $(TOPDIR)/Postamble.make Patch for tunefs-c --- tunefs.ocfs2/tunefs.c 2006-04-21 23:40:29.0 +0200 +++ tunefs.ocfs2_new/tunefs.c 2006-07-21 14:25:19.0 +0200 @@ -44,6 +44,7 @@ #include inttypes.h #include ctype.h #include signal.h +#include uuid/uuid.h #include ocfs2.h #include ocfs2_fs.h @@ -70,6 +71,7 @@ typedef struct _ocfs2_tune_opts { char *progname; char *device; int verbose; +int uuid; int quiet; int prompt; time_t tune_time; @@ -84,7 +86,7 @@ static void usage(const char *progname) { fprintf(stderr, usage: %s [-N number-of-node-slots] [-L volume-label]\n - \t[-J journal-options] [-S volume-size] [-qvV] + \t[-J journal-options] [-S volume-size] [-qvuV] device\n, progname); exit(0); @@ -242,6 +244,7 @@ static void get_options(int argc, char * { quiet, 0, 0, 'q' }, { version, 0, 0, 'V' }, { journal-options, 0, 0, 'J'}, +{ uuid-reset, 0, 0, 'u'}, { volume-size, 0, 0, 'S'}, { 0, 0, 0, 0} }; @@ -254,7 +257,7 @@ static void get_options(int argc, char * opts.prompt = 1; while (1) { - c = getopt_long(argc, argv, L:N:J:S:vqVx, long_options, + c = getopt_long(argc, argv, L:N:J:S:vquVx, long_options, NULL); if (c == -1) @@ -303,6 +306,10 @@ static void get_options(int argc, char * opts.vol_size = val; break; +case 'u': +opts.uuid = 1; +break; + case 'v': opts.verbose = 1; break; @@ -471,6 +478,38 @@ static void update_volume_label(ocfs2_fi return ; } + +static void update_uuid (ocfs2_filesys *fs, int *changed) +{ +unsigned char *uuid = OCFS2_RAW_SB(fs-fs_super)-s_uuid; + size_t i, max = sizeof(OCFS2_RAW_SB(fs-fs_super)-s_uuid); +uuid_t uuid_new; + +/* print out old uuid of device */ +printf (Try to change uuid: \n); + for(i = 0; i max; i++) + printf(%02x , uuid[i]); + + printf(\n); + +/* generate new uuid */ +uuid_generate(uuid_new); + + memset (OCFS2_RAW_SB(fs-fs_super)-s_uuid, 0, OCFS2_VOL_UUID_LEN); + memcpy (OCFS2_RAW_SB(fs-fs_super)-s_uuid, uuid_new, OCFS2_VOL_UUID_LEN); + +/* print out new uuid */ +printf (New uuid: \n); + for(i = 0; i max; i++) + printf(%02x , uuid[i]); + +printf(\n); + +*changed = 1; + + return ; +} + static errcode_t update_slots(ocfs2_filesys *fs, int *changed) { errcode_t ret = 0; @@ -553,6 +592,7 @@ int main(int argc, char **argv) errcode_t ret = 0; ocfs2_filesys *fs = NULL; int upd_label = 0; +int upd_uuid = 0; int upd_slots = 0; int upd_jrnls = 0; int upd_vsize = 0; @@ -674,6 +714,10 @@ int main(int argc, char **argv) vol_size, opts.vol_size); } +/* update unique serial number of device has been selected */ +if (opts.uuid) +printf ( Change unique serial number of device \n ); + /* Abort? */ if (opts.prompt) { printf(Proceed (y/N): ); @@ -690,6 +734,13 @@ int main(int argc, char **argv) printf(Changed volume label\n); } +/* update the unique serial number */ +if (opts.uuid) { +update_uuid (fs, upd_uuid); +if (upd_uuid) +printf (Changed volume uuid \n); +} + /* update number of slots */ if (opts.num_slots) { ret = update_slots(fs, upd_slots); @@ -726,7 +777,7 @@ int main(int argc, char **argv) } /* write superblock */ - if (upd_label || upd_slots || upd_vsize) { + if (upd_label || upd_slots || upd_vsize || upd_uuid) { block_signals(SIG_BLOCK); ret = ocfs2_write_super(fs); if (ret) { Sunil Mushran wrote: Please could you send it to me again in the diff -u -p format. Andre Brinkmann wrote: Sorry, here the patch as text: For the Makefile: 39c39 $(LINK) $(LIBOCFS2_LIBS) $(LIBO2DLM_LIBS) $(LIBO2CB_LIBS) $(COM_ERR_LIBS) --- $(LINK) $(LIBOCFS2_LIBS) $(UUID_LIBS) $(LIBO2DLM_LIBS) $(LIBO2CB_LIBS) $(COM_ERR_LIBS) For tunefs.ocfs2.c: 46a47
Re: [Ocfs2-users] OCFS2: Could not start cluster stack
Check the support guide on cluster start/stop in the doc section on http://oss.oracle.com/projects/ocfs2. Vicki Luo wrote: I installed OCFS2 on RHEL4 with ocfs2-2.6.9-22.ELsmp-1.2.2-1.i686.rpm. When I start ocfs2console and click on Cluster, and then Configure Nodes, it returns a dialog with the following message Could not start cluster stack. This must be resolved before any OCFS2 filesystem can be mounted Here is some information about my system: 1. Uname -a Linux SDCHS40I030 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:32:14 EDT 2005 i686 i686 i386 GNU/Linux 2. rpm -qa|grep ocfs2 ocfs2-tools-1.2.1-1 ocfs2console-1.2.1-1 ocfs2-2.6.9-22.ELsmp-1.2.2-1 3. rpm -qa | grep kernel kernel-smp-2.6.9-5.EL kernel-utils-2.4-13.1.48 kernel-2.6.9-5.EL kernel-smp-2.6.9-22.EL I saw a few solution posted, for example: depmod -a , then try again. Or SELINUX=disabled I did both of them , but I still got the error message. Can anybody help me? What else could be wrong? Thanks, Vicki ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Private Interconnect and self fencing
Do you have a netdump server configured? If so, it'll have the details of the hb timeout. Jeffery P. Humes wrote: I have set it to 30 seconds, and the same thing still happens. (15,1):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 3 milli seconds panic+0x3e/0x174(15,1):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing [c01233de] [f8cc826a] o2quo_disk_timeout+0x0/0x2 [ocfs2_nodemanager] [c01313f8] run_workqueue+0x7f/0xba [f8cc6b15] o2hb_write_timeout+0x0/0x65 [ocfs2_nodemanager] [c0131be5] worker_thread+0x0/0x117 [c0131ccb] worker_thread+0xe6/0x117 [c011daa9] default_wake_function+0x0/0xc [c01344fd] kthread+0x9d/0xc9 [c0134460] kthread+0x0/0xc9 [c0102005] kernel_thread_helper+0x5/0xb -JPH Sunil Mushran wrote: The 12 sec default is low. Bump it up to 30 secs or even higher. FAQ has the details. The higher you set it to, the longer the brown-out time. Jeffery P. Humes wrote: I have an OCFS2 filesystem on a coraid AOE device. It mounts fine, but with heavy I/O the server self fences claiming a write timeout: (16,2):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device etherd/e0.1p1 after 12000 milliseconds (16,2):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing It is my understanding that OCFS is expecting that the only heartbeat available to be on disk the same disk that I am writing to? Is there any way like with other clustering setups to setup a different or even multiple heartbeats? On a crossover between servers, or on a private interface? Seems like putting it only on the disk, that may have heavy IO is going to cause problems. Any advice on setting up the heartbeats would be greatly appreciated. Thanks, -JPH ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature
What version of ocfs2 is on the nodes? Do modinfo ocfs2 on all nodes. The version of OCFS2 shipped with SLES9 SP3 varies with kernel. Are you using the modules shipped by suse or building them yourself? Vladan Gunjic wrote: I've got a strange issue with the following configuration: Using Oracle 10gR2, having EMC CX500 with FC drives and 2 LUNs configured (one RAID5, one RAID1/0). We have 5 node ocfs2 cluster (4 nodes are SLES9 SP3 64-bit, kernel 2.6.5-7.252-smp, one node is SLES9 SP3 32-bit, 2.6.5-7.257-bigsmp). On all machines latest available OCFS2 is installed (RPMs: ocfs2console-1.2.1-4.2, ocfs2-tools-1.2.1-4.2). As we have at the moment Oracle 10gR2 on other 32-bit machines, we wanted to migrate two such machines into Oracle RAC plus using our new SAN as a storage behind. Therefore I made ocfs2 filesystems on two LUNs (from 64-bit machines) and Connect all five machines in OCFS2 cluster). - 32 bit machine is mounting both LUNs (and acting as a standby for our other existing productive Oracles unrelated to 5 machines described here). - 2 64-bit machines are mounting one of the LUNs (RAID5) and they are one of the two Oracle RACs. - 2 more 64-bit machines are mounting one of the LUNs (RAID1/0) and they are one of the two Oracle RACs. As we want to avoid big downtime for the switch, the idea is to use 32-bit standbies, convert them to 64-bit and use them under 64-bit Oracle RACs. We tested this scenario and it worked well. Now we made final layout of the SAN (more disks in LUNs, etc.) and during the standby building one of the LUNs was suddenly mounted read only and I got following in dmesg: OCFS2: ERROR (device emcpowere1): ocfs2_search_chain: Group Descriptor # 0 has bad signature File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. (9727,3):ocfs2_claim_suballoc_bits:1157 ERROR: status = -5 (9727,3):ocfs2_claim_clusters:1392 ERROR: status = -5 (9727,3):ocfs2_local_alloc_new_window:852 ERROR: status = -5 (9727,3):ocfs2_local_alloc_slide_window:959 ERROR: status = -5 (9727,3):ocfs2_reserve_local_alloc_bits:515 ERROR: status = -5 (9727,3):ocfs2_reserve_clusters:592 ERROR: status = -5 (9727,3):ocfs2_extend_file:836 ERROR: status = -5 (9727,3):ocfs2_write_lock_maybe_extend:689 ERROR: status = -5 (9727,3):ocfs2_write_lock_maybe_extend:693 ERROR: Failed to extend inode 262690 from 0 to 512 After umounting and fsck I found a lot of errors: Checking OCFS2 filesystem in /dev/emcpowere1: label: NONE uuid: 19 a2 94 f5 91 5d 4c ca be 2f c2 51 21 65 6e 2c number of blocks: 175172744 bytes per block:4096 number of clusters: 21896593 bytes per cluster: 32768 max slots: 4 Pass 0a: Checking cluster allocation chains [CHAIN_LINK_MAGIC] Chain 85 in allocator at inode 23 contains a reference at depth 1 to block 84639744 which doesn't have a valid checksum. Truncate this chain? y [CHAIN_BITS] Chain 85 in allocator inode 23 has 64716 bits marked free out of 96768 total bits but the block groups in the chain have 206 free out of 32256 total. Fix this by updating the chain record? y [CHAIN_LINK_MAGIC] Chain 113 in allocator at inode 23 contains a reference at depth 2 to block 154570752 which doesn't have a valid checksum. Truncate this chain? y [CHAIN_BITS] Chain 113 in allocator inode 23 has 64509 bits marked free out of 96768 total bits but the block groups in the chain have 32254 free out of 64512 total. Fix this by updating the chain record? y [CHAIN_LINK_MAGIC] Chain 241 in allocator at inode 23 contains a reference at depth 0 to block 62189568 which doesn't have a valid checksum. Truncate this chain? y [CHAIN_BITS] Chain 241 in allocator inode 23 has 64510 bits marked free out of 64512 total bits but the block groups in the chain have 0 free out of 0 total. Fix this by updating the chain record? y [CHAIN_GROUP_BITS] Allocator inode 23 has 6215157 bits marked used out of 21896593 total bits but the chains have 6215152 used out of 21735313 total. Fix this by updating the inode counts? y [CHAIN_I_CLUSTERS] Allocator inode 23 has 21735313 clusters represented in its allocator chains but has an i_clusters value of 21896593. Fix this by updating i_clusters? y [CHAIN_I_SIZE] Allocator inode 23 has 21735313 clusters represented in its allocator chain which accounts for 71736384 total bytes, but its i_size is 717507559424. Fix this by updating i_size? y [GROUP_EXPECTED_DESC] Block 62189568 should be a group descriptor for the bitmap chain allocator but it wasn't found in any chains. Reinitialize it as a group desc and link it into the bitmap allocator? y [GROUP_EXPECTED_DESC] Block 84639744 should be a group descriptor for the bitmap chain allocator but it wasn't found in any chains. Reinitialize it as a group desc and link it into the bitmap allocator? y [GROUP_EXPECTED_DESC] Block 124895232 should be a group descriptor for the bitmap chain allocator but it wasn't found in
Re: [Ocfs2-users] Question
Just create a one node cluster. However, if you were to mount two mirrored volumes on the same node, you will have problems as detailed in this thread: http://oss.oracle.com/pipermail/ocfs2-users/2006-July/000630.html Thanks to Andre, the next drop of ocfs2-tools will have a fix for this (ability to change the uuid). J Angel Villegas wrote: Hi everybody, I am new in the ocfs2 technology, I have a cluster installed, and I have a EMC clones for backup propose, Now I want to mount the disk cloned by EMC in another machine ( not in the cluster ) I need to make another cluster for this porpose? What are the best practices to mount the disks (ocfs2 FileSystems) in another machine? It is possible? I think that the disk have the same information of the original disk with the ocfs2 FS created.. Regards, ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] re: question on adding a node to RAC cluster and o2cb
When you added the new node using ocfs2console, did it show up in: # ls /config/cluster/clustername/node/ I am assuming that it was added in /etc/ocfs2/cluster.conf. Yes, the docs does not cover this as of now. I will update the FAQ/user's guide with the info. Peter Santos wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Folks, I'm trying to find information about how to dynamically add a 2nd node to a 1 node RAC cluster. I'm posting this only after not getting the details from my oracle tar via metalink. My installation is Suse Enterprise 9 x86_64 (kernel 267). Installing the single node was not a problem, what is not clear is how to prepare the cluster.conf file and the ocr stuff to add a 2nd or additional node. Obviously the 2nd node has to have all the ip configurations in place and ssh has to be working, but at some point, the /etc/ocfs2/cluster.conf file has to be modified and propagated and the ocfs2 mount point has to be mounted on the additional nodes ..this is where we had problems. Here is what we did. 1. setup the 2nd node with all the proper network configuration, and ssh equivalence. 2. we added a 2nd node to cluster.conf via ocfs2console and propagated that to the new node. 3. We tried to mount the ocfs2 mount point, but could not .. it said something like transpoint end point not found 4. We then restarted the cluster on node1 and were able to mount the ocfs2 mount point and go on to add the 2nd node. We are trying to identify the sequence of actions/procedures to add a 2nd node at the o2cb/ocfs2 level. Oracle support didn't have this level of detail, so I'm hoping someone knows how to do this without shutting down the cluster on node1 thanks - -peter -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEz2gCoyy5QBCjoT0RAu6cAJ9C2oRLQUD437fuRF9DSuI9zZb3VgCePP9Y mBoOxNLILnKGo5z0qQtvU3o= =t1Zv -END PGP SIGNATURE- ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] re: question on adding a node to RAC cluster and o2cb
The real error was the one you got when you were not able to add the new node in node1. It is an ocfs2console problem. That it did not work when you added the node in node2 and propagated, is explainable. When you get the third node, do the following: 1. On the existing two nodes, add the new node by hand by executing this (on both). # o2cb_ctl -C -i -n NODENAME -t node -a number=NODENUM -a ip_address=IPADDR -a ip_port= -a cluster=CLUSTERNAME 2. By doing so, you are not only adding the node in /etc/ocfs2/cluster.conf but also activating it (/config/cluster/CLUSTERNAME/node). 3. Either Propagate or hand copy the cluster.conf to the new node. 4. Start the cluster on the new node and then mount. Peter Santos wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I don't know what the entries looked like in /config/cluster/clustername/node/ when we tried this. Now it does show both nodes... but we have since restarted the entire cluster in order to get this to work. We are waiting to get another new machine to try it again. What I do remember is that initially we started up the ocfs2console from node1 and clicked add to add a 2nd node and the tool complained ( I can't remember the exact error message now). Then we tried to run ocfs2console from the new/2nd node and added both node1 and node2 to the configuration Then we clicked propagate .. this worked without any error messages, but we were not able to mount the ocfs2 filesystem on node2 until we restarted the cluster on node1. (transport endpoint errors..) We will definitely try again on a 3rd node, I'm just not clear on what the sequence of events should be. thanks peter Sunil Mushran wrote: When you added the new node using ocfs2console, did it show up in: # ls /config/cluster/clustername/node/ I am assuming that it was added in /etc/ocfs2/cluster.conf. Yes, the docs does not cover this as of now. I will update the FAQ/user's guide with the info. Peter Santos wrote: Folks, I'm trying to find information about how to dynamically add a 2nd node to a 1 node RAC cluster. I'm posting this only after not getting the details from my oracle tar via metalink. My installation is Suse Enterprise 9 x86_64 (kernel 267). Installing the single node was not a problem, what is not clear is how to prepare the cluster.conf file and the ocr stuff to add a 2nd or additional node. Obviously the 2nd node has to have all the ip configurations in place and ssh has to be working, but at some point, the /etc/ocfs2/cluster.conf file has to be modified and propagated and the ocfs2 mount point has to be mounted on the additional nodes ..this is where we had problems. Here is what we did. 1. setup the 2nd node with all the proper network configuration, and ssh equivalence. 2. we added a 2nd node to cluster.conf via ocfs2console and propagated that to the new node. 3. We tried to mount the ocfs2 mount point, but could not .. it said something like transpoint end point not found 4. We then restarted the cluster on node1 and were able to mount the ocfs2 mount point and go on to add the 2nd node. We are trying to identify the sequence of actions/procedures to add a 2nd node at the o2cb/ocfs2 level. Oracle support didn't have this level of detail, so I'm hoping someone knows how to do this without shutting down the cluster on node1 thanks -peter ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEz6EBoyy5QBCjoT0RAo89AJ9QoGYnyEcjJtjDTmOgdnPdiJqS+ACgkZEV p58c7/3nlVoJ2Gk2FnzOTyc= =KCxu -END PGP SIGNATURE- ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: AW: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature
The ocfs2 version should be the same on all the nodes. Mixing nodes with 1.1.8 and 1.2.1 will cause problems. We had fixed a lot of issues in 1.2.1. I'll write more when I reread your prev email. Vladan Gunjic wrote: I'm using ocfs2 and all modules from Suse (SLES9), no self compilations. Here are the details: * 32-bit machine (writing to ocfs2 partition/LUN and where the corruption was reported): Kernel: 2.6.5-7.257-bigsmp #1 SMP i686 i386 GNU/Linux OCFS2 rpms: ocfs2console-1.2.1-4.2 ocfs2-tools-1.2.1-4.2 o2cb_ctl -V:o2cb_ctl version 1.2.1 /etc/init.d/o2cb status: Module configfs: Loaded Filesystem configfs: Mounted Module ocfs2_nodemanager: Loaded Module ocfs2_dlm: Loaded Module ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking cluster dbrac: Online Checking heartbeat: Active /etc/init.d/ocfs2 status: Configured OCFS2 mountpoints: /mnt/emcpowera1 mnt/emcpowere1 Active OCFS2 mountpoints: /mnt/emcpowera1 /mnt/emcpowere1 * 2 identical 64-bit machines (that are supposed to use the data after 32-64 bit conversion): Kernel: 2.6.5-7.257-smp #1 SMP x86_64 GNU/Linux OCFS2 rpms: ocfs2console-1.2.1-4.2 ocfs2-tools-1.2.1-4.2 o2cb_ctl -V:o2cb_ctl version 1.2.1 /etc/init.d/o2cb status: Module configfs: Loaded Filesystem configfs: Mounted Module ocfs2_nodemanager: Loaded Module ocfs2_dlm: Loaded Module ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking cluster dbrac: Online Checking heartbeat: Active /etc/init.d/ocfs2 status: Configured OCFS2 mountpoints: /mnt/emcpowerd1 Active OCFS2 mountpoints: /mnt/emcpowerd1 (other 2 64-bit machines have other LUN from 32-bit machine mounted) modinfo on all 5 machines: 1. (32-bit) license:GPL author: Oracle version:1.2.1-SLES AC2C92855997647E2A862F0 description:OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-bigsmp SMP PENTIUMII REGPARM gcc-3.3 == next 2 machines are mounting the LUN that was corrupted (will be one Oracle RAC): 2. (64-bit) license:GPL author: Oracle version:1.2.1-SLES AC2C92855997647E2A862F0 description:OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-smp SMP gcc-3.3 3. (64-bit) license:GPL author: Oracle version:1.2.1-SLES AC2C92855997647E2A862F0 description:OCFS2 1.2.1-SLES Thu Apr 20 18:03:18 PDT 2006 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.257-smp SMP gcc-3.3 == next 2 machines are mounting the LUN that was NOT corrupted (will be another Oracle RAC): 4. (64-bit) license:GPL author: Oracle version:1.1.8-SLES E9BF6AA66857FAE88EF441B description:OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.252-smp SMP gcc-3.3 5. (64-bit) license:GPL author: Oracle version:1.1.8-SLES E9BF6AA66857FAE88EF441B description:OCFS2 1.1.8-SLES Tue Dec 13 18:20:37 PST 2005 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.252-smp SMP gcc-3.3 Additionally I noticed last night, when I was shortly disabling the complete network of all of those machines that after restoring the network, the last two machines (older ocfs2 version) were confused and didn't rejoin the cluster before the system reboot. So, I guess first step is to update last two on ocfs2 version 1.2.1 ? Although they were not directly involved in corruption, maybe indirect ? Thanks, Vladan -Ursprüngliche Nachricht- Von: Sunil Mushran [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 1. August 2006 04:29 An: Vladan Gunjic Cc: ocfs2-users@oss.oracle.com Betreff: Re: [Ocfs2-users] ocfs2_search_chain: Group Descriptor has bad signature What version of ocfs2 is on the nodes? Do modinfo ocfs2 on all nodes. The version of OCFS2 shipped with SLES9 SP3 varies with kernel. Are you using the modules shipped by suse or building them yourself? ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] o2net: connect to node has been idle for 10 secs
1. o2net talks tcp. It should be able to handle this. 2. If the cluster is active and the nodes are communicating, the keepalive packet is rarely sent. It only sends the packet if it does not hear from the other node for 5 secs. 3. Try the same with 1.2.3. (We made 2 important 1 line fixes.) 4. If this does happen again, and you are interested, we could always give you a drop that dumps the stack of all the procs, to get a better feel for the situation. Andy Phillips wrote: Hello, Apologies for following up on myself. in ocfs2/cluster/tcp_internal.h #define O2NET_KEEPALIVE_DELAY_SECS 5 #define O2NET_IDLE_TIMEOUT_SECS 10 Is this really sensible? Potentially, given small variance in system clocks losing one keepalive packet (assuming that o2net_sc_send_keep_req is the only thing keeping the connection alive) the loss of one packet could cause a node to self fence and reboot. Would #define O2NET_KEEPALIVE_DELAY_SECS 5 #define O2NET_IDLE_TIMEOUT_SECS 20 Cause any problems? Andy On Thu, 2006-08-03 at 12:41 +0100, Andy Phillips wrote: Hello, I've a two node 10gR2 rac cluster on a pair of sun opteron boxes. Redhat AS 4.3 2.6.9-34.0.1.ELsmp x86_64. ocfs 1.2.2. RAC is using ASM to talk to the data files, but we have 3 ocfs2 filesystems up to share dba files, and the usual bits and bobs. Things were fine until, on mostly idle system, this happened out of the blue; Aug 2 19:06:27 fred kernel: o2net: connection to node barney (num 0) at 172.16.6.10: has been idle for 10 seconds, shutting it down. Aug 2 19:06:27 fred kernel: (0,7):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1154545576.798263 now 1154545586.796978 dr 1154545576.798238 adv 1154545576.798291:1154545576.798293 func (06aac8a1:1) 1154545566.800782:1154545566.800787) Aug 2 19:06:27 fred kernel: o2net: no longer connected to node barney (num 0) at 172.16.6.10: Aug 2 19:08:33 fred kernel: (25,7):o2quo_make_decision:143 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Aug 2 19:08:33 fred kernel: (25,7):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions. And the node then halted. Barney is node 0. The systems were idle. We've hammered the ocfs2 file systems, and set o2cb_heartbeat_threshold to 61. All is good and stable under heavy i/o. The interconnect is a bonded interface, with two gig cards, each connected (with flow control on) to two separate FESX424 switches. The switches dont register any problems at this time, nor does linux register any interface issues. I'm looking at the source code at the moment, but nothing is leaping out at me. Any ideas - Do the timer debug lines above mean anything to anyone. Thanks Andy ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Re: Problems with OCFS2 and Oracle 10g
ocfs2 requires a shared disk. As in, all nodes must be able to concurrently read/write to the device. sorapak Last wrote: Yes. my disk is an IDE. Would it cause the problems? Thanks Sorapak ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] o2net: connect to node has been idle for 10 secs
Alexei_Roudnev wrote: In my case, after spending few days, I find that my HugeTLB setting (in Oracle) caused long kernel loop and it forced OCFSv2 to reboot because of losing connection. I am keen to hear more about this. Please could you elaborate. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Installing ocfs2-tools from source?
Do, make rpm instead. Change Copyright to License in the spec file and do make rpm. I built the following for fc5/x86. http://oss.oracle.com/~smushran/.fc5-rpms/ Eric Adair wrote: building on fedora core 5, kernel 2.6.16.-1.2133.FC5smp Everything builds fine, but I can't find a means to make install. Obviously, I'm noob-ing up the place here. What am I missing? ( ocfs2-tools-1.2.1 tarball is being used). -Eric / / /***This message has been dgitally signed by Thawte Certificate Authority*/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS- EMC Issue
# cd /tmp # wget http://oss.oracle.com/~smushran/.debug/stat_sysdir.sh # ./stat_sysdir -d sdX sys.out Email me the output. amit pansare wrote: I’ve an issue related to Oracle 10g RAC. I’ve 2 node cluster each being Dell 2850 Server with RHEL 4.0 I’ve EMC CX300 SAN storage with following partitions /orasoft 10 Gb OCFS2 File system /oracrs 2 Gb OCFS2 File system /orabackup 100 Gb OCFS2 File system The datafiles are on ASM which is not directly visible in OS. I’ve common Oracle Home installed in /orasoft/db_1 which is shared by both nodes in cluster. I’ve faced an issue recently related to EMC storage. The /orasoft partition displays 1.4 Gb space available using df command. when ever I try to create a file on this partition I get an error as No Space left on device. I’m unable to start any service with the same reason. while i am able to use this partition from another node in cluster. Can anyone help me with this storage issue ?? Regards , Amit Pansare DBA Net Magic Solutions ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] cfq scheduler?
U4 has the fix. We've tested U2 (and U3) + fix internally already. So we don't feel the need to rerun the test for the same again. Brian Long wrote: Has anyone at Oracle tested the RHEL 4.4 beta or GA kernel to verify the cfq scheduler is fixed wrt. OCFS2? Or will that testing only begin now that U4 is GA? http://oss.oracle.com/bugzilla/show_bug.cgi?id=671 Thanks for any info. /Brian/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] re: Process to change cluster.conf IPS ?
# o2cb_ctl -H -t node -n node_name -a ip_address=NEW_IP_ADD o2cb_ctl: Node changes not yet supported The man page is missing -t node but that will still not help you. Currently o2cb_ctl only allows dynamic adding of new nodes not updating existing nodes. So, edit /etc/ocfs2/cluster.conf, and change the ip address. Copy it to all nodes before restarting the cluster on all nodes. # cat /config/cluster/clus/node/nodename/ip_address should show the updated value after the cluster is up. Peter Santos wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Sunil, The link you pointed me to, just says to stop the cluster, modify the cluster.conf and re-start the cluster. Is that all that needs to be done? The reason I ask is that I found this url http://manpage.willempen.org/8/o2cb_ctl with the man pages for o2cb_ctl and it says that you can do this o2cb_ctl -H -n node_name -a ip_address=NEW_IP_ADDRESS. However, everytime I tried it, I kept getting the error invalid attribute. Can you please just confirm the proper way? I just want to be sure that editing the cluster.conf file is enough to update all the proper locations where that IP may exist. - -peter Sunil Mushran wrote: http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#CONFIGURE Peter Santos wrote: Folks, I have a simple 2 node 10gR2 RAC cluster. Each node has a public/private and virtual IP. We moved the network to a different subnet and now I need to figure out how to make the changes visible to ocfs2 and it's services including making the changes to cluster.conf. I suspect that simply changing the IP addresses in cluster.conf is not enough? My cluster.conf files have the nodes physical ip address. I'm not even sure if they should have the private interconnect IP or not, but that is also a different issue. Can someone point me to the correct procedure? TIA -peter ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFE4yR1oyy5QBCjoT0RAqxSAJ4jQyDkWzHSoTDCuLVxd9Kn8mU+ewCggk2O 1XtMV3qfhvanqGHwvVFuUck= =jteH -END PGP SIGNATURE- ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 over DRBDv8
As far as ocfs2 is concerned, bio_add_page() is failing. The one thing that springs to mind is that o2hb sets bio-bi_sector to 512 bytes and not the block size. Kilian CAVALOTTI wrote: Hi all, I'm new to OCFS2, but not so new to DRBD. I'd like to use the new primary/primary feature of DRBDv8 to create a shared storage space and concurrently access it from multiple clients, using OCFS2. I configured two hosts with DRBD, allowed two primaries, and successfully made each partition primary. # cat /proc/drbd version: 8.0pre4 (api:84/proto:82) SVN Revision: 2375M build by [EMAIL PROTECTED], 2006-08-17 15:54:17 0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate r--- ns:0 nr:1398278 dw:1398278 dr:98 al:0 bm:1895 lo:0 pe:0 ua:0 ap:0 resync: used:0/7 hits:86007 misses:1381 starving:0 dirty:0 changed:1381 act_log: used:0/257 hits:0 misses:0 starving:0 dirty:0 changed:0 1: cs:Unconfigured I tried to format the volume with a traditionnal filesystem, and successfully mounted it on both nodes. I then tried with ocfs2. On the first node, mkfs and mount went without a hitch, but on the second one, I systematically get an error when I try to do anything on the volume (fsck'ing, starting ocfs2-heartbeat, mounting, etc.). dmesg shows the following, drbd0: role( Secondary - Primary ) drbd0: Writing meta data super block now. (6672,0):o2hb_setup_one_bio:290 ERROR: Error adding page to bio i = 1, vec_len = 4096, len = 0 , start = 0 (6672,0):o2hb_read_slots:385 ERROR: status = -5 (6672,0):o2hb_populate_slot_data:1279 ERROR: status = -5 (6672,0):o2hb_region_dev_write:1379 ERROR: status = -5 It seems that the heartbeat process can't write to the device, for an unknown reason: open(/sys/kernel/config/cluster, O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 4 fstat(4, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getdents64(4, /* 3 entries */, 4096)= 88 getdents64(4, /* 0 entries */, 4096)= 0 close(4)= 0 mkdir(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135, 0755) = 0 open(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135/block_bytes, O_WRONLY) = 4 write(4, 512, 3) = 3 close(4)= 0 open(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135/start_block, O_WRONLY) = 4 write(4, 2176, 4) = 4 close(4)= 0 open(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135/blocks, O_WRONLY) = 4 write(4, 255, 3) = 3 close(4)= 0 open(/dev/drbd0, O_RDWR) = 4 open(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135/dev, O_WRONLY) = 5 write(5, 4, 1)= -1 EIO (Input/output error) close(5)= 0 close(4)= 0 rmdir(/sys/kernel/config/cluster/ocfs2_cluster/heartbeat/D6F76726AFE4472CBF0650A1FEF09135) = 0 semop(0, 0x7fff930bfe30, 1) = 0 close(3)= 0 write(2, mkfs.ocfs2, 10mkfs.ocfs2) = 10 write(2, : , 2: ) = 2 write(2, I/O error on channel, 20I/O error on channel)= 20 write(2, , 1 )= 1 write(2, while initializing the dlm, 26while initializing the dlm) = 26 write(2, \r\n, 2 I can't figure if it's a DRBD- or a OCFS2-related issue, and I'd take any enlightenment with gratitude. BTW, I use amd64, debian-provided 2.6.17 kernel, drbd8-module-source 8.0pre4-1 (I tried SVN trunk too), and ocfs2-tools 1.2.1-1. Thanks in advance, ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Wrong dm device used
Well, mounted.ocfs2 is dumb... as in, it just scans /proc/partitions. We have to teach it new tricks. :) Fabio Corazza wrote: Hi there, I've just setup an EVMS cluster with Heartbeat 2.0.7 and OCFS2. Everything seems to be working fine except this: [EMAIL PROTECTED] photos]# mounted.ocfs2 -d DeviceFS UUID Label /dev/dm-6 ocfs2 c1a56afe-3d4b-4b88-919c-b9454b1ec708 cache /dev/dm-7 ocfs2 c1a56afe-3d4b-4b88-919c-b9454b1ec708 cache /dev/dm-8 ocfs2 0663bfeb-60ad-400a-8c1a-61156772eebc photos /dev/dm-14ocfs2 e2533760-1c3f-4f7a-886f-8769e73f1088 photos /dev/dm-15ocfs2 e2533760-1c3f-4f7a-886f-8769e73f1088 photos [EMAIL PROTECTED] photos]# mounted.ocfs2 -f DeviceFS Nodes /dev/dm-6 ocfs2 mybbook-as01, mybbook-as02 /dev/dm-7 ocfs2 mybbook-as01, mybbook-as02 /dev/dm-8 ocfs2 Unknown: OCFS2 directory corrupted /dev/dm-14ocfs2 mybbook-as01, mybbook-as02 /dev/dm-15ocfs2 mybbook-as01, mybbook-as02 [EMAIL PROTECTED] photos]# The same in the other node. I tried to reboot, to run dmsetup delete_all restart evms... nothing happens. That dm-8 still hangs there. Everything else is fine... what could it be? The filesystems seem to work correctly. Regards, ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Wrong dm device used
If you are dealing with creating/removing lots of small files, a large journal will help. Currently there is no way other than trial and error. We'll look into making this easier but right now there is no other way. Pick the largest value of all the subcomponents for the hb timeout. Fabio Corazza wrote: Sorry for my laziness, I just had a read at the mkfs.ocfs2 man page and had answered to some questions by myself. If you can still give me some hints about the block-size and cluster-size values about the filesystem that I'm going to create, I'd appreciated it. Also, I'm a little bit curious about the journal size, how and why it should be tuned. I'd also have another question... reading the faq it's stated that I should set the O2CB_HEARTBEAT_THRESHOLD to a value calculated through a specific formula over the I/O layer timeout. Where can I look to obtain such value? I'm using the iSCSI Linux initiator with the parameter ConnFailTimeout=180, don't know if this has something to do with the I/O layer timeout. Also, I'm using multipath-tools. Thanks, Fabio Fabio Corazza wrote: OK, so basically the filesystem keeps on relying on EVMS devices even if ocfs2console or ocfs2tools will be detecting other devices. Please confirm me that this is correct. Also, relating to the options I'm given during the creation of an ocfs2 volume, which options do you suggest for a volume that _only_ stores a LOT of small files (images, maximum size for each will be 3MB) and a lot of directories. Actually, I will have 2 nodes on r/w and a third node that will just read (is the backup server). [-b block-size] [-C cluster-size] [-N number-of-node-slots] [-T filesystem-type] [-L volume-label] [-J journal-options] [-HFqvV] device [blocks-count] Basically: block-size, cluster-size. Also, what number-of-node-slots mean? The maximum number of nodes the filesystem can be accessed from? I've seen that this defaults to 4, can this be expanded after the filesystem creation or has to be prevented on time? Also, what about journal-options? Thanks for your attention, highly appreciated. Fabio Sunil Mushran wrote: Well, mounted.ocfs2 is dumb... as in, it just scans /proc/partitions. We have to teach it new tricks. :) Fabio Corazza wrote: Hi there, I've just setup an EVMS cluster with Heartbeat 2.0.7 and OCFS2. Everything seems to be working fine except this: [EMAIL PROTECTED] photos]# mounted.ocfs2 -d DeviceFS UUID Label /dev/dm-6 ocfs2 c1a56afe-3d4b-4b88-919c-b9454b1ec708 cache /dev/dm-7 ocfs2 c1a56afe-3d4b-4b88-919c-b9454b1ec708 cache /dev/dm-8 ocfs2 0663bfeb-60ad-400a-8c1a-61156772eebc photos /dev/dm-14ocfs2 e2533760-1c3f-4f7a-886f-8769e73f1088 photos /dev/dm-15ocfs2 e2533760-1c3f-4f7a-886f-8769e73f1088 photos [EMAIL PROTECTED] photos]# mounted.ocfs2 -f DeviceFS Nodes /dev/dm-6 ocfs2 mybbook-as01, mybbook-as02 /dev/dm-7 ocfs2 mybbook-as01, mybbook-as02 /dev/dm-8 ocfs2 Unknown: OCFS2 directory corrupted /dev/dm-14ocfs2 mybbook-as01, mybbook-as02 /dev/dm-15ocfs2 mybbook-as01, mybbook-as02 [EMAIL PROTECTED] photos]# The same in the other node. I tried to reboot, to run dmsetup delete_all restart evms... nothing happens. That dm-8 still hangs there. Everything else is fine... what could it be? The filesystems seem to work correctly. Regards, ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] self fencing and system panicproblem afterforced reboot
Yes, we are working on it. :) Alexei_Roudnev wrote: It's all about the same - need 'single node' mounting mode on OCFSv2, so that sysadmin be able to mount it with any media errors and without working cluster. (Of course, such mount should show many warnings before going thru). - Original Message - From: Holger Brueckner [EMAIL PROTECTED] To: Sunil Mushran [EMAIL PROTECTED] Cc: ocfs2-users@oss.oracle.com Sent: Friday, September 15, 2006 1:20 AM Subject: Re: [Ocfs2-users] self fencing and system panicproblem afterforced reboot i guess i found the solution. while dumping some files with debugfs, it suddenly stopped working and could not be killed. and guess what, media error on the drive :-/. funny that a filesystem check succeeds. anyway thx a lot to those who responded. holger On Thu, 2006-09-14 at 11:03 -0700, Sunil Mushran wrote: Not sure why a power outage should cause this. Do you have the full stack of the oops? It will show the times taken in the last 24 operations in the hb thread. That should tell us as to what is up. Holger Brueckner wrote: i just discovered the ls, cd, dump and rdump commands in debugfs.ocfs2. they work fine :-). neverless i would really like to know why mounting and accessing the volume is not possible anymore. but thanks for the hint pieter holger brueckner On Thu, 2006-09-14 at 14:30 +0200, Pieter Viljoen - MWEB wrote: Hi Holger Maybe you should try the fscat tools (http://oss.oracle.com/projects/fscat/) - which has a fsls (to list) and fscp (to copy) directly from the device. I have not tried it yet, so good luck! Pieter Viljoen -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Holger Brueckner Sent: Thursday, September 14, 2006 14:17 To: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] self fencing and system panic problem afterforced reboot side note: setting HEARBEAT_THRESHOLD to 30 did not help either. could it be that the syncronization between the daemons does not work? (e.g daemons think fs is mounted on some nodes and try to synchonize but actually the fs isn't mounted on any node?) i'm rather clueless now. finding a way to access the data and copy it to the non shared partitions would help me a lot. thx holger brueckner On Thu, 2006-09-14 at 13:47 +0200, Holger Brueckner wrote: X-CS-3-Report: plain hello, i'm running ocfs2 to provide a shared disk thoughout a xen cluster. this setup was working fine until today where there was an power outage and all xen nodes where forcefully shut down. whenever i try to mount/access the ocfs2 partition the system panics and reboots: darks:~# fsck.ocfs2 -y -f /dev/sda4 (617,0):__dlm_print_nodes:377 Nodes in my domain (5BA3969FC2714FFEAD66033486242B58): (617,0):__dlm_print_nodes:381 node 0 Checking OCFS2 filesystem in /dev/sda4: label: NONE uuid: 5b a3 96 9f c2 71 4f fe ad 66 03 34 86 24 2b 58 number of blocks: 35983584 bytes per block:4096 number of clusters: 4497948 bytes per cluster: 32768 max slots: 4 /dev/sda4 was run with -f, check forced. Pass 0a: Checking cluster allocation chains Pass 0b: Checking inode allocation chains Pass 0c: Checking extent block allocation chains Pass 1: Checking inodes and blocks. [CLUSTER_ALLOC_BIT] Cluster 295771 is marked in the global cluster bitmap but it isn't in use. Clear its bit in the bitmap? y [CLUSTER_ALLOC_BIT] Cluster 2456870 is marked in the global cluster bitmap but it isn't in use. Clear its bit in the bitmap? y [CLUSTER_ALLOC_BIT] Cluster 2683096 is marked in the global cluster bitmap but it isn't in use. Clear its bit in the bitmap? y Pass 2: Checking directory entries. Pass 3: Checking directory connectivity. Pass 4a: checking for orphaned inodes Pass 4b: Checking inodes link counts. All passes succeeded. darks:~# mount /data (622,0):ocfs2_initialize_super:1326 max_slots for this device: 4 (622,0):ocfs2_fill_local_node_info:1019 I am node 0 (622,0):__dlm_print_nodes:377 Nodes in my domain (5BA3969FC2714FFEAD66033486242B58): (622,0):__dlm_print_nodes:381 node 0 (622,0):ocfs2_find_slot:261 slot 2 is already allocated to this node! (622,0):ocfs2_find_slot:267 taking node slot 2 (622,0):ocfs2_check_volume:1586 File system was not unmounted cleanly, recovering volume. kjournald starting. Commit interval 5 seconds ocfs2: Mounting device (8,4) on (node 0, slot 2) with ordered data mode. (630,0):ocfs2_replay_journal:1181 Recovering node 2 from slot 0 on device (8,4) darks:~# (4,0):o2hb_write_timeout:164 ERROR: Heartbeat write timeout to device sda4 after 12000 milliseconds (4,0):o2hb_stop_all_regions:1789 ERROR: stopping heartbeat on all active regions
Re: [Ocfs2-users] ocfs2 - disk usage inconsistencies
Another node or that node itself. As far as the filesize goes, ls -l does not give the ondisk size. Do stat inodenum on the unlinked files and see the Clusters. Matthew Flusche wrote: There has been a lot of file system activity recently. I have files in orphan_dir: and orphan_dir:0002. But that doesn't seem to account for the 17 GB missing. The truncate logs seem clean. So having files in orphan_dir: is telling me that the node in slot 0 deleted files and another node(s) still has the file open, correct? Matt debugfs: ls -l //orphan_dir: 16 drwxr-xr-x 13 0 0 774144 10-Sep-2006 00:08 . 10 drwxr-xr-x 6 0 04096 2-May-2006 16:11 .. 3052182 drwxrwxrwx 0 501 5004096 19-Jul-2006 17:50 002e9296 8234094 drwxrwxrwx 0 501 5004096 19-Jul-2006 17:50 007da46e 13063783drwxrwxrwx 0 501 5004096 19-Jul-2006 17:50 00c75667 7869995 drwxrwxrwx 0 501 5004096 22-Aug-2006 13:27 0078162b 3741473 drwxrwxrwx 0 501 5004096 22-Aug-2006 13:29 00391721 3351057 drwxrwxrwx 0 501 5004096 19-Jul-2006 17:50 00332211 7842503 drwxrwxrwx 0 501 5004096 19-Jul-2006 17:50 0077aac7 2056493 drwxrwxrwx 0 501 5004096 5-Sep-2006 08:53 001f612d 7861894 drwxrwxrwx 0 501 5004096 5-Sep-2006 08:53 0077f686 1487817 drwxrwxrwx 0 501 5004096 5-Sep-2006 08:53 0016b3c9 1702439 drwxrwxrwx 0 501 5004096 5-Sep-2006 08:53 0019fa27 debugfs: ls -l //orphan_dir:0002 18 drwxr-xr-x 2 0 0 94208 5-Jul-2006 17:13 . 10 drwxr-xr-x 6 0 04096 2-May-2006 16:11 .. 4301446 -rw-r--r-- 0 503 500 0 12-Aug-2006 10:40 0041a286 -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 20, 2006 12:32 PM To: Matthew Flusche Cc: Ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] ocfs2 - disk usage inconsistencies Did you remove some large files recently? If so, check the orphan_dir and truncate_log for all the slots. 1. Start debugfs: # debugfs.ocfs2 /dev/sdX 2. List system directory: ls -l // 3. List files in all orphan_dir(s): ls -l //orphan_dir: If there are files, means some process in the cluster is still using that file. 4. stat all trancate_log(s): stat //truncate_log: I will be surprised if you see any bits here. If there are, do sync;sync;sync; on the appropriate node. 5. You can find the appropriate node by dumping the slotmap: slotmap Find the slot-to-nodenum mapping. Do the sync on that node. For this and more, refer to the on-disk format support guide. http://oss.oracle.com/projects/ocfs2/dist/documentation/03-disk_format.p df Matthew Flusche wrote: Hi all. I have a 50 GB OCFS2 file system. I'm currently using ~26GB of space but df is reporting 43 GB used. Any ideas how to find out where the missing 17GB is at? The file system was formatted with a 16K cluster 4K block size. Thanks, Matt ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Use of OCFS2 file systems.
Yes. Bill Wells wrote: All, Can someone comment on whether it is recommended to use the OCFS2 file system for the admin directories of a RAC database. Specifically, for bdump, udump, cdump, etc. This is being considered on RHEL4-U4 with 10gR2 on a 3 node cluster. Thanks much, Bill Wells -- Oracle logo *Bill Wells*, OCP 7,8i,9i,10g /Principal Service Delivery Engineer Advanced Customer Services - GEH/ Cell# (919) 624-6300 Office# (919) 846-8426 email: [EMAIL PROTECTED] The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.
File a bug on bugzilla (oss.oracle.com/bugzilla) with the full oops trace and any other information that seems relevant. Galan Merchan, Martin wrote: Hello, I’m working with OCFS2 on Radhat Advanced Server 4 Patch 3 and I had kernel panics too. I use OCFS2 only for RAC archive logs and RMAN backups. Well, I’m testing one solution and seems to be fine: In /etc/ocfs2/cluster.conf I have replaced the public IPs by the heartbeat IPs (parameter ip_address), but keeping the names. Is there anyone that knows this solution and have tested it with fails? Regards from Spain, *_MARTÍN_* ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Resizing mountpoint in ocfs2
Yes, the last patch to add this feature is in review. We will release this as part of ocfs2-tools 1.2.2. Kerr-Sheppard, Stephen wrote: Has anyone had to resize a mountpoint in ocfs2. In ocfs version 1 it was a case of unmounting and using the resizeocfs command. Is this still the same for ocfs2?? Thanks Stephen *Stephen Kerr-Sheppard* T +44(0)1908 257469 F +44(0)1908 692791 E [EMAIL PROTECTED] W_ __http://www.imserv.invensys.com_ IMServ Europe Ltd Scorpio Rockingham Drive Linford Wood Milton Keynes MK14 6LY Registered in England and Wales No.2749624 Registered Address: Invensys Portland House Stag Place London SW1E 5BF Disclaimer Notice This message/attachment(s) are CONFIDENTIAL and may contain LEGALLY PRIVILEGED information. If this message/attachment(s) were not intended for you please contact the sender IMMEDIATELY and delete this message/attachment(s) from your computer. You must not copy, forward or disclose the contents of this message/attachment(s) to any other person. The views/opinions in this message are solely of the author and do not necessarily represent those of the company. Please check this message/attachment(s) for the presence of viruses. No liability for any damage caused by any virus transmitted by this message/attachment(s) is accepted by the company. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] 2 Node cluster, and nodes OS hang
tcpdump -i eth1 -C 10 -W 15 -s 1 -Sw /tmp/`hostname -s`_tcpdump.log -ttt 'port ' Do this on both nodes before mounting on the second node. Ping me with the path to the logs. [EMAIL PROTECTED] wrote: Hello All, I have a NAS that I would like to use ocfs2 on. Currently there are three partitions made on it, I have included a fdisk listing below. I have created a 2 node cluster. I perform a basic mount, mount -t ocfs2 /dev/ndas-00500435:0p3 /media/nas3. When I do this on the 2nd node.. after the 5 sec it takes to mount, the 2nd node will completely hang after 5 secs. I did see some local iptables blocks from the 2nd node. so I disabled the firewall on both nodes, and that did not help. I am using opensuse 10.1, kernel 2.6.16.21-0.25-default. I am fine with troubleshooting the cluster not working, but completely hanging my system ? [ocfs2console] Version: 0.90 Label: Nas3 UUID: Long #. Maximum Nodes: 4 Cluster Size: 16K Block Size: 4k [/etc/ocfs2/cluster.conf] node: ip_port = ip_address = 192.168.123.198 number = 0 name = desk1 cluster = ocfs2 node: ip_port = ip_address = 192.168.123.199 number = 1 name = desk2 cluster = ocfs2 cluster: node_count = 2 name = ocfs2 [fdisk] Disk /dev/ndas-00500435:0: 164.6 GB, 164694458368 bytes 255 heads, 63 sectors/track, 20022 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/ndas-00500425:0p1 1130510482381c W95 FAT32 (LBA) /dev/ndas-00500425:0p21306 1139781063990 83 Linux /dev/ndas-00500425:0p3 11398 2002269280312+ 83 Linux ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Getting Started with ocfs2
Martin J. Evans wrote: fine but on selecting cluster/configure nodes I still get dialogue saying Could not query the state of the cluster stack. This must be resolved before any OCFS2 filesystemcan be mounted. Could be because the script is installed as o2cb and not o2cb.init. Fedora Core release 5 (Bordeaux) with ocfs2 tools installed by downloading the source and the usual configure and make (version ocfs2-tools-1.2.1.tar.gz) because the rpms I saw seemed ages (years) out of date. ocfs2-tools 1.2.1 was the last one released. We are working on releasing tools 1.2.2. I'm new to this so I may be missing a lot but following the instructions in the user guide did not get me to this point: 1. the ocfs2console does not run without setting my PYTHONPATH first - I don't know why. 2. the ocfs2console does not seem to create the /etc/ocfs2/cluster.conf (for me anyway). 3. if you install the ocfs2 tools from the rpm it create a minor cluster.conf which ocfs2console does not seem to like since you can't add nodes. If you install from the source code you don't even get a cluster.conf. I am willing to accept I've missed things but what were they? Does everyone go through this? I would hope not. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] out of memory... doing heavy IO on ocfs2 is wasting (low) memory?!
Still in testing. It is a larger patch than normal and thus requires more time/effort. Once we are comfortable with it, we will look into releasing the patch for others to test before releasing 1.2.4. Jonah H. Harris wrote: What's the status on this? I've researched Bugzilla, SVN, and the lists and haven't seen any mention of it yet being fixed as of yet. Kurt or Sunil, do you have a patch available that I could try? Otherwise, what's the Bugzilla ID so I can follow it's progress. Any help you can give would be appreciated. Thanks! -Jonah ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] disk heartbeat timeout poll
Thanks for all the replies in the previous usage poll. One of the chief concerns expressed was the (very) low default disk heartbeat timeout setting. Well, we want to bump it up but to what? Here are some qs the answers to which will help us determine that value. 1. What is the your disk heartbeat timeout? If you are unsure, cat /etc/sysconfig/o2cb. 2. What is your shared disk setup like? Fiber Channel, iscsi, AoE, etc. Provide as much detail as you can. 3. Are you using some sort of multipathing? If so, provide details. 4. What is the cluster used for? Oracle database, mailserver, etc. 5. How many nodes in your cluster? 6. Any other relevant information? Again, feel free to mail me directly. Thanks Sunil ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] SUSE Patches
Ping Novell. They issue interim PTF SLES kernels with the required fix(es) to help users tide over until the formal release. Needless to add, you need to have Novell Support. Andy Kipp wrote: Hello all, I am running SLES9 with the latest kernel patches (2.6.5-7.282-bigsmp) and ocfs2 version (1.2.1-SLES), I was wondering if anyone had any info on if the latest ocfs2 will be included in SLES anytime soon. Or how to get the patches from Novell. Thanks in advance! - Andy Andy Kipp Network Administrator Velcro USA Inc. 406 Brown Ave. Manchester, NH 03103 Phone: (603) 222-4844 Email: [EMAIL PROTECTED] CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] RHEL 4 hotfix RPMs?
# ./configure --with-kernel=/usr/src/kernels/2.6.9-42.X.EL-smp-i686/ # make rhel4_2.6.9-42.X.EL_rpm The rpms will be in the rpmdir as specified in ~/.rpmmacros. ~$ cat .rpmmacros %_topdir/rpmbuild/user %_tmppath /rpmbuild/user/tmp %_sourcedir /rpmbuild/user/SOURCES %_specdir /rpmbuild/user/SPECS %_srcrpmdir /rpmbuild/user/SRPMS %_rpmdir/rpmbuild/user/RPMS %_builddir /rpmbuild/user/BUILD Brian Long wrote: Would it be possible to post the src.rpm used to build the RHEL 4 binary RPMs for OCFS2 kernel modules? Or could you explain how to easily build the ocfs2 kernel modules for a Red Hat hotfix kernel? I downloaded and extracted the ocfs2-1.2.3 tarball and ran ./configure with the defaults. It found the hotfix -devel RPM installed. When I run make, it compiles ocfs2 properly, but it does not create RPMs for me to install. I found the ocfs2.spec-generic in the vendor/rhel4 directory, but how can I easily use it to build RPMs for my hotfix kernel? Thanks. /Brian/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] 1.2.2 dump issue
As the ocfs2 home page suggests, when building 1.2.x against mainline 2.6.14 and above, specify GENERIC_DELETE_INODE_NOT_TRUNCATES=1. Peter Larsen wrote: I'm running 1.2.2 here - compiled from source, and while I can read files, trying to delete a file on my OCFS2 volume produces the following: [EMAIL PROTECTED] orcl]# rm testing rm: remove regular file `testing'? yes Segmentation fault [EMAIL PROTECTED] orcl]# Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [ cut here ] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] kernel BUG at fs/inode.c:253! Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] invalid opcode: [#1] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] CPU:0 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] EIP is at clear_inode+0x27/0x142 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] eax: 09c4 ebx: f5b56bdc ecx: edx: f5b56d28 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] esi: fc6d115b edi: f5b56bdc ebp: f5ac2f58 esp: f5ac2ebc Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] ds: 007b es: 007b ss: 0068 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] Process rm (pid: 5583, threadinfo=f5ac2000 task=f77eaab0) Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] Stack: 0fc6d115b f5b56bdc fc6d115b fc6d11a5 0001 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000]f1eb9240 00200246 f5b56bdc 0001 f58f8c9c Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] f5b56bdc fc6d115b c016ea67 f5b56bdc f5b56980 fc6d269d f228a6b8 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] Call Trace: Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [fc6d115b] ocfs2_delete_inode+0x0/0x56b [ocfs2] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [fc6d115b] ocfs2_delete_inode+0x0/0x56b [ocfs2] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [fc6d11a5] ocfs2_delete_inode+0x4a/0x56b [ocfs2] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [fc6d115b] ocfs2_delete_inode+0x0/0x56b [ocfs2] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c016ea67] generic_delete_inode+0x9e/0x13d Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [fc6d269d] ocfs2_drop_inode+0x5d/0x195 [ocfs2] Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c0306f0b] __mutex_unlock_slowpath+0x93/0x200 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c01c87b1] _atomic_dec_and_lock+0xd/0x3c Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c016ecae] iput+0x53/0x67 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c0165dca] do_unlinkat+0xba/0xf6 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c01041de] do_IRQ+0x53/0x85 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c0156434] filp_close+0x33/0x60 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] [c0102c03] sysenter_past_esp+0x54/0x75 Message from [EMAIL PROTECTED] at Tue Oct 24 18:45:17 2006 ... ora02 kernel: [17180140.004000] Code: c0 01 5b c3 56 ba f9 00 00 00 53 89 c3 b8 cf 41 32 c0 83 ec 04 e8 21 99 fa ff 89 d8 e8 50 ad fe ff 8b 83 28 01 00 00 85 c0 74 08 0f 0b fd 00 cf 41 32 c0 8b 83 ac 01 00 00 a8 10 75 08 0f 0b ff This is the module information: [EMAIL PROTECTED] orcl]# modinfo ocfs2 filename: /lib/modules/2.6.16.9/extra/ocfs2/ocfs2.ko author: Oracle license:GPL description:OCFS2 1.2.2 Tue Jul 4 22:18:34 EDT 2006 (build 89ef9a0a0785d11d426e7842446d505b) version:1.2.2 vermagic: 2.6.16.9 PENTIUM4 REGPARM 4KSTACKS gcc-3.4 depends:ocfs2_nodemanager,ocfs2_dlm,jbd srcversion: E4F740AE9E8176169DAB864 I see a 1.2.3 version has been released, and I'll try to see if that makes any difference. But in the mean time, is this a known issue? Btw. I have not recieved any messages on this list for a long time.
Re: [Ocfs2-users] lvm2 not cluster aware - okay, so how should Istripe my LUNs?
Fabio Corazza wrote: Last but not least.. a question for Sunil if he's gonna read this.. when OCFS2 will support data-on-inode would we need to reformat the file systems or will the new module be compatible with the 1.4 on-disk data? I am envisioning a compat flag to be added on existing volumes using tunefs.ocfs2. But it is hard to state anything with surety about a feature that is yet not implemented. :) Thanks team for the new OCFS2 tools by the way, now we can grow our file systems. Yet a step forward. One step at a time. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 Fencing and Locking MSA500 Array: Help
Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000250 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000250 That's where the problem begins. The cciss driver is unable to to complete the ios due to a bus reset maybe. Ping HP or whoever your contact is for the MSA500. You may get more information if you setup a netconsole server to catch the stack dumps. Deaderick, David (EDS) wrote: I have a RedHat Enterprise Linux 4.0 two node cluster on HP ProLiant ML350 Servers connected to an HP MSA500 with HP 532 SCSI adapters (cciss driver). The following list includes critical component versions: ocfs2console-1.2.1-1 Mon 28 Aug 2006 05:39:20 PM EDT ocfs2-2.6.9-42.0.2.ELsmp-1.2.3-1 Mon 28 Aug 2006 05:39:19 PM EDT ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1 Mon 28 Aug 2006 05:39:18 PM EDT ocfs2-2.6.9-42.0.2.EL-1.2.3-1 Mon 28 Aug 2006 05:39:17 PM EDT ocfs2-tools-1.2.1-1 Mon 28 Aug 2006 05:39:15 PM EDT oracleasmlib-2.0.2-1 Mon 28 Aug 2006 05:37:51 PM EDT oracleasm-2.6.9-42.0.2.ELhugemem-2.0.3-1 Mon 28 Aug 2006 05:37:49 PM EDT oracleasm-2.6.9-42.0.2.EL-2.0.3-1 Mon 28 Aug 2006 05:37:47 PM EDT oracleasm-2.6.9-42.0.2.ELsmp-2.0.3-1 Mon 28 Aug 2006 05:37:45 PM EDT oracleasm-support-2.0.3-1 Mon 28 Aug 2006 05:37:44 PM EDT kernel-hugemem-2.6.9-42.0.2.ELMon 28 Aug 2006 05:25:32 PM EDT kernel-doc-2.6.9-42.0.2.ELMon 28 Aug 2006 05:25:29 PM EDT kernel-hugemem-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 05:25:07 PM EDT kernel-smp-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 05:21:45 PM EDT kernel-smp-2.6.9-42.0.2.ELMon 28 Aug 2006 05:20:51 PM EDT kernel-utils-2.4-13.1.83 Mon 28 Aug 2006 05:20:48 PM EDT kernel-devel-2.6.9-42.0.2.EL Mon 28 Aug 2006 04:42:48 PM EDT kernel-2.6.9-42.0.2.ELMon 28 Aug 2006 04:42:37 PM EDT When ever a heavy load is on the I/O system (i.e. database full backups using RMAN), the servers fence, reboot and cannot reconnect with the MSA500. We must power the servers and the MSA500 off and restart. Where can I start troubleshooting this? /var/log/messages: (Node 2) Oct 11 05:16:56 vhaispora02 kernel: o2net: connection to node vhaispora01 (num 0) at 192.168.1.1: has been idle for 10 seconds, shutting it down. Oct 11 05:16:56 vhaispora02 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1160558206.560358 now 1160558216.558300 dr 1160558206.560323 adv 1160558206.560375:1160558206.560379 func (0d6da305:504) 1160552001.561116:1160552001.561125) Oct 11 05:16:56 vhaispora02 kernel: o2net: no longer connected to node vhaispora01 (num 0) at 192.168.1.1: Oct 11 05:16:59 vhaispora02 kernel: cciss0: unsolicited abort f7010e90 Oct 11 05:16:59 vhaispora02 kernel: cciss0: retrying f7010e90 . . . Oct 11 05:17:18 vhaispora02 kernel: cciss0: f7010550 retried too many times Oct 11 05:17:18 vhaispora02 kernel: cciss0: unsolicited abort f70107a0 Oct 11 05:17:18 vhaispora02 kernel: cciss0: f70107a0 retried too many times Oct 11 05:17:18 vhaispora02 kernel: cciss0: unsolicited abort f70109f0 Oct 11 10:35:57 vhaispora02 syslogd 1.4.1: restart. Oct 11 10:35:57 vhaispora02 syslog: syslogd startup succeeded Oct 11 10:35:57 vhaispora02 kernel: klogd 1.4.1, log source = /proc/kmsg started. Oct 11 10:35:57 vhaispora02 kernel: Linux version 2.6.9-42.0.2.ELsmp ([EMAIL PROTECTED]) (gcc version 3.4.6 20060404 (Red Hat 3.4.6-3)) #1 SMP Thu Aug 17 18:00:32 EDT 2006 /var/log/messages (Node 1) Oct 11 05:10:01 vhaispora01 crond(pam_unix)[14577]: session closed for user root Oct 11 05:14:25 vhaispora01 ntpd[3243]: synchronized to 10.4.31.254, stratum 2 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000250 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000250 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70004a0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f70004a0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70006f0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f70006f0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000940 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000940 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000b90 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000b90 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7000de0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7000de0 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7001030 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7001030 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f7001280 Oct 11 05:15:28 vhaispora01 kernel: cciss0: retrying f7001280 Oct 11 05:15:28 vhaispora01 kernel: cciss0: unsolicited abort f70014d0 Oct 11 05:15:28
Re: [Ocfs2-users] BUG: unable to handle kernel NULL pointer dereference
Please file a bugzilla with the details provided. It is easier to manage bugs that a way. Thanks Christian Schlittchen wrote: Thanks to syncronous writes on the log-files I finally managed to get a log of the regular panics we experience. The setup is as follows: Three blades (IBM HS20) accessing a shared storage on a fibre channel connected storage server (IBM DS4300). The storage is used as a central mailstorage for about 35000 users, so it is pretty heavy duty storage wise. blade01 crashes every few days with a kernel panic. Unfortunatly all watchdogs we tried fail to reboot the machine, and setting /proc/sys/kernel/panic and /proc/sys/kernel/panic_on_oops to non-zero values doesn't help either. The machine still responds to pings, but to nothing else. Even more unfortunatly the file system on the other blades starts to hang sometime after blade01 crashes. Logging /proc/slabinfo showed a steady increase of the size-256 and size-32 number of objects and we thought the crashes might have something to do with it. We then did a nightly umount/mount which reduced the values a bit and which does seem to reduce the frequency of crashes slightly. Nevertheless today we had a crash with rather low values of size-256 and size-32: From /proc/slabinfo, timestamped, a few seconds before the crash: 2006-10-27-06:20:01 size-256 92187 169605256 151 : tunables 120 608 : slabdata 11307 113 07 0 2006-10-27-06:20:01 size-3294037 534942 32 1131 : tunables 120 608 : slabdata 4734 47 34 0 The kern.log shows: Oct 27 06:20:11 blade01 kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 0004 Oct 27 06:20:11 blade01 kernel: printing eip: Oct 27 06:20:11 blade01 kernel: f92b9431 Oct 27 06:20:11 blade01 kernel: *pde = Oct 27 06:20:11 blade01 kernel: Oops: 0002 [#1] Oct 27 06:20:11 blade01 kernel: SMP Oct 27 06:20:11 blade01 kernel: Modules linked in: i6300esb ocfs2 xt_state ip_conntrack xt_limit ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager md_mod dm_snapshot dm_mirror dm_mod mptctl qla2xxx i2c_i801 firmware_class i2c_core scsi_transport_fc rtc Oct 27 06:20:11 blade01 kernel: CPU:1 Oct 27 06:20:11 blade01 kernel: EIP:0060:[f92b9431]Not tainted VLI Oct 27 06:20:11 blade01 kernel: EFLAGS: 00010286 (2.6.18 #1) Oct 27 06:20:11 blade01 kernel: EIP is at dlm_add_migration_mle+0x1f6/0x30a [ocfs2_dlm] Oct 27 06:20:11 blade01 kernel: eax: ebx: d61e4c00 ecx: c4ce5988 edx: Oct 27 06:20:11 blade01 kernel: esi: f7531de4 edi: c4ce5980 ebp: e1873080 esp: f7531d6c Oct 27 06:20:11 blade01 kernel: ds: 007b es: 007b ss: 0068 Oct 27 06:20:11 blade01 kernel: Process o2net (pid: 1698, ti=f753 task=c215b560 task.ti=f753) Oct 27 06:20:11 blade01 kernel: Stack: c0327a2c f7531d88 e6805a80 f7531e6c 0048 0040 d61e4c00 Oct 27 06:20:11 blade01 kernel:d899a020 0001 0102 d899a021 004d Oct 27 06:20:11 blade01 kernel:c4ce5980 d61e4c00 fff4 f92bb927 f7531de4 d899a020 001f Oct 27 06:20:11 blade01 kernel: Call Trace: Oct 27 06:20:11 blade01 kernel: [c0327a2c] sock_recvmsg+0xe9/0x10b Oct 27 06:20:11 blade01 kernel: [f92bb927] dlm_migrate_request_handler+0x17b/0x231 [ocfs2_dlm] Oct 27 06:20:11 blade01 kernel: [f9256762] o2net_process_message+0x46e/0x626 [ocfs2_nodemanager] Oct 27 06:20:11 blade01 kernel: [c0120312] __do_softirq+0x73/0xdf Oct 27 06:20:11 blade01 kernel: [f9256057] o2net_recv_tcp_msg+0x6b/0x7e [ocfs2_nodemanager] Oct 27 06:20:11 blade01 kernel: [c0114142] find_busiest_group+0x129/0x4f9 Oct 27 06:20:11 blade01 kernel: [f925819e] o2net_rx_until_empty+0x1e6/0x6b9 [ocfs2_nodemanager] Oct 27 06:20:11 blade01 kernel: [c011619f] __wake_up+0x32/0x43 Oct 27 06:20:11 blade01 kernel: [c012af5b] run_workqueue+0x73/0xe1 Oct 27 06:20:11 blade01 kernel: [f9257fb8] o2net_rx_until_empty+0x0/0x6b9 [ocfs2_nodemanager] Oct 27 06:20:11 blade01 kernel: [c012b710] worker_thread+0x143/0x15f Oct 27 06:20:11 blade01 kernel: [c011563d] default_wake_function+0x0/0x15 Oct 27 06:20:11 blade01 kernel: [c012b5cd] worker_thread+0x0/0x15f Oct 27 06:20:11 blade01 kernel: [c012e151] kthread+0xfc/0x100 Oct 27 06:20:11 blade01 kernel: [c012e055] kthread+0x0/0x100 Oct 27 06:20:11 blade01 kernel: [c0100d95] kernel_thread_helper+0x5/0xb Oct 27 06:20:11 blade01 kernel: Code: 98 0a 00 00 c7 44 24 0c 62 81 2c f9 89 54 24 08 89 44 24 04 c7 04 24 80 06 2d f9 e8 85 29 e6 c6 e9 57 fe ff ff 8b 57 08 8b 41 04 89 42 04 89 10 89 4f 08 89 49 04 eb 9c f7 05 a0 2b 26 f9 00 09 Oct 27 06:20:11 blade01 kernel: EIP: [f92b9431] dlm_add_migration_mle+0x1f6/0x30a [ocfs2_dlm] SS:ESP 0068:f7531d6c This is with a vanilla 2.6.18 kernel from kernel.org. There were no suspicious messages in the logs before the crash. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com
Re: [Ocfs2-users] Unexpected reboot / crash
The first issue could be because you don't have ocfs2-tools 1.2.2. The earlier version was missing a line in the ocfs2 init script. Rafal Maliszewski wrote: Hi guys I installed ocfs2 on 4 node (redhat 4u3) on shared FC devices ( EMC storage ). So I've noticed several problems: 1. When I restart first node, it shutdown so slow and I have to power off machine. What is the problem, heartbeat timeout, dlm ? 2. What will happen when I plug off network cabel ( for interconnect ). What shoud I do, unmount ocfs2 volume on each node, stop ocfs2 service? regards ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Interesting Error
Which version of OCFS2? Did you run fsck.ocfs2 -f on that device? Do: # echo stat 6518860 | debugfs.ocfs2 -n /dev/sdX /tmp/ext.out Email ext.out. Andy Kipp wrote: Anybody have any idea what this error involves? Or how to resolve it? Oct 30 05:11:24 groupwise-1-mht kernel: (8494,0):ocfs2_extent_map_find_leaf:287 ERROR: status = -53 Oct 30 05:11:24 groupwise-1-mht kernel: OCFS2: ERROR (device dm-0): ocfs2_extent_map_find_leaf: Extent 29 at e_blkno 1973744 of inode 6518860 goes past ip_clusters of 441 Oct 30 05:11:24 groupwise-1-mht kernel: Oct 30 05:11:24 groupwise-1-mht kernel: File system is now read-only due to the potential of on-disk corruption. Please run fsck.ocfs2 once the file system is unmounted. Oct 30 05:11:24 groupwise-1-mht kernel: (8494,0):ocfs2_extent_map_lookup_read:383 ERROR: status = -53 Oct 30 05:11:24 groupwise-1-mht kernel: (8494,0):ocfs2_extent_map_get_blocks:858 ERROR: status = -53 Oct 30 05:11:24 groupwise-1-mht kernel: (8494,0):ocfs2_get_block:171 ERROR: Error -53 from get_blocks(0xf3c2d608, 0, 1, 0, NULL) - Andy Andy Kipp Network Administrator Velcro USA Inc. 406 Brown Ave. Manchester, NH 03103 Phone: (603) 222-4844 Email: [EMAIL PROTECTED] CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 error messages
Are you using NFS by any chance? I am looking into bug#790 that also encounters the same error (ESTALE). Matthew Flusche wrote: I received the following error messages in the system logs. Is this anything to be concerned with? kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid dinode: i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0 kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate inode failed! i_blkno=1293597, i_ino=1293597 kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116 kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116 kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116 This is a three node cluster, no other error messages on any of the other nodes. System Information RHEL 4U4 2.6.9-42.0.2 kernel ocfs2console-1.2.1-1 ocfs2-tools-debuginfo-1.2.1-1 ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1 ocfs2-tools-1.2.1-1 Thanks, Matt ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Interesting Error
Replace sdX with the device on which the ocfs2 fs exists. You can use mount | grep ocfs2 to find that volume. If the inode on disk is good, one explanation for the issue could be the lvb bug which was fixed in 1.2.2. Ping Novell to get a PTF kernel with ocfs2 1.2.3. Andy Kipp wrote: Which version of OCFS2? ocfs2 1.2.1 (sles) ocfs2-tools 1.2.1 (sles) Did you run fsck.ocfs2 - f on that device? Not yet. Wanted to see what the error was about. Before I take down a production machine to do the fsck. Do: # echo stat 6518860 | debugfs.ocfs2 - n /dev/sdX /tmp/ext.out Email ext.out. This keeps returning saying that the device can not be found. I have tried running it with the following options with consideration for multipathing: /dev/sdb /dev/sdd /dev/dm-0 /dev/disk/by-name/vol_groupwise_data Am I missing the obvious? Thank for your help. Andy Kipp Network Administrator Velcro USA Inc. 406 Brown Ave. Manchester, NH 03103 Phone: (603) 222-4844 Email: [EMAIL PROTECTED] CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 error messages
So it is bug#790. It just may be a case of unnecessary error messages for you. I am still investigating it. Matthew Flusche wrote: Yes, one of the clustered file systems is shared with nfs. -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 31, 2006 12:25 PM To: Matthew Flusche Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] ocfs2 error messages Are you using NFS by any chance? I am looking into bug#790 that also encounters the same error (ESTALE). Matthew Flusche wrote: I received the following error messages in the system logs. Is this anything to be concerned with? kernel: (4074,0):ocfs2_populate_inode:234 ERROR: Invalid dinode: i_ino=1293597, i_blkno=1293597, signature = INODE01, flags = 0x0 kernel: (4074,0):ocfs2_read_locked_inode:389 ERROR: populate inode failed! i_blkno=1293597, i_ino=1293597 kernel: (4074,0):ocfs2_iget:131 ERROR: status = -116 kernel: (4074,0):ocfs2_iget:141 ERROR: status = -116 kernel: (4074,0):ocfs2_get_dentry:63 ERROR: status = -116 This is a three node cluster, no other error messages on any of the other nodes. System Information RHEL 4U4 2.6.9-42.0.2 kernel ocfs2console-1.2.1-1 ocfs2-tools-debuginfo-1.2.1-1 ocfs2-2.6.9-42.0.2.ELhugemem-1.2.3-1 ocfs2-tools-1.2.1-1 Thanks, Matt ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Ocfs2 and low memory
To monitor ocfs2 memory usage, do: # cat /proc/slabinfo | egrep 'ocfs|dlm|size-256 |size-32 ' ocfs2_lock16226 16 2261 : tunables 120 60 0 : slabdata 1 1 0 ocfs2_inode_cache 22 24 115231 : tunables 24 12 0 : slabdata 8 8 0 ocfs2_uptodate28119 32 1191 : tunables 120 60 0 : slabdata 1 1 0 ocfs2_em_ent 10183 64 611 : tunables 120 60 0 : slabdata 3 3 0 dlmfs_inode_cache 1 576851 : tunables 54 27 0 : slabdata 1 1 0 dlm_mle_cache 0 0384 101 : tunables 54 27 0 : slabdata 0 0 0 size-256 40245 40245256 151 : tunables 120 60 0 : slabdata 2683 2683 0 size-3241650 41650 32 1191 : tunables 120 60 0 : slabdata350350 0 # cat /proc/fs/ocfs2_dlm/*/stat local=26, remote=0, unknown=0 local=39963, remote=6, unknown=0 size-256/32 are generic slab caches but are also used by ocfs2dlm. The ocfs2dlm impact on it can be detected by the second cat which lists the number of locally mastered (local) locks which are currently not freed until the volume is umounted. (The patch-fix is being tested.) Rafal Maliszewski wrote: Hi guys I have 4 node cluster with OCFS2. From time to time redhat fired : oom-killer to kill process with high amout of memory ( for example tomcat) My friends suppose that ocfs2 consume memory. So I have question: How check how many memory is occupied by ocfs processes? Is it a low memory? regards ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Newbie questions -- is OCFS2 what I even want?
You are probably looking for a distributed file system. Check out afs and/or v9fs. Thad Beier wrote: Dear Sirs and Madams, I run a small visual effects production company, Hammerhead Productions. We'd like to have an easily extensible inexpensive relatively high-performance storage network using open-source components. I was hoping that OCFS2 would be that system. I have a half-dozen 2 TB fileservers I'd like the rest of the network to see as a single 12 TB disk, with the aggregate throughput of the six servers serving some 50 compute nodes on the network. Is this what OCFS2 is for, or not? My guess is that it isn't, but I'm having a hard time parsing the documentation. If it isn't what OCFS2 is for, what am I looking for? Thanks, Thad Beier -- Thad Beier Hammerhead Productions [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] about 2 nodes enviroment and metalink note 394827.1
I would imagine you are using RHEL4. If so, upgrade the ocfs2-tools to 1.2.2. The previous version of the ocfs2 init script did not always umount ocfs2 volumes on clean shutdowns leading to this problem. [EMAIL PROTECTED] wrote: Hi to all: In 2 nodes environment I've 'suffered' the 'reboot 1st node hangs 2nd one', has described in metalink note 394827.1 Exactly this note says that this occurs when interconnect fails. Then i understand that if interconnect fails the idea is that node 1 stay up and running and node 2 'kills' itself to avoid split-brain. When 1st node reboots ( planned reboot ) ocfs2 thinks that interconnect has failed? If this is true, the cluster is condemned to die, cause node 1 is rebooting and node 2 kills itself, isn't it? Under a well know 2 nodes environment, does not exist some type of message like '2nd node,I'm rebooting, don't panic and stay tuned ' ? :) Any tip to avoid this behaviour ? I think that one way ( not optimal in any way ) could be adding another node, only for ocfs2, to help second node to think that it is in the max nodes group, when the 1st node reboots... Regards and TIA D. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 Block / Clustersize with Oracle 10gR2
http://www.dominicgiles.com/swingbench.html ?? Brian Long wrote: The DBA wrote a patented Java-based application which stress tests the Oracle IO subsystem. We use this to benchmark our IO subsystems (compare SAN to NAS, etc). This same benchmark is showing a maximum sustained throughput of 3,400 IO/sec while the same benchmark with the same data will max out at 7K+ IO/sec on RAW. I'll grab the iostat data which we've kept over time and try to make some sense of it before posting anything additional. Thanks. /Brian/ On Thu, 2006-11-09 at 10:20 -0800, Sunil Mushran wrote: Why are you looking at iops and not the io thruput? What is the actual io thruput? Please could you share some iostat numbers with us. In all our tests, we've seen very little difference in the actual io thruput between raw and ocfs2. Clustersize will mainly affect the alloc/dealloc performance. It has very little role to play in io performance. If anything, it could help coalesce requests to reduce number of ios (read cdbs) required to do the task. Brian Long wrote: Hello, I followed the user's guide recommendation of 4K block size and 128K cluster size. I have 8 32GB OCFS2 filesystems mounted on two nodes. The DBA has created a large tablespace with 4GB data files on each filesystem. The performance is only getting 3,400 IO/sec read/write combined. If I re-use the LUNs and give the DBA 4GB raw partitions, he can get over 7,000 IO/sec read/write combined (single-node) and over 11,000 IO/sec on two nodes. What's my next step to improve performance of OCFS2? Since the DBA is using 4GB datafiles, should I increase the cluster size to the max 1MB? Thanks for any hints. /Brian/ ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] frozen ocfs2 filesystem under heavy webserver load
None of these locks are busy. So they should not be the cause of the problem. Start with the version of ocfs2. Also, which kernel? What does top say? Is some process spinning? Also, what does this stresstest entail? Stephan Hendl wrote: Hi, I use a cluster of 4 nodes with ocfs2 as a webserver cluster. During a stresstest it occurs after a couple of minutes that the webserverprocesses are ideling but the system load is extremely high (abou 150...200) where the waits are very low. After that I cannot interrupt the webserver processes anymore and in some directories a ls -ls comes not back - so it seems that the file system has a problem. Only a reset of the server solves the problem ;-(( In the debug mode I can find the following lines as an example: Lockres: D0ccda0fcc1c07e Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: M0ccda0fcc1c07e Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: D0ccd9ffcc1c07d Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Lockres: M0ccd9ffcc1c07d Mode: Exclusive Flags: Initialized Attached RO Holders: 0 EX Holders: 0 Pending Action: None Pending Unlock Action: None Requested Mode: Exclusive Blocking Mode: Invalid Could it be that two servers like to write to the same file and under heavy load the clusterd processes cannot handle this? Regards und thanks, Stephan ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Ocfs2 errors on 3 node cluster
It will be easier if you file a bug on oss.oracle.com/bugzilla with all the details. Like messages files from all nodes, etc. Why are you using 1.2.1? 1.2.3 has been out for few months now. Randy Ramsdell wrote: Hi, Maybe someone could elaborate on these re-occuring ocfs2 errors that always results in a reboot of 1 or more systems. Our setup: 3 node cluster Ocfs2 v. 1.2.1 OpenSuse 10.1 SAN storage uses Iscsi for disk access. Cluster settings: # O2CB_ENABELED: 'true' means to load the driver on boot. O2CB_ENABLED=true # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. O2CB_BOOTCLUSTER=ocfs2 # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. O2CB_HEARTBEAT_THRESHOLD=60 Kernel line parameters: elevator=deadline panic=5 I have used the deadline or not testing to see if this will help. The messages we receive are simply this: Node 0 Nov 4 10:54:10 atl02010305 kernel: o2net: connection to node atl02010310 (num 0) at 192.168.3.110: has been idle for 10 seconds, shutting it down. Nov 4 10:54:10 atl02010305 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1162655640.698739 now 1162655650.695937 dr 1162655640.698734 adv 1162655640.698739:1162655640.698739 func (ca3835ec:504) 1162654980.779007:1162654980.779011) Nov 4 10:54:10 atl02010305 kernel: o2net: no longer connected to node atl02010310 (num 0) at 192.168.3.110: And the complimentary Node 1 Nov 4 10:54:11 atl02010310 kernel: o2net: connection to node atl02010305 (num 1) at 192.168.3.105: has been idle for 10 seconds, shutting it down. Nov 4 10:54:11 atl02010310 kernel: (32479,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1162655640.698521 now 1162655650.701661 dr 1 162655650.695829 adv 1162655640.698525:1162655640.698525 func (ca3835ec:505) 1162654980.778881:1162654980.778886) Nov 4 10:54:11 atl02010310 kernel: o2net: no longer connected to node atl02010305 (num 1) at 192.168.3.105: This showed up shortly after and repeated for hours: Nov 4 11:00:00 atl02010310 kernel: (32540,1):dlm_send_remote_convert_request:398 ERROR: status = -107 Nov 4 11:00:00 atl02010310 kernel: (32540,1):dlm_wait_for_node_death:371 32E007178FA24E87B45ECDDE6F7D5D52: waiting 5000ms for notification of death of node 1 Nov 4 11:00:04 atl02010310 sshd[5242]: Accepted publickey for nagios from 192.168.3.102 port 44292 ssh2 Node 3 saw nothing. So I wonder why neither node rebooted from a kernel panic? Or what happened, in general. Weren't they supposed to fence etc..? Randy Ramsdell ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Bad magic number in inode
The quick detect just looks for the superblock which is in the third block of the device. The full detect looks up the superblock and then the system directory. In your case it fails to locate the latter. This is one of the quirks when using an unpartitioned disk and later partitioning it. The partitioning does not clear all the header blocks that includes the fs superblock. Long story short... use any binary editor (bvi), open the volume and search for the OCFSV2 string. Change that signature. Hint: It will be the first string in the third block. If you had formatted with 4k blocksize, it will be on block starting at 8K, for 2k it will be block starting at 4K you get the drift. Marcel Savelkoul wrote: Hi, Pretty new here with SAN's and Oracle RAC. I had set up everything but because I wanted to enlarge one of the disks used I removed everything and started over and stumble upon the following: There is one disk defined on the SAN. This is /dev/sda. I haven't done fdisk yet so I also don't have /dev/sda1 yet. But if I now do the mounted.ocfs2 command I see the following: # mounted.ocfs2 -d DeviceFS UUID Label /dev/sda ocfs2 6fe56000-97e6-4ea1-a302-29a8213c6e04 oradb # mounted.ocfs2 -f DeviceFS Nodes /dev/sda ocfs2 Unknown: Bad magic number in inode This device is also listed when I check it with the ocfs2console. I tried to open it with debug but then I get this: # mount -t debugfs debugfs /debug # echo fs_locks | debugfs.ocfs2 /dev/sda /tmp/fslocks debugfs.ocfs2 1.2.2 Could not open debug state for 6FE5600097E64EA1A30229A8213C6E04. Perhaps that OCFS2 file system is not mounted? If I now use fdisk to create a partition on the device /dev/sda I see the following with the mounted.ocfs2 command: # mounted.ocfs2 -f DeviceFS Nodes /dev/sdc ocfs2 Unknown: Bad magic number in inode /dev/sdc1 ocfs2 Not mounted Now the /dev/sda isn't listed anymore in the ocfs2console but the /dev/sda1 is and after actually mounting the /dev/sdc1 I see: # mounted.ocfs2 -f DeviceFS Nodes /dev/sda ocfs2 Unknown: Bad magic number in inode /dev/sda1 ocfs2 pub.host.com Why do I keep seeing the /dev/sda device? Best Regards, Marcel ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem
Again, create a bug on oss.oracle.com/bugzilla and upload the messages files from both nodes. It is hard to state anything with incomplete information. [EMAIL PROTECTED] wrote: I decided to rebuild this from scratch today and got the same result. two cluster node, both boxes remain connected to the shared storage throughout tests. I unplug network connection from node0 and get e1000 driver Tx Unit Hang messages on node0 console node1 console displays o2net_idle_timer:1309 here are some times to help debug the situation followed by additional output node1 sits for a while and eventually displays o2quo_make_decision:143 error: fencing this node because it is connected to a half-quorum of one of two nodes which doesn't include the lowest active node 0 node 0 replays node 1's journal, too bad it still isn't on the network this is in node 1 /var/log/messages after reboot Nov 14 23:55:56 FTP02 kernel: o2net: connection to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: has been idle for 10 seconds, shutting it down. Nov 14 23:55:56 FTP02 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1163570146.656474 now 1163570156.65 5334 dr 1163570146.656446 adv 1163570146.656476:1163570146.656478 func (3a33f0f8:505) 1163570057.403947:1163570057.403950) Nov 14 23:55:56 FTP02 kernel: o2net: no longer connected to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: I'm confused by this. Shouldn't node 0 have eventually rebooted since it lost network connectivity and node 1 replayed node 0's journal and kept going? As it is right now we are left with no IP reachable box. If I do this same test but unplug node 1 instead of node 0, it works as it should. node 1 will fence and node 0 will reply the journal and stay online. Any input is greatly appreciated. Thanks, Colin Farley Network Administrator E-Care Contact Center Services Phone:(204) 940-6244 Fax:(204) 940-7394 Sunil Mushran [EMAIL PROTECTED] acle.com To [EMAIL PROTECTED] 11/13/2006 08:23 cc PMocfs2-users@oss.oracle.com Subject Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem Considering o2net only cares whether it is connected to the other node or not, it should not make a difference whether one unplugs node 0 or node 1. The result should be the same. Node 1 should fence in both cases. Do you see messages indicating that the node(s) have lost connectivity? If so, could you share them. It would be easiest if you could file a bug on oss.oracle.com/bugzilla with the messages file and listing the course of events... as in, unplugged cable on node 0 at time x, etc. [EMAIL PROTECTED] wrote: I'm testing a 2 node cluster in a VMWare ESX environment for use as a high availability FTP server to support a CRM application. Both nodes run Unbreakable 2.0 x86_64. They access a 300GB OCFS2 volume on an RDM LUN on an HP EVA. All disk connectivity is fine and haven't seen any problems there. The problem comes when doing some IP failover testing. The IP failover is done using UCARP so to test failover I tried unplugging one nodes virtual network cable to see what happens. If I unplug node 1 everything is fine, node 1 eventually panics and reboots while node 0 chugs along fine. The problem comes when unplugging node 0. When node 0 loses network connectivity it does not panic and eventually node 1 panics and reboots. Is there a reason why the lower node does not panic if it loses network connectivity? Heartbeat thresholds are the same on each node at 31 and both nodes are set to reboot on panic, node0 just never panics. All software installed are versions that come with Unbreakable 2.0. I didn't do the config on these boxes so the first thing I'm going to do on Tuesday when I work on this is rebuild both nodes from scratch but I figured I would ask first to see if it was an easy question for someone on the list to answer
Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem
You are missing his point. He is not saying that fencing is the problem. He is asking as to why the behavior differs between unplugging node 0 and node 1. Alexei_Roudnev wrote: It is not a bug; it is all by design. Problem is that OCFSv2: - can't support more than 1 interconnection link, so you always risk to lost intercionnection; In additional, to make things worst, it dont support serial interconenction; - can't find a quorum in 2 node configuration (it's not ocfsv2 problem but general concern with any 2 nodes cluster) - so all nodes lost quorum if network connection is lost; - don't analyze FS activity and reboot all nodes without quorum, except node0, in case of losing network connection. It can't be improved without supporting multiple interconnections + better decisions about fencing (there is not any use to fence a node, if it have not outstanding IO on cluster file system). Well known problem with OCFSv2. One solution is to add 3-d node and use interface bonding (be sure that interface convergeency time is less that o2cb timeout). - Original Message - From: [EMAIL PROTECTED] To: Sunil Mushran [EMAIL PROTECTED] Cc: ocfs2-users@oss.oracle.com Sent: Tuesday, November 14, 2006 10:35 PM Subject: Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem I decided to rebuild this from scratch today and got the same result. two cluster node, both boxes remain connected to the shared storage throughout tests. I unplug network connection from node0 and get e1000 driver Tx Unit Hang messages on node0 console node1 console displays o2net_idle_timer:1309 here are some times to help debug the situation followed by additional output node1 sits for a while and eventually displays o2quo_make_decision:143 error: fencing this node because it is connected to a half-quorum of one of two nodes which doesn't include the lowest active node 0 node 0 replays node 1's journal, too bad it still isn't on the network this is in node 1 /var/log/messages after reboot Nov 14 23:55:56 FTP02 kernel: o2net: connection to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: has been idle for 10 seconds, shutting it down. Nov 14 23:55:56 FTP02 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1163570146.656474 now 1163570156.65 5334 dr 1163570146.656446 adv 1163570146.656476:1163570146.656478 func (3a33f0f8:505) 1163570057.403947:1163570057.403950) Nov 14 23:55:56 FTP02 kernel: o2net: no longer connected to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: I'm confused by this. Shouldn't node 0 have eventually rebooted since it lost network connectivity and node 1 replayed node 0's journal and kept going? As it is right now we are left with no IP reachable box. If I do this same test but unplug node 1 instead of node 0, it works as it should. node 1 will fence and node 0 will reply the journal and stay online. Any input is greatly appreciated. Thanks, Colin Farley Network Administrator E-Care Contact Center Services Phone:(204) 940-6244 Fax:(204) 940-7394 Sunil Mushran [EMAIL PROTECTED] acle.com To [EMAIL PROTECTED] 11/13/2006 08:23 cc PMocfs2-users@oss.oracle.com Subject Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem Considering o2net only cares whether it is connected to the other node or not, it should not make a difference whether one unplugs node 0 or node 1. The result should be the same. Node 1 should fence in both cases. Do you see messages indicating that the node(s) have lost connectivity? If so, could you share them. It would be easiest if you could file a bug on oss.oracle.com/bugzilla with the messages file and listing the course of events... as in, unplugged cable on node 0 at time x, etc. [EMAIL PROTECTED] wrote: I'm testing a 2 node cluster in a VMWare ESX environment for use as a high availability FTP server to support a CRM application. Both nodes run Unbreakable 2.0 x86_64. They access a 300GB OCFS2 volume on an RDM LUN on an HP EVA. All disk connectivity is fine and haven't seen any problems there. The problem comes when doing some IP failover testing. The IP failover is done using UCARP so to test failover I tried unplugging one nodes virtual network cable to see what happens. If I unplug node 1 everything is fine, node 1 eventually panics and reboots while node 0 chugs along fine. The problem comes when unplugging node 0. When node 0 loses network connectivity it does not panic and eventually node 1 panics and reboots. Is there a reason why
Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem
Again, read his email. Alexei_Roudnev wrote: Behavior is not difference - if you broke node1-node0 connection, node1 will self-reboot in the current design. It dont matter what exactly you unplug - socket on nod1, socket on node2 or inter-switch connection if it is used. Add node-3 and everything will change. - Original Message - From: Sunil Mushran [EMAIL PROTECTED] To: Alexei_Roudnev [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; ocfs2-users@oss.oracle.com Sent: Wednesday, November 15, 2006 11:03 AM Subject: Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem You are missing his point. He is not saying that fencing is the problem. He is asking as to why the behavior differs between unplugging node 0 and node 1. Alexei_Roudnev wrote: It is not a bug; it is all by design. Problem is that OCFSv2: - can't support more than 1 interconnection link, so you always risk to lost intercionnection; In additional, to make things worst, it dont support serial interconenction; - can't find a quorum in 2 node configuration (it's not ocfsv2 problem but general concern with any 2 nodes cluster) - so all nodes lost quorum if network connection is lost; - don't analyze FS activity and reboot all nodes without quorum, except node0, in case of losing network connection. It can't be improved without supporting multiple interconnections + better decisions about fencing (there is not any use to fence a node, if it have not outstanding IO on cluster file system). Well known problem with OCFSv2. One solution is to add 3-d node and use interface bonding (be sure that interface convergeency time is less that o2cb timeout). - Original Message - From: [EMAIL PROTECTED] To: Sunil Mushran [EMAIL PROTECTED] Cc: ocfs2-users@oss.oracle.com Sent: Tuesday, November 14, 2006 10:35 PM Subject: Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem I decided to rebuild this from scratch today and got the same result. two cluster node, both boxes remain connected to the shared storage throughout tests. I unplug network connection from node0 and get e1000 driver Tx Unit Hang messages on node0 console node1 console displays o2net_idle_timer:1309 here are some times to help debug the situation followed by additional output node1 sits for a while and eventually displays o2quo_make_decision:143 error: fencing this node because it is connected to a half-quorum of one of two nodes which doesn't include the lowest active node 0 node 0 replays node 1's journal, too bad it still isn't on the network this is in node 1 /var/log/messages after reboot Nov 14 23:55:56 FTP02 kernel: o2net: connection to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: has been idle for 10 seconds, shutting it down. Nov 14 23:55:56 FTP02 kernel: (0,0):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1163570146.656474 now 1163570156.65 5334 dr 1163570146.656446 adv 1163570146.656476:1163570146.656478 func (3a33f0f8:505) 1163570057.403947:1163570057.403950) Nov 14 23:55:56 FTP02 kernel: o2net: no longer connected to node FTP01.mydomain.net (num 0) at 10.xxx.0.45: I'm confused by this. Shouldn't node 0 have eventually rebooted since it lost network connectivity and node 1 replayed node 0's journal and kept going? As it is right now we are left with no IP reachable box. If I do this same test but unplug node 1 instead of node 0, it works as it should. node 1 will fence and node 0 will reply the journal and stay online. Any input is greatly appreciated. Thanks, Colin Farley Network Administrator E-Care Contact Center Services Phone:(204) 940-6244 Fax:(204) 940-7394 Sunil Mushran [EMAIL PROTECTED] acle.com To [EMAIL PROTECTED] 11/13/2006 08:23 cc PMocfs2-users@oss.oracle.com Subject Re: [Ocfs2-users] ESX and Unbreakable 2.0 OCFS2 problem Considering o2net only cares whether it is connected to the other node or not, it should not make a difference whether one unplugs node 0 or node 1. The result should be the same. Node 1 should fence in both cases. Do you see messages indicating that the node(s) have lost connectivity? If so, could you share them. It would be easiest if you could file a bug on oss.oracle.com/bugzilla with the messages file and listing the course of events... as in, unplugged cable on node 0 at time x, etc. [EMAIL PROTECTED] wrote: I'm testing a 2 node cluster in a VMWare ESX environment for use as a high availability FTP server to support a CRM application. Both
Re: [Ocfs2-users] re: o2hb_write_timeout:270 ERROR: Heartbeat write timeout
As ocfs2 heartbeats on the same device, unplugging a different device on the storage should not affect ocfs2 as long as the ios are completing. But the logs indicate otherwise. HB ios are erroring out. The o2net message is the tcp connect message. We will be providing a way to configure that too. Peter Santos wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Suni, after trying to chase this down, I think one of our sa's might have restarted the storage without notifying anyone. Similarly, today a disk that was not in use was re-initialized and caused everything to come down. I don't know if this is an issue with ocfs2 or ( old_storage + our sa doing this incorrectly). The idea was to re-initialize a disk that was not being used (sdc) and not have it affect the ocfs2 storage (sdb). After the re-initialization completed, I noticed that all 3 nodes weren't working and this was what I found on dbo3 === Nov 21 11:40:36 dbo3 kernel: o2net: connection to node dbo2 (num 1) at 192.168.134.141: has been idle for 10 seconds, shutting it down. Nov 21 11:40:36 dbo3 kernel: (0,1):o2net_idle_timer:1310 here are some times that might help debug the situation: (tmr 1164127226.293816 now 1164127236.291931 dr 1164127226.293797 adv 1164127226.293818:1164127226.293819 func (a77953f3:2) 1164124426.747626:1164124426.747628) Nov 21 11:40:36 dbo3 kernel: o2net: no longer connected to node dbo2 (num 1) at 192.168.134.141: Nov 21 11:41:11 dbo3 kernel: SCSI error : 1 0 0 0 return code = 0x1 Nov 21 11:41:11 dbo3 kernel: end_request: I/O error, dev sdb, sector 591502543 Nov 21 11:41:11 dbo3 kernel: SCSI error : 1 0 0 0 return code = 0x1 ... Nov 21 11:41:11 dbo3 kernel: SCSI error : 1 0 0 0 return code = 0x1 Nov 21 11:41:11 dbo3 kernel: end_request: I/O error, dev sdb, sector 591502568 Nov 21 11:41:11 dbo3 kernel: (3711,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 Nov 21 11:41:11 dbo3 kernel: (3789,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 Nov 21 11:41:11 dbo3 kernel: SCSI error : 1 0 0 0 return code = 0x1 Nov 21 11:41:11 dbo3 kernel: end_request: I/O error, dev sdb, sector 1983 Nov 21 11:41:11 dbo3 kernel: (6614,0):o2hb_bio_end_io:332 ERROR: IO Error -5 Nov 21 11:41:11 dbo3 kernel: SCSI error : 1 0 0 0 return code = 0x1 Nov 21 11:41:11 dbo3 kernel: end_request: I/O error, dev sdb, sector 3921780 Nov 21 11:41:11 dbo3 kernel: (6614,0):o2hb_bio_end_io:332 ERROR: IO Error -5 Nov 21 11:41:11 dbo3 kernel: (3711,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 Nov 21 11:41:11 dbo3 kernel: (3789,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 ... Nov 21 11:41:11 dbo3 kernel: (3711,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 Nov 21 11:41:11 dbo3 kernel: (3789,0):o2hb_do_disk_heartbeat:954 ERROR: status = -5 Nov 21 11:41:11 dbo3 su: pam_unix2: session finished for user oracle, service su Nov 21 11:41:11 dbo3 logger: Oracle CSSD failure 134. Nov 21 11:45:07 dbo3 syslogd 1.4.1: restart. I'm curious about the message o2net: connection to node dbo2 (num 1) at 192.168.134.141: has been idle for 10 seconds, shutting it down. I have increased my O2CB_HEARTBEAT_THRESHOLD to 61, but where is this message getting 10 seconds from? Also this message is displayed because dbo2 was not able to check into the hearbeat filesystem right ? - -peter Sunil Mushran wrote: On nodes db01 and db03 hb timed-out at 17:12:49. However, the nodes did not fully panic. As in, the network was shutdown but the hb thread was still going strong for some reason. Within 10 secs of that, by 17:12:59, db02 detected loss of network connectivity with both nodes db01 and db03. However, it was still seeing the nodes hb on disk and assumed that they were alive. As per quorum rules, it paniced. So the qs is: what was happening on nodes db01 and db03 after 17:12:49? Peter Santos wrote: Folks, I'm trying to piece together what happened during a recent event where our 3 node RAC cluster had problems. It appears that all 3 nodes restarted .. which is likely to occur if all 3 nodes cannot communicate with the shared ocfs2 storage. I did find out from our SA, that this happened during the time he was replacing a failed drive on the storage and the storage was in a degraded mode. I'm trying to understand if the 3 nodes had a difficult time accessing the shared ocfs2 volume or was it a tcp connectivity issue. There is nobody currently using the cluster ..so it should have been idle from a user perspective. prompt# cat /etc/fstab | grep ocfs2 /dev/sdb1 /ocfs2 ocfs2 _netdev,datavolume,nointr 0 0 /dev/sdb2 /backups ocfs2 _netdev,datavolume,nointr 0 0 we have 2 ocfs2 volumes.. once if for the voting and ocr files, while the other is to be used as a shared storage for backups of archivelog files etc. /var/log/messages NODE1 (dbo1
Re: [Ocfs2-users] Oracle 9i RAC on OCFS2
Refer to CDSL (Conext Dependent Symbolic Links) in the OCFS2 user's guide. Marcel Savelkoul wrote: Hi, I'm setting up a 2-node Oracle 9i RAC on OCFS2. But I have some problems with understanding how the shared Oracle_Home is being used. For instance there is the *$ORACLE_HOME/oracm/admin/cmcfg.ora* file. The $ORACLE_HOME is on the SAN so the 2 nodes have access to this file. How will the settings then be done per node? Is there a guide/tutorial/howto for installing Oracle 9i RAC on OCFS2 with shared Oracle_Home's? Or is it better to just install it without shared Oracle_Home's? Best Regards, Marcel ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 and berkeley database files
You are on a very old release of OCFS2. The OCFS2 homepage and FAQ both list a SLES9 kernel version newer than the one you are using. But that may not be the reason for the error. My bet is that bdb is attempting to create a shared writeable mmap that ocfs2 1.2 does not support. [EMAIL PROTECTED] wrote: Hello Forum, when trying to install a mailserver on a shared SAN Device with ocfs2, I realized some strange problems with Berkeley .db files: Trying to install OpenLDAP: Dec 1 16:16:56 server_1 slapd[18497]: bdb_db_init: Initializing BDB database Dec 1 16:16:56 server_1 slapd[18498]: bdb(ou=users,dc=mycomp,dc=de): mmap: Invalid argument Dec 1 16:16:56 server_1 slapd[18498]: bdb_db_open: dbenv_open failed: Invalid argument (22) Dec 1 16:16:56 server_1 slapd[18498]: backend_startup: bi_db_open failed! (22) Dec 1 16:16:56 server_1 slapd[18498]: bdb(ou=users,dc=mycomp,dc=de): DB_ENV-lock_id_free interface requires an environment configured for th e locking subsystem Dec 1 16:16:56 server_1 slapd[18498]: bdb(ou=users,dc=mycomp,dc=de): txn_checkpoint interface requires an environment configured for the tran saction subsystem Dec 1 16:16:56 server_1 slapd[18498]: bdb_db_destroy: txn_checkpoint failed: Invalid argument (22) Dec 1 16:17:50 server_1 slapd[18525]: bdb_db_init: Initializing BDB database Dec 1 16:17:50 server_1 slapd[18526]: bdb(ou=users,dc=mycomp,dc=de): mmap: Invalid argument Dec 1 16:17:50 server_1 slapd[18526]: bdb_db_open: dbenv_open failed: Invalid argument (22) Dec 1 16:17:50 server_1 slapd[18526]: backend_startup: bi_db_open failed! (22) Dec 1 16:17:50 server_1 slapd[18526]: bdb(ou=users,dc=mycomp,dc=de): DB_ENV-lock_id_free interface requires an environment configured for th e locking subsystem Dec 1 16:17:50 server_1 slapd[18526]: bdb(ou=users,dc=mycomp,dc=de): txn_checkpoint interface requires an environment configured for the tran saction subsystem . When installing on a standard ext3 - no problems. When installing cyrus (initial start after creation of cyrus admin account): Dec 5 04:15:01 server_1 ctl_mboxlist[29435]: IOERROR: mapping /var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 ctl_cyrusdb[32679]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 idled[32680]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 idled[32680]: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3[32684]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3[32684]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32685]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32685]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32691]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32691]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3[32690]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3[32690]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3[32693]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3[32693]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32694]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32694]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32695]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32695]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3[32697]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3[32697]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32698]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32698]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 pop3s[32701]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 pop3s[32701]: Fatal error: failed to mmap /communication/var/lib/imap/mailboxes.db file Dec 5 14:14:43 server_1 sieve[32686]: IOERROR: mapping /communication/var/lib/imap/mailboxes.db file: Invalid argument Dec 5 14:14:43 server_1 sieve[32692]:
Re: [Ocfs2-users] Oracle Application Server 10.1.2.0.2 Install on OCFS2
strace apache. That may provide us with some clues. [EMAIL PROTECTED] wrote: Hello all, Has anyone installed Oracle Application Server 10.1.2.0.2 Infrastructure tier including the preseeded 10.1.0.4 database (High Availability option otherwise known as a cold failover cluster) on OCFS2 where the ocfs2 device is only mounted on one node at a time? I am trying to emulate Red Hat Cluster Suite on OCFS2 in an active/passive mode. The software installs ok and the database runs. However, it appears that Apache fails to start and no logs or error output is generated. A local disk installation is successful. We are running Oracle's Enterprise Linux x86 with 1.2.3 ocfs2 module (included with release). For Sunil and team, the Oracle Support SR is 6019530.994 if you care for further background. Any help is appreciated. Best regards, Tony Orlando MFG Systems [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (work) [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] (home) ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 and berkeley database files
ocfs2 supports private mmap r/w and shared mmap readonly. Shared mmap writeable is the only piece missing. We should have that by 1.4. Alexei_Roudnev wrote: There was a clear answer, WHY it did not worked on OCFSv2: - BerkleyDB and LDAP uses mmap to the files; - OCFSv2 don't implement it (because it is not possible to do such mapping in the cluster FS) - So they dont work on OCFSv2 Am I correct? - Original Message - From: [EMAIL PROTECTED] To: Michael Wood [EMAIL PROTECTED] Cc: ocfs2-users@oss.oracle.com Sent: Wednesday, December 06, 2006 2:33 AM Subject: Re: [Ocfs2-users] OCFS2 and berkeley database files Am Mi, 6.12.2006, 09:35, schrieb Michael Wood: Berkeley DB does have problems with certain filesystems (e.g. NFS) so maybe this is a similar issue. (Just a wild guess.) Yes I found the Orcale FAQ. So it is s structural problem with a simple DB and then trying to open it more than once OpenLDAP supports various different backends. Maybe one of the others will work better? Perhaps, but my guess is, that the funktionality of an cluster filesystem always uses an access to every file, even if it is not connected to a service. So I think it will not work if theses files are not able to handle more than one access at one time. Cyrus also supports various different database types for its databases. Maybe try one of the others. I have never used either OpenLDAP or Cyrus on ocfs2, so I can't guarantee anything :) Same as above Server is a DELL Poweredge with EMC CLARiiON SAN System SuSE SLES9 SP3+ - 2.6.5-7.244-smp Kernel with (... modinfo ocfs2): license:GPL author: Oracle version:1.1.7-SLES 5AF01E6455FC04917FE52FB description:OCFS2 1.1.7-SLES Tue Nov 1 14:45:27 PST 2005 (build sles) depends:ocfs2_nodemanager,ocfs2_dlm,jbd supported: yes vermagic: 2.6.5-7.244-smp SMP gcc-3.3 [snip] 2.6.5-7.244 was released with SLES9 SP3. You might want to patch your box to 2.6.5-7.282 (which I think is the latest.) The 7.282 version of the SLES9 SP3 kernel comes with ocfs2 version 1.2.1-SLES. Yes, but this EMC multipath software installs some kernelmodules (binaries, no sources) and they only match with the kernels in the list. So I have to wait for the next release :((( On the ocfs2 homepage it says: SuSE Linux Exterprise Server 9: OCFS2 1.2.1 is bundled with the SLES9 SP3 (2.6.5-7.257 and later) release. SLES9 users must upgrade to the latest SP3 kernel (2.6.5-7.257 or later) and install ocfs2-tools and ocfs2console packages. For more on OCFS2 bundling with SLES9, refer to the Novell SLES9 section in the FAQ See above -- Michael Wood [EMAIL PROTECTED] ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] re: is it possible for the o2cb stack to monitor multiple clusternames on the same box
To expand on it, cluster is just a grouping for nodes. They nodes only actively work together when two or more mount the same volume and thus joining a common domain. Say you add two test nodes in that cluster but ensure that they do not mount the volumes being used by the rac cluster, they will never be part of that domain. Sunil Mushran wrote: Currently it supports only one cluster. Peter Santos wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Folks, When I installed ocfs2 the first time and setup oracle to work with it, the clustername defaulted to ocfs2. We are testing adding new nodes, but we would like to add new nodes to the o2cb cluster in a different clustername. Do I need to do anything to keep that separate on the filesystem. I just want to make sure that when a user is testing adding/deleting nodes from a cluster, it's not the ocfs2 because that's the one I'm using for RAC. - -peter -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFFiZ2/oyy5QBCjoT0RAl6UAJ95jPfwFkJEkUTH2f1/+mGqZu1XhQCeNgOs 5zKPGX32Q8B4e0UbruFPk0Y= =pUMY -END PGP SIGNATURE- ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Problem installing OCFS 1.2.3
depmod -a ? Lin Shen (lshen) wrote: Switched the kernel to 2.6.9-42.Elsmp, still got the same error. [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 12:08 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 Refer to the FAQ. The module's kernel version should match the kernel version. Lin Shen (lshen) wrote: Hi, The kernel I'm using is: [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.7.ELsmp #1 SMP Tue Sep 5 18:29:39 EDT 2006 i686 i686 i386 GNU/Linux So I installed ocfs2-2.6.9-42.ELsmp-1.2.3-1.i686.rpm. While trying to load the modules, I got the following error: [EMAIL PROTECTED] Desktop]# /etc/init.d/o2cb load Loading module configfs: Unable to load module configfs Failed Any ideas? Thanks lin ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Problem installing OCFS 1.2.3
Lin Shen (lshen) wrote: That worked, thanks. I'm a newbie to OCFS2, so bear with me if some of my questions sound too trivial. I couldn't find the answers either in FAQ or User's Guide. 1. Does OCFS2 need gnbd (or similar thing) to work with off-the-shelf PC boxes like GFS does? If I have two nodes A and B, each has a partition. How do I include both partitions into a single instance of OCFS2 file system? you need a shared disk. you can use iscsi. 2. Does OCFS2 support cross-node RAID? no 3. How easy is it to port OCFS2 onto a different transport protocol such as TIPC? o2net code is pretty well contained and isolated. while we have discussed tipc, not sure if we ever gave it a serious look. lin -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 1:21 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 depmod -a ? Lin Shen (lshen) wrote: Switched the kernel to 2.6.9-42.Elsmp, still got the same error. [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 12:08 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 Refer to the FAQ. The module's kernel version should match the kernel version. Lin Shen (lshen) wrote: Hi, The kernel I'm using is: [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.7.ELsmp #1 SMP Tue Sep 5 18:29:39 EDT 2006 i686 i686 i386 GNU/Linux So I installed ocfs2-2.6.9-42.ELsmp-1.2.3-1.i686.rpm. While trying to load the modules, I got the following error: [EMAIL PROTECTED] Desktop]# /etc/init.d/o2cb load Loading module configfs: Unable to load module configfs Failed Any ideas? Thanks lin ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] Problem installing OCFS 1.2.3
theoretically yes... but for practical usage go with atleast iscsi Lin Shen (lshen) wrote: So w/o shared disk, is it possible to make OCFS2 to work by utilizing GNBD or etc? lin -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 2:48 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 Lin Shen (lshen) wrote: That worked, thanks. I'm a newbie to OCFS2, so bear with me if some of my questions sound too trivial. I couldn't find the answers either in FAQ or User's Guide. 1. Does OCFS2 need gnbd (or similar thing) to work with off-the-shelf PC boxes like GFS does? If I have two nodes A and B, each has a partition. How do I include both partitions into a single instance of OCFS2 file system? you need a shared disk. you can use iscsi. 2. Does OCFS2 support cross-node RAID? no 3. How easy is it to port OCFS2 onto a different transport protocol such as TIPC? o2net code is pretty well contained and isolated. while we have discussed tipc, not sure if we ever gave it a serious look. lin -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 1:21 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 depmod -a ? Lin Shen (lshen) wrote: Switched the kernel to 2.6.9-42.Elsmp, still got the same error. [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.ELsmp #1 SMP Wed Jul 12 23:27:17 EDT 2006 i686 i686 i386 GNU/Linux -Original Message- From: Sunil Mushran [mailto:[EMAIL PROTECTED] Sent: Thursday, January 04, 2007 12:08 PM To: Lin Shen (lshen) Cc: ocfs2-users@oss.oracle.com Subject: Re: [Ocfs2-users] Problem installing OCFS 1.2.3 Refer to the FAQ. The module's kernel version should match the kernel version. Lin Shen (lshen) wrote: Hi, The kernel I'm using is: [EMAIL PROTECTED] Desktop]# uname -a Linux cfs2 2.6.9-42.7.ELsmp #1 SMP Tue Sep 5 18:29:39 EDT 2006 i686 i686 i386 GNU/Linux So I installed ocfs2-2.6.9-42.ELsmp-1.2.3-1.i686.rpm. While trying to load the modules, I got the following error: [EMAIL PROTECTED] Desktop]# /etc/init.d/o2cb load Loading module configfs: Unable to load module configfs Failed Any ideas? Thanks lin ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] update on o2net_idle_timer
That and also we've seen similar issues with Broadcom TG3 drivers. We use Intel E1000 mostly and thus did not experience the same issue. As far as the configurable net timeouts goes, the patch was added into mainline on Dec 4th. So it will be available with ocfs2 1.4. We are still seeing if we have the bandwidth to backport it to 1.2. http://kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=history;f=fs/ocfs2/cluster/tcp.c;h=ae4ff4a6636b23759522994898a95c148a4401f1;hb=HEAD commit 828ae6afbef03bfe107a4a8cc38798419d6a2765 Author: Andrew Beekhof [EMAIL PROTECTED] Date: Mon Dec 4 14:04:55 2006 +0100 [patch 3/3] OCFS2 Configurable timeouts - Protocol changes Modify the OCFS2 handshake to ensure essential timeouts are configured identically on all nodes. Only allow changes when there are no connected peers Improves the logic in o2net_advance_rx() which broke now that sizeof(struct o2net_handshake) is greater than sizeof(struct o2net_msg) Included is the field for userspace-heartbeat timeout to avoid the need for further protocol changes. Uses a global spinlock to ensure the decisions to update configfs entries are made on the correct value. The region covered by the spinlock when incrementing the counter is much larger as this is the more critical case. Small cleanup contributed by Adrian Bunk [EMAIL PROTECTED] Signed-off-by: Andrew Beekhof [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] commit b5dd80304da482d77b2320e1a01a189e656b9770 Author: Jeff Mahoney [EMAIL PROTECTED] Date: Mon Dec 4 14:04:54 2006 +0100 [patch 2/3] OCFS2 Configurable timeouts Allow configuration of OCFS2 timeouts from userspace via configfs Signed-off-by: Andrew Beekhof [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] Andy Phillips wrote: Hello, I've made some progress with the o2net_idle_timer issue. Various people seem to occasionally report instability and faults where the following message is generated; (From Andrew Brunton) Sep 17 22:06:04 argon2 kernel: (0,0):o2net_idle_timer:1310 connection to node argon1.crewe.ukfuels.co.uk (num 0) at 10.1.1.110: has been idle for 10 seconds, shutting it down. (From Peter Santos) Nov 21 11:40:36 dbo3 kernel: o2net: connection to node dbo2 (num 1) at 192.168.134.141: has been idle for 10 seconds, shutting it down. And from me; Aug 2 19:06:27 fred kernel: o2net: connection to node barney (num 0) at 172.16.6.10: has been idle for 10 seconds, shutting it down. I've tried unsuccessfully to replicate the issue on my testbed environment. The problem stems from the o2net layer function 'o2net_idle_timer' firing, after not receiving a valid packet after O2NET_IDLE_TIMEOUT_SECS, which is defined to be 10 seconds in ocfs2-1.2.3/fs/ocfs2/cluster/tcp_internal.h. This then causes the rest of the code to fall over in a heap, once the underlying socket goes. It turns out that its very likely not a bug in ocfs2. This code is doing what its supposed to do. Others will (and have) argued that the network timeout is too low - see any and all posts by Alexei to this list. Leaving that aside, or indeed the idea that the network layer should make an attempt at reconnecting before killing the entire machine, I'll focus on the causes we've found here of this problem which are not spanning tree related. One common thread is that people finding this are on EM64T or Opteron based systems. There are various bugs reported against RedHat Linux (and probably SuSE as well) for the kernels before RHAS 4.4. e.g. page 16 of this document - lost ticks Message Under Stress With Non Uniform Memory Access Enabled on AMD Processor-Based Systems http://support.dell.com/support/edocs/software/osrhel4/en/INT/HJ834A00.pdf Or oracle bug 4593892 referenced in; http://www.oracle.com/technology/tech/linux/validated-configurations/html/vc_dell6850-rhel4-cx500-1_1.html We were also seeing messages of the form; Dec 18 10:35:44 gs2dwdb02 kernel: warning: many lost ticks. Dec 18 10:35:44 gs2dwdb02 kernel: Your time source seems to be instable or some driver is hogging interupts (sic) Our problem seems to have been at least partially down to dodgy AMI megaraid firmware for the system disks. We were getting messages from the megaraid driver module on the console, which correlated with dropped packets as logged by Oracle RAC's cssd.log. So given the above numa and driver/hardware errors its likely that ocfs2 was going for periods as long as 10 seconds without receiving a packet, and failing accordingly. Ocfs2 was hit the worst, as it has the finest trigger on lost packets. The heartbeat failure times for rac are over 60 seconds. The o2cb heartbeat is set to 61 for us, which is about 120 seconds IIRC, which is fine for interruptions to the SAN/multipathing failover failures. We're planning an upgrade to 4.4 which apparently has fixed several of these bugs, and would recommend others with
Re: [Ocfs2-users] Kernel panic - not syncing: ocfs2 is very sorry
Lot of ink has been spilled on this subject. ;) Check out the heartbeat section in the FAQ. One easy solution is to increase the hb timeout to 60 secs... O2CB_HEARTBEAT_THRESHOLD = 31 We will leaning towards making that number the default in the 1.4 release. George Liu wrote: Both systems crash with the following message on their consoles, Index 19: took 0 ms to do submit_bio for read Index 20: took ms to do waiting for read completion (6,0):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing kernel: Linux 2.6.9-42.0.3.ELsmp #1 SMP Mon Sep 25 17:28:02 EDT 2006 i686 i686 i386 GNU/Linux other : ocfs2-2.6.9-42.0.3.ELsmp-1.2.3-1.i686.rpm ocfs2-tools-1.2.2-1.i386.rpm ocfs2console-1.2.2-1.i386.rpm ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] mount error
You are using two different versions of ocfs2 on the two nodes. Different enough that they are not network compatible. It is working as designed. Consulente3 wrote: Hi, I'm new to ocfs2, and in my test's environment, i have: 2 node, becks and vaix becks can mount ocfs2 fs, but vaix can't. When vaix try to mount fs, it raise this error: vaix:/# mount -t ocfs2 /dev/etherd/e2.0 /ocfs2/ mount.ocfs2: Transport endpoint is not connected while mounting /dev/etherd/e2.0 on /ocfs2/ in becks syslog: Jan 10 00:57:38 localhost kernel: (23628,0):o2net_check_handshake:1107 node vaix (num 1) at 10.1.7.151: advertised net protocol version 4 but 1 is required, disconnecting Jan 10 00:57:44 localhost kernel: (23628,0):o2net_connect_expired:1444 ERROR: no connection established with node 1 after 10 seconds, giving up and returning errors. becks and vaix are two different linux versions (debian and redhat), whith different kernel and ocfs2tools thanks ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 crash
Looks to be running out of lowmem. # date # cat /proc/meminfo # cat /proc/slabinfo Run a script that dumps the above every 1 to 5 mins. That should help explain the cause. Brian Sieler wrote: Using 2-node clustered file system on DELL/EMC SAN/RHEL 2.6.9-34.0.2.ELsmp x86_64. Config: O2CB_HEARTBEAT_THRESHOLD=30 Kernel param: elavator=deadline (per FAQ) These log items appear and the server crashes. Has happened twice now at three week intervals, each time during a heavy IO operation: Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_setup_one_bio:371 ERROR: Could not alloc slots BIO! Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_read_slots:507 ERROR: status = -12 Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_do_disk_heartbeat:973 ERROR: status = -12 Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_setup_one_bio:371 ERROR: Could not alloc slots BIO! Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_read_slots:507 ERROR: status = -12 Jan 15 16:08:29 db100 kernel: (3898,6):o2hb_do_disk_heartbeat:973 ERROR: status Can't find much on any of these errors…what is 507 ERROR status = -12? Any help appreciated ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] ocfs2-1.2.4 RC2 released
All, http://oss.oracle.com/~smushran/.ocfs2-1.2.4-0.2/ The final 1.2.4 should look very close to this drop. We still have one slippery issue open that we are working on. But, other than that, this drop is looking good. The list of patches added post 1.2.4-0.1 is as follows: r2948: fs - Allow direct I/O read past end of file r2950: fs - Don't print errors when following symlinks r2951: dlm - Fixes race between migrate and dirty r2952: dlm - dlmunlock waits for migration to complete before unlocking r2953: dlm - migrate lockres handler looks for its lock on all queues r2954: fs - Directory c/mtime update fixes r2955: fs - Cleanup ocfs2_iget() errors r2956: dlm - Flush dlm workqueue before starting to migrate r2957: dlm - Drop inflight refmap even if no locks found on the lockres r2958: dlm - dlm dispatch was stopping too early r2959: dlm - wake up sleepers on the lockres waitqueue r2960: dlm - Silence a failed lock convert r2961: dlm - Dump dlm work queue entries for debugging r2962: dlm - Cookies in locks not being printed correctly in error messages r2963: o2net - Added post handler callable function in o2net message handler r2964: dlm - Calling post handler function in assert master handler Thanks OCFS2 Team ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 keeps fencing all my nodes
1. In SLES10, the /config has been moved to /sys/kernel/config. That's how it is on mainline. 2. To monitor heartbeat do: # watch -d -n2 debugfs.ocfs2 -R hb /dev/sdX This comand will work if you have ocfs2-tools 1.2.2. (Not sure whether sles10 ships with 1.2.2 or 1.2.1.) If 1.2.1, do: # watch -d -n2 echo \hb\ | debugfs.ocfs2 -n /dev/sdX | grep -v \ \ 3. Configure netconsole to catch any oops stack trace. 4. From the looks of it the issue is related to the disk hb timeout. Check the FAQ on increasing it to 60 secs from a default of 14 secs. John Lange wrote: I have a 4 node SLES 10 cluster with all nodes attached to a SAN via fiber. The SAN has a EVMS volume formatted with ocfs2. Below is my ocfs2.conf. I can mount the volume on any single node but as soon as I mount it on the second node, it fences one of the nodes. There is never more than one node active at a time. When I check the status of the nodes (quickly before they get fenced) the satus shows they are heartbeating. # /etc/init.d/o2cb status Module configfs: Loaded Filesystem configfs: Mounted Module ocfs2_nodemanager: Loaded Module ocfs2_dlm: Loaded Module ocfs2_dlmfs: Loaded Filesystem ocfs2_dlmfs: Mounted Checking O2CB cluster ocfs2: Online Checking O2CB heartbeat: Active Here are the logs from 2 machines (NOTE that this is the logs from 2 machines at the same time as they were captured via remote syslog on a 3rd machine machine) of what happens when the node vs2 is already running, and node vs3 joins the cluster (mounts the ocfs2 file system). In this instance vs3 gets fenced. Jan 18 14:52:41 vs2 kernel: o2net: accepted connection from node vs3 (num 2) at 10.1.1.13: Jan 18 14:52:41 vs3 kernel: o2net: connected to node vs2 (num 1) at 10.1.1.12: Jan 18 14:52:45 vs3 kernel: OCFS2 1.2.3-SLES Thu Aug 17 11:38:33 PDT 2006 (build sles) Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Node 2 joins domain 89FC5CB6C98B43B998AB8492874EA6CA Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Nodes in domain (89FC5CB6C98B43B998AB8492874EA6CA): 1 2 Jan 18 14:52:45 vs3 kernel: ocfs2_dlm: Nodes in domain (89FC5CB6C98B43B998AB8492874EA6CA): 1 2 Jan 18 14:52:45 vs3 kernel: kjournald starting. Commit interval 5 seconds Jan 18 14:52:45 vs3 kernel: ocfs2: Mounting device (253,13) on (node 2, slot 0) Jan 18 14:52:45 vs3 udevd-event[5542]: run_program: ressize 256 too short Jan 18 14:52:51 vs2 kernel: o2net: connection to node vs3 (num 2) at 10.1.1.13: has been idle for 10 seconds, shutting it down. Jan 18 14:52:51 vs2 kernel: (0,0):o2net_idle_timer:1314 here are some times that might help debug the situation: (tmr 1169153561.99906 now 1169153571.93951 dr 1169153566.98 030 adv 1169153566.98039:1169153566.98040 func (09ab0f3c:504) 1169153565.211482:1169153565.211485) Jan 18 14:52:51 vs3 kernel: o2net: no longer connected to node vs2 (num 1) at 10.1.1.12: Jan 18 14:52:51 vs2 kernel: o2net: no longer connected to node vs3 (num 2) at 10.1.1.13: == I previously had configured ocfs2 for userspace heartbeating but couldn't get that running so I reconfigured for disk based. Could that now be the cause of this problem? Where do the nodes write the heartbeats? I see nothing on the ocfs2 system. Also, I have no /config directory that is mentioned in the docs. Is that normal? Here is /etc/ocfs2/cluster.conf node: ip_port = ip_address = 10.1.1.11 number = 0 name = vs1 cluster = ocfs2 node: ip_port = ip_address = 10.1.1.12 number = 1 name = vs2 cluster = ocfs2 node: ip_port = ip_address = 10.1.1.13 number = 2 name = vs3 cluster = ocfs2 node: ip_port = ip_address = 10.1.1.14 number = 3 name = vs4 cluster = ocfs2 cluster: node_count = 4 name = ocfs2 Regards, Any tips on how I can go about diagnosing this problem? Thanks, John Lange ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2_cdsl_follow_link errors
#define EACCES 13 /* Permission denied */ The messages are harmless. Patch to silence them has already been checked into the 1.2 repo and mainline git. Matthew Flusche wrote: I’m seeing the following errors in my two node cluster. Is this anything to be concerned with? Host information: RedHat AS 4U4 2.6.9-42.0.2.ELsmp (x86_64) ocfs2-2.6.9-42.0.2.ELsmp-1.2.3-1 ocfs2console-1.2.2-1 ocfs2-tools-debuginfo-1.2.2-1 ocfs2-tools-1.2.2-1 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 02:51:44 host1 kernel: (3666,1):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 03:22:31 host1 kernel: (3666,3):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 05:06:04 host1 kernel: (18268,0):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_follow_link:410 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_cdsl_follow_link:372 ERROR: status = -13 Jan 22 06:04:27 host1 kernel: (24727,1):ocfs2_follow_link:410 ERROR: status = -13 Regards, Matt ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] kernel panic - not syncing
o2net timeout cannot cause the o2hb panic. The two are totally different. From the outputs, I would guess o2hb is timing out but I cannot say for sure till I don't see the full logs. Andy Phillips wrote: Its worth pointing out that the o2net idle timer is triggering on the network heartbeat, which is 10 seconds, in the current 1.2.x series. O2CB_HEARTBEAT_THRESHOLD has no effect on this, because its another part of the code which causes the problem. see ocfs2-1.2.3/fs/ocfs2/cluster/tcp_internal.h #define O2NET_IDLE_TIMEOUT_SECS 10 Andy On Mon, 2007-01-22 at 09:29 -0800, Srinivas Eeda wrote: problem appears to be that IO is taking more time than effective O2CB_HEARTBEAT_THRESHOLD. Your configured value 31 doesn't seem to be effective? Index 6: took 1995 ms to do msleepIndex Index 17: took 1996 ms to do msleep Index 22: took 10001 ms to do waiting for read completion. Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify. Thanks, --Srini. Consulente3 wrote: Hi all, my test environment, is composed by 2 server with centos 4.4 nodes is exporting with aoe6-43 + vblade-14 kernel-2.6.9-42.0.3.EL ocfs2-tools-1.2.2-1 ocfs2console-1.2.2-1 ocfs2-2.6.9-42.0.3.EL-1.2.3-1 /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local) /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local) DeviceFS Nodes /dev/etherd/e2.0 ocfs2 ocfs2, becks /dev/etherd/e3.0 ocfs2 ocfs2, becks DeviceFS UUID Label /dev/etherd/e2.0 ocfs2 b24cc18d-af89-4980-a75e-a87530b1b878 test1 /dev/etherd/e3.0 ocfs2 101a92fd-b83b-4294-8bfc-fbaa069c3239 nfs4 O2CB_HEARTBEAT_THRESHOLD=31 when i try to make stress test: Index 4: took 0 ms to do checking slots Index 5: took 2 ms to do waiting for write completion Index 6: took 1995 ms to do msleep Index 7: took 0 ms to do allocating bios for read Index 8: took 0 ms to do bio alloc read Index 9: took 0 ms to do bio add page read Index 10: took 0 ms to do submit_bio for read Index 11: took 2 ms to do waiting for read completion Index 12: took 0 ms to do bio alloc write Index 13: took 0 ms to do bio add page write Index 14: took 0 ms to do submit_bio for write Index 15: took 0 ms to do checking slots Index 16: took 1 ms to do waiting for write completion Index 17: took 1996 ms to do msleep Index 18: took 0 ms to do allocating bios for read Index 19: took 0 ms to do bio allo read Index 20: took 0 ms to do bio add page read Index 21: took 0 ms to do submit_bio for read Index 22: took 10001 ms to do waiting for read completion (3,0):o2hb_stop_all_regions:1908 ERROR: stopping heartbeat on all active regions. Kernel panic - not syncing: ocfs2 is very sorry to be fencing this system by panicing 6o2net: connection to node ocfs2 (num 2) at 10.1.7.107:777 has been idle for 10 seconds, shutting it down (3,0): o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr: 1169487957.71650 now 1169487967.69569 dr 1169487962.3 adv 1169487957.71671:1159487957.71674 func 83bce37b2:505) 1169487901.984644:1169487901.984676) the kernel panic occurs always on the same node, and the other node still responding thanks! ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] kernel panic - not syncing
I understand that. But that's not what the user experienced in this case. One node ran into the o2hb timeout (and panic) that caused the o2net message on the other node. These are two separate issues. FWIW, I am trying to get the o2net config backported to the 1.2 tree. Andy Phillips wrote: With respect sunil, the observed problems I see normally go like this; - o2net timeout - socket closes. Aug 2 19:06:27 fred kernel: o2net: connection to node barney (num 0) at 172.16.6.10: has been idle for 10 seconds, shutting it down. Aug 2 19:06:27 fred kernel: (0,7):o2net_idle_timer:1309 here are some times that might help debug the situation: (tmr 1154545576.798263 now - Upper layers realise they have no connection, and panic the box. Aug 2 19:06:27 fred kernel: o2net: no longer connected to node barney (num 0) at 172.16.6.10: Aug 2 19:08:33 fred kernel: (25,7):o2quo_make_decision:143 ERROR: fencing this node because it is connected to a half-quorum of 1 out of 2 nodes which doesn't include the lowest active node 0 Irrespective of that. The o2net message observed comes about due to the value of O2NET_HEARTBEAT_TIMEOUT not the o2cb heartbeat. The code that is probably giving you that error message is; The function o2net_idle_timer, which is referenced in your error message, is in ocfs2-1.2.3/fs/ocfs2/cluster/tcp.c printk(KERN_INFO o2net: connection to SC_NODEF_FMT has been idle for 10 seconds, shutting it down.\n, SC_NODEF_ARGS(sc)); mlog(ML_NOTICE, here are some times that might help debug the situation: (tmr %ld.%ld now %ld.%ld dr %ld.%ld adv %ld.%ld:%ld.%ld func (%08x:%u) %ld.%ld:%ld.%ld)\n, sc-sc_tv_timer.tv_sec, sc-sc_tv_timer.tv_usec, now.tv_sec, now.tv_usec, sc-sc_tv_data_ready.tv_sec, sc-sc_tv_data_ready.tv_usec, sc-sc_tv_advance_start.tv_sec, sc-sc_tv_advance_start.tv_usec, sc-sc_tv_advance_stop.tv_sec, sc-sc_tv_advance_stop.tv_usec, sc-sc_msg_key, sc-sc_msg_type, sc-sc_tv_func_start.tv_sec, sc-sc_tv_func_start.tv_usec, sc-sc_tv_func_stop.tv_sec, sc-sc_tv_func_stop.tv_usec); The original post only posted that error message, but the other error messages usually follow. If I'm wrong, please email me directly and help sort out my understanding. Andy On Mon, 2007-01-22 at 10:38 -0800, Sunil Mushran wrote: o2net timeout cannot cause the o2hb panic. The two are totally different. From the outputs, I would guess o2hb is timing out but I cannot say for sure till I don't see the full logs. Andy Phillips wrote: Its worth pointing out that the o2net idle timer is triggering on the network heartbeat, which is 10 seconds, in the current 1.2.x series. O2CB_HEARTBEAT_THRESHOLD has no effect on this, because its another part of the code which causes the problem. see ocfs2-1.2.3/fs/ocfs2/cluster/tcp_internal.h #define O2NET_IDLE_TIMEOUT_SECS 10 Andy On Mon, 2007-01-22 at 09:29 -0800, Srinivas Eeda wrote: problem appears to be that IO is taking more time than effective O2CB_HEARTBEAT_THRESHOLD. Your configured value 31 doesn't seem to be effective? Index 6: took 1995 ms to do msleepIndex Index 17: took 1996 ms to do msleep Index 22: took 10001 ms to do waiting for read completion. Can you please cat /proc/fs/ocfs2_nodemanager/hb_dead_threshold and verify. Thanks, --Srini. Consulente3 wrote: Hi all, my test environment, is composed by 2 server with centos 4.4 nodes is exporting with aoe6-43 + vblade-14 kernel-2.6.9-42.0.3.EL ocfs2-tools-1.2.2-1 ocfs2console-1.2.2-1 ocfs2-2.6.9-42.0.3.EL-1.2.3-1 /dev/etherd/e2.0 on /ocfs2 type ocfs2 (rw,_netdev,heartbeat=local) /dev/etherd/e3.0 on /ocfs2_nfs type ocfs2 (rw,_netdev,heartbeat=local) DeviceFS Nodes /dev/etherd/e2.0 ocfs2 ocfs2, becks /dev/etherd/e3.0 ocfs2 ocfs2, becks DeviceFS UUID Label /dev/etherd/e2.0 ocfs2 b24cc18d-af89-4980-a75e-a87530b1b878 test1 /dev/etherd/e3.0 ocfs2 101a92fd-b83b-4294-8bfc-fbaa069c3239 nfs4 O2CB_HEARTBEAT_THRESHOLD=31 when i try to make stress test: Index 4: took 0 ms to do checking slots Index 5: took 2 ms to do waiting for write completion Index 6: took 1995 ms to do msleep Index 7: took 0 ms to do allocating bios for read Index 8: took 0 ms to do bio alloc read Index 9: took 0 ms to do bio add page read Index 10: took 0 ms to do submit_bio for read Index 11: took 2 ms to do waiting for read completion Index 12: took 0 ms to do bio alloc write Index 13: took 0 ms to do bio add page write Index 14: took 0 ms to do submit_bio for write Index 15: took 0 ms to do checking slots Index 16: took 1 ms to do waiting for write completion Index 17: took 1996 ms to do msleep Index 18: took 0 ms to do allocating bios for read Index 19: took 0 ms to do bio allo read Index 20: took 0 ms to do bio add page read
Re: [Ocfs2-users] ocfs2 kernel bug in Fedora Core 4 update kernel
This was the lvb issue that was fixed long ago. In the 1.2 tree, it was fixed in 1.2.2. 2.6.18 should definitely have the fix for this. davide rossetti wrote: OS: Fedora Core release 4 (Stentz) KERNEL: Linux rack1.ape 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006 i686 i686 i386 GNU/Linux CLUSTER: 11 Linux kernels, mixed environment FC4,FC5,FC6 SAN: FC Infortrend storage, QLogic16 port FC switch, FC adapter LSI FC929X (21224,1):ocfs2_truncate_file:242 ERROR: bug expression: le64_to_cpu(fe-i_size) != i_size_read(inode) (21224,1):ocfs2_truncate_file:242 ERROR: Inode 1029752381, inode i_size = 582 != di i_size = 690, i_flags = 0x 1 [ cut here ] kernel BUG at fs/ocfs2/file.c:242! invalid opcode: [#1] SMP last sysfs file: /class/vc/vcs12/dev Modules linked in: nfs nfsd exportfs lockd nfs_acl ipv6 autofs4 ocfs2 rfcomm l2cap bluetooth ocfs2_dlmfs ocfs2 _dlm ocfs2_nodemanager configfs sunrpc video button battery ac uhci_hcd e7xxx_edac edac_mc i2c_i801 i2c_core t g3 e100 mii floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod mptfc scsi_transport_fc mptscsih mptbase sd_m od scsi_mod CPU:1 EIP:0060:[f8bc5ebe]Not tainted VLI EFLAGS: 00010286 (2.6.17-1.2142_FC4smp #1) EIP is at ocfs2_setattr+0x6a8/0x1000 [ocfs2] eax: 0073 ebx: ecx: 0246 edx: 0246 esi: 02b2 edi: ebp: c7fe esp: efc31e30 ds: 007b es: 007b ss: 0068 Process cp (pid: 21224, threadinfo=efc31000 task=f7e37730) Stack: efc31ea0 0001 f0cbf9c0 f5bf2000 f0cbfba0 0001 c9be8a60 f5bf2000 c9be8a60 f0cbfa78 efc31ea0 f0cbf9c0 c047e75e e6875774 0122 0068 45b55d27 e6875774 Call Trace: c047e75e notify_change+0x164/0x300 c0465388 do_truncate+0x54/0x6c c047372f may_open+0x1a8/0x1fc c04755ad open_namei+0x24b/0x5c3 c04666a6 do_filp_open+0x1c/0x31 c04667ad do_sys_open+0x3c/0xa9 c0466847 sys_open+0x16/0x18 c0403d2f syscall_call+0x7/0xb Code: fc ff ff ff b1 c0 fc ff ff 68 f2 00 00 00 68 8b 54 be f8 ff 70 10 8b 00 ff b0 b4 00 00 00 68 52 9e be f8 e8 1f e4 85 c7 83 c4 30 0f 0b f2 00 21 99 be f8 8b 4d 24 39 4c 24 28 8b 55 20 0f 82 c6 EIP: [f8bc5ebe] ocfs2_setattr+0x6a8/0x1000 [ocfs2] SS:ESP 0068:efc31e30 BUG: cp/21224, lock held at task exit time! [f0cbfa44] {inode_init_once} .. held by:cp:21224 [f7e37730, 119] ... acquired at: do_truncate+0x4b/0x6c (2535,0):o2net_set_nn_state:415 accepted connection from node rack10 (num 11) at 10.0.2.30: http://10.0.2.30: (2535,0):__dlm_print_nodes:377 Nodes in my domain (41AE1AA4C5534E50A93784D2AD94A94D): (2535,0):__dlm_print_nodes:381 node 1 (2535,0):__dlm_print_nodes:381 node 2 (2535,0):__dlm_print_nodes:381 node 3 (2535,0):__dlm_print_nodes:381 node 4 (2535,0):__dlm_print_nodes:381 node 5 (2535,0):__dlm_print_nodes:381 node 6 (2535,0):__dlm_print_nodes:381 node 7 (2535,0):__dlm_print_nodes:381 node 8 (2535,0):__dlm_print_nodes:381 node 9 (2535,0):__dlm_print_nodes:381 node 10 (2535,0):__dlm_print_nodes:381 node 11 (795,0):o2net_idle_timer:1284 connection to node rack6 (num 8) at 10.0.2.26: http://10.0.2.26: has been idle for 10 seconds, shutting it down. -- [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ICQ:290677265 SKYPE:d.rossetti ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 kernel bug in Fedora Core 4 update kernel
Not really. The mainline tree is labeled 1.3.x because it is the tree we add new features too. But bug fixes are applied to both 1.2 and 1.3 separately so it is hard to tell by the version# alone. This is the fix in the git tree: commit 4b1af774451bbc8440719e3fe441934a337c3b63 Author: Kurt Hackel [EMAIL PROTECTED] Date: Mon Jun 26 15:17:47 2006 -0700 ocfs2: Fix lvb corruption Properly ignore LVB flags during a PR downconvert. This avoids an illegal lvb update. Signed-off-by: Kurt Hackel [EMAIL PROTECTED] Signed-off-by: Mark Fasheh [EMAIL PROTECTED] davide rossetti wrote: On 1/23/07, *Sunil Mushran* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: This was the lvb issue that was fixed long ago. In the 1.2 tree, it was fixed in 1.2.2. 2.6.18 should definitely have the fix for this. it seems it's even more recent: /var/log/messages.4:Dec 27 19:40:40 rack1 kernel: OCFS2 Node Manager 1.3.3 /var/log/messages.4:Dec 27 19:40:40 rack1 kernel: OCFS2 DLM 1.3.3 /var/log/messages.4:Dec 27 19:40:40 rack1 kernel: OCFS2 DLMFS 1.3.3 /var/log/messages.4:Dec 27 19:40:40 rack1 kernel: OCFS2 User DLM kernel interface loaded /var/log/messages.4:Dec 27 19:40:40 rack1 kernel: SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not configured for labeling /var/log/messages.4:Dec 27 19:40:44 rack1 kernel: OCFS2 1.3.3 -- [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ICQ:290677265 SKYPE:d.rossetti ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] unable to configure O2CB_HEARTBEAT_THRESHOLD
The o2cb script fix is in ocfs2-tools 1.2.2 released Oct 2006. Ping SUSE for the update. [EMAIL PROTECTED] wrote: Using SuSE SP2 Linux running V1.0.8 of OCFS2 and the tools/console that comes with SP2 distribution. I am unable to set the* O2CB_HEARTBEAT_THRESHOLD* parameter in the /etc/sysconfig/o2cb file. In doing so, running o2cb configure overwrites it. Also noted that o2cb doesn't reference the parameter, even in its template file for sysconfig. How can I get this set? It has left our cluster in a hopeless situation and I'm reluctant to install SuSE SP3 as this will mean a full install of asmlibs and other software rebuilds. Phil Broughton ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2 kernel bug in Fedora Core 4 update kernel
This is not a fs issue. As in the file must be alright. This is a dlm issue. The fs is asking the dlm to free the lock and the dlm is stuck. How many nodes do you have? We've fixed a bunch of dlm bugs since what you appear to be running. davide rossetti wrote: I rebooted the two faulty nodes. now, I can't access anymore the file which was involved in the crash: /mi11/simma/ghmc/m24/JOB.log using the faq document, I'm trying to check the situation: Lockres: M003d60c63da894d788 Mode: No Lock Flags: Initialized Attached Busy RO Holders: 0 EX Holders: 0 Pending Action: Convert Pending Unlock Action: None sh-3.00# echo locate D003d60c63da894d788 | /sbin/debugfs.ocfs2 /dev/sdc1 debugfs.ocfs2 1.2.2 debugfs: 1029752381 /mi11/simma/ghmc/m24/JOB.log On another shell, I have a stuck un-killable process: theboss 14:25 (5) ~ls /storage/disk1/mi11/simma/ghmc/m24/ rossetti 7930 0.0 0.0 62324 872 pts/13 D+ 14:26 0:00 ls --color=tty -F /storage/disk1/mi11/simma/ghmc/m24/ How should I proceed to unlock the file and/or remove it ??? -- [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] ICQ:290677265 SKYPE:d.rossetti ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
[Ocfs2-users] OCFS2 1.2.4-2 released
All, We are pleased to announce the release of OCFS2 1.2.4-2. This release addresses the lowmem consumption issue that has plagued many users. It also addresses few races in the dlm relating to the lockres migration. The complete list of changes post 1.2.3 is available here: http://oss.oracle.com/projects/ocfs2/news/article_10.html Please note that we did have to update the network protocol in the 1.2.4 release and thus cannot support rolling upgrade from 1.2.3 or earlier to 1.2.4. We are well aware that this causes problems for many users and we would not have made the change if we did not think it necessary. We apologize for the inconvenience. The packages for Oracle's EL4 will be available on the ULN site sometime early next week. Novell has already incorporated all the patches bundled in this release in their SLES10 SP1 code tree. Please contact Novell for the release schedule. Packages for Red Hat's RHEL4 are available in the OCFS2 download area for all ten kernels (starting 2.6.9-22.EL) and six architectures. Follow the links to download the appropriate package. (Refer to the FAQ if you are unclear as to which package you need to install.) As always, we look forward to hearing from you on the ocfs2-users@oss.oracle.com mailing list. The OCFS2 Team OCFS2: http://oss.oracle.com/projects/ocfs2 TOOLS: http://oss.oracle.com/projects/ocfs2-tools/ FAQ: http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html GUIDE: http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_users_guide.pdf ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 mount problem
It could be that the device name is not the same across the two nodes. Do: # mounted.ocfs2 -d on both nodes. Match the device using the uuid. As in, you should see a device with the same uuid on both nodes. If not, then the device is not shared. If you do see the device on both nodes but with differing names, you could look into mounting by label. # mount -t ocfs2 -L label /c1 aibolit 66 wrote: Hello everybody! I'm using RHEL4 U4, kernel-2.6.9-42.0.8.EL and ocfs2-tools-1.2.2-1. Trying to set up OCFS2 on 2 nodes following this guide http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_users_guide.pdf: The problem is that after creation of cluster.conf via ocfs2console, and propagation it to second node, I can format /dev/sda5 as ocfs2FS, but I can mount it only on node1. Node2 mount fails with following error: [EMAIL PROTECTED] ~]# mount.ocfs2 /dev/sda5 /cl ocfs2_hb_ctl: Bad magic number in superblock while reading uuid mount.ocfs2: Error when attempting to run /sbin/ocfs2_hb_ctl: Operation not permitted I think, that I'm doing something wrong, because when I'm configuring nodes via ocfs2console, and when I'm formatting /dev/sda5, there is absolutely nothing happens between node1 and node2. No network traffic at all... Here is my cluster.conf that exists on both nodes: node: ip_port = ip_address = 89.XXX.134.24 number = 0 name = node1.Y.ru cluster = ocfs2 node: ip_port = ip_address = 89.XXX.134.25 number = 1 name = node2.Y.ru cluster = ocfs2 cluster: node_count = 2 name = ocfs2 Any help will be very appreciated. ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 mount problem
The device needs to be shared. As in, both nodes need to be able to see the same device concurrently. Refer to iscsi, fiber channel, aoe, etc. aibolit 66 wrote: -Original Message- From: Sunil Mushran [EMAIL PROTECTED] To: aibolit 66 [EMAIL PROTECTED] Date: Mon, 05 Feb 2007 12:46:26 -0800 Subject: Re: [Ocfs2-users] OCFS2 mount problem It could be that the device name is not the same across the two nodes. Do: # mounted.ocfs2 -d on both nodes. Match the device using the uuid. As in, you should see a device with the same uuid on both nodes. If not, then the device is not shared. If you do see the device on both nodes but with differing names, you could look into mounting by label. # mount -t ocfs2 -L label /c1 [EMAIL PROTECTED] ~]# mounted.ocfs2 -d DeviceFS UUID Label /dev/sda5 ocfs2 6a0fbcc8-675f-42c6-9bda-5513feb98a05 oracle [EMAIL PROTECTED] ~]# mounted.ocfs2 -d DeviceFS UUID Label I didn't format /dev/sda5 on second node, as guide.pdf says that. P.S. Sorry for previous dup, thought that your server didn't accept message from mail.ru :( ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] OCFS2 1.2.4-2 released
That's the source. Randy Ramsdell wrote: Mark Fasheh wrote: On Tue, Feb 06, 2007 at 10:18:51AM -0500, Randy Ramsdell wrote: Is source available? http://oss.oracle.com/projects/ocfs2/dist/files/source/v1.2/ocfs2-1.2.4.tar.gz --Mark -- Mark Fasheh Senior Software Developer, Oracle [EMAIL PROTECTED] I have this but thought it wasn't patched. It is patched with the -2 fixes? Randy ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] ocfs2-tools-1.2.2 compile.
The following patch will address this issue. The fix will be provided with the next tools release. Index: libocfs2/include/ocfs2.h === --- libocfs2/include/ocfs2.h(revision 1269) +++ libocfs2/include/ocfs2.h(revision 1270) @@ -48,6 +48,9 @@ #include byteorder.h +#if !defined(offsetof) +# define offsetof(type,memb) ((size_t)((type*)0)-memb) +#endif #if OCFS2_FLAT_INCLUDES #include o2dlm.h Randy Ramsdell wrote: Hi, The ocfs2 package compiled perfectly, but tools did not. The test setup is using opensuse10.1 - updates applied For ocfs2-tools-1.2.2: In file included from include/ocfs2.h:60, from alloc.c:32: include/ocfs2_fs.h: In function ‘ocfs2_fast_symlink_chars’: include/ocfs2_fs.h:566: warning: implicit declaration of function ‘offsetof’ include/ocfs2_fs.h:566: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_extent_recs_per_inode’: include/ocfs2_fs.h:574: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_chain_recs_per_inode’: include/ocfs2_fs.h:584: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_extent_recs_per_eb’: include/ocfs2_fs.h:594: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_local_alloc_size’: include/ocfs2_fs.h:604: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_group_bitmap_size’: include/ocfs2_fs.h:614: error: expected expression before ‘struct’ include/ocfs2_fs.h: In function ‘ocfs2_truncate_recs_per_inode’: include/ocfs2_fs.h:624: error: expected expression before ‘struct’ alloc.c: In function ‘ocfs2_init_inode’: alloc.c:143: warning: pointer targets in passing argument 1 of ‘strcpy’ differ in signedness alloc.c: In function ‘ocfs2_init_eb’: alloc.c:184: warning: pointer targets in passing argument 1 of ‘strcpy’ differ in signedness make[1]: *** [alloc.o] Error 1 make[1]: Leaving directory `/root/src/ocfs2-tools-1.2.2/libocfs2' make: *** [libocfs2] Error 2 Anyone know how to resolve this? Randy Ramsdell Foreclosure.com ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] 1.3.3 mount problem
The datavolume code is not in mainline. But you should be able to get Oracle RDBMS to work with it. Ensure the init.ora paramater filesystemio_options is set to direct_io. Ivo Maya wrote: Hi, I need to mount ocfs2 with datavolume option on open SuSE 10.2 Machines. ocfs2 is 1.3.3 version and does not support the datavolume option ??? I know it's not a supported distro (RHEL, Enterprise Linux, etc) but I want to test this specific distro. I tried to compile the 1.2.4 version but have some problems. Does that mean that ocfs2 is not supported on all 2.6 kernels? Why is it part of the kernel if you can't use all options normally available on the official distros? Tx Ivo Get your own web address. Have a HUGE year through Yahoo! Small Business. http://smallbusiness.yahoo.com/domains/?p=BESTDEAL ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users
Re: [Ocfs2-users] 1.2.4 symbols
What does dmesg say? Randy Ramsdell wrote: Hi, Everything compiled correctly for the ocfs2 package, but so far the modules will not load with the well known module symbol error. FATAL: Error inserting ocfs2 (/lib/modules/2.6.16.27-0.6-smp/kernel/fs/ocfs2/ocfs2.ko): Unknown symbol in module, or unknown parameter (see dmesg) Okay not sure what is up here, any suggestions? BTW, this is the correct module location and I manually ran depmod. thanks, randy ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users ___ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users