Re: [DRBD-user] Having Trouble with LVM on DRBD
Hello Eric, On Thu, Feb 25, 2016 at 11:51 PM, Eric Robinsonwrote: > Ø Im confused I don't see the VG(s) and LV(s) under cluster control have > you done that bit? > > (blank stare) > > This is where I admit that I have no idea what you mean. I’ve been building > clusters with drbd for a decade, and I’ve always had drbd on top of LVM and > all has been well. This is the first time I have LVM on top of drbd. What am > I missing? Pacemaker needs to activate the VG once DRBD is primary ... this is described here: https://drbd.linbit.com/users-guide/s-nested-lvm.html ... and ... https://drbd.linbit.com/users-guide/s-lvm-pacemaker.html Regards, Andreas > > --Eric > > > > > > > > > ___ > drbd-user mailing list > drbd-user@lists.linbit.com > http://lists.linbit.com/mailman/listinfo/drbd-user > ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Severe disk IO problems
On 2013-08-30 19:31, Stephen Marsh wrote: Hi all, I've recently upgraded to DRBD 8.4.3 (protocol C) on CentOS 6.4 (kernel 3.10.10) with Xen 4.3.0 on hardware RAID10 with an Infiniband 20Gbit/sec replication link. For a few days now, we've been experiencing a very strange issue whereby (seemingly randomly) the system will become almost unresponsive, with iowait going to 100% on some (but not all) domUs and dom0, but even the domUs whose load remains stable will still be incredibly sluggish. The problem occurs even when the resources are in standalone mode. One thing that I would check: if you are running the credit scheduler, dom0 may have run out of credits. Check/increase credit scheduler domain weights and make sure dom0 gets enough CPU time to serve i/o requests ... explained here in the Xen wiki http://goo.gl/fqtS6Y Regards, Andreas -- Need help with Linux-HA? http://www.hastexo.com/now Sometimes it self-corrects, but it's becoming more severe and is now less likely to go away without a reboot. Earlier today, the system running as primary was at 0.02 load, and the slave (which was doing nothing other than receiving updates from the master, no domUs running) went to 13 load and was pretty much dead. I've tried a variety of tuning options, including enabling disable_sendpage, but nothing is making it any better. Nothing is printed to the logs. My next thought is to try downgrading to DRBD 8.3, but considering support ends in December, I'd much prefer to continue using 8.4. I'm very much hoping that someone more experienced than myself will be able to offer some words of wisdom. :) Thanks Regards, Stephen Marsh ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Device is held open by someone
On 2013-02-26 13:04, Felipe Gutierrez wrote: Hi everyone, I am trying to do a failover system only with drbd. When my primary node get out of the network, the secondary node became primary and I mount the filesystem. secondary# drbdadm primary r7 secondary# mount /dev/drbd7 /mnt/drbd7/ Until that every thing is ok. At this time, my old primary node has to became the secondary and I have to discard my changes. primary# umount -l /mnt/drbd7 primary# drbdadm secondary r7 7: State change failed: (-12) Device is held open by someone Command 'drbdsetup 7 secondary' terminated with exit code 11 primary# drbdadm -- --discard-my-data connect r7 Does anyone have a hint? It's always worth checking device-mapper: dmsetup ls --tree -o inverted Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thnaks in advance! Felipe -- *-- -- Felipe Oliveira Gutierrez -- felipe.o.gutier...@gmail.com mailto:felipe.o.gutier...@gmail.com -- https://sites.google.com/site/lipe82/Home/diaadia* ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee
Please don't bypass the mailing-list ... On 12/21/2012 06:04 PM, Anthony G. wrote: Thank you for your input. That was my first thought, but I caught hell trying to get the partition sizes to match. I'm not sure which size reading I need to take on -nfs2 and then which specific lvcreate command I need to execute on -nfs1 to get the size on the latter set properly. well, you could try the one I put in my previous answer ... and it does not need to be of the exact size on nfs1 ... equal or more I've recreated the lv, though (just to try and make some progress), and am now getting the following, when I try to 'service drbd start' on -nfs1: DRBD's startup script waits for the peer node(s) to appear. - In case this node was already a degraded cluster before the reboot the timeout is 0 seconds. [degr-wfc-timeout] - If the peer was available before the reboot the timeout will expire after 0 seconds. [wfc-timeout] (These values are for resource 'nfs'; 0 sec - wait forever) To abort waiting enter 'yes' [ 123]:yes 'netstat -a' doesn't show -nfs2 listening on port 7789, but I do see drbd-related processes running on that box. so the resource on nfs2 is in disconnected state do a drbdadm adjust nfs on nfs2 Regards, Andreas -Anthony Date: Fri, 21 Dec 2012 17:25:01 +0100 From: andr...@hastexo.com To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee On 12/21/2012 12:13 AM, Anthony G. wrote: Hi, There's so much information relating to my current configuration, that I'm not sure what I should post here. Let me start by saying that I had two Ubuntu 10.04 hosts configured in a DRBD relationship: sf02-nfs1 (primary) and sf0-nfs2 (secondary). -nfs1 suffered a major filesystem fault. I had to make -nfs2 primary and rebuild -nfs1. I want to eventually have all of my machines on 12.04, so I took this as an opportunity to set -nfs1 on that OS. Here is a copy of my main configuration file (/etc/drbd.d/nfs.res): resource nfs { on sf02-nfs2 { device/dev/drbd0; disk /dev/ubuntu/drbd-nfs; address 10.0.6.2:7789; meta-disk internal; } on sf02-nfs1 { device/dev/drbd0; disk /dev/ubuntuvg/drbd-nfs; address 10.0.6.1:7789; meta-disk internal; } } I'm trying to re-introduce -nfs1 into the DRBD relationship and am having trouble. I have: 1.) created the resource nfs on -nfs1 ('drbdadm create-md nfs') 2.) run 'drbdadm primary nfs' on -nfs2 and 'drbdadm secondary nfs' on -nfs1. 3.) run drbdadm -- --overwrite-data-of-peer primary all' from -nfs2. But /var/log/kern.log shows: = Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843938] block drbd0: Handshake successful: Agreed network protocol version 91 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843949] block drbd0: conn( WFConnection - WFReportParams ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844171] block drbd0: Starting asender thread (from drbd0_receiver [2452]) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844539] block drbd0: data-integrity-alg: not-used Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844610] block drbd0: *The peer's disk size is too small!* Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844617] block drbd0: conn( WFReportParams - Disconnecting ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844626] block drbd0: error receiving ReportSizes, l: 32! Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844680] block drbd0: asender terminated Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844691] block drbd0: Terminating asender thread Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844746] block drbd0: Connection closed Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844755] block drbd0: conn( Disconnecting - StandAlone ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844791] block drbd0: receiver terminated Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844794] block drbd0: Terminating receiver thread = So, it seems that a difference in the size of drbd0 on the respective machines is the source of my trouble. 'cat /proc/partitions' (output pasted at the end of this message) on each machine tells me that -nfs2's partition is around 348148 blocks larger than -nfs1's. -nfs2 contains my company's Production data, so I do not, of course, want to do anything destructive there. I can, however, certainly recreate the resource on -nfs1. Does anyone out there know what steps I need to take to make the partition sizes match? Of course, I'm working under the belief that the peer's disk size is too small message points up the source of my trouble. Let me know, of course, if I need to post more information on my setup. You are using LVM, so simply resize the lv below DRBD on nfs1 to be at least of the same size or bigger ala: lvresize -L+200M ubuntuvg/drbd-nfs ... then
Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee
On 12/21/2012 06:39 PM, Anthony G. wrote: well, you could try the one I put in my previous answer ... and it does not need to be of the exact size on nfs1 ... equal or more I will try that. It's probably apparent, but I'm new to LVM and DRBD. Is the drbdadm adjust nfs on nfs2 something that I can do while that system is up-and-running and servicing Production requests? Yes that can be done online ... use -d switch for dry-run and you should only see a connect command as output Regards, Andreas Thanks, again, -Anthony Date: Fri, 21 Dec 2012 18:12:23 +0100 From: andr...@hastexo.com To: drbd-user@lists.linbit.com CC: agenere...@hotmail.com Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee Please don't bypass the mailing-list ... On 12/21/2012 06:04 PM, Anthony G. wrote: Thank you for your input. That was my first thought, but I caught hell trying to get the partition sizes to match. I'm not sure which size reading I need to take on -nfs2 and then which specific lvcreate command I need to execute on -nfs1 to get the size on the latter set properly. well, you could try the one I put in my previous answer ... and it does not need to be of the exact size on nfs1 ... equal or more I've recreated the lv, though (just to try and make some progress), and am now getting the following, when I try to 'service drbd start' on -nfs1: DRBD's startup script waits for the peer node(s) to appear. - In case this node was already a degraded cluster before the reboot the timeout is 0 seconds. [degr-wfc-timeout] - If the peer was available before the reboot the timeout will expire after 0 seconds. [wfc-timeout] (These values are for resource 'nfs'; 0 sec - wait forever) To abort waiting enter 'yes' [ 123]:yes 'netstat -a' doesn't show -nfs2 listening on port 7789, but I do see drbd-related processes running on that box. so the resource on nfs2 is in disconnected state do a drbdadm adjust nfs on nfs2 Regards, Andreas -Anthony Date: Fri, 21 Dec 2012 17:25:01 +0100 From: andr...@hastexo.com To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee On 12/21/2012 12:13 AM, Anthony G. wrote: Hi, There's so much information relating to my current configuration, that I'm not sure what I should post here. Let me start by saying that I had two Ubuntu 10.04 hosts configured in a DRBD relationship: sf02-nfs1 (primary) and sf0-nfs2 (secondary). -nfs1 suffered a major filesystem fault. I had to make -nfs2 primary and rebuild -nfs1. I want to eventually have all of my machines on 12.04, so I took this as an opportunity to set -nfs1 on that OS. Here is a copy of my main configuration file (/etc/drbd.d/nfs.res): resource nfs { on sf02-nfs2 { device /dev/drbd0; disk /dev/ubuntu/drbd-nfs; address 10.0.6.2:7789; meta-disk internal; } on sf02-nfs1 { device /dev/drbd0; disk /dev/ubuntuvg/drbd-nfs; address 10.0.6.1:7789; meta-disk internal; } } I'm trying to re-introduce -nfs1 into the DRBD relationship and am having trouble. I have: 1.) created the resource nfs on -nfs1 ('drbdadm create-md nfs') 2.) run 'drbdadm primary nfs' on -nfs2 and 'drbdadm secondary nfs' on -nfs1. 3.) run drbdadm -- --overwrite-data-of-peer primary all' from -nfs2. But /var/log/kern.log shows: = Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843938] block drbd0: Handshake successful: Agreed network protocol version 91 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843949] block drbd0: conn( WFConnection - WFReportParams ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844171] block drbd0: Starting asender thread (from drbd0_receiver [2452]) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844539] block drbd0: data-integrity-alg: not-used Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844610] block drbd0: *The peer's disk size is too small!* Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844617] block drbd0: conn( WFReportParams - Disconnecting ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844626] block drbd0: error receiving ReportSizes, l: 32! Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844680] block drbd0: asender terminated Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844691] block drbd0: Terminating asender thread Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844746] block drbd0: Connection closed Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844755] block drbd0: conn( Disconnecting - StandAlone ) Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844791] block drbd0: receiver terminated Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844794] block drbd0: Terminating receiver thread = So, it seems that a difference in the size of drbd0 on the respective machines is the source of my trouble. 'cat
Re: [DRBD-user] Still experiencing resource spikes
On 12/17/2012 08:27 PM, Prater, James K. wrote: I have a real sticky problem and would appreciate if anyone has insight. We currently have the following physical configuration 2) Dell PowerEdge R710 (dual 6-core with hyperthreading enabled) 120Gbytes of memory each That's quite a lot ... you also tuned vm.dirty_background_bytes and vm.dirty_bytes to a reasonable low value? to avoid regular heavy data write-out Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now 4x 10GE Nics (2-bonded for NFS and 2-bonded for Replication) Broadcom 57711. 4x 8Gbit FC HBAs which are tied to a dual controller NEXSAN E60 array (controllers in non-redundant mode), each controller has 4Gbytes of memory. Raw throughput is around 340Mbytes/sec per volume. Each system is running RHEL 6.3 with heartbeat and now with DRBD-8.4.2, problem was there with 8.4.0 (could not get 8.4.1 to work). System configured as Active/Passive pair with EXT4 and the filesystem, barriers off. Filesystems exported via NFS to vSphere 4.1 clients. Main problem is that everything works for most of the time but every now and then a resource stall (high load average and no I/O) occurs which Is not good for running VMs. Has anyone seen this?No errors recorded just no I/O and high load for a few minutes (3-4).This has been driving me crazy. One more thing, these events do not occur with “replication disabled” i.e. drbdadm down all (on the peer member). I have adjusted many sysctl parameters (up memory buffers , etc) , changed I/O schedulers and turned on and off hyperthreading and still have the issue. Thanks in advance. James ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] FileSystem Resource Won't Start
Hi Eric, On 12/11/2012 08:42 PM, Robinson, Eric wrote: Add something like this: order o_drbd_then_group_clust08 inf: ms_drbd0:promote g_clust08:start order o_drbd_then_group_clust09 inf: ms_drbd1:promote g_clust09:start collocate c_group_clust08_on_drbd_master inf: g_clust08 ms_drbd0:master collocate c_group_clust09_on_drbd_master inf: g_clust09 ms_drbd1:master Do I really need a colocation and an order? Yes ... for each drbd m/s resource in such a setup. Doesn't a colo inply the order? No Regads, Andreas -- Need help with DRBD? http://www.hastexo.com/now -- Eric Robinson Disclaimer - December 11, 2012 This email and any files transmitted with it are confidential and intended solely for *Jake Smith,drbd-user@lists.linbit.com*. If you are not the named addressee you should not disseminate, distribute, copy or alter this email. Any views or opinions presented in this email are solely those of the author and might not represent those of Physicians' Managed Care or Physician Select Management. Warning: Although Physicians' Managed Care or Physician Select Management has taken reasonable precautions to ensure no viruses are present in this email, the company cannot accept responsibility for any loss or damage arising from the use of this email or attachments. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Slow Reads on VM - Xenserver and DRBD
On 07/24/2012 11:46 AM, Phil Stricker wrote: Hi! I think, I am seing two different Problems of which one is isolated: - Slow performance in a VM The issue seams to be related to different OSs: In a Debian-VM, I am getting nearly full speed of the drbd-block-device , in a CentOS 5.8 VM, I can only see 10% of that speed. You are using paravirtualized drivers in the VM? - Slow overall performance of the raid-array: I cannot reach more than 450 MB/s on the array (LSI 9260-4i, 8x Intel 320 160 GB SSD). I see 450 MB/s in a Raid10 setting and I see 450MB/s in a Raid0 test setting. That is crazy! You already shared your drbd configuration? ... and the controller has a non-volatile cache? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Did you ever see that behaviour? Best wishes, Phil -Original Message- From: f...@mpexnet.de [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Felix Frank Sent: Tuesday, July 24, 2012 11:17 AM To: Christian Balzer Cc: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Slow Reads on VM - Xenserver and DRBD On 07/24/2012 11:11 AM, Christian Balzer wrote: Yeah, I've read the same thing and am leaning towards KVM for a fully virtualized system, though Vserver and (in the future) LXC work for 90% of the requirements I have. Seconded. Linux vServer + DRBD make for a robust HA setup at very good performance. A vserver RA for pacemaker is floating through the web. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Strange alerts on zabbix showing pacemaker down
On 07/09/2012 01:17 AM, Richard Goetz wrote: Hi DRBD Folks I have a strange issue occurring where zabbix checks for dbbdadmin/pacemaker and alerting at random intervals. This all started after doing a test fail over of master node using drbd. Some of the checks that fail call are executed by zabbix COMMAND=/sbin/drbdadm dstate harddisk COMMAND=/sbin/drbdadm cstate ssddisk COMMAND= /usr/sbin/crm_mon -s COMMAND= /usr/sbin/crm_mon -1 At first i thought that this was a zabbix only problem but then I began to suspect something was going awry.After a few dozen alerts in the middle of the night with no load on system I began to suspect that this was something else. During an event where the timeout of checks for pacemaker drbdadm fails. I was unable to log into systems in timely manner. I have attempted to login to log into mysql server to see what may because this blocking during a alerting event but I noticed that it is taking 2-5 mins to log into server which seemed off for server with LoavAvg in 0.0[1-9] range and iostat -dx was not over capacity. (i checked as soon as I was able to login) I turned sar on server to get better data and found 2 other things occurring at exactly the same time. A spike in totsck and one of the cores having high cpu utilization. Normally totsck was in 500 range but during event it was in 1500 range. So this is a mysql database and applications connect to it ... have you checked if all those tcp connections, and a lot of them are in TIME_WAIT state, are mysql connections? Have you been able to do a remote mysql connection and executing a SHOW PROCESSLIST? Have you tried to do a ssh connection with debug output? ... ssh -vvv to see more information DNS resolution is working fine? sshd and Mysql do reverse DNS lookups per default ... Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now totscktcpsckudpsckrawsck ip-fragtcp-tw 10:45:01 AM 1561 29318 0 0 838 10:35:01 AM CPU %usr %nice %sys %iowait%steal %irq %soft%guest %idle 10:45:01 AM 5 11.64 0.00 43.87 0.02 0.00 0.00 0.04 0.00 44.42 \ 06:45:01 AMtotscktcpsckudpsckrawsck ip-fragtcp-tw 03:05:06 AM 1562 28617 0 0 859 03:15:01 AM 1548 28617 0 0 869 10:35:01 AM CPU %usr %nice %sys %iowait%steal %irq %soft%guest %idle 03:15:01 AM 6 20.88 0.00 79.09 0.00 0.00 0.00 0.03 0.00 0.00 It is clear that something is occurring on server when this occurs and also always occurring in syslog at same time are following events(although the same events occur when zabbix checks/inability to login do no appear to occur also) Jul 8 03:05:44 mysql-1 lrmd: [2834]: info: operation monitor[191] on ip1 for client 2837: pid 7573 exited with return code 0 Jul 8 03:08:00 mysql-1 crmd: [2837]: info: crm_timer_popped: PEngine Recheck Timer (I_PE_CALC) just popped (90ms) Jul 8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition: State transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_tim er_popped ] Jul 8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED Jul 8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition: All 2 cluster nodes are eligible to run resources. Jul 8 03:08:00 mysql-1 crmd: [2837]: info: do_pe_invoke: Query 867: Requesting the current CIB: S_POLICY_ENGINE Jul 8 03:08:00 mysql-1 crmd: [2837]: info: do_pe_invoke_callback: Invoking the PE: query=867, ref=pe_calc-dc-1341731280-1029, seq=32, quorate=1 Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_config: On loss of CCM Quorum: Ignore Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_rsc_op: Operation ip1arp_last_failure_0 found resource ip1arp active on mysql-2 Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_rsc_op: Operation ip1arp_last_failure_0 found resource ip1arp active on mysql-1 Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave fs_mysql#011(Started mysql-1) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave fs_binlog#011(Started mysql-1) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave ip1#011(Started mysql-1) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave mysql#011(Started mysql-1) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave ip1arp#011(Started mysql-1) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave drbd_binlog:0#011(Slave mysql-2) Jul 8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions:
Re: [DRBD-user] Parse error an option keyword expected but got fence peer
On 06/23/2012 01:47 AM, Keith Christian wrote: I've searched for a solution to this error, lots of hits for Parse error but couldn't find anything specific for fence-peer. I have checked the drbd.conf file for obvious errors like unbalanced braces, and missing semicolons at the end of line. Nothing found. Using these RPM's: drbd82-8.2.6-1.el5.centos kmod-drbd82-8.2.6-2 Really, really, really consider an update to DRBD 8.3.x ... fence-peer was was named outdate-peer in earlier days. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now This is on a 64 bit system, so I fixed line 31 which needed lib64 to find the file: ls -l /usr/lib64/heartbeat/drbd-peer-outdater -rwxr-xr-x 1 root root 15984 Feb 6 2008 /usr/lib64/heartbeat/drbd-peer-outdater When running any DRBD command I see this error: drbdadm create-md drbd-resource-0 /etc/drbd.conf:31: Parse error: 'an option keyword' expected, but got 'fence-peer' I commented out line 31, tried to start DRBD again, and saw the error on line 56, removed the comment from line 31, and the error returns to line 31. service drbd start /etc/drbd.conf:56: Parse error: 'an option keyword' expected, but got 'outdated-wfc-timeout' Starting DRBD resources:/etc/drbd.conf:56: Parse error: 'an option keyword' expected, but got 'outdated-wfc-timeout' 53 # Wait for connection timeout if the peer node is already outdated. 54 # (Do not set this to 0, since that means unlimited) 55 # *** 56 outdated-wfc-timeout 2; # 2 seconds. 57# In case there was a split brain situation the devices will 58 # drop their network configuration instead of connecting. Since Below are the first 35 lines of the file, which enclose the line throwing the error: 1 global { usage-count no; } 2 3 resource drbd-resource-0 { 4 protocol C; 5 6 handlers { 7 # what should be done in case the node is primary, degraded 8 # (=no connection) and has inconsistent data. 9 pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; 10 11 # The node is currently primary, but lost the after split brain 12 # auto recovery procedure. As as consequence it should go away. 13 pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b /proc/sysrq-trigger ; reboot -f; 14 15 # In case you have set the on-io-error option to call-local-io-error, 16 # this script will get executed in case of a local IO error. It is 17 # expected that this script will case a immediate failover in the 18 # cluster. 19 local-io-error /usr/lib/drbd/notify-local-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o /proc/sysrq-trigger ; halt -f; 20 21 22 # Commands to run in case we need to downgrade the peer's disk 23 # state to Outdated. Should be implemented by the superior 24 # communication possibilities of our cluster manager. 25 # The provided script uses ssh, and is for demonstration/development 26 # purposis. 27 # fence-peer /usr/lib/drbd/outdate-peer.sh on amd 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12; 28 # 29 # Update: Now there is a solution that relies on heartbeat's 30 # communication layers. You should really use this. *** 31 fence-peer /usr/lib64/heartbeat/drbd-peer-outdater -t 5; 32 # For Pacemaker you might use: 33 # fence-peer /usr/lib/drbd/crm-fence-peer.sh; 34 35 } I'd appreciate any insight or help. == Keith ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Protocol A Pending
On 06/22/2012 09:38 PM, J.R. Lillard wrote: Witnessed another bandwidth spike that slowed my stacked layer down. 10: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate A r- ns:192538032 nr:0 dw:599650316 dr:1701817080 al:4481613 bm:43214 lo:1 pe:2050 ua:0 ap:2049 ep:1 wo:f oos:0 resync: used:0/61 hits:3274 misses:165 starving:0 dirty:0 changed:165 act_log: used:135/3833 hits:788561 misses:25007 starving:0 dirty:28 changed:24979 You also increased al-extents for the lower-level DRBD device of your stacked resource? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now On Fri, Jun 22, 2012 at 8:24 AM, J.R. Lillard jlill...@ghfllc.com mailto:jlill...@ghfllc.com wrote: My starving count was pretty high. I maxed out my al-extents and will see if that helps. Thanks. On Fri, Jun 22, 2012 at 4:22 AM, Andreas Kurz andr...@hastexo.com mailto:andr...@hastexo.com wrote: On 06/20/2012 11:51 PM, J.R. Lillard wrote: I have a lower-level setup on Protocol C and a stacked layer of Protocol A going through Proxy over a WAN. There are times when my disk activity spikes causing the Proxy buffers to fill up a bit. While this is happening the Pending count on my stacked resources goes up and causes my access to those resources to slow down. Is this normal? I thought with Protocol A as soon as my local write was finished things would continue. You checked that you are not running out of activity log extents for the stacked resource? Do an: echo 1 /sys/module/drbd/parameters/proc_details ... and have a look at starving counter in /proc/drbd ... should ideally be 0 and definitely not increasing regularly. If it does, increase the al-extents value for the stacked resource (max is 3833 in drbd 8.3 ... IIRC) Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now -- J.R. Lillard System / Network Admin Web Programmer Golden Heritage Foods 120 Santa Fe St. Hillsboro, KS 67063 ___ drbd-user mailing list drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- J.R. Lillard System / Network Admin Web Programmer Golden Heritage Foods 120 Santa Fe St. Hillsboro, KS 67063 -- J.R. Lillard System / Network Admin Web Programmer Golden Heritage Foods 120 Santa Fe St. Hillsboro, KS 67063 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Protocol A Pending
On 06/20/2012 11:51 PM, J.R. Lillard wrote: I have a lower-level setup on Protocol C and a stacked layer of Protocol A going through Proxy over a WAN. There are times when my disk activity spikes causing the Proxy buffers to fill up a bit. While this is happening the Pending count on my stacked resources goes up and causes my access to those resources to slow down. Is this normal? I thought with Protocol A as soon as my local write was finished things would continue. You checked that you are not running out of activity log extents for the stacked resource? Do an: echo 1 /sys/module/drbd/parameters/proc_details ... and have a look at starving counter in /proc/drbd ... should ideally be 0 and definitely not increasing regularly. If it does, increase the al-extents value for the stacked resource (max is 3833 in drbd 8.3 ... IIRC) Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now -- J.R. Lillard System / Network Admin Web Programmer Golden Heritage Foods 120 Santa Fe St. Hillsboro, KS 67063 ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Partition being synced must be on its own LVM volume?
On 06/21/2012 11:10 PM, Keith Christian wrote: In previous DRBD installations, a separate physical (e.g. non-LVM) partition was kept in sync with DRBD. This partition was, say, /dev/sda3 and was mounted on a /data directory in /etc/ha.d/ haresources. While planning this for a new LVM based install, it occurs to me that the partition that is mounted on /data must be a separate Logical Volume. Currently, /data resides in the same logical volume that hosts /. Am I correct in thinking that when DRBD starts, it would sync not only /data, but everything else in / too, including /root, /var, /home, etc.? I don't see how it wouldn't. So it appears I'll have to create a separate logical volume and mount /data on it as in the past. Sound reasonable, DRBD people? yes ... DRBD does block-level replication. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thanks. =Keith ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Filesystem Pacemaker Resources Stopped
On 03/24/2012 12:09 AM, Robert Langley wrote: Maybe I need to post this with Pacemaker? Not sure. I am a bit new to this scene and trying my best to learn all of this (Linux/DRBD/Pacemaker/Heartbeat). I am in the middle of following this document, Highly available NFS storage with DRBD and Pacemaker located at: http://www.linbit.com/en/education/tech-guides/highly-available-nfs-with-drbd-and-pacemaker/ OS: Ubuntu 11.10 DRBD version: 8.3.11 Pacemaker version: 1.1.5 I have two servers with 2.4 TB of internal hard drive space each, plus mirrored hard drives for the OS. They both have 10 NICs (2 onboard in a bond and 8 between 2, 4 port intel NICs). Issue: I got to the end of part 4.3 (commit) and that is when things went bad. I actually ended up with a split-brain and I seem to have recovered from that, but now my resources are as follows (running crm_mon -1): My slave node is actually showing as the Master under the Master/Slave Set: ms_drbd_nfs [p_drbd_nfs] Clone set Started Resource Group: Only p_lvm_nfs is Started on my slave node. All of the Filesystem resources are Stopped. Then, I have this at the bottom: Failed actions: p_fs_vol01_start_0 (node=ds01, call=46, rc=5, status=complete): not installed p_fs_vol01_start_0 (node=ds02, call=430, rc=5, status=complete): not installed Mountpoint created on both nodes, defined correct device and valid file system? What happens after a cleanup? ... crm resource cleanup p_fs_vol01 ... grep for Filesystem in your logs to get the error output from the resource agent. For more ... please share current drbd state/configuration and your cluster configuration. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Looking in the syslog on ds01 (primary node) does not reveal anything worth mentioning; but, looking at the syslog on ds02 (secondary node) shows the following messages: pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds01 pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds01: not installed (5) pengine: [11725]: notice: unpack_rsc_op: Operation p_lsb_nfsserver:1_monitor_0 found resource p_lsb_nfsserver:1 active on ds02 pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds02 pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds02: not installed (5) pengine: [11725]: notice: native_print: failover-ip#011(ocf::heartbeat:IPaddr):#011Stopped pengine: [11725]: notice: clone_print: Master/Slave Set: ms_drbd_nfs [p_drbd_nfs] ... pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds01 after 100 failures (max=100) pengine: [11725]: notice: common_apply_stickiness: p_lvm_nfs can fail 99 more times on ds02 before being forced off pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds02 after 100 failures (max=100) pengine: [11725]: notice: LogActions: Leave failover-ip#011(Stopped) pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:0#011(Slave ds01) pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:1#011(Master ds02) Thank you in advance for any assistance, Robert ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd read-only mode
On 03/15/2012 03:31 PM, зоррыч wrote: Read-only flag stands on two nodes. I will use a cluster file system (ocfs2 of gfs2) for the correct operation of the two disks read-only flag??? Do you mean the ro:Primary/Primary which indicates the role (ro..le) as being Primary on both sides? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now *From:*Marcelo Pereira [mailto:marcel...@gmail.com] *Sent:* Thursday, March 15, 2012 6:26 PM *To:* ?? *Subject:* Re: [DRBD-user] drbd read-only mode Why do you want it to be primary on both nodes? --Marcelo On Mar 15, 2012, at 10:21 AM, ?? zo...@megatrone.ru mailto:zo...@megatrone.ru wrote: Hi I installed 8.4.1 drdb. The cluster operates on a primary/primary mode. However, the drives are mounted in the mode of read-only [root@noc-1-m77 /]# cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by r...@noc-1-synt.rutube.ru mailto:r...@noc-1-synt.rutube.ru, 2012-03-14 10:05:49 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r- ^^ ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 Why not activate the read-write mode? Config: [root@noc-1-synt /]# cat /etc/drbd.d/r0.res # create new resource r0 { startup { wfc-timeout 20; degr-wfc-timeout 10; # we will keep this commented until tested successfully: become-primary-on both; } net { protocol C; allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } # DRBD device device /dev/drbd0; # phisical device disk /dev/vg_noc1synt/lv02; meta-disk internal; on noc-1-synt.rutube.ru http://noc-1-synt.rutube.ru { # IP address:port address 10.1.20.10:7788; } on noc-1-m77.rutube.ru http://noc-1-m77.rutube.ru { address 10.2.20.9:7788; } } [root@noc-1-synt /]# ___ drbd-user mailing list drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbdadm verify all question
On 03/13/2012 03:44 PM, Maurits van de Lande wrote: I have got a drbd setup with a primary and a secondary node. I would like to verify if the secondary node is in sync with the primary node. On which node do I have to execute # drbdadm verify all It doesn't matter on which node you start it. If differences are found they are _not_ automatically resynced ... out-of-sync blocks are resynced after doing a disconnect/reconnect and data is synced from Primary to Secondary ... no matter where you started the verify. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thanks, Maurits van de Lande ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Pacemaker + Dual Primary, handlers and fail-back issues
Hello, On 02/29/2012 08:08 PM, Daniel Grunblatt wrote: Hi, I have a 2 node cluster with sles11sp1, with the latest patches. Configured Pacemaker, dual primary drbd and xen. see below for some comments ... Here's the configuration: - drbd.conf global { usage-count yes; } common { protocol C; disk { on-io-errordetach; fencing resource-only; for dual-primary setup use resource-and-stonith } syncer { rate 1G; wow ... only set that high rate if your I/O system can handle that. al-extents 3389; } net { allow-two-primaries; # Enable this *after* initial testing cram-hmac-alg sha1; shared-secret a6a0680c40bca2439dbe48343cf4; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; } startup { become-primary-on both; this is only done by the drbd init script which is hopefully deactivated ... anyway ... remove that directive } handlers { fence-peer /usr/lib/drbd/crm-fence-peer.sh; after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; for dual-primary setup with a cluster file system and resource-and-stonith in combination with Pacemaker you can use a fencing script like stonith_admin-fence-peer.sh that recently found its way into the drbd repository: http://goo.gl/XfSfo ... thanks to my colleague Florian Haas for contributing this nice script ;-) You don't need after-resync-target for that kind of setup. } } resource vmsvn { device/dev/drbd0; disk /dev/sdb; meta-disk internal; on xm01 { address 100.0.0.1:7788; } on xm02 { address 100.0.0.2:7788; } } resource srvsvn1 { protocol C; device/dev/drbd1; disk /dev/sdc; meta-disk internal; on xm01 { address 100.0.0.1:7789; } on xm02 { address 100.0.0.2:7789; } } resource srvsvn2 { protocol C; device/dev/drbd2; disk /dev/sdd; meta-disk internal; on xm01 { address 100.0.0.1:7790; } on xm02 { address 100.0.0.2:7790; } } resource vmconfig { protocol C; device/dev/drbd3; meta-disk internal; on xm01 { address 100.0.0.1:7791; disk /dev/vg_xm01/lv_xm01_vmconfig; } on xm02 { address 100.0.0.2:7791; disk /dev/vg_xm02/lv_xm02_vmconfig; } } - crm configuration: node xm01 node xm02 primitive VMSVN ocf:heartbeat:Xen \ meta target-role=Started allow-migrate=true is-managed=true resource-stickiness=0 \ operations $id=VMSVN-operations \ op monitor interval=30 timeout=30 \ op start interval=0 timeout=60 \ op stop interval=0 timeout=60 \ op migrate_to interval=0 timeout=180 \ params xmfile=/etc/xen/vm/vmsvn primitive clvm ocf:lvm2:clvmd \ operations $id=clvm-operations \ op monitor interval=10 timeout=20 primitive dlm ocf:pacemaker:controld \ operations $id=dlm-operations \ op monitor interval=10 timeout=20 start-delay=0 primitive ipmi-stonith-xm01 stonith:external/ipmi \ meta target-role=Started is-managed=true priority=10 \ operations $id=ipmi-stonith-xm01-operations \ op monitor interval=15 timeout=15 start-delay=15 \ params hostname=xm01 ipaddr=125.1.254.107 userid=administrator passwd=17xm45 interface=lan primitive ipmi-stonith-xm02 stonith:external/ipmi \ meta target-role=Started is-managed=true priority=9 \ operations $id=ipmi-stonith-xm02-operations \ op monitor interval=15 timeout=15 start-delay=15 \ params hostname=xm02 ipaddr=125.1.254.248 userid=administrator passwd=17xm45 interface=lan primitive o2cb ocf:ocfs2:o2cb \ operations $id=o2cb-operations \ op monitor interval=10 timeout=20 primitive srvsvn1-drbd ocf:linbit:drbd \ params drbd_resource=srvsvn1 \ operations $id=srvsvn1-drbd-operations \ op monitor interval=20 role=Master timeout=20 \ op monitor interval=30 role=Slave timeout=20 \ op start interval=0 timeout=240 \ op promote interval=0 timeout=90 \ op demote interval=0 timeout=90 \ op stop interval=0 timeout=100 \ meta migration-threshold=10 failure-timeout=600 primitive srvsvn2-drbd ocf:linbit:drbd \ params drbd_resource=srvsvn2 \ operations $id=srvsvn2-drbd-operations \ op monitor interval=20 role=Master timeout=20 \ op monitor interval=30 role=Slave timeout=20 \ op start interval=0 timeout=240 \ op promote interval=0 timeout=90 \ op demote interval=0 timeout=90 \ op stop interval=0 timeout=100 \ meta migration-threshold=10 failure-timeout=600
Re: [DRBD-user] Urgent!!!: degr-wfc-timeout is not working.
Hello, On 02/21/2012 05:17 PM, venkatesh prabhu wrote: Hi, I am facing issue in degraded timeout. My two node cluster with DRBD is up and running. But the degr-wfc-timeout is not working as expected. I have primary and secondary node. Then i shutdown the secondary node, then making some changes in mirror from the primary node and rebooting. When it comes up it is waiting for 180 seconds instead of 3 seconds. See the config section provided below Please let me know what could be the problem. It is working as expected: degr-wfc-timeout would be triggered if the Primary crashes while running in a degraded cluster ... on regular shutdown and reboot afterwards wfc-timeout is used. Reset your single Primary and you should see degr-wfc-timeout being triggered. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now My confgi section for timeouts; startup { degr-wfc-timeout 3;#3 sec.. wfc-timeout 180;# 3 min. # become-primary-on both; } # end of startup -- Thanks in Advance. Vengatesh Prabhu Life is Beautiful: ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR)
Hello, On 02/17/2012 10:28 AM, Mark Watts wrote: I have a pair of CentOS 5.7 VM's running an LVM/DRBD/EXT3 Pri/Sec cluster. Since we use VDR to take snapshots of our VM's I naturally added these two VM's to the backup rota. So you installed latest VMware tools to the VMs and you are sure they are running? Pretty much every night I get hundreds of errors in the VDR logs relating to the Primary, giving the message: Failed to create snapshot for VDR01, error -3960 ( cannot quiesce virtual machine) Any logs from vmware-tools in the VM? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now I'm taking a wild guess at this perhaps being related to DRBD; can anyone suggest whether this is the issue. Mark. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Problem with drbd 8.4.1 on linux kernel 3.0.0
Hello, On 02/03/2012 03:05 PM, Owen Le Blanc wrote: Kaloyan Kovachev wrote: what does 'cat /proc/drbd' say at this moment on both nodes? on the primary: cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root@brahe, 2012-01-06 13:30:08 0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r- ns:0 nr:0 dw:116 dr:472 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:232 1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown r- ns:0 nr:0 dw:116 dr:472 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:232 oos: 232 on the secondary: cat /proc/drbd version: 8.4.1 (api:1/proto:86-100) GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by root@brahe, 2012-01-06 13:30:08 0: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1516 1: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1516 also oos on Secondary ... as Felix said, this looks like a split brain ... Out Of Sync blocks on both sides. at least, shortly after giving the connect command. Soon it reverts to StandAlone. and this is the default after-split-brain behaviour: disconnect Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now -- Owen ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Problem with drbd 8.4.1 on linux kernel 3.0.0
Hello, On 02/03/2012 04:35 PM, Felix Frank wrote: Hi, On 02/03/2012 04:05 PM, Owen Le Blanc wrote: This still leaves open the question of why split brain keeps occuring? The cluster is managed by pacemaker, version 1.1.6, with corosync, version 1.4.2. There isn't actually any real data on the two drbd devices, since this is only a test. But it concerns me that it seems to go into split brain about once a week. Be sure to have STONITH enabled and configured, use resource-level fencing in DRBD and redundant ring setup for corosync ... so try to minimize the chance for split-brain and establish protection in case your nodes get separated. this is neigh impossible to answer without looking at your setup and workflows. Split brain can only happen when the nodes get disconnected (this includes downtimes of either node). ... and additionally they need to get promoted to Primary on both sides while not connected. Regards, Andreas -- Need help with Pacemaker/DRBD/Corosync? http://www.hastexo.com/now How often are your nodes getting separated thus? ...this is not dual-primary, is it? Prune your logs, it should be easy to determine when DRBD changes states. As to why, well - analyzing pacemaker activity from hindsight can be challenging, I guess. Your best bet is probably to make sure the nodes don't get separated, and that maintenance is done very carefully. HTH, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Urgent: Stuch with DRBD bring up.
Hello, ... don't forget to post to the group. On 01/30/2012 01:06 PM, venkatesh prabhu wrote: Hi Andreas, Thanks for your quick reply. The solution you provided me solved the problem number 1. But still i am facing the error with up command. Please help in get rid of that error. When i run the drbdadm up r0 it exits with error code 20. Device '0' is configured! Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al' terminatedwith exit code 20 You don't need to to bring the DRBD device up after running the init script, that's the responsibility of the script. You only need to make it Primary manually if you don't want a cluster manager to do the job. If you want pacemaker to do this job, deactivate the init script completely and follow the DRBD users guide on how to integrate DRBD into your cluster configuration. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now But still cat /proc/drdb shows that disk is created and it is in inconsistent state. Then i can promote it to primary and everything works fine. How can i get rid of the Device '0' is configured! error? Please help me. Thank You Vengatesh Prabhu On Sat, Jan 28, 2012 at 2:19 PM, Andreas Kurz andr...@hastexo.com wrote: Hello, On 01/28/2012 01:26 PM, venkatesh prabhu wrote: Hi, Please help me solve my issues. I am trying to bring up DRBD for first time but i am facing following problems. 1. when i start the drbd service for first time it says adjust disk failed: Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0:failed(apply-al:20) adjust net: r0 ] 2. Then creation of metadat drbdadm create-md r0 is success. Do the metadata creation on both nodes before you start the drbd service and the rest should be fine ... and be sure to use 8.4.1 and nod 8.4.0 release. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now 3. Then drbdadm up r0 fails with exit code 1. drbdadm up r0r0: Failure: (102) Local address(port) already in use. Command 'drbdsetup connect r0 ipv4:10.203.230.136:7788 ipv4:10.203.230.135:7788 --shared-secret=DRBD --ping-timeout=5 --ping-int=10 --connect-int=10 --timeout=60 --protocol=C' terminated with exit code 1 4. If the run the same command drbdadm up r0 it fails with exit code 10. Device '0' is configured! Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al' terminatedwith exit code 20 but still i can promote the resource to primary and the sync happens properly between two nodes. but how can i avoid those errors? please help me. my drbd.conf file is provided below. global { usage-count no; } common { protocol C; startup { degr-wfc-timeout 3;#3 = 3 sec.. wfc-timeout 180;# 3 min. } # end of startup handlers { } # end of handlers disk { on-io-error detach; } # end of disk net { timeout 60;# 6 seconds (unit = 0.1 seconds) connect-int 10;# 10 seconds (unit = 1 second) ping-int 10;# 10 seconds (unit = 1 second) ping-timeout 5;# 500 ms (unit = 0.1 seconds) shared-secret DRBD; } # end of net } # end of common resource r0{ syncer { rate 100M; } on lab1601 { device /dev/drbd0; disk /dev/vg0/mirror; address10.203.230.135:7788; meta-disk /dev/vg0/drbdmeta; } on lab1602 { device/dev/drbd0; disk /dev/vg0/mirror; address 10.203.230.136:7788; meta-disk /dev/vg0/drbdmeta; } } #end Thank You Vengatesh Prabhu ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Slower disk throughput on DRBD partition
Hello, On 02/01/2012 01:04 PM, Frederic DeMarcy wrote: Hi Note 1: Scientific Linux 6.1 with kernel 2.6.32-220.4.1.el6.x86_64 DRBD 8.4.1 compiled from source Note 2: server1 and server2 are 2 VMware VMs on top of ESXi 5. However they reside on different physical 2U servers. The specs for the 2U servers are identical: - HP DL380 G7 (2U) - 2 x Six Core Intel Xeon X5680 (3.33GHz) - 24GB RAM - 8 x 146 GB SAS HD's (7xRAID5 + 1s) - Smart Array P410i with 512MB BBWC Have you tried to change the I/O scheduler to deadline or noop in the VMs? ... see below .. Note 3: I've tested the network throughput with iperf which yields close to 1Gb/s [root@server1 ~]# iperf -c 192.168.111.11 -f g Client connecting to 192.168.111.11, TCP port 5001 TCP window size: 0.00 GByte (default) [ 3] local 192.168.111.10 port 54330 connected with 192.168.111.11 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.10 GBytes 0.94 Gbits/sec [root@server2 ~]# iperf -s -f g Server listening on TCP port 5001 TCP window size: 0.00 GByte (default) [ 4] local 192.168.111.11 port 5001 connected with 192.168.111.10 port 54330 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.10 GBytes 0.94 Gbits/sec Scp'ing a large file from server1 to server2 yields ~ 57MB/s but I guess it's due to the encryption overhead. Note 4: MySQL was not running. Base DRBD config: resource mysql { startup { wfc-timeout 3; degr-wfc-timeout 2; outdated-wfc-timeout 1; } net { protocol C; verify-alg sha1; csums-alg sha1; using csums based resync is only interesting for WAN setups where you need to sync via a rather thin connection data-integrity-alg sha1; using data-integrity-alg is definitely not recommended (slow) for live setups, only if you have to assume there is buggy hardware on the way between your nodes ... like nics pretending csums are ok while they are not and out of curiosity ... did you gave DRBD 8.3.12 already a try? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now cram-hmac-alg sha1; shared-secret MySecret123; } on server1 { device/dev/drbd0; disk /dev/sdb; address 192.168.111.10:7789; meta-disk internal; } on server2 { device/dev/drbd0; disk /dev/sdb; address 192.168.111.11:7789; meta-disk internal; } } After any change in the /etc/drbd.d/mysql.res file I issued a drbdadm adjust mysql on both nodes. Test #1 DRBD partition on primary (secondary node disabled) Using Base DRBD config # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 420MB/s Test #2 DRBD partition on primary (secondary node enabled) Using Base DRBD config # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 61MB/s Test #3 DRBD partition on primary (secondary node enabled) Using Base DRBD config with: Protocol B; # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 68MB/s Test #4 DRBD partition on primary (secondary node enabled) Using Base DRBD config with: Protocol A; # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 94MB/s Test #5 DRBD partition on primary (secondary node enabled) Using Base DRBD config with: disk { disk-barrier no; disk-flushes no; md-flushes no; } # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Disk throughput ~ 62MB/s No difference from Test #2 really. Also cat /proc/drbd still shows wo:b in both cases so I'm not even sure these disk {..} parameters have been taken into account... Test #6 DRBD partition on primary (secondary node enabled) Using Base DRBD config with: Protocol B; disk { disk-barrier no; disk-flushes no; md-flushes no; } # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Disk throughput ~ 68MB/s No difference from Test #3 really. Also cat /proc/drbd still shows wo:b in both cases so I'm not even sure these disk {..} parameters have been taken into account... What else can I try? Is it worth trying DRBD 8.3.x? Thx. Fred On 1 Feb 2012, at 08:35, James Harper wrote: Hi I've configured DRBD with a view to use it with MySQL (and later on Pacemaker + Corosync) in a 2 nodes primary/secondary (master/slave) setup. ... No replication over the 1Gb/s crossover cable is taking place since the secondary node is down yet there's x2 lower disk performance.
Re: [DRBD-user] drbd 8.3.12 cannot get syncer speed beyond 110MB/s
Hello, On 02/01/2012 01:36 PM, Maurits van de Lande wrote: Hello, I have asked this question before for drbd 8.4.1 but I couldn't get proper write performance. Currently I downgraded drbd to 8.3.12 but I cannot get the syncer speed above 1Gb/s or 110MB/s. When I test the disk throughput with #dd if=/dev/zero of=/VM/test bs=512M count=1 oflag=direct I get a speed around 500MB/s (It's an all SSD raid 5 array) I installed a 10G network adapter and tested the connection with iperf and I get on average 7Gbit/s This should all be sufficient for a drbd syncer speed 110MB/s. I have set the rate=200M option for a 2Gbit/s rate cat /proc/drbd shows version: 8.3.12 (api:88/proto:86-96) GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 2011-11-20 10:57:03 0: cs:SyncSource ro:Primary/Primary ds:UpToDate/Inconsistent C r- ns:176 nr:0 dw:0 dr:79839200 al:0 bm:4872 lo:0 pe:0 ua:20 ap:0 ep:1 wo:d oos:1470943696 [...] sync'ed: 5.2% (1436468/1514432)M finish: 3:30:54 speed: 116,228 (109,212) K/sec Only at the start the value of (200.000) K/sec is shown. What can I do to get the 200MB/s syncer speed? You followed all the guidelines in the performance tuning section of the DRBD Users Guide? ... like using jumbo frames on the direct connected 10Gb link and the deadline I/O scheduler, to just name two important ones ... Regards, Andreas -- Need help with DRBD performance tuning? https://www.hastexo.com/services/remote Best regards, Maurits van de Lande | Van de Lande BV. | Lissenveld 1 | 4941VK | Raamsdonksveer | the Netherlands |T +31 (0) 162 516000 | F +31 (0) 162 521417 | www.vdl-fittings.com | ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Slower disk throughput on DRBD partition
On 02/01/2012 05:15 PM, Frederic DeMarcy wrote: Hi Andrea Commenting out csum-alg doesn't seem to make any noticeable difference... However commenting out data-integrity-alg and running Test #2 again increases the throughput from ~ 61MB/s to ~ 97MB/s ! Note that I may well run into the 1Gb/s crossover link limit here since my network tests showed ~ 0.94 Gb/s Also Test #1 was wrong in my email... It should have been split in 2: Test #1 On non-DRBD device (/dev/sda) # dd if=/dev/zero of=/home/userxxx/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 420MB/s DRBD partition (/dev/sdb) on primary (secondary node disabled) Using Base DRBD config # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 205MB/s Is the result the same if you execute a drbdadm invalidate-remote mysql on the primary before doing the single node test? that would disable activity log updates ... Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/services/remote With the above -alg commented out, disabling the secondary node and running Test #1 again (correctly split this time) shows the same throughputs of ~ 420MB/s and ~ 205MB/s Fred On Wed, Feb 1, 2012 at 1:48 PM, Andreas Kurz andr...@hastexo.com mailto:andr...@hastexo.com wrote: Hello, On 02/01/2012 01:04 PM, Frederic DeMarcy wrote: Hi Note 1: Scientific Linux 6.1 with kernel 2.6.32-220.4.1.el6.x86_64 DRBD 8.4.1 compiled from source Note 2: server1 and server2 are 2 VMware VMs on top of ESXi 5. However they reside on different physical 2U servers. The specs for the 2U servers are identical: - HP DL380 G7 (2U) - 2 x Six Core Intel Xeon X5680 (3.33GHz) - 24GB RAM - 8 x 146 GB SAS HD's (7xRAID5 + 1s) - Smart Array P410i with 512MB BBWC Have you tried to change the I/O scheduler to deadline or noop in the VMs? ... see below .. Note 3: I've tested the network throughput with iperf which yields close to 1Gb/s [root@server1 ~]# iperf -c 192.168.111.11 -f g Client connecting to 192.168.111.11, TCP port 5001 TCP window size: 0.00 GByte (default) [ 3] local 192.168.111.10 port 54330 connected with 192.168.111.11 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.10 GBytes 0.94 Gbits/sec [root@server2 ~]# iperf -s -f g Server listening on TCP port 5001 TCP window size: 0.00 GByte (default) [ 4] local 192.168.111.11 port 5001 connected with 192.168.111.10 port 54330 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.0 sec 1.10 GBytes 0.94 Gbits/sec Scp'ing a large file from server1 to server2 yields ~ 57MB/s but I guess it's due to the encryption overhead. Note 4: MySQL was not running. Base DRBD config: resource mysql { startup { wfc-timeout 3; degr-wfc-timeout 2; outdated-wfc-timeout 1; } net { protocol C; verify-alg sha1; csums-alg sha1; using csums based resync is only interesting for WAN setups where you need to sync via a rather thin connection data-integrity-alg sha1; using data-integrity-alg is definitely not recommended (slow) for live setups, only if you have to assume there is buggy hardware on the way between your nodes ... like nics pretending csums are ok while they are not and out of curiosity ... did you gave DRBD 8.3.12 already a try? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now cram-hmac-alg sha1; shared-secret MySecret123; } on server1 { device/dev/drbd0; disk /dev/sdb; address 192.168.111.10:7789 http://192.168.111.10:7789; meta-disk internal; } on server2 { device/dev/drbd0; disk /dev/sdb; address 192.168.111.11:7789 http://192.168.111.11:7789; meta-disk internal; } } After any change in the /etc/drbd.d/mysql.res file I issued a drbdadm adjust mysql on both nodes. Test #1 DRBD partition on primary (secondary node disabled) Using Base DRBD config # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 oflag=direct Throughput ~ 420MB/s Test #2 DRBD partition on primary (secondary node enabled) Using Base DRBD
Re: [DRBD-user] Urgent: Stuch with DRBD bring up.
Hello, On 01/28/2012 01:26 PM, venkatesh prabhu wrote: Hi, Please help me solve my issues. I am trying to bring up DRBD for first time but i am facing following problems. 1. when i start the drbd service for first time it says adjust disk failed: Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0:failed(apply-al:20) adjust net: r0 ] 2. Then creation of metadat drbdadm create-md r0 is success. Do the metadata creation on both nodes before you start the drbd service and the rest should be fine ... and be sure to use 8.4.1 and nod 8.4.0 release. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now 3. Then drbdadm up r0 fails with exit code 1. drbdadm up r0r0: Failure: (102) Local address(port) already in use. Command 'drbdsetup connect r0 ipv4:10.203.230.136:7788 ipv4:10.203.230.135:7788 --shared-secret=DRBD --ping-timeout=5 --ping-int=10 --connect-int=10 --timeout=60 --protocol=C' terminated with exit code 1 4. If the run the same command drbdadm up r0 it fails with exit code 10. Device '0' is configured! Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al' terminatedwith exit code 20 but still i can promote the resource to primary and the sync happens properly between two nodes. but how can i avoid those errors? please help me. my drbd.conf file is provided below. global { usage-count no; } common { protocol C; startup { degr-wfc-timeout 3;#3 = 3 sec.. wfc-timeout 180;# 3 min. } # end of startup handlers { } # end of handlers disk { on-io-error detach; } # end of disk net { timeout 60;# 6 seconds (unit = 0.1 seconds) connect-int 10;# 10 seconds (unit = 1 second) ping-int 10;# 10 seconds (unit = 1 second) ping-timeout 5;# 500 ms (unit = 0.1 seconds) shared-secret DRBD; } # end of net } # end of common resource r0{ syncer { rate 100M; } on lab1601 { device /dev/drbd0; disk /dev/vg0/mirror; address10.203.230.135:7788; meta-disk /dev/vg0/drbdmeta; } on lab1602 { device/dev/drbd0; disk /dev/vg0/mirror; address 10.203.230.136:7788; meta-disk /dev/vg0/drbdmeta; } } #end Thank You Vengatesh Prabhu signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd and 10Gb/s network how to increase syncer rate beyond 100MB/s?
Hello, On 01/27/2012 02:35 PM, Maurits van de Lande wrote: Hello, I have a couple of servers with SSD based raid arrays (capable of writing 550+MB/s to the disks). And a 10Gb/s network (HP NC552SFP network adapter HP E8206 network switches) Before installing the 10Gb network adapter the syncrate was limited to around 102MB/s (1Gb/s?) After Installing the 10Gb network adapter it was still impossible to get more than 102MB/s sync rate between the servers. I would like to have a 300MB/s sync rate. When I set this as a fixed sync rate in my drbd84 resource file I noticed that this value is ignored and 102MB/s is used. Just to be sure you are using the new 10Gbit interconnect in your DRBD configuration ... and you already tested with iperf or the tool of your choice that you can use the full bandwidth? Regards, Andreas -- Need help with DRBD performance tuning? http://www.hastexo.com/services/remote How can I increase the syncer rate in drbd84? Best regards, Maurits van de Lande |Van de Lande BV.|Lissenveld 1|4941VK |Raamsdonksveer|the Netherlands |T +31 (0) 162 516000 |F +31 (0) 162 521417 |www.vdl-fittings.com http://www.vdl-fittings.com | ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Slow buffered read performance
On 01/24/2012 10:59 AM, samp...@neutraali.net wrote: Hello all, I'm running DRBD on two following machines: Scientific Linux 6.1 Intel Xeon E5620 Quad Core 12GB Ram LSI 9280-4i4e + CacheCade 1.0 (CacheCade is currently disabled) 12x 1TB Seagate Constellation SAS 7200 RPM drives DRBD has been configured in master/slave -fashion and there is 10 GbE dedicated link between machines for DRBD traffic. Drives have been configured in RAID-10 mode with 64 kB Stripe Size. DRBD version is 8.4.0 (api:1/proto:86-100). Have you tried to reproduce that with drbd 8.4.1 ? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Couple of weeks ago I had to do some maintenance work on node1 (normally the master node) so I moved all services to run on node2. Everything went well and since machines are identical I didn't bother to move services back to node1. Few days later I noticed that performance was somewhat degraded but difference was so minimal that I didn't focus on it at all. A little later I was asked to do some simple read performance tests. Everything looked ok when doing direct reads on DRBD-device: dd if=/dev/drbd0 of=/dev/null bs=1M iflag=direct ^C17317+0 records in 17316+0 records out 18157142016 bytes (18 GB) copied, 21.3465 s, 851 MB/s But with buffered reads things get slow: dd if=/dev/drbd0 of=/dev/null bs=1M ^C11131+0 records in 11130+0 records out 11670650880 bytes (12 GB) copied, 105.299 s, 111 MB/s However, the underlying disk seems to be fine: dd if=/dev/sdb of=/dev/null bs=1M ^C14087+0 records in 14086+0 records out 14770241536 bytes (15 GB) copied, 19.8579 s, 744 MB/s I moved services back to node1 and the problem was gone (dd if=/dev/drbd0 of=/dev/null bs=1M, 37312528384 bytes (37 GB) copied, 54.8519 s, 680 MB/s). Now I started to investigate what caused the issues and moved services to node2 and performance problems hit again so I thought that there has to be something wrong on node2. I compared settings between the two machines to make sure they are really identical and found nothing strange between them. Raid sets are fine and there are no error messages in log files. At this point I decided to reboot node1 and after that moved services back to it. After the reboot performance dropped also on node1 and I haven't been able to find out anything that could really help getting performance up again. So it seems like DRBD has huge effect when it comes to buffered reads. It may very well be that I have forgotten to do some sysctl or such tuning after reboot but I can't figure out what it could be. Any ideas how to work this out or is this expected behaviour? I'm currently using following settings on my disks: echo deadline /sys/block/sdb/queue/scheduler echo 0 /sys/block/sdb/queue/iosched/front_merges echo 150 /sys/block/sdb/queue/iosched/read_expire echo 1500 /sys/block/sdb/queue/iosched/write_expire echo 3200 /proc/sys/vm/dirty_background_bytes echo 38400 /proc/sys/vm/dirty_bytes echo 1024 /sys/block/sdb/queue/nr_requests and current DRBD resource configuration: resource drbd0 { device /dev/drbd0; disk /dev/sdb1; meta-disk internal; options { cpu-mask 15; } net { protocol C; max-buffers 8000; max-epoch-size 8000; unplug-watermark 16; sndbuf-size 0; } disk { al-extents 3389; disk-barrier no; disk-flushes no; } on node1 { address 10.10.10.1:7789; } on node2 { address 10.10.10.2:7789; } } Best regards, Samuli Heinonen ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Concurrent local write detected! [DISCARD L]
Hello, On 01/18/2012 12:39 PM, Alessandro Bono wrote: Hi installing a kvm virtual machine on a drbd disk cause these logs on host machine [2571736.830557] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 48981951s +32768; pending: 48981951s +32768 [2571736.857671] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 48982015s +512; pending: 48982015s +512 [2571736.884479] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 48982016s +3584; pending: 48982016s +3584 [2571736.911285] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 48982023s +28672; pending: 48982023s +28672 [2571798.062440] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 66639863s +4096; pending: 66639855s +8192 [2571798.089232] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 66639871s +512; pending: 66639871s +512 [2571798.116014] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 66639872s +3584; pending: 66639872s +3584 [2571798.143110] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 66639879s +57344; pending: 66639879s +53248 [2571932.144089] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 64914183s +24576; pending: 64914215s +65536 [2572014.253295] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 78267975s +65536; pending: 78267975s +65536 [2572317.294655] block drbd0: kvm[7083] Concurrent local write detected! [DISCARD L] new: 45901543s +12288; pending: 45901543s +12288 [2572346.510458] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 64619831s +65536; pending: 64619831s +65536 [2572402.322369] block drbd0: kvm[7082] Concurrent local write detected! [DISCARD L] new: 64818567s +61440; pending: 64818567s +61440 [2572402.349160] block drbd0: kvm[7082] Concurrent local write detected! [DISCARD L] new: 64818687s +512; pending: 64818687s +512 [2572402.376182] block drbd0: kvm[7082] Concurrent local write detected! [DISCARD L] new: 64818688s +3584; pending: 64818688s +3584 [2572403.429157] block drbd0: kvm[7082] Concurrent local write detected! [DISCARD L] new: 64896055s +65536; pending: 64896055s +65536 [2572422.493968] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 57422703s +65536; pending: 57422703s +65536 [2572889.505644] block drbd0: kvm[7080] Concurrent local write detected! [DISCARD L] new: 75415135s +65536; pending: 75415135s +65536 on mailing list I found some reference to iscsi not on a vm directly installed on drbd vm guest is a windows 2003 server r2 with virtio disk drivers Are you using latest version of the virtio drivers and do the errors also occur if you run the machine without them? vm host is an ubuntu server with kernel 2.6.38-13 and drbd 8.3.12 from git qemu-kvm 0.12.3+noroms-0ubuntu9.16 can you show us the options you use to start the VM ... or a virsh dumpxml output cat /proc/drbd version: 8.3.12 (api:88/proto:86-96) GIT-hash: 465da64362f0aece357e9015c50ed849e2458abd debian/changelog debian/control build by root@nebbiolo-dev, 2011-12-29 16:00:53 ... is this the complete config? Please show us a drbdadm dump all and the full cat /proc/drbd Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now cat /etc/drbd.d/r0.res resource r0 { syncer { rate 25M; csums-alg sha1; verify-alg sha1; } net { cram-hmac-alg sha1; shared-secret xx; } disk { no-disk-flushes; no-md-flushes; } on ga1 { device /dev/drbd0; disk /dev/ga1/winsrv; address10.12.24.242:7788; meta-disk internal; } on ga2 { device/dev/drbd0; disk /dev/ga2/winsrv; address 10.12.24.243:7788; meta-disk internal; } } Is there a workaround/solution? thanks signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] 4-way replication
Hello, On 01/19/2012 09:51 AM, Benjamin Knoth wrote: Hi Lars, i'm not interestedin Pacemaker 1.05. But in the DRBD User Guide you can only read the following. Note Due to limitations in the Pacemaker cluster manager as of Pacemaker version 1.0.5, it is not possible to create this setup in a single four-node cluster without disabling CIB validation, which is an advanced process not recommended for general-purpose use. It is anticipated that this is being addressed in future Pacemaker releases. I use Pacemaker 1.1.5 at the moment. If you wan't to configure a split-site cluster you could test two two-node Pacemaker 1.1.6 cluster and the new booth service: http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html If you have reliable interconnect between the two pairs including fencing you could try to use one four node cluster with some fancy constraints ... I think that should work ... though the correct constraints might be challenging. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Best regards Benjamin Am 18.01.2012 17:46, schrieb Lars Ellenberg: On Wed, Jan 18, 2012 at 02:12:13PM +0100, Benjamin Knoth wrote: Hi all, the 4-node replication is working fine and i can mange the resources in Pacemaker. In the documentation of the user guide i read that's not possible in Pacemaker 1.05 to create a 4 node cluster only 2. Why would you be interested in notes for Pacemaker 1.0.5, if we have 1.0.12 and 1.1.6-almost-7 out there? What's the status of this feature? Is it integrated or in which version is it planned to integrate this feature? best regards Benjamin ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] % of utilisation on my drbd device
sorry if that might get double-posted ... I used wrong sender address on initial reply ... -- Hello, On 01/19/2012 07:56 AM, Matthieu Lejeune wrote: Hello, I have a primary/secondary node with 2 ressources. When I watch the i/O performance i look some think like this : When I watch the physical drive utilisation : root@relax:~# iostat -x /dev/sda 1 Linux 2.6.32-5-amd64 (relax) 01/19/2012 _x86_64_(16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.020.001.090.020.00 98.87 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 220.78 700.91 555.52 374.88 20794.66 21227.05 45.16 1.541.66 0.42 38.87 avg-cpu: %user %nice %system %iowait %steal %idle 0.040.000.170.000.00 99.79 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 186.0048.00 211.00 162.00 15392.00 2624.00 48.30 0.030.09 0.06 2.40 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.290.000.00 99.71 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 197.00 9.00 379.00 117.00 19472.00 1172.00 41.62 0.150.31 0.19 9.60 When I watch the drbd device utilisation : root@relax:~# iostat -x /dev/drbd0 1 Linux 2.6.32-5-amd64 (relax) 01/19/2012 _x86_64_(16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.020.001.090.020.00 98.87 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util drbd0 0.00 0.00 775.16 1016.47 20729.96 21174.70 23.39 0.945.51 0.56 99.85 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.560.000.00 99.44 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util drbd0 0.00 0.00 426.00 104.00 15648.00 1262.00 31.91 5.190.15 1.87 99.20 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.560.000.00 99.44 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util drbd0 0.00 0.00 404.00 148.00 15888.00 1872.00 32.17 5.780.33 1.80 99.60 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.310.000.00 99.69 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util drbd0 0.00 0.00 398.00 117.00 15632.00 1736.00 33.72 5.280.10 1.94 100.00 avg-cpu: %user %nice %system %iowait %steal %idle 0.000.000.240.000.00 99.76 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util drbd0 0.00 0.00 406.00 106.00 15376.00 1382.00 32.73 5.070.17 1.92 98.40 I don't understand why my drbd device is at 100 %util. This is my ressource configuration : There are a raid 0 hardware with 12 sas 15k drives. So think you should get higher write?? performance from your setup ... what are your current benchmarking results? Looking at your config I suggest you read the performance tuning chapter in the DRBD user guide ... assuming you have a raid controller with non volatile cache and 10GBit interconnect you should really get quite near to native speed with DRBD Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now resource data { protocol C; startup { wfc-timeout 0; } disk { on-io-error detach; } syncer { rate 20M; verify-alg md5; } on surtax { device/dev/drbd0; disk /dev/sda; address 10.1.42.11:7788; meta-disk internal; } on relax { device/dev/drbd0; disk /dev/sda; address 10.1.42.12:7788; meta-disk internal; } } What's wrong on my configuration ? Thank's Matthieu ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] 4-way replication
Hello, On 01/16/2012 11:05 AM, Benjamin Knoth wrote: Hello, i think it's clear and the pacemaker config is also clear but i can't get a positiv result. I started with 4 maschines vm01 vm02 vm03 vm04 On vm01 and vm02 i created a DRBD resource with this config. resource test { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm01 { address 10.10.255.12:7003; } on vm02 { address 10.10.255.13:7003; } } On vm03 and vm04 i created this DRBD Resource resource test2 { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm03 { address 10.10.255.14:7003; } on vm04 { address 10.10.255.15:7003; } } This two unstacked resources are running. If i look in the documentation i think that i need to create the following DRBD Resource on vm01-04. resource stest { protocolA; stacked-on-top-of test2 { device /dev/drbd13; address 10.10.255.16:7009; } stacked-on-top-of test { device /dev/drbd13; address 10.10.255.17:7009; } } But if i save this and copy them to all vms i get on vm03-04 if i run drbdadm --stacked create-md stest Use the same, complete config on all nodes. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now drbd.d/stest.res:1: in resource stest, referenced resource 'test' not defined. and vm01-02 on drbd.d/stest.res:1: in resource stest, referenced resource 'test2' not defined. What do i need that vm01-02 know about test2 on vm03-04 and vm03-04 know about test on vm01-02? Both ip addresses are virtual adresses on vm01 and vm03 where test and test2 are primary That is what i understood after i look on the picture and the pacemaker configuration. Best regards Benjamin Am 13.01.2012 15:27, schrieb Andreas Kurz: Hello, On 01/13/2012 12:56 PM, Benjamin Knoth wrote: Hi, i will create a 4 node replication with DRBD. I read also the documentation. I understand also the configuration of a 3 way replication, but how do i need to config the 4 way replication? I configured 2 2way resources successfully and now i need to config the stacked resource. Have a look at: http://www.drbd.org/users-guide-8.3/s-pacemaker-stacked-resources.html#s-pacemaker-stacked-dr ... a picture says more than 1000 words ;-) resource r0-U { { protocol A; } stacked-on-top-of r0 { device /dev/drbd10; address 192.168.42.1:7788; } on charlie { device /dev/drbd10; disk /dev/hda6; address 192.168.42.2:7788; # Public IP of the backup meta-disk internal; } } Is the solution to define on server alice and bob and charlie and daisy a lower level resource with protoc C and than one stacked resource where directly the stacked resource from alice and bob communicate with the stacked resource of charlie and daisy like this configuration? Yes, configure the replication between two stacked resources. resource stacked { protocolA; stacked-on-top-of r0 { device /dev/drbd10; address 192.168.:7788; } stacked-on-top-of r0 { device /dev/drbd10; address 134.76.28.188:7788; } } Best regards Regards, Andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] 4-way replication
On 01/17/2012 03:02 PM, Benjamin Knoth wrote: Hello Andreas, Am 17.01.2012 14:51, schrieb Andreas Kurz: Hello, On 01/16/2012 11:05 AM, Benjamin Knoth wrote: Hello, i think it's clear and the pacemaker config is also clear but i can't get a positiv result. I started with 4 maschines vm01 vm02 vm03 vm04 On vm01 and vm02 i created a DRBD resource with this config. resource test { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm01 { address 10.10.255.12:7003; } on vm02 { address 10.10.255.13:7003; } } On vm03 and vm04 i created this DRBD Resource resource test2 { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm03 { address 10.10.255.14:7003; } on vm04 { address 10.10.255.15:7003; } } This two unstacked resources are running. If i look in the documentation i think that i need to create the following DRBD Resource on vm01-04. resource stest { protocolA; stacked-on-top-of test2 { device /dev/drbd13; address 10.10.255.16:7009; } stacked-on-top-of test { device /dev/drbd13; address 10.10.255.17:7009; } } But if i save this and copy them to all vms i get on vm03-04 if i run drbdadm --stacked create-md stest Use the same, complete config on all nodes. I copied this config on all nodes. And still not working? Can you provide or pastebin drbdadm dump all and cat /proc/drbd from a node that gives you that error? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Best regards Benjamin Regards, Andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] 4-way replication
Hello, On 01/17/2012 04:22 PM, Benjamin Knoth wrote: Hello, Am 17.01.2012 15:36, schrieb Andreas Kurz: On 01/17/2012 03:02 PM, Benjamin Knoth wrote: Hello Andreas, Am 17.01.2012 14:51, schrieb Andreas Kurz: Hello, On 01/16/2012 11:05 AM, Benjamin Knoth wrote: Hello, i think it's clear and the pacemaker config is also clear but i can't get a positiv result. I started with 4 maschines vm01 vm02 vm03 vm04 On vm01 and vm02 i created a DRBD resource with this config. resource test { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm01 { address 10.10.255.12:7003; } on vm02 { address 10.10.255.13:7003; } } On vm03 and vm04 i created this DRBD Resource resource test2 { device /dev/drbd3; meta-disk internal; disk /dev/vg01/test; protocolC; syncer{ rate 800M; } on vm03 { address 10.10.255.14:7003; } on vm04 { address 10.10.255.15:7003; } } This two unstacked resources are running. If i look in the documentation i think that i need to create the following DRBD Resource on vm01-04. resource stest { protocolA; stacked-on-top-of test2 { device /dev/drbd13; address 10.10.255.16:7009; } stacked-on-top-of test { device /dev/drbd13; address 10.10.255.17:7009; } } But if i save this and copy them to all vms i get on vm03-04 if i run drbdadm --stacked create-md stest Use the same, complete config on all nodes. I copied this config on all nodes. And still not working? Can you provide or pastebin drbdadm dump all and cat /proc/drbd from a node that gives you that error? on vm01 and vm02 i get for resource test on cat /proc/drbd. The not stacked resource works 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r- ns:1077644 nr:0 dw:33232 dr:1044968 al:13 bm:63 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 After i copied the config with resource stest to all 4 nodes i get the following on vm01 and vm02. drbdadm dump all drbd.d/stest.res:1: in resource stest, referenced resource 'test2' not defined. And cat /proc/drbd display only the unstacked test resource 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:528 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 On vm03 and vm04 i can't also find a stacked resource in /proc/drbd 3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r- ns:0 nr:0 dw:0 dr:536 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 drbdadm dump all drbd.d/stest.res:1: in resource stest, referenced resource 'test' not defined. You see that on the referenced resource are different between vm01-02 and vm03-04. On the example the unstacked resources had also different names. In this part DRBD need to know that the referenced resource test is also available on vm01-02 and test2 is only available on vm03-04. That is the problem what i need to solve or not? Yes ... put _all_ resource configs on _all_ nodes (and include them in your config of course): the same config on all four nodes Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Best regards Benjamin Regards, Andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Dual-primary to single node
Hello, On 01/13/2012 10:59 AM, Luis M. Carril wrote: Hello, I´m new to DRBD and I think that I have a mess with some concepts and policies. I have setup a two node cluster (of virtual machines) with a shared volume in dual primary mode with ocfs2 as a basic infrastructure for some testings. I need that when one of the two nodes goes down the other continues working normally (we can assume that the other node never will recover again), but when one node fails the other enter in WFConnection state and the volume is disconnected, I have setup the standar set of policies for split brain: after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; Which policy should I use to achieve the desired behaivour (if one node fails, the other continue working alone)? these policies only take affect if the two nodes see each other again after a split-brain and if you loose one node it is correct behaviour that the remaining node has it's DRBD resources in WFConnection state. What do you mean with: volume is disconnected? How do you manage your cluster? Pacemaker? rgmanager? Without any further information on the rest of you setup and what you think is not working correct it's unable to comment further ... Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] 4-way replication
Hello, On 01/13/2012 12:56 PM, Benjamin Knoth wrote: Hi, i will create a 4 node replication with DRBD. I read also the documentation. I understand also the configuration of a 3 way replication, but how do i need to config the 4 way replication? I configured 2 2way resources successfully and now i need to config the stacked resource. Have a look at: http://www.drbd.org/users-guide-8.3/s-pacemaker-stacked-resources.html#s-pacemaker-stacked-dr ... a picture says more than 1000 words ;-) resource r0-U { { protocol A; } stacked-on-top-of r0 { device /dev/drbd10; address 192.168.42.1:7788; } on charlie { device /dev/drbd10; disk /dev/hda6; address 192.168.42.2:7788; # Public IP of the backup meta-disk internal; } } Is the solution to define on server alice and bob and charlie and daisy a lower level resource with protoc C and than one stacked resource where directly the stacked resource from alice and bob communicate with the stacked resource of charlie and daisy like this configuration? Yes, configure the replication between two stacked resources. resource stacked { protocolA; stacked-on-top-of r0 { device /dev/drbd10; address 192.168.:7788; } stacked-on-top-of r0 { device /dev/drbd10; address 134.76.28.188:7788; } } Best regards Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Metadata question
Hello, On 01/09/2012 01:41 PM, Peter Beck wrote: Hi guys, I have a question: If I use a partition with 1 TB, I should create a 33 MB metadata partition. What if I resize the 1TB partition ? Now the (internal) metadata partition should be resized too ? Or will it automatically attached at the end ? How does that exactly work ? use internal meta-data and all is done automagically for you ... and yes, on an online resize of the DRBD device the meta-data is also resized and moved to the (new) end of the underlying device. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Best Regards Peter signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] blocking I/O with drbd
On 12/29/2011 11:50 AM, Volker wrote: Hi all, ok ... still drbd 8.3.8 ... what does iostat -dx say during your test? sadly, yes. elrepo is not an option here because its unofficial. After hunting the problem in the perc6-caches, page-caches, lvm-alignment, etc. and not finding anything, i managed to convince some people here. We are going to try the packages from the elrepo-repository. any advice besides use the latest!? I was going to use drbd83-utils-8.3.12-1.el5.elrepo.x86_64.rpm kmod-drbd83-8.3.12-1.el5.elrepo.x86_64.rpm fine since 8.4.x is sort of bleeding edge and still might contain some bugs. Even though the update on a test-host went flawlessly: Is there anything particular i need to look after before/after the update of the packages? no Any notes on compatibility between 8.3.8-1 and 8.3.12-1 i should be aware of? none that I am aware of Once the host is live again, i will report if that did the trick :-) I'm curious too ;-) Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now regards volker ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Best replication link - 10Gbit ethernet, Infiniband, Dolphin
Hello, On 12/27/2011 01:54 PM, Phil Stricker wrote: Hi! At the moment, I am thinking about a new system using DRBD (Xenserver with VMs using databases). To get higher throughput and lower latencies, I wanted to stop using 1 Gbit Ethernet as replication link and started to read posts about DRBD with alternative connections like: - 10 Gbit/s Ethernet - Infiniband (IPoIB) - Dolphin 10 Gbit Ethernet would be the easiest an cheapest solution, but is it a good idea to use it? I successfully integrated several 10Gb/s setups and they work fine. Easy to setup and tune and well supported if you use one of the usual suspects regarding vendor. With infiniband/dolphin you can get lower latency and with sdp/supersockets also higher bandwith but they are typically more complicated to setup/tune. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now What are your expiriences? Best wishes, Phil signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] blocking I/O with drbd
Hello Volker, On 12/15/2011 01:19 PM, Volker wrote: Hi all, we've been using drbd for about six month now, and so far everything is working pretty well. Our setup is like this with two identical machines (besides the actual HDDs). Dell 2950 III - 16GB Ram, Dual-Quadcore Xeons 2.0GHz - Redhat Enterprise Linux 5.7, 2.6.18-238.19.1.el5 - PERC6/i Raid-Controller - 2 Disk OS, Raid 1 (sda) - 4 Disk Content, Raid 10 (sdb) - 500GB /dev/sdb5 extended partition - LVM-Group 'content' on /dev/sdb5 - 400GB LVM-Volume 'data' created in LVM-Group 'content' - DRBD with /dev/drbd0 on /dev/content/data (content being the LVM-Group, data being the LVM-Volume) - /dev/drbd0 is mounted with noatime,ext3-ordered-journaling and then exported with nfs3 and and mounted by 8 machines (rhel5 entirely) - replication is done using a dedicated nic with gbit The DRBD-Version is drbd 8.3.8-1 kmod 8.3.8.1 here is the information from /proc/drbd: version: 8.3.8 (api:88/proto:86-94) GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbu...@builder10.centos.org, 2010-06-04 08:04:09 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate B r ns:91617716 nr:15706584 dw:107784232 dr:53529112 al:892898 bm:37118 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 This is the latest official Version available for CentOS/Redhat from the CentOS-extras repo (as far as i know): try elrepo.org http://mirror.centos.org/centos/5/extras/x86_64/RPMS/ The configuration is identical on both nodes, looking like this: # /etc/drbd.d/global_common.conf # ## global { usage-count no; } common { protocol B; handlers {} startup {} disk { no-disk-barrier; no-disk-flushes; no-disk-drain; try replacing no-disk-drain by no-md-flushes Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now } net { max-buffers 8000; max-epoch-size 8000; unplug-watermark 1024; sndbuf-size 512k; } syncer { rate 25M; al-extents 3833; } } ## # /etc/drbd.d/production.conf # ## resource eshop { device/dev/drbd0; disk /dev/content/data; meta-disk internal; on nfs01.data.domain.de { address 10.110.127.129:7789; } on fallback.dta.domain.de { address 10.110.127.130:7789; } } ## The problem we have with this setup is quite complicated to explain. The read/write-performance in daily production use is sufficient to not effect the entire platform. The usual system-load viewed using top is pretty low, usually between 0.5 and 3. As soon as i produce some artifical i/o on /dev/drb0 on the master, the load pretty much explodes (up to 15) because of blocking i/o. The i/o is done with dd and pretty small files of bout 40MB: dd if=/dev/zero of=./test-data.dd bs=4096 count=10240 Two successive runs like this, make the load go up as far as 10-12 rendering the whole system useless. In this state, a running dd can not be interruped, the nfs-exports are totally inaccessible and the whole production-system is at a stand still. Using blktrace/blkparse one can see, that absolutely no i/o is possible. 'top' shows one or two cores at 100% wait: ### Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st ### This lasts for about 3-4 minutes with the load slowly degrading. This behaviour can also be reproduced by using the megacli and querying the raid-controller for information. A couple of successive runs result in the above described behaviour. And here comes the catch: This only happens, if the drbd-layer is in use. If i produce heavy i/o (100 runs of writing a 400MB) on - the same block-device /dev/sdb - in the same volume-group 'content' - but on a newly created _different_ LVM-Volume - without usind the drbd-layer the system-load marginally rises, but i/o is never blocking. Things i have tried: - switching from protocol C to B - disk: no-disk-barrier / no-disk-flushes / no-disk-drain; - net: max-buffers / max-epoch-size / unplug-watermark / sndbuf-size - syncer: rate, al-extents - various caching settings on the
Re: [DRBD-user] DRBD failover between datacenters if one's network fails
Hello, On 12/15/2011 06:55 PM, Trey Dockendorf wrote: On Dec 15, 2011 10:22 AM, Felix Frank f...@mpexnet.de mailto:f...@mpexnet.de wrote: Hi, On 12/15/2011 05:09 PM, Trey Dockendorf wrote: Thanks for the input. Your right in that 2 days is too little time to do this, so I'm going to manual route of shutting one server down at a time, migrating the virtual disks then bringing it back up on the remote site. I had thought the QCow images were on one disk. If there are indeed several disks you can sync, yes, you can take that route. To avoid more downtime of manual migration once this is all over with, I think I will first attempt just getting a DRBD resource up and running to sync my servers back to the primary datacenter. Can a DRBD resource on an existing LVM be done without effecting the data ? Also since I Yes, provided you can a) enlarge the LV a bit to use internal meta data or b) have some extra space on both machines to create an external meta data disk. don't plain to have automatic failover, any precautions I should take if the network connection is lost between the two datacenters ? Ideally this would allow me to have minimal downtime while the nodes re-sync. Resyncing does not require downtime. Migrating the VMs to the other DRBD peer needs downtime, and it's always brief. I cannot think of any required precautions. So the actual plan is to migrate the VMs before the connection is lost? Great, this way you get away with an (arbitrarily long) quicksync once the link returns and once that's finished, you can migrate back at your leisure. Cheers, Felix All the qcow images are in pools located on the same logical volume. Your correct. The plan is to migrate before the fiber repair and network outage then sync them back with DRBD. The meta space, does it have to be stored separate from the replicated LVM? I have a few 100 GBs left on that device. meta-data is located at the end of a device (internal) or on an extra device. About 32MB per 1TB is needed ... exact calculation: http://www.drbd.org/users-guide/ch-internals.html#s-meta-data-size Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thanks - Trey ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] blocking I/O with drbd
Hello Volker, On 12/15/2011 05:33 PM, Volker wrote: Hi Andreas, no-disk-drain; try replacing no-disk-drain by no-md-flushes Thanks for your suggestion. Unfortunately setting that made it worse. Shortly after $ drbdadm adjust content The load on the master went up to 4 and did not decrease afterwards. After removing 'no-md-flushes' the load went down to around 1-1.5 again. hmmm ... that is unexpected. This behaviour would maybe make sense if your controller has no cache at all ... or it is configured to only cache reads. But: There were two resyncs directly after activating and deactivating it. It looked like below: [root@nfs01 nfs]# cat /proc/drbd version: 8.3.8 (api:88/proto:86-94) Really do an upgrade! ... elrepo seems to have latest DRBD 8.3.12 packages GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbu...@builder10.centos.org, 2010-06-04 08:04:09 0: cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate B r ns:902760 nr:13965696 dw:14865000 dr:75668 al:1163261 bm:45844 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:1737728 [...] sync'ed: 89.0% (1696/15332)M queue_delay: 0.0 ms finish: 0:00:52 speed: 32,928 (25,156) want: 25,600 K/sec ### As you can see, the rate is at around 25MB, which is fine and fast enough. The system-load on the master is not affected by this resync. Why these resyncs happen and so much data is being resynced, is another case. The nodes were disconnected for 3-4 Minutes which does not justify so much data. Anyways... If you adjust your resource after changing an disk option the disk is detached/attached ... this means syncing the complete AL when done on a primary ... 3833*4MB=15332MB One further note regarding the blocking-io: After issueing the mentioned dd command $ dd if=/dev/zero of=./test-data.dd bs=4096 count=10240 10240+0 records in 10240+0 records out 41943040 bytes (42 MB) copied, 0.11743 seconds, 357 MB/s you benchmark your page cache here ... add oflag=direct to dd to bypass it dd finishes within a couple of seconds (1-2) and the system-load does not increase right away. It takes about 4-5 seconds for the load to increase up to around 5-6. If i would issue a second dd-command right after the first one finishes, the load would increase even higher than 5-6 with the second dd command being uninterruptible. looks like I/O system or network is fully saturated Interestingly dd _always_ reports speeds of 200-350MB which is obviously not the case. Any more ideas? try another RAID controller if DRBD upgrade is not enough. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now greetings volker ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Userspace- vs. Kernelspaceversions
On 11/30/2011 08:57 PM, Arnold Krille wrote: Hi again, public thanks to Andreas Kurz, who answered and helped in private (altough only writing one email:). yeah ... sorry, wrong reply button According to his recommendation I updated both nodes to 8.3.12. That alone didn't fix the performance problem. But setting al-extents 3389; (and getting the parameter correctly) seems to fix the problem. good to hear. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Gotta see if it proves itself tomorrow when my colleagues want to do their daily share of work. Have a good night, Arnold On Wednesday 30 November 2011 17:03:21 Arnold Krille wrote: I am a bit stumped currently. Our drbd setup seems to include extra low performance as a feature and I don't exactly know where the reason is. On thing the bothers bothers us is that the versions of the userspace tools and the kernel module differ (drbd8-utils v8.3.7 vs module v8.3.9). The newer kernel-module is a side-effect of the kernel from debian squeeze backports needed for the new network-cards. But the versions installed on both nodes are the same. Using latency- and throughput tests from the drbd documentation, the latency rises by a factor of ten and the throughput sinks by a factor of ~5 from the baccking devices to the drbd device. So here are my questions: - Could the mixed version be the reason for the performance penalty? - Would it be save to downgrade the kernel (and compile the network-driver by hand) or is the meta-data on the disk incompatible? - Or would you rather update the userspace tools to match the modules version? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] problem with diskless state
Hello, On 11/28/2011 11:05 AM, Michael Schumacher wrote: Hi, I am running a CENTOS6 server that is temporarily stand alone. I succeeded installing drbd on this stand alone machine and I am planning to add a secondary machine soon to run drbd in a useful primary/secondary configuration. However, it was necessary to get the first machine up and running. This weekend, I had to reboot the machine and are facing now problems to get it up and running again. This is what /prod/drbd is saying: ---8--- version: 8.4.0 (api:1/proto:86-100) GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by dag@Build64R6, 2011-08-12 09:40:17 0: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 1: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 ---8--- Please use 8.4.0 only for testing, it has som stability issues ... wait for 8.4.1 if you plan to use it for producive systems. this is what /var/log/messages is saying: ---8--- Nov 28 09:53:52 virthost1 kernel: drbd: initialized. Version: 8.4.0 (api:1/proto:86-100) Nov 28 09:53:52 virthost1 kernel: drbd: GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by dag@Build64R6, 2011-08-12 09:40:17 Nov 28 09:53:52 virthost1 kernel: drbd: registered as block device major 147 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting worker thread (from drbdsetup [2303]) Nov 28 09:53:52 virthost1 kernel: block drbd1: open(/dev/sda6) failed with -16 Nov 28 09:53:52 virthost1 kernel: block drbd1: drbd_bm_resize called with capacity == 0 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Terminating worker thread Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting worker thread (from drbdsetup [2309]) Nov 28 09:53:52 virthost1 kernel: block drbd0: open(/dev/sda4) failed with -16 Nov 28 09:53:52 virthost1 kernel: block drbd0: drbd_bm_resize called with capacity == 0 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Terminating worker thread Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting worker thread (from drbdsetup [2312]) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: conn( StandAlone - Unconnected ) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting receiver thread (from drbd_w_fileserv [2313]) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: receiver (re)started Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: conn( Unconnected - WFConnection ) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting worker thread (from drbdsetup [2315]) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: conn( StandAlone - Unconnected ) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting receiver thread (from drbd_w_fileserv [2316]) Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: receiver (re)started Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: conn( Unconnected - WFConnection ) Nov 28 09:54:02 virthost1 kernel: block drbd1: State change failed: Need access to UpToDate data Nov 28 09:54:02 virthost1 kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:02 virthost1 kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:03 virthost1 kernel: block drbd1: State change failed: Need access to UpToDate data Nov 28 09:54:03 virthost1 kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:03 virthost1 kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:04 virthost1 kernel: block drbd1: State change failed: Need access to UpToDate data Nov 28 09:54:04 virthost1 kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:04 virthost1 kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:05 virthost1 kernel: block drbd1: State change failed: Need access to UpToDate data Nov 28 09:54:05 virthost1 kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:05 virthost1 kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:06 virthost1 kernel: block drbd1: State change failed: Need access to UpToDate data Nov 28 09:54:06 virthost1 kernel: block drbd1: state = { cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown r- } Nov 28 09:54:06 virthost1 kernel: block drbd1: wanted = { cs:WFConnection ro:Primary/Unknown
Re: [DRBD-user] problem with diskless state
Hello Michael, don't forget to post to mailing-list ;-) On 11/28/2011 03:21 PM, Michael Schumacher wrote: Dear Andreas, On Monday, November 28, 2011 you wrote: Any change PV signatures on these disk were detected and therefore the VGs where activated automatically? If yes, please adjust your lvm.conf to ignore it. Don't forget to recreate your initrd/initramfs to also update it in there. This could have happened. Hm, I am still wondering. If I will adjust my lvm.conf accordingly to avoid being activated, this still will not repair the damage on my drbd disks. Why do you think there is something damaged? Right? My drbd disks may still be unaccessible? I'd suspect disk to be attachable after you stopped all vgs. Deactivating/removing lvm cache in addition to the extended filtering is also a good idea. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Any experiences with this kind of setup
Hello, On 11/23/2011 10:00 AM, Thomas Reinhold wrote: Hi list, I'm operating several DRBD clusters and am currently planning a new one. I just would like to know if you have any experience with or suggestions for this kind of stack: - RAID-Controller: LSI MegaRAID SAS 9260-4i with BBU and SAS-HDDs -- DRBD (protocol C, 2-node) --- LVM2 dm-crypt (aes:xts-plain:sha1:512) - VM (KVM) -- DBMS (MySQL) Everything is based on Ubuntu LTS (10.04). The DRBD version shipped with the OS is 8.3.7 with Ubuntu patches (2:8.3.7-1ubuntu2.2). We have such a setup running. On Debian Squeeze, backports kernel and latest DRBD 8.3 ... works like a charm ;-) Any No-Nos in here? Any comments greatly appreciated! Not really, though you want to make sure you have aesni_intel module available when running on current Intel Xeon hardware to get support for AES-NI. In combination with a kernel that supports multiple encryption pipes you get nearly native write performance ... without you are stuck to what one cpu can encrypt. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Thanks, Thomas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] How to recover data from node3
On 11/16/2011 01:58 PM, fosiul alam wrote: Hi Andreas Thanks for your response. I read that link so many times. and tryed what its say.. but not luck bellow what i have done so far .. Denmkar link : root@drbd-drs:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res csro ds p mounted fstype 10:home-U WFConnection Primary/Unknown UpToDate/DUnknown A 11:data-U WFConnection Primary/Unknown UpToDate/DUnknown A Uk link : root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 0:home Connected Secondary/Secondary UpToDate/UpToDate C 1:data Connected Secondary/Secondary UpToDate/UpToDate C You have to bring up the stacked resources in secondary mode on either drbd1 or drbd2 ... without this step, there is no device drbd3 can connect to. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now now I need to make syncronization bewtween Drbd-drs to Drbd1 Because its Split Brain according to document. I need to tell drbd1 to use as secondary and syncronized from drbd-drs so in drbd1 root@drbd1:~# drbdadm secondary --stacked data-U 11: Failure: (127) Device minor not allocated Command 'drbdsetup 11 secondary' terminated with exit code 10 so its does not take the secondary command .. in drbd-drs ( im teling it to be primary ) root@drbd-drs:~# drbdadm primary data-U root@drbd-drs:~# drbdadm primary home-U root@drbd-drs:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res csro ds p mounted fstype 10:home-U WFConnection Primary/Unknown UpToDate/DUnknown A 11:data-U WFConnection Primary/Unknown UpToDate/DUnknown A root@drbd-drs:~# drbdadm connect data-U 11: Failure: (125) Device has a net-config (use disconnect first) Command 'drbdsetup 11 net 172.31.3.4:7789 http://172.31.3.4:7789 172.31.2.4:7789 http://172.31.2.4:7789 A --set-defaults --create-device --shared-secret=secret' terminated with exit code 10 so it does take those spilit brain command .. I am missign something but dont understand what .. what steps are those .. thanks for your help ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] How to recover data from node3
On 11/16/2011 02:41 PM, fosiul alam wrote: Hi I was trying to simulate the process again .. Now when i tryed to make drbd1 as Secondary root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 0:homeConnected Primary/Secondary UpToDate/UpToDate C 1:dataConnected Primary/Secondary UpToDate/UpToDate C 10:home-U^^0 StandAlone Secondary/Unknown UpToDate/DUnknown r 11:data-U^^1 StandAlone Secondary/Unknown UpToDate/DUnknown r fine ... so nearly ready for resync Now as described in documentation.. in DRBD-DRS root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 10:home-U StandAlone Primary/Unknown UpToDate/DUnknown r 11:data-U StandAlone Primary/Unknown UpToDate/DUnknown r root@drbd-drs:/# drbdadm connect home-U root@drbd-drs:/# drbdadm connect data-U root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res csro ds p mounted fstype 10:home-U WFConnection Primary/Unknown UpToDate/DUnknown A 11:data-U WFConnection Primary/Unknown UpToDate/DUnknown A so its gone back to WFconnection .. yes, because drbd2 is in Standalone mode ... so no drbd network configuration on drbd2 ... now you can execute the drbdadm -S -- --discard-my-data connect _stacked_resource_ cmd you read in DRBD users guide on drbd2 if you want it to be Synctarget for drbd-drs. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Shall do you want me to do ? (a) I make primary on Low lever device on DRBD1 (B) I turn on Stacked device on DRBD1 (c) I set secondary on Stacked devices. (DRBD2) so Again I am missing something .. Please help me bit more .. Thanks for your help On 16 November 2011 13:20, fosiul alam expertal...@gmail.com mailto:expertal...@gmail.com wrote: Hi Felix and /Andreas/ thanks, its working now .. but only thing i have done different this time is invalidate the data in drbd1 .. i will simulate the same process couple of time more.. and will come back toyou ... Thanks again on both On 16 November 2011 13:00, Felix Frank f...@mpexnet.de mailto:f...@mpexnet.de wrote: Hi, On 11/16/2011 10:47 AM, fosiul alam wrote: root@drbd1:~# /etc/init.d/drbd status0:home Connected Secondary/Secondary UpToDate/UpToDate C 1:data Connected Secondary/Secondary UpToDate/UpToDate so if drbd1 is connected... Croot@drbd2:~# /etc/init.d/drbd status0:home WFConnection Primary/UnknownUpToDate/Outdated C 1:data WFConnection Secondary/Unknown UpToDate/Outdated C ...what is it connected to? I wonder. Getting both your local nodes in Secondary/Secondary is fine. The you must make one Primary and bring up the stacked resources. If you do get split brain between your local stacked resources and the remote resources, you do have to resolve it as hinted by Andreas. Bear in mind that your victim has stacked resources, you will have to discard its data using drbdadm --stacked. HTH, Felix ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] How to recover data from node3
On 11/16/2011 04:53 PM, fosiul alam wrote: Hi Flex : bellow is my another simulation i have put step by step , I execute command on both drbd1 and drbd-drs. i have posted the output after each effect from both server. DRBD1 and DRBD2 is OFF (uk is off) DRBD-DRS is on and primary root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res csro ds p mounted fstype 10:home-U WFConnection Primary/Unknown UpToDate/DUnknown A 11:data-U WFConnection Primary/Unknown UpToDate/DUnknown A DRBd1 : root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 0:home Connected Primary/Secondary UpToDate/UpToDate C 1:data Connected Primary/Secondary UpToDate/UpToDate C root@drbd1:~# root@drbd1:~# drbdadm up --stacked data-U root@drbd1:~# drbdadm up --stacked home-U root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 0:homeConnected Primary/Secondary UpToDate/UpToDate C 1:dataConnected Primary/Secondary UpToDate/UpToDate C 10:home-U^^0 StandAlone Secondary/Unknown UpToDate/DUnknown r 11:data-U^^1 StandAlone Secondary/Unknown UpToDate/DUnknown r Afater I execute previous command, when i check DRBD-DRS : root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 10:home-U StandAlone Primary/Unknown UpToDate/DUnknown r 11:data-U StandAlone Primary/Unknown UpToDate/DUnknown r -- DRBD1 :: root@drbd1:~# drbdadm connect --stacked data-U root@drbd1:~# drbdadm connect --stacked home-U root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res csro ds p mounted fstype 0:homeConnected Primary/Secondary UpToDate/UpToDate C 1:dataConnected Primary/Secondary UpToDate/UpToDate C 10:home-U^^0 WFConnection Secondary/Unknown UpToDate/DUnknown A 11:data-U^^1 WFConnection Secondary/Unknown UpToDate/DUnknown A DRBD-DRS: root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 10:home-U StandAlone Primary/Unknown UpToDate/DUnknown r 11:data-U StandAlone Primary/Unknown UpToDate/DUnknown r -- Now if i execute drbdadm connect on DRBD-DRS : root@drbd-drs:/# drbdadm connect data-U root@drbd-drs:/# drbdadm connect home-U root@drbd-drs:/# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 10:home-U StandAlone Primary/Unknown UpToDate/DUnknown r 11:data-U StandAlone Primary/Unknown UpToDate/DUnknown r now output from DRBD1 root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version: 8.3.7 (api:88/proto:86-91) srcversion: EE47D8BF18AC166BE219757 m:res cs ro ds p mounted fstype 0:homeConnected Primary/Secondary UpToDate/UpToDate C 1:dataConnected Primary/Secondary UpToDate/UpToDate C 10:home-U^^0 StandAlone Secondary/Unknown UpToDate/DUnknown r 11:data-U^^1 StandAlone Secondary/Unknown UpToDate/DUnknown r --- So connect does not do anything ... .. Did you _really_ read this? http://www.drbd.org/users-guide-legacy/s-resolve-split-brain.html I strongly doubt! ... there is really no need to invalidate and do a full sync if this is only a split brain situation. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now now if i invalidate .. in DRBD1 root@drbd1:~# drbdadm invalidate --stacked data-U root@drbd1:~# drbdadm invalidate --stacked home-U root@drbd1:~# /etc/init.d/drbd status drbd driver loaded OK; device status: version:
Re: [DRBD-user] Can i add Third Node with existing DRBD setup ?
On 11/16/2011 06:13 PM, fosiul alam wrote: Hi I just completed my test of DRBD with 3node our Situation is like ethis : We will have 2 node in Uk and One node in Denmark. suppose If i create 2 node in Uk with internal meta-disk options, with bellow drbd.conf http://www.drbd.org/users-guide/re-drbdconf.html and if its runs for couple of month .. then if i want to add a 3rd node .. will i be able to do this without lossing data of node1 and node2 ?? If you already know for sure you will add a third node you can set up your system with stacked devices from the beginning: http://www.drbd.org/users-guide-8.3/s-three-nodes.html ... and run without third node. This has the advantage that there will be no service downtime when adding the third node. If you want to start with a two-node setup, this is also possible but you need to prepare your setup to keep some free space at the end of the device for the internal meta-data that is obligatory for a stacked resource. No problem if you use LVM, then you can resize the device later ... or you create a file system smaller than the device ... or change to external meta-date for the lower resource later ... there are several possibilities ... Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Please advise .. Thanks Fosiul. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] mkfs on /dev/drdb0 crashes host
On 11/10/2011 08:49 AM, glennv wrote: Dear drbd experts, Please some help for a newbie. Checked the complete internet and found only one similar post but without any solution or hints. 2 Linux 32bits server nodes 11.10 (in VMware Fusion) A 30GB device setup on both nodes and initial sync is fine. Can switch from primary to secondary etc. But the moment i want to create a filesystem on the primary (mkfs.ext3 /dev/drdb0) the node crashes every time half way in the mkfs. Any ideas/ hints ? drbd version+config? kernel version? kernel logs? ... any information? Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Increasing a DRBD array
On 10/22/2011 05:47 PM, Gerald Brandt wrote: Okay, this is the plan for changing an primary/secondary drbd from 'meta-disk /dev/sda6[1] to 'flexible-meta-disk /dev/sda6' sda6 is already 512 MB in size, which in theory will let me go to 16 TB storage (128 MB is 4 TB storage). yes, 512MB is fine on secondary: 1. drbdadm down iscsi.target.0 2. edit drbd.conf to reflect meta-disk change (also on primary?) no, do it on the secondary only for now 3. drbdadm create-md iscsi.target.0 4. drbdadm up iscsi.target.0 5. wait for full sync to take place on primary you mean on the previous primary ... as you might want to do a switchover to the already prepared secondary 6-10 same as 1 - 5 above At with point I should be able to do drbdadm resize iscsi.target.0, and I should see 6TB of storage. after point 4 where the previous primary connects to the already updated previous secondary the resize starts automatically as the two nodes detect they both have more diskspace and metadata that can handle this ... so you will see a resync of the missing 2TB That that sound right to everyone? It should give me no downtime and the ability to have more than 4TB storage. No downtime? ... except the time you need to switch Roles and therefore migrate services ... but I assume you let your cluster manager do this job. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Gerald - Original Message - From: Gerald Brandt g...@majentis.com To: Andreas Kurz andr...@hastexo.com Cc: drbd-user drbd-user@lists.linbit.com Sent: Saturday, October 22, 2011 8:33:15 AM Subject: Re: [DRBD-user] Increasing a DRBD array Hi, - Original Message - From: Andreas Kurz andr...@hastexo.com To: drbd-user drbd-user@lists.linbit.com Sent: Friday, October 21, 2011 6:12:11 PM Subject: Re: [DRBD-user] Increasing a DRBD array On 10/21/2011 11:39 PM, Gerald Brandt wrote: Hi, I just saw that (google is my friend). Can I change that on a running drbd system? hmm ... never tried changing meta-date that way ... shutdown, dump-md, reconfigure, create-md, restore-md might work ... maybe Lars has a hint ... I would bring DRBD down on both nodes, stop it when all is in sync and recreate the meta data after changing the config and then skip the initial sync when bringing them up. I really can't bring the nodes down. I can bring down one at a time, but the systems have to stay running. ie: original: on iscsi-filer-1 {simply use meta-disk internal; device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; meta-disk /dev/sda6[1]; } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; meta-disk /dev/sda6[1]; } new: on iscsi-filer-1 { device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; flexible-meta-disk /dev/sda6[1]; no ... that index thing only works for static meta-disk ... remove the [1] and resize /dev/sda6 if its not bigger than 196MB. I'm not sure I understand. /dev/sda6 is already 512 MB (I think). Should I change to: on iscsi-filer-1 { device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; flexible-meta-disk /dev/sda6; } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; flexible-meta-disk /dev/sda6; } or would this be better: on iscsi-filer-1 { device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; meta-disk internal; } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; meta-disk internal; } I'll go back to my lists to see if I'm doing things right. 1. bring down the secondary 2. change the secondary to 'flexible-meta-data /dev/sda6' in drbd.conf on primary and secondary. 3. bring secondary back up (may re-sync entire disk, not a serious issue, just time) 4. repeat process for primary after re-sync (may cause another complete resync). Gerald Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; flexible-meta-disk /dev/sda6[1]; } Then reboot primary, followed by reboot secondary (after sync), and all will be well? Sorry if these seem to be noob questions. I just want to be 100% sure, as the file servers have live data on them. Gerald - Original Message - From: Andreas Kurz andr...@hastexo.com To: drbd-user@lists.linbit.com Sent: Friday, October 21, 2011 4:26:13 PM Subject: Re: [DRBD-user] Increasing a DRBD array
Re: [DRBD-user] Questions Regarding Configuration
On 10/23/2011 09:39 PM, Nick Khamis wrote: The following works as expected: node mydrbd1 \ attributes standby=off node mydrbd2 \ attributes standby=off primitive myIP ocf:heartbeat:IPaddr2 \ op monitor interval=60 timeout=20 \ params ip=192.168.2.5 cidr_netmask=24 \ nic=eth1 broadcast=192.168.2.255 \ lvs_support=true primitive myDRBD ocf:linbit:drbd \ params drbd_resource=r0.res \ op monitor role=Master interval=10 \ op monitor role=Slave interval=30 ms msMyDRBD myDRBD \ meta master-max=1 master-node-max=1 \ clone-max=2 clone-node-max=1 \ notify=true globally-unique=false group MyServices myIP order drbdAfterIP \ inf: myIP msMyDRBD location prefer-mysql1 MyServices inf: mydrbd1 location prefer-mysql2 MyServices inf: mydrbd2 ?? property $id=cib-bootstrap-options \ no-quorum-policy=ignore \ stonith-enabled=false \ expected-quorum-votes=5 \ dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \ cluster-recheck-interval=0 \ cluster-infrastructure=openais rsc_defaults $id=rsc-options \ resource-stickiness=100 However, when modifying the order entry to: order drbdAfterIP \ inf: myIP:promote msMyDRBD:start DRBD no longer works. And when adding the following colocation: yes, the promote of the IP will never happen as it is a) only configured as primitve and b) IPaddr2 does not support a promote action ... no IP promote, no DRBD start ... colocation drbdOnIP \ inf: MyServices msMyDRBD:Master none of the resources work. tried removing those obscure two location constraints? Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] [Linux-HA] violate uniqueness for parameter drbd_resource
Hello, On 10/19/2011 04:49 PM, Nick Khamis wrote: Hello Everyone, What we have is a 4 node cluster: 2 Running mysql on a active/passive, and 2 running our application on an active/active: MyDRBD1 and MyDRBD2: Mysql, DRBD (active/passive) ASTDRBD1 and ASTDRBD2: In-house application, DRBD dual primary A snippet of our config looks like this: node mydrbd1 \ attributes standby=off node mydrbd2 \ attributes standby=off node astdrbd1 \ attributes standby=off node astdrbd2 \ attributes standby=off primitive drbd_mysql ocf:linbit:drbd \ params drbd_resource=r0.res \ op monitor role=Master interval=10 \ op monitor role=Slave interval=30 . primitive drbd_asterisk ocf:linbit:drbd \ params drbd_resource=r0.res \ op monitor interval=20 timeout=20 role=Master \ op monitor interval=30 timeout=20 role=Slave ms ms_drbd_asterisk drbd_asterisk \ meta master-max=2 notify=true \ interleave=true group MyServices myIP fs_mysql mysql \ meta target-role=Started group ASTServices astIP asteriskDLM asteriskO2CB fs_asterisk \ meta target-role=Started . I am recieving the following warning: WARNING: Resources drbd_asterisk,drbd_mysql violate uniqueness for parameter drbd_resource: r0.res Now the obvious thing to do is to change the resource name at the DRBD level however, I assumed that the parameter uniqueness was bound to the primitive? Only one resource per cluster should use this value for this attribute if it is marked globally-unique in the RA meta-information. Do yourself a favour and give the DRBD resources a meaningful name, how about asterisk and mysql ;-) My second quick question is, I like to use group + location to single out services on specific nodes however, when creating clones: clone cloneDLM asteriskDLM meta globally-unique=false interleave=true I am recieving ERROR: asteriskDLM already in use at ASTServices error? My question is, what are the benefits of using group + location vs. clone + location? Once a resource is in a group it cannot be used for clones/MS any more ... though you can clone a group or make it MS. With the latter I assue we will have a long list of location (one for each primitive + node)? And with the former we do not have he meta information (globally-unique, and interleave)? I assume you want to manage a cluster filesystem ... so put all the dlm/o2cb/cluster-fs resources in a group and clone it (and use interleave for this clone) Regards, Andreas -- Need help with Pacemaker or DRBD? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Increasing a DRBD array
On 10/21/2011 10:48 PM, Gerald Brandt wrote: Hi, DRBD is running directly on md0. /dev/drbd1 is then exported via iSCSI. The logs show: Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332010] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332012] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@filer-1, 2011-03-05 08:29:38 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332014] drbd: registered as block device major 147 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332015] drbd: minor_table @ 0x88021dbaf300 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334144] block drbd1: Starting worker thread (from cqueue [1489]) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334177] block drbd1: == truncating very big lower level device to currently maximum possible 8587575296 sectors == Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334179] block drbd1: == using internal or flexible meta data may help == Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334188] block drbd1: disk( Diskless - Attaching ) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.353306] Loading iSCSI transport class v2.0-870. Oct 21 15:34:53 iscsi-filer-1 kernel: [7.360582] skge eth0: disabling interface Oct 21 15:34:53 iscsi-filer-1 iscsid: iSCSI logger with pid=1515 started! Oct 21 15:34:53 iscsi-filer-1 kernel: [7.381956] iscsi: registered transport (iser) Oct 21 15:34:53 iscsi-filer-1 init: ssh main process (1162) terminated with status 255 Oct 21 15:34:53 iscsi-filer-1 postfix/master[1411]: reload -- version 2.7.0, configuration /etc/postfix Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584678] block drbd1: Found 57 transactions (3507 active extents) in activity log. Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584684] block drbd1: Method to ensure write ordering: barrier Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584691] block drbd1: Backing device's merge_bvec_fn() = a00c0100 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584694] block drbd1: max_segment_size ( = BIO size ) = 4096 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584698] block drbd1: Adjusting my ra_pages to backing device's (32 - 96) Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584704] block drbd1: drbd_bm_resize called with capacity == 8587575296 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603470] block drbd1: resync bitmap: bits=1073446912 words=16772608 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603474] block drbd1: size = 4095 GB (4293787648 KB) The lines in yellow bug me. I don't recall see them before. I had a 4 disk RAID-6 md0 (4x2TB = 4 TB RAID-6). I added a single drive (5x2TB = 6 TB array). Not using meta-disk internal or flexible-meta-disk limits the device size to 4TB (=128MB meta data size) ... change your metadata config ... as the logs suggest ... if you want to use all 6TB Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Any ideas? Gerald From: Andreas Kurz andr...@hastexo.com To: drbd-user@lists.linbit.com Sent: Friday, October 21, 2011 2:55:06 PM Subject: Re: [DRBD-user] Increasing a DRBD array On 10/21/2011 09:30 PM, Gerald Brandt wrote: Hi, I've successfully resize the lower level RAID-6 array, and grown it. I'm now attempting to resize drbd, and nothing seems to happen. /dev/md0 is definitely bigger. What should I see during a drbd resize? You should see a DRBD resync of the newly added space. What is the lower level device of your DRBD resource? The whole md0, a partition on md0, a lv on a vg on a pv on md0? ... so if the lower level device has been resized on both nodes, DRBD should definitely grow on a drbdadm resize. Did I mention that on starting DRBD it is resized automatically if a bigger lower level device is detected? ... have a look at the kernel logs... Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Gerald - Original Message - From: Gerald Brandt g...@majentis.com To: drbd-user@lists.linbit.com Sent: Tuesday, October 18, 2011 7:08:55 AM Subject: Re: [DRBD-user] Increasing a DRBD array Hi, Okay, this is my list of what to do, and in what order: 1. remove the primary from DRBD 2. add the physical disk to the primary 3. add the primary back to DRBD and allow resync. 4. remove the secondary from DRBD 5. add the physical disk to the secondary 6. add the secondary back to DRBD and allow resync. 7. fdisk and add the disk to the RAID array on primary and secondary 8. grow the RAID array
Re: [DRBD-user] Increasing a DRBD array
On 10/21/2011 11:39 PM, Gerald Brandt wrote: Hi, I just saw that (google is my friend). Can I change that on a running drbd system? hmm ... never tried changing meta-date that way ... shutdown, dump-md, reconfigure, create-md, restore-md might work ... maybe Lars has a hint ... I would bring DRBD down on both nodes, stop it when all is in sync and recreate the meta data after changing the config and then skip the initial sync when bringing them up. ie: original: on iscsi-filer-1 {simply use meta-disk internal; device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; meta-disk /dev/sda6[1]; } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; meta-disk /dev/sda6[1]; } new: on iscsi-filer-1 { device /dev/drbd1; disk/dev/md0; address 192.168.95.1:7789; flexible-meta-disk /dev/sda6[1]; no ... that index thing only works for static meta-disk ... remove the [1] and resize /dev/sda6 if its not bigger than 196MB. Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now } on iscsi-filer-2 { device /dev/drbd1; disk/dev/md0; address 192.168.95.2:7789; flexible-meta-disk /dev/sda6[1]; } Then reboot primary, followed by reboot secondary (after sync), and all will be well? Sorry if these seem to be noob questions. I just want to be 100% sure, as the file servers have live data on them. Gerald - Original Message - From: Andreas Kurz andr...@hastexo.com To: drbd-user@lists.linbit.com Sent: Friday, October 21, 2011 4:26:13 PM Subject: Re: [DRBD-user] Increasing a DRBD array On 10/21/2011 10:48 PM, Gerald Brandt wrote: Hi, DRBD is running directly on md0. /dev/drbd1 is then exported via iSCSI. The logs show: Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332010] drbd: initialized. Version: 8.3.7 (api:88/proto:86-91) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332012] drbd: GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@filer-1, 2011-03-05 08:29:38 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332014] drbd: registered as block device major 147 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332015] drbd: minor_table @ 0x88021dbaf300 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334144] block drbd1: Starting worker thread (from cqueue [1489]) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334177] block drbd1: == truncating very big lower level device to currently maximum possible 8587575296 sectors == Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334179] block drbd1: == using internal or flexible meta data may help == Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334188] block drbd1: disk( Diskless - Attaching ) Oct 21 15:34:53 iscsi-filer-1 kernel: [7.353306] Loading iSCSI transport class v2.0-870. Oct 21 15:34:53 iscsi-filer-1 kernel: [7.360582] skge eth0: disabling interface Oct 21 15:34:53 iscsi-filer-1 iscsid: iSCSI logger with pid=1515 started! Oct 21 15:34:53 iscsi-filer-1 kernel: [7.381956] iscsi: registered transport (iser) Oct 21 15:34:53 iscsi-filer-1 init: ssh main process (1162) terminated with status 255 Oct 21 15:34:53 iscsi-filer-1 postfix/master[1411]: reload -- version 2.7.0, configuration /etc/postfix Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584678] block drbd1: Found 57 transactions (3507 active extents) in activity log. Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584684] block drbd1: Method to ensure write ordering: barrier Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584691] block drbd1: Backing device's merge_bvec_fn() = a00c0100 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584694] block drbd1: max_segment_size ( = BIO size ) = 4096 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584698] block drbd1: Adjusting my ra_pages to backing device's (32 - 96) Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584704] block drbd1: drbd_bm_resize called with capacity == 8587575296 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603470] block drbd1: resync bitmap: bits=1073446912 words=16772608 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603474] block drbd1: size = 4095 GB (4293787648 KB) The lines in yellow bug me. I don't recall see them before. I had a 4 disk RAID-6 md0 (4x2TB = 4 TB RAID-6). I added a single drive (5x2TB = 6 TB array). Not using meta-disk internal or flexible-meta-disk limits the device size to 4TB (=128MB meta data size) ... change your metadata config ... as the logs suggest ... if you want to use all 6TB Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now Any ideas? Gerald From: Andreas Kurz andr...@hastexo.com To: drbd-user
Re: [DRBD-user] DRBD on Encrypted FS
hello, On 10/06/2011 12:24 AM, Bill Asher wrote: Today I did a little test to see if I could configure DRBD on encrypted LVs and what I found is it didn't work for me... Because the servers are located in a colo, security for the servers is the main reasoning. All seems to go good until I tell DRBD to mirror filerA logical volume(/dev/vg/data) to filerB LV (/dev/vg/data). I then received errors on the console like this, over and over: Block drbd0: open(/dev/vg/data) failed with -16 I then rebooted to Ubuntu CD to look at the LVs and.. they were all gone. The only thing the partitioner sees is the two partitions I created, one for /boot the other for logical volumes, but all my lvm tables were gone. I was able to repeat this issue on both my filers. So my question is.. a) can this even be done, encrypting the filesystem then configureing DRBD b) if encryption can be done, is my approach wrong? Thank you in advance for your time. ... if you want to encrypt a _blockdevice_ and one possible solution is: * encrypt a complete partition/disk with dm-crypt/LUKS/cryptsetup * use this encrypted dm device as pv for your vg(s) * create a lv per DRBD device after every reboot you need to activate the encrypted partition using cryptsetup and e.g. your passphrase and you have to do a vgscan/vgchange prior to the activation of DRBD. and if you own a recent Intel cpu supporting AES-NI in combination with a recent kernel like 2.6.39 which supports multiple encryption pipes and the aesni_intel driver, then you get a damn fast and secure replicated storage ;-) Regards, Andreas -- Need help with DRBD? http://www.hastexo.com/now signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Recover from Split-Brain
On 09/07/2011 09:19 AM, Christian Völker wrote: Hi, as I just send out I had a short outage which ended in a Split-Brain scenario. I'm trying to recover now from this and have all drbd devices back again. Unfortunately I can't recover from split-brain. Could someone help me, please? This is the current state on the primary: [root@backuppc ~]# cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) please consider an update! GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-i386-build, 2008-10-03 11:42:32 0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown A r--- ns:0 nr:0 dw:68288 dr:820201 al:2641 bm:2636 lo:0 pe:0 ua:0 ap:0 oos:582912 This is the state on the secondary: [root@drbd ~]# cat /proc/drbd version: 8.2.6 (api:88/proto:86-88) GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-i386-build, 2008-10-03 11:42:32 0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown r--- ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0 Now I tried to recover manually as shown on drbd documentation (http://www.drbd.org/users-guide/s-resolve-split-brain.html), but it have a look at: http://www.drbd.org/users-guide-legacy/s-resolve-split-brain.html ... for DRBD 8.4 to get the old documentation with the old (still supported) cmdline syntax. Regards, Andreas doesn't know this special parameter: [root@drbd ~]# drbdadm secondary drbd0 [root@drbd ~]# drbdadm connect --discard-my-data drbd0 drbdadm: unrecognized option `--discard-my-data' try 'drbdadm help' So how can I recover now? Greetings Christian ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Directly connected GigE ports bonded together no switch
On 2011-08-09 16:46, Herman wrote: Sorry if this is covered elsewhere. I know the Linux Bonding FAQ is supposed to talk about this, but I didn't see anything specific in it on what parameters to use. Basically, I want to bond two GigE ports between two servers which are connected with straight cables with no switch and use them for DRBD. I tried the various bonding modes with miimon=100, but none of them worked. Say the eth1 ports on both servers were cabled together, and the same for eth5. Then, I could create the bond with eth1 and eth5. However, if I downed one of the ports on one server, say eth1, it would failover on that server to eth5, but the other server would not failover to eth5. Eventually, I decided to use arp_interval=100 and arp_ip_target=ip of other bonded pair instead of miimon=100. This seems to work as I expected, with the bond properly failing over. Is this the right way to do this kind of bonding? Also, right now I'm using mode=active-backup. Would one of the other modes allow higher throughput and still allow automatic failover and transparency to DRBD? use balance-rr and e.g. miimon=100, that should do fine Regards, Andreas Thanks, Herman ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Email Notfications
On 03/06/2011 09:55 PM, Matt Graham wrote: From: Gerald Brandt g...@majentis.com Is there a way to get email notifications when the servers are syncing, similar to the way mdadm does? you can use the before-resync-target, after-resync-target handlers to send a notification on start/end of a sync Regards, Andreas Within DRBD? No. That's not DRBD's job. That job is best handled by something like Nagios. Nagios is a bit heavyweight for *just* monitoring DRBD, but if you have ~70 machines all running various services, Nagios can make your life a hell of a lot easier. If you just want to monitor DRBD sync status, put together a shell or Perl script that runs on both nodes every 10 min via cron and mails a list of people when /proc/drbd matches /SyncSource|SyncTarget/ . You can search for check-drbd to find a Nagios plugin that does that; modify it for your needs. On a side note, I'd also like email notification when HA switches servers. Which HA system are you talking about? Pacemaker, Corosync, heartbeat? The answer to this will almost certainly be found in the Fine Manual for the HA system you're using. signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Sgisdela! Help! Horrible DRBD performance on CentOS 5. Where do I start looking?
Hello again, On Thursday 10 June 2010 02:32:18 Michael Joyner wrote: Two node setup for serving out NFS to vSphere. timing test is : sync; time iozone -a -e -g 4096 /data/nfs is on LVM striped across 3 DRBD devices. EXT4. data=journal. /var/tmp is local filesystem. ... don't do this! There is no guarantee all 3 DRBD devices are up-to-date at the same time Regards, Andreas both are on same set of disk platters and controller. Raid 6. 64K blocksize. FYI, the initial sync (5 TB) used a full gigabit of bandwidth without hesitation. === DRBD TIMES (both nodes up) == real0m10.028s user0m0.116s sys0m1.538s real0m10.045s user0m0.085s sys0m1.541s real0m9.990s user0m0.091s sys0m1.578s real0m9.970s user0m0.099s sys0m1.557s real0m9.960s user0m0.091s sys0m1.499s === DRBD TIMES (2nd node down) == real0m3.754s user0m0.093s sys0m1.070s real0m3.855s user0m0.079s sys0m1.064s real0m3.938s user0m0.094s sys0m1.044s real0m3.809s user0m0.066s sys0m1.067s real0m3.863s user0m0.069s sys0m1.072s === LOCAL TIMES == real0m1.770s user0m0.070s sys0m0.977s real0m1.809s user0m0.085s sys0m0.974s real0m1.737s user0m0.067s sys0m0.942s real0m2.007s user0m0.058s sys0m0.955s real0m1.808s user0m0.072s sys0m0.956s === After re-enabling 2nd node == r...@san-node-2 ~]# cat /proc/drbd version: 8.3.2 (api:88/proto:86-90) GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by mockbu...@v20z-x86-64.home.local, 2009-08-29 14:07:55 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r ns:0 nr:5180 dw:5180 dr:0 al:0 bm:24 lo:1 pe:1389 ua:0 ap:0 ep:1 wo:b oos:43964 [==.] sync'ed: 16.7% (43964/49144)K finish: 0:00:08 speed: 5,180 (5,180) K/sec 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r ns:0 nr:49044 dw:49044 dr:0 al:0 bm:44 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0 2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r ns:0 nr:6896 dw:6896 dr:0 al:0 bm:28 lo:1 pe:1346 ua:0 ap:0 ep:1 wo:b oos:42436 [===] sync'ed: 23.1% (42436/49332)K finish: 0:00:05 speed: 6,896 (6,896) K/sec === ifconfig == bond0:0 Link encap:Ethernet HWaddr 00:1B:21:26:B8:18 inet addr:192.168.XXX.XX0 Bcast:192.168.XXX.255 Mask:255.255.255.0 UP BROADCAST RUNNING MASTER MULTICAST MTU:9000 Metric:1 === rpm -qa|grep drbd == kmod-drbd83-8.3.2-6.el5_3 drbd83-8.3.2-6.el5_3 === uname -a == Linux san-node-1.ewc.edu 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux === cat /etc/issue == CentOS release 5.5 (Final) Kernel \r on an \m === lspci == 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078 (rev 04) === box to box copy test == dd if=/dev/sda of=testfile bs=1M count=1024 1024+0 records in 1024+0 records out 1073741824 bytes (1.1 GB) copied, 1.94192 seconds, 553 MB/s scp testfile r...@san-node-2:/var/tmp/testfile testfile100% 1024MB 46.6MB/s 00:22 === /etc/drbd.conf == global { usage-count yes; } common { protocol C; syncer { rate 70K; } net { sndbuf-size 0; after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; } startup { degr-wfc-timeout 3; } disk { no-disk-flushes; no-md-flushes; } } resource sdb { on san-node-1.ewc.edu { device/dev/drbd0; disk /dev/sdb; address 192.168.XXX.XX1:7789; meta-disk internal; } on san-node-2.ewc.edu { device/dev/drbd0; disk /dev/sdb; address 192.168.XXX.XX2:7789; meta-disk internal; } } resource sdc { on san-node-1.ewc.edu { device/dev/drbd1; disk /dev/sdc; address 192.168.XXX.XX1:7790; meta-disk internal; } on san-node-2.ewc.edu { device/dev/drbd1; disk /dev/sdc; address 192.168.XXX.XX2:7790; meta-disk internal; } } resource sdd { on san-node-1.ewc.edu { device/dev/drbd2; disk /dev/sdd; address 192.168.XXX.XX1:7791; meta-disk internal; } on san-node-2.ewc.edu { device/dev/drbd2; disk /dev/sdd; address 192.168.XXX.XX2:7791; meta-disk internal; } } -- : Andreas Kurz : LINBIT
Re: [DRBD-user] Sgisdela! Help! Horrible DRBD performance on CentOS 5. Where do I start looking?
On Thursday 10 June 2010 17:19:53 Michael Joyner wrote: better, but still way below box2box bandwidth is fs type or fact using lvm a factor? Here are my test results, will try w/o LVM next. I can really recommend Part V. Optimizing DRBD performance in the DRBD Users Guide ... or invest in a DRBD Healthcheck offered by Linbit ;-) Regards, Andreas On 06/10/2010 04:05 AM, Andreas Kurz wrote: LVM on your system supports barriers -- drbd supports barriers -- barriers are used to enforce write-after-write dependencies per default ... use drbd config-param 'no-disk-barrier' === new layout == using /dev/sdb as /dev/drbd0, non-spanned vg. === new drbd.conf global { usage-count yes; } common { protocol C; syncer { rate 70K; } net { sndbuf-size 0; after-sb-0pri discard-least-changes; after-sb-1pri discard-secondary; } startup { degr-wfc-timeout 3; } disk { no-disk-flushes; no-md-flushes; no-disk-barrier; } } resource sdb { on san-node-1.ewc.edu { device/dev/drbd0; disk /dev/sdb; address 192.168.75.201:7789; meta-disk internal; } on san-node-2.ewc.edu { device/dev/drbd0; disk /dev/sdb; address 192.168.75.202:7789; meta-disk internal; } } * === *** mount /dev/vgdrbd0/nfs1 /data *** === === iozone -a -e -g 4096 with peer down: real0m1.757s, real0m1.738s, real0m1.753s, real 0m1.762s, real0m1.745s === with peer up: real0m6.739s, real0m6.758s, real0m6.619s, real0m6.653s, real0m6.636s === big files test = === DRBD === rm -rfv /data/ISOS; rsync -a --verbose --human-readable --progress ~mjoyner/ISOS /data/ISOS building file list ... 8 files to consider created directory /data/ISOS ISOS/ ISOS/ISOS/ ISOS/ISOS/.lck-5a00b810 84 100%0.00kB/s0:00:00 (xfer#1, to-check=5/8) ISOS/ISOS/.lck-6500b810 84 100% 82.03kB/s0:00:00 (xfer#2, to-check=4/8) ISOS/ISOS/.lck-7700b810 84 100% 82.03kB/s0:00:00 (xfer#3, to-check=3/8) ISOS/ISOS/SW_DVD5_NTRL_SQL_Svr_2008_SP1_English_X15-51857.ISO 943.07M 100% 189.42MB/s0:00:04 (xfer#4, to-check=2/8) ISOS/ISOS/SW_DVD5_SQL_Svr_Enterprise_Edtn_2008_English_MLF_X14-89207.ISO 3.26G 100% 189.81MB/s0:00:16 (xfer#5, to-check=1/8) ISOS/ISOS/SW_DVD5_Windows_Svr_2008w_SP2_English__x64_DC_EE_SE_X15-41371.ISO 2.76G 100% 162.03MB/s0:00:16 (xfer#6, to-check=0/8) #1) sent 6.96G bytes received 164 bytes 190.60M bytes/sec #2) sent 6.96G bytes received 164 bytes 244.10M bytes/sec #3) sent 6.96G bytes received 164 bytes 244.10M bytes/sec === LOCAL FS === rm -rfv /var/tmp/ISOS; rsync -a --verbose --human-readable --progress ~mjoyner/ISOS /var/tmp/ISOS #1) sent 6.96G bytes received 164 bytes 167.63M bytes/sec #2) sent 6.96G bytes received 164 bytes 185.51M bytes/sec #3) sent 6.96G bytes received 164 bytes 220.85M bytes/sec === time umount /data real2m11.090s user0m0.000s sys1m49.566s * === *** mount -o sync,data=journal /dev/vgdrbd0/nfs1 /data *** === === iozone test, no peer real0m9.483s real0m9.583s real0m9.583s real0m9.559s real0m9.491s === izone test, w/ peer up. real0m41.558s real0m41.386s real0m41.456s real0m41.448s === BIG FILES TEST no peer) sent 6.96G bytes received 164 bytes 74.40M bytes/sec w/ peer) sent 6.96G bytes received 164 bytes 29.79M bytes/sec rsync/var/tmp2/var/tmp) sent 6.96G bytes received 164 bytes 47.49M bytes/sec === time umount /data real0m1.078s user0m0.000s sys0m1.075s ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- : Andreas Kurz : LINBIT | Your Way to High Availability : Tel +43-1-8178292-64, Fax +43-1-8178292-82 : : http://www.linbit.com LINBIT - We're the HA experts that other experts ask for help! http://www.linbit.com/en/training/ DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. This e-mail is solely for use by the intended recipient(s). Information contained in this e-mail and its attachments may be confidential, privileged or copyrighted. If you are not the intended recipient you are hereby formally notified that any use, copying, disclosure or distribution of the contents of this e-mail, in whole or in part, is prohibited. Also please notify immediately the sender by return e-mail and delete this e-mail
Re: [DRBD-user] backing_dev length value limited to 128 bytes
On Thursday 25 February 2010 12:10:40 Christian Iversen wrote: On 2010-02-25 10:21, Andreas Kurz wrote: On Wednesday 24 February 2010 14:28:38 Alexander Winkler wrote: Hello, I am curious if it would be possible (perhaps in a future version) to increase the max length for the backing_dev name to more than 128 bytes (maybe 255 bytes?)? In my current setup there are iscsi-targets whose names are generated by udev in the following manner, therefore exceeding the length-limitation: /dev/disk/by-path/ip-x.x.x.x:3260-iscsi-iqn.2001-05.com.equallogic:0-8a0 906 -9f4e88005-91300484b842-xxx-sda-lun-0 If this is not possible, any thoughts on how to circumvent this problem? use /dev/disk/by-id/ paths in your config But by-id paths look non-sensical with iSCSI. For instance, they could be /dev/disk/by-id/scsi-14945540003005f410d00 where the by-path name is very descriptive: /dev/disk/by-path/ip-10.0.0.120:3260-iscsi-iqn.2009-09.org.sikkerhed:sikker hedorg-swap-lun-10 In fact, how is it even possible to determine what disk you need when using by-id paths? I don't know one. e.g. # scsi_id -g -s /block/sdx ... where sdx is the destination of the disk/by-path link. Regards, Andreas ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] First post regarding drbd
Bernie Wu wrote: Hi Listers, I am trying to test Linux-HA with DRBD. The active/passive test environment consists of: -zVM 5.4 -SLES10-SP2 -2 nodes/guests ( lnodbbt lnodbct ) -Oracle 10.2.0.4 I have managed to setup drbd and to mount the database filesystem on lnodbbt ( active/master ). Now I want to bring up Oracle. I have created a resource ORADB and am using the ocf scripts to start oracle which doesn't work. However, I can manually run the ocf startup script and oracle comes up. My questions follow: 1. Should I be using the LSB startup scripts instead of the ocf script ? from the log: oracle[28687][28819]: 2009/09/09_09:03:58 ERROR: Oracle dssd can not mount. ... looks like startup mount is not successful. The ocf script should work fine ... did you explicitly define the user for the oracle resource? If not give it a try. 2. How do I configure drbd so that lnodbbt is the master and lnodbct is the slave ? Add a location constraint to your cluster config eg: rsc_location id=prefer-lnodbbt rsc=RG_A node=lnodbbt score=100/ 3. Do I have to set up any constraints for the ORADB resource so that it only starts up on the guest that has the /dbms mounted and what would the constraints be ? No ... you defined a resource group which sets the needed constraints implicitly. Regards, Andreas Attached is my /etc/drbd.conf and etc/ha.d/ha.cf and ha-logs from lnodbbt ( the active/master ) guest. Any help or pointers would be much appreciated. TIA Bernie The information contained in this e-mail message is intended only for the personal and confidential use of the recipient(s) named above. This message may be an attorney-client communication and/or work product and as such is privileged and confidential. If the reader of this message is not the intended recipient or an agent responsible for delivering it to the intended recipient, you are hereby notified that you have received this document in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail, and delete the original message. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- : Andreas Kurz : LINBIT | Your Way to High Availability : Tel +43-1-8178292-64, Fax +43-1-8178292-82 : : http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. This e-mail is solely for use by the intended recipient(s). Information contained in this e-mail and its attachments may be confidential, privileged or copyrighted. If you are not the intended recipient you are hereby formally notified that any use, copying, disclosure or distribution of the contents of this e-mail, in whole or in part, is prohibited. Also please notify immediately the sender by return e-mail and delete this e-mail from your system. Thank you for your co-operation. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user