Re: [DRBD-user] Having Trouble with LVM on DRBD

2016-02-25 Thread Andreas Kurz
Hello Eric,

On Thu, Feb 25, 2016 at 11:51 PM, Eric Robinson  wrote:
> Ø  Im confused I don't see the VG(s) and LV(s) under cluster control have
> you done that  bit?
>
> (blank stare)
>
> This is where I admit that I have no idea what you mean. I’ve been building
> clusters with drbd for a decade, and I’ve always had drbd on top of LVM and
> all has been well. This is the first time I have LVM on top of drbd. What am
> I missing?

Pacemaker needs to activate the VG once DRBD is primary ... this is
described here:

https://drbd.linbit.com/users-guide/s-nested-lvm.html ... and ...
https://drbd.linbit.com/users-guide/s-lvm-pacemaker.html


Regards,
Andreas

>
> --Eric
>
>
>
>
>
>
>
>
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Severe disk IO problems

2013-09-06 Thread Andreas Kurz
On 2013-08-30 19:31, Stephen Marsh wrote:
 Hi all,
 
 I've recently upgraded to DRBD 8.4.3 (protocol C) on CentOS 6.4 (kernel
 3.10.10) with Xen 4.3.0 on hardware RAID10 with an Infiniband 20Gbit/sec
 replication link.
 
 For a few days now, we've been experiencing a very strange issue whereby
 (seemingly randomly) the system will become almost unresponsive, with
 iowait going to 100% on some (but not all) domUs and dom0, but even the
 domUs whose load remains stable will still be incredibly sluggish. The
 problem occurs even when the resources are in standalone mode.

One thing that I would check: if you are running the credit scheduler,
dom0 may have run out of credits. Check/increase credit scheduler domain
weights and make sure dom0 gets enough CPU time to serve i/o requests
...  explained here in the Xen wiki http://goo.gl/fqtS6Y

Regards,
Andreas

-- 
Need help with Linux-HA?
http://www.hastexo.com/now

 
 Sometimes it self-corrects, but it's becoming more severe and is now
 less likely to go away without a reboot. Earlier today, the system
 running as primary was at 0.02 load, and the slave (which was doing
 nothing other than receiving updates from the master, no domUs running)
 went to 13 load and was pretty much dead.
 
 I've tried a variety of tuning options, including enabling
 disable_sendpage, but nothing is making it any better. Nothing is
 printed to the logs.
 
 My next thought is to try downgrading to DRBD 8.3, but considering
 support ends in December, I'd much prefer to continue using 8.4.
 
 I'm very much hoping that someone more experienced than myself will be
 able to offer some words of wisdom. :)
 
 Thanks
 Regards,
 Stephen Marsh
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Device is held open by someone

2013-02-28 Thread Andreas Kurz
On 2013-02-26 13:04, Felipe Gutierrez wrote:
 Hi everyone,
 
 I am trying to do a failover system only with drbd. When my primary node
 get out of the network, the secondary node became primary and I mount
 the filesystem.
 secondary# drbdadm primary r7
 secondary# mount /dev/drbd7 /mnt/drbd7/
 
 Until that every thing is ok.
 At this time, my old primary node has to became the secondary and I have
 to discard my changes.
 primary# umount -l /mnt/drbd7
 primary# drbdadm secondary r7
 7: State change failed: (-12) Device is held open by someone
 Command 'drbdsetup 7 secondary' terminated with exit code 11
 primary# drbdadm -- --discard-my-data connect r7
 
 Does anyone have a hint?

It's always worth checking device-mapper:

dmsetup ls --tree -o inverted

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 Thnaks in advance!
 Felipe
 
 -- 
 *--
 -- Felipe Oliveira Gutierrez
 -- felipe.o.gutier...@gmail.com mailto:felipe.o.gutier...@gmail.com
 -- https://sites.google.com/site/lipe82/Home/diaadia*
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 







signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee

2012-12-21 Thread Andreas Kurz

Please don't bypass the mailing-list ...

On 12/21/2012 06:04 PM, Anthony G. wrote:
 Thank you for your input.  That was my first thought, but I caught hell
 trying
 to get the partition sizes to match.  I'm not sure which size reading I
 need to 
 take on -nfs2 and then which specific lvcreate command I need to execute on 
 -nfs1 to get the size on the latter set properly.

well, you could try the one I put in my previous answer ... and it does
not need to be of the exact size on nfs1 ... equal or more

 
 I've recreated the lv, though (just to try and make some progress), and
 am now 
 getting the following, when I try to 'service drbd start' on -nfs1:
 
 DRBD's startup script waits for the peer node(s) to appear.
  - In case this node was already a degraded cluster before the
reboot the timeout is 0 seconds. [degr-wfc-timeout]
  - If the peer was available before the reboot the timeout will
expire after 0 seconds. [wfc-timeout]
(These values are for resource 'nfs'; 0 sec - wait forever)
  To abort waiting enter 'yes' [ 123]:yes
 
 'netstat -a' doesn't show -nfs2 listening on port 7789, but I do see
 drbd-related
 processes running on that box.  

so the resource on nfs2 is in disconnected state  do a drbdadm
adjust nfs on nfs2

Regards,
Andreas

 
 -Anthony
 
 Date: Fri, 21 Dec 2012 17:25:01 +0100
 From: andr...@hastexo.com
 To: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages
 on attempts to add rebuilt pee
 
 On 12/21/2012 12:13 AM, Anthony G. wrote:
 Hi,
 
 There's so much information relating to my current configuration, that
 I'm not sure what I should post here.  Let me start by saying that I had
 two Ubuntu 10.04 hosts configured in a DRBD relationship:  sf02-nfs1
 (primary) and sf0-nfs2 (secondary).  -nfs1 suffered a major filesystem
 fault.  I had to make -nfs2 primary and rebuild -nfs1.  I want to
 eventually have all of my machines on 12.04, so I took this as an
 opportunity to set -nfs1 on that OS.
 
 Here is a copy of my main configuration file (/etc/drbd.d/nfs.res):
 
 resource nfs {
   on sf02-nfs2 {
 device/dev/drbd0;
 disk  /dev/ubuntu/drbd-nfs;
 address   10.0.6.2:7789;
 meta-disk internal;
   }
   on sf02-nfs1 {
 device/dev/drbd0;
 disk  /dev/ubuntuvg/drbd-nfs;
 address   10.0.6.1:7789;
 meta-disk internal;
   }
 }
 
 
 I'm trying to re-introduce -nfs1 into the DRBD relationship and am
 having trouble.  I have:
 
 
 1.) created the resource nfs on -nfs1 ('drbdadm create-md nfs')
 
 2.) run 'drbdadm primary nfs' on -nfs2 and 'drbdadm secondary nfs' on -nfs1.
 
 3.) run drbdadm -- --overwrite-data-of-peer primary all' from -nfs2.
 
 
 But /var/log/kern.log shows:
 
 =
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843938] block drbd0:
 Handshake successful: Agreed network protocol version 91
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843949] block drbd0: conn(
 WFConnection - WFReportParams )
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844171] block drbd0: Starting
 asender thread (from drbd0_receiver [2452])
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844539] block drbd0:
 data-integrity-alg: not-used
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844610] block drbd0: *The
 peer's disk size is too small!*
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844617] block drbd0: conn(
 WFReportParams - Disconnecting )
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844626] block drbd0: error
 receiving ReportSizes, l: 32!
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844680] block drbd0: asender
 terminated
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844691] block drbd0:
 Terminating asender thread
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844746] block drbd0:
 Connection closed
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844755] block drbd0: conn(
 Disconnecting - StandAlone )
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844791] block drbd0: receiver
 terminated
 
 Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844794] block drbd0:
 Terminating receiver thread
 
 =
 
 
 So, it seems that a difference in the size of drbd0 on the respective
 machines is the source of my trouble.  'cat /proc/partitions' (output
 pasted at the end of this message) on each machine tells me that -nfs2's
 partition is around 348148 blocks larger than -nfs1's.  -nfs2 contains
 my company's Production data, so I do not, of course, want to do
 anything destructive there.  I can, however, certainly recreate the
 resource on -nfs1.  
 
 
 Does anyone out there know what steps I need to take to make the
 partition sizes match?  Of course, I'm working under the belief that the
 peer's disk size is too small message points up the source of my
 trouble.  Let me know, of course, if I need to post more information on
 my setup.
  
 You are using LVM, so simply resize the lv below DRBD on nfs1 to be at
 least of the same size or bigger ala:
  
 lvresize -L+200M ubuntuvg/drbd-nfs
  
 ... then 

Re: [DRBD-user] “The peer's disk size is too small!” messages on attempts to add rebuilt pee

2012-12-21 Thread Andreas Kurz
On 12/21/2012 06:39 PM, Anthony G. wrote:
 well, you could try the one I put in my previous answer ... and it does
 not need to be of the exact size on nfs1 ... equal or more
 
 
 I will try that.  It's probably apparent, but I'm new to LVM and DRBD.
  Is the
 drbdadm adjust nfs on nfs2 something that I can do while that system is
 up-and-running and servicing Production requests?

Yes that can be done online ... use -d switch for dry-run and you
should only see a connect command as output

Regards,
Andreas

 
 Thanks, again,
 
 -Anthony
 
 Date: Fri, 21 Dec 2012 18:12:23 +0100
 From: andr...@hastexo.com
 To: drbd-user@lists.linbit.com
 CC: agenere...@hotmail.com
 Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages
 on attempts to add rebuilt pee


 Please don't bypass the mailing-list ...

 On 12/21/2012 06:04 PM, Anthony G. wrote:
  Thank you for your input. That was my first thought, but I caught hell
  trying
  to get the partition sizes to match. I'm not sure which size reading I
  need to
  take on -nfs2 and then which specific lvcreate command I need to
 execute on
  -nfs1 to get the size on the latter set properly.

 well, you could try the one I put in my previous answer ... and it does
 not need to be of the exact size on nfs1 ... equal or more

 
  I've recreated the lv, though (just to try and make some progress), and
  am now
  getting the following, when I try to 'service drbd start' on -nfs1:
 
  DRBD's startup script waits for the peer node(s) to appear.
  - In case this node was already a degraded cluster before the
  reboot the timeout is 0 seconds. [degr-wfc-timeout]
  - If the peer was available before the reboot the timeout will
  expire after 0 seconds. [wfc-timeout]
  (These values are for resource 'nfs'; 0 sec - wait forever)
  To abort waiting enter 'yes' [ 123]:yes
 
  'netstat -a' doesn't show -nfs2 listening on port 7789, but I do see
  drbd-related
  processes running on that box.

 so the resource on nfs2 is in disconnected state  do a drbdadm
 adjust nfs on nfs2

 Regards,
 Andreas

 
  -Anthony
 
  Date: Fri, 21 Dec 2012 17:25:01 +0100
  From: andr...@hastexo.com
  To: drbd-user@lists.linbit.com
  Subject: Re: [DRBD-user] “The peer's disk size is too small!” messages
  on attempts to add rebuilt pee
 
  On 12/21/2012 12:13 AM, Anthony G. wrote:
  Hi,
 
  There's so much information relating to my current configuration, that
  I'm not sure what I should post here. Let me start by saying that I had
  two Ubuntu 10.04 hosts configured in a DRBD relationship: sf02-nfs1
  (primary) and sf0-nfs2 (secondary). -nfs1 suffered a major filesystem
  fault. I had to make -nfs2 primary and rebuild -nfs1. I want to
  eventually have all of my machines on 12.04, so I took this as an
  opportunity to set -nfs1 on that OS.
 
  Here is a copy of my main configuration file (/etc/drbd.d/nfs.res):
 
  resource nfs {
  on sf02-nfs2 {
  device /dev/drbd0;
  disk /dev/ubuntu/drbd-nfs;
  address 10.0.6.2:7789;
  meta-disk internal;
  }
  on sf02-nfs1 {
  device /dev/drbd0;
  disk /dev/ubuntuvg/drbd-nfs;
  address 10.0.6.1:7789;
  meta-disk internal;
  }
  }
 
 
  I'm trying to re-introduce -nfs1 into the DRBD relationship and am
  having trouble. I have:
 
 
  1.) created the resource nfs on -nfs1 ('drbdadm create-md nfs')
 
  2.) run 'drbdadm primary nfs' on -nfs2 and 'drbdadm secondary nfs'
 on -nfs1.
 
  3.) run drbdadm -- --overwrite-data-of-peer primary all' from -nfs2.
 
 
  But /var/log/kern.log shows:
 
  =
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843938] block drbd0:
  Handshake successful: Agreed network protocol version 91
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.843949] block drbd0: conn(
  WFConnection - WFReportParams )
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844171] block drbd0:
 Starting
  asender thread (from drbd0_receiver [2452])
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844539] block drbd0:
  data-integrity-alg: not-used
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844610] block drbd0: *The
  peer's disk size is too small!*
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844617] block drbd0: conn(
  WFReportParams - Disconnecting )
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844626] block drbd0: error
  receiving ReportSizes, l: 32!
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844680] block drbd0: asender
  terminated
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844691] block drbd0:
  Terminating asender thread
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844746] block drbd0:
  Connection closed
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844755] block drbd0: conn(
  Disconnecting - StandAlone )
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844791] block drbd0:
 receiver
  terminated
 
  Dec 19 19:55:47 sf02-nfs2 kernel: [9284165.844794] block drbd0:
  Terminating receiver thread
 
  =
 
 
  So, it seems that a difference in the size of drbd0 on the respective
  machines is the source of my trouble. 'cat 

Re: [DRBD-user] Still experiencing resource spikes

2012-12-18 Thread Andreas Kurz
On 12/17/2012 08:27 PM, Prater, James K. wrote:
  
 
  
 
  
 
 I have a real sticky problem and would appreciate if anyone has insight.
 
  
 
  
 
 We currently have the following physical configuration
 
  
 
 2) Dell PowerEdge R710 (dual 6-core with hyperthreading enabled)
 
  120Gbytes of memory each

That's quite a lot ... you also tuned vm.dirty_background_bytes and
vm.dirty_bytes to a reasonable low value?  to avoid regular heavy
data write-out

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
  4x 10GE Nics  (2-bonded for NFS and 2-bonded for Replication)
Broadcom 57711.
 
  4x 8Gbit FC HBAs  which are tied to a dual controller NEXSAN E60
 array (controllers in non-redundant mode), each controller has 4Gbytes
 of memory.
 
   Raw throughput is around 340Mbytes/sec per volume.
 
 
 
  Each system is running RHEL 6.3 with heartbeat and now with
 DRBD-8.4.2,  problem was there with 8.4.0 (could not get 8.4.1 to work).
 
 System configured as Active/Passive pair with EXT4 and the filesystem,
 barriers off.   Filesystems exported via NFS to vSphere  4.1 clients.
 
  
 
  Main problem is that everything works for most of the time but every
 now and then a resource stall (high load average and no I/O) occurs which
 
 Is not good for running VMs.   Has anyone seen this?No errors
 recorded just no I/O and high load for a few minutes (3-4).This has
 been driving
 
 me crazy.  One more thing,   these events do not occur with
 “replication disabled”  i.e. drbdadm down all (on the peer member).  
   I have adjusted
 
 many sysctl parameters (up memory buffers , etc) ,  changed I/O
 schedulers and turned on and off hyperthreading and still have the issue.
 
  
 
  
 
 Thanks in advance.
 
  
 
  
 
 James
 
  
 
  
 
  
 
  
 
  
 
  
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] FileSystem Resource Won't Start

2012-12-11 Thread Andreas Kurz
Hi Eric,

On 12/11/2012 08:42 PM, Robinson, Eric wrote:
 Add something like this:
 
 order o_drbd_then_group_clust08 inf: ms_drbd0:promote g_clust08:start
 order o_drbd_then_group_clust09 inf: ms_drbd1:promote g_clust09:start
 collocate c_group_clust08_on_drbd_master inf: g_clust08 ms_drbd0:master
 collocate c_group_clust09_on_drbd_master inf: g_clust09 ms_drbd1:master
 
 Do I really need a colocation and an order? 

Yes ... for each drbd m/s resource in such a setup.

 Doesn't a colo inply the order?

No

Regads,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now
  
 --
 Eric Robinson
  
 
 
 Disclaimer - December 11, 2012
 
 This email and any files transmitted with it are confidential and
 intended solely for *Jake Smith,drbd-user@lists.linbit.com*. If you are
 not the named addressee you should not disseminate, distribute, copy or
 alter this email. Any views or opinions presented in this email are
 solely those of the author and might not represent those of Physicians'
 Managed Care or Physician Select Management. Warning: Although
 Physicians' Managed Care or Physician Select Management has taken
 reasonable precautions to ensure no viruses are present in this email,
 the company cannot accept responsibility for any loss or damage arising
 from the use of this email or attachments.
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 







signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Slow Reads on VM - Xenserver and DRBD

2012-07-26 Thread Andreas Kurz
On 07/24/2012 11:46 AM, Phil Stricker wrote:
 Hi!
 
 I think, I am seing two different Problems of which one is isolated:
 
 - Slow performance in a VM
 The issue seams to be related to different OSs: In a Debian-VM, I am getting 
 nearly full speed of the drbd-block-device , in a CentOS 5.8 VM, I can only 
 see 10% of that speed.
 

You are using paravirtualized drivers in the VM?

 - Slow overall performance of the raid-array:
 I cannot reach more than 450 MB/s on the array (LSI 9260-4i, 8x Intel 320 160 
 GB SSD). I see 450 MB/s in a Raid10 setting and I see 450MB/s in a Raid0 test 
 setting. That is crazy!
 

You already shared your drbd configuration? ... and the controller has a
non-volatile cache?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 Did you ever see that behaviour?
 
 Best wishes,
 Phil
 
 
 
 -Original Message-
 From: f...@mpexnet.de [mailto:drbd-user-boun...@lists.linbit.com] On
 Behalf Of Felix Frank
 Sent: Tuesday, July 24, 2012 11:17 AM
 To: Christian Balzer
 Cc: drbd-user@lists.linbit.com
 Subject: Re: [DRBD-user] Slow Reads on VM - Xenserver and DRBD

 On 07/24/2012 11:11 AM, Christian Balzer wrote:
 Yeah, I've read the same thing and am leaning towards KVM for a fully
 virtualized system, though Vserver and (in the future) LXC work for
 90% of the requirements I have.

 Seconded. Linux vServer + DRBD make for a robust HA setup at very good
 performance. A vserver RA for pacemaker is floating through the web.
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Strange alerts on zabbix showing pacemaker down

2012-07-09 Thread Andreas Kurz
On 07/09/2012 01:17 AM, Richard Goetz wrote:
 Hi DRBD Folks
 I have a strange issue occurring where zabbix checks for
 dbbdadmin/pacemaker and alerting at random intervals. This all started
 after doing a test fail over of master node using drbd.
 
 Some of the checks that fail call are executed by zabbix 
 
 COMMAND=/sbin/drbdadm dstate harddisk 
 COMMAND=/sbin/drbdadm cstate ssddisk
 COMMAND= /usr/sbin/crm_mon -s
 COMMAND= /usr/sbin/crm_mon -1
 
 At first i thought that this was a zabbix only problem but then I began
 to suspect something was going awry.After a few dozen alerts in the
 middle of the night with no load on system I began to suspect that this
 was something else.
 During an event where the timeout of checks for pacemaker  drbdadm
  fails. I was unable to log into systems in timely manner.
 I have attempted to login to  log into mysql server to see what may
 because this blocking during a alerting event but I noticed that it is
 taking 2-5 mins to log into server which seemed off for server with
 LoavAvg in 0.0[1-9] range and iostat -dx was not over capacity. (i
 checked as soon as I was able to login)
 
 I turned sar on server to get better data and found 2 other things
 occurring at exactly the same time. A spike in totsck and one of the
 cores having high cpu utilization. Normally totsck was in 500 range but
 during event it was in 1500 range.

So this is a mysql database and applications connect to it ... have you
checked if all those tcp connections, and a lot of them are in TIME_WAIT
state, are mysql connections? Have you been able to do a remote mysql
connection and executing a  SHOW PROCESSLIST?

Have you tried to do a ssh connection with debug output? ... ssh -vvv to
see more information

DNS resolution is working fine? sshd and Mysql do reverse DNS lookups
per default ...

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
   totscktcpsckudpsckrawsck  
 ip-fragtcp-tw
 
 
 10:45:01 AM  1561   29318 0 0   838
 
 10:35:01 AM CPU  %usr %nice  %sys   %iowait%steal   
   %irq %soft%guest %idle
 
 10:45:01 AM   5 11.64  0.00 43.87  0.02  0.00   
   0.00  0.04  0.00 44.42
 
 \
 
 06:45:01 AMtotscktcpsckudpsckrawsck   ip-fragtcp-tw
 
 03:05:06 AM  1562   28617 0 0   859
 
 03:15:01 AM  1548   28617 0 0   869
 
 10:35:01 AM CPU  %usr %nice  %sys   %iowait%steal   
   %irq %soft%guest %idle
 
 03:15:01 AM   6 20.88  0.00 79.09  0.00  0.00   
   0.00  0.03  0.00  0.00
 
 
 It is clear that something is occurring on server when this occurs and
 also always occurring in syslog at same time are following
 events(although the same events occur when zabbix checks/inability to
 login do no appear to occur also)
 
 
 
 Jul  8 03:05:44 mysql-1 lrmd: [2834]: info: operation monitor[191] on
 ip1 for client 2837: pid 7573 exited with return code 0
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: crm_timer_popped: PEngine
 Recheck Timer (I_PE_CALC) just popped (90ms)
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition: State
 transition S_IDLE - S_POLICY_ENGINE [ input=I_PE_CALC
 cause=C_TIMER_POPPED origin=crm_tim
 
 er_popped ]
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition:
 Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: do_state_transition: All 2
 cluster nodes are eligible to run resources.
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: do_pe_invoke: Query 867:
 Requesting the current CIB: S_POLICY_ENGINE
 
 Jul  8 03:08:00 mysql-1 crmd: [2837]: info: do_pe_invoke_callback:
 Invoking the PE: query=867, ref=pe_calc-dc-1341731280-1029, seq=32,
 quorate=1
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_config: On loss
 of CCM Quorum: Ignore
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_rsc_op:
 Operation ip1arp_last_failure_0 found resource ip1arp active on mysql-2
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: unpack_rsc_op:
 Operation ip1arp_last_failure_0 found resource ip1arp active on mysql-1
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 fs_mysql#011(Started mysql-1)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 fs_binlog#011(Started mysql-1)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 ip1#011(Started mysql-1)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 mysql#011(Started mysql-1)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 ip1arp#011(Started mysql-1)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: Leave  
 drbd_binlog:0#011(Slave mysql-2)
 
 Jul  8 03:08:00 mysql-1 pengine: [1339]: notice: LogActions: 

Re: [DRBD-user] Parse error an option keyword expected but got fence peer

2012-06-25 Thread Andreas Kurz
On 06/23/2012 01:47 AM, Keith Christian wrote:
 I've searched for a solution to this error, lots of hits for Parse
 error but couldn't find anything specific for fence-peer.
 
 I have checked the drbd.conf file for obvious errors like unbalanced
 braces, and missing semicolons at the end of line.  Nothing found.
 
 Using these RPM's:
 
 drbd82-8.2.6-1.el5.centos
 kmod-drbd82-8.2.6-2

Really, really, really consider an update to DRBD 8.3.x ... fence-peer
was was named outdate-peer in earlier days.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 
 This is on a 64 bit system, so I fixed line 31 which needed lib64 to
 find the file:
 
 ls -l /usr/lib64/heartbeat/drbd-peer-outdater
 -rwxr-xr-x 1 root root 15984 Feb  6  2008
 /usr/lib64/heartbeat/drbd-peer-outdater
 
 
 
 When running any DRBD command I see this error:
 
 drbdadm create-md drbd-resource-0
 /etc/drbd.conf:31: Parse error: 'an option keyword' expected,
 but got 'fence-peer'
 
 
 
 
 I commented out line 31, tried to start DRBD again, and saw the error
 on line 56, removed the comment from line 31, and the error returns to
 line 31.
 
 service drbd start
 /etc/drbd.conf:56: Parse error: 'an option keyword' expected,
 but got 'outdated-wfc-timeout'
 Starting DRBD resources:/etc/drbd.conf:56: Parse error:
 'an option keyword' expected,
 but got 'outdated-wfc-timeout'
 
 53 # Wait for connection timeout if the peer node is
 already outdated.
 54 # (Do not set this to 0, since that means unlimited)
 55 #
 *** 56 outdated-wfc-timeout 2;  # 2 seconds.
 57# In case there was a split brain situation the
 devices will
 58 # drop their network configuration instead of connecting. Since
 
 
 
 Below are the first 35 lines of the file, which enclose the line
 throwing the error:
 
1 global { usage-count no; }
2
3 resource drbd-resource-0 {
4   protocol C;
5
6 handlers {
7   # what should be done in case the node is primary, degraded
8 # (=no connection) and has inconsistent data.
9 pri-on-incon-degr /usr/lib/drbd/notify-pri-on-incon-degr.sh;
 /usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger
 ; reboot -f;
   10
   11 # The node is currently primary, but lost the after split brain
   12 # auto recovery procedure. As as consequence it should go away.
   13 pri-lost-after-sb /usr/lib/drbd/notify-pri-lost-after-sb.sh;
 /usr/lib/drbd/notify-emergency-reboot.sh; echo b  /proc/sysrq-trigger
 ; reboot -f;
   14
   15 # In case you have set the on-io-error option to 
 call-local-io-error,
   16 # this script will get executed in case of a local IO error. It 
 is
   17 # expected that this script will case a immediate failover in the
   18 # cluster.
   19 local-io-error /usr/lib/drbd/notify-local-io-error.sh;
 /usr/lib/drbd/notify-emergency-shutdown.sh; echo o 
 /proc/sysrq-trigger ; halt -f;
   20
   21
   22 # Commands to run in case we need to downgrade the peer's disk
   23 # state to Outdated. Should be implemented by the superior
   24 # communication possibilities of our cluster manager.
   25 # The provided script uses ssh, and is for 
 demonstration/development
   26 # purposis.
   27 # fence-peer /usr/lib/drbd/outdate-peer.sh on amd
 192.168.22.11 192.168.23.11 on alf 192.168.22.12 192.168.23.12;
   28 #
   29 # Update: Now there is a solution that relies on heartbeat's
   30 # communication layers. You should really use this.
   *** 31 fence-peer /usr/lib64/heartbeat/drbd-peer-outdater -t 5;
   32 # For Pacemaker you might use:
   33 # fence-peer /usr/lib/drbd/crm-fence-peer.sh;
   34
   35 }
 
 
 
 I'd appreciate any insight or help.
 
 
 == Keith
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Protocol A Pending

2012-06-25 Thread Andreas Kurz
On 06/22/2012 09:38 PM, J.R. Lillard wrote:
 Witnessed another bandwidth spike that slowed my stacked layer down.
 
 10: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate A r-
 ns:192538032 nr:0 dw:599650316 dr:1701817080 al:4481613 bm:43214
 lo:1 pe:2050 ua:0 ap:2049 ep:1 wo:f oos:0
 resync: used:0/61 hits:3274 misses:165 starving:0 dirty:0
 changed:165
 act_log: used:135/3833 hits:788561 misses:25007 starving:0
 dirty:28 changed:24979

You also increased al-extents for the lower-level DRBD device of your
stacked resource?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 On Fri, Jun 22, 2012 at 8:24 AM, J.R. Lillard jlill...@ghfllc.com
 mailto:jlill...@ghfllc.com wrote:
 
 My starving count was pretty high.  I maxed out my al-extents and
 will see if that helps.  Thanks.
 
 
 On Fri, Jun 22, 2012 at 4:22 AM, Andreas Kurz andr...@hastexo.com
 mailto:andr...@hastexo.com wrote:
 
 On 06/20/2012 11:51 PM, J.R. Lillard wrote:
  I have a lower-level setup on Protocol C and a stacked layer
 of Protocol
  A going through Proxy over a WAN.  There are times when my
 disk activity
  spikes causing the Proxy buffers to fill up a bit.  While this is
  happening the Pending count on my stacked resources goes up
 and causes
  my access to those resources to slow down.  Is this normal?  I
 thought
  with Protocol A as soon as my local write was finished things
 would
  continue.
 
 You checked that you are not running out of activity log extents
 for the
 stacked resource?
 
 Do an: echo 1 /sys/module/drbd/parameters/proc_details
 
 ... and have a look at starving counter in /proc/drbd ... should
 ideally be 0 and definitely not increasing regularly. If it does,
 increase the al-extents value for the stacked resource (max is
 3833 in
 drbd 8.3 ... IIRC)
 
 Regards,
 Andreas
 
 --
 Need help with DRBD?
 http://www.hastexo.com/now
 
 
  --
  J.R. Lillard
  System / Network Admin
  Web Programmer
  Golden Heritage Foods
  120 Santa Fe St.
  Hillsboro, KS  67063
 
 
 
  ___
  drbd-user mailing list
  drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com
  http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 
 -- 
 J.R. Lillard
 System / Network Admin
 Web Programmer
 Golden Heritage Foods
 120 Santa Fe St.
 Hillsboro, KS  67063
 
 
 
 
 -- 
 J.R. Lillard
 System / Network Admin
 Web Programmer
 Golden Heritage Foods
 120 Santa Fe St.
 Hillsboro, KS  67063
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Protocol A Pending

2012-06-22 Thread Andreas Kurz
On 06/20/2012 11:51 PM, J.R. Lillard wrote:
 I have a lower-level setup on Protocol C and a stacked layer of Protocol
 A going through Proxy over a WAN.  There are times when my disk activity
 spikes causing the Proxy buffers to fill up a bit.  While this is
 happening the Pending count on my stacked resources goes up and causes
 my access to those resources to slow down.  Is this normal?  I thought
 with Protocol A as soon as my local write was finished things would
 continue.

You checked that you are not running out of activity log extents for the
stacked resource?

Do an: echo 1 /sys/module/drbd/parameters/proc_details

... and have a look at starving counter in /proc/drbd ... should
ideally be 0 and definitely not increasing regularly. If it does,
increase the al-extents value for the stacked resource (max is 3833 in
drbd 8.3 ... IIRC)

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 -- 
 J.R. Lillard
 System / Network Admin
 Web Programmer
 Golden Heritage Foods
 120 Santa Fe St.
 Hillsboro, KS  67063
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 








signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Partition being synced must be on its own LVM volume?

2012-06-21 Thread Andreas Kurz
On 06/21/2012 11:10 PM, Keith Christian wrote:
 In previous DRBD installations, a separate physical (e.g. non-LVM)
 partition was kept in sync with DRBD.  This partition was, say,
 /dev/sda3 and was mounted on a /data directory in /etc/ha.d/
 haresources.
 
 While planning this for a new LVM based install, it occurs to me that
 the partition that is mounted on /data must be a separate Logical
 Volume.
 
 Currently, /data resides in the same logical volume that hosts /.
 
 Am I correct in thinking that when DRBD starts, it would sync not only
 /data, but everything else in / too, including /root, /var, /home,
 etc.?  I don't see how it wouldn't.
 
 So it appears I'll have to create a separate logical volume and mount
 /data on it as in the past.
 
 Sound reasonable, DRBD people?

yes ... DRBD does block-level replication.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 Thanks.
 
 
 =Keith
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Filesystem Pacemaker Resources Stopped

2012-03-26 Thread Andreas Kurz
On 03/24/2012 12:09 AM, Robert Langley wrote:
 Maybe I need to post this with Pacemaker? Not sure.
 I am a bit new to this scene and trying my best to learn all of this 
 (Linux/DRBD/Pacemaker/Heartbeat).
 
 I am in the middle of following this document, Highly available NFS storage 
 with DRBD and Pacemaker located at:
 http://www.linbit.com/en/education/tech-guides/highly-available-nfs-with-drbd-and-pacemaker/
 
 OS: Ubuntu 11.10
 DRBD version: 8.3.11
 Pacemaker version: 1.1.5
 I have two servers with 2.4 TB of internal hard drive space each, plus 
 mirrored hard drives for the OS. They both have 10 NICs (2 onboard in a bond 
 and 8 between 2, 4 port intel NICs).
 
 Issue: I got to the end of part 4.3 (commit) and that is when things went 
 bad. I actually ended up with a split-brain and I seem to have recovered from 
 that, but now my resources are as follows (running crm_mon -1):
 My slave node is actually showing as the Master under the Master/Slave Set: 
 ms_drbd_nfs [p_drbd_nfs]
 Clone set Started
 Resource Group: Only p_lvm_nfs is Started on my slave node. All of the 
 Filesystem resources are Stopped.
 
 Then, I have this at the bottom:
 Failed actions:
 p_fs_vol01_start_0 (node=ds01, call=46, rc=5, status=complete): 
 not installed
 p_fs_vol01_start_0 (node=ds02, call=430, rc=5, status=complete): 
 not installed

Mountpoint created on both nodes, defined correct device and valid file
system? What happens after a cleanup? ... crm resource cleanup
p_fs_vol01 ... grep for Filesystem in your logs to get the error
output from the resource agent.

For more ... please share current drbd state/configuration and your
cluster configuration.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Looking in the syslog on ds01 (primary node) does not reveal anything worth 
 mentioning; but, looking at the syslog on ds02 (secondary node) shows the 
 following messages:
 
 pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 
 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds01
 pengine: [11725]: WARN: unpack_rsc_op: Processing failed op 
 p_fs_vol01_start_0 on ds01: not installed (5)
 pengine: [11725]: notice: unpack_rsc_op: Operation 
 p_lsb_nfsserver:1_monitor_0 found resource p_lsb_nfsserver:1 active on ds02
 pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 
 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds02
 pengine: [11725]: WARN: unpack_rsc_op: Processing failed op 
 p_fs_vol01_start_0 on ds02: not installed (5)
 pengine: [11725]: notice: native_print: 
 failover-ip#011(ocf::heartbeat:IPaddr):#011Stopped
 pengine: [11725]: notice: clone_print:  Master/Slave Set: ms_drbd_nfs 
 [p_drbd_nfs]
 ...
 pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from 
 ds01 after 100 failures (max=100)
 pengine: [11725]: notice: common_apply_stickiness: p_lvm_nfs can fail 99 
 more times on ds02 before being forced off
 pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from 
 ds02 after 100 failures (max=100)
 pengine: [11725]: notice: LogActions: Leave   failover-ip#011(Stopped)
 pengine: [11725]: notice: LogActions: Leave   p_drbd_nfs:0#011(Slave ds01)
 pengine: [11725]: notice: LogActions: Leave   p_drbd_nfs:1#011(Master ds02)
 
 
 Thank you in advance for any assistance,
 Robert
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd read-only mode

2012-03-15 Thread Andreas Kurz
On 03/15/2012 03:31 PM, зоррыч wrote:
 Read-only  flag stands on two nodes.
 
 I will use a cluster file system (ocfs2 of gfs2) for the correct
 operation of the two disks

read-only flag??? Do you mean the ro:Primary/Primary which indicates the
role (ro..le) as being Primary on both sides?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
  
 
  
 
  
 
  
 
  
 
 *From:*Marcelo Pereira [mailto:marcel...@gmail.com]
 *Sent:* Thursday, March 15, 2012 6:26 PM
 *To:* ??
 *Subject:* Re: [DRBD-user] drbd read-only mode
 
  
 
 Why do you want it to be primary on both nodes?
 
 --Marcelo
 
 
 On Mar 15, 2012, at 10:21 AM, ?? zo...@megatrone.ru
 mailto:zo...@megatrone.ru wrote:
 
 Hi
 
 I installed 8.4.1 drdb.
 
 The cluster operates on a primary/primary mode.
 
 However, the drives are mounted in the mode of read-only
 
 [root@noc-1-m77 /]# cat /proc/drbd
 
 version: 8.4.1 (api:1/proto:86-100)
 
 GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
 r...@noc-1-synt.rutube.ru mailto:r...@noc-1-synt.rutube.ru,
 2012-03-14 10:05:49
 
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-
 
   ^^
 
 ns:0 nr:0 dw:0 dr:664 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 
  
 
 Why not activate the read-write mode?
 
  
 
 Config:
 
 [root@noc-1-synt /]# cat /etc/drbd.d/r0.res
 
 # create new
 
 resource r0 {
 
   startup {
 
 wfc-timeout 20;
 
 degr-wfc-timeout 10;
 
 # we will keep this commented until tested successfully:
 
 become-primary-on both;
 
   }
 
  net {
 
 protocol C;
 
 allow-two-primaries;
 
 after-sb-0pri discard-zero-changes;
 
 after-sb-1pri discard-secondary;
 
 after-sb-2pri disconnect;
 
  
 
 }
 
 # DRBD device
 
 device /dev/drbd0;
 
 # phisical device
 
 disk /dev/vg_noc1synt/lv02;
 
 meta-disk internal;
 
 on noc-1-synt.rutube.ru http://noc-1-synt.rutube.ru {
 
# IP address:port
 
address 10.1.20.10:7788;
 
 }
 
 on noc-1-m77.rutube.ru http://noc-1-m77.rutube.ru {
 
address 10.2.20.9:7788;
 
 }
 
 }
 
 [root@noc-1-synt /]#
 
  
 
  
 
  
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com mailto:drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify all question

2012-03-13 Thread Andreas Kurz
On 03/13/2012 03:44 PM, Maurits van de Lande wrote:
 I have got a drbd setup with a primary and a secondary node. I would
 like to verify if the secondary node is in sync with the primary node.
 
  On which node do I have to execute
 
 # drbdadm verify all

It doesn't matter on which node you start it. If differences are found
they are _not_ automatically resynced ... out-of-sync blocks are
resynced after doing a disconnect/reconnect and data is synced from
Primary to Secondary ... no matter where you started the verify.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
  Thanks,
 
  Maurits van de Lande
 
  
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Pacemaker + Dual Primary, handlers and fail-back issues

2012-03-01 Thread Andreas Kurz
Hello,

On 02/29/2012 08:08 PM, Daniel Grunblatt wrote:
 Hi,
 
 I have a 2 node cluster with sles11sp1, with the latest patches.
 Configured Pacemaker, dual primary drbd and xen.

see below for some comments ...

 
 Here's the configuration:
 
 - drbd.conf
 global {
usage-count  yes;
 }
 common {
protocol C;
disk {
 on-io-errordetach;
 fencing resource-only;

for dual-primary setup use resource-and-stonith

}
syncer {
   rate   1G;

wow ... only set that high rate if your I/O system can handle that.

   al-extents   3389;
}
net {
   allow-two-primaries; # Enable this *after* initial testing
   cram-hmac-alg sha1;
   shared-secret a6a0680c40bca2439dbe48343cf4;
   after-sb-0pri discard-zero-changes;
   after-sb-1pri discard-secondary;
   after-sb-2pri disconnect;
}
   startup {
   become-primary-on both;

this is only done by the drbd init script which is hopefully deactivated
... anyway ... remove that directive

 }
   handlers {
   fence-peer /usr/lib/drbd/crm-fence-peer.sh;
   after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;

for dual-primary setup with a cluster file system and
resource-and-stonith in combination with Pacemaker you can use a fencing
script like stonith_admin-fence-peer.sh that recently found its way
into the drbd repository: http://goo.gl/XfSfo ... thanks to my colleague
Florian Haas for contributing this nice script ;-)

You don't need after-resync-target for that kind of setup.

 
 }
 }
 resource vmsvn {
   device/dev/drbd0;
   disk  /dev/sdb;
   meta-disk internal;
on xm01 {
   address   100.0.0.1:7788;
}
on xm02 {
   address   100.0.0.2:7788;
}
 }
 
 resource srvsvn1 {
   protocol C;
   device/dev/drbd1;
   disk  /dev/sdc;
   meta-disk internal;
on xm01 {
   address   100.0.0.1:7789;
}
on xm02 {
   address   100.0.0.2:7789;
}
 }
 
 resource srvsvn2 {
   protocol C;
   device/dev/drbd2;
   disk  /dev/sdd;
   meta-disk internal;
on xm01 {
   address   100.0.0.1:7790;
}
on xm02 {
   address   100.0.0.2:7790;
}
 }
 
 resource vmconfig {
 protocol C;
   device/dev/drbd3;
   meta-disk internal;
on xm01 {
   address   100.0.0.1:7791;
   disk /dev/vg_xm01/lv_xm01_vmconfig;
}
on xm02 {
   address   100.0.0.2:7791;
   disk /dev/vg_xm02/lv_xm02_vmconfig;
}
 }
 
 
 
 - crm configuration:
 node xm01
 node xm02
 primitive VMSVN ocf:heartbeat:Xen \
 meta target-role=Started allow-migrate=true
 is-managed=true resource-stickiness=0 \
 operations $id=VMSVN-operations \
 op monitor interval=30 timeout=30 \
 op start interval=0 timeout=60 \
 op stop interval=0 timeout=60 \
 op migrate_to interval=0 timeout=180 \
 params xmfile=/etc/xen/vm/vmsvn
 primitive clvm ocf:lvm2:clvmd \
 operations $id=clvm-operations \
 op monitor interval=10 timeout=20
 primitive dlm ocf:pacemaker:controld \
 operations $id=dlm-operations \
 op monitor interval=10 timeout=20 start-delay=0
 primitive ipmi-stonith-xm01 stonith:external/ipmi \
 meta target-role=Started is-managed=true priority=10 \
 operations $id=ipmi-stonith-xm01-operations \
 op monitor interval=15 timeout=15 start-delay=15 \
 params hostname=xm01 ipaddr=125.1.254.107
 userid=administrator passwd=17xm45 interface=lan
 primitive ipmi-stonith-xm02 stonith:external/ipmi \
 meta target-role=Started is-managed=true priority=9 \
 operations $id=ipmi-stonith-xm02-operations \
 op monitor interval=15 timeout=15 start-delay=15 \
 params hostname=xm02 ipaddr=125.1.254.248
 userid=administrator passwd=17xm45 interface=lan
 primitive o2cb ocf:ocfs2:o2cb \
 operations $id=o2cb-operations \
 op monitor interval=10 timeout=20
 primitive srvsvn1-drbd ocf:linbit:drbd \
 params drbd_resource=srvsvn1 \
 operations $id=srvsvn1-drbd-operations \
 op monitor interval=20 role=Master timeout=20 \
 op monitor interval=30 role=Slave timeout=20 \
 op start interval=0 timeout=240 \
 op promote interval=0 timeout=90 \
 op demote interval=0 timeout=90 \
 op stop interval=0 timeout=100 \
 meta migration-threshold=10 failure-timeout=600
 primitive srvsvn2-drbd ocf:linbit:drbd \
 params drbd_resource=srvsvn2 \
 operations $id=srvsvn2-drbd-operations \
 op monitor interval=20 role=Master timeout=20 \
 op monitor interval=30 role=Slave timeout=20 \
 op start interval=0 timeout=240 \
 op promote interval=0 timeout=90 \
 op demote interval=0 timeout=90 \
 op stop interval=0 timeout=100 \
 meta migration-threshold=10 failure-timeout=600
 

Re: [DRBD-user] Urgent!!!: degr-wfc-timeout is not working.

2012-02-21 Thread Andreas Kurz
Hello,

On 02/21/2012 05:17 PM, venkatesh prabhu wrote:
 Hi,
 I am facing issue in degraded timeout.
 My two node cluster with DRBD is up and running.
 But the degr-wfc-timeout is not working as expected.
 
 I have primary and secondary node.
 Then i shutdown the secondary node, then making some changes in mirror
 from the primary node and rebooting.
 When it comes up it is waiting for 180 seconds instead of 3 seconds.
 See the config section provided below
 Please let me know what could be the problem.

It is working as expected: degr-wfc-timeout would be triggered if the
Primary crashes while running in a degraded cluster ... on regular
shutdown and reboot afterwards wfc-timeout is used.

Reset your single Primary and you should see degr-wfc-timeout being
triggered.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 My confgi section for timeouts;
 startup {
 degr-wfc-timeout 3;#3 sec..
 
 wfc-timeout 180;# 3 min.
 
 #   become-primary-on both;
 } # end of startup
 
 --
 Thanks in Advance.
 
 Vengatesh Prabhu
 Life is Beautiful:
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Backing up VMware VM's running DRBD with VMware Data Recovery (VDR)

2012-02-20 Thread Andreas Kurz
Hello,

On 02/17/2012 10:28 AM, Mark Watts wrote:
 
 I have a pair of CentOS 5.7 VM's running an LVM/DRBD/EXT3 Pri/Sec cluster.
 
 Since we use VDR to take snapshots of our VM's I naturally added these
 two VM's to the backup rota.

So you installed latest VMware tools to the VMs and you are sure they
are running?

 
 Pretty much every night I get hundreds of errors in the VDR logs
 relating to the Primary, giving the message:
 
 Failed to create snapshot for VDR01, error -3960 ( cannot quiesce
 virtual machine)

Any logs from vmware-tools in the VM?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 I'm taking a wild guess at this perhaps being related to DRBD; can
 anyone suggest whether this is the issue.
 
 
 Mark.
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Problem with drbd 8.4.1 on linux kernel 3.0.0

2012-02-03 Thread Andreas Kurz
Hello,

On 02/03/2012 03:05 PM, Owen Le Blanc wrote:
 Kaloyan Kovachev wrote:
 
 what does 'cat /proc/drbd' say at this moment on both nodes?
 
 on the primary:
 
 cat /proc/drbd
 version: 8.4.1 (api:1/proto:86-100)
 GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
 root@brahe, 2012-01-06 13:30:08
  0: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-
 ns:0 nr:0 dw:116 dr:472 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:232
  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r-
 ns:0 nr:0 dw:116 dr:472 al:2 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:232

oos: 232

 
 on the secondary: cat /proc/drbd
 version: 8.4.1 (api:1/proto:86-100)
 GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
 root@brahe, 2012-01-06 13:30:08
  0: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1516
  1: cs:WFConnection ro:Secondary/Unknown ds:Outdated/DUnknown C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:1516

also oos on Secondary ... as Felix said, this looks like a split brain
... Out Of Sync blocks on both sides.

 
 at least, shortly after giving the connect command.  Soon it reverts to
 StandAlone.

and this is the default after-split-brain behaviour: disconnect

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
  -- Owen
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Problem with drbd 8.4.1 on linux kernel 3.0.0

2012-02-03 Thread Andreas Kurz
Hello,

On 02/03/2012 04:35 PM, Felix Frank wrote:
 Hi,
 
 On 02/03/2012 04:05 PM, Owen Le Blanc wrote:
 This still
 leaves open the question of why split brain keeps occuring?  The
 cluster is managed by pacemaker, version 1.1.6, with corosync, version
 1.4.2.  There isn't actually any real data on the two drbd devices,
 since this is only a test.  But it concerns me that it seems to go
 into split brain about once a week.

Be sure to have STONITH enabled and configured, use resource-level
fencing in DRBD and redundant ring setup for corosync ... so try to
minimize the chance for split-brain and establish protection in case
your nodes get separated.

 
 this is neigh impossible to answer without looking at your setup and
 workflows.
 
 Split brain can only happen when the nodes get disconnected (this
 includes downtimes of either node).

... and additionally they need to get promoted to Primary on both sides
while not connected.

Regards,
Andreas

-- 
Need help with Pacemaker/DRBD/Corosync?
http://www.hastexo.com/now

 How often are your nodes getting separated thus?
 
 ...this is not dual-primary, is it?
 
 Prune your logs, it should be easy to determine when DRBD changes
 states. As to why, well - analyzing pacemaker activity from hindsight
 can be challenging, I guess. Your best bet is probably to make sure the
 nodes don't get separated, and that maintenance is done very carefully.
 
 HTH,
 Felix
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Urgent: Stuch with DRBD bring up.

2012-02-02 Thread Andreas Kurz
Hello,

... don't forget to post to the group.

On 01/30/2012 01:06 PM, venkatesh prabhu wrote:
 Hi Andreas,
 Thanks for your quick reply.
 
 The solution you provided me solved the problem number 1.
 But still i am facing the error with up command.
 Please help in get rid of that error.
 
 When i run the drbdadm up r0 it exits with error code 20.
 Device '0' is configured!
 Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al'
 terminatedwith exit code 20

You don't need to to bring the DRBD device up after running the init
script, that's the responsibility of the script.

You only need to make it Primary manually if you don't want a cluster
manager to do the job.

If you want pacemaker to do this job, deactivate the init script
completely  and follow the DRBD users guide on how to integrate DRBD
into your cluster configuration.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 But still cat /proc/drdb shows that disk is created and it is in
 inconsistent state.
 Then i can promote it to primary and everything works fine.
 
 How can i get rid of the Device '0' is configured! error?
 
 Please help me.
 
 Thank You
 Vengatesh Prabhu
 
 
 On Sat, Jan 28, 2012 at 2:19 PM, Andreas Kurz andr...@hastexo.com wrote:
 Hello,

 On 01/28/2012 01:26 PM, venkatesh prabhu wrote:
 Hi,
 Please help me solve my issues.
 I am trying to bring up DRBD for first time but i am facing following 
 problems.

 1. when i start the drbd service for first time it says adjust disk failed:
 Starting DRBD resources: [
  create res: r0
  prepare disk: r0
  adjust disk: r0:failed(apply-al:20)
  adjust net: r0
 ]

 2. Then creation of metadat drbdadm create-md r0 is success.

 Do the metadata creation on both nodes before you start the drbd service
 and the rest should be fine ... and be sure to use 8.4.1 and nod 8.4.0
 release.

 Regards,
 Andreas

 --
 Need help with DRBD?
 http://www.hastexo.com/now


 3. Then drbdadm up r0 fails with exit code 1.
 drbdadm up r0r0: Failure: (102) Local address(port) already in use.
 Command 'drbdsetup connect r0 ipv4:10.203.230.136:7788
 ipv4:10.203.230.135:7788 --shared-secret=DRBD --ping-timeout=5
 --ping-int=10 --connect-int=10 --timeout=60 --protocol=C' terminated
 with exit code 1

 4. If the run the same command drbdadm up r0  it fails with exit code 10.
 Device '0' is configured!
 Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al'
 terminatedwith exit code 20

 but still i can promote the resource to primary and the sync happens
 properly between two nodes.

 but how can i avoid those errors? please help me.

 my drbd.conf file is provided below.

 global {
  usage-count no;
   }

   common {

 protocol C;

 startup {
 degr-wfc-timeout 3;#3 = 3 sec..

 wfc-timeout 180;# 3 min.
 } # end of startup

 handlers {
 } # end of handlers

   disk {
 on-io-error  detach;
} # end of disk

net {
  timeout   60;#  6 seconds  (unit = 0.1 seconds)
  connect-int   10;# 10 seconds  (unit = 1 second)
  ping-int  10;# 10 seconds  (unit = 1 second)
 ping-timeout   5;# 500 ms (unit = 0.1 seconds)
  shared-secret DRBD;
} # end of net
  } # end of common

 resource r0{

syncer {
 rate 100M;
 }

  on lab1601 {
 device /dev/drbd0;
 disk   /dev/vg0/mirror;
 address10.203.230.135:7788;
 meta-disk  /dev/vg0/drbdmeta;
 }

 on lab1602 {
 device/dev/drbd0;
 disk  /dev/vg0/mirror;
 address   10.203.230.136:7788;
 meta-disk /dev/vg0/drbdmeta;
}
  } #end


Thank You
Vengatesh Prabhu







 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

 
 
 






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Slower disk throughput on DRBD partition

2012-02-01 Thread Andreas Kurz
Hello,

On 02/01/2012 01:04 PM, Frederic DeMarcy wrote:
 Hi
 
 Note 1:
 Scientific Linux 6.1 with kernel 2.6.32-220.4.1.el6.x86_64
 DRBD 8.4.1 compiled from source
 
 Note 2:
 server1 and server2 are 2 VMware VMs on top of ESXi 5. However they reside on 
 different physical 2U servers.
 The specs for the 2U servers are identical:
   - HP DL380 G7 (2U)
   - 2 x Six Core Intel Xeon X5680 (3.33GHz)
   - 24GB RAM
   - 8 x 146 GB SAS HD's (7xRAID5 + 1s)
   - Smart Array P410i with 512MB BBWC

Have you tried to change the I/O scheduler to deadline or noop in the VMs?

... see below ..

   
 Note 3:
 I've tested the network throughput with iperf which yields close to 1Gb/s
 [root@server1 ~]# iperf -c 192.168.111.11 -f g
 
 Client connecting to 192.168.111.11, TCP port 5001
 TCP window size: 0.00 GByte (default)
 
 [  3] local 192.168.111.10 port 54330 connected with 192.168.111.11 port 5001
 [ ID] Interval   Transfer Bandwidth
 [  3]  0.0-10.0 sec  1.10 GBytes  0.94 Gbits/sec
 
 [root@server2 ~]# iperf -s -f g
 
 Server listening on TCP port 5001
 TCP window size: 0.00 GByte (default)
 
 [  4] local 192.168.111.11 port 5001 connected with 192.168.111.10 port 54330
 [ ID] Interval   Transfer Bandwidth
 [  4]  0.0-10.0 sec  1.10 GBytes  0.94 Gbits/sec
 
 Scp'ing a large file from server1 to server2 yields ~ 57MB/s but I guess it's 
 due to the encryption overhead.
 
 Note 4:
 MySQL was not running.
 
 
 
 Base DRBD config:
 resource mysql {
   startup {
 wfc-timeout 3;
 degr-wfc-timeout 2;
 outdated-wfc-timeout 1;
   }
   net {
 protocol C;
 verify-alg sha1;
 csums-alg sha1;

using csums based resync is only interesting for WAN setups where you
need to sync via a rather thin connection

 data-integrity-alg sha1;

using data-integrity-alg is definitely not recommended (slow) for live
setups, only if you have to assume there is buggy hardware on the way
between your nodes ... like nics pretending csums are ok while they are not

and out of curiosity ... did you gave DRBD 8.3.12 already a try?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now


 cram-hmac-alg sha1;
 shared-secret MySecret123;
   }
   on server1 {
 device/dev/drbd0;
 disk  /dev/sdb;
 address   192.168.111.10:7789;
 meta-disk internal;
   }
   on server2 {
 device/dev/drbd0;
 disk  /dev/sdb;
 address   192.168.111.11:7789;
 meta-disk internal;
   }
 }
 
 
 After any change in the /etc/drbd.d/mysql.res file I issued a drbdadm adjust 
 mysql on both nodes.
 
 Test #1
 DRBD partition on primary (secondary node disabled)
 Using Base DRBD config
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Throughput ~ 420MB/s
 
 Test #2
 DRBD partition on primary (secondary node enabled)
 Using Base DRBD config
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Throughput ~ 61MB/s
 
 Test #3
 DRBD partition on primary (secondary node enabled)
 Using Base DRBD config with:
   Protocol B;
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Throughput ~ 68MB/s
 
 Test #4
 DRBD partition on primary (secondary node enabled)
 Using Base DRBD config with:
   Protocol A;
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Throughput ~ 94MB/s
 
 Test #5
 DRBD partition on primary (secondary node enabled)
 Using Base DRBD config with:
   disk {
 disk-barrier no;
 disk-flushes no;
 md-flushes no;
   }
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Disk throughput ~ 62MB/s
 
 No difference from Test #2 really. Also cat /proc/drbd still shows wo:b in 
 both cases so I'm not even sure
 these disk {..} parameters have been taken into account...
 
 Test #6
 DRBD partition on primary (secondary node enabled)
 Using Base DRBD config with:
   Protocol B;
   disk {
 disk-barrier no;
 disk-flushes no;
 md-flushes no;
   }
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096 
 oflag=direct
 Disk throughput ~ 68MB/s
 
 No difference from Test #3 really. Also cat /proc/drbd still shows wo:b in 
 both cases so I'm not even sure
 these disk {..} parameters have been taken into account...
 
 
 What else can I try?
 Is it worth trying DRBD 8.3.x?
 
 Thx.
 
 Fred
 
 
 
 
 
 
 On 1 Feb 2012, at 08:35, James Harper wrote:
 
 Hi

 I've configured DRBD with a view to use it with MySQL (and later on
 Pacemaker + Corosync) in a 2 nodes primary/secondary
 (master/slave) setup.

 ...

 No replication over the 1Gb/s crossover cable is taking place since the
 secondary node is down yet there's x2 lower disk performance.

 

Re: [DRBD-user] drbd 8.3.12 cannot get syncer speed beyond 110MB/s

2012-02-01 Thread Andreas Kurz
Hello,

On 02/01/2012 01:36 PM, Maurits van de Lande wrote:
 Hello,
 
 I have asked this question before for drbd 8.4.1 but I couldn't get proper 
 write performance.
 Currently I downgraded drbd to 8.3.12 but I cannot get the syncer speed above 
 1Gb/s or 110MB/s.
 
 When I test the disk throughput with
 #dd if=/dev/zero of=/VM/test bs=512M count=1  oflag=direct 
 I get a speed around 500MB/s
 (It's an all SSD raid 5 array)
 I installed a 10G network adapter and tested the connection with iperf and I 
 get on average 7Gbit/s
 
 This should all be sufficient for a drbd syncer speed 110MB/s.
 I have set the rate=200M option for a 2Gbit/s rate
 
 cat /proc/drbd shows
 version: 8.3.12 (api:88/proto:86-96)
 GIT-hash: e2a8ef4656be026bbae540305fcb998a5991090f build by dag@Build64R6, 
 2011-11-20 10:57:03
  0: cs:SyncSource ro:Primary/Primary ds:UpToDate/Inconsistent C r-
 ns:176 nr:0 dw:0 dr:79839200 al:0 bm:4872 lo:0 pe:0 ua:20 ap:0 ep:1 wo:d 
 oos:1470943696
 [...] sync'ed:  5.2% (1436468/1514432)M
 finish: 3:30:54 speed: 116,228 (109,212) K/sec
 
 Only at the start the value of (200.000) K/sec is shown.
 
 What can I do to get the 200MB/s syncer speed?

You followed all the guidelines in the performance tuning section of the
DRBD Users Guide?

... like using jumbo frames on the direct connected 10Gb link and the
deadline I/O scheduler, to just name two important ones ...

Regards,
Andreas

-- 
Need help with DRBD performance tuning?
https://www.hastexo.com/services/remote

 
 Best regards,
 
 Maurits van de Lande
 
 
 | Van de Lande BV. | Lissenveld 1 | 4941VK | Raamsdonksveer | the Netherlands 
 |T +31 (0) 162 516000 | F +31 (0) 162 521417 | www.vdl-fittings.com |
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Slower disk throughput on DRBD partition

2012-02-01 Thread Andreas Kurz
On 02/01/2012 05:15 PM, Frederic DeMarcy wrote:
 Hi Andrea
 
 Commenting out csum-alg doesn't seem to make any noticeable difference...
 However commenting out data-integrity-alg and running Test #2 again
 increases the throughput from ~ 61MB/s to ~ 97MB/s !
 Note that I may well run into the 1Gb/s crossover link limit here since
 my network tests showed ~ 0.94 Gb/s
 
 Also Test #1 was wrong in my email... It should have been split in 2:
 Test #1
 On non-DRBD device (/dev/sda)
 # dd if=/dev/zero of=/home/userxxx/disk-test.xxx bs=1M count=4096
 oflag=direct
 Throughput ~ 420MB/s
 
 DRBD partition (/dev/sdb) on primary (secondary node disabled)
 Using Base DRBD config
 # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M count=4096
 oflag=direct
 Throughput ~ 205MB/s

Is the result the same if you execute a drbdadm invalidate-remote
mysql on the primary before doing the single node test?  that
would disable activity log updates ...

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/services/remote

 
 With the above -alg commented out, disabling the secondary node and
 running Test #1 again (correctly split this time) shows the same
 throughputs of ~ 420MB/s and ~ 205MB/s
 
 Fred
 
 On Wed, Feb 1, 2012 at 1:48 PM, Andreas Kurz andr...@hastexo.com
 mailto:andr...@hastexo.com wrote:
 
 Hello,
 
 On 02/01/2012 01:04 PM, Frederic DeMarcy wrote:
  Hi
 
  Note 1:
  Scientific Linux 6.1 with kernel 2.6.32-220.4.1.el6.x86_64
  DRBD 8.4.1 compiled from source
 
  Note 2:
  server1 and server2 are 2 VMware VMs on top of ESXi 5. However
 they reside on different physical 2U servers.
  The specs for the 2U servers are identical:
- HP DL380 G7 (2U)
- 2 x Six Core Intel Xeon X5680 (3.33GHz)
- 24GB RAM
- 8 x 146 GB SAS HD's (7xRAID5 + 1s)
- Smart Array P410i with 512MB BBWC
 
 Have you tried to change the I/O scheduler to deadline or noop in
 the VMs?
 
 ... see below ..
 
 
  Note 3:
  I've tested the network throughput with iperf which yields close
 to 1Gb/s
  [root@server1 ~]# iperf -c 192.168.111.11 -f g
  
  Client connecting to 192.168.111.11, TCP port 5001
  TCP window size: 0.00 GByte (default)
  
  [  3] local 192.168.111.10 port 54330 connected with
 192.168.111.11 port 5001
  [ ID] Interval   Transfer Bandwidth
  [  3]  0.0-10.0 sec  1.10 GBytes  0.94 Gbits/sec
 
  [root@server2 ~]# iperf -s -f g
  
  Server listening on TCP port 5001
  TCP window size: 0.00 GByte (default)
  
  [  4] local 192.168.111.11 port 5001 connected with 192.168.111.10
 port 54330
  [ ID] Interval   Transfer Bandwidth
  [  4]  0.0-10.0 sec  1.10 GBytes  0.94 Gbits/sec
 
  Scp'ing a large file from server1 to server2 yields ~ 57MB/s but I
 guess it's due to the encryption overhead.
 
  Note 4:
  MySQL was not running.
 
 
 
  Base DRBD config:
  resource mysql {
startup {
  wfc-timeout 3;
  degr-wfc-timeout 2;
  outdated-wfc-timeout 1;
}
net {
  protocol C;
  verify-alg sha1;
  csums-alg sha1;
 
 using csums based resync is only interesting for WAN setups where you
 need to sync via a rather thin connection
 
  data-integrity-alg sha1;
 
 using data-integrity-alg is definitely not recommended (slow) for live
 setups, only if you have to assume there is buggy hardware on the way
 between your nodes ... like nics pretending csums are ok while they
 are not
 
 and out of curiosity ... did you gave DRBD 8.3.12 already a try?
 
 Regards,
 Andreas
 
 --
 Need help with DRBD?
 http://www.hastexo.com/now
 
 
  cram-hmac-alg sha1;
  shared-secret MySecret123;
}
on server1 {
  device/dev/drbd0;
  disk  /dev/sdb;
  address   192.168.111.10:7789 http://192.168.111.10:7789;
  meta-disk internal;
}
on server2 {
  device/dev/drbd0;
  disk  /dev/sdb;
  address   192.168.111.11:7789 http://192.168.111.11:7789;
  meta-disk internal;
}
  }
 
 
  After any change in the /etc/drbd.d/mysql.res file I issued a
 drbdadm adjust mysql on both nodes.
 
  Test #1
  DRBD partition on primary (secondary node disabled)
  Using Base DRBD config
  # dd if=/dev/zero of=/var/lib/mysql/TMP/disk-test.xxx bs=1M
 count=4096 oflag=direct
  Throughput ~ 420MB/s
 
  Test #2
  DRBD partition on primary (secondary node enabled)
  Using Base DRBD

Re: [DRBD-user] Urgent: Stuch with DRBD bring up.

2012-01-28 Thread Andreas Kurz
Hello,

On 01/28/2012 01:26 PM, venkatesh prabhu wrote:
 Hi,
 Please help me solve my issues.
 I am trying to bring up DRBD for first time but i am facing following 
 problems.
 
 1. when i start the drbd service for first time it says adjust disk failed:
 Starting DRBD resources: [
  create res: r0
  prepare disk: r0
  adjust disk: r0:failed(apply-al:20)
  adjust net: r0
 ]
 
 2. Then creation of metadat drbdadm create-md r0 is success.

Do the metadata creation on both nodes before you start the drbd service
and the rest should be fine ... and be sure to use 8.4.1 and nod 8.4.0
release.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 3. Then drbdadm up r0 fails with exit code 1.
 drbdadm up r0r0: Failure: (102) Local address(port) already in use.
 Command 'drbdsetup connect r0 ipv4:10.203.230.136:7788
 ipv4:10.203.230.135:7788 --shared-secret=DRBD --ping-timeout=5
 --ping-int=10 --connect-int=10 --timeout=60 --protocol=C' terminated
 with exit code 1
 
 4. If the run the same command drbdadm up r0  it fails with exit code 10.
 Device '0' is configured!
 Command 'drbdmeta 0 v08 /dev/vg0/drbdmeta flex-external apply-al'
 terminatedwith exit code 20
 
 but still i can promote the resource to primary and the sync happens
 properly between two nodes.
 
 but how can i avoid those errors? please help me.
 
 my drbd.conf file is provided below.
 
 global {
  usage-count no;
   }
 
   common {
 
 protocol C;
 
 startup {
 degr-wfc-timeout 3;#3 = 3 sec..
 
 wfc-timeout 180;# 3 min.
 } # end of startup
 
 handlers {
 } # end of handlers
 
   disk {
 on-io-error  detach;
} # end of disk
 
net {
  timeout   60;#  6 seconds  (unit = 0.1 seconds)
  connect-int   10;# 10 seconds  (unit = 1 second)
  ping-int  10;# 10 seconds  (unit = 1 second)
 ping-timeout   5;# 500 ms (unit = 0.1 seconds)
  shared-secret DRBD;
} # end of net
  } # end of common
 
 resource r0{
 
syncer {
 rate 100M;
 }
 
  on lab1601 {
 device /dev/drbd0;
 disk   /dev/vg0/mirror;
 address10.203.230.135:7788;
 meta-disk  /dev/vg0/drbdmeta;
 }
 
 on lab1602 {
 device/dev/drbd0;
 disk  /dev/vg0/mirror;
 address   10.203.230.136:7788;
 meta-disk /dev/vg0/drbdmeta;
}
  } #end
 
 
Thank You
Vengatesh Prabhu
 
 
 
 




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd and 10Gb/s network how to increase syncer rate beyond 100MB/s?

2012-01-28 Thread Andreas Kurz
Hello,

On 01/27/2012 02:35 PM, Maurits van de Lande wrote:
 Hello,
 
  
 
 I have a couple of servers with SSD based raid arrays (capable of
 writing 550+MB/s to the disks). And a 10Gb/s network (HP NC552SFP
 network adapter HP E8206 network switches)
 
  
 
 Before installing the 10Gb network adapter the syncrate was limited to
 around 102MB/s (1Gb/s?)
 
 After Installing the 10Gb network adapter it was still impossible to get
 more than 102MB/s sync rate between the servers.
 
 I would like to have a 300MB/s sync rate. When I set this as a fixed
 sync rate in my drbd84 resource file I noticed that this value is
 ignored and 102MB/s is used.

Just to be sure  you are using the new 10Gbit interconnect in your
DRBD configuration ... and you already tested with iperf or the tool of
your choice that you can use the full bandwidth?

Regards,
Andreas

-- 
Need help with DRBD performance tuning?
http://www.hastexo.com/services/remote

 
  
 
 How can I increase the syncer rate in drbd84?
 
  
 
 Best regards,
 
  
 
 Maurits van de Lande
 
  
 
 |Van de Lande BV.|Lissenveld 1|4941VK |Raamsdonksveer|the Netherlands |T
 +31 (0) 162 516000 |F +31 (0) 162 521417 |www.vdl-fittings.com
 http://www.vdl-fittings.com |
 
  
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Slow buffered read performance

2012-01-24 Thread Andreas Kurz
On 01/24/2012 10:59 AM, samp...@neutraali.net wrote:
 Hello all,
 
 I'm running DRBD on two following machines:
 Scientific Linux 6.1
 Intel Xeon E5620 Quad Core
 12GB Ram
 LSI 9280-4i4e + CacheCade 1.0 (CacheCade is currently disabled)
 12x 1TB Seagate Constellation SAS 7200 RPM drives
 
 DRBD has been configured in master/slave -fashion and there is 10 GbE
 dedicated link between machines for DRBD traffic.
 Drives have been configured in RAID-10 mode with 64 kB Stripe Size.
 
 DRBD version is 8.4.0 (api:1/proto:86-100).

Have you tried to reproduce that with drbd 8.4.1 ?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Couple of weeks ago I had to do some maintenance work on node1 (normally
 the master node) so I moved all services to run on node2. Everything
 went well and since machines are identical I didn't bother to move
 services back to node1. Few days later I noticed that performance was
 somewhat degraded but difference was so minimal that I didn't focus on
 it at all.
 
 A little later I was asked to do some simple read performance tests.
 Everything looked ok when doing direct reads on DRBD-device:
 
 dd if=/dev/drbd0 of=/dev/null bs=1M iflag=direct
 ^C17317+0 records in
 17316+0 records out
 18157142016 bytes (18 GB) copied, 21.3465 s, 851 MB/s
 
 But with buffered reads things get slow:
 
 dd if=/dev/drbd0 of=/dev/null bs=1M
 ^C11131+0 records in
 11130+0 records out
 11670650880 bytes (12 GB) copied, 105.299 s, 111 MB/s
 
 However, the underlying disk seems to be fine:
 
 dd if=/dev/sdb of=/dev/null bs=1M
 ^C14087+0 records in
 14086+0 records out
 14770241536 bytes (15 GB) copied, 19.8579 s, 744 MB/s
 
 I moved services back to node1 and the problem was gone (dd
 if=/dev/drbd0 of=/dev/null bs=1M, 37312528384 bytes (37 GB) copied,
 54.8519 s, 680 MB/s). Now I started to investigate what caused the
 issues and moved services to node2 and performance problems hit again so
 I thought that there has to be something wrong on node2. I compared
 settings between the two machines to make sure they are really identical
 and found nothing strange between them. Raid sets are fine and there are
 no error messages in log files. At this point I decided to reboot node1
 and after that moved services back to it. After the reboot performance
 dropped also on node1 and I haven't been able to find out anything that
 could really help getting performance up again.
 
 So it seems like DRBD has huge effect when it comes to buffered reads.
 It may very well be that I have forgotten to do some sysctl or such
 tuning after reboot but I can't figure out what it could be. Any ideas
 how to work this out or is this expected behaviour?
 
 I'm currently using following settings on my disks:
 
 echo deadline  /sys/block/sdb/queue/scheduler
 echo 0  /sys/block/sdb/queue/iosched/front_merges
 echo 150  /sys/block/sdb/queue/iosched/read_expire
 echo 1500  /sys/block/sdb/queue/iosched/write_expire
 echo 3200  /proc/sys/vm/dirty_background_bytes
 echo 38400  /proc/sys/vm/dirty_bytes
 echo 1024  /sys/block/sdb/queue/nr_requests
 
 and current DRBD resource configuration:
 
 resource drbd0 {
 device /dev/drbd0;
 disk /dev/sdb1;
 meta-disk internal;
 
 options {
 cpu-mask 15;
 }
 
 net {
 protocol C;
 max-buffers 8000;
 max-epoch-size 8000;
 unplug-watermark 16;
 sndbuf-size 0;
 }
 
 disk {
 al-extents 3389;
 disk-barrier no;
 disk-flushes no;
 }
 
 on node1 {
 address 10.10.10.1:7789;
 }
 on node2 {
 address 10.10.10.2:7789;
 }
 }
 
 Best regards,
 Samuli Heinonen
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Concurrent local write detected! [DISCARD L]

2012-01-19 Thread Andreas Kurz
Hello,

On 01/18/2012 12:39 PM, Alessandro Bono wrote:
 Hi
 
 installing a kvm virtual machine on a drbd disk cause these logs on host
 machine
 
 [2571736.830557] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 48981951s +32768; pending: 48981951s +32768
 [2571736.857671] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 48982015s +512; pending: 48982015s +512
 [2571736.884479] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 48982016s +3584; pending: 48982016s +3584
 [2571736.911285] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 48982023s +28672; pending: 48982023s +28672
 [2571798.062440] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 66639863s +4096; pending: 66639855s +8192
 [2571798.089232] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 66639871s +512; pending: 66639871s +512
 [2571798.116014] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 66639872s +3584; pending: 66639872s +3584
 [2571798.143110] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 66639879s +57344; pending: 66639879s +53248
 [2571932.144089] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 64914183s +24576; pending: 64914215s +65536
 [2572014.253295] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 78267975s +65536; pending: 78267975s +65536
 [2572317.294655] block drbd0: kvm[7083] Concurrent local write detected! 
 [DISCARD L] new: 45901543s +12288; pending: 45901543s +12288
 [2572346.510458] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 64619831s +65536; pending: 64619831s +65536
 [2572402.322369] block drbd0: kvm[7082] Concurrent local write detected! 
 [DISCARD L] new: 64818567s +61440; pending: 64818567s +61440
 [2572402.349160] block drbd0: kvm[7082] Concurrent local write detected! 
 [DISCARD L] new: 64818687s +512; pending: 64818687s +512
 [2572402.376182] block drbd0: kvm[7082] Concurrent local write detected! 
 [DISCARD L] new: 64818688s +3584; pending: 64818688s +3584
 [2572403.429157] block drbd0: kvm[7082] Concurrent local write detected! 
 [DISCARD L] new: 64896055s +65536; pending: 64896055s +65536
 [2572422.493968] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 57422703s +65536; pending: 57422703s +65536
 [2572889.505644] block drbd0: kvm[7080] Concurrent local write detected! 
 [DISCARD L] new: 75415135s +65536; pending: 75415135s +65536
 
 on mailing list I found some reference to iscsi not on a vm directly
 installed on drbd
 
 vm guest is a windows 2003 server r2 with virtio disk drivers

Are you using latest version of the virtio drivers and do the errors
also occur if you run the machine without them?

 vm host is an ubuntu server with kernel 2.6.38-13 and drbd 8.3.12 from git
 qemu-kvm 0.12.3+noroms-0ubuntu9.16

can you show us the options you use to start the VM ... or a virsh
dumpxml output

 
 cat /proc/drbd
 version: 8.3.12 (api:88/proto:86-96)
 GIT-hash: 465da64362f0aece357e9015c50ed849e2458abd debian/changelog
 debian/control build by root@nebbiolo-dev, 2011-12-29 16:00:53

... is this the complete config? Please show us a drbdadm dump all and
the full cat /proc/drbd

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 cat /etc/drbd.d/r0.res
 resource r0 {
  syncer {
 rate 25M;
 csums-alg sha1;
 verify-alg sha1;
   }
 
   net {
 cram-hmac-alg sha1;
 shared-secret xx;
   }
 
   disk {
 no-disk-flushes;
 no-md-flushes;
   }
 
   on ga1 {
 device /dev/drbd0;
 disk   /dev/ga1/winsrv;
 address10.12.24.242:7788;
 meta-disk  internal;
   }
 
   on ga2 {
 device/dev/drbd0;
 disk  /dev/ga2/winsrv;
 address   10.12.24.243:7788;
 meta-disk internal;
   }
 
 }
 
 
 Is there a workaround/solution?
 
 thanks
 





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 4-way replication

2012-01-19 Thread Andreas Kurz
Hello,

On 01/19/2012 09:51 AM, Benjamin Knoth wrote:
 Hi Lars,
 i'm not interestedin Pacemaker 1.05. But in the DRBD User Guide you can
 only read the following.
 
 Note
 
 Due to limitations in the Pacemaker cluster manager as of Pacemaker
 version 1.0.5, it is not possible to create this setup in a single
 four-node cluster without disabling CIB validation, which is an advanced
 process not recommended for general-purpose use. It is anticipated that
 this is being addressed in future Pacemaker releases.
 
 I use Pacemaker 1.1.5 at the moment.

If you wan't to configure a split-site cluster you could test two
two-node Pacemaker 1.1.6 cluster and the new booth service:

http://doc.opensuse.org/products/draft/SLE-HA/SLE-ha-guide_sd_draft/cha.ha.geo.html

If you have reliable interconnect between the two pairs including
fencing you could try to use one four node cluster with some fancy
constraints ... I think that should work ... though the correct
constraints might be challenging.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Best regards
 
 Benjamin
 
 
 Am 18.01.2012 17:46, schrieb Lars Ellenberg:
 On Wed, Jan 18, 2012 at 02:12:13PM +0100, Benjamin Knoth wrote:
 Hi all,
 the 4-node replication is working fine and i can mange the resources in
 Pacemaker. In the documentation of the user guide i read that's not
 possible in Pacemaker 1.05 to create a 4 node cluster only 2.

 Why would you be interested in notes for Pacemaker 1.0.5,
 if we have 1.0.12 and 1.1.6-almost-7 out there?

 What's the
 status of this feature? Is it integrated or in which version is it
 planned to integrate this feature?

 best regards

 Benjamin


 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] % of utilisation on my drbd device

2012-01-19 Thread Andreas Kurz
sorry if that might get double-posted ... I used wrong sender address on
initial reply ...

--

Hello,

On 01/19/2012 07:56 AM, Matthieu Lejeune wrote:
 Hello,
 
 I have a primary/secondary node with 2 ressources.
 When I watch the i/O performance i look some think like this :
 
 When I watch the physical drive utilisation :
 
 root@relax:~# iostat -x /dev/sda 1
 Linux 2.6.32-5-amd64 (relax) 01/19/2012 _x86_64_(16 CPU)
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.020.001.090.020.00   98.87
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda 220.78   700.91  555.52  374.88 20794.66 21227.05   
 45.16 1.541.66   0.42  38.87
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.040.000.170.000.00   99.79
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda 186.0048.00  211.00  162.00 15392.00  2624.00   
 48.30 0.030.09   0.06   2.40
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.000.290.000.00   99.71
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 sda 197.00 9.00  379.00  117.00 19472.00  1172.00   
 41.62 0.150.31   0.19   9.60
 
 When I watch the drbd device utilisation :
 
 root@relax:~# iostat -x /dev/drbd0 1
 Linux 2.6.32-5-amd64 (relax) 01/19/2012 _x86_64_(16 CPU)
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.020.001.090.020.00   98.87
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 drbd0 0.00 0.00  775.16 1016.47 20729.96 21174.70   
 23.39 0.945.51   0.56  99.85
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.000.560.000.00   99.44
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 drbd0 0.00 0.00  426.00  104.00 15648.00  1262.00   
 31.91 5.190.15   1.87  99.20
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.000.560.000.00   99.44
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 drbd0 0.00 0.00  404.00  148.00 15888.00  1872.00   
 32.17 5.780.33   1.80  99.60
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.000.310.000.00   99.69
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 drbd0 0.00 0.00  398.00  117.00 15632.00  1736.00   
 33.72 5.280.10   1.94 100.00
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.000.000.240.000.00   99.76
 
 Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s
 avgrq-sz avgqu-sz   await  svctm  %util
 drbd0 0.00 0.00  406.00  106.00 15376.00  1382.00   
 32.73 5.070.17   1.92  98.40
 
 I don't understand why my drbd device is at 100 %util.
 
 This is my ressource configuration : There are a raid 0 hardware with 12
 sas 15k drives.

So think you should get higher write?? performance from your setup ...
what are your current benchmarking results?

Looking at your config I suggest you read the performance tuning chapter
in the DRBD user guide ... assuming you have a raid controller with non
volatile cache and 10GBit interconnect you should really get quite near
to native speed with DRBD 

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now


 resource data {
   protocol C;

   startup {
 wfc-timeout 0;
   }

   disk {
 on-io-error detach;
   }

   syncer {
 rate 20M;
 verify-alg md5;
   }

   on surtax {
 device/dev/drbd0;
 disk  /dev/sda;
 address   10.1.42.11:7788;
 meta-disk internal;
   }

   on relax {
 device/dev/drbd0;
 disk  /dev/sda;
 address   10.1.42.12:7788;
 meta-disk internal;
   }
 }

 What's wrong on my configuration ?

 Thank's

 Matthieu

 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 4-way replication

2012-01-17 Thread Andreas Kurz
Hello,

On 01/16/2012 11:05 AM, Benjamin Knoth wrote:
 Hello,
 i think it's clear and the pacemaker config is also clear but i can't
 get a positiv result.
 
 I started with 4 maschines
 
 vm01 vm02 vm03 vm04
 
 On vm01 and vm02 i created a DRBD resource with this config.
 
 resource test {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;
 
 
 syncer{
 rate 800M;
 }
 
 on vm01 {
 address 10.10.255.12:7003;
 }
 
 on vm02 {
 address 10.10.255.13:7003;
 }
 }
 
 On vm03 and vm04 i created this DRBD Resource
 
 resource test2 {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;
 
 
 syncer{
 rate 800M;
 }
 
 on vm03 {
 address 10.10.255.14:7003;
 }
 
 on vm04 {
 address 10.10.255.15:7003;
 }
 }
 
 This two unstacked resources are running.
 
 If i look in the documentation i think that i need to create the
 following DRBD Resource on vm01-04.
 
 resource stest {
 protocolA;
 stacked-on-top-of test2 {
 device /dev/drbd13;
 address 10.10.255.16:7009;
 }
 stacked-on-top-of test {
 device /dev/drbd13;
 address 10.10.255.17:7009;
 }
 }
 
 But if i save this and copy them to all vms i get on vm03-04 if i run
 drbdadm --stacked create-md stest

Use the same, complete config on all nodes.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 drbd.d/stest.res:1: in resource stest, referenced resource 'test' not
 defined.
 
 and vm01-02 on
 
 drbd.d/stest.res:1: in resource stest, referenced resource 'test2' not
 defined.
 
 What do i need that vm01-02 know about test2 on vm03-04 and vm03-04 know
 about test on vm01-02?
 
 Both ip addresses are virtual adresses on vm01 and vm03 where test and
 test2 are primary
 
 That is what i understood after i look on the picture and the pacemaker
 configuration.
 
 Best regards
 
 Benjamin
 
 Am 13.01.2012 15:27, schrieb Andreas Kurz:
 Hello,

 On 01/13/2012 12:56 PM, Benjamin Knoth wrote:
 Hi,

 i will create a 4 node replication with DRBD.
 I read also the documentation.
 I understand also the configuration of a 3 way replication, but how do i
 need to config the 4 way replication?

 I configured 2 2way resources successfully and now i need to config the
 stacked resource.

 Have a look at:

 http://www.drbd.org/users-guide-8.3/s-pacemaker-stacked-resources.html#s-pacemaker-stacked-dr

 ... a picture says more than 1000 words ;-)


 resource r0-U {
 {
 protocol A;
 }
 stacked-on-top-of r0 {
 device
 /dev/drbd10;
 address
 192.168.42.1:7788;
 }
 on charlie {
 device /dev/drbd10;
 disk /dev/hda6;
 address 192.168.42.2:7788; # Public IP of the backup
 meta-disk internal;
 }

 }


 Is the solution to define on server alice and bob and charlie and daisy
 a lower level resource with protoc C and than one stacked resource where
 directly the stacked resource from alice and bob communicate with the
 stacked resource of charlie and daisy like this configuration?

 Yes, configure the replication between two stacked resources.


 resource stacked {
 protocolA;
 stacked-on-top-of r0 {
 device /dev/drbd10;
 address 192.168.:7788;
 }
 stacked-on-top-of r0 {
 device /dev/drbd10;
 address 134.76.28.188:7788;
 }
 }

 Best regards

 Regards,
 Andreas




 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 4-way replication

2012-01-17 Thread Andreas Kurz
On 01/17/2012 03:02 PM, Benjamin Knoth wrote:
 Hello Andreas,
 
 Am 17.01.2012 14:51, schrieb Andreas Kurz:
 Hello,

 On 01/16/2012 11:05 AM, Benjamin Knoth wrote:
 Hello,
 i think it's clear and the pacemaker config is also clear but i can't
 get a positiv result.

 I started with 4 maschines

 vm01 vm02 vm03 vm04

 On vm01 and vm02 i created a DRBD resource with this config.

 resource test {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;


 syncer{
 rate 800M;
 }

 on vm01 {
 address 10.10.255.12:7003;
 }

 on vm02 {
 address 10.10.255.13:7003;
 }
 }

 On vm03 and vm04 i created this DRBD Resource

 resource test2 {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;


 syncer{
 rate 800M;
 }

 on vm03 {
 address 10.10.255.14:7003;
 }

 on vm04 {
 address 10.10.255.15:7003;
 }
 }

 This two unstacked resources are running.

 If i look in the documentation i think that i need to create the
 following DRBD Resource on vm01-04.

 resource stest {
 protocolA;
 stacked-on-top-of test2 {
 device /dev/drbd13;
 address 10.10.255.16:7009;
 }
 stacked-on-top-of test {
 device /dev/drbd13;
 address 10.10.255.17:7009;
 }
 }

 But if i save this and copy them to all vms i get on vm03-04 if i run
 drbdadm --stacked create-md stest

 Use the same, complete config on all nodes.
 I copied this config on all nodes.

And still not working? Can you provide or pastebin drbdadm dump all
and cat /proc/drbd from a node that gives you that error?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Best regards
 
 Benjamin
 

 Regards,
 Andreas




 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 4-way replication

2012-01-17 Thread Andreas Kurz
Hello,

On 01/17/2012 04:22 PM, Benjamin Knoth wrote:
 Hello,
 
 Am 17.01.2012 15:36, schrieb Andreas Kurz:
 On 01/17/2012 03:02 PM, Benjamin Knoth wrote:
 Hello Andreas,

 Am 17.01.2012 14:51, schrieb Andreas Kurz:
 Hello,

 On 01/16/2012 11:05 AM, Benjamin Knoth wrote:
 Hello,
 i think it's clear and the pacemaker config is also clear but i can't
 get a positiv result.

 I started with 4 maschines

 vm01 vm02 vm03 vm04

 On vm01 and vm02 i created a DRBD resource with this config.

 resource test {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;


 syncer{
 rate 800M;
 }

 on vm01 {
 address 10.10.255.12:7003;
 }

 on vm02 {
 address 10.10.255.13:7003;
 }
 }

 On vm03 and vm04 i created this DRBD Resource

 resource test2 {
 device /dev/drbd3;
 meta-disk internal;
 disk /dev/vg01/test;
 protocolC;


 syncer{
 rate 800M;
 }

 on vm03 {
 address 10.10.255.14:7003;
 }

 on vm04 {
 address 10.10.255.15:7003;
 }
 }

 This two unstacked resources are running.

 If i look in the documentation i think that i need to create the
 following DRBD Resource on vm01-04.

 resource stest {
 protocolA;
 stacked-on-top-of test2 {
 device /dev/drbd13;
 address 10.10.255.16:7009;
 }
 stacked-on-top-of test {
 device /dev/drbd13;
 address 10.10.255.17:7009;
 }
 }

 But if i save this and copy them to all vms i get on vm03-04 if i run
 drbdadm --stacked create-md stest

 Use the same, complete config on all nodes.
 I copied this config on all nodes.

 And still not working? Can you provide or pastebin drbdadm dump all
 and cat /proc/drbd from a node that gives you that error?
 
 on vm01 and vm02 i get for resource test on cat /proc/drbd. The not
 stacked resource works
 
  3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
 ns:1077644 nr:0 dw:33232 dr:1044968 al:13 bm:63 lo:0 pe:0 ua:0 ap:0
 ep:1 wo:b oos:0
 
 After i copied the config with resource stest to all 4 nodes i get the
 following on vm01 and vm02.
 
 drbdadm dump all
 
 drbd.d/stest.res:1: in resource stest, referenced resource 'test2' not
 defined.
 
 And cat /proc/drbd display only the unstacked test resource
 
   3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
 ns:0 nr:0 dw:0 dr:528 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 
 On vm03 and vm04 i can't also find a stacked resource in /proc/drbd
 
  3: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-
 ns:0 nr:0 dw:0 dr:536 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 
 drbdadm dump all
 drbd.d/stest.res:1: in resource stest, referenced resource 'test' not
 defined.
 
 You see that on the referenced resource are different between vm01-02
 and vm03-04. On the example the unstacked resources had also different
 names. In this part DRBD need to know that the referenced resource test
 is also available on vm01-02 and test2 is only available on vm03-04.
 That is the problem what i need to solve or not?

Yes ... put _all_ resource configs on _all_ nodes (and include them in
your config of course): the same config on all four nodes

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Best regards
 Benjamin
 

 Regards,
 Andreas




 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Dual-primary to single node

2012-01-17 Thread Andreas Kurz
Hello,

On 01/13/2012 10:59 AM, Luis M. Carril wrote:
 Hello,
 
I´m new to DRBD and I think that I have a mess with some concepts and
 policies.
 
I have setup a two node cluster (of virtual machines) with a shared
 volume in dual primary mode with ocfs2 as a basic infrastructure for
 some testings.
I need that when one of the two nodes goes down the other continues
 working normally (we can assume that the other node never will recover
 again), but when one node fails
the other enter in WFConnection state and the volume is disconnected,
 I have setup the standar set of policies for split brain:
 
 after-sb-0pri discard-zero-changes;
 after-sb-1pri discard-secondary;
 after-sb-2pri disconnect;
 
 
   Which policy should I use to achieve the desired behaivour (if one
 node fails, the other continue working alone)?

these policies only take affect if the two nodes see each other again
after a split-brain and if you loose one node it is correct behaviour
that the remaining node has it's DRBD resources in WFConnection state.

What do you mean with: volume is disconnected? How do you manage your
cluster? Pacemaker? rgmanager?

Without any further information on the rest of you setup and what you
think is not working correct it's unable to comment further ...

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] 4-way replication

2012-01-13 Thread Andreas Kurz
Hello,

On 01/13/2012 12:56 PM, Benjamin Knoth wrote:
 Hi,
 
 i will create a 4 node replication with DRBD.
 I read also the documentation.
 I understand also the configuration of a 3 way replication, but how do i
 need to config the 4 way replication?
 
 I configured 2 2way resources successfully and now i need to config the
 stacked resource.

Have a look at:

http://www.drbd.org/users-guide-8.3/s-pacemaker-stacked-resources.html#s-pacemaker-stacked-dr

... a picture says more than 1000 words ;-)

 
 resource r0-U {
 {
 protocol A;
 }
 stacked-on-top-of r0 {
 device
 /dev/drbd10;
 address
 192.168.42.1:7788;
 }
 on charlie {
   device /dev/drbd10;
   disk /dev/hda6;
   address 192.168.42.2:7788; # Public IP of the backup
   meta-disk internal;
 }
 
 }
 
 
 Is the solution to define on server alice and bob and charlie and daisy
 a lower level resource with protoc C and than one stacked resource where
 directly the stacked resource from alice and bob communicate with the
 stacked resource of charlie and daisy like this configuration?

Yes, configure the replication between two stacked resources.

 
 resource stacked {
 protocolA;
 stacked-on-top-of r0 {
 device /dev/drbd10;
 address 192.168.:7788;
 }
 stacked-on-top-of r0 {
 device /dev/drbd10;
 address 134.76.28.188:7788;
 }
 }
 
 Best regards

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Metadata question

2012-01-10 Thread Andreas Kurz
Hello,

On 01/09/2012 01:41 PM, Peter Beck wrote:
 Hi guys,
 
 I have a question:
 
 If I use a partition with 1 TB, I should create a 33 MB metadata partition.
 What if I resize the 1TB partition ? Now the (internal) metadata partition
 should be resized too ? Or will it automatically attached at the end ?
 How does that exactly work ?

use internal meta-data and all is done automagically for you ... and
yes, on an online resize of the DRBD device the meta-data is also
resized and moved to the (new) end of the underlying device.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Best Regards
 Peter
 
 
 




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] blocking I/O with drbd

2011-12-29 Thread Andreas Kurz
On 12/29/2011 11:50 AM, Volker wrote:
 Hi all,
 
 ok ... still drbd 8.3.8 ... what does iostat -dx say during your test?
 sadly, yes. elrepo is not an option here because its unofficial.
 
 After hunting the problem in the perc6-caches, page-caches,
 lvm-alignment, etc. and not finding anything, i managed to convince some
 people here. We are going to try the packages from the
 elrepo-repository. any advice besides use the latest!?
 
 I was going to use
 
 drbd83-utils-8.3.12-1.el5.elrepo.x86_64.rpm
 kmod-drbd83-8.3.12-1.el5.elrepo.x86_64.rpm

fine

 
 since 8.4.x is sort of bleeding edge and still might contain some bugs.
 
 Even though the update on a test-host went flawlessly:
 
 Is there anything particular i need to look after before/after the
 update of the packages?
 

no

 Any notes on compatibility between 8.3.8-1 and 8.3.12-1 i should be
 aware of?

none that I am aware of

 
 Once the host is live again, i will report if that did the trick :-)

I'm curious too ;-)

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 regards
 volker
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Best replication link - 10Gbit ethernet, Infiniband, Dolphin

2011-12-27 Thread Andreas Kurz
Hello,

On 12/27/2011 01:54 PM, Phil Stricker wrote:
 Hi!
 
 At the moment, I am thinking about a new system using DRBD (Xenserver with 
 VMs using databases).
 
 To get higher throughput and lower latencies, I wanted to stop using 1 Gbit 
 Ethernet as replication link and started to read posts about DRBD with 
 alternative connections like:
 
 - 10 Gbit/s Ethernet
 - Infiniband (IPoIB)
 - Dolphin
 
 10 Gbit Ethernet would be the easiest an cheapest solution, but is it a 
 good idea to use it?

I successfully integrated several 10Gb/s setups and they work fine. Easy
to setup and tune and well supported if you use one of the usual
suspects regarding vendor.

With infiniband/dolphin you can get lower latency and with
sdp/supersockets also higher bandwith but they are typically more
complicated to setup/tune.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 What are your expiriences?
 
 Best wishes,
 Phil






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] blocking I/O with drbd

2011-12-15 Thread Andreas Kurz
Hello Volker,

On 12/15/2011 01:19 PM, Volker wrote:
 Hi all,
 
 we've been using drbd for about six month now, and so far everything is
 working pretty well. Our setup is like this with two identical machines
 (besides the actual HDDs).
 
 Dell 2950 III
 - 16GB Ram, Dual-Quadcore Xeons 2.0GHz
 - Redhat Enterprise Linux 5.7, 2.6.18-238.19.1.el5
 - PERC6/i Raid-Controller
 - 2 Disk OS, Raid 1 (sda)
 - 4 Disk Content, Raid 10 (sdb)
 - 500GB /dev/sdb5 extended partition
 - LVM-Group 'content' on /dev/sdb5
 - 400GB LVM-Volume 'data' created in LVM-Group 'content'
 - DRBD with /dev/drbd0 on /dev/content/data (content being the
 LVM-Group, data being the LVM-Volume)
 - /dev/drbd0 is mounted with noatime,ext3-ordered-journaling and then
 exported with nfs3 and and mounted by 8 machines (rhel5 entirely)
 - replication is done using a dedicated nic with gbit
 
 The DRBD-Version is
 drbd 8.3.8-1
 kmod 8.3.8.1
 
 here is the information from /proc/drbd:
 
 
 version: 8.3.8 (api:88/proto:86-94)
 GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
 mockbu...@builder10.centos.org, 2010-06-04 08:04:09
  0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate B r
 ns:91617716 nr:15706584 dw:107784232 dr:53529112 al:892898 bm:37118
 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 
 
 This is the latest official Version available for CentOS/Redhat from
 the CentOS-extras repo (as far as i know):

try elrepo.org

 
 http://mirror.centos.org/centos/5/extras/x86_64/RPMS/
 
 
 The configuration is identical on both nodes, looking like this:
 
 # /etc/drbd.d/global_common.conf #
 ##
 global {
 usage-count no;
 }
 
 common {
 protocol B;
 handlers {}
 startup {}
 disk {
 no-disk-barrier;
 no-disk-flushes;
 no-disk-drain;

try replacing no-disk-drain by no-md-flushes

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 }
 net {
 max-buffers 8000;
 max-epoch-size 8000;
 unplug-watermark 1024;
 sndbuf-size 512k;
 }
 syncer {
 rate 25M;
 al-extents 3833;
 }
 }
 ##
 
 
 
 # /etc/drbd.d/production.conf #
 ##
 resource eshop
 {
   device/dev/drbd0;
   disk  /dev/content/data;
   meta-disk internal;
 
   on nfs01.data.domain.de  {
 address   10.110.127.129:7789;
   }
 
   on fallback.dta.domain.de  {
 address   10.110.127.130:7789;
   }
 }
 ##
 
 
 The problem we have with this setup is quite complicated to explain. The
 read/write-performance in daily production use is sufficient to not
 effect the entire platform. The usual system-load viewed using top is
 pretty low, usually between 0.5 and 3.
 
 As soon as i produce some artifical i/o on /dev/drb0 on the master, the
 load pretty much explodes (up to 15) because of blocking i/o. The i/o is
 done with dd and pretty small files of bout 40MB:
 
 dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
 
 Two successive runs like this, make the load go up as far as 10-12
 rendering the whole system useless. In this state, a running dd can not
 be interruped, the nfs-exports are totally inaccessible and the whole
 production-system is at a stand still.
 
 Using blktrace/blkparse one can see, that absolutely no i/o is possible.
 
 'top' shows one or two cores at 100% wait:
 
 ###
 Cpu0 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu3 : 0.0%us, 0.0%sy, 0.0%ni, 0.0%id,100.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu5 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu6 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 Cpu7 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
 ###
 
 This lasts for about 3-4 minutes with the load slowly degrading.
 
 This behaviour can also be reproduced by using the megacli and querying
 the raid-controller for information. A couple of successive runs result
 in the above described behaviour.
 
 And here comes the catch:
 This only happens, if the drbd-layer is in use.
 
 
 If i produce heavy i/o (100 runs of writing a 400MB) on
 - the same block-device /dev/sdb
 - in the same volume-group 'content'
 - but on a newly created _different_ LVM-Volume
 - without usind the drbd-layer
 
 the system-load marginally rises, but i/o is never blocking.
 
 Things i have tried:
 - switching from protocol C to B
 - disk: no-disk-barrier / no-disk-flushes / no-disk-drain;
 - net: max-buffers / max-epoch-size / unplug-watermark / sndbuf-size
 - syncer: rate, al-extents
 - various caching settings on the 

Re: [DRBD-user] DRBD failover between datacenters if one's network fails

2011-12-15 Thread Andreas Kurz
Hello,

On 12/15/2011 06:55 PM, Trey Dockendorf wrote:
 
 On Dec 15, 2011 10:22 AM, Felix Frank f...@mpexnet.de
 mailto:f...@mpexnet.de wrote:

 Hi,

 On 12/15/2011 05:09 PM, Trey Dockendorf wrote:
  Thanks for the input.  Your right in that 2 days is too little time to
  do this, so I'm going to manual route of shutting one server down at a
  time, migrating the virtual disks then bringing it back up on the remote
  site.

 I had thought the QCow images were on one disk. If there are indeed
 several disks you can sync, yes, you can take that route.

  To avoid more downtime of manual migration once this is all over with, I
  think I will first attempt just getting a DRBD resource up and running
  to sync my servers back to the primary datacenter.  Can a DRBD resource
  on an existing LVM be done without effecting the data ?  Also since I

 Yes, provided you can a) enlarge the LV a bit to use internal meta data
 or b) have some extra space on both machines to create an external meta
 data disk.

  don't plain to have automatic failover, any precautions I should take if
  the network connection is lost between the two datacenters ?  Ideally
  this would allow me to have minimal downtime while the nodes re-sync.

 Resyncing does not require downtime. Migrating the VMs to the other DRBD
 peer needs downtime, and it's always brief.

 I cannot think of any required precautions.

 So the actual plan is to migrate the VMs before the connection is lost?
 Great, this way you get away with an (arbitrarily long) quicksync once
 the link returns and once that's finished, you can migrate back at your
 leisure.

 Cheers,
 Felix
 
 All the qcow images are in pools located on the same logical volume.
 
 Your correct.  The plan is to migrate before the fiber repair and
 network outage then sync them back with DRBD.
 
 The meta space, does it have to be stored separate from the replicated
 LVM?  I have a few 100 GBs left on that device.

meta-data is located at the end of a device (internal) or on an extra
device. About 32MB per 1TB is needed ... exact calculation:

http://www.drbd.org/users-guide/ch-internals.html#s-meta-data-size

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Thanks
 - Trey
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] blocking I/O with drbd

2011-12-15 Thread Andreas Kurz
Hello Volker,

On 12/15/2011 05:33 PM, Volker wrote:
 Hi Andreas,
 
 no-disk-drain;

 try replacing no-disk-drain by no-md-flushes
 
 Thanks for your suggestion. Unfortunately setting that made it worse.
 Shortly after
 
 $ drbdadm adjust content
 
 The load on the master went up to 4 and did not decrease afterwards.
 After removing 'no-md-flushes' the load went down to around 1-1.5 again.

hmmm ... that is unexpected.

This behaviour would maybe make sense if your controller has no cache at
all ... or it is configured to only cache reads.

 
 But:
 
 There were two resyncs directly after activating and deactivating it. It
 looked like below:
 
 
 [root@nfs01 nfs]# cat /proc/drbd
 version: 8.3.8 (api:88/proto:86-94)

Really do an upgrade! ... elrepo seems to have latest DRBD 8.3.12 packages

 GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by
 mockbu...@builder10.centos.org, 2010-06-04 08:04:09
  0: cs:SyncTarget ro:Primary/Secondary ds:Inconsistent/UpToDate B r
 ns:902760 nr:13965696 dw:14865000 dr:75668 al:1163261 bm:45844 lo:0
 pe:0 ua:0 ap:0 ep:1 wo:d oos:1737728
   [...] sync'ed: 89.0% (1696/15332)M queue_delay: 0.0 ms
   finish: 0:00:52 speed: 32,928 (25,156) want: 25,600 K/sec
 ###
 
 As you can see, the rate is at around 25MB, which is fine and fast
 enough. The system-load on the master is not affected by this resync.
 
 Why these resyncs happen and so much data is being resynced, is another
 case. The nodes were disconnected for 3-4 Minutes which does not justify
 so much data. Anyways...

If you adjust your resource after changing an disk option the disk is
detached/attached ... this means syncing the complete AL when done on a
primary ... 3833*4MB=15332MB

 
 One further note regarding the blocking-io:
 
 After issueing the mentioned dd command
 
 $ dd if=/dev/zero of=./test-data.dd bs=4096 count=10240
 10240+0 records in
 10240+0 records out
 41943040 bytes (42 MB) copied, 0.11743 seconds, 357 MB/s

you benchmark your page cache here ... add oflag=direct to dd to bypass it

 
 dd finishes within a couple of seconds (1-2) and the system-load does
 not increase right away. It takes about 4-5 seconds for the load to
 increase up to around 5-6. If i would issue a second dd-command right
 after the first one finishes, the load would increase even higher than
 5-6 with the second dd command being uninterruptible.

looks like I/O system or network is fully saturated

 
 Interestingly dd _always_ reports speeds of 200-350MB which is obviously
 not the case.
 
 Any more ideas?

try another RAID controller if DRBD upgrade is not enough.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 greetings
 volker
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Userspace- vs. Kernelspaceversions

2011-11-30 Thread Andreas Kurz
On 11/30/2011 08:57 PM, Arnold Krille wrote:
 Hi again,
 
 public thanks to Andreas Kurz, who answered and helped in private (altough 
 only writing one email:).

yeah ... sorry, wrong reply button

 
 According to his recommendation I updated both nodes to 8.3.12. That alone 
 didn't fix the performance problem. But setting al-extents 3389; (and 
 getting 
 the parameter correctly) seems to fix the problem.

good to hear.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Gotta see if it proves itself tomorrow when my colleagues want to do their 
 daily share of work.
 
 Have a good night,
 
 Arnold
 
 On Wednesday 30 November 2011 17:03:21 Arnold Krille wrote:
 I am a bit stumped currently. Our drbd setup seems to include extra low
 performance as a feature and I don't exactly know where the reason is.
 On thing the bothers bothers us is that the versions of the userspace tools
 and the kernel module differ (drbd8-utils v8.3.7 vs module v8.3.9).
 The newer kernel-module is a side-effect of the kernel from debian squeeze
 backports needed for the new network-cards. But the versions installed on
 both nodes are the same.
 Using latency- and throughput tests from the drbd documentation, the
 latency rises by a factor of ten and the throughput sinks by a factor of
 ~5 from the baccking devices to the drbd device.
 So here are my questions:
  - Could the mixed version be the reason for the performance penalty?
  - Would it be save to downgrade the kernel (and compile the network-driver
 by hand) or is the meta-data on the disk incompatible?
  - Or would you rather update the userspace tools to match the modules
 version?


 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] problem with diskless state

2011-11-28 Thread Andreas Kurz
Hello,

On 11/28/2011 11:05 AM, Michael Schumacher wrote:
 Hi,
 
 I am running a CENTOS6 server that is temporarily stand alone. I
 succeeded installing drbd on this stand alone machine and I am
 planning to add a secondary machine soon to run drbd in a useful
 primary/secondary configuration.
 However, it was necessary to get the first machine up and running.
 This weekend, I had to reboot the machine and are facing now problems
 to get it up and running again.
 
 This is what /prod/drbd is saying:
 
 ---8---
 version: 8.4.0 (api:1/proto:86-100)
 GIT-hash: 28753f559ab51b549d16bcf487fe625d5919c49c build by dag@Build64R6, 
 2011-08-12 09:40:17
  0: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
  1: cs:WFConnection ro:Secondary/Unknown ds:Diskless/DUnknown C r-
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
 ---8---

Please use 8.4.0 only for testing, it has som stability issues ... wait
for 8.4.1 if you plan to use it for producive systems.

 
 this is what /var/log/messages is saying:
 
 ---8---
 Nov 28 09:53:52 virthost1 kernel: drbd: initialized. Version: 8.4.0 
 (api:1/proto:86-100)
 Nov 28 09:53:52 virthost1 kernel: drbd: GIT-hash: 
 28753f559ab51b549d16bcf487fe625d5919c49c build by dag@Build64R6, 2011-08-12 
 09:40:17
 Nov 28 09:53:52 virthost1 kernel: drbd: registered as block device major 147
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting 
 worker thread (from drbdsetup [2303])
 Nov 28 09:53:52 virthost1 kernel: block drbd1: open(/dev/sda6) failed with 
 -16
 Nov 28 09:53:52 virthost1 kernel: block drbd1: drbd_bm_resize called with 
 capacity == 0
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Terminating 
 worker thread
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting 
 worker thread (from drbdsetup [2309])
 Nov 28 09:53:52 virthost1 kernel: block drbd0: open(/dev/sda4) failed with 
 -16
 Nov 28 09:53:52 virthost1 kernel: block drbd0: drbd_bm_resize called with 
 capacity == 0
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Terminating 
 worker thread
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting 
 worker thread (from drbdsetup [2312])
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: conn( 
 StandAlone - Unconnected )
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: Starting 
 receiver thread (from drbd_w_fileserv [2313])
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: receiver 
 (re)started
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_data_drbd: conn( 
 Unconnected - WFConnection )
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting 
 worker thread (from drbdsetup [2315])
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: conn( 
 StandAlone - Unconnected )
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: Starting 
 receiver thread (from drbd_w_fileserv [2316])
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: receiver 
 (re)started
 Nov 28 09:53:52 virthost1 kernel: d-con fileserver1_root_drbd: conn( 
 Unconnected - WFConnection )
 Nov 28 09:54:02 virthost1 kernel: block drbd1: State change failed: Need 
 access to UpToDate data
 Nov 28 09:54:02 virthost1 kernel: block drbd1:   state = { cs:WFConnection 
 ro:Secondary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:02 virthost1 kernel: block drbd1:  wanted = { cs:WFConnection 
 ro:Primary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:03 virthost1 kernel: block drbd1: State change failed: Need 
 access to UpToDate data
 Nov 28 09:54:03 virthost1 kernel: block drbd1:   state = { cs:WFConnection 
 ro:Secondary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:03 virthost1 kernel: block drbd1:  wanted = { cs:WFConnection 
 ro:Primary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:04 virthost1 kernel: block drbd1: State change failed: Need 
 access to UpToDate data
 Nov 28 09:54:04 virthost1 kernel: block drbd1:   state = { cs:WFConnection 
 ro:Secondary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:04 virthost1 kernel: block drbd1:  wanted = { cs:WFConnection 
 ro:Primary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:05 virthost1 kernel: block drbd1: State change failed: Need 
 access to UpToDate data
 Nov 28 09:54:05 virthost1 kernel: block drbd1:   state = { cs:WFConnection 
 ro:Secondary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:05 virthost1 kernel: block drbd1:  wanted = { cs:WFConnection 
 ro:Primary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:06 virthost1 kernel: block drbd1: State change failed: Need 
 access to UpToDate data
 Nov 28 09:54:06 virthost1 kernel: block drbd1:   state = { cs:WFConnection 
 ro:Secondary/Unknown ds:Diskless/DUnknown r- }
 Nov 28 09:54:06 virthost1 kernel: block drbd1:  wanted = { cs:WFConnection 
 ro:Primary/Unknown 

Re: [DRBD-user] problem with diskless state

2011-11-28 Thread Andreas Kurz
Hello Michael,

don't forget to post to mailing-list ;-)

On 11/28/2011 03:21 PM, Michael Schumacher wrote:
 Dear Andreas,
 
 On Monday, November 28, 2011 you wrote:
 
 Any change PV signatures on these disk were detected and therefore the
 VGs where activated automatically? If yes, please adjust your lvm.conf
 to ignore it. Don't forget to recreate your initrd/initramfs to also
 update it in there.
 
 This could have happened.
 
 Hm, I am still wondering. If I will adjust my lvm.conf accordingly to
 avoid being activated, this still will not repair the damage on my
 drbd disks.

Why do you think there is something damaged?

 
 Right?
 
 My drbd disks may still be unaccessible?

I'd suspect disk to be attachable after you stopped all vgs.
Deactivating/removing lvm cache in addition to the extended filtering is
also a good idea.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Any experiences with this kind of setup

2011-11-23 Thread Andreas Kurz
Hello,

On 11/23/2011 10:00 AM, Thomas Reinhold wrote:
 Hi list, 
 
 I'm operating several DRBD clusters and am currently planning a new one. 
 
 I just would like to know if you have any experience with or suggestions for 
 this kind of stack: 
 
 - RAID-Controller: LSI MegaRAID SAS 9260-4i with BBU and SAS-HDDs
 -- DRBD (protocol C, 2-node)
 --- LVM2
  dm-crypt (aes:xts-plain:sha1:512)
 - VM (KVM)
 -- DBMS (MySQL)
 
 Everything is based on Ubuntu LTS (10.04). The DRBD version shipped with the 
 OS is 8.3.7 with Ubuntu patches (2:8.3.7-1ubuntu2.2). 

We have such a setup running. On Debian Squeeze, backports kernel and
latest DRBD 8.3 ... works like a charm ;-)

 
 Any No-Nos in here? Any comments greatly appreciated!

Not really, though you want to make sure you have aesni_intel module
available when running on current Intel Xeon hardware to get support for
AES-NI. In combination with a kernel that supports multiple encryption
pipes you get nearly native write performance ... without you are stuck
to what one cpu can encrypt.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
   Thanks, 
 
  Thomas
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user






signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] How to recover data from node3

2011-11-16 Thread Andreas Kurz
On 11/16/2011 01:58 PM, fosiul alam wrote:
 Hi Andreas
 
 Thanks for your response.
 I read that link so many times. and tryed what its say.. but not luck
 bellow what i have done so far ..
 
 Denmkar link :
 root@drbd-drs:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  csro   ds p  mounted 
 fstype
 10:home-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 11:data-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 
 Uk link :
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res   cs ro   ds p  mounted 
 fstype
 0:home  Connected  Secondary/Secondary  UpToDate/UpToDate  C
 1:data  Connected  Secondary/Secondary  UpToDate/UpToDate  C

You have to bring up the stacked resources in secondary mode on either
drbd1 or drbd2 ... without this step, there is no device drbd3 can
connect to.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 
 now I need to make syncronization bewtween Drbd-drs to Drbd1
 
 Because its Split Brain  according to document. I need to tell drbd1 to
 use as secondary and syncronized from drbd-drs
 
 so
 in drbd1
 
 
 root@drbd1:~# drbdadm secondary --stacked data-U
 11: Failure: (127) Device minor not allocated
 Command 'drbdsetup 11 secondary' terminated with exit code 10
 
 so its does not take the secondary command ..
 
 in drbd-drs ( im teling it to be primary )
 
 root@drbd-drs:~# drbdadm primary data-U
 root@drbd-drs:~# drbdadm primary home-U
 
 root@drbd-drs:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  csro   ds p  mounted 
 fstype
 10:home-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 11:data-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 
 root@drbd-drs:~# drbdadm connect data-U
 11: Failure: (125) Device has a net-config (use disconnect first)
 Command 'drbdsetup 11 net 172.31.3.4:7789 http://172.31.3.4:7789
 172.31.2.4:7789 http://172.31.2.4:7789 A --set-defaults
 --create-device --shared-secret=secret' terminated with exit code 10
 
 
 
 so it does take those spilit brain command ..
 I am missign something but dont understand what .. what steps are those ..
 
 thanks for your help
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] How to recover data from node3

2011-11-16 Thread Andreas Kurz
On 11/16/2011 02:41 PM, fosiul alam wrote:
 Hi
 I was trying to simulate the process again ..
 
 Now when i tryed to make drbd1 as Secondary
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res cs  ro ds p 
 mounted  fstype
 0:homeConnected   Primary/Secondary  UpToDate/UpToDate  C
 1:dataConnected   Primary/Secondary  UpToDate/UpToDate  C
 10:home-U^^0  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r
 11:data-U^^1  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r

fine ... so nearly ready for resync

 
 Now as described in documentation..
 in DRBD-DRS
 
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  cs  ro   ds p 
 mounted  fstype
 10:home-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 11:data-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 root@drbd-drs:/# drbdadm connect home-U
 root@drbd-drs:/# drbdadm connect data-U
 
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  csro   ds p  mounted 
 fstype
 10:home-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 11:data-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 
 so its gone back to WFconnection ..

yes, because drbd2 is in Standalone mode ... so no drbd network
configuration on drbd2 ...

now you can execute the drbdadm -S -- --discard-my-data connect
_stacked_resource_ cmd you read in DRBD users guide on drbd2 if you
want it to be Synctarget for drbd-drs.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Shall do you want me to do ?
 
 (a) I make  primary on Low lever device on DRBD1
 (B) I turn on Stacked device  on DRBD1
 (c) I set secondary on Stacked devices. (DRBD2)
 
 so Again I am missing something ..
 
 Please help me bit more ..
 Thanks for your help
 
 
 
 
 On 16 November 2011 13:20, fosiul alam expertal...@gmail.com
 mailto:expertal...@gmail.com wrote:
 
 Hi Felix and /Andreas/
 
 thanks, its working now .. but only thing i have done different this
 time is invalidate the data in drbd1 ..
 
 i will simulate the same process couple of time more.. and will come
 back toyou ...
 
 Thanks again on both
 
 
 
 On 16 November 2011 13:00, Felix Frank f...@mpexnet.de
 mailto:f...@mpexnet.de wrote:
 
 Hi,
 
 On 11/16/2011 10:47 AM, fosiul alam wrote:
  root@drbd1:~# /etc/init.d/drbd status0:home  Connected
  Secondary/Secondary  UpToDate/UpToDate  C
  1:data  Connected  Secondary/Secondary  UpToDate/UpToDate
 
 so if drbd1 is connected...
 
  Croot@drbd2:~# /etc/init.d/drbd status0:home  WFConnection
  Primary/UnknownUpToDate/Outdated  C
  1:data  WFConnection  Secondary/Unknown  UpToDate/Outdated  C
 
 ...what is it connected to? I wonder.
 
 Getting both your local nodes in Secondary/Secondary is fine.
 The you
 must make one Primary and bring up the stacked resources.
 
 If you do get split brain between your local stacked resources
 and the
 remote resources, you do have to resolve it as hinted by Andreas.
 
 Bear in mind that your victim has stacked resources, you will
 have to
 discard its data using drbdadm --stacked.
 
 HTH,
 Felix
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] How to recover data from node3

2011-11-16 Thread Andreas Kurz
On 11/16/2011 04:53 PM, fosiul alam wrote:
 Hi Flex :
 bellow is my another simulation
 
 i have put step by step , I execute command on both drbd1 and drbd-drs.
 i have posted the output after each effect from both server.
 
 
 
 DRBD1 and DRBD2 is OFF (uk is off)
 
 DRBD-DRS is on and primary
 
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  csro   ds p  mounted 
 fstype
 10:home-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 11:data-U  WFConnection  Primary/Unknown  UpToDate/DUnknown  A
 
 
 DRBd1 :
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res   cs ro ds p  mounted  fstype
 0:home  Connected  Primary/Secondary  UpToDate/UpToDate  C
 1:data  Connected  Primary/Secondary  UpToDate/UpToDate  C
 root@drbd1:~#
 
 
 root@drbd1:~# drbdadm up --stacked data-U
 root@drbd1:~# drbdadm up --stacked home-U
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res cs  ro ds p 
 mounted  fstype
 0:homeConnected   Primary/Secondary  UpToDate/UpToDate  C
 1:dataConnected   Primary/Secondary  UpToDate/UpToDate  C
 10:home-U^^0  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r
 11:data-U^^1  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r
 
 
 Afater I execute previous command, when i check DRBD-DRS :
 
 
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  cs  ro   ds p 
 mounted  fstype
 10:home-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 11:data-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 
 --
 
 DRBD1 ::
 
 
 root@drbd1:~# drbdadm connect --stacked data-U
 root@drbd1:~# drbdadm connect --stacked home-U
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res csro ds p 
 mounted  fstype
 0:homeConnected Primary/Secondary  UpToDate/UpToDate  C
 1:dataConnected Primary/Secondary  UpToDate/UpToDate  C
 10:home-U^^0  WFConnection  Secondary/Unknown  UpToDate/DUnknown  A
 11:data-U^^1  WFConnection  Secondary/Unknown  UpToDate/DUnknown  A
 
 
 DRBD-DRS:
 
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  cs  ro   ds p 
 mounted  fstype
 10:home-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 11:data-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 
 --
 
 Now if i execute drbdadm connect on DRBD-DRS :
 
 root@drbd-drs:/# drbdadm connect data-U
 root@drbd-drs:/# drbdadm connect home-U
 root@drbd-drs:/# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res  cs  ro   ds p 
 mounted  fstype
 10:home-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 11:data-U  StandAlone  Primary/Unknown  UpToDate/DUnknown  r
 
 now output from DRBD1
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 8.3.7 (api:88/proto:86-91)
 srcversion: EE47D8BF18AC166BE219757
 m:res cs  ro ds p 
 mounted  fstype
 0:homeConnected   Primary/Secondary  UpToDate/UpToDate  C
 1:dataConnected   Primary/Secondary  UpToDate/UpToDate  C
 10:home-U^^0  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r
 11:data-U^^1  StandAlone  Secondary/Unknown  UpToDate/DUnknown  r
 
 
 ---
 
 So connect does not do anything ... ..


Did you _really_ read this?

http://www.drbd.org/users-guide-legacy/s-resolve-split-brain.html

I strongly doubt! ... there is really no need to invalidate and do a
full sync if this is only a split brain situation.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now



 
 now if i invalidate ..  in DRBD1
 
 
 root@drbd1:~# drbdadm invalidate --stacked data-U
 root@drbd1:~# drbdadm invalidate --stacked home-U
 
 root@drbd1:~# /etc/init.d/drbd status
 drbd driver loaded OK; device status:
 version: 

Re: [DRBD-user] Can i add Third Node with existing DRBD setup ?

2011-11-16 Thread Andreas Kurz
On 11/16/2011 06:13 PM, fosiul alam wrote:
 Hi
 I just completed my test of DRBD with 3node
 
 our Situation is like ethis :
 
 We will have 2 node in Uk and One node in Denmark.
 
 suppose If i create 2 node in Uk  with internal meta-disk options, with
 bellow drbd.conf
 
 http://www.drbd.org/users-guide/re-drbdconf.html
 
 and if its runs for couple of month ..
 
 
 then if i want to add a 3rd node ..
 will i be able to do this without lossing data of node1 and node2 ??

If you already know for sure you will add a third node you can set up
your system with stacked devices from the beginning:

http://www.drbd.org/users-guide-8.3/s-three-nodes.html

... and run without third node. This has the advantage that there will
be no service downtime when adding the third node.

If you want to start with a two-node setup, this is also possible but
you need to prepare your setup to keep some free space at the end of the
device for the internal meta-data that is obligatory for a stacked
resource.

No problem if you use LVM, then you can resize the device later ... or
you create a file system smaller than the device ... or change to
external meta-date for the lower resource later ... there are several
possibilities ...

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Please advise ..
 
 Thanks
 Fosiul.
 
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user





signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] mkfs on /dev/drdb0 crashes host

2011-11-10 Thread Andreas Kurz
On 11/10/2011 08:49 AM, glennv wrote:
 
 Dear drbd experts,
 
 Please some help for a newbie. Checked the complete internet and found only
 one similar post but without any solution or hints.
 
 2 Linux 32bits server nodes 11.10 (in VMware Fusion)
 A 30GB device setup on both nodes and initial sync is fine. Can switch from
 primary to secondary etc. 
 But the moment i want to create a filesystem on the primary  (mkfs.ext3
 /dev/drdb0) the node crashes every time half way in the mkfs.
 
 Any ideas/ hints ?

drbd version+config? kernel version? kernel logs? ... any information?

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Increasing a DRBD array

2011-10-24 Thread Andreas Kurz
On 10/22/2011 05:47 PM, Gerald Brandt wrote:
 Okay, this is the plan for changing an primary/secondary drbd from 'meta-disk 
 /dev/sda6[1] to 'flexible-meta-disk /dev/sda6'
 
 sda6 is already 512 MB in size, which in theory will let me go to 16 TB 
 storage (128 MB is 4 TB storage).

yes, 512MB is fine

 
 on secondary:
 1. drbdadm down iscsi.target.0
 2. edit drbd.conf to reflect meta-disk change (also on primary?)

no, do it on the secondary only for now

 3. drbdadm create-md iscsi.target.0
 4. drbdadm up iscsi.target.0
 5. wait for full sync to take place
 
 on primary

you mean on the previous primary ... as you might want to do a
switchover to the already prepared secondary 

 6-10 same as 1 - 5 above
 
 At with point I should be able to do drbdadm resize iscsi.target.0, and I 
 should see 6TB of storage.

after point 4 where the previous primary connects to the already updated
previous secondary the resize starts automatically as the two nodes
detect they both have more diskspace and metadata that can handle this
... so you will see a resync of the missing 2TB

 
 That that sound right to everyone?  It should give me no downtime and the 
 ability to have more than 4TB storage.

No downtime? ... except the time you need to switch Roles and therefore
migrate services ... but I assume you let your cluster manager do this job.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 
 Gerald
 
 - Original Message -
 From: Gerald Brandt g...@majentis.com
 To: Andreas Kurz andr...@hastexo.com
 Cc: drbd-user drbd-user@lists.linbit.com
 Sent: Saturday, October 22, 2011 8:33:15 AM
 Subject: Re: [DRBD-user] Increasing a DRBD array

 Hi,



 - Original Message -
 From: Andreas Kurz andr...@hastexo.com
 To: drbd-user drbd-user@lists.linbit.com
 Sent: Friday, October 21, 2011 6:12:11 PM
 Subject: Re: [DRBD-user] Increasing a DRBD array

 On 10/21/2011 11:39 PM, Gerald Brandt wrote:
 Hi,

 I just saw that (google is my friend).  Can I change that on a
 running drbd system?

 hmm ... never tried changing meta-date that way ... shutdown,
 dump-md,
 reconfigure, create-md, restore-md might work ... maybe Lars has a
 hint ...

 I would bring DRBD down on both nodes, stop it when all is in sync
 and
 recreate the meta data after changing the config and then skip the
 initial sync when bringing them up.


 I really can't bring the nodes down.  I can bring down one at a time,
 but the systems have to stay running.
  

 ie:

 original:

 on iscsi-filer-1 {simply use meta-disk internal;
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 meta-disk /dev/sda6[1];
 }

 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 meta-disk /dev/sda6[1];
 }

 new:

 on iscsi-filer-1 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 flexible-meta-disk /dev/sda6[1];

 no ... that index thing only works for static meta-disk ...
 remove
 the
 [1] and resize /dev/sda6 if its not bigger than 196MB.

 I'm not sure I understand.  /dev/sda6 is already 512 MB (I think).
  Should I change to:

 on iscsi-filer-1 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 flexible-meta-disk /dev/sda6;
 }

 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 flexible-meta-disk /dev/sda6;
 }

 or would this be better:

 on iscsi-filer-1 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 meta-disk internal;
 }

 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 meta-disk internal;
 }


 I'll go back to my lists to see if I'm doing things right.

 1. bring down the secondary
 2. change the secondary to 'flexible-meta-data /dev/sda6' in
 drbd.conf on primary and secondary.
 3. bring secondary back up (may re-sync entire disk, not a serious
 issue, just time)
 4. repeat process for primary after re-sync (may cause another
 complete resync).

 Gerald



 Regards,
 Andreas

 --
 Need help with DRBD?
 http://www.hastexo.com/now

 }

 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 flexible-meta-disk /dev/sda6[1];
 }

 Then reboot primary, followed by reboot secondary (after sync),
 and
 all will be well?

 Sorry if these seem to be noob questions.  I just want to be 100%
 sure, as the file servers have live data on them.

 Gerald


 - Original Message -
 From: Andreas Kurz andr...@hastexo.com
 To: drbd-user@lists.linbit.com
 Sent: Friday, October 21, 2011 4:26:13 PM
 Subject: Re: [DRBD-user] Increasing a DRBD array

Re: [DRBD-user] Questions Regarding Configuration

2011-10-23 Thread Andreas Kurz
On 10/23/2011 09:39 PM, Nick Khamis wrote:
 The following works as expected:
 
 node mydrbd1 \
attributes standby=off
 node mydrbd2 \
attributes standby=off
 primitive myIP ocf:heartbeat:IPaddr2 \
   op monitor interval=60 timeout=20 \
 params ip=192.168.2.5 cidr_netmask=24 \
 nic=eth1 broadcast=192.168.2.255 \
   lvs_support=true
 primitive myDRBD ocf:linbit:drbd \
   params drbd_resource=r0.res \
   op monitor role=Master interval=10 \
   op monitor role=Slave interval=30
 ms msMyDRBD myDRBD \
   meta master-max=1 master-node-max=1 \
   clone-max=2 clone-node-max=1 \
   notify=true globally-unique=false
 group MyServices myIP
 order drbdAfterIP \
   inf: myIP msMyDRBD
 location prefer-mysql1 MyServices inf: mydrbd1
 location prefer-mysql2 MyServices inf: mydrbd2

??

 property $id=cib-bootstrap-options \
 no-quorum-policy=ignore \
 stonith-enabled=false \
 expected-quorum-votes=5 \
 dc-version=1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c \
 cluster-recheck-interval=0 \
 cluster-infrastructure=openais
   rsc_defaults $id=rsc-options \
   resource-stickiness=100
 
 However, when modifying the order entry to:
 
 order drbdAfterIP \
   inf: myIP:promote msMyDRBD:start
 
 DRBD no longer works. And when adding the following colocation:

yes, the promote of the IP will never happen as it is a) only configured
as primitve and b) IPaddr2 does not support a promote action ... no IP
promote, no DRBD start ...

 
 colocation drbdOnIP \
   inf: MyServices msMyDRBD:Master
 
 none of the resources work.

tried removing those obscure two location constraints?

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] [Linux-HA] violate uniqueness for parameter drbd_resource

2011-10-21 Thread Andreas Kurz
Hello,

On 10/19/2011 04:49 PM, Nick Khamis wrote:
 Hello Everyone,
 
 What we have is a 4 node cluster: 2 Running mysql on a active/passive,
 and 2 running our application on an active/active:
 
 MyDRBD1 and MyDRBD2: Mysql, DRBD (active/passive)
 ASTDRBD1 and ASTDRBD2: In-house application, DRBD dual primary
 A snippet of our config looks like this:
 
 node mydrbd1 \
attributes standby=off
 node mydrbd2 \
attributes standby=off
 node astdrbd1 \
attributes standby=off
 node astdrbd2 \
attributes standby=off
 primitive drbd_mysql ocf:linbit:drbd \
   params drbd_resource=r0.res \
   op monitor role=Master interval=10 \
   op monitor role=Slave interval=30
 .
 primitive drbd_asterisk ocf:linbit:drbd \
   params drbd_resource=r0.res \
   op monitor interval=20 timeout=20 role=Master \
   op monitor interval=30 timeout=20 role=Slave
 ms ms_drbd_asterisk drbd_asterisk \
   meta master-max=2 notify=true \
   interleave=true
 group MyServices myIP fs_mysql mysql \
   meta target-role=Started
 group ASTServices astIP asteriskDLM asteriskO2CB fs_asterisk \
   meta target-role=Started
 .
 
 I am recieving the following warning: WARNING: Resources
 drbd_asterisk,drbd_mysql violate uniqueness for parameter
 drbd_resource: r0.res
 Now the obvious thing to do is to change the resource name at the DRBD
 level however, I assumed that the parameter uniqueness was bound to
 the primitive?

Only one resource per cluster should use this value for this attribute
if it is marked globally-unique in the RA meta-information.

Do yourself a favour and give the DRBD resources a meaningful name, how
about asterisk and mysql ;-)

 
 My second quick question is, I like to use group + location to
 single out services on specific nodes however, when creating clones:
 
 clone cloneDLM asteriskDLM meta globally-unique=false interleave=true
 
 I am recieving ERROR: asteriskDLM already in use at ASTServices
 error? My question is, what are the benefits of using group + location
 vs. clone + location?

Once a resource is in a group it cannot be used for clones/MS any more
... though you can clone a group or make it MS.

 With the latter I assue we will have a long list of location (one for
 each primitive + node)? And with the former we do not have he meta
 information
 (globally-unique, and interleave)?

I assume you want to manage a cluster filesystem ... so put all the
dlm/o2cb/cluster-fs resources in a group and clone it (and use
interleave for this clone)

Regards,
Andreas

-- 
Need help with Pacemaker or DRBD?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Increasing a DRBD array

2011-10-21 Thread Andreas Kurz
On 10/21/2011 10:48 PM, Gerald Brandt wrote:
 Hi,
 
 DRBD is running directly on md0.  /dev/drbd1 is then exported via iSCSI.
 
 The logs show:
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332010] drbd: initialized.
 Version: 8.3.7 (api:88/proto:86-91)
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332012] drbd: GIT-hash:
 ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@filer-1,
 2011-03-05 08:29:38
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332014] drbd: registered as
 block device major 147
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332015] drbd: minor_table @
 0x88021dbaf300
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334144] block drbd1:
 Starting worker thread (from cqueue [1489])
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334177] block drbd1: ==
 truncating very big lower level device to currently maximum possible
 8587575296 sectors ==
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334179] block drbd1: ==
 using internal or flexible meta data may help ==
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334188] block drbd1: disk(
 Diskless - Attaching )
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.353306] Loading iSCSI
 transport class v2.0-870.
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.360582] skge eth0:
 disabling interface
 Oct 21 15:34:53 iscsi-filer-1 iscsid: iSCSI logger with pid=1515 started!
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.381956] iscsi: registered
 transport (iser)
 Oct 21 15:34:53 iscsi-filer-1 init: ssh main process (1162) terminated
 with status 255
 Oct 21 15:34:53 iscsi-filer-1 postfix/master[1411]: reload -- version
 2.7.0, configuration /etc/postfix
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584678] block drbd1: Found
 57 transactions (3507 active extents) in activity log.  
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584684] block drbd1: Method
 to ensure write ordering: barrier
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584691] block drbd1:
 Backing device's merge_bvec_fn() = a00c0100
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584694] block drbd1:
 max_segment_size ( = BIO size ) = 4096
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584698] block drbd1:
 Adjusting my ra_pages to backing device's (32 - 96)
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584704] block drbd1:
 drbd_bm_resize called with capacity == 8587575296
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603470] block drbd1: resync
 bitmap: bits=1073446912 words=16772608
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603474] block drbd1: size =
 4095 GB (4293787648 KB)
 
 The lines in yellow bug me.  I don't recall see them before.
 
 I had a 4 disk RAID-6 md0 (4x2TB = 4 TB RAID-6).  I added a single drive
 (5x2TB = 6 TB array).

Not using meta-disk internal or flexible-meta-disk limits the device
size to 4TB (=128MB meta data size) ... change your metadata config ...
as the logs suggest ... if you want to use all 6TB

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 
 Any ideas?
 
 Gerald
 
 
 
 
 From: Andreas Kurz andr...@hastexo.com
  To: drbd-user@lists.linbit.com
  Sent: Friday, October 21, 2011 2:55:06 PM
  Subject: Re: [DRBD-user] Increasing a DRBD array
  
  On 10/21/2011 09:30 PM, Gerald Brandt wrote:
   Hi,
  
   I've successfully resize the lower level RAID-6 array, and grown
   it.  I'm now attempting to resize drbd, and nothing seems to
   happen.
  
   /dev/md0 is definitely bigger.
  
   What should I see during a drbd resize?
  
  You should see a DRBD resync of the newly added space.
  
  What is the lower level device of your DRBD resource? The whole md0,
  a
  partition on md0, a lv on a vg on a pv on md0?
  
  ... so if the lower level device has been resized on both nodes, DRBD
  should definitely grow on a drbdadm resize.
  
  Did I mention that on starting DRBD it is resized automatically if a
  bigger lower level device is detected? ... have a look at the kernel
  logs...
  
  Regards,
  Andreas
  
  --
  Need help with DRBD?
  http://www.hastexo.com/now
  
  
   Gerald
  
  
   - Original Message -
   From: Gerald Brandt g...@majentis.com
   To: drbd-user@lists.linbit.com
   Sent: Tuesday, October 18, 2011 7:08:55 AM
   Subject: Re: [DRBD-user] Increasing a DRBD array
  
   Hi,
  
   Okay, this is my list of what to do, and in what order:
  
   1. remove the primary from DRBD
   2. add the physical disk to the primary
   3. add the primary back to DRBD and allow resync.
   4. remove the secondary from DRBD
   5. add the physical disk to the secondary
   6. add the secondary back to DRBD and allow resync.
   7. fdisk and add the disk to the RAID array on primary and
   secondary
   8. grow the RAID array

Re: [DRBD-user] Increasing a DRBD array

2011-10-21 Thread Andreas Kurz
On 10/21/2011 11:39 PM, Gerald Brandt wrote:
 Hi,
 
 I just saw that (google is my friend).  Can I change that on a running drbd 
 system?

hmm ... never tried changing meta-date that way ... shutdown, dump-md,
reconfigure, create-md, restore-md might work ... maybe Lars has a hint ...

I would bring DRBD down on both nodes, stop it when all is in sync and
recreate the meta data after changing the config and then skip the
initial sync when bringing them up.

 
 ie:
 
 original:
 
 on iscsi-filer-1 {simply use meta-disk internal;
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 meta-disk /dev/sda6[1];
 }
 
 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 meta-disk /dev/sda6[1];
 }
 
 new:
 
 on iscsi-filer-1 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.1:7789;
 flexible-meta-disk /dev/sda6[1];

no ... that index thing only works for static meta-disk ... remove the
[1] and resize /dev/sda6 if its not bigger than 196MB.

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now

 }
 
 on iscsi-filer-2 {
 device  /dev/drbd1;
 disk/dev/md0;
 address 192.168.95.2:7789;
 flexible-meta-disk /dev/sda6[1];
 }
 
 Then reboot primary, followed by reboot secondary (after sync), and all will 
 be well?
 
 Sorry if these seem to be noob questions.  I just want to be 100% sure, as 
 the file servers have live data on them.
 
 Gerald
 
 
 - Original Message -
 From: Andreas Kurz andr...@hastexo.com
 To: drbd-user@lists.linbit.com
 Sent: Friday, October 21, 2011 4:26:13 PM
 Subject: Re: [DRBD-user] Increasing a DRBD array

 On 10/21/2011 10:48 PM, Gerald Brandt wrote:
 Hi,

 DRBD is running directly on md0.  /dev/drbd1 is then exported via
 iSCSI.

 The logs show:
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332010] drbd:
 initialized.
 Version: 8.3.7 (api:88/proto:86-91)
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332012] drbd:
 GIT-hash:
 ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by root@filer-1,
 2011-03-05 08:29:38
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332014] drbd:
 registered as
 block device major 147
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.332015] drbd:
 minor_table @
 0x88021dbaf300
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334144] block drbd1:
 Starting worker thread (from cqueue [1489])
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334177] block drbd1:
 ==
 truncating very big lower level device to currently maximum
 possible
 8587575296 sectors ==
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334179] block drbd1:
 ==
 using internal or flexible meta data may help ==
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.334188] block drbd1:
 disk(
 Diskless - Attaching )
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.353306] Loading iSCSI
 transport class v2.0-870.
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.360582] skge eth0:
 disabling interface
 Oct 21 15:34:53 iscsi-filer-1 iscsid: iSCSI logger with pid=1515
 started!
 Oct 21 15:34:53 iscsi-filer-1 kernel: [7.381956] iscsi:
 registered
 transport (iser)
 Oct 21 15:34:53 iscsi-filer-1 init: ssh main process (1162)
 terminated
 with status 255
 Oct 21 15:34:53 iscsi-filer-1 postfix/master[1411]: reload --
 version
 2.7.0, configuration /etc/postfix
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584678] block drbd1:
 Found
 57 transactions (3507 active extents) in activity log.
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584684] block drbd1:
 Method
 to ensure write ordering: barrier
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584691] block drbd1:
 Backing device's merge_bvec_fn() = a00c0100
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584694] block drbd1:
 max_segment_size ( = BIO size ) = 4096
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584698] block drbd1:
 Adjusting my ra_pages to backing device's (32 - 96)
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.584704] block drbd1:
 drbd_bm_resize called with capacity == 8587575296
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603470] block drbd1:
 resync
 bitmap: bits=1073446912 words=16772608
 Oct 21 15:34:54 iscsi-filer-1 kernel: [7.603474] block drbd1:
 size =
 4095 GB (4293787648 KB)

 The lines in yellow bug me.  I don't recall see them before.

 I had a 4 disk RAID-6 md0 (4x2TB = 4 TB RAID-6).  I added a single
 drive
 (5x2TB = 6 TB array).

 Not using meta-disk internal or flexible-meta-disk limits the
 device
 size to 4TB (=128MB meta data size) ... change your metadata config
 ...
 as the logs suggest ... if you want to use all 6TB

 Regards,
 Andreas

 --
 Need help with DRBD?
 http://www.hastexo.com/now


 Any ideas?

 Gerald


 

 From: Andreas Kurz andr...@hastexo.com
  To: drbd-user

Re: [DRBD-user] DRBD on Encrypted FS

2011-10-07 Thread Andreas Kurz
hello,

On 10/06/2011 12:24 AM, Bill Asher wrote:
 Today I did a little test to see if I could configure DRBD on encrypted LVs 
 and what I found is it didn't work for me... Because the servers are located 
 in a colo, security for the servers is the main reasoning.
 All seems to go good until I tell DRBD to mirror filerA logical 
 volume(/dev/vg/data) to filerB LV (/dev/vg/data).  I then received errors on 
 the console like this, over and over:
 
 Block drbd0: open(/dev/vg/data) failed with -16
 
 I then rebooted to Ubuntu CD to look at the LVs and.. they were all gone. The 
 only thing the partitioner sees is the two partitions I created, one for 
 /boot the other for logical volumes, but all my lvm tables were gone.  I was 
 able to repeat this issue on both my filers.
 
 So my question is..
 
 a) can this even be done, encrypting the filesystem then configureing DRBD
 b) if encryption can be done, is my approach wrong?
 
 Thank you in advance for your time.
...

if you want to encrypt a _blockdevice_ and one possible solution is:

* encrypt a complete partition/disk with dm-crypt/LUKS/cryptsetup
* use this encrypted dm device as pv for your vg(s)
* create a lv per DRBD device

after every reboot you need to activate the encrypted partition using
cryptsetup and e.g. your passphrase and you have to do a vgscan/vgchange
prior to the activation of DRBD.

and if you own a recent Intel cpu supporting AES-NI in combination with
a recent kernel like 2.6.39 which supports multiple encryption pipes and
the aesni_intel driver, then you get a damn fast and secure replicated
storage ;-)

Regards,
Andreas

-- 
Need help with DRBD?
http://www.hastexo.com/now




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Recover from Split-Brain

2011-09-07 Thread Andreas Kurz
On 09/07/2011 09:19 AM, Christian Völker wrote:
 Hi,
 
 as I just send out I had a short outage which ended in a Split-Brain
 scenario.
 I'm trying to recover now from this and have all drbd devices back again.
 
 Unfortunately I can't recover from split-brain. Could someone help me,
 please?
 This is the current state on the primary:
 [root@backuppc ~]# cat /proc/drbd
 version: 8.2.6 (api:88/proto:86-88)

please consider an update!

 GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
 buildsvn@c5-i386-build, 2008-10-03 11:42:32
  0: cs:WFConnection st:Primary/Unknown ds:UpToDate/DUnknown A r---
 ns:0 nr:0 dw:68288 dr:820201 al:2641 bm:2636 lo:0 pe:0 ua:0 ap:0
 oos:582912
 
 This is the state on the secondary:
 [root@drbd ~]# cat /proc/drbd
 version: 8.2.6 (api:88/proto:86-88)
 GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by
 buildsvn@c5-i386-build, 2008-10-03 11:42:32
  0: cs:StandAlone st:Secondary/Unknown ds:UpToDate/DUnknown   r---
 ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 oos:0
 
 Now I tried to recover manually as shown on drbd documentation
 (http://www.drbd.org/users-guide/s-resolve-split-brain.html), but it

have a look at:

http://www.drbd.org/users-guide-legacy/s-resolve-split-brain.html

... for DRBD  8.4 to get the old documentation with the old (still
supported) cmdline syntax.

Regards,
Andreas

 doesn't know this special parameter:
 [root@drbd ~]# drbdadm secondary drbd0
 [root@drbd ~]# drbdadm connect --discard-my-data drbd0
 drbdadm: unrecognized option `--discard-my-data'
 try 'drbdadm help'
 
 So how can I recover now?
 
 Greetings
 
 Christian
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Directly connected GigE ports bonded together no switch

2011-08-09 Thread Andreas Kurz
On 2011-08-09 16:46, Herman wrote:
 Sorry if this is covered elsewhere.
 
 I know the Linux Bonding FAQ is supposed to talk about this, but I
 didn't see anything specific in it on what parameters to use.
 
 Basically, I want to bond two GigE ports between two servers which are
 connected with straight cables with no switch and use them for DRBD.
 
 I tried the various bonding modes with miimon=100, but none of them
 worked. Say the eth1 ports on both servers were cabled together, and the
 same for eth5.  Then,  I could create the bond with eth1 and eth5. 
 However, if I downed one of the ports on one server, say eth1, it would
 failover on that server to eth5, but the other server would not
 failover  to eth5.
 
 Eventually, I decided to use arp_interval=100 and arp_ip_target=ip
 of other bonded pair  instead of miimon=100.  This seems to work as
 I expected, with the bond properly failing over.
 
 Is this the right way to do this kind of bonding?
 
 Also, right now I'm using mode=active-backup.  Would one of the other
 modes allow higher throughput and still allow automatic failover and
 transparency to DRBD?

use balance-rr and e.g. miimon=100, that should do fine

Regards,
Andreas

 
 Thanks,
 Herman
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Email Notfications

2011-03-06 Thread Andreas Kurz
On 03/06/2011 09:55 PM, Matt Graham wrote:
 From: Gerald Brandt g...@majentis.com
 Is there a way to get email notifications when the servers are
 syncing, similar to the way mdadm does?

you can use the before-resync-target, after-resync-target handlers to
send a notification on start/end of a sync

Regards,
Andreas

 
 Within DRBD?  No.  That's not DRBD's job.  That job is best handled by
 something like Nagios.  Nagios is a bit heavyweight for *just* monitoring
 DRBD, but if you have ~70 machines all running various services, Nagios can
 make your life a hell of a lot easier.  If you just want to monitor DRBD sync
 status, put together a shell or Perl script that runs on both nodes every 10
 min via cron and mails a list of people when /proc/drbd matches
 /SyncSource|SyncTarget/ .  You can search for check-drbd to find a Nagios
 plugin that does that; modify it for your needs.
  
 On a side note, I'd also like email notification when HA switches
 servers.
 
 Which HA system are you talking about?  Pacemaker, Corosync, heartbeat?  The
 answer to this will almost certainly be found in the Fine Manual for the HA
 system you're using.
 




signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Sgisdela! Help! Horrible DRBD performance on CentOS 5. Where do I start looking?

2010-06-10 Thread Andreas Kurz
Hello again,

On Thursday 10 June 2010 02:32:18 Michael Joyner wrote:
 Two node setup for serving out NFS to vSphere.
 
 timing test is : sync; time iozone -a -e -g 4096
 
 /data/nfs is on LVM striped across 3 DRBD devices. EXT4. data=journal.
 /var/tmp is local filesystem.

... don't do this! There is no guarantee all 3 DRBD devices are up-to-date at 
the same time 

Regards,
Andreas

 
 both are on same set of disk platters and controller. Raid 6. 64K
  blocksize.
 
 FYI, the initial sync (5 TB) used a full gigabit of bandwidth without
 hesitation.
 
 === DRBD TIMES (both nodes up) ==
 real0m10.028s
 user0m0.116s
 sys0m1.538s
 
 real0m10.045s
 user0m0.085s
 sys0m1.541s
 
 real0m9.990s
 user0m0.091s
 sys0m1.578s
 
 real0m9.970s
 user0m0.099s
 sys0m1.557s
 
 real0m9.960s
 user0m0.091s
 sys0m1.499s
 
 === DRBD TIMES (2nd node down) ==
 real0m3.754s
 user0m0.093s
 sys0m1.070s
 
 real0m3.855s
 user0m0.079s
 sys0m1.064s
 
 real0m3.938s
 user0m0.094s
 sys0m1.044s
 
 real0m3.809s
 user0m0.066s
 sys0m1.067s
 
 real0m3.863s
 user0m0.069s
 sys0m1.072s
 
 === LOCAL TIMES ==
 real0m1.770s
 user0m0.070s
 sys0m0.977s
 
 real0m1.809s
 user0m0.085s
 sys0m0.974s
 
 real0m1.737s
 user0m0.067s
 sys0m0.942s
 
 real0m2.007s
 user0m0.058s
 sys0m0.955s
 
 real0m1.808s
 user0m0.072s
 sys0m0.956s
 
 === After re-enabling 2nd node ==
 
 r...@san-node-2 ~]# cat /proc/drbd
 version: 8.3.2 (api:88/proto:86-90)
 GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by
 mockbu...@v20z-x86-64.home.local, 2009-08-29 14:07:55
  0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r
 ns:0 nr:5180 dw:5180 dr:0 al:0 bm:24 lo:1 pe:1389 ua:0 ap:0 ep:1 wo:b
 oos:43964
 [==.] sync'ed: 16.7% (43964/49144)K
 finish: 0:00:08 speed: 5,180 (5,180) K/sec
  1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r
 ns:0 nr:49044 dw:49044 dr:0 al:0 bm:44 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b
 oos:0
  2: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r
 ns:0 nr:6896 dw:6896 dr:0 al:0 bm:28 lo:1 pe:1346 ua:0 ap:0 ep:1 wo:b
 oos:42436
 [===] sync'ed: 23.1% (42436/49332)K
 finish: 0:00:05 speed: 6,896 (6,896) K/sec
 
 === ifconfig ==
 
 bond0:0   Link encap:Ethernet  HWaddr 00:1B:21:26:B8:18
   inet addr:192.168.XXX.XX0  Bcast:192.168.XXX.255
 Mask:255.255.255.0
   UP BROADCAST RUNNING MASTER MULTICAST  MTU:9000  Metric:1
 
 
 === rpm -qa|grep drbd ==
 kmod-drbd83-8.3.2-6.el5_3
 drbd83-8.3.2-6.el5_3
 
 === uname -a ==
 Linux san-node-1.ewc.edu 2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT
 2010 x86_64 x86_64 x86_64 GNU/Linux
 
 === cat /etc/issue ==
 CentOS release 5.5 (Final)
 Kernel \r on an \m
 
 === lspci ==
 01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 1078
 (rev 04)
 
 === box to box copy test ==
 dd if=/dev/sda of=testfile bs=1M count=1024
 1024+0 records in
 1024+0 records out
 1073741824 bytes (1.1 GB) copied, 1.94192 seconds, 553 MB/s
 
 scp testfile r...@san-node-2:/var/tmp/testfile
 testfile100% 1024MB
 46.6MB/s   00:22
 
 === /etc/drbd.conf ==
 
 global {
   usage-count yes;
 }
 common {
 protocol C;
 syncer { rate 70K; }
 net {
 sndbuf-size 0;
 after-sb-0pri discard-least-changes;
 after-sb-1pri discard-secondary;
 }
 startup {
 degr-wfc-timeout 3;
 }
 disk {
 no-disk-flushes;
 no-md-flushes;
 }
 
 }
 resource sdb {
   on san-node-1.ewc.edu {
 device/dev/drbd0;
 disk  /dev/sdb;
 address   192.168.XXX.XX1:7789;
 meta-disk internal;
   }
   on san-node-2.ewc.edu {
 device/dev/drbd0;
 disk  /dev/sdb;
 address   192.168.XXX.XX2:7789;
 meta-disk internal;
   }
 }
 resource sdc {
   on san-node-1.ewc.edu {
 device/dev/drbd1;
 disk  /dev/sdc;
 address   192.168.XXX.XX1:7790;
 meta-disk internal;
   }
   on san-node-2.ewc.edu {
 device/dev/drbd1;
 disk  /dev/sdc;
 address   192.168.XXX.XX2:7790;
 meta-disk internal;
   }
 }
 resource sdd {
   on san-node-1.ewc.edu {
 device/dev/drbd2;
 disk  /dev/sdd;
 address   192.168.XXX.XX1:7791;
 meta-disk internal;
   }
   on san-node-2.ewc.edu {
 device/dev/drbd2;
 disk  /dev/sdd;
 address   192.168.XXX.XX2:7791;
 meta-disk internal;
   }
 }
 

-- 
: Andreas Kurz   
: LINBIT

Re: [DRBD-user] Sgisdela! Help! Horrible DRBD performance on CentOS 5. Where do I start looking?

2010-06-10 Thread Andreas Kurz
On Thursday 10 June 2010 17:19:53 Michael Joyner wrote:
 better, but still way below box2box bandwidth is fs type or fact
 using lvm a factor? Here are my test results, will try w/o LVM next.

I can really recommend  Part V. Optimizing DRBD performance in the DRBD 
Users Guide ... or invest in a DRBD Healthcheck offered by Linbit ;-)

Regards,
Andreas 

 
 On 06/10/2010 04:05 AM, Andreas Kurz wrote:
   LVM on your system supports barriers -- drbd supports barriers --
  barriers are used to enforce write-after-write dependencies per default
  ... use drbd config-param 'no-disk-barrier'
 
 === new layout ==
 
 using /dev/sdb as /dev/drbd0, non-spanned vg.
 
 === new drbd.conf 
 
 global {
usage-count yes;
 }
 common {
  protocol C;
  syncer { rate 70K; }
  net {
  sndbuf-size 0;
  after-sb-0pri discard-least-changes;
  after-sb-1pri discard-secondary;
  }
  startup {
  degr-wfc-timeout 3;
  }
  disk {
  no-disk-flushes;
  no-md-flushes;
  no-disk-barrier;
  }
 
 }
 resource sdb {
on san-node-1.ewc.edu {
  device/dev/drbd0;
  disk  /dev/sdb;
  address   192.168.75.201:7789;
  meta-disk internal;
}
on san-node-2.ewc.edu {
  device/dev/drbd0;
  disk  /dev/sdb;
  address   192.168.75.202:7789;
  meta-disk internal;
}
 }
 
 *
 === *** mount /dev/vgdrbd0/nfs1 /data *** ===
 
 === iozone -a -e -g 4096 with peer down:
 real0m1.757s,  real0m1.738s, real0m1.753s, real
 0m1.762s,  real0m1.745s
 
 === with peer up:
 real0m6.739s, real0m6.758s, real0m6.619s, real0m6.653s,
 real0m6.636s
 
 === big files test =
 
 ===  DRBD === rm -rfv /data/ISOS; rsync -a --verbose --human-readable
 --progress ~mjoyner/ISOS /data/ISOS
 
 building file list ...
 8 files to consider
 created directory /data/ISOS
 ISOS/
 ISOS/ISOS/
 ISOS/ISOS/.lck-5a00b810
84 100%0.00kB/s0:00:00 (xfer#1, to-check=5/8)
 ISOS/ISOS/.lck-6500b810
84 100%   82.03kB/s0:00:00 (xfer#2, to-check=4/8)
 ISOS/ISOS/.lck-7700b810
84 100%   82.03kB/s0:00:00 (xfer#3, to-check=3/8)
 ISOS/ISOS/SW_DVD5_NTRL_SQL_Svr_2008_SP1_English_X15-51857.ISO
   943.07M 100%  189.42MB/s0:00:04 (xfer#4, to-check=2/8)
 ISOS/ISOS/SW_DVD5_SQL_Svr_Enterprise_Edtn_2008_English_MLF_X14-89207.ISO
 3.26G 100%  189.81MB/s0:00:16 (xfer#5, to-check=1/8)
 ISOS/ISOS/SW_DVD5_Windows_Svr_2008w_SP2_English__x64_DC_EE_SE_X15-41371.ISO
 2.76G 100%  162.03MB/s0:00:16 (xfer#6, to-check=0/8)
 
 #1) sent 6.96G bytes  received 164 bytes  190.60M bytes/sec
 #2) sent 6.96G bytes  received 164 bytes  244.10M bytes/sec
 #3) sent 6.96G bytes  received 164 bytes  244.10M bytes/sec
 
 === LOCAL FS ===  rm -rfv /var/tmp/ISOS; rsync -a --verbose
 --human-readable --progress ~mjoyner/ISOS /var/tmp/ISOS
 
 #1) sent 6.96G bytes  received 164 bytes  167.63M bytes/sec
 #2) sent 6.96G bytes  received 164 bytes  185.51M bytes/sec
 #3) sent 6.96G bytes  received 164 bytes  220.85M bytes/sec
 
 === time umount /data
 real2m11.090s
 user0m0.000s
 sys1m49.566s
 
 
 *
 === *** mount -o sync,data=journal /dev/vgdrbd0/nfs1 /data *** ===
 
 === iozone test, no peer
 
 real0m9.483s real0m9.583s real0m9.583s real0m9.559s
 real0m9.491s
 
 === izone test, w/ peer up.
 
 real0m41.558s real0m41.386s real0m41.456s real0m41.448s
 
 
 === BIG FILES TEST
 no peer) sent 6.96G bytes  received 164 bytes  74.40M bytes/sec
 w/ peer) sent 6.96G bytes  received 164 bytes  29.79M bytes/sec
 rsync/var/tmp2/var/tmp) sent 6.96G bytes  received 164 bytes  47.49M
 bytes/sec
 
 === time umount /data
 
 real0m1.078s
 user0m0.000s
 sys0m1.075s
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 

-- 
: Andreas Kurz   
: LINBIT | Your Way to High Availability
: Tel +43-1-8178292-64, Fax +43-1-8178292-82
:
: http://www.linbit.com


LINBIT - We're the HA experts that other experts ask for help!
http://www.linbit.com/en/training/


DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

This e-mail is solely for use by the intended recipient(s). Information
contained in this e-mail and its attachments may be confidential,
privileged or copyrighted. If you are not the intended recipient you are
hereby formally notified that any use, copying, disclosure or
distribution of the contents of this e-mail, in whole or in part, is
prohibited. Also please notify immediately the sender by return e-mail
and delete this e-mail

Re: [DRBD-user] backing_dev length value limited to 128 bytes

2010-02-25 Thread Andreas Kurz
On Thursday 25 February 2010 12:10:40 Christian Iversen wrote:
 On 2010-02-25 10:21, Andreas Kurz wrote:
  On Wednesday 24 February 2010 14:28:38 Alexander Winkler wrote:
  Hello,
 
  I am curious if it would be possible (perhaps in a future version) to
  increase the max length for the backing_dev name to more than 128 bytes
  (maybe 255 bytes?)? In my current setup there are iscsi-targets whose
  names are generated by udev in the following manner, therefore
  exceeding the length-limitation:
 
  /dev/disk/by-path/ip-x.x.x.x:3260-iscsi-iqn.2001-05.com.equallogic:0-8a0
 906 -9f4e88005-91300484b842-xxx-sda-lun-0
 
  If this is not possible, any thoughts on how to circumvent this
  problem?
 
  use /dev/disk/by-id/ paths in your config
 
 But by-id paths look non-sensical with iSCSI. For instance, they could be
 
 /dev/disk/by-id/scsi-14945540003005f410d00
 
 where the by-path name is very descriptive:
 
 /dev/disk/by-path/ip-10.0.0.120:3260-iscsi-iqn.2009-09.org.sikkerhed:sikker
 hedorg-swap-lun-10
 
 In fact, how is it even possible to determine what disk
 you need when using by-id paths? I don't know one.
 

e.g. # scsi_id -g -s /block/sdx

... where sdx is the destination of the disk/by-path link. 

Regards,
Andreas  
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] First post regarding drbd

2009-09-10 Thread Andreas Kurz
Bernie Wu wrote:
 Hi Listers,
 I am trying to test Linux-HA with DRBD.
 The active/passive test environment consists of:
 
 -zVM 5.4
 
 -SLES10-SP2
 
 -2 nodes/guests ( lnodbbt  lnodbct )
 
 -Oracle 10.2.0.4
 I have managed to setup drbd and to mount the database filesystem on lnodbbt 
 ( active/master ).
 Now I want to bring up Oracle.  I have created a resource ORADB and am using 
 the ocf scripts to start oracle which doesn't work.  However, I can manually 
 run the ocf startup script and oracle comes up.
 My questions follow:
 
 1.  Should I be using the LSB startup scripts instead of the ocf script ?

from the log:

oracle[28687][28819]: 2009/09/09_09:03:58 ERROR: Oracle dssd can not mount.

... looks like startup mount is not successful.

The ocf script should work fine ... did you explicitly define the user
for the oracle resource? If not give it a try.

 
 2.  How do I configure drbd so that lnodbbt is the master and lnodbct is 
 the slave ?

Add a location constraint to your cluster config eg:

rsc_location id=prefer-lnodbbt rsc=RG_A node=lnodbbt score=100/


 
 3.  Do I have to set up any constraints for the ORADB resource so that it 
 only starts up on the guest that has the /dbms mounted and what would the 
 constraints be ?

No ... you defined a resource group which sets the needed constraints
implicitly.

Regards,
Andreas

 
 Attached is my /etc/drbd.conf and etc/ha.d/ha.cf and ha-logs from lnodbbt ( 
 the active/master ) guest.
 Any help or pointers would be much appreciated.
 
 TIA
 Bernie
 
 
 The information contained in this e-mail message is intended only for the 
 personal and confidential use of the recipient(s) named above. This message 
 may be an attorney-client communication and/or work product and as such is 
 privileged and confidential. If the reader of this message is not the 
 intended recipient or an agent responsible for delivering it to the intended 
 recipient, you are hereby notified that you have received this document in 
 error and that any review, dissemination, distribution, or copying of this 
 message is strictly prohibited. If you have received this communication in 
 error, please notify us immediately by e-mail, and delete the original 
 message.
 
 
 
 
 
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user


-- 
: Andreas Kurz  
: LINBIT | Your Way to High Availability
: Tel +43-1-8178292-64, Fax +43-1-8178292-82
:
: http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

This e-mail is solely for use by the intended recipient(s). Information
contained in this e-mail and its attachments may be confidential,
privileged or copyrighted. If you are not the intended recipient you are
hereby formally notified that any use, copying, disclosure or
distribution of the contents of this e-mail, in whole or in part, is
prohibited. Also please notify immediately the sender by return e-mail
and delete this e-mail from your system. Thank you for your co-operation.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user