Re: [DRBD-user] drbd-9.1.17 and drbd-9.2.6
From: drbd-user-boun...@lists.linbit.com on behalf of Philipp Reisner Sent: 31 October 2023 16:07 To: drbd-annou...@lists.linbit.com Cc: drbd-user@lists.linbit.com Subject: [DRBD-user] drbd-9.1.17 and drbd-9.2.6 Hi, The tags for these releases don't seem to have made it it github yet. Is it possible to get them pushed? We'd like to see if the reported fixes address a crash/hang with drbd we've been seeing. Thanks, James <6>[21236.721355] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Preparing remote state change 2392587428 <6>[21236.723573] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Committing remote state change 2392587428 (primary_nodes=0) <6>[21236.726339] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: peer( Primary -> Secondary ) <6>[21236.730954] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Preparing remote state change 1429741561 <6>[21236.733200] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Committing remote state change 1429741561 (primary_nodes=0) <6>[21236.733209] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: conn( Connected -> TearDown ) peer( Secondary -> Unknown ) <6>[21236.733211] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011 node004a-00-00: pdsk( UpToDate -> DUnknown ) repl( SyncTarget -> Off ) <6>[21236.733274] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: conn( TearDown -> Disconnecting ) <6>[21236.733509] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Terminating sender thread <6>[21236.733520] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg node004a-00-00: Starting sender thread (from drbd_r_d091f05c [1347328]) <3>[21236.765194] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011 node004a-00-00: ASSERTION __dec_rs_pending(peer_req->peer_device) >= 0 FAILED in free_waiting_resync_requests <3>[21236.766022] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011: ASSERTION !drbd_interval_empty(i) FAILED in drbd_remove_peer_req_interval <4>[21236.766904] [ cut here ] <2>[21236.766905] kernel BUG at mm/slub.c:384! <4>[21236.767438] invalid opcode: [#1] SMP NOPTI <4>[21236.767928] CPU: 5 PID: 1347328 Comm: drbd_r_d091f05c Tainted: P OE 5.15.0-85-generic #95~20.04.2 <4>[21236.768522] Hardware name: Supermicro SYS-5019D-FN8TP-5-NC041/X11SDV-4C-TP8F, BIOS 1.2 11/14/2019 <4>[21236.769341] RIP: e030:kfree+0x21f/0x250 <4>[21236.770436] Code: ff ff 49 89 da e9 d2 fe ff ff 48 8b 55 d0 4d 89 e9 41 b8 01 00 00 00 4c 89 d1 4c 89 e6 4c 89 f7 e8 76 fa ff ff e9 0b ff ff ff <0f> 0b 41 bd 00 f0 ff ff 45 31 f6 eb 84 e8 df 20 cd ff 66 90 eb a1 <4>[21236.772713] RSP: e02b:c900491abc78 EFLAGS: 00010246 <4>[21236.773659] RAX: 888f24c9f000 RBX: 888f24c9f000 RCX: 888f24c9f010 <4>[21236.774353] RDX: 01aef99a RSI: c900491abc88 RDI: 888100040400 <4>[21236.775065] RBP: c900491abcb8 R08: 0003 R09: 0001 <4>[21236.775746] R10: 888f24c9f000 R11: R12: ea003c9327c0 <4>[21236.776456] R13: c0ce188e R14: 888100040400 R15: c900491abd68 <4>[21236.777231] FS: () GS:889046d4() knlGS: <4>[21236.777956] CS: e030 DS: ES: CR0: 80050033 <4>[21236.778701] CR2: 7f4adb7f0ff0 CR3: 00010d9be000 CR4: 00050660 <4>[21236.778708] Call Trace: <4>[21236.781965] <4>[21236.782769] ? show_trace_log_lvl+0x1d6/0x2ea <4>[21236.783574] ? show_trace_log_lvl+0x1d6/0x2ea <4>[21236.784423] ? drbd_free_peer_req+0x10e/0x220 [drbd] <4>[21236.785235] ? show_regs.part.0+0x23/0x29 <4>[21236.786076] ? __die_body.cold+0x8/0xd <4>[21236.786931] ? __die+0x2b/0x37 <4>[21236.787734] ? die+0x30/0x60 <4>[21236.788575] ? do_trap+0xbe/0x100 <4>[21236.789451] ? do_error_trap+0x70/0xb0 <4>[21236.790299] ? kfree+0x21f/0x250 <4>[21236.791247] ? exc_invalid_op+0x53/0x70 <4>[21236.792091] ? kfree+0x21f/0x250 <4>[21236.792992] ? asm_exc_invalid_op+0x1b/0x20 <4>[21236.793899] ? drbd_free_peer_req+0x10e/0x220 [drbd] <4>[21236.794826] ? kfree+0x21f/0x250 <4>[21236.795747] ? kfree+0x1f7/0x250 <4>[21236.796689] drbd_free_peer_req+0x10e/0x220 [drbd] <4>[21236.797585] drain_resync_activity+0x6dc/0xc10 [drbd] <4>[21236.798502] ? wake_up_q+0x50/0x90 <4>[21236.799421] ? mutex_unlock+0x25/0x30 <4>[21236.800318] conn_disconnect+0x199/0xa10 [drbd] <4>[21236.801263] ? receive_twopc+0xa6/0x120 [drbd] <4>[21236.802199] ? process_twopc+0x17e0/0x17e0 [drbd] <4>[21236.803108] drbd_receiver+0x373/0x880 [drbd] <4>[21236.804009] drbd_thread_setup+0x84/0x1e0 [drbd] <4>[21236.804984] ? __drbd_next_peer_device_ref+0x1a0/0x1a0 [drbd] <4>[21236.809238] kthread+0x127/0x150 <4>[21236.809246] ? set_kthread_struct+0x50/0x50 <4>[21236.809250] ret_from_fork+0x1f/0x30 <4>[21236.809258] <4>[21236.813193] Modules linked in: nls_iso8859_1 tcp_diag udp_diag
Re: [DRBD-user] drbd-9.0.29-0rc1 & drbd-9.1.2-rc.1
Hi, I have some problems with creating rpm's in this version (previous was okay): ./configure --with-distro=suse ... make rpm test -e .version test -e .filelist Makefile:186: *** environment variable VERSION is not set. Stop. There seems to be some destructive changes in the Makefile etc. Any ideas how to solve? Regards, Rob On 4/28/21 17:42, Philipp Reisner wrote: Hi, here is the next release candidate for both of our branches. I promise to write a bit more text for the final release, which will happen in one week if no show stoppers are found. This is a release candidate, please help testing it. 9.0.29-0rc1 (api:genl2/proto:86-120/transport:14) * fix data corruption when DRBD's backing disk is a degraded Linux software raid (MD) * add correct thawing of IO requests after IO was frozen due to loss of quorum * fix timeout detection after idle periods and for configs with ko-count when a disk on an a secondary stops delivering IO-completion events * fixed an issue where UUIDs where not shifted in the history slots; that caused false "unrelated data" events * fix a temporal deadlock you could trigger when you exercise promotion races and mix some read-only openers into the test case * fix for bitmap-copy operation in a very specific and unlikely case where two nodes do a bitmap-based resync due to disk-states * fix size negotiation when combining nodes of different CPU architectures that have different page sizes * fix a very rare race where DRBD reported wrong magic in a header packet right after reconnecting * fix a case where DRBD ends up reporting unrelated data; it affected thinly allocated resources with a diskless node in a recreate from day0 event * speedup open() of drbd devices if promote has not chance to go through * new option "--reset-bitmap=no" for the invalidate and invalidate-remote commands; this allows to do a resync after online verify found differences * changes to socket buffer sizes get applied to established connections immediately; before it was applied after a re-connect * add exists events for path objects * forbid keyed hash algorithms for online verify, csyms and HMAC base alg * following upstream changes to DRBD up to Linux 5.12 and updated compat rules to support up to Linux 5.12 https://linbit.com/downloads/drbd/9/drbd-9.1.2-rc.1.tar.gz https://github.com/LINBIT/drbd/commit/8bf23d4e30fdbc907395fb9ec84cb585d82d97c6 https://linbit.com/downloads/drbd/9.0/drbd-9.0.29-0rc1.tar.gz https://github.com/LINBIT/drbd/commit/be52fd979504061bfa9a899e266e314f0aee4cac ___ Star us on GITHUB:https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user -- smime.p7s Description: S/MIME Cryptographic Signature ___ Star us on GITHUB: https://github.com/LINBIT drbd-user mailing list drbd-user@lists.linbit.com https://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD online resize drbd dual-primary on Pacemaker
Nevermind decided not to be lazy (saturday and all that) and do it properly. Done now. On Sat, 15 Dec 2018 12:40 pm Igor Cicimov Hi, > > According to https://docs.linbit.com/docs/users-guide-8.4/#s-resizing, > when resizing DRBD 8.4 device online one side of the mirror needs to > be Secondary. I have dual primary setup with Pacemaker and GFS2 as > file system and wonder if I need to demote one side to Secondary > before I run: > > drbdadm -- --assume-clean resize > > or it will still work while both sides are Primary? The resource has > internal metadata. > > Thanks > ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD 9 or DRBD 8.4 for Dual Primary
On 08/23/2018 10:15 PM, Daniel Ragle wrote: > Greetings, > > I'm setting up my first DBRD pair for testing and curious as to which > version I should use. > > I definitely need a dual primary system, as I need the load balancing > between the two nodes. I may need to move to a multiple-node (3+) > infrastructure in the future. I'll leave commenting on the stability of dual primary setups with DRBD 9.0.x to the core DRBD developers, as I am unsure of its current status. But anyway, a dual/multiple primary setup for load balancing typically only makes sense if the reason for the load-balancing is not I/O, but e.g. CPU or memory resources. Regarding I/O, a dual or multiple primary setup, especially with cluster file systems, will make I/O slower, not faster, due to distributed locking. Even if the reason for load balancing is something else, like CPU load, then I'd probably still just use an NFS or CIFS server on a single primary rather than a dual primary. br, Robert ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd-9.0.6 and drbd-utils-8.9.10
On Fri, Dec 23, 2016 at 02:55:18PM +0100, Philipp Reisner wrote: > http://www.drbd.org/download/drbd/utils/drbd-utils-8.9.10.tar.gz There was a minor flaw in the packaged "./configure" which made drbdmon building impossible without regenerating the script. No additional/code changes. 186a59a714084026c074ce7d8f2a9d11 drbd-utils-8.9.10.tar.gz Regards, rck signature.asc Description: Digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD 8.4.5 + userspace drbd tools 8.4.3
I've been curious about this too as we are in the same boat with Gentoo. On Thu, Apr 23, 2015 at 5:32 AM, Jean-Francois Maeyhieux b...@free.fr wrote: We are using a ganeti cluster in production environment using a classical KVM/LVM/DRBD stack using dedicated 10gbs nics for drbd synchonization. Everything works fine and currently host system specs are: - Gentoo Linux - Kernel 3.14.x - DRBD: version: 8.4.3 (api:1/proto:86-101) - Userspace drbd tools: sys-cluster/drbd-8.4.3 Since there is not yet drbd userspace tools 8.4.3 available in Gentoo portage, we wonder if it could be possible to use: - a more recent kernel (3.18.x) that will bring an in kernel DRBD 8.4.5 - with the same userspace drbd tools 8.4.3 version On a test host with a recent 3.19.x kernel, /proc/drbd exposed the same API: version: 8.4.5 (api:1/proto:86-101) So I think kernel upgrade is possible. Is DRBD API the only important version to check for drbd kernel/userspace part compatibility ? We plan to do an upgrade using this path: - evacuate VM from node to upgrade - remove note to upgrade from cluster (so stop DRBD sync on secondary devices) - update kernel on node to upgrade with new DRBD 8.4.5 - re-add the upgraded node in the cluster and so let drbd synchonize: - from primary devices on 8.4.3 kernel host - to secondary devices on 8.4.5 kernel host Is this path correct ? Any advices about such an upgrade ? Thanks -- Jean-Francois Maeyhieux [Zentoo] ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user -- Adam Randall http://www.xaren.net AIM: blitz574 Twitter: @randalla0622 To err is human... to really foul up requires the root password. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd+device mapper drbd didn't start.
Dear Lars Thank you for your answer. I did chmod -x /sbin/kpartx DRBD worked fine. Thank you --- On Sat, 2014/5/17, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 15, 2014 at 12:03:55PM +0900, kad76oglz...@yahoo.co.jp wrote: Dear Lars Thank you for your answer. I will build next configuration. Primary Storage1 -FC- DRBD-tgt(iSCSI target Driver)- iSCSI -Winodows2008R2 | LAN | Secondary Srorage2 -FC- DRBD That's all nice and shiny. Then you simply do not want to see those partitions (relevant only to your initiator box) on the target. Try to tell multipath/udev/kpartx to *not* automagically create those device mapper partition mappings. How to do that may be distribution specific. You can manually remove those mappings using kpartx -d /dev/mapper/mpatha There should be some option in mutlipath conf to disable kpartx invocation, but I don't remember from the top of my head, and it may not be supported on all platforms (yet). If nothing else helps, chmod -x kpartx ;-) I created partitions from Windows2008R first. Windows2008R created MS data partition(mpathap2) with MS reserved partition(mpathap1). I hope to replicate MS data(mpathp2) by DRBD. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd+device mapper drbd didn't start.
On Wed, May 14, 2014 at 02:07:21PM +0900, kad76oglz...@yahoo.co.jp wrote: Dear Lars Thank you for your answer. Something already claimed mapatha. Maybe you need to exclude kpartx from mapping internal partitions, or adjust the lvm filter to exclude that device. What's supposed to be on that device? File system? VM image? LVM PV? I didn't use LVM. device are /dev/sda and /dev/sdb. # df Filesystem 1K-ブロック使用 使用可 使用% マウント位置 /dev/sdc3 10079084 6286048 3281036 66% / tmpfs 615834088 6158252 1% /dev/shm /dev/sdc1 198337 51717136380 28% /boot /dev/sdc5100125432 12501716 82537548 14% /home # chkconfig --list multipathd multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off But I reboot system and I checked under /dev/mapper. #ls /dev/mapper controll mpatha mpathap1 mpathap2 That's your problem right there. mpathap1 and p2 are device mapper targets created by kpartx on top of the multipath target mpatha. These claim mpatha (correctly), thereby preventing DRBD from claiming them. If that was not the case, you would have both the partition mappings and DRBD accessing the lower level devices, concurrently, but only DRBD would replicate, the partition mappings would bypass DRBD, and you'd become soon very very disappointed (and likely would blame DRBD...). Why did you think you want partitions there? Did you mean to have DRBD use one of those partitions? I suggest you either use one DRBD per partition, or you get rid of the partitions completely and put DRBD on the whole device. If you need partitions inside of one DRBD I recommend to use DRBD as PV (physical volume) for a LVM VG (volume group). Hth, Lars I started drbd. Same error occured. --- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote: Hi everybody I set up device-mapper under centOS6.3. I started drbd but drbd didn't start. #/etc/rc.d/init.d/drbd start Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0:failed(attach:10) adjust net: r0 ] multipath # multipath -ll mpatha dm-0 IFT,DS S16F-R1440 size=7.6T features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 5:0:0:0 sdb 8:16 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:0:0 sda 8:0 active ready running /etc/multipath.conf defaults { user_friendly_names yes }devices { device { vendor IFT product DS S16F-R1440 path_grouping_policymultibus #path_grouping_policyfailover getuid_callout /lib/udev/scsi_id --whitelisted --device =/dev/%n path_checkerreadsector0 path_selector round-robin 0 hardware_handler0 failback15 rr_weight priorities no_path_retry 15 #no_path_retry queue } } blacklist{ devnode ^drbd* # devnode * device { vendor SEAGATE product * } device { vendor Dell product * } device { vendor iDRAC product * } } drbd Ver8.4.4 /etc/drbd.conf # more /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # common { disk { max-bio-bvecs 1; #on-io-error call-local-io-erro; } } resource r0 { protocol C; net { sndbuf-size 512K; ping-int 10; ping-timeout 10; connect-int 10; timeout 80; ko-count 0; max-buffers 8000; max-epoch-size 8000; } syncer { rate 80M; verify-alg md5; al-extents 3833; } on centos1 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.153:7790; flexible-meta-disk /dev/sdc6; } on centos2 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.155:7790; flexible-meta-disk /dev/sdc6; } } /var/log/messages May 8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 (api:1/proto:86-101) May 8 16:16:56 centos1 kernel: drbd: GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 21:53:13 May 8 16:16:56 centos1 kernel: drbd: registered as block device major 147 May 8 16:16:56 centos1 kernel: drbd r0: Starting
Re: [DRBD-user] drbd+device mapper drbd didn't start.
Dear Lars Thank you for your answer. What's supposed to be on that device? File system? VM image? LVM PV? File system.Not use VM and LVM. I used write partition by parted --- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote: Hi everybody I set up device-mapper under centOS6.3. I started drbd but drbd didn't start. #/etc/rc.d/init.d/drbd start Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0:failed(attach:10) adjust net: r0 ] multipath # multipath -ll mpatha dm-0 IFT,DS S16F-R1440 size=7.6T features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 5:0:0:0 sdb 8:16 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:0:0 sda 8:0 active ready running /etc/multipath.conf defaults { user_friendly_names yes }devices { device { vendor IFT product DS S16F-R1440 path_grouping_policymultibus #path_grouping_policyfailover getuid_callout /lib/udev/scsi_id --whitelisted --device =/dev/%n path_checkerreadsector0 path_selector round-robin 0 hardware_handler0 failback15 rr_weight priorities no_path_retry 15 #no_path_retry queue } } blacklist{ devnode ^drbd* # devnode * device { vendor SEAGATE product * } device { vendor Dell product * } device { vendor iDRAC product * } } drbd Ver8.4.4 /etc/drbd.conf # more /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # common { disk { max-bio-bvecs 1; #on-io-error call-local-io-erro; } } resource r0 { protocol C; net { sndbuf-size 512K; ping-int 10; ping-timeout 10; connect-int 10; timeout 80; ko-count 0; max-buffers 8000; max-epoch-size 8000; } syncer { rate 80M; verify-alg md5; al-extents 3833; } on centos1 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.153:7790; flexible-meta-disk /dev/sdc6; } on centos2 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.155:7790; flexible-meta-disk /dev/sdc6; } } /var/log/messages May 8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 (api:1/proto:86-101) May 8 16:16:56 centos1 kernel: drbd: GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 21:53:13 May 8 16:16:56 centos1 kernel: drbd: registered as block device major 147 May 8 16:16:56 centos1 kernel: drbd r0: Starting worker thread (from drbdsetup [5231]) May 8 16:16:56 centos1 kernel: block drbd0: open(/dev/mapper/mapatha) failed with -16 Something already claimed mapatha. Maybe you need to exclude kpartx from mapping internal partitions, or adjust the lvm filter to exclude that device. What's supposed to be on that device? File system? VM image? LVM PV? May 8 16:16:56 centos631 kernel: block drbd0: drbd_bm_resize called with capacity == 0 May 8 16:16:56 centos631 kernel: drbd r0: Terminating drbd_w_r0 Please help me. Regards Masahiko Kawase -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd+device mapper drbd didn't start.
Dear Lars Thank you for your answer. Something already claimed mapatha. Maybe you need to exclude kpartx from mapping internal partitions, or adjust the lvm filter to exclude that device. What's supposed to be on that device? File system? VM image? LVM PV? I didn't use LVM. device are /dev/sda and /dev/sdb. # df Filesystem 1K-ブロック使用 使用可 使用% マウント位置 /dev/sdc3 10079084 6286048 3281036 66% / tmpfs 615834088 6158252 1% /dev/shm /dev/sdc1 198337 51717136380 28% /boot /dev/sdc5100125432 12501716 82537548 14% /home # chkconfig --list multipathd multipathd 0:off 1:off 2:off 3:off 4:off 5:off 6:off But I reboot system and I checked under /dev/mapper. #ls /dev/mapper controll mpatha mpathap1 mpathap2 I started drbd. Same error occured. --- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote: Hi everybody I set up device-mapper under centOS6.3. I started drbd but drbd didn't start. #/etc/rc.d/init.d/drbd start Starting DRBD resources: [ create res: r0 prepare disk: r0 adjust disk: r0:failed(attach:10) adjust net: r0 ] multipath # multipath -ll mpatha dm-0 IFT,DS S16F-R1440 size=7.6T features='0' hwhandler='0' wp=rw |-+- policy='round-robin 0' prio=1 status=active | `- 5:0:0:0 sdb 8:16 active ready running `-+- policy='round-robin 0' prio=1 status=enabled `- 1:0:0:0 sda 8:0 active ready running /etc/multipath.conf defaults { user_friendly_names yes }devices { device { vendor IFT product DS S16F-R1440 path_grouping_policymultibus #path_grouping_policyfailover getuid_callout /lib/udev/scsi_id --whitelisted --device =/dev/%n path_checkerreadsector0 path_selector round-robin 0 hardware_handler0 failback15 rr_weight priorities no_path_retry 15 #no_path_retry queue } } blacklist{ devnode ^drbd* # devnode * device { vendor SEAGATE product * } device { vendor Dell product * } device { vendor iDRAC product * } } drbd Ver8.4.4 /etc/drbd.conf # more /etc/drbd.conf # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # common { disk { max-bio-bvecs 1; #on-io-error call-local-io-erro; } } resource r0 { protocol C; net { sndbuf-size 512K; ping-int 10; ping-timeout 10; connect-int 10; timeout 80; ko-count 0; max-buffers 8000; max-epoch-size 8000; } syncer { rate 80M; verify-alg md5; al-extents 3833; } on centos1 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.153:7790; flexible-meta-disk /dev/sdc6; } on centos2 { device /dev/drbd0; disk /dev/mapper/mpatha; address172.26.24.155:7790; flexible-meta-disk /dev/sdc6; } } /var/log/messages May 8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 (api:1/proto:86-101) May 8 16:16:56 centos1 kernel: drbd: GIT-hash: 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 21:53:13 May 8 16:16:56 centos1 kernel: drbd: registered as block device major 147 May 8 16:16:56 centos1 kernel: drbd r0: Starting worker thread (from drbdsetup [5231]) May 8 16:16:56 centos1 kernel: block drbd0: open(/dev/mapper/mapatha) failed with -16 Something already claimed mapatha. Maybe you need to exclude kpartx from mapping internal partitions, or adjust the lvm filter to exclude that device. What's supposed to be on that device? File system? VM image? LVM PV? May 8 16:16:56 centos631 kernel: block drbd0: drbd_bm_resize called with capacity == 0 May 8 16:16:56 centos631 kernel: drbd r0: Terminating drbd_w_r0 Please help me. Regards Masahiko Kawase -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com
Re: [DRBD-user] drbd-9.0.0pre6.tar.gz drbd-8.3.16rc1.tar.gz
Hi Phil, Any further progress on DRBD 9 you can share with us? (or perhaps even a glimpse on an expected v9.0.0-final release date ? :) ... ) Yours, Chris On 8/14/2013 11:13 PM, Philipp Reisner wrote: Hi, this is a *double release* day. On the drbd-9 side great progress was made since the pre5 release. I ask everybody interested in drbd9 to check it out and report any issues found back to us. 9.0.0pre6 (api:genl1/proto:86-110) * Fixed the wait-[connect|sync] drbdadm commands. Now they actually work on all three object types (resources, connections, volumes) * Fixed the resync vs application IO deadlock on a resync from Primary/SyncSource to Secondary/SyncTarget [Was introduced in drbd9 development] * Correctly deal with writes from a Primary on two Secondaries that do a resync * New command called forget-peer. It is used to free a peer-device slot. Online (via drbdsetup) or offline (via drbdmeta) * Lots of minor fixes http://oss.linbit.com/drbd/9.0/drbd-9.0.0pre6.tar.gz http://git.drbd.org/gitweb.cgi?p=drbd-9.0.git;a=tag;h=refs/tags/drbd-9.0.0pre6 On the drbd-8.3 side the fix regarding devices larger than 64TByte shows that our user base is moving to larger device sizes. Also note that the crm_fence_peer script infrastructure received a number of improvements. 8.3.16rc1 (api:88/proto:86-97) * fix decoding of bitmap vli rle for device sizes 64 TB * fix for deadlock when using automatic split-brain-recovery * only fail empty flushes if no good data is reachable * avoid to shrink max_bio_size due to peer re-configuration * fix resume-io after reconnect with broken fence-peer handler * fencing script improvements http://oss.linbit.com/drbd/8.3/drbd-8.3.16rc1.tar.gz http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=tag;h=refs/tags/drbd-8.3.16rc1 PS: drbd-8.4.4rc1 will arrive in the next days. Best, Phil ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] drbd-9.0.0pre6.tar.gz drbd-8.3.16rc1.tar.gz
On 14/08/13 17:13, Philipp Reisner wrote: 8.3.16rc1 (api:88/proto:86-97) * fix decoding of bitmap vli rle for device sizes 64 TB * fix for deadlock when using automatic split-brain-recovery * only fail empty flushes if no good data is reachable * avoid to shrink max_bio_size due to peer re-configuration * fix resume-io after reconnect with broken fence-peer handler * fencing script improvements http://oss.linbit.com/drbd/8.3/drbd-8.3.16rc1.tar.gz http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=tag;h=refs/tags/drbd-8.3.16rc1 woohoo! I'll try to test this tomorrow if I can at all free up the time. Thanks all! -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] [Drbd-dev] [PATCH] drbd: use list_move_tail instead of list_del/list_add_tail
Thanks, applied. Best regards, Phil From: Wei Yongjun yongjun_...@trendmicro.com.cn Using list_move_tail() instead of list_del() + list_add_tail(). spatch with a semantic match is used to found this problem. (http://coccinelle.lip6.fr/) Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn --- drivers/block/drbd/drbd_worker.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c index 6bce2cc..a196281 100644 --- a/drivers/block/drbd/drbd_worker.c +++ b/drivers/block/drbd/drbd_worker.c @@ -141,8 +141,7 @@ static void drbd_endio_write_sec_final(struct drbd_epoch_entry *e) __releases(lo spin_lock_irqsave(mdev-req_lock, flags); mdev-writ_cnt += e-size 9; - list_del(e-w.list); /* has been on active_ee or sync_ee */ - list_add_tail(e-w.list, mdev-done_ee); + list_move_tail(e-w.list, mdev-done_ee); /* No hlist_del_init(e-collision) here, we did not send the Ack yet, * neither did we wake possibly waiting conflicting requests. ___ drbd-dev mailing list drbd-...@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-dev ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On 04/26/2011 01:05 PM, Whit Blauvelt wrote: Matching LVs are not the same LVs. The LV with your VM is a single item, and having it treated as such, which you get with clvmd, will ensure that it's not startable on either node at the same time. Okay, that could be valuable. Thanks. How does that layer with DRBD? Is there a path from a currently-configured and running lvm system to clvmizing it? Red Hat's documentation here: DRBD is just a block device, so far as LVM is concerned. You will need to set the filter in lvm.conf to only look for drbd devices though, otherwise it sees the LVM twice (once on the backing device and again on the DRBD). I've got the details on how to do this here: http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_2_Tutorial#Setting_Up_Clustered_LVM http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html is only a rough sketch at best. For all the effort going into various cluster and cloud scenarios, it amazes me how little is getting written up well. Hrm, I'm not sure about migrating an existing LVM, but I do know that you need to set the locking type to 3 and mark the LVM as clustered. When creating the VG, this is done with the 'vgcreate -c y' (which is the default when the clvmd daemon is running). Converting the VG/LVs though... That will require research. RHCS's rgmanager is much simpler than Pacemaker, and is well tested and already exists. Writing your own scripts is, I'd argue, a fools errand. :) Ah, but running rgmanager on Ubuntu would be even more foolish. My preference for Ubuntu as host is off topic, but it's a strong one. The configuration files alone needed for all the elements of a good Pacemaker install might require more lines of code than a custom script for a well-defined, limited situation like mine. And the script, I'd understand. The configuration file approach gets into trusting voodoo. Not that much of sysadmin work doesn't consist in trusting voodoo RHCS on .deb distros is probably a non-started. Are you married to Ubuntu, or would on consider an RPM based install? As for Pacemaker; My main argument for it is that it is a well tested, thought out and supported solution. Your custom scripts would only be known by you, which would be a problem for an IT manager, I should expect. ;) I must admit, you lost me somewhat in your reference to emailing people. :) Any notable systems events around here result in notices, whether through Nagios or independently. Ah Best, and thanks again, Whit Best of luck. :) -- Digimer E-Mail: digi...@alteeve.com AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On Sun, Apr 24, 2011 at 10:21:44PM -0400, Digimer wrote: Matching LVs are not the same LVs. The LV with your VM is a single item, and having it treated as such, which you get with clvmd, will ensure that it's not startable on either node at the same time. Okay, that could be valuable. Thanks. How does that layer with DRBD? Is there a path from a currently-configured and running lvm system to clvmizing it? Red Hat's documentation here: http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html is only a rough sketch at best. For all the effort going into various cluster and cloud scenarios, it amazes me how little is getting written up well. RHCS's rgmanager is much simpler than Pacemaker, and is well tested and already exists. Writing your own scripts is, I'd argue, a fools errand. :) Ah, but running rgmanager on Ubuntu would be even more foolish. My preference for Ubuntu as host is off topic, but it's a strong one. The configuration files alone needed for all the elements of a good Pacemaker install might require more lines of code than a custom script for a well-defined, limited situation like mine. And the script, I'd understand. The configuration file approach gets into trusting voodoo. Not that much of sysadmin work doesn't consist in trusting voodoo I must admit, you lost me somewhat in your reference to emailing people. :) Any notable systems events around here result in notices, whether through Nagios or independently. Best, and thanks again, Whit ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On 04/24/2011 10:05 PM, Digimer wrote: Comments in-line. On 04/24/2011 11:34 AM, Whit Blauvelt wrote: Digimer, All useful stuff. Thanks. I hadn't considered three rather than two networks. That's a good case for it. Here's what I'm trying to scope out, and from your comments it looks to be territory you're well familiar with. I've got two systems set up with KVM VMs, where each VM is on its own LVM, currently each with primary-secondary DRBD, where the primary roles are balanced across the two machines. As far as I can tell, and from past comments here, It's necessary to go primary-primary to enable KVM live migration, which is a very nice feature to have. None of the VMs in this case face critical issues with disk performance, so primary-primary slowing that, if it does in this context, isn't a problem. You do need Primary/Primary for live migration. Why do you need a P/P configuration for live migration? From what I understand the VM state will be migrated from the source host to the target host and then the storage will be unmounted on the source and mounted again on the target. That should make the specific configuration of the storage irrelevant as long as it is remotely mountable from both hosts. Regards, Dennis ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
Dennis Jacobfeuerborn wrote: Why do you need a P/P configuration for live migration? From what I understand the VM state will be migrated from the source host to the target host and then the storage will be unmounted on the source and mounted again on the target. That should make the specific configuration of the storage irrelevant as long as it is remotely mountable from both hosts. There's a step in the Xen handoff process where both VMs (or maybe it's both hypervisors) need access to the disk. It's documented in here: http://www.drbd.org/users-guide-emb/ch-xen.html - no details as to why, though. -- In theory, there is no difference between theory and practice. Infnord practice, there is. Yogi Berra ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On 4/23/11 9:59 PM, Patrick Egloff wrote: I got several pm urging me NOT to use active/active and OCFS2. Hi, what were the stated reasons to avoid active/active OCFS2? Did they prefer GFS2 or just not like active/active? We have a few active/active clusters running OCFS2 and have not encountered many problems outside issues with folders with tens of thousands of small files. For larger files such as VMs, we see few issues. Please note that the active/active clusers in question tend to have most of the writes occuring on the A side and we do reads and maintenance on the B side, so perhaps our setup is more of an Active/Passsive (R/W mode). That being said. Active/Active with OCFS2 works very well and we are quite comfortable with it. It should also be said that Active/Passsive (not mounted) has much better performance, irregardless of the file system used. One more question. I have 2 ethernet ports. eth1 is used to link both boxes together. Should i use for DRBD + Heartbeat a different IP address and class than on eth0 which is on the LAN ? We do that so that its easier to know what network you are looking at when ssh'ed on the box. -bill ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
Sotty, guys, but is it really necesary to use a clustered file system for a primary-primary drbd with mysql? Does it really need a locking system? About using or not an active-active cluster, it depends on the needs (high availability, load balancing) but both solutions work perfectly, as mysql recommends drbd and explains how to use it with mysql clusters. And it would be nice having separated NICs for LAN and DRBD: http://dev.mysql.com/doc/refman/5.0/en/ha-drbd-performance.html#ha-drbd-performance-sepinterface You can use /etc/hosts to point other node's secondary nic via hostname: 10.10.10.11 node1 10.10.10.12 node2 10.10.20.1 drbd1 10.10.20.2 drbd2 On Sun, Apr 24, 2011 at 08:06, William Kern wk...@pixelgate.net wrote: On 4/23/11 9:59 PM, Patrick Egloff wrote: I got several pm urging me NOT to use active/active and OCFS2. Hi, what were the stated reasons to avoid active/active OCFS2? Did they prefer GFS2 or just not like active/active? We have a few active/active clusters running OCFS2 and have not encountered many problems outside issues with folders with tens of thousands of small files. For larger files such as VMs, we see few issues. Please note that the active/active clusers in question tend to have most of the writes occuring on the A side and we do reads and maintenance on the B side, so perhaps our setup is more of an Active/Passsive (R/W mode). That being said. Active/Active with OCFS2 works very well and we are quite comfortable with it. It should also be said that Active/Passsive (not mounted) has much better performance, irregardless of the file system used. One more question. I have 2 ethernet ports. eth1 is used to link both boxes together. Should i use for DRBD + Heartbeat a different IP address and class than on eth0 which is on the LAN ? We do that so that its easier to know what network you are looking at when ssh'ed on the box. -bill ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On 04/24/2011 04:30 AM, Eduardo Gonzalez wrote: Sotty, guys, but is it really necesary to use a clustered file system for a primary-primary drbd with mysql? Does it really need a locking system? Yes, you need cluster locking, so that all nodes know when any one node requests to lock part of the file system. Non-clustered file systems expect that they are the only one with access to the storage, and will quickly corrupt is anything else changes the data. -- Digimer E-Mail: digi...@alteeve.com AN!Whitepapers: http://alteeve.com Node Assassin: http://nodeassassin.org ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On Sun, Apr 24, 2011 at 10:39:01AM -0400, Digimer wrote: OCFS2 and GFS2 require cluster locking, which comes with a fair amount of overhead. Primary/Secondary DRBD with a normal filesystem like ext3 will certainly be faster, but in Secondary, you can not access the Secondary resource at all. ... Given the relative trivial expense of network cards, I always recommend three separate networks; Internet Facing, Storage and Back-Channel (cluster comms + live migrations when clustering VMs). Digimer, All useful stuff. Thanks. I hadn't considered three rather than two networks. That's a good case for it. Here's what I'm trying to scope out, and from your comments it looks to be territory you're well familiar with. I've got two systems set up with KVM VMs, where each VM is on its own LVM, currently each with primary-secondary DRBD, where the primary roles are balanced across the two machines. As far as I can tell, and from past comments here, It's necessary to go primary-primary to enable KVM live migration, which is a very nice feature to have. None of the VMs in this case face critical issues with disk performance, so primary-primary slowing that, if it does in this context, isn't a problem. Since each VM is in raw format, directly on top of DRBD, on top of its dedicated LVM, there is no normal running condition where locking should be an issue. That is, there's no time, when the systems are both running well, when both copies of a VM will be live - aside from during migration, where libvirt handles that well. It's the abnormal conditions that require planning. In basic primary-primary it's possible to end up with the same VM on each host running based on the same storage at the same time. When that happens, even cluster locking won't necessarily prevent corruption, since the two instances can be doing inconsistent stuff in different areas of the storage, in ways that locks at the file system level can't prevent. There are two basic contexts where both copies of a VM could be actively running at once like that. One is in a state of failover. In a way failover initiation should be simpler here than that between non-VM systems. No applications per se need to be started when one system goes down. It's just that the VMs that were primary on it need to be started on the survivor. At the same time, some variation of stonith needs to be aimed at the down system to be sure it doesn't recover and create dueling VMs. Any hints at what the most effective way of accomplishing that (probably using IPMI in my case) will be welcomed. The other way to get things in a bad state, if it's a primary-primary setup for each VM, is operator error. I can't see any obvious way to block this, other than running primary-secondary instead, and sacrificing the live migration capacity. It doesn't look like libvirt, virsh and virt-manager have any way to test whether a VM is already running on the other half of a two-system mirror, so they might decline to start it when that's the case. Maybe I'm missing something obvious? Is there, for instance, a way to run primary-secondary just up to when a live migration's desired, and go primary-primary in DRBD for just long enough to migrate? Thanks, Whit ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD or not DRBD ?
On 04/24/2011 09:57 PM, Whit Blauvelt wrote: Digimer, I really thank you for your long-form discussion. So much of the writing on this stuff is terse, making for a steep learning curve. You should be using Clustered LVM (clvmd). This way the LVM PV/VG/LVs are in sync across both nodes at all times. I'm not yet convinced why I should use clvmd. I'm not afraid of creating matching PV/VG/LVs by hand. It's easy to get those to match, and nothing that's run post setup is altering their scheme. On the KISS principle, since I'm capable enough of being stupid, I stick with the tools I know - in this case plain LVM - unless the argument for introducing something new is decisive. I've read some past discussion here about clvmd being required or not, and it seemed to lean against the requirement. With each VM being on a DRBD pair of two dedicated LV's (just for the one VM), I just don't see what can get confused on this level. Am I missing something? Matching LVs are not the same LVs. The LV with your VM is a single item, and having it treated as such, which you get with clvmd, will ensure that it's not startable on either node at the same time. Running the same VM on either host is suicidal, just don't, ever. To help prevent this, using 'resource-and-stonith' and use a script that fires a fence device when a split-brain occurs, then recover the lost VMs on the surviving node. Further, your cluster resource manager (rgmanager or pacemaker) should themselves require a successful fence before beginning resource recovery. Yeah, I definitely have to either get a better hold on the logic of pacemaker, or write my own scripts for this stuff. These servers have IPMI. It would be simple in a bad state to be sure the replication link is dropped. Since the IPMI is on the LAN side, if one server loses sight of the other on both replication and LAN links, then it should be safe to send the other a shutdown message over IPMI given that the other, no longer being on the LAN, shouldn't be able to send the same message back at it at the same time. I think. RHCS's rgmanager is much simpler than Pacemaker, and is well tested and already exists. Writing your own scripts is, I'd argue, a fools errand. :) As for fencing; It's always ideal to have two fence devices on separate interfaces and switch, otherwise you're back to a single point of failure again. If you lose a switch though and all network traffic is stopped, you're not going to make much use of your VMs anyway. Then the only other logic needed, aside from firing appropriate notices to staff, is to start the list of VMs normally run on the down host. Am I making a beginner's mistake to think this can be kept so simple: If both links test dead for the other system, shut it down by IPMI, start up the VMs it was responsible for running, send notices, and we're done. Now, it would be good on restarting the other machine to have it recognize it shouldn't fire up all its usual VMs, so there's more logic needed to be ready for that event. But the initial failover looks simple. Pacemaker looks overly complex and opaque - or more likely I don't understand yet how simple it would be to set it up for this, as I'm getting lost among all it's other options. It's not much to script from scratch though, if it's as simple as it looks in my sketch. I must admit, you lost me somewhat in your reference to emailing people. :) The VMs that are lost when a node dies can be started manually on the survivor, if that is what you wish. You still need the cluster for DLM and fencing, but forgo the resource manager. However, I think you'd be missing on the major benefit of clustering in that case. Just the same though, having the VM data replicated would still reduce your MTTR. Fencing (stonith) generally defaults to restart. This way, with a proper setup, the lost node will hopefully reboot in a healthy state, connect to the DRBD resources and resync, rejoin the cluster and, if you configure it to do so, relocate the VMs back to their original host. Personally though, I disable automatic fail-back so that I can determine the fault before putting the VMs back. Hmm, restart rather than shut down. I take it there's a standard way to have that come back up without doing its normal start of its VMs, but instead to initialize a live migration of them back, just if the system comes up well? If the node successfully rejoins the cluster and resync's the DRBD resources, then you can have it live-migrate the VMs back automatically if you wish. However, as I mentioned, I recommend leaving the VMs on the surviving node and manually live-migrate them back once you've sorted out what went wrong in the first place. This behaviour is configurable in your resource manager of choice. Regardless, properly configured cluster resource manager should prevent the same VM running twice. ... That said, a properly configured resource manager can be told
Re: [DRBD-user] DRBD or not DRBD ?
Hi and thanks for the answer ! I got several pm urging me NOT to use active/active and OCFS2. A more simpler active/passive and no OCFS2 would be the best choice Too many things could go wrong with OCFS2 and active/active + MySQL. But you fully understood my configuration and thanks for your help. My drbd.conf is almost like the one you sent me. But in my case, i must have another problem it's not working. One more question. I have 2 ethernet ports. eth1 is used to link both boxes together. Should i use for DRBD + Heartbeat a different IP address and class than on eth0 which is on the LAN ? Patrick 2011/4/22 Digimer li...@alteeve.com On 04/22/2011 01:36 PM, Patrick Egloff wrote: Hi all, First of all, let me say that i'm a newbie with DRBD and not a high level linux specialist... Few are. Fewer still who claim to be. :) I want to have a HA setup for my Intranet which is using PHP + MySQL. (Joomla 1.6) For that, i have 2 DELL servers with 5 HD RAID on which i installed a CentOS 5.5 with I tried to install OCFS2, DRBD and Heartbeat as active/active. I'm at the point where i can access to my drbd partition /sda6, but i can't make both boxes talk together. I do have some errors will loading : - mount.ocfs2 (device name specified was not found while opening device /dev/drbd0) - drbd is waiting for peer and i have to enter yes to stop the process After reading a lot, i'm not even sure anymore if my first project is the right choice... Is the configuration i planned the best one for my usage or should i change my plans for another setup with same result, that is high availibility ? If it makes sense to continue with drbd , i will be back with some questions about my problems... Thanks, I can't speak to heartbeat or OCFS2, as I use RHCS and GFS2, but the concept should be similar. Aside from that, those are questions above DRBD anyway. First, your RAID 5 is done in hardware, so CentOS only sees /dev/sda, right? Second, Partition 6 is what you want to use as a backing device on either node for /dev/drbd0? If you want to run Active/Active, then you will also want Primary/Primary, right? Given those assumptions, you will need to have a drbd.conf similar to below. Note that the 'on foo {}' section must have the same hostname returned by `uname -n` from either node. Also, change the 'address' to match the IP address of the interface you want DRBD to communicate on. Lastly, make sure any firewall you have allows port 7789 on those interfaces. Finally, replace '/sbin/obliterate' with the path to a script that will kill (or mark Inconsistent) the other node in a split-brain situation. This is generally done using a fence device (aka: stonith). Line wrapping will likely make this ugly, sorry. # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # # The 'global' directive covers values that apply to RBD in general. global { # This tells Linbit that it's okay to count us as a DRBD user. If you # have privacy concerns, set this to 'no'. usage-count yes; } # The 'common' directive sets defaults values for all resources. common { # Protocol 'C' tells DRBD to not report a disk write as complete until # it has been confirmed written to both nodes. This is required for # Primary/Primary use. protocolC; # This sets the default sync rate to 15 MiB/sec. Be careful about # setting this too high! High speed sync'ing can flog your drives and # push disk I/O times very high. syncer { rate15M; } # This tells DRBD what policy to use when a fence is required. disk { # This tells DRBD to block I/O (resource) and then try to fence # the other node (stonith). The 'stonith' option requires that # we set a fence handler below. The name 'stonith' comes from # Shoot The Other Nide In The Head and is a term used in # other clustering environments. It is synonomous with with # 'fence'. fencing resource-and-stonith; } # We set 'stonith' above, so here we tell DRBD how to actually fence # the other node. handlers { # The term 'outdate-peer' comes from other scripts that flag # the other node's resource backing device as 'Inconsistent'. # In our case though, we're flat-out fencing the other node, # which has the same effective result. outdate-peer/sbin/obliterate; } # Here we tell DRBD that we want to use Primary/Primary mode. It is # also where we define split-brain (sb) recovery policies. As we'll be # running all of our resources in Primary/Primary, only the #
Re: [DRBD-user] DRBD or not DRBD ?
On 04/22/2011 01:36 PM, Patrick Egloff wrote: Hi all, First of all, let me say that i'm a newbie with DRBD and not a high level linux specialist... Few are. Fewer still who claim to be. :) I want to have a HA setup for my Intranet which is using PHP + MySQL. (Joomla 1.6) For that, i have 2 DELL servers with 5 HD RAID on which i installed a CentOS 5.5 with I tried to install OCFS2, DRBD and Heartbeat as active/active. I'm at the point where i can access to my drbd partition /sda6, but i can't make both boxes talk together. I do have some errors will loading : - mount.ocfs2 (device name specified was not found while opening device /dev/drbd0) - drbd is waiting for peer and i have to enter yes to stop the process After reading a lot, i'm not even sure anymore if my first project is the right choice... Is the configuration i planned the best one for my usage or should i change my plans for another setup with same result, that is high availibility ? If it makes sense to continue with drbd , i will be back with some questions about my problems... Thanks, I can't speak to heartbeat or OCFS2, as I use RHCS and GFS2, but the concept should be similar. Aside from that, those are questions above DRBD anyway. First, your RAID 5 is done in hardware, so CentOS only sees /dev/sda, right? Second, Partition 6 is what you want to use as a backing device on either node for /dev/drbd0? If you want to run Active/Active, then you will also want Primary/Primary, right? Given those assumptions, you will need to have a drbd.conf similar to below. Note that the 'on foo {}' section must have the same hostname returned by `uname -n` from either node. Also, change the 'address' to match the IP address of the interface you want DRBD to communicate on. Lastly, make sure any firewall you have allows port 7789 on those interfaces. Finally, replace '/sbin/obliterate' with the path to a script that will kill (or mark Inconsistent) the other node in a split-brain situation. This is generally done using a fence device (aka: stonith). Line wrapping will likely make this ugly, sorry. # # please have a a look at the example configuration file in # /usr/share/doc/drbd83/drbd.conf # # The 'global' directive covers values that apply to RBD in general. global { # This tells Linbit that it's okay to count us as a DRBD user. If you # have privacy concerns, set this to 'no'. usage-count yes; } # The 'common' directive sets defaults values for all resources. common { # Protocol 'C' tells DRBD to not report a disk write as complete until # it has been confirmed written to both nodes. This is required for # Primary/Primary use. protocolC; # This sets the default sync rate to 15 MiB/sec. Be careful about # setting this too high! High speed sync'ing can flog your drives and # push disk I/O times very high. syncer { rate15M; } # This tells DRBD what policy to use when a fence is required. disk { # This tells DRBD to block I/O (resource) and then try to fence # the other node (stonith). The 'stonith' option requires that # we set a fence handler below. The name 'stonith' comes from # Shoot The Other Nide In The Head and is a term used in # other clustering environments. It is synonomous with with # 'fence'. fencing resource-and-stonith; } # We set 'stonith' above, so here we tell DRBD how to actually fence # the other node. handlers { # The term 'outdate-peer' comes from other scripts that flag # the other node's resource backing device as 'Inconsistent'. # In our case though, we're flat-out fencing the other node, # which has the same effective result. outdate-peer/sbin/obliterate; } # Here we tell DRBD that we want to use Primary/Primary mode. It is # also where we define split-brain (sb) recovery policies. As we'll be # running all of our resources in Primary/Primary, only the # 'after-sb-2pri' really means anything to us. net { # Tell DRBD to allow dual-primary. allow-two-primaries; # Set the recover policy for split-brain recover when no device # in the resource was primary. after-sb-0pri discard-zero-changes; # Now if one device was primary. after-sb-1pri discard-secondary; # Finally, set the policy when both nodes were Primary. The # only viable option is 'disconnect', which tells DRBD to # simply tear-down the DRBD resource right away and wait for # the administrator to manually
Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen
Thanks for the reply. I actually came across that thread yesterday after I originally sent the first e-mail. I've disabled checksumming using ethtool -K described and it's been working great. It was also documented in this thread: https://bugzilla.redhat.com/show_bug.cgi?id=443621 Thanks again, hany On Thu, Apr 8, 2010 at 4:09 PM, Lars Ellenberg lars.ellenb...@linbit.comwrote: On Thu, Apr 08, 2010 at 08:48:54PM +0200, Florian Haas wrote: On 04/08/2010 08:14 PM, Hany Fahim wrote: Hi, I'm currently running a Xen setup using the CentOS distribution DRBD 8.3.2 from Extras Repo. I'm running into consistent kernel panics when I'm benchmarking the individual DomUs. When the primary node crashes, the secondary also panics shortly after and the two servers reboot in tandem. Doing a search in Google, I found someone else who has the exact same issue as I do: http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html And it was answered here: http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html Maybe Maros can share additional findings. Actually, it later turned out to be likely related to this: http://www.gossamer-threads.com/lists/drbd/users/17207 (read the whole thread) there are a few more threads on that thing, but basically you can use 8.3.7, and see it that helps, upgrade your xen kernel as well, do some of the ethtool things mentioned in the above thread, and if none of that helps, disable DRBD's use of sendpage, which is a module parameter, but can also be toggled at runtime. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen
On 04/08/2010 08:14 PM, Hany Fahim wrote: Hi, I'm currently running a Xen setup using the CentOS distribution DRBD 8.3.2 from Extras Repo. I'm running into consistent kernel panics when I'm benchmarking the individual DomUs. When the primary node crashes, the secondary also panics shortly after and the two servers reboot in tandem. Doing a search in Google, I found someone else who has the exact same issue as I do: http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html And it was answered here: http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html Maybe Maros can share additional findings. Florian signature.asc Description: OpenPGP digital signature ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen
On Thu, Apr 08, 2010 at 08:48:54PM +0200, Florian Haas wrote: On 04/08/2010 08:14 PM, Hany Fahim wrote: Hi, I'm currently running a Xen setup using the CentOS distribution DRBD 8.3.2 from Extras Repo. I'm running into consistent kernel panics when I'm benchmarking the individual DomUs. When the primary node crashes, the secondary also panics shortly after and the two servers reboot in tandem. Doing a search in Google, I found someone else who has the exact same issue as I do: http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html And it was answered here: http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html Maybe Maros can share additional findings. Actually, it later turned out to be likely related to this: http://www.gossamer-threads.com/lists/drbd/users/17207 (read the whole thread) there are a few more threads on that thing, but basically you can use 8.3.7, and see it that helps, upgrade your xen kernel as well, do some of the ethtool things mentioned in the above thread, and if none of that helps, disable DRBD's use of sendpage, which is a module parameter, but can also be toggled at runtime. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user