Re: [DRBD-user] drbd-9.1.17 and drbd-9.2.6

2023-11-03 Thread Dingwall, James
From: drbd-user-boun...@lists.linbit.com  
on behalf of Philipp Reisner 
Sent: 31 October 2023 16:07
To: drbd-annou...@lists.linbit.com
Cc: drbd-user@lists.linbit.com
Subject: [DRBD-user] drbd-9.1.17 and drbd-9.2.6

Hi,

The tags for these releases don't seem to have made it it github yet.  Is it 
possible to get them pushed?  We'd like to see if the reported fixes address a 
crash/hang with drbd we've been seeing.

Thanks,
James



<6>[21236.721355] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Preparing remote state change 2392587428
<6>[21236.723573] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Committing remote state change 2392587428 (primary_nodes=0)
<6>[21236.726339] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: peer( Primary -> Secondary )
<6>[21236.730954] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Preparing remote state change 1429741561
<6>[21236.733200] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Committing remote state change 1429741561 (primary_nodes=0)
<6>[21236.733209] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: conn( Connected -> TearDown ) peer( Secondary -> Unknown )
<6>[21236.733211] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011 
node004a-00-00: pdsk( UpToDate -> DUnknown ) repl( SyncTarget -> Off )
<6>[21236.733274] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: conn( TearDown -> Disconnecting )
<6>[21236.733509] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Terminating sender thread
<6>[21236.733520] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg 
node004a-00-00: Starting sender thread (from drbd_r_d091f05c [1347328])
<3>[21236.765194] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011 
node004a-00-00: ASSERTION __dec_rs_pending(peer_req->peer_device) >= 0 FAILED 
in free_waiting_resync_requests
<3>[21236.766022] drbd d091f05c-2423-4e61-a042-bff1d6065068-1-cfg/0 drbd1011: 
ASSERTION !drbd_interval_empty(i) FAILED in drbd_remove_peer_req_interval
<4>[21236.766904] [ cut here ]
<2>[21236.766905] kernel BUG at mm/slub.c:384!
<4>[21236.767438] invalid opcode:  [#1] SMP NOPTI
<4>[21236.767928] CPU: 5 PID: 1347328 Comm: drbd_r_d091f05c Tainted: P  
 OE 5.15.0-85-generic #95~20.04.2
<4>[21236.768522] Hardware name: Supermicro 
SYS-5019D-FN8TP-5-NC041/X11SDV-4C-TP8F, BIOS 1.2 11/14/2019
<4>[21236.769341] RIP: e030:kfree+0x21f/0x250
<4>[21236.770436] Code: ff ff 49 89 da e9 d2 fe ff ff 48 8b 55 d0 4d 89 e9 41 
b8 01 00 00 00 4c 89 d1 4c 89 e6 4c 89 f7 e8 76 fa ff ff e9 0b ff ff ff <0f> 0b 
41 bd 00 f0 ff ff 45 31 f6 eb 84 e8 df 20 cd ff 66 90 eb a1
<4>[21236.772713] RSP: e02b:c900491abc78 EFLAGS: 00010246
<4>[21236.773659] RAX: 888f24c9f000 RBX: 888f24c9f000 RCX: 
888f24c9f010
<4>[21236.774353] RDX: 01aef99a RSI: c900491abc88 RDI: 
888100040400
<4>[21236.775065] RBP: c900491abcb8 R08: 0003 R09: 
0001
<4>[21236.775746] R10: 888f24c9f000 R11:  R12: 
ea003c9327c0
<4>[21236.776456] R13: c0ce188e R14: 888100040400 R15: 
c900491abd68
<4>[21236.777231] FS:  () GS:889046d4() 
knlGS:
<4>[21236.777956] CS:  e030 DS:  ES:  CR0: 80050033
<4>[21236.778701] CR2: 7f4adb7f0ff0 CR3: 00010d9be000 CR4: 
00050660
<4>[21236.778708] Call Trace:
<4>[21236.781965]  
<4>[21236.782769]  ? show_trace_log_lvl+0x1d6/0x2ea
<4>[21236.783574]  ? show_trace_log_lvl+0x1d6/0x2ea
<4>[21236.784423]  ? drbd_free_peer_req+0x10e/0x220 [drbd]
<4>[21236.785235]  ? show_regs.part.0+0x23/0x29
<4>[21236.786076]  ? __die_body.cold+0x8/0xd
<4>[21236.786931]  ? __die+0x2b/0x37
<4>[21236.787734]  ? die+0x30/0x60
<4>[21236.788575]  ? do_trap+0xbe/0x100
<4>[21236.789451]  ? do_error_trap+0x70/0xb0
<4>[21236.790299]  ? kfree+0x21f/0x250
<4>[21236.791247]  ? exc_invalid_op+0x53/0x70
<4>[21236.792091]  ? kfree+0x21f/0x250
<4>[21236.792992]  ? asm_exc_invalid_op+0x1b/0x20
<4>[21236.793899]  ? drbd_free_peer_req+0x10e/0x220 [drbd]
<4>[21236.794826]  ? kfree+0x21f/0x250
<4>[21236.795747]  ? kfree+0x1f7/0x250
<4>[21236.796689]  drbd_free_peer_req+0x10e/0x220 [drbd]
<4>[21236.797585]  drain_resync_activity+0x6dc/0xc10 [drbd]
<4>[21236.798502]  ? wake_up_q+0x50/0x90
<4>[21236.799421]  ? mutex_unlock+0x25/0x30
<4>[21236.800318]  conn_disconnect+0x199/0xa10 [drbd]
<4>[21236.801263]  ? receive_twopc+0xa6/0x120 [drbd]
<4>[21236.802199]  ? process_twopc+0x17e0/0x17e0 [drbd]
<4>[21236.803108]  drbd_receiver+0x373/0x880 [drbd]
<4>[21236.804009]  drbd_thread_setup+0x84/0x1e0 [drbd]
<4>[21236.804984]  ? __drbd_next_peer_device_ref+0x1a0/0x1a0 [drbd]
<4>[21236.809238]  kthread+0x127/0x150
<4>[21236.809246]  ? set_kthread_struct+0x50/0x50
<4>[21236.809250]  ret_from_fork+0x1f/0x30
<4>[21236.809258]  
<4>[21236.813193] Modules linked in: nls_iso8859_1 tcp_diag udp_diag 

Re: [DRBD-user] drbd-9.0.29-0rc1 & drbd-9.1.2-rc.1

2021-05-02 Thread Rob van der Wal

Hi,

I have some problems with creating rpm's in this version (previous was 
okay):

./configure --with-distro=suse
...
make rpm
test -e .version
test -e .filelist
Makefile:186: *** environment variable VERSION is not set.  Stop.

There seems to be some destructive changes in the Makefile etc. Any 
ideas how to solve?


Regards,
Rob


On 4/28/21 17:42, Philipp Reisner wrote:

Hi,

here is the next release candidate for both of our branches. I promise to write
a bit more text for the final release, which will happen in one week if no show
stoppers are found.

This is a release candidate, please help testing it.

9.0.29-0rc1 (api:genl2/proto:86-120/transport:14)

  * fix data corruption when DRBD's backing disk is a degraded Linux software
raid (MD)
  * add correct thawing of IO requests after IO was frozen due to loss of quorum
  * fix timeout detection after idle periods and for configs with ko-count
when a disk on an a secondary stops delivering IO-completion events
  * fixed an issue where UUIDs where not shifted in the history slots; that
caused false "unrelated data" events
  * fix a temporal deadlock you could trigger when you exercise promotion races
and mix some read-only openers into the test case
  * fix for bitmap-copy operation in a very specific and unlikely case where
two nodes do a bitmap-based resync due to disk-states
  * fix size negotiation when combining nodes of different CPU architectures
that have different page sizes
  * fix a very rare race where DRBD reported wrong magic in a header
packet right after reconnecting
  * fix a case where DRBD ends up reporting unrelated data; it affected
thinly allocated resources with a diskless node in a recreate from day0
event
  * speedup open() of drbd devices if promote has not chance to go through
  * new option "--reset-bitmap=no" for the invalidate and invalidate-remote
commands; this allows to do a resync after online verify found differences
  * changes to socket buffer sizes get applied to established connections
immediately; before it was applied after a re-connect
  * add exists events for path objects
  * forbid keyed hash algorithms for online verify, csyms and HMAC base alg
  * following upstream changes to DRBD up to Linux 5.12 and updated compat
rules to support up to Linux 5.12

https://linbit.com/downloads/drbd/9/drbd-9.1.2-rc.1.tar.gz
https://github.com/LINBIT/drbd/commit/8bf23d4e30fdbc907395fb9ec84cb585d82d97c6

https://linbit.com/downloads/drbd/9.0/drbd-9.0.29-0rc1.tar.gz
https://github.com/LINBIT/drbd/commit/be52fd979504061bfa9a899e266e314f0aee4cac
___
Star us on GITHUB:https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


--

smime.p7s
Description: S/MIME Cryptographic Signature
___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
https://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD online resize drbd dual-primary on Pacemaker

2018-12-14 Thread Igor Cicimov
Nevermind decided not to be lazy (saturday and all that) and do it
properly. Done now.

On Sat, 15 Dec 2018 12:40 pm Igor Cicimov  Hi,
>
> According to https://docs.linbit.com/docs/users-guide-8.4/#s-resizing,
> when resizing DRBD 8.4 device online one side of the mirror needs to
> be Secondary. I have dual primary setup with Pacemaker and GFS2 as
> file system and wonder if I need to demote one side to Secondary
> before I run:
>
> drbdadm -- --assume-clean resize 
>
> or it will still work while both sides are Primary? The resource has
> internal metadata.
>
> Thanks
>
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD 9 or DRBD 8.4 for Dual Primary

2018-08-24 Thread Robert Altnoeder
On 08/23/2018 10:15 PM, Daniel Ragle wrote:
> Greetings,
>
> I'm setting up my first DBRD pair for testing and curious as to which
> version I should use.
>
> I definitely need a dual primary system, as I need the load balancing
> between the two nodes. I may need to move to a multiple-node (3+)
> infrastructure in the future.

I'll leave commenting on the stability of dual primary setups with DRBD
9.0.x to the core DRBD developers, as I am unsure of its current status.

But anyway, a dual/multiple primary setup for load balancing typically
only makes sense if the reason for the load-balancing is not I/O, but
e.g. CPU or memory resources. Regarding I/O, a dual or multiple primary
setup, especially with cluster file systems, will make I/O slower, not
faster, due to distributed locking.

Even if the reason for load balancing is something else, like CPU load,
then I'd probably still just use an NFS or CIFS server on a single
primary rather than a dual primary.

br,
Robert

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd-9.0.6 and drbd-utils-8.9.10

2016-12-27 Thread Roland Kammerer
On Fri, Dec 23, 2016 at 02:55:18PM +0100, Philipp Reisner wrote:
> http://www.drbd.org/download/drbd/utils/drbd-utils-8.9.10.tar.gz

There was a minor flaw in the packaged "./configure" which made drbdmon
building impossible without regenerating the script. No additional/code
changes.

186a59a714084026c074ce7d8f2a9d11  drbd-utils-8.9.10.tar.gz

Regards, rck


signature.asc
Description: Digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD 8.4.5 + userspace drbd tools 8.4.3

2015-04-23 Thread Adam Randall
I've been curious about this too as we are in the same boat with Gentoo.

On Thu, Apr 23, 2015 at 5:32 AM, Jean-Francois Maeyhieux b...@free.fr
wrote:


 We are using a ganeti cluster in production environment using a
 classical KVM/LVM/DRBD stack using dedicated 10gbs nics for drbd
 synchonization.


 Everything works fine and currently host system specs are:
 - Gentoo Linux
 - Kernel 3.14.x
 - DRBD: version: 8.4.3 (api:1/proto:86-101)
 - Userspace drbd tools: sys-cluster/drbd-8.4.3


 Since there is not yet drbd userspace tools  8.4.3 available in Gentoo
 portage, we wonder if it could be possible to use:
 - a more recent kernel (3.18.x) that will bring an in kernel DRBD 8.4.5
 - with the same userspace drbd tools 8.4.3 version

 On a test host with a recent 3.19.x kernel, /proc/drbd exposed the same
 API:
 version: 8.4.5 (api:1/proto:86-101)

 So I think kernel upgrade is possible. Is DRBD API the only important
 version to check for drbd kernel/userspace part compatibility ?

 We plan to do an upgrade using this path:
 - evacuate VM from node to upgrade
 - remove note to upgrade from cluster (so stop DRBD sync on secondary
 devices)
 - update kernel on node to upgrade with new DRBD 8.4.5
 - re-add the upgraded node in the cluster and so let drbd synchonize:
   - from primary devices on 8.4.3 kernel host
   - to secondary devices on 8.4.5 kernel host

 Is this path correct ?
 Any advices about such an upgrade ?


 Thanks
 --
 Jean-Francois Maeyhieux [Zentoo]


 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user




-- 
Adam Randall
http://www.xaren.net
AIM: blitz574
Twitter: @randalla0622

To err is human... to really foul up requires the root password.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+device mapper drbd didn't start.

2014-05-21 Thread kad76oglz0qh
Dear Lars

Thank you for your answer.

I did chmod -x /sbin/kpartx

DRBD worked fine.

Thank you

--- On Sat, 2014/5/17, Lars Ellenberg lars.ellenb...@linbit.com wrote:

On Thu, May 15, 2014 at 12:03:55PM +0900, kad76oglz...@yahoo.co.jp wrote:
 
 Dear Lars
 
 Thank you for your answer.
 
 I will build next configuration.
 
 Primary   Storage1 -FC- DRBD-tgt(iSCSI target Driver)- iSCSI -Winodows2008R2
 
 |
 LAN
  |
 Secondary Srorage2 -FC- DRBD

That's all nice and shiny.
Then you simply do not want to see those partitions
(relevant only to your initiator box)
on the target.

Try to tell multipath/udev/kpartx
to *not* automagically create those
device mapper partition mappings.

How to do that may be distribution specific.

You can manually remove those mappings using
kpartx -d /dev/mapper/mpatha

There should be some option in mutlipath conf
to disable kpartx invocation,
but I don't remember from the top of my head,
and it may not be supported on all platforms (yet).

If nothing else helps, chmod -x kpartx ;-)


 
 I created partitions from Windows2008R first.
 
 Windows2008R created MS data partition(mpathap2) with MS reserved 
 partition(mpathap1).
 
 I hope to replicate MS data(mpathp2) by DRBD.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+device mapper drbd didn't start.

2014-05-14 Thread Lars Ellenberg
On Wed, May 14, 2014 at 02:07:21PM +0900, kad76oglz...@yahoo.co.jp wrote:
 Dear Lars
 
 Thank you for your answer.
 
 Something already claimed mapatha.
 Maybe you need to exclude kpartx from mapping internal partitions,
 or adjust the lvm filter to exclude that device.
 
 What's supposed to be on that device?
 File system? VM image? LVM PV?
 
 I didn't use LVM.
 
 device are /dev/sda and /dev/sdb.
 # df
 Filesystem   1K-ブロック使用   使用可 使用% マウント位置
 /dev/sdc3 10079084   6286048   3281036  66% /
 tmpfs  615834088   6158252   1% /dev/shm
 /dev/sdc1   198337 51717136380  28% /boot
 /dev/sdc5100125432  12501716  82537548  14% /home
 
 # chkconfig --list multipathd
 multipathd  0:off   1:off   2:off   3:off   4:off   5:off   6:off
 
 But I reboot system and I checked under /dev/mapper.
 #ls /dev/mapper
 controll mpatha mpathap1 mpathap2

That's your problem right there.

mpathap1 and p2 are device mapper targets created by kpartx
on top of the multipath target mpatha.
These claim mpatha (correctly),
thereby preventing DRBD from claiming them.

If that was not the case, you would have both
the partition mappings and DRBD accessing the lower level devices,
concurrently, but only DRBD would replicate,
the partition mappings would bypass DRBD,
and you'd become soon very very disappointed
(and likely would blame DRBD...).


Why did you think you want partitions there?

Did you mean to have DRBD use one of those partitions?

I suggest you either use one DRBD per partition,
or you get rid of the partitions completely
and put DRBD on the whole device.

If you need partitions inside of one DRBD
I recommend to use DRBD as PV (physical volume)
for a LVM VG (volume group).

Hth,
Lars

 I started drbd. Same error occured.
 
 
 
 
 --- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote:
 
 On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote:
  Hi everybody
  
  I set up device-mapper under centOS6.3.
  I started drbd but drbd didn't start.
  
  #/etc/rc.d/init.d/drbd start
  Starting DRBD resources: [
   create res: r0
 prepare disk: r0
  adjust disk: r0:failed(attach:10)
   adjust net: r0
  ]
  
  multipath
  # multipath -ll
  mpatha dm-0 IFT,DS S16F-R1440
  size=7.6T features='0' hwhandler='0' wp=rw
  |-+- policy='round-robin 0' prio=1 status=active
  | `- 5:0:0:0  sdb 8:16 active ready  running
  `-+- policy='round-robin 0' prio=1 status=enabled
`- 1:0:0:0  sda 8:0  active ready  running
  
  /etc/multipath.conf
  defaults {
  user_friendly_names yes
  }devices {
 device {
 vendor  IFT
 product DS S16F-R1440
 path_grouping_policymultibus
 #path_grouping_policyfailover
 getuid_callout  /lib/udev/scsi_id --whitelisted 
  --device
  =/dev/%n
 path_checkerreadsector0
 path_selector   round-robin 0
 hardware_handler0
 failback15
 rr_weight   priorities
 no_path_retry   15
 #no_path_retry   queue
 }
  }
  blacklist{
   devnode ^drbd*
  # devnode *
   device {
  vendor SEAGATE
  product *
  }
   device {
  vendor Dell
  product *
  }
   device {
  vendor iDRAC
  product *
  }
   }
  
  drbd Ver8.4.4
  
  /etc/drbd.conf
  
  # more /etc/drbd.conf
  #
  # please have a a look at the example configuration file in
  # /usr/share/doc/drbd83/drbd.conf
  #
  common {
 disk {
  max-bio-bvecs 1;
  #on-io-error call-local-io-erro;
  }
  }
  
  resource r0 {
protocol C;
  
  
   net {
 sndbuf-size 512K;
 ping-int 10;
 ping-timeout 10;
 connect-int 10;
 timeout 80;
 ko-count 0;
 max-buffers 8000;
 max-epoch-size 8000;
}
  
  syncer {
rate 80M;
verify-alg md5;
al-extents 3833;
  }
  
on centos1 {
 device /dev/drbd0;
 disk   /dev/mapper/mpatha;
 address172.26.24.153:7790;
 flexible-meta-disk /dev/sdc6;
}
on centos2 {
 device /dev/drbd0;
 disk   /dev/mapper/mpatha;
 address172.26.24.155:7790;
 flexible-meta-disk /dev/sdc6;
}
  }
  
  /var/log/messages
  May  8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 
  (api:1/proto:86-101)
  May  8 16:16:56 centos1 kernel: drbd: GIT-hash: 
  74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 
  21:53:13
  May  8 16:16:56 centos1 kernel: drbd: registered as block device major 147
  May  8 16:16:56 centos1 kernel: drbd r0: Starting 

Re: [DRBD-user] drbd+device mapper drbd didn't start.

2014-05-13 Thread kad76oglz0qh
Dear Lars

Thank you for your answer.

What's supposed to be on that device?
File system? VM image? LVM PV?

File system.Not use VM and LVM.
I used write partition by parted



--- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote:

On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote:
 Hi everybody
 
 I set up device-mapper under centOS6.3.
 I started drbd but drbd didn't start.
 
 #/etc/rc.d/init.d/drbd start

 Starting DRBD resources: [
  create res: r0
prepare disk: r0
 adjust disk: r0:failed(attach:10)
  adjust net: r0
 ]
 
 multipath
 # multipath -ll
 mpatha dm-0 IFT,DS S16F-R1440
 size=7.6T features='0' hwhandler='0' wp=rw
 |-+- policy='round-robin 0' prio=1 status=active
 | `- 5:0:0:0  sdb 8:16 active ready  running
 `-+- policy='round-robin 0' prio=1 status=enabled
   `- 1:0:0:0  sda 8:0  active ready  running
 
 /etc/multipath.conf
 defaults {
 user_friendly_names yes
 }devices {
device {
vendor  IFT 
   
product DS S16F-R1440
path_grouping_policymultibus
#path_grouping_policyfailover
getuid_callout  /lib/udev/scsi_id --whitelisted 
 --device
 =/dev/%n
path_checkerreadsector0
path_selector   round-robin 0
hardware_handler0
 
   failback15
rr_weight   priorities
no_path_retry   15
#no_path_retry   queue
}
 }
 blacklist{
  devnode ^drbd*
 # devnode *
  device {
 vendor SEAGATE
 product *
 
}
  device {
 vendor Dell
 product *
 }
  device {
 vendor iDRAC
 product *
 }
  }
 
 drbd Ver8.4.4
 
 /etc/drbd.conf
 
 # more /etc/drbd.conf
 #
 # please have a a look at the example configuration file in
 # /usr/share/doc/drbd83/drbd.conf
 #
 common {
disk
 {
 max-bio-bvecs 1;
 #on-io-error call-local-io-erro;
 }
 }
 
 resource r0 {
   protocol C;
 
 
  net {
sndbuf-size 512K;
ping-int 10;
ping-timeout 10;
connect-int 10;
timeout 80;
ko-count 0;
max-buffers 8000;
max-epoch-size 8000;
   }
 
 syncer {
   rate 80M;
   verify-alg md5;
   al-extents 3833;
 }
 
   on centos1 {
device /dev/drbd0;
   
 disk   /dev/mapper/mpatha;
address172.26.24.153:7790;
flexible-meta-disk /dev/sdc6;
   }
   on centos2 {
device /dev/drbd0;
disk   /dev/mapper/mpatha;
address172.26.24.155:7790;
flexible-meta-disk /dev/sdc6;
   }
 }
 
 /var/log/messages
 May  8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 
 (api:1/proto:86-101)
 May  8 16:16:56 centos1 kernel: drbd: GIT-hash: 
 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 
 21:53:13
 May  8 16:16:56 centos1 kernel: drbd: registered as block device major 147
 May  8 16:16:56 centos1 kernel: drbd r0: Starting worker
 thread (from drbdsetup [5231])
 May  8 16:16:56 centos1 kernel: block drbd0: open(/dev/mapper/mapatha) 
 failed with -16

Something already claimed mapatha.
Maybe you need to exclude kpartx from mapping internal partitions,
or adjust the lvm filter to exclude that device.

What's supposed to be on that device?
File system? VM image? LVM PV?

 May  8 16:16:56 centos631 kernel: block drbd0: drbd_bm_resize called with 
 capacity == 0
 May  8 16:16:56 centos631 kernel: drbd r0: Terminating drbd_w_r0
 
 Please help me.
 
 Regards
 Masahiko Kawase


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to
 list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+device mapper drbd didn't start.

2014-05-13 Thread kad76oglz0qh
Dear Lars

Thank you for your answer.

Something already claimed mapatha.
Maybe you need to exclude kpartx from mapping internal partitions,
or adjust the lvm filter to exclude that device.

What's supposed to be on that device?
File system? VM image? LVM PV?

I didn't use LVM.

device are /dev/sda and /dev/sdb.
# df
Filesystem   1K-ブロック使用   使用可 使用% マウント位置
/dev/sdc3 10079084   6286048   3281036  66% /
tmpfs  615834088   6158252   1% /dev/shm
/dev/sdc1   198337 51717136380  28% /boot
/dev/sdc5100125432  12501716  82537548  14% /home

# chkconfig --list multipathd
multipathd  0:off   1:off   2:off   3:off   4:off   5:off   6:off

But I reboot system and I checked under /dev/mapper.
#ls /dev/mapper
controll mpatha mpathap1 mpathap2

I started drbd. Same error occured.




--- On Tue, 2014/5/13, Lars Ellenberg lars.ellenb...@linbit.com wrote:

On Thu, May 08, 2014 at 06:43:58PM +0900, kad76oglz...@yahoo.co.jp wrote:
 Hi everybody
 
 I set up device-mapper under centOS6.3.
 I started drbd but drbd didn't start.
 
 #/etc/rc.d/init.d/drbd start
 Starting DRBD resources: [
  create res: r0
prepare disk: r0
 adjust disk: r0:failed(attach:10)
  adjust net: r0
 ]
 
 multipath
 # multipath -ll
 mpatha dm-0 IFT,DS S16F-R1440
 size=7.6T features='0' hwhandler='0' wp=rw
 |-+- policy='round-robin 0' prio=1 status=active
 | `- 5:0:0:0  sdb 8:16 active ready  running
 `-+- policy='round-robin 0' prio=1 status=enabled
   `- 1:0:0:0  sda 8:0  active ready  running
 
 /etc/multipath.conf
 defaults {
 user_friendly_names yes
 }devices {
device {
vendor  IFT
product DS S16F-R1440
path_grouping_policymultibus
#path_grouping_policyfailover
getuid_callout  /lib/udev/scsi_id --whitelisted 
 --device
 =/dev/%n
path_checkerreadsector0
path_selector   round-robin 0
hardware_handler0
failback15
rr_weight   priorities
no_path_retry   15
#no_path_retry   queue
}
 }
 blacklist{
  devnode ^drbd*
 # devnode *
  device {
 vendor SEAGATE
 product *
 }
  device {
 vendor Dell
 product *
 }
  device {
 vendor iDRAC
 product *
 }
  }
 
 drbd Ver8.4.4
 
 /etc/drbd.conf
 
 # more /etc/drbd.conf
 #
 # please have a a look at the example configuration file in
 # /usr/share/doc/drbd83/drbd.conf
 #
 common {
disk {
 max-bio-bvecs 1;
 #on-io-error call-local-io-erro;
 }
 }
 
 resource r0 {
   protocol C;
 
 
  net {
sndbuf-size 512K;
ping-int 10;
ping-timeout 10;
connect-int 10;
timeout 80;
ko-count 0;
max-buffers 8000;
max-epoch-size 8000;
   }
 
 syncer {
   rate 80M;
   verify-alg md5;
   al-extents 3833;
 }
 
   on centos1 {
device /dev/drbd0;
disk   /dev/mapper/mpatha;
address172.26.24.153:7790;
flexible-meta-disk /dev/sdc6;
   }
   on centos2 {
device /dev/drbd0;
disk   /dev/mapper/mpatha;
address172.26.24.155:7790;
flexible-meta-disk /dev/sdc6;
   }
 }
 
 /var/log/messages
 May  8 16:16:56 centos1 kernel: drbd: initialized. Version: 8.4.4 
 (api:1/proto:86-101)
 May  8 16:16:56 centos1 kernel: drbd: GIT-hash: 
 74402fecf24da8e5438171ee8c19e28627e1c98a build by root@centos63, 2014-04-25 
 21:53:13
 May  8 16:16:56 centos1 kernel: drbd: registered as block device major 147
 May  8 16:16:56 centos1 kernel: drbd r0: Starting worker thread (from 
 drbdsetup [5231])
 May  8 16:16:56 centos1 kernel: block drbd0: open(/dev/mapper/mapatha) 
 failed with -16

Something already claimed mapatha.
Maybe you need to exclude kpartx from mapping internal partitions,
or adjust the lvm filter to exclude that device.

What's supposed to be on that device?
File system? VM image? LVM PV?

 May  8 16:16:56 centos631 kernel: block drbd0: drbd_bm_resize called with 
 capacity == 0
 May  8 16:16:56 centos631 kernel: drbd r0: Terminating drbd_w_r0
 
 Please help me.
 
 Regards
 Masahiko Kawase


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com

Re: [DRBD-user] drbd-9.0.0pre6.tar.gz drbd-8.3.16rc1.tar.gz

2013-11-27 Thread Christiaan den Besten
Hi Phil,

Any further progress on DRBD 9 you can share with us? 

(or perhaps even a glimpse on an expected v9.0.0-final release date ? :) ... )

Yours,
Chris

On 8/14/2013 11:13 PM, Philipp Reisner wrote:
 Hi,
 
 this is a *double release* day. On the drbd-9 side great 
 progress was made since the pre5 release. I ask everybody
 interested in drbd9 to check it out and report any
 issues found back to us. 
 
 9.0.0pre6 (api:genl1/proto:86-110)
 
  * Fixed the wait-[connect|sync] drbdadm commands. Now they actually
work on all three object types (resources, connections, volumes)
  * Fixed the resync vs application IO deadlock on a resync from
Primary/SyncSource to Secondary/SyncTarget [Was introduced in
drbd9 development]
  * Correctly deal with writes from a Primary on two Secondaries that
do a resync
  * New command called forget-peer. It is used to free a peer-device
slot. Online (via drbdsetup) or offline (via drbdmeta)
  * Lots of minor fixes
 
 http://oss.linbit.com/drbd/9.0/drbd-9.0.0pre6.tar.gz
 http://git.drbd.org/gitweb.cgi?p=drbd-9.0.git;a=tag;h=refs/tags/drbd-9.0.0pre6
 
 
 On the drbd-8.3 side the fix regarding devices larger than
 64TByte shows that our user base is moving to larger
 device sizes. Also note that the crm_fence_peer script
 infrastructure received a number of improvements.
 
 8.3.16rc1 (api:88/proto:86-97)
 
  * fix decoding of bitmap vli rle for device sizes  64 TB
  * fix for deadlock when using automatic split-brain-recovery
  * only fail empty flushes if no good data is reachable
  * avoid to shrink max_bio_size due to peer re-configuration
  * fix resume-io after reconnect with broken fence-peer handler
  * fencing script improvements
 
 http://oss.linbit.com/drbd/8.3/drbd-8.3.16rc1.tar.gz
 http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=tag;h=refs/tags/drbd-8.3.16rc1
 
 
 PS: drbd-8.4.4rc1 will arrive in the next days.
 
 Best,
   Phil
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user
 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd-9.0.0pre6.tar.gz drbd-8.3.16rc1.tar.gz

2013-08-14 Thread Digimer

On 14/08/13 17:13, Philipp Reisner wrote:

8.3.16rc1 (api:88/proto:86-97)

  * fix decoding of bitmap vli rle for device sizes  64 TB
  * fix for deadlock when using automatic split-brain-recovery
  * only fail empty flushes if no good data is reachable
  * avoid to shrink max_bio_size due to peer re-configuration
  * fix resume-io after reconnect with broken fence-peer handler
  * fencing script improvements

http://oss.linbit.com/drbd/8.3/drbd-8.3.16rc1.tar.gz
http://git.drbd.org/gitweb.cgi?p=drbd-8.3.git;a=tag;h=refs/tags/drbd-8.3.16rc1


woohoo!

I'll try to test this tomorrow if I can at all free up the time. Thanks all!

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] [Drbd-dev] [PATCH] drbd: use list_move_tail instead of list_del/list_add_tail

2012-09-05 Thread Philipp Reisner

Thanks, applied.

Best regards,
 Phil
 From: Wei Yongjun yongjun_...@trendmicro.com.cn
 
 Using list_move_tail() instead of list_del() + list_add_tail().
 
 spatch with a semantic match is used to found this problem.
 (http://coccinelle.lip6.fr/)
 
 Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
 ---
  drivers/block/drbd/drbd_worker.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)
 
 diff --git a/drivers/block/drbd/drbd_worker.c
 b/drivers/block/drbd/drbd_worker.c index 6bce2cc..a196281 100644
 --- a/drivers/block/drbd/drbd_worker.c
 +++ b/drivers/block/drbd/drbd_worker.c
 @@ -141,8 +141,7 @@ static void drbd_endio_write_sec_final(struct
 drbd_epoch_entry *e) __releases(lo
 
   spin_lock_irqsave(mdev-req_lock, flags);
   mdev-writ_cnt += e-size  9;
 - list_del(e-w.list); /* has been on active_ee or sync_ee */
 - list_add_tail(e-w.list, mdev-done_ee);
 + list_move_tail(e-w.list, mdev-done_ee);
 
   /* No hlist_del_init(e-collision) here, we did not send the Ack yet,
* neither did we wake possibly waiting conflicting requests.
 
 
 ___
 drbd-dev mailing list
 drbd-...@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-dev
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-26 Thread Digimer
On 04/26/2011 01:05 PM, Whit Blauvelt wrote:
 Matching LVs are not the same LVs. The LV with your VM is a single item,
 and having it treated as such, which you get with clvmd, will ensure
 that it's not startable on either node at the same time.
 
 Okay, that could be valuable. Thanks. How does that layer with DRBD? Is
 there a path from a currently-configured and running lvm system to clvmizing
 it? Red Hat's documentation here:

DRBD is just a block device, so far as LVM is concerned. You will need
to set the filter in lvm.conf to only look for drbd devices though,
otherwise it sees the LVM twice (once on the backing device and again on
the DRBD).

I've got the details on how to do this here:

http://wiki.alteeve.com/index.php/Red_Hat_Cluster_Service_2_Tutorial#Setting_Up_Clustered_LVM

 http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html
 
 is only a rough sketch at best. For all the effort going into various
 cluster and cloud scenarios, it amazes me how little is getting written up
 well.

Hrm, I'm not sure about migrating an existing LVM, but I do know that
you need to set the locking type to 3 and mark the LVM as clustered.
When creating the VG, this is done with the 'vgcreate -c y' (which is
the default when the clvmd daemon is running). Converting the VG/LVs
though... That will require research.

 RHCS's rgmanager is much simpler than Pacemaker, and is well tested and
 already exists. Writing your own scripts is, I'd argue, a fools errand. :)
 
 Ah, but running rgmanager on Ubuntu would be even more foolish. My
 preference for Ubuntu as host is off topic, but it's a strong one. The
 configuration files alone needed for all the elements of a good Pacemaker
 install might require more lines of code than a custom script for a
 well-defined, limited situation like mine. And the script, I'd understand.
 The configuration file approach gets into trusting voodoo. Not that much of
 sysadmin work doesn't consist in trusting voodoo

RHCS on .deb distros is probably a non-started. Are you married to
Ubuntu, or would on consider an RPM based install?

As for Pacemaker; My main argument for it is that it is a well tested,
thought out and supported solution. Your custom scripts would only be
known by you, which would be a problem for an IT manager, I should
expect. ;)

 I must admit, you lost me somewhat in your reference to emailing people. :)
 
 Any notable systems events around here result in notices, whether through
 Nagios or independently. 

Ah

 Best, and thanks again,
 Whit

Best of luck. :)

-- 
Digimer
E-Mail: digi...@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-26 Thread Whit Blauvelt
On Sun, Apr 24, 2011 at 10:21:44PM -0400, Digimer wrote:

 Matching LVs are not the same LVs. The LV with your VM is a single item,
 and having it treated as such, which you get with clvmd, will ensure
 that it's not startable on either node at the same time.

Okay, that could be valuable. Thanks. How does that layer with DRBD? Is
there a path from a currently-configured and running lvm system to clvmizing
it? Red Hat's documentation here:

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Logical_Volume_Manager_Administration/LVM_Cluster_Overview.html

is only a rough sketch at best. For all the effort going into various
cluster and cloud scenarios, it amazes me how little is getting written up
well.

 RHCS's rgmanager is much simpler than Pacemaker, and is well tested and
 already exists. Writing your own scripts is, I'd argue, a fools errand. :)

Ah, but running rgmanager on Ubuntu would be even more foolish. My
preference for Ubuntu as host is off topic, but it's a strong one. The
configuration files alone needed for all the elements of a good Pacemaker
install might require more lines of code than a custom script for a
well-defined, limited situation like mine. And the script, I'd understand.
The configuration file approach gets into trusting voodoo. Not that much of
sysadmin work doesn't consist in trusting voodoo

 I must admit, you lost me somewhat in your reference to emailing people. :)

Any notable systems events around here result in notices, whether through
Nagios or independently. 

Best, and thanks again,
Whit
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-25 Thread Dennis Jacobfeuerborn

On 04/24/2011 10:05 PM, Digimer wrote:

Comments in-line.

On 04/24/2011 11:34 AM, Whit Blauvelt wrote:

Digimer,

All useful stuff. Thanks. I hadn't considered three rather than two
networks. That's a good case for it.

Here's what I'm trying to scope out, and from your comments it looks to be
territory you're well familiar with. I've got two systems set up with KVM
VMs, where each VM is on its own LVM, currently each with primary-secondary
DRBD, where the primary roles are balanced across the two machines. As far
as I can tell, and from past comments here, It's necessary to go
primary-primary to enable KVM live migration, which is a very nice feature
to have. None of the VMs in this case face critical issues with disk
performance, so primary-primary slowing that, if it does in this context,
isn't a problem.


You do need Primary/Primary for live migration.


Why do you need a P/P configuration for live migration? From what I 
understand the VM state will be migrated from the source host to the target 
host and then the storage will be unmounted on the source and mounted again 
on the target. That should make the specific configuration of the storage 
irrelevant as long as it is remotely mountable from both hosts.


Regards,
  Dennis
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-25 Thread Miles Fidelman

Dennis Jacobfeuerborn wrote:
Why do you need a P/P configuration for live migration? From what I 
understand the VM state will be migrated from the source host to the 
target host and then the storage will be unmounted on the source and 
mounted again on the target. That should make the specific 
configuration of the storage irrelevant as long as it is remotely 
mountable from both hosts.


There's a step in the Xen handoff process where both VMs (or maybe it's 
both hypervisors) need access to the disk.  It's documented in here: 
http://www.drbd.org/users-guide-emb/ch-xen.html - no details as to why, 
though.



--
In theory, there is no difference between theory and practice.
Infnord  practice, there is.    Yogi Berra


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-24 Thread William Kern

On 4/23/11 9:59 PM, Patrick Egloff wrote:


I got several pm urging me NOT to use active/active and OCFS2.


Hi, what were the stated reasons to avoid active/active OCFS2?
Did they prefer GFS2 or just not like active/active?

We have a few active/active clusters running OCFS2 and have not 
encountered many problems outside issues with folders with tens of 
thousands of small files.


For larger files such as VMs, we see few issues. Please note that the 
active/active clusers in question tend to have most of the writes 
occuring on the A side and we do reads and maintenance on the B side, so 
perhaps our setup is more of an Active/Passsive (R/W mode).


That being said. Active/Active with OCFS2 works very well and we are 
quite comfortable with it.


It should also be said that Active/Passsive (not mounted) has much 
better performance, irregardless of the file system used.




One more question. I have 2 ethernet ports. eth1 is used to link both 
boxes together.
Should i use for DRBD + Heartbeat a different IP address and class 
than on eth0 which is on the LAN ?




We do that so that its easier to know what network you are looking at 
when ssh'ed on the box.


-bill
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-24 Thread Eduardo Gonzalez
Sotty, guys, but is it really necesary to use a clustered file system
for a primary-primary drbd with mysql? Does it really need a locking
system?

About using or not an active-active cluster, it depends on the needs
(high availability, load balancing) but both solutions work perfectly,
as mysql recommends drbd and explains how to use it with mysql
clusters.

And it would be nice having separated NICs for LAN and DRBD:
http://dev.mysql.com/doc/refman/5.0/en/ha-drbd-performance.html#ha-drbd-performance-sepinterface
You can use /etc/hosts to point other node's secondary nic via hostname:

10.10.10.11 node1
10.10.10.12 node2
10.10.20.1 drbd1
10.10.20.2 drbd2


On Sun, Apr 24, 2011 at 08:06, William Kern wk...@pixelgate.net wrote:
 On 4/23/11 9:59 PM, Patrick Egloff wrote:

 I got several pm urging me NOT to use active/active and OCFS2.

 Hi, what were the stated reasons to avoid active/active OCFS2?
 Did they prefer GFS2 or just not like active/active?

 We have a few active/active clusters running OCFS2 and have not encountered
 many problems outside issues with folders with tens of thousands of small
 files.

 For larger files such as VMs, we see few issues. Please note that the
 active/active clusers in question tend to have most of the writes occuring
 on the A side and we do reads and maintenance on the B side, so perhaps our
 setup is more of an Active/Passsive (R/W mode).

 That being said. Active/Active with OCFS2 works very well and we are quite
 comfortable with it.

 It should also be said that Active/Passsive (not mounted) has much better
 performance, irregardless of the file system used.


 One more question. I have 2 ethernet ports. eth1 is used to link both
 boxes together.
 Should i use for DRBD + Heartbeat a different IP address and class than on
 eth0 which is on the LAN ?


 We do that so that its easier to know what network you are looking at when
 ssh'ed on the box.

 -bill
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-24 Thread Digimer
On 04/24/2011 04:30 AM, Eduardo Gonzalez wrote:
 Sotty, guys, but is it really necesary to use a clustered file system
 for a primary-primary drbd with mysql? Does it really need a locking
 system?

Yes, you need cluster locking, so that all nodes know when any one node
requests to lock part of the file system. Non-clustered file systems
expect that they are the only one with access to the storage, and will
quickly corrupt is anything else changes the data.

-- 
Digimer
E-Mail: digi...@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-24 Thread Whit Blauvelt
On Sun, Apr 24, 2011 at 10:39:01AM -0400, Digimer wrote:

   OCFS2 and GFS2 require cluster locking, which comes with a fair amount
 of overhead. Primary/Secondary DRBD with a normal filesystem like ext3
 will certainly be faster, but in Secondary, you can not access the
 Secondary resource at all.

...

   Given the relative trivial expense of network cards, I always
 recommend three separate networks; Internet Facing, Storage and
 Back-Channel (cluster comms + live migrations when clustering VMs).

Digimer,

All useful stuff. Thanks. I hadn't considered three rather than two
networks. That's a good case for it.

Here's what I'm trying to scope out, and from your comments it looks to be
territory you're well familiar with. I've got two systems set up with KVM
VMs, where each VM is on its own LVM, currently each with primary-secondary
DRBD, where the primary roles are balanced across the two machines. As far
as I can tell, and from past comments here, It's necessary to go
primary-primary to enable KVM live migration, which is a very nice feature
to have. None of the VMs in this case face critical issues with disk
performance, so primary-primary slowing that, if it does in this context,
isn't a problem.

Since each VM is in raw format, directly on top of DRBD, on top of its
dedicated LVM, there is no normal running condition where locking should be
an issue. That is, there's no time, when the systems are both running well,
when both copies of a VM will be live - aside from during migration, where
libvirt handles that well.

It's the abnormal conditions that require planning. In basic primary-primary
it's possible to end up with the same VM on each host running based on the
same storage at the same time. When that happens, even cluster locking won't
necessarily prevent corruption, since the two instances can be doing
inconsistent stuff in different areas of the storage, in ways that locks at
the file system level can't prevent. 

There are two basic contexts where both copies of a VM could be actively
running at once like that. One is in a state of failover. In a way failover
initiation should be simpler here than that between non-VM systems. No
applications per se need to be started when one system goes down. It's just
that the VMs that were primary on it need to be started on the survivor. At
the same time, some variation of stonith needs to be aimed at the down
system to be sure it doesn't recover and create dueling VMs. Any hints at
what the most effective way of accomplishing that (probably using IPMI in my
case) will be welcomed.

The other way to get things in a bad state, if it's a primary-primary setup
for each VM, is operator error. I can't see any obvious way to block this,
other than running primary-secondary instead, and sacrificing the live
migration capacity. It doesn't look like libvirt, virsh and virt-manager
have any way to test whether a VM is already running on the other half of a
two-system mirror, so they might decline to start it when that's the case.

Maybe I'm missing something obvious? Is there, for instance, a way to run
primary-secondary just up to when a live migration's desired, and go
primary-primary in DRBD for just long enough to migrate? 

Thanks,
Whit
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD or not DRBD ?

2011-04-24 Thread Digimer
On 04/24/2011 09:57 PM, Whit Blauvelt wrote:
 Digimer,
 
  I really thank you for your long-form discussion. So much of the writing on
 this stuff is terse, making for a steep learning curve.
 
 You should be using Clustered LVM (clvmd). This way the LVM PV/VG/LVs
 are in sync across both nodes at all times.
 
 I'm not yet convinced why I should use clvmd. I'm not afraid of creating
 matching PV/VG/LVs by hand. It's easy to get those to match, and nothing
 that's run post setup is altering their scheme. On the KISS principle, since
 I'm capable enough of being stupid, I stick with the tools I know - in this
 case plain LVM - unless the argument for introducing something new is
 decisive. I've read some past discussion here about clvmd being required or
 not, and it seemed to lean against the requirement. With each VM being on a
 DRBD pair of two dedicated LV's (just for the one VM), I just don't see what
 can get confused on this level. Am I missing something?

Matching LVs are not the same LVs. The LV with your VM is a single item,
and having it treated as such, which you get with clvmd, will ensure
that it's not startable on either node at the same time.

 Running the same VM on either host is suicidal, just don't, ever. To
 help prevent this, using 'resource-and-stonith' and use a script that
 fires a fence device when a split-brain occurs, then recover the lost
 VMs on the surviving node. Further, your cluster resource manager
 (rgmanager or pacemaker) should themselves require a successful fence
 before beginning resource recovery.
 
 Yeah, I definitely have to either get a better hold on the logic of
 pacemaker, or write my own scripts for this stuff. These servers have IPMI.
 It would be simple in a bad state to be sure the replication link is
 dropped. Since the IPMI is on the LAN side, if one server loses sight of the
 other on both replication and LAN links, then it should be safe to send the
 other a shutdown message over IPMI given that the other, no longer being on
 the LAN, shouldn't be able to send the same message back at it at the same
 time. I think.

RHCS's rgmanager is much simpler than Pacemaker, and is well tested and
already exists. Writing your own scripts is, I'd argue, a fools errand. :)

As for fencing; It's always ideal to have two fence devices on separate
interfaces and switch, otherwise you're back to a single point of
failure again. If you lose a switch though and all network traffic is
stopped, you're not going to make much use of your VMs anyway.

 Then the only other logic needed, aside from firing appropriate notices to
 staff, is to start the list of VMs normally run on the down host. Am I
 making a beginner's mistake to think this can be kept so simple: If both
 links test dead for the other system, shut it down by IPMI, start up the VMs
 it was responsible for running, send notices, and we're done. Now, it would
 be good on restarting the other machine to have it recognize it shouldn't
 fire up all its usual VMs, so there's more logic needed to be ready for that
 event. But the initial failover looks simple. Pacemaker looks overly complex
 and opaque - or more likely I don't understand yet how simple it would be to
 set it up for this, as I'm getting lost among all it's other options. It's
 not much to script from scratch though, if it's as simple as it looks in my
 sketch.

I must admit, you lost me somewhat in your reference to emailing people. :)

The VMs that are lost when a node dies can be started manually on the
survivor, if that is what you wish. You still need the cluster for DLM
and fencing, but forgo the resource manager. However, I think you'd be
missing on the major benefit of clustering in that case. Just the same
though, having the VM data replicated would still reduce your MTTR.

 Fencing (stonith) generally defaults to restart. This way, with a
 proper setup, the lost node will hopefully reboot in a healthy state,
 connect to the DRBD resources and resync, rejoin the cluster and, if you
 configure it to do so, relocate the VMs back to their original host.
 Personally though, I disable automatic fail-back so that I can determine
 the fault before putting the VMs back.
 
 Hmm, restart rather than shut down. I take it there's a standard way to have
 that come back up without doing its normal start of its VMs, but instead to
 initialize a live migration of them back, just if the system comes up well?

If the node successfully rejoins the cluster and resync's the DRBD
resources, then you can have it live-migrate the VMs back automatically
if you wish. However, as I mentioned, I recommend leaving the VMs on the
surviving node and manually live-migrate them back once you've sorted
out what went wrong in the first place. This behaviour is configurable
in your resource manager of choice.

 Regardless, properly configured cluster resource manager should prevent
 the same VM running twice.
 
 ...
 
 That said, a properly configured resource manager can be told 

Re: [DRBD-user] DRBD or not DRBD ?

2011-04-23 Thread Patrick Egloff
Hi and thanks for the answer !

I got several pm urging me NOT to use active/active and OCFS2.

A more simpler active/passive and no OCFS2 would be the best choice Too
many things could go wrong with OCFS2 and active/active + MySQL.

But you fully understood my configuration and thanks for your help.
My drbd.conf is almost like the one you sent me.

But in my case, i must have another problem it's not working.

One more question. I have 2 ethernet ports. eth1 is used to link both boxes
together.
Should i use for DRBD + Heartbeat a different IP address and class than on
eth0 which is on the LAN ?

Patrick

2011/4/22 Digimer li...@alteeve.com

 On 04/22/2011 01:36 PM, Patrick Egloff wrote:
  Hi all,
 
  First of all, let me say that i'm a newbie with DRBD and not a high
  level linux specialist...

 Few are. Fewer still who claim to be. :)

  I want to have a HA setup for my Intranet which is using PHP + MySQL.
  (Joomla 1.6)
 
  For that, i have 2 DELL servers with 5 HD RAID on which i installed a
  CentOS 5.5 with
 
  I tried to install OCFS2, DRBD and Heartbeat as active/active. I'm at
  the point where i can access to my drbd partition  /sda6, but i can't
  make both boxes talk together.
  I do have some errors will loading :
  - mount.ocfs2 (device name specified was not found while opening device
  /dev/drbd0)
  - drbd is waiting for peer and i have to enter yes to stop the
 process
 
  After reading a lot, i'm not even sure anymore if my first project is
  the right choice...
 
  Is the configuration i planned the best one for my usage or should i
  change my plans for another setup with same result, that is high
  availibility ?
 
  If it makes sense to continue with drbd , i will be back with some
  questions about my problems...
 
 
  Thanks,

 I can't speak to heartbeat or OCFS2, as I use RHCS and GFS2, but the
 concept should be similar. Aside from that, those are questions above
 DRBD anyway.

 First, your RAID 5 is done in hardware, so CentOS only sees /dev/sda,
 right? Second, Partition 6 is what you want to use as a backing device
 on either node for /dev/drbd0? If you want to run Active/Active, then
 you will also want Primary/Primary, right?

 Given those assumptions, you will need to have a drbd.conf similar to
 below. Note that the 'on foo {}' section must have the same hostname
 returned by `uname -n` from either node. Also, change the 'address' to
 match the IP address of the interface you want DRBD to communicate on.
 Lastly, make sure any firewall you have allows port 7789 on those
 interfaces.

 Finally, replace '/sbin/obliterate' with the path to a script that will
 kill (or mark Inconsistent) the other node in a split-brain situation.
 This is generally done using a fence device (aka: stonith).

 Line wrapping will likely make this ugly, sorry.

 
 #
 # please have a a look at the example configuration file in
 # /usr/share/doc/drbd83/drbd.conf
 #

 # The 'global' directive covers values that apply to RBD in general.
 global {
# This tells Linbit that it's okay to count us as a DRBD user. If
 you
# have privacy concerns, set this to 'no'.
usage-count yes;
 }

 # The 'common' directive sets defaults values for all resources.
 common {
# Protocol 'C' tells DRBD to not report a disk write as complete
 until
# it has been confirmed written to both nodes. This is required for
# Primary/Primary use.
protocolC;

# This sets the default sync rate to 15 MiB/sec. Be careful about
# setting this too high! High speed sync'ing can flog your drives
 and
# push disk I/O times very high.
syncer {
rate15M;
}

# This tells DRBD what policy to use when a fence is required.
disk {
# This tells DRBD to block I/O (resource) and then try to
 fence
# the other node (stonith). The 'stonith' option requires
 that
# we set a fence handler below. The name 'stonith' comes
 from
# Shoot The Other Nide In The Head and is a term used in
# other clustering environments. It is synonomous with with
# 'fence'.
fencing resource-and-stonith;
}

# We set 'stonith' above, so here we tell DRBD how to actually fence
# the other node.
handlers {
# The term 'outdate-peer' comes from other scripts that flag
# the other node's resource backing device as
 'Inconsistent'.
# In our case though, we're flat-out fencing the other node,
# which has the same effective result.
outdate-peer/sbin/obliterate;
}

# Here we tell DRBD that we want to use Primary/Primary mode. It is
# also where we define split-brain (sb) recovery policies. As we'll
 be
# running all of our resources in Primary/Primary, only the
# 

Re: [DRBD-user] DRBD or not DRBD ?

2011-04-22 Thread Digimer
On 04/22/2011 01:36 PM, Patrick Egloff wrote:
 Hi all,
 
 First of all, let me say that i'm a newbie with DRBD and not a high
 level linux specialist... 

Few are. Fewer still who claim to be. :)

 I want to have a HA setup for my Intranet which is using PHP + MySQL.
 (Joomla 1.6)
 
 For that, i have 2 DELL servers with 5 HD RAID on which i installed a
 CentOS 5.5 with 
 
 I tried to install OCFS2, DRBD and Heartbeat as active/active. I'm at
 the point where i can access to my drbd partition  /sda6, but i can't
 make both boxes talk together.
 I do have some errors will loading :
 - mount.ocfs2 (device name specified was not found while opening device
 /dev/drbd0)
 - drbd is waiting for peer and i have to enter yes to stop the process
 
 After reading a lot, i'm not even sure anymore if my first project is
 the right choice...
 
 Is the configuration i planned the best one for my usage or should i
 change my plans for another setup with same result, that is high
 availibility ? 
 
 If it makes sense to continue with drbd , i will be back with some
 questions about my problems...
 
 
 Thanks,

I can't speak to heartbeat or OCFS2, as I use RHCS and GFS2, but the
concept should be similar. Aside from that, those are questions above
DRBD anyway.

First, your RAID 5 is done in hardware, so CentOS only sees /dev/sda,
right? Second, Partition 6 is what you want to use as a backing device
on either node for /dev/drbd0? If you want to run Active/Active, then
you will also want Primary/Primary, right?

Given those assumptions, you will need to have a drbd.conf similar to
below. Note that the 'on foo {}' section must have the same hostname
returned by `uname -n` from either node. Also, change the 'address' to
match the IP address of the interface you want DRBD to communicate on.
Lastly, make sure any firewall you have allows port 7789 on those
interfaces.

Finally, replace '/sbin/obliterate' with the path to a script that will
kill (or mark Inconsistent) the other node in a split-brain situation.
This is generally done using a fence device (aka: stonith).

Line wrapping will likely make this ugly, sorry.


#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#

# The 'global' directive covers values that apply to RBD in general.
global {
# This tells Linbit that it's okay to count us as a DRBD user. If you
# have privacy concerns, set this to 'no'.
usage-count yes;
}

# The 'common' directive sets defaults values for all resources.
common {
# Protocol 'C' tells DRBD to not report a disk write as complete until
# it has been confirmed written to both nodes. This is required for
# Primary/Primary use.
protocolC;

# This sets the default sync rate to 15 MiB/sec. Be careful about
# setting this too high! High speed sync'ing can flog your drives and
# push disk I/O times very high.
syncer {
rate15M;
}

# This tells DRBD what policy to use when a fence is required.
disk {
# This tells DRBD to block I/O (resource) and then try to fence
# the other node (stonith). The 'stonith' option requires that
# we set a fence handler below. The name 'stonith' comes from
# Shoot The Other Nide In The Head and is a term used in
# other clustering environments. It is synonomous with with
# 'fence'.
fencing resource-and-stonith;
}

# We set 'stonith' above, so here we tell DRBD how to actually fence
# the other node.
handlers {
# The term 'outdate-peer' comes from other scripts that flag
# the other node's resource backing device as 'Inconsistent'.
# In our case though, we're flat-out fencing the other node,
# which has the same effective result.
outdate-peer/sbin/obliterate;
}

# Here we tell DRBD that we want to use Primary/Primary mode. It is
# also where we define split-brain (sb) recovery policies. As we'll be
# running all of our resources in Primary/Primary, only the
# 'after-sb-2pri' really means anything to us.
net {
# Tell DRBD to allow dual-primary.
allow-two-primaries;

# Set the recover policy for split-brain recover when no device
# in the resource was primary.
after-sb-0pri   discard-zero-changes;

# Now if one device was primary.
after-sb-1pri   discard-secondary;

# Finally, set the policy when both nodes were Primary. The
# only viable option is 'disconnect', which tells DRBD to
# simply tear-down the DRBD resource right away and wait for
# the administrator to manually 

Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen

2010-04-09 Thread Hany Fahim
Thanks for the reply. I actually came across that thread yesterday after I
originally sent the first e-mail. I've disabled checksumming using ethtool
-K described and it's been working great. It was also documented in this
thread:

https://bugzilla.redhat.com/show_bug.cgi?id=443621

Thanks again,

hany

On Thu, Apr 8, 2010 at 4:09 PM, Lars Ellenberg lars.ellenb...@linbit.comwrote:

 On Thu, Apr 08, 2010 at 08:48:54PM +0200, Florian Haas wrote:
  On 04/08/2010 08:14 PM, Hany Fahim wrote:
   Hi,
  
   I'm currently running a Xen setup using the CentOS distribution DRBD
   8.3.2 from Extras Repo. I'm running into consistent kernel panics when
   I'm benchmarking the individual DomUs. When the primary node crashes,
   the secondary also panics shortly after and the two servers reboot in
   tandem. Doing a search in Google, I found someone else who has the
 exact
   same issue as I do:
  
  
 http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html
 
  And it was answered here:
 
  http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html
 
  Maybe Maros can share additional findings.


 Actually, it later turned out to be likely related to this:
 http://www.gossamer-threads.com/lists/drbd/users/17207
 (read the whole thread)

 there are a few more threads on that thing,
 but basically you can use 8.3.7, and see it that helps,
 upgrade your xen kernel as well,
 do some of the ethtool things mentioned in the above thread,
 and if none of that helps, disable DRBD's use of sendpage,
 which is a module parameter, but can also be toggled at runtime.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 __
 please don't Cc me, but send to list   --   I'm subscribed
 ___
 drbd-user mailing list
 drbd-user@lists.linbit.com
 http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen

2010-04-08 Thread Florian Haas
On 04/08/2010 08:14 PM, Hany Fahim wrote:
 Hi,
 
 I'm currently running a Xen setup using the CentOS distribution DRBD
 8.3.2 from Extras Repo. I'm running into consistent kernel panics when
 I'm benchmarking the individual DomUs. When the primary node crashes,
 the secondary also panics shortly after and the two servers reboot in
 tandem. Doing a search in Google, I found someone else who has the exact
 same issue as I do:
 
 http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html

And it was answered here:

http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html

Maybe Maros can share additional findings.

Florian



signature.asc
Description: OpenPGP digital signature
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Kernel Panic DRBD-LVM-Xen

2010-04-08 Thread Lars Ellenberg
On Thu, Apr 08, 2010 at 08:48:54PM +0200, Florian Haas wrote:
 On 04/08/2010 08:14 PM, Hany Fahim wrote:
  Hi,
  
  I'm currently running a Xen setup using the CentOS distribution DRBD
  8.3.2 from Extras Repo. I'm running into consistent kernel panics when
  I'm benchmarking the individual DomUs. When the primary node crashes,
  the secondary also panics shortly after and the two servers reboot in
  tandem. Doing a search in Google, I found someone else who has the exact
  same issue as I do:
  
  http://lists.centos.org/pipermail/centos-virt/2008-December/000775.html
 
 And it was answered here:
 
 http://lists.linbit.com/pipermail/drbd-user/2008-December/011092.html
 
 Maybe Maros can share additional findings.


Actually, it later turned out to be likely related to this:
http://www.gossamer-threads.com/lists/drbd/users/17207
(read the whole thread)

there are a few more threads on that thing,
but basically you can use 8.3.7, and see it that helps,
upgrade your xen kernel as well,
do some of the ethtool things mentioned in the above thread,
and if none of that helps, disable DRBD's use of sendpage,
which is a module parameter, but can also be toggled at runtime.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user