from:"Lars Ellenberg"

Re: [DRBD-user] Help with drbddisk modification to block takeover when the local resource is not in a safe state

2010-09-06 Thread Lars Ellenberg

On Mon, Sep 06, 2010 at 08:34:40PM +0800, jan gestre wrote:
> Hi Everyone,
> 
> I've found this drbddisk modification that will block takeover when
> the local resource is not in a safe state, however it only works if
> you only have one resource, but since I have two resources namely r0
> and r1, it would not work.
> 
> case "$CMD" in
>start)
>  # forbid to become primary if ressource is not clean
>  DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep '
> ds:UpToDate/' | wc -l`
>  if [ $DRBDSTATEOK -ne 1 ]; then
>echo >&2 "drbd is not in Connected/UpToDate state. refusing to
> start resource"
>exit 1
>  fi
> 
> I would be truly grateful if anyone could care to show how to effect
> said modification.
> 
> I'm trying to prevent a Split Brain scenario here, and I'm still
> testing my setup; I was in a predicament earlier wherein one of the
> resource r1 is in healthy state and r0 is in standalone
> Primary/Unknown state, I had to issue drdbadm -- --discard-my-data r0
> to resolve the split brain.

No Sir.

What if the Primary dies? Hard?
You now want your Secondary to take over, no?
Well, you cannot anymore. Because it is not Connected.
How could it, you just lost the peer ;-)

Don't focus only on one specific scenario.
Because, if you just "fix" that specific scenario,
you break a truckload of others.

Maybe it helps a bit to read
http://www.mail-archive.com/pacema...@oss.clusterlabs.org/msg04312.html

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Help with drbddisk modification to block takeover when the local resource is not in a safe state

2010-09-06 Thread Lars Ellenberg

On Mon, Sep 06, 2010 at 10:02:51PM +0800, jan gestre wrote:
> On Mon, Sep 6, 2010 at 8:48 PM, Lars Ellenberg
>  wrote:
> > On Mon, Sep 06, 2010 at 08:34:40PM +0800, jan gestre wrote:
> >> Hi Everyone,
> >>
> >> I've found this drbddisk modification that will block takeover when
> >> the local resource is not in a safe state, however it only works if
> >> you only have one resource, but since I have two resources namely r0
> >> and r1, it would not work.
> >>
> >> case "$CMD" in
> >>    start)
> >>      # forbid to become primary if ressource is not clean
> >>      DRBDSTATEOK=`cat /proc/drbd | grep ' cs:Connected ' | grep '
> >> ds:UpToDate/' | wc -l`
> >>      if [ $DRBDSTATEOK -ne 1 ]; then
> >>        echo >&2 "drbd is not in Connected/UpToDate state. refusing to
> >> start resource"
> >>        exit 1
> >>      fi
> >>
> >> I would be truly grateful if anyone could care to show how to effect
> >> said modification.
> >>
> >> I'm trying to prevent a Split Brain scenario here, and I'm still
> >> testing my setup; I was in a predicament earlier wherein one of the
> >> resource r1 is in healthy state and r0 is in standalone
> >> Primary/Unknown state, I had to issue drdbadm -- --discard-my-data r0
> >> to resolve the split brain.
> >
> > No Sir.
> >
> > What if the Primary dies? Hard?
> > You now want your Secondary to take over, no?
> > Well, you cannot anymore. Because it is not Connected.
> > How could it, you just lost the peer ;-)
> >
> > Don't focus only on one specific scenario.
> > Because, if you just "fix" that specific scenario,
> > you break a truckload of others.
> >
> > Maybe it helps a bit to read
> > http://www.mail-archive.com/pacema...@oss.clusterlabs.org/msg04312.html
> >
> 
> Thanks Lars, but now I am confused, maybe you can enlighten me, you're
> saying that I would be better off without modifying, what then would
> you recommend to prevent Split Brain? Add a stonith device, e.g. IBM
> RSA? Add handlers like dopd?
> 
> BTW, I got the modification from this url -->
> http://lemonnier.se/erwan/blog/item/53/

Which is mislead.
And, it is not an attempt to avoid split brain,
but to avoid diverging data sets,
one of the ill consequences a split brain can lead to. 

What the presented patch does is disable takeover in case the Primary node dies.
So why then have heartbeat, in the first place?

I'll partially quote that blog:

| Let's take an example: two nodes, N0 and N1. N0 is primary, N1 is secondary.
| Both have redundant heartbeat links and at least one dedicated drbd
| replication link. Let's consider the (highly) hypothetical case when the drbd
| link goes down, soon followed by a power outage for N0. What will happen in a
| standard heartbeat/drbd setup is that when the drbd link goes down, the drbd
| daemon will set the local ressources on both nodes in state 'cs:WFConnection'
| (Waiting For Connection) and mark the peer data as outdated.

So far that is correct. Where "the drbd daemon" would be dopd.  Or, in a
pacemaker cluster, you could also use the crm-fence-peer script to achieve
a similar effect.

| Then when N0
| disappears due to the power outage, heartbeat on N1 will takeover ressources
| and become the primary node.

Which is wrong.

First, drbd will refuse to be promoted if it is outdated.
So this outdating seems to have not worked in the above setup.
Fix it.

| What we may want is to forbid a node to become primary in case its drbd
| resources are not in a connected and up-to-date state.

Which you already have: if it is Outdated, it cannot be promoted.

Second, in a properly configured Pacemaker setup,
Pacemaker (resp. the drbd OCF resource agent) would already know,
and not even try to promote it on the outdated node.

Besides, it should be a very unlikely event that a just rebooted, isolated node
decides to take over resources.

Maybe you should increase your initdead time.

Or wait for connection before even starting heartbeat/pacemaker.
In the haresources mode heartbeat clusters and using drbddisk, the drbd
wfc-timeout parameter is used for this, and the default for it is "unlimited",
so by default, the drbd init script would in most cases wait forever for drbd
to establish a connection to its peer, thereby blocking the bootup process on
purpose. Heartbeat would only start, once DRBD was able to establish its
connection.

Additionally maybe add a third node, so you have real quorum?

But it depends on you, and what you want to achieve, of course.
There is no one single best way.

Re: [DRBD-user] drbd module not correctly loading

2010-09-06 Thread Lars Ellenberg

On Mon, Sep 06, 2010 at 01:46:46PM -0400, Jean-Francois Malouin wrote:
> Hi,
> 
> I sent something similar twice to the list this weekend but to no
> avail.

You need to subscribe here before you are allowed to post.

> Here a shorter version:
> 
> On a Debian/squeeze system running 2.6.32-5-xen-amd64 I downloaded
> drbd-8.3.8 from the git repositary, compiled and installed the module
> with module-assistant. When I start drbd I get 
> 
> Starting DRBD resources:DRBD module version: 8.3.7
>userland version: 8.3.8
> 
> Looks like the system still loads the module from the squeeze kernel
> image package linux-image-2.6.32-5-xen-amd64 rather than from
> drbd8-module-2.6.32-5-xen-amd64. 
> 
> When I try to load it manually with insmod I get:
> 
> :~# insmod /lib/modules/2.6.32-5-xen-amd64/kernel/drivers/block/drbd.ko
> insmod: error inserting
> '/lib/modules/2.6.32-5-xen-amd64/kernel/drivers/block/drbd.ko': 
> -1 Invalid module format
> 
> On an identical system (to be used in a pacemaker cluster) I don't
> experience this issue.
> 
> Any ideas?

Build error. Maybe module-assistant got something wrong,
or you got something wrong.
Sorry, I know that was not very helpful.

What does "dmesg" say?
It is likely complaining about version magic or something like that.
Double check that you are using the right gcc version,
matching kernel headers, and the right make arguments for your setup.

Or, well, remember what you did differently
on the "identical system that works" (TM).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd module not correctly loading

2010-09-06 Thread Lars Ellenberg

On Mon, Sep 06, 2010 at 04:37:07PM -0400, Jean-Francois Malouin wrote:
> * Lars Ellenberg  [20100906 16:05]:
> > On Mon, Sep 06, 2010 at 01:46:46PM -0400, Jean-Francois Malouin wrote:
> > > Hi,
> > > 
> > > On a Debian/squeeze system running 2.6.32-5-xen-amd64 I downloaded
> > > drbd-8.3.8 from the git repositary, compiled and installed the module
> > > with module-assistant. When I start drbd I get 
> > > 
> > > Starting DRBD resources:DRBD module version: 8.3.7
> > >userland version: 8.3.8
> > > 
> > > Looks like the system still loads the module from the squeeze kernel
> > > image package linux-image-2.6.32-5-xen-amd64 rather than from
> > > drbd8-module-2.6.32-5-xen-amd64. 
> > > 
> > > When I try to load it manually with insmod I get:
> > > 
> > > :~# insmod /lib/modules/2.6.32-5-xen-amd64/kernel/drivers/block/drbd.ko
> > > insmod: error inserting
> > > '/lib/modules/2.6.32-5-xen-amd64/kernel/drivers/block/drbd.ko': 
> > > -1 Invalid module format
> > > 
> > > On an identical system (to be used in a pacemaker cluster) I don't
> > > experience this issue.
> > > 
> > > Any ideas?
> > 
> > Build error. Maybe module-assistant got something wrong,
> > or you got something wrong.
> 
> or both...

Still not subscribed ;-)

> > Sorry, I know that was not very helpful.
> > 
> > What does "dmesg" say?
> 
> when I insmode drbd I get
> drbd: exports duplicate symbol lc_seq_dump_details (owned by lru_cache)

The "in mainline linux" drbd split out one part of it as extra module,
while the out-of-tree module still contains this part.

rmmod lru_cache # in tree module
insmod drbd.ko  # out of tree module

Should fix it for you.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] File corruption in drbd partition

2010-09-07 Thread Lars Ellenberg

On Tue, Sep 07, 2010 at 09:35:48AM +, putcha narayana wrote:
> 
> Hi,
> 
> We are running continuous failovers on a redundant setup (Active / Standby).
> After few failovers we observe content of file x appears inside file y.

How much is "few"?
What is the IO load?
How do you trigger the failover?
DRBD version, kernel version, file system type?
Volatile caches involved?
How often/when do you fsck?

> In one particular case we observed inode corruption, when fsck command is run 
> on /repl partition.
>  Multiply-claimed block(s) in inode 28: 1233 1249 1251 1252
>  Multiply-claimed block(s) in inode 1183: 1251 1252
>  Multiply-claimed block(s) in inode 1184: 1233
>  Multiply-claimed block(s) in inode 1185: 1249
> 
> When fsck -fy is run on /repl partition then the end result is content of 
> file x is seen in file y.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] File corruption in drbd partition

2010-09-07 Thread Lars Ellenberg

On Tue, Sep 07, 2010 at 12:12:08PM +, putcha narayana wrote:
> 
> Thanks for responding,
> 
>  
> 
> FYI: I have ran stat command to get details of the files whose data is
> seen criss-crossing. I mean content of one file is seen in another.
> Snapshot enclosed at the end, when corruption occured.
> 
> Files which have an issue belong to same block,  IO Block: 4096   

No, that is the file size in occupied blocks.

> Every corruption seen, content of /repl/firewall/sysconfig/iptables content 
> is seen in /repl/snmpagent/data/snmpd.conf
> 
>  
> 
>  How much is "few"?
> 
>  Today After 12 failovers. Last run after 80 failovers similar 
> corruption is seen.
> 
> 
>  What is the IO load?
> 
> Note exactly sure, When sigterm is received there are 2 processes which 
> write config data to DRBD partition.
> 
> 
>  How do you trigger the failover?
> 
>using reboot command
> 
> 
> DRBD version, kernel version, file system type?
> 
>DRBD-8.0.16, 2.6.14.7, EXT3-FS
> 
> 
>  Volatile caches involved?
> 
>NO
> How often/when do you fsck?
> 
>   Every time DRBD-GO-Primary script is called. Before mounting DRBD partition 
> we invoke fsck -fy

That is you do
  primary; fsck /dev/drbd0; mount;
in that order?

The observerd corruption may be caused by a lot of things.
DRBD (in that version) may have an issue.
ext3 (in your kernel version) may have an issue.
the generic write-out path (in your kernel version) may have an issue.
fsck (resp. your version of fsck) may have an issue.
probably many other things I cannot think of right now ;-)

I suggest to repeat your tests with
 * no drbd involved, simply reboot a single box the same way you do now,
   force fsck before the mount.
 * more recent kernel (and distribution?)
 * more recent DRBD version (8.3.8.1) in your current setup
 * more recent DRBD version with newer kernel (and distribution)

To get additional data points.

> > Date: Tue, 7 Sep 2010 12:16:59 +0200
> > From: lars.ellenb...@linbit.com
> > To: drbd-user@lists.linbit.com
> > Subject: Re: [DRBD-user] File corruption in drbd partition
> > 
> > On Tue, Sep 07, 2010 at 09:35:48AM +, putcha narayana wrote:
> > > 
> > > Hi,
> > > 
> > > We are running continuous failovers on a redundant setup (Active / 
> > > Standby).
> > > After few failovers we observe content of file x appears inside file y.
> > 
> > How much is "few"?
> > What is the IO load?
> > How do you trigger the failover?
> > DRBD version, kernel version, file system type?
> > Volatile caches involved?
> > How often/when do you fsck?
> > 
> > > In one particular case we observed inode corruption, when fsck command is 
> > > run on /repl partition.
> > > Multiply-claimed block(s) in inode 28: 1233 1249 1251 1252
> > > Multiply-claimed block(s) in inode 1183: 1251 1252
> > > Multiply-claimed block(s) in inode 1184: 1233
> > > Multiply-claimed block(s) in inode 1185: 1249
> > > 
> > > When fsck -fy is run on /repl partition then the end result is content of 
> > > file x is seen in file y.
> > 
> > 
> > 
> > -- 
> > : Lars Ellenberg
> > : LINBIT | Your Way to High Availability
> > : DRBD/HA support and consulting http://www.linbit.com
> > 
> > DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> > __
> > please don't Cc me, but send to list -- I'm subscribed
> > ___
> > drbd-user mailing list
> > drbd-user@lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> 

> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] File corruption in drbd partition

2010-09-07 Thread Lars Ellenberg

On Tue, Sep 07, 2010 at 03:15:07PM +0200, Lars Ellenberg wrote:
> On Tue, Sep 07, 2010 at 12:12:08PM +, putcha narayana wrote:
> > 
> > Thanks for responding,
> > 
> >  
> > 
> > FYI: I have ran stat command to get details of the files whose data is
> > seen criss-crossing. I mean content of one file is seen in another.
> > Snapshot enclosed at the end, when corruption occured.
> > 
> > Files which have an issue belong to same block,  IO Block: 4096   
> 
> No, that is the file size in occupied blocks.

Scratch that.
It's not even that, it's the "Optimal block size for IO" ;-)
So not at all interessting in this context.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] File corruption in drbd partition

2010-09-07 Thread Lars Ellenberg

On Tue, Sep 07, 2010 at 01:28:43PM +, putcha narayana wrote:
> > * more recent DRBD version with newer kernel (and distribution)
> [[LAK]]: I have seen in the mailing list people reported, on DRBD-8.3.x "I 
> shall become SyncTarget, but I am primary" 
> To this some on replied that DRBD network is being shutdown too 
> fast.
> may be similar thing is happening in our case which eventually 
> resulted in corruption (may be!). 

I don't see what this would have to do with things, and I'm not sure
what exactly this refers to, anyways.

> Does DRBD provide any options to protect against such corruption, say 
> partial writes, lock the disk for writes

First you have to convince me that DRBD actually causes corruption here.
It may well be that it simply mirrors corruption happening above the
DRBD layer.

There are no partial writes on the DRBD layer.  And I'm not sure what
you mean with "lock the disk" in this context, either.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd stops sync on LVM

2010-09-08 Thread Lars Ellenberg

On Wed, Sep 08, 2010 at 01:31:02PM +0200, Eric Deschamps wrote:
> Hi list!
> 
> I am trying to sync KVM virtual machines located on LVM spaces.
> 
> The first host is a ubuntu 8.04 amd64 server (host 1) and the second is
> a ubuntu 10.04 amd64 server, both with original kernels. Drbd api are
> not the same on both host  (8.3.0 on host1 and 8.3.7 on host2), but i'm
> not sure it can cause any problem.
> 
> Another point is that I've had to use external metadata, as I'm using
> contiguous LV that I could not extend for some VMs.
> 
> I'd like to first migrate all the VM from the first server to the second
> one, then upgrade the first one to 10.04 and put the VMs back on host1.
> 
> The initial sync works like a charm, but if I stop a VM on host 1 and
> start it on host2, it looks like the system is a snapshot from the
> initial sync.
> 
> Anyway, /proc/drbd on both nodes prints UpToDate/UpToDate as well as
> drbdadm.
> 
> What could cause this problem ? What am i doing wrong ?

Could you post the output of

for vm in $(virsh list --all | awk 'NR > 2 {  print $2  }') ; do
printf "\n%s\n" $vm;
virsh dumpxml $vm | sed -e '/http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbdsetup 0 disk /dev/sdb1 /dev/sdb1 internal --set-defaults --create-device failed - continuing

2010-09-08 Thread Lars Ellenberg

On Tue, Sep 07, 2010 at 12:20:17AM +0500, Muhammad Sharfuddin wrote:
> OS: SUSE Linux Enterprise 11 SP1
> HAE SP1
> drbd version: 8.7.3

8.3.7 ... we don't have 8.7.x yet ;-)

> on one node I got the following messages while starting the drbd daemon
> 
> node1:~ # /etc/init.d/drbd start
> 
> Starting DRBD resources: [ d(r0) 0: Failure: (114) Lower device is
> already claimed. This usually means it is mounted.
> 
> 
> [r0] cmd /sbin/drbdsetup 0 disk /dev/sdb1 /dev/sdb1 internal
> --set-defaults --create-device  failed - continuing!
> 
> s(r0) n(r0) ].
> 
> stop/start the drbd daemon two  times in a row fix the issue, i.e no more
> such messages.

There is no drbd daemon. DRBD is an in kernel stacking device driver,
configured through userland tools.
The second configuration attempt was successfull,
the disk was not claimed at that point anymore,
so whatever was accessing sdb1 had a short life.

> and the above message never comes before today.
> likewise the above message didn't came on the other node(node2)

Likely something triggered through udev (or other) had sdb1 open at that time.
It impossible to determine what had it open at that time, after the fact.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Can't seem to get DRBD + Heartbeat to work properly

2010-09-08 Thread Lars Ellenberg

On Thu, Sep 02, 2010 at 05:48:53PM -0700, Michael Shadle wrote:
> running latest DRBD/Heartbeat/Pacemaker available in Ubuntu Lucid (10.04)
> 
> I have /dev/drbd1, formatted as xfs, I can mount it manually and it
> has data on it, but I can't seem to get it to get triggered properly
> using heartbeat/pacemaker. At first I followed the DRBD user manual
> which had me using cibadmin, which kept telling me an invalid
> schema/DTD, then on #drbd someone said to use crm so I tried using
> that. It seems like things are configured properly, but a "crm_verify
> -L" told me that it would not start without stonith being configured.
> So I configured that with some dummy thing, and it seems to want to
> start everything up but doesn't work. Looks like it still doesn't
> understand what I am going for - which is simply to have an
> active/passive /home XFS partition mounted on two machines. That's it.
> No DRBD+MySQL/etc.
> 
> mirror1 is active/primary, mirror2 does not exist yet (as soon as
> mirror1 is functional I will be reformatting a machine to -make- it
> mirror2)
> 
> Most the writeup is MySQL specific so I tried to tweak it but I'm
> still at a loss here. Any help?
> 
> drbd 8.3.7 (api 88) - from ubuntu repo
> heartbeat version: 1:3.0.3-1ubuntu1
> pacemaker version: 1.0.8+hg15494-2ubuntu2 (same as cibadmin, crmadmin)
> 
> Here's a bunch of daemon.log extract from start and while it's
> running: http://pastebin.com/XFpKxeqp
>
> Here's my /etc/ha.d/ha.cf:
> 
> autojoin none
> ucast eth0 10.9.185.4 10.36.148.112
> crm yes
> use_logd on
> bcast eth1
> warntime 5
> deadtime 15
> initdead 15
> keepalive 2
> node mirror1 mirror2
> 
> Here's the output from crm...
> 
> # crm
> crm(live)# configure
> crm(live)configure# show
> node $id="fd4053b1-a50b-4c01-9e54-56bc24fdebc1" mirror1
> primitive drbd_r0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="15s"
> primitive fs_r0 ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/r0" directory="/home" fstype="xfs"
> primitive st-null stonith:null \
> params hostlist="mirror1 mirror2"
> group r0 fs_r0
> clone fencing st-null
> property $id="cib-bootstrap-options" \
> dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
> cluster-infrastructure="Heartbeat"

You are missing colocation and order dependencies between
your drbd, Filesystem, and whatever else needs to be started.

And make sure you have the udev rules for drbd (package drbd-udev),
if you want to access it via /dev/drbd/by-res/*.

> drbd config:
> 
> global {
> usage-count no;
> }
> 
> common {
> protocol C;
> 
> handlers {
> pri-on-incon-degr
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
> ; reboot -f";
> pri-lost-after-sb
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger
> ; reboot -f";
> local-io-error "/usr/lib/drbd/notify-io-error.sh;
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o >
> /proc/sysrq-trigger ; halt -f";
> fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> }
> 
> startup {
> degr-wfc-timeout 15;
> wfc-timeout 15;
> }
> 
>     disk {
> fencing resource-only;
> }
> }
> 
> resource r0 {
>   device/dev/drbd1;
>   meta-disk internal;
>   on mirror1 {
> disk  /dev/sda7;
> address   10.9.185.4:7789;
>   }
>   on mirror2 {
> disk  /dev/sda7;
> address   10.36.148.112:7789;
>   }
> }
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD + LVM

2010-09-08 Thread Lars Ellenberg

On Thu, Sep 02, 2010 at 09:45:58AM +0930, Mike Hall wrote:
> I'm currently testing a replicated network storage device using CentOS
> 5.5 and DRBD 8.3, as a storage back end for KVM virtualisation. We may
> use either iSCSI (~SAN) or NFS (~NAS) to make this storage available
> to the VM host.

Typical usage scenario.

> I was having trouble changing the status of a DRBD node or bringing a
> node off-line, getting messages about something holding the device
> open. This turned out to be LVM, and it appears that it is necessary
> to bring LVM volumes offline (vgchange -an) before changing DRBD's
> status.

For LVM on top of DRBD (drbdX being a PV), this is correct.

> So, my question is this:
> 
> If I bring LVM volumes offline on node 1 and demote node 1 to
> secondary, then promote node 2 to primary and bring the LVM volumes
> online again on node 2, is it necessary to also bring those LVM
> volumes online again back on node 1?
> 
> I'm guessing no (because it's the 'same' device on either node), but am 
> checking to be sure.

In a "single master" setup (always recommended, unless you have a really
good reason to use dual-master, and know what you are doing),
you cannot bring them online on a DRBD Secondary.
At least you should not be able to. If you can activate your VG despite
it being on top of DRBD, and  DRBD being Secondary, you screwed up your
filters in lvm.conf.

> Also, it it better to:

There is no better.
There is only better for a specific purpose, to a specific measure ;-)

> - format the LVMs on the storage machine (NAS/SAN) and then make those 
> volumes available to the VM host, or
> 
> - in the case of iSCSI, just make unformatted devices available and format 
> them from the VM host?

In this case, I think it does not make much difference.

> Anything else I need to be aware of ?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Real live risk of data loss w/o flush

2010-09-08 Thread Lars Ellenberg

On Thu, Sep 02, 2010 at 03:22:25PM +0200, Robert Verspuy wrote:
>  On 08/09/2010 11:08 AM, Sebastian Hetze wrote:
> >Hi *,
> >
> >What is your opinion and possibly your experience with using
> >no-disk-barrier and no-disk-flushes without BBU RAID?  The reason for
> >me asking is the huge latency I suffer using flushes in my setup
> >where I run several virtual KVM instances in DRBD containers without
> >BBU RAID. These virtual systems frequently flush disks and these
> >operations occasionally queue up to a substantial epoch of 100 or even
> >higher.
> >
> Sebastian,
> 
> See also my other 2 messages to the list, mailed yesterday and today.
> After some testing on our new database cluster,
> I'm seeing a huge latency in writing small packets to disk with flushes.
> Now I'm going to use protocol C, no-disk-barrier, no-disk-flushes,
> and no BBU on primary and secondary.
> 
> Your message helped me thinking about the risks.
> 
> Both our servers have 2 power supplies, connected to 2 power feeds.
> So in case of a power failure of one feed, both servers will still
> be running.
> 
> Just like you mention, only in a complete power failure in the datacentre,
> drbd will loose data, but at that moment all other servers using the
> database server are also offline.
> 
> On the database server we're using PostgreSQL.
> PostgreSQL is ACID-compliant, so the data on disk should not be corrupt.
> It could be possible that we lost some database insert/updates,
> but that's a risk I'm willing to accept, looking at the small change
> that all power is lost.

Excuse me, but WHAT?

PostgreSQL is ACID compliant, IF AND ONLY IF the fsync/fdatasync and
similar it issues are behaving as expected, i.e. data is on stable
storage when PostgreSQL thinks it is.

If data only reaches stable storage at some point after PostgreSQL
thinks it already was there, and most likely even in some random order,
then no, ACID compliance is not met.

So no, if you run PostgreSQL on disks with volatile caches,
and you unplug the power hard, you can expect data loss
and possibly data corruption.

Which is completely independend of DRBD.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Real live risk of data loss w/o flush

2010-09-08 Thread Lars Ellenberg

On Mon, Aug 09, 2010 at 11:08:22AM +0200, Sebastian Hetze wrote:
> Hi *,
> 
> I would like to get more info about the real live risk of data loss
> with DRBD using no-disk-barrier and no-disk-flushes on RAID
> controllers without BBU.
> 
> If I understand things correctly, DRBD adds barriers into the data
> stream from primary to secondary (at least) on each flush of the
> underlying primary device.

No, after any IO completion visible to upper layers.

> Without barrier support it flushes the
> secondary on each flush of the primary.  This happens to make shure,
> subsequent operations that rely on the data to be commited on disk
> find the same state on the secondary in case of a failover.
> 
> If I use a RAID controller with BBU, that takes care for all data that
> has reached the controller cache to survive (some) crashes or power
> failures.
> 
> But what are the scenarios where I really suffer data loss without
> BBU?  And is my risk of data loss hihger with DRBD than it would be
> without?
> 
> The primary use case for DRBD as I see it is failure of one node in
> the cluster that leads to a failover to the secondary. In this case we
> have one survivor and this survivor has plenty of time to flush all
> data from the cache buffer to its disk before the failover proceeds.
> And reads would give me the cached data meanwhile.
> The benefit I get from the BBU in this situation is this flush time.
> After that time, the data on disk is exactly the same, so there is no
> additional protection against data corruption that might arise from
> faulty data sent by the primary during the crash. As soon as this data
> is in secondary cache it will be written to disk sooner or later.
> 
> If this is correct so far,

I think so.

> the remaining risk is simultanious
> (power-)failure of both nodes.

Not quite.
There is also the single node crash.

If the secondary crashes, then later comes back,
we need to resync everything that _may_ be different.

That includes anything that is changed since we lost the Secondary.
It also includes everything that has been in flight to the Secondary.
AND it includes everything that may have been lost due to volatile
caches.

Similar for primary crash.  The amount of data that may have been lost
due to volatile caches, but is no longer covered by the activity log
may fail to be resynced, as DRBD has no further way to track that.

> If this happens, there are several
> causes of trouble.  I suffer real service downtime although I have
> spent so much money for high availability. I might get asked why I did
> not spend the little extra money on independent UPS for both nodes.
> Data on the secondary might have been written out of order leading to
> an inconsistent state. On the primary, without BBU an queued flush
> might have succeeded or not, but the write order is correct.

With volatile caches involved, and without cache flushes at appropriate
times, you can forget about write ordering. It can no longer be
controlled by the OS.

> I will likely suffer data loss in this scenario, but there is no
> additional risk by using DRBD.
>
> On boot after (power-)recovery the
> primary needs a file system check to cleanup possible damage but this is
> exactly the same risk as in the standalone case.  Even with BBU (on the
> primary) in this scenario I would rely on the primary data more than on
> the secondary. So the only case where I would really get extra
> reliablity from barriers and in order flushes on the secondary would be
> if only my secondary has a BBU and the primary does not.

Only that with volatile caches without flushes, now there is potential
for DRBD to "forget" to resync parts of the disk that would need to be
resynced, as the corresponding data has been lost in volatile caches.
So you get potential for data divergence without DRBD having any chance
to know about it.

> What is your opinion and possibly your experience with using
> no-disk-barrier and no-disk-flushes without BBU RAID?  The reason for
> me asking is the huge latency I suffer using flushes in my setup
> where I run several virtual KVM instances in DRBD containers without
> BBU RAID. These virtual systems frequently flush disks and these
> operations occasionally queue up to a substantial epoch of 100 or even
> higher.

Try disable all barriers and flushes,
but put the disks in write-through instead
(and hope that they don't forget about that setting).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Real live risk of data loss w/o flush

2010-09-08 Thread Lars Ellenberg

On Wed, Sep 08, 2010 at 04:05:02PM +0200, Hans Gregersen Jensen wrote:
> Hi all.
> 
> I've been having similar concerns about disk-flushes.
> The primary node I BBU RAID, while the secondary is not.
> 
> Is it possible to configure drbd to only use disk-flushes on the secondary 
> node?
> I should mention that I'm using protocol B.
> 
> Any pointers would be appreciated.. I haven't been able to find anything in 
> the documentation about a per-node disk-flush configuration. 

drbd.conf does not yet allow disk {} settings on a per-node basis.
But you can have drbd.conf differ in that setting on both nodes.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD and iSCSI

2010-09-08 Thread Lars Ellenberg

On Wed, Sep 08, 2010 at 01:59:13PM +0100, Mark Watts wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> I know you can export DRBD block devices as iSCSI LUNs, but what happens
> to active iSCSI connections when a fail-over occurs?
> 
> Do they get reconnected automatically or is it something more involved?
> (Last time I looked at iSCSI I'm sure you had to completely re-export
> all your LUNs any time you wanted to add/remove one)

Just the same as if you had a single iSCSI server without DRBD
crashing and rebooting.

Only faster.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Fwd: Re: drbd stops sync on LVM

2010-09-08 Thread Lars Ellenberg

On Wed, Sep 08, 2010 at 04:25:03PM +0200, Eric Deschamps wrote:
> sorry...

;)

> >> The initial sync works like a charm, but if I stop a VM on host 1 and
> >> start it on host2, it looks like the system is a snapshot from the
> >> initial sync.
> >>
> >> Anyway, /proc/drbd on both nodes prints UpToDate/UpToDate as well as
> >> drbdadm.
> >>
> >> What could cause this problem ? What am i doing wrong ?
> > 
> > Could you post the output of
> > 
> > for vm in $(virsh list --all | awk 'NR > 2 {  print $2  }') ; do
> > printf "\n%s\n" $vm;
> > virsh dumpxml $vm | sed -e '/ > done
> > 
> 
> Of course :
> 
> On the first host :

> actes2
> 
>   

> ltsp
> 
>   

> samba
> 
>   

> patients
> 
>   

> intranet
> 
>   

See how not one of them references DRBD,
but all directly use the lower level LVs directly?

If you bybass DRBD, you cannot expect it to magically know
about, and sync, your changes.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Fwd: Re: drbd stops sync on LVM

2010-09-08 Thread Lars Ellenberg

On Wed, Sep 08, 2010 at 04:53:12PM +0200, Eric Deschamps wrote:
> Le 08/09/2010 16:30, Lars Ellenberg a écrit :
> 
> > 
> > See how not one of them references DRBD,
> > but all directly use the lower level LVs directly?
> > 
> > If you bybass DRBD, you cannot expect it to magically know
> > about, and sync, your changes.
> > 
> 
> The VMs are directly on LVs and LVs are the backing devices for DRBD
> nodes, as in the user guide:
> http://www.drbd.org/users-guide-emb/s-lvm-lv-as-drbd-backing-dev.html
> 
> The sync works (actually, makes a snapshot) if I use drbdadm
> invalidate-remote.

Ok, I'll try again.
You have:

 VM [DRBD]  [DRBD] remote node
  \  /
   \/
 [logical volume]

 (DRBD does not see or know about any changes done by VM)

You need:
 [VM]
  |
[DRBD]  [DRBD] remote node
/
 [logical volume]

 (DRBD sees every change done by the VM, and thus
  has a chance to mirror the changes over).

Ok?

If you trigger a full sync in [what you have],
of course everything that is in the LV is synced over.
Though, if it is changed during that time directly from the VM,
it will be inconsistent.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Recurring Log Messages

2010-09-12 Thread Lars Ellenberg

On Fri, Sep 10, 2010 at 03:52:26PM -0700, Robinson, Eric wrote:
> I get messages like these in /var/log/messages every few seconds...
>  
> Sep 10 15:50:05 ha05 crm_attribute: [30028]: info: Invoked:
> crm_attribute -N ha05.mydomain.com d -n master-p_DRBD:1 -l reboot -v 100
>  
> Is that normal?

Yes.
In a newer version of that resource agent it uses some "quiet" flag in
the invokation of crm_master (which is the wrapper around crm_attribute,
which is logging this).
In a newer version of pacemaker, crm_attribute even keeps quiet
when someone uses that flag ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] ds:Inconsistent/Diskless

2010-09-13 Thread Lars Ellenberg

On Mon, Sep 13, 2010 at 03:22:48PM +0200, Sam Przyswa wrote:
> Le 13/09/2010 06:40, Digimer a écrit :
> >On 10-09-12 08:28 PM, Sam Przyswa wrote:
> >>>Your second node is messed up, by the looks of it. Work from your first
> >>>node.
> >>>
> >>>Stop both nodes' drbd (/etc/init.d/drbd stop)
> >>>
> >>>Now on 1st, run:
> >>>
> >>>drbdadm -- --overwrite-data-of-peer connect r0
> >>>
> >>I got:
> >>
> >>drbdsetup net: unrecognized option '--overwrite-data-of-peer'
> >>Command 'drbdsetup /dev/drbd0 net 88.190.11.187:7789 88.190.11.244:7789
> >>B --set-defaults --create-device --overwrite-data-of-peer' terminated
> >>with exit code 20
> >>
> >>:-(
> >>
> >>Sam.
> >Before you ran that, did you verify that both disks were attached? If
> >one of them still showed Diskless, it won't work.
> 
> Ok but I run DRBD v8.3.7 (both side) and lot of command from the
> http://www.drbd.org/docs/ are changed (it seems) as "drbdadm --
> --clear-bitmap new-current-uuid ressource" return an error "Unknown
> command 'new-current-uuid' "

If that is an unknown command to drbdadm, you are NOT using drbd 8.3.7.

Double check kernel _and_ userland versions of DRBD.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] "become-primary-on" when drbd network connection is interrupted

2010-09-13 Thread Lars Ellenberg

On Sat, Sep 11, 2010 at 11:12:57AM +0200, Robert @ GMail wrote:
> Dear list's friends, I've noticed one strange action on two nodes with the
> following listed drbd.conf setup.
> 
> The two nodes are crossover connected with two bonded nics, set for high
> availability, not for performances.
> 
> For some reason, at the startup the bonded drbd network didn't came up
> correctly (could be for the crossover connection? Forget it, it is not the
> issue I'll talk about)), so it was not possible to ping 10.1.1.x each other,
> no connection for drbd.
> 
> The supposed scenario I expected to be WFConnection on both nodes, and it
> was, but with Primary/Unknown on the server-1 and Secondary/Unknown on the
> server-2.
> 
> My issue is that they both were Secondary

You just said server-1 was Primary?

Anyways, if DRBD is told to become Primary by someone (like the init
script), but cannot become Primary for some reason, it says so in the
kernel logs.  Find the kernel logs from the time frame where you think
DRBD is misbehaving, and see if it complains about anything.

Or maybe the init script did not even get that far as to try and promote
whatever is configured?

> and I had to manually issue the
> "drbdadm primary" command on server-1.
> It did work, not a big deal, but it would be better if this can work as
> expected.
> 
> Could be due to some settings in the drbd.conf?
> 
> What follow is the drbd.con content and I thank you in advance for any kind
> of tip.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] connect error -22 with SDP/InfiniBand

2010-09-17 Thread Lars Ellenberg

On Thu, Sep 16, 2010 at 06:29:03PM -0500, J. Ryan Earl wrote:
> Hello,
> 
> I've recently setup an InfiniBand 40Gbit interconnect between two nodes to
> run DRBD on top of some pretty fast storage.  I am able to get DRBD to work
> over Ethernet and IPoIB, however, when I try to enable SDP for the lower
> latency, lower overhead communication I'm getting connection errors:
> 
> block drbd0: conn( Unconnected -> WFConnection )
> block drbd0: connect failed, err = -22
> block drbd0: connect failed, err = -22
> block drbd0: connect failed, err = -22

-EINVAL

iirc, it is a bug inside the in-kernel SDP connect() peer lookup,
which EINVALs if the target address is not given as AF_INET (!),
even if the socket itself is AF_INET_SDP.
Or the other way around.

If you do "drbdadm -d connect $resource", you get the drbdsetup
command that would have been issued.
replace the second (remmote) sdp with ipv4,
and do them manually, on both nodes.
If that does not work, replace only the first (local) sdp with ipv4,
but keep the second (remote) sdp.

If that gets you connected, then its that bug.
I think I even patched it in kernel once,
but don't find that right now,
and don't remember the SDP version either.
I think it was
drivers/infiniband/ulp/sdp/sdp_main.c:addr_resolve_remote()
missing an (... || ... = AF_INET_SDP)

> I have the MLNX_OFED installed on CentOS5.5 with SDP active:
> 
> # rpm -qa|grep sdp
> libsdp-devel-1.1.100-0.1.g920ea31
> sdpnetstat-1.60-0.2.g8844f04
> libsdp-1.1.100-0.1.g920ea31
> libsdp-1.1.100-0.1.g920ea31
> libsdp-devel-1.1.100-0.1.g920ea31
> libsdp-debuginfo-1.1.100-0.1.g920ea31

That's all userland, and does not affect DRBD, as DRBD does all
networking from within the kernel.

> [r...@node02 log]# netperf -f g -H 192.168.20.1 -c -C
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1
> (192.168.20.1) port 0 AF_INET
> Recv   SendSend  Utilization   Service
> Demand
> Socket Socket  Message  Elapsed  Send Recv SendRecv
> Size   SizeSize Time Throughput  localremote   local   remote
> bytes  bytes   bytessecs.10^9bits/s  % S  % S  us/KB   us/KB
>  87380  65536  6553610.0016.15   1.74 4.61 0.211   0.562
> 
> [r...@node02 log]# LD_PRELOAD="libsdp.so" netperf -f g -H 192.168.20.1 -c -C
> TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.20.1
> (192.168.20.1) port 0 AF_INET
> Recv   SendSend  Utilization   Service
> Demand
> Socket Socket  Message  Elapsed  Send Recv SendRecv
> Size   SizeSize Time Throughput  localremote   local   remote
> bytes  bytes   bytessecs.10^9bits/s  % S  % S  us/KB   us/KB
>  87380  65536  6553610.0124.67   3.18 3.28 0.253   0.262
> 
> There is a significant (50-100%) increase in bandwidth and decrease in
> latency using SDP instead of IPoIB, so even though IPoIB works I'd like to
> use the SDP method.

Share your findings on DRBD performance IPoIB vs. SDP,
once you get the thing to work on your platform.

HTH,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] data-integrity-alg in dual-Primary setup

2010-09-17 Thread Lars Ellenberg

On Fri, Sep 17, 2010 at 11:22:53AM +0200, Fabrice Charlier wrote:
> Hi,
> 
> Three months ago we deployed a "web cluster" for LAMP hosting. We
> based this solution on drdb in active/active mode combined with
> ocfs2. This solution matches correctly our needs but several times (
> ~ once a month) a problem appears: we have enable the
> "data-integrity-alg" option to (try to) avoid silent corruption of
> data and several times, this feature detected that data have been
> altered during transit between the two nodes. As we have
> active/active nodes and as we use automatic split brain recovery
> policies proposed in official documentation, the two members of the
> mirror are disconnected and we have to resync it manually to
> continue normal operation. We have already disabled all TCP
> offloading capabilities of all NICs without success.
> 
> Is it possible to ask drdb to retry to send the block until success
> in this kind of situation?

No.
It is likely one of those cases where in-flight buffers are changed.
Problem is known since a long time, but recently has drawn some
more attention again,
http://lwn.net/Articles/399148/
http://www.spinics.net/lists/linux-scsi/msg44074.html

Maybe disable the drbd level checksum, and trust the TCP checksum.

> If not, are you planning to implement this feature?

Maybe.
But it does not have particular priority.
Maybe we rather wait for the VM and VFS layer to fix it for us.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] data-integrity-alg in dual-Primary setup

2010-09-17 Thread Lars Ellenberg

On Fri, Sep 17, 2010 at 05:28:44PM +0200, Fabrice Charlier wrote:
> On 09/17/2010 02:40 PM, Lars Ellenberg wrote:
> 
> >Maybe disable the drbd level checksum, and trust the TCP checksum.
> 
> And take the risk of corrupt one mirror? ;-)
> 
> >>If not, are you planning to implement this feature?
> >
> >Maybe.
> >But it does not have particular priority.
> >Maybe we rather wait for the VM and VFS layer to fix it for us.
> 
> If somebody else implement this feature,
> are you ready to merge it with the main branch of drbd?

If it is "correct", why not.

It is not possible in protocol A.
B and C should be doable.

One of the interesting aspects is that
write ordering may be violated,
without anyone knowing it.

BTW, feature sponsoring may be an option, too ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] connect error -22 with SDP/InfiniBand

2010-09-17 Thread Lars Ellenberg

On Fri, Sep 17, 2010 at 02:12:35PM -0500, J. Ryan Earl wrote:
> >
> > If that gets you connected, then its that bug.
> > I think I even patched it in kernel once,
> > but don't find that right now,
> > and don't remember the SDP version either.
> > I think it was
> > drivers/infiniband/ulp/sdp/sdp_main.c:addr_resolve_remote()
> > missing an (... || ... = AF_INET_SDP)
> 
> 
> Is this the fix to which you refer?
> http://www.mail-archive.com/gene...@lists.openfabrics.org/msg10615.html

That's certainly relevant as well,
but it would have returned EAFNOSUPPORT, which is 97.

I doubt that this has anything to do with performance, btw,
it is just the address lookup during connect.

My guess is if you strace a netcat in userland, using your sdp preload
thingy, you'll likely see that it only creates the socket as
AF_INET_SDP, but all the rest of the network functions keep using
AF_INET, so no-one ever noticed. If that is intentional, we'd have to
adjust that in DRBD. If not, it needs to be fixed in the sdp stack.

DRBD over SDP performance tuning is a bit tricky,
and no, I don't remember the details, it's been a while.

I think cpu usage dropped considerable, thats a plus. But neither
single write latency nor sequential throughput of a single connection
improved much or even degraded respective to IPoIB.  If you have
multiple DRBD, thus multiple connections, the cumulative throughput
scaled better, though.

But please go ahead and tune your stack, on your hardware, which may be
more capable than the test lab hardware we used.
Depending on your hardware, and the quality recommendations of
your SDP tuning expert, your findings may be different.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] how to measure the time of one file's synchronization to secondary node

2010-09-21 Thread Lars Ellenberg

On Tue, Sep 21, 2010 at 09:30:06AM -0600, Mike Lovell wrote:
> Tomki wrote:
> >resource r0 {
> > protocol C;
> > startup { wfc-timeout 0; degr-wfc-timeout 120; }
> > disk { on-io-error detach; }
> >}
> >
> >If I copy a 1GB file to /mnt on primary node , how to measure the time
> >of this file's synchronization to secondary node
> >done completely.
> according to the config you posted, you are using protocol C on the
> drbd resource. the drbd.conf man pages says that with protocol C the
> "write IO is reported as completed, if it has reached both local and
> remote disk." in other words, when you write a file on the primary,
> drbd will not return the write operation as completed until drbd has
> written the blocks to the local disk as well as written it to the
> disk on the remote node. this makes it really easy to measure the
> time that it takes. just do `time cp /path/to/large/file
> /mnt/new/path/to/large/file`. the write operations in the cp wont
> complete until the data is on both disks.

Uhm... page cache...

rather do a "time cp $source $sink ; time sync",
otherwise you only benchmark your memory bandwidth
(assuming that the 1GiB file fits into your RAM).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] testing crm-fence-peer.sh

2010-09-23 Thread Lars Ellenberg

On Mon, Sep 20, 2010 at 10:18:49PM +0200, Pavlos Parissis wrote:
> Hi,
> I was testing the testing crm-fence-peer.sh on heartbeat/pacemaker
> cluster and I was wondering if the message " Remote node did not
> respond" which I got 3 times, is normal.
> For the simulation I used iptables on the slave to break the
> communication link between the master and slave. The drbd noticed
> immediately the broken link and invoked the crm-fence-peer.sh.
> 
> here is the log on the master and I broke the communication link only
> for one of my drbd resources (drbd_pbx_service_1)
> Sep 20 22:07:22 node-01 kernel: block drbd1: PingAck did not arrive in time.
> Sep 20 22:07:22 node-01 kernel: block drbd1: peer( Secondary -> Unknown ) 
> conn( Connected -> NetworkFailure ) pdsk( UpToDate -> DUnknown )
> Sep 20 22:07:22 node-01 kernel: block drbd1: asender terminated
> Sep 20 22:07:22 node-01 kernel: block drbd1: Terminating asender thread
> Sep 20 22:07:22 node-01 kernel: block drbd1: short read expecting header on 
> sock: r=-512
> Sep 20 22:07:22 node-01 kernel: block drbd1: Creating new current UUID
> Sep 20 22:07:22 node-01 kernel: block drbd1: Connection closed
> Sep 20 22:07:22 node-01 kernel: block drbd1: helper command: /sbin/drbdadm 
> fence-peer minor-1
> Sep 20 22:07:22 node-01 crm-fence-peer.sh[14877]: invoked for 
> drbd_pbx_service_1
> Sep 20 22:07:22 node-01 cibadmin: [14881]: info: Invoked: cibadmin -Ql
> Sep 20 22:07:22 node-01 cibadmin: [14890]: info: Invoked: cibadmin -Q -t 1
> Sep 20 22:07:24 node-01 crm-fence-peer.sh[14877]: Call cib_query failed 
> (-41): Remote node did not respond
> Sep 20 22:07:24 node-01 cibadmin: [14905]: info: Invoked: cibadmin -Q -t 1
> Sep 20 22:07:25 node-01 crm-fence-peer.sh[14877]: Call cib_query failed 
> (-41): Remote node did not respond
> Sep 20 22:07:25 node-01 cibadmin: [14913]: info: Invoked: cibadmin -Q -t 1
> Sep 20 22:07:27 node-01 crm-fence-peer.sh[14877]: Call cib_query failed 
> (-41): Remote node did not respond
> Sep 20 22:07:27 node-01 cibadmin: [14958]: info: Invoked: cibadmin -Q -t 1
> Sep 20 22:07:29 node-01 crm-fence-peer.sh[14877]: Call cib_query failed 
> (-41): Remote node did not respond
> Sep 20 22:07:29 node-01 cibadmin: [14966]: info: Invoked: cibadmin -Q -t 2
> Sep 20 22:07:31 node-01 cibadmin: [14992]: info: Invoked: cibadmin -C
>   -o constraints -Xid="drbd-fence-by-handler-ms-drbd_01">  score="-INFINITY" id="drbd-fence-by-handler-rule-ms-drbd_01">
>  id="drbd-fence-by-handler-expr-ms-drbd_01"/>
> Sep 20 22:07:33 node-01 crm-fence-peer.sh[14877]: INFO peer is reachable, my 
> disk is UpToDate: placed constraint 'drbd-fence-by-handler-ms-drbd_01'


Yes.
http://git.linbit.com/?p=drbd-8.3.git;a=blob;f=scripts/crm-fence-peer.sh;h=ea461f884963e7fe9c1d21ca97d74cdc4fb27285;hb=68ee998421a014e931b398ed21fd738c9e9a5d12#l322

(That url is ugly, I meant to say look at lines around 322 in that script.)

We start with a timeout of 1 second,
apparently that is not enough in your setup to get an answer.

If the message disturbs you, feel free to increase the initial timeout there.
- local cibtimeout=10
+ local cibtimeout=29

(or something like that).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] recover from a situation

2010-09-24 Thread Lars Ellenberg

On Fri, Sep 24, 2010 at 02:35:04PM +0200, Pavlos Parissis wrote:
> Hi,
> 
> Here is a situation from which I want either automatic (by the cluster) or
> manually (by the admin) to recover from.
> 
> DRBD resource runs on node 1
> shutdown all nodes in a such order which will not cause a failover of the
> resources
> start the node 2 which was secondary prior the shutdown.
> 
> As we know DRBD wont let the cluster to start up the drbd resource because
> is marked outdated.
> what would be the correct way to recover from this situation?

If I understand correctly, your scenario is that the only data left is
an outdated secondary, you have catastrophically lost the good data at
the Primary site.

The Outdated one will refuse to be promoted.

That is easily changed by drbdadm -- --force primary $resoucename.

(older drbd versions may need to low-level modify the metadata
with drbdmeta show-gi/set-gi).

You should definetly not automate this, as it would render all the
effort we do to "outdate" disconnected secondaries useless.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Effectiveness of barriers on disks with volatile write cache

2010-09-24 Thread Lars Ellenberg

On Fri, Sep 24, 2010 at 11:12:35AM +0200, Nicolae Mihalache wrote:
> Hello,
> 
> I've been reading about the barriers (no-disk-barrier option) in drbd.
> I understand that when the primary gets a IO completion notification,
> it will issue a barrier request (actually start a new epoch) to the
> secondary.
> However, if the disk of the primary has a write cache, it will
> immediately issue an IO completion notification, without actually
> writing the data to the disk. So what happens is that the secondary
> will use lots of barriers to guarantee the write order of the primary
> while in fact the primary itself has no guarantee about the order.
> 
> My conclusion is in contradiction with what is written in the user
> guide http://www.drbd.org/users-guide/re-drbdconf.html:
> 
> "When selecting the method you should not only base your decision on
> the measurable performance. In case your backing storage device has a
> volatile write cache (plain disks, RAID of plain disks) you should use
> one of the first two (i.e. barrier or flush)."
> 
> Can someone point the fault in my reasoning?

Look for this message, or read the whole thread.
 Date: Thu, 5 Aug 2010 14:40:56 +0200       
 
 From: Lars Ellenberg
 
 Subject: Re: [DRBD-user] barrier mode on LVM containers
 

DRBD's usage of barriers/cache flushes is to make sure
we won't "forget" to resync parts that need to be synced
after a node crash.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] dual primary DRBD, iSCSI and multipath possible?

2010-10-05 Thread Lars Ellenberg

On Mon, Oct 04, 2010 at 09:56:12PM +0200, Markus Hochholdinger wrote:
> Hello,
> 
> Am 15.03.2010 um 15:47 Uhr schrieb Olivier LAMBERT 
> :
> > :D
> > That's I'm using right now, but it's for Xen on the top.
> > I think this is the only good reason to do that.
> 
> I'm also in the process of evaluating this (for a xen setup).
> 
> My setup would be:
> On two nodes drbd with active/active (so xen live migration would work). On 
> each node export the drbd device with iscsi.

If you xen attaches to iSCSI,
why would you need anything else for live migration?

> On each other node import the iscsi devices of both drbd nodes and put 
> multipath over it.

Don't.

> The tricky part now is how to handle failures. With this setup it is possible 
> that multipath switches between both drbd nodes. If we do this more than once 
> while we have a split brain, this would destroy our data!

With dual-primary DRBD, you currently still have to set it up so that at
least one node reboots if the replication link breaks, and the rebooted
node must not go primary again untill connection is re-established.

> So the goal would be to develope a good multipath strategy.
> 
> How do you do this?

You don't.

> My idea is to say multipath to stick to one path and only switch on an error. 

Unfortunately you cannot control on what type of error the initiator will 
switch.

> Also you have to say multipath to NOT recover faulty paths autmatically to 
> prevent data loss in a split brain situation.

That's not your only problem there.

Please go for a failover setup.

You can have two targets, one active on box A, one active on box B,
both replicating to the respective other.

As the iSCSI target is not cluster aware, if you try to do multipath to
two _independent_ targets on an dual-primary DRBD, in general you will
break things.

DRBD only replicates the data changes.  To make that actually work, the
targets would have to be cluster aware, and replicate iSCSI state to
each other. All the non-data commands, unit attention, lun resets, cmd
aborts, not to speek of ordering or reservations.

It may appear to work as long as it is completely unloaded, you don't do
reservations, target and initiator are mostly idle, there are not much
scsi commands involved, and all you are doing is unplug an iSCSI cable.
But in general, if anything happens at all, you get high load on either
the network or the initiators or the targets or the replication link
or anything more interesting breaks, then I certainly don't want to be
the one cleaning up the mess.

I strongly recommend against it.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD 8.3.7 user/Makefile.in patch

2010-10-05 Thread Lars Ellenberg

On Sat, Sep 25, 2010 at 01:01:07PM +0200, Deftunix wrote:
> Hi all,
> 
> i've found a little problem when installing drbd 8.3.7 from
> source. It installs binary only related to $(DESTDIR) disregarding
> $(sbindir).
> 
> Kind regards,
> 
>   -- deftunix

I think that has been intentional, as drbdsetup may be necessary early
during boot, we really want it in /sbin, not in /usr/sbin, or
/usr/local/sbin/ or whatever.

If there is a "autofoo configure" variable for /sbin/ equivalent
(as opposed to /usr/sbin/ equivalent), please let me know.

> --- drbd-8.3.7/user/Makefile.in.orig2010-01-13 17:04:50.0 +0100
> +++ drbd-8.3.7/user/Makefile.in 2010-09-25 12:26:17.963038793 +0200
> @@ -98,23 +98,24 @@ distclean: clean
>  
>  install:
>  ifeq ($(WITH_UTILS),yes)
> -   install -d $(DESTDIR)/sbin/
> +   install -d $(DESTDIR)$(sbindir)

...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problems using image files with drbd, maybe unsupported?

2010-10-06 Thread Lars Ellenberg

On Wed, Oct 06, 2010 at 10:51:54AM +0200, Martin Fandel wrote:
> Hi,
> 
> hearbeat does the same as I do in the following steps:
> 
> drbd disconnect all
> drbd primary all
> mount /dev/drbd0 /var/lib/xen/images
> 
> However, in a heartbeat environment, the xen images are also damaged.
> If the primary node fails, all xen images located in
> /var/lib/xen/images are inconsistent and not bootable. 
> 
> I can avoid this by using fsck on the images. But I don't know how use
> fsck on image files. Does anybody know this?

That should not happen.
Your setup is broken somewhere.

Maybe show the xml definition of your xen resource,
of the pacemaker config,
and the drbd config,
and someone can point out what you did wrong.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problems using image files with drbd, maybe unsupported?

2010-10-06 Thread Lars Ellenberg

On Wed, Oct 06, 2010 at 02:23:49PM +0200, Martin Fandel wrote:
> Now it works :). My fault was that I've used yast2 instead of xm on the 
> console. If I start the vm via xm, it works fine... yast ckzskz...
> 
> Here are the steps I've done:
> 
> virtual-n1 is secondary node!
> 
> Configuration / State details:
> virtual-n1:~ # cat /proc/drbd 
> version: 8.3.7 (api:88/proto:86-91)
>  1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r
> ns:0 nr:794500 dw:968596 dr:24677 al:14 bm:1563 lo:0 pe:0 ua:0 ap:0 ep:1 
> wo:b oos:0
> virtual-n1:~ # cat /etc/drbd.d/xen.res
> resource xen {
>protocol   C;
>disk {
>   on-io-error pass_on;
>}
>syncer {
>   rate100M;
>}
>net {
>}
>startup {
>}
>on virtual-n1 {
>   device  /dev/drbd1 ;
>   address 192.168.100.1:5001;
>   meta-disk   internal;
>   disk/dev/sdc1;
>}
>on virtual-n2 {
>   device  /dev/drbd1 ;
>   address 192.168.100.2:5001;
>   meta-disk   internal;
>   disk/dev/sdc1;
>}
> }
> virtual-n2:~ # cat /etc/xen/vm/trnagios
> name="trnagios"
> uuid="dce10d36-3498-50f5-505c-e61615da090e"
> memory=512
> maxmem=512
> vcpus=1
> on_poweroff="destroy"
> on_reboot="restart"
> on_crash="destroy"
> localtime=0
> keymap="de"
> builder="linux"
> bootloader="/usr/lib/xen/boot/domUloader.py"
> bootargs="--entry=xvda2:/boot/vmlinuz-xen,/boot/initrd-xen"
> extra="xencons=tty "
> disk=[ 'file:/var/lib/xen/images/trnagios/disk0,xvda,w', ]
> vif=[ 'mac=00:16:3e:00:97:38,bridge=br0', ]
> nographic=1
> 
> Here we go, manual "unplanned failover" test:
> 
> virtual-n1:~ # drbdadm disconnect all
> virtual-n1:~ # cat /proc/drbd 
> version: 8.3.7 (api:88/proto:86-91)
>  1: cs:StandAlone ro:Secondary/Unknown ds:UpToDate/DUnknown   r
> ns:0 nr:822316 dw:996412 dr:24677 al:14 bm:1563 lo:0 pe:0 ua:0 ap:0 ep:1 
> wo:b oos:0
> virtual-n1:~ # drbdadm primary all
> virtual-n1:~ # cat /proc/drbd 
> version: 8.3.7 (api:88/proto:86-91)
>  1: cs:StandAlone ro:Primary/Unknown ds:UpToDate/DUnknown   r
> ns:0 nr:822316 dw:996412 dr:25277 al:14 bm:1563 lo:0 pe:0 ua:0 ap:0 ep:1 
> wo:b oos:0
> virtual-n1:~ # mount /dev/drbd1 /var/lib/xen/images/
> virtual-n1:~ # ls -lha /var/lib/xen/images/trnagios/disk0 
> -rw--- 1 root root 20G  6. Okt 14:15 /var/lib/xen/images/trnagios/disk0
> virtual-n1:~ # xm create trnagios
> virtual-n1:~ # ssh trnagios
> Password:
> 
> :D :D :D I'm sooo happy :). I thought I can trust yast in starting the vm 
> after unclean cut. But this doesn't work... I'm fine with this solution. DRBD 
> rocks!

And you are sure you  ssh'ed into your "newly started" trnagios now running
on this host, not into the "still running" trnagios still running on the other?

 ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] problem with syncing new changes on disks

2010-10-11 Thread Lars Ellenberg

On Mon, Oct 11, 2010 at 09:31:51AM -0400, Ravi Kanth wrote:
> Hello,
> 
> In my 3 machine config A, B and C. I want A - B cluster to be primary and
> get all the real time data changes. Then I would disconnect B and connect it
> with C to just copy the changes that we made. And then Connect B back to A,
> while A was still getting changes made to it. So now B gets the new changes
> from A. This way I could get stable changes (only changes I would like)
> rather than realtime changes (like errors) onto C.
> 
> Problem I am facing is when I connect A-B and make changes there is no
> change in UUID, but some activity in Activity log and bitmap. I don't
> understand how it is checking the activity log but it is not seeing the
> changes.
> 
> Now when I disconnect and connect it to C, it is either raising
> Split-Brain,  or showing 0KB marked out of sync when B clearly has changes
> to it.
> 
> Is there any way I could change UUID on C so that it always starts to sync
> from B (not full disk syncs) or stop split brain.

DRBD is not (yet) designed to work the way you seem to want it to.

Depending on the exact sequence of events, DRBD should either detect
split brain, or detect related, but not closely-enough related data.

Either way, it would require a full sync.

As long as at least one of the nodes is Primary,
DRBD should not silently connect without a resync.

So for this sequence of events:

t1)
A (Connected [to B], Primary, UpToDate)
B (Connected [to A], Secondary, UpToDate)
C just sitting there.
t2)
A (WFConnection, Primary, UpToDate)
B (StandAlone, Secondary, just sitting there)
C just sitting there, now connecting to A

connecting A and C now, waiting for sync...

t3)
A (Connected [to C], Primary, UpToDate)
B (StandAlone, Secondary, just sitting there)
C (Connected [to A], Secondary, UpToDate)

t4)
A (WFConnection, Primary, UpToDate)
B (StandAlone, Secondary, just sitting there)
C (StandAlone, Secondary, just sitting there)

Now connecting A and B again,

If B has NOT been Primary in between,
you should get an automatic full sync from A to B.

If B has been Primary in between,
you should get a "split-brain detected", or even "unrelated data", and
to resolve it would also require a full resync.

You will NOT get any incremental resync here.

We have that feature on the roadmap, though,
we call it "More than two nodes in one level".

Feature sponsoring accepted ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Online Verify and Kernel Panic

2010-10-11 Thread Lars Ellenberg

On Mon, Oct 11, 2010 at 03:16:19PM +0200, Fabrice Charlier wrote:
> Hi all,
> 
> We are running a web cluster based on dual primary drbd
> configuration and ocfs2. During each week-end we run a online verify
> on the drbd volume by executing "/sbin/drbdadm verify all" on one
> node. Last w-e, one node (not the one executing the verify command)
> completely crash and we found it this morning with a nice kernel
> panic message on the console.

Would have been useful to post that nice kernel panic message here.

> Anybody else already observed this behavior?
> 
> OS:  Linux server1.ucl.ac.be 2.6.18-194.3.1.el5 #1 SMP Thu May 13
> 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
> 
> DRBD: # modinfo drbd
> filename:   /lib/modules/2.6.18-194.3.1.el5/weak-updates/drbd83/drbd.ko
> alias:  block-major-147-*
> license:GPL
> version:8.3.2

Please use 8.3.8.1,
iirc, we fixed some bugs in some code paths with online verify since 8.3.2.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] {Spam?} Re: Online Verify and Kernel Panic

2010-10-11 Thread Lars Ellenberg

On Mon, Oct 11, 2010 at 05:02:40PM +0200, Fabrice Charlier wrote:
> On 10/11/2010 04:02 PM, Lars Ellenberg wrote:
> 
> >Would have been useful to post that nice kernel panic message here.
> 
> http://img176.imageshack.us/img176/5827/11102010.jpg

Ok, the interesting part is
:drbd:w_e_end_ov_req+0x2d/0x155

> >>Anybody else already observed this behavior?
> >>
> >>OS:  Linux server1.ucl.ac.be 2.6.18-194.3.1.el5 #1 SMP Thu May 13
> >>13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
> >>
> >>DRBD: # modinfo drbd
> >>filename:   /lib/modules/2.6.18-194.3.1.el5/weak-updates/drbd83/drbd.ko

Please do
cd /lib/modules/2.6.18-194.3.1.el5/weak-updates/drbd83/
gdb -q drbd.ko -ex 'l *(w_e_end_ov_req+0x2d)' -ex q

and post the output here.

If it says "no symbols found", grab the debug .ko first.

If that should not be available, grab a LINBIT support contract,
and have us sort that out for you ;-)

> >>alias:  block-major-147-*
> >>license:GPL
> >>version:8.3.2
> >
> >Please use 8.3.8.1,
> >iirc, we fixed some bugs in some code paths with online verify since 8.3.2.
> 
> I'll look at this.

BTW, in case you are unlucky enough to encounter some IO error during
online-verify, and DRBD detaches the disk, and you get unlucky again in
timing, you may then run into something else, which will also Oops, and
has been fixed only after 8.3.8.1. Fix for that is in 8.3.9.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Online Verify and Kernel Panic

2010-10-12 Thread Lars Ellenberg

On Tue, Oct 12, 2010 at 03:19:25PM +0200, Roland Friedwagner wrote:
> Hello,
> 
> Am Montag 11 Oktober 2010 schrieb Fabrice Charlier:
> > Hi all,
> >
> > We are running a web cluster based on dual primary drbd configuration
> > and ocfs2. During each week-end we run a online verify on the drbd
> > volume by executing "/sbin/drbdadm verify all" on one node. Last w-e,
> > one node (not the one executing the verify command) completely crash
> > and we found it this morning with a nice kernel panic message on the
> > console.
> >
> > Anybody else already observed this behavior?
> >
> 
> Yes, we (and Michael) did at Sep  2 00:18:01.
> 
> The DRBD-User thread concerning this is 
> "8.3.8 Online Verify Oops on kernel 2.6.34"
> 
> 
> DRBD Version: 8.3.8.1
> HW: HP DL380G6 (1 x Xeon X5570)
> OS: RHEL 5.5 x86_64
> Kernel: 2.6.18-194.11.3.el5 #1 SMP Mon Aug 23 15:51:38 EDT 2010 x86_64 x86_64 
> x86_64 GNU/Linux
> 
> It was nearly the same address (:drbd:w_e_end_ov_req+0x29/0x136) here
> and michael had w_e_end_ov_req+0x36/0x154.
> 
>  $ gdb drbd.ko -ex 'l *(w_e_end_ov_req+0x29)' -ex q

> 0x5fbf is in w_e_end_ov_req (include/linux/crypto.h:286).
> 281 return module_name(tfm->__crt_alg->cra_module);
> 282 }
> 283
> 284 static inline u32 crypto_tfm_alg_type(struct crypto_tfm *tfm)
> 285 {
> 286 return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;

which would mean that some of those pointers are invalid.  and that's
hard to believe, given that they are used and dereferenced all the time.

> 287 }
> 288
> 289 static inline unsigned int crypto_tfm_alg_min_keysize(struct 
> crypto_tfm *tfm)
> 290 {
> 
> We do an online verify each night.
> Does not reproduce since.

As long as it does not reproduce, we cannot really fix it.
Give us a reproducer, and we'll fix it.

> Slightly changed config now.
> Switched csums-alg and verify-alg from md5 to sha1
> (But the reason was concerning lower hash collisions probability by nearly 
> same speed) 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd-8.3.9rc2.tar.gz

2010-10-20 Thread Lars Ellenberg

On Mon, Oct 18, 2010 at 12:47:14PM +0200, r...@q-leap.de wrote:
> Also there were two issues when compiling the kernel modules for a 2.6.32
> kernel (both are present in 8.3.8.1 as well). I've appended a diff to
> the patch file I generated using make kernel-patch.

Works for me, though.
The second chunk should be generated by the build automagic anyways.
Could you reason why the first chunk would be necessary?

> --- patch-linux-2.6.32.23.ql.2.6.32-14.nodrbd-drbd-8.3.9rc2   2010-10-15 
> 16:34:04.854020482 +0200
> +++ patch-linux-2.6.32.23.ql.2.6.32-14.nodrbd-drbd-8.3.9rc2.orig  
> 2010-10-15 16:03:11.345584416 +0200
> @@ -74,8 +74,8 @@
>  --- linux-2.6.32.23.ql.2.6.32-14.nodrbd/drivers/block/drbd/Makefile  
> 1970-01-01 01:00:00.0 +0100
>  +++ linux-2.6.32.23.ql.2.6.32-14.nodrbd-drbd/drivers/block/drbd/Makefile 
> 2010-10-15 15:24:49.0 +0200
>  @@ -0,0 +1,11 @@
> -+ccflags-y := -include linux/drbd.h
> -+ccflags-y += -include linux/drbd_limits.h
> ++ccflags-y := -include $(DRBDSRC)/linux/drbd.h
> ++ccflags-y += -include $(DRBDSRC)/linux/drbd_limits.h
>  +
>  +drbd-y := drbd_buildtag.o drbd_bitmap.o drbd_proc.o
>  +drbd-y += drbd_worker.o drbd_receiver.o drbd_req.o drbd_actlog.o


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Proposing a patch for drbd ocf ra to suppress "mismatch" warnings

2010-10-20 Thread Lars Ellenberg

On Fri, Oct 15, 2010 at 03:14:13PM +0200, Raoul Bhatia [IPAX] wrote:
> On 09/18/2010 02:16 PM, Raoul Bhatia [IPAX] wrote:
> >> Sep 18 13:58:21 node2 lrmd: [21135]: info: RA output: 
> >> (drbd_db:0:monitor:stderr) DRBD module version: 8.3.7#012   userland 
> >> version: 8.3.8#012preferably kernel and userland versions should match.
> > 
> > actually, this is expected but at the same time irritating.
> > 
> > i found commit c82bc3f73e6a7f14b856471bd6e69d817e6145cc [1] where
> > one can set the environment variable DRBD_DONT_WARN_ON_VERSION_MISMATCH
> > to suppress such warnings.
> > 
> > i therefore propose the attach patch to add a new parameter
> > warn_on_version_mismatch which defaults to "true"
> > 
> > when it is set to false, it exports DRBD_DONT_WARN_ON_VERSION_MISMATCH=
> > to suppress the warnings.
> 
> hi,
> 
> what about my suggestion? what objections do you have to not apply
> it for drbd 8.3.9?

I just committed something similar to that.
Should be pushed to public "soon".
Will be in 8.3.9.


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbdadm expects /usr/local/etc/drbd.conf but linbit:ocf tries /etc/drbd.conf

2010-10-20 Thread Lars Ellenberg

On Fri, Oct 15, 2010 at 03:12:23PM +0200, Raoul Bhatia [IPAX] wrote:
> On 09/30/2010 01:26 PM, Koch, Sebastian wrote:
> > and saw that there were erros concerning the drbd.conf. The
> > ocf:linbit:drbd uses /etc/drbd.conf as the OCF_RESKEY_drbdconf and my
> > drbdadm tool always wanted to use /usr/local/etc/drbd.conf (maybe this
> > is compiled into the drb-utils, I wasn’t able to figure that out)
> > therefor the pacemaker always refused to let the secondary node connect
> > to the drbd device.
> 
> hi,
> 
> what about doing something like this in drbd.ocf:
> 
> -OCF_RESKEY_drbdconf_default="/etc/drbd.conf"
> +OCF_RESKEY_drbdconf_default="$(DESTDIR)$(sysconfdir)/drbd.conf"
> 
> and update these values via configure?

I don't think I like that. I dislike our usage of configure,
but we had to do that move to comply with some packaging guidelines.

Just "configure" with something sane, not the insanity autofoo and
packaging guidelines forced upon us.

Which would be, as suggested by our "autogen.sh":
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc

Or better yet, _package_ drbd as .rpm or .deb,
which will do the configure properly for you anyways.

I'm very much unsure if DRBD will function properly
if you deviate from that. It may.
We certainly do not test any other locations.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd-8.3.9rc2.tar.gz

2010-10-20 Thread Lars Ellenberg

On Wed, Oct 20, 2010 at 01:42:37PM +0200, r...@q-leap.de wrote:
> >>>>> "Lars" == Lars Ellenberg  writes:
> 
> Lars> On Mon, Oct 18, 2010 at 12:47:14PM +0200, r...@q-leap.de wrote:
> >> Also there were two issues when compiling the kernel modules for a 
> 2.6.32
> >> kernel (both are present in 8.3.8.1 as well). I've appended a diff to
> >> the patch file I generated using make kernel-patch.
> 
> Lars> Works for me, though.
> 
> Could you let me know what your exact cmdline to generate the patch is?
> 
> Lars> The second chunk should be generated by the build automagic
> Lars> anyways.
> 
> Build automagic of who? The kernel? When I apply the patch generated by
> make kernel-patch and compile I get a "BLK_MAX_SEGMENTS undefined" error.

Ah.  Well, the build magic of DRBD.
You are supposed to build the out-of-tree DRBD module as external module.
You are not supposed to use the "make kernel-patch" thing.
It may or may not work. I don't really care if it does.
We should probably drop if from our makefiles now.

But if you go that route, you first need to
 ./scripts/adjust_drbd_config_h.sh
Or even KDIR=$KDIR O=$O ../scripts/adjust_drbd_config_h.sh

> Lars> Could you reason why the first chunk would be necessary?
> 
> The kernel doesn't know about the variable  $(DRBDSRC)

Of course it does not.  And you should not patch it in there.
The in-kernel tree drbd must only use the in kernel tree .h files, obviously.
So the in-kernel tree Makefile should be left untouched.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd-8.3.9rc2.tar.gz

2010-10-20 Thread Lars Ellenberg

On Wed, Oct 20, 2010 at 04:33:30PM +0200, r...@q-leap.de wrote:
> >>>>> "Lars" == Lars Ellenberg  writes:
> 
> >> Build automagic of who? The kernel? When I apply the patch generated by
> >> make kernel-patch and compile I get a "BLK_MAX_SEGMENTS undefined" 
> error.
> 
> Lars> Ah.  Well, the build magic of DRBD.  You are supposed to build
> Lars> the out-of-tree DRBD module as external module.  You are not
> Lars> supposed to use the "make kernel-patch" thing.  It may or may
> Lars> not work. I don't really care if it does.  We should probably
> Lars> drop if from our makefiles now.
> 
> Lars> But if you go that route, you first need to
> Lars>  ./scripts/adjust_drbd_config_h.sh
> Lars> Or even KDIR=$KDIR O=$O ../scripts/adjust_drbd_config_h.sh
> 
> OK, that fixes the missing #define. Thanks for the hint.
> 
> Lars> Could you reason why the first chunk would be necessary?
> 
> >> The kernel doesn't know about the variable  $(DRBDSRC)
> 
> Lars> Of course it does not.  And you should not patch it in there.
> Lars> The in-kernel tree drbd must only use the in kernel tree .h
> Lars> files, obviously.  So the in-kernel tree Makefile should be
> Lars> left untouched.
> 
> I prefer in-tree, but OK, I build it out of tree from now on.

Well, if it does work for you, fine ;-)
But it may get cumbersome now that there is already an in-tree drbd,
with a somewhat different layout of files, and the split of an additional
helper module.

> Now to the more serious problem.
> Do you have any hint on how to start
> debugging the SDP connect problem?

Sorry.
The workaround mentioned before, respectively patching OFED kernel[*],
did work last time I tried.
Performance tuning is a different thing altogether.

[*] I think it was something like this

diff --git a/drivers/infiniband/core/addr.c b/drivers/infiniband/core/addr.c
index ce511d8..26ef4c4 100644
--- a/drivers/infiniband/core/addr.c
+++ b/drivers/infiniband/core/addr.c
@@ -306,7 +306,7 @@ static int addr_resolve_remote(struct sockaddr *src_in,
struct sockaddr *dst_in,
struct rdma_dev_addr *addr)
 {
-   if (src_in->sa_family == AF_INET) {
+   if (src_in->sa_family == AF_INET || src_in->sa_family == AF_INET_SDP) {
return addr4_resolve_remote((struct sockaddr_in *) src_in,
(struct sockaddr_in *) dst_in, addr);
} else


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] [Pacemaker] drbd on heartbeat links

2010-11-02 Thread Lars Ellenberg

On Tue, Nov 02, 2010 at 10:07:17PM +0100, Pavlos Parissis wrote:
> On 2 November 2010 16:15, Dan Frincu  wrote:
> > Hi,
> >
> > Pavlos Parissis wrote:
> >>
> >> Hi,
> >>
> >> I am trying to figure out how I can resolve the following scenario
> >>
> >> Facts
> >> 3 nodes
> >> 2 DRBD ms resource
> >> 2 group resource
> >> by default drbd1/group1 runs on node-01 and drbd2/group2 runs on node2
> >> drbd1/group1  can only run on node-01 and node-03
> >> drbd2/group2  can only run on node-02 and node-03
> >> DRBD fencing_policy is resource-only [1]
> >> 2 heartbeat links and one of them used by DRBD communication
> >>
> >> Scenario
> >> 1) node-01 loses both heartbeat links
> >> 2) DRBD monitor detects first the absence of the drbd communication
> >> and does resource fencing by add location constraint which prevent
> >> drbd1 to run on node3
> >> 3) pacemaker fencing kicks in and kills node-01
> >>
> >> due to location constraint created at step 2, drbd1/group1 can run in
> >> the cluster
> >>
> >>
> >
> > I don't understand exactly what you mean by this. Resource-only fencing
> > would create a -inf score on node1 when the node loses the drbd
> > communication channel (the only one drbd uses),
> Because node-01 is the primary at the moment of the failure,
> resource-fencing will create an -inf score for the node-03.
> 
> > however you could still have
> > heartbeat communication available via the secondary link, then you shouldn't
> As I wrote none of the heartbeat links is available.
> After I sent the mail, I realized that the node-03 will not see
> location constraint created by node-01 because there no heartbeat
> communication!
> Thus I think my scenario has a flaw, since none of the heartbeat links
> are available on node-01.
> Resource-fencing from DRBD will be triggered but without any effect
> and node-03 or node-02 will fence node-01, and node-03 will be become
> the primary for drbd1
> 
> > fence the entire node, the resource-only fencing does that for you, the only
> > thing you need to do is to add the drbd fence handlers in /etc/drbd.conf.
> >       handlers {
> >               fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
> >               after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
> >       }
> >
> > Is this what you meant?
> 
> No.
> Dan thanks for your mail.
> 
> 
> Since there is a flaw on the scenario let's define a similar scenario.
> 
> status
> node-01 primary for drbd1 and group1 runs on it
> node-02 primary for drbd2 and group2 runs on it
> node-3 secondary for drbd1 and drbd2
> 
> 2 heartbeat links, and one of them being used for DRBD communication
> 
> here is the scenario
> 1) on node-01 heartbeat link which carries also DRBD communication is lost
> 2) node-01 does resource-fencing and places score -inf for drbd1 on node-03
> 3) on node-01 second heartbeat link is lost
> 4) node-01 will be fenced by one other cluster members
> 5) drbd1 can't run on node-03 due to location constraint created at step 2
> 
> The problem here is that location constraint will be active even
> node-01 is fenced.

Which is good, and intended behaviour, as it protects you from
going online with stale data (changes between 1) and 4) would be lost).

> Any ideas?

The drbd setting "resource-and-stonith" simply tells DRBD
that you have stonith configured in your cluster.
It does not by itself trigger any stonith action.

So if you have stonith enabled, and you want to protect against being
shot while modifying data, you should say "resource-and-stonith".


What exactly do you want to solve?

Either you want to avoid going online with stale data,
so you place that contraint, or use dopd, or some similar mechanism.

Or you don't care, so you don't use those fencing scripts.

Or you usually are in a situation where you not want to use stale data,
but suddenly your primary data copy is catastrophically lost, and the
(slightly?) stale other copy is the best you have.

Then you remove the constraint or force drbd primary, or both.
This should not be outomated, as it involves knowledge the cluster
cannot have, thus cannot base decisions on.

So again,

What is it you are trying to solve?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd version 8.3.9 or other with kernel > 2.6.35 ?

2010-11-07 Thread Lars Ellenberg

On Sat, Nov 06, 2010 at 10:37:47AM -0700, fibrer...@gmail.com wrote:
> On Friday, November 5, 2010, Michael  wrote:
> > On http://www.drbd.org/download/mainline/
> > 2.6.35   -> 8.3.8
> >
> > 2.6.36   -> 8.3.8.1
> >
> >
> > Which kernel version should work with 8.3.9?
> >
> > Tried build with kernel 2.6.35.7 but got an error:
> > ( with ./configure -with-km )

...

> Also I can confirm that compiling drbd 8.3.9 on Ubuntu 10.10
> (maverick) also fails.

Should be fixed with
http://git.drbd.org/?p=drbd-8.3.git;a=commitdiff;h=d23e7fa9dd7c51160761ebcf9fa06a926b042001

For the tracing stuff, just don't compile it for now.
--- a/drbd/Makefile
+++ b/drbd/Makefile
@@ -44,7 +44,7 @@ ifneq ($(PATCHLEVEL),)
   endif

   CONFIG_BLK_DEV_DRBD := m
-  CONFIG_DRBD_TRACE := $(shell test $${SUBLEVEL} -ge 30 && echo m || echo n)
+  CONFIG_DRBD_TRACE := $(shell test $${SUBLEVEL} -ge 30 && test $${SUBLEVEL} 
-lt 35 && echo m || echo n)

   include $(DRBDSRC)/Makefile-2.6


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] building drbd 8.3.9 against upstream kernel

2010-11-07 Thread Lars Ellenberg

On Sun, Nov 07, 2010 at 12:42:15PM +0200, Or Gerlitz wrote:
> Attempting to build drbd 8.3.9 against 2.6.36 I have run into few build 
> errors, I have managed to get
> drbd_receiver.c and drbd_main.c to build fine with the below patch,
> 
> > make -C /lib/modules/2.6.36/source  O=/lib/modules/2.6.36/build 
> > SUBDIRS=/home/ogerlitz/linux/drbd/drbd-8.3.9/drbd  modules
> >   CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_buildtag.o
> >   CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_bitmap.o
> >   CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_proc.o
> >   CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_worker.o
> >   CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_receiver.o
> > /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_receiver.c: In function 
> > ‘write_flags_to_bio’:
> > /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_receiver.c:1818: error: 
> > ‘REQ_BCOMP_SYNC’ undeclared (first use in this function)
> [...]
> >  CC [M]  /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_main.o
> > /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_main.c: In function 
> > ‘bio_flags_to_wire’:
> > /home/ogerlitz/linux/drbd/drbd-8.3.9/drbd/drbd_main.c:2647: error: 
> > ‘DP_BCOMP_UNPLUG’ undeclared (first use in this function)

Should be fixed with
http://git.drbd.org/?p=drbd-8.3.git;a=commitdiff;h=d23e7fa9dd7c51160761ebcf9fa06a926b042001

> 
> but then drbd_tracing put some more serious challenges, so what would be the 
> best way to proceed. 

For the tracing stuff, just don't compile it for now.
--- a/drbd/Makefile
+++ b/drbd/Makefile
@@ -44,7 +44,7 @@ ifneq ($(PATCHLEVEL),)
   endif

   CONFIG_BLK_DEV_DRBD := m
-  CONFIG_DRBD_TRACE := $(shell test $${SUBLEVEL} -ge 30 && echo m || echo n)
+  CONFIG_DRBD_TRACE := $(shell test $${SUBLEVEL} -ge 30 && test $${SUBLEVEL} 
-lt 35 && echo m || echo n)

   include $(DRBDSRC)/Makefile-2.6


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd on virtio: WARNING: at block/blk-core.c

2010-11-09 Thread Lars Ellenberg

On Mon, Nov 08, 2010 at 01:56:43PM +0100, Thomas Vögtle wrote:
> Hello,
> 
> 
> For testing purposes only I test our software and drbd stuff on two
> Virtual Machines (kvm, virtio-net, virtio-blk)
> I'm using Kernel 2.6.32.25.
> 
> Since using drbd-8.3.9 I get following messages (or similar), again and
> again, when DRBD is starting to sync:
> 
> 
> [ 3830.713476] block drbd0: Began resync as SyncSource (will sync
> 7814892 KB [1953723 bits set]).
> [ 3829.057557] block drbd0: helper command: /sbin/drbdadm
> before-resync-target minor-0
> [ 3830.739016] [ cut here ]
> [ 3830.739143] WARNING: at block/blk-core.c:337 blk_start_queue+0x29/0x42()

void blk_start_queue(struct request_queue *q)
{
WARN_ON(!irqs_disabled());  <=== there

queue_flag_clear(QUEUE_FLAG_STOPPED, q);
__blk_run_queue(q);
}

> [ 3830.739145] Hardware name: Bochs
> [ 3830.739147] Modules linked in: ocfs2 jbd2 ocfs2_nodemanager
> ocfs2_stack_user ocfs2_stackglue dlm bonding dummy drbd cn 8021q garp
> bridge stp llc rpcsec_gss_krb5 nfsd exportfs nfs lockd fscache nfs_acl
> auth_rpcgss sunrpc xt_NOTRACK xt_TCPMSS xt_connmark xt_conntrack
> xt_CONNMARK xt_state xt_policy iptable_nat nf_nat_tftp nf_conntrack_tftp
> nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre
> nf_nat_irc nf_conntrack_irc nf_nat_sip nf_conntrack_sip nf_nat_ftp
> nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack
> autofs4 xfrm_user ipmi_devintf ipmi_msghandler 8139too lcd_module ppdev
> parport_pc parport st tpm_tis virtio_net tpm tpm_bios virtio_balloon
> i2c_piix4 rtc_cmos i2c_core rtc_core rtc_lib evdev button sg [last
> unloaded: ocfs2_stackglue]
> [ 3830.739351] Pid: 22400, comm: path_id Not tainted 2.6.32.25 #1
> [ 3830.739353] Call Trace:
> [ 3830.739355][] ? blk_start_queue+0x29/0x42
> [ 3830.739416]  [] warn_slowpath_common+0x77/0x8f
> [ 3830.739420]  [] warn_slowpath_null+0xf/0x11

> [ 3830.739422]  [] blk_start_queue+0x29/0x42
> [ 3830.739475]  [] blk_done+0xe0/0xfa

static void blk_done(struct virtqueue *vq)
{
struct virtio_blk *vblk = vq->vdev->priv;
struct virtblk_req *vbr;
unsigned int len;
unsigned long flags;

spin_lock_irqsave(&vblk->lock, flags);
while ((vbr = vblk->vq->vq_ops->get_buf(vblk->vq, &len)) != NULL) {
int error;

switch (vbr->status) {
case VIRTIO_BLK_S_OK:
error = 0;
break;
case VIRTIO_BLK_S_UNSUPP:
error = -ENOTTY;
break;
default:
error = -EIO;
break;
}

if (blk_pc_request(vbr->req)) {
vbr->req->resid_len = vbr->in_hdr.residual;
vbr->req->sense_len = vbr->in_hdr.sense_len;
vbr->req->errors = vbr->in_hdr.errors;
}

__blk_end_request_all(vbr->req, error);
list_del(&vbr->list);
mempool_free(vbr, vblk->pool);
}
/* In case queue is stopped waiting for more buffers. */
blk_start_queue(vblk->disk->queue); <<< THERE
spin_unlock_irqrestore(&vblk->lock, flags);
}
 

If your kernel source looks like mine, then this would indicate something in
between spin_lock_irqsave and spin_unlock_irqrestore above would enable
spinlocks again, where is must not.

If that something is some part of DRBD, then that would be a serious bug.

If you run with spin lock debug enabled, that may provide some more insight.
We'll try to reproduce here anyways.
You say you simply start drbd 8 in a VM with virtio-blk,
and that warning triggers?

> [ 3830.739514]  [] ? __rcu_process_callbacks+0xf2/0x2a6
> [ 3830.739557]  [] vring_interrupt+0x27/0x30
> [ 3830.739572]  [] handle_IRQ_event+0x2d/0xb7
> [ 3830.739575]  [] handle_edge_irq+0xc1/0x102
> [ 3830.739607]  [] handle_irq+0x89/0x94
> [ 3830.739610]  [] do_IRQ+0x5a/0xab
> [ 3830.739613]  [] ret_from_intr+0x0/0x11
> [ 3830.739624]  
> [ 3830.739627] ---[ end trace a9e0f5d8de037953 ]---
> [ 3830.739628] [ cut here ]
> 
> 
> I don't get any message like this on real hardware.
> 
> This is absolutely reproducable and still exists in git head
> (drbd-8.3.9-5-g7fed7c2).
> 
> It didn't exist in 8.3.8.1.
> 
> Except for the warning DRBD is syncing fine.
> 
> Any clues?
> 
> 
>Thomas

Thanks,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Drbd resource return 20 (unspecified)

2010-11-10 Thread Lars Ellenberg

On Wed, Nov 10, 2010 at 01:53:58PM +0800, Chen, Yanfei (NSN - CN/Cheng Du) 
wrote:
> Hi
> 
> We use redhat cluster + drbd architecture. Oracle use resource
> res_drbd_oracle
> The drbd version is 8.3.2   cluster version 2.0.46
> 
> We get the below error:
> 
> Nov  4 11:43:49 clnode1 xinetd[4432]: EXIT: http status=0 pid=12414
> duration=0(sec)
> Nov  6 15:25:30 clnode1 clurgmgrd[4691]:  status on drbd
> "res_drbd_oracle" returned 20 (unspecified) 
> Nov  6 15:25:30 clnode1 clurgmgrd[4691]:  Stopping service
> service:Oracle 
> Nov  6 15:25:47 clnode1 kernel: block drbd0: role( Primary -> Secondary
> ) 
> 
> 
> The redhat cluster call the function drbd_status  in drbd.sh to moniter
> status, which is from drbd
> 
> drbd_status() {
> role=$(drbdadm role $OCF_RESKEY_resource)
> case $role in
> Primary/*)
> return $OCF_RUNNING
> ;;
> Secondary/*)
> return $OCF_NOT_RUNNING
> ;;
> 
> esac
> return $OCF_ERR_GENERIC
> }

If that is indeed the script that is used,
exit code 20 is "impossible",
exit code will either be $OCF_ERR_GENERIC (which is 1),
$OCF_NOT_RUNNING (which is 7), or
$OCF_RUNNING (which is ... wait... WTF!)

OCF_RUNNING is non-existent. And as it is empty, it will expand to nothing,
the statement will expand to "return", and return without argument
is equivalent to "return $?", so it will return the exit status of the
last command, which was "drbdadm role".
Because usually, if drbdadm role is able to determine the role,
it would exit 0, and if role was assigned Primary/..., drbdadm clearly
was able to determine the role, this usually just worked "by accident".
Still it should have been $OCF_SUCCESS there.

Why drbdadm role would have an exit code of 20,
while still returning Primary to stdout is beyond me for now.

But that is the only way I can see that the above shell code would return 20.

Unless, of course, the other $OCF_* are not defined as well, in which
case the "return $OCF_ERR_GENERIC" would have been empty thus equivalent
to "$?" as well.  If that was the case, though, it would be a better
fit: if drbdadm could not determine the role for whatever reason, role
will be empty, and drbdadm probably exits with 20.
But OCF_ERR_GENERIC being empty would mean that ocf-shellfuncs could not
be sourced, which I find a bit unlikely.

Please try this, and try to reproduce.
Once you have a reproducer, it will be easy to fix.
--- a/drbd.sh
+++ b/drbd.sh
@@ -68,7 +68,7 @@ drbd_status() {
 role=$(drbdadm role $OCF_RESKEY_resource)
 case $role in
Primary/*)
-   return $OCF_RUNNING
+   return $OCF_SUCCESS
;;
Secondary/*)
return $OCF_NOT_RUNNING

> This problem happened two times and lead oracle service restarted.
> Appricated you help us to understand what's the error 20 meaning? How
> could it happen?

Do you have any further logs, kernel or other, from the time period in question?
Or sysstat like info about the general workload at that time?
Was the system particularly busy at the times when this happened?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Need help, we want invalidate if role=Secondary and uuid_compare rule==4

2010-11-11 Thread Lars Ellenberg

On Tue, Nov 09, 2010 at 10:08:55AM +, putcha narayana wrote:
> 
> 
> Hi
>  
> Can someone please respond to my request.

Upgrade drbd.
If it is still reproducable, post again.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] building drbd 8.3.9 against upstream kernel

2010-11-11 Thread Lars Ellenberg

On Mon, Nov 08, 2010 at 08:10:31AM -0800, fibrer...@gmail.com wrote:
> Hi Lars,
> 
> Despite this patch, compiling still fails on Ubuntu Maverick (10.10).
> Can you advise?

Works for me, though.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Low performance over NFS after upgrade from 0.7.25 to 8.2.7

2010-11-11 Thread Lars Ellenberg

On Thu, Nov 04, 2010 at 01:55:56PM +0100, Håkan Engblom wrote:
> 
> Hi drbd-users,
>  
> We use DRBD on a system with two fileservers. We have three replicated
> DRBD-partitions, and on top of that we use ext3 filesystem, and these
> are exported over NFS,
> pretty basic setup. NFS-server is in kernel. Linux kernel is
> 2.6.27.39, running on a x86_64 system. C-protocol for drbd.

> The problem we have is that after the upgrade to 8.2.7,

There is absolutely no reason to upgrade to 8.2, 8.2. is obsolete.
Upgrade to 8.3.

> the
> performance, when creating many small files from an NFS-client, has
> decreased drastically. 

Then read about the disk settings no-disk-barrier, no-disk-flushes,
no-md-flushes, which did not exist in 0.7, and have performance impact.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem mounting lvm snapshot

2010-11-11 Thread Lars Ellenberg

On Tue, Oct 26, 2010 at 04:20:40PM -0500, Lewis Donzis wrote:
> We'd like to be able to make backups from our DRBD secondary by mounting
> the underlying filesystems.  After some searching, this appears to be a
> relatively common discussion: running DRBD on top of LVM, making a
> snapshow of the backing LV on the secondary, and mounting the snapshot LV.
> As has been discussed, it's not perfect, but it should produce reasonable
> results most of the time, at least no worse than if there had been a power
> failure.
> 
> So, I'm sorry if this is a silly question, but I can't seem to get this to
> work.  It appears that DRBD permanently prevents mounting the backing
> device (or its snapshot) because "drbdadm create-md" function modifies the
> filesystem

Nope.
It modifies the block device.

> so that it can no longer be mounted except via the /dev/drbdX
> device.
>
> In particular, attempting to mount a snapshot, or detaching DRBD
> from the backing device and attempting to mount the file system, or even
> just attempting to mount the backing device after create-md has run,
> results in:
> 
>mount: unknown filesystem type 'drbd'

how about
 mount -t ext4 -o your,favorite,mount,options /dev/$vg/$snap_lv /mnt/point
 (or ext3 or xfs or whatever it is you are actually using?)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd on virtio: WARNING: at block/blk-core.c

2010-11-13 Thread Lars Ellenberg

Sorry this post spent so much time in some moderation queue,
apparently you don't post from your subscription address,
or you are not subscribed. Anyways, see below.

On Tue, Nov 09, 2010 at 05:40:02PM +0100, Thomas Vögtle wrote:
> Lars Ellenberg wrote:
> > 
> > If your kernel source looks like mine, then this would indicate something in
> > between spin_lock_irqsave and spin_unlock_irqrestore above would enable
> > spinlocks again, where is must not.
> 
> 
> My kernel source is 2.6.32.25 (vanilla).
> 
> 
> > If that something is some part of DRBD, then that would be a serious bug.
> > 
> > If you run with spin lock debug enabled, that may provide some more insight.
> > We'll try to reproduce here anyways.
> 
> Switched on:
> CONFIG_DEBUG_SPINLOCK=y
> CONFIG_DEBUG_SPINLOCK_SLEEP=y
> 
> Hope this helps:

No it does not...
no new information than in your previous post.

Anyways,
we do it wrong...

commit 7f9c6c210158d212cc2c7be6d6b4d289078ab735
Author: Lars Ellenberg 
Date:   Wed Nov 10 10:33:21 2010 +0100

drbd: use irqsave in bio endio callback

We used spin_lock_irq, spin_unlock_irq.  The later may re-enable irq too
early if we have been called with irq disabled, opening up a window
for all sorts of problems.

diff --git a/drbd/drbd_req.h b/drbd/drbd_req.h
index 2260e4f..f759b05 100644
--- a/drbd/drbd_req.h
+++ b/drbd/drbd_req.h
@@ -338,18 +338,21 @@ static inline int _req_mod(struct drbd_request *req, enum 
drbd_req_event what)
return rv;
 }

-/* completion of master bio is outside of spinlock.
- * If you need it irqsave, do it your self! */
+/* completion of master bio is outside of our spinlock.
+ * We still may or may not be inside some irqs disabled section
+ * of the lower level driver completion callback, so we need to
+ * spin_lock_irqsave here. */
 static inline int req_mod(struct drbd_request *req,
enum drbd_req_event what)
 {
+   unsigned long flags;
struct drbd_conf *mdev = req->mdev;
struct bio_and_error m;
int rv;

-   spin_lock_irq(&mdev->req_lock);
+   spin_lock_irqsave(&mdev->req_lock, flags);
rv = __req_mod(req, what, &m);
-   spin_unlock_irq(&mdev->req_lock);
+   spin_unlock_irqrestore(&mdev->req_lock, flags);

if (m.bio)
complete_master_bio(mdev, &m);

only, we do it wrong for a long time already.
So I don't really see, why it would only show up in 8.3.9...

Hm. Wait.
No, we used to do it correct. My bad.
in
commit 9b7f76dc37919ea36caa9680a3f765e5b19b25fb
Author: Lars Ellenberg 
Date:   Wed Aug 11 23:40:24 2010 +0200

drbd: new configuration parameter c-min-rate

We now track the data rate of locally submitted resync related requests,
and can thus detect non-resync activity on the lower level device.

If the current sync rate is above c-min-rate, and the lower level device
appears to be busy, we throttle the resyncer.

a bad chunk slipped through, replacing the correct
spin_lock_irqsave;__req_mod; etc..
with a plain req_mod(), which only does spin_lock_irq.
Sorry.

I'll revert req_mod to plain spin_lock_irq,
and revert the endio callback to use the spin_lock_irqsave.

I think that should do it.

Thanks.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD and CoroSync fencing and unfencing not working

2010-11-13 Thread Lars Ellenberg

On Fri, Nov 12, 2010 at 01:25:22PM -0500, Georges-Etienne Legendre wrote:
> Hi,
> 
> I'm testing my DRBD + CoroSync cluster. I've come across a situation that I'm 
> not sure is supported.
> 
> First, my setup:
> - I have a 2 nodes setup, with dual ring for CoroSync.
> - Stonith is configured in CoroSync.
> - DRBD is using a cross-over link between the 2 nodes.
> - DRBD is configured to fence/unfence peer (resource-only) with the scripts 
> (crm-fence/unfence-peer.sh).
> 
> Test case:
> - The cross-over link becomes unavailable (simulated with "ifdown ethX")
> - DRBD fences the peer
> - Then, 2nd failure: the secondary node (DRBD is secondary) node is crashed 
> (e.g. hardware issue on this server, simulated by resetting the server with 
> the ILO).
> 
> The problem:
> When secondary node comes back, CoroSync doesn't see the node coming
> back. The node appears as "Offline" even though CoroSync is started
> and network interface is up. To recover from that situation, I had to
> remove CoroSync constraints, and then reboot the primary node.

That has nothing to do with DRBD or its fencing scripts.

> Is this supposed to work by automatically unfencing the peer?
> Is there something I'm doing wrong here?

Do you need to reset the fault status of your rings on the remaining
node using corosync-cfgtool?  Corosync apparently does not heal itself,
but needs administrative help every now and then.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd on virtio: WARNING: at block/blk-core.c

2010-11-13 Thread Lars Ellenberg

On Sat, Nov 13, 2010 at 02:24:05PM +, Stefan Hajnoczi wrote:
> Does this fix the issue?  Compiled but not tested.
> 
> Subject: [PATCH] drbd: Use irqsave/irqrestore for req_mod() in 
> drbd_endio_pri()
> 
> It is not safe to call req_mod() from drbd_endio_pri() since it uses
> spin_lock_irq()/spin_unlock_irq().  Instead use irqsave/irqrestore and
> call __req_mod() so that local irq mode is preserved.

Right, thanks for your work.

See also my other post.

I previously only "fixed" req_mod(),
wondering why that would show up only now.
But then found the other commit that broke it, back in August, which
contained a chunk that basically looks like your below patch, reversed
  :(

So it's fixed in our internal git already,
the fix should show up in public git early next week.

> Signed-off-by: Stefan Hajnoczi 
> ---
>  drivers/block/drbd/drbd_worker.c |9 -
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/block/drbd/drbd_worker.c 
> b/drivers/block/drbd/drbd_worker.c
> index b0551ba..b136fb8 100644
> --- a/drivers/block/drbd/drbd_worker.c
> +++ b/drivers/block/drbd/drbd_worker.c
> @@ -197,6 +197,8 @@ void drbd_endio_pri(struct bio *bio, int error)
>   struct drbd_request *req = bio->bi_private;
>   struct drbd_conf *mdev = req->mdev;
>   enum drbd_req_event what;
> + struct bio_and_error m;
> + unsigned long flags;
>   int uptodate = bio_flagged(bio, BIO_UPTODATE);
>  
>   if (!error && !uptodate) {
> @@ -221,7 +223,12 @@ void drbd_endio_pri(struct bio *bio, int error)
>   bio_put(req->private_bio);
>   req->private_bio = ERR_PTR(error);
>  
> - req_mod(req, what);
> + spin_lock_irqsave(&mdev->req_lock, flags);
> + __req_mod(req, what, &m);
> + spin_unlock_irqrestore(&mdev->req_lock, flags);
> +
> + if (m.bio)
> +     complete_master_bio(mdev, &m);
>  }
>  
>  int w_read_retry_remote(struct drbd_conf *mdev, struct drbd_work *w, int 
> cancel)
> -- 
> 1.7.2.3
> 

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Removed resources not disappearing: 17:??not-found??

2010-11-17 Thread Lars Ellenberg

On Wed, Nov 17, 2010 at 12:36:19PM -, Robert Dunkley wrote:
> Hi everyone,
> 
> 
> 
> I removed quite a few DRBD resources from the config file on both
> servers involved and some of them have appeared as shown below:
> 17:??not-found??  WFConnection  Secondary/UnknownDiskless/DUnknown
> C
> 18:??not-found??  Connected Secondary/Secondary  Diskless/Diskless
> C
> 
> The results line up on the other system (DUKNOWN status resources are
> not showing at all on the other system). Is there anyway to clear these
> resources out without a reboot?

You should
drbsetup 17 down
drbsetup 18 down

(the network part is still configured...)

Usually, you _first_ unconfigure things,
and only then remove them from the config.

There currently is no way to completely remove "Unconfigured" devices
from /proc/drbd short of a module unload,
but they should no longer show up in drbd-overview.

> DRBD seems to be holding the associated devices in use so I cant remove
> them from devmapper.

That is nonsense, I dare say.
If its "Diskless", it does not hold anything in use.
So whatever keeps you from removing things from devmapper,
it is not DRBD.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem with DRBD sync stalling with vmware vmxnet3 10Gb adapter

2010-11-20 Thread Lars Ellenberg

On Fri, Nov 19, 2010 at 10:54:20AM -0800, Sean McCreadie wrote:
> Hello,
> 
> I am having issues using the vmware 10Gb virtual adapter with vmxnet
> driver and drbd.  When I go to sync the resources, the sync will
> always stall at some point.  I have confirmed that the vmxnet driver
> is the culprit, as I have no issues using the e1000 NIC and driver on
> the same VMs.

seems that at least you are not alone :-/
http://communities.vmware.com/thread/269297
(just one of the first hits...)

I suggest you stress test drive the vmxnet3 thingy without DRBD for a while.
Find a simple test case, and take that to your support contact.

> My setup is this:
> 
> -  ESXi 4.1 on new HP DL 380 G7 servers
> -  Onboard Broadcom NC382i gigbit adapter
> -  Add on Emulex OneConnect 10Gb SFP adapter
> -  Two identical fedora 13 VMs setup with drbd version 8.3.7 installed
> -  One SFP+ cable running direct between the two 10Gb NICs on each 
> server, no switch.
> -  One 1Gb CAT6 cable running direct between two 1Gb NICs on each 
> server, no switch.
> 
> When I add a vmxnet3 virtual adapter to the VMs and start drbd, it
> begins syncing and then quickly will stall.  The behavior is identical
> when I map the virtual NIC to either the Emulex 10Gb adapter on the
> ESXi host, or the Broadcom 1Gb adapter. So I know its not the physical
> link that's the issue.
> 
> Conversely when I add the e1000 NIC to the VMs and start drbd,
> everything syncs fine with no issues, and again this is the same no
> matter what physical link I connect the VMs to.
> 
> I installed vmware tools and verified it loaded the vmxnet3 module
> successfully.
> 
> I have looked in the messages log but haven't found anything, is there
> another log I should be reviewing?
> 
> Thanks in advance for any insight into this.
> 
> Sean


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Compile Error Against RackSpace Cloud Linux Kernel

2010-11-23 Thread Lars Ellenberg

On Tue, Nov 23, 2010 at 01:44:13AM -0500, Sean Carey wrote:
> Folks,
> I am trying to compile the drbd kernel module against the rackspace cloud
> linux kernel and am getting the following error. I followed the online doc
> and I can get the build to work against generic headers, just not RS Cloud.
> Any help would be greatly appreciated.
> 
> 
> The Error:
> 
> test -f ../scripts/adjust_drbd_config_h.sh && \
>  KDIR=/lib/modules/2.6.35.4-rscloud/build O= /bin/bash
> ../scripts/adjust_drbd_config_h.sh
> /lib/modules/2.6.35.4-rscloud/build /usr/src/drbd-8.3.7/drbd
> /usr/src/drbd-8.3.7/drbd

You do realise that 2.6.35 already contains DRBD?

You could try with latest DRBD git (if you want to have a more recent
DRBD than the one shipped with 2.6.35), it should be aware of a few
specialties when building external drbd modules against kernel sources
already containing drbd.

Or just use the one shipped with 2.6.35.

You should use a DRBD userland with the same version as your kernel module.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] uuid_compare()=-1000 by rule 100

2010-11-25 Thread Lars Ellenberg

On Thu, Nov 25, 2010 at 11:32:02AM +0100, Pavlos Parissis wrote:

> I guess the result of -1000 on uuid_compare is quite cryptic and
> doesn't give you much information on the root cause.

Then don't focus on that line, but on the next:
Nov 24 15:00:27 pbxsrv3 kernel: block drbd2: Unrelated data, aborting!

Still cryptic?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Build rpm package for redhat don't work well

2010-11-29 Thread Lars Ellenberg

On Wed, Nov 24, 2010 at 09:32:08AM +0800, Zhu, Jummy (NSN - CN/Cheng Du) wrote:
> Hi,
> 
> I try to build 8.3.9 rpm package in RHEL5.3 with kernel 2.6.18-194.17.4.el5
> but I found all the --with-package or --without-package don't work, for 
> example, if I enable --with-rgmanager, after make rpm there's no rgmanager 
> rpm package generated.
> And I used --without-pacemaker --without-heartbeat, but those two rpm 
> generated. Please see my output below:
> 
> # uname -r
> 2.6.18-194.17.4.el5
> 
> # ./configure --with-utils --with-km --without-udev --without-xen 
> --without-pacemaker --without-heartbeat --with-rgmanager 
> --without-bashcompletion --without-distro

> But when I run make rpm I got:
> [r...@clnode1(Emma) /home/oracle/drbd-8.3.9]
> # make rpm

> You have now:
> /usr/src/redhat/RPMS/x86_64/drbd-udev-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-bash-completion-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-pacemaker-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-heartbeat-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-xen-8.3.9-1.x86_64.rpm
> /usr/src/redhat/RPMS/x86_64/drbd-utils-8.3.9-1.x86_64.rpm
> 
> The one rgmanager is not there, but all the other ones that I don't want. Why?

does
make rpm "RPMOPT=--with rgmanager"
give you the desired result?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] N-to-N network setup

2010-11-29 Thread Lars Ellenberg

On Fri, Nov 26, 2010 at 12:48:28PM -0500, Seyed Amir Hejazi wrote:
> Hi all,
> 
> I need to setup a three node  N-to-N network. I was wondering how should I
> set up the configuration file for drbd in order to make all of my nodes
> primary. I know that in the two node setup you only need to user

DRBD still does not allow more than two primaries.

If you step back a bit, what is it you are _actually_ trying to achieve?

> startup {
> become-primary-on both;
> 
> }
> net {
> allow-two-primaries;
> 
> }
> 
> Can anyone kindly help me setting up my network. I am using ubuntu 10.4
> machines as my nodes and I want to integrate drbd with pacemaker.
> 
> Regards,
> -- 
> Amir

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Disk errors at smartctl -a

2010-11-29 Thread Lars Ellenberg

On Tue, Nov 23, 2010 at 11:55:17AM -0800, chambal wrote:
> When I do "smartctl -a /dev/sda" on this system, it usually
> triggers errors in the system log.

Wrong list?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem using DRBD 8.3.8.1/Intel 10Gb/x64/2.6.36

2010-11-29 Thread Lars Ellenberg

On Sun, Nov 21, 2010 at 02:30:02PM +0100, Laurent Caron wrote:
> Hi,
> 
> I did migrate an active passive mail server from Debian x32
> (2.6.22/Intel Gb/Drbd 8.???) to Debian x64 (Lenny) (2.6.36/Intel
> 10Gb/Drbd 8.3.8.1).
> 
> More often than not, the directories located on the DRBD device (on the
> master) become unaccessible.

> Is this a known issue ?

Sorry. What exactly is the issue?
What do you mean by "become unaccessible"?

> My 10Gb cards seems to work fine apart from this.
> 
> Should I try syncing over Gb links ?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] 8.3.7: UpToDate/UpToDate <-> UpToDate/Inconsistent

2010-11-29 Thread Lars Ellenberg

On Thu, Nov 25, 2010 at 11:10:20PM +0100, Ekkard Gerlach wrote:
> Hi, 
> 
> with 8.3.7 on two installations with 2.6.32-bpo.5-vserver-amd64 I get quite 
> often
> a asymmetric information about consistency. At the moment I have such a 
> state: 
> 
> The primary site claims the other side to be inconsistent:
> =
> prax1:/usr/local/bin# cat /proc/drbd
> version: 8.3.7 (api:88/proto:86-91)
> srcversion: EE47D8BF18AC166BE219757

>  2: cs:Connected ro:Primary/Secondary ds:UpToDate/Inconsistent C r   
> <<<<<<< !! 

> The secondary thinks it is UpToDate:
> =
> prax2:/usr/local/bin# cat /proc/drbd
> version: 8.3.7 (api:88/proto:86-91)
> srcversion: EE47D8BF18AC166BE219757

>  2: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r   <<<<<<< 
> !!!

> Who tells the truth? 
> 
> The older system running since 3 months (on the same hardware) has had same 
> problem 
> at the beginning, say fist week. But never after. In this actual installation 
> I'm 
> in the first week. Perhaps DRBD becomes calm and wise after a week like a 
> teenager with 
> the 20's oder 30's? 

It's a mostly cosmetic issue, and has since been fixed.
Upgrade your kernel module.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] pacemaker/corosync fence drbd on digest integrity error - do not wait for reconnect

2010-11-29 Thread Lars Ellenberg

On Wed, Nov 17, 2010 at 12:36:32PM -0800, Dmitry Golubev wrote:
> 
> Hi,
> 
> I have a nasty problem with my cluster. For some reason it sometimes fails
> DRBD with "Digest integrity check FAILED". If I understand this correctly,
> that is OK and DRBD will reconnect at once. However before it does that, the
> cluster fences the secondary node and thus disables any possibility of
> cluster ever working again - until I manually clear the fencing rules out of
> crm config. The log looks like this:
> 
> 
> Nov 17 18:30:52 srv1 kernel: [2299058.247328] block drbd1: Digest integrity
> check FAILED.

> What can I do to fight this? I have no idea why is the communication fails
> sometimes, although the NICs and cabling is perfect. However I read in
> mailing lists, that it might happen with some NIC/kernel combination. Can we
> force the cluster soft to wait for reconnect a little bit?

Disable digest-integrity.
Buffers seem to be modified while being written out.
Which I consider bad behaviour. But it is still "legal",
and all filesystems seem to do it under certain circumstances.

Digest-integrity with dual-primary is not a very good idea.
Also, dual-primary mode of drbd does not necessarily add to your
availability, so think twice if you really need it.

We may add some workarounds in later versions of DRBD (copying
all data to private pages first, before we further process it),
which will obviously have additional performance impact.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] N-to-N network setup

2010-11-29 Thread Lars Ellenberg

On Mon, Nov 29, 2010 at 10:15:45AM -0500, Seyed Amir Hejazi wrote:
> I am trying to design a database in which all three of my nodes are primary
> and have read/write permission. Moreover, they are consistent with each
> other; in other words, all three nodes are in sync and every modification of
> data is disseminated amongst all of them. I need the N-to-N setup so that I
> can use all three nodes to load balance the write actions.

Good luck.
You cannot (yet?) have more than two nodes
accessing the same DRBD concurrently.

You may be able to do it on a cluster file system on top of iSCSI,
and of course you can have iSCSI on top of a failover DRBD cluster.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd on ramdisks

2010-11-29 Thread Lars Ellenberg

On Mon, Nov 29, 2010 at 04:18:25PM -0200, Andre Nathan wrote:
> On Fri, 2010-11-19 at 11:37 -0200, Andre Nathan wrote:
> > Hello
> > 
> > I'm trying to setup drbd on a ramdisk. The idea is to use it for session
> > storage in a webserver cluster. However, I'm getting the following
> > error:
> > 
> > # drbdadm  create-md r2
> > Writing meta data...
> > initializing activity log
> > pwrite(13,...,32768,536834048) in md_initialize_common:AL failed:
> > Input/output error
> > Command 'drbdmeta 2 v08 /dev/ram0 internal create-md' terminated with
> > exit code 10
> > 
> > My resource configuration and an strace on the above drbdmeta command
> > can be found at
> > 
> >   http://pastebin.com/feNbVwmz
> > 
> > Is this kind of configuration supported?
> 
> 
> So, is anyone using this successfuly? Is this a bug or ramdisks are
> really not supported?

Worked for me last time I tried.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Understanding degr-wfc-timeout

2010-12-03 Thread Lars Ellenberg

On Fri, Dec 03, 2010 at 02:13:09AM +, Andrew Gideon wrote:
> On Thu, 02 Dec 2010 13:36:25 -0600, J. Ryan Earl wrote:
> 
> > If you "gracefully" stop DRBD on one
> > node, it's not "degraded."  Degraded is from like a non-graceful
> > separation due to a crash, power-outage, network issue, etc where one
> > end detects the other is gone instead of being told to gracefully close
> > connection between nodes.
> 
> I issued a "stop" (a graceful shutdown) only after I broke the DRBD 
> connection by blocking the relevant packets.  So before the stop, the 
> cluster was in a degraded state:
> 
>  1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> 
> Using "stop" still causes a clean shutdown which then avoids degr-wfc-
> timeout?
> 
> Is there any way that a network issue, or anything else short of a crash 
> of the system, can invoke degr-wfc-timeout?  I've even tried 'kill -9' of 
> the drbd processes, but they seems immune to this.
>
> I can force a system crash if I have to, but that's something of pain in 
> the neck so I'd prefer another option if one is available.
> 
> Or have I misunderstood?  I've been assuming that degr-wfc-timeout 
> applies only to the WFC at startup (because the timeout value is in the 
> startup block of the configuration file).  Is this controlling some other 
> WFC period?
> 
> When I break the connection (and with no extra fencing logic specified), 
> I see that both nodes go into a WFC state.  But this is lasting well 
> longer than the 60 seconds I have defined in degr-wfc-timeout.

See if my post
[DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary
dated Tue Jan 8 11:56:00 CET 2008 helps.
http://lists.linbit.com/pipermail/drbd-user/2008-January/008223.html
and other archives

BTW, that setting only affects drbdadm/drbdsetup wait-connect, as used
for example by the init script, if used without an explicit timeout.
It does not affect anything else.

What is it you are trying to prove/trying to achieve?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] error code 17

2010-12-03 Thread Lars Ellenberg

On Fri, Dec 03, 2010 at 09:48:33PM +0530, Sachin Gupta wrote:
> state change failed (-2): need access to up to data
> terminated with exit code 17

At least copy'n'paste the error message correctly.
it is "Need access to UpToDate data"

> why am i getting this error ?

Because you do not have access to up-to-date data.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Understanding degr-wfc-timeout

2010-12-06 Thread Lars Ellenberg

On Fri, Dec 03, 2010 at 05:12:23PM +, Andrew Gideon wrote:
> On Fri, 03 Dec 2010 09:43:11 +0100, Lars Ellenberg wrote:
> 
> > See if my post
> > [DRBD-user] DRBD Failover Not Working after Cold Shutdown of Primary
> > dated Tue Jan 8 11:56:00 CET 2008 helps.
> > http://lists.linbit.com/pipermail/drbd-user/2008-January/008223.html and
> > other archives
> 
> Perhaps I'm still not grasping this, but - based on that URL - I thought 
> the situation below would make use of degr-wfc-timeout:
> 
> I'd two nodes, both primary.
> 
> Using iptables, I "broke" the connection.  Both nodes were still up, but 
> reporting:
> 
>  1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> 
> and
> 
>  1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C r
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> 
> I then crashed ("xm destroy") one node and then booted it.  As I 
> understand the above-cited post, this should have make use of the degr-
> wfc-timeout value but - apparently - it did not:
> 
> Starting drbd:  Starting DRBD resources: [ 
> drbd1
> Found valid meta data in the expected location, 16105058304 bytes into /
> dev/xvdb1.
> d(drbd1) drbd: bd_claim(cfe1ad00,cc00c800); failed [d108e4d0;c0478e79;1]
> 1: Failure: (114) Lower device is already claimed. This usually means it 
> is mounted.

There.  It cannot even attach.
Because it cannot attach, it cannot read it's meta data.
Thus it does not know anything about itself.

> [drbd1] cmd /sbin/drbdsetup 1 disk /dev/xvdb1 /dev/xvdb1 internal --set-
> defaults --create-device  failed - continuing!

You better make sure xvdb1 is not used by someone else
at the time your drbd tries to attach it.

You may need to fix your fstab, or your lvm.conf,
or your initrd, or whatever other "magic" is going on there.

> s(drbd1) n(drbd1) ]..
> ***
>  DRBD's startup script waits for the peer node(s) to appear.
>  - In case this node was already a degraded cluster before the
>reboot the timeout is 60 seconds. [degr-wfc-timeout]
>  - If the peer was available before the reboot the timeout will
>expire after 0 seconds. [wfc-timeout]
>(These values are for resource 'drbd1'; 0 sec -> wait forever)
>  To abort waiting enter 'yes' [ 208]:
> 
> 
> What am I doing/understanding wrong?

The disk you ask DRBD to attach to is used by something else,
file system, device mapper, whatever.  Fix that.

> 
> > BTW, that setting only affects drbdadm/drbdsetup wait-connect, as used
> > for example by the init script, if used without an explicit timeout. It
> > does not affect anything else.
> > 
> > What is it you are trying to prove/trying to achieve?
> 
> At this point, I'm trying to understand DRBD.  Specifically in this case, 
> I'm trying to understand the startup process and it deals with various 
> partition/split-brain cases.  I come from a Cluster Suite world, where 
> "majority voting" is the answer to these issues, so I'm working to come 
> up to speed on how these issues are addressed by DRBD.
> 
> The idea of waiting forever seems like a problem if only one node is 
> available to go back into production.  I know that the wait can be 
> overridden manually, but is there a way to not wait forever?
> 
> This is the context in which I started looking at degr-wfc-timeout.  
> 
> FWIW, I've also posted in the thread "RedHat Clustering Services does not 
> fence when DRBD breaks" trying to understand the fencing process.  I 
> think I managed to suspend all I/O in the case of a fence failure (the 
> handler returning a value of 6), but I'm not sure.  Does:
> 
>  1: cs:WFConnection ro:Primary/Unknown ds:UpToDate/DUnknown C s
> ns:0 nr:0 dw:4096 dr:28 al:1 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
> 
> indicate suspension?  Is that what "s" means?

at that position, yes, that means application io is
s: suspended, or r: running/resumed.
you can manually resume with "drbdadm resume-io"

> I've failed to find documentation for that bit of string in /proc/drbd.

Is that so.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem using DRBD 8.3.8.1/Intel 10Gb/x64/2.6.36

2010-12-06 Thread Lars Ellenberg

On Wed, Dec 01, 2010 at 08:53:58PM +0100, Florian Haas wrote:
> On 12/01/2010 08:27 PM, Laurent CARON wrote:
> > On 29/11/2010 10:57, Lars Ellenberg wrote:
> >> Sorry. What exactly is the issue?
> >> What do you mean by "become unaccessible"?
> > 
> > 
> > Hi,
> > 
> > The load climbs to 100 or 200.
> 
> Try using the deadline I/O scheduler.

And, you said you are coming from
Debian something (2.6.22/Intel Gb/Drbd 8.???)
to Debian x64 (Lenny) (2.6.36/Intel 10Gb/Drbd 8.3.8.1).

Depending on the value of 8.???,
you may want to read about the drbd config options
 no-disk-barrier, no-disk-flushes, no-md-flushes ...

Other than that, there is still
http://www.linbit.com/en/produkte-services/drbd-consulting/

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] strange split-brain problem

2010-12-07 Thread Lars Ellenberg

On Mon, Dec 06, 2010 at 06:08:19PM +0100, Klaus Darilion wrote:
> Hi all!
> 
> Today I had a strange experience.
> 
> node A: 192.168.100.100, cc1-vie
>   /dev/drbd1: primary
>   /dev/drbd5: primary
> 
> node B: 192.168.100.101, cc1-sbg
>   /dev/drbd1: secondary
>   /dev/drbd5: secondary
> 
> The /dev/drbdX devices are used by a xen domU.
> 
> resource manager-ha {
>   startup {
> become-primary-on cc1-vie;
>   }
>   on cc1-vie {
> device/dev/drbd1;
> disk  /dev/mapper/cc1--vienna-manager--disk--drbd;
> address   192.168.100.100:7789;
> meta-disk internal;
>   }
>   on cc1-sbg {
> device/dev/drbd1;
> disk  /dev/mapper/cc1--sbg-manager--disk--drbd;
> address   192.168.100.101:7789;
> meta-disk internal;
>   }
> }
> 
> resource cc-manager-templates-ha {
>   startup {
> become-primary-on cc1-vie;
>   }
>   on cc1-vie {
> device/dev/drbd5;
> disk  /dev/mapper/cc1--vienna-cc--manager--templates--drbd
> address   192.168.100.100:7793;
> meta-disk internal;
>   }
>   on cc1-sbg {
> device/dev/drbd5;
> disk  /dev/mapper/cc1--sbg-cc--manager--templates--drbd
> address   192.168.100.101:7793;
> meta-disk internal;
>   }
> }
> 
> Everything was running fine. Then I rebooted both servers. Then I spotted:
> 
> block drbd5: Starting worker thread (from cqueue [1573])
> block drbd5: disk( Diskless -> Attaching )
> block drbd5: Found 4 transactions (192 active extents) in activity log.
> block drbd5: Method to ensure write ordering: barrier
> block drbd5: Backing device's merge_bvec_fn() = 81431b10
> block drbd5: max_segment_size ( = BIO size ) = 4096
> block drbd5: drbd_bm_resize called with capacity == 41941688
> block drbd5: resync bitmap: bits=5242711 words=81918
> block drbd5: size = 20 GB (20970844 KB)
> block drbd5: recounting of set bits took additional 0 jiffies
> block drbd5: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> block drbd5: Marked additional 508 MB as out-of-sync based on AL.
> block drbd5: disk( Attaching -> UpToDate )
> 
> 
> This is the first thing which makes me nervous: There were 500MB to
> synchronize although the server was idle and everything was
> synchronized before rebooting.

As was pointed out already,
read up on what we call the activitly log.

> Then some more reboots on node A and suddenly:
> 
> block drbd5: State change failed: Refusing to be Primary without at
> least one UpToDate disk
> block drbd5:   state = { cs:WFConnection ro:Secondary/Unknown
> ds:Diskless/DUnknown r--- }
 

You failed to attach, you have not yet connected,
so DRBD refuses to become Primary: which data should it be Primary with?

> Then the status on node A was:
> 
> cc-manager-templates-ha  Connected Primary/Secondary
> Diskless/UpToDate A r

It was able to establish the connection,
and was going Primary with the data of the peer.

> When I tried to manually attach the device I got error messages:
> "Split-Brain detected, dropping connection".

Hm.  Ugly.
It should refuse the attach instead.
Did it just get the error message wrong,
or did it actually disconnect there?
What DRBD version would that be?

> After some googling without finding any hint suddenly the status changed:
> 
> cc-manager-templates-ha  StandAlone Primary/Unknown
> UpToDate/DUnknown r xen-vbd: _cc-manager
> 
> 
> So, suddenly this one device is not connected anymore. All the other
> drbd devices are still connected and working fine - only this single
> device is making problems, although it has identical configuration.
> 
> 
> What could cause such an issue? Everything was working fine, I just
> rebooted the servers.
> 
> Any hints what to do now to solve this issue?

Your setup is broken.
Apparently something in your boot process, at least "sometimes",
claims the lower level devices so DRBD fails to attach.
 Fix that.

Your shutdown process is apparently broken enough to
not really shutdown everything and demote/down DRBD
so it stays Primary. That makes an "orderly" shutdown/reboot
look like a Primary crash to DRBD.
 Fix that.

Are you sure that you have been the only one tampering with DRBD at the
time, or would heartbeat/pacemaker/whatever try to do something at the
same time?

And, BTW, no.
Your /etc/hosts file has zero to do with how DRBD behaves.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] strange split-brain problem

2010-12-07 Thread Lars Ellenberg

On Tue, Dec 07, 2010 at 08:36:01PM +0100, Klaus Darilion wrote:
> Hi Lars!

> >>Then some more reboots on node A and suddenly:
> >>
> >>block drbd5: State change failed: Refusing to be Primary without at
> >>least one UpToDate disk
> >>block drbd5:   state = { cs:WFConnection ro:Secondary/Unknown
> >>ds:Diskless/DUnknown r--- }
> >  
> >
> >You failed to attach, you have not yet connected,
> >so DRBD refuses to become Primary: which data should it be Primary with?
> 
> but how can it be secondary without and disk?

Oh the wonders of DRBD ;-)
Well, you told it to.
It's completely legal to tell a DRBD to connect to its peer
without having a local disk attached.  It's unusual, though.

> 
> >>Then the status on node A was:
> >>
> >>cc-manager-templates-ha  Connected Primary/Secondary
> >>Diskless/UpToDate A r
> >
> >It was able to establish the connection,
> >and was going Primary with the data of the peer.
> 
> Is this a feature? How can it know that the peers data is up2date
> when it can not attach to the local disk?

You told it to.  DRBD typically does what it is told,
unless it happens know better for sure
(and even then you can force it, usually).

If you tell it to connect without first attaching a local disk,
and you don't have resource level fencing mechanisms in place
so the remote end assumes itself to be uptodate,
that's your problem.

> >>When I tried to manually attach the device I got error messages:
> >>"Split-Brain detected, dropping connection".
> >
> >Hm.  Ugly.
> >It should refuse the attach instead.
> >Did it just get the error message wrong,
> >or did it actually disconnect there?
> >What DRBD version would that be?
> 
> Ubuntu 10.04:
> # /etc/init.d/drbd status
> drbd driver loaded OK; device status:
> version: 8.3.7 (api:88/proto:86-91)
> GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
> r...@cc1-sbg, 2010-10-14 15:13:20

> >And, BTW, no.
> >Your /etc/hosts file has zero to do with how DRBD behaves.
> 
> At least I can reproduce the bad behavior when adding the bug to
> /etc/hosts. I think it has something todo how I address the disk.
> The one volume which is working fine is configured with:
>   disk /dev/mapper/cc1--vienna-manager--disk--drbd
> 
> The other volume which causes the problems is configured with
>   disk /dev/cc1-vienna/cc-manager-templates-drbd
> which is a symlink to
>   /dev/mapper/cc1--vienna-cc--manager--templates--drbd
> 
> So, I have no idea why, but it seems that if /etc/hosts is broken
> then the symlinks are no available when DRBD starts. When after
> booting up is stop/start the DRBD service, then DRBD attaches to the
> disks fine. Strange.

At best, changing stuff in /etc/hosts changes some timing during
your boot process. Which means it is still broken since racy.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] "To resolve this both sides have to support at least protocol" after upgrade

2010-12-09 Thread Lars Ellenberg

On Wed, Dec 08, 2010 at 12:37:07PM +1100, Tim Sharpe wrote:
> Hi all,
> 
> I was upgrading one of our older DRBD pairs from 8.3.2 to 8.3.7 today
> and ran into a bit of a problem.  I took the resources on the
> secondary down, upgraded the kernel & DRBD, rebooted and brought the
> resources back up again.  One of the resources on the secondary
> refused to reconnect to the primary server however the other 15
> connected and resynced fine.
> 
> Here's the log for the resource
> block drbd12: drbd_sync_handshake:
> block drbd12: self 
> 113BE4C4974DBB16::FDD46B6FD6F2C788:80EFD64D4EC7ECC7 bits:0 
> flags:0
> block drbd12: peer 
> 113BE4C4974DBB17:FDD46B6FD6F2C789:80EFD64D4EC7ECC7:F88BD898114DD15D bits:4842 
> flags:0
> block drbd12: uuid_compare()=-1001 by rule 30
> block drbd12: To resolve this both sides have to support at least protocol

That should read "at least protocol 91",
which according to changelog is DRBD version >= 8.3.3.

> It looks like it's handshaking and agreeing on a common protocol fine,
> but then disconnects with a message to the contrary.  Has anyone run
> into a similar situation?

So as soon as you have upgraded on both sides,
drbd should be able to resolve that.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] drbd on ramdisks

2010-12-09 Thread Lars Ellenberg

On Wed, Dec 08, 2010 at 11:29:42AM -0200, Andre Nathan wrote:
> On Tue, 2010-12-07 at 14:47 -0200, Andre Nathan wrote:
> > It seems the problem only occurs when you have resources on physical
> > volumes and on ram disks simultaneously. I can create these resources as
> > usual, but after a reboot the resource on the ramdisk stops working (as
> > expected, I guess, because the metadata was lost on the reboot) but then
> > drbdadm create-md results in the error I mentioned.
> 
> I tried working around this by using tmpfs. I created a large file in a
> tmpfs volume, and used losetup to map the file to a device. Then I
> created a DRBD volume on /dev/loop0.
> 
> I can bring up the resource fine, but it breaks during synchronization:
> 

> Dec  8 10:26:58 wcluster1 kernel: [ 1699.000927] block drbd2: 
> Handshakesuccessful: Agreed network protocol version 94
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.000944] block drbd2:conn( 
> WFConnection -> WFReportParams ) 
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.000976] block drbd2: Startingasender 
> thread (from drbd2_receiver [2161])
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001098] block 
> drbd2:data-integrity-alg: 
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001191] block 
> drbd2:drbd_sync_handshake:
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001198] block drbd2: 
> selfD608531CD3061EE3:0004::bits:262127
>  flags:0
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001204] block drbd2: 
> peer0004:::bits:262127
>  flags:4
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001209] block drbd2:uuid_compare()=2 
> by rule 30
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001212] block drbd2: Becomingsync 
> source due to disk states.
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001216] block drbd2: Writingthe 
> whole bitmap, full sync required after drbd_sync_handshake.
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001380] block drbd2: 1024 MB(262127 
> bits) marked out-of-sync by on disk bit-map.
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.001427] block drbd2:peer( Unknown -> 
> Secondary ) conn( WFReportParams -> WFBitMapS )pdsk( Outdated -> Inconsistent 
> ) 
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.241078] block drbd2:conn( WFBitMapS 
> -> SyncSource ) 
> Dec  8 10:26:58 wcluster1 kernel: [ 1699.241098] block drbd2: Beganresync as 
> SyncSource (will sync 1048508 KB [262127 bits set]).
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729138] block drbd2: read:error=-28 
> s=523520s
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729148] block drbd2: Resyncaborted.
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729156] block drbd2:conn( SyncSource 
> -> Connected ) disk( UpToDate -> Failed ) 
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729165] block drbd2: Local IOfailed 
> in drbd_endio_read_sec_final.Detaching...
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729223] block drbd2: read:error=-28 
> s=523584s
> Dec  8 10:27:19 wcluster1 kernel: [ 1719.729235] block drbd2: read:error=-28 
> s=523648s

ENOSPC on read is a tad unusual.
I think your experiment is just broken,
and that's certainly not DRBD's fault.
Maybe you overcomitted too much?

> Dec  8 10:27:21 wcluster1 kernel: [ 1721.887452] block drbd2: in
> got_BlockAck:4348: rs_pending_cnt = -1 < 0 !
> 
> This last message is repeated many times with varying rs_pending_cnt.

Retry with latest DRBD,
that one should be fixed.

and always state the version you are using when asking for advice
on "strange behaviour" ;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] strange split-brain problem

2010-12-09 Thread Lars Ellenberg

On Tue, Dec 07, 2010 at 09:22:34PM +0100, Klaus Darilion wrote:
> Things are getting worse:
> 
> Am 07.12.2010 20:36, schrieb Klaus Darilion:
> >>And, BTW, no.
> >>Your /etc/hosts file has zero to do with how DRBD behaves.
> >
> >At least I can reproduce the bad behavior when adding the bug to
> >/etc/hosts. I think it has something todo how I address the disk. The
> >one volume which is working fine is configured with:
> >disk /dev/mapper/cc1--vienna-manager--disk--drbd
> >
> >The other volume which causes the problems is configured with
> >disk /dev/cc1-vienna/cc-manager-templates-drbd
> >which is a symlink to
> >/dev/mapper/cc1--vienna-cc--manager--templates--drbd
> >
> >So, I have no idea why, but it seems that if /etc/hosts is broken then
> >the symlinks are no available when DRBD starts. When after booting up is
> >stop/start the DRBD service, then DRBD attaches to the disks fine. Strange.
> 
> I changed drbd.conf to use the "mapper" devices instead of the symlinks.
> 
> Now after startup all but one of the volumes were attached to the
> local disk. The one failed with:
> 
> block drbd6: refusing attach: md-device too small, at least 2048
> sectors needed for this meta-disk type
> block drbd6: drbd_bm_resize called with capacity == 0
> block drbd6: worker terminated
> block drbd6: Terminating worker thread
> 
> After that I manually attached the device without problems.
> 
> Is this really a problem with a too small device or may it be that
> the device wasn't existing at all?

You have to investigate your boot process more closely then.
It is likely still racy somewhere,
possibly with asynchronous udev involvement.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] "To resolve this both sides have to support at least protocol" after upgrade

2010-12-09 Thread Lars Ellenberg

On Fri, Dec 10, 2010 at 06:45:15AM +1100, Tim Sharpe wrote:
> Hi Lars,
> 
> Thanks for getting back to me.
> 
> A couple of questions though.  Why would it need protocol 91 if they
> both supported protocol 90 and why would this issue only show up on
> one of the resources on this box while the other 15 resources ran
> fine?

Because it's a bug?

Because it's a racy bug,
both sides have to agree on the result of the sync handshake,
one "aware" side is enough to detect it,
but you need both sides "aware" of the issue to resolve it.

Prior versions would not even have detected it, but more or less
silently skipped the resync (they would have logged some warning about
no resync, but bits in bitmap), because of the identical (ignore the
right most bit) "current" UUIDs, even though there have been "dirty"
bits in the bitmap.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Problem DRBD With Xen

2010-12-21 Thread Lars Ellenberg

On Wed, Dec 22, 2010 at 01:14:25PM -0200, gilmarli...@agrovale.com.br wrote:

Please avoid html mail on mailing lists,
even more so if their plain text "alternative"
looks like the incomprehensible garbage below.
Anyways, I see no obvious indication of any problem here.

Maybe you have different expectations about the result of whatever it is
you are doing?

> Hello!I'm using xen with drbd lvm where the domU on drbd it applied.Maybe
> someone knows if the log below this a problem for me is not correct.

Don't refer to logs without describing what you did,
what you expected to happen, and what actually did happen.

> Dec 20
> 22:52:39 pitanga kernel: block drbd0: Resync done (total 196 sec; paused 0 
> sec; 102044
> K/sec)Dec 20 22:52:39 pitanga kernel: block drbd0: conn( SyncSource ->
> Connected ) pdsk( Inconsistent -> UpToDate )Dec 20 22:53:09 pitanga kernel:
> block drbd0: peer( Primary -> Secondary )Dec 20 22:53:09 pitanga kernel: block
> drbd0: peer( Secondary -> Unknown ) conn( Connected -> TearDown ) pdsk( 
> UpToDate
> -> DUnknown )Dec 20 22:53:09 pitanga kernel: block drbd0: Creating new current
> UUIDDec 20 22:53:09 pitanga kernel: block drbd0: meta connection shut down by
> peer.

Without the knowledge about what you did and what you thought _should_
happen, the logs just show some fraction of what _did_ happen,
and there is hardly anything wrong with drbd shutting down per se, if
you (or your cluster manager) told it to.  It usually won't shut down
all by itself.

> drbd.confglobal {    usage-count no;}common {  syncer { rate 100M; }resource 
> xen {  protocol C;  handlers {  }  startup { wfc-timeout  15;   
> degr-wfc-timeout 20;    # 2 minutes.   
> become-primary-on both;}  disk {} 
> net {    sndbuf-size 512k;    timeout 60;    connect-init 10;   
> ping-int    10;   
> ping-timeout  5;    max-buffers 2048;   
> max-epoch-size  2048;    allow-two-primaries;    after-sb-0pri 
> discard-zero-changes;   
> after-sb-0pri discard-least-changes;    after-sb-1pri
> discard-secondary;}syncer {rate 100M;al-extents 257;}on pitanga {  device    
> /dev/drbd0; 
> disk    /dev/sda7;  address
> 10.1.1.50:7788;  meta-disk  internal;}on inga {  device    /dev/drbd0; 
> disk    /dev/sda7;  address
> 10.1.1.50:7788;  meta-disk  internal;}} The heartbeat is configured ha.cf 
> with auto_failback on.Thanks

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] filesystem corrupt based on drbd

2010-12-25 Thread Lars Ellenberg

On Fri, Dec 24, 2010 at 10:50:43AM +0800, Sharp.Zeng wrote:
> There is node 0 and 1. Node 0's drbd has encountered filesystem
> corrupt and turned into read-only, node 1's filesystem has been also
> corrupted at the same time.
> Any suggestions to avoid this situation? or drbd can't be
> configured to resolve this?

http://blogs.linbit.com/florian/2010/10/28/drbd-fsck-dix/

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD BUG (kernel oops)

2011-01-09 Thread Lars Ellenberg

On Thu, Jan 06, 2011 at 02:44:19PM +0100, Christian Iversen wrote:
> Hello.
> 
> I was trying to set up a DRBD test resource, using DRBD version:
> 
> "GIT-hash: ea9e28dbff98e331a62bcbcc63a6135808fe2917 build by
> r...@buildd, 2010-01-23 08:21:00" (8.3.7)

Nope.  I may be wrong, but I suspect you are not using the DRBD kernel
module you think you do.  You most likely still use the "drbd 8.0.14"
module "shipped" with lenny.  Same for your other oops.

Please "tripple" check.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Sunshine screen with XenServer 5.6 FP1

2011-01-10 Thread Lars Ellenberg

On Tue, Dec 21, 2010 at 01:59:32PM -0500, Rom Zhe wrote:
> Hi guys. Have some updates and more info for you.
> While trying to tackle the problem from dif. angles, I installed plain HDD
> in one of the servers and put XS 5.6 FP1 on it, also
> gave DRBD /dev/sda3 on the same drive. Before server booted from flash card
> (perhaps not fast enough to capture crash info) and data was on RAID
> volumes.
> This time around I was able to see what happened behind "sunshine screen"
> (also got crash and other logs generated).
> <1>BUG: unable to handle kernel NULL pointer dereference at 0004
> <1>IP: [] bio_free+0x2c/0x50
> <4>*pdpt = 0004fe3ed027 *pde = 
> <0>Oops:  [#1] SMP
> <0>last sysfs file: /sys/class/net/lo/carrier
> 
> I'm attaching snapshot of "OOPS" and a few log files hoping this will shed
> some light on what's really causing this
> and how to fix it.. As I understand this problem was confirmed by Jodok (or
> maybe it's dif. issue).. Any thoughts?

Is that reproducable also with 8.3.8.1?
or only with 8.3.9?

Are you aware of the 8.3.9-y branch?
http://git.drbd.org/?p=drbd-8.3.git;a=shortlog;h=refs/heads/drbd-8.3.9-y

If that does not help, I'd suggest you managed to get a broken build
or drbd (not exactly matching your xen dom0 kernel).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] "Concurrent local write detected!"

2011-01-10 Thread Lars Ellenberg

On Wed, Dec 29, 2010 at 02:49:00PM -0700, Chris Worley wrote:
> I really think drbd is being brain-dead here.

That's very much possible.
But see
http://old.nabble.com/IET-1.4.20.2-hosting-vmfs-on-drbd-complains-about-concurrent-write-td29756710.html#a29767518

> Concurrent writes to
> the same LBA aren't an issue... just do it!

But then, why not "just don't do it",
on your part?

> Note the below is using a
> primary/secondary setup on two raw drbd devices; no GFS anywhere.
> 
> Let me use an example of two fio invocations as an example, sorry if
> you don't know fio.
> 
> The first is an example of what I'd normally use, when telling it I
> want to run two threads per drive.
> 
> fio  --rw=write --bs=1m --rwmixread=0 --iodepth=64
> --output=/tmp/fio.out.txt --group_reporting --sync=0 --direct=1
> --randrepeat=0 --softrandommap=1 --ioengine=libaio --loops=1
>--name=test0 --filename=/dev/drbd0 --numjobs=2 --size=16093304832
>--name=tet1 --filename=/dev/drbd1 --numjobs=2 --size=16093304832
> 
> In the above case, nearly immediately, the systems starts spewing
> "Concurrent local write detected", and as block sizes decrease the
> machine-check monitor will eventually do a soft lockup, and the
> thumb/boot drive will all of sudden think it's disconnected then
> reconnect as a different SD device (leaving the system dead).

I suspect that the "system dead" may be a result of
the logged message being "alert" level,
also ending up on some serial console, which then disables interrupts
too often for too long so some other part of the system "breaks".

> If I change the above to assure no two threads write to the same offsets, as 
> in:
> 
> fio --rw=write --rwmixread=0 --bs=1m --runtime=600 --iodepth=64
> --output=/tmp/fio.out.txt --group_reporting --sync=0 --direct=1
> --randrepeat=0 --softrandommap=1 --ioengine=libaio --loops=1 \
>   --name=test0-0 --filename=/dev/drbd0 --offset=0 --numjobs=1
> --size=8046652416 \
>   --name=test0-1 --filename=/dev/drbd0 --offset=8046652416 --numjobs=1
> --size=8046652416 \
>   --name=test1-0 --filename=/dev/drbd1 --offset=0 --numjobs=1
> --size=8046652416 \
>   --name=test1-1 --filename=/dev/drbd1 --offset=8046652416 --numjobs=1
> --size=8046652416
> 
> ... then I see no problems.
> 
> Unix semantics has you covered.  If your told to write the same LBA
> twice, just write the thing, and don't kill the system.
> 
> Thanks,
> 
> Chris
> On Tue, Dec 28, 2010 at 1:06 PM, Chris Worley  wrote:
> > On Thu, Dec 23, 2010 at 10:48 AM, J. Ryan Earl  wrote:
> >> On Mon, Dec 20, 2010 at 2:06 PM, Chris Worley  wrote:
> >>>
> >>> I'm using RHEL5.5/2.6.18-194.3.1.el5 and IB/SDP.
> >>
> >> What version of DRBD are you using and what versions have you tried?
> >
> > 8.3.8-1, using the precompiled binary RPMs.  I've not tried other
> > revs.  I have tried other configurations, all of which seem to
> > lock-up; in one configuration, the drive devices not associated with
> > DRBD get locked-up and the devices go offline.
> >
> > Thanks,
> >
> > Chris
> >
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Xenserver 5.6 FP 1 DRBD crash (Alex Kuehne)

2011-01-10 Thread Lars Ellenberg

On Mon, Jan 10, 2011 at 11:38:27AM -0500, Roman wrote:
> This issue was reported to DRBD list and Citrix forums 2 weeks ago.
> There was no official follow up from neither of the vendors.
> Seems it's a Xen new kernel issue which was resolved internally at Citrix,
> but fix is not available at present time. See thread below:
> http://forums.citrix.com/thread.jspa?threadID=279359&tstart=45
> 
> FP1 in general has plenty of issues, but that's beyond this topic.
> http://forums.citrix.com/forum.jspa?forumID=503&start=0
> Hope this saves you some time.

Thanks for the update.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] configuring 2 services on 2 hosts

2011-01-10 Thread Lars Ellenberg

On Mon, Jan 10, 2011 at 12:54:17PM -0600, J. Ryan Earl wrote:
> On Mon, Jan 10, 2011 at 12:51 PM, J. Ryan Earl  wrote:
> 
> > On Thu, Jan 6, 2011 at 1:23 PM, J  wrote:
> >
> >> In a related question. Is it possible to take a snapshot of the secondary
> >> volume while the drbd is active? I can't find the exact link, but the 
> >> user's
> >> guide mentioned not accessing the secondary AT ALL while the primary was
> >> active. I assumed that would include taking snapshots of the secondary?
> >
> >
> > A DRBD device in secondary mode will not be accessible in anyway.   IIRC,
> > the /dev/drbdX device won't even exist at this point.  If you're using
> > protocol C and have the DRBD resource on top of LV, you can still snapshot
> > it and the snapshot should be consistent for a point-in-time.
> >
> > -JR
> >
> 
> I should say, you can still snapshot it on [the secondary node at the
> time]...
> 
> Otherwise, if you're running active-passive with LVM on-top of DRBD, then
> the snapshot can only be instigated by the primary and the snapshot CoW
> volume will be replicated to the secondary.  Keep in mind, DRBD +
> Snapshotting will impose triple overhead on writes.

If your DRBD lives on top of a LV, you can of course snapshot
the LV _below_ drbd (on any node, does not make much difference).

Compared to snapshots with the file system directly on the LV,
there is an important difference: in that case the file system is
implicitly notified to "freeze", immediately before the snapshot is
taken, and then "thawed" again after the snapshot stuff has been setup.

With DRBD between LV and Filesystem, the filesystem is no longer
notified implicitly, and even though you don't have to, you may want to
do that explicitly (e.g. using xfs_freeze -f, -u), which of course
needs to be done on the Primary (where it is mounted).

If we are not talking about Filesystems, but anything else (VM images,
whatever), you just take your snapshot (of the lower level LC) as usual.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Xenserver 5.6 FP 1 DRBD crash

2011-01-10 Thread Lars Ellenberg

scsi_decide_disposition+0x15e/0x170 [scsi_mod]
> ><4> [] ? scsi_softirq_done+0xfd/0x130 [scsi_mod]
> ><4> [] ? trigger_softirq+0x8a/0xa0
> ><4> [] ? blk_done_softirq+0x68/0x80
> ><4> [] ? __do_softirq+0xba/0x180
> ><4> [] ? handle_IRQ_event+0x37/0x100
> ><4> [] ? move_native_irq+0x14/0x50
> ><4> [] ? do_softirq+0x75/0x80
> ><4> [] ? irq_exit+0x2b/0x40
> ><4> [] ? evtchn_do_upcall+0x1e7/0x330
> ><4> [] ? set_next_entity+0x1f/0x50
> ><4> [] ? hypervisor_callback+0x43/0x4b
> ><4> [] ? xen_safe_halt+0xb5/0x150
> ><4> [] ? xen_idle+0x1e/0x50
> ><4> [] ? cpu_idle+0x3b/0x60
> ><4> [] ? cpu_bringup_and_idle+0xd/0x10
> ><0>Code: 89 e5 83 ec 08 89 1c 24 89 c3 89 74 24 04 89 d6
> >8b 50 38 85 d2 74 14 8d 40 4c 39 c2 74 0d 8b 4b 10 89 f0 c1 e9 1c
> >e8 a4 ff ff ff <2b> 5e 04 8b 56 08 89 d8 e8 97 99 fa ff 8b 1c 24
> >8b 74 24 04 89
> ><0>EIP: [] bio_free+0x2c/0x50 SS:ESP 0069:ee863ce8
> ><0>CR2: 0004
> >
> >I hope this is only a minor bug as it used to work with 5.6, I
> >really intend to use DRBD with FP1. So any response is
> >appreciated, if you need further info just give me a note.
> >
> >Best regards
> >Alex Kuehne
> 
> Follow up: I now tried to use version 8.3.10rc1. While doing
> "service drbd start" on the Xenserver, the drbdadm command crashes
> with that error message:
> 
> Jan 10 15:57:50 xs1 kernel: drbd: initialized. Version: 8.3.10rc1
> (api:88/proto:86-96)
> Jan 10 15:57:50 xs1 kernel: drbd: GIT-hash:
> 1a1dfa9f736c091cf4a4b8f8042601f3bcd00c5e build by
> r...@std11526-vm01, 2011-01-10 09:38:20
> Jan 10 15:57:50 xs1 kernel: drbd: registered as block device major 147
> Jan 10 15:57:50 xs1 kernel: drbd: minor_table @ 0xea4e30c0
> Jan 10 15:57:50 xs1 kernel: drbdadm[11720]: segfault at 0 ip
> 08052690 sp bffb4f60 error 4 in drbdadm[8048000+23000]
> 
> The drbd resource is not getting initialized at all. I'm building
> everything on Xenserver 5.6 FP1 DDK as RPM package.
> 
> BR,
> Alex Kuehne
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Xenserver 5.6 FP 1 DRBD crash

2011-01-11 Thread Lars Ellenberg

On Tue, Jan 11, 2011 at 04:58:46PM +0100, Alex Kuehne wrote:
> Quoting Lars Ellenberg :
> 
> >On Mon, Jan 10, 2011 at 04:07:12PM +0100, Alex Kuehne wrote:
> >>Quoting Alex Kuehne :
> >>
> >>>Hi guys,
> >>>
> >>>This is another report of DRBD not working with Xenserver 5.6 FP1.
> >>>I tried version 8.3.9 and 8.3.8.1. With Xenserver 5.6 (without
> >>>FP1) at least version 8.3.8.1 is working.
> >
> >For your drbdadm segfault with 8.3.10rc1,
> >please send me the config file it segfaults with (PM or to the list)
> 
> Here it is: http://p.0wnz.at/205018
> The global_common.conf is used as shipped. This is the same config I
> use with lower versions.

Ok thanks. I guess that's some bogon in drbdadm that has been fixed
already and will be pushed public tomorrow together with a bunch of
other updates, probably labled "rc2" .

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] [Linux-ha-dev] ocf:linbit:drbd incorrectly handles split brain

2011-01-13 Thread Lars Ellenberg

On Thu, Jan 13, 2011 at 06:53:54PM +0100, Lars Marowsky-Bree wrote:
> On 2011-01-05T15:46:54, Florian Haas  wrote:
> 
> > Run Pacemaker on Heartbeat, and use dopd, and this won't happen.
> 
> Hi Florian,
> 
> is there any missing functionality in the pacemaker integration?

The main difference is that dopd would, once invoked, actually write the
outdated information into the peers meta data (provided that peer is
reachable and it's metadata still writeable).

The "crm fence peer" thingy does not, but creates a constraint only.

I'm unsure how the OP got into the split brain situation, so I cannot
comment about what needs to be changed to not get there, or if/how it
was avoidable.

I'm also not sure what exactly was expected:
>>> however, i would expect pacemaker/the ra to do something about it.  
>>> 
>>>
>>> e.g. create a location constraint to not run drbd on the
>>> 
>>>
>>> secondary/consistent node.

But the location contraint is not generated by the ocf ra,
and must not be.  That's what the fence-peer hook of drbd is for.

I wrote a lengthy email in the past about "does drbd really need stonith",
http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg04312.html
which deals a bit with resource level fencing.

I'm not aware of missing functionality in the pacemaker integration
compared to the dopd solution.

But resource level fencing (wether using dopd or crm-fence-peer)
is not trivial to get it right.  That is partially because what is right
is not clearly defined, and may well change with different requirements.

As I tried to explain in above linked post,
stonith alone does not help, either.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Moving a live DRBD setup to a new version?

2011-01-18 Thread Lars Ellenberg

On Tue, Jan 18, 2011 at 07:58:49AM +0100, Felix Frank wrote:
> On 01/18/2011 12:45 AM, Cameron Smith wrote:
> >Well looking further online it seems the two versions are compatible!
> >Can anybody here confirm that please?
> >
> >Now my main two questions are:
> >
> >1) Will having different size partitions between nodes cause an issue?
> >(my meta-data is on a separate partition and not set as internal)
> 
> Yes, that's a problem. Your initial SyncTarget must be the same size

at least as big is supposed to be enough.

> as your original DRBD. You can then resize that partition by
> whatever means necessary, then resize the DRBD itself. Note that the
> last step should not be performed before you have resized both your
> nodes' partitions.
> 
> >2) After bringing up my resource on the new node do I then run:
> >
> >*|drbdadm -- --overwrite-data-of-peer primary/|resource|/|*
> >
> >ON THE OLD PRIMARY NODE? This is where the data is live and the /data 
> >partition is mounted.
> >
> >I want to make sure I don't kill my live site and it's data!!! :)
> 
> Seeing as your old node is already primary, you will more likely
> want to connect --discard-my-data on the other one. But your general
> idea is sound.

If that new node was "just created" (... create-md ...),
then there is no need for any of that.

But just to feel better, it may help knowing that DRBD
will refuse to become synctarget if it is currently primary,
so if you want something to not become synctarget,
it is usually good enough to just keep it primary.

Besides, for all things cluster upgrade,
you could (probably: should) just do a "dry run" in some VMs at least,
if you don't have an actual test cluster.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] minor problem with drbd script for redhat cluster suite

2011-01-20 Thread Lars Ellenberg

On Thu, Jan 20, 2011 at 12:28:04PM -0500, Chris Hunter wrote:
> I am testing drbd 8.3.9 with RHEL 6 "High Availabity" Cluster Suite package.
> 
> Last few versions of drbd provide the necessary scripts to interface
> drbd with the Redhat cluster resource manager (rgmanager) software.
> This is described in Chapter 10 "Integrating DRBD with Red Hat
> Cluster Suite" of the drbd users's guide.
> 
> The drbd-supplied script (drbd.sh.rhcs) make us of the opencf exit
> status codes (eg. $OCF_ERR_GENERIC), defined in the opencf
> "resource-agent-api" draft standard 
> (http://www.opencf.org/cgi-bin/viewcvs.cgi/*checkout*/specs/ra/resource-agent-api.txt?rev=1.10).
> 
> One minor issue I ran into. The drbd script returns variable
> $OCF_RUNNING for the "status" api. This variable is not defined in
> the opencf draft standard. I believe the correct return variable is
> $OCF_SUCCESS.
> 
> The redhat implementation of the opencf standard is defined in the
> redhat-provided script /usr/share/cluster/ocf-shellfuncs.

Thanks.
Known and fixed:
http://git.drbd.org/?p=drbd-8.3.git;a=commitdiff;h=491a13156fa503907b6e7aec6344ddad7468ca64

But supposed to be harmless, anyways.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] failing slave node - WFConnection PRIMARY/unkown

2011-01-20 Thread Lars Ellenberg

On Thu, Jan 20, 2011 at 05:17:59AM -0800, TrustRanger wrote:
> 
> Maybe it will help to use the cluster resource manager (crm) (integrated in
> heartbeat 2.x.x) to avoid an unexpected takeover. There you have much more
> options to handle the takeover behavior (e.g. when a node loses connection).

If you go "crm", please go pacemaker,
not it's >= four years old predecessor.

Thanks,

> > Yes I use heartbeat-2.1.3-3.el5.centos, so I would tell that I'm using
> > heartbeat 2 but with ha.cf config.

Note that ha.cf is always needed.
The difference is "haresources" or "crm + cib".

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Local Asynchronous Replication

2011-01-25 Thread Lars Ellenberg

On Mon, Jan 24, 2011 at 08:08:09AM -0700, Nick Couchman wrote:
> 
> > Hi,
> > 
> > what level of asynchrony do you need? And (out of curiosity) why?
> > 
> > Cheers,
> > Felix
> 
> I'll tell you what I'm trying to do, and maybe that will answer both
> questions.  I'm trying to roll my own disk-based backup solution.
> Basically, I'll have a Solaris-based system using ZFS to serve volumes
> out to the systems that I want to back up over iSCSI.  This way I can
> use the ZFS snapshot management capabilities to snapshot the volumes at
> various points in time, and use the clone capability to represent that
> volume somewhere in the event that I need a restore done.  Also, because
> ZFS supports remote send/receive out of the box, it gives me a way to
> send those backups off-site very easily for DR purposes.
> 
> On the servers I'm trying to back up, I need some form of asynchronous
> replication.  These systems will connect over iSCSI to the ZFS system in
> order to replicate the volumes.  The reason for the asynchronous
> requirements is because I need to make sure that the replication of the
> data to the secondary (iSCSI ZFS) disk does not block I/O for the
> primary volume - I don't want the fact that I'm replicating the data to
> interfere with performance of the volume.  A second concern is that I
> need to make sure that all I/O operations are actually being done to the
> primary storage on each of those systems and not to the secondary iSCSI
> volume, again, mostly for performance reasons.  Finally, I want to be
> able to shut down the ZFS backup system and the iSCSI links without
> worrying about the system going into any kind of degraded state - it
> needs to be able to pick the synchronization right back up when it comes
> back up.
> 
> The built-in Linux RAID1 driver offers the ability to mark a volume in a
> RAID1 set as "write mostly", which takes care of most of the concern for
> having I/O operations occur on the primary device, but does not
> necessarily insure that write operations will not be blocked by Linux
> waiting on them to occur on both volumes.

Not that I'm advocating against DRBD usually, but just make sure you do
not overlook the parts about write-intent bitmap and write-behind mode
on the mdadm man page.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Local Asynchronous Replication

2011-01-25 Thread Lars Ellenberg

On Mon, Jan 24, 2011 at 10:34:54AM -0700, Chris Worley wrote:
> On Mon, Jan 24, 2011 at 10:15 AM, Nick Couchman  
> wrote:
> >
> >> Ah, I see. So you want to retain Read-Only access even when iSCSI is
> >> disconnected? That's problematic, as DRBD will probably detect possible
> >> split-brains and refuse to resume synchronization. You can of course
> >> discard your local backing device upon each reconnect, but that will
> >> trigger (I think) a full sync from the iSCSI device.
> >
> > No, I want full r/w access even when iSCSI is disconnected, then a
> > resynchronization (full resync is fine) when the iSCSI volume is back.
> > I'm not going to be using this in a cluster scenario at all, so DRBD
> > need not worry about split-brain situations, as nothing will be writing
> > to the volume on the iSCSI side besides DRBD.  Essentially, the iSCSI
> > (backup) side will be a read-only copy used for restoring files and DR
> > recovery scenarios.
> 
> I can think of another reason why this might be useful: fabric
> performance.  DRBD is limited to 10GbE performance.

Is that so?
What makes you think so?

> If the primary
> and secondary each served the local drives as targets over SRP (and
> the primary writes as an initiator directly to the secondary's drive),
> for example, the mirror peak write performance could be much higher.

I cannot follow you here.

If that's like some thought through idea, please can you explain?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Local Asynchronous Replication

2011-01-25 Thread Lars Ellenberg

On Tue, Jan 25, 2011 at 12:27:51PM -0700, Nick Couchman wrote:
> > These steps may or may not work, I hope DRBD isn't smarter than is good
> > for itself ;-)
> > 
> > ip addr add 10.213.0.1/30 dev eth1
> > ip addr add 10.213.0.2/30 dev eth1
> > echo -e '10.213.0.1\tselfA' >> /etc/hosts
> > echo -e '10.213.0.2\tselfB' >> /etc/hosts
> > 
> > Then use selfA and selfB as peers in your DRBD configuration.
> > 
> > All of this is untested, but I hope you get the general idea. Yes, the
> > addresses are arbitrary.
> > 
> > HTH,
> > Felix
> 
> I will give this a shot.  Some initial tests I did late last week,
> however, showed that DRBD expects one of the peers in the config file
> for a resource to be the actual hostname of the machine that it's
> running on.  But maybe I can trick it into working...

No. To actually pull this off, you'd need to do it with two resources,
simply use two resource sections, both using the local hostname (what
uname -n reports), and some "non-existant-peer" hostname, then connect
those to each other.

I really doubt that this would be useful, though, and I'm not sure about
deadlock potential due to "unexpected sharing" of e.g. memory pools.
But feel free to try.

resource to-be-used-as-primary {
 on self { # (what uname -n reports)
address 127.0.0.1:8000;
disk ... # your local device here
 }
 on other { # arbitrary
address 127.0.0.1:8001; # only value actually used on "self"
... nonsense values 
 }
}

resource to-be-used-as-secondary, never to be made primary {
 on self {
address 127.0.0.1:8001; # what is above in "on other"
disk ... # your iscsi device here
 }
 on other {
address 127.0.0.1:8000; # what is above in "on self"
... nonsense values 
 }
}

Regardless of what replication technology you use, you better make
absolutely sure that is the only entity ever modifying your disk images
on the iSCSI target host. Not even "read-only loop mount" that stuff,
typically even read-only mounts first replay the journal...
much less start a VM directly from those images.  Or the replication
will lose track of blocks in need for resynchronisation.
In other words: if you scramble your data, don't blame technology.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Local Asynchronous Replication

2011-01-26 Thread Lars Ellenberg




"Nick Couchman"  schrieb:

>The naming of the drbd devices seems to be a little picky.  I'd like to
>be able to call one /dev/drbd-do-no-use or something like that, but
>drbd
>seems to choke on this and is fairly strict about the drbd
>naming
>scheme. 

 device name has to be either drbd , or start with drbd_ (mind the 
underscore) 

 Lars

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Default Split Brain Behaviour

2011-01-27 Thread Lars Ellenberg

58 emlsurit-v4 kernel: [26892.356300] block drbd9: conn( 
> StandAlone -> Unconnected ) 
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356326] block drbd9: Starting 
> receiver thread (from drbd9_worker [2126])
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356519] block drbd9: receiver 
> (re)started
> Jan 23 22:35:58 emlsurit-v4 kernel: [26892.356527] block drbd9: conn( 
> Unconnected -> WFConnection ) 


So... My guess is, that you still have two versions of your data.

>From this log, there was no sync, because DRBD default behaviour in that
case it to disconnect. Therefore no rollback, and no data loss.
But you certainly have diverging data sets, and my guess is they keep
diverging still.

You have to figure out when they started to diverge, and why.
And you have to sort it out, decide which to keep,
and tell DRBD (see the User's Guide for details on this).

Consider booking DRBD Training

;-)

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] Default Split Brain Behaviour

2011-01-28 Thread Lars Ellenberg

On Fri, Jan 28, 2011 at 11:01:51AM +1100, Lewis Shobbrook wrote:
> Thanks for the reply Lars,
> 
> > > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905963] block drbd9:
> > > drbd_sync_handshake:
> > > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905967] block drbd9: self
> > > 49615ABF1622FC55:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807
> > > bits:143432 flags:0
> > > Jan 23 22:19:35 emlsurit-v4 kernel: [25910.905971] block drbd9: peer
> > > 6116B0558277E470:643454BA1CA67140:5625CFAB3DDD24A2:EA5079D16F8C7807
> > > bits:336381 flags:0
> > 
> > There. Both nodes have changes the other node did not see (yet).
> > That's where DRBD can detect that there previously has been data
> > divergence, usually caused by cluster split brain.
> 
> I'm struggling to see how the secondary node could have write changes as it 
> has never been primary.

Well, I cannot tell you that.

> The resource had originally been in sync, then was manually switched to a 
> detached state for roughly 8 days prior to the data rollback.
> The primary node as mentioned was a KVM instance, this instance does not 
> exist (never has) on the secondary node.
> 
> > So... My guess is, that you still have two versions of your data.
> > 
> > From this log, there was no sync, because DRBD default behavior in
> > that
> > case it to disconnect. Therefore no rollback, and no data loss.
> > But you certainly have diverging data sets, and my guess is they keep
> > diverging still.
> 
> That's what I'd be happy for it to do, but the complete rollback of 8 days of 
> work on a web site is pretty obvious and contrasts.

Maybe the logs you posted do not match the incident described.

Or you attached to stale data, thinking a rollback had taken place,
but actually it is just stale data and the more recent data is still
on the other node.

But the logs you posted do not show any sync taking place, even cleary
show that DRBD refuses to do a sync because it detected data divergence.
There cannot have been a rollback, because there has been no sync,
again according to the logs you posted.

> > You have to figure out when they started to diverge, and why.
> > And you have to sort it out, decide which to keep,
> > and tell DRBD (see the User's Guide for details on this).
> I'd kept the two separate and taken the KVM instance offline in the vain hope 
> that I may have been able to rollback the rollback.
> I made dd images of each nodes LVM associated with the resource just in case, 
> but have now accepted my losses so to speak and begun the reconstruction.
> 
> I've been using DRBD since 2005, and although clearly having much to learn, 
> I'd like to think I have a reasonable handle on the fundamentals.
> What I've experienced with the data roll back is both unexpected and 
> unintended.
> I'm still unclear as to how this node came to discard 8 days worth of data, 
> but am very keen to do so.
> If you good people are prepared to guide me further, I'm prepared to do what 
> is necessary at my end to try determine the cause of this.

Go back to your logs, and find the logs that match the incident
described.

What is the status of that pair of DRBD now?
Is it actually "cs:Connected, UpToDate/UpToDate" ?

Find out when it became so, and how.  Because, again, the logs you
showed previously, state, that DRBD refused to connect.
If it finnaly synced up and connected anyways, likely someone told it to
"--discard-my-data" on one of the nodes (or "invalidate" or something to
that regard).
And if that has been the side with the data you lost,
well, then that someone told DRBD to throw it away.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] CPU Affinity (Documentation Bugs?)

2011-01-31 Thread Lars Ellenberg

On Mon, Jan 31, 2011 at 11:12:49AM +0100, Roland Friedwagner wrote:
> Hello,
> 
> after reading the available documentation in DRBD User-Guide
> (http://www.drbd.org/users-guide/s-latency-tuning.html#s-latency-tuning-cpu-mask)
> 
>   ... 
>   A mask of 12 (1100) implies DRBD may use the third and fourth CPU.
>   ...
> 
> and the man page drbd.conf:
> 
>   ...
>   The default value of cpu-mask is 0, which means that
>   DRBD's kernel threads should be spread over all CPUs of the machine.
>   This value must be given in hexadecimal notation.
>   ...
> 
> 
> I set the config parameter cpu-mask in drbd.conf to 255 
> (to enable usage of all 8 available cores) but got this:
> 
>   # ps u 4387
>   USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME 
> COMMAND
>   root  4387  3.5  0.0  0 0 ?S 2010 1539:29 
> [drbd0_worker]
>   # taskset -c -p 4387
>   pid 4387's current affinity list: 0,2,4,6
> 
> But expected this list: 0,1,2,3,4,5,6,7
> 
> => Conclusion:
> 
>   1. The example in DRBD User-Guide is simply wrong 
>  (drbd.conf: "cpu-mask 12;"   =>   affinity list: 1,4)
> 
>   2. The cpu-mask parameter has to be specified, as stated in the man 
> page,
>  as Hexstring ("cpu-mask ff;" to get the first 8 cpus) in drbd.conf
> 
>   3. But if the parameter cpu-mask is explicit set to zero in drbd.conf
>  (to get it run on _all_ cpus) I get only the second cput (affinity 
> list: 1).
>  So in this aspect the man page is wrong about the default.

It's not exactly wrong, but possibly lacks an important detail:
if the cpu_mask is not specified, the drbd kernel threads of a specific
minor will be pinned on one particular cpu, but accross all minors, drbd
threads will be spread over all cpus.
At least that was the intention, iirc.
Actual results of an unspecified cpu-mask (or explicitly specified as 0)
may even vary with kernel version.

> My DRBD Version is 8.3.9.
> 
> @linbit: Could this be fixed in User-Guide and man page

Thanks, noted, will be fixed.

> And I'm not sure, if it can safely fixed by setting the mask on running
> [drbd1_worker], [drbd1_receiver] and [drbd1_asender] tasks like
> this: 
> 
>   taskset -c -p 0-7 

In case kernel threads won't ignore attempts to set their cpu mask
from userland, and I don't think they do, then this should just work.

> (Because I won't like to shutdown drbd resources on primary)
> Or may this triggers some race condition and drbd hangs or show other
> erratic behaviour?

It won't cause any harm.

But you should just set cpu-mask ff.

Note that it may take a new write request or other "full round trip"
through all threads to become visible: to avoid locking issues they all
set their own cpumask in their respective "main loop" equivalent.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD + LVM backup

2011-02-03 Thread Lars Ellenberg

On Thu, Feb 03, 2011 at 02:16:08PM -0600, J wrote:
> On 2/3/2011 1:09 PM, Digimer wrote:
> >On 02/03/2011 01:50 PM, J wrote:
> >
> >   If you use clustered LVM, snapshotting is not an option.
> >
> >   How you snapshot depends, to an extent, on where LVM is in relation to
> >your DRBD resource. That is, if it's raw partition ->  DRBD ->  LVM, raw
> >partition ->  LVM ->  DRBD or stacked raw ->  LVM ->  DRBD ->  LVM.
> >
> >   If you've got LVM below DRBD, then snapshotting it would, I suspect,
> >take a point-in-time snapshot of one side of the resource. So long as
> >the DRBD itself is UpToDate, this should provide you with a "drive
> >image" capable of restoring your DRBD resource. Alternatively, if you
> >use LVM on DRBD, then you can snapshot individual LVM LVs as you
> >normally would.
> >
> >   In either case, you should not impact or effect your DRBD resource,
> >beyond allocating enough space for the snapshot partition (be it
> >node-side or space in the DRBD resource).
> >
> 
> So, I believe the answer is no: I will not be able to mount that
> locally and easily. Suppose remounting the file system read only for
> a fast rsync is a viable fall back.

Don't give up so quickly because of information overload ;-)

Your question was:
  if I take a snapshot of the logical volume used by
  drbd, will I be able to mount that locally (and easily?)

Though J's answer is correct, it does not clearly say it:

The answer is: Yes.

It's that simple.

Bonus: you can even do it on the Secondary,
so the additional rsync load will less affect the primary too much.

Caveat 1: (nothing to do with DRBD) of course, if there are files
currently in use (like database stuff etc.) you should first tell your
application to "quiescen" (equivalent of "flush tables with lock" or
whatever the incantation was).

Caveat 2: (has to do with additional layer between lv and fs)
If a file system lives directly on an LV, "lvcreate -s" will do an
implicit "freeze" of the file system, and then "thaw" it again once the
snapshot is taken. This is not strictly necessary, but reduces
potentially large journal replays necessary when mounting the snapshot,
in case the snapshot is taken during a particularly busy period.

If there is an additional layer between LV and file system,
in this case DRBD, this implicit freeze/thaw is no longer done.

If you really want it, you'd have to do it explicitly.
To do that, you could use xfs_freeze -f, xfs_freeze -u,
even against a non-xfs file system.

It it much more useful to "quiesce" the application(s)
running on top of that filesystem than the filesystem itself:
if you don't do the former, the latter won't help much,
if you do the former, the latter does not have much to do, anyways.

So if you did not know about that file system freezing,
you can probably safely forget about it again.

The additional information about all the various stacking possibilities
with DRBD and LVM are still correct.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD + LVM backup

2011-02-03 Thread Lars Ellenberg

On Thu, Feb 03, 2011 at 09:55:48PM +0100, Lars Ellenberg wrote:
> On Thu, Feb 03, 2011 at 02:16:08PM -0600, J wrote:
> > On 2/3/2011 1:09 PM, Digimer wrote:
...
> Your question was:
>   if I take a snapshot of the logical volume used by
>   drbd, will I be able to mount that locally (and easily?)
> 
> Though J's answer is correct, it does not clearly say it:

Digimer's answer on J's question.
But anyways:

> The answer is: Yes.
> 
> It's that simple.

Lars
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD + LVM backup

2011-02-03 Thread Lars Ellenberg

On Thu, Feb 03, 2011 at 03:20:50PM -0600, J wrote:
> On 2/3/2011 2:55 PM, Lars Ellenberg wrote:
> >Don't give up so quickly because of information overload ;-)
> >Your question was:
> >   if I take a snapshot of the logical volume used by
> >   drbd, will I be able to mount that locally (and easily?)
> >
> >Though J's answer is correct, it does not clearly say it:
> >
> >The answer is: Yes.
> >
> >It's that simple.
> >
> 
> Thanks for answering my question Lars. I do appreciate it :) I
> decided I would go ahead and try it before I got too much into a new
> shell script for the alternative. I got to the point I thought I
> would get to.
> 
> # lvcreate --snapshot --size 7g --name snap-srvr mvg/srvr
>   Logical volume "snap-srvr" created
> # ls /dev/mvg
> home  root  snap-srvr  srvr
> # mount /dev/mvg/snap-srvr /mnt
> mount: unknown filesystem type 'drbd'

libblkid or whatever tries to play clever here
seems to try to be smarter than it's good for it.

Just explicitly specify the file system, and be done with it.

mount -t ext3 /dev/mvg/snap-srvr /mnt

(or xfs, or ext4, or whatever you are using).


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Re: [DRBD-user] DRBD write after write (cache)

2011-02-09 Thread Lars Ellenberg

On Wed, Feb 09, 2011 at 12:51:24PM -0600, J. Ryan Earl wrote:
> On Sat, Feb 5, 2011 at 4:45 AM, ionral  wrote:
> 
> > if I check the status of drbd the following response is:
> >
> >  0: cs: Connected ro: Primary / Primary ds: UpToDate / UpToDate C r 
> >ns: 640864680 nr: 135234784 dw: 776099464 dr: 1520599328 al: 836478 bm:
> > 2185 lo: 0 pe: 0 ua: 0 ap: 0 ep: 1 wo: b OOS: 0
> >
> > I do not know if it is positive that the method is to write after is
> > Barrier
> >  reading the manuals of drbd I noticed that in these cases (BBU)
> > performance would be better for a wo:n
> >
> > What do you think?
> >
> 
> Modify your drbd.conf to look more like:
> 
>   disk {
> on-io-error   detach;
> # It is good to have backed up write caches
> no-disk-barrier;
> no-disk-flushes;
> no-disk-drain;

Do not use "no-disk-drain", unless you are absolutely sure that nothing
in the IO stack below DRBD could possibly reorder writes.

> no-md-flushes;
> use-bmbv;

With recent DRBD (>= 8.3.8), use-bmbv is a no-op.

With older DRBD, it can cause spurious detaches
on the Secondary, so don't use that either,
unless you absolutely know what you are doing.

>   }
> 
> This will completely turn off write ordering which.  I have measured
> performance gains doing this with backed-up write caches.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list   --   I'm subscribed
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1382 matches

Mail list logo