[DRBD-user] Determining whether a resource is in dual-primary mode

2015-10-29 Thread Veit Wahlich
Hi,

is there a (preferred) method to determine whether a resource is
currently in dual-primary mode, e.g. show the active net-options?

I use "drbdadm net-options --protocol=C --allow-two-primaries " to
active dual-primary mode temporarily for a resource and promote it on
the former secondary, so I can migrate a virtual machine backed by this
resource to another node. After migration has completed or failed, I
demote the resource on the machine that does not run the vm and
deactivate dual-primary mode with "drbdadm net-options
--allow-two-primaries=no ".

For safety reasons, I would like to verify whether the resource is not
already resp. no longer in dual-primary mode, but I found no method to
obtain this information:

 - /proc/drbd does not seem to contain this information
 - neither the documented nor official hidden commands of drbdadm seem
   to be able to return this information

Any idea appreciated.

Regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Determining whether a resource is in dual-primary mode

2015-10-29 Thread Veit Wahlich
Hi Ivan,

thank you for your suggestions.

Am Donnerstag, den 29.10.2015, 18:28 +0200 schrieb Ivan:
> you may be interested by this:
> 
> https://github.com/taradiddles/cluster/blob/master/libvirt_hooks/qemu
> 
> I wrote it some time ago as a qemu hook before ending setting up a full 
> fledged cluster (pacemaker) with proper fencing, which rendered the 
> script useless. It worked pretty well though (it doesn't change the 
> "--allow-two-primaries" but you could easily add it if you end up using 
> the script).

This looks nice. I have written a similar Perl based hook for this case,
but at the moment it only takes care to promote resources used by the
domain before the domain starts, and demotes the resources when it stops
or is migrated away.

I was unsure to add migration support to this hook due to safety
concerns about what happens when migration fails. I already experienced
split-brain situations due to configuration mismatch after one side
executed the "drbdadm --allow-two-primaries" command, but the other one
did not.

So I decided to put this functionality into a control program with
direct user interaction. It also checks for a lot of other potential
problems before migration starts, such as 'file' type disks (e.g. ISOs
for domains' cdroms) not on cluster filesystems, 'dir' type disks, any
DRBD-backed disk cache setting not being 'writethrough', domain being
already defined on other nodes than source, domain having active
snapshots, domain not being persistently defined, ...

> are you sure ? on my machine, cat /proc/drbd
> 
> version: 8.4.6 (api:1/proto:86-101)
> GIT-hash: 9304f3b0995d97316d6bd4455ecbcf2963147ab4 build by 
> bu...@vm-build-el7.c3i.priv, 2015-10-03 16:07:33
> 
> 10: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-
> ns:9850696 nr:7822820 dw:17672764 dr:10737789 al:151 bm:0 lo:0 pe:0 ua:0 
> ap:0 ep:1 wo:f oos:0
> 11: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-
> ns:977368 nr:16 dw:977384 dr:1143886 al:212 bm:0 lo:0 pe:0 ua:0 ap:0 
> ep:1 wo:f oos:0

> $ drbdadm role vm-file
> Primary/Primary
> Primary/Primary

I think I explained my problem mistakable:

My problem is not that I do not know whether there are two primaries
active, but whether two primaries are still allowed (esp. whether
"drbdadm --allow-two-primaeries=no" has been successfully executed.

So I was unable to find a flag in /proc/drbd that indicates whether
allow-dual-primary is active or not.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Bug? GI UUIDs differ by 1

2015-11-02 Thread Veit Wahlich
Hi,

I am observing a quite strage behaviour:

The generation identifier UUIDs on my nodes differ although the volumes
are connected and reported in-sync (in fact they are really in-sync, I
ran a verify).

Regarding to https://drbd.linbit.com/users-guide/s-gi.html this should
not happen.

I find it quite odd that the differences in UUID was always only 1 so
far:

Here GIs from node A:

[root@nodea:~]# drbdadm get-gi all
749E0550B4C8120C::B36328481DC19155:B36228481DC19155:1:1:0:1:0:0:0
CA23BF5B6198E2A2::37EC37F7420D2057:37EB37F7420D2057:1:1:0:1:0:0:0
AD2A6DB42BD2E1E6::8FF18D07624D26DA:8FF08D07624D26DA:1:1:0:1:0:0:0

[root@nodeb:~]# drbdadm get-gi all
749E0550B4C8120C::B36328481DC19154:B36228481DC19155:1:1:0:1:0:0:0
CA23BF5B6198E2A2::37EC37F7420D2056:37EB37F7420D2057:1:1:0:1:0:0:0
AD2A6DB42BD2E1E7::8FF18D07624D26DA:8FF08D07624D26DA:1:1:1:1:0:0:0

Here for the 3rd volume, the currend GI UUID differs by 1.
For the 1st and 2nd volume, the younger history GI UUID differs by 1.

Is this in any way an expected behaviour?
I think that I hit a bug -- might it cause further problems?

I see no warnings about UUIDs not matching in the kernel log. The only
problem I see is that the replication link seems to have disconnected
for two seconds right after boot. This seems to happen due to
reconfiguration of bonding.

After that DRBD started to sync and completed successfully, but here the
UUID was updated to the not-matching number:

Node A:
...
[   32.358379] block drbd10100: drbd_sync_handshake:
[   32.358383] block drbd10100: self 
8FF08D07624D26DA::34233B59A66A8C62:34223B59A66A8C63 bits:0 
flags:0
[   32.358385] block drbd10100: peer 
AD2A6DB42BD2E1E7:8FF08D07624D26DA:34233B59A66A8C63:34223B59A66A8C63 bits:0 
flags:0
see here ---^
[   32.358387] block drbd10100: uuid_compare()=-1 by rule 50
...
[   32.372810] block drbd10100: Began resync as SyncTarget (will sync 0 KB [0 
bits set]).
[   32.373783] block drbd10100: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
[   32.373787] block drbd10100: updated UUIDs 
AD2A6DB42BD2E1E6::8FF18D07624D26DA:8FF08D07624D26DA
see here ^
[   32.373790] block drbd10100: conn( SyncTarget -> Connected ) disk( 
Inconsistent -> UpToDate ) 
...

Node B:
...
[   32.502916] block drbd10100: drbd_sync_handshake:
[   32.502924] block drbd10100: self 
AD2A6DB42BD2E1E7:8FF08D07624D26DA:34233B59A66A8C63:34223B59A66A8C63 bits:0 
flags:0
see here ---^
[   32.502929] block drbd10100: peer 
8FF08D07624D26DA::34233B59A66A8C62:34223B59A66A8C63 bits:0 
flags:0
[   32.502933] block drbd10100: uuid_compare()=1 by rule 70
...
[   32.506619] block drbd10100: Began resync as SyncSource (will sync 0 KB [0 
bits set]).
[   32.506645] block drbd10100: updated sync UUID 
AD2A6DB42BD2E1E7:8FF18D07624D26DA:8FF08D07624D26DA:34233B59A66A8C63
[   32.512254] block drbd10100: Resync done (total 1 sec; paused 0 sec; 0 K/sec)
[   32.512262] block drbd10100: updated UUIDs 
AD2A6DB42BD2E1E7::8FF18D07624D26DA:8FF08D07624D26DA
see here ^
[   32.512272] block drbd10100: conn( SyncSource -> Connected ) pdsk( 
Inconsistent -> UpToDate ) 
...

System is CentOS 7.1, fully updated.
drbd kernel module is version 8.4.6 from official tarball.
drbd-utils is version 8.9.3 from official tarball.

Both downloaded form http://oss.linbit.com/drbd/.

Regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Bug? GI UUIDs differ by 1

2015-11-02 Thread Veit Wahlich
Hi Roland,

thank you for your reply.

May I suggest to update the documentation accordingly?

Best regards,
// Veit

> Hi,
> 
> it is expected behavior, don't worry, everything is fine. The lowest bit
> is used to encode the role (1 == primary, 0 == secondary IIRC).
> 
> Regards, rck

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Determining whether a resource is in dual-primary mode

2015-11-02 Thread Veit Wahlich
Am Montag, den 02.11.2015, 15:53 +0100 schrieb Lars Ellenberg:
> What's wrong with
> # drbdsetup XYZ show --show-defaults 

That is exactly what I was looking for, thank you!

Just a little more complicated to parse than I hoped for, but feasible
without problems.

Regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Determining whether a resource is in dual-primary mode

2015-11-02 Thread Veit Wahlich
Am Montag, den 02.11.2015, 17:14 +0100 schrieb Lars Ellenberg:
> res=XYZ
> if drbdsetup show $res | grep -q allow-two-primaries; then
>   echo "Two primaries are allowed."
> else
>   echo "Two primaries are not allowed."
> fi
> 
> complicated to parse?
> where?

Well, for safety reasons I thought about something more sophisticated, a
parser that understands the structure of drbdsetup's configuration data
output and converts sections and attributes to a neat data structure
that can be addressed directly.
I will use the data captured for other things to, in this case (live
migration) to determine the protocol before setting it to C, to be able
re-set it after migration to its previous value. E.g. in case some VMs
might use protocol B in the future, so after live migration I will set
it back to B without having to call drbdadm adjust.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify always report oos

2016-05-18 Thread Veit Wahlich
Hi,

how did you configure die VMs' disk caches? In case of qemu/KVM/Xen it
is essential for consistency to configure cache as "write through", any
other setting is prone to problems due to double-writes, unless the OS
inside of the VM uses write barriers.

Although write barriers are default for many Linux distributions, it is
often disabled within VMs for performance reasons, e.g. by the
virt-guest profine in tuned.
Also Linux swap does not support write barriers at all, meaning that
migrating a VM might not cause file system inconsistencies but memory
corruptions inside a VM, leading to unpredictable results.

Windows OS is also very prone to double-write problems.

If you use any VM cache configuration other than write through, please
consider switching to write through. VMs will need to be restarted.
After all VMs have been restarted, use verify, disconnect, connect to
get rid of oos sectors and check using verify if they occur again.

When migrating VMs between hosts, you may ignore warnings stating write
through cache to be unsafe for migration.

Google for this issue for further information.

Regards,
// Veit

Am Mittwoch, den 18.05.2016, 18:21 +0800 schrieb d tbsky:
> hi:
> I am using drbd 8.4.7 which comes from epel under scientific linux
> 7.2.  when I try "drbdadm verify res", it report there is oos.
> 
>so I disconnect/connect the resource, the oos now becomes 0. but
> when I verify it again, it report oos again.  the oos amount is
> different than previous, but the oos sector numbers are simliar.
> 
>I also try "invalidate-remote" to resync all and then verify, but
> it still report oos.
> 
> I don't know what happened, it seems my cluster has big problems
> with data consistency. but all the vm running above the drbd resources
> seems fine, and I migrate them between hosts many times.
> 
>is the behavior normal or I should replace the hardware now?
> 
>thanks for help!!
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify always report oos

2016-05-18 Thread Veit Wahlich
Are you utilising SSDs?

Is the kernel log (dmesg) clean from errors on the backing devices (also mdraid 
members/backing devices)? 

Did you verify the mdraid array consistency and are the array's members in sync?


 Ursprüngliche Nachricht 
Von: d tbsky 
Gesendet: 18. Mai 2016 18:27:00 MESZ
An: Veit Wahlich 
CC: drbd-user@lists.linbit.com
Betreff: Re: [DRBD-user] drbdadm verify always report oos

Hi:
I shutdown the vm when I found the strange behavior. so the drbd
is resync under idle situation. I try to play with config options and
resync about 15 times, still can not get verify report 0 oos.

   I have about 10 resource which has verify problem. but it's strange
that some resources are ok. the largest resource is about 1T and it is
ok. the resource I am testing now is only 32G.

  the host structure is: two sata disk (mdadm raid 1) -> lvm -> drbd

  the host has ecc ram so the memory is ok. the most confusing  part
is the resync data is very small, it is only a few kilo bytes and sync
done in 1 seconds. I don't know why it can not be synced correctly. I
also try to run command "md5sum /dev/drbdX" at both node to check.
they are indeed different.


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbdadm verify always report oos

2016-05-19 Thread Veit Wahlich
Hi,

well, if it still occurs while the resources are not accessed and thus
no data is transferred at all except for the resync and verify, I
suspect a surface or mapping related storage hardware/firmware issue to
be the culprit, as this would also explain that this issue occurs on the
same resources again and again while never on others.

In other thoughts I'd suspect that a rogue process on the host system
alters the backing devices directly, e.g. LVM accessing some DRBD
resources' backing devices. You might want to use the "pvs" command to
verify that LVM does not incorporate the backing devices at all.

Regards,
// Veit

Am Donnerstag, den 19.05.2016, 10:09 +0800 schrieb d tbsky:
> Hi:
> 
> it is not ssd. it is just two 2TB sata hard disks. mdadm is
> checked every week and I don't see any error report under dmesg.
> there are 20 VMs running above it and they seems normal. but  I wonder
> it will be normal again after so many verify/resync. so I just pick a
> test-vm to try. still trying the config options to see if I can get it
> resync. I can also use dd to resync it but then the problem may
> disappear and I won't know what happened.
> 
>any suggestions to find out what happened?
> 
> 2016-05-19 4:50 GMT+08:00 Veit Wahlich :
> > Are you utilising SSDs?
> >
> > Is the kernel log (dmesg) clean from errors on the backing devices (also 
> > mdraid members/backing devices)?
> >
> > Did you verify the mdraid array consistency and are the array's members in 
> > sync?
> >
> >
> >  Ursprüngliche Nachricht 
> > Von: d tbsky 
> > Gesendet: 18. Mai 2016 18:27:00 MESZ
> > An: Veit Wahlich 
> > CC: drbd-user@lists.linbit.com
> > Betreff: Re: [DRBD-user] drbdadm verify always report oos
> >
> > Hi:
> > I shutdown the vm when I found the strange behavior. so the drbd
> > is resync under idle situation. I try to play with config options and
> > resync about 15 times, still can not get verify report 0 oos.
> >
> >I have about 10 resource which has verify problem. but it's strange
> > that some resources are ok. the largest resource is about 1T and it is
> > ok. the resource I am testing now is only 32G.
> >
> >   the host structure is: two sata disk (mdadm raid 1) -> lvm -> drbd
> >
> >   the host has ecc ram so the memory is ok. the most confusing  part
> > is the resync data is very small, it is only a few kilo bytes and sync
> > done in 1 seconds. I don't know why it can not be synced correctly. I
> > also try to run command "md5sum /dev/drbdX" at both node to check.
> > they are indeed different.
> >
> >
> > ___
> > drbd-user mailing list
> > drbd-user@lists.linbit.com
> > http://lists.linbit.com/mailman/listinfo/drbd-user
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9: full-mesh and managed resources

2016-08-18 Thread Veit Wahlich
Am Donnerstag, den 18.08.2016, 12:33 +0200 schrieb Roberto Resoli:
> Il 18/08/2016 10:09, Adam Goryachev ha scritto:
> > I can't comment on the DRBD related portions, but can't you add both
> > interfaces on each machine to a single bridge, and then configure the IP
> > address on the bridge. Hence each machine will only have one IP address,
> > and the other machines will use their dedicated network to connect to
> > it. I would assume the overhead of the bridge inside the kernel would be
> > minimal, but possibly not, so it might be a good idea to test it out.
> 
> Very clever suggestion!
> 
> Many thanks, will try and report.

If you try this, take care to enable STP on the bridges, or this will
create loops.
Also STP will give you redundancy in case a link breaks and will try to
determine the shortest path between nodes.

But the shortest link is not guaranteed. Especially after recovery from
a network link failure.
You might want to monitor each node for the shortest path.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD9: full-mesh and managed resources

2016-08-19 Thread Veit Wahlich
Hi Dan,

Am Donnerstag, den 18.08.2016, 10:33 -0600 schrieb dan:
> Simplest solution here is to overbuild.  If you are going to do a
> 3-node 'full-mesh' then you should consider 10G ethernet (a melanox w/
> cables on ebay is about US$20 w/ cables!).  Then you just enable STP
> on all the bridges and let it be.  If you are taking 2 hops, that
> should still be well over the transfer rates you need for such a small
> cluster and STP will eventually work itself out.

I agree that it would work fine even when not in optimal state.
But my least-hop consideration was more about latency and unexpected CPU
overhead for bridging and not simply trusting the system to return to
the optimal state automatically.

At least for my own applications, I need to know, not to trust. Thus I
suggest to monitor the state.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Out-of-sync woes

2017-08-04 Thread Veit Wahlich
Hi Luke,

I assume you are experiencing the results of data inconsistency by
in-flight writes. This means that a process (here your VM's qemu) can
change a block that already waits to be written to disk.
Whether this happens (undetected) or not depends on how the data is
accessed for writing and synced to disk.

For qemu, you have to consider two factors; the guest OS' file systems'
configuration and qemu's disk caching configuration:
On Linux guests, this usually only happens for guests with file systems,
that are NOT mounted either sync or with barriers, and with block-backed
swap.
On Windows guests it always happens.
For qemu it depends on how the disk caching strategy is configured and
thus whether it allows in-fight writes or not.

The common position is to configure qemu for writethrough caching for
all disks and leave your guests' OS unchanged. You will also have to
ignore/override libvirt's warning about unsafe migration with this cache
setting, as it only applies to file-backed VM disks, not
blockdev-backed.
I use this for hundreds of both Linux and Windows VMs backed by DRBD
block devices and have no inconsistency problems at all since this
change.

Changing qemu's caching strategy might affect performance.
For performance reasons you are advised to use a hardware RAID
controller with battery-backed write-back cache.

For consistency reasons you are advised to use real hardware RAID, too,
as the in-flight block changing problem described above might also
affect mdraid, dmraid/FakeRAID, LVM mirroring, etc. (depending on
configuration).

Best regards,
// Veit


Am Freitag, den 04.08.2017, 11:11 +1200 schrieb Luke Pascoe:
> Hello everyone.
> 
> I have a fairly simple 2-node CentOS 7 setup running KVM virtual
> machines, with DRBD 8.4.9 between them.
> 
> There is one DRBD resource per VM, with at least 1 volume each,
> totalling 47 volumes.
> 
> There's no clustering or heartbeat or other complexity. DRBD has it's
> own Gig-E interface to sync over.
> 
> I recently migrated a host between nodes and it crashed. During
> diagnostics I did a verification on the drbd volume for the host and
> found that it had _a lot_ of out of sync blocks.
> 
> This led me to run a verification on all volumes, and while I didn't
> find any other volumes with large numbers of out of sync blocks, there
> were several with a few. I have disconnected and reconnected all these
> volumes, to force them to resync.
> 
> I have now set up a nightly cron which will verify as many volumes as
> it can in a 2 hour window, this means I get through the whole lot in
> about a week.
> 
> Almost every night, it reports at least 1 volume which is out-of-sync,
> and I'm trying to understand why that would be.
> 
> I did some research and the only likely candidate I could find was
> related to TCP checksum offloading on the NICs, which I have now
> disabled, but it has made no difference.
> 
> Any suggestions what might be going on here?
> 
> Thanks.
> 
> Luke Pascoe
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Imported vm to kvm does not show interface

2017-08-04 Thread Veit Wahlich
Hi Dirk,

this is most likely caused by a NIC model not supported by the guest OS'
drivers, but this is also totally off-topic redarding this is the DRBD
ML. I suggest you consult the KVM/qemu/libvirt MLs on this topic
instead.

Beste Gruesze,
// Veit

Am Freitag, den 04.08.2017, 13:59 + schrieb Dirk Lehmann:
> Hello,
> 
> 
> I exported one of my virtual maschines from Oracle virtualbox to ova
> 2.0 and converted this file to KVM qcow2 like described for example in
> this tutorial:
> 
> 
> https://utappia.org/2016/04/20/how-to-migrate-virtual-box-machines-to-the-kvm-virtmanager/
> 
> 
> Unfortunatly this vm does not bring up eth0 when started by KVM.
> 
> 
> Any hint how to fix to get migrated to KVM with DRBD and pacemaker HA?
> 
> 
> Best regards,
> 
> 
> Dirk
> 
> 
> 
> ---
> 
> 
> Dirk Lehmann
> 
> Informatikkaufmann (IHK)
> 
> Groppstraße 11
> 
> 97688 Bad Kissingen
> 
> Telefon (0971) 121 922 56
> 
> Telefax (0971) 121 922 58
> 
> Webseite www.so-geht-es.org
> 
> 
> Diese Nachricht wurde gesendet mit Outlook for Android und ist nur für
> den Empfänger bestimmt.
> 
> 
> „LASS DEN KLICK IN DEINER STADT! Kauf da ein, wo Du auch lebst“ und
> besuche meinen Online Shop jetzt unter
> http://www.badkissingen.computer
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Veit Wahlich
Hi,

Am Freitag, den 18.08.2017, 14:16 +0200 schrieb Gionatan Danti:
> Hi, I plan to use a primary/secondary setup, with manual failover.
> In other words, split brain should not be possible at all.
> 
> Thanks.

having one DRBD ressource per VM also allows you to run VMs on both
hosts simultaniously, enables VM live migration and your hosts may even
go into (planned or unplanned) DRBD disconnected situations without
interruption of service and with automatic recovery on reconnect.

This might be worth to consider.

Best regards,
// Veit


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Veit Wahlich
To clarify:

Am Freitag, den 18.08.2017, 14:34 +0200 schrieb Veit Wahlich:
> hosts simultaniously, enables VM live migration and your hosts may even

VM live migration requires primary/primary configuration of the DRBD
ressource accessed by the VM, but only during migration. The ressource
can be reconfigured for allow-two-primaries and revert this setting
afterwards on the fly.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD over ZFS - or the other way around?

2017-08-18 Thread Veit Wahlich
Am Freitag, den 18.08.2017, 15:46 +0200 schrieb Gionatan Danti:
> Il 18-08-2017 14:40 Veit Wahlich ha scritto:
> > VM live migration requires primary/primary configuration of the DRBD
> > ressource accessed by the VM, but only during migration. The ressource
> > can be reconfigured for allow-two-primaries and revert this setting
> > afterwards on the fly.
> 
> Hi Veit, this is interesting.
> So you suggest to use DRBD on top of a ZVOLs?

Yes, I regard qemu -> DRBD -> volume management [-> RAID] -> disk the
most recommendable solution for this scenario.

I personally go with LVM thinp for volume management, but ZVOLs should
do the trick, too. 

With named ressources (named after VMs) and multiple volumes per
ressource (for multiple VM disks), this works very well for us for
hundreds of VMs.

Having a cluster-wide unified system for numbering VMs is very
advisable, as it allows to calculate the ports and minor numbers for
both DRBD and KVM/qemu configuration.

Example:
* numbering VMs from 0 to 999 as , padded with leading zeros
* numbering volumes from 0 to 99 as , padded with leading zeros
* DRBD ressource port: 10
* VNC/SPICE unencrypted port: 11
* SPICE TLS port: 12
* DRBD minor: 

Let's say your VM gets number 123, it has 3 virtual disks and uses VNC:
* DRBD ressource port: 10123
* VNC port: 11123
* DRBD minor of volume/VM disk 0: 12300
* DRBD minor of volume/VM disk 1: 12301
* DRBD minor of volume/VM disk 2: 12302

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] cronjob for verify

2017-10-09 Thread Veit Wahlich
Hi Jan,

Am Sonntag, den 08.10.2017, 13:07 +1300 schrieb Jan Bakuwel:
> I'd like to include an automatic disconnect/connect on the secondary if 
> out-of-sync blocks were found but so far I haven't found out how I can 
> query drbd to find out (apart from parsing the log somehow). I hope 
> there's a simpler/cleaner way to find out.

beside it is wise to monitor for OOS blocks using methods like those
suggested by Robert, IMO it is advisable to NOT sync OOS blocks
automatically as it usually indicates a problem with your storage stack
or its configuration.

Best regards,
// Veit


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] cronjob for verify

2017-10-09 Thread Veit Wahlich
Hi Jan,

Am Dienstag, den 10.10.2017, 06:56 +1300 schrieb Jan Bakuwel:
> I've seen OOS blocks in the past where storage stack appeared to be fine 
> (hardware wise). What possible causes could there be? Hardware issues, bugs 
> in storage stack including DRBD itself, network issues. In most (all?) cases 
> it seems prudent to me to keep the resources in sync as much as possible and 
> of course investigate once alerted by the monitoring system.

a common configuration issue is using O_DIRECT for opening files or
block devices. O_DIRECT is used by userspace processes to bypass parts
of the kernel's I/O stack with the goal to reduce CPU cycles required
for I/O operations and to eliminate/minimize caching effects.
Unfortunately this also allows the content of buffers to be changed
while they are still "in-flight", speaking simplified, e.g. while being
read/mirrored by DRBD, software RAID, ...
The general use for O_DIRECT is for applications that either want to
bypass caching, such as benchmarks, or that implement caching by
themselves, which is the case for e.g. some DBMS. But also qemu (as used
by KVM and Xen) implements several kinds of caching and uses O_DIRECT
for VM disks depending on the configured caching mode.

HtH,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] cronjob for verify

2017-10-10 Thread Veit Wahlich
Hi Jan,

Am Dienstag, den 10.10.2017, 13:11 +1300 schrieb Jan Bakuwel:
> Thanks for that. Must say that possibility has escaped my attention so 
> far. I'm using DRBD in combination with Xen and LVM for VMs so I assume 
> O_DIRECT is in play here. Any suggestions where to go from here? A 
> search for DRBD, LVM, Xen and O_DIRECT doesn't seem to bring up any 
> results discussing this issue.

qemu uses O_DIRECT for all caching modes except for "writethrough" and
"writeback", so both modes are safe to be used with DRBD to prevent OOS
blocks.

However it is controversal whether "writeback" is actually
migration-safe in shared storage or synchronized storage environments as
some people assume qemu to not properly sync data to disk before handing
over to the new host. However (some versions of) libvirt will also
report "writethrough" as unsafe for migration, although it has been
proven otherwise.
I have not analyzed possible issues with qemu's "writeback" cache by
myself, as I regard it unsafe anyway (as it means to trust the guest OS
to properly do syncs). Do hardware write back caching with BBU on the
host system instead.

The caching mode can be set using the "cache=" attribute within qemu's
"-drive" parameters, so you can check which cache mode is currently used
by examining the command line parameters of running qemu processes.
Where to configure the cache mode depends on what invokes qemu in your
environment. Some environments use cache mode "none" as default if the
VM's backing storage is a block device (such as a LVM LV) to avoid
caching.
If you are using libvirt, you can add/change the "cache" attribute of
the "driver" subnode within the "disk" node in the domains' XML
configuration.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Semantics of oos value, verification abortion

2017-12-27 Thread Veit Wahlich
Hi Christoph, 

I believe that, at least for synchronous replication with protocol C, the oos 
count should always be 0 in a healthy, fully synchronized configuration, and 
that any occurance of a value >0 (except for currently running manual 
administrative tasks) indicates a problem that requires to be investigated. 
Therefore I regard an automated disconnect-connect, for the sole purpose of 
clearing the oos counter without determining the cause, both a very bad idea 
and bad practice.
We have run hundreds of synchronously replicated DRBD8 volumes for years now 
that we verify weekly, but we never ever sighted oos that were not either 
caused by a runtime, configuration or hardware issue.

Our verification runs utilise a script similar to yours, but it actively 
parallelises the task to optimise for minimum duration while maintaining a 
constant load that won't harm performance. It does so by sorting all volumes by 
size and then run a given number of verify tasks at once, beginning with the 
largest volumes, and starting the next verify once one finishes. Especially on 
machines that have few very big volumes and lots of small ones, this allows to 
complete the verification of all volumes at the time the big volumes take 
alone, thus minimal duration at constant I/O load without peaks. The script 
prints a report to stdout with any occurance of oos to stderr, making it easy 
to filter for any problems -- even before monitoring notices. 

Best regards, 
// Veit 


 Ursprüngliche Nachricht 
Von: Christoph Lechleitner 
Gesendet: 28. Dezember 2017 01:05:30 MEZ
An: drbd-user 
CC: Wolfgang Glas 
Betreff: [DRBD-user] Semantics of oos value, verification abortion

Hello everbody!


I have a question regarding the exact semantics of the oos value in
/proc/drbd.


The Users Guide
  https://docs.linbit.com/doc/users-guide-84/ch-admin/
says:
  "oos (out of sync). Amount of storage currently out of sync; in
Kibibytes. Since 8.2.6."

After several uncomforting events over the years we have now started to
do regular verify runs.

We will announce our script as open source right here at some point in
the future, but we want to clarify some details first.

Our script basically calls
  drbdadm verify
on one resource at a time, because
  drbdadm verify all
would kill the system for sure.

After the verification run has completed, the script
- analyses the oos: value,
- eventually disconnects & connects the resource
- starts verification of the next resource

The script does not run as daemon, it's simply called regularily via
cron, on the node with the more important resources.


My main question is:

Should the oos value always be 0?

Does a non-0 value of oos mean that there have been sync errors?

Or does oos include blocks that are currently beeing synched or waiting
to be synched, too?

In the latter case, what would be a valid condition to disconnect &
connect a resource after a verification run?


Also: Are there events that can cause a verification run to be aborted?

One verification run on a huge resource (1.3 TB, HW RAID 5, dedicated
GBit line) was finished way too fast, so I think something must have
aborted it, like, say,
- a buffer runs full
-> automatic disconnect/reconnect
-> verification aborted

If something along this line is possible, is there a way to avoid or
detect that?
Maybe a kernel message we could grep for?


Thanks,

Regards,

Christoph


-- 

Christoph Lechleitner

Geschäftsführung


ITEG IT-Engineers GmbH | Conradstr. 5, A-6020 Innsbruck
Mail: christoph.lechleit...@iteg.at | Web: http://www.iteg.at/


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Semantics of oos value, verification abortion

2017-12-28 Thread Veit Wahlich
Hi again,

well, O_DIRECT is a Linux specific and only used in very special cases, thus 
mostly not used by default or at least deactivatable. To be fair, this is not a 
DRBD problem, but Linux', as it allows user space processes to bypass parts of 
the kernel for tuning -- you may call it fast path, but actually it is a bypass 
including side effects. In the past there have even been several discussions in 
the Linux kernel community on dropping O_DIRECT support (or replacing it by 
something more sane), but it was kept for what I believe 
don't-break-the-userland compatibility reasons.

I am quite confident this will only apply to few of your containers. So the 
actual task is to identify them, which might already been done due to the 
issues you encountered.

Also I suppose that LXC allows to run a process in some kind of super context, 
just like Linux VServers/Linux Secure Contexts and OpenVZ does, so a process on 
the host can see all processes from all the contexts at once. So running lsof 
for diagnostics in this super context should give you a list of all files 
currently open. Use the +fG option to add a column to the listing that shows 
all the flags used to open the files -- O_DIRECT is 0x8 and is ORed bitwise to 
the other flags. You might want to use +fg instead, which decodes the flags, 
but the decoded flags are abbreviated and I do not know from memory what 
abbreviation is used for O_DIRECT. 

O_DIRECT might also be safe in combination with O_SYNC, as I suppose O_SYNC to 
prevent in-flight changes of buffers by blocking writes until data is 
processed, but that question should be asked to and answered by someone with 
more Linux kernel (source code) expertise.

Best regards, 
// Veit 


 Ursprüngliche Nachricht 
Von: Christoph Lechleitner 
Gesendet: 28. Dezember 2017 21:14:37 MEZ
An: drbd-user 
CC: Veit Wahlich , Wolfgang Glas 
Betreff: Re: [DRBD-user] Semantics of oos value, verification abortion

On 2017-12-28 13:32, Veit Wahlich wrote:
> Hi Christoph, 
> 
> I do not have experience with the precise functioning of LXC disk storage, 
> but I assume that every operation that could cause oos applies to every 
> application running inside the LXC containers, too.
> 
> A common cause, that I suspect here, is opening a file (or block device) 
> using O_DIRECT. This flag is used to reduce I/O latency and especially bypass 
> the page cache, but it also allows buffers to be modified in-flight while 
> they are processed by e.g. DRBD. So not only DRBD is affected by this, but 
> also software RAID such as mdraid, dmraid or lvmraid, and I bet even block 
> caching such as bcache.

Are you serious?

Can someone from linbit please comment on this?

This would basically mean that DRBD is useless whenever an application
opens files with O_DIRECT!?

How could a fast path to user space render the replication of the
underlying block device useless?


> In most cases O_DIRECT is used by applications such as some DBMS to avoid 
> caching by the kernel, as they implement their own cache or do not want the 
> kernel to sacrifice memory on page caching as the data written will not be 
> used again.
> 
> So my recommendation is to check your logs/monitoring if the oos has only 
> occurred repeatedly on certain containers, and then inspect the applications' 
> configuration running inside for the use of O_DIRECT (which can usually be 
> disabled).
> If it has been occurring on all your containers, I would suspect your LXC 
> configuration itself as the cause, such as an overlay filesystem or container 
> image. 

Checking 1000s of applications in 100s of containers is NOT an option.


Regards, Christoph
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] KAISER healing? Re: Semantics of oos value, verification abortion

2018-01-19 Thread Veit Wahlich
Hi Christoph, 

this might also be caused by other patches backported to make the KPTI patches 
work with your running kernel. 

But indeed I think it is possible that the KPTI patches sail around your 
problem as e. g. syscalls now block longer and prevent your application to 
manipulate buffers. Also might your kernel (or tunables profile) come with 
changed default values for process scheduling, e. g. serving longer time slices 
to (kernel) processes or change CPU migration rate to minimise the count of 
context changes in order to lower the impacts of the Meltdown/Spectre patches, 
thus simply giving higher probability of finishing to process the data before 
another process is scheduled that might change it in-flight. I could think of 
even a few more scenarios. 

Best regards, 
// Veit 


 Ursprüngliche Nachricht 
Von: Christoph Lechleitner 
Gesendet: 20. Januar 2018 01:29:38 MEZ
An: drbd-user@lists.linbit.com
CC: Wolfgang Glas 
Betreff: [DRBD-user] KAISER healing? Re:  Semantics of oos value, verification 
abortion

Am 31.12.17 um 17:21 schrieb Christoph Lechleitner:
> On 2017-12-29 23:07, Christoph Lechleitner wrote:
>> On 2017-12-29 00:49, Christoph Lechleitner wrote:
>>
>> What' worse and slightly ALARMING:
>> When this occurs a eventual verification is aborted!
> 
> Maybe it's not that alarming.
> 
> After I failed to find anything in the logfiles of a LXC guest that is
> almost unused but regularily affected I found out those
>   "buffer modified by upper layers during write"
> might be false positives, in swap or append situations.
> 
> While we don't use DRBD for swap, appending to logfiles happens all the
> time.
> 
> I'll follow the recommendation there:
> - switch off data-integrity-alg
> - keep up the regular verify runs

This "works" in the sense that the verify runs now seem to finish (as
opposed to abort).

Although I wouldn't call it "working" as long as certain LXC guest
partitions continue to produce oos-Blocks on a daily basis.

We already have set up an appointment with a kernel expert to find
O_DIRECT "sinners" (by means of the kernel's features for function call
tracing).


BUT, the KAISER/KPTI patches seem to prevent further oos-Blocks.

Or maybe the inconsistencies go undetected now, maybe even hitting the
primary node ;-)

Next update in a few weeks ...


Regards, Christoph

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Access to the slave node

2018-03-15 Thread Veit Wahlich
Hi Ondrej, 

yes, this is perfectly normal in single-primary environments. DRBD simply does 
not permit to access the resource block devices until it is promoted to 
primary. What you describe would only work in dual-primary environments, but 
running such an environment also requires a lot more precautions than 
single-primary to not endanger your data. Also remember that even mounting 
read-only does for many (most?) filesystems not mean that no data is altered; 
at least meta data such as "last-mounted" attributes are still written, also 
journal replay might occur. As the fs on the primary is still updated while 
your read-only side does not expect this, your ro-mount will most likely read 
garbage at some point and might even freeze the system.

There are only a few scenarios to prevent such situations, and I regard the two 
following the most useful ones:

a) Implement a dual-primary environment running a cluster filesystems such as 
GFS or OCFS2 on top -- this is hard work to learn and build and offers lots of 
pitfalls that put your data in danger and is currently limited to 2 nodes, but 
it even allows to write the fs from both sides.

b) Build a single-primary environment like your existing, but use a block layer 
that allows snapshots (e. g. classic LVM, LVM thinp or ZFS) to place your DRBD 
backing devices upon -- when you need to access the primary's data from a 
secondary, take a snapshot of its backing device on the secondary and mount the 
snapshot instead of the DRBD volume.

Addendum to b): This reflects the state of the fs only at the point in time the 
snapshot was created. You will be able to even mount the snapshot rw without 
affecting the DRBD volume. If using a backing device with internal metadata, 
this metadata will also be present in the snapshot, but most (if not all) Linux 
filesystems will ignore any data at the end of the block device that is out of 
the fs' actual size. The snapshot will grow as data is written to the DRBD 
volume and, depending on the snapshot implementation and block size/pointer 
granularity, will slow down writes to both the DRBD volume and the snapshot as 
long as the snapshot exists (due to copy on write and/or pointer tracking). So 
only choose this scenario if you need to read data from the secondary for a 
limited time (such as backup reasons), or you are willing to renew the snapshot 
on a regular basis, or you can afford to sacrifice possibly a lot of storage 
and write performance on this. 

Best regards,
// Veit 


 Ursprüngliche Nachricht 
Von: Ondrej Valousek 
Gesendet: 15. März 2018 11:21:49 MEZ
An: "drbd-user@lists.linbit.com" 
Betreff: [DRBD-user] Access to the slave node

Hi list,

When trying to mount the filesystem on the slave node (read-only, I do not want 
to crash the filesystem), I am receiving:

mount: mount /dev/drbd0 on /brick1 failed: Wrong medium type

Is it normal? AFAIK it should be OK to mount the filesystem read-only on the 
slave node.
Thanks,

Ondrej


-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] primary reverts to secondary after reboot

2018-05-18 Thread Veit Wahlich
Hi Nico,

if you want DRBD to promote one node automatically at boot-up, your
resource file(s) will need a startup{} section featuring a
become-primary-on statement.

If your setup is not yet in production, you might consider using single
DRBD ressources per VM instead of a replicated filesystem, so you could
live-migrate VMs between the hosts and let libvirt promote/demote the
ressource when the VM is powered on/off using a handler script.
You do not need a permanent dual-primary configuration for this, but
only during live migration and only for the ressource of the VM being
currently migrated.

Best regards,
// Veit


Am Mittwoch, den 16.05.2018, 10:20 +0200 schrieb Nico De Ranter:
> 
> 
> Hi all,
> 
> 
> I'm trying to create a simple setup containing 2 servers.  One server
> has a filesystem on /dev/drbd0 mounted as /var/lib/libvirt/. The drbd
> disk is synchronised to a second 'passive' server.  If something goes
> wrong with the primary a script should be run manually on the
> secondary to promote it to primary and restart all VM's.  I do not
> intend this to be automatic.
> 
> 
> The initial configuration seems to run fine until I reboot the primary
> server.  After the primary reboots the drbd0 resource is set to
> Secondary/Secondary. I need to manually promote it back to Primary and
> restart all my services.  
> 
> cat /proc/drbd 
> version: 8.4.5 (api:1/proto:86-101)
> srcversion: 4B3E2E2CD48CAE5280B5205 
>  0: cs:Connected ro:Secondary/Secondary ds:UpToDate/UpToDate C r-
> ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
> 
> 
> How can I prevent this from happening?  I want my primary to remain
> primary until I manually promote the secondary in which case the
> original primary is considered dead and will be wiped.
> 
> content of /etc/drbd.d/test.res
> resource r0 {
> protocol C;
> startup {
> wfc-timeout  15;
> degr-wfc-timeout 60;
> }
> syncer {
> rate 200M;
> al-extents 1801;
> }
> net {
> cram-hmac-alg sha1;
> shared-secret "somesillypassword";
> max-buffers8000;
> max-epoch-size8000;
> }
> on core1-spc {
> address 10.0.0.1:7788;
> device /dev/drbd0;
> disk /dev/md3;
> meta-disk internal;
> }
> on core2-spc {
> address 10.0.0.2:7788;
> device /dev/drbd0;
> disk /dev/md3;
> meta-disk internal;
> }
> } 
> 
> 
> content of /etc/fstab
> ...
> /dev/drbd0/var/lib/libvirtext4 defaults,_netdev
> 0   0
> 
> 
> 
> 
> 
> -- 
> 
> Nico De Ranter
> Operations Engineer
> 
> T. +32 16 38 72 10
> 
> 
> 
> 
> 
> 
> eSATURNUS
> Romeinse straat 12
> 3001 Leuven – Belgium
> 
> 
> T. +32 16 40 12 82
> F. +32 16 40 84 77
> www.esaturnus.com
> 
> 
> 
> For Service & Support 
> 
> Support Line: +32 16 387210 or via email : supp...@esaturnus.com
> 
>  
> 
> 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] How long will the primary DRBD continue to run without a secondary?

2018-06-18 Thread Veit Wahlich
Hi GC,

keeping track of changed blocks is implemented using bitmaps, which are
part of the meta data on both sides. These bitmaps are always of full
size, so they will not grow by just changing blocks.

So unless you added something consuming ressources, e.g. event handlers
performing LVM snapshots to be created when the peers disconnect, a
single peer may run infinite.

Best regards,
// Veit

Am Donnerstag, den 14.06.2018, 19:07 -0500 schrieb G C:
> I know the secondary is the DR and the primary keeps track of what is
> changing but how long can it continue to run in this state before it will
> cause it to crash or not be writable any longer?  I would gather that at
> some point it would cause a lack of resources to write the changes to but
> I'm not sure and can't find any information related to it.
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Veit Wahlich
Hi Eric,

if your distro is el (e.g. RHEL/CentOS/Scientific), the kernel ABI
*should* not change during kernel updates, and copying modules from
older kernel versions as "weak updates" is not uncommon, following the
slogan "old module is better than no module". This is for example the
case for CentOS 7 and worked quite well in the past, unfortunately with
upgrade to 7.5 the ABI changed nevertheless and caused many systems even
to crash when using some old modules, including drbd.

If you build the module on the system that runs it, you might consider
installing/building a dkms or akmod package of drbd instead, along with
dkms/akmod itself. When booting a new kernel, dkms/akmod will check
whether the packaged modules already exist for the running kernel, and
if not, they will be built and installed. This works as long as the
module source builds well against the kernel source/headers provided and
all dependencies and build tools are present.

Regards,
// Veit

Am Freitag, den 22.06.2018, 04:38 + schrieb Eric Robinson:
> Greetings -
> 
> We always build drbd as a KLM, and it seems that every time we update the 
> kernel (with yum update) we have to rebuild drbd. This is probably the 
> worlds's dumbest question, but is there a way to update the kernel without 
> having to rebuild drbd every time?
> 
> --Eric
> 
> 
> 
> 
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user


___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Updating Kernel w/out Updating DRBD

2018-06-22 Thread Veit Wahlich
Well, I assume you are on el7 here. Adapt to other distros if required.

1. Install dkms, for el7 it is available in EPEL:

# yum install dkms

2. Untar the drbd tarball in /usr/src/, for drbd 8.4.11-1, you should
now have a directory /usr/src/drbd-8.4.11-1/.

3. Create a file /usr/src/drbd-8.4.11-1/dkms.conf with this content:

PACKAGE_NAME="drbd"
PACKAGE_VERSION="8.4.11-1"
MAKE="make -C drbd KDIR=/lib/modules/${kernelver}/build"
BUILT_MODULE_NAME[0]=drbd
DEST_MODULE_LOCATION[0]=/kernel/drivers/block
BUILT_MODULE_LOCATION[0]=drbd
CLEAN="make -C drbd clean"
AUTOINSTALL=yes

4. Register drbd with dkms, so dkms knows about it:

# dkms add -m drbd -v 8.4.11-1

5. Build the module of the desired version for the current kernel:

# dkms build -m drbd -v 8.4.11-1

6. Install the module of the desired version to the kernel's module
tree:

# dkms install -m drbd -v 8.4.11-1

You should now be able to use drbd.

dkms installs a hook that will automatically rebuild the module once you
install a new kernel{,-devel} package.
On rpm-based distros (maybe also others, I have not tested) and
depending on configuration, dkms also builds rpms for the new kmods, so
all files dkms writes are being registered with the package management.

If you want to remove a dkms installed module, you may simply use:

# dkms remove -m drbd -v 8.4.11-1 --all

--all removes the module from all kernel module trees.

Starting with drbd 9.0, the source tarball also includes an almost ready
to use dkms.conf file in the debian/ subdir. It is not specific to
Debian. May may want to copy it to .. and edit the module version
number.
Please note that drbd 9.0 has 2 kernel module files (dkms.ko and
drbd_transport_tcp.ko) and the module source changed to src/drbd/, so
with drbd 9.0 use the the dkms.conf file provided with the tarball
instead of my example dkms.conf above.

Best regards,
// Veit


Am Freitag, den 22.06.2018, 08:43 + schrieb Eric Robinson:
> I'm familiar with the --with-km switch when building drbd, but I don't see 
> anything in the documentation that allows building an akmod or dkms version 
> instead. How would I do that?
> 
> Also, I find it odd that the option to build from source is only in the DRBD 
> 8.3 User Guide and was left out of the 8.4 and 9.X User Guides. (I'm sure the 
> reason is obvious to everyone else I just missed something.) 
> 
> --Eric



___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Cluster split after short network outage

2018-07-12 Thread Veit Wahlich
Hi Roman,

what you experienced is the expected behaviour of a primary-primary
setup when the nodes are being disconnected from each other. It is
called split-brain situation and ensures that data stays
available/accessible on both sides without further corruption.

Usually you want to set up a STONITH configuration that performs a hard
shut-down or at least a network isolation of one of the hosts if such a
situation occurs, so the surviving side is free to restart the services
that resided on the other side before the split-brain occured.

You also might want to set up redundant networking, especially when
running a primary-primary configuration.

To resolve the split-brain, you need to dismiss the data of one side by
forcing a resync with the other side as source. If you have data changes
on both sides, you might want to copy the changes from the side to be
discarded the the future source first, usually at file level, or in case
of a shared LVM-PV, on LV level.

You might also want to reconsider, whether is primary-primary
configuration really suits your needs best. 

Best regards,
// Veit

Am Donnerstag, den 12.07.2018, 12:52 +0300 schrieb Roman Makhov:
> Hello,
> 
> I discovered the "Cluster is now split" message in log and moving to
> StandAlone then after short (about 8 seconds) network failure between
> cluster nodes.
> 
> Would you please to suggest something?
> 
> Thank you in advance!
> 
> The drbd log is:
> =
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: PingAck did not
> arrive in time.
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: conn( Connected
> -> NetworkFailure ) peer( Primary -> Unknown )
> [Sat Jul  7 21:02:22 2018] drbd dhcp/0 drbd0: disk( UpToDate -> Consistent )
> [Sat Jul  7 21:02:22 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: pdsk(
> UpToDate -> DUnknown ) repl( Established -> Off )
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: ack_receiver terminated
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: Terminating
> ack_recv thread
> [Sat Jul  7 21:02:22 2018] drbd dhcp: Preparing cluster-wide state
> change 1083152536 (1->-1 0/0)
> [Sat Jul  7 21:02:22 2018] drbd dhcp: Committing cluster-wide state
> change 1083152536 (2ms)
> [Sat Jul  7 21:02:22 2018] drbd dhcp/0 drbd0: disk( Consistent -> UpToDate )
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: Connection closed
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: conn(
> NetworkFailure -> Unconnected )
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: Restarting
> receiver thread
> [Sat Jul  7 21:02:22 2018] drbd dhcp dhcp-master.dhcp: conn(
> Unconnected -> Connecting )
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: Handshake to
> peer 0 successful: Agreed network protocol version 112
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: Feature flags
> enabled on protocol level: 0x7 TRIM THIN_RESYNC WRITE_SAME.
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: Starting
> ack_recv thread (from drbd_r_dhcp [28952])
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: Preparing
> remote state change 1152846943 (primary_nodes=0, weak_nodes=0)
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: Committing
> remote state change 1152846943
> [Sat Jul  7 21:02:30 2018] drbd dhcp dhcp-master.dhcp: conn(
> Connecting -> Connected ) peer( Unknown -> Primary )
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0: disk( UpToDate -> Outdated )
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp:
> drbd_sync_handshake:
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: self
> 0967822D6718C8AC::323BE7D71FABECCC:44CD99B02FF92950
> bits:0 flags:120
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp:
> uuid_compare()=-2 by rule 50
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: pdsk(
> DUnknown -> UpToDate ) repl( Off -> WFBitMapT )
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: receive
> bitmap stats [Bytes(packets)]: plain 0(0), RLE 29(1), total 29;
> compression: 100.0%
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: send
> bitmap stats [Bytes(packets)]: plain 0(0), RLE 29(1), total 29;
> compression: 100.0%
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: helper
> command: /sbin/drbdadm before-resync-target
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: helper
> command: /sbin/drbdadm before-resync-target exit code 0 (0x0)
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0: disk( Outdated -> Inconsistent )
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: repl(
> WFBitMapT -> SyncTarget )
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: Began
> resync as SyncTarget (will sync 12 KB [3 bits set]).
> [Sat Jul  7 21:02:30 2018] drbd dhcp/0 drbd0 dhcp-master.dhcp: Resync
> done (total 1 sec; paused 0 sec; 12 K/sec)
> [Sat Jul  7 21:0

Re: [DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Veit Wahlich
Hi Eric,

Am Donnerstag, den 26.07.2018, 13:56 + schrieb Eric Robinson:
> Would there really be a PV signature on the backing device? I didn't turn md4 
> into a PV (did not run pvcreate /dev/md4), but I did turn the drbd disk into 
> one (pvcreate /dev/drbd1).

both DRBD and mdraid put their metadata at the end of the block device,
thus depending on LVM configuration, both mdraid backing devices as well
as DRBD minors bcking VM disks with direct-on-disk PVs might be detected
as PVs.

It is very advisable to set lvm.conf's global_filter to allow only the
desired devices as PVs by matching a strict regexp, and to ignore all
other devices, e.g.:

 global_filter = [ "a|^/dev/md.*$|", "r/.*/" ]

or even more strict: 

 global_filter = [ "a|^/dev/md4$|", "r/.*/" ]

After editing the configuration, you might want to regenerate your
distro's initrd/initramfs to reflect the changes directly at startup.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] drbd+lvm no bueno

2018-07-26 Thread Veit Wahlich
Am Donnerstag, den 26.07.2018, 17:31 +0200 schrieb Lars Ellenberg:
> >  global_filter = [ "a|^/dev/md.*$|", "r/.*/" ]
> > 
> > or even more strict: 
> > 
> >  global_filter = [ "a|^/dev/md4$|", "r/.*/" ]
> 
> Uhm, no.
> Not if he want DRBD to be his PV...
> then he needs to exclude (reject) the backend,
> and only include (accept) the DRBD.

Ah yes, sorry. In my mind Eric used LVM below DRBD, just like you
recommended:

> But really, most of the time, you really want LVM *below* DRBD,

Regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Stop using DRBD

2018-09-28 Thread Veit Wahlich
Hi Michael,

my hint is to have a look at the wipe-md command of drbdadm resp.
drbdmeta.

Alternatively it might suffice to simply resize the FS on the former
backing device to the backing device's full size. As integrated meta
data is stored at the end of the block device, resizing the FS would
grow it into the metadata. But you cannot be sure that exactly the bits
get overwritten that identify the backing device as drdb, so using
wipe-md instead/before is advisable.

Best regards,
// Veit



Am Freitag, den 28.09.2018, 15:40 +0300 schrieb Michael Dukelsky:
> Hi,
> 
> On 28.09.2018 14:15, Simon Ironside wrote:
> 
> > > I have two nodes with DRBD 8.3. Now I want to stop using DRBD and
> > > use 
> > > one of the nodes as a standalone box. How do I do it? As far as
> > > I 
> > > understand it would be enough to wipe out the meta data. But I
> > > see no 
> > > command for that. How can it be done?
> > 
> > You don't really need to delete it at all, it's at the end of the
> > block 
> > device so it can be ignored. You can just use (mount or whatever)
> > the 
> > underlying block device directly.
> 
> Although gdisk says that the partition with the user data is "8300 
> Linux filesystem", when I try to mount it after stopping drbd I get
> 
> mount: unknown filesystem type 'drbd'
> 
> So it looks like I have to wipe out the meta data first. But I need
> to 
> know its offset and size. Unfortunately I cannot understand how to
> get 
> it from the textual dump made by 'drbdmeta dump-md'. The offset 
> mentioned there does not seem to point to the meta data.
> 
> Michael
> 
> ___
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] [E] Re: Out of sync blocks in DRBD 8.4.6

2018-12-11 Thread Veit Wahlich
Hi, 

just to warm up some older discussions; might your setup be using files opened 
with O_DIRECT? This allows the modification of buffers in-flight (read: changes 
while they are being sent to the peer) and is commonly used by some 
applications to bypass Linux' caching mechanisms. 

If you are uncertain, tell us some details about the setup/application stack. 

Regards, 
// Veit 


 Ursprüngliche Nachricht 
Von: Athanasios Pantinakis 
Gesendet: 11. Dezember 2018 16:19:17 MEZ
An: G C 
CC: "drbd-user@lists.linbit.com" 
Betreff: Re: [DRBD-user] [E] Re:  Out of sync blocks in DRBD 8.4.6

This definitely works, but then again few days later more OOS blocks are coming 
up when verify runs.


From: G C 
Sent: Tuesday, December 11, 2018 17:17
To: Athanasios Pantinakis 
Cc: drbd-user@lists.linbit.com
Subject: [E] Re: [DRBD-user] Out of sync blocks in DRBD 8.4.6

Have you tried?
drbdadm disconnect 
drbdadm connect 

On Tue, Dec 11, 2018 at 9:12 AM Athanasios Pantinakis 
mailto:athanasios.pantina...@mavenir.com>> 
wrote:
Dear DRBD community,

I was wondering if anyone has faced significant amount of out of sync blocks in 
DRBD 8.4.6 and how did they resolve such issues. I have tried to disabled LVM 
cache but situation remains the same. Would adding LVM filter make any 
difference?

ns:1394142416 nr:413724 dw:1363095788 dr:284583597 al:688 bm:0 lo:4 pe:163 
ua:0 ap:163 ep:1 wo:d oos:44
ns:416532 nr:1371198592 dw:1371616044 dr:251713761 al:62 bm:0 lo:66 pe:0 
ua:65 ap:0 ep:1 wo:d oos:280

ns:0 nr:1394140816 dw:1394140804 dr:251698144 al:0 bm:0 lo:3 pe:0 ua:4 ap:0 
ep:1 wo:d oos:44
ns:1370787580 nr:415668 dw:1339742876 dr:286452449 al:680 bm:0 lo:0 pe:256 
ua:0 ap:256 ep:1 wo:d oos:280

129600 lrwxrwxrwx   1 root root   11 Nov 27 12:24 
/dev/drbd/by-res/r2 -> ../../drbd2
   129490 lrwxrwxrwx   1 root root   11 Nov 27 10:46 
/dev/drbd/by-res/r1 -> ../../drbd1

Thank you,
Thanos


This e-mail message may contain confidential or proprietary information of 
Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
the intended recipient(s). If you are not the intended recipient of this 
message, you are hereby notified that any review, use or distribution of this 
information is absolutely prohibited and we request that you delete all copies 
in your control and contact us by e-mailing to 
secur...@mavenir.com. This message contains the 
views of its author and may not necessarily reflect the views of Mavenir 
Systems, Inc. or its affiliates, who employ systems to monitor email messages, 
but make no representation that such messages are authorized, secure, 
uncompromised, or free from computer viruses, malware, or other defects. Thank 
You
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

This e-mail message may contain confidential or proprietary information of 
Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
the intended recipient(s). If you are not the intended recipient of this 
message, you are hereby notified that any review, use or distribution of this 
information is absolutely prohibited and we request that you delete all copies 
in your control and contact us by e-mailing to secur...@mavenir.com. This 
message contains the views of its author and may not necessarily reflect the 
views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor 
email messages, but make no representation that such messages are authorized, 
secure, uncompromised, or free from computer viruses, malware, or other 
defects. Thank You
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] discrepancy in space usage accounting

2019-04-16 Thread Veit Wahlich
Am Montag, den 15.04.2019, 15:52 -0400 schrieb Boris Epstein:
> I have a DRBD filesystem used by ProxMox which has about 1-2 TB worth
> of
> files yet shows the usage level at close to 30 TB. How is that
> possible?

Hi Boris,

DRBD is not a filesystem, it only provides replicated block devices
that filesystems can reside on.

You may grep the status line of your mountpoint from the output of
"mount" and find the filesystem type, or if using "df" anyway, you may
simply attach the parameter "-T" to let it include the filesystem type
in its output.

Best regards,
// Veit

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] DRBD Delay

2019-07-23 Thread Veit Wahlich
Hi David,

have a look at the documentation of drbdadm; the commands up, down and
adjust might be what you are looking for.

Best regards,
// Veit


Am Montag, den 22.07.2019, 16:34 + schrieb Banks, David (db2d):
> Hello,
> 
> Is there a way to delay the loading of DRBD resources until after the
> underlying block system has made the devices available?
> 
> I’ve looked in systemd but didn’t see any drbd services and wanted to
> ask before monkeying with that.
> 
> System: Ubuntu 18.04
> DRBD: 8.9.10
> 
> After reboot DRBD starts before the zfs volumes that is use are
> available so I have to do a 'drbdadm adjust all’ each time. I’d like
> it to just wait until the zfs-mount.service is done.
> 
> Thanks!
> 
> ___
> Star us on GITHUB: https://github.com/LINBIT
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user

___
Star us on GITHUB: https://github.com/LINBIT
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user