Re: [DRBD-user] Access to the slave node

2018-03-16 Thread Ondrej Valousek
Hi Veit,

Thanks for the detailed reply.
1. Yes, I have tried GFS - the problem is that the whole pacemaker/corosync 
setup seems to me bit difficult and fragile. Also the distributed filesystems 
like GlusterFS/GFS/OCFS will never offer such performance like normal 
filesystem + asynchronous NFS. Hence I am here on the DRBD mailing list as I 
need redundancy :)

2. Thanks for the snapshotting hint - it did not occurred to me. I think the 
metadata won't be problem for the reasons you described. I am also aware of the 
performance penalty.

I will give drbd some more tests (possibly with a large send buffer) as I need 
to be 100% sure it will not crash the filesystem if protocol A is used. So far, 
so good.

Thanks,
Ondrej

-Original Message-
From: Veit Wahlich [mailto:cru.li...@zodia.de] 
Sent: Thursday, March 15, 2018 7:38 PM
To: drbd-user@lists.linbit.com; Ondrej Valousek <ondrej.valou...@s3group.com>; 
drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Access to the slave node

Hi Ondrej, 

yes, this is perfectly normal in single-primary environments. DRBD simply does 
not permit to access the resource block devices until it is promoted to 
primary. What you describe would only work in dual-primary environments, but 
running such an environment also requires a lot more precautions than 
single-primary to not endanger your data. Also remember that even mounting 
read-only does for many (most?) filesystems not mean that no data is altered; 
at least meta data such as "last-mounted" attributes are still written, also 
journal replay might occur. As the fs on the primary is still updated while 
your read-only side does not expect this, your ro-mount will most likely read 
garbage at some point and might even freeze the system.

There are only a few scenarios to prevent such situations, and I regard the two 
following the most useful ones:

a) Implement a dual-primary environment running a cluster filesystems such as 
GFS or OCFS2 on top -- this is hard work to learn and build and offers lots of 
pitfalls that put your data in danger and is currently limited to 2 nodes, but 
it even allows to write the fs from both sides.

b) Build a single-primary environment like your existing, but use a block layer 
that allows snapshots (e. g. classic LVM, LVM thinp or ZFS) to place your DRBD 
backing devices upon -- when you need to access the primary's data from a 
secondary, take a snapshot of its backing device on the secondary and mount the 
snapshot instead of the DRBD volume.

Addendum to b): This reflects the state of the fs only at the point in time the 
snapshot was created. You will be able to even mount the snapshot rw without 
affecting the DRBD volume. If using a backing device with internal metadata, 
this metadata will also be present in the snapshot, but most (if not all) Linux 
filesystems will ignore any data at the end of the block device that is out of 
the fs' actual size. The snapshot will grow as data is written to the DRBD 
volume and, depending on the snapshot implementation and block size/pointer 
granularity, will slow down writes to both the DRBD volume and the snapshot as 
long as the snapshot exists (due to copy on write and/or pointer tracking). So 
only choose this scenario if you need to read data from the secondary for a 
limited time (such as backup reasons), or you are willing to renew the snapshot 
on a regular basis, or you can afford to sacrifice possibly a lot of storage 
and write performance on this. 

Best regards,
// Veit 


 Ursprüngliche Nachricht ----
Von: Ondrej Valousek <ondrej.valou...@s3group.com>
Gesendet: 15. März 2018 11:21:49 MEZ
An: "drbd-user@lists.linbit.com" <drbd-user@lists.linbit.com>
Betreff: [DRBD-user] Access to the slave node

Hi list,

When trying to mount the filesystem on the slave node (read-only, I do not want 
to crash the filesystem), I am receiving:

mount: mount /dev/drbd0 on /brick1 failed: Wrong medium type

Is it normal? AFAIK it should be OK to mount the filesystem read-only on the 
slave node.
Thanks,

Ondrej


-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.

-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you mu

Re: [DRBD-user] Access to the slave node

2018-03-15 Thread Ondrej Valousek

> That is perfectly expected behavior. Imagine a resync from the primary that, 
> for some time, makes the secondary inconsistent (see what Lars already told 
> you). Would not make sense to mount that 
> one... Error codes are limited, "Wrong medium type" is the one that makes 
> most sense.

> For DRBD9 it is possible to mount a secondary RO when there is no primary.

I am using DRBD9 and I am experiencing the same error when I unmount the 
filesystem on the primary node first (i.e. to make sure the filesystem is 100% 
consistent)
-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.

___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


[DRBD-user] Access to the slave node

2018-03-15 Thread Ondrej Valousek
Hi list,

When trying to mount the filesystem on the slave node (read-only, I do not want 
to crash the filesystem), I am receiving:

mount: mount /dev/drbd0 on /brick1 failed: Wrong medium type

Is it normal? AFAIK it should be OK to mount the filesystem read-only on the 
slave node.
Thanks,

Ondrej


-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user


Re: [DRBD-user] Data consistency question

2018-03-15 Thread Ondrej Valousek
Hi,

Thanks for the explanation.
So for example, let's have 2 nodes in a different geo locations (for say 
disaster recovery), so let's use protocol A so things go fast for the 1st node 
(the primary).
But we have  a large data to resync, say 10Tb and the link is slow so it might 
take few days for these 2 nodes to finish the initial resync.

But it will finish at some stage. Now say, I start some heavy file I/O 
operation on the 1st node and then suddenly the primary node fatally crashes:

1. You say no matter what I do, which filesystem I choose (xfs, ext4, btrfs), I 
will always be able to recover data on the 2nd node (the slave). At the maximum 
I would have to run fsck to fix journals, etc. Right?
2. The snapshotting via the "before-resync-target" handler will not be 
effective in this case as the crash happened after both nodes sync up. Right?
3. As we chose async protocol A, a decent RAM buffer is required for those 
writes which did not make it into the slave node yet. Is there some limit to 
this buffer so that when the limit is hit, a synchronous operation is enforced 
(i.e. much like the "vm.dirty_background_ratio" kernel parameter) ?
4. Is drbd able to merge writes - for example I write a to a file on node 1 & 
then immediately overwrite it. So the async protocol could potentially 
invalidate the 1st write as it was superseded by the later write to the same 
block.

5. Is it wise to run DRBD in the scenario above (Slow link, big chunk of data, 
asynchronous protocol, aiming for disaster recovery)? Yes I know I could rather 
use something like rsync, but we have lots of small files in the filesystem - 
it seems to me more practical to operate at the block level like DRBD does. 
Thanks,

Ondrej


-Original Message-
From: drbd-user-boun...@lists.linbit.com 
[mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg
Sent: Wednesday, March 14, 2018 2:03 PM
To: drbd-user@lists.linbit.com
Subject: Re: [DRBD-user] Data consistency question

On Tue, Mar 13, 2018 at 10:23:25AM +, Ondrej Valousek wrote:
> Hi list,
> 
> I have a question regarding filesystem consistency - say I choose 
> async protocol (A) and the master peer node crashes fatally in the 
> middle of write operation.
>
> The slave peer node will then be outdated, but what happens to the 
> filesystem on the top of the replicated block device - will I be able 
> to restore data from the outdated peer?
> 
> My understanding is that DRBD completely ignores the filesystem, so 
> unless I choose synchronous replication protocol C, filesystem 
> corruption can occur on the peer node.
> 
> Am I right?

No.

If you "fatally" lose your primary while the secondary is "sync target, 
inconsistent", then yes, you have lost your data.
That's why we have the "before-resync-target" handler, where you could snapshot 
the last consistent version of your data, before becoming sync target.

If you "fatally" lose your primary during normal operation (which is: live 
replication, no resync), depending on protocol in use, the disk on the 
secondary will possibly not have seen those writes that where still in flight.

Which in "synchronous" mode (protocol C) will be only requests that have not 
been completed to upper layers (the file system, the data base, the
VM) yet, so it would look just like a "single system crash".

In "asynchronous" mode (protocol A), that will be a few more requests, some of 
which may have already been completed to upper layers.

Clients that have committed a transaction, and already got an acknowledgement 
for that, may be confused by the fact that the most recent few such 
transactions may have been lost.

That's the nature of "asynchronous" replication here.

Going online with the Secondary now will look just like a "single system 
crash", but like that crash would have happened a few requests earlier.

It may miss the latest few updates.
But it will still be consistent.

--
: Lars Ellenberg
: LINBIT | Keeping the Digital World Running
: DRBD -- Heartbeat -- Corosync -- Pacemaker

DRBD(r) and LINBIT(r) are registered trademarks of LINBIT __ please don't Cc 
me, but send to list -- I'm subscribed 
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user
-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank Y

[DRBD-user] Data consistency question

2018-03-14 Thread Ondrej Valousek
Hi list,

I have a question regarding filesystem consistency - say I choose async 
protocol (A) and the master peer node crashes fatally in the middle of write 
operation.
The slave peer node will then be outdated, but what happens to the filesystem 
on the top of the replicated block device - will I be able to restore data from 
the outdated peer?

My understanding is that DRBD completely ignores the filesystem, so unless I 
choose synchronous replication protocol C, filesystem corruption can occur on 
the peer node.

Am I right?
Thanks,

Ondrej

-

The information contained in this e-mail and in any attachments is confidential 
and is designated solely for the attention of the intended recipient(s). If you 
are not an intended recipient, you must not use, disclose, copy, distribute or 
retain this e-mail or any part thereof. If you have received this e-mail in 
error, please notify the sender by return e-mail and delete all copies of this 
e-mail from your computer system(s). Please direct any additional queries to: 
communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 
Group). Registered in Ireland no. 378073. Registered Office: South County 
Business Park, Leopardstown, Dublin 18.
___
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user