Re: [DRBD-user] Access to the slave node
Hi Veit, Thanks for the detailed reply. 1. Yes, I have tried GFS - the problem is that the whole pacemaker/corosync setup seems to me bit difficult and fragile. Also the distributed filesystems like GlusterFS/GFS/OCFS will never offer such performance like normal filesystem + asynchronous NFS. Hence I am here on the DRBD mailing list as I need redundancy :) 2. Thanks for the snapshotting hint - it did not occurred to me. I think the metadata won't be problem for the reasons you described. I am also aware of the performance penalty. I will give drbd some more tests (possibly with a large send buffer) as I need to be 100% sure it will not crash the filesystem if protocol A is used. So far, so good. Thanks, Ondrej -Original Message- From: Veit Wahlich [mailto:cru.li...@zodia.de] Sent: Thursday, March 15, 2018 7:38 PM To: drbd-user@lists.linbit.com; Ondrej Valousek <ondrej.valou...@s3group.com>; drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Access to the slave node Hi Ondrej, yes, this is perfectly normal in single-primary environments. DRBD simply does not permit to access the resource block devices until it is promoted to primary. What you describe would only work in dual-primary environments, but running such an environment also requires a lot more precautions than single-primary to not endanger your data. Also remember that even mounting read-only does for many (most?) filesystems not mean that no data is altered; at least meta data such as "last-mounted" attributes are still written, also journal replay might occur. As the fs on the primary is still updated while your read-only side does not expect this, your ro-mount will most likely read garbage at some point and might even freeze the system. There are only a few scenarios to prevent such situations, and I regard the two following the most useful ones: a) Implement a dual-primary environment running a cluster filesystems such as GFS or OCFS2 on top -- this is hard work to learn and build and offers lots of pitfalls that put your data in danger and is currently limited to 2 nodes, but it even allows to write the fs from both sides. b) Build a single-primary environment like your existing, but use a block layer that allows snapshots (e. g. classic LVM, LVM thinp or ZFS) to place your DRBD backing devices upon -- when you need to access the primary's data from a secondary, take a snapshot of its backing device on the secondary and mount the snapshot instead of the DRBD volume. Addendum to b): This reflects the state of the fs only at the point in time the snapshot was created. You will be able to even mount the snapshot rw without affecting the DRBD volume. If using a backing device with internal metadata, this metadata will also be present in the snapshot, but most (if not all) Linux filesystems will ignore any data at the end of the block device that is out of the fs' actual size. The snapshot will grow as data is written to the DRBD volume and, depending on the snapshot implementation and block size/pointer granularity, will slow down writes to both the DRBD volume and the snapshot as long as the snapshot exists (due to copy on write and/or pointer tracking). So only choose this scenario if you need to read data from the secondary for a limited time (such as backup reasons), or you are willing to renew the snapshot on a regular basis, or you can afford to sacrifice possibly a lot of storage and write performance on this. Best regards, // Veit Ursprüngliche Nachricht ---- Von: Ondrej Valousek <ondrej.valou...@s3group.com> Gesendet: 15. März 2018 11:21:49 MEZ An: "drbd-user@lists.linbit.com" <drbd-user@lists.linbit.com> Betreff: [DRBD-user] Access to the slave node Hi list, When trying to mount the filesystem on the slave node (read-only, I do not want to crash the filesystem), I am receiving: mount: mount /dev/drbd0 on /brick1 failed: Wrong medium type Is it normal? AFAIK it should be OK to mount the filesystem read-only on the slave node. Thanks, Ondrej - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18. - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you mu
Re: [DRBD-user] Access to the slave node
> That is perfectly expected behavior. Imagine a resync from the primary that, > for some time, makes the secondary inconsistent (see what Lars already told > you). Would not make sense to mount that > one... Error codes are limited, "Wrong medium type" is the one that makes > most sense. > For DRBD9 it is possible to mount a secondary RO when there is no primary. I am using DRBD9 and I am experiencing the same error when I unmount the filesystem on the primary node first (i.e. to make sure the filesystem is 100% consistent) - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
[DRBD-user] Access to the slave node
Hi list, When trying to mount the filesystem on the slave node (read-only, I do not want to crash the filesystem), I am receiving: mount: mount /dev/drbd0 on /brick1 failed: Wrong medium type Is it normal? AFAIK it should be OK to mount the filesystem read-only on the slave node. Thanks, Ondrej - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user
Re: [DRBD-user] Data consistency question
Hi, Thanks for the explanation. So for example, let's have 2 nodes in a different geo locations (for say disaster recovery), so let's use protocol A so things go fast for the 1st node (the primary). But we have a large data to resync, say 10Tb and the link is slow so it might take few days for these 2 nodes to finish the initial resync. But it will finish at some stage. Now say, I start some heavy file I/O operation on the 1st node and then suddenly the primary node fatally crashes: 1. You say no matter what I do, which filesystem I choose (xfs, ext4, btrfs), I will always be able to recover data on the 2nd node (the slave). At the maximum I would have to run fsck to fix journals, etc. Right? 2. The snapshotting via the "before-resync-target" handler will not be effective in this case as the crash happened after both nodes sync up. Right? 3. As we chose async protocol A, a decent RAM buffer is required for those writes which did not make it into the slave node yet. Is there some limit to this buffer so that when the limit is hit, a synchronous operation is enforced (i.e. much like the "vm.dirty_background_ratio" kernel parameter) ? 4. Is drbd able to merge writes - for example I write a to a file on node 1 & then immediately overwrite it. So the async protocol could potentially invalidate the 1st write as it was superseded by the later write to the same block. 5. Is it wise to run DRBD in the scenario above (Slow link, big chunk of data, asynchronous protocol, aiming for disaster recovery)? Yes I know I could rather use something like rsync, but we have lots of small files in the filesystem - it seems to me more practical to operate at the block level like DRBD does. Thanks, Ondrej -Original Message- From: drbd-user-boun...@lists.linbit.com [mailto:drbd-user-boun...@lists.linbit.com] On Behalf Of Lars Ellenberg Sent: Wednesday, March 14, 2018 2:03 PM To: drbd-user@lists.linbit.com Subject: Re: [DRBD-user] Data consistency question On Tue, Mar 13, 2018 at 10:23:25AM +, Ondrej Valousek wrote: > Hi list, > > I have a question regarding filesystem consistency - say I choose > async protocol (A) and the master peer node crashes fatally in the > middle of write operation. > > The slave peer node will then be outdated, but what happens to the > filesystem on the top of the replicated block device - will I be able > to restore data from the outdated peer? > > My understanding is that DRBD completely ignores the filesystem, so > unless I choose synchronous replication protocol C, filesystem > corruption can occur on the peer node. > > Am I right? No. If you "fatally" lose your primary while the secondary is "sync target, inconsistent", then yes, you have lost your data. That's why we have the "before-resync-target" handler, where you could snapshot the last consistent version of your data, before becoming sync target. If you "fatally" lose your primary during normal operation (which is: live replication, no resync), depending on protocol in use, the disk on the secondary will possibly not have seen those writes that where still in flight. Which in "synchronous" mode (protocol C) will be only requests that have not been completed to upper layers (the file system, the data base, the VM) yet, so it would look just like a "single system crash". In "asynchronous" mode (protocol A), that will be a few more requests, some of which may have already been completed to upper layers. Clients that have committed a transaction, and already got an acknowledgement for that, may be confused by the fact that the most recent few such transactions may have been lost. That's the nature of "asynchronous" replication here. Going online with the Secondary now will look just like a "single system crash", but like that crash would have happened a few requests earlier. It may miss the latest few updates. But it will still be consistent. -- : Lars Ellenberg : LINBIT | Keeping the Digital World Running : DRBD -- Heartbeat -- Corosync -- Pacemaker DRBD(r) and LINBIT(r) are registered trademarks of LINBIT __ please don't Cc me, but send to list -- I'm subscribed ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank Y
[DRBD-user] Data consistency question
Hi list, I have a question regarding filesystem consistency - say I choose async protocol (A) and the master peer node crashes fatally in the middle of write operation. The slave peer node will then be outdated, but what happens to the filesystem on the top of the replicated block device - will I be able to restore data from the outdated peer? My understanding is that DRBD completely ignores the filesystem, so unless I choose synchronous replication protocol C, filesystem corruption can occur on the peer node. Am I right? Thanks, Ondrej - The information contained in this e-mail and in any attachments is confidential and is designated solely for the attention of the intended recipient(s). If you are not an intended recipient, you must not use, disclose, copy, distribute or retain this e-mail or any part thereof. If you have received this e-mail in error, please notify the sender by return e-mail and delete all copies of this e-mail from your computer system(s). Please direct any additional queries to: communicati...@s3group.com. Thank You. Silicon and Software Systems Limited (S3 Group). Registered in Ireland no. 378073. Registered Office: South County Business Park, Leopardstown, Dublin 18. ___ drbd-user mailing list drbd-user@lists.linbit.com http://lists.linbit.com/mailman/listinfo/drbd-user