Hi,

I'd like to post some semi-useful setup I installed a week ago for you to
enjoy. Maybe someone has some comments on it ...

The plan is to create an "indestructible" block device which does
real-time backup to a remote iSCSI LUN. This can be done with Linux
Software RAID alone already, but using nilfs, one can add point-in-time
recovery to said block device.

The intention is to run virtual machines of any kind on the indestructible
block device, and have backup of any past state, up to certain limits of
course.

This is the setup:

 * The remote iSCSI LUN is formatted with nilfs2. It is mounted without
cleanerd, to avoid performance loss on the host.

 * A loop file is created on the nilfs2. The size of this file will limit
the size of the indestructible block device.

 * Another loop file identical in size is created on a local filesystem
(eventually this may also be a real partition or disk).

 * Using mdadm, both loop devices are welded together as a RAID-0
(mirrored) device. The remote loop device is marked "write-mostly" so
reads on the iSCSI LUN are kept at minimum. "write-behind" is activated
to allow the remote device to lag behind. Also, internal bitmapping is
activated, so in case of a connection loss only changed blocks are
written to the remote loop file.

 * The resulting md-device can be formatted with any filesystem or
connected to a virtual machine.

So how does it perform?

Quite well I'd say. I am running two slave databases on two indestructible
block devices (formatted with XFS actually), backed by a single iSCSI LUN
with nilfs2. One 100 GB, another 50 GB. The nilfs partition (1000 GB) is
filling at a rate of about 30 GB a day. They keep up with their masters.
Of course nothing is optimized for anything here: Small changes in the
database are likely to produce rather big block copies in the nilfs.

I plan to swap the nilfs device and resync on a blank loop file once it is
full. Far more efficient than using cleanerd, sorry :-)

Recovery is "okay".

I can use any checkpoint and load the loop file from that point in time.
There are some things to note, though:

 * The file is read-only. So is the loop device. mdadm doesn't really like
assembling with a read-only loop device (I couldn't stop it without a
kernel "oops" message). The safe workaround is to mount the loop device
directly, without any RAID. That works for XFS, but not necessarily for
other filesystems, as the end of the device contains the mdadm
superblock.

 * The filesystem is in an unclean state. It will try to repair and fail
to do so because it is read-only. In case of XFS, that can be turned off
with mount option.

 * Also, the filesystem UUID is the same as the active one. Same again, a
mount option forces ignoring the UUID.

The backup can be restored, which was the point.

The remote loop device can be removed with mdadm. Write performance should
increase a lot then, though it will never be native because the bitmap
file has to be written. Upon re-add, it resynced quickly (the /proc/mdstat
indicator isn't accurate on this, the sync just skips unwritten portions).

I will now watch the nilfs2 device fill up to see what happens and will
report back soon. My guess is something will explode: mdadm, loop and / or
nilfs. This can be avoided, of course, by swapping the loop file before
that happens. But I'm curious.

All done with an unmodified Debian Lenny, nilfs is version 2.04 (tools
2.06) and kernel is 2.6.26.

Comments welcome!

Regards,

Pierre Beck

_______________________________________________
users mailing list
[email protected]
https://www.nilfs.org/mailman/listinfo/users

Reply via email to