Re: Restart from....? (DRP)
Hi, Quoting Albert Shih : Hi everyone I've a question about DRP (Disaster Recovery Plan), what's the easiest (= fastest) way to rebuild a server (with the data) after a server « disappear » (fire, water flood, etc.). I see three way to « backup » the data : Replication, Backup service (inside cyrusimapd 3), Filesystem backup (whatever the technic) For replication my concern is the speed of the replication, the main server (I got only one server) got lots of RAM, got SSD, and SAS disk, the replication got SATA disks (lots of RAM too). When I check I think everything are indeed replicated on the « slave » but with some delays (1/2 days). We have distributed our users across 6 (virtual) servers in an cyrus 2.4 murder setup. The servers are grouped in pairs, so that one is running on hardware in one building and the other in the other. On each server there are 3 cyrus instances running, one frontend, and one backend and one replic. In case of disaster, or planed maintenance we will start the replic as normal backend (we use service ip addresses for each backend and move this ip to the other server so we don't have to update the mupdate master mailbox.db. The rolling replication is able to keep up. So normally there is only a small delay (2-5 Secs). If there is a traffic peak (many newsletters) it may take up to 1-2h. I have only seen longer delays in case of a corrupt mailbox where the replication bailed out. We are monitoring the size of the replication log. We have ~ 41000 Accounts ~13.5 TB Mails, The VMs are running in an RHEV System. Each Server has 20 GB Ram, 8 CPU-Cores, the Mails are stored on EUROstor iSCSI Systems with SATA disks Recently we migrated the metadata onto a new EUROstor iSCSI System with SSDs. At the moment we plan to migrate to Cyrus 3.0 to use archive partitions so that the recent mails will be stored on a iSCSI System with SAS disks, and the older mails will be moved to the old iSCSI system with SATA disks In addition to the disaster recovery plan we use "expunge_mode: delayed" and "delete_mode: delayed" and normal file based backup for the "I deleted my very importent mail by accident" use case. Regards Michael Menge M.MengeTel.: (49) 7071/29-70316 Universität Tübingen Fax.: (49) 7071/29-5912 Zentrum für Datenverarbeitung mail: michael.me...@zdv.uni-tuebingen.de Wächterstraße 76 72074 Tübingen Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: Restart from....? (DRP)
Am Montag, 18. Juni 2018, 10:48:16 CEST schrieb Albert Shih: > Everything seem working fine, until I try to send the dataset on other > server. I just cannot send a zfs snapshot from this server to another. If > the dataset are small that's OK, but with the mailbox (~4To) the zfs > command just hang after 10-40 minutes during 1-10 minutes, come back work > during 1 or 2 hours and hang again etc. Ahh, yes, we have local snapshots and a second ZFS machine for ZFS replication (incl. snapshots) which is running in the background - so the snapshots are done locally and send "in backround" over the network to another location. if just the machine but not the disks break, we can sue the local disk set within a new machine to start over. if the whole sites burn down, the disks (or temporarily by iSCSI, NFS or samba) could be used to start over in/with new hardware. in smaller on-site setups we use i.e. FreeNAS as the "FreeBSD distribution" for easier management (even trough less skilled IT persons). this allows us to run jails with i.e. cyrus (capsulated and backed up too) which can be handled "by click" btw. this means the cyrus (jails) are running in ZFS too. > Yes we using puppet, reinstalling the system and configuration are easy. > The hard part are the data. this depends from the storage (on network like NAS or SAN or "locally"). by principle: - mounting or copying the pool (usually the largest part) - reimport database i.e. similiar to: https://forum.open-xchange.com/showthread.php?3512-Simple-Cyrus-mailbox-migration or http://www.monoplan.de/cyrus_imap_migration.html (german) - reconstruct -f (i use "just" reconstruct -f as this runs the whole pool too) then your cyrus should be fine again. the reconstruct -f (forced) mail reads the pool data (folder by folder, mail by mail) and "fixes" any inconsistencies with the "database" (due to the "hot" state of the backup - not shutted down for backup). the cyrus databases seems quite robust to that (compared to most other database systems). > I'm a bit new with cyrus so... All I can say is the replication seem to > works well. I got thanks for this info. will try cyrus replication soon for testing purposes. ß) > I'll will try today to see if it's easy or not to restart with a slave by > cloning it. i'm new to replication, but afaik it should be easy to make a slave to a new master just by reconfiguration /cyrus.conf, some imapd.conf flags) i assume. hth a bit, good luck, niels. -- --- Niels Dettenbach Syndicat IT & Internet http://www.syndicat.com PGP: https://syndicat.com/pub_key.asc --- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus
Re: Restart from....? (DRP)
Le 18/06/2018 à 10:22:03+0200, Niels Dettenbach via Info-cyrus a écrit > Am Montag, 18. Juni 2018, 09:46:02 CEST schrieb Albert Shih: > > What do you think ? What's your DRP ? > I shoot snapshots from the underlying FS of the spool partition(s) and the > main DB files (skiplist) - incl. (incremental) filesystem dumps of them. How you do that ? Because at the beginning my plan was to do both (replication and snapshot). The problem is currently I'm encounter big issue with the snapshot. I don't know if this is the right place because I don't know if it's related to Cyrus, so that's why I didn't talk about at the first time. But I got a server (Dell PowerEdge, 192Go, 28 mechanicals disk, 2 ssd, 2 SAS (for the OS)). The system is FreeBSD 11 running on the 2 SAS disk on UFS The cyrus imap run inside a jail on the 2 ssd ( on zfs pool) The mailbox and xapian index are on two zfs dataset on a zpool with 28 mechanicals disk. Everything seem working fine, until I try to send the dataset on other server. I just cannot send a zfs snapshot from this server to another. If the dataset are small that's OK, but with the mailbox (~4To) the zfs command just hang after 10-40 minutes during 1-10 minutes, come back work during 1 or 2 hours and hang again etc. > in a desaster scenario it usually works well to reinstantiate the last > snapshot and start the server(s) with a forced full reconstruct run. But this > only offers "low resolution" recovery (mails / mods since last snapshot are > gone then). > > Beside this we run daily FS backups (incl. cyrus DB dumps) which allows us to How you do that ? Because cyrus got a lot of DB > reinstall from zero (i.e. autmated by ansible or similiar) on system and FS Yes we using puppet, reinstalling the system and configuration are easy. The hard part are the data. > level. > > I'm a bit new to the new included backup mechs and repo features in cyrus 3 > and interested in experiences with setups, allowing a efficient "lossless" > recovery too. I'm a bit new with cyrus so... ;-) All I can say is the replication seem to works well. I got master --> first slave (same room) --> second slave (distant datacenter). I'll will try today to see if it's easy or not to restart with a slave by cloning it. Best regards. -- Albert SHIH DIO bâtiment 15 Observatoire de Paris xmpp: j...@obspm.fr Heure local/Local time: Mon Jun 18 10:36:19 CEST 2018 Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus