Re: Restart from....? (DRP)

2018-06-20 Thread Michael Menge

Hi,

Quoting Albert Shih :


Hi everyone

I've a question about DRP (Disaster Recovery Plan), what's the easiest (=
fastest) way to rebuild a server (with the data) after a server «  
disappear » (fire,

water flood, etc.).

I see three way to « backup »  the data :

  Replication,

  Backup service (inside cyrusimapd 3),

  Filesystem backup (whatever the technic)

For replication my concern is the speed of the replication, the main server
(I got only one server) got lots of RAM, got SSD, and SAS disk, the
replication got SATA disks (lots of RAM too). When I check I think
everything are indeed replicated on the « slave » but with some delays
(1/2 days).



We have distributed our users across 6 (virtual) servers in an cyrus  
2.4 murder setup. The servers
are grouped in pairs, so that one is running on hardware in one  
building and the other in the other.
On each server there are 3 cyrus instances running, one frontend, and  
one backend and one replic.


In case of disaster, or planed maintenance we will start the replic as  
normal backend (we use service
ip addresses for each backend and move this ip to the other server so  
we don't have to update the

mupdate master mailbox.db.


The rolling replication is able to keep up. So normally there is only  
a small delay (2-5 Secs). If there
is a traffic peak (many newsletters) it may take up to 1-2h. I have  
only seen longer delays in case of a
corrupt mailbox where the replication bailed out. We are monitoring  
the size of the replication log.


We have ~ 41000 Accounts ~13.5 TB Mails, The VMs are running in an  
RHEV System.
Each Server has 20 GB Ram, 8 CPU-Cores, the Mails are stored on  
EUROstor iSCSI Systems with SATA disks
Recently we migrated the metadata onto a new EUROstor iSCSI System  
with SSDs. At the moment we plan to migrate
to Cyrus 3.0 to use archive partitions so that the recent mails will  
be stored on a iSCSI System with SAS disks,

and the older mails will be moved to the old iSCSI system with SATA disks

In addition to the disaster recovery plan we use "expunge_mode:  
delayed" and "delete_mode: delayed" and normal file based backup for  
the "I deleted my very importent mail by accident" use case.



Regards

   Michael Menge



M.MengeTel.: (49) 7071/29-70316
Universität Tübingen   Fax.: (49) 7071/29-5912
Zentrum für Datenverarbeitung  mail:  
michael.me...@zdv.uni-tuebingen.de

Wächterstraße 76
72074 Tübingen


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus

Re: Restart from....? (DRP)

2018-06-18 Thread Niels Dettenbach via Info-cyrus
Am Montag, 18. Juni 2018, 10:48:16 CEST schrieb Albert Shih:
> Everything seem working fine, until I try to send the dataset on other
> server. I just cannot send a zfs snapshot from this server to another. If
> the dataset are small that's OK, but with the mailbox (~4To) the zfs
> command just hang after 10-40 minutes during 1-10 minutes, come back work
> during 1 or 2 hours and hang again etc.

Ahh,
yes,

we have local snapshots and a second ZFS machine for ZFS replication (incl. 
snapshots) which is running in the background - so the snapshots are done 
locally and send "in backround" over the network to another location. if just 
the machine but not the disks break, we can sue the local disk set within a 
new machine to start over. if the whole sites burn down, the disks (or 
temporarily by iSCSI, NFS or samba) could be used to start over in/with new 
hardware.

in smaller on-site setups we use i.e. FreeNAS as the "FreeBSD distribution" 
for easier management (even trough less skilled IT persons). this allows us 
to run jails with i.e. cyrus (capsulated and backed up too) which can be 
handled "by click" btw. this means the cyrus (jails) are running in ZFS too.

> Yes we using puppet, reinstalling the system and configuration are easy.
> The hard part are the data.
this depends from the storage (on network like NAS or SAN or "locally"). by 
principle:

 - mounting or copying the pool (usually the largest part)
 - reimport database

i.e. similiar to:

https://forum.open-xchange.com/showthread.php?3512-Simple-Cyrus-mailbox-migration

or

http://www.monoplan.de/cyrus_imap_migration.html (german)

 - reconstruct -f  (i use "just" reconstruct -f as this runs the whole pool 
too)

then your cyrus should be fine again. the reconstruct -f (forced) mail reads 
the pool data (folder by folder, mail by mail) and "fixes" any inconsistencies 
with the "database" (due to the "hot" state of the backup - not shutted down 
for backup). the cyrus databases seems quite robust to that (compared to most 
other database systems).

> I'm a bit new with cyrus so...  All I can say is the replication seem to
> works well. I got
thanks for this info. will try cyrus replication soon for testing purposes. 
ß)

> I'll will try today to see if it's easy or not to restart with a slave by
> cloning it.
i'm new to replication, but afaik it should be easy to make a slave to a new 
master just by reconfiguration /cyrus.conf, some imapd.conf flags)  i assume.



hth a bit,
good luck,


niels.




-- 
 ---
 Niels Dettenbach
 Syndicat IT & Internet
 http://www.syndicat.com
 PGP: https://syndicat.com/pub_key.asc
 ---
 




Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: Restart from....? (DRP)

2018-06-18 Thread Albert Shih
Le 18/06/2018 à 10:22:03+0200, Niels Dettenbach via Info-cyrus a écrit
> Am Montag, 18. Juni 2018, 09:46:02 CEST schrieb Albert Shih:
> > What do you think ? What's your DRP ?
> I shoot snapshots from the underlying FS of the spool partition(s) and the
> main DB files (skiplist) - incl. (incremental) filesystem dumps of them.

How you do that ?

Because at the beginning my plan was to do both (replication and snapshot).

The problem is currently I'm encounter big issue with the snapshot. I don't
know if this is the right place because I don't know if it's related to
Cyrus, so that's why I didn't talk about at the
first time. But I got a server (Dell PowerEdge, 192Go, 28 mechanicals disk,
2 ssd, 2 SAS (for the OS)).

The system is FreeBSD 11 running on the 2 SAS disk on UFS

The cyrus imap run inside a jail on the 2 ssd ( on zfs pool)

The mailbox and xapian index are on two zfs dataset on a zpool with 28
mechanicals disk.

Everything seem working fine, until I try to send the dataset on other
server. I just cannot send a zfs snapshot from this server to another. If
the dataset are small that's OK, but with the mailbox (~4To) the zfs
command just hang after 10-40 minutes during 1-10 minutes, come back work
during 1 or 2 hours and hang again etc.

> in a desaster scenario it usually works well to reinstantiate the last
> snapshot and start the server(s) with a forced full reconstruct run. But this
> only offers "low resolution" recovery (mails / mods since last snapshot are
> gone then).
>
> Beside this we run daily FS backups (incl. cyrus DB dumps) which allows us to

How you do that ? Because cyrus got a lot of DB

> reinstall from zero (i.e. autmated by ansible or similiar) on system and FS

Yes we using puppet, reinstalling the system and configuration are easy.
The hard part are the data.

> level.
>
> I'm a bit new to the new included backup mechs and repo features in cyrus 3
> and interested in experiences with setups, allowing a efficient "lossless"
> recovery too.

I'm a bit new with cyrus so... ;-) All I can say is the replication seem to
works well. I got

  master --> first slave (same room) --> second slave (distant datacenter).

I'll will try today to see if it's easy or not to restart with a slave by
cloning it.

Best regards.

--
Albert SHIH
DIO bâtiment 15
Observatoire de Paris
xmpp: j...@obspm.fr
Heure local/Local time:
Mon Jun 18 10:36:19 CEST 2018

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus