Re: [Dovecot] Compressing existing maildirs

2012-01-03 Thread Timo Sirainen
On 31.12.2011, at 9.54, Stan Hoeppner wrote:

 Timo, is there any technical or sanity based upper bound on mdbox size?
 Anything wrong with using 64MB, 128MB, or even larger for
 mdbox_rotate_size?

Should be fine. The only issue is the extra disk I/O required to recreate the 
files during doveadm purge.



Re: [Dovecot] Compressing existing maildirs

2011-12-30 Thread Jan-Frode Myklebust
On Thu, Dec 29, 2011 at 07:00:03AM -0600, Stan Hoeppner wrote:
  We just got rid of the legacy app that worked directly against the
  maildirs, which is the reason we now can turn on compression. I
  intend to switch to mdbox, but first I need to free up some disks by
  compressing the existing maildirs (12 TB maildirs, should probably
  compress down to less than half).
 
 How much additional space do you expect the conversion process to
 compressed mdbox to consume?

Somewhere around 1/3 of the current usage, I expect..

 It shouldn't need much.  Using dsync, the
 conversion will be done one mailbox at a time and the existing emails
 will be compressed when written into the new mdbox mailbox.

Yes, I know, but I intend to do more than just convert to mdbox. I want
to fix the whole folder structure*, in a new filesystem with different
settings (turn on metadata-replication, and possibly also data
replication). So I need to free up some disks before this can start.

[*] move away from @Mails /atmail/a/b/abuser@domain folder structure to
mdbox:/srv/mailbackup/%256Hu/%d/%n, stop having home=inbox, 
possibly use many smaller fs's instead of one huge, move the indexes
inside home...


  -jf


Re: [Dovecot] Compressing existing maildirs

2011-12-30 Thread Stan Hoeppner
On 12/30/2011 8:41 AM, Jan-Frode Myklebust wrote:
 On Thu, Dec 29, 2011 at 07:00:03AM -0600, Stan Hoeppner wrote:
 We just got rid of the legacy app that worked directly against the
 maildirs, which is the reason we now can turn on compression. I
 intend to switch to mdbox, but first I need to free up some disks by
 compressing the existing maildirs (12 TB maildirs, should probably
 compress down to less than half).

 How much additional space do you expect the conversion process to
 compressed mdbox to consume?
 
 Somewhere around 1/3 of the current usage, I expect..
 
 It shouldn't need much.  Using dsync, the
 conversion will be done one mailbox at a time and the existing emails
 will be compressed when written into the new mdbox mailbox.
 
 Yes, I know, but I intend to do more than just convert to mdbox. I want
 to fix the whole folder structure*, in a new filesystem with different
 settings (turn on metadata-replication, and possibly also data
 replication). So I need to free up some disks before this can start.
 
 [*] move away from @Mails /atmail/a/b/abuser@domain folder structure to
 mdbox:/srv/mailbackup/%256Hu/%d/%n, stop having home=inbox, 
 possibly use many smaller fs's instead of one huge, move the indexes
 inside home...

Roger that.  Good strategy.  You using SAN storage or local RAID?  What
filesystem do you plan to use for the new mailbox location?  What OS is
the Dovecot host?  Lastly, how many users you have?  Sorry for prying,
I'm always really curious about system details when someone states they
have 12TB of mailbox data. ;)

-- 
Stan



Re: [Dovecot] Compressing existing maildirs

2011-12-30 Thread Jan-Frode Myklebust
On Fri, Dec 30, 2011 at 06:38:28PM -0600, Stan Hoeppner wrote:
 
 Roger that.  Good strategy.  You using SAN storage or local RAID?  What
 filesystem do you plan to use for the new mailbox location?  What OS is
 the Dovecot host?

IBM DS4800 SAN-storage. Filesystem is IBM GPFS, which stripe all I/O
over all the RAID5 LUNs it has assigned. Kind of like RAID5+0. To guard
against disaster if one RAID5 array should fail, we plan on replicating
the filesystem metadata on different sets for LUNs.

OS is RHEL (currently RHEL4 and RHEL5, but new servers are implemented
on RHEL6).

 Lastly, how many users you have?  Sorry for prying,

I'd rather not say.. but we're an ISP, with about 250.000 residential
customers and multiple mailboxes per customer.

 I'm always really curious about system details when someone states they
 have 12TB of mailbox data. ;)

$ df -h /usr/local/atmail/users
FilesystemSize  Used Avail Use% Mounted on
/dev/atmailusers   14T   12T  2.1T  85% /usr/local/atmail/users
$ df -hi /usr/local/atmail/users
FilesystemInodes   IUsed   IFree IUse% Mounted on
/dev/atmailusers145M109M 37M   75% 
/usr/local/atmail/users

Looking forward to reducing the number of inodes when we finally move to
mdbox.. Should do wonders to the backup process.


  -jf


Re: [Dovecot] Compressing existing maildirs

2011-12-30 Thread Stan Hoeppner
On 12/31/2011 12:56 AM, Jan-Frode Myklebust wrote:
 On Fri, Dec 30, 2011 at 06:38:28PM -0600, Stan Hoeppner wrote:

 Roger that.  Good strategy.  You using SAN storage or local RAID?  What
 filesystem do you plan to use for the new mailbox location?  What OS is
 the Dovecot host?
 
 IBM DS4800 SAN-storage. Filesystem is IBM GPFS, which stripe all I/O
 over all the RAID5 LUNs it has assigned. Kind of like RAID5+0. To guard
 against disaster if one RAID5 array should fail, we plan on replicating
 the filesystem metadata on different sets for LUNs.

Nice setup.  I've mentioned GPFS for cluster use on this list before,
but I think you're the only operator to confirm using it.  I'm sure
others would be interested in hearing of your first hand experience:
pros, cons, performance, etc.  And a ball park figure on the licensing
costs, whether one can only use GPFS on IBM storage or if storage from
others vendors is allowed in the GPFS pool.

To this point IIRC everyone here doing clusters is using NFS, GFS, or
OCFS.  Each has its downsides, mostly because everyone is using maildir.
 NFS has locking issues with shared dovecot index files.  GFS and OCFS
have filesystem metadata performance issues.  How does GPFS perform with
your maildir workload?

 OS is RHEL (currently RHEL4 and RHEL5, but new servers are implemented
 on RHEL6).
 
 Lastly, how many users you have?  Sorry for prying,
 
 I'd rather not say.. but we're an ISP, with about 250.000 residential
 customers and multiple mailboxes per customer.
 
 I'm always really curious about system details when someone states they
 have 12TB of mailbox data. ;)
 
   $ df -h /usr/local/atmail/users
   FilesystemSize  Used Avail Use% Mounted on
   /dev/atmailusers   14T   12T  2.1T  85% /usr/local/atmail/users
   $ df -hi /usr/local/atmail/users
   FilesystemInodes   IUsed   IFree IUse% Mounted on
   /dev/atmailusers145M109M 37M   75% 
 /usr/local/atmail/users
 
 Looking forward to reducing the number of inodes when we finally move to
 mdbox.. Should do wonders to the backup process.

That will depend to a large degree on your mdbox_rotate_size value.  The
default is 2MB, which means you'll get multiple ~2MB mdbox files.  If we
assume the average email size including headers and attachments is 32KB,
Dovecot will place ~64 such emails in a single mdbox file with the
default 2MB setting.  32KB may be a high or low average depending on
your particular users.

Considering there is no inherent performance downside to going larger
than the default, and significant gains to be made, consider a setting
of 8MB to 16MB.  This will dramatically reduce both inode consumption
and filesystem metadata IOPS vs maildir.  Reducing IOPS on a shared SAN
is always a plus, especially if you're going to be adding some extra
GPFS replication traffic.

Timo, is there any technical or sanity based upper bound on mdbox size?
 Anything wrong with using 64MB, 128MB, or even larger for
mdbox_rotate_size?

-- 
Stan


Re: [Dovecot] Compressing existing maildirs

2011-12-29 Thread Jan-Frode Myklebust
On Wed, Dec 28, 2011 at 03:56:33PM -0800, Dovecot-GDH wrote:
 The cleanest (though not necessarily simplest) way to go about this would be 
 to use dsync to create a new maildir and incrementally direct traffic to a 
 separate Dovecot instance.
 
 Unless you have a legacy application that relies on maildir, switching to 
 mdbox would be a good idea too.

We just got rid of the legacy app that worked directly against the
maildirs, which is the reason we now can turn on compression. I
intend to switch to mdbox, but first I need to free up some disks by
compressing the existing maildirs (12 TB maildirs, should probably
compress down to less than half).

 
 I expect that with Dovecot compression is something that can just be turned 
 on, but for fear of any possible issue, I chose to migrate mailboxes in 
 batches with the way mentioned above.
 

Migrating to mdbox is much scarier to me than an easily reversible
compression of existing maildir files.

Could you please give a bit more details about how you did this migration?
Did you change user home dirctory in the process? Seeing the scripts you
used to run the migration would be very interesting..


  -jf


Re: [Dovecot] Compressing existing maildirs

2011-12-29 Thread Timo Sirainen
On 29.12.2011, at 15.36, Jan-Frode Myklebust wrote:

 On Thu, Dec 29, 2011 at 02:55:40PM +0200, Timo Sirainen wrote:
 
 I.e. find all maildir-files:
 
 - with size in the name (*,S=*)
 - modified before I enabled zlib plugin
 
 As long as it doesn't find any already compressed mails..
 
 Can't I trust that no mails with timestamp before I enabled compression
 are uncompressed? Or will dovecot compress old messages keeping old
 timestamp when copying messages between folders, or something like that?

It's possible that a user saves a mail with an old IMAP INTERNALDATE (=file's 
mtime), which is already compressed. You could use ctime, but that could skip 
mails whose flags have been changed since compression.

 I want to avoid reading every file to check if it's compressed
 already, as that will add ages to an already slow process..

You could use mtime, and just before compressing the mail check if it's already 
compressed. That won't add much overhead.

 - compress them 
 - add the Z suffix
 
 Make sure there's also :2, suffix already. If someone hasn't logged in for a 
 while there are such files in new/ directory.
 
 So, 
   find /var/vmail -type f -name *,S=*:2* -mtime +6 -exec gzip -S Z -6 
 '{}' +
 
 
 Right ? I don't care too much if I miss on a few percent of the files..

Yes.

Re: [Dovecot] Compressing existing maildirs

2011-12-29 Thread Timo Sirainen
On 24.12.2011, at 17.20, Jan-Frode Myklebust wrote:

 I've just enabled zlib for our users, and am looking at how to compress
 the existing files. The routine for doing this at
 http://wiki2.dovecot.org/Plugins/Zlib seems a bit complicated. What do
 you think about simply doing:
 
   find /var/vmail -type f -name *,S=* -mtime +1 -exec gzip -S Z -6 '{}' 
 +
 
 
 I.e. find all maildir-files:
 
   - with size in the name (*,S=*)
   - modified before I enabled zlib plugin

As long as it doesn't find any already compressed mails..

   - compress them 
   - add the Z suffix

Make sure there's also :2, suffix already. If someone hasn't logged in for a 
while there are such files in new/ directory.

 It's of course racy without the maildirlock, but are there any other
 problems with this approach ?

Other than being racy, I guess it should work.

Re: [Dovecot] Compressing existing maildirs

2011-12-29 Thread Jan-Frode Myklebust
On Thu, Dec 29, 2011 at 02:55:40PM +0200, Timo Sirainen wrote:
  
  I.e. find all maildir-files:
  
  - with size in the name (*,S=*)
  - modified before I enabled zlib plugin
 
 As long as it doesn't find any already compressed mails..

Can't I trust that no mails with timestamp before I enabled compression
are uncompressed? Or will dovecot compress old messages keeping old
timestamp when copying messages between folders, or something like that?

I want to avoid reading every file to check if it's compressed
already, as that will add ages to an already slow process..

 
  - compress them 
  - add the Z suffix
 
 Make sure there's also :2, suffix already. If someone hasn't logged in for a 
 while there are such files in new/ directory.

So, 
find /var/vmail -type f -name *,S=*:2* -mtime +6 -exec gzip -S Z -6 
'{}' +


Right ? I don't care too much if I miss on a few percent of the files..


(I'll probably have to use -newer /somefile instead of -mtime since it
will run for some days)


  -jf


Re: [Dovecot] Compressing existing maildirs

2011-12-29 Thread Stan Hoeppner
On 12/29/2011 2:49 AM, Jan-Frode Myklebust wrote:
 On Wed, Dec 28, 2011 at 03:56:33PM -0800, Dovecot-GDH wrote:
 The cleanest (though not necessarily simplest) way to go about this would be 
 to use dsync to create a new maildir and incrementally direct traffic to a 
 separate Dovecot instance.

 Unless you have a legacy application that relies on maildir, switching to 
 mdbox would be a good idea too.
 
 We just got rid of the legacy app that worked directly against the
 maildirs, which is the reason we now can turn on compression. I
 intend to switch to mdbox, but first I need to free up some disks by
 compressing the existing maildirs (12 TB maildirs, should probably
 compress down to less than half).

How much additional space do you expect the conversion process to
compressed mdbox to consume?  It shouldn't need much.  Using dsync, the
conversion will be done one mailbox at a time and the existing emails
will be compressed when written into the new mdbox mailbox.

After you've converted a few mailboxes by hand and have confirmed you're
happy with the results, simply add commands to your bulk conversion
script to delete each user maildir and contents after the new mdbox
mailbox has been created and populated.  Using this method shouldn't
require much more additional filesystem space than that equal to your
largest single user maildir.

Given your 12TB of mailstore, I'd convert users in small batches over a
period of weeks or a month, depending on your total mailbox count.
Firing up a conversion script and having it run non-stop until all 12TB
are converted is probably asking for trouble due to many factors I
shouldn't need to put down here.  Time your first few manual
conversions.  Divide that average time into your daily off-peak hours so
you know approximately how many mailboxes you can convert during
off-peak hours.  Run your script daily against these small sets of
mailboxes until the entire process is complete.

-- 
Stan


Re: [Dovecot] Compressing existing maildirs

2011-12-28 Thread Dovecot-GDH
The cleanest (though not necessarily simplest) way to go about this would be to 
use dsync to create a new maildir and incrementally direct traffic to a 
separate Dovecot instance.

Unless you have a legacy application that relies on maildir, switching to mdbox 
would be a good idea too.

I expect that with Dovecot compression is something that can just be turned 
on, but for fear of any possible issue, I chose to migrate mailboxes in 
batches with the way mentioned above.

On Dec 24, 2011, at 7:20 AM, Jan-Frode Myklebust wrote:

 I've just enabled zlib for our users, and am looking at how to compress
 the existing files. The routine for doing this at
 http://wiki2.dovecot.org/Plugins/Zlib seems a bit complicated. What do
 you think about simply doing:
 
   find /var/vmail -type f -name *,S=* -mtime +1 -exec gzip -S Z -6 '{}' 
 +
 
 
 I.e. find all maildir-files:
 
   - with size in the name (*,S=*)
   - modified before I enabled zlib plugin
   - compress them 
   - add the Z suffix
   - keep timestamps (gzip does that by default)
   
 
 It's of course racy without the maildirlock, but are there any other
 problems with this approach ?
 
 
 -jf



[Dovecot] Compressing existing maildirs

2011-12-24 Thread Jan-Frode Myklebust
I've just enabled zlib for our users, and am looking at how to compress
the existing files. The routine for doing this at
http://wiki2.dovecot.org/Plugins/Zlib seems a bit complicated. What do
you think about simply doing:

find /var/vmail -type f -name *,S=* -mtime +1 -exec gzip -S Z -6 '{}' 
+


I.e. find all maildir-files:

- with size in the name (*,S=*)
- modified before I enabled zlib plugin
- compress them 
- add the Z suffix
- keep timestamps (gzip does that by default)


It's of course racy without the maildirlock, but are there any other
problems with this approach ?


 -jf