Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-10-04 Thread Webert de Souza Lima
Hi, bring this up again to ask one more question:

what would be the best recommended locking strategy for dovecot against
cephfs?
this is a balanced setup using independent director instances but all
dovecot instances on each node share the same storage system (cephfs).

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Wed, May 16, 2018 at 5:15 PM Webert de Souza Lima 
wrote:

> Thanks Jack.
>
> That's good to know. It is definitely something to consider.
> In a distributed storage scenario we might build a dedicated pool for that
> and tune the pool as more capacity or performance is needed.
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
>
>
> On Wed, May 16, 2018 at 4:45 PM Jack  wrote:
>
>> On 05/16/2018 09:35 PM, Webert de Souza Lima wrote:
>> > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore
>> > backend.
>> > We'll have to do some some work on how to simulate user traffic, for
>> writes
>> > and readings. That seems troublesome.
>> I would appreciate seeing these results !
>>
>> > Thanks for the plugins recommendations. I'll take the change and ask you
>> > how is the SIS status? We have used it in the past and we've had some
>> > problems with it.
>>
>> I am using it since Dec 2016 with mdbox, with no issue at all (I am
>> currently using Dovecot 2.2.27-3 from Debian Stretch)
>> The only config I use is mail_attachment_dir, the rest lies as default
>> (mail_attachment_min_size = 128k, mail_attachment_fs = sis posix,
>> ail_attachment_hash = %{sha1})
>> The backend storage is a local filesystem, and there is only one Dovecot
>> instance
>>
>> >
>> > Regards,
>> >
>> > Webert Lima
>> > DevOps Engineer at MAV Tecnologia
>> > *Belo Horizonte - Brasil*
>> > *IRC NICK - WebertRLZ*
>> >
>> >
>> > On Wed, May 16, 2018 at 4:19 PM Jack  wrote:
>> >
>> >> Hi,
>> >>
>> >> Many (most ?) filesystems does not store multiple files on the same
>> block
>> >>
>> >> Thus, with sdbox, every single mail (you know, that kind of mail with
>> 10
>> >> lines in it) will eat an inode, and a block (4k here)
>> >> mdbox is more compact on this way
>> >>
>> >> Another difference: sdbox removes the message, mdbox does not : a
>> single
>> >> metadata update is performed, which may be packed with others if many
>> >> files are deleted at once
>> >>
>> >> That said, I do not have experience with dovecot + cephfs, nor have
>> made
>> >> tests for sdbox vs mdbox
>> >>
>> >> However, and this is a bit out of topic, I recommend you look at the
>> >> following dovecot's features (if not already done), as they are awesome
>> >> and will help you a lot:
>> >> - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib)
>> >> - Single-Instance-Storage (aka sis, aka "attachment deduplication" :
>> >> https://www.dovecot.org/list/dovecot/2013-December/094276.html)
>> >>
>> >> Regards,
>> >> On 05/16/2018 08:37 PM, Webert de Souza Lima wrote:
>> >>> I'm sending this message to both dovecot and ceph-users ML so please
>> >> don't
>> >>> mind if something seems too obvious for you.
>> >>>
>> >>> Hi,
>> >>>
>> >>> I have a question for both dovecot and ceph lists and below I'll
>> explain
>> >>> what's going on.
>> >>>
>> >>> Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox),
>> >> when
>> >>> using sdbox, a new file is stored for each email message.
>> >>> When using mdbox, multiple messages are appended to a single file
>> until
>> >> it
>> >>> reaches/passes the rotate limit.
>> >>>
>> >>> I would like to understand better how the mdbox format impacts on IO
>> >>> performance.
>> >>> I think it's generally expected that fewer larger file translate to
>> less
>> >> IO
>> >>> and more troughput when compared to more small files, but how does
>> >> dovecot
>> >>> handle that with mdbox?
>> >>> If dovecot does flush data to storage upon each and every new email is
>> >>> arrived and appended to the corresponding file, would that mean that
>> it
>> >>> generate the same ammount of IO as it would do with one file per
>> message?
>> >>> Also, if using mdbox many messages will be appended to a said file
>> >> before a
>> >>> new file is created. That should mean that a file descriptor is kept
>> open
>> >>> for sometime by dovecot process.
>> >>> Using cephfs as backend, how would this impact cluster performance
>> >>> regarding MDS caps and inodes cached when files from thousands of
>> users
>> >> are
>> >>> opened and appended all over?
>> >>>
>> >>> I would like to understand this better.
>> >>>
>> >>> Why?
>> >>> We are a small Business Email Hosting provider with bare metal, self
>> >> hosted
>> >>> systems, using dovecot for servicing mailboxes and cephfs for email
>> >> storage.
>> >>>
>> >>> We are currently working on dovecot and storage redesign to be in
>> >>> production ASAP. The main objective is to serve more users with better
>> >>> performance, high 

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Thanks Jack.

That's good to know. It is definitely something to consider.
In a distributed storage scenario we might build a dedicated pool for that
and tune the pool as more capacity or performance is needed.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Wed, May 16, 2018 at 4:45 PM Jack  wrote:

> On 05/16/2018 09:35 PM, Webert de Souza Lima wrote:
> > We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore
> > backend.
> > We'll have to do some some work on how to simulate user traffic, for
> writes
> > and readings. That seems troublesome.
> I would appreciate seeing these results !
>
> > Thanks for the plugins recommendations. I'll take the change and ask you
> > how is the SIS status? We have used it in the past and we've had some
> > problems with it.
>
> I am using it since Dec 2016 with mdbox, with no issue at all (I am
> currently using Dovecot 2.2.27-3 from Debian Stretch)
> The only config I use is mail_attachment_dir, the rest lies as default
> (mail_attachment_min_size = 128k, mail_attachment_fs = sis posix,
> ail_attachment_hash = %{sha1})
> The backend storage is a local filesystem, and there is only one Dovecot
> instance
>
> >
> > Regards,
> >
> > Webert Lima
> > DevOps Engineer at MAV Tecnologia
> > *Belo Horizonte - Brasil*
> > *IRC NICK - WebertRLZ*
> >
> >
> > On Wed, May 16, 2018 at 4:19 PM Jack  wrote:
> >
> >> Hi,
> >>
> >> Many (most ?) filesystems does not store multiple files on the same
> block
> >>
> >> Thus, with sdbox, every single mail (you know, that kind of mail with 10
> >> lines in it) will eat an inode, and a block (4k here)
> >> mdbox is more compact on this way
> >>
> >> Another difference: sdbox removes the message, mdbox does not : a single
> >> metadata update is performed, which may be packed with others if many
> >> files are deleted at once
> >>
> >> That said, I do not have experience with dovecot + cephfs, nor have made
> >> tests for sdbox vs mdbox
> >>
> >> However, and this is a bit out of topic, I recommend you look at the
> >> following dovecot's features (if not already done), as they are awesome
> >> and will help you a lot:
> >> - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib)
> >> - Single-Instance-Storage (aka sis, aka "attachment deduplication" :
> >> https://www.dovecot.org/list/dovecot/2013-December/094276.html)
> >>
> >> Regards,
> >> On 05/16/2018 08:37 PM, Webert de Souza Lima wrote:
> >>> I'm sending this message to both dovecot and ceph-users ML so please
> >> don't
> >>> mind if something seems too obvious for you.
> >>>
> >>> Hi,
> >>>
> >>> I have a question for both dovecot and ceph lists and below I'll
> explain
> >>> what's going on.
> >>>
> >>> Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox),
> >> when
> >>> using sdbox, a new file is stored for each email message.
> >>> When using mdbox, multiple messages are appended to a single file until
> >> it
> >>> reaches/passes the rotate limit.
> >>>
> >>> I would like to understand better how the mdbox format impacts on IO
> >>> performance.
> >>> I think it's generally expected that fewer larger file translate to
> less
> >> IO
> >>> and more troughput when compared to more small files, but how does
> >> dovecot
> >>> handle that with mdbox?
> >>> If dovecot does flush data to storage upon each and every new email is
> >>> arrived and appended to the corresponding file, would that mean that it
> >>> generate the same ammount of IO as it would do with one file per
> message?
> >>> Also, if using mdbox many messages will be appended to a said file
> >> before a
> >>> new file is created. That should mean that a file descriptor is kept
> open
> >>> for sometime by dovecot process.
> >>> Using cephfs as backend, how would this impact cluster performance
> >>> regarding MDS caps and inodes cached when files from thousands of users
> >> are
> >>> opened and appended all over?
> >>>
> >>> I would like to understand this better.
> >>>
> >>> Why?
> >>> We are a small Business Email Hosting provider with bare metal, self
> >> hosted
> >>> systems, using dovecot for servicing mailboxes and cephfs for email
> >> storage.
> >>>
> >>> We are currently working on dovecot and storage redesign to be in
> >>> production ASAP. The main objective is to serve more users with better
> >>> performance, high availability and scalability.
> >>> * high availability and load balancing is extremely important to us *
> >>>
> >>> On our current model, we're using mdbox format with dovecot, having
> >>> dovecot's INDEXes stored in a replicated pool of SSDs, and messages
> >> stored
> >>> in a replicated pool of HDDs (under a Cache Tier with a pool of SSDs).
> >>> All using cephfs / filestore backend.
> >>>
> >>> Currently there are 3 clusters running dovecot 2.2.34 and ceph Jewel
> >>> (10.2.9-4).
> >>>  - ~25K users from a few thousands of domains per cluster
> 

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Hello Danny,

I actually saw that thread and I was very excited about it. I thank you all
for that idea and all the effort being put in it.
I haven't yet tried to play around with your plugin but I intend to, and to
contribute back. I think when it's ready for production it will be
unbeatable.

I have watched your talk at Cephalocon (on YouTube). I'll see your slides,
maybe they'll give me more insights on our infrastructure architecture.

As you can see our business is still taking baby steps compared to Deutsche
Telekom's but we face infrastructure challenges everyday since ever.
As for now, I think we could still fit with cephfs, but we definitely need
some improvement.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Wed, May 16, 2018 at 4:42 PM Danny Al-Gaaf 
wrote:

> Hi,
>
> some time back we had similar discussions when we, as an email provider,
> discussed to move away from traditional NAS/NFS storage to Ceph.
>
> The problem with POSIX file systems and dovecot is that e.g. with mdbox
> only around ~20% of the IO operations are READ/WRITE, the rest are
> metadata IOs. You will not change this with using CephFS since it will
> basically behave the same way as e.g. NFS.
>
> We decided to develop librmb to store emails as objects directly in
> RADOS instead of CephFS. The project is still under development, so you
> should not use it in production, but you can try it to run a POC.
>
> For more information check out my slides from Ceph Day London 2018:
> https://dalgaaf.github.io/cephday-london2018-emailstorage/#/cover-page
>
> The project can be found on github:
> https://github.com/ceph-dovecot/
>
> -Danny
>
> Am 16.05.2018 um 20:37 schrieb Webert de Souza Lima:
> > I'm sending this message to both dovecot and ceph-users ML so please
> don't
> > mind if something seems too obvious for you.
> >
> > Hi,
> >
> > I have a question for both dovecot and ceph lists and below I'll explain
> > what's going on.
> >
> > Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox),
> when
> > using sdbox, a new file is stored for each email message.
> > When using mdbox, multiple messages are appended to a single file until
> it
> > reaches/passes the rotate limit.
> >
> > I would like to understand better how the mdbox format impacts on IO
> > performance.
> > I think it's generally expected that fewer larger file translate to less
> IO
> > and more troughput when compared to more small files, but how does
> dovecot
> > handle that with mdbox?
> > If dovecot does flush data to storage upon each and every new email is
> > arrived and appended to the corresponding file, would that mean that it
> > generate the same ammount of IO as it would do with one file per message?
> > Also, if using mdbox many messages will be appended to a said file
> before a
> > new file is created. That should mean that a file descriptor is kept open
> > for sometime by dovecot process.
> > Using cephfs as backend, how would this impact cluster performance
> > regarding MDS caps and inodes cached when files from thousands of users
> are
> > opened and appended all over?
> >
> > I would like to understand this better.
> >
> > Why?
> > We are a small Business Email Hosting provider with bare metal, self
> hosted
> > systems, using dovecot for servicing mailboxes and cephfs for email
> storage.
> >
> > We are currently working on dovecot and storage redesign to be in
> > production ASAP. The main objective is to serve more users with better
> > performance, high availability and scalability.
> > * high availability and load balancing is extremely important to us *
> >
> > On our current model, we're using mdbox format with dovecot, having
> > dovecot's INDEXes stored in a replicated pool of SSDs, and messages
> stored
> > in a replicated pool of HDDs (under a Cache Tier with a pool of SSDs).
> > All using cephfs / filestore backend.
> >
> > Currently there are 3 clusters running dovecot 2.2.34 and ceph Jewel
> > (10.2.9-4).
> >  - ~25K users from a few thousands of domains per cluster
> >  - ~25TB of email data per cluster
> >  - ~70GB of dovecot INDEX [meta]data per cluster
> >  - ~100MB of cephfs metadata per cluster
> >
> > Our goal is to build a single ceph cluster for storage that could expand
> in
> > capacity, be highly available and perform well enough. I know, that's
> what
> > everyone wants.
> >
> > Cephfs is an important choise because:
> >  - there can be multiple mountpoints, thus multiple dovecot instances on
> > different hosts
> >  - the same storage backend is used for all dovecot instances
> >  - no need of sharding domains
> >  - dovecot is easily load balanced (with director sticking users to the
> > same dovecot backend)
> >
> > On the upcoming upgrade we intent to:
> >  - upgrade ceph to 12.X (Luminous)
> >  - drop the SSD Cache Tier (because it's deprecated)
> >  - use bluestore engine
> >
> > I was said on 

Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Danny Al-Gaaf
Hi,

some time back we had similar discussions when we, as an email provider,
discussed to move away from traditional NAS/NFS storage to Ceph.

The problem with POSIX file systems and dovecot is that e.g. with mdbox
only around ~20% of the IO operations are READ/WRITE, the rest are
metadata IOs. You will not change this with using CephFS since it will
basically behave the same way as e.g. NFS.

We decided to develop librmb to store emails as objects directly in
RADOS instead of CephFS. The project is still under development, so you
should not use it in production, but you can try it to run a POC.

For more information check out my slides from Ceph Day London 2018:
https://dalgaaf.github.io/cephday-london2018-emailstorage/#/cover-page

The project can be found on github:
https://github.com/ceph-dovecot/

-Danny

Am 16.05.2018 um 20:37 schrieb Webert de Souza Lima:
> I'm sending this message to both dovecot and ceph-users ML so please don't
> mind if something seems too obvious for you.
> 
> Hi,
> 
> I have a question for both dovecot and ceph lists and below I'll explain
> what's going on.
> 
> Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when
> using sdbox, a new file is stored for each email message.
> When using mdbox, multiple messages are appended to a single file until it
> reaches/passes the rotate limit.
> 
> I would like to understand better how the mdbox format impacts on IO
> performance.
> I think it's generally expected that fewer larger file translate to less IO
> and more troughput when compared to more small files, but how does dovecot
> handle that with mdbox?
> If dovecot does flush data to storage upon each and every new email is
> arrived and appended to the corresponding file, would that mean that it
> generate the same ammount of IO as it would do with one file per message?
> Also, if using mdbox many messages will be appended to a said file before a
> new file is created. That should mean that a file descriptor is kept open
> for sometime by dovecot process.
> Using cephfs as backend, how would this impact cluster performance
> regarding MDS caps and inodes cached when files from thousands of users are
> opened and appended all over?
> 
> I would like to understand this better.
> 
> Why?
> We are a small Business Email Hosting provider with bare metal, self hosted
> systems, using dovecot for servicing mailboxes and cephfs for email storage.
> 
> We are currently working on dovecot and storage redesign to be in
> production ASAP. The main objective is to serve more users with better
> performance, high availability and scalability.
> * high availability and load balancing is extremely important to us *
> 
> On our current model, we're using mdbox format with dovecot, having
> dovecot's INDEXes stored in a replicated pool of SSDs, and messages stored
> in a replicated pool of HDDs (under a Cache Tier with a pool of SSDs).
> All using cephfs / filestore backend.
> 
> Currently there are 3 clusters running dovecot 2.2.34 and ceph Jewel
> (10.2.9-4).
>  - ~25K users from a few thousands of domains per cluster
>  - ~25TB of email data per cluster
>  - ~70GB of dovecot INDEX [meta]data per cluster
>  - ~100MB of cephfs metadata per cluster
> 
> Our goal is to build a single ceph cluster for storage that could expand in
> capacity, be highly available and perform well enough. I know, that's what
> everyone wants.
> 
> Cephfs is an important choise because:
>  - there can be multiple mountpoints, thus multiple dovecot instances on
> different hosts
>  - the same storage backend is used for all dovecot instances
>  - no need of sharding domains
>  - dovecot is easily load balanced (with director sticking users to the
> same dovecot backend)
> 
> On the upcoming upgrade we intent to:
>  - upgrade ceph to 12.X (Luminous)
>  - drop the SSD Cache Tier (because it's deprecated)
>  - use bluestore engine
> 
> I was said on freenode/#dovecot that there are many cases where SDBOX would
> perform better with NFS sharing.
> In case of cephfs, at first, I wouldn't think that would be true because
> more files == more generated IO, but thinking about what I said in the
> beginning regarding sdbox vs mdbox that could be wrong.
> 
> Any thoughts will be highlt appreciated.
> 
> Regards,
> 
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
> 
> 
> 
> ___
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 


Re: [ceph-users] dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
Hello Jack,

yes, I imagine I'll have to do some work on tuning the block size on
cephfs. Thanks for the advise.
I knew that using mdbox, messages are not removed but I though that was
true in sdbox too. Thanks again.

We'll soon do benchmarks of sdbox vs mdbox over cephfs with bluestore
backend.
We'll have to do some some work on how to simulate user traffic, for writes
and readings. That seems troublesome.

Thanks for the plugins recommendations. I'll take the change and ask you
how is the SIS status? We have used it in the past and we've had some
problems with it.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*


On Wed, May 16, 2018 at 4:19 PM Jack  wrote:

> Hi,
>
> Many (most ?) filesystems does not store multiple files on the same block
>
> Thus, with sdbox, every single mail (you know, that kind of mail with 10
> lines in it) will eat an inode, and a block (4k here)
> mdbox is more compact on this way
>
> Another difference: sdbox removes the message, mdbox does not : a single
> metadata update is performed, which may be packed with others if many
> files are deleted at once
>
> That said, I do not have experience with dovecot + cephfs, nor have made
> tests for sdbox vs mdbox
>
> However, and this is a bit out of topic, I recommend you look at the
> following dovecot's features (if not already done), as they are awesome
> and will help you a lot:
> - Compression (classic, https://wiki.dovecot.org/Plugins/Zlib)
> - Single-Instance-Storage (aka sis, aka "attachment deduplication" :
> https://www.dovecot.org/list/dovecot/2013-December/094276.html)
>
> Regards,
> On 05/16/2018 08:37 PM, Webert de Souza Lima wrote:
> > I'm sending this message to both dovecot and ceph-users ML so please
> don't
> > mind if something seems too obvious for you.
> >
> > Hi,
> >
> > I have a question for both dovecot and ceph lists and below I'll explain
> > what's going on.
> >
> > Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox),
> when
> > using sdbox, a new file is stored for each email message.
> > When using mdbox, multiple messages are appended to a single file until
> it
> > reaches/passes the rotate limit.
> >
> > I would like to understand better how the mdbox format impacts on IO
> > performance.
> > I think it's generally expected that fewer larger file translate to less
> IO
> > and more troughput when compared to more small files, but how does
> dovecot
> > handle that with mdbox?
> > If dovecot does flush data to storage upon each and every new email is
> > arrived and appended to the corresponding file, would that mean that it
> > generate the same ammount of IO as it would do with one file per message?
> > Also, if using mdbox many messages will be appended to a said file
> before a
> > new file is created. That should mean that a file descriptor is kept open
> > for sometime by dovecot process.
> > Using cephfs as backend, how would this impact cluster performance
> > regarding MDS caps and inodes cached when files from thousands of users
> are
> > opened and appended all over?
> >
> > I would like to understand this better.
> >
> > Why?
> > We are a small Business Email Hosting provider with bare metal, self
> hosted
> > systems, using dovecot for servicing mailboxes and cephfs for email
> storage.
> >
> > We are currently working on dovecot and storage redesign to be in
> > production ASAP. The main objective is to serve more users with better
> > performance, high availability and scalability.
> > * high availability and load balancing is extremely important to us *
> >
> > On our current model, we're using mdbox format with dovecot, having
> > dovecot's INDEXes stored in a replicated pool of SSDs, and messages
> stored
> > in a replicated pool of HDDs (under a Cache Tier with a pool of SSDs).
> > All using cephfs / filestore backend.
> >
> > Currently there are 3 clusters running dovecot 2.2.34 and ceph Jewel
> > (10.2.9-4).
> >  - ~25K users from a few thousands of domains per cluster
> >  - ~25TB of email data per cluster
> >  - ~70GB of dovecot INDEX [meta]data per cluster
> >  - ~100MB of cephfs metadata per cluster
> >
> > Our goal is to build a single ceph cluster for storage that could expand
> in
> > capacity, be highly available and perform well enough. I know, that's
> what
> > everyone wants.
> >
> > Cephfs is an important choise because:
> >  - there can be multiple mountpoints, thus multiple dovecot instances on
> > different hosts
> >  - the same storage backend is used for all dovecot instances
> >  - no need of sharding domains
> >  - dovecot is easily load balanced (with director sticking users to the
> > same dovecot backend)
> >
> > On the upcoming upgrade we intent to:
> >  - upgrade ceph to 12.X (Luminous)
> >  - drop the SSD Cache Tier (because it's deprecated)
> >  - use bluestore engine
> >
> > I was said on freenode/#dovecot that there are many cases where SDBOX
> would
> > 

dovecot + cephfs - sdbox vs mdbox

2018-05-16 Thread Webert de Souza Lima
I'm sending this message to both dovecot and ceph-users ML so please don't
mind if something seems too obvious for you.

Hi,

I have a question for both dovecot and ceph lists and below I'll explain
what's going on.

Regarding dbox format (https://wiki2.dovecot.org/MailboxFormat/dbox), when
using sdbox, a new file is stored for each email message.
When using mdbox, multiple messages are appended to a single file until it
reaches/passes the rotate limit.

I would like to understand better how the mdbox format impacts on IO
performance.
I think it's generally expected that fewer larger file translate to less IO
and more troughput when compared to more small files, but how does dovecot
handle that with mdbox?
If dovecot does flush data to storage upon each and every new email is
arrived and appended to the corresponding file, would that mean that it
generate the same ammount of IO as it would do with one file per message?
Also, if using mdbox many messages will be appended to a said file before a
new file is created. That should mean that a file descriptor is kept open
for sometime by dovecot process.
Using cephfs as backend, how would this impact cluster performance
regarding MDS caps and inodes cached when files from thousands of users are
opened and appended all over?

I would like to understand this better.

Why?
We are a small Business Email Hosting provider with bare metal, self hosted
systems, using dovecot for servicing mailboxes and cephfs for email storage.

We are currently working on dovecot and storage redesign to be in
production ASAP. The main objective is to serve more users with better
performance, high availability and scalability.
* high availability and load balancing is extremely important to us *

On our current model, we're using mdbox format with dovecot, having
dovecot's INDEXes stored in a replicated pool of SSDs, and messages stored
in a replicated pool of HDDs (under a Cache Tier with a pool of SSDs).
All using cephfs / filestore backend.

Currently there are 3 clusters running dovecot 2.2.34 and ceph Jewel
(10.2.9-4).
 - ~25K users from a few thousands of domains per cluster
 - ~25TB of email data per cluster
 - ~70GB of dovecot INDEX [meta]data per cluster
 - ~100MB of cephfs metadata per cluster

Our goal is to build a single ceph cluster for storage that could expand in
capacity, be highly available and perform well enough. I know, that's what
everyone wants.

Cephfs is an important choise because:
 - there can be multiple mountpoints, thus multiple dovecot instances on
different hosts
 - the same storage backend is used for all dovecot instances
 - no need of sharding domains
 - dovecot is easily load balanced (with director sticking users to the
same dovecot backend)

On the upcoming upgrade we intent to:
 - upgrade ceph to 12.X (Luminous)
 - drop the SSD Cache Tier (because it's deprecated)
 - use bluestore engine

I was said on freenode/#dovecot that there are many cases where SDBOX would
perform better with NFS sharing.
In case of cephfs, at first, I wouldn't think that would be true because
more files == more generated IO, but thinking about what I said in the
beginning regarding sdbox vs mdbox that could be wrong.

Any thoughts will be highlt appreciated.

Regards,

Webert Lima
DevOps Engineer at MAV Tecnologia
*Belo Horizonte - Brasil*
*IRC NICK - WebertRLZ*