Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-16 Thread Jan Ingvoldstad via Exim-users
On Thu, Aug 15, 2019 at 6:34 PM Phil Pennock via Exim-users <
exim-users@exim.org> wrote:

> On 2019-08-14 at 12:24 -0400, Phil Pennock via Exim-users wrote:
> > On 2019-08-14 at 12:54 +0100, Jeremy Harris via Exim-users wrote:
> > > Do we need a fast/poor quota method for cases where the size-file
> > > cannot be used?
> >
> > Just to raise the possibility to see if others can spot approaches which
> > make this feasible rather than a giant can of worms: direct support for
> > filesystem quotas in Exim, using soft limits set to match Dovecot's.
>
> This was a stupid suggestion on my part.
>
> In any sort of spool delivery system like this, one OS runtime user is
> used and owns all the files.  OS quotas won't help at all.  This is
> _why_ Exim supports other quota systems.
>
>
Yes, but OS quotas _do_ help in the case where mail is owned by a specific
Unix user ID. Additionally, there may be group quotas.

-- 
Jan
-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-15 Thread Phil Pennock via Exim-users
On 2019-08-14 at 12:24 -0400, Phil Pennock via Exim-users wrote:
> On 2019-08-14 at 12:54 +0100, Jeremy Harris via Exim-users wrote:
> > Do we need a fast/poor quota method for cases where the size-file
> > cannot be used?
> 
> Just to raise the possibility to see if others can spot approaches which
> make this feasible rather than a giant can of worms: direct support for
> filesystem quotas in Exim, using soft limits set to match Dovecot's.

This was a stupid suggestion on my part.

In any sort of spool delivery system like this, one OS runtime user is
used and owns all the files.  OS quotas won't help at all.  This is
_why_ Exim supports other quota systems.

Sorry,
-Phil

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-15 Thread Jasen Betts via Exim-users
On 2019-08-15, Jeremy Harris via Exim-users  wrote:
> On 14/08/2019 20:09, Cyborg via Exim-users wrote:
>> I really believe, an option to match quota/du behaviour and use stat()
>> on each file to check
>> the inode, is fine. A) it's relatively simple to do, B) does not break
>> existing installations and C) works on NFS as well, i think.
>
> We already use stat on each file, unless quota_size_regex is set.
>
> If we take Nigel's excellent point that dividing the filesize by the
> link count, both from a stat call, does the job - plus document the
> point that using quota_size_regex gets quotas wrong in environments
> using linking - this would seem to be the minimum possible change.
>
> Does it satisfy enough cases?

I think that if OS filesystem quotas are in use (rather than software
quotas) exim will be getting errors in response to open(2) write(2)
or close(2) calls when it attempts to exceed them, and it probably
already handles that scenario fairly well.

-- 
  When I tried casting out nines I made a hash of it.

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-15 Thread Jeremy Harris via Exim-users
On 14/08/2019 20:09, Cyborg via Exim-users wrote:
> I really believe, an option to match quota/du behaviour and use stat()
> on each file to check
> the inode, is fine. A) it's relatively simple to do, B) does not break
> existing installations and C) works on NFS as well, i think.

We already use stat on each file, unless quota_size_regex is set.

If we take Nigel's excellent point that dividing the filesize by the
link count, both from a stat call, does the job - plus document the
point that using quota_size_regex gets quotas wrong in environments
using linking - this would seem to be the minimum possible change.

Does it satisfy enough cases?

-- 
Cheers,
  Jeremy

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Cyborg via Exim-users
Am 14.08.19 um 18:24 schrieb Phil Pennock via Exim-users:
> On 2019-08-14 at 12:54 +0100, Jeremy Harris via Exim-users wrote:
>> Do we need a fast/poor quota method for cases where the size-file
>> cannot be used?
> Just to raise the possibility to see if others can spot approaches which
> make this feasible rather than a giant can of worms: direct support for
> filesystem quotas in Exim, using soft limits set to match Dovecot's.
>
> Caveat in all of this: I've never played with quotas from the C APIs, so
> don't know enough to speak authoritatively here; this is based on some
> quick man-page and source checks, to write this email.
>
> That should be a "proper" solution, which is lightweight in use, if
> folks already have the FS quotas turned on.  Which my uninformed guess
> is "yes, because that's the sanest way for Dovecot to implement quotas
> with hard link support".
>

I really believe, an option to match quota/du behaviour and use stat()
on each file to check
the inode, is fine. A) it's relatively simple to do, B) does not break
existing installations and C) works on NFS as well, i think.

>From a mailhosters reallive experience, i can tell, that we had to
switch from local diskstorage to a nfs storage for a customer over night,
as he ran out of diskspace and needed way more we could have available
on the local storage. So a solution that works in both situations
is handy. The local and the nfs share had both SSDs in use, so the
stat() call would be no big deal, even on very big mailboxes IMHO.
And, as NFS means local network, you get a latency in the process anyway
and everyone is happy with it, due to available space is the main issue
for mailboxes these days. Noone cares if it takes a few ms longer to
access it. They only ask how much GB they can get per Euro.

best regards,
Marius

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Merlin Hartley via Exim-users
More generally isn’t it easier to let the imap system manage the quotas and the 
mta just queries that? The imap would need to enforce them anyway and allow mua 
to query it... 


--
Merlin Hartley


> On 14 Aug 2019, at 17:20, Nigel Metheringham via Exim-users 
>  wrote:
> 
> Would not taking the quota size of a message as being the file size / link 
> count be good enough?
> 
> If there are links between users then this will underestimate the size of a 
> maildir structure, but in general it would be right.
> 
> But it does force a stat() call which the put everything in the name as tags 
> was designed to avoid - we used to really want that way back in the days of 
> huge numbers of users with NFS mounted mailboxes in late 90's ISPs.
> 
> Nigel.
> 
> Andrew C Aitchison via Exim-users wrote on 14/08/2019 16:48:
>>> On Wed, 14 Aug 2019, Jeremy Harris via Exim-users wrote:
>>> 
 On 14/08/2019 12:37, Andrew C Aitchison via Exim-users wrote:
 I suspect an option to have a fast but inaccurate quota would be useful
 in some circumstances.
>>> 
>>> We already have maildir_use_size_file; rebuilding isn't needed often.
>>> 
>>> Do we need a fast/poor quota method for cases where the size-file
>>> cannot be used?
>> 
>> Ah. I had missed that Cyborg is using maildir format (I'm used to mbox).
>> 
>>> Other possible ways of balancing: we currently glance at the filename,
>>> trying to pull a size encoded in it.  That saves an additional per-file
>>> stat call to get the size.  But without the stat we don't have an
>>> inode number... we can hash the filename, but that only works if
>>> a hardlink is to a different dir but with the same name.  We could
>>> glance at the number of links, and only bother remembering >1 link
>>> nodes - but, again, we then need to do the stat call.
>> 
>> From the introduction to "Chapter 26 - The appendfile transport":
>>   Exim recognizes system quota errors, and generates an appropriate
>>   message. Exim also supports its own quota control within the transport,
>>   for use when the system facility is unavailable or cannot be used for some 
>> reason.
>> 
>> Now I think about it I've not (knowingly) used both on the same filesystem.
>> I also now realize that the INBOX and the files that a .forward file
>> redirects mail to may be on different disks and have different quotas ...
>> 
> 
> -- 
> 
> [ Nigel Metheringham --- nigel@dotdot.cloud ]
> [  Ellipsis Intangible Cloudy Technologies  ]
> 
> 
> 
> -- 
> ## List details at https://lists.exim.org/mailman/listinfo/exim-users
> ## Exim details at http://www.exim.org/
> ## Please use the Wiki with this list - http://wiki.exim.org/


-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Nigel Metheringham via Exim-users
Would not taking the quota size of a message as being the file size / 
link count be good enough?


If there are links between users then this will underestimate the size 
of a maildir structure, but in general it would be right.


But it does force a stat() call which the put everything in the name as 
tags was designed to avoid - we used to really want that way back in the 
days of huge numbers of users with NFS mounted mailboxes in late 90's ISPs.


    Nigel.

Andrew C Aitchison via Exim-users wrote on 14/08/2019 16:48:

On Wed, 14 Aug 2019, Jeremy Harris via Exim-users wrote:


On 14/08/2019 12:37, Andrew C Aitchison via Exim-users wrote:

I suspect an option to have a fast but inaccurate quota would be useful
in some circumstances.


We already have maildir_use_size_file; rebuilding isn't needed often.

Do we need a fast/poor quota method for cases where the size-file
cannot be used?


Ah. I had missed that Cyborg is using maildir format (I'm used to mbox).


Other possible ways of balancing: we currently glance at the filename,
trying to pull a size encoded in it.  That saves an additional per-file
stat call to get the size.  But without the stat we don't have an
inode number... we can hash the filename, but that only works if
a hardlink is to a different dir but with the same name.  We could
glance at the number of links, and only bother remembering >1 link
nodes - but, again, we then need to do the stat call.


From the introduction to "Chapter 26 - The appendfile transport":
  Exim recognizes system quota errors, and generates an appropriate
  message. Exim also supports its own quota control within the transport,
  for use when the system facility is unavailable or cannot be used 
for some reason.


Now I think about it I've not (knowingly) used both on the same 
filesystem.

I also now realize that the INBOX and the files that a .forward file
redirects mail to may be on different disks and have different quotas ...



--

[ Nigel Metheringham --- nigel@dotdot.cloud ]
[  Ellipsis Intangible Cloudy Technologies  ]
 




--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Phil Pennock via Exim-users
On 2019-08-14 at 12:54 +0100, Jeremy Harris via Exim-users wrote:
> Do we need a fast/poor quota method for cases where the size-file
> cannot be used?

Just to raise the possibility to see if others can spot approaches which
make this feasible rather than a giant can of worms: direct support for
filesystem quotas in Exim, using soft limits set to match Dovecot's.

Caveat in all of this: I've never played with quotas from the C APIs, so
don't know enough to speak authoritatively here; this is based on some
quick man-page and source checks, to write this email.

The portability and maintenance is going to be hell, unless I've missed
some standardization here (very possible).  The only approach likely to
reliably expose usage/limits to userland is the NFS RPC server approach,
which could probably be run even without NFS.  Otherwise, the issue is
that a lot of things assume that all enforcement is in the kernel and
userland only needs access to manipulate limits, not to query how close
a user is.

On FreeBSD there's quota_open/quota_read/quota_close, which are used by
repquota(8), but that's UFS only.  On FreeBSD, the cases to consider
would be NFS, ZFS, UFS.  Each different, AFAIK, with no standardized
access across them all.  However, the oldest API is quotactl(), which
was in 4.3BSD-Reno, so is widespread.  The manpage documents UFS-only.

Underneath, the UFS case uses quotactl(), which is also present on Linux
and appears to be used for other filesystem types on Linux too, albeit
perhaps requiring extra C header imports to work with those.

  rc = quotactl(path, Q_GETQUOTA, uid, );

Since quotactl() is on, at least, Linux and *BSD, and supports at least
some FS types, it might be worth seeing what sorts of OSes/FSes people
use for delivered email with Exim and design around using this general
API, initially supporting just quotactl() unless/until volunteers
contribute code for other APIs.

That should be a "proper" solution, which is lightweight in use, if
folks already have the FS quotas turned on.  Which my uninformed guess
is "yes, because that's the sanest way for Dovecot to implement quotas
with hard link support".

Exim's current approach has the benefit of working everywhere, on all
OSes which Exim supports.

I think that quotactl(), ZFS and NFS are _probably_ the only scenarios
we'd likely end up having code for.

People who actually work with quotas, please shoot holes in everything
I've just said.  :)

-Phil

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Andrew C Aitchison via Exim-users

On Wed, 14 Aug 2019, Jeremy Harris via Exim-users wrote:


On 14/08/2019 12:37, Andrew C Aitchison via Exim-users wrote:

I suspect an option to have a fast but inaccurate quota would be useful
in some circumstances.


We already have maildir_use_size_file; rebuilding isn't needed often.

Do we need a fast/poor quota method for cases where the size-file
cannot be used?


Ah. I had missed that Cyborg is using maildir format (I'm used to mbox).


Other possible ways of balancing: we currently glance at the filename,
trying to pull a size encoded in it.  That saves an additional per-file
stat call to get the size.  But without the stat we don't have an
inode number... we can hash the filename, but that only works if
a hardlink is to a different dir but with the same name.  We could
glance at the number of links, and only bother remembering >1 link
nodes - but, again, we then need to do the stat call.



From the introduction to "Chapter 26 - The appendfile transport":

  Exim recognizes system quota errors, and generates an appropriate
  message. Exim also supports its own quota control within the transport,
  for use when the system facility is unavailable or cannot be used for some 
reason.

Now I think about it I've not (knowingly) used both on the same filesystem.
I also now realize that the INBOX and the files that a .forward file
redirects mail to may be on different disks and have different quotas ...

--
Andrew C. Aitchison Kendal, UK
and...@aitchison.me.uk

--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Jeremy Harris via Exim-users
On 14/08/2019 12:37, Andrew C Aitchison via Exim-users wrote:
> I suspect an option to have a fast but inaccurate quota would be useful
> in some circumstances.

We already have maildir_use_size_file; rebuilding isn't needed often.

Do we need a fast/poor quota method for cases where the size-file
cannot be used?

Other possible ways of balancing: we currently glance at the filename,
trying to pull a size encoded in it.  That saves an additional per-file
stat call to get the size.  But without the stat we don't have an
inode number... we can hash the filename, but that only works if
a hardlink is to a different dir but with the same name.  We could
glance at the number of links, and only bother remembering >1 link
nodes - but, again, we then need to do the stat call.

-- 
Cheers,
  Jeremy

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Andrew C Aitchison via Exim-users

On Wed, 14 Aug 2019, Jeremy Harris via Exim-users wrote:


On 14/08/2019 10:48, Cyborg via Exim-users wrote:

Shall Exim be changed to refect the quota-du pov  ( i.e. with an option
how to handle hardlinks (default=oldstyle) ) or not?


I'd say it's an outright deficiency in Exim, it needs changing
and there need be no option.


When there are many files in the mailbox this calculation is likely to be 
slow (order n squared, at least when implemented without a good hash 
algorithm for 64bit inode values).


I suspect an option to have a fast but inaccurate quota would be useful
in some circumstances.

--
Andrew C. Aitchison Kendal, UK
and...@aitchison.me.uk
--
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


Re: [exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Jeremy Harris via Exim-users
On 14/08/2019 10:48, Cyborg via Exim-users wrote:
> Shall Exim be changed to refect the quota-du pov  ( i.e. with an option
> how to handle hardlinks (default=oldstyle) ) or not?

I'd say it's an outright deficiency in Exim, it needs changing
and there need be no option.

-- 
Cheers,
  Jeremy

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/


[exim] Exim, Dovecot, mdir and hardlinks - a true story

2019-08-14 Thread Cyborg via Exim-users
Hi all,

too make yourself aware of an exim-dovecot-quota problem you may run
into, we need:

https://wiki.dovecot.org/MailLocation/Maildir

which states:


Optimizations

  *

maildir_copy_with_hardlinks=yes (default): When copying a message,
do it with hard links whenever possible. This makes the performance
much better, and it's unlikely to have any side effects. Only reason
to disable this is if you're using a filesystem where hard links are
slow (e.g. HFS+).


means, if a client ( imap ) copies mails from one tree segment to
another, dovecot shall use hardlinks instead of a fs-storage copyoperation.

If you are running exim with a quota for imapmailboxes, you can now get
into trouble, as the FS in form of "repquota" and "du" miscalculates the
size of that mailbox, compared to exim. The fs gets a lower total size
of the directory, as it does not take hardlinks into account,*but exim
does*.

Example:

[root@sxxx Maildir]# repquota -a | grep  50544
#50544-- 1341720   0   0   1341 0 0 

and exim counted everything as if it was a real file: 

check_dir_size: dir=/mailacct/{usernameinquestion}/Maildir/ sum=1573152669 
count=1420

which in this case, resulted in an overflow of the quota and emails got
rejected with "mailbox full".

The issue was caused by a client copy operation of a folder into the
trashcan, which did not result in a delete of the copied folder.
or easier: it copied, but should had use "move" for it.

If you are the admin now, and check the quota or diskusage for that
specific mailbox, you will see a (mostly) much lower du as you quota for
the given mailbox suggests.

You can detect it with ls :

-rw---  2 51703 exim 4688 28. Jul 14:27
1406550448.H508202P31298.testmailsystem.de:2,STac
-rw---  1 51703 exim 6228 28. Jul 14:29 
1406550593.H760899P1075.testmailsystem.de:2,STac
You notice the "2" in line 1 column 2 ? Thats an indicator, that you
have hardlinks in use.


Problem is, you have no idea where the other one is located. You have to
use "ls -lia" to get the inode referenced for that file:

 5913370 -rw--- 1 51703 exim 5174 10. Sep 16:00 
1410357617.H134775P24247.testmailsystem.de:2,STac
5904472 -rw--- 2 51703 exim 13042 11. Sep 13:11
1410433911.H351692P31331.testmailsystem.de:2,STac

and search with find for the inodeusage :

find /mailacct/mailbenutzer/ -inum 5904472 -print

Now you need to decide which one of those files gets deleted. If you
have an entire folderfull of hardlinked files, it's easy, as you can
just delete the folder.

The second methode to detect it is "du -s":

"du -sh .[A-Z]* * "

You get a list of the mailbox and a summary of the folders sizes. Now
run "du -s" on each folder manually and compare it to the first output.

You will notice that in the first du output for all folders, the size
a(or more) folders is just a few kb , but when you "du -s" it one by
one, you get a much bigger folder size out of it. That happens, because
du captures the inodes for files it processes, and if you hit a file
with the same inode again, which is a hardlink, it skips it. It can't
skip this, if it only finds one of the hardlinked files :) As a result,
the du -s of the single folder reveals the true "exim counted" size of
the folder.

The problem resides in the nature of hardlinks and how different tools
react to it.

The question is:

Shall Exim be changed to refect the quota-du pov  ( i.e. with an option
how to handle hardlinks (default=oldstyle) ) or not?

Before you say, "dovecot caused it, just change the default option
there" remember: the problem already manifested itself on your maildir
storage and changing dovecots config now does not change this. Sooner or
later you may ran into this again.

I vote for a quota option in appendfile transport to skip hardlinks.  A
simple (inode,boolean)hash check would do while scanning the mdir content.

What do you think?

best regards,
Marius


For german readers, there is an example case here:

https://marius.bloggt-in-braunschweig.de/2014/11/30/exim-hoehere-quota-durch-hardlinks/

-- 
## List details at https://lists.exim.org/mailman/listinfo/exim-users
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/