Re: postfix benchmark performance

2009-02-12 Thread lst_hoe02

Zitat von Silas Boyd-Wickizer :


Why do you believe that this should use 100% of ALL Cpus?

If you look at your synthetic test then you will likely find that
there are at any point in time only a few mail receiving processes
and mail delivering processes, and that these processes will all
be waiting for kernel system calls to complete.

With this synthetic test you really have only a low-concurrency load.


Yes, there are only a few mail delivering processes (virtual).
Why is this a function of my load?  There are many messages
waiting for delivery, so why doesn't postfix run more virtuals
to increase concurrency?

I'm not sure what you mean by "waiting for kernel system calls to
complete".  Do you mean "executing kernel system calls" (reading
from a pipe), or "blocked on kernel system calls" (i.e. waiting
on a pipe)?


As far as i understand all mail must "pass" (by means of decide what  
to do with) the qmgr which is a single process and therefore limited  
to one CPU. As you have show it is able to manage around 3000 mail/sec  
(which means around 10 mio. a hour btw) on a low cost CPU core.
In practice you will never be able to push mail that fast to any  
permanent storage available today...
If you will be able to do so in the far future one CPU core will be  
even faster and therefore qmgr will still not be the bottleneck in any  
real mailsystem.
This is why your "benchmark" is only useful to see qmgr working hard  
because in any real-world scenario it is nearly idle waiting for the  
disk I/O.
Be aware that this is a "naive" explantation and the internal details  
are more complex than this.


Regards

Andreas




Re: postfix benchmark performance

2009-02-11 Thread Wietse Venema
Silas Boyd-Wickizer:
> > Why do you believe that this should use 100% of ALL Cpus?
> > 
> > If you look at your synthetic test then you will likely find that
> > there are at any point in time only a few mail receiving processes
> > and mail delivering processes, and that these processes will all
> > be waiting for kernel system calls to complete.
> > 
> > With this synthetic test you really have only a low-concurrency load.
> 
> Yes, there are only a few mail delivering processes (virtual).  
> Why is this a function of my load?  There are many messages 
> waiting for delivery, so why doesn't postfix run more virtuals 
> to increase concurrency?

One Postfix process uses one CPU at any point in time. The Postfix
scheduler is one such process. You have clocked this process at
300 microseconds per message. Congratulations. You will never have
a real network or real file system that can sustain this. So now
you can focus on real problems instead.

> I'm not sure what you mean by "waiting for kernel system calls to 
> complete".  Do you mean "executing kernel system calls" (reading 
> from a pipe), or "blocked on kernel system calls" (i.e. waiting 
> on a pipe)?

Kernels execute system calls. Processes can only ask and wait
while the kernel is doing kernel thingies.

Wietse


Re: postfix benchmark performance

2009-02-11 Thread Victor Duchovni
On Wed, Feb 11, 2009 at 03:57:45PM -0600, Noel Jones wrote:

> Silas Boyd-Wickizer wrote:
>> Yes, there are only a few mail delivering processes (virtual).  Why is 
>> this a function of my load?  There are many messages waiting for delivery, 
>> so why doesn't postfix run more virtuals to increase concurrency?
>
> This might have something to do with concurrency...
>> postconf -n
>> default_destination_concurrency_limit = 1

For maildirs this is not necessary. More reasonable, is a recipient
limit of 1.

> But really, this whole exercise seems fairly meaningless.

Indeed, benchmarks of peak queue manager throughput are not that useful.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:


If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.


Re: postfix benchmark performance

2009-02-11 Thread Victor Duchovni
On Wed, Feb 11, 2009 at 04:47:40PM -0500, Silas Boyd-Wickizer wrote:

> There are many messages 
> waiting for delivery, so why doesn't postfix run more virtuals 
> to increase concurrency?

Because it can't decide where to send the mail any faster. This thread
is not very productive, the benchmark is measuring a part of the system
that is never the bottle-neck in real configurations.

If you test a real configuration and you don't over-saturate the input
rate, you'll find that the incoming queue stays small, and throughput
is disk I/O limited. If you then push harder (more input concurrency),
throughput will drop-off slowly as input I/O starves output I/O and
the queue manager.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:


If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.


Re: postfix benchmark performance

2009-02-11 Thread Noel Jones

Silas Boyd-Wickizer wrote:
Yes, there are only a few mail delivering processes (virtual).  
Why is this a function of my load?  There are many messages 
waiting for delivery, so why doesn't postfix run more virtuals 
to increase concurrency?


This might have something to do with concurrency...

postconf -n
default_destination_concurrency_limit = 1


But really, this whole exercise seems fairly meaningless.

  -- Noel Jones


Re: postfix benchmark performance

2009-02-11 Thread Silas Boyd-Wickizer
> Why do you believe that this should use 100% of ALL Cpus?
> 
> If you look at your synthetic test then you will likely find that
> there are at any point in time only a few mail receiving processes
> and mail delivering processes, and that these processes will all
> be waiting for kernel system calls to complete.
> 
> With this synthetic test you really have only a low-concurrency load.

Yes, there are only a few mail delivering processes (virtual).  
Why is this a function of my load?  There are many messages 
waiting for delivery, so why doesn't postfix run more virtuals 
to increase concurrency?

I'm not sure what you mean by "waiting for kernel system calls to 
complete".  Do you mean "executing kernel system calls" (reading 
from a pipe), or "blocked on kernel system calls" (i.e. waiting 
on a pipe)?

Thanks.

Silas


Re: postfix benchmark performance

2009-02-11 Thread Victor Duchovni
On Wed, Feb 11, 2009 at 02:28:40PM -0500, Silas Boyd-Wickizer wrote:

> > With 16 logical CPUs, in this configuration you'll find your CPU load
> > to be 1/16th of the theoretical maximum + overhead. Your report of 10%
> > is about right.
> 
> The system has 16 physical execution units: four quad core AMD 
> Opterons.  In the configuration I described, 90% of total cycles 
> are unused.

Yes, but in this configuration, 1 CPU is pegged, and the others are idle,
actually the others are working baout as hard combined, so that's where
you get the ~10%.

> > What exactly are you trying to measure with this "benchmark"?
> 
> I'm measuring how many emails Postfix can deliver per-sec to some 
> number of virtual aliases.  I'm not interested so much in the 
> absolute throughput performance, but in the reasons for the 
> performance.

Why is this an interseting measurement? In practice, your performance will
be at least a factor of 10 (more likely 30-100) lower, once you add
real disk latency, and other real loads.

> > No realistic configuration has the same critical resource, and you'll
> > run out of disk I/O throughput or CPU first depending on how CPU hungry
> > your content-filters are.
> 
> I understand this.
> 
> > If you really are planning to host all spools in RAM disk, and need more
> > than 3000 msgs/sec, I am most curious what use-case motivates this design
> > and performance requirement.
> 
> I don't have a real use-case in mind.

This benchmark is essentially meaningless, it proves that Postfix
switching won't be a problem util you reach 3000 msgs/sec. Since
your real loads will be much lower, you don't have to worry about it.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:


If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.


Re: postfix benchmark performance

2009-02-11 Thread Wietse Venema
Silas Boyd-Wickizer:
> Hello, I'm doing some experiments with a synthetic benchmark and 
> postfix.  My current postfix configuration can deliver ~3000 
> msg/sec to 1000 virtual mailboxes; however, the system (16 
> core/4x4 AMD opteron) is ~90% idle.  All logs and queues reside 

Why do you believe that this should use 100% of ALL Cpus?

If you look at your synthetic test then you will likely find that
there are at any point in time only a few mail receiving processes
and mail delivering processes, and that these processes will all
be waiting for kernel system calls to complete.

With this synthetic test you really have only a low-concurrency load.
 
Wietse


Re: postfix benchmark performance

2009-02-11 Thread Silas Boyd-Wickizer
> With 16 logical CPUs, in this configuration you'll find your CPU load
> to be 1/16th of the theoretical maximum + overhead. Your report of 10%
> is about right.

The system has 16 physical execution units: four quad core AMD 
Opterons.  In the configuration I described, 90% of total cycles 
are unused.

> What exactly are you trying to measure with this "benchmark"?

I'm measuring how many emails Postfix can deliver per-sec to some 
number of virtual aliases.  I'm not interested so much in the 
absolute throughput performance, but in the reasons for the 
performance.

> No realistic configuration has the same critical resource, and you'll
> run out of disk I/O throughput or CPU first depending on how CPU hungry
> your content-filters are.

I understand this.

> If you really are planning to host all spools in RAM disk, and need more
> than 3000 msgs/sec, I am most curious what use-case motivates this design
> and performance requirement.

I don't have a real use-case in mind.  For curiosities sake I 
would like to know what the second-order bottlenecks are after 
the disk and network.  I suspect that I mis-configured because 
postfix only utilizes 10% of available cycles.  I realize this is 
a synthetic/contrived/silly "benchmark" and a little outside the 
scope of what is normally discussed on this list..but I would 
still like to know why postfix uses 10% of available cycles.

Silas


Re: postfix benchmark performance

2009-02-11 Thread Victor Duchovni
On Wed, Feb 11, 2009 at 01:41:19PM -0500, Silas Boyd-Wickizer wrote:

> Hello, I'm doing some experiments with a synthetic benchmark and 
> postfix.  My current postfix configuration can deliver ~3000 
> msg/sec to 1000 virtual mailboxes; however, the system (16 
> core/4x4 AMD opteron) is ~90% idle.  All logs and queues reside 
> in a RAM filesystem, so disk IO is not a bottleneck.  I am 
> generating the incoming load locally using (a slightly modified) 
> smtp-source, so the network is not a bottleneck.  smtp-source is 
> generating 10k emails and smtpd/cleanup can put the incoming 
> emails on the incoming queue much faster than the qmgr can pull 
> them off.  Besides the incoming and active queues, all queues are 
> empty during the benchmark.  Ideally I want the system to be 0% 
> idle.  Any suggestions on how to achieve this?

With 16 logical CPUs, in this configuration you'll find your CPU load
to be 1/16th of the theoretical maximum + overhead. Your report of 10%
is about right.

What exactly are you trying to measure with this "benchmark"?

No realistic configuration has the same critical resource, and you'll
run out of disk I/O throughput or CPU first depending on how CPU hungry
your content-filters are.

If you really are planning to host all spools in RAM disk, and need more
than 3000 msgs/sec, I am most curious what use-case motivates this design
and performance requirement.

-- 
Viktor.

Disclaimer: off-list followups get on-list replies or get ignored.
Please do not ignore the "Reply-To" header.

To unsubscribe from the postfix-users list, visit
http://www.postfix.org/lists.html or click the link below:


If my response solves your problem, the best way to thank me is to not
send an "it worked, thanks" follow-up. If you must respond, please put
"It worked, thanks" in the "Subject" so I can delete these quickly.


postfix benchmark performance

2009-02-11 Thread Silas Boyd-Wickizer
Hello, I'm doing some experiments with a synthetic benchmark and 
postfix.  My current postfix configuration can deliver ~3000 
msg/sec to 1000 virtual mailboxes; however, the system (16 
core/4x4 AMD opteron) is ~90% idle.  All logs and queues reside 
in a RAM filesystem, so disk IO is not a bottleneck.  I am 
generating the incoming load locally using (a slightly modified) 
smtp-source, so the network is not a bottleneck.  smtp-source is 
generating 10k emails and smtpd/cleanup can put the incoming 
emails on the incoming queue much faster than the qmgr can pull 
them off.  Besides the incoming and active queues, all queues are 
empty during the benchmark.  Ideally I want the system to be 0% 
idle.  Any suggestions on how to achieve this?

postconf -n

alias_database = hash:/etc/aliases
alias_maps = hash:/etc/aliases
alternate_config_directories = /etc/postfix1, /etc/postfix2
append_dot_mydomain = no
biff = no
command_directory = /usr/sbin
config_directory = /etc/postfix
daemon_directory = /usr/libexec/postfix
data_directory = /tmp/mail/0/lib/postfix
default_destination_concurrency_limit = 1
default_destination_recipient_limit = 1000
default_process_limit = 200
default_recipient_refill_limit = 10
disable_dns_lookups = yes
html_directory = no
in_flow_delay = 0
inet_interfaces = all
initial_destination_concurrency = 500
mail_owner = postfix
mailbox_command = procmail -a "$EXTENSION"
mailbox_size_limit = 0
mailq_path = /usr/bin/mailq
manpage_directory = /usr/local/man
mydestination = localhost.csail.mit.edu, , localhost
myhostname = localhost.csail.mit.edu
mynetworks = 127.0.0.0/8
myorigin = /etc/mailname
newaliases_path = /usr/bin/newaliases
qmgr_message_active_limit = 8
qmgr_message_recipient_limit = 8
queue_directory = /tmp/mail/0/postfix
readme_directory = no
recipient_delimiter = +
relayhost = 
sample_directory = /etc/postfix
sendmail_path = /usr/sbin/sendmail
setgid_group = postdrop
smtpd_banner = $myhostname ESMTP $mail_name (Debian/GNU)
smtpd_client_connection_count_limit = 0
smtpd_peername_lookup = no
syslog_facility = local0
virtual_gid_maps = static:1000
virtual_mailbox_base = /tmp/mail/vhosts
virtual_mailbox_domains = goo.com
virtual_mailbox_maps = hash:/etc/postfix/vmailbox
virtual_minimum_uid = 100
virtual_uid_maps = static:1000

Here is sequence from strace -p  -T -tt:

12:34:33.138590 lstat("incoming/2303823913A8", {st_mode=S_IFREG|0700, 
st_size=10797, ...}) = 0 <0.09>
12:34:33.138648 rename("incoming/2303823913A8", "active/2303823913A8") = 0 
<0.12>
12:34:33.138697 open("active/2303823913A8", O_RDWR) = 10 <0.08>
12:34:33.138738 flock(10, LOCK_EX|LOCK_NB) = 0 <0.06>
12:34:33.138773 lseek(10, 0, SEEK_CUR)  = 0 <0.06>
12:34:33.138808 read(10, "CO  10291 50"..., 4096) = 4096 
<0.09>
12:34:33.138863 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.07>
12:34:33.138925 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.07>
12:34:33.138981 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.07>
12:34:33.139044 sendto(7, "<134>Feb 11 12:34:33 postfix/qmg"..., 108, 
MSG_NOSIGNAL, NULL, 0) = 108 <0.13>
12:34:33.139114 lseek(10, 6697, SEEK_CUR) = 10793 <0.06>
12:34:33.139149 read(10, "X\0E\0", 4096) = 4 <0.06>
12:34:33.139187 lseek(10, 0, SEEK_END)  = 10797 <0.06>
12:34:33.139221 unlink("defer/2/2303823913A8") = -1 ENOENT (No such file or 
directory) <0.08>
12:34:33.139266 poll([{fd=11, events=POLLIN}], 1, 0) = 0 <0.06>
12:34:33.139305 poll([{fd=11, events=POLLOUT, revents=POLLOUT}], 1, 360) = 
1 <0.06>
12:34:33.139345 write(11, "request\0resolve\0sender\0...@josmp"..., 57) = 57 
<0.68>
12:34:33.139478 poll([{fd=11, events=POLLIN, revents=POLLIN}], 1, 360) = 1 
<0.07>
12:34:33.139524 read(11, "flags\\0transport\0virtual\0nextho"..., 4096) = 
79 <0.09>
12:34:33.139579 close(10)   = 0 <0.07>
12:34:33.139617 epoll_wait(8, {}, 100, 0) = 0 <0.06>
12:34:33.139651 alarm(333)  = 333 <0.06>
12:34:33.139699 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.08>
12:34:33.139763 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.08>
12:34:33.139824 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.07>
12:34:33.139888 sendto(7, "<134>Feb 11 12:34:33 postfix/qmg"..., 82, 
MSG_NOSIGNAL, NULL, 0) = 82 <0.11>
12:34:33.139947 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.07>
12:34:33.140012 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.08>
12:34:33.140070 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.08>
12:34:33.140134 sendto(7, "<134>Feb 11 12:34:33 postfix/qmg"..., 131, 
MSG_NOSIGNAL, NULL, 0) = 131 <0.11>
12:34:33.140190 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1267, 
...}) = 0 <0.08>
12:34:33.