ZFS doing insane I/O reads
I just deployed zfs on my newer cyrus servers. These servers get less than 2000 mails per hour and around 400 concurrent pop/imap connections I have seen that even if there is no incoming pop or imap connection still there is large amount of READ happenning on the zfs partitions. Is this normal behaviour for an imap server. Iostat shows sometimes upto 2000 TPS The reads are infact more than 10x of what writes are. I am afraid I will be trashing the harddisk. Do I need to tune ZFS specially for cyrus ? This is the typical zpool iostat output zpool iostat 1 poolalloc free read write read write -- - - - - - - imap 145G 655G418 58 18.0M 1.78M imap 146G 654G258118 8.28M 960K imap 145G 655G447146 19.4M 4.37M imap 145G 655G413 32 19.4M 1.46M imap 145G 655G339 4 14.8M 20.0K imap 145G 655G341 40 15.7M 755K imap 145G 655G305 10 15.0M 55.9K imap 145G 655G328 12 14.8M 136K Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: ZFS doing insane I/O reads
On Mon, February 27, 2012 11:10 am, Ram wrote: I just deployed zfs on my newer cyrus servers. These servers get less than 2000 mails per hour and around 400 concurrent pop/imap connections I have seen that even if there is no incoming pop or imap connection still there is large amount of READ happenning on the zfs partitions. Is this normal behaviour for an imap server. Iostat shows sometimes upto 2000 TPS The reads are infact more than 10x of what writes are. I am afraid I will be trashing the harddisk. Do I need to tune ZFS specially for cyrus ? This is the typical zpool iostat output zpool iostat 1 poolalloc free read write read write -- - - - - - - imap 145G 655G418 58 18.0M 1.78M imap 146G 654G258118 8.28M 960K imap 145G 655G447146 19.4M 4.37M imap 145G 655G413 32 19.4M 1.46M imap 145G 655G339 4 14.8M 20.0K imap 145G 655G341 40 15.7M 755K imap 145G 655G305 10 15.0M 55.9K imap 145G 655G328 12 14.8M 136K Ram, We have a single Cyrus server about ten times as busy as yours with four ZFS pools (EMC Celerra iSCSI SAN) for message stores ; all the databases, quota and seen information are on an internal server SSD based (mirror) pool. We also have a few GB of SSD based ZIL (synchronous write cache) per pool. Here is our 'zpool iostat 1' output : capacity operationsbandwidth poolalloc free read write read write -- - - - - - - cpool1 901G 2.96T 22 32 422K 286K cpool2 1.18T 2.66T 29 45 578K 459K cpool3 1.00T 2.84T 24 34 456K 314K cpool4 993G 2.87T 25 35 455K 328K ssd 7.49G 22.3G 4 35 17.2K 708K -- - - - - - - cpool1 901G 2.96T 45 16 670K 759K cpool2 1.18T 2.66T 47 25 565K 603K cpool3 1.00T 2.84T 33 13 410K 483K cpool4 993G 2.87T 12 8 525K 244K ssd 7.49G 22.3G 13210 49.4K 10.8M -- - - - - - - cpool1 901G 2.96T 20 22 77.9K 2.15M cpool2 1.18T 2.66T 25 4 937K 128K cpool3 1.00T 2.84T 20 91 324K 11.0M cpool4 993G 2.87T 17 13 844K 83.9K ssd 7.49G 22.3G 6237 20.0K 20.9M -- - - - - - - cpool1 901G 2.96T 0 0 1023 0 cpool2 1.18T 2.66T 12 21 146K 1.26M cpool3 1.00T 2.84T 8 26 46.5K 2.28M cpool4 993G 2.87T 11 4 353K 24.0K ssd 7.49G 22.3G 17135 99.4K 8.12M -- - - - - - - cpool1 901G 2.96T 4 0 80.9K 4.00K cpool2 1.18T 2.66T 7 6 133K 28.0K cpool3 1.00T 2.84T 6 0 16.5K 4.00K cpool4 993G 2.87T 4 4 149K 20.0K ssd 7.49G 22.3G 9 76 51.0K 4.24M -- - - - - - - cpool1 901G 2.96T 12 0 269K 4.00K cpool2 1.18T 2.66T 19 0 327K 4.00K cpool3 1.00T 2.84T 7 3 11.0K 16.0K cpool4 993G 2.87T 5 95 167K 11.4M ssd 7.49G 22.3G 4226 17.5K 25.2M -- - - - - - - cpool1 901G 2.96T 14 20 311K 1.22M cpool2 1.18T 2.66T 19 15 85.4K 1.39M cpool3 1.00T 2.84T 6 6 5.49K 40.0K cpool4 993G 2.87T 4 15 17.0K 1.70M ssd 7.49G 22.3G 6151 21.5K 13.1M -- - - - - - - cpool1 901G 2.96T 56 15 2.11M 559K cpool2 1.18T 2.66T 13 7 18.5K 32.0K cpool3 1.00T 2.84T 5 4 54.4K 392K cpool4 993G 2.87T 17 2 66.4K 136K ssd 7.49G 22.3G 6109 45.9K 8.29M -- - - - - - - cpool1 901G 2.96T 38 19 228K 1.89M cpool2 1.18T 2.66T 29 11 160K 300K cpool3 1.00T 2.84T 4 4 11.5K 24.0K cpool4 993G 2.87T 9 8 31.5K 56.0K ssd 7.49G 22.3G 12150 46.0K 12.1M -- - - - - - - cpool1 901G 2.96T 32 1 106K 256K cpool2 1.18T 2.66T 46 5 692K 95.9K cpool3 1.00T 2.84T 7 13 189K 324K cpool4 993G 2.87T 4 0 29.0K 4.00K ssd 7.49G 22.3G 25 96 149K 8.08M -- - - - - - - Q1 : How much RAM does your server have ? Solaris 10 uses all remaining free RAM as ZFS read cache. We have 72 GB of RAM in our
Re: how to authenticate on localhost without password?
On 02/26/12 12:36 -0500, Brian J. Murrell wrote: Subject might be a bit misleading but here is the problem... I have a cyrus imap server serving a userbase. Of course with any mail system comes the issue of handling spam. My users each have two folders in their account: Junk and Not Junk where they put their spam and mis-identifed spam. On the imap server each user has a system (i.e. linux) account complete with a SpamAssassin configuration including bayesian classification database, etc. so that each user has their own database of what's spam and what isn't. That means that for each user to classify their spam/ham the sa-learn process has to run as their own uid. To achieve that goal, as well as timely processing of the spam and ham folders, each user has a process on the mail server running as their uid which monitors those mailboxes and processes them (and/or each user has jobs run from their cron to periodically do the same). The question comes now, how can I have a master process which spawns all of these per-user threads/processes give them some sort of credential that allows them to get access to their imap account, without storing a list of accounts/passwords in a file that would need to keep synchronized with their system passwords (not to mention the security nightmare it would be to store account passwords in plaintext). FWIW, this configuration is Kerberos authenticated/authorized. Hiemdal KCM should be able to handle renewing kerberos credentials for your users. Another option would be to utilize SASL EXTERNAL authentication to authenticate your users, locally, based on peercred. Cyrus IMAP does not currently have support for external auth, but I'm attaching a Linux specific patch, against cyrus 2.3.12, which works for me. I'm not sure how your spam processing fits into the picture, but your spawned processes will need to function as IMAP clients, and will need to be able to select the GSSAPI or EXTERNAL SASL mechanisms to use either of the above scenarios. Or is there some alternative interface to the cyrus imap folder mechanism (i.e. not through the IMAP protocol) that I am completely missing, that would be better suited to this problem? One possible solution I can think of that would use the IMAP protocol for all of this is to create a single IMAP account that will be given access (i.e. using cyrus' ACLs) to every users' Junk, Not Junk and INBOX folders in order to read the messages, learn them and in the case of ham, move them back to their INBOX. But before I go down this road I just want to make sure it's really the right road or if there is some alternative that I am just not recognizing yet. -- Dan White diff -ruN cyrus-imapd-2.3.12.pristine/imap/imapd.c cyrus-imapd-2.3.12/imap/imapd.c --- cyrus-imapd-2.3.12.pristine/imap/imapd.c 2008-04-13 10:40:29.0 -0500 +++ cyrus-imapd-2.3.12/imap/imapd.c 2008-04-22 23:14:20.0 -0500 @@ -106,6 +106,7 @@ #include xmalloc.h #include xstrlcat.h #include xstrlcpy.h +#include pwd.h #include pushstats.h /* SNMP interface */ @@ -715,6 +716,8 @@ char hbuf[NI_MAXHOST]; int niflags; int imapd_haveaddr = 0; +struct ucred pc; +socklen_t pclen = sizeof(pc); signals_poll(); @@ -780,8 +783,25 @@ saslprops.ipremoteport = xstrdup(remoteip); sasl_setprop(imapd_saslconn, SASL_IPLOCALPORT, localip); saslprops.iplocalport = xstrdup(localip); +} else { + if (getsockopt(0, SOL_SOCKET, SO_PEERCRED, (void *)pc, pclen) == 0) { + struct passwd *pw = getpwuid(pc.uid); + int result; + result = sasl_setprop(imapd_saslconn, SASL_AUTH_EXTERNAL, pw-pw_name); + if (result != SASL_OK) { + return -1; + } + if(saslprops.authid) { + free(saslprops.authid); + saslprops.authid = NULL; + } + if(pw-pw_name) { + saslprops.authid = xstrdup(pw-pw_name); + } + } } + proc_register(imapd, imapd_clienthost, NULL, NULL); /* Set inactivity timer */ Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: how to authenticate on localhost without password?
On 02/27/12 10:32 -0600, Dan White wrote: Another option would be to utilize SASL EXTERNAL authentication to authenticate your users, locally, based on peercred. Cyrus IMAP does not currently have support for external auth, but I'm attaching a Linux specific patch, against cyrus 2.3.12, which works for me. I'm not sure how your spam processing fits into the picture, but your spawned processes will need to function as IMAP clients, and will need to be able to select the GSSAPI or EXTERNAL SASL mechanisms to use either of the above scenarios. I forgot to mention that to use the EXTERNAL mechanism in this way, you'll need to spawn an imapd process on a unix socket. E.g., in /etc/cyrus.conf: imapunixcmd=imapd -U 30 listen=/var/run/cyrus/socket/imap And your IMAP client will need the capability to speak to an IMAP server over that unix socket, like: socat -d READLINE /var/run/cyrus/socket/imap (c01 AUTHENTICATE EXTERNAL) -- Dan White Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
Re: ZFS doing insane I/O reads
On 02/27/2012 04:16 PM, Eric Luyten wrote: On Mon, February 27, 2012 11:10 am, Ram wrote: I just deployed zfs on my newer cyrus servers. These servers get less than 2000 mails per hour and around 400 concurrent pop/imap connections I have seen that even if there is no incoming pop or imap connection still there is large amount of READ happenning on the zfs partitions. Is this normal behaviour for an imap server. Iostat shows sometimes upto 2000 TPS The reads are infact more than 10x of what writes are. I am afraid I will be trashing the harddisk. Do I need to tune ZFS specially for cyrus ? This is the typical zpool iostat output zpool iostat 1 poolalloc free read write read write -- - - - - - - imap 145G 655G418 58 18.0M 1.78M imap 146G 654G258118 8.28M 960K imap 145G 655G447146 19.4M 4.37M imap 145G 655G413 32 19.4M 1.46M imap 145G 655G339 4 14.8M 20.0K imap 145G 655G341 40 15.7M 755K imap 145G 655G305 10 15.0M 55.9K imap 145G 655G328 12 14.8M 136K Ram, We have a single Cyrus server about ten times as busy as yours with four ZFS pools (EMC Celerra iSCSI SAN) for message stores ; all the databases, quota and seen information are on an internal server SSD based (mirror) pool. We also have a few GB of SSD based ZIL (synchronous write cache) per pool. Here is our 'zpool iostat 1' output : capacity operationsbandwidth poolalloc free read write read write -- - - - - - - cpool1 901G 2.96T 22 32 422K 286K cpool2 1.18T 2.66T 29 45 578K 459K cpool3 1.00T 2.84T 24 34 456K 314K cpool4 993G 2.87T 25 35 455K 328K ssd 7.49G 22.3G 4 35 17.2K 708K -- - - - - - - cpool1 901G 2.96T 45 16 670K 759K cpool2 1.18T 2.66T 47 25 565K 603K cpool3 1.00T 2.84T 33 13 410K 483K cpool4 993G 2.87T 12 8 525K 244K ssd 7.49G 22.3G 13210 49.4K 10.8M -- - - - - - - cpool1 901G 2.96T 20 22 77.9K 2.15M cpool2 1.18T 2.66T 25 4 937K 128K cpool3 1.00T 2.84T 20 91 324K 11.0M cpool4 993G 2.87T 17 13 844K 83.9K ssd 7.49G 22.3G 6237 20.0K 20.9M -- - - - - - - cpool1 901G 2.96T 0 0 1023 0 cpool2 1.18T 2.66T 12 21 146K 1.26M cpool3 1.00T 2.84T 8 26 46.5K 2.28M cpool4 993G 2.87T 11 4 353K 24.0K ssd 7.49G 22.3G 17135 99.4K 8.12M -- - - - - - - cpool1 901G 2.96T 4 0 80.9K 4.00K cpool2 1.18T 2.66T 7 6 133K 28.0K cpool3 1.00T 2.84T 6 0 16.5K 4.00K cpool4 993G 2.87T 4 4 149K 20.0K ssd 7.49G 22.3G 9 76 51.0K 4.24M -- - - - - - - cpool1 901G 2.96T 12 0 269K 4.00K cpool2 1.18T 2.66T 19 0 327K 4.00K cpool3 1.00T 2.84T 7 3 11.0K 16.0K cpool4 993G 2.87T 5 95 167K 11.4M ssd 7.49G 22.3G 4226 17.5K 25.2M -- - - - - - - cpool1 901G 2.96T 14 20 311K 1.22M cpool2 1.18T 2.66T 19 15 85.4K 1.39M cpool3 1.00T 2.84T 6 6 5.49K 40.0K cpool4 993G 2.87T 4 15 17.0K 1.70M ssd 7.49G 22.3G 6151 21.5K 13.1M -- - - - - - - cpool1 901G 2.96T 56 15 2.11M 559K cpool2 1.18T 2.66T 13 7 18.5K 32.0K cpool3 1.00T 2.84T 5 4 54.4K 392K cpool4 993G 2.87T 17 2 66.4K 136K ssd 7.49G 22.3G 6109 45.9K 8.29M -- - - - - - - cpool1 901G 2.96T 38 19 228K 1.89M cpool2 1.18T 2.66T 29 11 160K 300K cpool3 1.00T 2.84T 4 4 11.5K 24.0K cpool4 993G 2.87T 9 8 31.5K 56.0K ssd 7.49G 22.3G 12150 46.0K 12.1M -- - - - - - - cpool1 901G 2.96T 32 1 106K 256K cpool2 1.18T 2.66T 46 5 692K 95.9K cpool3 1.00T 2.84T 7 13 189K 324K cpool4 993G 2.87T 4 0 29.0K 4.00K ssd 7.49G 22.3G 25 96 149K 8.08M -- - - - - - - Q1 : How much RAM does
Re: ZFS doing insane I/O reads
Le 28/02/2012 07:13, Ram a écrit : This is a 16GB Ram server running Linux Centos 5.5 64 bit. There seems to be something definitely wrong .. because all the memory on the machine is free. (I dont seem to have fsstat on my server .. I will have to get it compiled ) ZFS as FUSE? We have Solaris 10 on x86(amd64) and we noticed that ZFS needs _RAM_, the more, the better. On Solaris, using mdb you can look at the memory consumption (in pages of physical memory): bash-3.2# mdb -k Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp cpu.generic zfs sockfs ip hook neti sctp arp usba fcp fctl qlc lofs sata fcip random crypto logindmux ptm ufs mpt mpt_sas ] ::memstat Page SummaryPagesMB %Tot Kernel6052188 23641 36% ZFS File Data 4607758 17999 27% Anon 2115097 8262 13% Exec and libs6915270% Page cache 82665 3220% Free (cachelist) 433268 16923% Free (freelist) 3477076 13582 21% Total16774967 65527 Physical 16327307 63778 As this is early in the morning, there are plenty of free pages in RAM (4 million), and the memory mapped executables of Cyrus IMAPd and shared libraries only consume 6915 pages, 27 MB. 1779 connections at this moment. We had to go from 32 GB to 64 GB per node due to extreme lags in IMAP spool processing. And even with 64 GB when memory pressure from the Kernel and Anon (mapped pages without an underlying file: classical malloc() or mmap mapped on /dev/zero after COW) there are light degradations in access times on high volume hours. Another idea we had was the usage of a fast SSD as Layer 2 ARC (L2ARC) named cache on the zpool command line, based on the lru algorithm at the end the blocks containing the cyrus.*-files should be there. The problem lies in the fact that a pool with a local cache device and remote SAN (FiberChannel) storage won't be able to be imported automatically on another machine without replacing the faulty device. And for the price of an FC-enabled SSD you can buy MUCH RAM. Does your CentOS system have some kind of trace to look for the block numbers which are read constantly? In Solaris I use dtrace to look for that and also for file based i/o to look WHICH files get read and written when there is starvation. -- Pascal Gienger Jabber/XMPP/Mail: pascal.gien...@uni-konstanz.de University of Konstanz, IT Services Department (Rechenzentrum) Building V, Room V404, Phone +49 7531 88 5048, Fax +49 7531 88 3739 G+: https://plus.google.com/114525323843315818983/ Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/