Re: [Dovecot] Fwd: Re: Dotlock dovecot-uidlist errors / NFS / High Load
Stan, On 1/20/11 7:45 PM, "Stan Hoeppner" wrote: > > What you're supposed to do, and what VMWare recommends, is to run ntpd _only > in > the ESX host_ and _not_ in each guest. According to: > http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displ > ayKC&externalId=1006427 Did you read the document you linked? As was mentioned on this list fairly recently, that's not been the recommendation for quite some time. To the contrary: === NTP Recommendations Note: In all cases use NTP instead of VMware Tools periodic time synchronization. (...) When using NTP in the guest, disable VMware Tools periodic time synchronization. === We run the guests with divider=10, periodic timesync disabled, and NTP on both the host and the guest. We have not had any time problems in several years of operation. -Brad
Re: [Dovecot] SSD drives are really fast running Dovecot
On 1/14/11 8:59 PM, "Brandon Davidson" wrote: > I work for central IS, so this is the first stage of a consolidated service > offering that we anticipate may encompass all of our staff and faculty. We > bought what we could with what we had, anticipating that usage will grow > over time as individual units migrate off their existing infrastructure. > > 1/3 of the available capacity is passive 3rd-site disaster-recovery. The > remaining 2 sites each host both an active and a passive copy of each mail > store; we design to be able to sustain a site outage without loss of > service. Each site has extra space for several years of growth, database > restores, and archival / records retention reserves. Oh, and you probably don't even want to think about what we did for our Dovecot infrastructure. Clustered NFS servers with seamless failover, snapshotting, and real-time block-level replication aren't cheap. The students and faculty/staff not supported by an existing Exchange environment aren't getting any less support, I'll say that much. Folks trust us with their education, their livelihoods, and their personal lives. I'd like to think that 'my fellow taxpayers' understand the importance of what we do and appreciate the measures we take to ensure the integrity and availability of their data. -Brad
Re: [Dovecot] SSD drives are really fast running Dovecot
Stan, On 1/14/11 7:09 PM, "Stan Hoeppner" wrote: > > The average size of an email worldwide today is less than 4KB, less than one > typical filesystem block. > > 28TB / 4KB = 28,000,000,000,000 bytes / 4096 bytes = 6,835,937,500 = > 6.8 billion emails / 5,000 users = > 1,367,188 emails per user > > 6.8 billion emails is "not much anymore" for a 5,000 seat org? You obviously don't live in the same world I do. Have you ever been part of a grant approval process and seen what kinds of files are exchanged, and with what frequency? Complied with retention and archival policies? Dealt with folks who won't (or can't) delete an message once they've received it? Blithely applying some inexplicable figure you've pulled out of who-knows-where and extrapolating from that hardly constitutes prudent planning. We based our requirement on real numbers observed in our environment, expected growth, and our budget cycle. How do you plan? More blind averaging? > How much did that 252TB NetApp cost the university? $300k? $700k? Just a > drop > in the bucket right? Do you think that was a smart purchasing decision, given > your state's $3.8 Billion deficit? You're close, if a bit high with one of your guesses. Netapp is good to Education. Not that it matters - you know very little about the financial state of my institution or how capital expenditures work within my department's funding model. I suppose I shouldn't be surprised though, you seem to be very skilled at taking a little bit of information and making a convincing-sounding argument about it... regardless of how much you actually know. > For comparison, as of Feb 2009, the entire digital online content of the > Library > of Congress was only 74TB. And you just purchased 252TB just for email for a > 5,000 head count subsection of a small state university's population? I work for central IS, so this is the first stage of a consolidated service offering that we anticipate may encompass all of our staff and faculty. We bought what we could with what we had, anticipating that usage will grow over time as individual units migrate off their existing infrastructure. Again, you're guessing and casting aspersions. This is enterprise storage; I'm not sure that you know what this actually means either. With Netapp you generally lose on the order of 35-45% due to right-sizing, RAID, spares, and aggregate/volume/snapshot reserves. What's left will be carved up into LUNs and presented to the hosts. 1/3 of the available capacity is passive 3rd-site disaster-recovery. The remaining 2 sites each host both an active and a passive copy of each mail store; we design to be able to sustain a site outage without loss of service. Each site has extra space for several years of growth, database restores, and archival / records retention reserves. That's how 16TB of active mail can end up requiring 252TB of raw disk. Doing things right can be expensive, but it's usually cheaper in the long run than doing it wrong. It's like looking into a whole other world for you, isn't it? No Newegg parts here... -Brad
Re: [Dovecot] Ongoing performance issues with 2.0.x
Stan, On 11/8/10 10:39 AM, "Stan Hoeppner" wrote: > > However, if CONFIG_HZ=1000 you're generating WAY too many interrupts/sec > to the timer, ESPECIALLY on an 8 core machine. This will exacerbate the > high context switching problem. On an 8 vCPU (and physical CPU) machine > you should have CONFIG_HZ=100 or a tickless kernel. You may get by > using 250, but anything higher than that is trouble. On modern kernels you can boot with "divider=10" to take the HZ from 1000 down to 100 at boot time - no rebuilding necessary. -Brad
Re: [Dovecot] remote hot site, IMAP replication or cluster over WAN
Stan, On 11/1/10 7:30 PM, "Stan Hoeppner" wrote: > 1. How many of you have a remote site hot backup Dovecot IMAP server? +1 > 2. How are you replicating mailbox data to the hot backup system? > C. Other Netapp Fabric MetroCluster, active IMAP/POP3 nodes at both sites mounting storage over NFS, and active/standby hardware load balancers in front. Probably more than most folks can afford, but it's pretty bulletproof. -Brad
Re: [Dovecot] Problem with namespace (maybe bug?)
Timo, On 10/28/10 5:13 AM, "Timo Sirainen" wrote: >> . list (subscribed) "" "*" >> * LIST (\Subscribed \NonExistent) "/" >> "Shared/tester2/sdfgsg/gsdfgf/vtyjyfgj/rtdhrthxs/zhfhg" >> . OK List completed. > > Looks like a bug, yeah. Should be fixed in v2.0. I don't know if it's worth > the trouble to fix it in v1.2 anymore though.. I think it's differently broken in 2.0 in certain configurations. See http://www.dovecot.org/list/dovecot/2010-October/054310.html It's possible that the same change to SUBSCRIPTIONS would fix it on 1.2? -Brad
Re: [Dovecot] Retrieving unread message count
Timo, On 10/17/10 4:20 PM, "Timo Sirainen" wrote: > On 18.10.2010, at 0.19, Brandon Davidson wrote: > >> Other than actually calling THREAD and >> counting the resulting groups, is there a good way to get a count of >> threads? > > Nope, that's the only way. It looks like draft-ietf-morg-inthread-01 dropped THREADROOT/THREADLEAF which is too bad. It would be really nice to be able to do something like: s01 SEARCH RETURN (COUNT) THREADROOT INTHREAD It looks like it's still pretty much a work in progress, but are you planning on implementing any more of the INTHREAD stuff? -Brad
Re: [Dovecot] Retrieving unread message count
Timo, On 10/17/10 3:56 PM, "Timo Sirainen" wrote: > > The reason why STATUS is mentioned to be possibly slow is to discourage > clients from doing a STATUS to all mailboxes. > > STATUS is definitely faster than SELECT+SEARCH with all IMAP servers. That's what I figured, thanks! Other than actually calling THREAD and counting the resulting groups, is there a good way to get a count of threads? -Brad
[Dovecot] Retrieving unread message count
Timo, I'm working with a webmail client that periodically polls unread message counts for a list of folders. It currently does this by doing a LIST or LSUB and then iterating across all of the folders, running a SEARCH ALL UNSEEN, and counting the resulting UID list. Eventually I'd like to see it using RFC5819 LIST-EXTENDED, but that requires a fair bit of work. In the mean time I'm trying to speed up the existing iteration. I've got it working using 'STATUS "mailbox" (UNSEEN)', but the language in RFC3501 suggest that this may be slow. There is a counterproposal to use RFC4731 ESEARCH and do 'SELECT "MAILBOX"'; 'SEARCH RETURN (COUNT) UNSEEN'. >From an IMAP server perspective, which do you anticipate would be faster? >From a client perspective it seems like STATUS would be better since it involves less round-trips to the server and less output parsing, but given the warnings in the RFCs there is concern that it is in fact be more expensive. -Brad
Re: [Dovecot] Significant performance problems
Chris, On 10/6/10 9:42 PM, "Chris Hobbs" wrote: > 3) Modified my NFS mount with noatime to reduce i/o hits there. Need to > figure out what Brad's suggestions about readahead on the server mean. It's been a while since I mucked with Linux as a NFS server, I've been on Netapp for a while. There may be less knobs than I recall. > I do have one more idea I'll throw out there. Everything I've got here > is virtual. I only have the one Dovecot/Postfix server running now, and > the impression I get from you all is that that should be adequate for my > load. What would the collective opinion be of simply removing the NFS > server altogether and mounting the virtual disk holding my messages > directly to the dovecot server? If you're not planning on doing some sort of HA failover or load balancing, and have the option to make your storage direct-attached instead of NAS, it might be worth trying. There's not much to be gained from NFS in a single-node configuration. -Brad
Re: [Dovecot] Broken SELECT ""/EXAMINE ""
Michael, On 9/1/10 12:18 AM, "Michael M. Slusarz" wrote: > imapproxy *should* really be using UNSELECT, but that looks like a > different (imapproxy) bug. I run imapproxy too. If you're using Dovecot 2.0, set: imap_capability = +UNSELECT IDLE Imapproxy is naive and only reads capabilities from the initial banner - it doesn't refresh them after login. If you make sure they're in the initial capability list it will behave properly. -Brad
Re: [Dovecot] nfs director
Noel, On 8/26/10 11:28 PM, "Noel Butler" wrote: > I just fail to see why adding more complexity, and essentially making > $9K load balancers redundant, is the way of the future. To each their own. If your setup works without it, then fine, don't use it... but I don't see why you feel the need to disparage it either. It's hardly bloat; those of us with larger installations do find it useful. IIRC it was sponsored development, and was running in production for a large ISP from the very moment it was released. -Brad
Re: [Dovecot] nfs director
Noel, On 8/26/10 9:59 PM, "Noel Butler" wrote: >> I fail to see advantage if anything it add in more point of failure, with > > i agree with this and it is why we dont use it > > we use dovecots deliver with postfix and have noticed no problems, not > to say there was none, but if so, we dont notice it. We might be a slightly larger install than you (60k users, mail on FAS 3170 Metrocluster), but we have noticed corruption issues and the director is definitely going to see use in our shop. We still use Sendmail+procmail for delivery, so no issue there... but we've got hordes of IMAP users that will leave a client running at home, at their desk, on their phone, and then will use Webmail on their laptop. Without the director, all of these sessions end up on different backend mailservers, and it's basically a crapshoot which Dovecot instance notices a new message first. NFS locking being what it is, odds are an index will get corrupted sooner or later, and when this happens the user's mail 'disappears' until Dovecot can reindex it. The users inevitably freak out and call the helpdesk, who tells them to close and reopen their mail client. Maybe you're small enough to not run into problems, or maybe your users just have lower expectations or a higher pain threshold than ours. Either way, it's unpleasant for everyone involved, and quite easy to solve with the director proxy. Timo has been saying for YEARS that you need user-node affinity if you're doing NFS, and now he's done something about it. If you've already got a load balancer, then just point the balancer at a pool of directors, and then point the directors at your existing mailserver pool. For health monitoring on the directors, check out: http://github.com/brandond/poolmon -Brad
[Dovecot] Login process connection routing
Timo, Just out of curiosity, how are incoming connections routed to login processes when run with: service imap-login { service_count = 0 } I've been playing with this on our test director, and the process connection counts look somewhat unbalanced. I'm wondering if there are any performance issues with having a single process handle so many connections. It seems fine (system load is actually lower than with service_count = 1), but I thought I'd ask. /usr/sbin/dovecot \_ dovecot/imap-login \_ dovecot/imap-login [1 connections (1 TLS)] \_ dovecot/imap-login \_ dovecot/imap-login [5 connections (5 TLS)] \_ dovecot/imap-login [1 connections (1 TLS)] \_ dovecot/imap-login [4 connections (4 TLS)] \_ dovecot/imap-login [1 connections (1 TLS)] \_ dovecot/imap-login [1 connections (1 TLS)] \_ dovecot/imap-login [315 connections (315 TLS)] \_ dovecot/imap-login [63 connections (63 TLS)] \_ dovecot/imap-login [12 connections (12 TLS)] \_ dovecot/imap-login \_ dovecot/imap-login [10 connections (10 TLS)] \_ dovecot/imap-login [2 connections (2 TLS)] \_ dovecot/imap-login [370 connections (370 TLS)] \_ dovecot/imap-login [24 connections (24 TLS)] -Brad
Re: [Dovecot] Doveadm director flush/remove
Timo, On 7/19/10 9:38 AM, "Timo Sirainen" wrote: > > http://hg.dovecot.org/dovecot-2.0/rev/f178792fb820 fixes it? It makes it further before crashing. Trace attached. I still wonder why it's timing out in the first place. Didn't you change it to reset the timeout as long as it's still getting data from the userdb? -Brad auth-worker-gdb_2.txt Description: Binary data
Re: [Dovecot] Doveadm director flush/remove
Timo, >> Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7 > > Nope, still crashes with the same stack. I'll rebuild with -g and report > back. Here we go. Attached, hopefully Entourage won't mangle the line wrap. -Brad auth-worker-gdb.txt Description: Binary data
Re: [Dovecot] Doveadm director flush/remove
Timo, On 7/17/10 11:06 AM, "Timo Sirainen" wrote: > >> Here's a stack trace. Standard null function pointer. No locals, I think I'd >> have to recompile to get additional information. >> >> #0 0x in ?? () >> #1 0x00415a71 in auth_worker_destroy () >> #2 0x00415416 in auth_worker_call_timeout () > > Maybe this fixes it: http://hg.dovecot.org/dovecot-2.0/rev/cfd15170dff7 Nope, still crashes with the same stack. I'll rebuild with -g and report back. -Brad
Re: [Dovecot] Doveadm director flush/remove
Timo, On 7/16/10 4:23 AM, "Timo Sirainen" wrote: > >> Jul 16 01:50:44 cc-popmap7 dovecot: auth: Error: auth worker: Aborted >> request: Lookup timed out >> Jul 16 01:50:44 cc-popmap7 dovecot: master: Error: service(auth): child 1607 >> killed with signal 11 (core dumps disabled) > > I don't think that above change should have caused any crashes, so backtrace > would be nice. Here's a stack trace. Standard null function pointer. No locals, I think I'd have to recompile to get additional information. #0 0x in ?? () #1 0x00415a71 in auth_worker_destroy () #2 0x00415416 in auth_worker_call_timeout () #3 0x0038b3e5273d in io_loop_handle_timeouts_real () from /usr/lib64/dovecot/libdovecot.so.0 #4 0x0038b3e52797 in io_loop_handle_timeouts () from /usr/lib64/dovecot/libdovecot.so.0 #5 0x0038b3e53958 in io_loop_handler_run () from /usr/lib64/dovecot/libdovecot.so.0 #6 0x0038b3e527dd in io_loop_run () from /usr/lib64/dovecot/libdovecot.so.0 #7 0x0038b3e3b926 in master_service_run () from /usr/lib64/dovecot/libdovecot.so.0 #8 0x004184b1 in main () -Brad
Re: [Dovecot] Doveadm director flush/remove
Timo, On 7/15/10 4:18 PM, "Timo Sirainen" wrote: >>> Jul 15 13:46:24 cc-popmap7 dovecot: auth: Error: auth worker: Aborted >>> request: Lookup timed out >>> Jul 15 13:53:25 cc-popmap7 dovecot: auth: Error: getpwent() failed: No such >>> file or directory > > Also see if http://hg.dovecot.org/dovecot-2.0/rev/d13c1043096e fixes this or > if there are other timeouts? Now I get: Jul 16 01:50:44 cc-popmap7 dovecot: auth: Error: auth worker: Aborted request: Lookup timed out Jul 16 01:50:44 cc-popmap7 dovecot: master: Error: service(auth): child 1607 killed with signal 11 (core dumps disabled) Should I try to grab a core, or do you have a good idea where this is coming from? Seems suspiciously similar to the crash with '-f userlist'. -Brad
Re: [Dovecot] Doveadm director flush/remove
Timo, On 7/15/10 4:12 PM, "Timo Sirainen" wrote: > >> Maybe there could be a parameter to get the user list from a file (one >> username per line) instead of userdb. > > Added -f parameter for this. Awesome! I dumped a userlist (one username per line) which it seems to read through quite quickly, unfortunately I get... [r...@cc-popmap7 ~]# doveadm director map -f userlist.txt Segmentation fault (lots of pread/mmap snipped) pread(9, "user0\nuser1\nuser2\nuser3\nuser4"..., 8189, 393042) = 8189 mmap(NULL, 2101248, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2acbc3aae000 pread(9, "user5\nuser6\nuser7\nuser8\nuser9"..., 8188, 401231) = 36 pread(9, "", 8152, 401267) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- #1 0x2b55977ec1d0 in auth_connection_close () from /usr/lib64/dovecot/libdovecot.so.0 #2 0x2b55977ec258 in auth_master_deinit () from /usr/lib64/dovecot/libdovecot.so.0 #3 0x0040a059 in user_file_get_user_list () #4 0x0040a22f in cmd_director_map () #5 0x0040897d in doveadm_try_run_multi_word () #6 0x00408aab in doveadm_try_run () #7 0x00408e0f in main () -Brad
Re: [Dovecot] Director proxy timeout
On 7/13/10 4:53 AM, "Timo Sirainen" wrote: > > Hmm. "Between"? Is it doing CAPABILITY before or after login or both? That > anyway sounds different from the idle timeout problem.. I added some additional logging to imapproxy and it looks like it's actually getting stuck in a few different commands. It just depends on what it's trying to do when the connection gets wedged. What I'm seeing is that from time to time an imapproxy -> imap-login proxy connection will get stuck and cease responding to commands. After a while the PHP client will timeout and give up, after which the stuck connection goes back to the pool, and continues to get reused and cause hangs until I either restart imapproxy or kill off the imap-login proxy that the stuck socket is connected to. If I attach to the stuck imap-login process, it's waiting in: #0 0x00385c0c6070 in __write_nocancel () from /lib64/libc.so.6 #1 0x003c5620c9a1 in login_proxy_state_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #2 0x003c5620c026 in login_proxy_notify () from /usr/lib64/dovecot/libdovecot-login.so.0 #3 0x003c55e52521 in io_loop_handle_timeouts_real () from /usr/lib64/dovecot/libdovecot.so.0 #4 0x003c55e5257b in io_loop_handle_timeouts () from /usr/lib64/dovecot/libdovecot.so.0 #5 0x003c55e5373c in io_loop_handler_run () from /usr/lib64/dovecot/libdovecot.so.0 #6 0x003c55e525c1 in io_loop_run () from /usr/lib64/dovecot/libdovecot.so.0 #7 0x003c55e3b896 in master_service_run () from /usr/lib64/dovecot/libdovecot.so.0 #8 0x003c5620dc4b in main () from /usr/lib64/dovecot/libdovecot-login.so.0 #9 0x00385c01d994 in __libc_start_main () from /lib64/libc.so.6 #10 0x00402019 in _start () If I tcpdump the stuck connection, I can see that imapproxy sends something to the imap-login proxy when new clients are connected, but I'm not sure what since it's SSL encrypted. The response is an empty ack packet. I'm going to try disabling SSL between imapproxy and the director to see if I can figure out what it's sending. All in all I'm having a hard time debugging it since it only seems to happen when there are a decent number of users active. I'm not at all convinced that it's dovecot's fault, but if you have any suggestions or things that I could to to see what the imap-login proxy or backend think is going on I'd be much in your debt. -Brad
[Dovecot] Doveadm director flush/remove
I've got a couple more issues with the doveadm director interface: 1) If I use "doveadm director remove" to disable a host with active users, the director seems to lose track of users mapped to that host. I guess I would expect it to tear down any active sessions by killing the login proxies, like I'd done 'doveadm direct add HOSTNAME 0 && doveadm director flush HOSTNAME' before removing it? Here's what I see with an active open connection: [r...@cc-popmap7 ~]# doveadm director status brandond Current: 10.142.0.179 (expires 2010-07-14 01:26:14) Hashed: 10.142.0.179 Initial config: 10.142.0.161 [r...@cc-popmap7 ~]# doveadm director remove 10.142.0.179 [r...@cc-popmap7 ~]# doveadm director status brandond Current: not assigned Hashed: 10.142.0.174 Initial config: 10.142.0.161 2) "doveadm director flush" returns the wrong usage: [r...@cc-popmap7 ~]# doveadm director flush doveadm director remove [-a ] 3) "doveadm director flush" all breaks the ring: [r...@cc-popmap7 ~]# doveadm director flush all Jul 14 01:26:33 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/right disconnected Jul 14 01:26:33 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/left disconnected Jul 14 01:26:33 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/left): Invalid HOST-FLUSH args Jul 14 01:26:33 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/right): Invalid HOST-FLUSH args For some reason, flushing a host address only disconnects one side: [r...@cc-popmap7 ~]# doveadm director flush 10.142.0.160 Jul 14 01:28:23 cc-popmap7 dovecot: director: Error: Director 10.142.0.180:1234/right disconnected Jul 14 01:28:23 oh-popmap7 dovecot: director: Error: director(10.142.0.162:1234/left): Invalid HOST-FLUSH args -Brad
Re: [Dovecot] v2.0.rc2 released
Timo, On 7/11/10 10:58 AM, "Timo Sirainen" wrote: > >> dsync in hg tip is failing tests: > > Fixed now, as well as another dsync bug. Looks good! New doveadm director status is a little odd though. The 'mail server ip' column is way wide (I guess it adjusts to term size though?) and the users column got renamed to vhosts, so now there are two vhosts columns. -Brad
Re: [Dovecot] dovecot director service
Timo, On 7/11/10 12:06 PM, "Timo Sirainen" wrote: >> Pretty much anything built into Dovecot would be an improvement over an >> external script from my point of view. > > Yeah, some day I guess.. Well, I would definitely make use of it if you ever get around to coding it. >> With a script I have to deal with all >> kinds of questions like, which director do I have my script log in to? > > Any one of them. Sure, but that adds additional complexity to the script (selecting a host, retrying, etc)... or sticking the director/doveadm interface behind a load balancer if possible. >> What happens if it goes down? What happens if the monitoring host is down? > > Add redundancy :) Sure, but it's easier for me if I don't have to worry about it ;) > Well, you could make the doveadm interface available via TCP port, but > that really should be firewalled well. Hmm. It wouldn't be difficult to > patch doveadm director to also support connecting to host:port rather > than unix socket. That would be pretty awesome. Rignt now I could try to talk Director protocol from the management node to inject commands, but the Directors only accept connections from other hosts listed in the ring, right? So having doveadm function over TCP would be a big plus. -Brad
Re: [Dovecot] TLS Issue
Leander, On 7/10/10 2:14 PM, "Leander S." wrote: > "You have attempted to establish a connection with "server". However, > the security certificate presented belongs to "*.server". It is > possible, though unlikely, that someone may be trying to intercept your > communication with this web site." IIRC, wildcard certificates are only valid for subdomains. *.domain.com would be valid for a.domain.com, b.domain.com, but not domain.com. It also relies upon the client supporting wildcard certs. -Brad
Re: [Dovecot] v2.0.rc2 released
dsync in hg tip is failing tests: test-dsync-brain.c:176: Assert failed: test_dsync_mailbox_create_equals(&box_event.box, &src_boxes[6]) test-dsync-brain.c:180: Assert failed: test_dsync_mailbox_create_equals(&box_event.box, &dest_boxes[6]) Segmentation fault I'm currently using rev 77f244924009, I'm not sure when it started. -Brad On 7/9/10 3:14 PM, "Timo Sirainen" wrote: > > http://dovecot.org/releases/2.0/rc/dovecot-2.0.rc2.tar.gz > http://dovecot.org/releases/2.0/rc/dovecot-2.0.rc2.tar.gz.sig
Re: [Dovecot] dovecot director service
On 7/9/10 12:01 AM, "Xavier Pons" wrote: > I think this new funcionalities would be perfect (necessary ;-) ) for a > complete load balanced/high availability mail system. Timo, what you described sounds great. Pretty much anything built into Dovecot would be an improvement over an external script from my point of view. With a script I have to deal with all kinds of questions like, which director do I have my script log in to? What happens if it goes down? What happens if the monitoring host is down? I'd probably end up trying to put the director port behind the load-balancer and figuring out some way to get my script to talk Director protocol to add/remove mail servers, and that would just be ugly ;) >> Yeah. Any good naming ideas for the doveadm director command? :) > something like doveadm director servers ?!?! I'm not sure either. Maybe 'doveadm director ring'? I thought of suggesting that 'status' report the ring status, and the current output go to something like 'mailhosts'. After a moment of consideration, I realized that all the current director commands (add,remove,status) act on the mailhost list, not the director list, and so in that sense 'doveadm director' is really more like 'doveadm director-mailhosts' to begin with. -Brad
Re: [Dovecot] dovecot director service
Xavier, On 7/8/10 1:29 AM, "Xavier Pons" wrote: > > Yes, we will have two hardware balancers in front of proxies. Thus, the > director service will detect failures of backend servers and not forward > sessions them? how detects if a backend server it's alive or not? IIRC, it does not detect failures of backend servers. It's up to you to detect outages and react appropriately. The folks that sponsored Director development apparently have a monitoring script that removes downed nodes by running something like 'ssh directorhost doveadm director remove backendhost', and then re-adds them when they come back up. I'm not sure how I'm going to handle this myself, as our monitoring system only checks every 5 minutes, and our existing load balancers check and add/remove nodes every 20 seconds or so. 5 minutes would be a long time to have Dovecot trying to send users to a non-functional server. > The command 'doveadm director status', gives information about status of > backend mailservers or of director servers? Just the backend servers - it shows backend server addresses and how many users they're each assigned, or details on a specific user mapping. I am not aware of a way to get Dovecot to output the director ring status. That would be nice though, to be able to list the directors and how many connections they're each proxying. You might read though this thread, which starts here: http://www.dovecot.org/list/dovecot/2010-May/049189.html And continues later here: http://www.dovecot.org/list/dovecot/2010-June/049425.html -Brad
Re: [Dovecot] dovecot evaluation on a 30 gb mailbox
Timo, On 6/24/10 4:23 AM, "Timo Sirainen" wrote: >> >> I'd recommend also installing and configuring imapproxy - it can be >> beneficial with squirrelmail. > > Do you have any about a real world numbers about installation with and without > imapproxy? We run imapproxy behind our Roundcube instance, and our old in-house Perl mail system has a custom equivalent written in C that also does some caching of folder metadata and message headers. We run a proxy instance on each of the webmail hosts, with the communication between the web application add the proxy being done in cleartext, but with the proxy -> Dovecot communication secured over SSL. Besides preventing a lot of extra SSL handshakes and login/logout actions, it also helps tie a user session to a single backend node in our pool of IMAP servers. It seems like there might also be other benefits to having Dovecot not tear down all of the user session state between page loads. A lot of this stuff might be nice to see in the Director some day. If there was an option to not immediately close the Director proxy's backend connections when the user logs out, (ie leave the connection active and logged in for X seconds, and reuse it if the user logs in to the Director again), and if the auth caching works as well as you say, then I could definitely see a day where we replace imapproxy with a director instance on the webmail host. -Brad
Re: [Dovecot] 'doveadm who' enhancement request
On 6/2/10 7:33 PM, "Timo Sirainen" wrote: >> I wonder if they can stand up to 10k+ concurrent proxied >> connections though? > > I'd think so. I could probably give that a try, but I'll have a hard time convincing folks to do that until after 2.0 has out of beta for a bit. Maybe after summer term then... October or so? I can certainly start testing it in advance of that though. > Also another thought :) I guess you have now two login services for imap and > pop3, one for proxy and one for backend? No, just the one for each right now. I haven't figured out how to do that yet ;) Just multiple imap-login/pop3-login blocks with different inet_listener addresses? > You could do the same for auth > processes and override the other one's settings. Something like: > > # disable default auth process for proxy lookups > service auth { > executable = auth -o passdb/proxy/args=ignore.conf > unix_listener auth-login { > mode = 0 > } > } > > service auth-proxy { > unix_listener auth-login { > user = dovecot > mode = 0600 > } > } How do I tell different login services to use different auth backends? Is it the first argument to the login process executable? So like: service imap-login { executable = imap-login auth-proxy inet_listener imaps { address = 1.2.3.4 } } I'm still trying to grok what all the different config bits mean and imply. -Brad
Re: [Dovecot] RTFM: Manual pages for Dovecot v2.0
Pascal, On 5/31/10 11:40 PM, "Pascal Volk" wrote: > > I've spent some time for the fine manual. Whats new? > > Location: http://hg.localdomain.org/dovecot-2.0-man > So I don't have to flood the wiki with attachments. > As soon as the manual pages are complete, they will be included in the > Dovecot source tree. This is fantastic. When I get a moment, I'll definitely read them over. I spent a good bit of time getting a Dovecot 2.0 test system set up this weekend and I found myself flipping back and forth between Timo's release announcements, sample configurations in the tarball, and the raw source code. Doveconf in particular doesn't even seem to provide help text listing available command line flags, so a man page is very welcome. Documentation of what configuration options work where would also be particularly nice to see. The new syntax is incredibly powerful but also very complex. It appears that there are some things that will pass doveconf checks but will either cause errors or be ignored by the actual code at runtime. -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 6:56 PM, "Timo Sirainen" wrote: > > Oh, you're right. For auth settings currently only protocol blocks work. It > was a bit too much trouble to make local/remote blocks to work. :) That's too bad! Any hope of getting support for this and director+proxy_maybe anytime soon? -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 5:34 PM, "Brandon Davidson" wrote: > > Still not sure why it's not proxying though. The config looks good but it's > still using PAM even for the external IP. I played with subnet masks instead of IPs and using remote instead of local, as well as setting auth_cache_size = 0, but no dice. It still seems to ignore the block and only use the global definition, even if doveconf -f lip= shows that it's expanding it properly. -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 5:09 PM, "Timo Sirainen" wrote: > > Right .. it doesn't work exactly like that I guess. Or I don't remember :) > Easiest to test with: > > doveconf -f lip=128.223.142.138 -n That looks better: [r...@cc-popmap7 ~]# doveconf -f lip=128.223.142.138 -h |grep -B1 -A7 passdb } passdb { args = /etc/dovecot/proxy-sqlite.conf deny = no driver = sql master = no pass = no } passdb { args = deny = no driver = pam master = no pass = no } plugin { -- local 128.223.142.138 { passdb { args = /etc/dovecot/proxy-sqlite.conf driver = sql } } Still not sure why it's not proxying though. The config looks good but it's still using PAM even for the external IP. -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 4:36 PM, "Timo Sirainen" wrote: > > The passdbs and userdbs are checked in the order they're defined. You could > add them at the bottom. Or probably more easily: > > local 128.223.143.138 { > passdb { > driver = sql > args = .. > } > > passdb { > driver = pam > } > userdb { > driver = passwd > } Ahh, OK. For some reason I was assuming that the best match was used. Unfortunately that doesn't seem to work either. I've got it set up just as you recommended: [r...@cc-popmap7 ~]# cat /etc/dovecot/dovecot.conf | nl | grep -B1 -A4 passdb 35local 128.223.142.138 { 36 passdb { 37driver = sql 38args = /etc/dovecot/proxy-sqlite.conf 39 } 40} 41passdb { 42 driver = pam 43} 44userdb { 45 driver = passwd It still doesn't respect the driver for that local block, and uses PAM for everything: May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: client in: AUTH1 PLAINservice=imapsecuredlip=128.223.142.138 rip=128.223.162.22lport=993rport=57067resp= May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: pam(brandond,128.223.162.22): lookup service=dovecot May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: pam(brandond,128.223.162.22): #1/1 style=1 msg=Password: May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: pam(brandond,128.223.162.22): #1/1 style=1 msg=LDAP Password: May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: client out: OK1 user=brandond May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: master in: REQUEST1 56521d19a5592fd2206241cfc0ca658020b0b May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: passwd(brandond,128.223.162.22): lookup May 31 16:48:16 cc-popmap7 dovecot: auth: Debug: master out: USER1 brandondsystem_groups_user=brandonduid=41027gid=91 home=/home10/brandond May 31 16:48:16 cc-popmap7 dovecot: imap-login: Login: user=, method=PLAIN, rip=128.223.162.22, lip=128.223.142.138, TLS, mailpid=5667 Interestingly enough, if I run 'doveconf -n' it doesn't seem to be retaining the order I specified. The local section is dropped down to the very end: [r...@cc-popmap7 ~]# doveconf -n | nl | grep -B1 -A4 passdb 31} 32passdb { 33 driver = pam 34} 35plugin { 36 quota = fs:user:inode_per_mail -- 82local 128.223.142.138 { 83 passdb { 84args = /etc/dovecot/proxy-sqlite.conf 85driver = sql 86 } 87} Ideas? -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 4:13 PM, "Timo Sirainen" wrote: > You need to put the other passdb/userdb to the external IP: > > local 1.2.3.4 { >> userdb { >> driver = passwd >> } >> passdb { >> driver = sql >> args = /etc/dovecot/proxy-sqlite.conf >> } > > } > It still doesn't seem to work. I tried this, with no userdb/passdb outside a local block: local 128.223.142.138 { userdb { driver = passwd } passdb { driver = sql args = /etc/dovecot/proxy-sqlite.conf } } local 10.142.0.162 { userdb { driver = passwd } passdb { driver = pam } } But I got this error in the log file upon connecting to the external IP: May 31 16:20:42 cc-popmap7 dovecot: auth: Fatal: No passdbs specified in configuration file. PLAIN mechanism needs one May 31 16:20:42 cc-popmap7 dovecot: master: Error: service(auth): command startup failed, throttling May 31 16:20:42 cc-popmap7 dovecot: master: Error: service(director): child 5339 killed with signal 11 (core dumps disabled) May 31 16:20:42 cc-popmap7 dovecot: master: Error: service(director): command startup failed, throttling So I added a global passdb/userdb: userdb { driver = passwd } passdb { driver = pam } local 128.223.142.138 { userdb { driver = passwd } passdb { driver = sql args = /etc/dovecot/proxy-sqlite.conf } } local 10.142.0.162 { userdb { driver = passwd } passdb { driver = pam } } And again it uses the global passdb for all requests, ignoring the contents of the local blocks. -Brad
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, On 5/31/10 6:04 AM, "Timo Sirainen" wrote: > Well .. maybe you could use separate services. Have the proxy listen on > public IP and the backend listen on localhost. Then you can do: > > local_ip 127.0.0.1 { > passdb { > .. > } > } > > and things like that. I think it would work, but I haven't actually > tried. It doesn't seem to be honoring the passdb setting within the local block. I've got a single host set up with director, and itself listed as a mail server: director_servers = 128.223.142.138 director_mail_servers = 128.223.142.138 userdb { driver = passwd } passdb { driver = sql args = /etc/dovecot/proxy-sqlite.conf } local 127.0.0.1 { passdb { driver = pam } } If I telnet to localhost and attempt to log in, the logs show: May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: client in: AUTH1 PLAINservice=imapsecuredlip=127.0.0.1rip=127.0.0.1 lport=143rport=60417resp= May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: sql(brandond,127.0.0.1): query: SELECT null AS password, 'Y' AS nopassword, 'Y' AS proxy May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: client out: OK1 user=brandondproxypass= May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: client in: AUTH1 PLAINservice=imapsecuredlip=128.223.142.138 rip=128.223.142.138lport=143rport=44453resp= May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: sql(brandond,128.223.142.138): query: SELECT null AS password, 'Y' AS nopassword, 'Y' AS proxy May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: client out: OK1 user=brandondproxypass= May 31 14:39:34 cc-popmap7 dovecot: imap-login: Error: Proxying loops to itself: user=, method=PLAIN, rip=128.223.142.138, lip=128.223.142.138, secured, mailpid=0 May 31 14:39:34 cc-popmap7 dovecot: auth: Debug: new auth connection: pid=4700 May 31 14:39:34 cc-popmap7 dovecot: imap-login: Disconnected (auth failed, 1 attempts): user=, method=PLAIN, rip=128.223.142.138, lip=128.223.142.138, secured, mailpid=0 Even if the alternate passdb worked, how would I get it to connect to the backend on localhost? It looks like the proxy connection comes in over the external IP even if it's to itself, as the external address is what's specified as the proxy destination by the director. I do have a private network that I run NFS over; I suppose I could run the proxy on the external, backend on the internal, and use only the internal IPs in the mailserver list. I've also tried that, but it doesn't seem to work either due to the passdb setting not being honored within local|remote blocks. Even if it did, wouldn't it still complain about the proxy looping back to itself since both lip and rip would both be local addresses? Unless the loopback check just compares to see if they're the same... Either way, it seems like having proxy_maybe work with the director service would make the whole setup a lot simpler. > There's not yet a static passdb .. perhaps there should be. But you > could use e.g. sqlite backend for the proxy and use: > > password_query = select null as password, 'Y' as nopassword, 'Y' as > proxy That seems to work well enough, with the major caveat noted above.
Re: [Dovecot] A new director service in v2.0 for NFS installations
Timo, After straightening out some issues with Axel's spec file, I'm back to poking at this. On 5/25/10 3:14 PM, "Timo Sirainen" wrote: > So instead of having separate proxies and mail servers, have only hybrids > everywhere? I guess it would almost work, except proxy_maybe isn't yet > compatible with director. That's actually a bit annoying to implement.. You > could of course run two separate Dovecot instances, but that also can be a bit > annoying. Would I have to run two separate instances, or could I just set up multiple login services on different ports; one set to proxy (forwarding the password to the remote server) and one set to not? I suppose each login service would have to use a different authdb, which I don't know how to do. > No. The director service simply adds "host" field to auth lookup replies if > the original reply had proxy=y but didn't have host field. Interesting. It sounds like proxying requires a database query that will return 'proxy=y' as part of the auth lookup. It would be nice to have a static password authdb for proxying that didn't require a real database backend. I'm using PAM now, and don't see a good way to enable proxying. The wiki also says that there's a way to let the proxy backend handle authentication, but I don't see an example of that anywhere. > Yes. So the connections between the proxies should be pretty fast. I think the > maximum bytes transferred per user is 38. Cool. > The proxies always try to keep connecting to next available server (i.e. if > the next server won't connect, it tries one further away until it finally > connects to something or reaches itself). So the segmentation could happen > only if there was no network connection between the two segments. Ahh, OK - good to know. That sounds like a good way to do it. Can I confirm my understanding of a few other things? It looks like the mailserver list is initially populated from director_mail_servers, but can be changed by discovering hosts from other directors or by adding/removing hosts with doveadm. Since the initial host list is not written back in to the config file, changes made with doveadm are not persistent across service restarts. Does 'doveadm director ' need to be run against each director individually, or will the changes be sent around the ring? If a new host comes up with a mailserver in its list that has been removed by doveadm, will the handshake remove it from the list? The list of director servers used to build the ring is read from director_servers, and cannot be changed at runtime. A host finds its position within the ring based on its order within the list, and connects to hosts to its left and right until it has a connection on either side and can successfully send a test message around the ring. Is that all correct? What happens if some hosts have only a subset, or different subsets, of a group of hosts in their mail server or director server list? Thanks! -Brad
Re: [Dovecot] beta5 builds under RHEL
On 5/30/10 2:49 PM, "Axel Thimm" wrote: > > How are your %optflags (which is the same as $RPM_OPT_FLAGS) merged > into the build if it is not passed to make? And it would yield the > same CFLAGS as above (merged default optflags with what configure adds > to it). They're exported by the %configure macro, and configure writes the combined CFLAGS, CXXFLAGS, and FFLAGS into the Makefile... so it's not necessary (and possibly detrimental) to both export them before configuring and pass them explicitly to make, as the command-line CLFAGS option overrides the Makefile CFLAGS declaration that includes -std=gnu99. My point is, if I don't inclue CFLAGS="..." in my call to make in the spec file, it builds fine, and *does* include all the necessary optflags. Give it a try. -Brad
Re: [Dovecot] beta5 builds under RHEL
Axel, On 5/30/10 10:22 AM, "Axel Thimm" wrote: >> >> Oh, the spec file overrides CFLAGS and doesn't contain -std=gnu99? >> > > The config.log for RHEL5/x86_64 says: > > CFLAGS='-std=gnu99 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 > -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 > -mtune=generic -Wall -W -Wmissing-prototypes -Wmissing-declarations > -Wpointer-arith -Wchar-subscripts -Wformat=2 -Wbad-function-cast > -Wstrict-aliasing=2 -I/usr/kerberos/include ' It may be a specfile after all. %configure exports CFLAGS before calling ./configure, which should be sufficient to get any needed options into the Makefile, merged with whatever configure auto-detects (including -std=gnu99). Your spec also calls "make CFLAGS="$RPM_OPT_FLAGS" which overrides everything and omits -std=gnu99 unless specifically included by the packager. If I remove that and just call 'make' it works fine - my %optflags are merged in with the CFLAGS from configure and the build completes without error. -Brad
Re: [Dovecot] beta5 builds under RHEL
Axel, On 5/30/10 3:39 AM, "Axel Thimm" wrote: > > Now it is more consistent and looks like a change between 4.1.2 and > 4.4.1. > > Maybe in the older gcc -std=gnu99 didn't set __USE_ISOC99 and thus the > missing constants were not defined? If I '%define optflags -std=gnu99' in the spec it builds just fine, so I don't think it's a compiler problem. Maybe a libtool issue? -Brad
Re: [Dovecot] beta5 builds under RHEL
Axel, On 5/30/10 12:05 AM, "Axel Thimm" wrote: > beta4 built under RHEL4, RHEL5 and RHEL6 (the latter being the public > beta). beta5 now builds only for RHEL5, the other two fail with: > > strnum.c: In function `str_to_llong': > strnum.c:139: error: `LLONG_MIN' undeclared (first use in this function) > strnum.c:139: error: (Each undeclared identifier is reported only once > strnum.c:139: error: for each function it appears in.) FWIW, the build fails with the same error within my CentOS 5 Mock build environment. I'm not sure what I've got set up different than you, but I'm using a slightly tweaked version of your spec file and a pretty vanilla Mock 0.6 setup. -Brad
Re: [Dovecot] quick question
Hi David, > -Original Message- > From: David Halik > > It looks like we're still working towards a layer 7 solution anyway. > Right now we have one of our student programmers hacking Perdition with > a new plugin for dynamic username caching, storage, and automatic fail > over. If we get it working I can send you the basics if you're interested. I'd definitely be glad in taking a look at what you come up with! I'm still leaning towards MySQL with quick local fallback, but I'm nowhere near committed to anything. On a side note, we've been running with the two latest maildir patches in production for a few days now. The last few days we've been seeing a lot of lock failures: Feb 10 04:06:02 cc-popmap6p dovecot: imap-login: Login: user=, method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=12881 Feb 10 04:08:03 oh-popmap3p dovecot: imap-login: Login: user=, method=PLAIN, rip=67.223.67.45, lip=128.223.142.39, TLS, mailpid=9569 Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=, rip=67.223.67.45, pid=12881: Timeout while waiting for lock for transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=, rip=67.223.67.45, pid=12881: Our dotlock file /home6/pellerin/Maildir/dovecot-uidlist.lock was modified (1265803562 vs 1265803684), assuming it wa Feb 10 04:09:02 cc-popmap6p dovecot: imap: user=, rip=67.223.67.45, pid=12881: Connection closed bytes=31/772 Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=, rip=67.223.67.45, pid=9569: Timeout while waiting for lock for transaction log file /home6/pellerin/.imapidx/.INBOX/dovecot.index.log Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=, rip=67.223.67.45, pid=9569: Our dotlock file /home6/pellerin/Maildir/dovecot-uidlist.lock was deleted (locked 180 secs ago, touched 180 secs ago) Feb 10 04:11:04 oh-popmap3p dovecot: imap: user=, rip=67.223.67.45, pid=9569: Connection closed bytes=18/465 I'm not sure if this is just because it's trying more diligently to make sure it's got the latest info, and is therefore hitting locks where it didn't previously... but it's been hanging our clients and requiring manual intervention to clear. We've been removing the lock file and killing any active dovecot sessions, which seems to resolve things for a while. Just thought I'd see if this was happening to anyone else. -Brad
Re: [Dovecot] quick question
Hi David, > -Original Message- > From: David Halik > > I've been running both patches and so far they're stable with no new > crashes, but I haven't really seen any "better" behavior, so I don't > know if it's accomplishing anything. =) > > Still seeing entire uidlist list dupes after the list goes stale. I > think that was what we were originally discussing. I wasn't able to roll the patched packages into production until this morning, but so far I'm seeing the same thing as you - no real change in behavior. I guess that brings us back to Timo's possibility number two? -Brad
Re: [Dovecot] proxy_maybe regex
David, > -Original Message- > From: dovecot-bounces+brandond=uoregon@dovecot.org [mailto:dovecot- > > There are ways of doing this in mysql, with heartbeats etc (which we've > discussed before), but then I'm back to mysql again. Maybe mysql just > has to be the way to go in this case. > > Brad, any more investigation into this? I've been mulling it over in my head, but haven't had a chance to actually build up a test environment and start playing with it yet. I got some other things (Blackboard, for those that can sympathize) dropped in my lap, and that's been consuming the majority of my time. I do like the possibility of falling back to a local connection if the database goes away. I am curious to see how it behaves if the database is corrupt, database server is down, host is offline, and so on. All that plus figuring out the best schema, queries, cleanup, etc of course ;) -Brad
Re: [Dovecot] quick question
Timo, On 1/25/10 12:31 PM, "Timo Sirainen" wrote: > > I don't think it's immediate.. But it's probably something like: > > - notice it's not working -> reconnect > - requests are queued > - reconnect fails, hopefully soon, but MySQL connect at least fails in max. > 10 seconds > - reconnect timeout is added, which doubles after each failure > - requests are failed while it's not trying to connect Hmm, that's not great. Is that tunable at all? Cursory examination shows that it's hardcoded in src/lib-sql/driver-mysql.c, so I guess not. I suppose I could also get around to playing with multi-master replication so I at least have a SQL server available at each of the sites that I have Dovecot servers... -Brad
Re: [Dovecot] quick question
Timo, > -Original Message- > From: Timo Sirainen [mailto:t...@iki.fi] > > On 25.1.2010, at 21.30, Brandon Davidson wrote: > > If it could be set up to just fall back to > > using a local connection in the event of a SQL server outage, that might > > help things a bit. Anyone know how that might work? > > Well, you can always fall back to LDAP if SQL isn't working.. Just something > like: > > passdb sql { > .. > } > passdb ldap { > .. > } Or just 'passdb pam { ... }' for the second one in our case, since we're using system auth with pam_ldap/nss_ldap. Is the SQL connection/query timeout configurable? It would be nice to make a very cursory attempt at proxying, and immediately give up and use a local connection if anything isn't working. -Brad
Re: [Dovecot] quick question
David, > Though we aren't using NFS we do have a BigIP directing IMAP and POP3 > traffic to multiple dovecot stores. We use mysql authentication and the > "proxy_maybe" option to keep users on the correct box. My tests using an > external proxy box didn't significantly reduce the load on the stores > compared to proxy_maybe. And you don't have to manage another > box/config. Since you only need to keep users on the _same_ box and not > the _correct_ box, if you're using mysql authentication you could hash > the username or domain to a particular IP address: > > SELECT CONCAT('192.168.1.', ORD(UPPER(SUBSTRING('%d', 1, 1))) AS host, > 'Y' AS proxy_maybe, ... > > Just assign IP addresses 192.168.1.48-90 to your dovecot servers. Shift > the range by adding or subtracting to the ORD. A mysql function would > likely work just as well. If a server goes down, move it's IP. You could > probably make pairs with heartbeat or some monitoring software to do it > automatically. Timo posted a similar suggestion recently, and I might try to find some time to proof this out over the next few weeks. I liked his idea of storing the user's current server in the database and proxying to that, with fallback to a local connection if they're new or their current server is unavailable. The table cleanup and pool monitoring would probably be what I'd worry most about testing. Unfortunately we're currently using LDAP auth via PAM... so even if I could get the SQL and monitoring issues resolved, I think I'd have a hard time convincing my peers that adding a SQL server as a single point of failure was a good idea. If it could be set up to just fall back to using a local connection in the event of a SQL server outage, that might help things a bit. Anyone know how that might work? -Brad
Re: [Dovecot] quick question
David, > -Original Message- > From: David Halik [mailto:dha...@jla.rutgers.edu] > > *sigh*, it looks like there still might be the occasional user visible > issue. I was hoping that once the assert stopped happening, and the > process stayed alive, that the users wouldn't see their inbox disappear > and reappear apparently, this is still happening occasionally. > > I just had user experience this with TB 2, and after looking at the logs > I found the good ole' stale nfs message: > Hmm, that's disappointing to hear. I haven't received any new reports from our helpdesk, so maybe it's at least less visible? > For now they're just have to live with it until I either get proxy_maybe > setup, or some other solution. Let me know if you come up with anything. I'm not sure we want to add MySQL as a dependency for our mail service... but I'm at least curious to see how things perform with session affinity. I'll add it to my long list of things to play with when I have time for such things... -Brad
Re: [Dovecot] quick question
David, On 1/22/10 12:34 PM, "David Halik" wrote: > > We currently have IP session 'sticky' on our L4's and it didn't help all > that much. yes, it reduces thrashing on the backend, but ultimately it > won't help the corruption. Like you said, multiple logins will still go > to different servers when the IP's are different. > > How if your webmail architecture setup? We're using imapproxy to spread > them them out across the same load balancer, so essentially all traffic > from outside and inside get's balanced. The trick is we have an internal > load balanced virtual IP that spreads the load out for webmail on > private IP space. If they were to go outside they would get NAT'd as one > outbound IP, so we just go inside and get the benefit of balancing. We have two webmail interfaces - one is an old in-house open-source project called Alphamail, the new one is Roundcube. Both of them point at the same VIP that we point users at, with no special rules. We're running straight round-robin L4 connection distribution, with no least-connections or sticky-client rules. We've been running this way for about 3 years I think.. I've only been here a year. We made a number of changes in sequence starting about three and a half years ago - Linux NFS to Netapp, Courier to Dovecot, mbox to Maildir+, LVS to F5 BigIP; not necessarily in that order. At no point have we ever had any sort of session affinity. > That's where we are, and as long as the corruptions stay user invisible, > I'm fine with it. Crashes seem to be the only user visible issue so far, > with "noac" being out of the question unless they buy a ridiculously > expensive filer. Yeah, as long as the users don't see it, I'm happy to live with the messages in the log file. -Brad
Re: [Dovecot] quick question
Cor, On 1/22/10 1:05 PM, "Cor Bosman" wrote: > > Pretty much the same as us as well. 35 imap servers. 10 pop servers. > clustered pair of 6080s, with about 250 15K disks. We're seeing some > corruption as well. I myself am using imap extensively and regularly have > problems with my inbox disappearing. Im not running the patch yet though. Is > 1.2.10 imminent or should i just patch 1.2.9? You guys must serve a pretty heavy load. What's your peak connection count across all those machines? How's the load? We recently went through a hardware replacement cycle, and were targeting < 25% utilization at peak load so we can lose one of our sites (half of our machines are in each site) without running into any capacity problems. We're actually at closer to 10% at peak, if that... Probably less now that we've disabled noac. Dovecot is fantastic :) -Brad
Re: [Dovecot] quick question
David, > -Original Message- > From: dovecot-bounces+brandond=uoregon@dovecot.org [mailto:dovecot- > Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with > the same NFS backend where the index, control, and Maildir's for the > users reside. Accessing this are direct connections from clients, plus > multiple squirrelmail webservers, and pine users, all at the same time > with layer4 switch connection load balancing. > > Each server has an average of about 400 connections, for a total of > around concurrent 4000 during a normal business day. This is out of a > possible user population of about 15,000. > > All our dovecot servers syslog to one machine, and on average I see > about 50-75 instances of file corruption per day. I'm not counting each > line, since some instances of corruption generate a log message for each > uid that's wrong. This is just me counting "user A was corrupted once at > 10:00, user B was corrupted at 10:25" for example. We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4, Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster (active/standby) in a L4 profile distributing connections round-robin, maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks), 10k peak concurrent connections for 45k total accounts. We used to run with the noac mount option, but performance was abysmal, and we were approaching 80% CPU utilization on the filers at peak load. After removing noac, our CPU is down around 30%, and our NFS ops/sec rate is maybe 1/10th of what it used to be. The downside to this is that we've started seeing significantly more crashing and mailbox corruption. Timo's latest patch seems to have fixed the crashing, but the corruption just seems to be the cost of distributing users at random across our backend servers. We've thought about enabling IP-based session affinity on the load balancer, but this would concentrate the load of our webmail clients, as well as not really solving the problem for users that leave clients open on multiple systems. I've done a small bit of looking at nginx's imap proxy support, but it's not really set up to do what we want, and would require moving the IMAP virtual server off our load balancers and on to something significantly less supportable. Having the dovecot processes 'talk amongst themselves' to synchronize things, or go into proxy mode automatically, would be fantastic. Anyway, that's where we're at with the issue. As a data point for your discussion with your boss: * With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most of these were related to users going over quota. * After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes a day. The crashes were highly visible to the users, as their mailbox would appear to be empty until the rebuild completed. * Since applying the latest patch, we've seen no crashes, and 60-70 'Corrupt' errors a day. We have not had any new user complaints. Hope that helps, -Brad
Re: [Dovecot] 1.2.9 imap crash with backtrace
Hi David, On 1/14/10 3:13 PM, "David Halik" wrote: > > FYI, we backed out of the "noac" change today. When our 20K accounts > started coming to work the NetApp NFS server was pushing 70% CPU usage > and 25K NFS Ops/s, which resulted in all kinds of other havoc as normal > services started becoming slow. This server usally runs around 25% and > 5K, so such a large increase of load was too much to handle. > > During the 12 hour window I didn't see a single uid error as expected, > but the fix was worse than the problem. We're pretty loathe to go back to noac as well. We will probably disable process log throttling (mail_log_max_lines_per_sec = 0) to increase the reindex speed until Timo comes up with a fix for the crash. This should at least help the users "get their mail back" in a more reasonable timeframe. -Brad
Re: [Dovecot] 1.2.9 imap crash with backtrace
Timo, > -Original Message- > From: Timo Sirainen > > 1721 is not in the recs[] list, since it's sorted and the first one is 1962. > > So there's something weird going on why it's in the filename hash table, but > not in the array. I'll try to figure it out later.. I hope your move is going well, and you get settled in and your internet hooked up soon. It's got to be a rough process! Just for the record, we continue to see this crash fairly frequently with a small subset of our users, enough so that they have started to complain to the helpdesk staff about their mail 'disappearing and then reappearing.' One user in particular has a mail client left open from three hosts and has hit it 23 times in the last week, and 10 times today. If there's any more information I can collect or anything I can do to help get this resolved, please let me know! -Brad
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
Timo, On 12/23/09 8:37 AM, "David Halik" wrote: > I switched all of our servers to dotlock_use_excl=no last night, but > we're still seeing the errors: We too have set dotlock_use_excl = no. I'm not seeing the "Stale NFS file handle" message any more, but I am still seeing a crash. The crashes seem to be leaving the indexes in a bad state: Dec 23 09:07:44 oh-popmap3p dovecot: imap: user=, rip=x.x.x.x, pid=30101: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 23 09:07:44 oh-popmap3p dovecot: imap: user=, rip= x.x.x.x, pid=30101: Raw backtrace: imap [0x4d8986] -> imap [0x4d97b0] -> imap(i_fatal+0) [0x4d8c7a] -> imap [0x44f2cc] -> imap [0x44f814] -> imap [0x4500a2] -> imap(maildir_uidlist_refresh+0x9d) [0x450686] -> imap [0x44bff1] -> imap [0x44c0a8] -> imap [0x44c178] -> imap(maildir_storage_sync_init+0x7c) [0x44c6e6] -> imap(mailbox_sync_init+0x44) [0x489922] -> imap(imap_sync_init+0xab) [0x42e02b] -> imap [0x41ccc4] -> imap [0x41cd26] -> imap [0x4733be] -> imap [0x4e4171] -> imap(io_loop_handle_timeouts+0x1d) [0x4e41ce] -> imap(io_loop_handler_run+0x86) [0x4e4f29] -> imap(io_loop_run+0x3b) [0x4e4214] -> imap(main+0xa6) [0x4300af] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3217e1d994] -> imap [0x419aa9] Dec 23 09:07:45 oh-popmap3p dovecot: dovecot: child 30101 (imap) killed with signal 6 (core dumped) Dec 23 09:09:16 cc-popmap3p dovecot: imap: user=, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size Dec 23 09:09:17 oh-popmap2p dovecot: imap: user=, rip=y.y.y.y, pid=3279: read() failed with index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: Input/output error Dec 23 09:09:38 cc-popmap3p dovecot: imap: user=, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size Dec 23 09:18:12 cc-popmap3p dovecot: imap: user=, rip= x.x.x.x, pid=5975: Corrupted index cache file /home16/cnisser/.imapidx/.INBOX/dovecot.index.cache: invalid record size We're also seeing another odd error that seems to be unrelated to the crashes, but seemed like it bears reporting. Reading of uidlists and cache files seems to intermittently fail with EIO. It doesn't seem to tie in with anything else, and I don't see any corresponding NFS errors in the system log. Dec 23 09:31:06 oh-popmap4p dovecot: imap: user=, rip=a.a.a.a, pid=7641: read(/home6/joet/Maildir/dovecot-uidlist) failed: Input/output error Dec 23 09:53:17 cc-popmap2p dovecot: imap: user=, rip=b.b.b.b, pid=12840: read(/home3/catm/Maildir/dovecot-uidlist) failed: Input/output error Dec 23 09:59:38 cc-popmap5p dovecot: imap: user=, rip=c.c.c.c, pid=13539: read() failed with index cache file /home15/kforrist/.imapidx/.INBOX/dovecot.index.cache: Input/output error -Brad
Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
We've started seeing the maildir_uidlist_records_array_delete assert crash as well. It always seems to be preceded by a 'stale NFS file handle' error from a the same user on a different connection. Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=, rip=a.a.a.a, pid=2439: fdatasync(/home11/apbao/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=, rip=a.a.a.a, pid=2439: /home11/apbao/Maildir/dovecot-uidlist: next_uid was lowered (2642 -> 2641, hdr=2641) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=, rip=b.b.b.b, pid=28088: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=, rip=b.b.b.b, pid=28088: Raw backtrace: imap [0x4d8986] -> imap [0x4d97b0] -> imap(i_fatal+0) [0x4d8c7a] -> imap [0x44f2cc] -> imap [0x44f814] -> imap [0x4500a2] -> imap(maildir_uidlist_refresh+0x9d) [0x450686] -> imap [0x44bff1] -> imap [0x44c0a8] -> imap [0x44c178] -> imap(maildir_storage_sync_init+0x7c) [0x44c6e6] -> imap(mailbox_sync_init+0x44) [0x489922] -> imap(imap_sync_init+0xab) [0x42e02b] -> imap [0x42f107] -> imap(cmd_sync_delayed+0x1c6) [0x42f663] -> imap(client_handle_input+0x119) [0x4244d4] -> imap(client_input+0xb4) [0x424594] -> imap(io_loop_handler_run+0x17d) [0x4e5020] -> imap(io_loop_run+0x3b) [0x4e4214] -> imap(main+0xa6) [0x4300af] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c4ea1d994] -> imap [0x419aa9] Dec 22 11:17:26 cc-popmap2p dovecot: dovecot: child 28088 (imap) killed with signal 6 (core dumped) Dec 22 13:16:49 cc-popmap3p dovecot: imap: user=, rip=x.x.x.x, pid=3908: fdatasync(/home2/ndunn/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=, rip=y.y.y.y, pid=3228: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=, rip=y.y.y.y, pid=3228: Raw backtrace: imap [0x4d8986] -> imap [0x4d97b0] -> imap(i_fatal+0) [0x4d8c7a] -> imap [0x44f2cc] -> imap [0x44f814] -> imap [0x4500a2] -> imap(maildir_uidlist_refresh+0x9d) [0x450686] -> imap [0x44bff1] -> imap [0x44c0a8] -> imap [0x44c178] -> imap(maildir_storage_sync_init+0x7c) [0x44c6e6] -> imap(mailbox_sync_init+0x44) [0x489922] -> imap(imap_sync_init+0xab) [0x42e02b] -> imap [0x42f107] -> imap(cmd_sync_delayed+0x1c6) [0x42f663] -> imap(client_handle_input+0x119) [0x4244d4] -> imap(client_input+0xb4) [0x424594] -> imap(io_loop_handler_run+0x17d) [0x4e5020] -> imap(io_loop_run+0x3b) [0x4e4214] -> imap(main+0xa6) [0x4300af] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e5021d994] -> imap [0x419aa9] Dec 22 13:25:16 cc-popmap3p dovecot: dovecot: child 3228 (imap) killed with signal 6 (core dumped) I will note that we did not start seeing this crash until we took 'noac' out of our NFS mount options, as discussed on this list late last week. On the other hand, load on our NFS server (as measured in IOPS/sec) has dropped by a factor of 10. -Brad > -Original Message- > From: dovecot-bounces+brandond=uoregon@dovecot.org [mailto:dovecot- > bounces+brandond=uoregon@dovecot.org] On Behalf Of David Halik > Sent: Tuesday, December 22, 2009 7:48 AM > To: dovecot@dovecot.org > Subject: Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace) > > > I'm seeing both of these dumps on multiple users now with 1.2.9, so I > went ahead and did backtraces for them both. > > maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8 > > ns_get_listed_prefix panic: http://pastebin.com/f1420194c > > > On 12/21/2009 12:43 PM, David Halik wrote: > > > > Just wanted to update you that I just upgraded all of our servers to > > 1.2.9 and I'm still seeing the array_delete panic: > > > > Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: > > file maildir-uidlist.c: line 403 > > (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) > > Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: > > file maildir-uidlist.c: line 403 > > (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) > > > > I also started receiving a good deal of these, but only from one user > > so far: > > > > Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: > > file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: > > (match == IMAP_MATCH_YES) > > Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: > > file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: > > (match == IMAP_MATCH_YES) > > Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: > > file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: > > (match == IMAP_MATCH_YES) > > Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: > > file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: > > (match ==
[Dovecot] Maildir on NFS - attribute caching question
Hi Timo, We've been running Dovecot with Maildir on NFS for quite a while - since back in the 1.0 days I believe. I'm somewhat new here. Anyway... The Wiki article on NFS states that 1.1 and newer will flush attribute caches if necessary with mail_nfs_storage=yes. We're running 1.2.8 with that set, as well as mail_nfs_index=yes, mmap_disable=yes and fsync_disable=no. We have a pool of POP/IMAP and SMTP machines that are accessing the maildirs, and can't guarantee any sort of user session affinity to a particular host. We also mount our NFS shares with 'noac', which is what I'm writing to ask about. I'd like to stop doing that for performance reasons. Do you see any issues with taking that out of the mount options, given our environment? Thanks, -Brad
Re: [Dovecot] 1.2.7: recs[i]->uid < rec-> uid
Timo, > -Original Message- > > I'm not really sure why these are happening. I anyway changed them from > > being assert-crashes to just logged errors. I'm interested to find out > > what it logs now and if there are any user-visible errors. > > http://hg.dovecot.org/dovecot-1.2/rev/e47eb506eebd > > FWIW, I'm seeing this on 1.2.8 as well - just for one user so far. I'll try > applying this patch, and report if I see anything else logged. The user who encountered an assert crash prior to this patch now seems to be working properly. I am not aware of any errors presented to the client, but the logs show the following on the first login after application of the patch: Nov 25 07:51:28 oh-popmap1p dovecot: imap: user=, rip=x.x.x.x, pid=13702: /home6/youm/Maildir/.Deleted Messages/dovecot-uidlist: uid=24464 exists in index, but not in uidlist Nov 25 07:51:28 oh-popmap1p dovecot: imap: user=, rip=x.x.x.x, pid=13702: /home6/youm/Maildir/.Deleted Messages/dovecot-uidlist: uid=24520 exists in index, but not in uidlist Nov 25 07:51:28 oh-popmap1p dovecot: imap: user=, rip=x.x.x.x, pid=13702: /home6/youm/Maildir/.Deleted Messages/dovecot-uidlist: uid=24532 exists in index, but not in uidlist I have not seen it repeated since. -Brad
Re: [Dovecot] 1.2.7: recs[i]->uid < rec-> uid
> -Original Message- > On Sun, 2009-11-22 at 23:54 +0100, Edgar Fuß wrote: > > I'm getting this Panic with some users on dovecot-1.2.7: > > > > Panic: file maildir-uidlist.c: line 1242 > > (maildir_uidlist_records_drop_expunges): assertion failed: (recs[i]- > > >uid < rec-> > > uid) > > I'm not really sure why these are happening. I anyway changed them from > being assert-crashes to just logged errors. I'm interested to find out > what it logs now and if there are any user-visible errors. > http://hg.dovecot.org/dovecot-1.2/rev/e47eb506eebd FWIW, I'm seeing this on 1.2.8 as well - just for one user so far. I'll try applying this patch, and report if I see anything else logged. For the record, the old epoll_ctl issue was resolved by the patch that reordered the fd closes. We ran 1.2.6 with that patch for quite a while and it didn't reoccur once. -Brad
Re: [Dovecot] pop3-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5):
Hi Timo, > -Original Message- > From: Timo Sirainen [mailto:t...@iki.fi] > > On Thu, 2009-10-29 at 12:08 -0700, Brandon Davidson wrote: > > I haven't applied the fd leak detection patch, but I do have lsof output > > and a core file available here: > > http://uoregon.edu/~brandond/dovecot-1.2.6/ > > There's no 0,12 in the lsof list.. Annoying, I can't seem to find what > it is. 0,10 is inotify, 0,11 is epoll, but 0,12 just doesn't show up > anywhere. It looks like eventpoll uses the dynamic minor stuff (.minor = MISC_DYNAMIC_MINOR), so it could well be that this is just what it got on his system due to something else loading and requesting a dynamic minor before eventpoll loaded. A better check (if one is necessary) might be to see if the minor of the leaked device is different from the minor of the epoll device right after creation. > The core file is also pretty useless without the exact same binaries and > libraries that produced it. RPMs are now in that directory. > You could also set login_process_per_connection=no and this should go > away, because then it only creates login processes at startup and can't > fail randomly later. Are there any downsides to doing this? -Brad
Re: [Dovecot] pop3-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5):
We've had this reoccur twice this week. In both cases, it seems to hit a swath of machines all within a few minutes. For some reason it's been limited to the master serving pop3 only. In all cases, the logging socket at fd 5 had gone missing. I haven't applied the fd leak detection patch, but I do have lsof output and a core file available here: http://uoregon.edu/~brandond/dovecot-1.2.6/ Timo, is there anything else I can collect to assist in debugging this? I'd rather not go back to 1.2.4, but my coworkers are becoming annoyed at having to restart the master processes every few days. -Brad
Re: [Dovecot] pop3-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5):
Hi Marco, On 10/22/09 1:50 AM, "Marco Nenciarini" wrote: > This morning it happened another time, another time during the daily > cron execution. > > Oct 22 06:26:57 server dovecot: pop3-login: Panic: Leaked file fd 5: dev > 0.12 inode 1005 > Oct 22 06:26:57 server dovecot: dovecot: Temporary failure in creating > login processes, slowing down for now > Oct 22 06:26:57 server dovecot: dovecot: child 21311 (login) killed with > signal 6 (core dumps disabled) > > I have dovecot 1.2.6 with Timo's patch to check leaked descriptors. I rebuilt the binaries on our hosts with optimization disabled, and I'm still waiting for it to reoccur so I can gather file descriptor information and a core. I don't have the leak-detect patch applied. Let's see what Timo has to say about that log file bit. Since it seems to happen to you fairly frequently, it might be worth enabling core dumps as well? -Brad
Re: [Dovecot] NFS random redirects
Thomas, On 10/22/09 1:29 AM, "Thomas Hummel" wrote: > On Wed, Oct 21, 2009 at 09:39:22AM -0700, Brandon Davidson wrote: >> As a contrasting data point, we run NFS + random redirects with almost no >> problems. > > Thanks for your answer as well. > > What mailbox format are you using ? We switched to Maildir a while back due to performance issues with mbox, primarily centered around locking and the cost of rewriting the entire file when one message changes. Haven't looked back since. Our config is pretty vanilla - users in LDAP (via pam_ldap), standard UNIX home directory layout, Sendmail on the MTA hosts. -Brad
Re: [Dovecot] NFS random redirects
On 10/21/09 8:59 AM, "Guy" wrote: > Our current setup uses two NFS mounts accessed simultaneously by two > servers. Our load balancing tries to keep a user on the same server whenever > possible. Initially we just had roundrobin load balancing which led to index > corruption. > The problems we've had with that corruption have simply been that some > messages are displayed twice or not displayed at all in mail clients. > Deletion of the corrupted index allowed Dovecot to recreate it correctly, so > the client can't do anything about it. You'd probably have to do it manually > or have some sort of web interface for users to do it themselves. > > I certainly wouldn't use NFS with multiple servers accessing it again for > Dovecot. Looking at a clustered FS on SAN solution at the moment. As a contrasting data point, we run NFS + random redirects with almost no problems. We host ~7TB of mail for ~45k users with a peak connection count of 10k IMAP connections, and maybe a handful of POP3. We make absolutely no effort to make sure that connections from the same user or IP are routed to the same server. We do occasionally see index corruption, but it is almost always related to the user going over quota, and Dovecot being unable to write to the logs. If we wanted to solve this problem, we could move the indexes off to a second tier of storage. It is a very minor issue though. Locking has not been a problem at all. I will say that this may be a situation where you get what you pay for. We've invested a fair amount of money in our storage system (Netapp), server pool (RHEL5), and networking technology (F5 BigIP LTM). Our mail is spread across 16 volumes on two filers, and we are careful to stress-test the servers and storage backend before rolling out major upgrades. That is not of course to neglect the value of things that are free - like Dovecot! Many thanks to Timo for maintaining such a wonderful piece of software! -Brad
Re: [Dovecot] master:Error @2.0 TLS login, "Too large auth data_size sent"
On Red Hat based distros, do: echo 'DAEMON_COREFILE_LIMIT="unlimited"' >> /etc/sysconfig/dovecot && service dovecot restart Might be worth putting in the wiki if it's not there already? -Brad > -Original Message- > ==> /var/log/dovecot/dovecot.log <== > Oct 15 09:07:33 master: Info: Dovecot v2.0.alpha1 starting up (core > dumps disabled) > > how do i enable coredumps?
Re: [Dovecot] Dovecot 1.2.6 segfault in imap_fetch_begin
Hi Timo, > -Original Message- > From: Timo Sirainen [mailto:t...@iki.fi] > > This just shouldn't be happening. Are you using NFS? Anyway this should > replace the crash with a nicer error message: > http://hg.dovecot.org/dovecot-1.2/rev/6c6460531514 Yes, we've got a pool of servers with Maildir on NFS with quotas enabled. Occasionally the users will run out of space and the indexes will get corrupted or out of sync. Our Helpdesk staff will increase their quota or help them delete things, and Dovecot logs a stream of " Expunged message reappeared" and "Duplicate file entry" messages as it straightens things out. This is a fairly common occurrence given the size of our user base, so I'm assuming this is the root cause... but this is the first time I've seen Dovecot crash as a result. -Brad
Re: [Dovecot] pop3-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5):
I seem to have run into the same issue on two of our 12 Dovecot servers this morning: Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: child 7529 (login) returned error 89 (Fatal failure) Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: child 7532 (login) returned error 89 (Fatal failure) Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: Temporary failure in creating login processes, slowing down for now Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: Temporary failure in creating login processes, slowing down for now Oct 15 03:41:51 oh-popmap5p dovecot: imap-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5): Operation not permitted Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: Created login processes successfully, unstalling Oct 15 03:41:51 oh-popmap5p dovecot: pop3-login: Fatal: io_loop_handle_add: epoll_ctl(1, 5): Operation not permitted Oct 15 03:41:51 oh-popmap5p dovecot: dovecot: Created login processes successfully, unstalling Oct 15 03:41:52 oh-popmap5p dovecot: dovecot: child 7576 (login) returned error 89 (Fatal failure) Oct 15 03:41:52 oh-popmap5p dovecot: dovecot: Temporary failure in creating login processes, slowing down for now All 12 of our servers are running Dovecot 1.2.6, and all of them were upgraded from 1.2.4 and restarted at Oct 13 04:00 by a cron job that updates packages from our internal Yum repo. Only two of the servers encountered this issue. We run two separate master processes on each host - one for IMAP, one for POP3. The IMAP service runs with a significantly increased login_max_processes_count, and continued to serve user requests. The POP3 service hit the max login process limit and stopped accepting new connections, which triggered our alerting system. For what it's worth, I was able to kill -HUP the master processes on both machines and things seemed to return to normal. I also took the precaution of killing off the pop3 login processes to get new connections accepted. Timo, is there any more information I could gather about this issue? We've got a fairly large pool of machines, and odds are it will crop up again if we wait long enough. Thanks, -Brad
Re: [Dovecot] Dovecot 1.2.6 segfault in imap_fetch_begin
Timo, > -Original Message- > -O2 compiling has dropped one stage from the backtrace, but I think this > will fix the crash: > > I guess it would be time for 1.2.7 somewhat soon.. Thanks! As always, you're one step ahead of us with the bug fixes! I've got one more for you that just popped up. I'm guessing that it's also due to expunging causing sequence numbers to mixed up, and one of the existing patches will fix it? The error from the logs is: Panic: file mail-transaction-log-view.c: line 108 (mail_transaction_log_view_set): assertion failed: (min_file_seq <= max_file_seq) Raw backtrace: imap [0x49e4a0] -> imap [0x49e503] -> imap [0x49db66] -> imap(mail_transaction_log_view_set+0x4ac) [0x48651c] -> imap(mail_index_view_sync_begin+0xe5) [0x480055] -> imap(index_mailbox_sync_init+0x7f) [0x45e84f] -> imap(maildir_storage_sync_init+0x100) [0x43cd30] -> imap(imap_sync_init+0x67) [0x428257] -> imap(cmd_sync_delayed+0x174) [0x4284a4] -> imap(client_handle_input+0x19e) [0x420aee] -> imap(client_input+0x5f) [0x4214df] -> imap(io_loop_handler_run+0xf8) [0x4a61f8] -> imap(io_loop_run+0x1d) [0x4a530d] -> imap(main+0x620) [0x428da0] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x31d5a1d994] -> imap [0x419a89] dovecot: child 11758 (imap) killed with signal 6 (core dumped) Backtrace and such here: http://uoregon.edu/~brandond/dovecot-1.2.6/bt2.txt Thanks again, -Brad
[Dovecot] Dovecot 1.2.6 segfault in imap_fetch_begin
We recently upgraded from Dovecot 1.2.4 to 1.2.6 (with the sieve patches of course). Everything has been running quite well since the upgrade. The occasional issue with assert-crashing when expunging has gone away. However, one of our users seems to have triggered a new issue. She's been the only one to see it, but whenever she logs in, her imap process segfaults immediately. It appears that the crash is a null pointer deref in the array library, but I'm not sure what code is at fault for calling in without checking array validity... or even if I'm on the right track. Backtraces and some further information are available here. Cores available on request. http://uoregon.edu/~brandond/dovecot-1.2.6/bt.txt Thanks, -Brad
[Dovecot] Dovecot 1.2.4 - assertion crash in view_lookup_seq_range
Hi all, We have a number of machines running Dovecot 1.2.4 that have been assert crashing occasionally. It looks like it's occurring when the users expunge their mailboxes, but I'm not sure as I can't reproduce it myself. The error in the logs is: Oct 6 07:33:09 oh-popmap3p dovecot: imap: user=, rip=, pid=11931: Panic: file mail-index-view.c: line 264 (view_lookup_seq_range): assertion failed: (first_uid > 0) Oct 6 07:33:09 oh-popmap3p dovecot: imap: user=, rip=, pid=11931: Raw backtrace: imap [0x49e130] -> imap [0x49e193] -> imap [0x49d816] -> imap [0x47e462] -> imap(mail_index_lookup_seq+0x12) [0x47e022] -> imap(mail_index_view_sync_begin+0x36a) [0x47ffba] -> imap(index_mailbox_sync_init+0x7f) [0x45e56f] -> imap(maildir_storage_sync_init+0x100) [0x43cb70] -> imap(imap_sync_init+0x67) [0x428177] -> imap(cmd_sync_delayed+0x174) [0x4283c4] -> imap(client_handle_input+0x19e) [0x420a0e] -> imap(client_input+0x5f) [0x4213ff] -> imap(io_loop_handler_run+0xf8) [0x4a5e98] -> imap(io_loop_run+0x1d) [0x4a4fad] -> imap(main+0x620) [0x428cc0] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x323dc1d994] -> imap [0x4199f9] Oct 6 07:33:09 oh-popmap3p dovecot: dovecot: child 11931 (imap) killed with signal 6 (core dumped) GDB stack information and some additional details are available here: http://uoregon.edu/~brandond/dovecot-1.2.4/stack.txt We are planning to go to 1.2.6 sometime in the next week or two, but I thought I'd try to track this particular error down just in case it's still an issue after the upgrade. -Brad
Re: [Dovecot] Dovecot 1.2.5 segfaults.
Tom, Tom Diehl wrote: I just updated to dovecot 1.2.5 on centos5. 1.2.4 did not show this problem. I am going to roll back for the time being but I am willing to do whatever I need to to fix this. This is an x86_64 system. filesystem is ext3. I am now seeing the following in the logs: Sep 22 17:31:06 vfoggy kernel: imap[18644]: segfault at rip rsp 7fff83e31c88 error 14 I think I just posted a patch for your issue. It's possible there is another null function call in 1.2.5, but I'd bet against it. I can provide updated RPMs for testing if you are interested. -Brandon
[Dovecot] Segfault in quota-fs plugin
Hi all, We recently attempted to update our Dovecot installation to version 1.2.5. After doing so, we noticed a constant stream of crash messages in our log file: Sep 22 15:58:41 hostname dovecot: imap-login: Login: user=, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS Sep 22 15:58:41 hostname dovecot: dovecot: child 6339 (imap) killed with signal 11 (core dumps disabled) We rolled back to version 1.2.4, and installed 1.2.5 on a test system - something we'll have to make sure to do *before* rolling new versions into production. Anyway, after examining a few core files from the test system, it looks like the recent changes to the quota plugin (specifically the maildir backend's late initialization fix) have broken the other backends. Stack trace and further debugging are available here: http://uoregon.edu/~brandond/dovecot-1.2.5/bt.txt The relevant code seems to have been added in changeset 9380: http://hg.dovecot.org/dovecot-1.2/rev/fe063e0d7109 Specifically, quota.c line 447 does not check to see if the backend implements init_limits before calling it, resulting in a null function call for all backends that do not do so. Since this crash would appear to affect all quota backends other than maildir, it should be a pretty easy to reproduce. I've attached a patch which seems to fix the obvious code issue. I can't guarantee it's the correct fix since this is my first poke at the Dovecot source, but it seems to have stopped the crashing on our test host. Regards, -Brandon dovecot-1.2.5-check-init_limits.patch Description: dovecot-1.2.5-check-init_limits.patch