Re: [IMAP] Over-designed /Some thoughts ?
On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: snip I think it would be a good think to simplify the api a bit to make it a bit easier to understand. So some points which came to me mind: 1) UidChangeTracking: Is this really necessary ? It does some kind of caching but I don't see something else for which its useful. Why not just fire the events directly with a shared MailboxEventDispatcher which is the same for all Mailboxes? i'm not convinced it's needed but beware... this is one of the few areas retained from the design before i started reworking. i had hoped to replace it but never really worked out how to do that without crippling performance or breaking IMAP. I'm currently testing imap without the UidChangeTracker and so far it seems like its not really slower then before.. it's only slower than the alternatives that required to make IMAP work properly ;-) IIRC UIDChangeTracker tracks UID changes made by concurrent sessions accessing the same mailbox. the local caching should work for users own changes. it's possible that some of the changes i might have made it redundant by now but i don't trust the functional concurrency tests. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. i inherited torque and this is one area i left alone ;-) I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. for torque the session needs to be held to manage concurrency (mailbox access needs to be synchronized). for OpenJPA, sounds like the mailbox structure needs to be there to manage synchronization and caching but a new OpenJPA object needs to be created each time. The other problem with this is, the Mailbox should be tight to the MailboxSession. Let me explain why. For example in JCR we could use the User/Pass which is bound to the MailboxSession to access different parts of the JCR Repository etc.. i thought this too originally but i couldn't work out how to do so without cripple performance or breaking IMAP. Sure good performance is a must, but I would prefer to have a good api first ;) this wasn't a good performance issue but a usable at all one when two sessions are accessing the same mailbox, there are a handful number of operations which require caching and concurrency control to maintain correctness. there are a number of ways that this design could work. mailbox et al is inherited, and probably not my first choice. i would prefer to revise the API by pushing the Mailbox functions into MailboxManager, and so making it an internal feature which could be varied by implementations. the namespace handling is problematic, so i would then model namespace by a Mailbox object which could be passed in to each method in the API. IIRC these are related issues. the essential function is caching and synchronization. in performances terms, i think much higher performance could be achieved by replacement by something asynchronous and event driven using a blocking queue. this would be a substantial change. I agree with you here. But as you outlined already, its not a easy thing todo, without rewrite a lot of stuff. very little rewriting but hard, and risky for the poorly tested concurrent use cases. then again, maybe these don't work ATM. the best place to start would be by using creating some more concurrency tests. there's an application that creates tests in the package org.apache.james.imap.functional.builder in seda. I even tend to believe we should do something similar to what we have in SMTP/POP3. Just have some kind of LineHandler which push data in the processor when a CRLF was detected and so not using blocking streams as input at all. the IMAP protocol makes this approach tricky, but in general yes. the protocol handling foo is intended to address this, and should be quite close now. - robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: Hi Robert, nice to see you are back ;)... Comments inline.. i'm really busy ATM (exams, projects, reports) :-/ this will probably have to be my last post for a while... 2010/4/27 Robert Burrell Donkin robertburrelldon...@gmail.com: On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements.. Oh well... its some kind of a pita. +1 sessions that select a mailbox have to be updated by operations done by other sessions on that mailbox 1) UidChangeTracking: the API uses a listeners and events to manage updates to UIDs etc. this design may need to be revised. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. AFACT the OpenJPA stuff shouldn't be caching the mailboxes in the manager (caching should be managed internally by OpenJPA). sounds like a session management problem is much more likely. ATM sessions cache entity managers for the duration. if you're running a lot of concurrent connections, they need to be pooled. FWIW in IMAP, a session may: * have a long term interest in a mailbox spanning multiple requests * need to perform multiple operations on one or more mailboxes to execute a single protocol request IMHO the failure to cleanly and clearly separate these is a major flaw in current API. the Mailbox caching issues are just a consequence. fixing these would require a major rewrite with knowledge of the current foibles... - robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: Hi Robert, nice to see you are back ;)... Comments inline.. i'm really busy ATM (exams, projects, reports) :-/ this will probably have to be my last post for a while... Ok, hope to see you back in some time.. 2010/4/27 Robert Burrell Donkin robertburrelldon...@gmail.com: On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements.. Oh well... its some kind of a pita. +1 sessions that select a mailbox have to be updated by operations done by other sessions on that mailbox Yes I'm aware of this.. I'm still trying to get my head around of a good solution to see without breakin to much ;) 1) UidChangeTracking: the API uses a listeners and events to manage updates to UIDs etc. this design may need to be revised. I think the event stuff is good. Just the Tracker is not needed and add a useless layer of complexity to the api.. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. AFACT the OpenJPA stuff shouldn't be caching the mailboxes in the manager (caching should be managed internally by OpenJPA). sounds like a session management problem is much more likely. ATM sessions cache entity managers for the duration. if you're running a lot of concurrent connections, they need to be pooled. Yeah we should remove the caching of mailboxes at all from the MailboxManager, because (as you already stated) it depend on the implementation if a cache is needed and how a cache is implemented. At the moment I refactored the openjpa stuff to use one EntityManager per request. This seems to work out well so far. Even better would be to use one entitymanager per mailboxsession. FWIW in IMAP, a session may: * have a long term interest in a mailbox spanning multiple requests * need to perform multiple operations on one or more mailboxes to execute a single protocol request IMHO the failure to cleanly and clearly separate these is a major flaw in current API. the Mailbox caching issues are just a consequence. fixing these would require a major rewrite with knowledge of the current foibles... - robert Bye, Norman Ps: And good luck for your exams mate.. - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
On Wed, Apr 28, 2010 at 8:44 AM, Norman Maurer norman.mau...@googlemail.com wrote: 2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: Hi Robert, nice to see you are back ;)... Comments inline.. i'm really busy ATM (exams, projects, reports) :-/ this will probably have to be my last post for a while... Ok, hope to see you back in some time.. thunderbird 3 hates our IMAP so i'm having mail issues :- 2010/4/27 Robert Burrell Donkin robertburrelldon...@gmail.com: On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements.. Oh well... its some kind of a pita. +1 sessions that select a mailbox have to be updated by operations done by other sessions on that mailbox Yes I'm aware of this.. I'm still trying to get my head around of a good solution to see without breakin to much ;) 1) UidChangeTracking: the API uses a listeners and events to manage updates to UIDs etc. this design may need to be revised. I think the event stuff is good. Just the Tracker is not needed and add a useless layer of complexity to the api.. AFACT in https://svn.apache.org/repos/asf/james/imap/trunk UIDChangeTracker is only used by Torque. why not just move it into that package then leave it alone? 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. AFACT the OpenJPA stuff shouldn't be caching the mailboxes in the manager (caching should be managed internally by OpenJPA). sounds like a session management problem is much more likely. ATM sessions cache entity managers for the duration. if you're running a lot of concurrent connections, they need to be pooled. Yeah we should remove the caching of mailboxes at all from the MailboxManager, because (as you already stated) it depend on the implementation if a cache is needed and how a cache is implemented. At the moment I refactored the openjpa stuff to use one EntityManager per request. This seems to work out well so far. Even better would be to use one entitymanager per mailboxsession. for JPA, they should really be pooled, then obtained per request - robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 8:44 AM, Norman Maurer norman.mau...@googlemail.com wrote: 2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: Hi Robert, nice to see you are back ;)... Comments inline.. i'm really busy ATM (exams, projects, reports) :-/ this will probably have to be my last post for a while... Ok, hope to see you back in some time.. thunderbird 3 hates our IMAP so i'm having mail issues :- Just till upgrade ? 2010/4/27 Robert Burrell Donkin robertburrelldon...@gmail.com: On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements.. Oh well... its some kind of a pita. +1 sessions that select a mailbox have to be updated by operations done by other sessions on that mailbox Yes I'm aware of this.. I'm still trying to get my head around of a good solution to see without breakin to much ;) 1) UidChangeTracking: the API uses a listeners and events to manage updates to UIDs etc. this design may need to be revised. I think the event stuff is good. Just the Tracker is not needed and add a useless layer of complexity to the api.. AFACT in https://svn.apache.org/repos/asf/james/imap/trunk UIDChangeTracker is only used by Torque. why not just move it into that package then leave it alone? Yeah its still work in progress.. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. AFACT the OpenJPA stuff shouldn't be caching the mailboxes in the manager (caching should be managed internally by OpenJPA). sounds like a session management problem is much more likely. ATM sessions cache entity managers for the duration. if you're running a lot of concurrent connections, they need to be pooled. Yeah we should remove the caching of mailboxes at all from the MailboxManager, because (as you already stated) it depend on the implementation if a cache is needed and how a cache is implemented. At the moment I refactored the openjpa stuff to use one EntityManager per request. This seems to work out well so far. Even better would be to use one entitymanager per mailboxsession. for JPA, they should really be pooled, then obtained per request - robert Bye, Norman - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
On Wed, Apr 28, 2010 at 10:19 AM, Norman Maurer norman.mau...@googlemail.com wrote: 2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 8:44 AM, Norman Maurer norman.mau...@googlemail.com wrote: 2010/4/28 Robert Burrell Donkin robertburrelldon...@gmail.com: On Wed, Apr 28, 2010 at 5:55 AM, Norman Maurer norman.mau...@googlemail.com wrote: Hi Robert, nice to see you are back ;)... Comments inline.. i'm really busy ATM (exams, projects, reports) :-/ this will probably have to be my last post for a while... Ok, hope to see you back in some time.. thunderbird 3 hates our IMAP so i'm having mail issues :- Just till upgrade ? thunderbird 3 automatically retries operations in the background. so, when it crashes it will retry the next time you open it. it crashes a lot anyway but if you try any long running operations that are likely to time out, it quickly becomes unusable. starting to write something on long operations would probably help by stopping some of the timeouts but basically the rewrite to the IMAP client has been poorly executed. - robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
[IMAP] Over-designed /Some thoughts ?
Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... I think it would be a good think to simplify the api a bit to make it a bit easier to understand. So some points which came to me mind: 1) UidChangeTracking: Is this really necessary ? It does some kind of caching but I don't see something else for which its useful. Why not just fire the events directly with a shared MailboxEventDispatcher which is the same for all Mailboxes? 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time The other problem with this is, the Mailbox should be tight to the MailboxSession. Let me explain why. For example in JCR we could use the User/Pass which is bound to the MailboxSession to access different parts of the JCR Repository etc.. Thoughts ? - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements... I think it would be a good think to simplify the api a bit to make it a bit easier to understand. So some points which came to me mind: 1) UidChangeTracking: Is this really necessary ? It does some kind of caching but I don't see something else for which its useful. Why not just fire the events directly with a shared MailboxEventDispatcher which is the same for all Mailboxes? i'm not convinced it's needed but beware... this is one of the few areas retained from the design before i started reworking. i had hoped to replace it but never really worked out how to do that without crippling performance or breaking IMAP. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. The other problem with this is, the Mailbox should be tight to the MailboxSession. Let me explain why. For example in JCR we could use the User/Pass which is bound to the MailboxSession to access different parts of the JCR Repository etc.. i thought this too originally but i couldn't work out how to do so without cripple performance or breaking IMAP. IIRC these are related issues. the essential function is caching and synchronization. in performances terms, i think much higher performance could be achieved by replacement by something asynchronous and event driven using a blocking queue. this would be a substantial change. - robert - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Re: [IMAP] Over-designed /Some thoughts ?
Hi Robert, nice to see you are back ;)... Comments inline.. 2010/4/27 Robert Burrell Donkin robertburrelldon...@gmail.com: On Tue, Apr 27, 2010 at 10:42 AM, Norman Maurer nor...@apache.org wrote: Hi all, after spending some time over the weekend to fix some issues with IMAP I started to feel its a big over-designed ... i started out with that impression. after digging around i came to the conclusion that IMAP has some annoying requirements.. Oh well... its some kind of a pita. I think it would be a good think to simplify the api a bit to make it a bit easier to understand. So some points which came to me mind: 1) UidChangeTracking: Is this really necessary ? It does some kind of caching but I don't see something else for which its useful. Why not just fire the events directly with a shared MailboxEventDispatcher which is the same for all Mailboxes? i'm not convinced it's needed but beware... this is one of the few areas retained from the design before i started reworking. i had hoped to replace it but never really worked out how to do that without crippling performance or breaking IMAP. I'm currently testing imap without the UidChangeTracker and so far it seems like its not really slower then before.. 2) Global Mailbox caching At the moment the Mailbox is cached in a HashMap. The problem with this is it will never get recycled by the GC. This can generate a OOM over long time i run IMAP with approx 1.5G spread over around a hundred mailboxes. i've never had an OOM. so i never bothered changing this. I think you use Torque right ? Maybe it behave a bit different there. I'm using JPA and its reproducable with feeding a mailbox with ca 1 million emails. You will see the memory usage just grow and grow.. When I took a heap dump it seems like the OpenJPA objects where never released, because the where hold in the HashMap. The other problem with this is, the Mailbox should be tight to the MailboxSession. Let me explain why. For example in JCR we could use the User/Pass which is bound to the MailboxSession to access different parts of the JCR Repository etc.. i thought this too originally but i couldn't work out how to do so without cripple performance or breaking IMAP. Sure good performance is a must, but I would prefer to have a good api first ;) IIRC these are related issues. the essential function is caching and synchronization. in performances terms, i think much higher performance could be achieved by replacement by something asynchronous and event driven using a blocking queue. this would be a substantial change. I agree with you here. But as you outlined already, its not a easy thing todo, without rewrite a lot of stuff. I even tend to believe we should do something similar to what we have in SMTP/POP3. Just have some kind of LineHandler which push data in the processor when a CRLF was detected and so not using blocking streams as input at all. - robert Thx, Norman - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org