Re: Next Steps for James
Hi. Le 07/05/2018 à 06:47, Simon Levesque a écrit : >> 1. DOCS and TUTORIALS > > To help get new documentation fast, a wiki would be better than: creating a > Jira, changing text files in the repository and doing a pull request. > > Also, one big head-ache is that when we download the release, we cannot use > it right away and then slowly go throught all the changes we can make to > the config one part at a time. There should be a default config that just > works like "receive and send emails, no-ssl". Then, we could create some > sample/bundle configuration for: > - receive and send emails with ssl (maildir/jpa) > - receive emails ; send via a gateway (maildir/jpa) > > That would give ready to use full config and also some examples of how to > tweak things. > In my opinion, we should defend about knowledge spreading. We should tend to have technical information about the software in a single location. That both makes reading and writing docs/ tutorials easier. What I do believe is that we need "feature oriented scenario" allowing to quickly start a James server that does what you want, then having specific links pointing to what you want/can configure. Such use cases might be: - Starting up a JPA IMAP server - Setting up a distributed server - Doing inbound - Spawning a James server for your tests - ... (any other idea welcome) ... Things like SMTP gateway are in my opinion "low level features" that might not deserve such a guide, but could be linked at the end of that guide. Concerning SMTP mail sending, that is interesting. The doc exist on the mailet level but is rather pretty hard to find. Maybe we should consider adding a TLD page. Finally, maybe the "configure" section of the website [1] is too messy... We could subdivide the configure section between "backend configuration" and "other configuration"... We could improve the base page to better explain what the configuration pages will bring [1] http://james.apache.org/server/config.html > >> 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS > > I am not sure on this one. If you are doing a lot of API changes that > breaks all the components, yes that can be hard to maintain a lot of > components, but if you are mostly adding new mailet functionnalities and > new backends, that shouldn't be taking much time and it is actually giving > more choices to users. Even more so that there are Docker tests done with > them, so you can be confident to not have broken anything. > The fact is some implementations conform to the APIs but not with the generic tests coming with it. Moreover, code might be obscur, outdated and relying on hacks. We have a clear definition of what a trusted backend is: - It should match generic tests - Have a docker for easy use - Pass load testing generated by Gatling - Have and pass integration tests Many implementations fails to gain traction and do not comply with such criteria. And the cost to do so would be high. Of course, the idea is to reach a consensus here, and not drop an implementation that people are willing to support. I imagine that a call to contribution would be made, followed by deprecation then removal. > >> 3. FULLY DISTRIBUTED >> It sounds like a fully distributed solution (potentially running on > Kubernetes) could be a better differentiator. There is still work to > achieve this (especially on the queuing level). > > Not sure if having something for Kubernetes out of the box would really be > a differenciator. There are more people with Linux machines than with > Kubernetes clusters installed. > If I just think about using a provided Kubernetes cluster, I don't think > emails are good there. Eg: > - On Google Compute Engine, we cannot send emails directly, we need to use > an email gateway > - On Amazon, same thing > - On DigitalOcean, you can receive and send emails, but without using a > gateway, some of them could get lost (looking at Microsoft that sends > everything not whitelisted from a user to /dev/null instead of their SPAM > folder to give them a chance to know they are actually missing emails) > So using their Kubernetes cluster would still not be that simple to > configure. The big deal here would be of course to attract other companies, who can dedicate developers to Apache James ;-) More than a choice of technology, what is important is the "distributed stuff" feature, which is completly missing in the OpenSource landscape.. > > > > From Pablo's response >> I do not think current capabilities of the server are well promoted so > better communicating the current features would be good to get more users > to try the server. Maybe a sort of marketing campaign releasing some smart > things people could quickly do with the server would be nice. > > +1 for that. While configuring for the first time James by looking at the > config files and at the Mailet/Matcher code directly, I found so many nice > features that I thought about maybe do in the future, but
Re: Next Steps for James
> slow release schedule I totally agree. As a recall, the last one is from last October, which would be fine if nothing happened since, but there are a lot of commits so there should have way more releases. I had to manually create a new release from "master" to have my fixes and I would prefer to use an official release. > 1. DOCS and TUTORIALS To help get new documentation fast, a wiki would be better than: creating a Jira, changing text files in the repository and doing a pull request. Also, one big head-ache is that when we download the release, we cannot use it right away and then slowly go throught all the changes we can make to the config one part at a time. There should be a default config that just works like "receive and send emails, no-ssl". Then, we could create some sample/bundle configuration for: - receive and send emails with ssl (maildir/jpa) - receive emails ; send via a gateway (maildir/jpa) That would give ready to use full config and also some examples of how to tweak things. > 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS I am not sure on this one. If you are doing a lot of API changes that breaks all the components, yes that can be hard to maintain a lot of components, but if you are mostly adding new mailet functionnalities and new backends, that shouldn't be taking much time and it is actually giving more choices to users. Even more so that there are Docker tests done with them, so you can be confident to not have broken anything. > 3. FULLY DISTRIBUTED > It sounds like a fully distributed solution (potentially running on Kubernetes) could be a better differentiator. There is still work to achieve this (especially on the queuing level). Not sure if having something for Kubernetes out of the box would really be a differenciator. There are more people with Linux machines than with Kubernetes clusters installed. If I just think about using a provided Kubernetes cluster, I don't think emails are good there. Eg: - On Google Compute Engine, we cannot send emails directly, we need to use an email gateway - On Amazon, same thing - On DigitalOcean, you can receive and send emails, but without using a gateway, some of them could get lost (looking at Microsoft that sends everything not whitelisted from a user to /dev/null instead of their SPAM folder to give them a chance to know they are actually missing emails) So using their Kubernetes cluster would still not be that simple to configure. >From Pablo's response > I do not think current capabilities of the server are well promoted so better communicating the current features would be good to get more users to try the server. Maybe a sort of marketing campaign releasing some smart things people could quickly do with the server would be nice. +1 for that. While configuring for the first time James by looking at the config files and at the Mailet/Matcher code directly, I found so many nice features that I thought about maybe do in the future, but that are actually already there out of the box. That needs to be easier than looking at the code to be able to find all these goodies. Cheers On Sun, 6 May 2018 at 17:14 pablo pita leirawrote: > Well, I am no mail expert and I am not confronted with the distributed > case. As a user, my modest use case is that I want to have control of my > private email, and as I know Java, I like to be able to work with the > server if I like to implement something. > > Respect the first point, I need some solution to keep a few gigabytes of > email which I can deploy in a Linux server easily. For me, ideally I > would want a mail James package that I could upgrade to new releases > easily. > > And respect the second one, the code base makes the product work, and > gives the chance to adapt for whatever case is needed among many of > them. The code base is huge because of the great amount of choice, and > makes understanding of the parts more complex. Therefore, I am for > simplifying the code base by removing less used options, or having them > separate. And of course, documentation is helpful as a new user to start > with the server. As developer, quickly setup an environment to start > hacking is welcome. In my case, I have no docker experience, and I am > used to run applications the old way. > > I think the simple mail server use case is important for single > developers to try and test new features. The distributed use case with > kubernetes makes it a bit harder for me (I do not have experience with > that technology). Requirements for companies are at another level, > indeed. But many marketing features for James would sell both groups > fine, single users and companies. > > I do not think current capabilities of the server are well promoted so > better communicating the current features would be good to get more > users to try the server. Maybe a sort of marketing campaign releasing > some smart things people could quickly do with the server would be
Re: Next Steps for James
Well, I am no mail expert and I am not confronted with the distributed case. As a user, my modest use case is that I want to have control of my private email, and as I know Java, I like to be able to work with the server if I like to implement something. Respect the first point, I need some solution to keep a few gigabytes of email which I can deploy in a Linux server easily. For me, ideally I would want a mail James package that I could upgrade to new releases easily. And respect the second one, the code base makes the product work, and gives the chance to adapt for whatever case is needed among many of them. The code base is huge because of the great amount of choice, and makes understanding of the parts more complex. Therefore, I am for simplifying the code base by removing less used options, or having them separate. And of course, documentation is helpful as a new user to start with the server. As developer, quickly setup an environment to start hacking is welcome. In my case, I have no docker experience, and I am used to run applications the old way. I think the simple mail server use case is important for single developers to try and test new features. The distributed use case with kubernetes makes it a bit harder for me (I do not have experience with that technology). Requirements for companies are at another level, indeed. But many marketing features for James would sell both groups fine, single users and companies. I do not think current capabilities of the server are well promoted so better communicating the current features would be good to get more users to try the server. Maybe a sort of marketing campaign releasing some smart things people could quickly do with the server would be nice. That was my 2 cents. El 06/05/18 a las 08:31, Eric Charles escribió: Hi James Community, We have just discussed on the private list actions to further gain users and developers on the Apache James mail server. The discussion started as we are slow to convert new contributors to committers and we have a slow release schedule. I will summarize key points we have discussed. This is just a base to start the discussions and we really would love and need to hear your voice on this. 1. DOCS and TUTORIALS - We have a new website but no easy tutorials. - Which platform to use (readthedocs...?) - Migrate/Close Wiki. 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS - We may have to do some choice: Drop some Mailbox implementations (JCR, HBase), some data backends (JCR, HBase, JDBC) 3. FULLY DISTRIBUTED - Today James features (multiple mailbox implementations, configurable mailets, jmap access...) may not be enough to make the diff. - It sounds like a fully distributed solution (potentially running on Kubernetes) could be a better differentiator. There is still work to achieve this (especially on the queuing level). 4. GSOC - GSOC is an great way for new contributors, - Any other options to attract newbies? 5. COMMUNICATION - We don't use enough the available communication channels: Twitter, Apache Blog... - We also don't communicate between us about the plans, pipeline... This is an action to fix this. Do we need to put a kanboard in place? - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
[jira] [Updated] (JAMES-2390) JMAP attachment performance issues
[ https://issues.apache.org/jira/browse/JAMES-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tellier Benoit updated JAMES-2390: -- Attachment: Capture d’écran de 2018-05-06 19-35-02.png Capture d’écran de 2018-05-06 19-32-31.png > JMAP attachment performance issues > -- > > Key: JAMES-2390 > URL: https://issues.apache.org/jira/browse/JAMES-2390 > Project: James Server > Issue Type: New Feature > Components: cassandra, JMAP >Affects Versions: master >Reporter: Tellier Benoit >Assignee: Antoine Duprat >Priority: Major > Labels: perfomance > Attachments: Capture d’écran de 2018-05-06 19-32-31.png, Capture > d’écran de 2018-05-06 19-35-02.png > > > Most of the Cassandra failures are related to attachment downloads, and more > precisely to attachment right checking. > Having a look at attached screenshots: > - We can notice a lot of warnings are generated by JMAP attachment downloads. > - That failure happens when reading meta-data, in order to retrieve the list > of referencing messages to resolve rights. > - Furthermore, we can notice failure is systematic for some attachments. > I spend a bit of time this weekend analysing this (unexpected!) performance > issues. I've mostly found 2 intuitive performance improvements as well as one > more complex. > -1. Upon checking whether a set of messages is accessible, the containing > mailbox rights were checks on a per-mailbox base. This is sub-optimal as some > messages might be in the same mailbox, whose rights will be needlessly > checked several times. > This change inserts smoothly into the codebase, the tools for checking rights > once per mailbox is already implemented. Just not used in that case. > - 2. Paging and asynchronous code don't combine well as already proven by > previous code. The mantra is *join then collect*. If the operation is done > reverse and entries exceed paging size (~5000) an exception will be thrown by > the Cassandra driver. > This explains the systematic failures for some specific attachments... The > fix is trivial, and I added a test for demonstrating this. > - 3. The given logs suggest that we have high cardinality rows in our > database (IE an attachment referenced by several messages), as the number of > referencing messages exceeds 5000 (to trigger paging issues) > Such a high cardinality has a massive read cost: > - Reading such a row is a complex operation > - Caching can not help as cache size per primary key is exceeded > - Rights would be resolved for each referencing messages, generating an > expensive read Cascade. > Note that deduplication is done at the Attachment level. By looking at the > attachment names (cf screenshots) we can notice these "high cardinality" > attachments look like inlined images in signature... > The stand here is that deduplicating is not a concern for attachments, but > for blobs. We should further push this lower level constraint in the stack. > That way, each blob would be deduplicated (storage cost reduction, higher FS > cache efficiency, etc...) while avoiding *wide rows*. > We should ensure each newly generated AttachmentId is unique, then generate > BlobId from the blob's content, to avoid wide rows while keeping > deduplication in place. > Note that this being done just for newly received messages, this can be done > transparently, without the needs for a migration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
[jira] [Created] (JAMES-2390) JMAP attachment performance issues
Tellier Benoit created JAMES-2390: - Summary: JMAP attachment performance issues Key: JAMES-2390 URL: https://issues.apache.org/jira/browse/JAMES-2390 Project: James Server Issue Type: New Feature Components: cassandra, JMAP Affects Versions: master Reporter: Tellier Benoit Assignee: Antoine Duprat Most of the Cassandra failures are related to attachment downloads, and more precisely to attachment right checking. Having a look at attached screenshots: - We can notice a lot of warnings are generated by JMAP attachment downloads. - That failure happens when reading meta-data, in order to retrieve the list of referencing messages to resolve rights. - Furthermore, we can notice failure is systematic for some attachments. I spend a bit of time this weekend analysing this (unexpected!) performance issues. I've mostly found 2 intuitive performance improvements as well as one more complex. -1. Upon checking whether a set of messages is accessible, the containing mailbox rights were checks on a per-mailbox base. This is sub-optimal as some messages might be in the same mailbox, whose rights will be needlessly checked several times. This change inserts smoothly into the codebase, the tools for checking rights once per mailbox is already implemented. Just not used in that case. - 2. Paging and asynchronous code don't combine well as already proven by previous code. The mantra is *join then collect*. If the operation is done reverse and entries exceed paging size (~5000) an exception will be thrown by the Cassandra driver. This explains the systematic failures for some specific attachments... The fix is trivial, and I added a test for demonstrating this. - 3. The given logs suggest that we have high cardinality rows in our database (IE an attachment referenced by several messages), as the number of referencing messages exceeds 5000 (to trigger paging issues) Such a high cardinality has a massive read cost: - Reading such a row is a complex operation - Caching can not help as cache size per primary key is exceeded - Rights would be resolved for each referencing messages, generating an expensive read Cascade. Note that deduplication is done at the Attachment level. By looking at the attachment names (cf screenshots) we can notice these "high cardinality" attachments look like inlined images in signature... The stand here is that deduplicating is not a concern for attachments, but for blobs. We should further push this lower level constraint in the stack. That way, each blob would be deduplicated (storage cost reduction, higher FS cache efficiency, etc...) while avoiding *wide rows*. We should ensure each newly generated AttachmentId is unique, then generate BlobId from the blob's content, to avoid wide rows while keeping deduplication in place. Note that this being done just for newly received messages, this can be done transparently, without the needs for a migration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org
Next Steps for James
Hi James Community, We have just discussed on the private list actions to further gain users and developers on the Apache James mail server. The discussion started as we are slow to convert new contributors to committers and we have a slow release schedule. I will summarize key points we have discussed. This is just a base to start the discussions and we really would love and need to hear your voice on this. 1. DOCS and TUTORIALS - We have a new website but no easy tutorials. - Which platform to use (readthedocs...?) - Migrate/Close Wiki. 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS - We may have to do some choice: Drop some Mailbox implementations (JCR, HBase), some data backends (JCR, HBase, JDBC) 3. FULLY DISTRIBUTED - Today James features (multiple mailbox implementations, configurable mailets, jmap access...) may not be enough to make the diff. - It sounds like a fully distributed solution (potentially running on Kubernetes) could be a better differentiator. There is still work to achieve this (especially on the queuing level). 4. GSOC - GSOC is an great way for new contributors, - Any other options to attract newbies? 5. COMMUNICATION - We don't use enough the available communication channels: Twitter, Apache Blog... - We also don't communicate between us about the plans, pipeline... This is an action to fix this. Do we need to put a kanboard in place? - To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org For additional commands, e-mail: server-dev-h...@james.apache.org