Re: Next Steps for James

2018-05-06 Thread Benoit Tellier
Hi.

Le 07/05/2018 à 06:47, Simon Levesque a écrit :
>> 1. DOCS and TUTORIALS
> 
> To help get new documentation fast, a wiki would be better than: creating a
> Jira, changing text files in the repository and doing a pull request.
> 
> Also, one big head-ache is that when we download the release, we cannot use
> it right away and then slowly go throught all the changes we can make to
> the config one part at a time. There should be a default config that just
> works like "receive and send emails, no-ssl". Then, we could create some
> sample/bundle configuration for:
> - receive and send emails with ssl (maildir/jpa)
> - receive emails ; send via a gateway (maildir/jpa)
> 
> That would give ready to use full config and also some examples of how to
> tweak things.
>
In my opinion, we should defend about knowledge spreading. We should
tend to have technical information about the software in a single location.

That both makes reading and writing docs/ tutorials easier.

What I do believe is that we need "feature oriented scenario" allowing
to quickly start a James server that does what you want, then having
specific links pointing to what you want/can configure.

Such use cases might be:

 - Starting up a JPA IMAP server
 - Setting up a distributed server
 - Doing inbound
 - Spawning a James server for your tests
 - ... (any other idea welcome) ...

Things like SMTP gateway are in my opinion "low level features" that
might not deserve such a guide, but could be linked at the end of that
guide.

Concerning SMTP mail sending, that is interesting. The doc exist on the
mailet level but is rather pretty hard to find. Maybe we should consider
adding a TLD page.

Finally, maybe the "configure" section of the website [1] is too
messy... We could subdivide the configure section between "backend
configuration" and "other configuration"... We could improve the base
page to better explain what the configuration pages will bring

[1] http://james.apache.org/server/config.html

> 
>> 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS
> 
> I am not sure on this one. If you are doing a lot of API changes that
> breaks all the components, yes that can be hard to maintain a lot of
> components, but if you are mostly adding new mailet functionnalities and
> new backends, that shouldn't be taking much time and it is actually giving
> more choices to users. Even more so that there are Docker tests done with
> them, so you can be confident to not have broken anything.
>

The fact is some implementations conform to the APIs but not with the
generic tests coming with it. Moreover, code might be obscur, outdated
and relying on hacks.

We have a clear definition of what a trusted backend is:
 - It should match generic tests
 - Have a docker for easy use
 - Pass load testing generated by Gatling
 - Have and pass integration tests

Many implementations fails to gain traction and do not comply with such
criteria. And the cost to do so would be high.

Of course, the idea is to reach a consensus here, and not drop an
implementation that people are willing to support. I imagine that a call
to contribution would be made, followed by deprecation then removal.

> 
>> 3. FULLY DISTRIBUTED
>> It sounds like a fully distributed solution (potentially running on
> Kubernetes) could be a better differentiator. There is still work to
> achieve this (especially on the queuing level).
> 
> Not sure if having something for Kubernetes out of the box would really be
> a differenciator. There are more people with Linux machines than with
> Kubernetes clusters installed.
> If I just think about using a provided Kubernetes cluster, I don't think
> emails are good there. Eg:
> - On Google Compute Engine, we cannot send emails directly, we need to use
> an email gateway
> - On Amazon, same thing
> - On DigitalOcean, you can receive and send emails, but without using a
> gateway, some of them could get lost (looking at Microsoft that sends
> everything not whitelisted from a user to /dev/null instead of their SPAM
> folder to give them a chance to know they are actually missing emails)
> So using their Kubernetes cluster would still not be that simple to
> configure.

The big deal here would be of course to attract other companies, who can
dedicate developers to Apache James ;-)

More than a choice of technology, what is important is the "distributed
stuff" feature, which is completly missing in the OpenSource landscape..

> 
> 
> 
> From Pablo's response
>> I do not think current capabilities of the server are well promoted so
> better communicating the current features would be good to get more users
> to try the server. Maybe a sort of marketing campaign releasing some smart
> things people could quickly do with the server would be nice.
> 
> +1 for that. While configuring for the first time James by looking at the
> config files and at the Mailet/Matcher code directly, I found so many nice
> features that I thought about maybe do in the future, but 

Re: Next Steps for James

2018-05-06 Thread Simon Levesque
> slow release schedule

I totally agree. As a recall, the last one is from last October, which
would be fine if nothing happened since, but there are a lot of commits so
there should have way more releases.
I had to manually create a new release from "master" to have my fixes and I
would prefer to use an official release.


> 1. DOCS and TUTORIALS

To help get new documentation fast, a wiki would be better than: creating a
Jira, changing text files in the repository and doing a pull request.

Also, one big head-ache is that when we download the release, we cannot use
it right away and then slowly go throught all the changes we can make to
the config one part at a time. There should be a default config that just
works like "receive and send emails, no-ssl". Then, we could create some
sample/bundle configuration for:
- receive and send emails with ssl (maildir/jpa)
- receive emails ; send via a gateway (maildir/jpa)

That would give ready to use full config and also some examples of how to
tweak things.


> 2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS

I am not sure on this one. If you are doing a lot of API changes that
breaks all the components, yes that can be hard to maintain a lot of
components, but if you are mostly adding new mailet functionnalities and
new backends, that shouldn't be taking much time and it is actually giving
more choices to users. Even more so that there are Docker tests done with
them, so you can be confident to not have broken anything.


> 3. FULLY DISTRIBUTED
> It sounds like a fully distributed solution (potentially running on
Kubernetes) could be a better differentiator. There is still work to
achieve this (especially on the queuing level).

Not sure if having something for Kubernetes out of the box would really be
a differenciator. There are more people with Linux machines than with
Kubernetes clusters installed.
If I just think about using a provided Kubernetes cluster, I don't think
emails are good there. Eg:
- On Google Compute Engine, we cannot send emails directly, we need to use
an email gateway
- On Amazon, same thing
- On DigitalOcean, you can receive and send emails, but without using a
gateway, some of them could get lost (looking at Microsoft that sends
everything not whitelisted from a user to /dev/null instead of their SPAM
folder to give them a chance to know they are actually missing emails)
So using their Kubernetes cluster would still not be that simple to
configure.



>From Pablo's response
> I do not think current capabilities of the server are well promoted so
better communicating the current features would be good to get more users
to try the server. Maybe a sort of marketing campaign releasing some smart
things people could quickly do with the server would be nice.

+1 for that. While configuring for the first time James by looking at the
config files and at the Mailet/Matcher code directly, I found so many nice
features that I thought about maybe do in the future, but that are actually
already there out of the box. That needs to be easier than looking at the
code to be able to find all these goodies.

Cheers

On Sun, 6 May 2018 at 17:14 pablo pita leira 
wrote:

> Well, I am no mail expert and I am not confronted with the distributed
> case. As a user, my modest use case is that I want to have control of my
> private email, and as I know Java, I like to be able to work with the
> server if I like to implement something.
>
> Respect the first point, I need some solution to keep a few gigabytes of
> email which I can deploy in a Linux server easily. For me, ideally I
> would want a mail James package that I could upgrade to new releases
> easily.
>
> And respect the second one, the code base makes the product work, and
> gives the chance to adapt for whatever case is needed among many of
> them. The code base is huge because of the great amount of choice, and
> makes understanding of the parts more complex. Therefore, I am for
> simplifying the code base by removing less used options, or having them
> separate. And of course, documentation is helpful as a new user to start
> with the server. As developer, quickly setup an environment to start
> hacking is welcome. In my case, I have no docker experience, and I am
> used to run applications the old way.
>
> I think the simple mail server use case is important for single
> developers to try and test new features. The distributed use case with
> kubernetes makes it a bit harder for me (I do not have experience with
> that technology). Requirements for companies are at another level,
> indeed. But many marketing features for James would sell both groups
> fine, single users and companies.
>
> I do not think current capabilities of the server are well promoted so
> better communicating the current features would be good to get more
> users to try the server. Maybe a sort of marketing campaign releasing
> some smart things people could quickly do with the server would be 

Re: Next Steps for James

2018-05-06 Thread pablo pita leira
Well, I am no mail expert and I am not confronted with the distributed 
case. As a user, my modest use case is that I want to have control of my 
private email, and as I know Java, I like to be able to work with the 
server if I like to implement something.


Respect the first point, I need some solution to keep a few gigabytes of 
email which I can deploy in a Linux server easily. For me, ideally I 
would want a mail James package that I could upgrade to new releases 
easily.


And respect the second one, the code base makes the product work, and 
gives the chance to adapt for whatever case is needed among many of 
them. The code base is huge because of the great amount of choice, and 
makes understanding of the parts more complex. Therefore, I am for 
simplifying the code base by removing less used options, or having them 
separate. And of course, documentation is helpful as a new user to start 
with the server. As developer, quickly setup an environment to start 
hacking is welcome. In my case, I have no docker experience, and I am 
used to run applications the old way.


I think the simple mail server use case is important for single 
developers to try and test new features. The distributed use case with 
kubernetes makes it a bit harder for me (I do not have experience with 
that technology). Requirements for companies are at another level, 
indeed. But many marketing features for James would sell both groups 
fine, single users and companies.


I do not think current capabilities of the server are well promoted so 
better communicating the current features would be good to get more 
users to try the server. Maybe a sort of marketing campaign releasing 
some smart things people could quickly do with the server would be nice.


That was my 2 cents.


El 06/05/18 a las 08:31, Eric Charles escribió:

Hi James Community,

We have just discussed on the private list actions to further gain 
users and developers on the Apache James mail server.


The discussion started as we are slow to convert new contributors to 
committers and we have a slow release schedule.


I will summarize key points we have discussed. This is just a base to 
start the discussions and we really would love and need to hear your 
voice on this.


1. DOCS and TUTORIALS

- We have a new website but no easy tutorials.
- Which platform to use (readthedocs...?)
- Migrate/Close Wiki.

2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS

- We may have to do some choice: Drop some Mailbox implementations 
(JCR, HBase), some data backends (JCR, HBase, JDBC)


3. FULLY DISTRIBUTED

- Today James features (multiple mailbox implementations, configurable 
mailets, jmap access...) may not be enough to make the diff.
- It sounds like a fully distributed solution (potentially running on 
Kubernetes) could be a better differentiator. There is still work to 
achieve this (especially on the queuing level).


4. GSOC

- GSOC is an great way for new contributors,
- Any other options to attract newbies?

5. COMMUNICATION

- We don't use enough the available communication channels: Twitter, 
Apache Blog...
- We also don't communicate between us about the plans, pipeline... 
This is an action to fix this. Do we need to put a kanboard in place?


-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org




-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



[jira] [Updated] (JAMES-2390) JMAP attachment performance issues

2018-05-06 Thread Tellier Benoit (JIRA)

 [ 
https://issues.apache.org/jira/browse/JAMES-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tellier Benoit updated JAMES-2390:
--
Attachment: Capture d’écran de 2018-05-06 19-35-02.png
Capture d’écran de 2018-05-06 19-32-31.png

> JMAP attachment performance issues
> --
>
> Key: JAMES-2390
> URL: https://issues.apache.org/jira/browse/JAMES-2390
> Project: James Server
>  Issue Type: New Feature
>  Components: cassandra, JMAP
>Affects Versions: master
>Reporter: Tellier Benoit
>Assignee: Antoine Duprat
>Priority: Major
>  Labels: perfomance
> Attachments: Capture d’écran de 2018-05-06 19-32-31.png, Capture 
> d’écran de 2018-05-06 19-35-02.png
>
>
> Most of the Cassandra failures are related to attachment downloads, and more 
> precisely to attachment right checking.
> Having a look at attached screenshots:
>  - We can notice a lot of warnings are generated by JMAP attachment downloads.
>  - That failure happens when reading meta-data, in order to retrieve the list 
> of referencing messages to resolve rights.
>  - Furthermore, we can notice failure is systematic for some attachments.
> I spend a bit of time this weekend analysing this (unexpected!) performance 
> issues. I've mostly found 2 intuitive performance improvements as well as one 
> more complex.
>  -1. Upon checking whether a set of messages is accessible, the containing 
> mailbox rights were checks on a per-mailbox base. This is sub-optimal as some 
> messages might be in the same mailbox, whose rights will be needlessly 
> checked several times.
> This change inserts smoothly into the codebase, the tools for checking rights 
> once per mailbox is already implemented. Just not used in that case.
>  - 2. Paging and asynchronous code don't combine well as already proven by 
> previous code. The mantra is *join then collect*. If the operation is done 
> reverse and entries exceed paging size (~5000) an exception will be thrown by 
> the Cassandra driver.
> This explains the systematic failures for some specific attachments... The 
> fix is trivial, and I added a test for demonstrating this.
>  - 3. The given logs suggest that we have high cardinality rows in our 
> database (IE an attachment referenced by several messages), as the number of 
> referencing messages exceeds 5000 (to trigger paging issues)
> Such a high cardinality has a massive read cost:
>  - Reading such a row is a complex operation
>  - Caching can not help as cache size per primary key is exceeded
>  - Rights would be resolved for each referencing messages, generating an 
> expensive read Cascade.
> Note that deduplication is done at the Attachment level. By looking at the 
> attachment names (cf screenshots) we can notice these "high cardinality" 
> attachments look like inlined images in signature...
> The stand here is that deduplicating is not a concern for attachments, but 
> for blobs. We should further push this lower level constraint in the stack. 
> That way, each blob would be deduplicated (storage cost reduction, higher FS 
> cache efficiency, etc...) while avoiding *wide rows*.
> We should ensure each newly generated AttachmentId is unique, then generate 
> BlobId from the blob's content, to avoid wide rows while keeping 
> deduplication in place.
> Note that this being done just for newly received messages, this can be done 
> transparently, without the needs for a migration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



[jira] [Created] (JAMES-2390) JMAP attachment performance issues

2018-05-06 Thread Tellier Benoit (JIRA)
Tellier Benoit created JAMES-2390:
-

 Summary: JMAP attachment performance issues
 Key: JAMES-2390
 URL: https://issues.apache.org/jira/browse/JAMES-2390
 Project: James Server
  Issue Type: New Feature
  Components: cassandra, JMAP
Affects Versions: master
Reporter: Tellier Benoit
Assignee: Antoine Duprat


Most of the Cassandra failures are related to attachment downloads, and more 
precisely to attachment right checking.

Having a look at attached screenshots:
 - We can notice a lot of warnings are generated by JMAP attachment downloads.
 - That failure happens when reading meta-data, in order to retrieve the list 
of referencing messages to resolve rights.
 - Furthermore, we can notice failure is systematic for some attachments.

I spend a bit of time this weekend analysing this (unexpected!) performance 
issues. I've mostly found 2 intuitive performance improvements as well as one 
more complex.

 -1. Upon checking whether a set of messages is accessible, the containing 
mailbox rights were checks on a per-mailbox base. This is sub-optimal as some 
messages might be in the same mailbox, whose rights will be needlessly checked 
several times.

This change inserts smoothly into the codebase, the tools for checking rights 
once per mailbox is already implemented. Just not used in that case.

 - 2. Paging and asynchronous code don't combine well as already proven by 
previous code. The mantra is *join then collect*. If the operation is done 
reverse and entries exceed paging size (~5000) an exception will be thrown by 
the Cassandra driver.

This explains the systematic failures for some specific attachments... The fix 
is trivial, and I added a test for demonstrating this.

 - 3. The given logs suggest that we have high cardinality rows in our database 
(IE an attachment referenced by several messages), as the number of referencing 
messages exceeds 5000 (to trigger paging issues)

Such a high cardinality has a massive read cost:
 - Reading such a row is a complex operation
 - Caching can not help as cache size per primary key is exceeded
 - Rights would be resolved for each referencing messages, generating an 
expensive read Cascade.

Note that deduplication is done at the Attachment level. By looking at the 
attachment names (cf screenshots) we can notice these "high cardinality" 
attachments look like inlined images in signature...

The stand here is that deduplicating is not a concern for attachments, but for 
blobs. We should further push this lower level constraint in the stack. That 
way, each blob would be deduplicated (storage cost reduction, higher FS cache 
efficiency, etc...) while avoiding *wide rows*.

We should ensure each newly generated AttachmentId is unique, then generate 
BlobId from the blob's content, to avoid wide rows while keeping deduplication 
in place.

Note that this being done just for newly received messages, this can be done 
transparently, without the needs for a migration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org



Next Steps for James

2018-05-06 Thread Eric Charles

Hi James Community,

We have just discussed on the private list actions to further gain users 
and developers on the Apache James mail server.


The discussion started as we are slow to convert new contributors to 
committers and we have a slow release schedule.


I will summarize key points we have discussed. This is just a base to 
start the discussions and we really would love and need to hear your 
voice on this.


1. DOCS and TUTORIALS

- We have a new website but no easy tutorials.
- Which platform to use (readthedocs...?)
- Migrate/Close Wiki.

2. NOT ENOUGH HANDS? DROP NOT ENOUGH USED COMPONENTS

- We may have to do some choice: Drop some Mailbox implementations (JCR, 
HBase), some data backends (JCR, HBase, JDBC)


3. FULLY DISTRIBUTED

- Today James features (multiple mailbox implementations, configurable 
mailets, jmap access...) may not be enough to make the diff.
- It sounds like a fully distributed solution (potentially running on 
Kubernetes) could be a better differentiator. There is still work to 
achieve this (especially on the queuing level).


4. GSOC

- GSOC is an great way for new contributors,
- Any other options to attract newbies?

5. COMMUNICATION

- We don't use enough the available communication channels: Twitter, 
Apache Blog...
- We also don't communicate between us about the plans, pipeline... This 
is an action to fix this. Do we need to put a kanboard in place?


-
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org