Hello all.
The streaming approach turned out to be a terrible idea!
ChunkStream was doing massive blocking reads on the event loop.
InputStream is just not the right abstraction for non blocking reads.
Rewriting everything to support something like Flux<ByteBuffer> looks
doable but requires a major refactoring. Moreover AES transformations
would need to be adapted to that new format...
https://github.com/apache/james-project/pull/2149 is taking the exact
opposite approach:
- accept that we represent things internally as byte[]
- and carry that information to the Netty stack so that it can adapt
in consequences
This means that big IMAP FETCH would have an overhead of a few messages.
Which likely is acceptable.
If everyone agrees with this, I would carry on with this approach.
Best regards,
Benoit TELLIER
On 20/03/2024 16:40, Benoit TELLIER wrote:
Hello all,
Today I did put together a POC where the following IMAP command
a0 FETCH 1:* (BODY[])
would directly stream content from the S3 storage without storing the
full input in a byte array.
I did test it a bit manually on top of the S3 AES implementation.
Link: https://github.com/apache/james-project/pull/2137
While working on this I stumbled across ReactorUtils::toInputStream
which do not implement available (returns 0) and always block when
trying to access the next chunk of data.
This would defeat most of the benefits of Netty's ChuckedStream
abstraction: a reliable available method allows polling on it in the
enventLoop and send data as it is ready.
Feeling brave I decided to experiment with a subscriber bringing the
gaps between the NIO world and the reactor word.
This work is incomplete as usage in real life situation causes crash.
Link: https://github.com/apache/james-project/pull/2138
Other consideration doing this is also the need to increase the count
of S3 connection as they are going to stay open longer...
Those are advanced topics and I believe they would be crucial into
making Apache James a better IMAP server...
Best regards,
Benoit TELLIER
On 19/03/2024 16:45, Benoit TELLIER wrote:
Hello all,
As I had already been writing here, I did encounter significant
issues during a recent deployment [1]
[1]
https://www.mail-archive.com/server-dev@james.apache.org/msg73848.html
This did lead to [2] implementing backpressure for IMAP FETCH. Which
had been mitigating the issue.
[2] https://issues.apache.org/jira/projects/JAMES/issues/JAMES-3997
But not really well-enough. As the count of users/mails increases I
ended up with some new OutOfMemory exception related to IMAP usage
from this weekend.
I thus did take the time to write a test regarding backpressure [3]
(not reading the socket and instrumenting the mailbox layer to see
what is actually pulled) and started playing with some related Netty
settings [4].
[3] https://github.com/apache/james-project/pull/2128
[4] https://github.com/apache/james-project/pull/2129
However high/low level write buffer watermarks seems ineffective: it
takes dozens of several MB messages to be written for the
back-pressure to quick-in. And the default values (32KB/64KB) are
very low compared to a problematic message size. Netty expertise is
more than welcome here!
Another problem is that as of today message content is loaded as a
byte array by the mailbox layer. For a request like IMAP FETCH
(BODY[]) this is ineffective and we could rather be streaming it
straight from the object store (even applying backpressure from
within a single message write). Yet this would require a major
refactoring of mailbox / imap code. And also a bullet proof lifecycle
management for connections/ temporary files.
Thoughts?
Benoit
---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscr...@james.apache.org
For additional commands, e-mail: server-dev-h...@james.apache.org