Memory, Large messages and SharedInputStreams (Was: Apache James / JAMES-134)

Stefano Bagnara Sun, 22 Jan 2006 07:19:38 -0800

Ofir Gross wrote:

Hi,
I share my thoughts:

I appreciate this. Sorry If it takes so much to reply but I've been busywith the spool synchronization problems and with my day-job.

I'd like to keep the conversation on the dev-list because it is archivedfor future reference and somebody else could eventually partecipate tothe brainstorming.

 There is a problem with the constructor:

   public MimeMessage(MimeMessage source) throws MessagingException {
        [...]
        source.writeTo(bos);       <<----------Problem is here
[...]
Because it __always__ saves into a ByteArrayInputStream which is inmemory. The only way to avoid it is either __not__ to call thatconstructor, or to subclass it like I did, in order to overide thismethod. When I have overiden this constructor, I saved to a temporaryfile instead of a byte array.

Ok, that's clear, we must remove calls to that constructor from ourcode, and we should use new MimeMessage(Session s, InputStream is).

Furthermore we should pass a SharedInputStream to that constructor.

You suggested to use the existing repository instead, which is a goodidea. But two questions:1. how do I locate the message in the repository form the MimeMessage"source"?

I think this is not possible. You can get the source starting from theMail object in the repository but not viceversa.

Why do you need that?

2. Does every MimeMessage exist in the repository?

Not every mimemessage, because when we create a new mimemessage fromscratch (like the bounce mailet) we create the MimeMessage and afterthat we send it to James that store it in the spool.BTW most time when we deal with big messages we probably received themfrom the SMTPServer and we already have them in streamrepositories ormailrepositories.

When the SMTPServer receive a new message it currently create a newMailImpl using the "public MailImpl(String name, MailAddress sender,Collection recipients, InputStream messageIn)" costructor.That constructor create a new MimeMessageSource using the InputStreamfrom the socket (new MimeMessageInputStreamSource(name, messageIn);)MimeMessageInputStreamSource currently store the message in a .m64 fileand eventually provide the stream to the following users:


public synchronized InputStream getInputStream() throws IOException {
   return new BufferedInputStream(new FileInputStream(file));
}

This is the first step.

Maybe we should start changing the MimeMessageInputStreamSource toprovide a Shared input stream?!?

Then, when a message is stored in a dbfile repository then the header isstored in the body field of the repository db table while the body ofthe message is stored in a streamrepository.

MimeMessageJDBCSource is the object that handle this behaviour. So everytime we read a message from the db repository we use this object.

We call this method of that object to get the InputStream:
---
public synchronized InputStream getInputStream() throws IOException
---

and it create a new SequenceInputStream using a ByteArrayInputStream ofthe header and a the input stream provided by the streamrepository.get().

---
InputStream in = new ByteArrayInputStream(headers);
if (sr != null) {
   in = new SequenceInputStream(in, sr.get(key));
}
---

The implementation of the streamrepository.get used is inFile_Persistent_Stream_Repository and you can see it uses a

---

final ResettableFileInputStream stream = new ResettableFileInputStream(getFile( key ) );

---

Maybe we should work on the above files to always be able to provideshared inputstreams.

The other constructor: "public MimeMessage(Session session, InputStreamis)" will be OK if the InputStream provided to it will be aSharedInputStream, and that can be done by tracking down calls to thisconstractor (by grep "new MimeMessage"), and wrapping up the providedinput stream with a SharedInputStream implementing wrapper. I did thatgrep and looked at the results, and this looks possible.


The main one is on the MimeMessageWrapper.loadMessage()
----
in = source.getInputStream();
headers = loadHeaders(in);

ByteArrayInputStream headersIn = newByteArrayInputStream(headers.toByteArray());

in = new SequenceInputStream(headersIn, in);

message = new MimeMessage(session, in);
----

source is the MimeMessageJDBCSource we already seen before and we couldchange it to provide a SharedInputStream (to avoid a copy to a new file)You see a similar pattern as before: we create a new SequenceInputStreamfrom a ByteArrayInputStream and the stream provided by the source.We should find a way to have a SharedInputStream at the end of all thissteps, without the need to copy the stream on a new file.

We could introduce your proposed wrapped only in a few mailet but weshould avoid using it in our core because we already have 2 wrappersover the mimemessage and we can get better/cleaner result following thepath I describe in this mail.

Maybe if all InputStreams provided to that constructor will beSharedInputStream, then all MimeMessage will have a SharedInputStreamimplementing contentStream as well? If it is true, then the overidingconstructor of "MimeMessage(MimeMessage source)" could use that stream,and it will not need to locate the message in the repository. But I amnot sure wether it is true, because the constructor "MimeMessage(Sessionsession)" don't touch content, or contentStream, and so if it isconstructed that way, it will not have either, and so there will be noInputStream for it, in case it is sent this way to theMimeMessage(MimeMessage source) constructor.

When the message is constructed from someone else using the newMimeMessage(session) we have no power on how it is handled.IMHO we should start optimizing our own operations then we could write afew docs on how to write optimized mailets.

A further step would be to use streaming operations also for "db" onlyrepositories. We currently use blob operations that write and read fullbytearrays but most new dbs/jdbc drivers correctly supports the use ofstreams for write and read operations of large contents. Here therewould be one more issue because they don't provide (obviously)SharedInputStreams but their own InputStreams and I don't know how wecould implement it. We probably should use your proposed wrapper if weneed that, or otherwise create a wrapper over the previous InputStreamimplementing the SharedInputStream and simply retreaving a newInputStream at every call to newStream (returning, in turn, a newSharedInputStream).

The worst part of this work is that it is very difficult to unit-testmemory issues: any idea on how to test it?


Stefano

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Memory, Large messages and SharedInputStreams (Was: Apache James / JAMES-134)

Reply via email to