Possible bug in MimeBodyPartInputStream - should I report this?

Pieper, Aaron Thu, 14 Oct 2010 15:54:49 -0700

I'm using CXF 2.2.10. I'm having a problem with some MTOM attachments. It 
started when I upgraded from CXF 2.2.2 to CXF 2.2.3. The bug is that after 
calling a service which returned an MTOM attachment, when I try to parse the 
attachment, I sometimes get an error:


java.io.IOException: Underlying input stream returned zero bytes
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:268)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
        at sun.nio.cs.StreamDecoder.READ(StreamDecoder.java:158)
        at java.io.InputStreamReader.READ(InputStreamReader.java:167)
        at java.io.Reader.READ(Reader.java:123)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1128)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1050)
        at org.apache.commons.io.IOUtils.toString(IOUtils.java:359)
        at com.pragmatics.AsyncUtils.messageToString(AsyncUtils.java:18)
 
The error only happens for some attachments - about 25% of them. It's a 
seemingly arbitrary 25% - it's not like, the biggest 25% or the ones that have 
special characters. I was able to track this down to MimeBodyPartInputStream. 
MimeBodypartInputStream has some logic in processBuffer for reading the 
boundary. It goes like this:

while ((boundaryIndex < boundary.length) && (value == boundary[boundaryIndex])) 
{
 if (!hasData(buffer, initialI, i + 1, off, len)) {
  return initialI - off;
 }
 value = buffer[++i];
 boundaryIndex++;
}

So, basically, when MimeBodyPartInputStream finds the start of a boundary, it 
reads from the stream until either there's no more characters to read, or until 
it read the entire boundary. The problem with this logic is that it assumes the 
entire boundary will be read in the same call to the underlying InputStream. 
This assumption isn't always true. Specifically, when I'm fetching an 
attachment in my application, this MimeBodyPartInputStream is backed by an 
HttpURLConnection.HttpInputStream. This HttpInputStream sometimes fetches as 
few as 24 characters, I guess that's just how the HttpInputStream works. But if 
these 24 characters happen to fall on one of these MIME boundaries, it can 
cause problems.

One problem, which I'm running into here, is that the MimeBodyPartInputStream's 
read(byte,int,int) method returns 0, since the only bytes that were read were 
parts of the MIME boundary. In returning 0, it breaks InputStream's contract 
which says states that the read method will only ever return a positive integer 
(if some bytes were read) or -1 (if no bytes were read.) There are probably 
other possible problems - it seems like it's possible MimeBodyPartInputStream 
might misunderstand whether or not it's hit a boundary in some cases. I haven't 
run into that problem though.

I was hesitant to submit a tracker for this issue, since I don't 100% 
understand all of the pieces involved. Since the bug is dependent on 
HttpInputStream, I haven't really been able to create a test case for it, 
unless I do weird things like create my own InputStream class which behaves in 
weird ways. Should I submit it anyway? It fortunately only causes problems in 
my test code, but it seems like an important issue.
 
- Aaron

Possible bug in MimeBodyPartInputStream - should I report this?

Reply via email to