Your analysis seems plausible to me. To test this for sure, it would
require finding a way to inject a stream that would deliver the right
wrong stuff. That would be an entertaining programming project. Dan,
what do you think?

On Thu, Oct 14, 2010 at 6:56 PM, Pieper, Aaron <[email protected]> wrote:
> I'm using CXF 2.2.10. I'm having a problem with some MTOM attachments. It 
> started when I upgraded from CXF 2.2.2 to CXF 2.2.3. The bug is that after 
> calling a service which returned an MTOM attachment, when I try to parse the 
> attachment, I sometimes get an error:
>
> java.io.IOException: Underlying input stream returned zero bytes
>        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:268)
>        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:306)
>        at sun.nio.cs.StreamDecoder.READ(StreamDecoder.java:158)
>        at java.io.InputStreamReader.READ(InputStreamReader.java:167)
>        at java.io.Reader.READ(Reader.java:123)
>        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1128)
>        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1104)
>        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1050)
>        at org.apache.commons.io.IOUtils.toString(IOUtils.java:359)
>        at com.pragmatics.AsyncUtils.messageToString(AsyncUtils.java:18)
>
> The error only happens for some attachments - about 25% of them. It's a 
> seemingly arbitrary 25% - it's not like, the biggest 25% or the ones that 
> have special characters. I was able to track this down to 
> MimeBodyPartInputStream. MimeBodypartInputStream has some logic in 
> processBuffer for reading the boundary. It goes like this:
>
> while ((boundaryIndex < boundary.length) && (value == 
> boundary[boundaryIndex])) {
>  if (!hasData(buffer, initialI, i + 1, off, len)) {
>  return initialI - off;
>  }
>  value = buffer[++i];
>  boundaryIndex++;
> }
>
> So, basically, when MimeBodyPartInputStream finds the start of a boundary, it 
> reads from the stream until either there's no more characters to read, or 
> until it read the entire boundary. The problem with this logic is that it 
> assumes the entire boundary will be read in the same call to the underlying 
> InputStream. This assumption isn't always true. Specifically, when I'm 
> fetching an attachment in my application, this MimeBodyPartInputStream is 
> backed by an HttpURLConnection.HttpInputStream. This HttpInputStream 
> sometimes fetches as few as 24 characters, I guess that's just how the 
> HttpInputStream works. But if these 24 characters happen to fall on one of 
> these MIME boundaries, it can cause problems.
>
> One problem, which I'm running into here, is that the 
> MimeBodyPartInputStream's read(byte,int,int) method returns 0, since the only 
> bytes that were read were parts of the MIME boundary. In returning 0, it 
> breaks InputStream's contract which says states that the read method will 
> only ever return a positive integer (if some bytes were read) or -1 (if no 
> bytes were read.) There are probably other possible problems - it seems like 
> it's possible MimeBodyPartInputStream might misunderstand whether or not it's 
> hit a boundary in some cases. I haven't run into that problem though.
>
> I was hesitant to submit a tracker for this issue, since I don't 100% 
> understand all of the pieces involved. Since the bug is dependent on 
> HttpInputStream, I haven't really been able to create a test case for it, 
> unless I do weird things like create my own InputStream class which behaves 
> in weird ways. Should I submit it anyway? It fortunately only causes problems 
> in my test code, but it seems like an important issue.
>
> - Aaron
>

Reply via email to