Hi guys,

I know that protocol-httpclient is not recommended to use because of known 
problems, but I don't have much choice because I need authentication support, 
as a few other people do as well, I am sure.

I've reported a problem with too aggressive de-duplication recently. On the 
example that I had, I traced that problem to an empty content field. Digging 
further, I found this in httpclient/HttpResponse.java (lines 126-130):

        while ((bufferFilled = in.read(buffer, 0, buffer.length)) != -1
            && totalRead + bufferFilled < contentLength) {
          totalRead += bufferFilled;
          out.write(buffer, 0, bufferFilled);
        }

This should be changed to

        while ( ( bufferFilled = in.read( buffer, 0, buffer.length ) ) != -1 )
        {
          int toWrite = totalRead + bufferFilled < contentLength ?
                                                totalRead + bufferFilled : 
contentLength - totalRead ;
          totalRead += bufferFilled;
          out.write( buffer, 0, toWrite ) ;
          if ( totalRead >= contentLength ) break ;
        }

Else the last read portion quite often is not stored. Obviously, this is 
causing problems, especially in small documents where the last read portion is 
the only one, and in PDF documents, as well as other document types that are 
sensitive to truncation.

This problem explains a large part of false de-duplication cases, as well as 
parsing errors with truncated content symptoms, but it does not seem to explain 
all of them.

Regards,

Arkadi

Reply via email to