Sebastian Nagel created NUTCH-2699:
--------------------------------------

             Summary: Protocol-okhttp: needless loops to increment requested 
bytes counter when more content is already buffered
                 Key: NUTCH-2699
                 URL: https://issues.apache.org/jira/browse/NUTCH-2699
             Project: Nutch
          Issue Type: Bug
          Components: protocol
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16


The okhttp library used by the plugin protocol-okhttp buffers content internal 
and often has already buffered more content than has been requested. The plugin 
should immediately set the request count to the size of the buffered content to 
avoid needless loops when the buffered size comes close to the content limit 
(the increment steps are too small):
{noformat}
2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - 
http://localhost/large.pdf - http/1.1 200 OK
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
8192, buffered = 16088
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
16384, buffered = 24280
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
24576, buffered = 32472
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
32768, buffered = 40664
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
40960, buffered = 48856
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
49152, buffered = 57048
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
57344, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
57638, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
57932, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
58226, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
58520, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
58814, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
59108, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
59402, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
59696, buffered = 65240
2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 
59990, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
60284, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
60578, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
60872, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
61166, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
61460, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
61754, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
62048, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
62342, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
62636, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
62930, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
63224, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
63518, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
63812, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
64106, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
64400, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
64694, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
64988, buffered = 65240
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 
65282, buffered = 73432
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached
2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out of 
73432 buffered, remaining buffer contains 7898 bytes
2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated to 
65534 bytes (reason: LENGTH)
2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf 
skipped. Content of size 366578 was truncated to 65534
2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, parse 
may fail!
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to