Sebastian Nagel created NUTCH-2699: -------------------------------------- Summary: Protocol-okhttp: needless loops to increment requested bytes counter when more content is already buffered Key: NUTCH-2699 URL: https://issues.apache.org/jira/browse/NUTCH-2699 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.15 Reporter: Sebastian Nagel Fix For: 1.16
The okhttp library used by the plugin protocol-okhttp buffers content internal and often has already buffered more content than has been requested. The plugin should immediately set the request count to the size of the buffered content to avoid needless loops when the buffered size comes close to the content limit (the increment steps are too small): {noformat} 2019-03-11 14:56:36,642 DEBUG okhttp.OkHttpResponse - http://localhost/large.pdf - http/1.1 200 OK 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 8192, buffered = 16088 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 16384, buffered = 24280 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 24576, buffered = 32472 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 32768, buffered = 40664 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 40960, buffered = 48856 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 49152, buffered = 57048 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57344, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57638, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 57932, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58226, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58520, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 58814, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59108, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59402, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59696, buffered = 65240 2019-03-11 14:56:36,643 DEBUG okhttp.OkHttpResponse - total bytes requested = 59990, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60284, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60578, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 60872, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61166, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61460, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 61754, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62048, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62342, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62636, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 62930, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63224, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63518, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 63812, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64106, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64400, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64694, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 64988, buffered = 65240 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - total bytes requested = 65282, buffered = 73432 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - content limit reached 2019-03-11 14:56:36,644 DEBUG okhttp.OkHttpResponse - copied 65534 bytes out of 73432 buffered, remaining buffer contains 7898 bytes 2019-03-11 14:56:36,645 DEBUG okhttp.OkHttpResponse - HTTP content truncated to 65534 bytes (reason: LENGTH) 2019-03-11 14:56:36,661 INFO parse.ParseSegment - http://localhost/large.pdf skipped. Content of size 366578 was truncated to 65534 2019-03-11 14:56:36,661 WARN parse.ParserChecker - Content is truncated, parse may fail! {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)