good afternoon; is it possible that the client had no readable connection given the 3072/55397664 entry in the http proxy log? it should indicate that some sort of connection existed long enough for nginx to send over 55 million bytes.
> On 2021-07-05, at 15:10:03, Rob Vesse <rve...@dotnetrdf.org> wrote: > > That's a really good suggestion. In the normal code flow do you ever call > stream.close() ? And is createHttpClient() re-using an existing HttpClient > object ? And is the hang only happening after some requests have succeeded ? > > It is possible what is happening is that you aren't closing the stream (and I > don't believe Jena's parsers ever close the stream for you) so after so many > requests (10 I think by default) you are exhausting the max connections per > route for the HTTP Client. If that is the case wrapping the use of the > stream in a try-with-resources block may be the solution. > > Rob > > On 05/07/2021, 14:03, "Martynas Jusevičius" <marty...@atomgraph.com> wrote: > > HTTPClient is not running out of connections? It is known to hang in such > cases. > > On Mon, Jul 5, 2021 at 2:58 PM james anderson <ja...@dydra.com> wrote: >> >> good afternoon; >> >>> On 2021-07-05, at 12:36:20, Andy Seaborne <a...@apache.org> wrote: >>> >>> >>> >>> On 05/07/2021 10:01, Ivan Lagunov wrote: >>>> Hello, >>>> We’re facing an issue with Jena reading n-triples stream over HTTP. In >>>> fact, our application hangs entirely while executing this piece of code: >>>> Model sub = ModelFactory.createDefaultModel(); >>>> TypedInputStream stream = HttpOp.execHttpGet(requestURL, >>>> WebContent.contentTypeNTriples, createHttpClient(auth), null) >>>> // The following part sometimes hangs: >>>> RDFParser.create() >>>> .source(stream) >>>> .lang(Lang.NTRIPLES) >>>> .errorHandler(ErrorHandlerFactory.errorHandlerStrict) >>>> .parse(sub.getGraph()); >>>> // This point is not reached >>>> >>>> The issue is not persistent, moreover it happens infrequently. >>> >>> Then it looks like the data has stopped arriving but the connection is >>> still open. (or the system has gone in GC overload due to heap pressure.) >>> >>> Is intermittent on the same data? Or is the data changing? because maybe >>> the data can't be written properly and the sender stops sending, though I'd >>> expect the sender to close the connection (it's now in an unknown state and >>> can't be reused). >>> >>>> When it occurs, the RDF store server (we use Dydra for that) logs a >>>> successful HTTP 200 response for our call (truncated for readability): >>>> HTTP/1.1" 200 3072/55397664 10.676/10.828 "application/n-triples" "-" "-" >>>> "Apache-Jena-ARQ/3.17.0" "127.0.0.1:8104" >> >> the situation involves an nginx proxy and an upstream sparql processor. >> >>> >>> What do the fields mean? >> >> the line is an excerpt from an entry from the nginx request log. that line >> contains: >> >> protocol code requestLength/responseLength >> upstreamElapsedTime/clientElapsedTIme acceptType - - clientAgemt >> upstreamPort >> >>> >>> Is that 3072 bytes sent (so far) of 55397664? >>> >>> If so, is Content-Length set (and then chunk encoding isn't needed). >> >> likely not, as the response is (i believe) that from a sparql request, which >> is emitted as it is generated. >> >>> >>> Unfortunately, in HTTP, 200 really means "I started to send stuff", not "I >>> completed sending stuff". There is no way in HTTP 1/1 to signal an error >>> after starting the response. >> >> that is true, but there are indications in other logs which imply that the >> sparql processor believes the response to have been completely sent to nginx. >> there are several reasons to believe this. >> the times and the 200 response code in the nginx log indicate completion. >> otherwise, it would either indicate that it timed out, or would include a >> 499 code, to the effect that the client closed the connection before the >> response was sent. >> neither is the case. >> in addition, the elapsed time is well below that for which nginx would time >> out an upstream connection. >> >>> >>> The HttpClient - how is it configured? >>> >>>> So it looks like the RDF store successfully executes the SPARQL query, >>>> responds with HTTP 200 and starts transferring the data with the chunked >>>> encoding. Then something goes wrong when Jena processes the input stream. >>>> I expect there might be some timeout behind the scenes while Jena reads >>>> the stream >>> >>> Does any data reach the graph? >>> >>> There is no timeout at the client end - otherwise you would get an >>> exception. The parser is reading the input stream from Apache HttpClient. >>> If it hangs, it's because the data has stopped arriving but the connection >>> is still open. >>> >>> You could try replacing .parse(graph) with parse(StreamRDF) and plug in a >>> logging StreamRDF so you can see the progress, either sending on data to >>> the graph or for investigation, merely logging. >>> >>> In HTTP 1.1, a streamed response requires chunk encoding only when the >>> Content-Length isn't given. >> >> i believe, the content length is not given. >> >>> >>>> >>> , and it causes it to wait indefinitely. At the same time >>> ErrorHandlerFactory.errorHandlerStrict does not help at all – no errors are >>> logged. >>>> Is there a way to configure the timeout behavior for the underlying Jena >>>> logic of processing HTTP stream? Ideally we want to abort the request if >>>> it times out and then retry it a few times until it succeeds. >>> >>> The HttpClient determines the transfer. >>> >>> Andy >>> >>> FYI: RDFConnectionRemote is an abstraction to make this a little easier. No >>> need to go to the low-level HttpOp. >>> >>> >>> FYI: Jena 4.mumble.0 is likely to change to using jena.net.http as the HTTP >>> code. There has to be some change anyway to get HTTP/2 (Apache HttpClient >>> v5+, not v4, has HTTP/2 support). >>> >>> This will include a new Graph Store Protocol client. >>> >>>> Met vriendelijke groet, with kind regards, >>>> Ivan Lagunov >>>> Technical Lead / Software Architect >>>> Skype: lagivan >>>> Semaku B.V. >>>> Torenallee 20 (SFJ3D) • 5617 BC Eindhoven • www.semaku.com >> > > > >