good afternoon;

is it possible that the client had no readable connection given the 
3072/55397664 entry in the http proxy log?
it should indicate that some sort of connection existed long enough for nginx 
to send over 55 million bytes.

> On 2021-07-05, at 15:10:03, Rob Vesse <rve...@dotnetrdf.org> wrote:
> 
> That's a really good suggestion.  In the normal code flow do you ever call 
> stream.close() ? And is createHttpClient() re-using an existing HttpClient 
> object ? And is the hang only happening after some requests have succeeded ?
> 
> It is possible what is happening is that you aren't closing the stream (and I 
> don't believe Jena's parsers ever close the stream for you) so after so many 
> requests (10 I think by default) you are exhausting the max connections per 
> route for the HTTP Client.  If that is the case wrapping the use of the 
> stream in a try-with-resources block may be the solution.
> 
> Rob
> 
> On 05/07/2021, 14:03, "Martynas Jusevičius" <marty...@atomgraph.com> wrote:
> 
>    HTTPClient is not running out of connections? It is known to hang in such 
> cases.
> 
>    On Mon, Jul 5, 2021 at 2:58 PM james anderson <ja...@dydra.com> wrote:
>> 
>> good afternoon;
>> 
>>> On 2021-07-05, at 12:36:20, Andy Seaborne <a...@apache.org> wrote:
>>> 
>>> 
>>> 
>>> On 05/07/2021 10:01, Ivan Lagunov wrote:
>>>> Hello,
>>>> We’re facing an issue with Jena reading n-triples stream over HTTP. In 
>>>> fact, our application hangs entirely while executing this piece of code:
>>>> Model sub = ModelFactory.createDefaultModel();
>>>> TypedInputStream stream = HttpOp.execHttpGet(requestURL, 
>>>> WebContent.contentTypeNTriples, createHttpClient(auth), null)
>>>> // The following part sometimes hangs:
>>>> RDFParser.create()
>>>>        .source(stream)
>>>>        .lang(Lang.NTRIPLES)
>>>>        .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
>>>>        .parse(sub.getGraph());
>>>> // This point is not reached
>>>> 
>>>> The issue is not persistent, moreover it happens infrequently.
>>> 
>>> Then it looks like the data has stopped arriving but the connection is 
>>> still open. (or the system has gone in GC overload due to heap pressure.)
>>> 
>>> Is intermittent on the same data? Or is the data changing? because maybe 
>>> the data can't be written properly and the sender stops sending, though I'd 
>>> expect the sender to close the connection (it's now in an unknown state and 
>>> can't be reused).
>>> 
>>>> When it occurs, the RDF store server (we use Dydra for that) logs a 
>>>> successful HTTP 200 response for our call (truncated for readability):
>>>> HTTP/1.1" 200 3072/55397664 10.676/10.828 "application/n-triples" "-" "-" 
>>>> "Apache-Jena-ARQ/3.17.0" "127.0.0.1:8104"
>> 
>> the situation involves an nginx proxy and an upstream sparql processor.
>> 
>>> 
>>> What do the fields mean?
>> 
>> the line is an excerpt from an entry from the nginx request log. that line 
>> contains:
>> 
>>  protocol  code  requestLength/responseLength  
>> upstreamElapsedTime/clientElapsedTIme  acceptType  -  -  clientAgemt  
>> upstreamPort
>> 
>>> 
>>> Is that 3072 bytes sent (so far) of 55397664?
>>> 
>>> If so, is Content-Length set (and then chunk encoding isn't needed).
>> 
>> likely not, as the response is (i believe) that from a sparql request, which 
>> is emitted as it is generated.
>> 
>>> 
>>> Unfortunately, in HTTP, 200 really means "I started to send stuff", not "I 
>>> completed sending stuff". There is no way in HTTP 1/1 to signal an error 
>>> after starting the response.
>> 
>> that is true, but there are indications in other logs which imply that the 
>> sparql processor believes the response to have been completely sent to nginx.
>> there are several reasons to believe this.
>> the times and the 200 response code in the nginx log indicate completion.
>> otherwise, it would either indicate that it timed out, or would include a 
>> 499 code, to the effect that the client closed the connection before the 
>> response was sent.
>> neither is the case.
>> in addition, the elapsed time is well below that for which nginx would time 
>> out an upstream connection.
>> 
>>> 
>>> The HttpClient - how is it configured?
>>> 
>>>> So it looks like the RDF store successfully executes the SPARQL query, 
>>>> responds with HTTP 200 and starts transferring the data with the chunked 
>>>> encoding. Then something goes wrong when Jena processes the input stream. 
>>>> I expect there might be some timeout behind the scenes while Jena reads 
>>>> the stream
>>> 
>>> Does any data reach the graph?
>>> 
>>> There is no timeout at the client end - otherwise you would get an 
>>> exception. The parser is reading the input stream from Apache HttpClient. 
>>> If it hangs, it's because the data has stopped arriving but the connection 
>>> is still open.
>>> 
>>> You could try replacing .parse(graph) with parse(StreamRDF) and plug in a 
>>> logging StreamRDF so you can see the progress, either sending on data to 
>>> the graph or for investigation, merely logging.
>>> 
>>> In HTTP 1.1, a streamed response requires chunk encoding only when the 
>>> Content-Length isn't given.
>> 
>> i believe, the content length is not given.
>> 
>>> 
>>>> 
>>> , and it causes it to wait indefinitely. At the same time 
>>> ErrorHandlerFactory.errorHandlerStrict does not help at all – no errors are 
>>> logged.
>>>> Is there a way to configure the timeout behavior for the underlying Jena 
>>>> logic of processing HTTP stream? Ideally we want to abort the request if 
>>>> it times out and then retry it a few times until it succeeds.
>>> 
>>> The HttpClient determines the transfer.
>>> 
>>>   Andy
>>> 
>>> FYI: RDFConnectionRemote is an abstraction to make this a little easier. No 
>>> need to go to the low-level HttpOp.
>>> 
>>> 
>>> FYI: Jena 4.mumble.0 is likely to change to using jena.net.http as the HTTP 
>>> code. There has to be some change anyway to get HTTP/2  (Apache HttpClient 
>>> v5+, not v4, has HTTP/2 support).
>>> 
>>> This will include a new Graph Store Protocol client.
>>> 
>>>> Met vriendelijke groet, with kind regards,
>>>> Ivan Lagunov
>>>> Technical Lead / Software Architect
>>>> Skype: lagivan
>>>> Semaku B.V.
>>>> Torenallee 20 (SFJ3D) • 5617 BC Eindhoven • www.semaku.com
>> 
> 
> 
> 
> 

Reply via email to