On 05/07/2021 10:01, Ivan Lagunov wrote:
Hello,

We’re facing an issue with Jena reading n-triples stream over HTTP. In fact, 
our application hangs entirely while executing this piece of code:


Model sub = ModelFactory.createDefaultModel();

TypedInputStream stream = HttpOp.execHttpGet(requestURL, 
WebContent.contentTypeNTriples, createHttpClient(auth), null)

// The following part sometimes hangs:
RDFParser.create()
         .source(stream)
         .lang(Lang.NTRIPLES)
         .errorHandler(ErrorHandlerFactory.errorHandlerStrict)
         .parse(sub.getGraph());
// This point is not reached
>
The issue is not persistent, moreover it happens infrequently.

Then it looks like the data has stopped arriving but the connection is still open. (or the system has gone in GC overload due to heap pressure.)

Is intermittent on the same data? Or is the data changing? because maybe the data can't be written properly and the sender stops sending, though I'd expect the sender to close the connection (it's now in an unknown state and can't be reused).

When it occurs, the RDF store server (we use Dydra for that) logs a successful 
HTTP 200 response for our call (truncated for readability):
HTTP/1.1" 200 3072/55397664 10.676/10.828 "application/n-triples" "-" "-" 
"Apache-Jena-ARQ/3.17.0" "127.0.0.1:8104"

What do the fields mean?

Is that 3072 bytes sent (so far) of 55397664?

If so, is Content-Length set (and then chunk encoding isn't needed).

Unfortunately, in HTTP, 200 really means "I started to send stuff", not "I completed sending stuff". There is no way in HTTP 1/1 to signal an error after starting the response.

The HttpClient - how is it configured?

So it looks like the RDF store successfully executes the SPARQL query, responds 
with HTTP 200 and starts transferring the data with the chunked encoding. Then 
something goes wrong when Jena processes the input stream. I expect there might 
be some timeout behind the scenes while Jena reads the stream

Does any data reach the graph?

There is no timeout at the client end - otherwise you would get an exception. The parser is reading the input stream from Apache HttpClient. If it hangs, it's because the data has stopped arriving but the connection is still open.

You could try replacing .parse(graph) with parse(StreamRDF) and plug in a logging StreamRDF so you can see the progress, either sending on data to the graph or for investigation, merely logging.

In HTTP 1.1, a streamed response requires chunk encoding only when the Content-Length isn't given.

>
, and it causes it to wait indefinitely. At the same time ErrorHandlerFactory.errorHandlerStrict does not help at all – no errors are logged.

Is there a way to configure the timeout behavior for the underlying Jena logic 
of processing HTTP stream? Ideally we want to abort the request if it times out 
and then retry it a few times until it succeeds.

The HttpClient determines the transfer.

    Andy

FYI: RDFConnectionRemote is an abstraction to make this a little easier. No need to go to the low-level HttpOp.


FYI: Jena 4.mumble.0 is likely to change to using jena.net.http as the HTTP code. There has to be some change anyway to get HTTP/2 (Apache HttpClient v5+, not v4, has HTTP/2 support).

This will include a new Graph Store Protocol client.

Met vriendelijke groet, with kind regards,

Ivan Lagunov
Technical Lead / Software Architect
Skype: lagivan

Semaku B.V.
Torenallee 20 (SFJ3D) • 5617 BC Eindhoven • www.semaku.com

Reply via email to