On 02/07/2020 21:55, Chris Tomlinson wrote:
> grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >"
catalina.out
Is there any signature as to when they occur? Two PUTs overlapping,
certain usage by your clients (which probably isn't visible in the
logs)? earlier connections broken? high load on the server? Time of
day? Anything else that looks like a characteristic?
Andy
On 03/07/2020 00:13, Chris Tomlinson wrote:
On Jul 2, 2020, at 17:44, Andy Seaborne <[email protected]> wrote:
On 02/07/2020 21:55, Chris Tomlinson wrote:
From what I can see, it (WARN) isn't database related.
No it seems to me to be related to getting the payload off the wire.
I think you said the same payload had been sent before.
??
Yes a copy/clone of the same payload, i.e., the serialization of the given
graph, has been sent many times w/o issue.
...
Even the concurrency looks OK because it locally writes a buffer so the HTTP
length is available.
(in case of corruption, not repeat, is happening)
So it seems to me that there may be an opportunity for some sort of
robustification in RDFConnection. There is evidently a loop somewhere that
doesn't terminate, retrying the parsing repeatedly or something like that. The
payload is finite so there wold appear to be a test that repeatedly fails but
doesn't make progress in consuming the payload.
RDFConnection (client-side) is sending, not parsing.
I'm referring to the Fuseki receiving end of the connection, where the WARNing
is being logged.
The WARN says that an empty <RDF_StreamRow > was seen.
There is no information about the stalled transactions although not finishing
the write would explain this:
30-Jun-2020 16:21:30.778
java.io.BufferedInputStream.read(BufferedInputStream.java:345)
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
so it's waiting for input. What's the proxy/reverse-proxy setup?
None. For the client on the same ec2 instance, obviously none; and for the
client on a second ec2 instance, we have nothing between our two internal ec2's
In the current situation, the two precipitating PUTs are from a client on the
same ec2 instance.
The code writes the payload to a ByteArrayOutputStream and sends those bytes.
That's how it gets the length for the HTTP header.
https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615
(run Fuseki with "verbose" to see the headers ... but it is quite verbose)
It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.
As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where
the issue is getting logged. Not the sending/client end.
Chris
Anyway - you have the source code ... :-)
Andy