Re: repeated ThriftConvert WARN visit: Unrecognized:

Andy Seaborne Fri, 03 Jul 2020 06:43:00 -0700



On 02/07/2020 21:55, Chris Tomlinson wrote:

> grep -v "ThriftConvert WARN visit: Unrecognized: <RDF_StreamRow >"catalina.out

Is there any signature as to when they occur? Two PUTs overlapping,certain usage by your clients (which probably isn't visible in thelogs)? earlier connections broken? high load on the server? Time ofday? Anything else that looks like a characteristic?


    Andy


On 03/07/2020 00:13, Chris Tomlinson wrote:

On Jul 2, 2020, at 17:44, Andy Seaborne <[email protected]> wrote:



On 02/07/2020 21:55, Chris Tomlinson wrote:

 From what I can see, it (WARN) isn't database related.

No it seems to me to be related to getting the payload off the wire.


I think you said the same payload had been sent before.
??


Yes a copy/clone of the same payload, i.e., the serialization of the given 
graph, has been sent many times w/o issue.

...

Even the concurrency looks OK because it locally writes a buffer so the HTTP 
length is available.


(in case of corruption, not repeat, is happening)

So it seems to me that there may be an opportunity for some sort of 
robustification in RDFConnection. There is evidently a loop somewhere that 
doesn't terminate, retrying the parsing repeatedly or something like that. The 
payload is finite so there wold appear to be a test that repeatedly fails but 
doesn't make progress in consuming the payload.


RDFConnection (client-side) is sending, not parsing.


I'm referring to the Fuseki receiving end of the connection, where the WARNing 
is being logged.

The WARN says that an empty <RDF_StreamRow > was seen.

There is no information about the stalled transactions although not finishing 
the write would explain this:

30-Jun-2020 16:21:30.778

java.io.BufferedInputStream.read(BufferedInputStream.java:345)
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)

so it's waiting for input. What's the proxy/reverse-proxy setup?


None. For the client on the same ec2 instance, obviously none; and for the 
client on a second ec2 instance, we have nothing between our two internal ec2's

In the current situation, the two precipitating PUTs are from a client on the 
same ec2 instance.

The code writes the payload to a ByteArrayOutputStream and sends those bytes. 
That's how it gets the length for the HTTP header.

https://github.com/apache/jena/blob/master/jena-rdfconnection/src/main/java/org/apache/jena/rdfconnection/RDFConnectionRemote.java#L615

(run Fuseki with "verbose" to see the headers ... but it is quite verbose)

It sent something so the RDF->Thrift->bytes had finished and then it sent bytes.


As I tried to clarify, my remarks were w.r.t. the Fuseki/receiving end where 
the issue is getting logged. Not the sending/client end.

Chris

Anyway - you have the source code ... :-)

    Andy

Re: repeated ThriftConvert WARN visit: Unrecognized:

Reply via email to