Hi,
Thank you for the reply. It is a relatively complex test method to
reproduce, unfortunately, because it requires JDK 10 with the
jdk.incubator.httpclient. The client also connects to the app with
HTTP/2. I suspect that any asynchronous client, however, could produce a
similar result.
Here are some details:
1. TrellisLDP has configuration options to write to an in memory or TDB
filesystem dataset. Both of these options work with asynchronous writes,
and TrellisLDP depends on jena-osgi and jena-tdb2 version 3.6.0.
2. Fuseki is deployed in a Tomcat 9 container and trellis-app connects to
it over HTTPS using RDFConnection.connect("https:/
/fuseki:8443/fuseki/trellis"). (I do not suspect that Tomcat or the
protocol are consequential to the problem).
3. The tests always start from a clean dataset.
4. The client test does repeated PUTs of a 5k size n-triples dataset (400
iterations).
5. The client test is run in JUnit5 in a JDK 10 JVM on localhost.
6. The trellis-app and Fuseki deployment is dockerized. The Fuseki dataset
directory is bound to the localhost filesystem. The container environments
are initialized clean for each run, which facilitates reproducible test
output.
I have committed server and client access logs from a 3.4.0 deployment and
a 3.6.0 deployment here:
https://github.com/pan-dora/ldp-client/tree/async/fuseki-issue
The 3.4.0 test starts at 12:19:05.807 (Central European Summer Time). The
access log records 400 PUT requests all with a 201 response in 25 seconds)
The 3.6.0 test starts at 12:27:18.161 (Central European Summer Time). The
access log records 266 PUT requests with a 201 response and 134 with a 500
response in 34 seconds)
The first exception occurs 23 seconds into the test (line 12369 in
fuseki-3.6.0-async.txt):
> [2018-03-31 10:27:41] BindingTDB ERROR get1(?object)
> org.apache.jena.tdb.base.file.FileException: In the middle of an
> alloc-write
Thank you again for your assistance, and I am glad to do further
diagnostics. I have considered applying incremental commit patches to a
3.4.0 build to isolate the changeset(s) as the next course of action.
Christopher Johnson
Scientific Associate
Universitätsbibliothek Leipzig
On 31 March 2018 at 09:26, Andy Seaborne <[email protected]> wrote:
> Hi there,
>
> The log files would be useful to know what the server was doing at the
> time and to see the stacktraces. The ideal is a complete, minimal example
> which we can use to investigate. Without a runnable setup, its a bit
> harder to investigate.
>
> Some questions to help me understand what is happening:
>
> What does the the code in pan-dora/ldp-client is do? If it is concurrent
> operations, what else is the client doing at the time?
>
> Does it work if you go back to 3.4.0 after using 3.5.0 where it went wrong?
>
> What's the Fuseki server configuration?
>
> What's the history of the TDB database? How long ago was it created and
> has it been used outside of Fuseki or in different Fuseki configurations?
>
> JENA-1482 (which isn't in a released version of Jena) is a few diagnostics
> to catch error situations earlier.
>
> Andy
>
> On 30/03/18 07:06, Christopher Johnson wrote:
>
>> Hi list,
>>
>> I apologize, but on further evaluation, I have determined that it
>> definitely works in 3.4.0 but not in 3.5.0. So, the scope of changes
>> (since 2017 July) is a bit broader.
>> Also, for reference, here is a link to the client method
>> <https://github.com/pan-dora/ldp-client/blob/master/src/main
>> /java/cool/pandora/ldpclient/LdpClientImpl.java#L875-L879>.
>> If you need detailed server logs, please let me know.
>>
>> Christopher Johnson
>> Scientific Associate
>> Universitätsbibliothek Leipzig
>>
>> On 30 March 2018 at 07:08, Christopher Johnson <[email protected]>
>> wrote:
>>
>> Hi list,
>>>
>>> I am developing a client that writes asynchronously to Fuseki via
>>> TrellisLDP <https://github.com/trellis-ldp/trellis>. There seems to be
>>> an issue since 3.6.0 that throws a BindingTBD error.
>>>
>>> The exception is this:
>>>
>>> fuseki | [2018-03-27 17:02:44] BindingTDB ERROR get1(?parent)
>>>> fuseki | java.nio.BufferOverflowException
>>>> fuseki | at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:214)
>>>> fuseki | at sun.nio.ch.IOUtil.read(IOUtil.java:200)
>>>>
>>>
>>>
>>> I looked at the code in BindingTBD and reverted the change in JENA-1482:
>>> tests for null NodeIds at the TDB level" to see if a different result
>>> would
>>> occur.
>>>
>>> Now it throws this exception:
>>>
>>> fuseki | [2018-03-30 03:39:34] BindingTDB ERROR get1(?object)
>>>> fuseki | org.apache.jena.tdb.base.file.FileException: In the middle
>>>> of an alloc-write
>>>> fuseki | at org.apache.jena.tdb.base.objec
>>>> tfile.ObjectFileStorage.
>>>> read(ObjectFileStorage.java:311)
>>>>
>>>
>>>
>>> This looks like a concurrency issue. I do not know why it works in
>>> 3.5.0. Any ideas?
>>>
>>> Thank you for your consideration of this.
>>>
>>> Christopher Johnson
>>> Scientific Associate
>>> Universitätsbibliothek Leipzig
>>>
>>>
>>