On 29/03/2023 10:52, Osma Suominen wrote:
Hi Andy,

thanks for your quick response!

Andy Seaborne kirjoitti 29.3.2023 klo 12.20:

Previous reports about this have been hitting disk limits disk, other OS processes touching the files (including if a shared file system) and I/O errors. External environment factors that happen silently a significant time before problem emerges.

Unfortunately, reports don't always get completed - there's a report, they try some things out, we don't hear anything more. We don't get a picture what actually happened nor what worked.

I understand that these kinds of intermittent problems can be hard to debug and the cause can be an external factor. It's possible that this happened in our case as well. The machines are virtual servers running under VMWare and they have their own XFS file systems based on LVM on (virtual) block devices. In my understanding there is nothing else than Fuseki itself that could be performing write operations on the Fuseki database files. The disks have never been full. This happened on two separate (though very similar) machines, a few days apart.

The one Jena related issue was compact in the presence of updates.

Compact got significant robustness improvements at 4.6.x.

https://github.com/apache/jena/issues/1252
https://github.com/apache/jena/pull/1456

It should work safely to compact an online database. Note that a compact is "write" operation so while the compact is running concurrent writers are held up. Outstanding concurrent readers can continue, new concurrent readers can start during compaction.

Good to know! We do not currently use the compact functionality in Fuseki, so I don't think it can be a factor in this.

Anything is possible but Jena use of thrift is java-only and Thrift enforces the union-defined assumption.

The "type 0" means it is reading some corrupted at a lower level.
Union is used for all RDF terms.  Unless you have node extensions (needs Java code), thisis code that is executed a lot.

https://github.com/apache/jena/blob/16c9a8295d78a19787bdaa05b359af97ba00dcab/jena-arq/Grammar/RDF-Thrift/BinaryRDF.thrift#L68

We are using stock Apache Jena Fuseki builds. Nothing very customized except for some moderately complex jena-text configuration.

In my understanding Thrift is an RPC framework. I'm not sure I understand very well how it is used within Jena, when handling regular SPARQL queries coming in via HTTP to Fuseki. Are Thrift objects stored in TDB2? (The problem seemed to persist across Fuseki restarts.)

Basically I'm wondering how it's possible that "Thrift enforces the union-defined assumption" but still there was a Thrift object that apparently didn't follow it. How was it created? Or was it created, serialized to disk, somehow corrupted on-disk and then read back?

I don't think it is union-related. Anything broken is going to look like a union. All RDF terms are unions. I think it's looking into the middle of a messed up set of bytes for a term.

But if it is a broken union, it would be a write a short Java program that takes a node, serializes it and then can't unserialise it. (I think call cases are in the test suite.) That's all deterministic.

The reports are for occasional, random errors that can't be reproduced after a rebuild.

    Andy


-Osma

Reply via email to