Hi Andy,
thanks for your quick response!
Andy Seaborne kirjoitti 29.3.2023 klo 12.20:
Previous reports about this have been hitting disk limits disk, other OS
processes touching the files (including if a shared file system) and I/O
errors. External environment factors that happen silently a significant
time before problem emerges.
Unfortunately, reports don't always get completed - there's a report,
they try some things out, we don't hear anything more. We don't get a
picture what actually happened nor what worked.
I understand that these kinds of intermittent problems can be hard to
debug and the cause can be an external factor. It's possible that this
happened in our case as well. The machines are virtual servers running
under VMWare and they have their own XFS file systems based on LVM on
(virtual) block devices. In my understanding there is nothing else than
Fuseki itself that could be performing write operations on the Fuseki
database files. The disks have never been full. This happened on two
separate (though very similar) machines, a few days apart.
The one Jena related issue was compact in the presence of updates.
Compact got significant robustness improvements at 4.6.x.
https://github.com/apache/jena/issues/1252
https://github.com/apache/jena/pull/1456
It should work safely to compact an online database. Note that a compact
is "write" operation so while the compact is running concurrent writers
are held up. Outstanding concurrent readers can continue, new concurrent
readers can start during compaction.
Good to know! We do not currently use the compact functionality in
Fuseki, so I don't think it can be a factor in this.
Anything is possible but Jena use of thrift is java-only and Thrift
enforces the union-defined assumption.
The "type 0" means it is reading some corrupted at a lower level.
Union is used for all RDF terms. Unless you have node extensions (needs
Java code), thisis code that is executed a lot.
https://github.com/apache/jena/blob/16c9a8295d78a19787bdaa05b359af97ba00dcab/jena-arq/Grammar/RDF-Thrift/BinaryRDF.thrift#L68
We are using stock Apache Jena Fuseki builds. Nothing very customized
except for some moderately complex jena-text configuration.
In my understanding Thrift is an RPC framework. I'm not sure I
understand very well how it is used within Jena, when handling regular
SPARQL queries coming in via HTTP to Fuseki. Are Thrift objects stored
in TDB2? (The problem seemed to persist across Fuseki restarts.)
Basically I'm wondering how it's possible that "Thrift enforces the
union-defined assumption" but still there was a Thrift object that
apparently didn't follow it. How was it created? Or was it created,
serialized to disk, somehow corrupted on-disk and then read back?
-Osma
--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi