Re: Thrift problem / corruption on large TDB2 Fuseki dataset

Osma Suominen Wed, 29 Mar 2023 02:52:49 -0700

Hi Andy,

thanks for your quick response!


Andy Seaborne kirjoitti 29.3.2023 klo 12.20:

Previous reports about this have been hitting disk limits disk, other OSprocesses touching the files (including if a shared file system) and I/Oerrors. External environment factors that happen silently a significanttime before problem emerges.
Unfortunately, reports don't always get completed - there's a report,they try some things out, we don't hear anything more. We don't get apicture what actually happened nor what worked.

I understand that these kinds of intermittent problems can be hard todebug and the cause can be an external factor. It's possible that thishappened in our case as well. The machines are virtual servers runningunder VMWare and they have their own XFS file systems based on LVM on(virtual) block devices. In my understanding there is nothing else thanFuseki itself that could be performing write operations on the Fusekidatabase files. The disks have never been full. This happened on twoseparate (though very similar) machines, a few days apart.

The one Jena related issue was compact in the presence of updates.

Compact got significant robustness improvements at 4.6.x.

https://github.com/apache/jena/issues/1252
https://github.com/apache/jena/pull/1456
It should work safely to compact an online database. Note that a compactis "write" operation so while the compact is running concurrent writersare held up. Outstanding concurrent readers can continue, new concurrentreaders can start during compaction.

Good to know! We do not currently use the compact functionality inFuseki, so I don't think it can be a factor in this.

Anything is possible but Jena use of thrift is java-only and Thriftenforces the union-defined assumption.
The "type 0" means it is reading some corrupted at a lower level.
Union is used for all RDF terms. Unless you have node extensions (needsJava code), thisis code that is executed a lot.
https://github.com/apache/jena/blob/16c9a8295d78a19787bdaa05b359af97ba00dcab/jena-arq/Grammar/RDF-Thrift/BinaryRDF.thrift#L68

We are using stock Apache Jena Fuseki builds. Nothing very customizedexcept for some moderately complex jena-text configuration.

In my understanding Thrift is an RPC framework. I'm not sure Iunderstand very well how it is used within Jena, when handling regularSPARQL queries coming in via HTTP to Fuseki. Are Thrift objects storedin TDB2? (The problem seemed to persist across Fuseki restarts.)

Basically I'm wondering how it's possible that "Thrift enforces theunion-defined assumption" but still there was a Thrift object thatapparently didn't follow it. How was it created? Or was it created,serialized to disk, somehow corrupted on-disk and then read back?


-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 15 (Unioninkatu 36)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: Thrift problem / corruption on large TDB2 Fuseki dataset

Reply via email to