I'm sorry I cannot offer any further information, as I do not really understand Java in any meaningful way.

Just in case this provides some useful information:

The data corruption was limited to one named graph within the corrupted dataset. The triples within that graph were present and could be accessed when using the default union graph, but any query asking for the graph name for these triples resulted in the same error, with this added to front:

BindingTDB ERROR get1(?add)

where the variable name was the one used in the SPARQL query for the data within the graph.

If graph information was not asked for, the triples returned ok.

And to repeat, this was not the dataset that was updated: that dataset seems ok. I'm quite positive the corrupted dataset did not have any activity going on at that moment.

One option that comes to my mind: could it be that the corruption took place earlier? Is there something else that could explain the errors? The database did start ok after I had freed some space on the disk, but the errors just manifested when I tried to compact the data, and the data in the corrupted graphs has not been used for a while.

Just as FYI:

Since the actual graph entity was somehow corrupted, the data within that could not be deleted or edited; the remedy was to export data from all named graphs, delete the dataset files, and import the data back; luckily the data in the corrupted graph could be easily recreated.


Best,

Harri


On 21.5.2021 14.14, Andy Seaborne wrote:
Hi,

The JVM crash with SIGBUS looks like:

https://bugs.openjdk.java.net/browse/JDK-8168628
and see the comment 22-05-2018
"This change has been backed out of JDK 11 as it break sparse files."

which refers to:
https://bugs.openjdk.java.net/browse/JDK-8191278

fixed in version 14. Looks like a backport as well to java8 but that might be OpenJDK8 only. Mikael is running java-8-oracle - not clear if that has a backport.

I can't connect that to why the Fuseki nodetable becomes broken because the transaction shouldn't happen. Even a partial commit should be recovered (the journal is replayed on start-up if it has entries).

     Andy

On 21/05/2021 09:14, Harri Kiiskinen wrote:
I seem to have similar problems as M. Pesonen in the other chain.

Summary:
Fuseki server encountered a "disk full" situation (see log and error report below) during update leading  to crash. After restart, some parts of the database are corrupted: dump and compact fail with NodeTableTRDF/Read exceptions (see third error log below), as well as some queries, but not all.

The corruption has taken place in another dataset than the one that was being update when the disk full occurred.

Logs below,

best

Harri Kiiskinen

Fuseki log for "disk full" crash:
--------------------------------------------------------------------
fuseki-server[216149]: [2021-05-20 21:37:07] Fuseki     INFO  [182050] Update
fuseki-server[216149]: #
fuseki-server[216149]: # A fatal error has been detected by the Java Runtime Environment:
fuseki-server[216149]: #
fuseki-server[216149]: #  SIGBUS (0x7) at pc=0x00007f2b608b7e15, pid=216149, tid=768713
...
--------------------------------------------------------------------

The error report /tmp/hs_err_pid216149.log:
----------------------------------------------------------------------
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGBUS (0x7) at pc=0x00007f2b608b7e15, pid=216149, tid=768713
#
# JRE version: OpenJDK Runtime Environment (11.0.11+9) (build 11.0.11+9-Ubuntu-0ubuntu2.20.04) # Java VM: OpenJDK 64-Bit Server VM (11.0.11+9-Ubuntu-0ubuntu2.20.04, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# v  ~StubRoutines::jint_disjoint_arraycopy
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to //core.216149)
#
# If you would like to submit a bug report, please visit:
#   https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
#

---------------  S U M M A R Y ------------

Command Line: -Xmx2G org.apache.jena.fuseki.cmd.FusekiCmd --jetty-config=/etc/fuseki/fuseki-jetty-https.xml

Host: Intel(R) Xeon(R) Gold 6254 CPU @ 3.10GHz, 2 cores, 15G, Ubuntu 20.04.2 LTS Time: Thu May 20 21:37:07 2021 EEST elapsed time: 1517007.667502 seconds (17d 13h 23m 27s)

...

Error message when running tdb2.tdbcompact
-----------------------------------------------------------------
org.apache.jena.tdb2.TDBException: NodeTableTRDF/Read
     at org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:87)      at org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:103)      at org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)      at org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:206)      at org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:131)

...

     at tdb2.tdbcompact.main(tdbcompact.java:28)
Caused by: org.apache.thrift.protocol.TProtocolException: Unrecognized type 0      at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:144)      at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:60)      at org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:433)      at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:224)      at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:213)
     at org.apache.thrift.TUnion.read(TUnion.java:138)
     at org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:82)
     ... 27 more
-----------------------------------------------------------------------


--
Tutkijatohtori / post-doctoral researcher
Viral Culture in the Early Nineteenth-Century Europe (ViCE)
Movie Making Finland: Finnish fiction films as audiovisual big data, 1907–2017 (MoMaF)
Turun yliopisto / University of Turku

Reply via email to