Dan,

The exceptions don't correspond to problems with the indexing of triples; they come from the node table. The query


SELECT (count(*) AS ?C) { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } } }

only counts the triple and quad indexes and not the node table so

> "C": { "type": "literal" , "datatype":
> "http://www.w3.org/2001/XMLSchema#integer"; , "value": "0" }

suggests the indexes are empty (specifically indexes SPO and GSPO).

Could moving the databases around have got files mixed up?

In the backup log, there wasn't a NodeTable exception.


    Andy


On 15/05/18 22:36, Dan Pritts wrote:
...turns out that the server that I was working on is known to have a corrupted index - we'd left it running for no good reason.  Meanwhile, development work, and the associated CNAME, had moved on to another server.    So i was querying the good server's web interface, but logging in to the bad server to attempt backups.



I don't know what caused the index to be corrupted on the bad server - the person involved is one of those who is unavailable today.  If it seems like it might have been errors in fuseki (as opposed to operator error) I'll let you know.

For the record, on the bad machine,  the count(*) query returns:

"C": { "type": "literal" , "datatype": "http://www.w3.org/2001/XMLSchema#integer"; , "value": "0" }

the tdbdump command returns with no output.  So the indices are so borked that the code believed there was nothing there.  Which fits with a size 0 backup.


........

Meanwhile, the "working" server, where we have the object we had trouble deleting, throws the following error when I try to back it up.  It only backs up a couple hundred (out of 20 million) entries before it croaks.

[2018-05-15 16:35:56] Admin      INFO  [156546] 200 OK (12 ms)
[2018-05-15 16:35:56] TDB        ERROR ObjectFileStorage.read[nodes](595777248)[filesize=613223078][file.size()=613223078]: Impossibly large object : 1013478516 bytes > filesize-(loc+SizeOfInt)=17445826
[2018-05-15 16:35:56] Log4jLoggerAdapter WARN  Exception in backup
org.apache.jena.tdb.base.file.FileException: ObjectFileStorage.read[nodes](595777248)[filesize=613223078][file.size()=613223078]: Impossibly large object : 1013478516 bytes > filesize-(loc+SizeOfInt)=17445826         at org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:348)         at org.apache.jena.tdb.base.objectfile.ObjectFileWrapper.read(ObjectFileWrapper.java:57)
         at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
        at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)         at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)         at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)         at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)         at org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)         at org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)         at org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)


The  configuration is identical - it is a clone made from an AWS snapshot

In case it's useful:

count query:          "C": { "type": "literal" , "datatype": "http://www.w3.org/2001/XMLSchema#integer"; , "value": "20371945" }

tdbdump dumps what looks like the same few hundred entries, then throws a similar stack trace:

17:31:28 ERROR TDB                  :: ObjectFileStorage.read[nodes](595777248)[filesize=613223078][file.size()=613223078]: Impossibly large object : 1013478516 bytes > filesize-(loc+SizeOfInt)=17445826 org.apache.jena.tdb.base.file.FileException: ObjectFileStorage.read[nodes](595777248)[filesize=613223078][file.size()=613223078]: Impossibly large object : 1013478516 bytes > filesize-(loc+SizeOfInt)=17445826     at org.apache.jena.tdb.base.objectfile.ObjectFileStorage.read(ObjectFileStorage.java:348)
     at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:78)
    at org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)     at org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)     at org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)     at org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)



I  don't know what we did to muck up the indices; again, the appropriate guy is unavailable.  I doubt we used anything other than the http sparql interface, but i could definitely be wrong.    I know he was making very significant changes, but i think that was on yet another development server.

Andy Seaborne <mailto:a...@apache.org>
May 15, 2018 at 4:42 PM
Dan,

Could you try this query:

SELECT (count(*) AS ?C) { { ?s ?p ?o } UNION { GRAPH ?g { ?s ?p ?o } }

for the list archives, if anyone ever wants to try this, it's missing a trailing }


thanks
danno
--
Dan Pritts
ICPSR Computing & Network Services
University of Michigan
<https://www.postbox-inc.com>

Reply via email to