we are using fuseki 3.4.0 with TDB and we have some trouble with a dataset
for some SPARQL queries we get an
org.apache.jena.atlas.lib.InternalErrorException: Invalid id node for
subject (null node).
For example to count the triples in the default graph
[2017-12-21 12:10:59] Fuseki INFO [39] Query = select (count(*) as
?count) {?s ?p ?o}
[2017-12-21 12:10:59] Fuseki WARN [39] RC = 500 : Invalid id node
for subject (null node): ([0000000000094FAF], [00000000000001B4],
[000000000000038B])
org.apache.jena.atlas.lib.InternalErrorException: Invalid id node for
subject (null node): ([0000000000094FAF], [00000000000001B4],
[000000000000038B])
at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:98)
at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84)
at
org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54)
and we get the same exception when we try to backup the dataset.
[2017-12-21 12:18:14] Admin INFO [47] POST
http://localhost:3030/$/backup/MICA
[2017-12-21 12:18:14] Admin INFO [47] Backup dataset /MICA
[2017-12-21 12:18:14] Server INFO Task : 1 : backup
[2017-12-21 12:18:14] Server INFO [Task 1] starts : backup
[2017-12-21 12:18:14] Backup INFO [47] >>>> Start backup /MICA ->
P:\DevTools\SemWeb\apache-jena-fuseki-3.4.0\run\backups\MICA_2017-12-21_12-18-14
[2017-12-21 12:18:14] Admin INFO [47] 200 OK (6 ms)
[2017-12-21 12:18:14] Server INFO [48] GET
http://localhost:3030/$/tasks/1
[2017-12-21 12:18:14] Server INFO [48] Task 1
[2017-12-21 12:18:14] Server INFO [48] 200 OK (4 ms)
[2017-12-21 12:18:15] Backup INFO [47] **** Exception in backup
org.apache.jena.atlas.lib.InternalErrorException: Invalid id node for
subject (null node): ([0000000000094FAF], [00000000000001B4],
[000000000000038B])
at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:98)
at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84)
at
org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54)
at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
at org.apache.jena.atlas.iterator.Iter.next(Iter.java:875)
any idea about the origin of this problem ?
are there any ways to fix it ?
Thanks for your help
Philippe
Le 21/12/2017 à 11:52, Piotr Nowara a écrit :
Here are gist links to the test classes I mentioned in my previous message:
https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
Thanks,
Piotr
2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>:
Attachments don't come through on the list. Please use a paste or gist.
I hope these examples are short and concise. Complete, Minimal Examples
please.
HTML messes up structured text but:
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena</artifactId>
<version>3.5.0</version>
<type>zip</type>
</dependency>
should be:
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena-libs</artifactId>
<type>pom</type>
(your picked most of it up via the TDB dependency).
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>jena-csv</artifactId>
<version>3.5.0</version>
<type>jar</type>
</dependency>
Is this necessary for your example?
Andy
On 21/12/17 09:08, Piotr Nowara wrote:
Hi,
I'm attaching two simple JAVA classes which I'm using for testing (there
are some comments there describing what results I got). The JAVA app and
Fuseki are on the same server. The FusekiTest3 is invoking
DatasetAccessorFactory.createHTTP() (so I think this is what you mean by
"remote" implementation) and FusekiTest2 is using
RDFConnection.fetchDataset (which is the slowest operation).
The GRAPH clause gives me expected results (returns the newly added
triple), but why FROM should be wrong?
GRAPH access a named graph.
FROM describes a dataset to be queried.
We use FROM clause in many of our queries and we didn't notice anything
wrong/unexpected when using TDB dataset. With Fuseki FROM seems to return
the content of the default graph and not the graph indicated by the FROM
<named-graph-IRI>.
Both tests fail to preserve the newly added triple.
Here are the maven artifacts I'm using for the client app (maybe I should
download some Fuseki specific JAR?):
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>jena-tdb</artifactId>____
<version>3.5.0</version>____
<type>jar</type>____
</dependency>____
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>apache-jena</artifactId>____
<version>3.5.0</version>____
<type>zip</type>____
</dependency>____
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>jena-csv</artifactId>____
<version>3.5.0</version>____
<type>jar</type>____
</dependency>
Thanks,
Piotr
2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto:
[email protected]>>:
On 20/12/17 18:28, Piotr Nowara wrote:
Hi,
thanks for answering so quickly.
I tried two different solutions:
1) Merging models obtained using DatasetAccessor
Which implementation of DatasetAccessor? (local or remote?)
Model portal = accessor.getModel("http://www.myGraph.com/portal
<http://www.myGraph.com/portal>");
Model defaultM = accessor.getModel();
Model external =
accessor.getModel("http://www.myGraph.com/external
<http://www.myGraph.com/external>
");
dataset =
DatasetFactory.create(external.add(portal).add(defaultM));
2) RDFConnection - works much slower than the method above
(which is not
surprise since you said it can affect the performance negatively)
and this is a remote RDFConnection? (otherwise it should perform,
with default Isolation, the same)
I noticed two confusing issues when working with those datasets:
Issue 1: SPARQL SELECT would produce diferent results
in what way different?
depending on where
the named graph IRI was defined in the query (FROM clause vd.
WHERE clause):
SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s ?p ?o}
behaves differently than:
SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s ?p ?o}}
GRAPH is correct, FROM is wrong.
Issue 2: After ading a triple using INSERT DATA statement the
triple was
present in the graph but dissapeard after closing the connection
despite
the fact I did dataset.commit()
Complete example?
We didn't experience those issues when working with a "local"
Jena TDB. For
now we will probably stick to the TDB version, but someday we
would need
the multi-user functionality Fuseki offers anyway. It seems that
we will
have to revise all our SPARQL queries to make it Fuseki-ready
which means
migrating from TDB to Fuseki will be more difficult for us than
migrating
from another triple-store we were using in the past to Jena TDB
that went
very smoothly. I'm still wondering whether or not I'm missing
something
regarding Fuseki.
Thanks,
Piotr
2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected]
<mailto:[email protected]>>:
On 19/12/17 21:41, Piotr Nowara wrote:
Hi,
I got a TDB powered JAVA app which is issuing a lot of
SPARQL UPDATES and
SELECTS (most of them accessing multiple named graphs at
once). My app
obtains a Jena connection using this simple API call:
this.dataset = TDBFactory.createDataset(this.
storagePath);
Then this dataset object is used to run SPARQL UPDATES
and SELECTS.
I would like to replicate this solution using Jena
Fuseki but I wonder if
that’s possible since the DatasetAccessor class provides
only methods to
access separate named graphs. What I need is a
database/dataset level
access. The Fuseki database should be persistent.
I'd be grateful for any clue or code example.
Query and update work on datasets.
RDFConnection
http://jena.apache.org/documentation/rdfconnection/
<http://jena.apache.org/documentation/rdfconnection/>
is the combined interface to both local and remote datasets
and includes
some operations that include whole GET/POST/PUT of datasets
RDFConnection.connect("http:/localhost:3030/myDataset")
for migration from local, note that data is copied across
the network when
doing dataset operations. RDFConnection has whole dataset
operations in the
style of SPARQL Graph Store Protocol (=DatasetAccessor)
operations.
If your graphs and dataset are large is maybe not what you
want.
Because this across the network, the semantics of lcoal and
remote are not
identical unless you ask the local mode to do copying:
RDFConnection.connect(datasets, Isolation.COPY)
which is a good simulation for a local/remote (and slower
for local than
no COPY)
Andy
Thanks,
Piotr
--
Philippe Genoud
Universite Grenoble Alpes
-------------------------------------------------------
STEAMER group
Laboratoire d'Informatique de Grenoble (LIG)
Bâtiment IMAG
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
-------------------------------------------------------
adresse postale
LIG - Bâtiment IMAG - CS 40700 - 38058 GRENOBLE CEDEX 9
-------------------------------------------------------
Tel: tel: (+33) (0)4 57 42 15 01