On 22/12/17 12:19, Piotr Nowara wrote:
...
(semantics of HTTP operations being "copy")
...
1. Andy saying FROM clause is wrong and WHERE clause is right. Could
anyone comment on that? Why this behaves different than in TDB where
my
FROM clause works as expected.
"FROM" on this kind of dataset (general purpose, in-memory) tries to read
from the web, not take a graph from the local dataset.
http://www.example.com/portal does really exist and you can HTTP GET
from it (it has no triples).
2. No useful documentation found on the topic of migrating from TDB to
Fuseki (at least I couldn't find it).
3. Maybe my use case (migrating a single user app with local triple
store to an enterprise-ready multi-user app) is not good for Fuseki
at all?
Bu how do I know that? Your docs say Fuseki is the fit for multi-user
environment.
We haven't seen the details of the application but a big difference
betwen the local and remote setups is that while locally, you can use the
Jena API, remote data does not behave that way. Interacting with it is by
SPARQL (Query, Update, Graph Store Protocol) a bit like JDBC and also like
3-tier architecture web applications - client, app server, database.
4. So many ways to establish a Fuseki connection (different APIs:
RDFConnection, local or HTTP DatasetAccessor methods, embeded
FusekiServer
mentioned in the last email) and so little info on how and when to
use
them
RDFConnection provides a uniform way to interact with local and remote
data.
Embedded FusekiServer was mentioned for writing portable tests.
Andy
Of course I'd like to keep using Jena if possible. I was able to migrate a
complex analytical application from a commercial triple store to Jena TDB
just by reading TDB-related docs. I say a "complex" app because it is
using
sophisticated OWL reasoning (with several chains of SWRL rules), an
external reasoner (Openllet) and lots of SPARQL queries (some of them are
really twisted). It takes about 300ms to complete a basic analytical
process (which includes a couple of SWRL reasoning iterations) which is
very impressive result and the main reason we'd like keep using Jena in a
more enterprise-friendly scenario. But we don't know how to make the next
step because inserting a simple triple and running a trivial SELECT on a
single named graph seems like a big challenge now.
So let me ask a simple question: what is the recommended Jena setup when
migrating from a local, single-user app to an environment suitable for a
small team of users (lets say 8-12) to use a shared Jena database? We
assume local network only access and things that already work for us in
the
plain TDB mode which are: accessing the database both on Dataset and
Model
level, support for multi-graph SPARQL Updates and Queries and use of an
external reasoner (via OntModel or InfModel).
Thanks,
Piotr
2017-12-22 5:34 GMT-05:00 Andy Seaborne <[email protected]>:
Piotr,
As ajs6f says, it is not possible to recreate your examples. We don't
know what' in Fuseki nor how it's configured. The fact some code is
commented out is also puzzling.
Fuseki can be run in the same process as the examples - this is very
useful for testing.
See org.apache.jena.fuseki.embedded.FusekiServer
eg.
FusekiServer server= FusekiServer.create()
.setPort(port)
.setLoopback(true)
.add("/ds", dataset)
.build();
server.start();
https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
dataset = DatasetFactory.create(
and also
dataset = conn.fetchDataset()
This is a local, in-memory dataset.
In the conn.fetchDataset case it is copied out of the server.
...
dataset.begin(ReadWrite.WRITE);
executeSPARQLUpdate
This is only updating the local copy of the dataset.
The changes do not go back to Fuseki.
Use RDFConnection.update or UpdateExecutionFactor.createRemote.
Andy
On 21/12/17 14:50, ajs6f wrote:
In the first code example, you have commented out the line that actually
runs an update. That may be a typo, but now we don't know what you are
actually running.
In the second, you don't actually show the query you are running after
a
commit, or how you run it.
In both cases, you include a deal of commented-out queries and OntModel
machinery.
Please, a complete and minimal example.
ajs6f
On Dec 21, 2017, at 5:52 AM, Piotr Nowara <[email protected]>
wrote:
Here are gist links to the test classes I mentioned in my previous
message:
https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
Thanks,
Piotr
2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>:
Attachments don't come through on the list. Please use a paste or
gist.
I hope these examples are short and concise. Complete, Minimal
Examples
please.
HTML messes up structured text but:
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena</artifactId>
<version>3.5.0</version>
<type>zip</type>
</dependency>
should be:
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena-libs</artifactId>
<type>pom</type>
(your picked most of it up via the TDB dependency).
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>jena-csv</artifactId>
<version>3.5.0</version>
<type>jar</type>
</dependency>
Is this necessary for your example?
Andy
On 21/12/17 09:08, Piotr Nowara wrote:
Hi,
I'm attaching two simple JAVA classes which I'm using for testing
(there
are some comments there describing what results I got). The JAVA app
and
Fuseki are on the same server. The FusekiTest3 is invoking
DatasetAccessorFactory.createHTTP() (so I think this is what you
mean
by
"remote" implementation) and FusekiTest2 is using
RDFConnection.fetchDataset (which is the slowest operation).
The GRAPH clause gives me expected results (returns the newly added
triple), but why FROM should be wrong?
GRAPH access a named graph.
FROM describes a dataset to be queried.
We use FROM clause in many of our queries and we didn't notice
anything
wrong/unexpected when using TDB dataset. With Fuseki FROM seems to
return
the content of the default graph and not the graph indicated by the
FROM
<named-graph-IRI>.
Both tests fail to preserve the newly added triple.
Here are the maven artifacts I'm using for the client app (maybe I
should
download some Fuseki specific JAR?):
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>jena-tdb</artifactId>____
<version>3.5.0</version>____
<type>jar</type>____
</dependency>____
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>apache-jena</artifactId>____
<version>3.5.0</version>____
<type>zip</type>____
</dependency>____
<dependency>____
<groupId>org.apache.jena</groupId>____
<artifactId>jena-csv</artifactId>____
<version>3.5.0</version>____
<type>jar</type>____
</dependency>
Thanks,
Piotr
2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto:
[email protected]>>:
On 20/12/17 18:28, Piotr Nowara wrote:
Hi,
thanks for answering so quickly.
I tried two different solutions:
1) Merging models obtained using DatasetAccessor
Which implementation of DatasetAccessor? (local or remote?)
Model portal = accessor.getModel("http://www.
myGraph.com/portal
<http://www.myGraph.com/portal>");
Model defaultM = accessor.getModel();
Model external =
accessor.getModel("http://www.myGraph.com/external
<http://www.myGraph.com/external>
");
dataset =
DatasetFactory.create(external.add(portal).add(defaultM));
2) RDFConnection - works much slower than the method above
(which is not
surprise since you said it can affect the performance
negatively)
and this is a remote RDFConnection? (otherwise it should
perform,
with default Isolation, the same)
I noticed two confusing issues when working with those
datasets:
Issue 1: SPARQL SELECT would produce diferent results
in what way different?
depending on where
the named graph IRI was defined in the query (FROM clause
vd.
WHERE clause):
SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s
?p ?o}
behaves differently than:
SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s
?p
?o}}
GRAPH is correct, FROM is wrong.
Issue 2: After ading a triple using INSERT DATA statement
the
triple was
present in the graph but dissapeard after closing the
connection
despite
the fact I did dataset.commit()
Complete example?
We didn't experience those issues when working with a
"local"
Jena TDB. For
now we will probably stick to the TDB version, but someday
we
would need
the multi-user functionality Fuseki offers anyway. It seems
that
we will
have to revise all our SPARQL queries to make it
Fuseki-ready
which means
migrating from TDB to Fuseki will be more difficult for us
than
migrating
from another triple-store we were using in the past to
Jena TDB
that went
very smoothly. I'm still wondering whether or not I'm
missing
something
regarding Fuseki.
Thanks,
Piotr
2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected]
<mailto:[email protected]>>:
On 19/12/17 21:41, Piotr Nowara wrote:
Hi,
I got a TDB powered JAVA app which is issuing a
lot of
SPARQL UPDATES and
SELECTS (most of them accessing multiple named
graphs
at
once). My app
obtains a Jena connection using this simple API
call:
this.dataset = TDBFactory.createDataset(this.
storagePath);
Then this dataset object is used to run SPARQL
UPDATES
and SELECTS.
I would like to replicate this solution using Jena
Fuseki but I wonder if
that’s possible since the DatasetAccessor class
provides
only methods to
access separate named graphs. What I need is a
database/dataset level
access. The Fuseki database should be persistent.
I'd be grateful for any clue or code example.
Query and update work on datasets.
RDFConnection
http://jena.apache.org/documentation/rdfconnection/
<http://jena.apache.org/documentation/rdfconnection/>
is the combined interface to both local and remote
datasets
and includes
some operations that include whole GET/POST/PUT of
datasets
RDFConnection.connect("http:/
localhost:3030/myDataset")
for migration from local, note that data is copied
across
the network when
doing dataset operations. RDFConnection has whole
dataset
operations in the
style of SPARQL Graph Store Protocol (=DatasetAccessor)
operations.
If your graphs and dataset are large is maybe not what
you
want.
Because this across the network, the semantics of
lcoal and
remote are not
identical unless you ask the local mode to do copying:
RDFConnection.connect(datasets, Isolation.COPY)
which is a good simulation for a local/remote (and
slower
for local than
no COPY)
Andy
Thanks,
Piotr