Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Piotr Nowara Tue, 02 Jan 2018 01:55:28 -0800

Andy,

thank you for all those details. Now I understand why I was getting not
what I had expected.


Got three new questions:

   1. How to update the "original" remote Fuseki database with the local
   changes made after copying the remote content?
   2. Is the FROM clause behavior somehow configurable to always stay local
   and not to access a remote DB?
   3. How to access Jena Fuseki database without downloading its content to
   a local copy at all? I'd prefer all my SPARQL queries operate on the source
   database not on some local copy.

And let me ask again my previous question:
What is the recommended Jena setup and connection API for an app deployed
for a small team of users (8-12) using a shared Jena database? Let's assume:

   - local network only access
   - SPARQL queries and other database processes should be executed on the
   Fuseki server and not on a local dataset.
   - accessing the database both on Dataset and Model level.
   - support for multi-graph SPARQL Updates and Queries and
   - use of an external reasoner via OntModel or InfModel (ideally without
   the need of copying data to a local dataset)


Wish you and Jena community all the best in 2018!

Piotr


2017-12-23 9:44 GMT-05:00 Andy Seaborne <[email protected]>:

>
>
> On 22/12/17 12:19, Piotr Nowara wrote:
> ...
> (semantics of HTTP operations being "copy")
> ...
>
>>     1. Andy saying FROM clause is wrong and WHERE clause is right. Could
>>     anyone comment on that? Why this behaves different than in TDB where
>> my
>>     FROM clause works as expected.
>>
>
> "FROM" on this kind of dataset (general purpose, in-memory) tries to read
> from the web, not take a graph from the local dataset.
> http://www.example.com/portal does really exist and you can HTTP GET from
> it (it has no triples).
>
>     2. No useful documentation found on the topic of migrating from TDB to
>>     Fuseki (at least I couldn't find it).
>>     3. Maybe my use case (migrating a single user app with local triple
>>     store to an enterprise-ready multi-user app) is not good for Fuseki
>> at all?
>>     Bu how do I know that? Your docs say Fuseki is the fit for multi-user
>>     environment.
>>
>
> We haven't seen the details of the application but a big difference betwen
> the local and remote setups is that while locally, you can use the Jena
> API, remote data does not behave that way. Interacting with it is by SPARQL
> (Query, Update, Graph Store Protocol) a bit like JDBC and also like 3-tier
> architecture web applications - client, app server, database.
>
>     4. So many ways to establish a Fuseki connection (different APIs:
>>     RDFConnection, local or HTTP DatasetAccessor methods, embeded
>> FusekiServer
>>     mentioned in the last email) and so little info  on how and when to
>> use
>>     them
>>
>
> RDFConnection provides a uniform way to interact with local and remote
> data.
>
> Embedded FusekiServer was mentioned for writing portable tests.
>
>     Andy
>
>
> Of course I'd like to keep using Jena if possible. I was able to migrate a
>> complex analytical application from a commercial triple store to Jena TDB
>> just by reading TDB-related docs. I say a "complex" app because it is
>> using
>> sophisticated OWL reasoning (with several chains of SWRL rules), an
>> external reasoner (Openllet) and lots of SPARQL queries (some of them are
>> really twisted). It takes about 300ms to complete a basic analytical
>> process (which includes a couple of SWRL reasoning iterations) which is
>> very impressive result and the main reason we'd like keep using Jena in a
>> more enterprise-friendly scenario. But we don't know how to make the next
>> step because inserting a simple triple and running a trivial SELECT on a
>> single named graph seems like a big challenge now.
>>
>> So let me ask a simple question: what is the recommended Jena setup when
>> migrating from a local, single-user app to an environment suitable for a
>> small team of users (lets say 8-12) to use a shared Jena database? We
>> assume local network only access and things that already work for us in
>> the
>> plain TDB mode which are: accessing the database both on Dataset and Model
>> level, support for multi-graph SPARQL Updates and Queries and use of an
>> external reasoner (via OntModel or InfModel).
>>
>> Thanks,
>> Piotr
>>
>>
>> 2017-12-22 5:34 GMT-05:00 Andy Seaborne <[email protected]>:
>>
>> Piotr,
>>>
>>> As ajs6f says, it is not possible to recreate your examples.  We don't
>>> know what' in Fuseki nor how it's configured. The fact some code is
>>> commented out is also puzzling.
>>>
>>> Fuseki can be run in the same process as the examples - this is very
>>> useful for testing.
>>>
>>> See org.apache.jena.fuseki.embedded.FusekiServer
>>>
>>> eg.
>>>
>>> FusekiServer server= FusekiServer.create()
>>>              .setPort(port)
>>>              .setLoopback(true)
>>>              .add("/ds", dataset)
>>>              .build();
>>> server.start();
>>>
>>>
>>>
>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
>>>>>
>>>>
>>>
>>> dataset = DatasetFactory.create(
>>>>
>>> and also
>>>
>>>> dataset = conn.fetchDataset()
>>>>
>>>
>>> This is a local, in-memory dataset.
>>> In the conn.fetchDataset case it is copied out of the server.
>>>
>>> ...
>>> dataset.begin(ReadWrite.WRITE);
>>> executeSPARQLUpdate
>>>
>>> This is only updating the local copy of the dataset.
>>>
>>> The changes do not go back to Fuseki.
>>> Use RDFConnection.update or UpdateExecutionFactor.createRemote.
>>>
>>>      Andy
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 21/12/17 14:50, ajs6f wrote:
>>>
>>> In the first code example, you have commented out the line that actually
>>>> runs an update. That may be a typo, but now we don't know what you are
>>>> actually running.
>>>>
>>>> In the second, you don't actually show the query you are running after a
>>>> commit, or how you run it.
>>>>
>>>> In both cases, you include a deal of commented-out queries and OntModel
>>>> machinery.
>>>>
>>>> Please, a complete and minimal example.
>>>>
>>>> ajs6f
>>>>
>>>> On Dec 21, 2017, at 5:52 AM, Piotr Nowara <[email protected]>
>>>> wrote:
>>>>
>>>>>
>>>>> Here are gist links to the test classes I mentioned in my previous
>>>>> message:
>>>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
>>>>>
>>>>> Thanks,
>>>>> Piotr
>>>>>
>>>>> 2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>:
>>>>>
>>>>> Attachments don't come through on the list.  Please use a paste or
>>>>> gist.
>>>>>
>>>>>> I hope these examples are short and concise.  Complete, Minimal
>>>>>> Examples
>>>>>> please.
>>>>>>
>>>>>> HTML messes up structured text but:
>>>>>>
>>>>>>          <dependency>
>>>>>>              <groupId>org.apache.jena</groupId>
>>>>>>              <artifactId>apache-jena</artifactId>
>>>>>>              <version>3.5.0</version>
>>>>>>              <type>zip</type>
>>>>>>          </dependency>
>>>>>>
>>>>>> should be:
>>>>>>
>>>>>>        <groupId>org.apache.jena</groupId>
>>>>>>        <artifactId>apache-jena-libs</artifactId>
>>>>>>        <type>pom</type>
>>>>>>
>>>>>> (your picked most of it up via the TDB dependency).
>>>>>>
>>>>>>          <dependency>
>>>>>>              <groupId>org.apache.jena</groupId>
>>>>>>              <artifactId>jena-csv</artifactId>
>>>>>>              <version>3.5.0</version>
>>>>>>              <type>jar</type>
>>>>>>          </dependency>
>>>>>>
>>>>>> Is this necessary for your example?
>>>>>>
>>>>>>      Andy
>>>>>>
>>>>>> On 21/12/17 09:08, Piotr Nowara wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>>
>>>>>>> I'm attaching two simple JAVA classes which I'm using for testing
>>>>>>> (there
>>>>>>> are some comments there describing what results I got). The JAVA app
>>>>>>> and
>>>>>>> Fuseki are on the same server. The FusekiTest3 is invoking
>>>>>>> DatasetAccessorFactory.createHTTP() (so I think this is what you
>>>>>>> mean
>>>>>>> by
>>>>>>> "remote" implementation) and FusekiTest2 is using
>>>>>>> RDFConnection.fetchDataset (which is the slowest operation).
>>>>>>>
>>>>>>> The GRAPH clause gives me expected results (returns the newly added
>>>>>>> triple), but why FROM should be wrong?
>>>>>>>
>>>>>>>
>>>>>>> GRAPH access a named graph.
>>>>>>
>>>>>> FROM describes a dataset to be queried.
>>>>>>
>>>>>> We use FROM clause in many of our queries and we didn't notice
>>>>>> anything
>>>>>>
>>>>>> wrong/unexpected when using TDB dataset. With Fuseki FROM seems to
>>>>>>> return
>>>>>>> the content of the default graph and not the graph indicated by the
>>>>>>> FROM
>>>>>>> <named-graph-IRI>.
>>>>>>>
>>>>>>> Both tests fail to preserve the newly added triple.
>>>>>>>
>>>>>>> Here are the maven artifacts I'm using for the client app (maybe I
>>>>>>> should
>>>>>>> download some Fuseki specific JAR?):
>>>>>>>
>>>>>>>       <dependency>____
>>>>>>>
>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>
>>>>>>>               <artifactId>jena-tdb</artifactId>____
>>>>>>>
>>>>>>>               <version>3.5.0</version>____
>>>>>>>
>>>>>>>               <type>jar</type>____
>>>>>>>
>>>>>>>           </dependency>____
>>>>>>>
>>>>>>>           <dependency>____
>>>>>>>
>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>
>>>>>>>               <artifactId>apache-jena</artifactId>____
>>>>>>>
>>>>>>>               <version>3.5.0</version>____
>>>>>>>
>>>>>>>               <type>zip</type>____
>>>>>>>
>>>>>>>           </dependency>____
>>>>>>>
>>>>>>>           <dependency>____
>>>>>>>
>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>
>>>>>>>               <artifactId>jena-csv</artifactId>____
>>>>>>>
>>>>>>>               <version>3.5.0</version>____
>>>>>>>
>>>>>>>               <type>jar</type>____
>>>>>>>
>>>>>>>           </dependency>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Piotr
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto:
>>>>>>> [email protected]>>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>      On 20/12/17 18:28, Piotr Nowara wrote:
>>>>>>>
>>>>>>>          Hi,
>>>>>>>
>>>>>>>          thanks for answering so quickly.
>>>>>>>
>>>>>>>          I tried two different solutions:
>>>>>>>
>>>>>>>          1) Merging models obtained using DatasetAccessor
>>>>>>>
>>>>>>>
>>>>>>>      Which implementation of DatasetAccessor? (local or remote?)
>>>>>>>
>>>>>>>          Model portal = accessor.getModel("http://www.
>>>>>>> myGraph.com/portal
>>>>>>>          <http://www.myGraph.com/portal>");
>>>>>>>                    Model defaultM = accessor.getModel();
>>>>>>>                    Model external =
>>>>>>>          accessor.getModel("http://www.myGraph.com/external
>>>>>>>          <http://www.myGraph.com/external>
>>>>>>>          ");
>>>>>>>                    dataset =
>>>>>>>          DatasetFactory.create(external.add(portal).add(defaultM));
>>>>>>>
>>>>>>>          2) RDFConnection - works much slower than the method above
>>>>>>>          (which is not
>>>>>>>          surprise since you said it can affect the performance
>>>>>>> negatively)
>>>>>>>
>>>>>>>
>>>>>>>      and this is a remote RDFConnection? (otherwise it should
>>>>>>> perform,
>>>>>>>      with default Isolation, the same)
>>>>>>>
>>>>>>>
>>>>>>>          I noticed two confusing issues when working with those
>>>>>>> datasets:
>>>>>>>          Issue 1: SPARQL SELECT would produce diferent results
>>>>>>>
>>>>>>>
>>>>>>>      in what way different?
>>>>>>>
>>>>>>>          depending on where
>>>>>>>          the named graph IRI was defined in the query (FROM clause
>>>>>>> vd.
>>>>>>>          WHERE clause):
>>>>>>>          SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s ?p
>>>>>>> ?o}
>>>>>>>          behaves differently than:
>>>>>>>          SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s
>>>>>>> ?p
>>>>>>> ?o}}
>>>>>>>
>>>>>>>
>>>>>>>      GRAPH is correct, FROM is wrong.
>>>>>>>
>>>>>>>
>>>>>>>          Issue 2: After ading a triple using INSERT DATA statement
>>>>>>> the
>>>>>>>          triple was
>>>>>>>          present in the graph but dissapeard after closing the
>>>>>>> connection
>>>>>>>          despite
>>>>>>>          the fact I did dataset.commit()
>>>>>>>
>>>>>>>
>>>>>>>      Complete example?
>>>>>>>
>>>>>>>
>>>>>>>          We didn't experience those issues when working with a
>>>>>>> "local"
>>>>>>>          Jena TDB. For
>>>>>>>          now we will probably stick to the TDB version, but someday
>>>>>>> we
>>>>>>>          would need
>>>>>>>          the multi-user functionality Fuseki offers anyway. It seems
>>>>>>> that
>>>>>>>          we will
>>>>>>>          have to revise all our SPARQL queries to make it
>>>>>>> Fuseki-ready
>>>>>>>          which means
>>>>>>>          migrating from TDB to Fuseki will be more difficult for us
>>>>>>> than
>>>>>>>          migrating
>>>>>>>          from another triple-store we were using in the past to Jena
>>>>>>> TDB
>>>>>>>          that went
>>>>>>>          very smoothly.  I'm still wondering whether or not I'm
>>>>>>> missing
>>>>>>>          something
>>>>>>>          regarding Fuseki.
>>>>>>>
>>>>>>>          Thanks,
>>>>>>>          Piotr
>>>>>>>
>>>>>>>
>>>>>>>          2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected]
>>>>>>>          <mailto:[email protected]>>:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              On 19/12/17 21:41, Piotr Nowara wrote:
>>>>>>>
>>>>>>>                  Hi,
>>>>>>>
>>>>>>>                  I got a TDB powered JAVA app which is issuing a lot
>>>>>>> of
>>>>>>>               SPARQL UPDATES and
>>>>>>>                  SELECTS (most of them accessing multiple named
>>>>>>> graphs
>>>>>>> at
>>>>>>>                  once). My app
>>>>>>>                  obtains a Jena connection using this simple API
>>>>>>> call:
>>>>>>>
>>>>>>>                  this.dataset = TDBFactory.createDataset(this.
>>>>>>> storagePath);
>>>>>>>
>>>>>>>                  Then this dataset object is used to run SPARQL
>>>>>>> UPDATES
>>>>>>>                  and SELECTS.
>>>>>>>
>>>>>>>                  I would like to replicate this solution using Jena
>>>>>>>                  Fuseki but I wonder if
>>>>>>>                  that’s possible since the DatasetAccessor class
>>>>>>> provides
>>>>>>>                  only methods to
>>>>>>>                  access separate named graphs. What I need is a
>>>>>>>                  database/dataset level
>>>>>>>                  access. The Fuseki database should be persistent.
>>>>>>>
>>>>>>>                  I'd be grateful for any clue or code example.
>>>>>>>
>>>>>>>              Query and update work on datasets.
>>>>>>>
>>>>>>>              RDFConnection
>>>>>>>              http://jena.apache.org/documentation/rdfconnection/
>>>>>>>              <http://jena.apache.org/documentation/rdfconnection/>
>>>>>>>              is the combined interface to both local and remote
>>>>>>> datasets
>>>>>>>              and includes
>>>>>>>              some operations that include whole GET/POST/PUT of
>>>>>>> datasets
>>>>>>>
>>>>>>>              RDFConnection.connect("http:/localhost:3030/myDataset")
>>>>>>>
>>>>>>>              for migration from local, note that data is copied
>>>>>>> across
>>>>>>>              the network when
>>>>>>>              doing dataset operations. RDFConnection has whole
>>>>>>> dataset
>>>>>>>              operations in the
>>>>>>>              style of SPARQL Graph Store Protocol (=DatasetAccessor)
>>>>>>>              operations.
>>>>>>>              If your graphs and dataset are large is maybe not what
>>>>>>> you
>>>>>>> want.
>>>>>>>
>>>>>>>              Because this across the network, the semantics of lcoal
>>>>>>> and
>>>>>>>              remote are not
>>>>>>>              identical unless you ask the local mode to do copying:
>>>>>>>
>>>>>>>                  RDFConnection.connect(datasets, Isolation.COPY)
>>>>>>>
>>>>>>>              which is a good simulation for a local/remote (and
>>>>>>> slower
>>>>>>>              for local than
>>>>>>>              no COPY)
>>>>>>>
>>>>>>>                     Andy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>                  Thanks,
>>>>>>>
>>>>>>>                  Piotr
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>

Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Reply via email to