Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Piotr Nowara Fri, 12 Jan 2018 11:12:52 -0800

Could anyone comment on issues described in my previous email?

Thanks,
Piotr


2018-01-02 4:54 GMT-05:00 Piotr Nowara <[email protected]>:

> Andy,
>
> thank you for all those details. Now I understand why I was getting not
> what I had expected.
>
> Got three new questions:
>
>    1. How to update the "original" remote Fuseki database with the local
>    changes made after copying the remote content?
>    2. Is the FROM clause behavior somehow configurable to always stay
>    local and not to access a remote DB?
>    3. How to access Jena Fuseki database without downloading its content
>    to a local copy at all? I'd prefer all my SPARQL queries operate on the
>    source database not on some local copy.
>
> And let me ask again my previous question:
> What is the recommended Jena setup and connection API for an app deployed
> for a small team of users (8-12) using a shared Jena database? Let's assume:
>
>    - local network only access
>    - SPARQL queries and other database processes should be executed on
>    the Fuseki server and not on a local dataset.
>    - accessing the database both on Dataset and Model level.
>    - support for multi-graph SPARQL Updates and Queries and
>    - use of an external reasoner via OntModel or InfModel (ideally
>    without the need of copying data to a local dataset)
>
>
> Wish you and Jena community all the best in 2018!
>
> Piotr
>
>
> 2017-12-23 9:44 GMT-05:00 Andy Seaborne <[email protected]>:
>
>>
>>
>> On 22/12/17 12:19, Piotr Nowara wrote:
>> ...
>> (semantics of HTTP operations being "copy")
>> ...
>>
>>>     1. Andy saying FROM clause is wrong and WHERE clause is right. Could
>>>     anyone comment on that? Why this behaves different than in TDB where
>>> my
>>>     FROM clause works as expected.
>>>
>>
>> "FROM" on this kind of dataset (general purpose, in-memory) tries to read
>> from the web, not take a graph from the local dataset.
>> http://www.example.com/portal does really exist and you can HTTP GET
>> from it (it has no triples).
>>
>>     2. No useful documentation found on the topic of migrating from TDB to
>>>     Fuseki (at least I couldn't find it).
>>>     3. Maybe my use case (migrating a single user app with local triple
>>>     store to an enterprise-ready multi-user app) is not good for Fuseki
>>> at all?
>>>     Bu how do I know that? Your docs say Fuseki is the fit for multi-user
>>>     environment.
>>>
>>
>> We haven't seen the details of the application but a big difference
>> betwen the local and remote setups is that while locally, you can use the
>> Jena API, remote data does not behave that way. Interacting with it is by
>> SPARQL (Query, Update, Graph Store Protocol) a bit like JDBC and also like
>> 3-tier architecture web applications - client, app server, database.
>>
>>     4. So many ways to establish a Fuseki connection (different APIs:
>>>     RDFConnection, local or HTTP DatasetAccessor methods, embeded
>>> FusekiServer
>>>     mentioned in the last email) and so little info  on how and when to
>>> use
>>>     them
>>>
>>
>> RDFConnection provides a uniform way to interact with local and remote
>> data.
>>
>> Embedded FusekiServer was mentioned for writing portable tests.
>>
>>     Andy
>>
>>
>> Of course I'd like to keep using Jena if possible. I was able to migrate a
>>> complex analytical application from a commercial triple store to Jena TDB
>>> just by reading TDB-related docs. I say a "complex" app because it is
>>> using
>>> sophisticated OWL reasoning (with several chains of SWRL rules), an
>>> external reasoner (Openllet) and lots of SPARQL queries (some of them are
>>> really twisted). It takes about 300ms to complete a basic analytical
>>> process (which includes a couple of SWRL reasoning iterations) which is
>>> very impressive result and the main reason we'd like keep using Jena in a
>>> more enterprise-friendly scenario. But we don't know how to make the next
>>> step because inserting a simple triple and running a trivial SELECT on a
>>> single named graph seems like a big challenge now.
>>>
>>> So let me ask a simple question: what is the recommended Jena setup when
>>> migrating from a local, single-user app to an environment suitable for a
>>> small team of users (lets say 8-12) to use a shared Jena database? We
>>> assume local network only access and things that already work for us in
>>> the
>>> plain TDB mode which are: accessing the database both on Dataset and
>>> Model
>>> level, support for multi-graph SPARQL Updates and Queries and use of an
>>> external reasoner (via OntModel or InfModel).
>>>
>>> Thanks,
>>> Piotr
>>>
>>>
>>> 2017-12-22 5:34 GMT-05:00 Andy Seaborne <[email protected]>:
>>>
>>> Piotr,
>>>>
>>>> As ajs6f says, it is not possible to recreate your examples.  We don't
>>>> know what' in Fuseki nor how it's configured. The fact some code is
>>>> commented out is also puzzling.
>>>>
>>>> Fuseki can be run in the same process as the examples - this is very
>>>> useful for testing.
>>>>
>>>> See org.apache.jena.fuseki.embedded.FusekiServer
>>>>
>>>> eg.
>>>>
>>>> FusekiServer server= FusekiServer.create()
>>>>              .setPort(port)
>>>>              .setLoopback(true)
>>>>              .add("/ds", dataset)
>>>>              .build();
>>>> server.start();
>>>>
>>>>
>>>>
>>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
>>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
>>>>>>
>>>>>
>>>>
>>>> dataset = DatasetFactory.create(
>>>>>
>>>> and also
>>>>
>>>>> dataset = conn.fetchDataset()
>>>>>
>>>>
>>>> This is a local, in-memory dataset.
>>>> In the conn.fetchDataset case it is copied out of the server.
>>>>
>>>> ...
>>>> dataset.begin(ReadWrite.WRITE);
>>>> executeSPARQLUpdate
>>>>
>>>> This is only updating the local copy of the dataset.
>>>>
>>>> The changes do not go back to Fuseki.
>>>> Use RDFConnection.update or UpdateExecutionFactor.createRemote.
>>>>
>>>>      Andy
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 21/12/17 14:50, ajs6f wrote:
>>>>
>>>> In the first code example, you have commented out the line that actually
>>>>> runs an update. That may be a typo, but now we don't know what you are
>>>>> actually running.
>>>>>
>>>>> In the second, you don't actually show the query you are running after
>>>>> a
>>>>> commit, or how you run it.
>>>>>
>>>>> In both cases, you include a deal of commented-out queries and OntModel
>>>>> machinery.
>>>>>
>>>>> Please, a complete and minimal example.
>>>>>
>>>>> ajs6f
>>>>>
>>>>> On Dec 21, 2017, at 5:52 AM, Piotr Nowara <[email protected]>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>> Here are gist links to the test classes I mentioned in my previous
>>>>>> message:
>>>>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
>>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60
>>>>>>
>>>>>> Thanks,
>>>>>> Piotr
>>>>>>
>>>>>> 2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>:
>>>>>>
>>>>>> Attachments don't come through on the list.  Please use a paste or
>>>>>> gist.
>>>>>>
>>>>>>> I hope these examples are short and concise.  Complete, Minimal
>>>>>>> Examples
>>>>>>> please.
>>>>>>>
>>>>>>> HTML messes up structured text but:
>>>>>>>
>>>>>>>          <dependency>
>>>>>>>              <groupId>org.apache.jena</groupId>
>>>>>>>              <artifactId>apache-jena</artifactId>
>>>>>>>              <version>3.5.0</version>
>>>>>>>              <type>zip</type>
>>>>>>>          </dependency>
>>>>>>>
>>>>>>> should be:
>>>>>>>
>>>>>>>        <groupId>org.apache.jena</groupId>
>>>>>>>        <artifactId>apache-jena-libs</artifactId>
>>>>>>>        <type>pom</type>
>>>>>>>
>>>>>>> (your picked most of it up via the TDB dependency).
>>>>>>>
>>>>>>>          <dependency>
>>>>>>>              <groupId>org.apache.jena</groupId>
>>>>>>>              <artifactId>jena-csv</artifactId>
>>>>>>>              <version>3.5.0</version>
>>>>>>>              <type>jar</type>
>>>>>>>          </dependency>
>>>>>>>
>>>>>>> Is this necessary for your example?
>>>>>>>
>>>>>>>      Andy
>>>>>>>
>>>>>>> On 21/12/17 09:08, Piotr Nowara wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>>
>>>>>>>> I'm attaching two simple JAVA classes which I'm using for testing
>>>>>>>> (there
>>>>>>>> are some comments there describing what results I got). The JAVA app
>>>>>>>> and
>>>>>>>> Fuseki are on the same server. The FusekiTest3 is invoking
>>>>>>>> DatasetAccessorFactory.createHTTP() (so I think this is what you
>>>>>>>> mean
>>>>>>>> by
>>>>>>>> "remote" implementation) and FusekiTest2 is using
>>>>>>>> RDFConnection.fetchDataset (which is the slowest operation).
>>>>>>>>
>>>>>>>> The GRAPH clause gives me expected results (returns the newly added
>>>>>>>> triple), but why FROM should be wrong?
>>>>>>>>
>>>>>>>>
>>>>>>>> GRAPH access a named graph.
>>>>>>>
>>>>>>> FROM describes a dataset to be queried.
>>>>>>>
>>>>>>> We use FROM clause in many of our queries and we didn't notice
>>>>>>> anything
>>>>>>>
>>>>>>> wrong/unexpected when using TDB dataset. With Fuseki FROM seems to
>>>>>>>> return
>>>>>>>> the content of the default graph and not the graph indicated by the
>>>>>>>> FROM
>>>>>>>> <named-graph-IRI>.
>>>>>>>>
>>>>>>>> Both tests fail to preserve the newly added triple.
>>>>>>>>
>>>>>>>> Here are the maven artifacts I'm using for the client app (maybe I
>>>>>>>> should
>>>>>>>> download some Fuseki specific JAR?):
>>>>>>>>
>>>>>>>>       <dependency>____
>>>>>>>>
>>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>>
>>>>>>>>               <artifactId>jena-tdb</artifactId>____
>>>>>>>>
>>>>>>>>               <version>3.5.0</version>____
>>>>>>>>
>>>>>>>>               <type>jar</type>____
>>>>>>>>
>>>>>>>>           </dependency>____
>>>>>>>>
>>>>>>>>           <dependency>____
>>>>>>>>
>>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>>
>>>>>>>>               <artifactId>apache-jena</artifactId>____
>>>>>>>>
>>>>>>>>               <version>3.5.0</version>____
>>>>>>>>
>>>>>>>>               <type>zip</type>____
>>>>>>>>
>>>>>>>>           </dependency>____
>>>>>>>>
>>>>>>>>           <dependency>____
>>>>>>>>
>>>>>>>>               <groupId>org.apache.jena</groupId>____
>>>>>>>>
>>>>>>>>               <artifactId>jena-csv</artifactId>____
>>>>>>>>
>>>>>>>>               <version>3.5.0</version>____
>>>>>>>>
>>>>>>>>               <type>jar</type>____
>>>>>>>>
>>>>>>>>           </dependency>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Piotr
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> 2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto:
>>>>>>>> [email protected]>>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>      On 20/12/17 18:28, Piotr Nowara wrote:
>>>>>>>>
>>>>>>>>          Hi,
>>>>>>>>
>>>>>>>>          thanks for answering so quickly.
>>>>>>>>
>>>>>>>>          I tried two different solutions:
>>>>>>>>
>>>>>>>>          1) Merging models obtained using DatasetAccessor
>>>>>>>>
>>>>>>>>
>>>>>>>>      Which implementation of DatasetAccessor? (local or remote?)
>>>>>>>>
>>>>>>>>          Model portal = accessor.getModel("http://www.
>>>>>>>> myGraph.com/portal
>>>>>>>>          <http://www.myGraph.com/portal>");
>>>>>>>>                    Model defaultM = accessor.getModel();
>>>>>>>>                    Model external =
>>>>>>>>          accessor.getModel("http://www.myGraph.com/external
>>>>>>>>          <http://www.myGraph.com/external>
>>>>>>>>          ");
>>>>>>>>                    dataset =
>>>>>>>>          DatasetFactory.create(external.add(portal).add(defaultM));
>>>>>>>>
>>>>>>>>          2) RDFConnection - works much slower than the method above
>>>>>>>>          (which is not
>>>>>>>>          surprise since you said it can affect the performance
>>>>>>>> negatively)
>>>>>>>>
>>>>>>>>
>>>>>>>>      and this is a remote RDFConnection? (otherwise it should
>>>>>>>> perform,
>>>>>>>>      with default Isolation, the same)
>>>>>>>>
>>>>>>>>
>>>>>>>>          I noticed two confusing issues when working with those
>>>>>>>> datasets:
>>>>>>>>          Issue 1: SPARQL SELECT would produce diferent results
>>>>>>>>
>>>>>>>>
>>>>>>>>      in what way different?
>>>>>>>>
>>>>>>>>          depending on where
>>>>>>>>          the named graph IRI was defined in the query (FROM clause
>>>>>>>> vd.
>>>>>>>>          WHERE clause):
>>>>>>>>          SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s
>>>>>>>> ?p ?o}
>>>>>>>>          behaves differently than:
>>>>>>>>          SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s
>>>>>>>> ?p
>>>>>>>> ?o}}
>>>>>>>>
>>>>>>>>
>>>>>>>>      GRAPH is correct, FROM is wrong.
>>>>>>>>
>>>>>>>>
>>>>>>>>          Issue 2: After ading a triple using INSERT DATA statement
>>>>>>>> the
>>>>>>>>          triple was
>>>>>>>>          present in the graph but dissapeard after closing the
>>>>>>>> connection
>>>>>>>>          despite
>>>>>>>>          the fact I did dataset.commit()
>>>>>>>>
>>>>>>>>
>>>>>>>>      Complete example?
>>>>>>>>
>>>>>>>>
>>>>>>>>          We didn't experience those issues when working with a
>>>>>>>> "local"
>>>>>>>>          Jena TDB. For
>>>>>>>>          now we will probably stick to the TDB version, but someday
>>>>>>>> we
>>>>>>>>          would need
>>>>>>>>          the multi-user functionality Fuseki offers anyway. It seems
>>>>>>>> that
>>>>>>>>          we will
>>>>>>>>          have to revise all our SPARQL queries to make it
>>>>>>>> Fuseki-ready
>>>>>>>>          which means
>>>>>>>>          migrating from TDB to Fuseki will be more difficult for us
>>>>>>>> than
>>>>>>>>          migrating
>>>>>>>>          from another triple-store we were using in the past to
>>>>>>>> Jena TDB
>>>>>>>>          that went
>>>>>>>>          very smoothly.  I'm still wondering whether or not I'm
>>>>>>>> missing
>>>>>>>>          something
>>>>>>>>          regarding Fuseki.
>>>>>>>>
>>>>>>>>          Thanks,
>>>>>>>>          Piotr
>>>>>>>>
>>>>>>>>
>>>>>>>>          2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected]
>>>>>>>>          <mailto:[email protected]>>:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>              On 19/12/17 21:41, Piotr Nowara wrote:
>>>>>>>>
>>>>>>>>                  Hi,
>>>>>>>>
>>>>>>>>                  I got a TDB powered JAVA app which is issuing a
>>>>>>>> lot of
>>>>>>>>               SPARQL UPDATES and
>>>>>>>>                  SELECTS (most of them accessing multiple named
>>>>>>>> graphs
>>>>>>>> at
>>>>>>>>                  once). My app
>>>>>>>>                  obtains a Jena connection using this simple API
>>>>>>>> call:
>>>>>>>>
>>>>>>>>                  this.dataset = TDBFactory.createDataset(this.
>>>>>>>> storagePath);
>>>>>>>>
>>>>>>>>                  Then this dataset object is used to run SPARQL
>>>>>>>> UPDATES
>>>>>>>>                  and SELECTS.
>>>>>>>>
>>>>>>>>                  I would like to replicate this solution using Jena
>>>>>>>>                  Fuseki but I wonder if
>>>>>>>>                  that’s possible since the DatasetAccessor class
>>>>>>>> provides
>>>>>>>>                  only methods to
>>>>>>>>                  access separate named graphs. What I need is a
>>>>>>>>                  database/dataset level
>>>>>>>>                  access. The Fuseki database should be persistent.
>>>>>>>>
>>>>>>>>                  I'd be grateful for any clue or code example.
>>>>>>>>
>>>>>>>>              Query and update work on datasets.
>>>>>>>>
>>>>>>>>              RDFConnection
>>>>>>>>              http://jena.apache.org/documentation/rdfconnection/
>>>>>>>>              <http://jena.apache.org/documentation/rdfconnection/>
>>>>>>>>              is the combined interface to both local and remote
>>>>>>>> datasets
>>>>>>>>              and includes
>>>>>>>>              some operations that include whole GET/POST/PUT of
>>>>>>>> datasets
>>>>>>>>
>>>>>>>>              RDFConnection.connect("http:/
>>>>>>>> localhost:3030/myDataset")
>>>>>>>>
>>>>>>>>              for migration from local, note that data is copied
>>>>>>>> across
>>>>>>>>              the network when
>>>>>>>>              doing dataset operations. RDFConnection has whole
>>>>>>>> dataset
>>>>>>>>              operations in the
>>>>>>>>              style of SPARQL Graph Store Protocol (=DatasetAccessor)
>>>>>>>>              operations.
>>>>>>>>              If your graphs and dataset are large is maybe not what
>>>>>>>> you
>>>>>>>> want.
>>>>>>>>
>>>>>>>>              Because this across the network, the semantics of
>>>>>>>> lcoal and
>>>>>>>>              remote are not
>>>>>>>>              identical unless you ask the local mode to do copying:
>>>>>>>>
>>>>>>>>                  RDFConnection.connect(datasets, Isolation.COPY)
>>>>>>>>
>>>>>>>>              which is a good simulation for a local/remote (and
>>>>>>>> slower
>>>>>>>>              for local than
>>>>>>>>              no COPY)
>>>>>>>>
>>>>>>>>                     Andy
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                  Thanks,
>>>>>>>>
>>>>>>>>                  Piotr
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>
>>>
>

Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Reply via email to