Could anyone comment on issues described in my previous email? Thanks, Piotr
2018-01-02 4:54 GMT-05:00 Piotr Nowara <[email protected]>: > Andy, > > thank you for all those details. Now I understand why I was getting not > what I had expected. > > Got three new questions: > > 1. How to update the "original" remote Fuseki database with the local > changes made after copying the remote content? > 2. Is the FROM clause behavior somehow configurable to always stay > local and not to access a remote DB? > 3. How to access Jena Fuseki database without downloading its content > to a local copy at all? I'd prefer all my SPARQL queries operate on the > source database not on some local copy. > > And let me ask again my previous question: > What is the recommended Jena setup and connection API for an app deployed > for a small team of users (8-12) using a shared Jena database? Let's assume: > > - local network only access > - SPARQL queries and other database processes should be executed on > the Fuseki server and not on a local dataset. > - accessing the database both on Dataset and Model level. > - support for multi-graph SPARQL Updates and Queries and > - use of an external reasoner via OntModel or InfModel (ideally > without the need of copying data to a local dataset) > > > Wish you and Jena community all the best in 2018! > > Piotr > > > 2017-12-23 9:44 GMT-05:00 Andy Seaborne <[email protected]>: > >> >> >> On 22/12/17 12:19, Piotr Nowara wrote: >> ... >> (semantics of HTTP operations being "copy") >> ... >> >>> 1. Andy saying FROM clause is wrong and WHERE clause is right. Could >>> anyone comment on that? Why this behaves different than in TDB where >>> my >>> FROM clause works as expected. >>> >> >> "FROM" on this kind of dataset (general purpose, in-memory) tries to read >> from the web, not take a graph from the local dataset. >> http://www.example.com/portal does really exist and you can HTTP GET >> from it (it has no triples). >> >> 2. No useful documentation found on the topic of migrating from TDB to >>> Fuseki (at least I couldn't find it). >>> 3. Maybe my use case (migrating a single user app with local triple >>> store to an enterprise-ready multi-user app) is not good for Fuseki >>> at all? >>> Bu how do I know that? Your docs say Fuseki is the fit for multi-user >>> environment. >>> >> >> We haven't seen the details of the application but a big difference >> betwen the local and remote setups is that while locally, you can use the >> Jena API, remote data does not behave that way. Interacting with it is by >> SPARQL (Query, Update, Graph Store Protocol) a bit like JDBC and also like >> 3-tier architecture web applications - client, app server, database. >> >> 4. So many ways to establish a Fuseki connection (different APIs: >>> RDFConnection, local or HTTP DatasetAccessor methods, embeded >>> FusekiServer >>> mentioned in the last email) and so little info on how and when to >>> use >>> them >>> >> >> RDFConnection provides a uniform way to interact with local and remote >> data. >> >> Embedded FusekiServer was mentioned for writing portable tests. >> >> Andy >> >> >> Of course I'd like to keep using Jena if possible. I was able to migrate a >>> complex analytical application from a commercial triple store to Jena TDB >>> just by reading TDB-related docs. I say a "complex" app because it is >>> using >>> sophisticated OWL reasoning (with several chains of SWRL rules), an >>> external reasoner (Openllet) and lots of SPARQL queries (some of them are >>> really twisted). It takes about 300ms to complete a basic analytical >>> process (which includes a couple of SWRL reasoning iterations) which is >>> very impressive result and the main reason we'd like keep using Jena in a >>> more enterprise-friendly scenario. But we don't know how to make the next >>> step because inserting a simple triple and running a trivial SELECT on a >>> single named graph seems like a big challenge now. >>> >>> So let me ask a simple question: what is the recommended Jena setup when >>> migrating from a local, single-user app to an environment suitable for a >>> small team of users (lets say 8-12) to use a shared Jena database? We >>> assume local network only access and things that already work for us in >>> the >>> plain TDB mode which are: accessing the database both on Dataset and >>> Model >>> level, support for multi-graph SPARQL Updates and Queries and use of an >>> external reasoner (via OntModel or InfModel). >>> >>> Thanks, >>> Piotr >>> >>> >>> 2017-12-22 5:34 GMT-05:00 Andy Seaborne <[email protected]>: >>> >>> Piotr, >>>> >>>> As ajs6f says, it is not possible to recreate your examples. We don't >>>> know what' in Fuseki nor how it's configured. The fact some code is >>>> commented out is also puzzling. >>>> >>>> Fuseki can be run in the same process as the examples - this is very >>>> useful for testing. >>>> >>>> See org.apache.jena.fuseki.embedded.FusekiServer >>>> >>>> eg. >>>> >>>> FusekiServer server= FusekiServer.create() >>>> .setPort(port) >>>> .setLoopback(true) >>>> .add("/ds", dataset) >>>> .build(); >>>> server.start(); >>>> >>>> >>>> >>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8 >>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60 >>>>>> >>>>> >>>> >>>> dataset = DatasetFactory.create( >>>>> >>>> and also >>>> >>>>> dataset = conn.fetchDataset() >>>>> >>>> >>>> This is a local, in-memory dataset. >>>> In the conn.fetchDataset case it is copied out of the server. >>>> >>>> ... >>>> dataset.begin(ReadWrite.WRITE); >>>> executeSPARQLUpdate >>>> >>>> This is only updating the local copy of the dataset. >>>> >>>> The changes do not go back to Fuseki. >>>> Use RDFConnection.update or UpdateExecutionFactor.createRemote. >>>> >>>> Andy >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 21/12/17 14:50, ajs6f wrote: >>>> >>>> In the first code example, you have commented out the line that actually >>>>> runs an update. That may be a typo, but now we don't know what you are >>>>> actually running. >>>>> >>>>> In the second, you don't actually show the query you are running after >>>>> a >>>>> commit, or how you run it. >>>>> >>>>> In both cases, you include a deal of commented-out queries and OntModel >>>>> machinery. >>>>> >>>>> Please, a complete and minimal example. >>>>> >>>>> ajs6f >>>>> >>>>> On Dec 21, 2017, at 5:52 AM, Piotr Nowara <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> Here are gist links to the test classes I mentioned in my previous >>>>>> message: >>>>>> https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8 >>>>>> https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60 >>>>>> >>>>>> Thanks, >>>>>> Piotr >>>>>> >>>>>> 2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>: >>>>>> >>>>>> Attachments don't come through on the list. Please use a paste or >>>>>> gist. >>>>>> >>>>>>> I hope these examples are short and concise. Complete, Minimal >>>>>>> Examples >>>>>>> please. >>>>>>> >>>>>>> HTML messes up structured text but: >>>>>>> >>>>>>> <dependency> >>>>>>> <groupId>org.apache.jena</groupId> >>>>>>> <artifactId>apache-jena</artifactId> >>>>>>> <version>3.5.0</version> >>>>>>> <type>zip</type> >>>>>>> </dependency> >>>>>>> >>>>>>> should be: >>>>>>> >>>>>>> <groupId>org.apache.jena</groupId> >>>>>>> <artifactId>apache-jena-libs</artifactId> >>>>>>> <type>pom</type> >>>>>>> >>>>>>> (your picked most of it up via the TDB dependency). >>>>>>> >>>>>>> <dependency> >>>>>>> <groupId>org.apache.jena</groupId> >>>>>>> <artifactId>jena-csv</artifactId> >>>>>>> <version>3.5.0</version> >>>>>>> <type>jar</type> >>>>>>> </dependency> >>>>>>> >>>>>>> Is this necessary for your example? >>>>>>> >>>>>>> Andy >>>>>>> >>>>>>> On 21/12/17 09:08, Piotr Nowara wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>>> >>>>>>>> I'm attaching two simple JAVA classes which I'm using for testing >>>>>>>> (there >>>>>>>> are some comments there describing what results I got). The JAVA app >>>>>>>> and >>>>>>>> Fuseki are on the same server. The FusekiTest3 is invoking >>>>>>>> DatasetAccessorFactory.createHTTP() (so I think this is what you >>>>>>>> mean >>>>>>>> by >>>>>>>> "remote" implementation) and FusekiTest2 is using >>>>>>>> RDFConnection.fetchDataset (which is the slowest operation). >>>>>>>> >>>>>>>> The GRAPH clause gives me expected results (returns the newly added >>>>>>>> triple), but why FROM should be wrong? >>>>>>>> >>>>>>>> >>>>>>>> GRAPH access a named graph. >>>>>>> >>>>>>> FROM describes a dataset to be queried. >>>>>>> >>>>>>> We use FROM clause in many of our queries and we didn't notice >>>>>>> anything >>>>>>> >>>>>>> wrong/unexpected when using TDB dataset. With Fuseki FROM seems to >>>>>>>> return >>>>>>>> the content of the default graph and not the graph indicated by the >>>>>>>> FROM >>>>>>>> <named-graph-IRI>. >>>>>>>> >>>>>>>> Both tests fail to preserve the newly added triple. >>>>>>>> >>>>>>>> Here are the maven artifacts I'm using for the client app (maybe I >>>>>>>> should >>>>>>>> download some Fuseki specific JAR?): >>>>>>>> >>>>>>>> <dependency>____ >>>>>>>> >>>>>>>> <groupId>org.apache.jena</groupId>____ >>>>>>>> >>>>>>>> <artifactId>jena-tdb</artifactId>____ >>>>>>>> >>>>>>>> <version>3.5.0</version>____ >>>>>>>> >>>>>>>> <type>jar</type>____ >>>>>>>> >>>>>>>> </dependency>____ >>>>>>>> >>>>>>>> <dependency>____ >>>>>>>> >>>>>>>> <groupId>org.apache.jena</groupId>____ >>>>>>>> >>>>>>>> <artifactId>apache-jena</artifactId>____ >>>>>>>> >>>>>>>> <version>3.5.0</version>____ >>>>>>>> >>>>>>>> <type>zip</type>____ >>>>>>>> >>>>>>>> </dependency>____ >>>>>>>> >>>>>>>> <dependency>____ >>>>>>>> >>>>>>>> <groupId>org.apache.jena</groupId>____ >>>>>>>> >>>>>>>> <artifactId>jena-csv</artifactId>____ >>>>>>>> >>>>>>>> <version>3.5.0</version>____ >>>>>>>> >>>>>>>> <type>jar</type>____ >>>>>>>> >>>>>>>> </dependency> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Piotr >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto: >>>>>>>> [email protected]>>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 20/12/17 18:28, Piotr Nowara wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> thanks for answering so quickly. >>>>>>>> >>>>>>>> I tried two different solutions: >>>>>>>> >>>>>>>> 1) Merging models obtained using DatasetAccessor >>>>>>>> >>>>>>>> >>>>>>>> Which implementation of DatasetAccessor? (local or remote?) >>>>>>>> >>>>>>>> Model portal = accessor.getModel("http://www. >>>>>>>> myGraph.com/portal >>>>>>>> <http://www.myGraph.com/portal>"); >>>>>>>> Model defaultM = accessor.getModel(); >>>>>>>> Model external = >>>>>>>> accessor.getModel("http://www.myGraph.com/external >>>>>>>> <http://www.myGraph.com/external> >>>>>>>> "); >>>>>>>> dataset = >>>>>>>> DatasetFactory.create(external.add(portal).add(defaultM)); >>>>>>>> >>>>>>>> 2) RDFConnection - works much slower than the method above >>>>>>>> (which is not >>>>>>>> surprise since you said it can affect the performance >>>>>>>> negatively) >>>>>>>> >>>>>>>> >>>>>>>> and this is a remote RDFConnection? (otherwise it should >>>>>>>> perform, >>>>>>>> with default Isolation, the same) >>>>>>>> >>>>>>>> >>>>>>>> I noticed two confusing issues when working with those >>>>>>>> datasets: >>>>>>>> Issue 1: SPARQL SELECT would produce diferent results >>>>>>>> >>>>>>>> >>>>>>>> in what way different? >>>>>>>> >>>>>>>> depending on where >>>>>>>> the named graph IRI was defined in the query (FROM clause >>>>>>>> vd. >>>>>>>> WHERE clause): >>>>>>>> SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s >>>>>>>> ?p ?o} >>>>>>>> behaves differently than: >>>>>>>> SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s >>>>>>>> ?p >>>>>>>> ?o}} >>>>>>>> >>>>>>>> >>>>>>>> GRAPH is correct, FROM is wrong. >>>>>>>> >>>>>>>> >>>>>>>> Issue 2: After ading a triple using INSERT DATA statement >>>>>>>> the >>>>>>>> triple was >>>>>>>> present in the graph but dissapeard after closing the >>>>>>>> connection >>>>>>>> despite >>>>>>>> the fact I did dataset.commit() >>>>>>>> >>>>>>>> >>>>>>>> Complete example? >>>>>>>> >>>>>>>> >>>>>>>> We didn't experience those issues when working with a >>>>>>>> "local" >>>>>>>> Jena TDB. For >>>>>>>> now we will probably stick to the TDB version, but someday >>>>>>>> we >>>>>>>> would need >>>>>>>> the multi-user functionality Fuseki offers anyway. It seems >>>>>>>> that >>>>>>>> we will >>>>>>>> have to revise all our SPARQL queries to make it >>>>>>>> Fuseki-ready >>>>>>>> which means >>>>>>>> migrating from TDB to Fuseki will be more difficult for us >>>>>>>> than >>>>>>>> migrating >>>>>>>> from another triple-store we were using in the past to >>>>>>>> Jena TDB >>>>>>>> that went >>>>>>>> very smoothly. I'm still wondering whether or not I'm >>>>>>>> missing >>>>>>>> something >>>>>>>> regarding Fuseki. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Piotr >>>>>>>> >>>>>>>> >>>>>>>> 2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected] >>>>>>>> <mailto:[email protected]>>: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 19/12/17 21:41, Piotr Nowara wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I got a TDB powered JAVA app which is issuing a >>>>>>>> lot of >>>>>>>> SPARQL UPDATES and >>>>>>>> SELECTS (most of them accessing multiple named >>>>>>>> graphs >>>>>>>> at >>>>>>>> once). My app >>>>>>>> obtains a Jena connection using this simple API >>>>>>>> call: >>>>>>>> >>>>>>>> this.dataset = TDBFactory.createDataset(this. >>>>>>>> storagePath); >>>>>>>> >>>>>>>> Then this dataset object is used to run SPARQL >>>>>>>> UPDATES >>>>>>>> and SELECTS. >>>>>>>> >>>>>>>> I would like to replicate this solution using Jena >>>>>>>> Fuseki but I wonder if >>>>>>>> that’s possible since the DatasetAccessor class >>>>>>>> provides >>>>>>>> only methods to >>>>>>>> access separate named graphs. What I need is a >>>>>>>> database/dataset level >>>>>>>> access. The Fuseki database should be persistent. >>>>>>>> >>>>>>>> I'd be grateful for any clue or code example. >>>>>>>> >>>>>>>> Query and update work on datasets. >>>>>>>> >>>>>>>> RDFConnection >>>>>>>> http://jena.apache.org/documentation/rdfconnection/ >>>>>>>> <http://jena.apache.org/documentation/rdfconnection/> >>>>>>>> is the combined interface to both local and remote >>>>>>>> datasets >>>>>>>> and includes >>>>>>>> some operations that include whole GET/POST/PUT of >>>>>>>> datasets >>>>>>>> >>>>>>>> RDFConnection.connect("http:/ >>>>>>>> localhost:3030/myDataset") >>>>>>>> >>>>>>>> for migration from local, note that data is copied >>>>>>>> across >>>>>>>> the network when >>>>>>>> doing dataset operations. RDFConnection has whole >>>>>>>> dataset >>>>>>>> operations in the >>>>>>>> style of SPARQL Graph Store Protocol (=DatasetAccessor) >>>>>>>> operations. >>>>>>>> If your graphs and dataset are large is maybe not what >>>>>>>> you >>>>>>>> want. >>>>>>>> >>>>>>>> Because this across the network, the semantics of >>>>>>>> lcoal and >>>>>>>> remote are not >>>>>>>> identical unless you ask the local mode to do copying: >>>>>>>> >>>>>>>> RDFConnection.connect(datasets, Isolation.COPY) >>>>>>>> >>>>>>>> which is a good simulation for a local/remote (and >>>>>>>> slower >>>>>>>> for local than >>>>>>>> no COPY) >>>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Piotr >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>> >>> >
