Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Piotr Nowara Thu, 21 Dec 2017 02:53:34 -0800

Here are gist links to the test classes I mentioned in my previous message:
https://gist.github.com/PiotrNowara/586ebb3539bfbd0244bf7b7f606a64b8
https://gist.github.com/PiotrNowara/b3a84262ff0311d748efe03c7cc19d60


Thanks,
Piotr

2017-12-21 10:38 GMT+01:00 Andy Seaborne <[email protected]>:

> Attachments don't come through on the list.  Please use a paste or gist.
> I hope these examples are short and concise.  Complete, Minimal Examples
> please.
>
> HTML messes up structured text but:
>
>         <dependency>
>             <groupId>org.apache.jena</groupId>
>             <artifactId>apache-jena</artifactId>
>             <version>3.5.0</version>
>             <type>zip</type>
>         </dependency>
>
> should be:
>
>       <groupId>org.apache.jena</groupId>
>       <artifactId>apache-jena-libs</artifactId>
>       <type>pom</type>
>
> (your picked most of it up via the TDB dependency).
>
>         <dependency>
>             <groupId>org.apache.jena</groupId>
>             <artifactId>jena-csv</artifactId>
>             <version>3.5.0</version>
>             <type>jar</type>
>         </dependency>
>
> Is this necessary for your example?
>
>     Andy
>
> On 21/12/17 09:08, Piotr Nowara wrote:
>
>> Hi,
>>
>> I'm attaching two simple JAVA classes which I'm using for testing (there
>> are some comments there describing what results I got). The JAVA app and
>> Fuseki are on the same server. The FusekiTest3 is invoking
>> DatasetAccessorFactory.createHTTP() (so I think this is what you mean by
>> "remote" implementation) and FusekiTest2 is using
>> RDFConnection.fetchDataset (which is the slowest operation).
>>
>> The GRAPH clause gives me expected results (returns the newly added
>> triple), but why FROM should be wrong?
>>
>
> GRAPH access a named graph.
>
> FROM describes a dataset to be queried.
>
> We use FROM clause in many of our queries and we didn't notice anything
>> wrong/unexpected when using TDB dataset. With Fuseki FROM seems to return
>> the content of the default graph and not the graph indicated by the FROM
>> <named-graph-IRI>.
>>
>> Both tests fail to preserve the newly added triple.
>>
>> Here are the maven artifacts I'm using for the client app (maybe I should
>> download some Fuseki specific JAR?):
>>
>>      <dependency>____
>>
>>              <groupId>org.apache.jena</groupId>____
>>
>>              <artifactId>jena-tdb</artifactId>____
>>
>>              <version>3.5.0</version>____
>>
>>              <type>jar</type>____
>>
>>          </dependency>____
>>
>>          <dependency>____
>>
>>              <groupId>org.apache.jena</groupId>____
>>
>>              <artifactId>apache-jena</artifactId>____
>>
>>              <version>3.5.0</version>____
>>
>>              <type>zip</type>____
>>
>>          </dependency>____
>>
>>          <dependency>____
>>
>>              <groupId>org.apache.jena</groupId>____
>>
>>              <artifactId>jena-csv</artifactId>____
>>
>>              <version>3.5.0</version>____
>>
>>              <type>jar</type>____
>>
>>          </dependency>
>>
>>
>> Thanks,
>>
>> Piotr
>>
>>
>>
>>
>> 2017-12-20 16:52 GMT-05:00 Andy Seaborne <[email protected] <mailto:
>> [email protected]>>:
>>
>>
>>
>>
>>     On 20/12/17 18:28, Piotr Nowara wrote:
>>
>>         Hi,
>>
>>         thanks for answering so quickly.
>>
>>         I tried two different solutions:
>>
>>         1) Merging models obtained using DatasetAccessor
>>
>>
>>     Which implementation of DatasetAccessor? (local or remote?)
>>
>>         Model portal = accessor.getModel("http://www.myGraph.com/portal
>>         <http://www.myGraph.com/portal>");
>>                   Model defaultM = accessor.getModel();
>>                   Model external =
>>         accessor.getModel("http://www.myGraph.com/external
>>         <http://www.myGraph.com/external>
>>         ");
>>                   dataset =
>>         DatasetFactory.create(external.add(portal).add(defaultM));
>>
>>         2) RDFConnection - works much slower than the method above
>>         (which is not
>>         surprise since you said it can affect the performance negatively)
>>
>>
>>     and this is a remote RDFConnection? (otherwise it should perform,
>>     with default Isolation, the same)
>>
>>
>>         I noticed two confusing issues when working with those datasets:
>>         Issue 1: SPARQL SELECT would produce diferent results
>>
>>
>>     in what way different?
>>
>>         depending on where
>>         the named graph IRI was defined in the query (FROM clause vd.
>>         WHERE clause):
>>         SELECT * FROM <http://www.myGraph.com/portal> WHERE {?s ?p ?o}
>>         behaves differently than:
>>         SELECT * WHERE {GRAPH <http://www.myGraph.com/portal> {?s ?p ?o}}
>>
>>
>>     GRAPH is correct, FROM is wrong.
>>
>>
>>         Issue 2: After ading a triple using INSERT DATA statement the
>>         triple was
>>         present in the graph but dissapeard after closing the connection
>>         despite
>>         the fact I did dataset.commit()
>>
>>
>>     Complete example?
>>
>>
>>         We didn't experience those issues when working with a "local"
>>         Jena TDB. For
>>         now we will probably stick to the TDB version, but someday we
>>         would need
>>         the multi-user functionality Fuseki offers anyway. It seems that
>>         we will
>>         have to revise all our SPARQL queries to make it Fuseki-ready
>>         which means
>>         migrating from TDB to Fuseki will be more difficult for us than
>>         migrating
>>         from another triple-store we were using in the past to Jena TDB
>>         that went
>>         very smoothly.  I'm still wondering whether or not I'm missing
>>         something
>>         regarding Fuseki.
>>
>>         Thanks,
>>         Piotr
>>
>>
>>         2017-12-20 5:40 GMT-05:00 Andy Seaborne <[email protected]
>>         <mailto:[email protected]>>:
>>
>>
>>
>>
>>             On 19/12/17 21:41, Piotr Nowara wrote:
>>
>>                 Hi,
>>
>>                 I got a TDB powered JAVA app which is issuing a lot of
>>              SPARQL UPDATES and
>>                 SELECTS (most of them accessing multiple named graphs at
>>                 once). My app
>>                 obtains a Jena connection using this simple API call:
>>
>>                 this.dataset = TDBFactory.createDataset(this.
>> storagePath);
>>
>>                 Then this dataset object is used to run SPARQL UPDATES
>>                 and SELECTS.
>>
>>                 I would like to replicate this solution using Jena
>>                 Fuseki but I wonder if
>>                 that’s possible since the DatasetAccessor class provides
>>                 only methods to
>>                 access separate named graphs. What I need is a
>>                 database/dataset level
>>                 access. The Fuseki database should be persistent.
>>
>>                 I'd be grateful for any clue or code example.
>>
>>             Query and update work on datasets.
>>
>>             RDFConnection
>>             http://jena.apache.org/documentation/rdfconnection/
>>             <http://jena.apache.org/documentation/rdfconnection/>
>>             is the combined interface to both local and remote datasets
>>             and includes
>>             some operations that include whole GET/POST/PUT of datasets
>>
>>             RDFConnection.connect("http:/localhost:3030/myDataset")
>>
>>             for migration from local, note that data is copied across
>>             the network when
>>             doing dataset operations. RDFConnection has whole dataset
>>             operations in the
>>             style of SPARQL Graph Store Protocol (=DatasetAccessor)
>>             operations.
>>             If your graphs and dataset are large is maybe not what you
>> want.
>>
>>             Because this across the network, the semantics of lcoal and
>>             remote are not
>>             identical unless you ask the local mode to do copying:
>>
>>                 RDFConnection.connect(datasets, Isolation.COPY)
>>
>>             which is a good simulation for a local/remote (and slower
>>             for local than
>>             no COPY)
>>
>>                    Andy
>>
>>
>>
>>                 Thanks,
>>
>>                 Piotr
>>
>>
>>
>>
>>

Re: Jena TDB to Fuseki migration problem: how to obtain entire dataset?

Reply via email to