RE: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, Persisting named graphs upon server restart

Pierre Grenon Tue, 19 Feb 2019 05:30:24 -0800

Hey Andy,

Sorry I don’t mean to be agonisingly thick but I’m not sure I follow the 
conclusion and I don’t get how to modify the config file that I had attached 
for a TDB config.


I didn’t modify the data on disc model. I added a sparql update method to the 
inference model and I removed the explicit link to the union graph. I loaded in 
both data and inference models. I lost the ability to query without named 
graphs (so I have to use GRAPH) and the inference model wasn’t loaded with disc 
saved data when restarting.

To the question:

>> What is the prescribed way of keeping disc data and inference datasets in 
>> synch?

Your answer is two parts:

> Update via the inference model.

This means I keep two separate models, right? One in memory where I inference. 
One on disk where I just store data. But then they remain disconnected and I 
can’t initialise the inference model with the disk data in any case. Sorry I am 
a bit confused.

> Don't wire it to the union graph.

What does this mean? Will the default graph give me access to the union of 
graphs?

In the config file, do I entirely get rid of this or just the last clause?

>> # Intermediate graph referencing the default union graph
>> :g rdf:type tdb:GraphTDB ;
>> tdb:dataset :tdbDataset ;
>> tdb:graphName <urn:x-arq:UnionGraph> ;
>> .

Thank you,
Pierre



From: Andy Seaborne [mailto:[email protected]]
Sent: 09 February 2019 17:53
To: [email protected]
Subject: Re: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, 
Persisting named graphs upon server restart



On 04/02/2019 12:31, Pierre Grenon wrote:
> Hi,
>
> following up after going through my attempts more systematically again. I'm 
> trying to be as specific and clear as I can. Any feedback most appreciated.
>
> Many thanks,
> Pierre
>
> 1. It is possible to have a configuration file in which data is loaded into a 
> TDB and inferences are ran over this data. In this case:
>
> 1.a Data in named graphs created using SPARQL Update into a TDB dataset 
> persists upon restart.

Data must be loaded through the inference graph for the inferencer to
notice the change.

So the SPARQL updates can't create a new graph. Assemblers have a fixed
configuration.

(You could have one graph per database and upload new assemblers while
Fuseki is running..)

>
> 1.b Assertional data in these named graphs is immediately available to the 
> reasoning endpoint without server restart.
>
> 1.c Inference on data loaded using SPARQL Update requires restart of the 
> server after upload.
>
> 1.d CLEAR ALL in the TDB dataset endpoint requires server restart to have the 
> inference dataset emptied. (Queries to the reasoning endpoint for either 
> assertional or inferred data both return the same results as prior to 
> clearing the TDB dataset.)

Same general point - if you manipulate the database directly, the
inference code doesn't know a change has happened or what has changed.

> 2. TDB2 does not allow this --- or is it, at the moment only? As per OP in 
> this thread, the configuration adapted to TDB2 breaks. Based on Andy's 
> response, this may be caused by Bug Jena-1633. Would fixing the bug be enough 
> to allow for the configuration using TDB2?

JENA-1663.

> 3. Inference datasets do not synch with the underlying TDB(2) datasets (1.b 
> and 1.c in virtue of the in memory nature of inference models and the way 
> configuration files are handled as per Andy's and ajs6f 's responses).
>
> In view of this, however, 1.b is really weird.
>
> 4. Adding a service update method to the reasoning service does not seem to 
> allow updating the inference dataset. Sending SPARQL Update to the inference 
> endpoint does not result in either additional assertional or inferred data. 
> (Although, per 1.b, asserted data is returned when the SPARQL Update is sent 
> to the TDB endpoint.)

The base graph of the inference model is updated.

But you have that set to <urn:x-arq:UnionGraph>.

That applies to SPARQL query - the updates will have gone to the real
default graph but that is hidden by your setup.

>
>
> Question:
>
> What is the prescribed way of keeping disc data and inference datasets in 
> synch?

Update via the inference model.
Don't wire it to the union graph.

THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION. 
IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL 
IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS 
E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE 
MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN. 

IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES 
MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS 
"THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT 
MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE 
FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION 
(AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014). 
(https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html)
COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION 
AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION 
ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS, 
PLEASE SEE https://www.horizon-asset.co.uk/market-soundings/. 

HORIZON ASSET LLP IS AUTHORISED AND REGULATED 
BY THE FINANCIAL CONDUCT AUTHORITY.


>
> Is it:
>
> P1 - upon SPARQL Update to disc data, restart server (and reinitialise 
> inference dataset)?
> This makes it difficult to manage successive updates, especially when there 
> may be dependencies between states them, e.g., in in order to make update 2 I 
> need to have done update 1, I need to restart after update 1.
>
> Given that TDB only works at the moment, what is the 'transactional' meaning 
> of having to do this?
>
> P2 - upon SPARQL Update to disc data, SPARQL Update inference dataset. Is it 
> possible to update the inference dataset? In that case, is it possible to 
> guarantee that the two datasets are in synch? Does TDB versus TDB2 matter?
>
> 5. Not for self that property chains are not supported by the OWLFBReasoner.
>
>
> ##### TDB Configuration
> ##### From:
>
> https://stackoverflow.com/questions/47568703/named-graphs-v-default-graph-behaviour-in-apache-jena-fuseki<https://stackoverflow.com/questions/47568703/named-graphs-v-default-graph-behaviour-in-apache-jena-fuseki>
> @prefix : <http://base/#> .
> @prefix tdb: 
> <http://jena.hpl.hp.com/2008/tdb#<http://jena.hpl.hp.com/2008/tdb#>> .
> @prefix rdf: 
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/1999/02/22-rdf-syntax-ns#>>
>  .
> @prefix ja: 
> <http://jena.hpl.hp.com/2005/11/Assembler#<http://jena.hpl.hp.com/2005/11/Assembler#>>
>  .
> @prefix rdfs: 
> <http://www.w3.org/2000/01/rdf-schema#<http://www.w3.org/2000/01/rdf-schema#>>
>  .
> @prefix fuseki: 
> <http://jena.apache.org/fuseki#<http://jena.apache.org/fuseki#>> .
>
> # TDB
> tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
> tdb:GraphTDB rdfs:subClassOf ja:Model .
>
>
> # Service 1: Dataset endpoint (no reasoning)
> :dataService a fuseki:Service ;
> fuseki:name "tdbEnpointTDBB" ;
> fuseki:serviceQuery "sparql", "query" ;
> fuseki:serviceUpdate "update" ;
> fuseki:dataset :tdbDataset ;
> .
>
> # Service 2: Reasoning endpoint
> :reasoningService a fuseki:Service ;
> fuseki:dataset :infDataset ;
> fuseki:name "reasoningEndpointTDBB" ;
> fuseki:serviceQuery "query", "sparql" ;
> fuseki:serviceReadGraphStore "get" ;
> .
>
> # Inference dataset
> :infDataset rdf:type ja:RDFDataset ;
> ja:defaultGraph :infModel ;
> .
>
> # Inference model
> :infModel a ja:InfModel ;
> ja:baseModel :g ;
>
> ja:reasoner [
> ja:reasonerURL 
> <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>
>  ;
> ] ;
> .
>
> # Intermediate graph referencing the default union graph
> :g rdf:type tdb:GraphTDB ;
> tdb:dataset :tdbDataset ;
> tdb:graphName <urn:x-arq:UnionGraph> ;
> .
>
> # The location of the TDB dataset
> :tdbDataset rdf:type tdb:DatasetTDB ;
> tdb:location "C:\\dev\\apache-jena-fuseki-3.8.0\\run/databases/tdbB" ;
> tdb:unionDefaultGraph true ;
> .
>
> From: Pierre Grenon
> Sent: 01 February 2019 15:07
> To: '[email protected]'
> Subject: RE: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, 
> Persisting named graphs upon server restart
>
>
> I'll address you two, fine gentlemen, at once if that's OK.
>
>> On 31/01/2019 17:57, ajs6f wrote:
>>>> 2/ It is not possible in an assembler/Fuseki configuration file, to create 
>>>> a new named graph and have a another inference graph put around that new 
>>>> graph at runtime.
>>>
>>> Just to pull on one of these threads, my understanding is that this 
>>> essentially because the assembler system works only by names. IOW, there's 
>>> no such thing as a "variable", and a blank node doesn't function as a slot 
>>> (as it might in a SPARQL query), just as an nameless node. So you have to 
>>> know the specific name of any specific named graph to which you want to 
>>> refer. A named graph that doesn't yet exist and may have any name at all 
>>> when it does obviously doesn't fit into that.
>>>
>
> I find this difficult to follow. By name you mean a value to ja:graphName so 
> something like <urn:my:beautiful:graph>?
>
> I have tried a configuration in which I was defining graphs.
>
> <#graph_umb> rdf:type tdb2:GraphTDB ;
> tdb2:dataset :datasetTDB2 ;
> ja:graphName <urn:mad:bro> .
>
> Then I'd load into that graph.
>
> Again, I haven't found a configuration that allowed me to also define an 
> inference engine and keep the content of these graphs.
>
> I will retry and try to post files for comments, unless you can come up with 
> a minimal example that would save both save time and help preserve sanity.
>
>>> Andy and other more knowledgeable people: is that correct?
>>
>> The issue is that the assembler runs once at the start, builds some Java
>> structures based on that and does not get invoked when the new graph is
>> created later.
>
> To some extent, it would be possible to live with predefined graphs in the 
> config file. This would work for ontologies and reference data that doesn't 
> change.
>
> For data, in particular the type of data with lots of numbers and that 
> corresponds to daily operation data, it might be infeasible to predefine 
> graph names unless you can declare some sorts of template graphs names (e.g., 
> <urn:data:icecream:[FLAVOUR]:[YYYMMDDD]>) which sounds like a stretch. 
> Alternatively, we could use a rolling predefined graph and save with a 
> specific name as archive, then clear and load new data on a daily basis. I 
> think this is a different issue though.
>
>> The issue is also that the union graph is partition - if a single
>> concrete graph were used, it might well work.
>
> I'm not sure I follow this. Can you show an example of a config file that 
> makes that partitioning?
>
>> I haven't worked out the other details like why persistence isn't
>> happening. Might be related to a union graph. Might be update
>> happening going around the inference graph.
>
> Hope the previous message helped clarifying the issue.
>
> As a follow up too, I'm asked if it is possible to save to disc any named 
> graph created in memory before shutting down the server and if that would be 
> a work around.
>
> with many thanks and kind regards,
> Pierre
>
> THIS E-MAIL MAY CONTAIN CONFIDENTIAL AND/OR PRIVILEGED INFORMATION.
> IF YOU ARE NOT THE INTENDED RECIPIENT (OR HAVE RECEIVED THIS E-MAIL
> IN ERROR) PLEASE NOTIFY THE SENDER IMMEDIATELY AND DESTROY THIS
> E-MAIL. ANY UNAUTHORISED COPYING, DISCLOSURE OR DISTRIBUTION OF THE
> MATERIAL IN THIS E-MAIL IS STRICTLY FORBIDDEN.
>
> IN ACCORDANCE WITH MIFID II RULES ON INDUCEMENTS, THE FIRM'S EMPLOYEES
> MAY ATTEND CORPORATE ACCESS EVENTS (DEFINED IN THE FCA HANDBOOK AS
> "THE SERVICE OF ARRANGING OR BRINGING ABOUT CONTACT BETWEEN AN INVESTMENT
> MANAGER AND AN ISSUER OR POTENTIAL ISSUER"). DURING SUCH MEETINGS, THE
> FIRM'S EMPLOYEES MAY ON NO ACCOUNT BE IN RECEIPT OF INSIDE INFORMATION
> (AS DESCRIBED IN ARTICLE 7 OF THE MARKET ABUSE REGULATION (EU) NO 596/2014).
> (https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html<https://www.handbook.fca.org.uk/handbook/glossary/G3532m.html>)
> COMPANIES WHO DISCLOSE INSIDE INFORMATION ARE IN BREACH OF REGULATION
> AND MUST IMMEDIATELY AND CLEARLY NOTIFY ALL ATTENDEES. FOR INFORMATION
> ON THE FIRM'S POLICY IN RELATION TO ITS PARTICIPATION IN MARKET SOUNDINGS,
> PLEASE SEE 
> https://www.horizon-asset.co.uk/market-soundings/<https://www.horizon-asset.co.uk/market-soundings/>.
>
> HORIZON ASSET LLP IS AUTHORISED AND REGULATED
> BY THE FINANCIAL CONDUCT AUTHORITY.
>
>

RE: Fuskei2 configuration, TDB2 data, Inferencing with ontologies, Persisting named graphs upon server restart

Reply via email to