Re: SHACL-based data extraction from a knowledge graph

Thomas Francart Wed, 09 Mar 2022 04:22:29 -0800

Thanks Florian ! I am following up the conversion on the Jena mailing list

Le mer. 9 mars 2022 à 00:56, Florian Kleedorfer <
[email protected]> a écrit :


> I think you could do it with jena. Load the dara into a Graph, then get
> the focus nodes for all shapes you want using VLib.focusNodes. evaluate
> each shape on its focus nodes and compile the intersection of all focus
> nodes that are valid, along with the shapes. Now evaluate the shapes again
> on these valid focus nodes and record all the triples/quads that are pulled
> from the data graph during evaluation.
>

But does this guarantee that all triples pulled from the data graph are
valid triples ?
For example I may have

ex:myConcept skos:prefLabel "english label"@en, "german label"@de .

And my SHACL would specify a Shape that mandates English :

ex:MyShape a sh:NodeShape ;
 sh:property [
    sh:property skos:prefLabel ;
    sh:languageIn ("en") ;
 ]

In that case, does only the skos:prefLabel with an english lang be pulled
from the graph ?

I take the hypothesis that any triple pulled from the graph are the one for
which the predicate is indicated in sh:property, but this does not
guarantee that the triple is valid.
Wouldn't this require to know whether each individual triples has matched
all the constraints of the shape to output it or not ?

Thanks again !
Thomas


> That last bit requires you to wrap the original data graph object in a
> custom class extending the Graph class in such a way that you intercept all
> reading calls and store the result triples in an internal set before
> handing them back to the client.
>
> After the second evaluation of only the valid focus nodes you should have
> your desired extraction result in the wrapper graph.
>
> I may be wrong about this approach, but it might just work. If you try
> this and succeed, please consider contributing the code to jena. It's not
> the first time this question has come up.
>
> All the best!
> Florian
>
>
> Am 8. März 2022 18:25:13 MEZ schrieb Thomas Francart <
> [email protected]>:
>>
>> Hello !
>>
>> I am facing the following situation :
>>
>>    - A large knowledge graph with lots of triples
>>    - A need to export multiple RDF datasets from this large Knowledge
>>    Graph, each containing a subset of the triples from the graph
>>    - Datasets are not limited to a flat list of entities with their
>>    properties, but will each contain a small piece of graph
>>    - The exact content of each Dataset is specified in SHACL, using
>>    standard constraints of cardinalities, sh:node, datatype, languageIn,
>>    sh:hasValue, etc. This SHACL will be used as the source for documenting 
>> the
>>    exact content of each Dataset using [1]
>>
>> And now the question : can we automate the extraction of data from the
>> large knowledge graph based on the SHACL definition of our datasets ?
>> What we are looking for is a guarantee that the extraction process will
>> produce a dataset that is conformant with the SHACL definition.
>>
>> Has anyone done something similar ? A naîve approach would be a SPARQL
>> query generation based on the SHACL definition of the dataset, but I
>> suspect the query will quickly be too complicated.
>>
>> Thanks !
>> Thomas
>>
>> [1] SHACL Play documentation generator :
>> https://shacl-play.sparna.fr/play/doc
>>
>> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas

Re: SHACL-based data extraction from a knowledge graph

Reply via email to