Re: Question about Jena capabilities/requirements
On 09/03/2022 09:33, Goławski, Paweł wrote: Most of data being stored could be treat as dictionaries, so updates will be rather big but rarely. The number of simultaneous reads could be a few hundred maybe. Queries will need also Micro OWL Reasoner (at least) and acceptable response time is crucial. What features of the Micro OWL Reasoner? Andy Hi Paweł, The amount CPU is more determined by the number of concurrent users or other services. Even if a system is supporting 1000's, the number actually active at any given moment is much lower. The kinds of application using the system influence the complexity of the queries as well. Another factor is the data growth - is it a continuous stream of small updates or a few big updates? Fuseki can scale for performance with RDF-delta: https://afs.github.io/rdf-delta/ Andy On 07/03/2022 19:52, Rinor Sefa wrote: Hi Pawel, What would be other requirements for such a system, you mentioned memory scaling and response time? I ask this question because Fuseki might meet some of these requirements but not others, or vice versa, for any database. Knowing exactly what your requirements are will help you determine if Fuseki can be used.
Re: SHACL-based data extraction from a knowledge graph
What is VLib.validateShape actually returns the focusNode + Path + valueNodes that conform to each shape ? or emit them through a listener ? ( https://github.com/apache/jena/blob/5ce8c141d425655bcaa9d7567117659e502a7ff1/jena-shacl/src/main/java/org/apache/jena/shacl/validation/VLib.java#L89 ) The idea would be to use the Validator as a "filter" that emits the triples valid according to shapes, so that they can be aggregated in an output graph. Le mer. 9 mars 2022 à 13:45, Florian Kleedorfer < florian.kleedor...@austria.fm> a écrit : > Am 2022-03-09 13:22, schrieb Thomas Francart: > > >> I think you could do it with jena. Load the dara into a Graph, then > >> get > >> the focus nodes for all shapes you want using VLib.focusNodes. > >> evaluate > >> each shape on its focus nodes and compile the intersection of all > >> focus > >> nodes that are valid, along with the shapes. Now evaluate the shapes > >> again > >> on these valid focus nodes and record all the triples/quads that are > >> pulled > >> from the data graph during evaluation. > >> > > > > But does this guarantee that all triples pulled from the data graph are > > valid triples ? > > For example I may have > > > > ex:myConcept skos:prefLabel "english label"@en, "german label"@de . > > > > And my SHACL would specify a Shape that mandates English : > > > > ex:MyShape a sh:NodeShape ; > > sh:property [ > > sh:property skos:prefLabel ; > > sh:languageIn ("en") ; > > ] > > > > In that case, does only the skos:prefLabel with an english lang be > > pulled > > from the graph ? > > > > I take the hypothesis that any triple pulled from the graph are the one > > for > > which the predicate is indicated in sh:property, but this does not > > guarantee that the triple is valid. > > Wouldn't this require to know whether each individual triples has > > matched > > all the constraints of the shape to output it or not ? > > I think you are right. You'd get a bigger set than the triples you > actually want. You can probably use the validation result to filter out > the triples that cause violations - although I am not positive it will > work in every instance. I'd try, though. > -- *Thomas Francart* -* SPARNA* Web de *données* | Architecture de l'*information* | Accès aux *connaissances* blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart tel : +33 (0)6.71.11.25.97, skype : francartthomas
Re: SHACL-based data extraction from a knowledge graph
Am 2022-03-09 13:22, schrieb Thomas Francart: I think you could do it with jena. Load the dara into a Graph, then get the focus nodes for all shapes you want using VLib.focusNodes. evaluate each shape on its focus nodes and compile the intersection of all focus nodes that are valid, along with the shapes. Now evaluate the shapes again on these valid focus nodes and record all the triples/quads that are pulled from the data graph during evaluation. But does this guarantee that all triples pulled from the data graph are valid triples ? For example I may have ex:myConcept skos:prefLabel "english label"@en, "german label"@de . And my SHACL would specify a Shape that mandates English : ex:MyShape a sh:NodeShape ; sh:property [ sh:property skos:prefLabel ; sh:languageIn ("en") ; ] In that case, does only the skos:prefLabel with an english lang be pulled from the graph ? I take the hypothesis that any triple pulled from the graph are the one for which the predicate is indicated in sh:property, but this does not guarantee that the triple is valid. Wouldn't this require to know whether each individual triples has matched all the constraints of the shape to output it or not ? I think you are right. You'd get a bigger set than the triples you actually want. You can probably use the validation result to filter out the triples that cause violations - although I am not positive it will work in every instance. I'd try, though.
Re: SHACL-based data extraction from a knowledge graph
Thanks Florian ! I am following up the conversion on the Jena mailing list Le mer. 9 mars 2022 à 00:56, Florian Kleedorfer < florian.kleedor...@austria.fm> a écrit : > I think you could do it with jena. Load the dara into a Graph, then get > the focus nodes for all shapes you want using VLib.focusNodes. evaluate > each shape on its focus nodes and compile the intersection of all focus > nodes that are valid, along with the shapes. Now evaluate the shapes again > on these valid focus nodes and record all the triples/quads that are pulled > from the data graph during evaluation. > But does this guarantee that all triples pulled from the data graph are valid triples ? For example I may have ex:myConcept skos:prefLabel "english label"@en, "german label"@de . And my SHACL would specify a Shape that mandates English : ex:MyShape a sh:NodeShape ; sh:property [ sh:property skos:prefLabel ; sh:languageIn ("en") ; ] In that case, does only the skos:prefLabel with an english lang be pulled from the graph ? I take the hypothesis that any triple pulled from the graph are the one for which the predicate is indicated in sh:property, but this does not guarantee that the triple is valid. Wouldn't this require to know whether each individual triples has matched all the constraints of the shape to output it or not ? Thanks again ! Thomas > That last bit requires you to wrap the original data graph object in a > custom class extending the Graph class in such a way that you intercept all > reading calls and store the result triples in an internal set before > handing them back to the client. > > After the second evaluation of only the valid focus nodes you should have > your desired extraction result in the wrapper graph. > > I may be wrong about this approach, but it might just work. If you try > this and succeed, please consider contributing the code to jena. It's not > the first time this question has come up. > > All the best! > Florian > > > Am 8. März 2022 18:25:13 MEZ schrieb Thomas Francart < > thomas.franc...@sparna.fr>: >> >> Hello ! >> >> I am facing the following situation : >> >>- A large knowledge graph with lots of triples >>- A need to export multiple RDF datasets from this large Knowledge >>Graph, each containing a subset of the triples from the graph >>- Datasets are not limited to a flat list of entities with their >>properties, but will each contain a small piece of graph >>- The exact content of each Dataset is specified in SHACL, using >>standard constraints of cardinalities, sh:node, datatype, languageIn, >>sh:hasValue, etc. This SHACL will be used as the source for documenting >> the >>exact content of each Dataset using [1] >> >> And now the question : can we automate the extraction of data from the >> large knowledge graph based on the SHACL definition of our datasets ? >> What we are looking for is a guarantee that the extraction process will >> produce a dataset that is conformant with the SHACL definition. >> >> Has anyone done something similar ? A naîve approach would be a SPARQL >> query generation based on the SHACL definition of the dataset, but I >> suspect the query will quickly be too complicated. >> >> Thanks ! >> Thomas >> >> [1] SHACL Play documentation generator : >> https://shacl-play.sparna.fr/play/doc >> >> -- > Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet. > -- *Thomas Francart* -* SPARNA* Web de *données* | Architecture de l'*information* | Accès aux *connaissances* blog : blog.sparna.fr, site : sparna.fr, linkedin : fr.linkedin.com/in/thomasfrancart tel : +33 (0)6.71.11.25.97, skype : francartthomas
Re: [4.3.2] Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null
On 09/03/2022 11:16, Martynas Jusevičius wrote: Hi, This appeared after Java upgrade from 11 to 17: WARN LocationMapper:188 - Error in configuration file: Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null May be init related ... depends when it happened in the app. Always good to call JenaSystem.init before any Jena code is touched if you can. It makes the whole thing deterministic. I was looking at the LocationMapper code, but line 188 does not contain anything like that: https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/util/LocationMapper.java#L188 Wrong location manager? Look at any stacktraces. Run 4.4.0. What is the cause and does this need to be addressed? Martynas
[4.3.2] Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null
Hi, This appeared after Java upgrade from 11 to 17: WARN LocationMapper:188 - Error in configuration file: Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null I was looking at the LocationMapper code, but line 188 does not contain anything like that: https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/util/LocationMapper.java#L188 What is the cause and does this need to be addressed? Martynas
Streaming JSON RowSets (JENA-2302)
Dear all, I want to inform you of an active PR for making RowSets over application/sparql-reults+json streaming JIRA: https://issues.apache.org/jira/projects/JENA/issues/JENA-2302 PR: https://github.com/apache/jena/pull/1218 As nowadays JSON is the default content type used in Jena for sparql results, this PR is aimed at easing working with large sparql result sets by having streaming working out-of-the-box. The implementation used by jena so far loaded json sparql result sets into memory first. The JSON format itself allows for repeated keys (where the last one takes precedence) and keys may appear in any order - these things introduce a certain variety in how sparql result sets can be represented and that needs to be handled correctly by the implementation. While the new implementation already succeeds on all existing jena tests, there is still the risk of breaking existing implementations that rely on certain behavior of the non-streaming approach. Therefore, if you think this change might (negatively) affect you then please provide feedback on the proposed PR. Best regards, Claus Stadler -- Dipl. Inf. Claus Stadler Institute of Applied Informatics (InfAI) / University of Leipzig Workpage & WebID: http://aksw.org/ClausStadler
RE: Question about Jena capabilities/requirements
Most of data being stored could be treat as dictionaries, so updates will be rather big but rarely. The number of simultaneous reads could be a few hundred maybe. Queries will need also Micro OWL Reasoner (at least) and acceptable response time is crucial. > Hi Paweł, > > The amount CPU is more determined by the number of concurrent users or other > services. Even if a system is supporting 1000's, the number actually active > at any given moment is much lower. > > The kinds of application using the system influence the complexity of the > queries as well. > > Another factor is the data growth - is it a continuous stream of small > updates or a few big updates? > > Fuseki can scale for performance with RDF-delta: > https://afs.github.io/rdf-delta/ > > Andy > On 07/03/2022 19:52, Rinor Sefa wrote: >> Hi Pawel, >> >> What would be other requirements for such a system, you mentioned memory >> scaling and response time? I ask this question because Fuseki might meet >> some of these requirements but not others, or vice versa, for any database. >> Knowing exactly what your requirements are will help you determine if Fuseki >> can be used. >> >> smime.p7s Description: S/MIME cryptographic signature