Re: Question about Jena capabilities/requirements

2022-03-09 Thread Andy Seaborne




On 09/03/2022 09:33, Goławski, Paweł wrote:

Most of data being stored could be treat as dictionaries, so updates will be 
rather big but rarely.
The number of simultaneous reads could be a few hundred maybe.
Queries will need also Micro OWL Reasoner (at least) and acceptable response 
time is crucial.


What features of the Micro OWL Reasoner?

Andy




Hi Paweł,

The amount CPU is more determined by the number of concurrent users or other 
services. Even if a system is supporting 1000's, the number actually active at 
any given moment is much lower.

The kinds of application using the system influence the complexity of the 
queries as well.

Another factor is the data growth - is it a continuous stream of small updates 
or a few big updates?

Fuseki can scale for performance with RDF-delta:
https://afs.github.io/rdf-delta/

 Andy



On 07/03/2022 19:52, Rinor Sefa wrote:

Hi Pawel,

What would be other requirements for such a system, you mentioned memory 
scaling and response time? I ask this question because Fuseki might meet some 
of these requirements but not others, or vice versa, for any database. Knowing 
exactly what your requirements are will help you determine if Fuseki can be 
used.




Re: SHACL-based data extraction from a knowledge graph

2022-03-09 Thread Thomas Francart
What is VLib.validateShape actually returns the focusNode + Path +
valueNodes that conform to each shape ? or emit them through a listener ?
(
https://github.com/apache/jena/blob/5ce8c141d425655bcaa9d7567117659e502a7ff1/jena-shacl/src/main/java/org/apache/jena/shacl/validation/VLib.java#L89
)
The idea would be to use the Validator as a "filter" that emits the triples
valid according to shapes, so that they can be aggregated in an output
graph.

Le mer. 9 mars 2022 à 13:45, Florian Kleedorfer <
florian.kleedor...@austria.fm> a écrit :

> Am 2022-03-09 13:22, schrieb Thomas Francart:
>
> >> I think you could do it with jena. Load the dara into a Graph, then
> >> get
> >> the focus nodes for all shapes you want using VLib.focusNodes.
> >> evaluate
> >> each shape on its focus nodes and compile the intersection of all
> >> focus
> >> nodes that are valid, along with the shapes. Now evaluate the shapes
> >> again
> >> on these valid focus nodes and record all the triples/quads that are
> >> pulled
> >> from the data graph during evaluation.
> >>
> >
> > But does this guarantee that all triples pulled from the data graph are
> > valid triples ?
> > For example I may have
> >
> > ex:myConcept skos:prefLabel "english label"@en, "german label"@de .
> >
> > And my SHACL would specify a Shape that mandates English :
> >
> > ex:MyShape a sh:NodeShape ;
> >  sh:property [
> > sh:property skos:prefLabel ;
> > sh:languageIn ("en") ;
> >  ]
> >
> > In that case, does only the skos:prefLabel with an english lang be
> > pulled
> > from the graph ?
> >
> > I take the hypothesis that any triple pulled from the graph are the one
> > for
> > which the predicate is indicated in sh:property, but this does not
> > guarantee that the triple is valid.
> > Wouldn't this require to know whether each individual triples has
> > matched
> > all the constraints of the shape to output it or not ?
>
> I think you are right. You'd get a bigger set than the triples you
> actually want. You can probably use the validation result to filter out
> the triples that cause violations - although I am not positive it will
> work in every instance. I'd try, though.
>


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas


Re: SHACL-based data extraction from a knowledge graph

2022-03-09 Thread Florian Kleedorfer

Am 2022-03-09 13:22, schrieb Thomas Francart:

I think you could do it with jena. Load the dara into a Graph, then 
get
the focus nodes for all shapes you want using VLib.focusNodes. 
evaluate
each shape on its focus nodes and compile the intersection of all 
focus
nodes that are valid, along with the shapes. Now evaluate the shapes 
again
on these valid focus nodes and record all the triples/quads that are 
pulled

from the data graph during evaluation.



But does this guarantee that all triples pulled from the data graph are
valid triples ?
For example I may have

ex:myConcept skos:prefLabel "english label"@en, "german label"@de .

And my SHACL would specify a Shape that mandates English :

ex:MyShape a sh:NodeShape ;
 sh:property [
sh:property skos:prefLabel ;
sh:languageIn ("en") ;
 ]

In that case, does only the skos:prefLabel with an english lang be 
pulled

from the graph ?

I take the hypothesis that any triple pulled from the graph are the one 
for

which the predicate is indicated in sh:property, but this does not
guarantee that the triple is valid.
Wouldn't this require to know whether each individual triples has 
matched

all the constraints of the shape to output it or not ?


I think you are right. You'd get a bigger set than the triples you 
actually want. You can probably use the validation result to filter out 
the triples that cause violations - although I am not positive it will 
work in every instance. I'd try, though.


Re: SHACL-based data extraction from a knowledge graph

2022-03-09 Thread Thomas Francart
Thanks Florian ! I am following up the conversion on the Jena mailing list

Le mer. 9 mars 2022 à 00:56, Florian Kleedorfer <
florian.kleedor...@austria.fm> a écrit :

> I think you could do it with jena. Load the dara into a Graph, then get
> the focus nodes for all shapes you want using VLib.focusNodes. evaluate
> each shape on its focus nodes and compile the intersection of all focus
> nodes that are valid, along with the shapes. Now evaluate the shapes again
> on these valid focus nodes and record all the triples/quads that are pulled
> from the data graph during evaluation.
>

But does this guarantee that all triples pulled from the data graph are
valid triples ?
For example I may have

ex:myConcept skos:prefLabel "english label"@en, "german label"@de .

And my SHACL would specify a Shape that mandates English :

ex:MyShape a sh:NodeShape ;
 sh:property [
sh:property skos:prefLabel ;
sh:languageIn ("en") ;
 ]

In that case, does only the skos:prefLabel with an english lang be pulled
from the graph ?

I take the hypothesis that any triple pulled from the graph are the one for
which the predicate is indicated in sh:property, but this does not
guarantee that the triple is valid.
Wouldn't this require to know whether each individual triples has matched
all the constraints of the shape to output it or not ?

Thanks again !
Thomas


> That last bit requires you to wrap the original data graph object in a
> custom class extending the Graph class in such a way that you intercept all
> reading calls and store the result triples in an internal set before
> handing them back to the client.
>
> After the second evaluation of only the valid focus nodes you should have
> your desired extraction result in the wrapper graph.
>
> I may be wrong about this approach, but it might just work. If you try
> this and succeed, please consider contributing the code to jena. It's not
> the first time this question has come up.
>
> All the best!
> Florian
>
>
> Am 8. März 2022 18:25:13 MEZ schrieb Thomas Francart <
> thomas.franc...@sparna.fr>:
>>
>> Hello !
>>
>> I am facing the following situation :
>>
>>- A large knowledge graph with lots of triples
>>- A need to export multiple RDF datasets from this large Knowledge
>>Graph, each containing a subset of the triples from the graph
>>- Datasets are not limited to a flat list of entities with their
>>properties, but will each contain a small piece of graph
>>- The exact content of each Dataset is specified in SHACL, using
>>standard constraints of cardinalities, sh:node, datatype, languageIn,
>>sh:hasValue, etc. This SHACL will be used as the source for documenting 
>> the
>>exact content of each Dataset using [1]
>>
>> And now the question : can we automate the extraction of data from the
>> large knowledge graph based on the SHACL definition of our datasets ?
>> What we are looking for is a guarantee that the extraction process will
>> produce a dataset that is conformant with the SHACL definition.
>>
>> Has anyone done something similar ? A naîve approach would be a SPARQL
>> query generation based on the SHACL definition of the dataset, but I
>> suspect the query will quickly be too complicated.
>>
>> Thanks !
>> Thomas
>>
>> [1] SHACL Play documentation generator :
>> https://shacl-play.sparna.fr/play/doc
>>
>> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>


-- 

*Thomas Francart* -* SPARNA*
Web de *données* | Architecture de l'*information* | Accès aux
*connaissances*
blog : blog.sparna.fr, site : sparna.fr, linkedin :
fr.linkedin.com/in/thomasfrancart
tel :  +33 (0)6.71.11.25.97, skype : francartthomas


Re: [4.3.2] Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null

2022-03-09 Thread Andy Seaborne




On 09/03/2022 11:16, Martynas Jusevičius wrote:

Hi,

This appeared after Java upgrade from 11 to 17:

WARN LocationMapper:188 - Error in configuration file: Cannot invoke
"org.apache.jena.rdf.model.Property.asNode()" because
"org.apache.jena.vocabulary.RDF.type" is null


May be init related ... depends when it happened in the app.

Always good to call
JenaSystem.init
before any Jena code is touched if you can. It makes the whole thing 
deterministic.



I was looking at the LocationMapper code, but line 188 does not
contain anything like that:
https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/util/LocationMapper.java#L188


Wrong location manager?

Look at any stacktraces.

Run 4.4.0.



What is the cause and does this need to be addressed?

Martynas


[4.3.2] Cannot invoke "org.apache.jena.rdf.model.Property.asNode()" because "org.apache.jena.vocabulary.RDF.type" is null

2022-03-09 Thread Martynas Jusevičius
Hi,

This appeared after Java upgrade from 11 to 17:

WARN LocationMapper:188 - Error in configuration file: Cannot invoke
"org.apache.jena.rdf.model.Property.asNode()" because
"org.apache.jena.vocabulary.RDF.type" is null

I was looking at the LocationMapper code, but line 188 does not
contain anything like that:
https://github.com/apache/jena/blob/main/jena-core/src/main/java/org/apache/jena/util/LocationMapper.java#L188

What is the cause and does this need to be addressed?

Martynas


Streaming JSON RowSets (JENA-2302)

2022-03-09 Thread Claus Stadler

Dear all,


I want to inform you of an active PR for making RowSets over 
application/sparql-reults+json streaming



JIRA: https://issues.apache.org/jira/projects/JENA/issues/JENA-2302

PR: https://github.com/apache/jena/pull/1218


As nowadays JSON is the default content type used in Jena for sparql 
results, this PR is aimed at easing working with large sparql result 
sets by having streaming working out-of-the-box.


The implementation used by jena so far loaded json sparql result sets 
into memory first.



The JSON format itself allows for repeated keys (where the last one 
takes precedence) and keys may appear in any order - these things 
introduce a certain variety in how sparql result sets can be represented 
and that needs to be handled correctly by the implementation.



While the new implementation already succeeds on all existing jena 
tests, there is still the risk of breaking existing implementations that 
rely on certain behavior of the non-streaming approach.



Therefore, if you think this change might (negatively) affect you then 
please provide feedback on the proposed PR.



Best regards,

Claus Stadler

--
Dipl. Inf. Claus Stadler
Institute of Applied Informatics (InfAI) / University of Leipzig
Workpage & WebID: http://aksw.org/ClausStadler



RE: Question about Jena capabilities/requirements

2022-03-09 Thread Goławski , Paweł
Most of data being stored could be treat as dictionaries, so updates will be 
rather big but rarely.
The number of simultaneous reads could be a few hundred maybe. 
Queries will need also Micro OWL Reasoner (at least) and acceptable response 
time is crucial.

> Hi Paweł,
>
> The amount CPU is more determined by the number of concurrent users or other 
> services. Even if a system is supporting 1000's, the number actually active 
> at any given moment is much lower.
>
> The kinds of application using the system influence the complexity of the 
> queries as well.
>
> Another factor is the data growth - is it a continuous stream of small 
> updates or a few big updates?
>
> Fuseki can scale for performance with RDF-delta:
> https://afs.github.io/rdf-delta/
>
> Andy

> On 07/03/2022 19:52, Rinor Sefa wrote:
>> Hi Pawel,
>>
>> What would be other requirements for such a system, you mentioned memory 
>> scaling and response time? I ask this question because Fuseki might meet 
>> some of these requirements but not others, or vice versa, for any database. 
>> Knowing exactly what your requirements are will help you determine if Fuseki 
>> can be used.
>>
>>


smime.p7s
Description: S/MIME cryptographic signature