How to optimize TDB disk storage?

2022-01-26 Thread Vinay Mahamuni
Hello,

I am using Apache Jena v4.3.2 + Fuseki + TDB2 persistent disk storage. I am
using jena RDFConnection to connect to the Fuseki server. I am sending 50k
triples in one update. This is mostly new data(only a few triples will
match with existing data). These data are instances based on an ontology.
Please have a look at the attached file containing how much disk memory
increases with each update. For 1.5million triples, it took around 1.2GB.
We want to store around a few billions of triples. Thus the bytes/triple
ratio won't be good for our use case.

When I used the tdb2.tdbcompact tool, the data volume shrinked to 400MB.
But this extra step needs to be performed manually to optimise the storage.

My questions are as follows:

   1. Why 30 update queries each of 50k triples take 3 times more memory
   than a single update query of 1500k triples? Data getting stored is the
   same but memory consumed is more in the first case.
   2. Is there any other way to solve this memory problem?
   3. What are the existing strategies that can be used to optimise the
   storage memory while writing data?
   4. Is there any new development going on to use less memory for the
   write/update query?


Thanks,
Vinay Mahamuni
triples in thousands,memory increase,total Disk memory consumed
0,0,201.3
57.04,1.24,202.54
106.95,0.58,203.12
156.86,0.59,203.71
206.77,17.36,221.07
256.68,25.75,246.82
306.59,8.97,255.79
356.50,17.37,273.16
406.41,25.75,298.91
456.32,8.97,307.88
506.23,34.14,342.02
556.14,25.75,367.77
606.05,25.75,393.52
655.96,25.75,419.27
705.87,42.53,461.8
755.78,34.14,495.94
805.69,34.14,530.08
855.60,34.14,564.22
905.51,34.14,598.36
955.42,50.91,649.27
1005.33,42.53,691.8
1055.24,50.92,742.72
1105.15,42.53,785.25
1155.06,50.91,836.16
1204.97,50.92,887.08
1254.88,50.92,938
1304.79,59.3,997.3
1354.70,50.92,1048.22
1404.61,50.91,1099.13
1454.52,67.7,1166.83
1504.43,59.3,1226.13

RE: FW: Validation issue

2022-01-26 Thread Alani, Yasir
*The node to be validated is "yas:Car_1"

Thanks again

-Original Message-
From: Alani, Yasir  
Sent: 26 January 2022 22:38
To: users@jena.apache.org
Subject: RE: FW: Validation issue

Hi Andy,

Thank you for responding.

What I meant by "work" is producing the correct/expected result, which is the 2 
violations you received. The issue only seems to happen when I try to validate 
a node using the method "validate(Shapes shapes, Graph data, Node node)". The 
node in this case is "aso:Car_1". This method doesn't seem to 'like' qualified 
value shapes, as it returns 0 violations. 

However, when I validate the entire data graph using method "validate(Shapes 
shapes, Graph data)" I do get 2 violations. Are these methods designed to work 
this way?

I was using Jena 4.1.0 and even when I updated to Jena 4.3.2, I still get the 
same results.

I used the old (TQ) shacl playground, which returned 2 violations as well.

Many thanks

Yasir

-Original Message-
From: Andy Seaborne 
Sent: 26 January 2022 20:41
To: users@jena.apache.org
Subject: Re: FW: Validation issue

Yasir,

What is "work" here?

I get 2 violations with Jena 4.3.2 - the same as (current latest) TQ shacl. (1 
violation on a quite old, pre 4.x.x version of Jena - which version are you 
running?)

Which SHACL playground did you use (there are two).

The old (TopQuadrant) SHACL playground returns 2 violations.
The Zazuko one returns 2 violations.

@prefix ex:   

 .
@prefix owl:  

 .
@prefix rdf:  

 .
@prefix rdfs: 

 .
@prefix sh:   

 .
@prefix xsd:  

 .
@prefix yas:  

 .

[ rdf:type sh:ValidationReport ;
   sh:conforms  false ;
   sh:result[
   rdf:type  sh:ValidationResult ;
   sh:focusNode  yas:Wheel_1 ;
   sh:resultMessage
  "Data value \"600\"^^xsd:float is not less than or equal to 200" ;
   sh:resultPath ( yas:hasDiameter yas:hasValue ) ;
   sh:resultSeverity sh:Violation ;
   sh:sourceConstraintComponent  sh:MaxInclusiveConstraintComponent ;
   sh:sourceShape[]  ;
   sh:value  "600"^^xsd:float
   ] ;
   sh:result[
   rdf:type  sh:ValidationResult ;
   

RE: FW: Validation issue

2022-01-26 Thread Alani, Yasir
Hi Andy,

Thank you for responding.

What I meant by "work" is producing the correct/expected result, which is the 2 
violations you received. The issue only seems to happen when I try to validate 
a node using the method "validate(Shapes shapes, Graph data, Node node)". The 
node in this case is "aso:Car_1". This method doesn't seem to 'like' qualified 
value shapes, as it returns 0 violations. 

However, when I validate the entire data graph using method "validate(Shapes 
shapes, Graph data)" I do get 2 violations. Are these methods designed to work 
this way?

I was using Jena 4.1.0 and even when I updated to Jena 4.3.2, I still get the 
same results.

I used the old (TQ) shacl playground, which returned 2 violations as well.

Many thanks

Yasir

-Original Message-
From: Andy Seaborne  
Sent: 26 January 2022 20:41
To: users@jena.apache.org
Subject: Re: FW: Validation issue

Yasir,

What is "work" here?

I get 2 violations with Jena 4.3.2 - the same as (current latest) TQ shacl. (1 
violation on a quite old, pre 4.x.x version of Jena - which version are you 
running?)

Which SHACL playground did you use (there are two).

The old (TopQuadrant) SHACL playground returns 2 violations.
The Zazuko one returns 2 violations.

@prefix ex:   

 .
@prefix owl:  

 .
@prefix rdf:  

 .
@prefix rdfs: 

 .
@prefix sh:   

 .
@prefix xsd:  

 .
@prefix yas:  

 .

[ rdf:type sh:ValidationReport ;
   sh:conforms  false ;
   sh:result[
   rdf:type  sh:ValidationResult ;
   sh:focusNode  yas:Wheel_1 ;
   sh:resultMessage
  "Data value \"600\"^^xsd:float is not less than or equal to 200" ;
   sh:resultPath ( yas:hasDiameter yas:hasValue ) ;
   sh:resultSeverity sh:Violation ;
   sh:sourceConstraintComponent  sh:MaxInclusiveConstraintComponent ;
   sh:sourceShape[]  ;
   sh:value  "600"^^xsd:float
   ] ;
   sh:result[
   rdf:type  sh:ValidationResult ;
   sh:focusNode  yas:Car_1 ;
   sh:resultMessage
"QualifiedValueShape[1,_,false]: Min = 1 but got 0 validations" ;
   sh:resultPath yas:hasComponent ;

Re: FW: Validation issue

2022-01-26 Thread Andy Seaborne

Yasir,

What is "work" here?

I get 2 violations with Jena 4.3.2 - the same as (current latest) TQ 
shacl. (1 violation on a quite old, pre 4.x.x version of Jena - which 
version are you running?)


Which SHACL playground did you use (there are two).

The old (TopQuadrant) SHACL playground returns 2 violations.
The Zazuko one returns 2 violations.

@prefix ex:    .
@prefix owl:   .
@prefix rdf:   .
@prefix rdfs:  .
@prefix sh:    .
@prefix xsd:   .
@prefix yas:   .

[ rdf:type sh:ValidationReport ;
  sh:conforms  false ;
  sh:result[
  rdf:type  sh:ValidationResult ;
  sh:focusNode  yas:Wheel_1 ;
  sh:resultMessage
 "Data value \"600\"^^xsd:float is not less than or equal to 200" ;
  sh:resultPath ( yas:hasDiameter yas:hasValue ) ;
  sh:resultSeverity sh:Violation ;
  sh:sourceConstraintComponent  sh:MaxInclusiveConstraintComponent ;
  sh:sourceShape[]  ;
  sh:value  "600"^^xsd:float
  ] ;
  sh:result[
  rdf:type  sh:ValidationResult ;
  sh:focusNode  yas:Car_1 ;
  sh:resultMessage
"QualifiedValueShape[1,_,false]: Min = 1 but got 0 validations" ;
  sh:resultPath yas:hasComponent ;
  sh:resultSeverity sh:Violation ;
  sh:sourceConstraintComponent
 sh:QualifiedMinCountConstraintComponent ;
  sh:sourceShape[]
  ]
] .

Andy

On 26/01/2022 16:38, Alani, Yasir wrote:



From: Alani, Yasir
Sent: 26 January 2022 14:07
To: Andy Seaborne 
Subject: Validation issue

Hi, I have a shape that doesn’t seem to work with Jena (always conforms) but 
works when I try it on SHACL Playground.

The shape:
@prefix rdf:   
http://www.w3.org/1999/02/22-rdf-syntax-ns#
 .
@prefix sh:http://www.w3.org/ns/shacl# .
@prefix xsd:   
http://www.w3.org/2001/XMLSchema# .
@prefix rdfs:  
http://www.w3.org/2000/01/rdf-schema# .
@prefix ex:http://www.example.org/# .
@prefix owl:   http://www.w3.org/2002/07/owl# .
@prefix yas:   http://www.semanticweb.org/yas# .

yas:CarShape a sh:NodeShape ;
sh:targetClass yas:Car ;
sh:property [ sh:path yas:hasComponent ; sh:qualifiedValueShape [ 
sh:targetClass yas:Wheel ; sh:property [ sh:path (yas:hasDiameter yas:hasValue) 
; sh:minInclusive 0 ; sh:maxInclusive 200 ; sh:datatype xsd:float ; ] ; ] ; 
sh:qualifiedMinCount 1 ] .

The data:
@prefix rdf:   
http://www.w3.org/1999/02/22-rdf-syntax-ns#
 .
@prefix sh:http://www.w3.org/ns/shacl# .
@prefix xsd:   
http://www.w3.org/2001/XMLSchema# .
@prefix rdfs:  
http://www.w3.org/2000/01/rdf-schema# .
@prefix ex:http://www.example.org/# .
@prefix owl:   http://www.w3.org/2002/07/owl# .
@prefix yas:   http://www.semanticweb.org/yas# .

yas:Car_1 rdf:type owl:NamedIndividual ,
yas:Car ;
   yas:hasComponent yas:Wheel_1 .

yas:Wheel_1 rdf:type owl:NamedIndividual ,
 yas:Wheel ;
yas:hasDiameter yas:Diameter2 .

yas:Diameter2 rdf:type owl:NamedIndividual ,
  yas:Diameter ;
 yas:hasValue "600"^^xsd:float .


I am using ShaclValidator method ==> 
validate​(Shapes
 shapes, 
Graph
 data, 
Node
 node).

Please advise if you can.
Regards
Yasir




Re: Trying to count the properties used for each class

2022-01-26 Thread Bob DuCharme

Thank you Martyna and Nicola. Both queries worked perfectly!

Bob


On 1/24/22 6:21 PM, Nicola Vitucci wrote:

Hey Bob,

does this one do what you're after?

SELECT DISTINCT ?cl (COUNT(DISTINCT ?p) AS ?c)
WHERE {
   ?s a ?cl .
   ?s ?p ?o .
}
GROUP BY ?cl

Nicola

Il giorno lun 24 gen 2022 alle ore 23:05 Bob DuCharme  ha
scritto:


Using arq and the data at
http://www.snee.com/bobdc.blog/files/BeatlesMusicians.ttl, I’m trying to
write a query that will list the classes used in the data and the number
of distinct properties used by instances of that class. I’m having a
hard time and can’t even write a query that lists the number of
properties used for just one of the classes; the following just shows me
a series of ones.

 SELECT (COUNT(DISTINCT ?p) AS ?pcount)
 WHERE {
?s a  .
?s ?p ?o .
 }
 GROUP BY ?p

Any suggestions?

Thanks,

Bob




FW: Validation issue

2022-01-26 Thread Alani, Yasir


From: Alani, Yasir
Sent: 26 January 2022 14:07
To: Andy Seaborne 
Subject: Validation issue

Hi, I have a shape that doesn’t seem to work with Jena (always conforms) but 
works when I try it on SHACL Playground.

The shape:
@prefix rdf:   
http://www.w3.org/1999/02/22-rdf-syntax-ns#
 .
@prefix sh:http://www.w3.org/ns/shacl# .
@prefix xsd:   
http://www.w3.org/2001/XMLSchema# .
@prefix rdfs:  
http://www.w3.org/2000/01/rdf-schema# .
@prefix ex:http://www.example.org/# .
@prefix owl:   http://www.w3.org/2002/07/owl# .
@prefix yas:   http://www.semanticweb.org/yas# .

yas:CarShape a sh:NodeShape ;
   sh:targetClass yas:Car ;
sh:property [ sh:path yas:hasComponent ; sh:qualifiedValueShape [ 
sh:targetClass yas:Wheel ; sh:property [ sh:path (yas:hasDiameter yas:hasValue) 
; sh:minInclusive 0 ; sh:maxInclusive 200 ; sh:datatype xsd:float ; ] ; ] ; 
sh:qualifiedMinCount 1 ] .

The data:
@prefix rdf:   
http://www.w3.org/1999/02/22-rdf-syntax-ns#
 .
@prefix sh:http://www.w3.org/ns/shacl# .
@prefix xsd:   
http://www.w3.org/2001/XMLSchema# .
@prefix rdfs:  
http://www.w3.org/2000/01/rdf-schema# .
@prefix ex:http://www.example.org/# .
@prefix owl:   http://www.w3.org/2002/07/owl# .
@prefix yas:   http://www.semanticweb.org/yas# .

yas:Car_1 rdf:type owl:NamedIndividual ,
   yas:Car ;
  yas:hasComponent yas:Wheel_1 .

yas:Wheel_1 rdf:type owl:NamedIndividual ,
yas:Wheel ;
   yas:hasDiameter yas:Diameter2 .

yas:Diameter2 rdf:type owl:NamedIndividual ,
 yas:Diameter ;
yas:hasValue "600"^^xsd:float .


I am using ShaclValidator method ==> 
validate​(Shapes
 shapes, 
Graph
 data, 
Node
 node).

Please advise if you can.
Regards
Yasir




Re: Trying to count the properties used for each class

2022-01-26 Thread Nicola Vitucci
Thanks Alasdair,

This looks really useful. I haven't seen this query though - have I missed
it or were you not specifically referring to this?

Nicola


Il mar 25 gen 2022, 10:21 Gray, Alasdair  ha
scritto:

> We defined a lot of useful statistics queries for datasets in §6.6 of the
> W3C HCLS Dataset Description Guidelines
> https://www.w3.org/TR/hcls-dataset/#s6_6
>
> I’ve made these available in a GitHub repo
> https://github.com/AlasdairGray/HCLS-Stats-Queries
>
> Hopefully you find these helpful
>
> Alasdair
> --
> Alasdair J G Gray
> Associate Professor in Computer Science,
> School of Mathematical and Computer Sciences
> Heriot-Watt University, Edinburgh, UK.
>
> Email: a.j.g.g...@hw.ac.uk
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
> Heriot-Watt is a global University, as a result my working hours may not
> be your working hours. Do not feel pressure to reply to this email outside
> your working hours.
>
>
> To arrange a meeting:
> https://outlook.office365.com/owa/calendar/alasdairg...@heriotwatt.onmicrosoft.com/bookings/
>
> From: Nicola Vitucci 
> Date: Monday, 24 January 2022 at 23:21
> To: users@jena.apache.org 
> Subject: Re: Trying to count the properties used for each class
> 
> Caution: This email originated from a sender outside Heriot-Watt
> University.
> Do not follow links or open attachments if you doubt the authenticity of
> the sender or the content.
> 
>
>
> Hey Bob,
>
> does this one do what you're after?
>
> SELECT DISTINCT ?cl (COUNT(DISTINCT ?p) AS ?c)
> WHERE {
>   ?s a ?cl .
>   ?s ?p ?o .
> }
> GROUP BY ?cl
>
> Nicola
>
> Il giorno lun 24 gen 2022 alle ore 23:05 Bob DuCharme  ha
> scritto:
>
> > Using arq and the data at
> > http://www.snee.com/bobdc.blog/files/BeatlesMusicians.ttl, I’m trying to
> > write a query that will list the classes used in the data and the number
> > of distinct properties used by instances of that class. I’m having a
> > hard time and can’t even write a query that lists the number of
> > properties used for just one of the classes; the following just shows me
> > a series of ones.
> >
> > SELECT (COUNT(DISTINCT ?p) AS ?pcount)
> > WHERE {
> >?s a  .
> >?s ?p ?o .
> > }
> > GROUP BY ?p
> >
> > Any suggestions?
> >
> > Thanks,
> >
> > Bob
> >
> >
> 
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences. This email is generated from the
> Heriot-Watt University Group, which includes:
>
>   1.  Heriot-Watt University, a Scottish charity registered under number
> SC000278
>   2.  Heriot- Watt Services Limited (Oriam), Scotland's national
> performance centre for sport. Heriot-Watt Services Limited is a private
> limited company registered is Scotland with registered number SC271030 and
> registered office at Research & Enterprise Services Heriot-Watt University,
> Riccarton, Edinburgh, EH14 4AS.
>
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
>