Re: Turtle-star formatted output with annotation syntax?

2022-02-09 Thread Andy Seaborne




On 09/02/2022 16:14, Shaw, Ryan wrote:

Jena riot handles parsing Turtle-star annotation syntax, but AFAICT Turtle-star 
output always uses quoted triples. Are there any plans to support the 
annotation syntax for output?


More an aspiration.

A plan is an aspiration and a timescale :-)

Andy



Thanks,
Ryan


Re: Configure fuseki-server with geosparql assembler

2022-02-09 Thread Andy Seaborne

Hi Erik,

The jena-geosparql code isn't there by default.

Unfortuately, just adding it to the standard build is messy.

All users will see there will be several start-up warnings even if the 
GeoSPARQL code isn't being used.


At the moment, add jena-geosparql and also the dependencies it needs. 
Setting up maven/gradle is the the robust way to do it because it only 
needs jena-geosparql else it needs:


io.github.galbiston:expiring-map
javax.xml.bind:jaxb-api
org.apache.sis.core:sis-referencing
org.slf4j:jul-to-slf4j
org.locationtech.jts:jts-core
org.jdom:jdom2
org.apache.commons:commons-collections4


We hope to have a version of the Fuseki UI for query and data upload, 
without the admin controls, which can be added to any Fuseki Main based 
server (jena-fuseki-geosparql is Fuseki Main based). That is work to be 
done.


Andy

On 09/02/2022 13:33, Erik Bijsterbosch wrote:

Hi,

When embedding geosparql as an assembler in my *fuseki-server *configuration
I get the following error at startup in my docker log:

  ⠿ Container fuseki-1  Removed0.9s
[+] Running 1/1
  ⠿ Container labs-services-fuseki-1  Created  0.1s
Attaching to fuseki-1
fuseki-1  | /opt/java-minimal/bin/java -Xmx1048m -Xms1048m -jar
/fuseki/jena-fuseki-server-4.4.0.jar --conf=config-fuseki.ttl
fuseki-1  | the root file:///fuseki/config-fuseki.ttl#geo_ds has no most
specific type that is a subclass of ja:Object
fuseki-1 exited with code 1

I stripped config-fuseki.ttl to the bare example as follows:

PREFIX :  <#>
PREFIX fuseki:
PREFIX rdf:   
PREFIX rdfs:  
PREFIX ja:
PREFIX tdb2:  
PREFIX geosparql: 

<#service> rdf:type fuseki:Service;
fuseki:name "dst";
fuseki:endpoint [ fuseki:operation fuseki:query; ] ;
fuseki:dataset <#geo_dst> .

<#geo_dst> rdf:type geosparql:geosparqlDataset ;
   geosparql:spatialIndexFile "databases/DB2/spatial.index";
   geosparql:dataset <#baseDataset> ;
.

<#baseDataset> rdf:type tdb2:DatasetTDB2 ;
tdb2:location "databases/DB2"

What could be wrong here and what else needs to be done for a proper
*fuseki-geosparql-server* setup?

Regards.
Erik



Re: Disabling BNode UID generation

2022-02-09 Thread Andrii Berezovskyi
Ryan,

Here is an example of how we use it in JUnit: 
https://github.com/eclipse/lyo/blob/aa3b18e4f28f3960d3a86a0b54151dccec2f139f/core/oslc4j-jena-provider/src/test/java/org/eclipse/lyo/oslc4j/provider/jena/JenaModelHelperTest.java#L64

And here is an AssertJ helper we wrote: 
https://github.com/eclipse/lyo/blob/aa3b18e4f28f3960d3a86a0b54151dccec2f139f/core/oslc4j-jena-provider/src/test/java/org/eclipse/lyo/oslc4j/provider/jena/helpers/JenaAssert.java

/Andrew

On 2022-02-09, 17:10, "Shaw, Ryan"  wrote:

Thank you, Andy. 

I agree that working on the triple level is the correct way to approach 
this. I was looking for something quick and dirty that would work with textual 
diffing by a VCS, hence my focus on the blank node labels.

Are there any examples of how to use the isomorphism utilities in Jena?

> On Feb 5, 2022, at 12:48 PM, Andy Seaborne  wrote:
> 
> 
> 
> On 04/02/2022 19:09, Shaw, Ryan wrote:
>> Hello,
>> I am trying to experiment with generating diffable N-Triples or flat 
Turtle files.
> ...
>> Thanks,
>> Ryan
> 
> 
> Info: There is work on a charter for
> 
> "RDF Dataset Canonicalization and Hash Working Group"
> 
> https://w3c.github.io/rch-wg-charter/
> 
> The end of section 1 has some links to related work.
> 
> Given RDF is inherently unordered, canonicalization and "diff of triples" 
are related.
> 
> 
> For diff-able files, what counts as "different" between two files?
> 
> Instead of changing the bnode algorithm, have you considered making use 
of bnode-isomorphism? That is, during a diff, maintain a growing mapping from 
bnodes in one list of triples to bnodes in the other list?
> Iso.isomorphicTriples
> 
> (The list being the triples in encounter order during parsing). It is 
working not so much on the syntax as the abstraction of triples. e.g A Turtle 
file and an NT file produced by parsing the TTL file can be defined to be "the 
same".
> 
> It's fairly portable across files generated by other systems as well 
except for Turtle lists - Jena as a fixed order for triple generation for a 
list but it isn't necesasrily the same for all systems.
> 
> Jena's Turtle algorithm, which is in LangTurtleBase, generates in list 
order, with rdf:first, then rdf:rest; the triple the referencing the list 
appears after the list. It happens to be the way the spec explains it:
>   https://www.w3.org/TR/turtle/#sec-parsing-triples
> but that is defining the outcome and isn't a requirement.
> 
>Andy




Re: Disabling BNode UID generation

2022-02-09 Thread Andy Seaborne




On 09/02/2022 16:09, Shaw, Ryan wrote:

Thank you, Andy.

I agree that working on the triple level is the correct way to approach this. I 
was looking for something quick and dirty that would work with textual diffing 
by a VCS, hence my focus on the blank node labels.

Are there any examples of how to use the isomorphism utilities in Jena?


See the code - the isomorphism code takes two groups of triples in 
various grouping forms and returns true or false.  You'll probably want 
to look at how it does it and build similar for your use case to get a 
diff of triples.


Andy



On Feb 5, 2022, at 12:48 PM, Andy Seaborne  wrote:



On 04/02/2022 19:09, Shaw, Ryan wrote:

Hello,
I am trying to experiment with generating diffable N-Triples or flat Turtle 
files.

...

Thanks,
Ryan



Info: There is work on a charter for

"RDF Dataset Canonicalization and Hash Working Group"

https://w3c.github.io/rch-wg-charter/

The end of section 1 has some links to related work.

Given RDF is inherently unordered, canonicalization and "diff of triples" are 
related.


For diff-able files, what counts as "different" between two files?

Instead of changing the bnode algorithm, have you considered making use of 
bnode-isomorphism? That is, during a diff, maintain a growing mapping from 
bnodes in one list of triples to bnodes in the other list?
Iso.isomorphicTriples

(The list being the triples in encounter order during parsing). It is working not so much 
on the syntax as the abstraction of triples. e.g A Turtle file and an NT file produced by 
parsing the TTL file can be defined to be "the same".

It's fairly portable across files generated by other systems as well except for 
Turtle lists - Jena as a fixed order for triple generation for a list but it 
isn't necesasrily the same for all systems.

Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, 
with rdf:first, then rdf:rest; the triple the referencing the list appears 
after the list. It happens to be the way the spec explains it:
   https://www.w3.org/TR/turtle/#sec-parsing-triples
but that is defining the outcome and isn't a requirement.

Andy




Re: Disabling BNode UID generation

2022-02-09 Thread Beaudet, David
I ran across an API call the other day that checks isomorphism.  See the 
topbraid shacl library junit test runner. I think it's called by the dash test 
case class to make sure the resulting graph matches the expected response.


On Feb 9, 2022 11:10, "Shaw, Ryan"  wrote:
Thank you, Andy.

I agree that working on the triple level is the correct way to approach this. I 
was looking for something quick and dirty that would work with textual diffing 
by a VCS, hence my focus on the blank node labels.

Are there any examples of how to use the isomorphism utilities in Jena?

> On Feb 5, 2022, at 12:48 PM, Andy Seaborne  wrote:
>
>
>
> On 04/02/2022 19:09, Shaw, Ryan wrote:
>> Hello,
>> I am trying to experiment with generating diffable N-Triples or flat Turtle 
>> files.
> ...
>> Thanks,
>> Ryan
>
>
> Info: There is work on a charter for
>
> "RDF Dataset Canonicalization and Hash Working Group"
>
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fw3c.github.io%2Frch-wg-charter%2Fdata=04%7C01%7C%7C9b4e78ea9e08469c023008d9ebe6a533%7C53f6461e95ad4b08a8da973e49ae9312%7C0%7C0%7C637800198129953885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=iFjDAQwclQvtNtNPWQ1c98VVZh5WzEjyFcSRzP%2FckkQ%3Dreserved=0
>
> The end of section 1 has some links to related work.
>
> Given RDF is inherently unordered, canonicalization and "diff of triples" are 
> related.
>
>
> For diff-able files, what counts as "different" between two files?
>
> Instead of changing the bnode algorithm, have you considered making use of 
> bnode-isomorphism? That is, during a diff, maintain a growing mapping from 
> bnodes in one list of triples to bnodes in the other list?
> Iso.isomorphicTriples
>
> (The list being the triples in encounter order during parsing). It is working 
> not so much on the syntax as the abstraction of triples. e.g A Turtle file 
> and an NT file produced by parsing the TTL file can be defined to be "the 
> same".
>
> It's fairly portable across files generated by other systems as well except 
> for Turtle lists - Jena as a fixed order for triple generation for a list but 
> it isn't necesasrily the same for all systems.
>
> Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, 
> with rdf:first, then rdf:rest; the triple the referencing the list appears 
> after the list. It happens to be the way the spec explains it:
>   
> https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fturtle%2F%23sec-parsing-triplesdata=04%7C01%7C%7C9b4e78ea9e08469c023008d9ebe6a533%7C53f6461e95ad4b08a8da973e49ae9312%7C0%7C0%7C637800198129953885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=Y1dlFIAko0H92M2VQrUDvDQmZqCWYwDuJUFNFJoSVyc%3Dreserved=0
> but that is defining the outcome and isn't a requirement.
>
>Andy



Query performance with an ORDER BY

2022-02-09 Thread Andrii Berezovskyi
Hello,

We need to do paging in our app and as we don’t have a single property on which 
we can do a WHERE cutoff, we use an OFFSET/LIMIT. The OFFSET works fairly fast 
on the small values, however when we add an ORDER BY clause (as per SPARQL 1.1 
spec as OFFSET is not guaranteed to make sense otherwise), queries become quite 
slow (5s vs <<1s).

The query we run:

PREFIX rdf: 
DESCRIBE ?s
WHERE
{
GRAPH 
{
SELECT DISTINCT ?s
WHERE
{ ?s ?p ?o ;
  rdf:type 
}
ORDER BY ASC(?s)
OFFSET 100
LIMIT 21
}
}

An “optimization” below works (<<1s eval time) but obviously will not guarantee 
that the inner LIMIT is always applied in the same manner and the paging beyond 
### items won’t work:

PREFIX rdf: 
DESCRIBE ?s
WHERE
{
GRAPH 
{
SELECT DISTINCT ?s
WHERE
{ ?s ?p ?o ;
  rdf:type 
}
LIMIT 1000
}
}
ORDER BY ASC(?s)
OFFSET 100
LIMIT 21

We’ve read that the order is mostly stable even without the ORDER [1]. Is there 
a better way to do this? We were assuming that sorting on the subject URI will 
result into some optimization but it does not seem to be the case. Is it 
possible to add some way to sort on NodeId?

Also, there is a problem with OFFSET/LIMIT approach even without an order as 
the OFFSET grows [2]. What is the recommended approach to a stable paging that 
would scale well? In SQL, seek method [3] is considered appropriate for most 
DBs. I tried replicating that, adding a random int ID to every resource and 
using FILTER(?ord > ###) to do the paging, using the max value from a page as 
an argument for the next. However, this again works fast only if the ORDER BY 
clause is missing, which seems to be essential to get a stable sorting (but at 
least allows to add a custom sort order without requiring an ORDER BY clause if 
one wishes to live a dangerous life). What is the best way to do stable paging 
in Jena/SPARQL?

Thanks in advance,
Andrew

[1]: 
https://markmail.org/search/?q=offset+order+list%3Aorg.apache.incubator.jena-users#query:offset%20order%20list%3Aorg.apache.incubator.jena-users+page:1+mid:fik5kllpnd4sm3tl+state:results
[2]: 
https://markmail.org/search/?q=offset+list%3Aorg.apache.incubator.jena-users#query:offset%20list%3Aorg.apache.incubator.jena-users+page:1+mid:uhzxkaxstbzfetns+state:results
[3]: https://blog.jooq.org/faster-sql-paging-with-jooq-using-the-seek-method/


Turtle-star formatted output with annotation syntax?

2022-02-09 Thread Shaw, Ryan
Jena riot handles parsing Turtle-star annotation syntax, but AFAICT Turtle-star 
output always uses quoted triples. Are there any plans to support the 
annotation syntax for output?

Thanks,
Ryan

Re: Disabling BNode UID generation

2022-02-09 Thread Shaw, Ryan
Thank you, Andy. 

I agree that working on the triple level is the correct way to approach this. I 
was looking for something quick and dirty that would work with textual diffing 
by a VCS, hence my focus on the blank node labels.

Are there any examples of how to use the isomorphism utilities in Jena?

> On Feb 5, 2022, at 12:48 PM, Andy Seaborne  wrote:
> 
> 
> 
> On 04/02/2022 19:09, Shaw, Ryan wrote:
>> Hello,
>> I am trying to experiment with generating diffable N-Triples or flat Turtle 
>> files.
> ...
>> Thanks,
>> Ryan
> 
> 
> Info: There is work on a charter for
> 
> "RDF Dataset Canonicalization and Hash Working Group"
> 
> https://w3c.github.io/rch-wg-charter/
> 
> The end of section 1 has some links to related work.
> 
> Given RDF is inherently unordered, canonicalization and "diff of triples" are 
> related.
> 
> 
> For diff-able files, what counts as "different" between two files?
> 
> Instead of changing the bnode algorithm, have you considered making use of 
> bnode-isomorphism? That is, during a diff, maintain a growing mapping from 
> bnodes in one list of triples to bnodes in the other list?
> Iso.isomorphicTriples
> 
> (The list being the triples in encounter order during parsing). It is working 
> not so much on the syntax as the abstraction of triples. e.g A Turtle file 
> and an NT file produced by parsing the TTL file can be defined to be "the 
> same".
> 
> It's fairly portable across files generated by other systems as well except 
> for Turtle lists - Jena as a fixed order for triple generation for a list but 
> it isn't necesasrily the same for all systems.
> 
> Jena's Turtle algorithm, which is in LangTurtleBase, generates in list order, 
> with rdf:first, then rdf:rest; the triple the referencing the list appears 
> after the list. It happens to be the way the spec explains it:
>   https://www.w3.org/TR/turtle/#sec-parsing-triples
> but that is defining the outcome and isn't a requirement.
> 
>Andy



Configure fuseki-server with geosparql assembler

2022-02-09 Thread Erik Bijsterbosch
Hi,

When embedding geosparql as an assembler in my *fuseki-server *configuration
I get the following error at startup in my docker log:

 ⠿ Container fuseki-1  Removed0.9s
[+] Running 1/1
 ⠿ Container labs-services-fuseki-1  Created  0.1s
Attaching to fuseki-1
fuseki-1  | /opt/java-minimal/bin/java -Xmx1048m -Xms1048m -jar
/fuseki/jena-fuseki-server-4.4.0.jar --conf=config-fuseki.ttl
fuseki-1  | the root file:///fuseki/config-fuseki.ttl#geo_ds has no most
specific type that is a subclass of ja:Object
fuseki-1 exited with code 1

I stripped config-fuseki.ttl to the bare example as follows:

PREFIX :  <#>
PREFIX fuseki:
PREFIX rdf:   
PREFIX rdfs:  
PREFIX ja:
PREFIX tdb2:  
PREFIX geosparql: 

<#service> rdf:type fuseki:Service;
   fuseki:name "dst";
   fuseki:endpoint [ fuseki:operation fuseki:query; ] ;
   fuseki:dataset <#geo_dst> .

<#geo_dst> rdf:type geosparql:geosparqlDataset ;
  geosparql:spatialIndexFile "databases/DB2/spatial.index";
  geosparql:dataset <#baseDataset> ;
.

<#baseDataset> rdf:type tdb2:DatasetTDB2 ;
   tdb2:location "databases/DB2"

What could be wrong here and what else needs to be done for a proper
*fuseki-geosparql-server* setup?

Regards.
Erik