Re: Identifying the currently-running queries

2024-09-23 Thread Andy Seaborne

Hi Dan,



On 20/09/2024 15:18, Dan Pritts wrote:

Sorry for the late reply

We identified a problem with a similar old fuseki version. Using the 
internal backup mechanism caused fuseki to slow down drastically and 
take down our website, usually ending in us restarting.


Which version?
And is it a TDB1 database?

    Andy



We disabled the fuseki internal backups and instead make Linux lvm 
snapshots of the database and back it up via tdbdump.


We restart fuseki as part of the process to ensure a consistent 
database. That’s fast enough that we just live with the outage.


Sent from Phone, apologies for typos and/or brevity


On Thu, Sep 5, 2024 at 8:08 AM Hugo Mills 
 wrote:


Hi, all,

We’ve got a heavily-used webapp backed by Fuseki, and we’re having
issues with the load on the Fuseki server. It frequently heads
into a storm of high load average, with the CPU usage pegged at
600% (on 6 cores), and then the app grinds to a halt and we have
to restart the database. We’re trying to understand why this is
happening. Is there the ability in Fuseki to get a list of the
currently-running queries at any given point in time, including
the query text itself, and preferably also the amount of time each
one has been running for?

We’re running Fuseki from Jena 3.4.0, if that makes a difference
to the answer.

Thanks,

Hugo.

Hugo Mills

Development Team Leader

agrimetrics.co.uk 

Reading Enterprise Centre, Whiteknights Road, Reading, UK, RG6 6BU








Re: Starting Fuseki server from Jena

2024-09-23 Thread Andy Seaborne
If you find there is a Fuseki alkready on 3030, it may be because Fuseki 
has background threads so just returning from application "main" does 
not exit the JVM.


Either
   Systrem.exit(0)

or

  server.start();
  try {
 
  } finally { server.stop(); }


For general information
 - which OS are you running on?
 - which version of Fuseki is this?

Andy


On 23/09/2024 10:00, Simon Bin wrote:

Have you already tried accessing the server in your web browser on port
3030 to see if it responds?

On Sun, 2024-09-22 at 15:47 +, Zlatareva, Neli (Computer Science)
wrote:

Hi there, I try to start Fuseki with the following instructions

    FusekiServer server = FusekiServer.create()
     .port(3030)
    .add("/ds", dataset)
    .build();
    System.out.println("Fuseki is starting on port: " +
server.getHttpPort());
    server.start();

The server starts on port 3030 but hangs on. The dataset is fine, and
the port is available. The debug logs include

11:44:11.343 [main] DEBUG org.eclipse.jetty.server.AbstractConnector
- Could not configure SO_REUSEPORT to false on
sun.nio.ch.ServerSocketChannelImpl[unbound]
java.lang.UnsupportedOperationException: 'SO_REUSEPORT' not supported
at
java.base/sun.nio.ch.ServerSocketChannelImpl.setOption(ServerSocketCh
annelImpl.java:219)
at
org.eclipse.jetty.server.ServerConnector.setSocketOption(ServerConnec
tor.java:355)
at
org.eclipse.jetty.server.ServerConnector.openAcceptChannel(ServerConn
ector.java:336)
at
org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:30
4)
at org.eclipse.jetty.server.Server.lambda$doStart$0(Server.java:402)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachO
ps.java:184)
at
java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipe
line.java:212)
at
java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipe
line.java:194)
at
java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Sp
literators.java:1024)
at
java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline
.java:556)
at
java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractP
ipeline.java:546)
at
java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(Fo
rEachOps.java:151)
at
java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequent
ial(ForEachOps.java:174)
at
java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline
.java:265)
at
java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipelin
e.java:611)
at org.eclipse.jetty.server.Server.doStart(Server.java:398)
at
org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLife
Cycle.java:93)
at
org.apache.jena.fuseki.main.FusekiServer.start(FusekiServer.java:298)
at FusekiWithReasoner.main(FusekiWithReasoner.java:74)

Any suggestions what might be wrong? The help will be greatly
appreciated.
Thank you so much!
Regards, Neli.


Neli P. Zlatareva, PhD
Professor of Computer Science
Department of Computer Science
Central Connecticut State University
New Britain, CT 06050
Phone: (860) 832-2723
Fax: (860) 832-2712
Web site: cs.ccsu.edu/~neli/






Re: Using Fuseki to query the union of datasets - with inference

2024-09-16 Thread Andy Seaborne

Scott,

In looking at improving this area, could you give some background:

1/ What level of inference is the data going to use?

2/ Does the schema/ontology need to be in the dataset and visible to 
runtime query? Or in a separate file?


3/ Does the schema/ontology change incrementally or is it changing to a 
"new released version", infrequently and with significant changes?


Andy

On 12/09/2024 20:51, Scott Henninger wrote:

I had a typo.  The results I get are:
   :T1 :graph1

-- Scott

From: Scott Henninger
Sent: Thursday, September 12, 2024 2:29 PM
To: users@jena.apache.org
Subject: Re: Using Fuseki to query the union of datasets - with inference

Thank you for the response, Andy.  Unfortunately I can’t reproduce what you 
describe here.  I started Fuseki 5.1 with a config file using the config you 
gave me here.  I then used the Fuseki “add data” to upload a TriG file with the 
data in your example.  When I go to query the default graph, I don’t see any 
matches for :x.  If I do a
   SELECT * WHERE  { GRAPH ?g  { :x rdf:type ?T }}
I only get
   :x :graph1

Did you run your test different somehow?

A second question.  I thought maybe the inference would appear after 
re-starting Fuseki, but the data was erased.  I would have thought that { 
:tdbDataset a tdb:DatasetTDB } meant the data was saving in TDB, but it doesn’t 
seem to be persistent.  Why is that?

Thanks again for all your efforts
-- Scott

From: Andy Seaborne mailto:a...@apache.org>>
Sent: Thursday, September 12, 2024 6:10 AM
To: users@jena.apache.org<mailto:users@jena.apache.org>
Subject: [EXTERNAL] Re: Using Fuseki to query the union of datasets - with 
inference

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi Scott,

On 09/09/2024 21:28, Scott Henninger wrote:

Thank you for the config for unionDefaultGraph, Andy. I am able to see the data 
both in the named graph and the default graph with your config. Now I want to 
add inference to the default graph. In that case I do need to go through a 
model, correct?

Below is a config that I’m trying that doesn’t do what I want. I cannot see the 
triples in the default graph and no inferences are created. What am I doing 
wrong here? Specifically what would a config look like that performs inferences 
that appear in the default graph?


tdb:unionDefaultGraph only makes the SPARQL default graph appear as the
union of named graphs.

To get the inference code to see a union graph, try a named graph of


The tdb:unionDefaultGraph can be removed unless you want to have
(non-inference) access to the dataset.

-
:defaultDataset a ja:RDFDataset ;
ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>
 ;
] ;
.

:tdbGraph a tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
tdb:namedGraph  ## ***
.

:tdbDataset a tdb:DatasetTDB ;
tdb:location "DB" ;
.
-

I tried (TriG):

---
GRAPH :graph1 {
:x rdf:type :T1 .
}

GRAPH :graph2 {
:T1 rdfs:subClassOf :T2 .
}
---

query:
SELECT * { :x rdf:type ?T }

and got T1 and T2.



:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph-inference" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset : defaultDataset;
.
:defaultDataset a ja:RDFDataset ;
ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner><http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>>
 ;
] ;
.
:tdbGraph a tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
.
:tdbDataset a tdb:DatasetTDB ;



NB This is TDB1.
It should not affect this query but TDB2 is a better choice long term.
Both work in the illustration above.
TDB2 is more robust and work better in Fuseki.


tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.


Thanks for all your help
-- Scott

From: Andy Seaborne mailto:a...@apache.org>>
Sent: Saturday, August 24, 2024 9:52 AM
To: users@jena.apache.org<mailto:users@jena.apache.org>
Subject: Re: [EXTERNAL] Re: Using Fuseki to query the union of datasets

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.



On 24/07/2024 20:08, Scott Henninger wrote:

Thank you for the response, Pedro. However it seems to me that I am already 
applying the approach you reference in the page. See lines 35-37 int the config 
file I included. Perhaps I am applying it wrong? If so, where am I going wrong?


You don't need to go through a model. Connect the fuseki:Service to the
dataset with tdb:unionDefaultGr

Re: Using Fuseki to query the union of datasets - with inference

2024-09-15 Thread Andy Seaborne




On 12/09/2024 20:29, Scott Henninger wrote:

Thank you for the response, Andy.  Unfortunately I can’t reproduce what you 
describe here.  I started Fuseki 5.1 with a config file using the config you 
gave me here.  I then used the Fuseki “add data” to upload a TriG file with the 
data in your example.  When I go to query the default graph, I don’t see any 
matches for :x.  If I do a
   SELECT * WHERE  { GRAPH ?g  { :x rdf:type ?T }}
I only get
   :x :graph1

Did you run your test different somehow?


In the config.ttl I gave, only the :tdbGraph, which connects to 
, the internal name for the union default graph 
marked at "## ***"


Does SELECT * WHERE  { :x rdf:type ?T } give two rows?


A second question.  I thought maybe the inference would appear after 
re-starting Fuseki, but the data was erased.  I would have thought that { 
:tdbDataset a tdb:DatasetTDB } meant the data was saving in TDB, but it doesn’t 
seem to be persistent.  Why is that?


How are you updating the data? Which named graph?

---

For RDFS, range+domain, subClass, subProperty:

This has two features that are different to what you describe:

1. The schema is in a separate graph, often a file.
2. It does not work for unionDefaultGraph which is a feature omission.

(2) ought to be fixed. It happens because DatasetRDFS works using the 
Java API not via SPARQL; unionDefaultGraph is a SPARQL effect.


 
fuseki:dataset :rdfsDataset ;
.

## RDFS
:rdfsDataset rdf:type ja:DatasetRDFS ;
ja:rdfsSchema ;
ja:dataset :baseDataset;
.

:tdbDataset a  tdb2:DatasetTDB ;
tdb2:location  "DB2" ;
tdb2:unionDefaultGraph true ;
.
 

Andy



Thanks again for all your efforts
-- Scott

From: Andy Seaborne 
Sent: Thursday, September 12, 2024 6:10 AM
To: users@jena.apache.org
Subject: [EXTERNAL] Re: Using Fuseki to query the union of datasets - with 
inference

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi Scott,

On 09/09/2024 21:28, Scott Henninger wrote:

Thank you for the config for unionDefaultGraph, Andy. I am able to see the data 
both in the named graph and the default graph with your config. Now I want to 
add inference to the default graph. In that case I do need to go through a 
model, correct?

Below is a config that I’m trying that doesn’t do what I want. I cannot see the 
triples in the default graph and no inferences are created. What am I doing 
wrong here? Specifically what would a config look like that performs inferences 
that appear in the default graph?


tdb:unionDefaultGraph only makes the SPARQL default graph appear as the
union of named graphs.

To get the inference code to see a union graph, try a named graph of


The tdb:unionDefaultGraph can be removed unless you want to have
(non-inference) access to the dataset.

-
:defaultDataset a ja:RDFDataset ;
ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>
 ;
] ;
.

:tdbGraph a tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
tdb:namedGraph  ## ***
.

:tdbDataset a tdb:DatasetTDB ;
tdb:location "DB" ;
.
-

I tried (TriG):

---
GRAPH :graph1 {
:x rdf:type :T1 .
}

GRAPH :graph2 {
:T1 rdfs:subClassOf :T2 .
}
---

query:
SELECT * { :x rdf:type ?T }

and got T1 and T2.



:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph-inference" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset : defaultDataset;
.
:defaultDataset a ja:RDFDataset ;
ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner><http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>>
 ;
] ;
.
:tdbGraph a tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
.
:tdbDataset a tdb:DatasetTDB ;



NB This is TDB1.
It should not affect this query but TDB2 is a better choice long term.
Both work in the illustration above.
TDB2 is more robust and work better in Fuseki.


tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.


Thanks for all your help
-- Scott

From: Andy Seaborne mailto:a...@apache.org>>
Sent: Saturday, August 24, 2024 9:52 AM
To: users@jena.apache.org<mailto:users@jena.apache.org>
Subject: Re: [EXTERNAL] Re: Using Fuseki to query the union of datasets

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.



On 24/07/2024 20:08, Scott Henninger wrote:

Thank you for the response, Pedro. However it seems to me that I am already 
applying the approach you reference in the pa

Re: Using Fuseki to query the union of datasets - with inference

2024-09-12 Thread Andy Seaborne

Hi Scott,

On 09/09/2024 21:28, Scott Henninger wrote:

Thank you for the config for unionDefaultGraph, Andy.  I am able to see the 
data both in the named graph and the default graph with your config.  Now I 
want to add inference to the default graph.  In that case I do need to go 
through a model, correct?

Below is a config that I’m trying that doesn’t do what I want.  I cannot see 
the triples in the default graph and no inferences are created.  What am I 
doing wrong here?  Specifically what would a config look like that performs 
inferences that appear in the default graph?


tdb:unionDefaultGraph only makes the SPARQL default graph appear as the 
union of named graphs.


To get the inference code to see a union graph, try a named graph of 



The tdb:unionDefaultGraph can be removed unless you want to have 
(non-inference) access to the dataset.


-
:defaultDataset a ja:RDFDataset ;
ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
ja:baseModel :tdbGraph ;
ja:reasoner [
ja:reasonerURL <http://jena.hpl.hp.com/2003/OWLFBRuleReasoner> ;
] ;
.

:tdbGraph a tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
tdb:namedGraph  ## ***
.

:tdbDataset a  tdb:DatasetTDB ;
tdb:location  "DB" ;
.
-

I tried (TriG):

---
GRAPH :graph1 {
  :x rdf:type :T1 .
}

GRAPH :graph2 {
 :T1 rdfs:subClassOf :T2 .
}
---

query:
SELECT * { :x rdf:type ?T }

and got T1 and T2.



:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph-inference" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset : defaultDataset;
.
:defaultDataset a ja:RDFDataset ;
 ja:defaultGraph :infModel ;
.
:infModel a ja:InfModel ;
 ja:baseModel :tdbGraph ;
 ja:reasoner [
 ja:reasonerURL 
<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner<http://jena.hpl.hp.com/2003/OWLFBRuleReasoner>>
 ;
 ] ;
.
:tdbGraph a tdb:GraphTDB ;
 tdb:dataset :tdbDataset ;
.
:tdbDataset a  tdb:DatasetTDB ;



NB This is TDB1.
It should not affect this query but TDB2 is a better choice long term. 
Both work in the illustration above.

TDB2 is more robust and work better in Fuseki.


 tdb:location  "DB" ;
 tdb:unionDefaultGraph true ;
.


Thanks for all your help
-- Scott

From: Andy Seaborne 
Sent: Saturday, August 24, 2024 9:52 AM
To: users@jena.apache.org
Subject: Re: [EXTERNAL] Re: Using Fuseki to query the union of datasets

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.



On 24/07/2024 20:08, Scott Henninger wrote:

Thank you for the response, Pedro. However it seems to me that I am already 
applying the approach you reference in the page. See lines 35-37 int the config 
file I included. Perhaps I am applying it wrong? If so, where am I going wrong?


You don't need to go through a model. Connect the fuseki:Service to the
dataset with tdb:unionDefaultGraph


:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset :tdbDataset ;

:tdbDataset rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.

Andy



-- Scott

From: Pedro 
mailto:pedro.win.s...@googlemail.com.INVALID>>
Sent: Tuesday, July 23, 2024 4:35 PM
To: users@jena.apache.org<mailto:users@jena.apache.org>
Subject: [EXTERNAL] Re: Using Fuseki to query the union of datasets

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi Scott

Look at the TDB and TDB2 examples in
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html<https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html><https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html<https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html>>

Uncomment the line

tdb:unionDefaultGraph true;

Cheers!

On Tue, 23 Jul 2024, 22:29 Scott Henninger, 
mailto:scott.hennin...@ecstech.com<mailto:scott.hennin...@ecstech.com%3cmailto:scott.hennin...@ecstech.com>>>
wrote:


I am attempting to use Fuseki to query the union of a set of datasets. I
am using the following configuration to define a TDB dataset named
"union-graph" and setting tdb:unionDefaultGraph to true to query the union
of the datasets. However when I query "union-graph" I do not see the
triples from the datasets I have populated. How can I configure Fuseki to
query the union of datasets?

Thank you
-- Scott

@prefix : <#> .
@prefix fuseki: 
<http://jena.apache.org/fuseki#<http://jena.apache.org/fuseki#><http://jena.apache.org/fuseki#<http://jena.apache.org/fuseki#>>>
 .
@prefix rdf: 
<http://www.w3.org/1999/02/22-rdf-syntax-ns#<http://www.w3.org/19

Re: rdfdiff shows graphs are unequal, but does not list the differences

2024-09-09 Thread Andy Seaborne

John,

So they differ only in the object and the terms are "same value" but not 
"same term"?


Seems to work with the current Jena development codebase,
but not in Jena 5.1.0.

"Fixed in the next release".

Andy

On 09/09/2024 10:40, John Walker wrote:

Hi Andy,

Thanks for the quick reply!



-----Original Message-
From: Andy Seaborne 
Sent: Friday, 6 September 2024 16:40
To: users@jena.apache.org
Subject: Re: rdfdiff shows graphs are unequal, but does not list the differences



On 06/09/2024 13:32, John Walker wrote:

Hi Andy,


-----Original Message-
From: Andy Seaborne 
Sent: Friday, 6 September 2024 10:54
To: users@jena.apache.org
Subject: Re: rdfdiff shows graphs are unequal, but does not list the
differences



On 05/09/2024 19:12, John Walker wrote:

Hi,

I am working on a project where we cleanse/normalize some RDF data,
and I

am using the rdfdiff utility to compare the input and output.

The utility tells me the models are unequal, but it does not list
any

statements.


Through a process of trial and error, I could isolate a couple of
literals that are

changed, but it’s unclear why the utility does not detect them.

* "1756"^^xsd:int --> "1756"^^xsd:integer
* "2024-03-13T12:52:06.227Z"^^xsd:dateTime -->
"2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime

See attached minimal examples.

$ rdfdiff original.ttl modified.ttl TTL TTL models are unequal

Does Jena normalize literals when parsing the input files?


Not unless you configure the parser to do that or it goes into TDB.


Are the literal values different, or not?


xsd:dateTime: They are different RDF terms, they represent the same value.


Am I correct to say the variant with "Z" time zone is the canonical lexical

representation?

Yes.
https://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap


xsd:int/xsd:integer:
TDB1 blurs the difference, TDB2 retains the datatype.


Reading SPARQL 1.1 recommendation, is it correct to say:

"1756"^^xsd:int = "1756"^^xsd:integer produces a type error


That is not a type error. "=" compares values and they have the same value
space (numbers) so they can be be compared.

https://www.w3.org/TR/sparql11-query/#OperatorMapping


sameTerm("1756"^^xsd:int, "1756"^^xsd:integer) = false


Correct.

There can be multiple terms for the same value.

also false --

sameTerm("+1756"^^xsd:integer, "1756"^^xsd:integer)
sameTerm("01756"^^xsd:integer, "1756"^^xsd:integer)



This because RDF 1.1 Concepts and Abstract Syntax literal term equality

requires the datatype IRIs to compare equal, character by character.

and also RDF is not dependent on XSD datatypes. They are suggested but
there is no requirement to handle XSD. There is in SPARQL for a limited set of
datatypes and pragmatically, many triple store support a lot more than the
minimum.



I'm a bit puzzled by the example for "2004-12-31T19:00:00-

05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> and
xsd:dateTime("2005-01-01T00:00:00Z").

Those are not character by character equal, but the RDFterm-equal returns.


They are not term equals (sameTerm).
They are value equals (the same point on the time line)

RDFTerm-Equal is a fallback. Two terms are value-equal if the terms are
the sameTerm regardless of understanding the datatype (datatypes are a
function).


OK, clear.



Two dateTimes will be dispatched further up the table by:

A = B   xsd:dateTimexsd:dateTimeop:dateTime-equal(A, B)









Is this a bug?


Which Jena version are you running?


I'm running 4.10.0 locally.



Jena5 changed to "term equality" everywhere for in-memory, with TDB still
storing values.


I'll try with the latest release.


Using 5.1.0 the rdfdiff does output the statements with different terms when I 
try it with my initial files.
However, when I try with the smaller examples from my earlier mail, then no 
diff is shown.

$ rdfdiff original.ttl modified.ttl TTL TTL
models are unequal

It seems strange, but if I add more statements to both files, then it does 
output the diff when both graphs contain at least 10 statements:

$ rdfdiff original.ttl modified.ttl TTL TTL
models are unequal

< [http://example.com/this, http://purl.org/dc/terms/modified, 
"2024-03-13T12:52:06.227Z"^^xsd:dateTime]
< [http://example.com/this, http://open-services.net/ns/core#shortId, 
"1756"^^xsd:int]

[http://example.com/this, http://purl.org/dc/terms/modified, 
"2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime]
[http://example.com/this, http://open-services.net/ns/core#shortId, 
"1756"^^xsd:integer]


Seems like odd behaviour.

John



John



   Andy



Regards,

John Walker
Principal Consultant & co-founder

Semaku B.V. | Torenallee 20 (SFJ 3D) | 5617 BC Eindhoven | T +31 6
42590072 | https://semaku.com/
KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219

95








Re: Identifying the currently-running queries

2024-09-08 Thread Andy Seaborne




On 05/09/2024 16:59, Hugo Mills wrote:
Thanks, both. We're already logging queries, but I was after something 
like an API that could give me an instantaneous snapshot of what's 
running *right now*, rather than something that I'll have to do post-hoc 
analysis on messy log files to get that information. We've used that 
capability (through an API endpoint) to good effect in GraphDB, and I 
was hoping that there would be something similar in Fuseki.


Thanks,
Hugo.


Hugo,

Sound like a useful addition - maybe done as keeping details of the the 
last few 10's queries.


Could you raise a Jena issue for this please?

https://github.com/apache/jena/issues

Andy



Hugo Mills

Development Team Leader

agrimetrics.co.uk 

Reading Enterprise Centre, Whiteknights Road, Reading, UK, RG6 6BU






-Original Message-
From: Simon Bin 
Sent: 05 September 2024 15:31
To: users@jena.apache.org
Subject: Re: Re: Identifying the currently-running queries

[You don't often get email from s...@informatik.uni-leipzig.de. Learn 
why this is important at https://aka.ms/LearnAboutSenderIdentification ]


In case it is of use, here is our script to parse the Jena 5 logging 
output and print the log file in org format, so you can easily browse 
the running queries in org-mode:


https://gitlab.com/coypu-project/tools/skynet_loader/-/blob/master/ 
parse_log49.pl


Note that the logging format changes frequently so it may need small 
adjustments in the $date_pattern and the $line regular expression.


Cheers,

On Thu, 2024-09-05 at 16:15 +0200, Lorenz Buehmann wrote:
 >
 > Hi,
 >
 > we're running Fuseki 5.1.0 and are using the logging of Jena. That
 > works fine for us.
 >
 > There is no logging of SPARQL Update statements though because those
 > can get to large in terms of text.
 >
 >
 >
 >
 > By the way, I would not run an ancient Fuseki 3.4.0 in production
 > anymore - is there a reason for this? just because it works?
 >
 >
 >
 >
 >
 > Cheers,
 >
 > Lorenz
 >
 >
 > On 05.09.24 14:04, Hugo Mills wrote:
 >
 >
 > >
 > >
 > > Hi, all,
 > >
 > >
 > >
 > > We’ve got a heavily-used webapp backed by Fuseki, and we’re having
 > > issues with the load on the Fuseki server. It frequently heads into
 > > a storm of high load average, with the CPU usage pegged at 600% (on
 > > 6 cores), and then the app grinds to a halt and we have to restart
 > > the database. We’re trying to understand why this is happening. Is
 > > there the ability in Fuseki to get a list of the currently-running
 > > queries at any given point in time, including the query text itself,
 > > and preferably also the amount of time each one has been running
 > > for?
 > >
 > >
 > >
 > > We’re running Fuseki from Jena 3.4.0, if that makes a difference to
 > > the answer.
 > >
 > >
 > >
 > > Thanks,
 > >
 > > Hugo.
 > >
 > >
 > >
 > >
 > >
 > > Hugo Mills
 > >
 > > Development Team Leader
 > >
 > > agrimetrics.co.uk
 > >
 > > Reading Enterprise Centre, Whiteknights Road, Reading, UK, RG6 6BU
 > >
 > >
 > >
 > >
 > >
 > >
 > >
 > >
 > >
 > >
 >





Re: rdfdiff shows graphs are unequal, but does not list the differences

2024-09-06 Thread Andy Seaborne




On 06/09/2024 13:32, John Walker wrote:

Hi Andy,


-Original Message-
From: Andy Seaborne 
Sent: Friday, 6 September 2024 10:54
To: users@jena.apache.org
Subject: Re: rdfdiff shows graphs are unequal, but does not list the differences



On 05/09/2024 19:12, John Walker wrote:

Hi,

I am working on a project where we cleanse/normalize some RDF data, and I

am using the rdfdiff utility to compare the input and output.

The utility tells me the models are unequal, but it does not list any

statements.


Through a process of trial and error, I could isolate a couple of literals that 
are

changed, but it’s unclear why the utility does not detect them.

* "1756"^^xsd:int --> "1756"^^xsd:integer
* "2024-03-13T12:52:06.227Z"^^xsd:dateTime -->
"2024-03-13T12:52:06.227000+00:00"^^xsd:dateTime

See attached minimal examples.

$ rdfdiff original.ttl modified.ttl TTL TTL models are unequal

Does Jena normalize literals when parsing the input files?


Not unless you configure the parser to do that or it goes into TDB.


Are the literal values different, or not?


xsd:dateTime: They are different RDF terms, they represent the same value.


Am I correct to say the variant with "Z" time zone is the canonical lexical 
representation?


Yes.
https://www.w3.org/TR/xmlschema11-2/#f-tzCanFragMap


xsd:int/xsd:integer:
TDB1 blurs the difference, TDB2 retains the datatype.


Reading SPARQL 1.1 recommendation, is it correct to say:

"1756"^^xsd:int = "1756"^^xsd:integer produces a type error


That is not a type error. "=" compares values and they have the same 
value space (numbers) so they can be be compared.


https://www.w3.org/TR/sparql11-query/#OperatorMapping


sameTerm("1756"^^xsd:int, "1756"^^xsd:integer) = false


Correct.

There can be multiple terms for the same value.

also false --

sameTerm("+1756"^^xsd:integer, "1756"^^xsd:integer)
sameTerm("01756"^^xsd:integer, "1756"^^xsd:integer)



This because RDF 1.1 Concepts and Abstract Syntax literal term equality 
requires the datatype IRIs to compare equal, character by character.


and also RDF is not dependent on XSD datatypes. They are suggested but 
there is no requirement to handle XSD. There is in SPARQL for a limited 
set of datatypes and pragmatically, many triple store support a lot more 
than the minimum.




I'm a bit puzzled by the example for 
"2004-12-31T19:00:00-05:00"^^<http://www.w3.org/2001/XMLSchema#dateTime> and 
xsd:dateTime("2005-01-01T00:00:00Z").
Those are not character by character equal, but the RDFterm-equal returns.


They are not term equals (sameTerm).
They are value equals (the same point on the time line)

RDFTerm-Equal is a fallback. Two terms are value-equal if the terms are 
the sameTerm regardless of understanding the datatype (datatypes are a 
function).


Two dateTimes will be dispatched further up the table by:

A = B   xsd:dateTimexsd:dateTimeop:dateTime-equal(A, B)









Is this a bug?


Which Jena version are you running?


I'm running 4.10.0 locally.



Jena5 changed to "term equality" everywhere for in-memory, with TDB still
storing values.


I'll try with the latest release.

John



  Andy



Regards,

John Walker
Principal Consultant & co-founder

Semaku B.V. | Torenallee 20 (SFJ 3D) | 5617 BC Eindhoven | T +31 6
42590072 | https://semaku.com/
KvK: 58031405 | BTW: NL852842156B01 | IBAN: NL94 INGB 0008 3219 95






Re: RDFConnection remote fetch behaviour on missing graph

2024-09-01 Thread Andy Seaborne



On 01/09/2024 11:57, Bart van Leeuwen wrote:

Hi,

When I do a fetch(graph) on a RDFConnection created by RDFConnectionFuseki
on a fuseki endpoint (4.10.0) and the graph does not exist a 404 exception
is thrown.
I would expect an empty Model, and not an exception?

Doing a fetch on dataset from e.g. stardog to a none existing graph
returns an empty model.

Met Vriendelijke Groet / With Kind Regards
Bart van Leeuwen


(GSP = SPARQL Graph Store Protocol)

"Empty graph" vs "graph does not exist" is murky area because triple 
stores using quad indexing don't differentiate between the two cases 
leading to inconsistent behaviour. Jena behaviour is supposed to be 
consistent for all stores (quad based or not).


As to the effect on GSP GET - what does the community want?

Fuseki is making a special test so teh GSP GET behaviour can be changed 
without disturbing anything else.



There are corner cases:

Does "GRAPH ?G {}" include ?
  Presumably "no" for all stores - else there are infinite results.

Does "GRAPH  {}" return a row?
  Jena - yes. Empty patten on empty graph is one row, no columns.

What does
  CREATE GRAPH  ;
  CREATE GRAPH 
do?

From an plain HTTP point of view, the URL

   http://server/database?graphname=uri

(there is nothing special about query strings when identifying a web 
resource) does not exist so 404 is natural when client accesses are not 
SPARQL aware.



Checking the spec:

GSP says some at-odds things:

"""
If the RDF graph content identified in the request does not exist in the 
server, and the operation requires that it does, a 404 Not Found 
response code MUST be provided in the response.

"""

GSP GET says:
"""
It is equivalent to
  CONSTRUCT { ?s ?p ?o } WHERE { GRAPH  { ?s ?p ?o } }
"""
which would be empty. Whether the equivalence observation

But GSP POST says
"""
If the graph IRI does not identify either a Graph Store or RDF graph 
content, the origin server should respond with a 404 Not Found.

"""

GSP DELETE does say it's 404.



Re: Issues with WAR when upgrading from 5.0.0 to 5.1.0

2024-09-01 Thread Andy Seaborne




On 01/09/2024 09:30, Bart van Leeuwen wrote:

Hi Andy,

Adding JenaSystem.init() to my class does the trick.

Bart


Hiu Bart - thank you for confirming that.

There'll be Jena fix in 5.2.0:
  https://github.com/apache/jena/issues/2675

adding JenaSystem.init() in better to give the best control in complex 
situations and it's cheap to call in application class loading.


Andy




From:   Bart van Leeuwen/netage
To: users@jena.apache.org
Date:   31-08-2024 22:46
Subject:RE: Re: Issues with WAR when upgrading from 5.0.0 to 5.1.0


Hi Andy,

Will test tomorrow, so my initial idea was correct.

Did anything change inbetween 5.0.0 and 5.1.0 ?

Goodnight from a quiet fire station ;)

Bart

On 2024/08/31 20:03:35 Andy Seaborne wrote:

Hi Bart,

Innl.netage.storetest.Reader

could you add this as a class static initializer at the beginning of the



class:

  static { jenaSystem.init(); }

if that works, adding it to info.resc.pontypandy.store.StoreFactory
(assuming that's your code).


In general, especially if there might be multiple threads very early on
(not the case here), it is a good idea to call JenaSystem.init().

My regards to Sam,

  Andy

On 31/08/2024 19:04, Bart van Leeuwen wrote:

Hi,

I've started upgrading my components from 5.0.0 to 5.1.0 and I run
into the following stacktrace when creating a remotefuseki connection

I suspect an initialization issue, but I cannot find any reference in
the release notes.

=

Caused by: java.lang.ExceptionInInitializerError
 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.type(ReorderFixed.java:61)

 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.(ReorderFixed.java:63)

 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.fixed(ReorderLib.java:76)

 at org.apache.jena.tdb1.sys.SystemTDB.(SystemTDB.java:193)
 at org.apache.jena.tdb1.TDB1.(TDB1.java:99)
 at org.apache.jena.tdb1.sys.InitTDB.start(InitTDB.java:29)
 at


org.apache.jena.base.module.Subsystem.lambda$initialize$1(Subsystem.java:117)

 at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
 at

org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:193)

 at

org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:169)

 at
org.apache.jena.base.module.Subsystem.initialize(Subsystem.java:115)
 at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:89)
 at org.apache.jena.graph.NodeFactory.(NodeFactory.java:39)
 at


org.apache.jena.rdf.model.impl.ResourceImpl.fresh(ResourceImpl.java:150)

 at


org.apache.jena.rdf.model.impl.ResourceImpl.(ResourceImpl.java:86)

 at


org.apache.jena.rdf.model.ResourceFactory$Impl.createResource(ResourceFactory.java:308)

 at


org.apache.jena.rdf.model.ResourceFactory.createResource(ResourceFactory.java:94)

 at org.apache.jena.vocabulary.RDF.resource(RDF.java:54)
 at org.apache.jena.vocabulary.RDF.(RDF.java:65)
 at

org.apache.jena.riot.lang.ReaderTriX.(ReaderTriX.java:116)

 at


org.apache.jena.riot.RDFParserRegistry.initStandard(RDFParserRegistry.java:74)

 at
org.apache.jena.riot.RDFParserRegistry.init(RDFParserRegistry.java:57)
 at


org.apache.jena.riot.RDFParserRegistry.(RDFParserRegistry.java:52)

 at

org.apache.jena.riot.RDFLanguages.isQuads(RDFLanguages.java:370)

 at


org.apache.jena.rdflink.RDFLinkHTTPBuilder.quadsFormat(RDFLinkHTTPBuilder.java:172)

 at


org.apache.jena.rdflink.RDFLinkFuseki.setupForFuseki(RDFLinkFuseki.java:65)

 at


org.apache.jena.rdflink.RDFLinkFuseki.newBuilder(RDFLinkFuseki.java:45)

 at


org.apache.jena.rdfconnection.RDFConnectionFuseki$RDFConnectionFusekiBuilder.(RDFConnectionFuseki.java:75)

 at


org.apache.jena.rdfconnection.RDFConnectionFuseki.create(RDFConnectionFuseki.java:61)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStore.(JenaFusekiStore.java:52)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:75)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:16)

 at


info.resc.pontypandy.store.StoreFactory.getStoreFromJNDI(StoreFactory.java:292)

 at nl.netage.storetest.Reader.test(Reader.java:41)
 at


java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

 at java.base/java.lang.reflect.Method.invoke(Method.java:580)
 at


org.apache.openejb.server.cxf.rs.PojoInvoker.performInvocation(PojoInvoker.java:43)

 at


org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)

 ... 37 more
Caused by: java.lang.NullPointerException: Cannot invoke
"org.apache.jena.rdf.model.Resource.asNode()" because
"org.apache.jena.vocabulary.RDF.Alt" is null
 at org.apache.jena.vocabulary.RDF$Nodes.(RDF.java:193)
 ... 75 more










Re: Issues with WAR when upgrading from 5.0.0 to 5.1.0

2024-08-31 Thread Andy Seaborne




On 31/08/2024 21:46, Bart van Leeuwen wrote:

Hi Andy,

Will test tomorrow, so my initial idea was correct.


yes, you were correct. I can recrete a similar situation here.


Did anything change inbetween 5.0.0 and 5.1.0 ?


Not obviously. The only nearby things look safe. But initlization 
problems can be due to code far away.


If it all works with added JenaSystem.init() then a fix in Jena can be 
done.


All first point of contact classes that do actual work should have a 
JenaSystem.init() and RDFLinkHTTPBuilder doesn't and never has. "It just 
worked" by natural class initialization.


It isn't even always that some change X is wrong, it can be the overall 
combination of X/Y/Z/... being the cause.


Andy


Goodnight from a quiet fire station ;)

Bart

On 2024/08/31 20:03:35 Andy Seaborne wrote:

Hi Bart,

Innl.netage.storetest.Reader

could you add this as a class static initializer at the beginning of the



class:

  static { jenaSystem.init(); }

if that works, adding it to info.resc.pontypandy.store.StoreFactory
(assuming that's your code).


In general, especially if there might be multiple threads very early on
(not the case here), it is a good idea to call JenaSystem.init().

My regards to Sam,

  Andy

On 31/08/2024 19:04, Bart van Leeuwen wrote:

Hi,

I've started upgrading my components from 5.0.0 to 5.1.0 and I run
into the following stacktrace when creating a remotefuseki connection

I suspect an initialization issue, but I cannot find any reference in
the release notes.

=

Caused by: java.lang.ExceptionInInitializerError
 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.type(ReorderFixed.java:61)

 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.(ReorderFixed.java:63)

 at


org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.fixed(ReorderLib.java:76)

 at org.apache.jena.tdb1.sys.SystemTDB.(SystemTDB.java:193)
 at org.apache.jena.tdb1.TDB1.(TDB1.java:99)
 at org.apache.jena.tdb1.sys.InitTDB.start(InitTDB.java:29)
 at


org.apache.jena.base.module.Subsystem.lambda$initialize$1(Subsystem.java:117)

 at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
 at

org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:193)

 at

org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:169)

 at
org.apache.jena.base.module.Subsystem.initialize(Subsystem.java:115)
 at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:89)
 at org.apache.jena.graph.NodeFactory.(NodeFactory.java:39)
 at


org.apache.jena.rdf.model.impl.ResourceImpl.fresh(ResourceImpl.java:150)

 at


org.apache.jena.rdf.model.impl.ResourceImpl.(ResourceImpl.java:86)

 at


org.apache.jena.rdf.model.ResourceFactory$Impl.createResource(ResourceFactory.java:308)

 at


org.apache.jena.rdf.model.ResourceFactory.createResource(ResourceFactory.java:94)

 at org.apache.jena.vocabulary.RDF.resource(RDF.java:54)
 at org.apache.jena.vocabulary.RDF.(RDF.java:65)
 at

org.apache.jena.riot.lang.ReaderTriX.(ReaderTriX.java:116)

 at


org.apache.jena.riot.RDFParserRegistry.initStandard(RDFParserRegistry.java:74)

 at
org.apache.jena.riot.RDFParserRegistry.init(RDFParserRegistry.java:57)
 at


org.apache.jena.riot.RDFParserRegistry.(RDFParserRegistry.java:52)

 at

org.apache.jena.riot.RDFLanguages.isQuads(RDFLanguages.java:370)

 at


org.apache.jena.rdflink.RDFLinkHTTPBuilder.quadsFormat(RDFLinkHTTPBuilder.java:172)

 at


org.apache.jena.rdflink.RDFLinkFuseki.setupForFuseki(RDFLinkFuseki.java:65)

 at


org.apache.jena.rdflink.RDFLinkFuseki.newBuilder(RDFLinkFuseki.java:45)

 at


org.apache.jena.rdfconnection.RDFConnectionFuseki$RDFConnectionFusekiBuilder.(RDFConnectionFuseki.java:75)

 at


org.apache.jena.rdfconnection.RDFConnectionFuseki.create(RDFConnectionFuseki.java:61)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStore.(JenaFusekiStore.java:52)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:75)

 at


info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:16)

 at


info.resc.pontypandy.store.StoreFactory.getStoreFromJNDI(StoreFactory.java:292)

 at nl.netage.storetest.Reader.test(Reader.java:41)
 at


java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

 at java.base/java.lang.reflect.Method.invoke(Method.java:580)
 at


org.apache.openejb.server.cxf.rs.PojoInvoker.performInvocation(PojoInvoker.java:43)

 at


org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)

 ... 37 more
Caused by: java.lang.NullPointerException: Cannot invoke
"org.apache.jena.rdf.model.Resource.asNode()" because
"org.apache.jena.vocabulary.RDF.Alt" is null
 at org.apache.jena.vocabulary.RDF$Nodes.(RDF.java:193)
 ... 75 more








Re: Issues with WAR when upgrading from 5.0.0 to 5.1.0

2024-08-31 Thread Andy Seaborne

Hi Bart,

Innl.netage.storetest.Reader

could you add this as a class static initializer at the beginning of the 
class:


    static { jenaSystem.init(); }

if that works, adding it to info.resc.pontypandy.store.StoreFactory 
(assuming that's your code).



In general, especially if there might be multiple threads very early on 
(not the case here), it is a good idea to call JenaSystem.init().


My regards to Sam,

    Andy

On 31/08/2024 19:04, Bart van Leeuwen wrote:

Hi,

I've started upgrading my components from 5.0.0 to 5.1.0 and I run 
into the following stacktrace when creating a remotefuseki connection


I suspect an initialization issue, but I cannot find any reference in 
the release notes.


=

Caused by: java.lang.ExceptionInInitializerError
    at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.type(ReorderFixed.java:61)
    at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderFixed.(ReorderFixed.java:63)
    at 
org.apache.jena.sparql.engine.optimizer.reorder.ReorderLib.fixed(ReorderLib.java:76)

    at org.apache.jena.tdb1.sys.SystemTDB.(SystemTDB.java:193)
    at org.apache.jena.tdb1.TDB1.(TDB1.java:99)
    at org.apache.jena.tdb1.sys.InitTDB.start(InitTDB.java:29)
    at 
org.apache.jena.base.module.Subsystem.lambda$initialize$1(Subsystem.java:117)

    at java.base/java.util.ArrayList.forEach(ArrayList.java:1596)
    at org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:193)
    at org.apache.jena.base.module.Subsystem.forEach(Subsystem.java:169)
    at 
org.apache.jena.base.module.Subsystem.initialize(Subsystem.java:115)

    at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:89)
    at org.apache.jena.graph.NodeFactory.(NodeFactory.java:39)
    at 
org.apache.jena.rdf.model.impl.ResourceImpl.fresh(ResourceImpl.java:150)
    at 
org.apache.jena.rdf.model.impl.ResourceImpl.(ResourceImpl.java:86)
    at 
org.apache.jena.rdf.model.ResourceFactory$Impl.createResource(ResourceFactory.java:308)
    at 
org.apache.jena.rdf.model.ResourceFactory.createResource(ResourceFactory.java:94)

    at org.apache.jena.vocabulary.RDF.resource(RDF.java:54)
    at org.apache.jena.vocabulary.RDF.(RDF.java:65)
    at org.apache.jena.riot.lang.ReaderTriX.(ReaderTriX.java:116)
    at 
org.apache.jena.riot.RDFParserRegistry.initStandard(RDFParserRegistry.java:74)
    at 
org.apache.jena.riot.RDFParserRegistry.init(RDFParserRegistry.java:57)
    at 
org.apache.jena.riot.RDFParserRegistry.(RDFParserRegistry.java:52)

    at org.apache.jena.riot.RDFLanguages.isQuads(RDFLanguages.java:370)
    at 
org.apache.jena.rdflink.RDFLinkHTTPBuilder.quadsFormat(RDFLinkHTTPBuilder.java:172)
    at 
org.apache.jena.rdflink.RDFLinkFuseki.setupForFuseki(RDFLinkFuseki.java:65)
    at 
org.apache.jena.rdflink.RDFLinkFuseki.newBuilder(RDFLinkFuseki.java:45)
    at 
org.apache.jena.rdfconnection.RDFConnectionFuseki$RDFConnectionFusekiBuilder.(RDFConnectionFuseki.java:75)
    at 
org.apache.jena.rdfconnection.RDFConnectionFuseki.create(RDFConnectionFuseki.java:61)
    at 
info.resc.pontypandy.store.jena.fuseki.JenaFusekiStore.(JenaFusekiStore.java:52)
    at 
info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:75)
    at 
info.resc.pontypandy.store.jena.fuseki.JenaFusekiStoreFactory.getStore(JenaFusekiStoreFactory.java:16)
    at 
info.resc.pontypandy.store.StoreFactory.getStoreFromJNDI(StoreFactory.java:292)

    at nl.netage.storetest.Reader.test(Reader.java:41)
    at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)

    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at 
org.apache.openejb.server.cxf.rs.PojoInvoker.performInvocation(PojoInvoker.java:43)
    at 
org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)

    ... 37 more
Caused by: java.lang.NullPointerException: Cannot invoke 
"org.apache.jena.rdf.model.Resource.asNode()" because 
"org.apache.jena.vocabulary.RDF.Alt" is null

    at org.apache.jena.vocabulary.RDF$Nodes.(RDF.java:193)
    ... 75 more


Re: [EXTERNAL] Re: Configuring logging in Fuseki

2024-08-29 Thread Andy Seaborne

Hi Scott,

They are not logged normally - they are in verbose mode.

There are some issues with logging them due to scaling issues.

* Update can be very large
* Update request are streamed

Because of streaming, the whole text is not available at the point of 
logging (at the operation start).


An update request can be many operations and each operation is executed 
(but not committed) before the next one is read and processed.


Something like:

INSERT DATA {
  
} ;
INSERT DATA {
  
} ;
DELETE DATA {
  
} ;
INSERT DATA {
  
} ;

Adding large amounts of data is often better done by POSTing RDF.
RDF patch can be better, but not a standard, for a mix of adds and deletes.

Maybe the use of "small updates" dominates usage and could be logged.

Do you want the whole update logged or would a summary be sufficient?

Please do raise a issue at
https://github.com/apache/jena/issues

and we can gather input.

Andy

On 28/08/2024 21:41, Scott Henninger wrote:

That works nicely, thank you for the worked out example, Øyvind.

A follow-up question.  I noticed that queries are logged, but not updates (i.e. 
SPARQL update).  Is there a logger configuration specifically for updates?  
Logging queries is fine but I’d also like to know when and how my data is 
updated.

Thanks in advance,
-- Scott


Re: fuseki service description and sparql service description

2024-08-28 Thread Andy Seaborne




On 27/08/2024 02:52, Paul Tyson wrote:
I have a few questions to help me figure out the relationship between 
fuseki service descriptions and the SPARQL service-description schema 
[1]. My goal is to translate between fuseki:* and sd:* graphs, to use 
fuseki:* internally in the app, and expose service information as sd:* 
graphs.


Is there any code or documentation that would establish a mapping 
between the fuseki and sd schemas? E.g. between fuseki:Service and 
sd:Service, and fuseki:endpoint and sd:endpoint. I can infer some 
relationships from the examples, but would like to put it on a more 
solid footing.

That would be interesting to hear about.

The Fuseki configuration was not intended to be SPARQL 
service-descriptions; it is purely about configuring a Fuseki server.


> Is the http://jena.apache.org/fuseki# schema declared anywhere? Or, is
> there code or documentation to help understand it?
Code:
org.apache.jena.fuseki.build.FusekiConfig and nearby.

It is a mix of Jena assemblers for the datasets and Fuseki-specific for 
the services/endpoints


Examples:
https://github.com/apache/jena/tree/main/jena-fuseki2/examples

Description:
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

An additional question, at the operational level. Using fuseki 5.1.0. 
The SPARQL service description spec Section 2 says conforming SPARQL 
protocol servers [2] should provide a service description in response to 
a GET request with no query params at the root endpoint. But my services 
at localhost:3030/{dataset-name}/ just return the text:


   Service Description: /{dataset-name}

if I have all the services mapped to "/" as well as "/gsp", "/query", "/ 
update".


When I remove the "/" mappings, I get "No endpoint for request".

Is there a way to configure fuseki so it returns an SD graph from the 
root of the endpoint when no query is present?


No currently.

The problem is that "GET /dataset" is normally mapped to an HTTP GET of 
the dataset data. That's natural for HTTP.


It competes with providing service descriptions. They overlap in their 
content types as well.


There could be a fuseki:operation fuseki:ServiceDescription  - and it 
can be Content-type sensitive - that would add the fucntionality and the 
system provide either "GET dataset" (GSP-R supports quads) xor service 
description.


   Andy



Thanks,
--Paul

[1] https://www.w3.org/TR/sparql11-service-description/
[2] https://www.w3.org/TR/sparql11-protocol/





Re: jena vs jena-fuseki: different behaviour for namespace replacement by URI in query answer

2024-08-27 Thread Andy Seaborne




On 26/08/2024 11:43, Simon Bin wrote:

You can also get the current prefix mapping from Fuseki if you do this
query:

CONSTRUCT WHERE { }

You can also update the prefix mapping with

LOAD 


True - both Jena behaviours and not required by the standards.

RDF-Patch can add/replace and delete prefixes.

But not always convenient.

The original reason for a prefixes was to support UI code. Parsing RDF 
just to get prefixes, or making a file for LOAD is more work for the UI 
developer.


fuseki2/prefixes-service is a simple JSON return for lookup.

Andy



Cheers,


On Mon, 2024-08-26 at 10:49 +0200, christophe heligon wrote:

Dear Andy,
thank you for this explanation.
Christophe

- Mail original -
De: "Andy Seaborne" 
À: "users" 
Envoyé: Vendredi 23 Août 2024 12:46:56
Objet: Re: jena vs jena-fuseki: different behaviour for namespace
replacement by URI in query answer

On 22/08/2024 13:24, christophe heligon wrote:

Dear community,

Jena and Jena-fuseki 4.9/4.10/5.1 do not display the results
(namespace/
prefix vs uri) similarly in my hands and I wonder what I did wrong.


Nothing!

SPARQL SELECT results (any format) do not include prefixes and the
3rd
party component (the YASR part of YASGUI) using CodeMirror doesn't
have
a way of getting them.

Prefixes are only surface syntax and not part of RDF conceptual data
model. Each dataset records prefixes.

We recently introduced a "prefixes API" so applications can get and
update the prefixes for a dataset.

https://jena.apache.org/documentation/fuseki2/prefixes-service

but of course that's a Jena feature, not a standard.

It would be good if YASR could be prefixes sensitive. It may be
something the Jena UI javascript can do as YASR is supposed to be
modular and flexible.

  Andy


I use the applications via downloading from jena website or through
the
stain/jena docker images.
I load the same data set in both context(tdb2.tdbloader --loc DB
$files
or upload manually in fuseki GUI).
I send the same query to both either through ./bin/tdb2.tdbquery --
loc
DB --file request.sq or through fuseki GUI.
The replies are different as fuseki will replace all prefixes by
URIs
and jena will replace some only and I did not see why some and not
others. See below.

Can someone explain this behaviour to me?

Best regards,
Christophe








Re: Named graphs and inference

2024-08-27 Thread Andy Seaborne




On 26/08/2024 22:57, Scott Henninger wrote:

Is it possible to configure Fuseki to run inferences across named graphs and 
have the triples, including the inferred triples, appear in the default graph?


I don't think it's possible if you want to more complicated in the 
inference than your experiment.


There is something specific to RDFS (subLClassOf, subPropertyOf, domain 
and range, not the axiomatic inferences of full RDFS).


https://jena.apache.org/documentation/rdfs/

It's backward chaining at runtime. The separate schema is processed and 
subX expanded but the data isn't updated.



In my attempt to do this, included below (along with the triples I'm testing 
with), I have a configuration that will run the inferences as desired and they 
are materialized in the default graph.  However those inferences persist after 
performing a DROP GRAPH on the named graphs.


In the configuration, there aren't any named graphs. See below.


 In fact, the inferences persist even after restarting Fuseki.  I have to 
manually do a DELETE on the default graph to get them to go away.

So is it possible to run inference, see the inferred triples on the default 
graph, and be able to manage the triples with the named graphs?

Thanks in advance
-- Scott

@prefix :    .
@prefix fuseki:  .
@prefix ja:  .
@prefix rdf: .
@prefix rdfs:    .
@prefix tdb:    .

# --- the basic fuseki service (endpoints) with pointer to dataset --
:service_tdb_all  a   fuseki:Service ;
 fuseki:dataset:dataset ;
 fuseki:name   "test-inf" ;
 fuseki:serviceQuery   "query" , "" , "sparql" ;
 fuseki:serviceReadGraphStore  "get" ;
 fuseki:serviceReadQuads   "" ;
 fuseki:serviceReadWriteGraphStore "data" ;
 fuseki:serviceReadWriteQuads  "" ;
 fuseki:serviceUpdate  "" , "update" ;
 fuseki:serviceUpload  "upload" .

# --- define the default graph (with inference)  ---
:dataset a ja:RDFDataset ;
 ja:defaultGraph :model_inf ;


Default graph.


.

# --- add inference to the graph ---
:model_inf a ja:InfModel ;
  ja:baseModel :graph ;
  ja:reasoner [
  ja:reasonerURL 
  ] ;
.

:graph rdf:type tdb:GraphTDB ;


Default graph.


   tdb:dataset :tdb_dataset_readwrite ;
   tdb:unionDefaultGraph true ;


I think that is going to be ignored - it has meaning on the dataset, not 
a graph.




.

:tdb_dataset_readwrite
 a   tdb:DatasetTDB2 ;
 tdb:location  "../Fuseki/run/databases/dev" ;
.

##
# Loaded in http://example.org/test-ont
@prefix ex:  .
@prefix owl:  .
@prefix rdfs:  .

ex:Class1 a owl:Class .

ex:SubClass1 a owl:Class ;
 rdfs:subClassOf ex:Class1 .

##
# Loaded in http://example.org/test-data (inference in default graph is as 
expected, that ex:Instance1 is a member of ex:Class1)
@prefix ex:  .

ex:Instance1 a ex:SubClass1







Re: Should {?s ?p ?o} match against both named graph triples and default graph triples?

2024-08-25 Thread Andy Seaborne



On 25/08/2024 15:56, Bob DuCharme wrote:
(Sorry about using the older Fuseki that I just happened to have on my 
hard disk before; I think the problem may have been that I was using the 
wrong version of the data with it.)


This query gets 6 rows of results from Fuseki 5.1.0, 10 from GraphDB 
10.7.1,


To get 10 there must be duplicates with different ?g in GraphDB (or 
DISTINCT is mishandled). There are two ways to get the same s/p/o 
triples. There are only 6 triples! 2 default graph and 2 x 2 named graph.


and 12 from Blazegraph. (In Blazegraph, the "one" and "two" 
triples show up once with no value for ?g and once with bd:nullGraph as 
the ?g value.)


    SELECT DISTINCT ?g ?s ?p ?o {
   { ?s ?p ?o }
   UNION
     { GRAPH ?g {?s ?p ?o }}
    }



What about without the ?g

   SELECT DISTINCT ?s ?p ?o {
{ ?s ?p ?o }
UNION
 { GRAPH ?g {?s ?p ?o }}
   }


In your first message we have "SELECT * WHERE {?s ?p ?o}"
so three result columns. Without the ?g is the nearest equivalent.

Or CONSTRUCT { ?s ?p ?o } WHERE  {
{ ?s ?p ?o }
UNION
 { GRAPH ?g {?s ?p ?o }}
   }

because a graph is a set of triples.

Andy

This next query gets 6 rows with all three triplestores  with no value 
for ?g with the "one" and "two" triples in Fuseki and GraphDB and 
bd:nullGraph as the ?g value in BlazeGraph:


SELECT ?g ?s ?p ?o
WHERE
{
   { ?s ?p ?o
     MINUS { GRAPH ?g {?s ?p ?o} }
   }
   UNION
   { GRAPH ?g { ?s ?p ?o } }
}

It's less efficient than the first, but any "give me absolutely all the 
triples" query is asking a lot of the processor anyway. I tend to use 
them for pedagogical demos with small datasets, e.g after a named graph- 
related update query to show the effect of that query.


Thanks,

Bob


On 8/24/24 3:27 PM, Andy Seaborne wrote:



On 24/08/2024 18:33, Bob DuCharme wrote:
Thanks Andy! I understand a bit, but have some followup questions. 
For one thing, I  couldn't figure out which part of https:// 
jena.apache.org/ about_jena/security-advisories.html was relevant to 
this.


"Jena (tested with Fuseki 4.6.1)" -- always good to keep up-to-date. 
Hint, hint.



Is it oversimplifying if I said:

-  many triplestores treat the default graph to be the union of all 
named groups with the triples in any unnamed graphs


- Jena does not do this by default but can be configured to


and that other triplestores do not support the full range that the 
specs allow especially update.


This query shows the six triples from the original example in Fuseki 
but shows ten rows of results in GraphDB (and, I will assume in the 
other triplestores I tried before sending my original email):


    SELECT DISTINCT ?s ?p ?o {
   { ?s ?p ?o }
   UNION
   { GRAPH ?g {?s ?p ?o }
    }


That query is "all triples" in default and named graphs.

Given there are only 6 "?s ?p ?o" anywhere in your example data I 
don't see how "DISTINCT ?s ?p ?o" can be 10.


If it's 10, then there must be duplicates after DISTINCT.

GraphDB finds triples via two routes - default graph and named graph - 
hence 10 in quads ?g/?s/?p/?o


Please try in other stores.

This is more verbose but got consistent results with the two 
triplestores. Any comments or suggestions?


SELECT ?g ?s ?p ?o
WHERE
{
 { ?s ?p ?o
   MINUS { GRAPH ?g {?s ?p ?o} }
 }
   UNION
   { GRAPH ?g { ?s ?p ?o } }
}


More expensive.  BTW it has ?g in the projection. Do you get rows with
?g undefined?



Thanks,

Bob


On 8/24/24 10:40 AM, Andy Seaborne wrote:



On 24/08/2024 15:20, Bob DuCharme wrote:
Imagine running the short INSERT query at https:// 
www.learningsparql.com/2ndeditionexamples/ex338.ru on an empty 
dataset. It inserts six triples: two in the default graph and two 
each into two named graphs, for a total of six triples.


Next, we run the query SELECT * WHERE {?s ?p ?o}. In some 
triplestores this returns a row for each of the six triples in the 
dataset, but Jena (tested with Fuseki 4.6.1)


Don't forget:
https://jena.apache.org/about_jena/security-advisories.html

> only returns rows for the two triples in the > default graph.

It depends on the setup.

The default graph is a separate graph unless the setup configures it 
to be viewed as the union of all named graphs.


In your example, you'll get 4 triples.

e.g. for TDB2:

:dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
    tdb2:location "DB2" ;
    ## Optional - with union default
    ## for query and update WHERE matching.
    tdb2:unionDefaultGraph true ;
    .

Note - there still is a storage-level default graph. 
unionDefaultGraph is a view.


I can't find anything in the Recommendation about what should 
happen here.


The specs don't say much about how the dataset to be queried come 
into being without any FROM's.


This is intentional because treating the defaul

Re: Should {?s ?p ?o} match against both named graph triples and default graph triples?

2024-08-24 Thread Andy Seaborne




On 24/08/2024 18:33, Bob DuCharme wrote:
Thanks Andy! I understand a bit, but have some followup questions. For 
one thing, I  couldn't figure out which part of https://jena.apache.org/ 
about_jena/security-advisories.html was relevant to this.


"Jena (tested with Fuseki 4.6.1)" -- always good to keep up-to-date. 
Hint, hint.



Is it oversimplifying if I said:

-  many triplestores treat the default graph to be the union of all 
named groups with the triples in any unnamed graphs


- Jena does not do this by default but can be configured to


and that other triplestores do not support the full range that the specs 
allow especially update.


This query shows the six triples from the original example in Fuseki but 
shows ten rows of results in GraphDB (and, I will assume in the other 
triplestores I tried before sending my original email):


    SELECT DISTINCT ?s ?p ?o {
   { ?s ?p ?o }
   UNION
   { GRAPH ?g {?s ?p ?o }
    }


That query is "all triples" in default and named graphs.

Given there are only 6 "?s ?p ?o" anywhere in your example data I don't 
see how "DISTINCT ?s ?p ?o" can be 10.


If it's 10, then there must be duplicates after DISTINCT.

GraphDB finds triples via two routes - default graph and named graph - 
hence 10 in quads ?g/?s/?p/?o


Please try in other stores.

This is more verbose but got consistent results with the two 
triplestores. Any comments or suggestions?


SELECT ?g ?s ?p ?o
WHERE
{
     { ?s ?p ?o
   MINUS { GRAPH ?g {?s ?p ?o} }
     }
   UNION
   { GRAPH ?g { ?s ?p ?o } }
}


More expensive.  BTW it has ?g in the projection. Do you get rows with
?g undefined?



Thanks,

Bob


On 8/24/24 10:40 AM, Andy Seaborne wrote:



On 24/08/2024 15:20, Bob DuCharme wrote:
Imagine running the short INSERT query at https:// 
www.learningsparql.com/2ndeditionexamples/ex338.ru on an empty 
dataset. It inserts six triples: two in the default graph and two 
each into two named graphs, for a total of six triples.


Next, we run the query SELECT * WHERE {?s ?p ?o}. In some 
triplestores this returns a row for each of the six triples in the 
dataset, but Jena (tested with Fuseki 4.6.1)


Don't forget:
https://jena.apache.org/about_jena/security-advisories.html

> only returns rows for the two triples in the > default graph.

It depends on the setup.

The default graph is a separate graph unless the setup configures it 
to be viewed as the union of all named graphs.


In your example, you'll get 4 triples.

e.g. for TDB2:

:dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
    tdb2:location "DB2" ;
    ## Optional - with union default
    ## for query and update WHERE matching.
    tdb2:unionDefaultGraph true ;
    .

Note - there still is a storage-level default graph. unionDefaultGraph 
is a view.


I can't find anything in the Recommendation about what should happen 
here.


The specs don't say much about how the dataset to be queried come into 
being without any FROM's.


This is intentional because treating the default graph as the union of 
named graphs has been a pattern since before the SPARQL 1.0.


In some systems, the default is really a named graph

It looks like https://www.w3.org/TR/sparql11-query/ 
#namedAndDefaultGraph would have been the place to clarify this. Is 
the issue of proper behavior in this case just up to the implementer, 
like what DESCRIBE is supposed to return? Or is it something that the 
sparql12 group should clarify and document? (Or maybe they have and I 
missed it...)


https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0004/sep-0004.md
https://github.com/w3c/sparql-dev/issues/43

There should be three names for the three cases:

DEFAULT
UNION (of named graphs)
ALL - a set view of all triples:
   SELECT DISTINCT ?s ?p ?o {
  { ?s ?p ?o }
  UNION
  { GRAPH ?g {?s ?p ?o }
   }

    Andy



Thanks,

Bob








Re: [EXTERNAL] Re: Using Fuseki to query the union of datasets

2024-08-24 Thread Andy Seaborne




On 24/07/2024 20:08, Scott Henninger wrote:

Thank you for the response, Pedro.  However it seems to me that I am already 
applying the approach you reference in the page.  See lines 35-37 int the 
config file I included.  Perhaps I am applying it wrong?  If so, where am I 
going wrong?


You don't need to go through a model. Connect the fuseki:Service to the 
dataset with tdb:unionDefaultGraph



:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset :tdbDataset ;

:tdbDataset rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.

Andy



-- Scott

From: Pedro 
Sent: Tuesday, July 23, 2024 4:35 PM
To: users@jena.apache.org
Subject: [EXTERNAL] Re: Using Fuseki to query the union of datasets

CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you recognize the sender and know the content 
is safe.

Hi Scott

Look at the TDB and TDB2 examples in
https://jena.apache.org/documentation/fuseki2/fuseki-configuration.html

Uncomment the line

tdb:unionDefaultGraph true;

Cheers!

On Tue, 23 Jul 2024, 22:29 Scott Henninger, 
mailto:scott.hennin...@ecstech.com>>
wrote:


I am attempting to use Fuseki to query the union of a set of datasets. I
am using the following configuration to define a TDB dataset named
"union-graph" and setting tdb:unionDefaultGraph to true to query the union
of the datasets. However when I query "union-graph" I do not see the
triples from the datasets I have populated. How can I configure Fuseki to
query the union of datasets?

Thank you
-- Scott

@prefix : <#> .
@prefix fuseki: 
> .
@prefix rdf: 
>
 .

@prefix rdfs: 
> .
@prefix tdb: 
> .
@prefix ja: 
>
 .

[] rdf:type fuseki:Server ;
fuseki:services (
:service1
) .

# Custom code.
[] ja:loadClass "com.hp.hpl.jena.tdb.TDB" .

# TDB
tdb:DatasetTDB rdfs:subClassOf ja:RDFDataset .
tdb:GraphTDB rdfs:subClassOf ja:Model .

:service1 rdf:type fuseki:Service ;
fuseki:name "union-graph" ;
fuseki:serviceQuery "sparql" ;
fuseki:dataset :dataset ;
.

:dataset rdf:type ja:RDFDataset ;
ja:defaultGraph :model_default ;
.

:model_default a ja:InfModel ;
ja:baseModel :tdbGraph ;
.

:tdbDataset rdf:type tdb:DatasetTDB ;
tdb:location "DB" ;
tdb:unionDefaultGraph true ;
.

:tdbGraph rdf:type tdb:GraphTDB ;
tdb:dataset :tdbDataset ;
.






Re: Configuring logging in Fuseki

2024-08-24 Thread Andy Seaborne




On 23/08/2024 18:39, Øyvind Gjesdal wrote:

Hi Scott,

I think you are right. There is more info on logging for fuseki in
https://jena.apache.org/documentation/fuseki2/fuseki-logging.html which
also contains a link to a default file you can fit to your needs.

Best regards,
Øyvind


Yes, and in addition, if you want to take complete complete control of 
logging you can configure log4j directly by setting the environment 
variable LOGGING (see the fuskei-server script).


e.g.

export LOGGING="-Dlog4j.configurationFile=file:"

and use XML or JSON formats.



On Fri, Aug 23, 2024 at 7:21 PM Scott Henninger 
wrote:


Is it possible to configure logging from the command line for Fuseki?  I
found the following link that indicates it looks for the file
log4j2.properties, but I don't see anything about what goes in that
properties Apache Jena - Running Fuseki with UI<
https://jena.apache.org/documentation/fuseki2/fuseki-webapp.html#configuring-logging




First of all is it true that configuring logging is a matter of including
the log4j2.properties file and if so what should appear in the file to set
logging levels?

Thank you
-- Scott







Re: Should {?s ?p ?o} match against both named graph triples and default graph triples?

2024-08-24 Thread Andy Seaborne




On 24/08/2024 15:20, Bob DuCharme wrote:
Imagine running the short INSERT query at https:// 
www.learningsparql.com/2ndeditionexamples/ex338.ru on an empty dataset. 
It inserts six triples: two in the default graph and two each into two 
named graphs, for a total of six triples.


Next, we run the query SELECT * WHERE {?s ?p ?o}. In some triplestores 
this returns a row for each of the six triples in the dataset, but Jena 
(tested with Fuseki 4.6.1)


Don't forget:
https://jena.apache.org/about_jena/security-advisories.html

> only returns rows for the two triples in the > default graph.

It depends on the setup.

The default graph is a separate graph unless the setup configures it to 
be viewed as the union of all named graphs.


In your example, you'll get 4 triples.

e.g. for TDB2:

:dataset_tdb2 rdf:type  tdb2:DatasetTDB2 ;
tdb2:location "DB2" ;
## Optional - with union default
## for query and update WHERE matching.
tdb2:unionDefaultGraph true ;
.

Note - there still is a storage-level default graph. unionDefaultGraph 
is a view.


I can't find anything in the Recommendation about what should happen 
here.


The specs don't say much about how the dataset to be queried come into 
being without any FROM's.


This is intentional because treating the default graph as the union of 
named graphs has been a pattern since before the SPARQL 1.0.


In some systems, the default is really a named graph

It looks like https://www.w3.org/TR/sparql11-query/ 
#namedAndDefaultGraph would have been the place to clarify this. Is the 
issue of proper behavior in this case just up to the implementer, like 
what DESCRIBE is supposed to return? Or is it something that the 
sparql12 group should clarify and document? (Or maybe they have and I 
missed it...)


https://github.com/w3c/sparql-dev/blob/main/SEP/SEP-0004/sep-0004.md
https://github.com/w3c/sparql-dev/issues/43

There should be three names for the three cases:

DEFAULT
UNION (of named graphs)
ALL - a set view of all triples:
   SELECT DISTINCT ?s ?p ?o {
  { ?s ?p ?o }
  UNION
  { GRAPH ?g {?s ?p ?o }
   }

Andy



Thanks,

Bob






Re: jena vs jena-fuseki: different behaviour for namespace replacement by URI in query answer

2024-08-23 Thread Andy Seaborne




On 22/08/2024 13:24, christophe heligon wrote:

Dear community,

Jena and Jena-fuseki 4.9/4.10/5.1 do not display the results (namespace/ 
prefix vs uri) similarly in my hands and I wonder what I did wrong.


Nothing!

SPARQL SELECT results (any format) do not include prefixes and the 3rd 
party component (the YASR part of YASGUI) using CodeMirror doesn't have 
a way of getting them.


Prefixes are only surface syntax and not part of RDF conceptual data 
model. Each dataset records prefixes.


We recently introduced a "prefixes API" so applications can get and 
update the prefixes for a dataset.


https://jena.apache.org/documentation/fuseki2/prefixes-service

but of course that's a Jena feature, not a standard.

It would be good if YASR could be prefixes sensitive. It may be 
something the Jena UI javascript can do as YASR is supposed to be 
modular and flexible.


Andy

I use the applications via downloading from jena website or through the 
stain/jena docker images.
I load the same data set in both context(tdb2.tdbloader --loc DB $files 
or upload manually in fuseki GUI).
I send the same query to both either through ./bin/tdb2.tdbquery --loc 
DB --file request.sq or through fuseki GUI.
The replies are different as fuseki will replace all prefixes by URIs 
and jena will replace some only and I did not see why some and not 
others. See below.


Can someone explain this behaviour to me?

Best regards,
Christophe


Re: Help building a clean Fuseki instance with GeoSPARQL, admin protocol and UI

2024-08-13 Thread Andy Seaborne

Hi Ludovic,



On 25/07/2024 15:06, Ludovic Muller wrote:

Hi!

Thank you for all your great work on Apache Jena and Fuseki!

I need a Fuseki instance that has the following:

- the Fuseki core features
- the [HTTP Administration 
Protocol](https://jena.apache.org/documentation/fuseki2/fuseki-server-protocol.html)
- the UI
- GeoSPARQL support
- shiro, that can read its configuration from a `shiro.ini` file
- get the Fuseki configuration from a `config.ttl` file (where we can define 
all the datasets we need, it can be GeoSPARQL or non-GeoSPARQL ones) ; I don't 
need any help for writing the file, I just need the instance to be able to read 
them.

I managed to trick to get it working for the 4.x versions of Apache Jena, but 
I'm not able to upgrade to 5.x, as it always fails to start.

The way I did it was to 
[patch](https://github.com/zazuko/fuseki-geosparql/blob/226c94e845005d306438ef4415602fd3c40cdc27/patches/enable-geosparql.diff)
 the `jena-fuseki2/jena-fuseki-core/pom.xml` file to include the 
`jena-geosparql` artifact, and then run `mvn package -Dmaven.javadoc.skip=true 
-DskipTests` in `jena-fuseki2`.

This is how everything was built: 
https://github.com/zazuko/fuseki-geosparql/blob/226c94e845005d306438ef4415602fd3c40cdc27/Dockerfile

Do you have a cleaner and more stable way to create such a Fuseki instance?
By this, I think of creating a new dedicated Maven project (or similar) that 
includes all the necessary dependencies and configurations.
But I will need some help with that, as I never managed to get all parts 
working together.

I know that the Fuseki HTTP Administration Protocol does not support GeoSPARQL 
features (for example create a GeoSPARQL dataset from here) ; it could be great 
to have this in the future (if it was not already done), but for now the focus 
is on getting the Fuseki instance with all the mentioned features again, so 
that we can upgrade in an easier way in the future.
The administration protocol is great for the ping, for the compaction, and many 
other great features, that's why I still need it.
Creating the datasets is done manually in the ttl configuration file, so if for 
now it’s not done using the administration protocol is not a huge issue for now.

Right now, when I try to build it using the same way I've done it until now, I 
get the following error:

```
[fuseki 2024-07-25 07:52:45:534 +] [main] WARN 
org.eclipse.jetty.ee10.webapp.WebAppContext {} - Failed startup of context 
oeje10w.WebAppContext@13ebccd{org.apache.jena.fuseki.Servlet,/,b=file:///opt/fuseki/webapp/,a=STOPPED,h=oeje10s.SessionHandler@4e80960a{STOPPED}}
 

java.lang.IllegalArgumentException: Unable to mount FileSystem from unsupported 
URI: jar:file:/opt/fuseki/fuseki-server.jar!/
  at 
org.eclipse.jetty.util.resource.FileSystemPool.mount(FileSystemPool.java:135) 
~[fuseki-server.jar:5.1.0]
  at 
org.eclipse.jetty.util.resource.ResourceFactoryInternals$CompositeResourceFactory.mountIfNeeded(ResourceFactoryInternals.java:268)
 ~[fuseki-server.jar:5.1.0]
```

Can someone help me with this?


What should work is to take jena-fuseki-fulljar POM  which is the 
fuseki-server.jar in the apache-jena-fuseki download

and add jena-geosparql.

That's not much different to adding it to jena-fuseki-core, just better 
to put it in later.


But I don't see why something causes "jar:file:" -- why the "file:" is 
appearing.


Andy


Best,
Ludovic Muller.




Re: Package containing GraphRepository?

2024-08-08 Thread Andy Seaborne




On 07/08/2024 20:41, Andy Seaborne wrote:



On 07/08/2024 12:15, Steve Vestal wrote:
I have some questions about doing an OWL import closure.  My 
understanding is that an Import for an owl:Ontology can use any of a 
URL, an ontology IRI, or a version IRI to access an OWL document. 
Different documents that have the same ontology IRI may be accessed 
using different URLs, either identical copies or with different 
version IRIs.


The link given on the page https://jena.apache.org/documentation/ 
ontology/#graphrepository (https://jena.apache.org/documentation/ 
javadoc/jena/org.apache.jena.ontapi/org/apache/jena/ontapi/ 
GraphRepository.html) gets a "URL was not found" error.  A search 
didn't turn it up. What package contains this class?


Artifact jena-ontapi
package org.apache.jena.ontapi

Its only that the website page that has the wrong links.


Fixed!



The OntAPI javadoc is at:

https://jena.apache.org/documentation/javadoc/ontapi/ 
org.apache.jena.ontapi/module-summary.html


That page says "By default, when an ontology model reads an ontology 
document, it will/not/locate and load the document’s imports."  A 
search of this page did not find OntDocumentManager, which I have been 
using. FileManager seems to be deprecated in favor of RDFDataMgr.  
Will OntDocumentManager be deprecated or modified?


What is the recommended way to accumulate a set of URLs, ontology 
IRIs, and version IRIs, and use that set to do an import closure?


https://jena.apache.org/documentation/ontology/#compound-ontology- 
documents-and-imports-processing


Three argument call to (the new)
    OntModelFactory.createModel(,, GraphRepository)

     Andy




Re: Package containing GraphRepository?

2024-08-07 Thread Andy Seaborne




On 07/08/2024 12:15, Steve Vestal wrote:
I have some questions about doing an OWL import closure.  My 
understanding is that an Import for an owl:Ontology can use any of a 
URL, an ontology IRI, or a version IRI to access an OWL document. 
Different documents that have the same ontology IRI may be accessed 
using different URLs, either identical copies or with different version 
IRIs.


The link given on the page https://jena.apache.org/documentation/ 
ontology/#graphrepository (https://jena.apache.org/documentation/ 
javadoc/jena/org.apache.jena.ontapi/org/apache/jena/ontapi/ 
GraphRepository.html) gets a "URL was not found" error.  A search didn't 
turn it up. What package contains this class?


Artifact jena-ontapi
package org.apache.jena.ontapi

Its only that the website page that has the wrong links.

The OntAPI javadoc is at:

https://jena.apache.org/documentation/javadoc/ontapi/org.apache.jena.ontapi/module-summary.html

That page says "By default, when an ontology model reads an ontology 
document, it will/not/locate and load the document’s imports."  A search 
of this page did not find OntDocumentManager, which I have been using. 
FileManager seems to be deprecated in favor of RDFDataMgr.  Will 
OntDocumentManager be deprecated or modified?


What is the recommended way to accumulate a set of URLs, ontology IRIs, 
and version IRIs, and use that set to do an import closure?


https://jena.apache.org/documentation/ontology/#compound-ontology-documents-and-imports-processing

Three argument call to (the new)
   OntModelFactory.createModel(,, GraphRepository)

Andy


[ANN] Apache Jena 5.1.0

2024-07-18 Thread Andy Seaborne

The Apache Jena development community is pleased to
announce the release of Apache Jena 5.1.0.

The major item for the 5.1.0 release is the new artifact jena-ontapi:

It has API support for working with OWL2 as well as other ontologies. It 
is the long-term replacement for org.apache.jena.ontology.


  https://github.com/apache/jena/issues/2160

This is a contribution from @sszuev

== Contributions

@karolina-telicent
  Prefixes Service
New endpoint for Fuseki to give read and read-write access to the
prefixes of a dataset enabling lookup and modification over HTTP.
  https://github.com/apache/jena/issues/2543

Micrometer - Prometheus upgrade
  See 
https://github.com/micrometer-metrics/micrometer/wiki/1.13-Migration-Guide

  https://github.com/apache/jena/pull/2480

Value space of rdf:XMLLiteral changed to be RDF 1.1/1.2 value semantics.
  Issue https://github.com/apache/jena/issues/2430
  The value space in RDF 1.0 was different.

@TelicentPaul - Paul Gallagher
Migrating Base 64 operations from Apache Commons Codec to Util package.
  https://github.com/apache/jena/pull/2409

@thomasjtaylor Thomas J. Taylor
Fix for NodeValueFloat
https://github.com/apache/jena/pull/2374

@Aklakan Claus Stadler
"Incorrect JoinClassifier results with unbound values."
  https://github.com/apache/jena/issues/2412

@Aklakan Claus Stadler
  "QueryExec: abort before exec is ignored."
  https://github.com/apache/jena/issues/2394

@osi peter royal
  Track rule engine instances
  https://github.com/apache/jena/issues/2382
  https://github.com/apache/jena/pull/2432

Normalization/Canonicalization of values
  Including RDFParserBuilder.canonicalValues
This has been reworked to provide a consistent framework
and also guarantee the same behavior between parsing
and TDB2 handling of values.
  https://github.com/apache/jena/issues/2557

== Issues in this release

  https://s.apache.org/jena-5.1.0-issues

== Obtaining Apache Jena 5.1.0

* Via central.maven.org

The main jars and their dependencies can used with:

  
org.apache.jena
apache-jena-libs
pom
5.1.0
  

Full details of all maven artifacts are described at:

http://jena.apache.org/download/maven.html

* As binary downloads

Apache Jena libraries are available as a binary distribution of
libraries. For details of a global mirror copy of Jena binaries please see:

http://jena.apache.org/download/

* Source code for the release

The signed source code of this release is available at:

http://www.apache.org/dist/jena/source/

and the signed master source for all Apache Jena releases is available
at: http://archive.apache.org/dist/jena/

== Contributing

If you would like to help out, a good place to look is the list of
unresolved JIRA at:

https://https://github.com/apache/jena/issuesissues-current

or review pull requests at

https://github.com/apache/jena/pulls

or drop into the dev@ list.

We use github pull requests and other ways for accepting code:
 https://github.com/apache/jena/blob/master/CONTRIBUTING.md


Re: Tarql

2024-07-15 Thread Andy Seaborne




On 12/07/2024 16:03, Shaw, Ryan wrote:

Tarql  is a very useful tool that has not been 
updated in 4+ years:

https://github.com/tarql/tarql/commits/master/

It still requires Jena 3.11.

Are there replacement tools out there for CONSTRUCTing RDF from CSV?

If not, should Tarql perhaps become an extension of ARQ, given how widely used 
it seems to be?


Possible. added as an "extra", as is forking the repo independently and 
bringing it up to date.


Given the other responses here, is there a role for Tarql?

There would be a couple of things that need doing:

* Talk to cygri - ASF does not do hostile forks.
  It hasn't been inactive that long (Dec 2020)

* Change the dependency net.sourceforge.jchardet:jchardet
  because there isn't an updated tar for v1.1, only source code
  and it is quite old. Apache Tika has code for character set
  detection - lift that source code.

* Make it work with Jena5.

Andy




Ryan


https://www.w3.org/TR/tabular-data-primer/
but which implementations are currently maintained?


Re: JSON-LD writer and the Titanium RdfDataset

2024-07-11 Thread Andy Seaborne

One million triples in memory isn't very many these days.

Just make sure there are not a lot of "1 millions" at the same time.

As you describe it, it does sound like the example below. For each 
"entity" i.e. closure, produce a JSON-LD document and stream/assemble 
wrapped in


{ "@context" : URL , "@graph" : [  ] }

to avoid repeated "@context"

Andy

On 11/07/2024 06:03, Holger Knublauch wrote:

Hi Andy,

thanks for your response. To clarify, it would be a scenario such as a TDB with 1 million 
triples and the request is to produce a JSON-LD document from the "closure" 
around a given resource (in TopBraid's Source Code panel when the user navigates to a 
resource or through API calls). In other words: input is a Jena Graph, a start node and a 
JSON-LD frame document, and the output should be a JSON-LD describing the node and all 
reachable triples described by the frame.

So it sounds like Titanium cannot really be used for this as its algorithms can 
only operate on their own in-memory copy of a graph, and we cannot copy all 1 
million triples into memory each time.

Holger



On 10 Jul 2024, at 5:53 PM, Andy Seaborne  wrote:

Hi Holger,

How big is the database?
What sort of framing are you aiming to do?
Using framing to select some from a large database doesn't feel like the way to 
extract triples as you've discovered. Framing can touch anywhere in the JSON 
document.

This recent thread is relevant --
https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl

That JSON-LD file is 280 million triples.

It's structure is

[{"@context":  , ... }
,{"@context":  , ... }
,{"@context":  , ... }
...
,{"@context":  , ... }
]

9 million array entries.

It looks to me like it has been produced by text manipulation, taking each 
entity, writing a separate, self-contained JSON-LD object then, by text, making 
a big array. That, or a tool that is designed specially to write large JSON-LD. 
e.g. the outer array.

That's the same context URL and would be a denial of service attack except 
Titanium reads the whole file as JSON and runs out of space.

The JSON-LD algorithms do assume the whole document is available. Titanium is a 
faithful implementation of the spec.

It is hard to work with.

In JSON the whole object needs to be seen - repeated member names (and facto - last 
duplicate wins) and "@context" being at the end are possible.  Cases that don't 
occur in XML. Streaming JSON or JSON-LD is going to have to relax the strictness somehow.

JSON-LD is designed around the assumption of small/medium sized data.

And this affects writing. That large file looks like it was specially written 
or at least with a tool that is designed specially to write large JSON-LD. e.g. 
the outer array.


Jena could do with some RDFFormats + writers for JSONLD at scale. Oen obvious one is the 
one extending WriterStreamRDFBatched where a batch is the subject and its immediate 
triples, then write similar to the case above except in a way that is one context then 
the array is with "@graph".

https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects

That doesn't solve the reading side - a companion reader would be needed that 
stream-reads JSON.

Contributions welcome!

Andy

On 10/07/2024 12:36, Holger Knublauch wrote:

I am working on serializing partial RDF graphs to JSON-LD using the 
Jena-Titanium bridge.
Problem: For Titanium to "see" the triples it needs to have a complete copy. 
See JenaTitanion.convert which copies all Jena triples into a corresponding RdfDatset. 
This cannot scale if the graph is backed by a database, and we only want to export 
certain triples (esp for Framing). Titanium's RdfGraph does not provide an incremental 
function similar to Graph.find() but only returns a complete Java List of all triples.
Has anyone here run into the same problem and what would be a solution?
I guess one solution would be an incremental algorithm that "walks" a @context 
and JSON-LD frame document to collect all required Jena triples, producing a sub-graph 
that can then be sent to Titanium. But the complexity of such an algorithm is similar to 
having to implement my own JSON-LD engine, which feels like an overkill.
Holger




Re: Multi-Fuseki instance on same dataset

2024-07-11 Thread Andy Seaborne




On 11/07/2024 11:47, Steven Blanchard wrote:

Dear,

We want to improve our fuseki instance to reduce the request time when
we launch multiple request at the same time on the same dataset.

We launch our request by the fuseki REST API.

We have the impression that the requests are processed one after the
other.


Fuseki is multiple reader and single writer

There can be as many readers as you like (limited by hardware) and they 
all execute in parallel. One thread each.


In addition, there can be a single writer and further writers are queued.


We tried to create multiple fuseki instance that use some commun
dataset but we have a read acess error when we start the second
instance because of the first instance create a lock file in the
dataset.

It's possible to do this or it's a fuseki limitation ?


If the first part is sufficient,
see the non-Apache project RDF Delta https://afs.github.io/rdf-delta/


They have some other tips, advices or possibility to improve the
multiprocessing capacity of a fuseki server? to reduce time of
request?


What do the requests look like?

The shape of requests greatly affects the answers here.

Andy



Our final goal are to upgrade our "developpement" fuseki server with
default paramaters to a production server with an high scalability and
optimised response times.

Thanks for our help,

Steven

[Logo Microbiome Studio]



STEVEN BLANCHARD
Bioinformatics_
  steven.blanchard@microbiome.studio_


MICROBIOME STUDIO BY ABOLIS BIOTECHNOLOGIES
Station F - 5 parvis Alan Turing
75013 Paris, France
  _ https://microbiomestudio.com/_

_Ce message ainsi que les pièces jointes sont établis à
l'intention exclusive de ses destinataires et sont confidentiels. Si
vous recevez ce message par erreur, merci de le détruire et d'en
avertir immédiatement l'expéditeur

This message and any attachments may contain privileged and
confidential information. It is intended solely for the addressees and
is confidential. If you receive this message in error, please delete
it and immediately notify the sender._





Re: JSON-LD writer and the Titanium RdfDataset

2024-07-10 Thread Andy Seaborne

Hi Holger,

How big is the database?
What sort of framing are you aiming to do?
Using framing to select some from a large database doesn't feel like the 
way to extract triples as you've discovered. Framing can touch anywhere 
in the JSON document.


This recent thread is relevant --
https://lists.apache.org/thread/3mrcyf1ccry78rkxxb6vqsm4okfffzfl

That JSON-LD file is 280 million triples.

It's structure is

[{"@context":  , ... }
,{"@context":  , ... }
,{"@context":  , ... }
...
,{"@context":  , ... }
]

9 million array entries.

It looks to me like it has been produced by text manipulation, taking 
each entity, writing a separate, self-contained JSON-LD object then, by 
text, making a big array. That, or a tool that is designed specially to 
write large JSON-LD. e.g. the outer array.


That's the same context URL and would be a denial of service attack 
except Titanium reads the whole file as JSON and runs out of space.


The JSON-LD algorithms do assume the whole document is available. 
Titanium is a faithful implementation of the spec.


It is hard to work with.

In JSON the whole object needs to be seen - repeated member names (and 
facto - last duplicate wins) and "@context" being at the end are 
possible.  Cases that don't occur in XML. Streaming JSON or JSON-LD is 
going to have to relax the strictness somehow.


JSON-LD is designed around the assumption of small/medium sized data.

And this affects writing. That large file looks like it was specially 
written or at least with a tool that is designed specially to write 
large JSON-LD. e.g. the outer array.



Jena could do with some RDFFormats + writers for JSONLD at scale. Oen 
obvious one is the one extending WriterStreamRDFBatched where a batch is 
the subject and its immediate triples, then write similar to the case 
above except in a way that is one context then the array is with "@graph".


https://www.w3.org/TR/json-ld11/#example-163-same-description-in-json-ld-context-shared-among-node-objects

That doesn't solve the reading side - a companion reader would be needed 
that stream-reads JSON.


Contributions welcome!

Andy

On 10/07/2024 12:36, Holger Knublauch wrote:

I am working on serializing partial RDF graphs to JSON-LD using the 
Jena-Titanium bridge.

Problem: For Titanium to "see" the triples it needs to have a complete copy. 
See JenaTitanion.convert which copies all Jena triples into a corresponding RdfDatset. 
This cannot scale if the graph is backed by a database, and we only want to export 
certain triples (esp for Framing). Titanium's RdfGraph does not provide an incremental 
function similar to Graph.find() but only returns a complete Java List of all triples.

Has anyone here run into the same problem and what would be a solution?

I guess one solution would be an incremental algorithm that "walks" a @context 
and JSON-LD frame document to collect all required Jena triples, producing a sub-graph 
that can then be sent to Titanium. But the complexity of such an algorithm is similar to 
having to implement my own JSON-LD engine, which feels like an overkill.

Holger



Re: riot file conversion

2024-07-06 Thread Andy Seaborne




On 05/07/2024 16:16, Andy Seaborne wrote:

The file authorities-gnd_entityfacts.jsonld.gz has a structure you can 
exploit.


Each line is a entry in a JSON array and it is complete - it has the 
@context on each line


There are 9 million lines.

A possiblity is split the file on newline,
Clean up each line
* Drop the last line (the "]")
* remove the first character of each line (which is ",", after "[" on 
the first line.


Parse each line.


Each element of the array starts

{"@context":"https://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld"; 



so there are 9 million @context URLs and the parser is going to do a 
network call for each. On the whole file, that's days and that's if the 
far end does not decide it is a denial of service attack!


(confirmed: watching the network on an extract - Titanium isn't caching, 
at least in the way Jena uses it setup - maybe possible to get Titanum 
to to cache but the file size limit is still a problem).


Split the file into chunks - say 100k lines per file, 90 odd files, then
fixup so each file is legal JSON-LD with one overall @context at the 
start of each file.


run riot on all the files.

riot will parse one file, writing n-triple and then do the next so 
maximum RFAM is a 100k chunk.

default JVM size (16G on my desktop machine)

I got a few transient network errors fetching the context. So better 
would be parse each file to its own N-Triples, so it is easy to redo a 
chunk file.


A few bad URIs in which cause Titanium to skip bits.

About 280 million triples.

Andy

## In a directory Files:
split -l 10  ... the download uncomrpessed file.

for X in x??
do
echo "== $X"
(
# One object, one context, @graph array
cat header
# Convert start array to ",", remove @context and trailing array
sed -e 's/^\[/,/' \
	-e 
's!"@context":"https://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld";,!!' 
\

-e '/^\]$/d' \
"$X"
cat footer
) > ${X}-ld
done

to give files "xaa-ld" etc.

(I avoided jsonld extensions because it triggers editor help but the 
files are so big that really slow.


riot --syntax jsonld x??-ld

The parser step took 20 minutes


"header" is
---
{ 
"@context":"https://hub.culturegraph.org/entityfacts/context/v1/entityfacts.jsonld";,

  "@graph" : [
  {}
---
the {} is for the first line starts ","

"footer" is
---
]}
---



Re: riot file conversion

2024-07-05 Thread Andy Seaborne
at java.base/java.lang.String.(String.java:300)
  at 
org.glassfish.json.JsonTokenizer.getValue(JsonTokenizer.java:510)
  at 
org.glassfish.json.JsonParserImpl.getString(JsonParserImpl.java:101)
  at 
org.glassfish.json.JsonParserImpl.getObject(JsonParserImpl.java:332)
  at 
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:175)
  at 
org.glassfish.json.JsonParserImpl.getArray(JsonParserImpl.java:321)
  at 
org.glassfish.json.JsonParserImpl.getValue(JsonParserImpl.java:173)
  at 
com.apicatalog.jsonld.document.JsonDocument.doParse(JsonDocument.java:163)
  at 
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:112)
  at 
com.apicatalog.jsonld.document.JsonDocument.of(JsonDocument.java:90)
  at 
org.apache.jena.riot.lang.LangJSONLD11.read(LangJSONLD11.java:73)

  at org.apache.jena.riot.RDFParser.read(RDFParser.java:444)
  at org.apache.jena.riot.RDFParser.parseURI(RDFParser.java:413)
  at org.apache.jena.riot.RDFParser.parse(RDFParser.java:375)
  at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:391)
  at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:337)
  at riotcmd.CmdLangParse.exec$(CmdLangParse.java:234)
  at riotcmd.CmdLangParse.exec(CmdLangParse.java:174)
  at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:87)
  at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:56)
  at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:43)
  at riotcmd.riot.main(riot.java:29)

Regards
Sorin

Am 05.07.2024 um 00:35 schrieb Andy Seaborne:



On 03/07/2024 10:22, Sorin Gheorghiu wrote:

Greetings,

here my attempt to convert a large file from json-ld to rdf format, 
does riot tool support archived files?


Yes.



$ riot --out=RDF/XML filein.jsonld.gz > fileout.rdf


That should work (Jena 5.0.0)

What happened?



Best regards
Sorin




Re: riot file conversion

2024-07-04 Thread Andy Seaborne




On 03/07/2024 10:22, Sorin Gheorghiu wrote:

Greetings,

here my attempt to convert a large file from json-ld to rdf format, does 
riot tool support archived files?


Yes.



$ riot --out=RDF/XML filein.jsonld.gz > fileout.rdf


That should work (Jena 5.0.0)

What happened?



Best regards
Sorin


Re: Docker & compacting a dataset

2024-07-04 Thread Andy Seaborne
on.java:227) 
~[fuseki-server.jar:5.0.0]
at org.apache.thrift.TUnion.read(TUnion.java:145) ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:82)
 ~[fuseki-server.jar:5.0.0]
... 32 more
08:18:30 ERROR Server :: Exception in task 9 execution
org.apache.jena.tdb2.TDBException: NodeTableTRDF/Read
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:87)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:102)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:52)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:208)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:133)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:52)
 ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:65)
 ~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:107) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.lib.TupleLib.quad(TupleLib.java:103) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.lib.TupleLib.lambda$convertToQuads$3(TupleLib.java:52) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.atlas.iterator.Iter$IterMap.lambda$forEachRemaining$0(Iter.java:432)
 ~[fuseki-server.jar:5.0.0]
at java.base/java.util.Iterator.forEachRemaining(Unknown Source) ~[?:?]
at org.apache.jena.atlas.iterator.Iter$IterMap.forEachRemaining(Iter.java:432) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.atlas.iterator.IteratorWrapper.forEachRemaining(IteratorWrapper.java:52)
 ~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.sys.CopyDSG.lambda$copy$0(CopyDSG.java:38) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.system.Txn.exec(Txn.java:77) ~[fuseki-server.jar:5.0.0]
at org.apache.jena.system.Txn.executeWrite(Txn.java:125) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.sys.CopyDSG.lambda$copy$1(CopyDSG.java:36) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.system.Txn.exec(Txn.java:77) ~[fuseki-server.jar:5.0.0]
at org.apache.jena.system.Txn.executeRead(Txn.java:115) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.sys.CopyDSG.copy(CopyDSG.java:35) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.sys.DatabaseOps.lambda$compaction$3(DatabaseOps.java:431) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.dboe.transaction.txn.TransactionalSystemControl.execReadOnlyDatabase(TransactionalSystemControl.java:45)
 ~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.sys.DatabaseOps.compaction(DatabaseOps.java:419) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.sys.DatabaseOps.compact(DatabaseOps.java:359) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.tdb2.DatabaseMgr.compact(DatabaseMgr.java:81) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.fuseki.ctl.ActionCompact$CompactTask.run(ActionCompact.java:109)
 ~[fuseki-server.jar:5.0.0]
at org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:66) 
~[fuseki-server.jar:5.0.0]
at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) 
~[fuseki-server.jar:5.0.0]
at java.base/java.util.concurrent.FutureTask.run(Unknown Source) ~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) 
~[?:?]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) 
~[?:?]
at java.base/java.lang.Thread.run(Unknown Source) [?:?]
Caused by: org.apache.thrift.protocol.TProtocolException: Unrecognized type 0
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:140) 
~[fuseki-server.jar:5.0.0]
at org.apache.thrift.protocol.TProtocolUtil.skip(TProtocolUtil.java:53) 
~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.riot.thrift.wire.RDF_Term.standardSchemeReadValue(RDF_Term.java:432)
 ~[fuseki-server.jar:5.0.0]
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:238) 
~[fuseki-server.jar:5.0.0]
at org.apache.thrift.TUnion$TUnionStandardScheme.read(TUnion.java:227) 
~[fuseki-server.jar:5.0.0]
at org.apache.thrift.TUnion.read(TUnion.java:145) ~[fuseki-server.jar:5.0.0]
at 
org.apache.jena.tdb2.store.nodetable.NodeTableTRDF.readNodeFromTable(NodeTableTRDF.java:82)
 ~[fuseki-server.jar:5.0.0]



---

Thomas Bottini

IReMus — Institut de Recherche en Musicologie UMR 8223

____
De : Andy Seaborne 
Envoyé : mardi 2 juillet 2024 22:34:32
À : users@jena.apache.org
Objet : Re: Docker & compacting a dataset

Hi Thomas,

There should be a log file somewhere and that usually has more information.

  Andy


On 02/07/2024 12:47, BOTTINI Thomas wrote:

Hello ;

I run fuseki with a Docker compose and a Docker image (inspired by 
https://hub.docker

Re: Docker & compacting a dataset

2024-07-02 Thread Andy Seaborne

Hi Thomas,

There should be a log file somewhere and that usually has more information.

Andy


On 02/07/2024 12:47, BOTTINI Thomas wrote:

Hello ;

I run fuseki with a Docker compose and a Docker image (inspired by 
https://hub.docker.com/r/secoresearch/fuseki/).
Here is a fragment of the docker-compose.yml :

   fuseki:
 build: ./fuseki-docker
 ports:
   - 3030:3030
 volumes:
   - ./fuseki-data:/fuseki-base/databases
   - ./fuseki-configuration:/fuseki-base/configuration


I want/need/try to compact a dataset :

$ http POST localhost:3030/$/compact/iremus?deleteOld=true -a admin:XYZ

HTTP/1.1 200 OK
Content-Encoding: gzip
Content-Length: 56
Content-Type: application/json;charset=utf-8
Date: Tue, 02 Jul 2024 11:43:11 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Fuseki-Request-Id: 30
Location: /$/compact/iremus/4
Set-Cookie: JSESSIONID=node01py6j2rduedd7jwqpre5qkctr29.node0; Path=/
Set-Cookie: rememberMe=deleteMe; Path=/; Max-Age=0; Expires=Mon, 01-Jul-2024 
11:43:11 GMT; SameSite=lax
Vary: Accept-Encoding, Origin

{
 "requestId": 30,
 "taskId": "4"
}

The task fails immediately :

$ http localhost:3030/$/tasks -a admin:XYZ

 {
 "finished": "2024-07-02T11:43:11.487+00:00",
 "started": "2024-07-02T11:43:11.069+00:00",
 "success": false,
 "task": "Compact",
 "taskId": "4"
 }

How can I find the reasons of this failure? Could it be something related to 
the Docker volume?

Thank you very much, in advance,

Best regards

--
💾 Thomas Bottini
Institut de Recherche en Musicologie — IReMus UMR822



Re: skolemize a graph when loading

2024-06-30 Thread Andy Seaborne

Hi Paul,

You want this to make bnodes in the Fuseki server addressable?

I agree it would be nice for something in riot to do this.

What skolemization naming schemes are popular?


It's possible to convert using text processing on N-Triples. Regex for 
(^| )_: and replace with a  or what every scheme you want.


Maybe 


Jena also has a non-standard extension:

URI(bnode()) => <_:44f86b24-8556-44e0-89eb-27e3bf21c66e>

"_:" is illegal as a URI scheme and also as a relative URI so thjat's 
illegal as a URI, strictly but Jena allows it.


It can be in SPARQL results and queries.

When a Jena parser or result set reader sees "_:" it uses the rest as 
the bnode id.


Unlike skolemization, The data and results really do have a bnode in it 
so the data itself is unchanged.


Andy

Related but different: : https://github.com/apache/jena/issues/2549
"More granular control over Blank node serialization"

On 6/27/24 22:57, Martynas Jusevičius wrote:

LinkedDataHub has a class that does that:
https://github.com/AtomGraph/LinkedDataHub/blob/develop/src/main/java/com/atomgraph/linkeddatahub/server/util/Skolemizer.java

On Thu, Jun 27, 2024 at 11:55 PM Paul Tyson  wrote:


I searched the source and docs but didn't turn up any easy way to
skolemize an input graph before loading to fuseki. The skolemization
capabilities appear to be related to the reasoners, which I don't need.

I expect there's a good reason for this missing feature, but wanted to
ask if I've missed something. I probably don't need it badly enough to
build something for it. It would be better if the dataset came to me
with the necessary node ids, but am considering if it's worth putting in
a fixup routine.

Thanks
--Paul



Re: war deployment issue

2024-06-30 Thread Andy Seaborne

Hi Boris,

I'm glad the standalone JAR is working for you.
I completely understand not wanting to experiment on the war setup - the 
point of war files is that you should be able to just drop them in.


The NoClassDefFoundError has changed so adding shiro-cdi mad a difference.

I experimented adding shiro-cdi - it does add any more dependencies 
itself but does have an assumption that the environment provides
jakarta.enterprise:jakarta.enterprise.cdi-api. So, maybe, its the setup 
of the webapp server.


Andy

On 6/29/24 14:13, Boris Mauricette wrote:

  Hi Andy,
I'm attempting to run the war file untouched. Nevertheless, as glassfish seems 
to be a pain in the neck, I'll use the standalone fuseki server, it'll be 
fine.thanks a lot for the support.Best regards
Boris
ps: i'm not skilled enough to try many modif. I tried to add the shiro-cdi jar 
into the war WEB-INF/lib dir. A deploy attempt then raises a 
java.lang.NoClassDefFoundError: javax/faces/application/ApplicationFactory. too 
tricky for me…

 Le vendredi 28 juin 2024 à 20:28:58 UTC+2, Andy Seaborne  
a écrit :
  
  Hi Boris,


I don't use glassfish so I can't reproduce this. Maybe someone els eon
this list uses fuseki.war with glassfish+jboss.

Are you just running the Jena Fuseki WAR file untouched or combingin it
with anything?

Fuseki does not need Jakarta EE CDI but maybe your setup detects Shiro
and then does expect it. Adding the artifact
org.apache.shiro:shiro-cdi:jakarta:2.0.0 might help.

     Andy

On 28/06/2024 08:32, Boris Mauricette wrote:

Hi Andy,
it doesn't fix the problem. I try to join a log report (compressed).
Best regards.

Le jeudi 27 juin 2024 à 17:35:06 UTC+2, Andy Seaborne 
a écrit :


Hi Boris,

This could be related to

https://github.com/apache/jena/issues/2448
<https://github.com/apache/jena/issues/2448>

which has a workaround.

Add to web.xml:

     
       org.apache.shiro.ee.disabled
       true
     

if not, could you please show the stack trace?

       Andy

On 27/06/2024 14:05, Boris Mauricette wrote:
   > Dear all,I come here for the following:I try to deploy the
jena-fuseki war webapp on a fresh installed glassfish7 (7.0.14 EE
platform), but it fails with this error:
   > Error occurred during deployment: Exception while loading the app :
CDI definition failure:null
   > more precisely, in the last lines of the error stack, there is:
   > java.lang.NoClassDefFoundError: org/apache/shiro/cdi/AnnotatedTypeWrapper
   >   …at
org.apache.shiro.ee.cdi.ShiroSessionScopeExtension.addFacesViewScoped
   >
   > this package org/apache/shiro/cdi doesn't seem to exist, I only see
some shiro/ee/cdi/ things. As recommanded, I've renamed the .war file
fuseki.war.
   >
   > Does anyone have an idea to fix the problem?Best regards,
   >
   


Re: war deployment issue

2024-06-28 Thread Andy Seaborne

Hi Boris,

I don't use glassfish so I can't reproduce this. Maybe someone els eon 
this list uses fuseki.war with glassfish+jboss.


Are you just running the Jena Fuseki WAR file untouched or combingin it 
with anything?


Fuseki does not need Jakarta EE CDI but maybe your setup detects Shiro 
and then does expect it. Adding the artifact 
org.apache.shiro:shiro-cdi:jakarta:2.0.0 might help.


Andy

On 28/06/2024 08:32, Boris Mauricette wrote:

Hi Andy,
it doesn't fix the problem. I try to join a log report (compressed).
Best regards.

Le jeudi 27 juin 2024 à 17:35:06 UTC+2, Andy Seaborne  
a écrit :



Hi Boris,

This could be related to

https://github.com/apache/jena/issues/2448 
<https://github.com/apache/jena/issues/2448>


which has a workaround.

Add to web.xml:

   
     org.apache.shiro.ee.disabled
     true
   

if not, could you please show the stack trace?

     Andy

On 27/06/2024 14:05, Boris Mauricette wrote:
 > Dear all,I come here for the following:I try to deploy the 
jena-fuseki war webapp on a fresh installed glassfish7 (7.0.14 EE 
platform), but it fails with this error:
 > Error occurred during deployment: Exception while loading the app : 
CDI definition failure:null

 > more precisely, in the last lines of the error stack, there is:
 > java.lang.NoClassDefFoundError: org/apache/shiro/cdi/AnnotatedTypeWrapper
 >   …at 
org.apache.shiro.ee.cdi.ShiroSessionScopeExtension.addFacesViewScoped

 >
 > this package org/apache/shiro/cdi doesn't seem to exist, I only see 
some shiro/ee/cdi/ things. As recommanded, I've renamed the .war file 
fuseki.war.

 >
 > Does anyone have an idea to fix the problem?Best regards,
 >


Re: war deployment issue

2024-06-27 Thread Andy Seaborne

Hi Boris,

This could be related to

https://github.com/apache/jena/issues/2448

which has a workaround.

Add to web.xml:

  
org.apache.shiro.ee.disabled
true
  

if not, could you please show the stack trace?

Andy

On 27/06/2024 14:05, Boris Mauricette wrote:

Dear all,I come here for the following:I try to deploy the jena-fuseki war 
webapp on a fresh installed glassfish7 (7.0.14 EE platform), but it fails with 
this error:
Error occurred during deployment: Exception while loading the app : CDI 
definition failure:null
more precisely, in the last lines of the error stack, there is:
java.lang.NoClassDefFoundError: org/apache/shiro/cdi/AnnotatedTypeWrapper
  …at org.apache.shiro.ee.cdi.ShiroSessionScopeExtension.addFacesViewScoped

this package org/apache/shiro/cdi doesn't seem to exist, I only see some 
shiro/ee/cdi/ things. As recommanded, I've renamed the .war file fuseki.war.

Does anyone have an idea to fix the problem?Best regards,



Re: issue when Deleting a dataset on Windows

2024-06-20 Thread Andy Seaborne

Hi Philippe,

The issue is Windows.

There is a bug in the JDK on a NTFS filesystem whereby memory mapped 
files are not deleted until the JVM exits. The JDK issue is "won't fix".


https://github.com/apache/jena/issues/2092#issuecomment-1810362743

WSL has linux filesystem semantics - no issue.

For you example below, you could PUT the new triples into the dtabase 
which will clear out the old data.


Andy

On 20/06/2024 13:51, Philippe Genoud wrote:
I have an issue with Fuseki 5.0.0 running on Windows (10 or 11) when I 
try to remove a existing dataset using the Web UI.


Here are the operations I'm performing

1) Create en new DataSet named Test1

2) Remove the DataSet

3) In the interface everything seems going fine, no message

But in fact the file (folder Test1 containing the dataset data) is not 
destroyed on my hard disk and I get a Exception message on the console 
saying that The process cannot access the file because this file is in 
use by another process


Here the trace on the console

 >fuseki-server.bat
14:24:31 INFO  Server  :: Apache Jena Fuseki 5.0.0
14:24:31 INFO  Config  :: 
FUSEKI_HOME=P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\.
14:24:31 INFO  Config  :: 
FUSEKI_BASE=P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run
14:24:31 INFO  Config  :: Shiro file: 
file://P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run\shiro.ini
14:24:32 INFO  Server  :: Configuration file: 
P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run\config.ttl

14:24:32 INFO  Server  ::   Memory: 1,2 GiB
14:24:32 INFO  Server  ::   Java:   17.0.5
14:24:32 INFO  Server  ::   OS: Windows 10 10.0 amd64
14:24:32 INFO  Server  ::   PID:    22624
14:24:32 INFO  Server  :: Started 2024/06/20 14:24:32 CEST on 
port 3030

14:25:14 INFO  Admin   :: [3] Create database : name = /test1
14:25:43 INFO  Admin   :: [5] DELETE dataset=/test1
14:25:43 ERROR Admin   :: [5] Error while deleting database 
files 
P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run\databases\test1: 
java.nio.file.FileSystemException: 
P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run\databases\test1\Data-0001\GOSP.bpt: Le processus ne peut pas accéder au fichier car ce fichier est utilisé par un autre processus
org.apache.jena.atlas.RuntimeIOException: 
java.nio.file.FileSystemException: 
P:\DevTools\SemWeb\Fuseki\apache-jena-fuseki-5.0.0\run\databases\test1\Data-0001\GOSP.bpt: Le processus ne peut pas accéder au fichier car ce fichier est utilisé par un autre processus
     at org.apache.jena.atlas.io.IOX.exception(IOX.java:55) 
~[fuseki-server.jar:5.0.0]
     at org.apache.jena.atlas.io.IO.deleteAll(IO.java:601) 
~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.mgt.ActionDatasets.execDeleteItem(ActionDatasets.java:422) ~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.ctl.ActionContainerItem.performDelete(ActionContainerItem.java:99) ~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.ctl.ActionContainerItem.execute(ActionContainerItem.java:45) ~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.ctl.ActionCtl.executeLifecycle(ActionCtl.java:50) 
~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.ctl.ActionCtl.process(ActionCtl.java:40) 
~[fuseki-server.jar:5.0.0]
     at 
org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:127) ~[fuseki-server.jar:5.0.0]


I've tried also with Java 21.0.3 (Oracle JDK) and I got the same 
message. But when launching fuseki from wsl, everything works fine.


The problem is that if a create a Dataset, put some triples in it, then 
if a remove it and I create a new dataset with the same name, all the 
triples stored in the previous deleted one are in the new one.


Is it a known bug ? Is there a workaround ?

Thanks for your help

Philippe


--
Philippe Genoud
Universite Grenoble Alpes
---
STEAMER group
Laboratoire d'Informatique de Grenoble (LIG)
Bâtiment IMAG
700 avenue Centrale
Domaine Universitaire - 38401 St Martin d'Hères
---
adresse postale
LIG - Bâtiment IMAG - CS 40700 - 38058 GRENOBLE CEDEX 9
---
Tel: tel: (+33) (0)4 57 42 15 01



Re: Aborting UPDATEs

2024-05-19 Thread Andy Seaborne




On 17/05/2024 17:22, Holger Knublauch wrote:

Hi all,

am I missing something obvious or is it not yet possible to programmatically 
abort SPARQL UPDATEs, like QueryExecutions can?


No, you aren't missing anything. Updates don't have a timeout.

Probably they could have nowadays.

There is a requirement on the dataset target of the update - it must 
support a proper abort.


An Update can be several operations, separated by ";"  and each 
operation is completed before the next is attempted.


A timeout ought to abort the whole update otherwise, at best, the data 
is partially updated, at worse (non-transactional usage), java data 
structures may be corrupted.


Most datasets support a good enough abort but not a general dataset with 
links to arbitrary graphs. A buffering dataset can be used for a good 
enough abort.




Related, I also didn't see a way to set a timeout.

I guess for our use cases it would be sufficient if the abort would happen 
during the WHERE clause iteration...

>

Thanks
Holger



Andy


Re: Fuseki: multiple instances vs. multiple datasets

2024-05-13 Thread Andy Seaborne




On 13/05/2024 11:10, Martynas Jusevičius wrote:

Hi,

I'm using multiple Fuseki instances in a Docker setup but considering
to use a single instance with multiple datasets instead.

So I was wondering what the differences of those setups are (besides
the lower memory consumption etc.) 


which are not so great because the dominant memory cost is the database.


in terms of:
- security - I suppose there would be no difference since the datasets
are isolated and have separate endpoints?


If the docker setup is multi-machine, there is isolation of 
denial-of-service issues.



- federation - would SPARQL federation perform better on a single
instance? E.g. if a query federates between datasets on the same
instance, maybe Fuseki would recognize that and avoid HTTP calls? Just
thinking out loud here.
- any other aspects?


Administration convenience - which could go either way.

Load balancing.



Martynas



Andy


Re: Cannot get Fuseki 5 to run...

2024-05-02 Thread Andy Seaborne

Hi Phil,

It's a bug.

Fuseki uses the CORS filter from Eclipse Jetty by code-copy so as not to 
depend on Jetty. But at the last update, some Jetty code usage didn't 
get replaced and there are class references.


Issue created:
https://github.com/apache/jena/issues/2443

Andy

On 02/05/2024 04:02, Phillip Rhodes wrote:

Gang:

I'm having NO luck at all getting Fuseki 5 to run. I'm using Java 17
and the latest Tomcat 10 release that I see (apache-tomcat-10.1.23)
and Fuseki "jena-fuseki-war-5.0.0.war". From what I could find of docs
I thought this combination was sufficient, but apparently not. When I
try to launch the server I get this:

02-May-2024 02:56:46.903 SEVERE [main]
org.apache.catalina.startup.HostConfig.deployWAR Error deploying web
application archive
[/extradata/downloads/tomcat/apache-tomcat-10.1.23/webapps/fuseki.war]
java.lang.IllegalStateException: Error starting child
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:690)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:659)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:712)
at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:969)
at
org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1911)
at
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
at
java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:123)
at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:771)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:423)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1629)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:303)
at
org.apache.catalina.util.LifecycleBase.fireLifecycleEvent(LifecycleBase.java:114)
at
org.apache.catalina.util.LifecycleBase.setStateInternal(LifecycleBase.java:402)
at
org.apache.catalina.util.LifecycleBase.setState(LifecycleBase.java:345)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:903)
at
org.apache.catalina.core.StandardHost.startInternal(StandardHost.java:845)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:171)
at
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1345)
at
org.apache.catalina.core.ContainerBase$StartChild.call(ContainerBase.java:1335)
at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at
org.apache.tomcat.util.threads.InlineExecutorService.execute(InlineExecutorService.java:75)
at
java.base/java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:145)
at
org.apache.catalina.core.ContainerBase.startInternal(ContainerBase.java:876)
at
org.apache.catalina.core.StandardEngine.startInternal(StandardEngine.java:240)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:171)
at
org.apache.catalina.core.StandardService.startInternal(StandardService.java:470)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:171)
at
org.apache.catalina.core.StandardServer.startInternal(StandardServer.java:947)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:171)
at org.apache.catalina.startup.Catalina.start(Catalina.java:757)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:345)
at 
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:473)
Caused by: org.apache.catalina.LifecycleException: Failed to
start component
[StandardEngine[Catalina].StandardHost[localhost].StandardContext[/fuseki]]
at
org.apache.catalina.util.LifecycleBase.handleSubClassException(LifecycleBase.java:419)
at
org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:186)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:687)

TDB3

2024-04-25 Thread Andy Seaborne




On 24/04/2024 21:42, Martynas Jusevičius wrote:

Andy,

Not directly related, but would different storage backend address
issues like this?

It might sound a bit like the legacy SDB, but AFAIK oxigraph, Stardog
and another commercial triplestore use RocksDB for storage.
https://github.com/oxigraph/oxigraph
https://docs.stardog.com/operating-stardog/database-administration/storage-optimize

There is even a RocksDB backend for Jena:
https://github.com/zourzouvillys/triplerocks
And just now I found your own TDB3 repo: https://github.com/afs/TDB3

Can you shed some light on TDB3 and this approach in general?


TDB3 uses RocksDB as the storage layer replacing the custom B+trees and 
also the node table. It's a naive use of RockDB. It seems to work 
(functional). It's untested both in code and deployment.


It loads slower than tdb2 bulk loaders (IIRC maybe 70Ktriples/s) but 
little work has been done to exploit Rocks capabilities.


The advantage of Rocks is that it is likely to be around for a long time 
(= it's a safe investment), it's transactional, has compression [1], has 
compaction [2], it has a java wrapper (separate, but closely related and 
in contact with the Rocks team).


While there are many storage engines that claim to be faster than 
RocksDB, often, such claims have assumptions.


There are other storage layers to explore as well.

Andy

[1] Better, or also, would probably be compression in the encoding of 
stored tuples.


[2] compaction has two parts - finding the RDF terms that are currently 
in use in the database and recovering space indexes. RocksDB compaction 
is about the second case.





Martynas

On Wed, Apr 24, 2024 at 10:30 PM Andy Seaborne  wrote:


Hi Balduin,

Thanks for the detailed report. It's useful to hear of the use case that
occur and also the behaviour of specific deployments.

On 22/04/2024 16:22, Balduin Landolt wrote:

Hello,

we're running Fuseki 5.0.0 (but previously the last 4.x versions behaved
essentially the same) with roughly 40 Mio triples (tendency: growing).
Not sure what configuration is relevant, but we have the default graph as
the union graph.


Sort of relevant.

There are more indexes on named graphs so there is more compaction work
to be done.

"union default graph" is a view at query time, not in the storage itself.


Also, we use Fuseki as our main database, not just as a "view on our data"
so we do quite a bit of updating on the data all the time.

Lately, we've been having more and more issues with servers running out of
disk space because Fuseki's database grew pretty rapidly.
This can be solved by compacting the DB, but with our data and hardware
this takes ca. 15 minutes, during which Fuseki does not accept any update
queries, so for the production system we can't really do this outside of
nighttime hours when (hopefully) no one uses the system anyways.


Is the database disk area on an SSD, on a hard disk, or a remote
filesystem (and then, is it SSD or hard disk)?


Some things we've noticed:
- A subset of our data (I think ~20 Mio triples) taking up 6GB in compacted
state, when dumped to a .trig file is ca. 5GB. But when uploading the same
.trig file to an empty DB, this grows to ca. 25GB
- Dropping graphs does not free up disk space


That's at the point the graph is dropped? It should reclaim space at
compaction.


- A sequence of e.g. 10k queries updating only a small number of triples
(maybe 1-10 or so) on the full dataset seems to grow the DB size a lot,
like 10s to 100s of GB (I don't have numbers on this one, but it was
substantial).


This might be a factor. There is a space overhead per transaction, not
solely due to the size of update. Sounds like 10k updates is makiing
that appreciably.

Are the updates all additions? Or a mix of additions and deletions?


My question is:



Would that kind of growth in disk usage be expected?


Given 10K updates, then what you describe sounds possible.

  > Are other people having similar issues?> Are there strategies to
mitigate this?
Batching the updates although this does mean the updates don't
immediately appear in the database.

This can work reasonable when the updates are additions. If there are
deletes, it's harder.


Maybe some configuration that may be tweaked or so?


Sorry - there aren't any controls.



Best & thanks in advance,
Balduin



  Andy


Re: rdf:parseType="literal" vs. rdf:datatype="...XMLLiteral"

2024-04-25 Thread Andy Seaborne




On 25/04/2024 07:58, Thomas Francart wrote:

Hello Andy

Le lun. 22 avr. 2024 à 21:03, Andy Seaborne  a écrit :



On 22/04/2024 08:02, Thomas Francart wrote:

Hello

This is 3.17.0. Pretty old, due to other dependency with TopQuadrant

SHACL

API.


It's not perfect in 5.0.0 either.

TopQuadrant SHACL is now 4.10.0. It would be good to upgrade because of
XML security issue fixes around Jena 4.3.2.


It is being rejected because it is not legal RDF 1.0. At 1.0, the
lexical space had restrictions (XML exclusive canonicalization) where
 is not allowed. It has to be  -- there are
various other rules as well.



Thank you, I wasn't aware of this.


Nor was I until I checked!

The RDF 1.0 rules are quite confusing and this comes up from time to 
time. At the time there was a DOM so no way to have a defined value 
space other than strings and restrict the lexical form by exclusive 
canonicalization.


(The DOM wasn't standardized at the time of RDF 1.0 IIRC)

The definition of rdf:XMLLiteral changed at RDF 1.1 to one where any XML
document fragment string is valid.

Seems not all places got updated. Partially, that is because it was
depending on the specific implementation of the Jena RDF/XML parser.

https://github.com/apache/jena/issues/2430


Do you happen to have the SPARQL queries? That part of your report is
related to the value space of RDF XML Literals.



Yes, the query is using the "=" operator :


OK - that will get fixed with

https://github.com/apache/jena/issues/2430



ask {
   ?uri a <http://exemple.com/MyClass> .
   ?uri <http://exemple.com/MyProperty> ?x, ?y.
   filter (?x != ?y)
}


This false because use in the filter requires the value and the value is 
undefined.




But then using the sameTerm function we don't get the error:

ask {
   ?uri a <http://exemple.com/MyClass> .
   ?uri <http://exemple.com/MyProperty> ?x, ?y.
   FILTER ( !sameTerm(?x, ?y) )
}




A proper update to RDF 1.1 may change the value object class (it is
"string" for RDF 1.0, it is, by the spec, DocumentFragment for RDF 1.1;
it could be kept at document fragment toString() in jena. I'd like to
understand the usage to see which change is best).

  Andy

BTW It's rdf:parseType="Literal" -- Jena 5.0.0 is not tolerant of lower
case "literal"


And that can be put back to be tolerant on input.





Thanks !

Thomas


Andy


Re: Java 21 support for Jena Fuseki 5.0.0

2024-04-24 Thread Andy Seaborne

The wording has been changed to
"Jena5 requires Java 17, or a later version of Java."

Thanks
Andy

On 24/04/2024 09:45, Balduin Landolt wrote:

Hi list,

me again... Does Jena Fuseki 5.0.0 support Java 21?
On https://jena.apache.org/download/ all I can see is "Jena5 requires Java
17".

Best,
Balduin



Re: Fuseki growing in size and need for compaction

2024-04-24 Thread Andy Seaborne

Hi Balduin,

Thanks for the detailed report. It's useful to hear of the use case that 
occur and also the behaviour of specific deployments.


On 22/04/2024 16:22, Balduin Landolt wrote:

Hello,

we're running Fuseki 5.0.0 (but previously the last 4.x versions behaved
essentially the same) with roughly 40 Mio triples (tendency: growing).
Not sure what configuration is relevant, but we have the default graph as
the union graph.


Sort of relevant.

There are more indexes on named graphs so there is more compaction work 
to be done.


"union default graph" is a view at query time, not in the storage itself.


Also, we use Fuseki as our main database, not just as a "view on our data"
so we do quite a bit of updating on the data all the time.

Lately, we've been having more and more issues with servers running out of
disk space because Fuseki's database grew pretty rapidly.
This can be solved by compacting the DB, but with our data and hardware
this takes ca. 15 minutes, during which Fuseki does not accept any update
queries, so for the production system we can't really do this outside of
nighttime hours when (hopefully) no one uses the system anyways.


Is the database disk area on an SSD, on a hard disk, or a remote 
filesystem (and then, is it SSD or hard disk)?



Some things we've noticed:
- A subset of our data (I think ~20 Mio triples) taking up 6GB in compacted
state, when dumped to a .trig file is ca. 5GB. But when uploading the same
.trig file to an empty DB, this grows to ca. 25GB
- Dropping graphs does not free up disk space


That's at the point the graph is dropped? It should reclaim space at 
compaction.



- A sequence of e.g. 10k queries updating only a small number of triples
(maybe 1-10 or so) on the full dataset seems to grow the DB size a lot,
like 10s to 100s of GB (I don't have numbers on this one, but it was
substantial).


This might be a factor. There is a space overhead per transaction, not 
solely due to the size of update. Sounds like 10k updates is makiing 
that appreciably.


Are the updates all additions? Or a mix of additions and deletions?


My question is:


Would that kind of growth in disk usage be expected? 


Given 10K updates, then what you describe sounds possible.

> Are other people having similar issues?> Are there strategies to 
mitigate this?
Batching the updates although this does mean the updates don't 
immediately appear in the database.


This can work reasonable when the updates are additions. If there are 
deletes, it's harder.



Maybe some configuration that may be tweaked or so?


Sorry - there aren't any controls.



Best & thanks in advance,
Balduin



Andy


Re: Java 21 support for Jena Fuseki 5.0.0

2024-04-24 Thread Andy Seaborne




On 24/04/2024 10:41, Rob @ DNR wrote:

Java versions are generally forwards compatible, so Fuseki should run fine on 
Java 21, unless any of our dependencies have some previously unreported issues 
with Java 21

If you do find any bugs then please file bugs as appropriate

Thanks,

Rob


The project has CI with Java21 (targeting java17 byte code) and Java latest.

https://ci-builds.apache.org/job/Jena/

Currently, the Java23-latest build breaks because of:

1. Removal of javascript name "js"
2. org.awaitility not liking "23-ea" as the JDK version number
3. (jena-permissions) mockito -> Byte Buddy - "Java 23 not supported".

(2) and (3) will "just happen".

Andy


From: Balduin Landolt 
Date: Wednesday, 24 April 2024 at 09:46
To: users@jena.apache.org 
Subject: Java 21 support for Jena Fuseki 5.0.0
Hi list,

me again... Does Jena Fuseki 5.0.0 support Java 21?
On https://jena.apache.org/download/ all I can see is "Jena5 requires Java
17".

Best,
Balduin



Re: rdf:parseType="literal" vs. rdf:datatype="...XMLLiteral"

2024-04-22 Thread Andy Seaborne



On 22/04/2024 08:02, Thomas Francart wrote:

Hello

This is 3.17.0. Pretty old, due to other dependency with TopQuadrant SHACL
API.


It's not perfect in 5.0.0 either.

TopQuadrant SHACL is now 4.10.0. It would be good to upgrade because of 
XML security issue fixes around Jena 4.3.2.



It is being rejected because it is not legal RDF 1.0. At 1.0, the 
lexical space had restrictions (XML exclusive canonicalization) where 
 is not allowed. It has to be  -- there are 
various other rules as well.


(The DOM wasn't standardized at the time of RDF 1.0 IIRC)

The definition of rdf:XMLLiteral changed at RDF 1.1 to one where any XML 
document fragment string is valid.


Seems not all places got updated. Partially, that is because it was 
depending on the specific implementation of the Jena RDF/XML parser.


https://github.com/apache/jena/issues/2430


Do you happen to have the SPARQL queries? That part of your report is 
related to the value space of RDF XML Literals.


A proper update to RDF 1.1 may change the value object class (it is 
"string" for RDF 1.0, it is, by the spec, DocumentFragment for RDF 1.1; 
it could be kept at document fragment toString() in jena. I'd like to 
understand the usage to see which change is best).


Andy

BTW It's rdf:parseType="Literal" -- Jena 5.0.0 is not tolerant of lower 
case "literal"




Thomas

Le sam. 20 avr. 2024 à 18:06, Andy Seaborne  a écrit :


Hi Thomas,

Which version of Jena is this?

  Andy

On 19/04/2024 17:18, Thomas Francart wrote:

Hello

The RDF/XML parsing of the following succeeds:




href="

https://xx.xx.xx/PC"/>



while the RDF/XML parsing of this gives an error : in that case the XML

has

simply be encoded with <, and > and the rdf:datatype has been
explicitly set to XMLLiteral :



http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral

"><am><reference><symbol

href="https://xx.xx.xx/PC
"/></reference></am>




The error is

13:08:04.742 WARN  org.apache.jena.riot - Lexical form
'https://xx.xx.xx/PC"/>'

not

valid for datatype XSD XMLLiteral

and then further down in SPARQL queries:

13:08:04.775 WARN  o.apache.jena.sparql.expr.NodeValue - Datatype format
exception: "https://xx.xx.xx/PC\
"/>"^^rdf:XMLLiteral

The encoded XML is however valid.

Is it possible to explicitely create literals with XMLLiteral datatype in
RDF/XML by setting this datatype explicitely ?

Thanks
Thomas









Re: rdf:parseType="literal" vs. rdf:datatype="...XMLLiteral"

2024-04-20 Thread Andy Seaborne

Hi Thomas,

Which version of Jena is this?

Andy

On 19/04/2024 17:18, Thomas Francart wrote:

Hello

The RDF/XML parsing of the following succeeds:



https://xx.xx.xx/PC"/>



while the RDF/XML parsing of this gives an error : in that case the XML has
simply be encoded with <, and > and the rdf:datatype has been
explicitly set to XMLLiteral :



http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral";>




The error is

13:08:04.742 WARN  org.apache.jena.riot - Lexical form
'https://xx.xx.xx/PC"/>' not
valid for datatype XSD XMLLiteral

and then further down in SPARQL queries:

13:08:04.775 WARN  o.apache.jena.sparql.expr.NodeValue - Datatype format
exception: "https://xx.xx.xx/PC\
"/>"^^rdf:XMLLiteral

The encoded XML is however valid.

Is it possible to explicitely create literals with XMLLiteral datatype in
RDF/XML by setting this datatype explicitely ?

Thanks
Thomas




Re: ModelExtract

2024-04-11 Thread Andy Seaborne

Hi Arne, hi Simon,

It got removed because there wasn't evidence of use. 5.x.x was a chance

It is opinionated and it doesn't feel like a good fit being in the 
central code for graph. It's more like a utility library feature.


It can come back, maybe in a better form or better location.

So to both of you - what are your use cases? What are the 
TripleBoundary/StatementBoundary in use?


Andy

On 09/04/2024 19:56, Arne Bernhardt wrote:

Hi Simon,

my colleagues had the same problem with GraphExtract.
The code has been removed in the context of Jena5 Model API changes
 in the commit
https://github.com/afs/jena/commit/6697a516724745532616bb0db3ce67a8778e2b6c.
So anyone may fetch the latest code from there.
Unfortunately, I am not sure why exactly it has been removed. Since it does
not implement any standard and is not documented under
https://jena.apache.org/documentation/,  I doubt it will find its way back
into Jena.

Greetings
Arne

Am Di., 9. Apr. 2024 um 19:54 Uhr schrieb Dutkowski, Simon <
simon.dutkow...@fokus.fraunhofer.de>:


Hi All

I realized that in version 5.0.0, the classes ModelExtract and co are
removed. Are there any replacements or other ways to achieve the same (or
similar) results?

I can easily fetch the classes from earlier versions and integrate them
into my project directly, but I am not sure if it is necessary, and if
possible I would prefer to avoid it.

Thanks in advance
 Simon

--
Dipl.-Inf. Simon Dutkowski
Fraunhofer FOKUS (DPS)
Kaiserin-Augusta-Allee 31, 10589 Berlin
+49 160 90112644






Re: Performance question with joins

2024-04-01 Thread Andy Seaborne

Hi John,

Yes, the join of two large subqueries is the issue.

Optimization involves making pragmatic determination. Sometimes it isn't 
optimal for some data.


Something to consider is detecting these independency of the (?X_i, 
?X_j) and (?Y_i, ?Y_j) blocks because ia hash join is likely a better 
choice. That, or caching part evaluations there is cross-product like 
effects.


Thank you for the details.

Andy

See also:

"A Worst-Case Optimal Join Algorithm for SPARQL"
https://aidanhogan.com/docs/SPARQL_worst_case_optimal.pdf

"Leapfrog Triejoin: A Simple, Worst-Case Optimal Join"
Algorithm
https://www.openproceedings.org/2014/conf/icdt/Veldhuizen14.pdf

On 29/03/2024 09:25, John Walker wrote:

I did some more experimentation and checked the query algebra using the 
--explain option.

For sake of simplicity I use a simpler query:

```
select (count(*) as ?C)
where {
   {
 select ?X ?Y (struuid() as ?UUID)
 where {
   values ?X { 0 1 }
   values ?Y { 0 1 }
 }
   }
   {
 select ?X ?Y
 where {
   {
 select ?X ?Y (rand() as ?RAND)
 where {
   values ?X { 0 1 }
   values ?Y { 0 1 }
 }
   }
   filter (?RAND < 0.95)
 }
   }
}
```

For this the algebra is:

```
   (project (?C)
 (extend ((?C ?.0))
   (group () ((?.0 (count)))
 (sequence
   (project (?X ?Y ?UUID)
 (extend ((?UUID (struuid)))
   (sequence
 (table (vars ?Y)
   (row [?Y 0])
   (row [?Y 1])
 )
 (table (vars ?X)
   (row [?X 0])
   (row [?X 1])
 
   (project (?X ?Y)
 (project (?X ?Y ?/RAND)
   (filter (< ?/RAND 0.95)
 (extend ((?/RAND (rand)))
   (sequence
 (table (vars ?Y)
   (row [?Y 0])
   (row [?Y 1])
 )
 (table (vars ?X)
   (row [?X 0])
   (row [?X 1])
 ))
```

Whilst if I make a small change to also project some other variable from the 
second subquery

```
select (count(*) as ?C)
where {
   {
 select ?X ?Y (struuid() as ?UUID)
 where {
   values ?X { 0 1 }
   values ?Y { 0 1 }
 }
   }
   {
 select ?X ?Y (0 as ?_)
 where {
   {
 select ?X ?Y (rand() as ?RAND)
 where {
   values ?X { 0 1 }
   values ?Y { 0 1 }
 }
   }
   filter (?RAND < 0.95)
 }
   }
}
```

Then the algebra is:

```
   (project (?C)
 (extend ((?C ?.0))
   (group () ((?.0 (count)))
 (join
   (project (?X ?Y ?UUID)
 (extend ((?UUID (struuid)))
   (sequence
 (table (vars ?Y)
   (row [?Y 0])
   (row [?Y 1])
 )
 (table (vars ?X)
   (row [?X 0])
   (row [?X 1])
 
   (project (?X ?Y ?_)
 (extend ((?_ 0))
   (project (?X ?Y ?/RAND)
 (filter (< ?/RAND 0.95)
   (extend ((?/RAND (rand)))
 (sequence
   (table (vars ?Y)
 (row [?Y 0])
 (row [?Y 1])
   )
   (table (vars ?X)
 (row [?X 0])
 (row [?X 1])
   )))
```

Note the outermost sequence operator has changed to a join operator.
I don’t understand the logic behind that.

Note that projecting the ?RAND variable from the second query does not force 
the join.

John


-Original Message-
From: John Walker 
Sent: Friday, 29 March 2024 08:55
To: users@jena.apache.org
Subject: RE: Performance question with joins

I did a bit more experimentation by putting the second subquery inside some
other clauses:

* FILTER EXISTS - no effect
* OPTIONAL - runtime around 0.5s
* MINUS - runtime around 0.5s

So, I assume that the engine is doing some form of nested loop join to iterate
through each solution from the first subquery and test against the second.
Same as what is happening with FILTER EXISTS.

A "hack" to get around this seems to be to add a redundant MINUS {}
between the subqueries.

John


-Original Message-
From: John Walker 
Sent: Friday, 29 March 2024 07:58
To: jena-users-ml 
Subject: Performance question with joins

Hi,

I am working with some data representing a 2D Cartesian coordinate
system representing simple grid array “maps”
The X and Y coordinates are represented as integers.

I want to join data from different “layers” in the data.
One layer contains a unique identifier for each position.
The other layer only contains a subset of coordinates.

I have written the following queries to simulate some data to
demo

Re: Requesting advice on Fuseki memory settings

2024-03-25 Thread Andy Seaborne




On 25/03/2024 07:05, Gaspar Bartalus wrote:

Dear Andy and co.,

Thanks for the support, I think we can close this thread for now.
We will continue to monitor this behaviour and if we can retrieve any
additional useful information then we might reopen it.


Please do pass on any information and techniques for operation 
Fuseki/TDB. There is so much variety "out there" that all reports are 
helpful.


Andy



Best regards,
Gaspar

On Sun, Mar 24, 2024 at 5:00 PM Andy Seaborne  wrote:




On 21/03/2024 09:52, Rob @ DNR wrote:

Gaspar

This probably relates to https://access.redhat.com/solutions/2316

Deleting a file removes it from the file table but doesn’t immediately

free the space if a process is still accessing those files.  That could be
something else inside the container, or in a containerised environment
where the disk space is mounted that could potentially be host processes on
the K8S node that are monitoring the storage.
  >

There’s some suggested debugging steps in the RedHat article about ways

to figure out what processes might still be holding onto the old database
files


Rob


Fuseki does close the database connections after compact but only after
all read transactions on the old database have completed. that can hold
the database open for a while.

Another delay is the ext4 filing system. Deletes will be in the journal
and only when the journal operations are performed will the file system
be released. Usually this happens quickly, but I've seen it take an
appreciable length of time occasionally.

Gaspar wrote:
  > then we start fresh where du -sh and df -h return the same numbers.

This indicates the file space has been release. Restarting clears any
outstanding read-transactions and likely gives the ext4 journal to run
through.

Just about any layer (K8s, VMs) adds delays to real release of the space
but it should happen eventually.

  Andy


From: Gaspar Bartalus 
Date: Wednesday, 20 March 2024 at 11:41
To: users@jena.apache.org 
Subject: Re: Requesting advice on Fuseki memory settings
Hi Andy

On Sat, Mar 16, 2024 at 8:58 PM Andy Seaborne  wrote:




On 12/03/2024 13:17, Gaspar Bartalus wrote:

On Mon, Mar 11, 2024 at 6:28 PM Andy Seaborne  wrote:


On 11/03/2024 14:35, Gaspar Bartalus wrote:

Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne

wrote:




On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences

between

the actual size of our dataset and the size it uses on disk.

(Changes

between `df -h` and `du -lh`)?

Linear increase between compactions or across compactions? The

latter

sounds like the previous version hasn't been deleted.


Across compactions, increasing linearly over several days, with

compactions

running every day. The compaction is used with the "deleteOld"

parameter,

and there is only one Data- folder in the volume, so I assume

compaction

itself works as expected.



Strange - I can't explain that. Could you check that there is only one
Data- directory inside the database directory?


Yes, there is surely just one Data- folder in the database

directory.



What's the disk storage setup? e.g filesystem type.


We have an Azure disk of type Standard SSD LRS with a filesystem of

type

Ext4.


Hi Gaspar,

I still can't explain what your seeing I'm afraid.

Can we get some more details?

When the server has Data-N -- how big (as reported by 'du -sh') is that
directory and how big is the whole directory for the database. They
should be nearly equal.




When a compaction is done, and the server is at Data-(N+1), what are the
sizes of Data-(N+1) and the database directory?



What we see with respect to compaction is usually the following:
- We start with the Data-N folder of ~210MB
- After compaction we have a Data-(N+1) folder of size ~185MB, the old
Data-N being deleted.
- The sizes of the database directory and the Data-* directory are equal.

However when we check with df -h we sometimes see that volume usage is

not

dropping, but on the contrary, it goes up ~140MB after each compaction.



Does stop/starting the server change those numbers?



Yes, then we start fresh where du -sh and df -h return the same numbers.



   Andy







Re: query performance on named graph vs. default graph

2024-03-24 Thread Andy Seaborne




On 21/03/2024 00:21, Jim Balhoff wrote:

Hi Lorenz,

These both do speed things up quite a bit, but it prevents matching patterns 
that cross graphs in the case where I include multiple graphs.

Thanks,
Jim


It is the combination choosing certain graphs and wanting cross graph 
patterns that pushes the code into working in general way. it works in 
Nodes, and that means string comparisons.  That looses the TDB ability 
to do faster joins using NodeIds which both avoids string comparisons 
and retrieving the strings until they are known to be needed for the 
results.


Is there a reason for not having a union default graph overall the named 
graphs instead of selecting certain ones? If it is all named graphs, the 
union is TDB2 level.


You can have a Fuseki setup with two endpoints - one that does union 
default graph, one that does not, for the same dataset.


Andy





On Mar 20, 2024, at 4:28 AM, Lorenz Buehmann 
 wrote:

Hi,

what about

SELECT *
FROM NAMED 
FROM NAMED 
FROM NAMED  ...
FROM NAMED 
{
   GRAPH ?g {
   ...
   }
}

or

SELECT *
{
  VALUES ?g {  ... }
   GRAPH ?g {
 ...
   }
}


does that work better?

On 19.03.24 15:21, Jim Balhoff wrote:

Hi Andy,


On Mar 19, 2024, at 5:02 AM, Andy Seaborne  wrote:
Hi Jim,

What happens if you use GRAPH rather than FROM?

WHERE {
   GRAPH <http://example.org/ubergraph> {
 ?cell rdfs:subClassOf cell: .
 ?cell part_of: ?organ .
 ?organ rdfs:subClassOf organ: .
 ?organ part_of: abdomen: .
 ?cell rdfs:label ?cell_label .
 ?organ rdfs:label ?organ_label .
   }
}


This does help. With TDB this is actually faster than using the default graph. 
With the HDT setup it’s about the same (fast). But it doesn’t work that well 
for what I’m trying to do (below).


FROM builds a "view dataset" which is general purpose (e.g. multiple FROM are 
possible) but which is less efficient for basic graph pattern matching. It does not use 
the TDB2 basic graph pattern matcher.

GRAPH restricts to a single graph and the query goes direct to TDB2 basic graph 
pattern matcher.



If there is only one name graph, is here a reason to have it as a named graph? 
Using the default graph and no unionDefaultGraph may be

What I am really trying to do is have suite of large graphs that I can choose 
to include or not in a particular query, depending on what data sources I want 
to use in the query. I have several HDT files, one for each data source. I set 
this up as a dataset with a named graph for each data file, and was at first 
very happy with how it performed while turning on and off graphs using FROM 
lines. For example I have Wikidata in one HDT file, and it looks like having it 
available doesn’t slow down queries on other graphs when it’s not included. 
However I did see that performance issue in the query I asked about, and found 
it wasn’t related to having multiple graphs loaded; it happens even with just 
that one graph configured.

If I wrote my own server that accepted a list of data source names in a query 
parameter, and then for each request constructed a union model for executing 
the query over the required HDT graphs, would that work any better? Or is that 
basically the same as what FROM is doing?

Thank you,
Jim



--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109 
Leipzig | Germany





Re: Requesting advice on Fuseki memory settings

2024-03-24 Thread Andy Seaborne




On 21/03/2024 09:52, Rob @ DNR wrote:

Gaspar

This probably relates to https://access.redhat.com/solutions/2316

Deleting a file removes it from the file table but doesn’t immediately free the 
space if a process is still accessing those files.  That could be something 
else inside the container, or in a containerised environment where the disk 
space is mounted that could potentially be host processes on the K8S node that 
are monitoring the storage.

>

There’s some suggested debugging steps in the RedHat article about ways to 
figure out what processes might still be holding onto the old database files

Rob


Fuseki does close the database connections after compact but only after 
all read transactions on the old database have completed. that can hold 
the database open for a while.


Another delay is the ext4 filing system. Deletes will be in the journal 
and only when the journal operations are performed will the file system 
be released. Usually this happens quickly, but I've seen it take an 
appreciable length of time occasionally.


Gaspar wrote:
> then we start fresh where du -sh and df -h return the same numbers.

This indicates the file space has been release. Restarting clears any 
outstanding read-transactions and likely gives the ext4 journal to run 
through.


Just about any layer (K8s, VMs) adds delays to real release of the space 
but it should happen eventually.


Andy


From: Gaspar Bartalus 
Date: Wednesday, 20 March 2024 at 11:41
To: users@jena.apache.org 
Subject: Re: Requesting advice on Fuseki memory settings
Hi Andy

On Sat, Mar 16, 2024 at 8:58 PM Andy Seaborne  wrote:




On 12/03/2024 13:17, Gaspar Bartalus wrote:

On Mon, Mar 11, 2024 at 6:28 PM Andy Seaborne  wrote:


On 11/03/2024 14:35, Gaspar Bartalus wrote:

Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne  wrote:



On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences

between

the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?

Linear increase between compactions or across compactions? The latter
sounds like the previous version hasn't been deleted.


Across compactions, increasing linearly over several days, with

compactions

running every day. The compaction is used with the "deleteOld"

parameter,

and there is only one Data- folder in the volume, so I assume

compaction

itself works as expected.



Strange - I can't explain that. Could you check that there is only one
Data- directory inside the database directory?


Yes, there is surely just one Data- folder in the database directory.


What's the disk storage setup? e.g filesystem type.


We have an Azure disk of type Standard SSD LRS with a filesystem of type
Ext4.


Hi Gaspar,

I still can't explain what your seeing I'm afraid.

Can we get some more details?

When the server has Data-N -- how big (as reported by 'du -sh') is that
directory and how big is the whole directory for the database. They
should be nearly equal.




When a compaction is done, and the server is at Data-(N+1), what are the
sizes of Data-(N+1) and the database directory?



What we see with respect to compaction is usually the following:
- We start with the Data-N folder of ~210MB
- After compaction we have a Data-(N+1) folder of size ~185MB, the old
Data-N being deleted.
- The sizes of the database directory and the Data-* directory are equal.

However when we check with df -h we sometimes see that volume usage is not
dropping, but on the contrary, it goes up ~140MB after each compaction.



Does stop/starting the server change those numbers?



Yes, then we start fresh where du -sh and df -h return the same numbers.



  Andy



Re: [ANN] Apache Jena 5.0.0

2024-03-21 Thread Andy Seaborne




On 20/03/2024 17:18, Arne Bernhardt wrote:

Hi Ryan,

there is no "term graph" to be found via Google. From Jena 5.0 on, the
default in-memory Graph in Jena will treat typed literals everywhere as
described under "literals term equality" in
https://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal.

Before Jena 5, the default in-memory graph "indexed" object nodes based on
their values for typed literals, and methods like Graph#find and
Graph#contains found matches based on the values.

As far as I know, Fuseki always evaluated SPARQL with the
standard-compliant literal term equality.
But if one executed a query via the query API on the Jena 4 in-memory
graphs, the query execution would use object value equality.

I hope my explanation was roughly correct and helpful.

Arne


Hi Ryan,

In RDF, a literal looks like

"1"^^xsd:int

It is one of the kinds of RDF term

https://www.w3.org/TR/rdf11-concepts/#section-rdf-graph

"1" is lexical form.
xsd:int is the datatype.

The datatype xsd:int determines how these are mapped to values.
"+1", "0001" and "1" all map to the value one.

Two literal terms are the same term if and only if they have the same 
lexical form and same datatype (and language tag).


"+1"^^xsd:int has a different lexical form to "1"^^xsd:int so it is a 
different RDF term, yet they represent the same value.


In SPARQL,
   SAMETERM("1"^^xsd:int, "+1"^^xsd:int) is false.
   "1"^^xsd:int = "+1"^^xsd:int  is true.

Some Jena models stored literal by value.
RDF and SPARQL are defined to work with a graph made our of RDF terms, 
not values.


A "term graph" is one where Graph.find(,,1) or Model.listStatements() 
only considers RDF terms.


A "value graph" is one where looking for the literal "1"^^xsd:int might 
find "+1"^^xsd:int.



The change shouldn't have a widespread impact but it could be visible.
XSD datatypes define a canonical form - the preferred way to write a 
value. "1"^^xsd:int is canonical; "+1"^^xsd:int is not canonical.

Most published data uses canonical forms.

Andy


Shaw, Ryan  schrieb am Mi., 20. März 2024, 13:32:




On Mar 20, 2024, at 5:05 AM, Andy Seaborne  wrote:

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not

match "same value" for some of the Java mapped datatypes. The model API
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.


Can someone point me to an explanation of what this means? I am not
familiar with the terminology of "term graph" and "value graph" and a quick
web search turns up nothing that looks relevant.







[ANN] Apache Jena 5.0.0

2024-03-20 Thread Andy Seaborne

The Apache Jena development community is pleased to
announce the release of Apache Jena 5.0.0.

In Jena5:

* Minimum Java requirement: Java 17

* Language tags are case-insensitive unique.

* Term graphs for in-memory models

* RRX - New RDF/XML parser

* Remove support for JSON-LD 1.0

* Turtle/Trig Output : default output PREFIX and BASE

* New artifacts : jena-bom and OWASP CycloneDX SBOM

* API deprecation removal

* Dependency updates :
Note: slf4j update : v1 to v2 (needs log4j change)

More details below.

 Contributions:

Configurable CORS headers for Fuseki
  From Paul Gallagher

Balduin Landolt @BalduinLandolt - javadoc fix for Literal.getString.

@OyvindLGjesdal - https://github.com/apache/jena/pull/2121 -- text index fix

Tong Wang @wang3820 Fix tests due to hashmap order

Explicit Accept headers on RDFConnectionRemote fix
  from @Aklakan



All issues in this release:
https://s.apache.org/jena-5.0.0-issues

which includes the ones specifically related to Jena5:

  https://github.com/apache/jena/issues?q=label%3Ajena5

** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

Jakarta JavaEE required for deploying the WAR file (Apache Tomcat10)

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the Java mapped datatypes. The model API 
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Artifacts

There is now a release BOM for Jena artifacts - artifact 
org.apache.jena:jena-bom


There are now OWASP CycloneDX SBOM for Jena artifacts.
https://github.com/CycloneDX

jena-tdb is renamed jena-tdb1.

jena-jdbc is no longer released

** Dependencies

The update to slf4j 2.x means the log4j artifact changes to
"log4j-slf4j2-impl" (was "log4j-slf4j-impl").


 API Users

** Deprecation removal

There has been a clearing out of deprecated functions, methods and 
classes. This includes the deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders which are 
preferred and provide all full query execution setup controls.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.


Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1

 Fuseki Users

Fuseki: Uses the Jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


== Obtaining Apache Jena 5.0.0

* Via central.maven.org

The main jars and their dependencies can used with:

  
org.apache.jena
apache-jena-libs
pom
5.0.0
  

Full details of all maven artifacts are described at:

http://jena.apache.org/download/maven.html

* As binary downloads

Apache Jena libraries are available as a binary distribution of
libraries. For details of a global mirror copy of Jena binaries please see:

http://jena.apache.org/download/

* Source code for the release

The signed source code of this release is available at:

http://www.apache.org/dist/jena/source/

and the signed master source for all Apache Jena releases is available
at: http://archive.apache.org/dist/jena/

== Contributing

If you would like to help out, a good place to look is the list of
unresolved JIRA at:

https://https://github.com/apache/jena/issuesissues-current

or review pull requests at

https://github.com/apache/jena/pulls

or drop into the dev@ list.

We use github pull requests and other ways for accepting code:
  

Re: [EXTERNAL] Re: Query Performance Degrade With Sorting In Subquery

2024-03-19 Thread Andy Seaborne

Hi there,

Could you give some background as to what the sub-select / ORDER / LIMT 
blocks are trying to achieve? Maybe there is another way.


Andy

On 19/03/2024 10:50, Rob @ DNR wrote:

You haven’t specified how your data is stored but assuming you are using Jena’s 
TDB/TDB2 then the triples/quads themselves are already indexed for efficient 
access.  It also inlines some value types that speeds up some comparisons and 
filters, including those used in simple ORDER BY expression as in your example.

This assumes that your objects for relations:hasUserCount triples are properly 
typed as xsd:integer or another well-known XSD numeric type, if not Jena is 
forced to fallback to more simplistic lexical string sorting which can be more 
expensive.

However, there is no indexing available for sorting because SPARQL allows for 
arbitrarily complex sort expressions, and the inputs to those expressions may 
themselves be dynamically computed values that don’t exist in the underlying 
dataset directly.

Rob

From: Chirag Ratra 
Date: Tuesday, 19 March 2024 at 10:39
To: users@jena.apache.org , Andy Seaborne , 
dcchabg...@gmail.com 
Subject: Re: [EXTERNAL] Re: Query Performance Degrade With Sorting In Subquery
Is there any way to create an index or something?

On Tue, Mar 19, 2024 at 3:46 PM Rob @ DNR  wrote:


This is due to Jena’s lazy evaluation in its query engine.

When you include a LIMIT clause on its own Jena only needs find the first
N results (10 in your example) at which point it can abort any further
processing and return results.  In this case evaluation is lazy.

When you include LIMIT and ORDER BY clauses Jena has to find all possible
results, sort them, and then return only the first N results.  In this case
full evaluation is required.

One possible approach might be to split into multiple queries i.e. do one
query to get your main set of results, and then separately issue the
related item sub-queries with concrete values substituted into for your
?concept and ?titleSkosXl values as while Jena will still need to do full
evaluation injecting a concrete value will constrain the query evaluation
further

Hope this helps,

Rob

From: Chirag Ratra 
Date: Tuesday, 19 March 2024 at 07:46
To: users@jena.apache.org 
Subject: Query Performance Degrade With Sorting In Subquery
Hi,

Facing a big performance degradation  while using sort query in subquery
If I run query without sorting the response of my query is around 200 ms
but when I use the order by query,  performance comes to be around 4-5
seconds.

Here is my query :

PREFIX text: <http://jena.apache.org/text#<http://jena.apache.org/text>>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#<
http://www.w3.org/2004/02/skos/core>><http://www.w3.org/2004/02/skos/core%3e%3e>
PREFIX skosxl: <http://www.w3.org/2008/05/skos-xl#<
http://www.w3.org/2008/05/skos-xl>><http://www.w3.org/2008/05/skos-xl%3e%3e>
PREFIX relations: <https://cxdata.bold.com/ontologies/myDomain#<
https://cxdata.bold.com/ontologies/myDomain>><https://cxdata.bold.com/ontologies/myDomain%3e%3e>

SELECT ?concept ?titleSkosxl ?title ?languageCode (GROUP_CONCAT(DISTINCT
?relatedTitle; separator=", ") AS ?relatedTitles) (GROUP_CONCAT(DISTINCT
?alternate; separator=", ") AS ?alternates)
WHERE
{
   (?titleSkosxl ?score) text:query ('cashier').

?concept skosxl:prefLabel ?titleSkosxl.
   ?titleSkosxl skosxl:literalForm ?title.
   ?titleSkosxl relations:usedInLocale ?controlledList.
   ?controlledList relations:languageMarketCode ?languageCode
FILTER(?languageCode = 'en-US').


#  get alternate title
OPTIONAL
   {
 Select ?alternate  {
 ?concept skosxl:altLabel ?alternateSkosxl.
 ?alternateSkosxl skosxl:literalForm ?alternate;
   relations:hasUserCount ?alternateUserCount.
 }
ORDER BY DESC (?alternateUserCount) LIMIT 10
}

#  get related titles
   OPTIONAL
   {
   Select ?relatedTitle
   {
 ?titleSkosxl relations:isRelatedTo ?relatedSkosxl.
 ?relatedSkosxl skosxl:literalForm ?relatedTitle;
 relations:hasUserCount ?relatedUserCount.
   }
ORDER BY DESC (?relatedUserCount) LIMIT 10
}
}
GROUP BY ?concept ?titleSkosxl ?title ?languageCode ?alternateJobTitle
?notation
ORDER BY DESC(?jobtitleWeight) DESC(?score)
LIMIT 10

The sorting queries given causes huge performance degradation :
ORDER BY DESC (?alternateUserCount) AND ORDER BY DESC (?relatedUserCount)

How can this be improved, this sorting will be used in each and every query
in my application.

--








This email may contain material that is confidential, privileged,
or for the sole use of the intended recipient.  Any review, disclosure,
reliance, or distribution by others or forwarding without express
permission is strictly prohibited.  If you are not the intended recipient,
please contact the sender and delete all copies, including attachments.

Re: query performance on named graph vs. default graph

2024-03-19 Thread Andy Seaborne




On 18/03/2024 17:46, Jim Balhoff wrote:

Hi,

I’m running a particular query in a Fuseki server which performs very 
differently if the data is in a named graph vs. the default graph. I’m 
wondering if it’s expected to have a large performance hit if a named graph is 
specified. The dataset consists of ~462 million triples; it’s this dataset with 
all graphs merged together: 
https://github.com/INCATools/ubergraph?tab=readme-ov-file#downloads

I have loaded all the triples into a named graph in TDB2 using this command:

tdb2.tdbloader --loc tdb --graph 'http://example.org/ubergraph’ ubergraph.nt.gz

My fuseki config is like this:

[] rdf:type fuseki:Server ;
 ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "12" ] ;
 fuseki:services ( <#my-service> ) .

<#my-service> rdf:type fuseki:Service ;
 fuseki:name  "union" ;
 fuseki:serviceQuery  "sparql" ;
 fuseki:serviceReadGraphStore "get" ;
 fuseki:dataset   <#dataset> .

<#dataset> rdf:type  tdb2:DatasetTDB2 ;
 tdb2:location "tdb" ;
 tdb2:unionDefaultGraph true .

This is my query:

PREFIX rdfs: 
PREFIX cell: 
PREFIX organ: 
PREFIX abdomen: 
PREFIX part_of: 
SELECT DISTINCT ?cell ?organ
FROM 
WHERE {
   ?cell rdfs:subClassOf cell: .
   ?cell part_of: ?organ .
   ?organ rdfs:subClassOf organ: .
   ?organ part_of: abdomen: .
   ?cell rdfs:label ?cell_label .
   ?organ rdfs:label ?organ_label .
}

Using the FROM line causes the query to complete in about 40 seconds. Deleting 
the FROM line allows the query to complete in about 5 seconds.

The reason I was testing this in TDB2 is that I first noticed this behavior 
with an HDT backend, and wanted to make sure it wasn’t only an HDT issue. If I 
create a dataset using an HDT graph as the default graph, the query completes 
in a fraction of a second, but if I use the graph as a named graph the time 
jumps to about 20 seconds. For both of these scenarios (TDB2 and HDT) there is 
only a single named graph in the dataset.

Is there any way to improve performance when using FROM in the query?


Hi Jim,

What happens if you use GRAPH rather than FROM?

WHERE {
   GRAPH  {
 ?cell rdfs:subClassOf cell: .
 ?cell part_of: ?organ .
 ?organ rdfs:subClassOf organ: .
 ?organ part_of: abdomen: .
 ?cell rdfs:label ?cell_label .
 ?organ rdfs:label ?organ_label .
   }
}

FROM builds a "view dataset" which is general purpose (e.g. multiple 
FROM are possible) but which is less efficient for basic graph pattern 
matching. It does not use the TDB2 basic graph pattern matcher.


GRAPH restricts to a single graph and the query goes direct to TDB2 
basic graph pattern matcher.




If there is only one name graph, is here a reason to have it as a named 
graph? Using the default graph and no unionDefaultGraph may be


Andy



Thank you,
Jim



Re: Requesting advice on Fuseki memory settings

2024-03-16 Thread Andy Seaborne




On 12/03/2024 13:17, Gaspar Bartalus wrote:

On Mon, Mar 11, 2024 at 6:28 PM Andy Seaborne  wrote:


On 11/03/2024 14:35, Gaspar Bartalus wrote:

Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne  wrote:



On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences

between

the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?

Linear increase between compactions or across compactions? The latter
sounds like the previous version hasn't been deleted.


Across compactions, increasing linearly over several days, with

compactions

running every day. The compaction is used with the "deleteOld" parameter,
and there is only one Data- folder in the volume, so I assume compaction
itself works as expected.



Strange - I can't explain that. Could you check that there is only one
Data- directory inside the database directory?


Yes, there is surely just one Data- folder in the database directory.


What's the disk storage setup? e.g filesystem type.


We have an Azure disk of type Standard SSD LRS with a filesystem of type
Ext4.


Hi Gaspar,

I still can't explain what your seeing I'm afraid.

Can we get some more details?

When the server has Data-N -- how big (as reported by 'du -sh') is that 
directory and how big is the whole directory for the database. They 
should be nearly equal.


When a compaction is done, and the server is at Data-(N+1), what are the 
sizes of Data-(N+1) and the database directory?


Does stop/starting the server change those numbers?

Andy


Re: Problems when querying the SPARQL with Jena

2024-03-12 Thread Andy Seaborne




On 12/03/2024 13:02, Anna P wrote:

Hi Lorenz,
Thank you for your reply. Yes, I used maven to build the project. Here are
dependencies details:
Hi Lorenz,

Yes, I used maven to build the project. Here are the dependencies details:

UTF-8
1.8
1.8




junit
junit
4.11
test


org.apache.jena
apache-jena-libs
5.0.0-rc1
pom


org.apache.maven.plugins
maven-assembly-plugin


Is that creating a jar file to run?

The assembly plugin does not manage Java's service loader files. Either 
it needs doing in some way and put into the assembled jar.


There is the shade plugin that manages combined service loader files 
more easily:


https://jena.apache.org/documentation/notes/jena-repack.html

This is how the combined jar jena-fuseki-server is built:

https://github.com/apache/jena/blob/main/jena-fuseki2/jena-fuseki-server/pom.xml#L87-L138

Andy


3.6.0
maven-plugin



Best regards,
Pan

On Tue, Mar 12, 2024 at 7:13 AM Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:


Hi,

how did you setup your project? Which Jena version? Do you use Maven?
Which dependencies? It looks like ARQ.init() hasn't been called which
should happen automatically if the setup of the project is correct.


Cheers,
Lorenz

On 11.03.24 14:44, Anna P wrote:

Dear Jena support team,

Currently I just started to work on a SPARQL project using Jena and I

could

not get a solution when I query a model.
I imported a turtle file and ran a simple query, and the snippet code is
shown below. However, I got the error.

public class App {
  public static void main(String[] args) {
  try {
  Model model = RDFDataMgr.loadModel('data.ttl', Lang.TURTLE);
  RDFDataMgr.write(System.out, model, Lang.TURTLE);
  String queryString = "SELECT * { ?s ?p ?o }";
  Query query = QueryFactory.create(queryString);
  QueryExecution qe = QueryExecutionFactory.create(query,

model);

  ResultSet results = qe.execSelect();
  ResultSetFormatter.out(System.out, results, query);
  qe.close();
  } catch (Exception e) {
  e.printStackTrace();
  }
  }
}

Here is the error message:

org.apache.jena.riot.RiotException: Not registered as a SPARQL result set
output syntax: Lang:SPARQL-Results-JSON
  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:179)

  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:156)

  at


org.apache.jena.sparql.resultset.ResultsWriter.write(ResultsWriter.java:149)

  at


org.apache.jena.sparql.resultset.ResultsWriter$Builder.write(ResultsWriter.java:96)

  at


org.apache.jena.query.ResultSetFormatter.output(ResultSetFormatter.java:308)

  at


org.apache.jena.query.ResultSetFormatter.outputAsJSON(ResultSetFormatter.java:516)

  at de.unistuttgart.ki.esparql.App.main(App.java:46)


Thank you for your time and help!

Best regards,

Pan


--
Lorenz Bühmann
Research Associate/Scientific Developer

Email buehm...@infai.org

Institute for Applied Informatics e.V. (InfAI) | Goerdelerring 9 | 04109
Leipzig | Germany






Re: Requesting advice on Fuseki memory settings

2024-03-11 Thread Andy Seaborne




On 11/03/2024 14:35, Gaspar Bartalus wrote:

Hi Andy,

On Fri, Mar 8, 2024 at 4:41 PM Andy Seaborne  wrote:




On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences between
the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?


Linear increase between compactions or across compactions? The latter
sounds like the previous version hasn't been deleted.



Across compactions, increasing linearly over several days, with compactions
running every day. The compaction is used with the "deleteOld" parameter,
and there is only one Data- folder in the volume, so I assume compaction
itself works as expected.


Strange - I can't explain that. Could you check that there is only one 
Data- directory inside the database directory?


What's the disk storage setup? e.g filesystem type.

Andy


TDB uses sparse files. It allocates 8M chunks per index but that isn't
used immediately. Sparse files are reported differently by different
tools and also differently by different operating systems. I don't know
how k3s is managing the storage.

Sometimes it's the size of the file, sometimes it's the amount of space
in use. For small databases, there is quite a difference.

An empty database is around 220kbytes but you'll see many 8Mbyte files
with "ls -l".

If you zip the database up, and unpack it then it's 193Mbytes.

After a compaction, the previous version of storage can be deleted. The
directory "Data-..." - only the highest numbered directory is used. A
previous one can be zipped up for backup.


The heap memory has some very minimal peaks, saw-tooth, but otherwise

it's

flat.


At what amount of memory?



At ~7GB.





Regards,
Gaspar

On Thu, Mar 7, 2024 at 11:55 PM Andy Seaborne  wrote:




On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,

We would like to ask you to help us in configuring the memory for our
jena-fuseki instance running in kubernetes.

*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the
resource config:

Limits:
cpu: 2
memory:  16Gi
Requests:
cpu: 100m
memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A major
GC will then free up a lot of memory. But the JVM does not give the
memory back to the kernel.

TDB2 does not only use heap space. A heap of 2-4G is usually enough per
dataset, sometimes less (data shape depenendent - e.g. many large
literals used more space.

Use a profiler to examine the heap in-use, you'll probably see a
saw-tooth shape.
Force a GC and see the level of in-use memory afterwards.
Add some safety margin and work space for requests and try that as the
heap size.


*  We execute the following type of UPDATE operations:
 - There are triggers in the system (e.g. users of the application
changing the data) which start ~50 other update operations containing
up to ~30K triples. Most of them run in parallel, some are delayed
with seconds or minutes.
 - There are scheduled UPDATE operations (executed on hourly basis)
containing 30K-500K triples.
 - These UPDATE operations usually delete and insert the same amount
of triples in the dataset. We use the compact API as a nightly job.

*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in
the JVM_ARGS.

* There are points in time when the volume usage of the k8s container
starts to increase suddenly. This does not drop even though compaction
is successfully executed and the dataset size (triple count) does not
increase. See attachment below.

*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as
quickly as we would expect it, and the heap limit is reached quickly
if multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that
is not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they
could go to Gen2 as well, using more and more storage space).

Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration
for our use case?

Thanks in advance and best wishes,
Gaspar Bartalus











Re: Requesting advice on Fuseki memory settings

2024-03-08 Thread Andy Seaborne

Hi Jan,

On 08/03/2024 12:31, Jan Eerdekens wrote:

In our data mesh use case we currently also have serious disk issues
because frequently removing/adding and updating data in a dataset seems to
increase the disk usage a lot. We're currently running frequent compact
calls, but especially on the larger datasets these have the tendency to
stall/not finish which eventually causes the system to run out of storage
(even though the actual amount of data is relatively small).


Is there anything in the log files to indicate what is causing the 
compactions to fail?


Jena 5.0.0 wil have a more robust compaction step for linux and MacOS 
(and native Windows eventually - but that is current unreliable. Windows 
 deleting memory mapped files is a well-known, long time JDK issue)



In the beginning we also had some memory/GC issues, but after assigning
some more memory (we're at 12Gb now), tuning some GC parameters, switching
to SSD and adding some CPU capacity the GC issues seem to be under control.
We're currently also looking into configuring the disk to have more IOPS to
see if that can help with the compacting issues we're seeing now.


What size is your data?

What sort of storage class are you using for the database?

Andy



On Fri, 8 Mar 2024 at 11:40, Gaspar Bartalus  wrote:


Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences between
the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?

The heap memory has some very minimal peaks, saw-tooth, but otherwise it's
flat.

Regards,
Gaspar

On Thu, Mar 7, 2024 at 11:55 PM Andy Seaborne  wrote:




On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,

We would like to ask you to help us in configuring the memory for our
jena-fuseki instance running in kubernetes.

*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the
resource config:

Limits:
   cpu: 2
   memory:  16Gi
Requests:
   cpu: 100m
   memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A major
GC will then free up a lot of memory. But the JVM does not give the
memory back to the kernel.

TDB2 does not only use heap space. A heap of 2-4G is usually enough per
dataset, sometimes less (data shape depenendent - e.g. many large
literals used more space.

Use a profiler to examine the heap in-use, you'll probably see a
saw-tooth shape.
Force a GC and see the level of in-use memory afterwards.
Add some safety margin and work space for requests and try that as the
heap size.


*  We execute the following type of UPDATE operations:
- There are triggers in the system (e.g. users of the application
changing the data) which start ~50 other update operations containing
up to ~30K triples. Most of them run in parallel, some are delayed
with seconds or minutes.
- There are scheduled UPDATE operations (executed on hourly basis)
containing 30K-500K triples.
- These UPDATE operations usually delete and insert the same amount
of triples in the dataset. We use the compact API as a nightly job.

*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in
the JVM_ARGS.

* There are points in time when the volume usage of the k8s container
starts to increase suddenly. This does not drop even though compaction
is successfully executed and the dataset size (triple count) does not
increase. See attachment below.

*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as
quickly as we would expect it, and the heap limit is reached quickly
if multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that
is not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they
could go to Gen2 as well, using more and more storage space).

Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration
for our use case?

Thanks in advance and best wishes,
Gaspar Bartalus









Re: Requesting advice on Fuseki memory settings

2024-03-08 Thread Andy Seaborne




On 08/03/2024 10:40, Gaspar Bartalus wrote:

Hi,

Thanks for the responses.

We were actually curious if you'd have some explanation for the
linear increase in the storage, and why we are seeing differences between
the actual size of our dataset and the size it uses on disk. (Changes
between `df -h` and `du -lh`)?


Linear increase between compactions or across compactions? The latter 
sounds like the previous version hasn't been deleted.


TDB uses sparse files. It allocates 8M chunks per index but that isn't 
used immediately. Sparse files are reported differently by different 
tools and also differently by different operating systems. I don't know 
how k3s is managing the storage.


Sometimes it's the size of the file, sometimes it's the amount of space 
in use. For small databases, there is quite a difference.


An empty database is around 220kbytes but you'll see many 8Mbyte files 
with "ls -l".


If you zip the database up, and unpack it then it's 193Mbytes.

After a compaction, the previous version of storage can be deleted. The 
directory "Data-..." - only the highest numbered directory is used. A 
previous one can be zipped up for backup.



The heap memory has some very minimal peaks, saw-tooth, but otherwise it's
flat.


At what amount of memory?



Regards,
Gaspar

On Thu, Mar 7, 2024 at 11:55 PM Andy Seaborne  wrote:




On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,

We would like to ask you to help us in configuring the memory for our
jena-fuseki instance running in kubernetes.

*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the
resource config:

Limits:
   cpu: 2
   memory:  16Gi
Requests:
   cpu: 100m
   memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A major
GC will then free up a lot of memory. But the JVM does not give the
memory back to the kernel.

TDB2 does not only use heap space. A heap of 2-4G is usually enough per
dataset, sometimes less (data shape depenendent - e.g. many large
literals used more space.

Use a profiler to examine the heap in-use, you'll probably see a
saw-tooth shape.
Force a GC and see the level of in-use memory afterwards.
Add some safety margin and work space for requests and try that as the
heap size.


*  We execute the following type of UPDATE operations:
- There are triggers in the system (e.g. users of the application
changing the data) which start ~50 other update operations containing
up to ~30K triples. Most of them run in parallel, some are delayed
with seconds or minutes.
- There are scheduled UPDATE operations (executed on hourly basis)
containing 30K-500K triples.
- These UPDATE operations usually delete and insert the same amount
of triples in the dataset. We use the compact API as a nightly job.

*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in
the JVM_ARGS.

* There are points in time when the volume usage of the k8s container
starts to increase suddenly. This does not drop even though compaction
is successfully executed and the dataset size (triple count) does not
increase. See attachment below.

*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as
quickly as we would expect it, and the heap limit is reached quickly
if multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that
is not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they
could go to Gen2 as well, using more and more storage space).

Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration
for our use case?

Thanks in advance and best wishes,
Gaspar Bartalus







Re: Requesting advice on Fuseki memory settings

2024-03-07 Thread Andy Seaborne



On 07/03/2024 13:24, Gaspar Bartalus wrote:

Dear Jena support team,

We would like to ask you to help us in configuring the memory for our 
jena-fuseki instance running in kubernetes.


*We have the following setup:*

* Jena-fuseki deployed as StatefulSet to a k8s cluster with the 
resource config:


Limits:
  cpu:     2
  memory:  16Gi
Requests:
  cpu:     100m
  memory:  11Gi

* The JVM_ARGS has the following value: -Xmx10G

* Our main dataset of type TDB2 contains ~1 million triples.

A million triples doesn't take up much RAM even in a memory dataset.

In Java, the JVM will grow until it is close to the -Xmx figure. A major 
GC will then free up a lot of memory. But the JVM does not give the 
memory back to the kernel.


TDB2 does not only use heap space. A heap of 2-4G is usually enough per 
dataset, sometimes less (data shape depenendent - e.g. many large 
literals used more space.


Use a profiler to examine the heap in-use, you'll probably see a 
saw-tooth shape.

Force a GC and see the level of in-use memory afterwards.
Add some safety margin and work space for requests and try that as the 
heap size.



*  We execute the following type of UPDATE operations:
   - There are triggers in the system (e.g. users of the application 
changing the data) which start ~50 other update operations containing 
up to ~30K triples. Most of them run in parallel, some are delayed 
with seconds or minutes.
   - There are scheduled UPDATE operations (executed on hourly basis) 
containing 30K-500K triples.
   - These UPDATE operations usually delete and insert the same amount 
of triples in the dataset. We use the compact API as a nightly job.


*We are noticing the following behaviour:*

* Fuseki consumes 5-10G of heap memory continuously, as configured in 
the JVM_ARGS.


* There are points in time when the volume usage of the k8s container 
starts to increase suddenly. This does not drop even though compaction 
is successfully executed and the dataset size (triple count) does not 
increase. See attachment below.


*Our suspicions:*

* garbage collection in Java is often delayed; memory is not freed as 
quickly as we would expect it, and the heap limit is reached quickly 
if multiple parallel queries are run
* long running database queries can send regular memory to Gen2, that 
is not actively cleaned by the garbage collector
* memory-mapped files are also garbage-collected (and perhaps they 
could go to Gen2 as well, using more and more storage space).


Could you please explain the possible reasons behind such a behaviour?
And finally could you please suggest a more appropriate configuration 
for our use case?


Thanks in advance and best wishes,
Gaspar Bartalus



Re: RDFRemoteConnection to Shiro protected Fuseki

2024-02-17 Thread Andy Seaborne



On 17/02/2024 11:16, Bart van Leeuwen wrote:

Hi,

Forget the whole report, I messed up my shiro config, it works as 
expected with the snippet below.

Good to hear that!

    Andy




Bart

On 2024/02/16 13:50:57 Bart van Leeuwen wrote:
> Hi Andy,
>
> Stand alone example I'll try to work on that.
> This is an app that runs in Apache Tomee
>
> the snippet how I setup the authentication:
>
> AuthEnv.get().registerBasicAuthModifier(provider.getEndpoint(), provider
> .getUser(), provider.getPassword());
>
>     builder = RDFConnectionFuseki.create()
>         .destination(provider.getEndpoint());
>     conn = builder.build();
>
> provider is an internal class that gives me the information I need (all
> double checked to be correct)
>
> the shiro line I use:
>
> /ristore/** = authcBasic,user[admin]
>
> this works from the web UI without issues
>
> Met Vriendelijke Groet / With Kind Regards
> Bart van Leeuwen
>
> On 2024/02/16 10:49:38 Andy Seaborne wrote:
> > Hi Bart,
> >
> > Do you have a complete, ideally runnable, example of how you are 
using

> > RDFConnection and also the client side auth setup.
> >
> >      Andy
> >
> > On 15/02/2024 19:27, Bart van Leeuwen wrote:
> > > Hi,
> > >
> > > I'm runn
Met Vriendelijke Groet / With Kind Regards
Bart van Leeuwen


mastodon: @semanticfire@mastodon.social
tel. +31(0)6-53182997
Netage B.V.
http://netage.nl <http://netage.nl/>
Esdoornstraat 3
3461ER Linschoten
The Netherlands


Re: RDFRemoteConnection to Shiro protected Fuseki

2024-02-16 Thread Andy Seaborne

Hi Bart,

Do you have a complete, ideally runnable, example of how you are using 
RDFConnection and also the client side auth setup.


Andy

On 15/02/2024 19:27, Bart van Leeuwen wrote:

Hi,

I'm running Fuseki 4.9.0. on linux with OpenJDK 17
I've protected it with the shiro configuration and that works without
issues for the web UI.

When I try to connect to the server with RDFConnectionRemoteBuilder or
RDFConnectionFuseki
I get:
Caused by: java.io.IOException: WWW-Authenticate header missing for
response code 401

I've tried all the variations described in:
https://jena.apache.org/documentation/sparql-apis/http-auth.html

but to no avail.

Met Vriendelijke Groet / With Kind Regards
Bart van Leeuwen



[ANN] Apache Jena 5.0.0-rc1

2024-02-14 Thread Andy Seaborne

The Apache Jena development community is pleased to
announce the release of Apache Jena 5.0.0-rc1

In Jena5:

* Minimum: Java 17
* Language tags are case-insensitive unique.
* Term graphs for in-memory models
* RRX - New RDF/XML parser
* Remove support for JSON-LD 1.0
* Turtle/Trig Output : default output PREFIX and BASE
* Artifacts : jena-bom and OWASP CycloneDX SBOM
* API deprecation removal
* Dependency updates : slf4j update : v1 to v2 (needs log4j change)

More details below.

There is no further feature work planned for Jena 5.0.0. This RC release 
is for wider review. The review period will be about a month.


 Contributions:

Balduin Landolt @BalduinLandolt - javadoc fix for Literal.getString.

@OyvindLGjesdal - https://github.com/apache/jena/pull/2121 -- text index fix

Paul Gallagher @TelicentPaul - Code cleanup

Tong Wang @wang3820 Fix tests due to hashmap order



All issues in this release:
https://s.apache.org/jena-5.0.0-rc1-issues

which includes the ones specifically related to Jena5:

  https://github.com/apache/jena/issues?q=label%3Ajena5

** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

Jakarta JavaEE required for deploying the WAR file (Apache Tomcat10)

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

Graphs are now term graphs in the API or SPARQL. That is, they do not 
match "same value" for some of the java mapped datatypes. The model API 
already normalizes values written.


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is the default RDF/XML parser. It is a replacement for ARP.
RIOT uses RRX.

The ARP parser is still temporarily available for transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

"PREFIX" and "BASE" are output by default for Turtle and TriG output.

** Artifacts

There is now a release BOM for Jena artifacts - artifact 
org.apache.jena:jena-bom


There are now OWASP CycloneDX SBOM for Jena artifacts.
https://github.com/CycloneDX

jena-tdb is renamed jena-tdb1.

jena-jdbc is no longer released

** Dependencies

The update to slf4j 2.x means the log4j artifact changes to
"log4j-slf4j2-impl" (was "log4j-slf4j-impl").


 API Users

** Deprecation removal

There has been a clearing out of deprecated functions, methods and 
classes. This includes the deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the general QueryExecution builders which are 
preferred and provide all full query execution setup controls.


Local execution builder:
QueryExecution.create()...

Remote execution builder:
QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.


Rename TDB1 packages org.apache.jena.tdb -> org.apache.jena.tdb1


 Fuseki Users

Fuseki: Uses the Jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


== Obtaining Apache Jena 5.0.0-rc1

* Via central.maven.org

The main jars and their dependencies can used with:

  
org.apache.jena
apache-jena-libs
pom
5.0.0-rc1
  

Full details of all maven artifacts are described at:

http://jena.apache.org/download/maven.html

* As binary downloads

Apache Jena libraries are available as a binary distribution of
libraries. For details of a global mirror copy of Jena binaries please see:

http://jena.apache.org/download/

* Source code for the release

The signed source code of this release is available at:

http://www.apache.org/dist/jena/source/

and the signed master source for all Apache Jena releases is available
at: http://archive.apache.org/dist/jena/

== Contributing

If you would like to help out, a good place to look is the list of
unresolved JIRA at:

https://https://github.com/apache/jena/issuesissues-current

or review pull requests at

https://github.com/apache/jena/pulls

or drop into the dev@ list.

We use github pull requests and o

Re: Database Migrations in Fuseki

2024-02-09 Thread Andy Seaborne

Hi Balduin,

On 07/02/2024 11:05, Balduin Landolt wrote:

Hi everyone,

we're storing data in Fuseki as a persistence for our application backend,
the data is structured according to the application logic. Whenever
something changes in our application logic, we have to do a database
migration, so that the data conforms to the updated model.
Our current solution to that is very home-spun, not exactly stable and
comes with a lot of downtime, so we try to avoid it whenever possible.


If I understand correctly, this is a schema change requiring the data to 
change.


The transformation of the data to the updated data model could be done 
offline, that would reduce downtime. If the data is being continuously 
updated, that's harder because the offline copy will get out of step 
with the live data.


How often does the data change (not due to application logic changes)?


I'm now looking into how this could be improved in the future. My double
question is:
1) is there any tooling I missed, to help with this process? (In SQL world
for example, there are out of the box solutions for that.)
2) and if not, more broadly, does anyone have any hints on how I could best
go about this?


Do you have a concrete example of such a change? maybe chnage-in-place 
is possible but that depends on w=howupdates happen, how the dada feeds 
change with the application logic change.


Andy



Thanks in advance!
Balduin






Re: jena-fuseki UI in podman execution (2nd effort without attachments)

2024-02-09 Thread Andy Seaborne

Hi Jaana,

Glad you got it sorted out.

The Fuseki UI does not do anything special about browser caches. There 
was a major UI update with implementing it in Vue and all the HTML 
assets that go with that.


Andy

On 09/02/2024 05:37, jaa...@kolumbus.fi wrote:

Hi, I just noticed that it's not  question about podman or docker but about 
browser cache. After deleting everything in browser cache I managed to get the 
correct user interface when running stain/jena-fuseki:3.14.0 and 
stain/jena-fuseki:4.0.0 by both podman and docker, but when I tried the latest 
stain/jena-fuseki (4.8.0) I got the incorrect interface (shown here 
https://github.com/jamietti/jena/blob/main/fuseki-podman.png).

Jaana M



08.02.2024 13.23 EET jaa...@kolumbus.fi kirjoitti:

  
Hi, I've running jena-fuseki with docker:
  
docker run -p 3030:3030 -e ADMIN_PASSWORD=pw123 stain/jena-fuseki
  
and rootless podman:
  
podman run -p 3030:3030 -e ADMIN_PASSWORD=pw123 docker.io/stain/jena-fuseki
  
when excuted the same version 4.8.0 of jena-fuseki with podman the UI looks totally different from the UI of the instance excuted with docker.
  
see file fuseki-podman.png https://github.com/jamietti/jena/blob/main/fuseki-podman.png in https://github.com/jamietti/jena/

What can cause this problem ?
  
Br, Jaana M


Re: Restart during Fuseki compaction

2024-02-07 Thread Andy Seaborne

Recorded as https://github.com/apache/jena/issues/2254

On 06/02/2024 23:06, Andy Seaborne wrote:

Hi Samuel,

This is when the server exists for some reason?

(If it's an internal exception, there should be a stack trace in the log 
file.)


What operating system are you running on?

What's in the new Data-0002 directory?

It does look like some defensive measures are needed to not choose to 
use the incomplete storage directory.


     Andy


On 06/02/2024 09:26, Samuel Börlin wrote:

Hi everybody,

I recently noticed that when Fuseki (4.10.0) is stopped during a 
compaction task (started via the HTTP endpoint 
`/$/compact/{name}?deleteOld=true`)
then it uses the new and still incomplete database (e.g. Data-0002 
instead of the original non-compacted Data-0001) when it is started 
again.
Is there a way to do compaction in an atomic manner so that this 
doesn't happen?


As a workaround I'm currently thinking about simply deleting (or 
perhaps renaming/moving) all Data- directories but the one with 
the lowest index when the database is started.
I always use `?deleteOld=true`, so I only ever expect there to be one 
Data- directory when it starts. If there are multiple directories 
then that means that there must have been an incomplete compaction.

Does this seem like a reasonable approach?

Thanks and best regards,
Samuel


Re: Restart during Fuseki compaction

2024-02-06 Thread Andy Seaborne

Hi Samuel,

This is when the server exists for some reason?

(If it's an internal exception, there should be a stack trace in the log 
file.)


What operating system are you running on?

What's in the new Data-0002 directory?

It does look like some defensive measures are needed to not choose to 
use the incomplete storage directory.


Andy


On 06/02/2024 09:26, Samuel Börlin wrote:

Hi everybody,

I recently noticed that when Fuseki (4.10.0) is stopped during a compaction 
task (started via the HTTP endpoint `/$/compact/{name}?deleteOld=true`)
then it uses the new and still incomplete database (e.g. Data-0002 instead of 
the original non-compacted Data-0001) when it is started again.
Is there a way to do compaction in an atomic manner so that this doesn't happen?

As a workaround I'm currently thinking about simply deleting (or perhaps 
renaming/moving) all Data- directories but the one with the lowest index 
when the database is started.
I always use `?deleteOld=true`, so I only ever expect there to be one Data- 
directory when it starts. If there are multiple directories then that means 
that there must have been an incomplete compaction.
Does this seem like a reasonable approach?

Thanks and best regards,
Samuel


Re: question about FROM keyword

2024-02-05 Thread Andy Seaborne

This is a combination of things happening.

In the one case of no data (grph or dataset) provided, Jena does read 
the URL. If there is supplied data, FROM refers the dataset.


The URL is coming back from www.learningsparql.com
as explicitly "Content-Type: text/plain", not "text/turtle".

Jena pretty much ignores "text/plain" because it is usually wrong, so it 
tries to guess the syntax.


The URL in the message

  (URI=file:///D:/neli/cs575Spring24/ex070mod2.rq : stream=text/plain)

is misleading - that "URI" is the base URI, not the URI being read.

> (This specifically may be a bug in the arq tool)

Yes, it is.

Recorded as https://github.com/apache/jena/issues/2250

Corrected, the results are:

-
| last   | first | courseName   |
=
| "Mutt" | "Richard" | "Updating Data with SPARQL"  |
| "Mutt" | "Richard" | "Using SPARQL with non-RDF Data" |
| "Marshall" | "Cindy"   | "Modeling Data with OWL" |
| "Marshall" | "Cindy"   | "Using SPARQL with non-RDF Data" |
| "Ellis"| "Craig"   | "Using SPARQL with non-RDF Data" |
-

Workarounds:
1/ Download the file using curl or wget as suggested
2/ Set the base on the command line with
   --base http://www.learningsparql.com/2ndeditionexamples/ex069.ttl


The message

ERROR StatusLogger Reconfiguration failed: No configuration found for 
'73d16e93' at 'null' in 'null'


is unrelated.

It is the command not finding the logging set up - I don't know why that 
is happening.


Try copying the log4j2.properties from the distribution directory into 
the current directory.


Andy

On 05/02/2024 13:06, Zlatareva, Neli (Computer Science) wrote:

Hi Rob, thank you so much for the quick response. What made me wonder was that 
this same FROM from arq on command line worked perfectly fine in the past (was 
able to access remote files). However, I assume that for different reasons 
(security?) this is not the case anymore.
Truly appreciate the help.
Thanks.
Regards, Neli.

Neli P. Zlatareva, PhD
Professor of Computer Science
Department of Computer Science
Central Connecticut State University
New Britain, CT 06050
Phone: (860) 832-2723
Fax: (860) 832-2712
Web site: cs.ccsu.edu/~neli/

From: Rob @ DNR 
Sent: Monday, February 5, 2024 6:32 AM
To: users@jena.apache.org 
Subject: Re: question about FROM keyword

EXTERNAL EMAIL: This email originated from outside of the organization. Do not 
click any links or open any attachments unless you trust the sender and know 
the content is safe.

So, there’s a couple of things happening here.

Firstly, Jena’s SPARQL engine always treats FROM (and FROM NAMED) as referring 
to graphs in the local dataset.  So, it doesn’t matter that the URL in your 
FROM is a valid RDF resource on the web, Jena won’t try and load that by 
default, it just looks for a graph with that URI in the local dataset.

Nothing in the SPARQL specifications requires that these URLs be treated 
otherwise.  Some implementations choose to resolve these URIs from the web but 
that isn’t required by the standard, and from a security standpoint isn’t a 
good idea.

Secondly, the ARQ command line tool the local dataset is usually an implicit 
empty dataset if you don’t supply one.  Except as it turns out when you supply 
a FROM/FROM NAMED, in which case it tries to build one given the inputs it has. 
 In this case that’s only your query file which isn’t valid when treated as an 
RDF dataset, thus you get the big nasty stack trace you reported.  (This 
specifically may be a bug in the arq tool)

You can avoid this second problem by supplying an empty data file e.g.

  arq --query query.rq --data empty.ttl

But that will only serve to highlight the first issue, that Jena only treats 
FROM/FROM NAMED as references to graphs in the local dataset, and you’ll get an 
empty result from your query.

You are better off downloading the RDF data you want to query locally and then 
running arq and supplying both a query file and a data file.

Hope this helps,

Rob

From: Zlatareva, Neli (Computer Science) 
Date: Monday, 5 February 2024 at 01:40
To: users@jena.apache.org 
Subject: question about FROM keyword
Hi there, I am trying the following arq query from command window
(works fine if I am getting the file locally)

PREFIX ab: 


Re: ARQInternalErrorException during query execution in Jena 4.10.0

2024-01-04 Thread Andy Seaborne

https://github.com/apache/jena/discussions/2150

The query shows is not the one generated by the update builder.



Re: ARQInternalErrorException during query execution in Jena 4.10.0

2024-01-04 Thread Andy Seaborne




On 03/01/2024 20:58, Dhamotharan, Kishan wrote:

Hi all,

We are attempting to upgrade from Jena 3.5 to Jena 4.10.0.
We are using  “RDFConnection.connect(TDBFactory.createDataset());” for unit 
tests.
The below query works totally fine in Jena 3.5 but fails with the following 
exception in Jena 4.10.0.
I have confirmed that the query is correct and works totally fine in Neptune 
RDF as well. Can you please help us on how to go about this ? or please suggest 
if the query needs to updated to something else for jena 4.10.0.


If the update request works, try that exact string locally.

If that works, try converting the output of the UpdateBuilder to a 
string, and parsing it back:


updateRequest = UpdateFactory.create(updateRequest.toString());
update(conn, updateRequest);

If that works, then there is a problem in the UpdateBuilder.
Whether that is in the way it is being used or a bug in the 
UpdateBuilder itself isn't clear.


Reduce the test case to a simpler update.

> from Jena 3.5 to Jena 4.10.0.

It would helpful if you could bisect on the versions to identify which 
version introduced the problem.



I have also attached the code sample to reproduce the issue.


The code does not compile. Is it an extrat of Groovy?

There is missing code and multiple syntax errors. It is very helpful to 
have code that runs exactly without needing toi be fixed up because in 
fixing it up, some assumption may be made that relates to the problem at 
hand.


One example:

> private final graph1 = getNamedGraph()

Bad Java syntax 1. no ";"  Is this because its Groovy?
Has other text also been lost? Groovy may be returnign a bad chocie of type.

Bad Java syntax 2.  No type declaration - inserting bad data into a 
builder can make it fail.


What's "getNamedGraph()"?

Ditto createURI()



Query :

INSERT {
   GRAPH 
 {
   "o1" .
   }
 }
 WHERE
   { GRAPH 

   { }
 GRAPH 

   { FILTER NOT EXISTS { "o3" }}
   }

Error :

 org.apache.jena.sparql.ARQInternalErrorException: compile(Element)/Not a 
structural element: ElementFilter
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.broken(AlgebraGenerator.java:577)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileUnknownElement(AlgebraGenerator.java:170)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileElement(AlgebraGenerator.java:156)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileElementGraph(AlgebraGenerator.java:426)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileElement(AlgebraGenerator.java:133)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileOneInGroup(AlgebraGenerator.java:319)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileElementGroup(AlgebraGenerator.java:202)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compileElement(AlgebraGenerator.java:127)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compile(AlgebraGenerator.java:113)
 at 
app//org.apache.jena.sparql.algebra.AlgebraGenerator.compile(AlgebraGenerator.java:100)
 at app//org.apache.jena.sparql.algebra.Algebra.compile(Algebra.java:73)
 at 
app//org.apache.jena.sparql.engine.QueryEngineBase.createOp(QueryEngineBase.java:140)
 at 
app//org.apache.jena.sparql.engine.QueryEngineBase.(QueryEngineBase.java:57)
 at 
app//org.apache.jena.sparql.engine.main.QueryEngineMain.(QueryEngineMain.java:45)
 at 
app//org.apache.jena.tdb.solver.QueryEngineTDB.(QueryEngineTDB.java:63)
 at 
app//org.apache.jena.tdb.solver.QueryEngineTDB$QueryEngineFactoryTDB.create(QueryEngineTDB.java:135)
 at 
app//org.apache.jena.query.QueryExecutionFactory.makePlan(QueryExecutionFactory.java:442)
 at 
app//org.apache.jena.query.QueryExecutionFactory.createPlan(QueryExecutionFactory.java:418)
 at 
app//org.apache.jena.sparql.modify.UpdateEngineWorker.evalBindings(UpdateEngineWorker.java:532)
 at 
app//org.apache.jena.sparql.modify.UpdateEngineWorker.visit(UpdateEngineWorker.java:371)
 at 
app//org.apache.jena.sparql.modify.request.UpdateModify.visit(UpdateModify.java:100)
 at 
app//org.apache.jena.sparql.modify.UpdateVisitorSink.send(UpdateVisitorSink.java:45)
 at 
app//org.apache.jena.sparql.modify.UpdateVisitorSink.send(UpdateVisitorSink.java:31)
 at 
java.base@17.0.9/java.util.ArrayList$Itr.forEachRemaining(ArrayList.java:1003)
 at 
java.base@17.0.9/java.util.Collections$UnmodifiableCollection$1.forEachRemaining(Collections.java:1061)
 at app//org.apache.jena.atlas.iterator.Iter.sendToSink(Iter.java:776)

Jena5: what to expect

2023-12-30 Thread Andy Seaborne

Jena5 is the next planned release for Apache Jena.

** All issues for Jena5:

https://github.com/apache/jena/issues?q=is%3Aissue+label%3AJena5

** Java Requirement

Java 17 or later is required.
Java 17 language constructs now are used in the codebase.

** Language tags

Language tags become are case-insensitive unique.

"abc"@EN and "abc"@en are the same RDF term.

Internally, language tags are formatted using the algorithm of RFC 5646.

Examples "@en", "@en-GB", "@en-Latn-GB".

SPARQL LANG(?literal) will return a formatted language tag.

Data stored in TDB using language tags must be reloaded.

** Term graphs

The default in-memory graphs become term graphs for consistency across 
all Jena storage options; they do not match "same value" for some of the 
java mapped datatypes e.g int 1 does not match "001"^^xsd:int. The model 
API has always normalized values written, e.g. "1"^^xsd:int


TDB1, TDB2 keep their value canonicalization during data loading.

A legacy value-graph implementation can be obtained from GraphMemFactory.

** RRX - New RDF/XML parser

RRX is a new RDF/XML parser. It is a replacement for ARP and will be the 
default.


Differences to ARP:
  * daml:collection is not supported.
  * Strict rdf:parseType
  * Relative namespaces supported.

The ARP parser will, temporarily, still available for any
transition assistance.

** Remove support for JSON-LD 1.0

JSON-LD 1.1, using Titanium-JSON-LD, is the supported version of JSON-LD.

https://github.com/filip26/titanium-json-ld

** Turtle/Trig Output

The "PREFIX" and "BASE" forms are output by default for Turtle and TriG 
output. See RIOT.symTurtleDirectiveStyle.



 API Users

** Deprecation removal

There has been a general clearing out of deprecated functions, methods 
and classes. This includes deprecations in Jena 4.10.0 added to show 
code that is being removed in Jena5.


** QueryExecutionFactory

QueryExecutionFactory is simplified to cover commons cases only; it 
becomes a way to call the more general QueryExecution builders which 
support custom query execution setup.


Local execution builder:
  QueryExecution.create()...

Remote execution builder:
  QueryExecution.service(URL)...

** QueryExecution variable substitution

Using "substitution", where the query is modified by replacing one or 
more variables by RDF terms, is now preferred to using "initial 
bindings", where query solutions include (var,value) pairs.


"substitution" is available for all queries, local and remote, not just 
local executions.



 Fuseki Users

Fuseki: Uses the jakarta namespace for servlets and Fuseki has been 
upgraded to use Eclipse Jetty12.


Apache Tomcat10 or later, is required for running the WAR file.
Tomcat 9 or earlier will not work.


Re: Parallel requests on multiple fuseki

2023-12-14 Thread Andy Seaborne

Jorge,

Have you looked at

https://jena.apache.org/documentation/query/service_enhancer.html

It might have features of use to you.

Andy

On 14/12/2023 08:25, George News wrote:

Hi,

I have deployed several Fuseki instances. This email scenario is just 
for 2.


I was testing the SERVICE option in order to launch the same request to
both instances of Fuseki and merge the result under one response.

The SPARQL request I launched is the following:

prefix rdf: 
SELECT * WHERE {
   { SERVICE 
     {
   SELECT ?anything WHERE{?anything rdf:type ?bb}
     } BIND ( AS ?serviceLabel)
   }
   UNION
   { SERVICE 
  {
   SELECT ?anything WHERE{?anything rdf:type ?bb}
  } BIND ( AS ?serviceLabel)
   }
}

The result was the expected. However when using Wireshark and analysing
the logs, I noticed that the request are not send in parallel, but just
one and then the other. This is somehow a waste of time ;)

Is there any way to parallelize sending the same request to many Fuseki
instances and merge the responses? I guess I can make my own solution
using Jena, but I wanted to know if it would be possible using SPARQL.

Thanks.
Jorge


Re: Checking that SPARQL Update will not validate SHACL constraints

2023-12-13 Thread Andy Seaborne




On 13/12/2023 15:49, Arne Bernhardt wrote:

Hello Martynas,

I have no experience with implementing a validation layer for Fuseki.

But I might have an idea for your suggested approach:
Instead of loading a copy of the graph and modifying it, you could create
an org.apache.jena.graph.compose.Delta based on the unmodified graph.
Then apply the update to the delta graph and validate the SHACL on the
delta graph. If the validation is successful, you can safely apply the
update to the original graph and discard the delta graph.

You still have to deal with concurrency. For example, the original graph
could be changed by a second, faster update while you are still validating
the first update. It would not be safe to apply the validated changes to a
graph that has been changed in the meantime.

Arne


It'll depends in the SHACL. Many constraints don't need all the data 
available. Some need just the subject and all properties (e.g. 
sh:maxCount). Some need all the data (SPARQL ones - they are opaque to 
analysis so the general way is they need all the data).


If the proxy layer is same JVM, BufferingDatasetGraph may help.
It can be used to capture the adds and deletes. It can then be validated 
(all data or only the data changing). Flush the changes to the database 
just before the end of the request in the proxy level commit.


If the proxy is in a different JVM, then only certain constraints can be 
supported but they do tend to be the most common checks.


Andy






Am Mi., 13. Dez. 2023 um 14:29 Uhr schrieb Martynas Jusevičius <
marty...@atomgraph.com>:


Hi,

I have an objective to only persist constraint-validated data in Fuseki.

I have a proxy layer that validates all incoming GSP PUT and POST
request graphs in memory and rejects the invalid ones. So far so good.

What about SPARQL Update requests though? For simplicity's sake, let's
say they are restricted to a single graph as in GSP PATCH [1].
What I can think of is first loading the graph into memory and
executing the update, and then validating the resulting graph against
SHACL. But maybe there's a smarter way?

Also interested in the more general case without the graph restriction.

Martynas

[1] https://www.w3.org/TR/sparql11-http-rdf-update/#http-patch





Re: Unable to build the below query using jena query builder

2023-12-08 Thread Andy Seaborne




On 08/12/2023 02:08, Dhamotharan, Kishan wrote:

Hello Lorenz,

Thanks for your response.


...



Since query builder 3.5 does not have addWhereValueVar is there any other way 
to build the query ?

It’s a very painful process to pull in three party / open source libraries, 
requires multiple approvals and adding a new version would involve a very 
tedious task of manually upgrading and pulling in the dependences and get them 
to work with the in house build system. Would be great if we have a workaround 
for this.


If you're unwilling to upgrade (and the 6 year old 3.5.0 has CVE issues 
raise against it so upgrading would be a very good idea) then you could 
consider taking the query builder source code. It is a self-contained 
feature of Apache Jena should back-port quite easily.


Andy


Re: Problem running AtomGraph/fuseki-docker

2023-12-07 Thread Andy Seaborne

[2023-12-06 22:19:53] INFO  Server  :: Path = /'/ds'

Not good. Shell quoting didn't happen. That's a URL path component 
called '/ds' in the server root.


Andy

On 06/12/2023 23:55, Steve Vestal wrote:

I was using bash.  When I run it in command prompt, it works. Thanks!

Interestingly, when the command prompt is closed, the container is 
removed from Docker Desktop.  Each new start creates a new container 
with a new amusing name :-)


C:\Users\svestal>docker run --rm -p 3030:3030 atomgraph/fuseki --mem '/ds'
[2023-12-06 22:19:53] INFO  Server  :: Apache Jena Fuseki 4.6.1
[2023-12-06 22:19:53] INFO  Server  :: Database: in-memory
[2023-12-06 22:19:53] INFO  Server  :: Path = /'/ds'
[2023-12-06 22:19:53] INFO  Server  :: System
[2023-12-06 22:19:53] INFO  Server  ::   Memory: 2.0 GiB
[2023-12-06 22:19:53] INFO  Server  ::   Java:   17-ea
[2023-12-06 22:19:53] INFO  Server  ::   OS: Linux 
5.15.133.1-microsoft-standard-WSL2 amd64

[2023-12-06 22:19:53] INFO  Server  ::   PID:    1
[2023-12-06 22:19:53] INFO  Server  :: Start Fuseki (http=3030)

On 12/6/2023 2:12 PM, Martynas Jusevičius wrote:

Hi Steve,

This looks like Windows shell issue.

For some reason /ds is resolved as a filepath where it shouldn’t.

Can you try —mem '/ds' with quotes?

I’m running Docker on WSL2 and never had this problem.

Martynas

On Wed, 6 Dec 2023 at 21.05, Steve Vestal  
wrote:



I am running a VM with Microsoft Windows Server 2019 (64-bit). When I
try to stand up the docker server, I get

$ docker run --rm -p 3030:3030 atomgraph/fuseki --mem /ds
String '/C:/Program Files/Git/ds' not valid as 'service'

Suggestions?




Re: Text indexing stopped working

2023-11-30 Thread Andy Seaborne

There isn't much information to go.

On 29/11/2023 09:50, Mikael Pesonen wrote:

No idea?

On 16/11/2023 13.11, Mikael Pesonen wrote:
What could be the reason why new data is suddenly not added to text 
index and not found with Jena text queries?


The newest files in Jena text index folder are zero sized

_b_Lucene85FieldsIndexfile_pointers_n.tmp
_b_Lucene85FieldsIndex-doc_ids_m.tmp

dated 2023-11-13 although I have added lots of data since then using 
same methods as before. Text queries find all the data before this date.


BR




Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 25/11/2023 13:47, Marco Neumann wrote:

I was looking for an IRI validator and this one didn't come up in the
search engines. This service might need a bit more visibility and some
incoming links.


It gets lost in all the code library "validators"



Marco

On Sat, Nov 25, 2023 at 1:34 PM Andy Seaborne  wrote:




On 24/11/2023 10:05, Marco Neumann wrote:

(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.


It's a hangover from XML QNames.

Turtle doesn't care.

Style-wise, yes, avoid an initial number.


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was

later

also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.


There's an online IRI validator.

https://sparql.org/iri-validator.html

using the jena-iri package.






Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 10:05, Marco Neumann wrote:

(side note) preferably the local name of a URI should not start with a
number but a letter or underscore.


It's a hangover from XML QNames.

Turtle doesn't care.

Style-wise, yes, avoid an initial number.


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.

There are some community efforts that have introduced single letters and
number sequences for vocabulary development like CIDOC CRM which was later
also adopted by community projects like wikidata. But instance data
typically doesn't have that requirement and can be random but has to be
syntax compliant of course.

I am sure Andy can elaborate on the details of the encoding here.


There's an online IRI validator.

https://sparql.org/iri-validator.html

using the jena-iri package.


Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 08:55, Marco Neumann wrote:

Laura, see jena issue #2102
https://github.com/apache/jena/issues/2102


It's specific to [].

Because data formats accept these bad URIs (with a warning), the fact 
SPARQL generates errors is a bug to be fixed.


Andy



Marco

On Fri, Nov 24, 2023 at 7:12 AM Laura Morales  wrote:


I have a few URLs containing square brackets like
http://example.org/foo[1]bar
I can create a TDB2 dataset without much problems, with warnings


Warnings exist for a reason!

>> but no errors.



I tried escaping, "foo\[1\]bar" but it doesn't work.


URIs don't accept \ escapes.

And U+ doesn't help because the check isn't just in the parser.



Re: Querying URL with square brackets

2023-11-25 Thread Andy Seaborne




On 24/11/2023 10:40, Marco Neumann wrote:

The URI syntax is defined by the Internet Engineering Task Force (IETF) in
RFC 3986.

W3C RDF is just a rule-taker here ;)

https://datatracker.ietf.org/doc/html/rfc3986


We've drafted a non-normative section:

https://www.w3.org/TR/rdf12-concepts/#iri-abnf

which is all the RFCs we could find and adopting the current state of 
terminology.


Nowadays, URI and IRI are interchangeable. Only use in HTTP requests 
worries about ASCII vs UTF-8 and then only in old software. Use a 
toolkit and it'll sort it out.


Only the URI scheme name is restricted to A-Z.

   Andy



Marco

On Fri, Nov 24, 2023 at 10:36 AM Laura Morales  wrote:


What do you mean by human-readable here? For large technical systems it's
simply not feasible to encode meaning into the URI and I might even
consider it an anti-pattern.


This is my problem. I do NOT want to encode any meaning into URLs, but I
do want them to be human readable simply because I) properties are URLs
too, 2) they can be used online, and 3) they are simpler to work with, for
example editing in a Turtle file or writing a query.

:alice :knows :bobvs:dsa7hdsahdsa782j :d93ifg75jgueeywu
:s93oeirugj290sjf

I can avoid [ entirely, but it rises the question of what other characters
I MUST avoid.


{} {}

You can use () but hierarchical names are better.

Be careful about ':' because it can't be in the first segment of a path 
of a relative URI (it looks like a scheme name).


Andy








RDF URI references [Was: Querying URL with square brackets]

2023-11-25 Thread Andy Seaborne
Another option is the HTTP query string - think of it as asking a 
question of resource "http://example.org/book";


Andy

On 24/11/2023 11:03, Martynas Jusevičius wrote:

On Fri, Nov 24, 2023 at 11:46 AM Laura Morales  wrote:



in the case that I want to use these URLs with a web browser.


I don't understand what the trouble with the above example is?


The problem with # is that browsers treat them as the start of a local 
reference. When you open http://example.org/book#1 the server only receives 
http://example.org/book. In other words it would be an error to create nodes 
for n different books (#1 #2 #3 #n) if my goal is also to use these URLs with a 
browser (for example if I want to show one page for every book). It's not a 
problem with Jena, it's a problem with the way browsers treat the fragment.


If you want a page for every book, don't use fragment URIs. Use
http://example.org/book/1 or http://example.org/book/1#this instead of
  http://example.org/book#1.


Re: Implicit default-graph-uri

2023-11-19 Thread Andy Seaborne




On 18/11/2023 08:21, Laura Morales wrote:

I've tried this option too using the following configuration


fuseki:dataset [
 a ja:RDFDataset;

 ja:defaultGraph [
 a ja:UnionModel ;

 ja:subModel [
 a tdb2:GraphTDB2 ;
 tdb2:dataset [
 a tdb2:DatasetTDB2 ;
 tdb2:location "location1"
 ]
 ] ;

 ja:subModel [
 a tdb2:GraphTDB2 ;
 tdb2:dataset [
 a tdb2:DatasetTDB2 ;
 tdb2:location "location2"
 ]
 ] ;
 ]
]


but it always gives me "transaction error" with any query. I've tried TDB 1 
instead, but it gives me a different error:

ERROR Server  :: Exception in initialization: the (group) Assembler 
org.apache.jena.assembler.assemblers.AssemblerGroup$PlainAssemblerGroup@b73433 
cannot construct the object [...] [ja:subModel of [...] [ja:defaultGraph of 
[...] ]] because it does not have an implementation for the objects's most 
specific type ja:Model

I've found a couple of old threads online with people reporting "MultiUnion" as 
working, but I don't know how to use this configuration. I couldn't find it on the Fuseki 
documentation and simply replacing ja:UnionModel for ja:MultiUnionModel doesn't make any 
difference for me.
Do you know anything about this MultiUnion and if it could work?


Only with use of default-graph-uri or SELECT FROM.

Having a dadaset description in the request itself causes the processor 
to have a per-request dataset with a java class GraphUnionRead as the 
default graph. GraphUnionRead copes with the transaction setup across 
the two locations.


As things stand at the moment, other ways of constructing a suitable 
dataset don't use GraphUnionRead.


Using a service name of

   "/service/query/?default-graph-uri=urn:x-arq:UnionGraph".

Tools generally cope with query string in the URL and correctly assmble 
the URL:


Java:

QueryExecution qExec =
   QueryExecutionHTTP

.service("http://localhost:3030/ds/?default-graph-uri=urn:x-arq:UnionGraph";)
.query("SELECT * { ?s ?p ?o}")
.build();

or
 curl -d 'query=SELECT * {?s ?p ?o}'
   'http://localhost:3030/ds/?default-graph-uri=urn:x-arq:UnionGraph'

Andy




Re: Implicit default-graph-uri

2023-11-17 Thread Andy Seaborne




On 16/11/2023 11:35, Laura Morales wrote:

I would like to configure Fuseki such that I can use 2 datasets from 2 
different locations, as if they were a single dataset.
This is my config.ttl:


<#> a fuseki:Service ;

 fuseki:endpoint [
 fuseki:operation fuseki:query
 ] ;

 fuseki:dataset [
 a ja:RDFDataset ;

 ja:namedGraph [
 ja:graphName :graph1 ;
 ja:graph [
 a tdb2:GraphTDB ;
 tdb2:location "location-1" ;
 ]
 ] ;

 ja:namedGraph [
 ja:graphName :graph2 ;
 ja:graph [
 a tdb2:GraphTDB ;
 tdb2:location "location-2" ;
 ]
 ] ;
 ] .


There is no particular reason why I used this configuration; I mostly copied it 
from the Fuseki documentation. If it can be simplified, please suggest how.

I query Fuseki with "/service/query/?default-graph-uri=urn:x-arq:UnionGraph". I also know that I can use 
"SELECT FROM ". But I would like to know if I can configure this behavior as 
the default in the main configuration file, such that I can avoid using "x-arq:UnionGraph" entirely.
Both datasets are TDB2 and contain triples only in the default unnamed graph 
(in other words do not contain any named graph inside).


I can't find a way to do that.

tdb2:unionDefaultGraph applies to a single datasets and you have two 
datasets.


Using
  ja:defaultGraph [
a ja:Model;
ja:subModel ...
ja:subModel ...
] ;

falls foul of transaction coordination across two different models (even 
if they are views of the same database).


I though that would work - there is some attempt to extend transactions 
into graph but this seems to be pushing things too far.


Andy


Re: Delete Dataset with fuseki

2023-11-16 Thread Andy Seaborne




On 15/11/2023 09:19, Steven Blanchard wrote:

Dear Jena Users,

When i delete a dataset with fuseki, only the configuration file are 
removing and not the tdb2 folder
According to the documentation this is expected behaviour  : 

But I have a problem with this when I want to recreate a dataset with 
the name, the old data are still here.


How can I delete the tdb2 dataset with fuseki by interface or API?


Currently, that isn't possible.

You can delete the folder via the OS.

Feel free to raise an issue for a new feature. We can add a query string 
item and do the same as compact which hs a ?deleteOld flag.


However, there is also the issue in the general case that other 
operations maybe be using the database concurrently. Maybe renaming 
aside is better - already started requests will finish cleanly.


Note that on MS Windows, it isn't possible to free the space.  It is a 
JVM feature on MS Windows that memory mapped files do not go away until 
the JVM exists. This is a long-standing issue of Java.


Andy



Thanks,

Steven




Re: Issues importing Jena to Eclipse after clean install

2023-11-16 Thread Andy Seaborne

https://stackoverflow.com/questions/77490993/importing-jena-to-eclipse-compile-problems

On 15/11/2023 21:35, Paul Jarski wrote:


I believe I've followed the instructions from 
https://jena.apache.org/tutorials/using_jena_with_eclipse.html: I ran 
mvn clean install with apparently no issues, but then when I tried to 
import the maven project to Eclipse, I had 271 errors, mostly 
pertaining to the Graph type, which didn't compile somehow. I can't 
seem to find advice on how to resolve this issue anywhere. Any 
suggestions? Thanks in advance!


Screenshot from 2023-11-15 13-25-54.png



Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Andy Seaborne




On 10/11/2023 20:35, Marco Neumann wrote:

On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:




On 10/11/2023 12:33, Marco Neumann wrote:

Should DELETE {URI URI * } not update all matching graph patterns?


No.
(and that's bad syntax)


I had a case where only DELETE {URI URI NODE } did execute the update in
the dataset/graph/query fuseki UI.

To be precise it is a DELETE INSERT combination with an empty WHERE

clause.


DELETE {pattern} INSERT{pattern} WHERE{ }


the "pattern" is used as a template.
DELETE {template} INSERT {template} WHERE {pattern}

If the template has variables, these variables must be set by the WHERE
clause. Otherwise triple patterns with unbound variables are skipped.



OK, yes I think this is my case, an unbound variable was used in the
template, the "Update Success" tricked me into believing that the data was
actually removed.


"Update Success" means "executed as per spec" :-)

It's the same rule as CONSTRUCT which skips triples with any unbound 
variables.


Andy



There is no pattern matching  in a template.

There is a short form DELETE WHERE { pattern } which is
DELETE { pattern } WHERE {pattern}, using the pattern as the template.

  Andy



Marco








Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Andy Seaborne




On 10/11/2023 18:19, Marco Neumann wrote:

On Fri, Nov 10, 2023 at 5:51 PM Andy Seaborne  wrote:




On 10/11/2023 12:33, Marco Neumann wrote:

Should DELETE {URI URI * } not update all matching graph patterns?


No.
(and that's bad syntax)



DELETE {  ?x } is bad syntax?


"*" is bad syntax.

DELETE {  ?x } is bad syntax for another reason - there must 
be a WHERE.






I had a case where only DELETE {URI URI NODE } did execute the update in
the dataset/graph/query fuseki UI.

To be precise it is a DELETE INSERT combination with an empty WHERE

clause.


DELETE {pattern} INSERT{pattern} WHERE{ }


the "pattern" is used as a template.
DELETE {template} INSERT {template} WHERE {pattern}

If the template has variables, these variables must be set by the WHERE
clause. Otherwise triple patterns with unbound variables are skipped.

There is no pattern matching  in a template.

There is a short form DELETE WHERE { pattern } which is
DELETE { pattern } WHERE {pattern}, using the pattern as the template.

  Andy



Marco








Re: Semantics of SPARQL Update Delete

2023-11-10 Thread Andy Seaborne




On 10/11/2023 12:33, Marco Neumann wrote:

Should DELETE {URI URI * } not update all matching graph patterns?


No.
(and that's bad syntax)


I had a case where only DELETE {URI URI NODE } did execute the update in
the dataset/graph/query fuseki UI.

To be precise it is a DELETE INSERT combination with an empty WHERE clause.

DELETE {pattern} INSERT{pattern} WHERE{ }


the "pattern" is used as a template.
DELETE {template} INSERT {template} WHERE {pattern}

If the template has variables, these variables must be set by the WHERE 
clause. Otherwise triple patterns with unbound variables are skipped.


There is no pattern matching  in a template.

There is a short form DELETE WHERE { pattern } which is
DELETE { pattern } WHERE {pattern}, using the pattern as the template.

Andy



Marco



Re: Ever-increasing memory usage in Fuseki

2023-11-02 Thread Andy Seaborne

Hi Hugo,

On 01/11/2023 19:43, Hugo Mills wrote:

Hi,

We’ve got an application we’ve inherited recently which uses a Fuseki 
database. It was originally Fuseki 3.4.0, and has been upgraded to 4.9.0 
recently. The 3.4.0 server needed regular restarts (once a day) in order 
to keep working; the 4.9.0 server is even more unreliable, and has been 
running out of memory and being OOM-killed multiple times a day. This 
afternoon, it crashed enough times, fast enough, to make Kubernetes go 
into a back-off loop, and brought the app down for some time.


We’re using OpenJDK 19. The JVM options are: “-Xmx:30g -Xms18g”, and the 
container we’re running it in has a memory limit of 31 GiB.


Setting Xmx close to the container limit can cause problems.

The JVM itself takes space and the operating system needs space.
The JVM itself has a ~1G extra space for direct memory which networking 
uses.


The Java heap will almost certainly grow to reach Xmx at some point 
because Java delays running full garbage collections. The occasional 
drops you see are likely incremental garbage collections happening.


If Xxm is very close to container limit, the heap will naturally grow 
(it does not know about the container limit),  then the total in-use 
memory for the machine is reached and the container is killed.


30G heap looks like a very tight setting. Is there anything customized 
running in Fuseki? is the server dedicated to Fuseki?


As Conal mentioned, TDB used memory mapped files - these are not part of 
the heap. They are part of the OS virtual memory.


Is this a single database?
One TDB database needs about 4G RAM of heap space. Try a setting of -Xmx4G.

Only if you have a high proportion of very large literals will that 
setting not work.


More is not better from TDB's point of view.  Space for memory mapped 
files is handled elsewhere, and that space that will expand and contract 
as needed. If that space is squeezed out the system will slow down.


We tried the 
“-XX:+UserSerialGC” option this evening, but it didn’t seem to help 
much. We see the RAM usage of the java process rising steadily as 
queries are made, with occasional small, but insufficient, drops.



The store is somewhere around 20M triples in size.


Is this a TDB database or in-memory? (I'm guessing TDB but could you 
confirm that.)


Query processing can lead to a lot of memory use if the queries are 
inefficient and there is a high, overlapping query load.


What is the query load on the server? Are there many overlapping requests?

Could anyone suggest any tweaks or options we could do to make this more 
stable, and not leak memory? We’ve downgraded to 3.4.0 again, and it’s 
not running out of space every few minutes at least, but it still has an 
ever-growing memory usage.


Thanks,

Hugo.

*Dr. Hugo Mills*

Senior Data Scientist

hugo.mi...@agrimetrics.co.uk 


[ANN] Apache Jena 4.10.0

2023-11-01 Thread Andy Seaborne



The Apache Jena development community is pleased to
announce the release of Apache Jena 4.10.0

In this release:

* Prepare for Jena5

  Check use of deprecated API calls
These are largely being removed in Jena5.

  Jena5 will require Java17

  jena5 Fuseki will switch from javax.servlet to jakarta.servlet
This will require use of Apache Tomcat 10 to run the WAR file.

  jena-jdbc is planned for retirement in Jena 5.0.0

See the Jena5 label in the github issues area:

https://github.com/apache/jena/issues?q=is%3Aissue+label%3Ajena5

* Development will switch to Jena5.
  The 'main' branch is now for Jena5 development.
  There is a branch 'jena4' marking the 4.10.0 release

== Notes

All issues: https://s.apache.org/jena-4.10.0-issues

There is a CHANGES.txt in the root of the repository
with the history of announcement messages.

 Contributions:

Shawn Smith
"Race condition with QueryEngineRegistry and
UpdateEngineRegistry init()"
  https://issues.apache.org/jira/browse/JENA-2356

Ali Ariff
"Labeling for Blank Nodes Across Writers"
  https://github.com/apache/jena/issues/1997

sszuev
"jena-core: add more javadocs about Graph-mem thread-safety and 
ConcurrentModificationException"

  https://github.com/apache/jena/pull/1994

sszuev
GH-1419: fix DatasetGraphMap#clear
  https://github.com/apache/jena/issue/1419

sszuev
GH-1374: add copyWithRegisties Context helper method
  https://github.com/apache/jena/issue/1374


All issues in this release:
https://s.apache.org/jena-4.10.0-issues

 Key upgrades

org.apache.lucene : 9.5.0 -> 9.7.0
org.apache.commons:commons-lang3: 3.12.0 -> 3.13.0
org.apache.sis.core:sis-referencing : 1.1 -> 1.4

== Obtaining Apache Jena 4.10.0

* Via central.maven.org

The main jars and their dependencies can used with:

  
org.apache.jena
apache-jena-libs
pom
4.10.0
  

Full details of all maven artifacts are described at:

http://jena.apache.org/download/maven.html

* As binary downloads

Apache Jena libraries are available as a binary distribution of
libraries. For details of a global mirror copy of Jena binaries please see:

http://jena.apache.org/download/

* Source code for the release

The signed source code of this release is available at:

http://www.apache.org/dist/jena/source/

and the signed master source for all Apache Jena releases is available
at: http://archive.apache.org/dist/jena/

== Contributing

If you would like to help out, a good place to look is the list of
unresolved JIRA at:

https://https://github.com/apache/jena/issuesissues-current

or review pull requests at

https://github.com/apache/jena/pulls

or drop into the dev@ list.

We use github pull requests and other ways for accepting code:
 https://github.com/apache/jena/blob/master/CONTRIBUTING.md


Re: HTTP QueryExecution has been closed

2023-10-29 Thread Andy Seaborne

It's not clear from the information so far.

Complete, minimal, verifiable example please.

Also - what's the stacktrace you are seeing and which Jena version are 
you running?


Andy

On 27/10/2023 22:18, Martynas Jusevičius wrote:

Hi,

I'm trying to understand in which circumstances can the following code

 try (QueryExecution qex = QueryExecution.create(getQuery(), rowModel))
 {
 return qex.execConstructDataset();
 }

throw the "HTTP QueryExecution has been closed" exception?
Full code here:
https://github.com/AtomGraph/LinkedDataHub/blob/rf-direct-graph-ids-only/src/main/java/com/atomgraph/linkeddatahub/imports/stream/csv/CSVGraphStoreRowProcessor.java#L141

The execution is not even happening over HTTP? Is it somehow closed prematurely?

I can see the exception being thrown in QueryExecDataset::constructQuads:
https://github.com/apache/jena/blob/main/jena-arq/src/main/java/org/apache/jena/sparql/exec/QueryExecDataset.java#L211

Martynas


Re: qparse but preserve comments

2023-10-29 Thread Andy Seaborne




On 27/10/2023 17:58, Justin wrote:

Hello,

Is it possible to allow qparse to preserve comments?


No - the parser skips them.



e.g. it currently does not:
```
justin@parens:/tmp$ cat a.rq
select * where {
?s ?p ?o
# comment here
}
justin@parens:/tmp$ ~/Downloads/apache-jena-4.7.0/bin/qparse --query a.rq
SELECT  *
WHERE
   { ?s  ?p  ?o }
```

If comment lines (starting with #) are too tricky to work with what about
something like Clojure's `comment`
e.g.

```
(comment (range 5))
```
That way the "comment" is still part of the AST tree.
Maybe there could be a magic predicate that gets ignored?
```
[] ex:comment "comment here" .
```



If the comment is between syntax elements, then, yes, an 
"ElementComment" could be added.


Some positions are within syntax elements:

 { ?s
   # Some comment
   ?p  ?o }

and don't have a natural place to go in the AST.

Andy

c.f. Pragmas.


Re: How to reconstruct a Literal from a SPARQL SELECT row element?

2023-10-26 Thread Andy Seaborne




On 26/10/2023 10:17, Steve Vestal wrote:
What is the best way to reconstruct a typed Literal from a SPARQL SELECT 
result?


I have a SPARQL SELECT query issued against an OntModel in this way:

  QueryExecution structureRowsExec = 
QueryExecutionFactory.create(structureRowsQuery, owlOntModel);


Here are some example triples in the query:

   ?a2 
 ?dataVar1.
   ?a2 
 ?dataVar2.




Query results come back as the right RDF term kind.


The OntModel being queried was created using typed literals, e.g.,


     DataPropertyAssertion( struct:floatProperty struct:indivA2 
"123.456"^^xsd:float )
     DataPropertyAssertion( struct:dateTimeProperty struct:indivA2 
"2023-10-06T12:05:10Z"^^xsd:dateTime )


When I look at the ?dataVar1 and ?dataVar2 results in a row, I get 
things like:


  1
  stringB
  123.456
  2023-10-06T12:05:10Z


Are those are just the toString() presentation?
Or is your query returning strings?



What is a good way to reconstruct a typed Literal from the query 
results? 


RDFNode is the class for all RDF term types.

QuerySolution.get

and if you know they are literals:

QuerySolution.getLiteral

Is there a SPARQL option to show full typed literal strings? 
Something that can be added to the query?  A utility method that can 
identify the XSD schema simple data type when given a result value string?





Re: pellet version

2023-10-19 Thread Andy Seaborne




On 19/10/2023 14:46, Taras Petrenko wrote:

Hi,
I would like to know which Pellet implementation is the most consistent with 
Jena? or which one is currently used in Protege ?
Now I am using the openllet-jena, version 2.6.3:


To finf the version of jena to go with a release of openllet-jena, 
either check through the dependencies of your project or look in the POM 
for openllet-parent.


There are later versions of openllet-jena

https://repo1.maven.org/maven2/com/github/galigator/openllet/openllet-jena/

The version in the git repo is

2.6.6-SNAPSHOT

uses Jena 4.2.0

https://github.com/Galigator/openllet/blob/3abccbfc0eec54233590cd4149055b78351e374d/pom.xml#L88

so you could try building from source.

Andy





com.github.galigator.openllet
openllet-jena
2.6.3


But I noticed some Datatype conversion problems in there..

Thank you for your time and all the best

Taras



Dr. Taras Petrenko
High Performance Computing Center Stuttgart (HLRS)
Nobelstr. 19, 70569 Stuttgart, Germany
Email: taras.petre...@hlrs.de



Re: In using RIOT I encounter the "64000" entity expansions error.

2023-10-13 Thread Andy Seaborne


On 12/10/2023 20:20, Steve Vestal wrote:
I couldn't resist trying https://purl.obolibrary.org/obo/foodon.owl as 
a stress test for what we are doing.  We're on Jena 4.5.0 and I'm getting


Not in RDF/XML format due to exception 
org.apache.jena.riot.RiotException [line: 110334, col: 72] Invalid 
byte 2 of 2-byte UTF-8 sequence.

("Not in RDF/XML format due to..." does not appear to be a Jena message)

At that location:

"...(/ˈærɪkə/ or /əˈriːkə/)..."
        ^
(This email is UTF-8)

Line/column for encoding problems aren't always right but it looks like 
it is here.


Works for me in 3.17.0, 4.5.0, 5.0.0-dev

JVM_ARGS="-DentityExpansionLimit=200" riot --validate --count foodon.owl


Could this be due to my Jena version or Eclipse or Windows or UTF-8?


Windows most likely.
It can happen if the data has been piped at the command line.

    Andy



On 10/12/2023 1:42 PM, Andy Seaborne wrote:

Thanks. It parses OK.

On Thu, 12 Oct 2023, 19:36 Jim Balhoff,  wrote:


On Oct 6, 2023, at 3:46 AM, Andy Seaborne  wrote:


On 28/06/2023 09:26, Damion Dooley wrote:

I’m using RIOT to parse a large food ontology in owl rdf/xml format.

Damion,

Is that data publicly available?

There's a new RDF/XML parser for Jena in the pipeline and I'd like to

try it out on real data.

Andy,

Damion is active in FOODON, so that may be the ontology to try:
http://obofoundry.org/ontology/foodon.html

The ontology is at https://purl.obolibrary.org/obo/foodon.owl

- Jim





  1   2   3   4   5   6   7   8   9   10   >