Re: Implementing RDF reader

2015-05-11 Thread Andy Seaborne

On 10/05/15 21:48, Martynas Jusevičius wrote:

Hey all,

I want to refactor my RDF/POST parser into a Jena-compatible reader.
An example of the format can be found here:
http://www.lsrn.org/semweb/rdfpost.html#sec-examples

The documentation suggests implementing ReaderRIOT interface:
https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java

However, if I look at (what I think is) existing readers such as
Turtle for example, they do not seem to implement ReaderRIOT:
https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangTurtleBase.java

What is the explanation for that?


Hi Martynas,

It is historical - the Turtle derived parsers emerged with the 
RiotReader interface and some code is/was around that used that interface.


ReaderRIOTLang is the cross-over code from the proper interface 
ReaderRIOT to RiotReader. RiotReader is a fixed set of parsers.


This can be sorted out in Jena3.



Do I need to to tokenize the InputStream myself or is there some
machinery I can reuse?


The Turtle-world tokenizer is TokenizerText.  It is turtle term specific.

Any tokenizing for a new language is often, in my experience, very 
sensitive to the language details.


If you are used to javacc, and performance isn't critical at scale, 
that's a good tool.


RIOT uses custom I/O for speed; Jena used to have a javacc parser for 
Turtle but Turtle is sufficiently simple that a hand-written parser is 
doable.  A hand written tokenizer is for speed at scale (big file - 
about x2 than basic javacc tokenizing) but you need large input to make 
it worthwhile.  NTriples dumps of databases make it worthwhile.


If you do rdfpost - Turtle (string manipulation), then you can parse 
the Turtle as normal.  Downside: Error messages may be confusing as they 
refer to the Turtle, not the input string.


Splitting up the query string, with all the HTTP escaping rules, can be 
done with library code (see FusekiLib.parseQueryString [no longer used, 
but it works without consuming the body, unlike the servlet operations 
which combine form and query string processing] and probably lots of 
better code examples on the web.


Andy


Martynas
graphityhq.com





Re: Fuseki as Web Application read configuration file

2015-05-11 Thread Andy Seaborne

Hi Christian,

I've tried to recreate your setup but it's working for me for my 
understanding.  The part I'm not completely sure about is how you have 
your TDB database is configured into tomcat.  The full config file you 
showed (6/May) has the fuseki server setup:


-
[] rdf:type fuseki:Server ;

  ja:context [ ja:cxtName arq:queryTimeout ;  ja:cxtValue 4000 ] ;

   .
-
so you are not putting any service definitions in that file.

There is the problem I mentioned about a preset timeout of 3000ms when a 
Fuseki service is created via the UI.  That can be fixed by editing the 
template/config-tdb file.  You are not seeing that timeout.


So for the Fuseki setup, how are you configuring your TDB database?

My experiment:
1/ Stop tomcat, delete /etc/fuseki/*, start tomcat, stop tomcat.

That gets a clean area setup.


2/ Fix template/config-tdb (remove the timeout line).
3/ Start tomcat, execute the query below via the UI.  No timeout.

4/ Stop tomcat
5/ Edit /etc/fuseki/config.ttl, add a timeout of 1000.

6/ Start tomcat.
7/ Execute the query below via the UI.

I get a timeout.

I also tested with a complete configuration in /etc/fuseki/config.ttl 
(server and data service).


I was deleting the files in /etc/fuseki/system/ each time to get a known 
clean server.


By default, the fuseki log ends up in catalina.out - what does it say 
when the tomcat server is starting?


How have you been putting the data service configuration into Fuseki?

Andy

PS A test query that does not require data: it waits 1.5s then 1s then 
1s.  Timeout tests happen after each BIND.


PREFIX apf: http://jena.hpl.hp.com/ARQ/property#
PREFIX afn: http://jena.hpl.hp.com/ARQ/function#

SELECT *
WHERE {
  BIND (afn:wait(1500) AS ?X1)
  BIND (afn:wait(1000) AS ?X2)
  BIND (afn:wait(1000) AS ?X3)
}



On 09/05/15 12:04, Andy Seaborne wrote:

Hi Christian,

Thank you for the details.  It's not immediately obvious what is
happening.  I'll try to recreate your setup on my machine.

 Andy

On 07/05/15 17:56, Christian Schwaderer wrote:

Hi Andy,

thanks again!

I'll try to give some details about my setup:

I run a server with openSUSE 13.1/64 bit on wich Apache Tomcat 7.0.42
runs. I have deployed Fuseki 2.0.0 simply by putting the file
fuseki.war into /srv/tomcat/webapps/ (and started it from within the
Tomcat Web Application Manager).

My application is written in PHP, for SPARQL queries I use the ARC2
library.

On my local test machine, I've successfully set a timelimit in Fuseki.
The database there was created with TDB (which is far less convenient
than the Fuseki UI), I've started Fuseki with the parameter
--config=config.ttl. I then tested a very time consuming SPARQL
query. And well, after 4seconds, Fuseki returned the desired timeout
error.

Now, on the actual server, with Fuseki wrapped inside Tomcat, the
timelimit is not working.

For instance, this sample query:

PREFIX rdfs:  http://www.w3.org/2000/01/rdf-schema#
PREFIX prop-de: http://de.dbpedia.org/property/
PREFIX dcterms: http://purl.org/dc/terms/

SELECT ?s

WHERE
{
  ?s prop-de:name ?all_names .
  ?s dcterms:subject ?category .
  FILTER (regex(?all_names, ,i))
}

takes about 50 seconds in PHP.

Hope, those details help you understand the setup.

Thanks again and in advance!

Best regards,
Christian




Date: Thu, 7 May 2015 11:35:45 +0100
From: a...@apache.org
To: users@jena.apache.org
Subject: Re: Fuseki as Web Application read configuration file

Hi Christian,

Something is causing the 4s to be ignored - I don't know for sure but if
you are controlling the timeouts then it does need to be addressed
because even if it's not a factor at the moment, it may well become one.
I don't have a complete picture of your setup yet.

What are the queries taking a long time? (There are places where the
timeout can be delayed.)

Andy

On 07/05/15 08:08, Christian Schwaderer wrote:

Thanks!

But before I try it, let me ask (just to make sure): Are you really
sure that this is the problem here?
As I mentioned earlier, my queries take up to 20 seconds. So, if
there is a 3 seconds time limit set somewhere for my dataset - this
setting is definitely not working.

Best regards!



Date: Wed, 6 May 2015 22:25:42 +0100
From: a...@apache.org
To: users@jena.apache.org
Subject: Re: Fuseki as Web Application read configuration file

On 06/05/15 19:03, Christian Schwaderer wrote:

Hi,

thanks for the answer!

The full config file looks like so:

@prefix : # .
@prefix fuseki: http://jena.apache.org/fuseki# .
@prefix rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# .
@prefix rdfs: http://www.w3.org/2000/01/rdf-schema# .
@prefix ja: http://jena.hpl.hp.com/2005/11/Assembler# .

[] rdf:type fuseki:Server ;

ja:context [ ja:cxtName arq:queryTimeout ; ja:cxtValue 4000 ] ;

.

(Comments removed)

Yes, I created the dataset with the UI. How can I 

Re: Trace back RDF containers in SPARQL

2015-05-11 Thread Laurent Rucquoy
Yes, the bad query is the good query with the last 3 triple patterns added.

When I run the good query (without the bad query last 3 triple patterns), I
get about ten calculationResult nodes.
When I run the bad query to try to retrieve the containing
calculationResultCollection, the system freezes.

What I want to do is to find the CalculationResultCollection nodes
containing CalculationResult nodes referring to CalculationDataCollection
nodes containing in their turn CalculationData nodes having 0^^xsd:string
value.

Here is what could look like an instances diagram:

CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_1--- CalculationData_1_1
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_2--- CalculationData_1_2
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_3--- CalculationData_1_3
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_1--- CalculationData_2_1
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_2--- CalculationData_2_2
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_3--- CalculationData_2_3
...


Thank you for your help.

Laurent.


On 8 May 2015 at 12:42, Andy Seaborne a...@apache.org wrote:

 On 08/05/15 09:43, Laurent Rucquoy wrote:

 Hi Andy,

 Thank you for your response.

 1) Which version of Jena are you running?
 The used version of Jena is 2.10.1 (I will upgrade soon...)


 Try with 2.13.0 because the area of BGP optimizations has been improved.



 2) How are you storing the data and how big is it?
 TDBFactory.createDataset(directory)
 COUNT(*) - 1 224 103
 350MB on disk
 Do you need other details ?


 3) You say the query returns good results - what sort of query causes the
 system to freeze?
 This is the query returning good results appended with 3 more statements
 in
 the WHERE clause:


 So the bad query is the good query with the last 3 triple patterns added?

 It's hard to read but

 ?seqCalculationResultCollection
   ?seqCalculationResultCollectionIndex ?calculationResult .
 ?calculationResultCollection
:listCalculationResult  ?seqCalculationResultCollection .
 ?calculationResultCollection rdf:type :CalculationResultCollection

 is connected to the good part by ?calculationResult; all the other
 variables are just fanning out from that point without anything like the
 :value 0^^xsd:string in the good part.  From what I understand of your
 data, that can be a huge number of results.

 Do you get no results, or that some results appear but then the query does
 not finish?

 Andy




 PREFIX : http://www.telemis.com/
 PREFIX xsd: http://www.w3.org/2001/XMLSchema#
 PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
 PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

 SELECT ?calculationResultCollection
 WHERE {
 ?calculationData a :CalculationData ;
 :value 0^^xsd:string .
 ?seqCalculationDataCollection ?seqCalculationDataCollectionIndex
 ?calculationData .
 ?calculationDataCollection a :CalculationDataCollection ;
 :listCalculationData ?seqCalculationDataCollection .
 ?calculationResult a :CalculationResult ;
 :calculationDataCollection ?calculationDataCollection .
 ?seqCalculationResultCollection ?seqCalculationResultCollectionIndex
 ?calculationResult .
 ?calculationResultCollection :listCalculationResult
 ?seqCalculationResultCollection ;
 a :CalculationResultCollection .
 }





 On 7 May 2015 at 15:59, Andy Seaborne a...@apache.org wrote:

  Hi Laurent,

 Which version of Jena are you running?  How are you storing the data and
 how big is it?

 You say the query returns good results - what sort of query causes the
 system to freeze?

  Andy


 On 06/05/15 15:17, Laurent Rucquoy wrote:

  Hello,

 I have container resources which I want to retrieve from one of their
 elements.
 Here is an RDF/XML part of my data:


 rdf:Description rdf:about=


 http://www.telemis.com/CalculationResult/9e892a88-7257-4af3-881d-2eb304437546
 
 rdf:type rdf:resource=http://www.telemis.com/CalculationResult/
 TM:uid rdf:datatype=http://www.w3.org/2001/XMLSchema#string
 9e892a88-7257-4af3-881d-2eb304437546/TM:uid
 ...
 

RE: Fuseki as Web Application read configuration file

2015-05-11 Thread Christian Schwaderer
Hi Andy,

thank you so much for your detailled answer!

In the Fuseki UI with your test query, I get now the desired timeout error. 
(Though, I don't unterstand why Fuseki waits for the whole period of time 
stated within wait() instead of breaking up the executing after its timeout.)

Nevertheless, my own sample query executed via PHP still takes 50 seconds and 
doesn't end in a timeout error. From the Fuseki UI, it is quite faster. So, my 
real problem might be ARC2.

Thanks again for your help!

Christian


 Date: Mon, 11 May 2015 12:35:24 +0100
 From: a...@apache.org
 To: users@jena.apache.org
 Subject: Re: Fuseki as Web Application read configuration file

 Hi Christian,

 I've tried to recreate your setup but it's working for me for my
 understanding. The part I'm not completely sure about is how you have
 your TDB database is configured into tomcat. The full config file you
 showed (6/May) has the fuseki server setup:

 -
 [] rdf:type fuseki:Server ;

 ja:context [ ja:cxtName arq:queryTimeout ; ja:cxtValue 4000 ] ;

 .
 -
 so you are not putting any service definitions in that file.

 There is the problem I mentioned about a preset timeout of 3000ms when a
 Fuseki service is created via the UI. That can be fixed by editing the
 template/config-tdb file. You are not seeing that timeout.

 So for the Fuseki setup, how are you configuring your TDB database?

 My experiment:
 1/ Stop tomcat, delete /etc/fuseki/*, start tomcat, stop tomcat.

 That gets a clean area setup.


 2/ Fix template/config-tdb (remove the timeout line).
 3/ Start tomcat, execute the query below via the UI. No timeout.

 4/ Stop tomcat
 5/ Edit /etc/fuseki/config.ttl, add a timeout of 1000.

 6/ Start tomcat.
 7/ Execute the query below via the UI.

 I get a timeout.

 I also tested with a complete configuration in /etc/fuseki/config.ttl
 (server and data service).

 I was deleting the files in /etc/fuseki/system/ each time to get a known
 clean server.

 By default, the fuseki log ends up in catalina.out - what does it say
 when the tomcat server is starting?

 How have you been putting the data service configuration into Fuseki?

 Andy

 PS A test query that does not require data: it waits 1.5s then 1s then
 1s. Timeout tests happen after each BIND.

 PREFIX apf: http://jena.hpl.hp.com/ARQ/property#
 PREFIX afn: http://jena.hpl.hp.com/ARQ/function#

 SELECT *
 WHERE {
 BIND (afn:wait(1500) AS ?X1)
 BIND (afn:wait(1000) AS ?X2)
 BIND (afn:wait(1000) AS ?X3)
 }



 On 09/05/15 12:04, Andy Seaborne wrote:
 Hi Christian,

 Thank you for the details. It's not immediately obvious what is
 happening. I'll try to recreate your setup on my machine.

 Andy

 On 07/05/15 17:56, Christian Schwaderer wrote:
 Hi Andy,

 thanks again!

 I'll try to give some details about my setup:

 I run a server with openSUSE 13.1/64 bit on wich Apache Tomcat 7.0.42
 runs. I have deployed Fuseki 2.0.0 simply by putting the file
 fuseki.war into /srv/tomcat/webapps/ (and started it from within the
 Tomcat Web Application Manager).

 My application is written in PHP, for SPARQL queries I use the ARC2
 library.

 On my local test machine, I've successfully set a timelimit in Fuseki.
 The database there was created with TDB (which is far less convenient
 than the Fuseki UI), I've started Fuseki with the parameter
 --config=config.ttl. I then tested a very time consuming SPARQL
 query. And well, after 4seconds, Fuseki returned the desired timeout
 error.

 Now, on the actual server, with Fuseki wrapped inside Tomcat, the
 timelimit is not working.

 For instance, this sample query:

 PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
 PREFIX prop-de: http://de.dbpedia.org/property/
 PREFIX dcterms: http://purl.org/dc/terms/

 SELECT ?s

 WHERE
 {
 ?s prop-de:name ?all_names .
 ?s dcterms:subject ?category .
 FILTER (regex(?all_names, ,i))
 }

 takes about 50 seconds in PHP.

 Hope, those details help you understand the setup.

 Thanks again and in advance!

 Best regards,
 Christian


 
 Date: Thu, 7 May 2015 11:35:45 +0100
 From: a...@apache.org
 To: users@jena.apache.org
 Subject: Re: Fuseki as Web Application read configuration file

 Hi Christian,

 Something is causing the 4s to be ignored - I don't know for sure but if
 you are controlling the timeouts then it does need to be addressed
 because even if it's not a factor at the moment, it may well become one.
 I don't have a complete picture of your setup yet.

 What are the queries taking a long time? (There are places where the
 timeout can be delayed.)

 Andy

 On 07/05/15 08:08, Christian Schwaderer wrote:
 Thanks!

 But before I try it, let me ask (just to make sure): Are you really
 sure that this is the problem here?
 As I mentioned earlier, my queries take up to 20 seconds. So, if
 there is a 3 seconds time limit set somewhere for my dataset - this
 setting is 

Re: Fuseki as Web Application read configuration file

2015-05-11 Thread Andy Seaborne

On 11/05/15 14:41, Christian Schwaderer wrote:

Hi Andy,

thank you so much for your detailled answer!

In the Fuseki UI with your test query, I get now the desired timeout error. 
(Though, I don't unterstand why Fuseki waits for the whole period of time 
stated within wait() instead of breaking up the executing after its timeout.)

Nevertheless, my own sample query executed via PHP still takes 50 seconds and 
doesn't end in a timeout error. From the Fuseki UI, it is quite faster. So, my 
real problem might be ARC2.

Thanks again for your help!

Christian


Hi Christian,

If a timeout goes off, the output is intentionally invalid

e.g.

##  Query cancelled due to timeout during execution   ##
##    Incomplete results      ##

With HTTP, the status code has to go first, before you know you can 
deliver the results.  Buffering the response breaks scale and leads to 
poor latency.


Maybe this is what is confusing ARC2, then it holds the connection open, 
waiting.


You might try asking for a different format (sorry - don't know ARC2 
well enough).




Andy






Date: Mon, 11 May 2015 12:35:24 +0100
From: a...@apache.org
To: users@jena.apache.org
Subject: Re: Fuseki as Web Application read configuration file

Hi Christian,

I've tried to recreate your setup but it's working for me for my
understanding. The part I'm not completely sure about is how you have
your TDB database is configured into tomcat. The full config file you
showed (6/May) has the fuseki server setup:

-
[] rdf:type fuseki:Server ;

ja:context [ ja:cxtName arq:queryTimeout ; ja:cxtValue 4000 ] ;

.
-
so you are not putting any service definitions in that file.

There is the problem I mentioned about a preset timeout of 3000ms when a
Fuseki service is created via the UI. That can be fixed by editing the
template/config-tdb file. You are not seeing that timeout.

So for the Fuseki setup, how are you configuring your TDB database?

My experiment:
1/ Stop tomcat, delete /etc/fuseki/*, start tomcat, stop tomcat.

That gets a clean area setup.


2/ Fix template/config-tdb (remove the timeout line).
3/ Start tomcat, execute the query below via the UI. No timeout.

4/ Stop tomcat
5/ Edit /etc/fuseki/config.ttl, add a timeout of 1000.

6/ Start tomcat.
7/ Execute the query below via the UI.

I get a timeout.

I also tested with a complete configuration in /etc/fuseki/config.ttl
(server and data service).

I was deleting the files in /etc/fuseki/system/ each time to get a known
clean server.

By default, the fuseki log ends up in catalina.out - what does it say
when the tomcat server is starting?

How have you been putting the data service configuration into Fuseki?

Andy

PS A test query that does not require data: it waits 1.5s then 1s then
1s. Timeout tests happen after each BIND.

PREFIX apf: http://jena.hpl.hp.com/ARQ/property#
PREFIX afn: http://jena.hpl.hp.com/ARQ/function#

SELECT *
WHERE {
BIND (afn:wait(1500) AS ?X1)
BIND (afn:wait(1000) AS ?X2)
BIND (afn:wait(1000) AS ?X3)
}



On 09/05/15 12:04, Andy Seaborne wrote:

Hi Christian,

Thank you for the details. It's not immediately obvious what is
happening. I'll try to recreate your setup on my machine.

Andy

On 07/05/15 17:56, Christian Schwaderer wrote:

Hi Andy,

thanks again!

I'll try to give some details about my setup:

I run a server with openSUSE 13.1/64 bit on wich Apache Tomcat 7.0.42
runs. I have deployed Fuseki 2.0.0 simply by putting the file
fuseki.war into /srv/tomcat/webapps/ (and started it from within the
Tomcat Web Application Manager).

My application is written in PHP, for SPARQL queries I use the ARC2
library.

On my local test machine, I've successfully set a timelimit in Fuseki.
The database there was created with TDB (which is far less convenient
than the Fuseki UI), I've started Fuseki with the parameter
--config=config.ttl. I then tested a very time consuming SPARQL
query. And well, after 4seconds, Fuseki returned the desired timeout
error.

Now, on the actual server, with Fuseki wrapped inside Tomcat, the
timelimit is not working.

For instance, this sample query:

PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
PREFIX prop-de: http://de.dbpedia.org/property/
PREFIX dcterms: http://purl.org/dc/terms/

SELECT ?s

WHERE
{
?s prop-de:name ?all_names .
?s dcterms:subject ?category .
FILTER (regex(?all_names, ,i))
}

takes about 50 seconds in PHP.

Hope, those details help you understand the setup.

Thanks again and in advance!

Best regards,
Christian




Date: Thu, 7 May 2015 11:35:45 +0100
From: a...@apache.org
To: users@jena.apache.org
Subject: Re: Fuseki as Web Application read configuration file

Hi Christian,

Something is causing the 4s to be ignored - I don't know for sure but if
you are controlling the timeouts then it does need to be addressed
because even if it's not a factor at the 

Re: Implementing RDF reader

2015-05-11 Thread Martynas Jusevičius
Thanks Andy.

I have a parser that works on String, but this time I want to do it
right and make it streaming and plug it into Jena at the low level.

It seems that I should be able to reuse some code from TokenizerText.

I understand StreamRDF is used to sink the triples, but what about
ParserProfile? I see LangTurtleBase uses it:

org.apache.jena.iri.IRI iri = profile.makeIRI(iriStr,
currLine, currCol) ;

How do I construct an instance of ParserProfile? Or is there an
alternative way to construct IRIs etc.?

Martynas

On Mon, May 11, 2015 at 2:44 PM, Andy Seaborne a...@apache.org wrote:
 On 10/05/15 21:48, Martynas Jusevičius wrote:

 Hey all,

 I want to refactor my RDF/POST parser into a Jena-compatible reader.
 An example of the format can be found here:
 http://www.lsrn.org/semweb/rdfpost.html#sec-examples

 The documentation suggests implementing ReaderRIOT interface:

 https://github.com/apache/jena/blob/master/jena-arq/src-examples/arq/examples/riot/ExRIOT_5.java

 However, if I look at (what I think is) existing readers such as
 Turtle for example, they do not seem to implement ReaderRIOT:

 https://github.com/apache/jena/blob/master/jena-arq/src/main/java/org/apache/jena/riot/lang/LangTurtleBase.java

 What is the explanation for that?


 Hi Martynas,

 It is historical - the Turtle derived parsers emerged with the RiotReader
 interface and some code is/was around that used that interface.

 ReaderRIOTLang is the cross-over code from the proper interface ReaderRIOT
 to RiotReader. RiotReader is a fixed set of parsers.

 This can be sorted out in Jena3.


 Do I need to to tokenize the InputStream myself or is there some
 machinery I can reuse?


 The Turtle-world tokenizer is TokenizerText.  It is turtle term specific.

 Any tokenizing for a new language is often, in my experience, very sensitive
 to the language details.

 If you are used to javacc, and performance isn't critical at scale, that's a
 good tool.

 RIOT uses custom I/O for speed; Jena used to have a javacc parser for Turtle
 but Turtle is sufficiently simple that a hand-written parser is doable.  A
 hand written tokenizer is for speed at scale (big file - about x2 than basic
 javacc tokenizing) but you need large input to make it worthwhile.  NTriples
 dumps of databases make it worthwhile.

 If you do rdfpost - Turtle (string manipulation), then you can parse the
 Turtle as normal.  Downside: Error messages may be confusing as they refer
 to the Turtle, not the input string.

 Splitting up the query string, with all the HTTP escaping rules, can be done
 with library code (see FusekiLib.parseQueryString [no longer used, but it
 works without consuming the body, unlike the servlet operations which
 combine form and query string processing] and probably lots of better code
 examples on the web.

 Andy


 Martynas
 graphityhq.com




Re: Trace back RDF containers in SPARQL

2015-05-11 Thread Andy Seaborne

On 11/05/15 14:55, Laurent Rucquoy wrote:

Yes, the bad query is the good query with the last 3 triple patterns added.


The optimizer in 2.10 would probably do a bad job on your query.  Adding 
the patterns makes it worse as it puts an unconstrained cross product 
(due to the ??? a :SomeClass2 parts).


2.13 is better, using fixed.opt.

It probably makes no difference as to whether you have a stats.opt file; 
if you have one, and with 2.13 it's worth trying both ways round.


It could well explain what you are seeing and until that possibility is 
removed, it's hard to see any further.


Andy


When I run the good query (without the bad query last 3 triple patterns), I
get about ten calculationResult nodes.
When I run the bad query to try to retrieve the containing
calculationResultCollection, the system freezes.

What I want to do is to find the CalculationResultCollection nodes
containing CalculationResult nodes referring to CalculationDataCollection
nodes containing in their turn CalculationData nodes having 0^^xsd:string
value.

Here is what could look like an instances diagram:

CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_1--- CalculationData_1_1
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_2--- CalculationData_1_2
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_1--- CalculationResult_1 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_3--- CalculationData_1_3
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_1--- CalculationData_2_1
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_2--- CalculationData_2_2
CalculationResultCollection ---listCalculationResult--- blank_node_CR
---rdf:_2--- CalculationResult_2 ---calculationDataCollection---
CalculationDataCollection ---listCalculationData--- blank_node_CD
---rdf:_3--- CalculationData_2_3
...


Thank you for your help.

Laurent.


On 8 May 2015 at 12:42, Andy Seaborne a...@apache.org wrote:


On 08/05/15 09:43, Laurent Rucquoy wrote:


Hi Andy,

Thank you for your response.

1) Which version of Jena are you running?
The used version of Jena is 2.10.1 (I will upgrade soon...)



Try with 2.13.0 because the area of BGP optimizations has been improved.




2) How are you storing the data and how big is it?
TDBFactory.createDataset(directory)
COUNT(*) - 1 224 103
350MB on disk
Do you need other details ?


3) You say the query returns good results - what sort of query causes the
system to freeze?
This is the query returning good results appended with 3 more statements
in
the WHERE clause:



So the bad query is the good query with the last 3 triple patterns added?

It's hard to read but

?seqCalculationResultCollection
   ?seqCalculationResultCollectionIndex ?calculationResult .
?calculationResultCollection
:listCalculationResult  ?seqCalculationResultCollection .
?calculationResultCollection rdf:type :CalculationResultCollection

is connected to the good part by ?calculationResult; all the other
variables are just fanning out from that point without anything like the
:value 0^^xsd:string in the good part.  From what I understand of your
data, that can be a huge number of results.

Do you get no results, or that some results appear but then the query does
not finish?

 Andy





PREFIX : http://www.telemis.com/
PREFIX xsd: http://www.w3.org/2001/XMLSchema#
PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#

SELECT ?calculationResultCollection
WHERE {
?calculationData a :CalculationData ;
:value 0^^xsd:string .
?seqCalculationDataCollection ?seqCalculationDataCollectionIndex
?calculationData .
?calculationDataCollection a :CalculationDataCollection ;
:listCalculationData ?seqCalculationDataCollection .
?calculationResult a :CalculationResult ;
:calculationDataCollection ?calculationDataCollection .
?seqCalculationResultCollection ?seqCalculationResultCollectionIndex
?calculationResult .
?calculationResultCollection :listCalculationResult
?seqCalculationResultCollection ;
a :CalculationResultCollection .
}





On 7 May 2015 at 15:59, Andy Seaborne a...@apache.org wrote:

  Hi Laurent,


Which version of Jena are you running?  How are you storing the data and
how big is it?

You say the query returns good results - what sort of query causes the
system to freeze?