On 27/12/13 01:45, Rose Beck wrote:
I am afraid I had data in this format and by mistake now I have loaded the
data in this format...and it took TDB 10 days to load it on a server with
64GB RAM. Is there some way by which I may still query it on relative IRIs?
(I am asking because I have to report the results today and my boss will
get hyper angry on me).
If there is some way out then please let me know?
I managed to force ARQ to parse without a base by bypassing the
QueryFactory and calling the parser directly --
public static void main(String... argv) throws Exception {
String x = FileUtils.readWholeFileAsUTF8("/home/afs/tmp/Q.rq") ;
// Create empty query
Query q = new Query() ;
// Create a parser
SPARQLParser parser =
SPARQLParser.createParser(Syntax.syntaxSPARQL_11) ;
// Call directly
parser.parse(q, x) ;
// To show they are relative URIs:
Op op = Algebra.compile(q) ;
System.out.println(op) ;
}
Andy
On Fri, Dec 27, 2013 at 6:35 AM, Damian Steer <[email protected]> wrote:
On 26 Dec 2013, at 20:51, Rose Beck <[email protected]> wrote:
I created my data file containing the following data(try.nq):
<http://dbpedia.org/data/Plasmodium_hegneri.xml> <
http://code.google.com/p/ldspider/ns#headerInfo>
_:header16125770191335188966549 <a> .
Ah, here is the issue.
N-Quads _doesn't_ permit relative URIs / IRIs. [1] TDB is being kind /
unhelpful and loading the data as requested, full of relative IRIs.
This is, strictly, broken RDF. The behaviour when you work on it is as a
consequence undefined.
(If you run the data through validation this issue is apparent:
$ riot --validate try.nq
ERROR [line: 1, col: 133] Relative IRI: a
...)
Then I fired the following SPARQL command:
root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
...
I got the following output:
-----------------
| a | b | c |
=================
| <a> | <b> |
The answer is a) correct but b) very unhelpful. These are relative IRIs
but no base is given.
In this case, because the relative URIs are in the data.
ARQ will also print relative URIs in text output when making URIs
relative tot he base of the query.
If you want to check details, might be easier to look at the JSON output.
Andy
After this I tried another SPARQL query(given below) for which I obtained
an incorrect output:
SPARQL query:
root@server:/home/apache-jena-2.10.0/bin# ./tdbquery --time
--loc=/home/Jena/try "select ?a?b?c where{ graph ?j1{?a <b> <
http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
Output:
-------------
| a | b | c |
=============
-------------
Time: 0.095 sec
SPARQL, unlike N-Quads, allows relative IRIs. In the absence of a BASE
directive the IRIs are resolved relative to the query itself. In this case
the current directory is used as the base, so <b> is understood as
<CURRENT_DIR/b>. You can see this if you add --explain:
$ tdbquery --explain --loc=try "select ?a?b?c where{ graph ?j1{?a <b> <
http://dbpedia.org/data/Plasmodium_hegneri.xml>} }"
00:58:24 INFO exec :: QUERY
SELECT ?a ?b ?c
WHERE
{ GRAPH ?j1
{ ?a <b> <http://dbpedia.org/data/Plasmodium_hegneri.xml> }
}
00:58:24 INFO exec :: ALGEBRA
(project (?a ?b ?c)
(quadpattern (quad ?j1 ?a <file:///private/tmp/b> <
http://dbpedia.org/data/Plasmodium_hegneri.xml>)))
00:58:24 INFO exec :: Execute :: (?j1 ?a
<file:///private/tmp/b> <http://dbpedia.org/data/Plasmodium_hegneri.xml>)
-------------
| a | b | c |
=============
-------------
<b> is resolved to <file:///private/tmp/b> (I ran this in the temp dir).
Thus no results.
So the short answer is that the input data is broken.
Damian
[1] <http://www.w3.org/TR/n-quads/#sec-iri>
[2] <http://www.w3.org/TR/2013/REC-sparql11-query-20130321/#relIRIs>