Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Marc Agate
Hi,

owlURL is the url of the bdrc.owl file that is used to create the
Ontology model. 

public static final String owlURL="https://raw.githubusercontent.com/Bu
ddhistDigitalResourceCenter/owl-schema/master/bdrc.owl";
 
Marc

Le mercredi 14 mars 2018 à 20:25 +, Andy Seaborne a écrit :
> 
> On 14/03/18 19:12, Élie Roux wrote:
> > > In the case of inference then yes there is also an upfront cost
> > > of 
> > > computing the inferences.  Once computed these are typically
> > > cached 
> > > (though this depends on the rule set) and any changes to the
> > > data 
> > > might invalidate that cache.  You can call prepare() on the
> > > InfModel 
> > > to incur the initial computation cost separately, otherwise the 
> > > initial computation cost is incurred by whatever operation first 
> > > accesses the InfModel.  And as your email shows subsequent calls
> > > don't 
> > > incur that cost and are much faster.
> > 
> > I don't disagree, but I think there's a problem of scale here: even
> > with
> > a cold JVM and a not-too-efficient reasoner, it seems totally
> > unreasonable that a reasoner would take 60 full seconds (that's
> > what
> > Marc's test is taking on my machine) to run inference on a very
> > small
> > dataset already loaded in memory... 60s for such a small operation
> > really seems to indicate a bug to me. But maybe it doesn't...
> 
> 
> What's owlURL?
> 
> The second time cost is does not incur the forward inference rules,
> 9ms.
> 
> (Actually, if you see 60s and Marc sees 18s, something else is going
> on 
> as well)
> 
> > 
> > Thank you,

Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Andy Seaborne



On 14/03/18 19:12, Élie Roux wrote:
In the case of inference then yes there is also an upfront cost of 
computing the inferences.  Once computed these are typically cached 
(though this depends on the rule set) and any changes to the data 
might invalidate that cache.  You can call prepare() on the InfModel 
to incur the initial computation cost separately, otherwise the 
initial computation cost is incurred by whatever operation first 
accesses the InfModel.  And as your email shows subsequent calls don't 
incur that cost and are much faster.


I don't disagree, but I think there's a problem of scale here: even with
a cold JVM and a not-too-efficient reasoner, it seems totally
unreasonable that a reasoner would take 60 full seconds (that's what
Marc's test is taking on my machine) to run inference on a very small
dataset already loaded in memory... 60s for such a small operation
really seems to indicate a bug to me. But maybe it doesn't...



What's owlURL?

The second time cost is does not incur the forward inference rules, 9ms.

(Actually, if you see 60s and Marc sees 18s, something else is going on 
as well)




Thank you,


Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Élie Roux
In the case of inference then yes there is also an upfront cost of 
computing the inferences.  Once computed these are typically cached 
(though this depends on the rule set) and any changes to the data 
might invalidate that cache.  You can call prepare() on the InfModel 
to incur the initial computation cost separately, otherwise the 
initial computation cost is incurred by whatever operation first 
accesses the InfModel.  And as your email shows subsequent calls 
don't incur that cost and are much faster.


I don't disagree, but I think there's a problem of scale here: even with
a cold JVM and a not-too-efficient reasoner, it seems totally
unreasonable that a reasoner would take 60 full seconds (that's what
Marc's test is taking on my machine) to run inference on a very small
dataset already loaded in memory... 60s for such a small operation
really seems to indicate a bug to me. But maybe it doesn't...

Thank you,
--
Elie


Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Rob Vesse
Marc

Yes, displaying the result set is the first operation that actually consumes 
the results and thus actually executes the query hence my point.  When you call 
execSelect() you are just getting a placeholder object that knows how to 
execute the query at some future point in time.  Only when you actual start 
consuming the object does the query execution happen.

The difference in numbers you see on the first call to this is simply JVM 
ClassLoader effects on a cold VM, the first time you call execSelect() a whole 
bunch of classes have to be loaded in order to construct the underlying query 
iterator, subsequent calls will already have all the relevant classes loaded 
and thus be much faster.

The timing that matters is the consumption of the result set however you might 
choose to consume it whether by printing it out or otherwise.

In the case of inference then yes there is also an upfront cost of computing 
the inferences.  Once computed these are typically cached (though this depends 
on the rule set) and any changes to the data might invalidate that cache.  You 
can call prepare() on the InfModel to incur the initial computation cost 
separately, otherwise the initial computation cost is incurred by whatever 
operation first accesses the InfModel.  And as your email shows subsequent 
calls don't incur that cost and are much faster.

You are currently using OntModelSpec.OWL_MEM which is likely not the most 
performant rule set, there are variants that trade off OWL features/coverage 
for improved performance.  That isn't my area of expertise but other people on 
the list can probably suggest a relevant rule set if you can give some details 
of what kinds of inferences you are requiring.

Rob

On 14/03/2018, 18:41, "Marc Agate"  wrote:

Hi Rob,
Did you notice that in each case I display the full resultSet (and
therefore consume it) in the console ?In each case, I gave two numbers:
one that measures the time taken by execSelect (this number is relevant
since it can change a lot depending upon the query) and the second
number that measures (precisely) the time to consume the ResultSet
using ResulSetFormatter.asText(ResultSet res).
I am not benchmarking Jena and I have therefore no interest in the
timing values per se. My issue is just that I cannot use the API
because it's taking too long to return 86 results out of a model
comprising approximatively 4600 statements.
You are also telling me that " InfModel infMod =
ModelFactory.createInfModel(reasoner, m); " doesn't actually provides a
usable Model and that inference rules are applied when the ResultSet is
consumed. Well it looks like it's really the case : the execSelect()
takes 1ms in the case of an infModel, and "
System.out.println(ResultSetFormatter.asText(rs));" takes almost 19
seconds to complete).
Moreover, I actually ran the same test twice on the same infModel
object and yes, the second time it took 2ms for execSelect and 7ms to
consume the resultSet.
My conclusion is that one cannot use queries on InfModel created at
realtime (OR : any InfModel must be used once to be really usable then
after)

Marc

Le mercredi 14 mars 2018 à 16:41 +, Rob Vesse a écrit :
> You've made a common error that people trying to benchmark Jena
> make.  execSelect() simply prepares a result set backed by an
> iterator that is capable of answering the query, until you are
> consume that result set no execution actually takes place.  All query
> execution in Jena is lazy, if you want to time the time to execute
> the full results use a method that consumes/copies the returned
> iterator such as ResultSetFactory.copyResults() thus forcing full
> execution to happen
> 
> So what you are timing as the results processing is actually results
> processing + query execution.  Over an inference model the act of
> executing a query will cause inference rules to be applied which
> depending on the ontology and rules may take a long time.
> 
> Rob
> 
> On 14/03/2018, 16:26, "agate.m...@gmail.com" 
> wrote:
> 
> Hi,
> 
> I have included here (https://gist.github.com/MarcAgate/8bbe334fd
> 852817977c909af107a9c6b) some code tha illustrates the issue.
> It runs the same query against three different models (Model,
> InfModel and OntModel) of the same ontology.
> There's obviously a problem with InfModel.
> 
> Any idea ?
> 
> Thanks
> 
> Marc
> 
> 
> 
> 
> 
> 






Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Marc Agate
Hi Rob,
Did you notice that in each case I display the full resultSet (and
therefore consume it) in the console ?In each case, I gave two numbers:
one that measures the time taken by execSelect (this number is relevant
since it can change a lot depending upon the query) and the second
number that measures (precisely) the time to consume the ResultSet
using ResulSetFormatter.asText(ResultSet res).
I am not benchmarking Jena and I have therefore no interest in the
timing values per se. My issue is just that I cannot use the API
because it's taking too long to return 86 results out of a model
comprising approximatively 4600 statements.
You are also telling me that " InfModel infMod =
ModelFactory.createInfModel(reasoner, m); " doesn't actually provides a
usable Model and that inference rules are applied when the ResultSet is
consumed. Well it looks like it's really the case : the execSelect()
takes 1ms in the case of an infModel, and "
System.out.println(ResultSetFormatter.asText(rs));" takes almost 19
seconds to complete).
Moreover, I actually ran the same test twice on the same infModel
object and yes, the second time it took 2ms for execSelect and 7ms to
consume the resultSet.
My conclusion is that one cannot use queries on InfModel created at
realtime (OR : any InfModel must be used once to be really usable then
after)

Marc

Le mercredi 14 mars 2018 à 16:41 +, Rob Vesse a écrit :
> You've made a common error that people trying to benchmark Jena
> make.  execSelect() simply prepares a result set backed by an
> iterator that is capable of answering the query, until you are
> consume that result set no execution actually takes place.  All query
> execution in Jena is lazy, if you want to time the time to execute
> the full results use a method that consumes/copies the returned
> iterator such as ResultSetFactory.copyResults() thus forcing full
> execution to happen
> 
> So what you are timing as the results processing is actually results
> processing + query execution.  Over an inference model the act of
> executing a query will cause inference rules to be applied which
> depending on the ontology and rules may take a long time.
> 
> Rob
> 
> On 14/03/2018, 16:26, "agate.m...@gmail.com" 
> wrote:
> 
> Hi,
> 
> I have included here (https://gist.github.com/MarcAgate/8bbe334fd
> 852817977c909af107a9c6b) some code tha illustrates the issue.
> It runs the same query against three different models (Model,
> InfModel and OntModel) of the same ontology.
> There's obviously a problem with InfModel.
> 
> Any idea ?
> 
> Thanks
> 
> Marc
> 
> 
> 
> 
> 
> 

Re: Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread Rob Vesse
You've made a common error that people trying to benchmark Jena make.  
execSelect() simply prepares a result set backed by an iterator that is capable 
of answering the query, until you are consume that result set no execution 
actually takes place.  All query execution in Jena is lazy, if you want to time 
the time to execute the full results use a method that consumes/copies the 
returned iterator such as ResultSetFactory.copyResults() thus forcing full 
execution to happen

So what you are timing as the results processing is actually results processing 
+ query execution.  Over an inference model the act of executing a query will 
cause inference rules to be applied which depending on the ontology and rules 
may take a long time.

Rob

On 14/03/2018, 16:26, "agate.m...@gmail.com"  wrote:

Hi,

I have included here 
(https://gist.github.com/MarcAgate/8bbe334fd852817977c909af107a9c6b) some code 
tha illustrates the issue.
It runs the same query against three different models (Model, InfModel and 
OntModel) of the same ontology.
There's obviously a problem with InfModel.

Any idea ?

Thanks

Marc








Speed issue while processing query resultSets on various ontology models

2018-03-14 Thread agate . marc
Hi,

I have included here 
(https://gist.github.com/MarcAgate/8bbe334fd852817977c909af107a9c6b) some code 
tha illustrates the issue.
It runs the same query against three different models (Model, InfModel and 
OntModel) of the same ontology.
There's obviously a problem with InfModel.

Any idea ?

Thanks

Marc



Re: [3.0.1] ResultSetFactory.fromJSON() won't parse ASK JSON result

2018-03-14 Thread Andy Seaborne



On 14/03/18 12:48, Martynas Jusevičius wrote:

Andy,

I don't think that helps much. In fact, I think treating ASK result
differently from SELECT breaks some abstractions.

What I mean is that the result data structure normally maps to a media type
and not its query form. That way we can have generic parsers/serializers
that are orthogonal to application logic, for example:

MessageBodyReader: application/rdf+xml, text/turtle,
application/n-triples...
MessageBodyReader: application/n-quads...
MessageBodyReader: application/sparql-results+xml,
application/sparql-results+json...

Jena's treatment of ASK result breaks this pattern, because it maps to the
same media types as ResultSet does, but there is no way to parse it as
such. Do you see what I mean?

SPARQLResult does not help, because MessageBodyReader makes
little sense.


Why not use it for the SELECT/ASK case?
(Why not introduce your own results container type?)

Take a look at ResultsReader (not in 3.0.1).

Andy



Why not have ResultSet.getBoolean() or something?

On Mon, Mar 12, 2018 at 12:17 PM, Andy Seaborne  wrote:


JSONInput.make(InputStream) -> SPARQLResult

 Andy


On 12/03/18 10:13, Martynas Jusevičius wrote:


Hi Andy,

I'm not using QueryExecution here, I'm trying to parse JSON read from HTTP
InputStream using ResultSetFactory.fromJSON().

Then I want to carry the result set, maybe do some logic based on it, and
possibly serialize it back using ResultSetFormatter.

Is that not possible with ASK result?

On Mon, Mar 12, 2018 at 9:46 AM, Andy Seaborne  wrote:




On 11/03/18 23:03, Martynas Jusevičius wrote:

Hi,


I'm getting the following JSON result from an ASK query:

 { "head": {}, "boolean": true }

However, the method that usually works fine, will not parse it from
InputStream (Jena 3.0.1):

   org.apache.jena.sparql.resultset.ResultSetException: Not a
ResultSet
result
org.apache.jena.sparql.resultset.SPARQLResult.getResultSet(
SPARQLResult.java:94)
org.apache.jena.sparql.resultset.JSONInput.fromJSON(JSONInput.java:64)
org.apache.jena.query.ResultSetFactory.fromJSON(ResultSetFac
tory.java:331)

I stepped inside the code and I see that JSONObject is parsed fine, but
afterwards SPARQLResult.resultSet field is not being set for some
reason.

Any ideas?



The outcome of an ASK query is a boolean, not a ResultSet.

See execAsk.

SPARQLResult is the class for a holder of any SPARQL result type.

  Andy




Martynas









RE: Getting Symmetric Concise Bounded Description with Fuseki

2018-03-14 Thread Reto Gmür


> -Original Message-
> From: Martynas Jusevičius 
> Sent: Monday, March 12, 2018 5:11 PM
> To: jena-users-ml 
> Subject: Re: Getting Symmetric Concise Bounded Description with Fuseki
> 
> I disagree about SCBD as the default. In a Linked Data context, DESCRIBE is
> usually used to return description of a resource, meaning the resource is in 
> the
> subject position. And then bnode closure is added, because otherwise there
> would be no way to reach those bnodes. It's not about exploring the graph in
> all directions.

That a property describes more the subject than the object is not something 
intrinsic to RDF semantics but a design choice in ontologies. 

In https://www.w3.org/DesignIssues/LinkedData.html Tim Beners-Lee emphasizes 
that the description returned when dereferencing a resource must be a full 
Minimum Self Contained Graph (aka SCBD) for a graph to be browsable. In my 
opinion it should be possible to build linked data applications for browsable 
graphs that access the backend only via SPARQL and ideally only with a single 
query per returned representation. With Fuseki this is possible with a custom 
handler, GraphDB returns SCBD by default, with Virtuoso and other stores this 
is a matter of  configuration, with Stardog this is not possible. None of this 
implementations violates the SPARQL specification, in my opinion it's a shame 
that SPARQL doesn't provide a standard method for returning SCBD, but that's 
another story.

Cheers,
Reto

> 
> If you want more specific description, then you can always use CONSTRUCT.
> 
> Some triplestores, for example Dydra, allow specification of the description
> algorithm using a special PREFIX scheme, such as
> 
> PREFIX describeForm: 
> 
> On Mon, Mar 12, 2018 at 4:40 PM, Reto Gmür 
> wrote:
> 
> > Hi Andy
> >
> > > -Original Message-
> > > From: Andy Seaborne 
> > > Sent: Saturday, March 10, 2018 3:47 PM
> > > To: users@jena.apache.org
> > > Subject: Re: Getting Symmetric Concise Bounded Description with
> > > Fuseki
> > >
> > > Hi Reto,
> > >
> > > The whole DescribeHandler system is very(, very) old and hasn't
> > > changed
> > in
> > > ages, other than maintenance.
> > >
> > > On 10/03/18 11:44, Reto Gmür wrote:
> > > > Hi Andy,
> > > >
> > > > It first didn't quite work as I wanted it to: the model of the
> > resource passed
> > > to the describe model is the default graph so I got only the triples
> > > in
> > that
> > > graph.  Setting "tdb:unionDefaultGraph true" didn't change the graph
> > > the DescribeHandler gets.
> > >
> > > tdb:unionDefaultGraph only affects SPARQL execution.
> > >
> > > > Looking at the default implementation I saw that the Dataset can
> > > > be
> > > accessed from the context passed to the start method with
> > > cxt.get(ARQConstants.sysCurrentDataset). I am now using the Model
> > returned
> > > by dataset. getUnionModel.
> > >
> > > That should work.  Generally available getUnionModel post-dates the
> > describe
> > > handler code.
> > >
> > > > I'm wondering why the DescribeBNodeClosure doesn't do the same but
> > > instead queries for all graphs that contain the resource and then
> > > works
> > on
> > > each of the NamedModel individually. Is the UnionModel returned by
> > > the dataset inefficient that you've chosen this approach?
> > >
> > > I don't think so - much the same work is done, just in different places.
> > >
> > > getUnionModel will work with blank node named graphs.
> > >
> > > getUnionModel will do describes spanning graphs, iterating over
> > > named graphs will not.
> > >
> > > > Also the code seems to assume that the name of the graph is a URI,
> > > > does
> > > Jena not support Blank Nodes as names for graphs (having an
> > > "anonymous node" as name might be surprising but foreseen in RDF
> datasets)?
> > >
> > > Again, old code (pre RDF 1.1, which is where bNode graph names came in).
> > >
> > > Properly, nowadays, it should all work on DatasetGraph whose API
> > > does
> > work
> > > with bNode graphs.  Again, history.
> > >
> > > If you want to clean up, please do so.
> > >
> > > > It seems that even when a DescribeHandler is provided, the default
> > handler
> > > is executed as well. Is there a way to disable this?
> > >
> > > IIRC all the handers are executed - the idea being to apply all
> > > policies
> > and
> > > handlers may only be able to describe certain classes.  Remove any
> > > not required, or set your own registry in the query (a bit tricky in 
> > > Fuseki).
> > >
> > > > Another question is about the concept of "BNode closure", what's
> > > > the
> > > rationale for expanding only forward properties? Shouldn't a closure
> > > be everything that defines the node?
> > >
> > > It is a simple, basic policy - the idea being that more appropriate
> > > ones
> > which
> > > are data-sensitive would be used. This basic one can go wrong (FOAF
> > graphs
> > > when people 

Re: [3.0.1] ResultSetFactory.fromJSON() won't parse ASK JSON result

2018-03-14 Thread Martynas Jusevičius
Andy,

I don't think that helps much. In fact, I think treating ASK result
differently from SELECT breaks some abstractions.

What I mean is that the result data structure normally maps to a media type
and not its query form. That way we can have generic parsers/serializers
that are orthogonal to application logic, for example:

MessageBodyReader: application/rdf+xml, text/turtle,
application/n-triples...
MessageBodyReader: application/n-quads...
MessageBodyReader: application/sparql-results+xml,
application/sparql-results+json...

Jena's treatment of ASK result breaks this pattern, because it maps to the
same media types as ResultSet does, but there is no way to parse it as
such. Do you see what I mean?

SPARQLResult does not help, because MessageBodyReader makes
little sense.

Why not have ResultSet.getBoolean() or something?

On Mon, Mar 12, 2018 at 12:17 PM, Andy Seaborne  wrote:

> JSONInput.make(InputStream) -> SPARQLResult
>
> Andy
>
>
> On 12/03/18 10:13, Martynas Jusevičius wrote:
>
>> Hi Andy,
>>
>> I'm not using QueryExecution here, I'm trying to parse JSON read from HTTP
>> InputStream using ResultSetFactory.fromJSON().
>>
>> Then I want to carry the result set, maybe do some logic based on it, and
>> possibly serialize it back using ResultSetFormatter.
>>
>> Is that not possible with ASK result?
>>
>> On Mon, Mar 12, 2018 at 9:46 AM, Andy Seaborne  wrote:
>>
>>
>>>
>>> On 11/03/18 23:03, Martynas Jusevičius wrote:
>>>
>>> Hi,

 I'm getting the following JSON result from an ASK query:

 { "head": {}, "boolean": true }

 However, the method that usually works fine, will not parse it from
 InputStream (Jena 3.0.1):

   org.apache.jena.sparql.resultset.ResultSetException: Not a
 ResultSet
 result
 org.apache.jena.sparql.resultset.SPARQLResult.getResultSet(
 SPARQLResult.java:94)
 org.apache.jena.sparql.resultset.JSONInput.fromJSON(JSONInput.java:64)
 org.apache.jena.query.ResultSetFactory.fromJSON(ResultSetFac
 tory.java:331)

 I stepped inside the code and I see that JSONObject is parsed fine, but
 afterwards SPARQLResult.resultSet field is not being set for some
 reason.

 Any ideas?


>>> The outcome of an ASK query is a boolean, not a ResultSet.
>>>
>>> See execAsk.
>>>
>>> SPARQLResult is the class for a holder of any SPARQL result type.
>>>
>>>  Andy
>>>
>>>
>>>
 Martynas



>>


Re: OntModel.read parsing non-suffixed TTL as RDF/XML

2018-03-14 Thread Andy Seaborne

Lewis,

I tried:

OntModel m = ModelFactory.createOntologyModel();
m.read("http://sweetontology.net/sweetAll;, null, "TTL");
System.out.println("DONE");
System.exit(0);

and it worked for me (3.6.0, 3.7.0-dev; apache-jena-libs)
Have you got any local copies remapped?
Could you produce a standalone example that does not work for you?

Andy

On 14/03/18 08:52, Dave Reynolds wrote:

Hi Lewis,

On 14/03/18 06:13, Lewis John McGibbney wrote:

Hi Dave,
Coming back to this to address some thing I missed before

On 2018/03/07 08:57:47, Dave Reynolds  wrote:


OntModels use a somewhat older stack of tools (FileManager) which
guesses the language based on the suffix, with a default of RDF/XML, and
then relies on content negotiation to deliver the guessed format. Since
your resources don't support conneg


The resources do. For example, SWEET is served with conneg as per the 
following

http://sweetontoligy.net/sweetAll (returns default TTL)
http://sweetontoligy.net/sweetAll.ttl (explicitly returns TTL)
http://sweetontoligy.net/sweetAll.rdf (returns non-default RDFXML)


and don't support RDF/XML (the
official default) that's not going to work.


Ths issue I see is that even if I return the following ontology 
(http://sweetontoligy.net/sweetAll.rdf) resource as RDFXML, the 
imports contained within this resource e.g. 
http://sweetontology.net/relaHuman are parsed as RDFXML when they 
should be parsed as TTL e.g. the default manifestation.




One option might be to create a subclass of FileManager which overrides
readModelWorker to either load the data via the newer RDFDataMgr which
has more sophisticated conneg support, or to change the default syntax
to Turtle. Then install that FileManager in the OntDocumentManager you
use for your loading.



Yes, I can see from the stack trace that FileManager is being called 
as follows


at 
org.apache.jena.util.FileManager.readModelWorker(FileManager.java:375)

at org.apache.jena.util.FileManager.readModel(FileManager.java:342)
at org.apache.jena.util.FileManager.readModel(FileManager.java:326)

This in turn invokes readers expecting XML input, which is not the 
case...

I'll go ahead and implement the suggested fx as above.


Andy pointed out that in modern Jena versions RIOT rewires itself into 
model readers, so it *may* be that just updating to a newer version of 
Jena will solve this anyway.


Dave


Re: Towards Jena 3.7.0

2018-03-14 Thread Laura Morales
OK I can open it with Midori. Firefox on the other hand always shows me a login 
prompt with the message "ASF Committers" (even after I've cleaned the cache). I 
don't know, I don't have an account on apache.org.
 
 

Sent: Wednesday, March 14, 2018 at 12:42 PM
From: ajs6f 
To: users@jena.apache.org
Subject: Re: Towards Jena 3.7.0
Loads fine for me on several different browsers. Have you logged into an 
.apache.org website for some other purpose from that browser? Perhaps try 
another browser, or clearing your cookies.

Adam Soroka ; aj...@apache.org


Re: Towards Jena 3.7.0

2018-03-14 Thread Laura Morales
> http://jena.staging.apache.org/documentation/query/javascript-functions.html

401

I'd actually like to read about "ARQ custom functions to be written in 
JavaScript", is there a public link for this document?


Re: Towards Jena 3.7.0

2018-03-14 Thread Andy Seaborne



On 14/03/18 01:34, Laura Morales wrote:

JENA-1461: Allow ARQ custom functions to be written in JavaScript
Draft documentation:
http://jena.apache.org/documentation/query/javascript-functions.html


404?



http://jena.staging.apache.org/documentation/query/javascript-functions.html

Andy


Re: OntModel.read parsing non-suffixed TTL as RDF/XML

2018-03-14 Thread Dave Reynolds

Hi Lewis,

On 14/03/18 06:13, Lewis John McGibbney wrote:

Hi Dave,
Coming back to this to address some thing I missed before

On 2018/03/07 08:57:47, Dave Reynolds  wrote:


OntModels use a somewhat older stack of tools (FileManager) which
guesses the language based on the suffix, with a default of RDF/XML, and
then relies on content negotiation to deliver the guessed format. Since
your resources don't support conneg


The resources do. For example, SWEET is served with conneg as per the following
http://sweetontoligy.net/sweetAll (returns default TTL)
http://sweetontoligy.net/sweetAll.ttl (explicitly returns TTL)
http://sweetontoligy.net/sweetAll.rdf (returns non-default RDFXML)


and don't support RDF/XML (the
official default) that's not going to work.


Ths issue I see is that even if I return the following ontology 
(http://sweetontoligy.net/sweetAll.rdf) resource as RDFXML, the imports 
contained within this resource e.g. http://sweetontology.net/relaHuman are 
parsed as RDFXML when they should be parsed as TTL e.g. the default 
manifestation.



One option might be to create a subclass of FileManager which overrides
readModelWorker to either load the data via the newer RDFDataMgr which
has more sophisticated conneg support, or to change the default syntax
to Turtle. Then install that FileManager in the OntDocumentManager you
use for your loading.



Yes, I can see from the stack trace that FileManager is being called as follows

at 
org.apache.jena.util.FileManager.readModelWorker(FileManager.java:375)
at org.apache.jena.util.FileManager.readModel(FileManager.java:342)
at org.apache.jena.util.FileManager.readModel(FileManager.java:326)

This in turn invokes readers expecting XML input, which is not the case...
I'll go ahead and implement the suggested fx as above.


Andy pointed out that in modern Jena versions RIOT rewires itself into 
model readers, so it *may* be that just updating to a newer version of 
Jena will solve this anyway.


Dave


Re: OntModel.read parsing non-suffixed TTL as RDF/XML

2018-03-14 Thread Lewis John McGibbney
Hi Dave,
Coming back to this to address some thing I missed before

On 2018/03/07 08:57:47, Dave Reynolds  wrote: 

> OntModels use a somewhat older stack of tools (FileManager) which 
> guesses the language based on the suffix, with a default of RDF/XML, and 
> then relies on content negotiation to deliver the guessed format. Since 
> your resources don't support conneg 

The resources do. For example, SWEET is served with conneg as per the following
http://sweetontoligy.net/sweetAll (returns default TTL)
http://sweetontoligy.net/sweetAll.ttl (explicitly returns TTL)
http://sweetontoligy.net/sweetAll.rdf (returns non-default RDFXML)

> and don't support RDF/XML (the 
> official default) that's not going to work.

Ths issue I see is that even if I return the following ontology 
(http://sweetontoligy.net/sweetAll.rdf) resource as RDFXML, the imports 
contained within this resource e.g. http://sweetontology.net/relaHuman are 
parsed as RDFXML when they should be parsed as TTL e.g. the default 
manifestation.

> 
> One option might be to create a subclass of FileManager which overrides 
> readModelWorker to either load the data via the newer RDFDataMgr which 
> has more sophisticated conneg support, or to change the default syntax 
> to Turtle. Then install that FileManager in the OntDocumentManager you 
> use for your loading.
> 

Yes, I can see from the stack trace that FileManager is being called as follows

at 
org.apache.jena.util.FileManager.readModelWorker(FileManager.java:375)
at org.apache.jena.util.FileManager.readModel(FileManager.java:342)
at org.apache.jena.util.FileManager.readModel(FileManager.java:326)

This in turn invokes readers expecting XML input, which is not the case... 
I'll go ahead and implement the suggested fx as above.
Thanks again,
Lewis