Re: Concurrent access to a shared (Ont)Model

2016-08-26 Thread Martynas Jusevičius
Sorry, false alarm: this turned out to be a simpler case of Model
modification during iteration, obscured by a recursive call.

On Fri, Aug 26, 2016 at 8:04 PM, Martynas Jusevičius
 wrote:
> Hey Dave,
>
> another case of this came up. When calling imports.hasNext() on
>
> ExtendedIterator imports = ontology.listImports();
>
> I consistently get ConcurrentModificationException:
>
> at 
> org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkCME(LPTopGoalIterator.java:248)
> at 
> org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:222)
> at 
> org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at 
> org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at 
> org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
> at 
> org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at 
> org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
> at org.apache.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:49)
> at 
> org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at org.apache.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:49)
>
> Is there something special about listImports()? Is that considered a
> WRITE operation?
>
> I am creating a new OntModel with each request to avoid concurrent
> access, so I am quite sure (not 100% certain though) that nothing else
> is modifying this OntModel at the same time. As I had understood, in
> that case it should not be necessary to lock the model explicitly?
>
> I am attempting to implement polymorphism support, not sure if it
> could be related.
>
> On Fri, Jul 22, 2016 at 9:24 AM, Dave Reynolds
>  wrote:
>> On 21/07/16 22:26, Martynas Jusevičius wrote:
>>>
>>> Thanks Dave. Does the following code look reasonable?
>>>
>>>
>>>  OntModel ontModel =
>>> OntDocumentManager.getInstance().getOntology(ontologyURI,
>>> ontModelSpec);
>>>  ontModel.enterCriticalSection(Lock.READ);
>>>  try
>>>  {
>>>  OntModel clonedModel =
>>> ModelFactory.createOntologyModel(ontModelSpec);
>>>  clonedModel.add(ontModel);
>>>  return clonedModel;
>>>  }
>>>  finally
>>>  {
>>>  ontModel.leaveCriticalSection();
>>>  }
>>
>>
>> Seems reasonable so long as all the code that uses ontModel is similarly
>> wrapped in critical sections.
>>
>> Dave
>>
>>
>>> On Wed, Jul 20, 2016 at 10:46 AM, Dave Reynolds
>>>  wrote:

 So that's the reasoner in which case you need to lock the OntModel.

 Dave


 On 19/07/16 17:04, Martynas Jusevičius wrote:
>
>
> Hey Andy,
>
> I am not sure yet what is it that I need to lock - is it the OntModel,
> or
> the OntDocumentManager instance, or maybe both.
>
> But here are 2 actual exceptions:
>
> java.util.ConcurrentModificationException
> at
>
>
> com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkCME(LPTopGoalIterator.java:247)
> at
>
>
> com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:221)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
> at
> com.hp.hpl.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:48)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
> com.hp.hpl.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:48)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
> at
>
>
> com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
> at
>
>
> com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
> at
>
>
> org.graphity.client.filter.response.ConstructorBase.construct(ConstructorBase.java:130)
>
>
>
>
> java.util.ConcurrentModificationException: Due to closed iterator

Re: Concurrent access to a shared (Ont)Model

2016-08-26 Thread Martynas Jusevičius
Hey Dave,

another case of this came up. When calling imports.hasNext() on

ExtendedIterator imports = ontology.listImports();

I consistently get ConcurrentModificationException:

at 
org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkCME(LPTopGoalIterator.java:248)
at 
org.apache.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:222)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at org.apache.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:55)
at org.apache.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:49)
at 
org.apache.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
at org.apache.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:49)

Is there something special about listImports()? Is that considered a
WRITE operation?

I am creating a new OntModel with each request to avoid concurrent
access, so I am quite sure (not 100% certain though) that nothing else
is modifying this OntModel at the same time. As I had understood, in
that case it should not be necessary to lock the model explicitly?

I am attempting to implement polymorphism support, not sure if it
could be related.

On Fri, Jul 22, 2016 at 9:24 AM, Dave Reynolds
 wrote:
> On 21/07/16 22:26, Martynas Jusevičius wrote:
>>
>> Thanks Dave. Does the following code look reasonable?
>>
>>
>>  OntModel ontModel =
>> OntDocumentManager.getInstance().getOntology(ontologyURI,
>> ontModelSpec);
>>  ontModel.enterCriticalSection(Lock.READ);
>>  try
>>  {
>>  OntModel clonedModel =
>> ModelFactory.createOntologyModel(ontModelSpec);
>>  clonedModel.add(ontModel);
>>  return clonedModel;
>>  }
>>  finally
>>  {
>>  ontModel.leaveCriticalSection();
>>  }
>
>
> Seems reasonable so long as all the code that uses ontModel is similarly
> wrapped in critical sections.
>
> Dave
>
>
>> On Wed, Jul 20, 2016 at 10:46 AM, Dave Reynolds
>>  wrote:
>>>
>>> So that's the reasoner in which case you need to lock the OntModel.
>>>
>>> Dave
>>>
>>>
>>> On 19/07/16 17:04, Martynas Jusevičius wrote:


 Hey Andy,

 I am not sure yet what is it that I need to lock - is it the OntModel,
 or
 the OntDocumentManager instance, or maybe both.

 But here are 2 actual exceptions:

 java.util.ConcurrentModificationException
 at


 com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkCME(LPTopGoalIterator.java:247)
 at


 com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:221)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
 at
 com.hp.hpl.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:48)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at
 com.hp.hpl.jena.util.iterator.Map1Iterator.hasNext(Map1Iterator.java:48)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
 at


 com.hp.hpl.jena.util.iterator.WrappedIterator.hasNext(WrappedIterator.java:90)
 at


 com.hp.hpl.jena.util.iterator.FilterIterator.hasNext(FilterIterator.java:54)
 at


 org.graphity.client.filter.response.ConstructorBase.construct(ConstructorBase.java:130)




 java.util.ConcurrentModificationException: Due to closed iterator


 com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.checkClosed(LPTopGoalIterator.java:256)


 com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.moveForward(LPTopGoalIterator.java:95)


 com.hp.hpl.jena.reasoner.rulesys.impl.LPTopGoalIterator.hasNext(LPTopGoalIterator.java:222)


 

Re: TDB store parameters

2016-08-26 Thread Laurent Rucquoy
We use Microsoft Windows servers.



On 26 August 2016 at 13:01, Andy Seaborne  wrote:

> On 26/08/16 08:59, Laurent Rucquoy wrote:
>
>> Hello Andy,
>>
>> Thank you for your help.
>>
>> The params I'm mainly interested in changing are those of the profile
>> returned by StoreParams.getSmallStoreParams() to be able to reduce the
>> dataset size.
>>
>
> That is best done when creating the dataset in the first place.
>
> It reduces the in-memory cache foot print; it uses direct mode which uses
> in-JVM file cache but it does not swamp the machine with memory mapped
> files.
>
> For small datasets, it makes the file size seem less. The memory mapped
> files on Linux are spare files - space allocated but not used. The empty
> dataset on disk is 150K for Linux even though many file sizes are 8M. Some
> other OSs may allocate the whole space or they may misreport sparse files)
>
> Except the test of changing the fileMode from mapped to direct, I've not
>> made finer tuning on the other parameters, this is why the
>> StoreParams.getSmallStoreParams()
>> seems to be convenient for our needs.
>>
>> I've another question about this case:
>>
>> What will be the size result of changing from default store params to
>> small
>> store params on an existing TDB dataset ?
>>
>
> Not much.  The files reporting 8M will report 8k but the actual size is
> the same because all databases are compatible unless you change the block
> size or indexing.
>
> I think this will have an effect on future writing (i.e. the existing size
>> on disk will not be compacted -> is there a direct way or an existing tool
>> able to compact the size of an existing dataset ?)
>>
>
> Correct.
>
>
>> Regards,
>> Laurent
>>
>
> What OS are you using?
>
> Andy
>
>
>
>>
>> On 26 August 2016 at 00:22, Andy Seaborne  wrote:
>>
>> On 25/08/16 16:16, Laurent Rucquoy wrote:
>>>
>>> Hello,

 I'm implementing a TDB-backed dataset (Jena 3.1) and I whish to provide
 a
 method to change the StoreParams of this dataset.

 Because changing the StoreParams implies to release the corresponding
>

>>> dataset location, I'd like to identify the current StoreParams in use to
 be
 able to avoid to release the location if the StoreParams we want to
 apply
 now are the same as those currently used.


>>> Release is not so bad unless you are doing it frequently.
>>>
>>>
>>> What is the right way to do this (if possible) ?


>>> This may work:
>>>
>>> DatasetGraphTDB x = TDBInternal.getBaseDatasetGraphTDB(myDatasetGraph)
>>> StoreParams sp = x.getConfig().params ;
>>> System.out.println(sp);
>>>
>>> (the "may" is because I only think it works on a live dataset no tested
>>> it)
>>>
>>> Obviously the name "TDBInternal" is a warning!
>>>
>>> Which params are you interested in changing?
>>>
>>> Andy
>>>
>>> Defaults:
>>>
>>> fileMode   dft:mapped
>>> blockSize  dft:8192
>>> readCacheSize  dft:1
>>> writeCacheSize dft:2000
>>> Node2NodeIdCacheSize   dft:10
>>> NodeId2NodeCacheSize   dft:50
>>> NodeMissCacheSize  dft:100
>>> indexNode2Id   dft:node2id
>>> indexId2Node   dft:nodes
>>> primaryIndexTriplesdft:SPO
>>> tripleIndexes  dft:[SPO, POS, OSP]
>>> primaryIndexQuads  dft:GSPO
>>> quadIndexesdft:[GSPO, GPOS, GOSP, POSG, OSPG, SPOG]
>>> primaryIndexPrefix dft:GPU
>>> prefixIndexes  dft:[GPU]
>>> indexPrefixdft:prefixIdx
>>> prefixNode2Id  dft:prefix2id
>>> prefixId2Node  dft:prefixes
>>>
>>>
>>>
>>> Thank you in advance for your help.

 Sincerely,
 Laurent



>>>
>>
>


Re: EnhGraph polymorphism

2016-08-26 Thread Chris Dollin

On 26/08/16 12:03, Andy Seaborne wrote:

On 26/08/16 00:28, Martynas Jusevičius wrote:

Hey,

EnhGraph JavaDoc contains:

"WARNING. The polymorphic aspects of EnhGraph are not supported and
are not expected to be supported in this way for the indefinite
future."

https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/enhanced/EnhGraph.html


Is that supposed to mean Jena 3 does not support polymorphism, or
what?


No - the comment is much older than that.


Please clarify :)


I don't know what the javadoc comment means :-)


I do.

Once upon a time, when Jena2 was being invented, Jeremy invented Jena's
polymorphic node machinery, where a Resource in a Model may have
many different facets and you can use the .as() method to switch
between different facets of the same underlying node. All the
data about a facet is defined by the RDF Graph (= model for these
purposes) but of course might cache information. Changing facets
doesn't lose the information associated with a different facet.

We wondered if that same technique could be used to have
different kinds of Model as facets of an EnhGraph, so the same
entire graph could be presented in multiple different
ways.

However this turned out to be (a) a non-trivial extension of
the polymorphic resource machinery and (b) uncalled-for. But
we never got around to taking the hooks out. We just put a
comment in saying "don't try this". You're not missing
anything.

Chris

"Our future looks secure, but it's all out of our hands"
- Magenta, /Man and Machine/






Re: Slow SPARQL query

2016-08-26 Thread Mikael Pesonen


I'm happy to try out the snapshot. Its just matter of running the server 
- no modifications of data or config needed?


Do you know when the new version will released (weeks, months)?

Mikael


On 26.8.2016 13:53, Andy Seaborne wrote:

On 26/08/16 11:35, Rob Vesse wrote:

To try to answer the question about your specific query it’s
difficult without knowing more about the nature of the data, in your
case how many named graphs are in the database?

One thing that jumps out at me is that you use the GRAPH operator in
your query. That operator essentially requires that a query engine
applies your inner query to every single graph in your database. In
practice ARQ Will try to do something a bit more efficient than that
but this is not guaranteed.


TDB does all the graphs at the same time if it can. Property paths can 
stop this but basic graph patterns + GRAPH ?var is done as quad table 
accesses.


Ditto for TIM.

(It's only the general purpose in-memory dataset where you can put any 
graph in from any source that has to loop.)



Your inner query uses a lots of property paths and so is potentially
quite expensive. As a first step I would suggest changing * to + if
you can as that will avoid having to match the Zero length path which
while quite simple for your case can be very expensive for more
generic property paths.


And has been speed up recently (after the last release I'm afraid, 
post Jena 3.1.0 (Fuseki 2.4.0)) JENA-1195


How long are the skos:narrower* chains?

Mikael - are you able to try out a SNAPSHOT build?


Secondly if you are able to limit the number of graphs that are under
consideration you may be able to substantially improved performance.
One way to do this would be to place a pattern prior to the GRAPH
operator that restricts the values of the ?graph variable.  ARQ
should then be able to use that information to restrict which graphs
in the database it scans.

Rob



Andy




--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND



Re: Slow SPARQL query

2016-08-26 Thread Mikael Pesonen


Haven't done any changes there :-)

-Mikael


On 26.8.2016 13:55, A. Soroka wrote:

Yes, unless you have made some changes in configuration that you would not have 
made without noticing. {grin} TDB is the default storage for Fuseki.

---
A. Soroka
The University of Virginia Library


On Aug 26, 2016, at 6:40 AM, Mikael Pesonen  wrote:


Hi Rob,

Im using this command to start db:

/usr/bin/java -Xmx3600M -jar apache-jena-fuseki-2.3.1/fuseki-server.jar 
--update --port 3030 --loc=../apache-jena-3.0.1/DB /ds

and s- command line tools to make queries. In documentation there is tdbquery 
which I'm not using. But TDB is still in use?

Thanks,
Mikael



On 26.8.2016 13:27, Rob Vesse wrote:

Mikael

  If you’re using Fuseki then you are using TDB already. TDB is a native RDF 
database that uses memory mapped files to make data access as fast as possible.

SDB is a legacy system built on top of relational databases, so queries have to 
be compiled into SQL, submitted to the underlying relational database, and 
their results translated back into RDF appropriately. More complex queries 
cannot be translated directly into a single SQL query due to the differing 
semantics between the two query languages I may require many SQL Queries to 
answer.  SDB is no longer actively developed and receives only minor bug fixes.

  As for when you would not use TDB no probably three main criteria:

1 -  the amount of data you will store is that into the billions of triples. 
TDB Will scale pretty well into the millions of triples although this will 
depend on the complexity of the queries.
2 -  when you need clustering for load balancing, failover etc. TDB is a single 
node system, while there are ways to do load balancing these typically rely on 
layering additional services on top of it
3 -  when you need reasoning support.  TDB does not natively support reasoning, 
you can use other Jena apis to add this but they Will substantially be great 
performance because they require all the data to be in heap memory. If your 
data is static then you can compute the inference closure once and persist that 
into the database but if you need dynamic inference or extremely large-scale 
inference then TDB Will not be suitable.

There are plenty of commercial options that do address the above three criteria 
and people can probably provide recommendations if you think you need a 
commercial option.

  It is also worth noting that some queries are simply hard for any query 
engine to answer

Rob

On 26/08/2016 10:46, "Mikael Pesonen"  wrote:

  Hi, still wondering what I should do to make the performance better.
  I read that TDB is faster. What is the reason not to use TDB? Cant 
find
 any comparison on SDB and TDB in that regard.
  Br,
 Mikael
   On 16.8.2016 13:13, Andy Seaborne wrote:
 > On 15/08/16 09:47, Mikael Pesonen wrote:
 >>
 >> Hi,
 >>
 >> what do you mean by masking? It should remove duplicates and it makes
 >> the query run in half time compared to without DISTINCT. Result count at
 >> least is the same.
 >>
 >> Mikael
 >
 > If DISTINCT cause a lot of results to be turned into a few, it is
 > hiding a lot of work by the query engine.
 >
 > If it's the inner DISTINCT that halves the execution time, then the
 > improvements (in dev builds) to property* may help you.
 >
 > If it's the outer one, it's a serialization issue (which I doubt at
 > this sacale).
 >
 > Andy
 >
 >>
 >>
 >> On 12.8.2016 13:53, Andy Seaborne wrote:
 >>> On 08/08/16 11:56, Mikael Pesonen wrote:
 
  Hi Andy,
 
  storage is started like this:
 
  /usr/bin/java -Xmx3600M -jar
  /home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
  --port 3030 --loc=../apache-jena-3.0.1/DB /ds
 
  Ontology data is simple SKOS, and document data is also simple DC
  metadata triplets. Query returns ~15k triplets.
 
  I tested the SKOS part, and this executed in less than one second,
  returning ~50 items:
 >>>
 >>> How many without the two DISTINCT?
 >>>
 >>> I am wondering if the DISTINCT (the inner one) is masking a lot of
 >>> results.
 >>>
 
  SELECT DISTINCT *
  WHERE {
  GRAPH ?graph {
  SELECT DISTINCT ?child WHERE {
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
   

Re: ARQ - handling of n-quad file

2016-08-26 Thread Andy Seaborne
The union graph is about the access to the dataset, not how the dataset 
is built.


Try:

SELECT *
{
   GRAPH  {
  ?s ?p ?o
   }
}

which is general.

--graph, --namedGraph read a graph and N-Quads isn't a graph - it's many 
graphs.  The graph parser will return just the default graph which is 
why it warns you.


-
Normally, the way to work the unin graph is to load TDB as a persistent 
database, set the union graph context on the TDB description.


See the Fuseki2 examples.

Andy

On 25/08/16 16:56, Welch, Bill wrote:



On 24/8/16, 12:50, "Andy Seaborne"  wrote:


On 24/08/16 17:13, Welch, Bill wrote:



On 24/8/16, 03:17, "Andy Seaborne"  wrote:


On 24/08/16 00:53, Martynas Jusevičius wrote:






You can run with the default being the union if you want to.


I've tried various combinations of --data, --graph and --namedGraph with
the .nq file along with --base with no joy. I generally get:
WARN  riot :: Only triples or default graph data expected
: named graph data ignored
And no result as arq skips all the triples in the .nq.

How, exactly, do I tell arq to run with the default being the union?



Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (2000 Galloping Hill Road, Kenilworth,
New Jersey, USA 07033), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.





Re: EnhGraph polymorphism

2016-08-26 Thread Andy Seaborne

On 26/08/16 00:28, Martynas Jusevičius wrote:

Hey,

EnhGraph JavaDoc contains:

"WARNING. The polymorphic aspects of EnhGraph are not supported and
are not expected to be supported in this way for the indefinite
future."

https://jena.apache.org/documentation/javadoc/jena/org/apache/jena/enhanced/EnhGraph.html

Is that supposed to mean Jena 3 does not support polymorphism, or
what?


No - the comment is much older than that.


Please clarify :)


I don't know what the javadoc comment means :-)

Andy




Martynas
atomgraph.com






Re: Slow SPARQL query

2016-08-26 Thread Andy Seaborne

On 26/08/16 11:35, Rob Vesse wrote:

To try to answer the question about your specific query it’s
difficult without knowing more about the nature of the data, in your
case how many named graphs are in the database?

One thing that jumps out at me is that you use the GRAPH operator in
your query. That operator essentially requires that a query engine
applies your inner query to every single graph in your database. In
practice ARQ Will try to do something a bit more efficient than that
but this is not guaranteed.


TDB does all the graphs at the same time if it can. Property paths can 
stop this but basic graph patterns + GRAPH ?var is done as quad table 
accesses.


Ditto for TIM.

(It's only the general purpose in-memory dataset where you can put any 
graph in from any source that has to loop.)



Your inner query uses a lots of property paths and so is potentially
quite expensive. As a first step I would suggest changing * to + if
you can as that will avoid having to match the Zero length path which
while quite simple for your case can be very expensive for more
generic property paths.


And has been speed up recently (after the last release I'm afraid, post 
Jena 3.1.0 (Fuseki 2.4.0)) JENA-1195


How long are the skos:narrower* chains?

Mikael - are you able to try out a SNAPSHOT build?


Secondly if you are able to limit the number of graphs that are under
consideration you may be able to substantially improved performance.
One way to do this would be to place a pattern prior to the GRAPH
operator that restricts the values of the ?graph variable.  ARQ
should then be able to use that information to restrict which graphs
in the database it scans.

Rob



Andy




Re: Slow SPARQL query

2016-08-26 Thread Mikael Pesonen


Hi Rob,

Im using this command to start db:

 /usr/bin/java -Xmx3600M -jar 
apache-jena-fuseki-2.3.1/fuseki-server.jar --update --port 3030 
--loc=../apache-jena-3.0.1/DB /ds


and s- command line tools to make queries. In documentation there is 
tdbquery which I'm not using. But TDB is still in use?


Thanks,
Mikael



On 26.8.2016 13:27, Rob Vesse wrote:

Mikael

  If you’re using Fuseki then you are using TDB already. TDB is a native RDF 
database that uses memory mapped files to make data access as fast as possible.

SDB is a legacy system built on top of relational databases, so queries have to 
be compiled into SQL, submitted to the underlying relational database, and 
their results translated back into RDF appropriately. More complex queries 
cannot be translated directly into a single SQL query due to the differing 
semantics between the two query languages I may require many SQL Queries to 
answer.  SDB is no longer actively developed and receives only minor bug fixes.

  As for when you would not use TDB no probably three main criteria:

1 -  the amount of data you will store is that into the billions of triples. 
TDB Will scale pretty well into the millions of triples although this will 
depend on the complexity of the queries.
2 -  when you need clustering for load balancing, failover etc. TDB is a single 
node system, while there are ways to do load balancing these typically rely on 
layering additional services on top of it
3 -  when you need reasoning support.  TDB does not natively support reasoning, 
you can use other Jena apis to add this but they Will substantially be great 
performance because they require all the data to be in heap memory. If your 
data is static then you can compute the inference closure once and persist that 
into the database but if you need dynamic inference or extremely large-scale 
inference then TDB Will not be suitable.

There are plenty of commercial options that do address the above three criteria 
and people can probably provide recommendations if you think you need a 
commercial option.

  It is also worth noting that some queries are simply hard for any query 
engine to answer

Rob

On 26/08/2016 10:46, "Mikael Pesonen"  wrote:

 
 Hi, still wondering what I should do to make the performance better.
 
 I read that TDB is faster. What is the reason not to use TDB? Cant find

 any comparison on SDB and TDB in that regard.
 
 Br,

 Mikael
 
 
 On 16.8.2016 13:13, Andy Seaborne wrote:

 > On 15/08/16 09:47, Mikael Pesonen wrote:
 >>
 >> Hi,
 >>
 >> what do you mean by masking? It should remove duplicates and it makes
 >> the query run in half time compared to without DISTINCT. Result count at
 >> least is the same.
 >>
 >> Mikael
 >
 > If DISTINCT cause a lot of results to be turned into a few, it is
 > hiding a lot of work by the query engine.
 >
 > If it's the inner DISTINCT that halves the execution time, then the
 > improvements (in dev builds) to property* may help you.
 >
 > If it's the outer one, it's a serialization issue (which I doubt at
 > this sacale).
 >
 > Andy
 >
 >>
 >>
 >> On 12.8.2016 13:53, Andy Seaborne wrote:
 >>> On 08/08/16 11:56, Mikael Pesonen wrote:
 
  Hi Andy,
 
  storage is started like this:
 
  /usr/bin/java -Xmx3600M -jar
  /home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
  --port 3030 --loc=../apache-jena-3.0.1/DB /ds
 
  Ontology data is simple SKOS, and document data is also simple DC
  metadata triplets. Query returns ~15k triplets.
 
  I tested the SKOS part, and this executed in less than one second,
  returning ~50 items:
 >>>
 >>> How many without the two DISTINCT?
 >>>
 >>> I am wondering if the DISTINCT (the inner one) is masking a lot of
 >>> results.
 >>>
 
  SELECT DISTINCT *
  WHERE {
  GRAPH ?graph {
  SELECT DISTINCT ?child WHERE {
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
  skos:narrower* ?child}
  UNION
  
{
 
 
  

Re: Slow SPARQL query

2016-08-26 Thread Rob Vesse
To try to answer the question about your specific query it’s difficult without 
knowing more about the nature of the data, in your case how many named graphs 
are in the database?

One thing that jumps out at me is that you use the GRAPH operator in your 
query. That operator essentially requires that a query engine applies your 
inner query to every single graph in your database. In practice ARQ Will try to 
do something a bit more efficient than that but this is not guaranteed.

Your inner query uses a lots of property paths and so is potentially quite 
expensive. As a first step I would suggest changing * to + if you can as that 
will avoid having to match the Zero length path which while quite simple for 
your case can be very expensive for more generic property paths.

Secondly if you are able to limit the number of graphs that are under 
consideration you may be able to substantially improved performance. One way to 
do this would be to place a pattern prior to the GRAPH operator that restricts 
the values of the ?graph variable.  ARQ should then be able to use that 
information to restrict which graphs in the database it scans.

Rob

On 26/08/2016 10:46, "Mikael Pesonen"  wrote:


Hi, still wondering what I should do to make the performance better.

I read that TDB is faster. What is the reason not to use TDB? Cant find 
any comparison on SDB and TDB in that regard.

Br,
Mikael


On 16.8.2016 13:13, Andy Seaborne wrote:
> On 15/08/16 09:47, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> what do you mean by masking? It should remove duplicates and it makes
>> the query run in half time compared to without DISTINCT. Result count at
>> least is the same.
>>
>> Mikael
>
> If DISTINCT cause a lot of results to be turned into a few, it is 
> hiding a lot of work by the query engine.
>
> If it's the inner DISTINCT that halves the execution time, then the 
> improvements (in dev builds) to property* may help you.
>
> If it's the outer one, it's a serialization issue (which I doubt at 
> this sacale).
>
> Andy
>
>>
>>
>> On 12.8.2016 13:53, Andy Seaborne wrote:
>>> On 08/08/16 11:56, Mikael Pesonen wrote:

 Hi Andy,

 storage is started like this:

 /usr/bin/java -Xmx3600M -jar
 /home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
 --port 3030 --loc=../apache-jena-3.0.1/DB /ds

 Ontology data is simple SKOS, and document data is also simple DC
 metadata triplets. Query returns ~15k triplets.

 I tested the SKOS part, and this executed in less than one second,
 returning ~50 items:
>>>
>>> How many without the two DISTINCT?
>>>
>>> I am wondering if the DISTINCT (the inner one) is masking a lot of
>>> results.
>>>

 SELECT DISTINCT *
 WHERE {
 GRAPH ?graph {
 SELECT DISTINCT ?child WHERE {
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 

Re: Slow SPARQL query

2016-08-26 Thread Rob Vesse
Mikael

 If you’re using Fuseki then you are using TDB already. TDB is a native RDF 
database that uses memory mapped files to make data access as fast as possible.

SDB is a legacy system built on top of relational databases, so queries have to 
be compiled into SQL, submitted to the underlying relational database, and 
their results translated back into RDF appropriately. More complex queries 
cannot be translated directly into a single SQL query due to the differing 
semantics between the two query languages I may require many SQL Queries to 
answer.  SDB is no longer actively developed and receives only minor bug fixes.

 As for when you would not use TDB no probably three main criteria:

1 -  the amount of data you will store is that into the billions of triples. 
TDB Will scale pretty well into the millions of triples although this will 
depend on the complexity of the queries.
2 -  when you need clustering for load balancing, failover etc. TDB is a single 
node system, while there are ways to do load balancing these typically rely on 
layering additional services on top of it
3 -  when you need reasoning support.  TDB does not natively support reasoning, 
you can use other Jena apis to add this but they Will substantially be great 
performance because they require all the data to be in heap memory. If your 
data is static then you can compute the inference closure once and persist that 
into the database but if you need dynamic inference or extremely large-scale 
inference then TDB Will not be suitable.

There are plenty of commercial options that do address the above three criteria 
and people can probably provide recommendations if you think you need a 
commercial option.

 It is also worth noting that some queries are simply hard for any query engine 
to answer

Rob

On 26/08/2016 10:46, "Mikael Pesonen"  wrote:


Hi, still wondering what I should do to make the performance better.

I read that TDB is faster. What is the reason not to use TDB? Cant find 
any comparison on SDB and TDB in that regard.

Br,
Mikael


On 16.8.2016 13:13, Andy Seaborne wrote:
> On 15/08/16 09:47, Mikael Pesonen wrote:
>>
>> Hi,
>>
>> what do you mean by masking? It should remove duplicates and it makes
>> the query run in half time compared to without DISTINCT. Result count at
>> least is the same.
>>
>> Mikael
>
> If DISTINCT cause a lot of results to be turned into a few, it is 
> hiding a lot of work by the query engine.
>
> If it's the inner DISTINCT that halves the execution time, then the 
> improvements (in dev builds) to property* may help you.
>
> If it's the outer one, it's a serialization issue (which I doubt at 
> this sacale).
>
> Andy
>
>>
>>
>> On 12.8.2016 13:53, Andy Seaborne wrote:
>>> On 08/08/16 11:56, Mikael Pesonen wrote:

 Hi Andy,

 storage is started like this:

 /usr/bin/java -Xmx3600M -jar
 /home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
 --port 3030 --loc=../apache-jena-3.0.1/DB /ds

 Ontology data is simple SKOS, and document data is also simple DC
 metadata triplets. Query returns ~15k triplets.

 I tested the SKOS part, and this executed in less than one second,
 returning ~50 items:
>>>
>>> How many without the two DISTINCT?
>>>
>>> I am wondering if the DISTINCT (the inner one) is masking a lot of
>>> results.
>>>

 SELECT DISTINCT *
 WHERE {
 GRAPH ?graph {
 SELECT DISTINCT ?child WHERE {
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 
{
 


 skos:narrower* ?child}
 UNION
 

Re: Arithmetic in Jena rules

2016-08-26 Thread javed khan
Thank you Niels for your time.

regards

On Thu, Aug 25, 2016 at 12:36 PM, Niels Andersen  wrote:

> See LessThan and GreaterThan here https://jena.apache.org/
> documentation/inference/#builtin-primitives
>
> Niels
>
> -Original Message-
> From: javed khan [mailto:javedbtk...@gmail.com]
> Sent: Thursday, August 25, 2016 11:34
> To: users@jena.apache.org
> Subject: Arithmetic in Jena rules
>
> Can we do some mathematical/arithmetic operations on the Jena rules. For
> instance, if we want to do something like this: if some Employee has salary
> more than ten thousands dollars, then Employee is on Manager post.
>
> Can we do this and others like (>=, <=, ==) ?
>
> Any web links to the literature related to this will be highly appreciated?
>
> Thank you.
>