objects properties as inverse and at the same time disjoint

2017-12-01 Thread Manuel Enrique Puebla Martinez


Hello :

  Is it an error to define two properties of objects as inverse to each other 
and at the same time disjunct?

The reasoner HermiT 1.3.8.3 is generating the following error and I am 
suspecting that is why:

However, other reasoners and evaluative tools such as OOPS, Pellet and TrOWL do 
not report this error.

Best regards, MAnuel Puebla.
La @universidad_uci es Fidel: 15 años conectados al futuro... conectados a la 
Revolución
2002-2017


Re: tdb2.tdbloader performance

2017-12-01 Thread Dick Murray
Hi.

Sorry for the delay :-)

Short story I used the following "reasonable" device

Dell M3800
Fedora 27
16GB SODIMM DDR3 Synchronous 1600 MHz
CPU cache L1/256KB,L2/1MB,L3/6MB
Intel(R) Core(TM) i7-4702HQ CPU @ 2.20GHz 4 cores 8 threads

to load part of the latest-truthy.nt from a USB3.0 1TB drive to a 6GB RAM
disk and;

@800%60K/Sec
@100%40K/Sec
@50%20K/Sec

The full source file contains 2.2G of triples in 10GB bz2 which
decompresses to 250GB nt, which I split into 10M triple chunks and used the
first one to test.

Check with Andy but I think it's limited by CPU, which is why my 24 core (4
x Xeon 6 Core @2.5GHz) 128GB server is able to run concurrent loads with no
performance hit.

I might have access to an AMD ThreadRipper 12 core 24 thread 5GHz in the
next few days and I will try and test against it.

I haven't run the full import because a: i'm guessing the resulting TDB2
will be "large" b: my servers are currently importing other "large"
TDB2's!!!

Long story follows...

decompress the file;

pbzip2 -dv -p4 -m1024 latest-truthy.nt.bz2
Parallel BZIP2 v1.1.12 [Dec 21, 2014]
By: Jeff Gilchrist [http://compression.ca]
Major contributions: Yavor Nikolov [http://javornikolov.wordpress.com]
Uses libbzip2 by Julian Seward

 # CPUs: 4
 Maximum Memory: 1024 MB
 Ignore Trailing Garbage: off
---
 File #: 1 of 1
 Input Name: latest-truthy.nt.bz2
Output Name: latest-truthy.nt

 BWT Block Size: 900k
 Input Size: 9965955258 bytes
Decompressing data...
Output Size: 277563574685 bytes
---

 Wall Clock: 5871.550948 seconds

count the lines;

wc -l latest-truthy.nt
2199382887 latest-truthy.nt

Just short of 2200M...

split the file into 10M chunks;

split -d -l 10485760 -a 3 --verbose latest-truthy.nt latest-truthy.nt.
creating file 'latest-truthy.nt.000'
creating file 'latest-truthy.nt.001'
creating file 'latest-truthy.nt.002'
creating file 'latest-truthy.nt.003'
creating file 'latest-truthy.nt.004'
creating file 'latest-truthy.nt.005'
...

Restart!

sudo cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt

ps aux | grep tdb2
root  3358  0.0  0.0 222844  5756 pts/0S+   19:22   0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3359  0.0  0.0   4500   776 pts/0S+   19:22   0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3360  0.0  0.0 120304  3288 pts/0S+   19:22   0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root  3361  4.9  0.0   450092 pts/0S<+  19:22   0:05 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  3366 95.7 14.8 7866116 2418768 pts/0 Sl+  19:22   1:42 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick  3477  0.0  0.0 119728   972 pts/1S+   19:24   0:00 grep
--color=auto tdb2

Notice PID 3366 is -Xmx2G default.

19:26:49 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.28s (Avg: 42,404)

After the first pass there is no read from the 1TB source as the OS has
cached the 1.2G source.

19:33:50 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 245.70s (Avg: 42,677)

export JVM_ARGS="-Xmx4G" i.e. increase the max heap and help the GC

sudo ps aux | grep tdb2
root  4317  0.0  0.0 222848  6236 pts/0S+   19:35   0:00 sudo
cpulimit -v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4321  0.0  0.0   4500   924 pts/0S+   19:35   0:00 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4322  0.0  0.0 120304  3356 pts/0S+   19:35   0:00 sh
./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc /media/ramdisk/
latest-truthy.000.nt
root  4323  4.8  0.0   450088 pts/0S<+  19:35   0:09 cpulimit
-v -l 100 -i sh ./apache-jena-3.5.0/bin/tdb2.tdbloader -v --loc
/media/ramdisk/ latest-truthy.000.nt
root  4328 94.8 18.5 8406788 3036188 pts/0 Sl+  19:35   3:01 java
-Dlog4j.configuration=file:/run/media/dick/KVM/jena/apache-jena-3.5.0/jena-log4j.properties
-cp /run/media/dick/KVM/jena/apache-jena-3.5.0/lib/* tdb2.tdbloader -v
--loc /media/ramdisk/ latest-truthy.000.nt
dick  4594  0.0  0.0 119728  1024 pts/1S+   19:38   0:00 grep
--color=auto tdb2

At 800K PID was 3GB and peaked at 3.4GB just prior to completion.

19:39:23 INFO  TDB2 :: Finished: 10,485,760
latest-truthy.000.nt 247.65s (Avg: 42,340)

Throw all CPU resources at it i.e. 800

sudo 

Re: Slow Lucene query

2017-12-01 Thread Andy Seaborne



On 01/12/17 11:49, Jean-Marc Vanel wrote:

Hi

The time for a SPARQL + Lucene query is more than 1 mn.
It used to be around one second, and then the database grew.
Here is a typical query run for a lookup service similar to dbpedia lookup:

PREFIX text: 
PREFIX form: <
http://raw.githubusercontent.com/jmvanel/semantic_forms/master/vocabulary/forms.owl.ttl#



SELECT DISTINCT ?thing ?COUNT WHERE {
   graph ?g {
 ?thing text:query ( 'Jean-Marc' ) .
   }


This is going to loop on each graph and make a text:query for each one.

Is that what you intended?

Remove the "graph ?g {".

(and then remove the DISTINCT)


   graph ?g1 {
 ?thing a ?CLASS .


Unnecessary?


   }
   OPTIONAL {
graph ?grCount {
 ?thing form:linksCount ?COUNT.
   } }
} ORDER BY DESC(?COUNT)
LIMIT 10

Here is a simpler query that is also slow:

PREFIX text: 
SELECT DISTINCT ?thing ?COUNT WHERE {
   graph ?g {
 ?thing text:query ( 'Jean-Marc' ) .
   }
} ORDER BY DESC(?COUNT)
LIMIT 10

Run it with:
time wget -O semantic-forms.cc_select-ui.txt
http://semantic-forms.cc:9112/select-ui?query=PREFIX+text%3A+%3Chttp%3A%2F%2Fjena.apache.org%2Ftext%23%3E+%0D%0ASELECT+DISTINCT+%3Fthing+%3FCOUNT+WHERE+%7B%0D%0A++graph+%3Fg+%7B%0D%0A%3Fthing+text%3Aquery+%28+%27Jean-Marc%27+%29+.%0D%0A++%7D%0D%0A%7D%0D%0AORDER+BY+DESC%28%3FCOUNT%29%0D%0ALIMIT+10

Or, if you want to use YasGUI , the endpoint is
http://semantic-forms.cc:9112/sparql

*Statistics on the database*
268 graphs and 588 864 triples.

# Count graphs and triples
SELECT (COUNT(?s) AS ?trc) (COUNT(?GR) AS ?grc)
 WHERE {
   { GRAPH ?GR { } }
   UNION
   { GRAPH ?GR1 { ?s ?p ?o . } }
}

Result: 2 rows
"grc" "trc"
"268"^^http://www.w3.org/2001/XMLSchema#integer "588864"^^
http://www.w3.org/2001/XMLSchema#integer

(I'm not sure this the right way to count, but it gives figures :) )

You can reproduce the query with this UI :

http://semantic-forms.cc:9112/select-ui?query=%23+Count+graphs%0D%0ASELECT+%28COUNT%28%3Fs%29+AS+%3Ftrc%29+%28COUNT%28%3FGR%29+AS+%3Fgrc%29%0D%0AWHERE+%7B%0D%0A+%7B+GRAPH+%3FGR+%7B+%7D+%7D%0D%0A++UNION%0D%0A%7B+GRAPH+%3FGR1+%7B+%3Fs+%3Fp+%3Fo+.+%7D+%7D%0D%0A%7D

This is using Jena 3.5.0. with TDB 1 .
Here is a stack made with kill -3 when the app. was working hard;
I put in bold a suspect line.

"application-akka.actor.default-dispatcher-1351" #2683 prio=5 os_prio=0
tid=0x7f07a801c000 nid=0x9b7 runnable [0x7f06f25ab000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:737)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:183)
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
at org.apache.lucene.util.fst.FST.(FST.java:327)
at org.apache.lucene.util.fst.FST.(FST.java:313)
at
org.apache.lucene.codecs.blocktree.FieldReader.(FieldReader.java:91)
at
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:234)
at
org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat.fieldsProducer(Lucene50PostingsFormat.java:445)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:292)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:372)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:112)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:54)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:77)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at
org.apache.jena.query.text.TextIndexLucene.query(TextIndexLucene.java:370)
at org.apache.jena.query.text.TextQueryPF.performQuery(TextQueryPF.java:290)
at
org.apache.jena.query.text.TextQueryPF.lambda$query$1(TextQueryPF.java:267)
at
org.apache.jena.query.text.TextQueryPF$$Lambda$66/2108167189.call(Unknown
Source)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:5065)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3716)
at

TDB2 binary format

2017-12-01 Thread Laura Morales
Where can I find a definition of the TDB2 binary format?


Slow Lucene query

2017-12-01 Thread Jean-Marc Vanel
Hi

The time for a SPARQL + Lucene query is more than 1 mn.
It used to be around one second, and then the database grew.
Here is a typical query run for a lookup service similar to dbpedia lookup:

PREFIX text: 
PREFIX form: <
http://raw.githubusercontent.com/jmvanel/semantic_forms/master/vocabulary/forms.owl.ttl#
>
SELECT DISTINCT ?thing ?COUNT WHERE {
  graph ?g {
?thing text:query ( 'Jean-Marc' ) .
  }
  graph ?g1 {
?thing a ?CLASS .
  }
  OPTIONAL {
   graph ?grCount {
?thing form:linksCount ?COUNT.
  } }
} ORDER BY DESC(?COUNT)
LIMIT 10

Here is a simpler query that is also slow:

PREFIX text: 
SELECT DISTINCT ?thing ?COUNT WHERE {
  graph ?g {
?thing text:query ( 'Jean-Marc' ) .
  }
} ORDER BY DESC(?COUNT)
LIMIT 10

Run it with:
time wget -O semantic-forms.cc_select-ui.txt
http://semantic-forms.cc:9112/select-ui?query=PREFIX+text%3A+%3Chttp%3A%2F%2Fjena.apache.org%2Ftext%23%3E+%0D%0ASELECT+DISTINCT+%3Fthing+%3FCOUNT+WHERE+%7B%0D%0A++graph+%3Fg+%7B%0D%0A%3Fthing+text%3Aquery+%28+%27Jean-Marc%27+%29+.%0D%0A++%7D%0D%0A%7D%0D%0AORDER+BY+DESC%28%3FCOUNT%29%0D%0ALIMIT+10

Or, if you want to use YasGUI , the endpoint is
http://semantic-forms.cc:9112/sparql

*Statistics on the database*
268 graphs and 588 864 triples.

# Count graphs and triples
SELECT (COUNT(?s) AS ?trc) (COUNT(?GR) AS ?grc)
WHERE {
  { GRAPH ?GR { } }
  UNION
  { GRAPH ?GR1 { ?s ?p ?o . } }
}

Result: 2 rows
"grc" "trc"
"268"^^http://www.w3.org/2001/XMLSchema#integer "588864"^^
http://www.w3.org/2001/XMLSchema#integer

(I'm not sure this the right way to count, but it gives figures :) )

You can reproduce the query with this UI :

http://semantic-forms.cc:9112/select-ui?query=%23+Count+graphs%0D%0ASELECT+%28COUNT%28%3Fs%29+AS+%3Ftrc%29+%28COUNT%28%3FGR%29+AS+%3Fgrc%29%0D%0AWHERE+%7B%0D%0A+%7B+GRAPH+%3FGR+%7B+%7D+%7D%0D%0A++UNION%0D%0A%7B+GRAPH+%3FGR1+%7B+%3Fs+%3Fp+%3Fo+.+%7D+%7D%0D%0A%7D

This is using Jena 3.5.0. with TDB 1 .
Here is a stack made with kill -3 when the app. was working hard;
I put in bold a suspect line.

"application-akka.actor.default-dispatcher-1351" #2683 prio=5 os_prio=0
tid=0x7f07a801c000 nid=0x9b7 runnable [0x7f06f25ab000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.NativeThreadSet.add(NativeThreadSet.java:46)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:737)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
at
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:179)
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
at
org.apache.lucene.store.BufferedIndexInput.readInt(BufferedIndexInput.java:183)
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:194)
at org.apache.lucene.util.fst.FST.(FST.java:327)
at org.apache.lucene.util.fst.FST.(FST.java:313)
at
org.apache.lucene.codecs.blocktree.FieldReader.(FieldReader.java:91)
at
org.apache.lucene.codecs.blocktree.BlockTreeTermsReader.(BlockTreeTermsReader.java:234)
at
org.apache.lucene.codecs.lucene50.Lucene50PostingsFormat.fieldsProducer(Lucene50PostingsFormat.java:445)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat$FieldsReader.(PerFieldPostingsFormat.java:292)
at
org.apache.lucene.codecs.perfield.PerFieldPostingsFormat.fieldsProducer(PerFieldPostingsFormat.java:372)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:112)
at org.apache.lucene.index.SegmentReader.(SegmentReader.java:74)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62)
at
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:54)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:692)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:77)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:63)
at
org.apache.jena.query.text.TextIndexLucene.query(TextIndexLucene.java:370)
at org.apache.jena.query.text.TextQueryPF.performQuery(TextQueryPF.java:290)
at
org.apache.jena.query.text.TextQueryPF.lambda$query$1(TextQueryPF.java:267)
at
org.apache.jena.query.text.TextQueryPF$$Lambda$66/2108167189.call(Unknown
Source)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:5065)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3716)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2424)
at
org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2298)
* - locked <0xef9d34f8> (a