Re: Very very slow query when using a high OFFSET

Andy Seaborne Sun, 17 Dec 2017 09:14:05 -0800


On 17/12/17 16:47, Lorenz Buehmann wrote:


On 17.12.2017 16:18, Andy Seaborne wrote:

If yuo have the setup to hand still, coiudk you try:

SELECT (COUNT(*) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>}

Count(?s) materializes ?s, which is strictly unnecessary in this case
but in other cases is it necessary.

My expectation is that COUNT(*) and a slice of (10,2000000) should be
about the same.  (It indicates something about how hdt-java works.)

Sure, just for comparison I run both queries again. As you expected,
COUNT(*) is much faster:

COUNT(*):

|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(*) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"                        ||
||cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt   107,15s user 4,31s system 392% cpu
28,426 total|

COUNT(?s):

|bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"||
||cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt   282,99s user 5,72s system 185% cpu
2:35,41 total|


Thanks.

The COUNT(*) is preformed by ARQ by seeing each row but not touching thecontents of the row.

The same happens with TDB - the values of bindings in rows are delayeduntil fetched.


Regarding performance, from what I understood while reading the HDT
paper, the worst pattern is VAR URI URI which makes sense as triples are
ordered by subject in BitmapTriples.

That makes sense. Their website documentation also talks a little aboutmultiple indexes, which IIUC means multiple recording of triples againstthe same dictionary.


    Andy

Re: Very very slow query when using a high OFFSET

Reply via email to