On 17/12/17 16:47, Lorenz Buehmann wrote:
On 17.12.2017 16:18, Andy Seaborne wrote:
If yuo have the setup to hand still, coiudk you try:
SELECT (COUNT(*) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>}
Count(?s) materializes ?s, which is strictly unnecessary in this case
but in other cases is it necessary.
My expectation is that COUNT(*) and a slice of (10,2000000) should be
about the same. (It indicates something about how hdt-java works.)
Sure, just for comparison I run both queries again. As you expected,
COUNT(*) is much faster:
COUNT(*):
|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(*) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}" ||
||cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt 107,15s user 4,31s system 392% cpu
28,426 total|
COUNT(?s):
|bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"||
||cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt 282,99s user 5,72s system 185% cpu
2:35,41 total|
Thanks.
The COUNT(*) is preformed by ARQ by seeing each row but not touching the
contents of the row.
The same happens with TDB - the values of bindings in rows are delayed
until fetched.
Regarding performance, from what I understood while reading the HDT
paper, the worst pattern is VAR URI URI which makes sense as triples are
ordered by subject in BitmapTriples.
That makes sense. Their website documentation also talks a little about
multiple indexes, which IIUC means multiple recording of triples against
the same dictionary.
Andy