I think that aggregation is one of the access patterns that HDT is not really designed for:
|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>}"| |cnt|| ||37871468|| ||bin/hdtsparql.sh ~/wikidata.hdt 282,55s user 5,20s system 185% cpu 2:35,04 total| It's not a problem with the OFFSET, it's just slow: |time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>} LIMIT 20 OFFSET 20000000"|| || ||cnt|| ||bin/hdtsparql.sh ~/wikidata.hdt 350,53s user 30,95s system 207% cpu 3:03,83 total| On 16.12.2017 19:17, Laura Morales wrote: >> What I'm trying to understand is why you would have such a large offset and >> what real world application there is? > I don't have any particular use case in mind. I just tried to break it and it > broke. > >> It's because the query is simple with no order that it seems >> synthetic/contrived to me. > I think the default order is how triples are physically stored, which is > probably SPO. But anyway this wasn't important for me. I just wanted to test > a high offset. > >> I'm not near my hardware but I wonder if similar symptoms are obtained with >> a count (s) and a >> limit 20000000. As this should be similar in that it reads a large number of >> triples but >> returns a small result set? > Curiously, this query seems to hang in both cases, that is if I use > defaultGraph or namedGraph > > SELECT (COUNT(?s) AS ?cnt) > FROM <...> <-- only used with namedGraph. No FROM with defaultGraph > WHERE { > ?s a <http://wikiba.se/ontology-beta#Item> > } > LIMIT 10 > OFFSET 20000000