I think that aggregation is one of the access patterns that HDT is not
really designed for:

|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"|

|cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt   282,55s user 5,20s system 185% cpu
2:35,04 total|

It's not a problem with the OFFSET, it's just slow:

|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>} LIMIT 20 OFFSET 20000000"||
||
||cnt||
||bin/hdtsparql.sh ~/wikidata.hdt   350,53s user 30,95s system 207% cpu
3:03,83 total|


On 16.12.2017 19:17, Laura Morales wrote:
>> What I'm trying to understand is why you would have such a large offset and 
>> what real world application there is?
> I don't have any particular use case in mind. I just tried to break it and it 
> broke.
>
>> It's because the query is simple with no order that it seems 
>> synthetic/contrived to me.
> I think the default order is how triples are physically stored, which is 
> probably SPO. But anyway this wasn't important for me. I just wanted to test 
> a high offset.
>
>> I'm not near my hardware but I wonder if similar symptoms are obtained with 
>> a count (s) and a
>> limit 20000000. As this should be similar in that it reads a large number of 
>> triples but
>> returns a small result set?
> Curiously, this query seems to hang in both cases, that is if I use 
> defaultGraph or namedGraph
>
> SELECT (COUNT(?s) AS ?cnt)
> FROM <...>  <-- only used with namedGraph. No FROM with defaultGraph
> WHERE {
>   ?s a <http://wikiba.se/ontology-beta#Item>
> }
> LIMIT 10
> OFFSET 20000000

Reply via email to