On 17/12/17 09:51, Lorenz Buehmann wrote:
I think that aggregation is one of the access patterns that HDT is not
really designed for:

|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"|

|cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt   282,55s user 5,20s system 185% cpu
2:35,04 total|

It's not a problem with the OFFSET, it's just slow:

|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>} LIMIT 20 OFFSET 20000000"||
||
||cnt||
||bin/hdtsparql.sh ~/wikidata.hdt   350,53s user 30,95s system 207% cpu
3:03,83 total|

If yuo have the setup to hand still, coiudk you try:

SELECT (COUNT(*) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>}

Count(?s) materializes ?s, which is strictly unnecessary in this case but in other cases is it necessary.

My expectation is that COUNT(*) and a slice of (10,2000000) should be about the same. (It indicates something about how hdt-java works.)

    Andy



On 16.12.2017 19:17, Laura Morales wrote:
What I'm trying to understand is why you would have such a large offset and 
what real world application there is?
I don't have any particular use case in mind. I just tried to break it and it 
broke.

It's because the query is simple with no order that it seems 
synthetic/contrived to me.
I think the default order is how triples are physically stored, which is 
probably SPO. But anyway this wasn't important for me. I just wanted to test a 
high offset.

I'm not near my hardware but I wonder if similar symptoms are obtained with a 
count (s) and a
limit 20000000. As this should be similar in that it reads a large number of 
triples but
returns a small result set?
Curiously, this query seems to hang in both cases, that is if I use 
defaultGraph or namedGraph

SELECT (COUNT(?s) AS ?cnt)
FROM <...>  <-- only used with namedGraph. No FROM with defaultGraph
WHERE {
   ?s a <http://wikiba.se/ontology-beta#Item>
}
LIMIT 10
OFFSET 20000000


Reply via email to