On 17/12/17 09:51, Lorenz Buehmann wrote:
I think that aggregation is one of the access patterns that HDT is not
really designed for:
|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>}"|
|cnt||
||37871468||
||bin/hdtsparql.sh ~/wikidata.hdt 282,55s user 5,20s system 185% cpu
2:35,04 total|
It's not a problem with the OFFSET, it's just slow:
|time bin/hdtsparql.sh ~/wikidata.hdt "SELECT (COUNT(?s) AS ?cnt) {?s a
<http://wikiba.se/ontology-beta#Item>} LIMIT 20 OFFSET 20000000"||
||
||cnt||
||bin/hdtsparql.sh ~/wikidata.hdt 350,53s user 30,95s system 207% cpu
3:03,83 total|
If yuo have the setup to hand still, coiudk you try:
SELECT (COUNT(*) AS ?cnt) {?s a <http://wikiba.se/ontology-beta#Item>}
Count(?s) materializes ?s, which is strictly unnecessary in this case
but in other cases is it necessary.
My expectation is that COUNT(*) and a slice of (10,2000000) should be
about the same. (It indicates something about how hdt-java works.)
Andy
On 16.12.2017 19:17, Laura Morales wrote:
What I'm trying to understand is why you would have such a large offset and
what real world application there is?
I don't have any particular use case in mind. I just tried to break it and it
broke.
It's because the query is simple with no order that it seems
synthetic/contrived to me.
I think the default order is how triples are physically stored, which is
probably SPO. But anyway this wasn't important for me. I just wanted to test a
high offset.
I'm not near my hardware but I wonder if similar symptoms are obtained with a
count (s) and a
limit 20000000. As this should be similar in that it reads a large number of
triples but
returns a small result set?
Curiously, this query seems to hang in both cases, that is if I use
defaultGraph or namedGraph
SELECT (COUNT(?s) AS ?cnt)
FROM <...> <-- only used with namedGraph. No FROM with defaultGraph
WHERE {
?s a <http://wikiba.se/ontology-beta#Item>
}
LIMIT 10
OFFSET 20000000