Hello,

We need to do paging in our app and as we don’t have a single property on which 
we can do a WHERE cutoff, we use an OFFSET/LIMIT. The OFFSET works fairly fast 
on the small values, however when we add an ORDER BY clause (as per SPARQL 1.1 
spec as OFFSET is not guaranteed to make sense otherwise), queries become quite 
slow (5s vs <<1s).

The query we run:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DESCRIBE ?s
WHERE
{
                GRAPH <urn:x-arq:DefaultGraph>
                {
                                SELECT DISTINCT ?s
                                WHERE
                                { ?s ?p ?o ;
                                  rdf:type <someType>
                                }
                                ORDER BY ASC(?s)
                                OFFSET 100
                                LIMIT 21
                }
}

An “optimization” below works (<<1s eval time) but obviously will not guarantee 
that the inner LIMIT is always applied in the same manner and the paging beyond 
### items won’t work:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
DESCRIBE ?s
WHERE
{
                GRAPH <urn:x-arq:DefaultGraph>
                {
                                SELECT DISTINCT ?s
                                WHERE
                                { ?s ?p ?o ;
                                  rdf:type <someType>
                                }
                                LIMIT 1000
                }
}
ORDER BY ASC(?s)
OFFSET 100
LIMIT 21

We’ve read that the order is mostly stable even without the ORDER [1]. Is there 
a better way to do this? We were assuming that sorting on the subject URI will 
result into some optimization but it does not seem to be the case. Is it 
possible to add some way to sort on NodeId?

Also, there is a problem with OFFSET/LIMIT approach even without an order as 
the OFFSET grows [2]. What is the recommended approach to a stable paging that 
would scale well? In SQL, seek method [3] is considered appropriate for most 
DBs. I tried replicating that, adding a random int ID to every resource and 
using FILTER(?ord > ###) to do the paging, using the max value from a page as 
an argument for the next. However, this again works fast only if the ORDER BY 
clause is missing, which seems to be essential to get a stable sorting (but at 
least allows to add a custom sort order without requiring an ORDER BY clause if 
one wishes to live a dangerous life). What is the best way to do stable paging 
in Jena/SPARQL?

Thanks in advance,
Andrew

[1]: 
https://markmail.org/search/?q=offset+order+list%3Aorg.apache.incubator.jena-users#query:offset%20order%20list%3Aorg.apache.incubator.jena-users+page:1+mid:fik5kllpnd4sm3tl+state:results
[2]: 
https://markmail.org/search/?q=offset+list%3Aorg.apache.incubator.jena-users#query:offset%20list%3Aorg.apache.incubator.jena-users+page:1+mid:uhzxkaxstbzfetns+state:results
[3]: https://blog.jooq.org/faster-sql-paging-with-jooq-using-the-seek-method/

Reply via email to