Are you using TDB? Because it does have indices: https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes
On Tue, 9 Mar 2021 at 17.58, Donald McIntosh <dmcint...@opentechnology.net> wrote: > Thanks all for replies. > > Jean-Claude - yes I assume the same. It just scrolls through until it > gets to the offset, which means the query runtime will scale exactly > with size of the offset with no optimizations or re-use between > executions to make it faster. Without indexes, I cannot see a way > around this and think it is just a feature of a basic triple store which > is optimized for RDF flexibility over strict structure and indexes etc. > > Lorenz & Martynas - yes if I want to know the total data set size then I > can just do a query for the count(*) which will be slow first time but > can be re-used as they page back and forth. And then I can use > limit/offset and also have the count to set a bounding limit to total > number of pages. It's not a huge set of data so I think this is fine. > > Ultimately however, it's definitely not a scalable solution. If two > users run the same paging screen at the same time on my system, it will > double the work.. which is already sub-optimal due to lack of indexes. > I guess - and I do not mean this as a criticism - this is just the > nature of Jena and perhaps RDF datastores more broadly. Their "sweet > spot" is exploring linked data in a natural item-to-item way, rather > than sub-selecting for use with say a UI where there a multiple users > and responsiveness is important. I should mitigate by running slow > reports when quiet and save the output... or is there a Jena/RDF feature > that I am not leveraging that could help here? > > On 09/03/2021 09:34, Jean-Claude Moissinac wrote: > > I think we have there a performance issue. > > When the complete result is big, and you take a slice with LIMIT/OFFSET, > i > > suspect the Jena implementation is going through the dataset until the > > offset, then it get the result. It doesn't take advantage of a previous > > offset/limit with the same query; so, going through the complete result > > page by page using LIMIT/OFFSET can become infeasible because the > response > > time augment with the OFFSET value. > > Is there a best practice about such use case? > > -- > > Jean-Claude Moissinac > > > > > > > > Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann < > > buehm...@informatik.uni-leipzig.de> a écrit : > > > >> just some comments as you already got the answer: > >> > >> pagination in SPARQL can only be done via limit + offset, as you already > >> have figured out - but, formally, it is only guaranteed to be correct > >> when sorting the data. > >> > >> depending on the size of the data this can be expensive - especially > >> what people always find strange is hat offset operator in SPARQL is not > >> as simple as it might be in SQL because of the semantics of SPARQL. > >> There isn't a "simple" cursor like in SQL database, so it might be > >> rather slow for large offsets. > >> > >> Jena usually doesn't know the result size during query execution as it's > >> (afaik) using a pipelined execution (aka lazy or Volcano) - only for > >> operations where it has to have to whole intermediate result computed to > >> proceed to the next stage (e.,g. aggregates) this assumption holds. > >> > >> > >> long story short: if you really think that a user needs to see all > >> pages, as already suggest, a count in a separate query before would do > it. > >> > >> On 09.03.21 09:52, Donald McIntosh wrote: > >>> Hi.. > >>> > >>> I have an implementation where I would like to page through data > >> retrieved via a SPARQL query on Apache Jena on a UI. offset and limit > >> features take me some of the way there but do not tell me the full size > of > >> the overall result set so that users can skip to the end or to page x > >> knowing that it will exist. I am guessing that internally Jena will > know > >> the result set size from a query but perhaps this not available to the > >> caller, as the full set will have been retrieved and sorted. > >>> Is there a correct and efficient way to implement this type of use case > >> inApache Jena ? > >>> Thanks, > >>> Donald >