Re: Paging data with Jena

Donald McIntosh Tue, 09 Mar 2021 08:58:07 -0800

Thanks all for replies.

Jean-Claude - yes I assume the same. It just scrolls through until itgets to the offset, which means the query runtime will scale exactlywith size of the offset with no optimizations or re-use betweenexecutions to make it faster. Without indexes, I cannot see a wayaround this and think it is just a feature of a basic triple store whichis optimized for RDF flexibility over strict structure and indexes etc.

Lorenz & Martynas - yes if I want to know the total data set size then Ican just do a query for the count(*) which will be slow first time butcan be re-used as they page back and forth. And then I can uselimit/offset and also have the count to set a bounding limit to totalnumber of pages. It's not a huge set of data so I think this is fine.

Ultimately however, it's definitely not a scalable solution. If twousers run the same paging screen at the same time on my system, it willdouble the work.. which is already sub-optimal due to lack of indexes. I guess - and I do not mean this as a criticism - this is just thenature of Jena and perhaps RDF datastores more broadly. Their "sweetspot" is exploring linked data in a natural item-to-item way, ratherthan sub-selecting for use with say a UI where there a multiple usersand responsiveness is important. I should mitigate by running slowreports when quiet and save the output... or is there a Jena/RDF featurethat I am not leveraging that could help here?


On 09/03/2021 09:34, Jean-Claude Moissinac wrote:

I think we have there a performance issue.
When the complete result is big, and you take a slice with LIMIT/OFFSET, i
suspect the Jena implementation is going through the dataset until the
offset, then it get the result. It doesn't take advantage of a previous
offset/limit with the same query; so, going through the complete result
page by page using LIMIT/OFFSET can become infeasible because the response
time augment with the OFFSET value.
Is there a best practice about such use case?
--
Jean-Claude Moissinac



Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> a écrit :

just some comments as you already got the answer:

pagination in SPARQL can only be done via limit + offset, as you already
have figured out - but, formally, it is only guaranteed to be correct
when sorting the data.

depending on the size of the data this can be expensive - especially
what people always find strange is hat offset operator in SPARQL is not
as simple as it might be in SQL because of the semantics of SPARQL.
There isn't a "simple" cursor like in SQL database, so it might be
rather slow for large offsets.

Jena usually doesn't know the result size during query execution as it's
(afaik) using a pipelined execution (aka lazy or Volcano) - only for
operations where it has to have to whole intermediate result computed to
proceed to the next stage (e.,g. aggregates) this assumption holds.


long story short: if you really think that a user needs to see all
pages, as already suggest, a count in a separate query before would do it.

On 09.03.21 09:52, Donald McIntosh wrote:

Hi..

I have an implementation where I would like to page through data

retrieved via a SPARQL query on Apache Jena on a UI.  offset and limit
features take me some of the way there but do not tell me the full size of
the overall result set so that users can skip to the end or to page x
knowing that it will exist.  I am guessing that internally Jena will know
the result set size from a query but perhaps this not available to the
caller, as the full set will have been retrieved and sorted.

Is there a correct and efficient way to implement this type of use case

inApache Jena ?

Thanks,
Donald

Re: Paging data with Jena

Reply via email to