Re: Paging data with Jena

Donald McIntosh Wed, 10 Mar 2021 03:07:46 -0800

Yeah... I am conflating issues a bit but I don't think this is a clientcaching issue. What I am saying is that I am expecting theimplementation of offset to be slow and directly related to amount ofdata - and therefore unsuitable for a number use cases where responsetime is important. In a paging use case, it _must_ 'tablescan' the fulldataset after re-running the query, and then re-sorting it to get to thenext required offset. And then next page will need to do the same allover again, from scratch. It cannot quickly get to an arbitraryindex/offset in the data.

Unless there is a better way to use offset or structure my data toensure the offset is quick. TDB has been mentioned so I will look at that.


On 10/03/2021 09:56, Lorenz Buehmann wrote:

On 09.03.21 17:57, Donald McIntosh wrote:
Ultimately however, it's definitely not a scalable solution. If twousers run the same paging screen at the same time on my system, itwill double the work.. which is already sub-optimal due to lack ofindexes. I guess - and I do not mean this as a criticism - this isjust the nature of Jena and perhaps RDF datastores more broadly. Their "sweet spot" is exploring linked data in a natural item-to-itemway, rather than sub-selecting for use with say a UI where there amultiple users and responsiveness is important. I should mitigate byrunning slow reports when quiet and save the output... or is there aJena/RDF feature that I am not leveraging that could help here?
From my point of view, this is a task for some caching mechanism.There do exist several solutions for caching at different steps inyour workflow. While this indeed could be done at database level, it'sfrom my point of view more a client side task.
Or did you mean some different issue here?
On 09/03/2021 09:34, Jean-Claude Moissinac wrote:
I think we have there a performance issue.
When the complete result is big, and you take a slice withLIMIT/OFFSET, i
suspect the Jena implementation is going through the dataset until the
offset, then it get the result. It doesn't take advantage of a previous
offset/limit with the same query; so, going through the complete result
page by page using LIMIT/OFFSET can become infeasible because theresponse
time augment with the OFFSET value.
Is there a best practice about such use case?
--
Jean-Claude Moissinac



Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> a écrit :
just some comments as you already got the answer:
pagination in SPARQL can only be done via limit + offset, as youalready
have figured out - but, formally, it is only guaranteed to be correct
when sorting the data.

depending on the size of the data this can be expensive - especially
what people always find strange is hat offset operator in SPARQL isnot
as simple as it might be in SQL because of the semantics of SPARQL.
There isn't a "simple" cursor like in SQL database, so it might be
rather slow for large offsets.
Jena usually doesn't know the result size during query execution asit's
(afaik) using a pipelined execution (aka lazy or Volcano) - only for
operations where it has to have to whole intermediate resultcomputed to
proceed to the next stage (e.,g. aggregates) this assumption holds.


long story short: if you really think that a user needs to see all
pages, as already suggest, a count in a separate query before woulddo it.
On 09.03.21 09:52, Donald McIntosh wrote:
Hi..

I have an implementation where I would like to page through data
retrieved via a SPARQL query on Apache Jena on a UI.  offset and limit
features take me some of the way there but do not tell me the fullsize of
the overall result set so that users can skip to the end or to page x
knowing that it will exist. I am guessing that internally Jenawill know
the result set size from a query but perhaps this not available to the
caller, as the full set will have been retrieved and sorted.
Is there a correct and efficient way to implement this type of usecase
inApache Jena ?
Thanks,
Donald

Re: Paging data with Jena

Reply via email to