Re: Paging data with Jena

Martynas Jusevičius Tue, 09 Mar 2021 09:34:15 -0800

Are you using TDB? Because it does have indices:
https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes



On Tue, 9 Mar 2021 at 17.58, Donald McIntosh <dmcint...@opentechnology.net>
wrote:

> Thanks all for replies.
>
> Jean-Claude - yes I assume the same.  It just scrolls through until it
> gets to the offset, which means the query runtime will scale exactly
> with size of the offset with no optimizations or re-use between
> executions to make it faster.  Without indexes, I cannot see a way
> around this and think it is just a feature of a basic triple store which
> is optimized for RDF flexibility over strict structure and indexes etc.
>
> Lorenz & Martynas - yes if I want to know the total data set size then I
> can just do a query for the count(*) which will be slow first time but
> can be re-used as they page back and forth. And then I can use
> limit/offset and also have the count to set a bounding limit to total
> number of pages.  It's not a huge set of data so I think this is fine.
>
> Ultimately however, it's definitely not a scalable solution.  If two
> users run the same paging screen at the same time on my system, it will
> double the work.. which is already sub-optimal due to lack of indexes.
> I guess - and I do not mean this as a criticism - this is just the
> nature of Jena and perhaps RDF datastores more broadly.  Their "sweet
> spot" is exploring linked data in a natural item-to-item way, rather
> than sub-selecting for use with say a UI where there a multiple users
> and responsiveness is important.  I should mitigate by running slow
> reports when quiet and save the output... or is there a Jena/RDF feature
> that I am not leveraging that could help here?
>
> On 09/03/2021 09:34, Jean-Claude Moissinac wrote:
> > I think we have there a performance issue.
> > When the complete result is big, and you take a slice with LIMIT/OFFSET,
> i
> > suspect the Jena implementation is going through the dataset until the
> > offset, then it get the result. It doesn't take advantage of a previous
> > offset/limit with the same query; so, going through the complete result
> > page by page using LIMIT/OFFSET can become infeasible because the
> response
> > time augment with the OFFSET value.
> > Is there a best practice about such use case?
> > --
> > Jean-Claude Moissinac
> >
> >
> >
> > Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann <
> > buehm...@informatik.uni-leipzig.de> a écrit :
> >
> >> just some comments as you already got the answer:
> >>
> >> pagination in SPARQL can only be done via limit + offset, as you already
> >> have figured out - but, formally, it is only guaranteed to be correct
> >> when sorting the data.
> >>
> >> depending on the size of the data this can be expensive - especially
> >> what people always find strange is hat offset operator in SPARQL is not
> >> as simple as it might be in SQL because of the semantics of SPARQL.
> >> There isn't a "simple" cursor like in SQL database, so it might be
> >> rather slow for large offsets.
> >>
> >> Jena usually doesn't know the result size during query execution as it's
> >> (afaik) using a pipelined execution (aka lazy or Volcano) - only for
> >> operations where it has to have to whole intermediate result computed to
> >> proceed to the next stage (e.,g. aggregates) this assumption holds.
> >>
> >>
> >> long story short: if you really think that a user needs to see all
> >> pages, as already suggest, a count in a separate query before would do
> it.
> >>
> >> On 09.03.21 09:52, Donald McIntosh wrote:
> >>> Hi..
> >>>
> >>> I have an implementation where I would like to page through data
> >> retrieved via a SPARQL query on Apache Jena on a UI.  offset and limit
> >> features take me some of the way there but do not tell me the full size
> of
> >> the overall result set so that users can skip to the end or to page x
> >> knowing that it will exist.  I am guessing that internally Jena will
> know
> >> the result set size from a query but perhaps this not available to the
> >> caller, as the full set will have been retrieved and sorted.
> >>> Is there a correct and efficient way to implement this type of use case
> >> inApache Jena ?
> >>> Thanks,
> >>> Donald
>

Re: Paging data with Jena

Reply via email to