Re: Paging data with Jena

Jean-Claude Moissinac Tue, 09 Mar 2021 09:54:03 -0800

When possible, I'm using some partitioning strategy and split a big set of
entities between several graphs; so, the response time for each graph is
acceptable, but it push some complexity to the application and the queries.
For example, with a lot of entities with a date property, I split the set
by month.


--
Jean-Claude Moissinac



Le mar. 9 mars 2021 à 18:34, Martynas Jusevičius <marty...@atomgraph.com> a
écrit :

> Are you using TDB? Because it does have indices:
>
> https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes
>
>
> On Tue, 9 Mar 2021 at 17.58, Donald McIntosh <dmcint...@opentechnology.net
> >
> wrote:
>
> > Thanks all for replies.
> >
> > Jean-Claude - yes I assume the same.  It just scrolls through until it
> > gets to the offset, which means the query runtime will scale exactly
> > with size of the offset with no optimizations or re-use between
> > executions to make it faster.  Without indexes, I cannot see a way
> > around this and think it is just a feature of a basic triple store which
> > is optimized for RDF flexibility over strict structure and indexes etc.
> >
> > Lorenz & Martynas - yes if I want to know the total data set size then I
> > can just do a query for the count(*) which will be slow first time but
> > can be re-used as they page back and forth. And then I can use
> > limit/offset and also have the count to set a bounding limit to total
> > number of pages.  It's not a huge set of data so I think this is fine.
> >
> > Ultimately however, it's definitely not a scalable solution.  If two
> > users run the same paging screen at the same time on my system, it will
> > double the work.. which is already sub-optimal due to lack of indexes.
> > I guess - and I do not mean this as a criticism - this is just the
> > nature of Jena and perhaps RDF datastores more broadly.  Their "sweet
> > spot" is exploring linked data in a natural item-to-item way, rather
> > than sub-selecting for use with say a UI where there a multiple users
> > and responsiveness is important.  I should mitigate by running slow
> > reports when quiet and save the output... or is there a Jena/RDF feature
> > that I am not leveraging that could help here?
> >
> > On 09/03/2021 09:34, Jean-Claude Moissinac wrote:
> > > I think we have there a performance issue.
> > > When the complete result is big, and you take a slice with
> LIMIT/OFFSET,
> > i
> > > suspect the Jena implementation is going through the dataset until the
> > > offset, then it get the result. It doesn't take advantage of a previous
> > > offset/limit with the same query; so, going through the complete result
> > > page by page using LIMIT/OFFSET can become infeasible because the
> > response
> > > time augment with the OFFSET value.
> > > Is there a best practice about such use case?
> > > --
> > > Jean-Claude Moissinac
> > >
> > >
> > >
> > > Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann <
> > > buehm...@informatik.uni-leipzig.de> a écrit :
> > >
> > >> just some comments as you already got the answer:
> > >>
> > >> pagination in SPARQL can only be done via limit + offset, as you
> already
> > >> have figured out - but, formally, it is only guaranteed to be correct
> > >> when sorting the data.
> > >>
> > >> depending on the size of the data this can be expensive - especially
> > >> what people always find strange is hat offset operator in SPARQL is
> not
> > >> as simple as it might be in SQL because of the semantics of SPARQL.
> > >> There isn't a "simple" cursor like in SQL database, so it might be
> > >> rather slow for large offsets.
> > >>
> > >> Jena usually doesn't know the result size during query execution as
> it's
> > >> (afaik) using a pipelined execution (aka lazy or Volcano) - only for
> > >> operations where it has to have to whole intermediate result computed
> to
> > >> proceed to the next stage (e.,g. aggregates) this assumption holds.
> > >>
> > >>
> > >> long story short: if you really think that a user needs to see all
> > >> pages, as already suggest, a count in a separate query before would do
> > it.
> > >>
> > >> On 09.03.21 09:52, Donald McIntosh wrote:
> > >>> Hi..
> > >>>
> > >>> I have an implementation where I would like to page through data
> > >> retrieved via a SPARQL query on Apache Jena on a UI.  offset and limit
> > >> features take me some of the way there but do not tell me the full
> size
> > of
> > >> the overall result set so that users can skip to the end or to page x
> > >> knowing that it will exist.  I am guessing that internally Jena will
> > know
> > >> the result set size from a query but perhaps this not available to the
> > >> caller, as the full set will have been retrieved and sorted.
> > >>> Is there a correct and efficient way to implement this type of use
> case
> > >> inApache Jena ?
> > >>> Thanks,
> > >>> Donald
> >
>

Re: Paging data with Jena

Reply via email to