When possible, I'm using some partitioning strategy and split a big set of entities between several graphs; so, the response time for each graph is acceptable, but it push some complexity to the application and the queries. For example, with a lot of entities with a date property, I split the set by month.
-- Jean-Claude Moissinac Le mar. 9 mars 2021 à 18:34, Martynas Jusevičius <marty...@atomgraph.com> a écrit : > Are you using TDB? Because it does have indices: > > https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes > > > On Tue, 9 Mar 2021 at 17.58, Donald McIntosh <dmcint...@opentechnology.net > > > wrote: > > > Thanks all for replies. > > > > Jean-Claude - yes I assume the same. It just scrolls through until it > > gets to the offset, which means the query runtime will scale exactly > > with size of the offset with no optimizations or re-use between > > executions to make it faster. Without indexes, I cannot see a way > > around this and think it is just a feature of a basic triple store which > > is optimized for RDF flexibility over strict structure and indexes etc. > > > > Lorenz & Martynas - yes if I want to know the total data set size then I > > can just do a query for the count(*) which will be slow first time but > > can be re-used as they page back and forth. And then I can use > > limit/offset and also have the count to set a bounding limit to total > > number of pages. It's not a huge set of data so I think this is fine. > > > > Ultimately however, it's definitely not a scalable solution. If two > > users run the same paging screen at the same time on my system, it will > > double the work.. which is already sub-optimal due to lack of indexes. > > I guess - and I do not mean this as a criticism - this is just the > > nature of Jena and perhaps RDF datastores more broadly. Their "sweet > > spot" is exploring linked data in a natural item-to-item way, rather > > than sub-selecting for use with say a UI where there a multiple users > > and responsiveness is important. I should mitigate by running slow > > reports when quiet and save the output... or is there a Jena/RDF feature > > that I am not leveraging that could help here? > > > > On 09/03/2021 09:34, Jean-Claude Moissinac wrote: > > > I think we have there a performance issue. > > > When the complete result is big, and you take a slice with > LIMIT/OFFSET, > > i > > > suspect the Jena implementation is going through the dataset until the > > > offset, then it get the result. It doesn't take advantage of a previous > > > offset/limit with the same query; so, going through the complete result > > > page by page using LIMIT/OFFSET can become infeasible because the > > response > > > time augment with the OFFSET value. > > > Is there a best practice about such use case? > > > -- > > > Jean-Claude Moissinac > > > > > > > > > > > > Le mar. 9 mars 2021 à 10:16, Lorenz Buehmann < > > > buehm...@informatik.uni-leipzig.de> a écrit : > > > > > >> just some comments as you already got the answer: > > >> > > >> pagination in SPARQL can only be done via limit + offset, as you > already > > >> have figured out - but, formally, it is only guaranteed to be correct > > >> when sorting the data. > > >> > > >> depending on the size of the data this can be expensive - especially > > >> what people always find strange is hat offset operator in SPARQL is > not > > >> as simple as it might be in SQL because of the semantics of SPARQL. > > >> There isn't a "simple" cursor like in SQL database, so it might be > > >> rather slow for large offsets. > > >> > > >> Jena usually doesn't know the result size during query execution as > it's > > >> (afaik) using a pipelined execution (aka lazy or Volcano) - only for > > >> operations where it has to have to whole intermediate result computed > to > > >> proceed to the next stage (e.,g. aggregates) this assumption holds. > > >> > > >> > > >> long story short: if you really think that a user needs to see all > > >> pages, as already suggest, a count in a separate query before would do > > it. > > >> > > >> On 09.03.21 09:52, Donald McIntosh wrote: > > >>> Hi.. > > >>> > > >>> I have an implementation where I would like to page through data > > >> retrieved via a SPARQL query on Apache Jena on a UI. offset and limit > > >> features take me some of the way there but do not tell me the full > size > > of > > >> the overall result set so that users can skip to the end or to page x > > >> knowing that it will exist. I am guessing that internally Jena will > > know > > >> the result set size from a query but perhaps this not available to the > > >> caller, as the full set will have been retrieved and sorted. > > >>> Is there a correct and efficient way to implement this type of use > case > > >> inApache Jena ? > > >>> Thanks, > > >>> Donald > > >