The difference is as Andy implied in the protocol definition.  SPARQL as a 
protocol, is tied to HTTP, and does not expose any notion of cursors as a 
traditional RDBMS would.  Once ARQ passes off a query to a remote SPARQL 
service it has no control over how that query is executed by the remote service.

But yes you could probably implement a full blown Jena client side cursor 
implementation layered over the basic implementation.

Internally ARQ itself does streaming execution. So when Andy refers to top-k 
what he means is ARQ allocates an internal buffer of k results (LIMIT + OFFSET) 
when k is below a certain threshold and as it executes the query keeps only the 
top K results necessary to answer it. However this does still require the 
entire query to be evaluated i.e. you still have to find and sort all possible 
results, even if only in reference to the top k currently stored.

At a Jena client layer you can already execute the query without LIMIT/OFFSET 
(by removing those clauses from the parsed query), and then translating the 
results into a ResultSetRewindable [1][2] and paging through that in your own 
code.  But depending on the query (and dataset) you may be trading off memory 
for time.

Rob

[1] 
https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetFactory.html#makeRewindable(org.apache.jena.rdf.model.Model)
[2] 
https://jena.apache.org/documentation/javadoc/arq/org/apache/jena/query/ResultSetRewindable.html

On 13/05/2021, 23:00, "graham" <gra...@orangedogsoftware.com> wrote:

    Hi

    I am a little confused by this discussion.

    I understand the original posters question -- what they are talking 
    about is an incredibly common use case. Run query, look at the first 
    1-200 results, add filters, re-run new query, repeat and rinse.

    If this were JDBC the client would run one query with no limits/offsets, 
    so in this example the query with the results ordered by date. The 
    client UI would then implement the paging itself. The client would cache 
    past pages, so that users can page backward, and forward paging is just 
    standard JDBC result streaming. Admittedly if you were clever you could 
    also do a dance with a scrolling result set, although personally I don't 
    find them all that useful.

    What isn't clear to me is why you can't do the same thing with SPARQL? 
    In other words I am unclear where the implementation issues with 
    SPARQL/Jena are occurring.

    thanks

    graham

    I don't know what Jena's policy regarding diverging from the SPARQL 
    standards are, but

    On 14/05/21 7:32 am, Andy Seaborne wrote:
    > On 11/05/2021 16:54, Kimball, Adam wrote:
    >> I know that I’ve asked this question before, but I am still 
    >> struggling to understand how I might handle this case:
    >>
    >> I have a Jena DB of event entries.  One common way to view the events 
    >> is to page through them.  Normally this is done by seeing the most 
    >> recent 50 events and then paging to the next 50 most recent and so on.
    >>
    >> In pure SPARQL, I don’t really see an efficient way to accomplish 
    >> this.  With limit and offset, I don’t really save anything other than 
    >> i/o since the whole result set will need to be ordered before this 
    >> limit/offset has an effect.  And that is killing us now.
    >>
    >> My guess is we will need to implement some caching or possibly index 
    >> the graph with Lucene or something.  It is doable but definitely not 
    >> ideal.  Maybe I can use the quad position to facilitate this?  I am 
    >> assuming this cannot be optimized within Jena itself?
    >>
    >> Best,
    >> Adam
    >>
    >>
    >
    > Hi Adam,
    >
    > No - there isn't a better way in std SPARQL. If you think the app is 
    > going to process all the results, reading the whole thing into some 
    > local cache is a way to go.
    >
    > The proper solution is a overhaul of the SPARQL protocol.
    >
    > Also, HTTP/2 may offer some iteresting possibilities.
    >
    > Specific to ARQ: query execution is often predictable and stable 
    > order. There aren't many places where - absent concurrent updates - 
    > the order will be different from call to call.
    >
    > FWIW Jena does optimize "top k" sorts  SELECT-sort-LIMIT/OFFSET up to 
    > (from memory) k=1000 items.
    >
    > > Maybe I can use the quad position to facilitate this?
    >
    > Not sure what the idea is here.
    >
    >     Andy
    >
    >
    -- 
         Doubt is a pain too lonely to know that faith is his twin brother. - 
Kahlil Gibran





Reply via email to