Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

Andy Seaborne Mon, 30 Oct 2017 14:58:45 -0700


On 30/10/17 15:28, Phil Gooch wrote:

Thanks Andy

In the Tomcat logs I see

Query Cancelled - results truncated (but 200 already sent)

but the Javascript web app becomes unresponsive and locks up the browser
(Chrome and Safari tested).


That seems likely.

The results stream back - really don't want to buffer the whole resultset to know the results are complete as that would be a big impact onthe server. If a query cancel happens, the result stream is broken (badJSON). The JS will need to be wary of this. Short of a non-standardsolution, I don't know anything better.

The JS has to be careful anyway - reading anything large over HTTP hasthe potential to break.


    Andy

The context to this is that I have modified shiro.ini to prevent non-admin
remote users from modifying or uploading data - they can just use the UI to
create and run SPARQL queries. So I wanted a way of preventing the user
doing things they shouldn't, such as running queries that might attempt to
return all the data.

I was thinking the easiest way to do this would be to sniff the query and
append a LIMIT onto the end of it if there isn't one already present.

Cheers

Phil

On Mon, Oct 30, 2017 at 2:47 PM, Andy Seaborne <[email protected]> wrote:

When are you getting server overload, what's happening to the server? When
I've tried it, yes, the request is doing a lot of work and the network is
working but the server itself was just a bit sluggish.


There isn't a way ATM and it is messy in HTTP (like timeouts).
But it would be a good thing to have.

HTTP requires the status code be sent first so if it is "200 OK" the
contract is that the response will complete properly.


If you or somone wants to put in a contribution, that would be great.

Recorded as JENA-1412.

What is needed is handling in the same way as query timeouts. (The old
JENA-228 tried by inserting a LIMIT but then you don't know if the query
overran or not.  What is needed is a QueryIterator to wrap the execution
and throw QueryCancelledException , then check the handling of results.
Does really need to be done the same as query timeout).

     Andy


On 30/10/17 13:43, Phil Gooch wrote:

Hi Andy

Thanks, the timeout works fine, it’s just the number of rows returned that
I’d like to impose a hard limit on via a configuration file, if possible.

Cheers

Phil


On Mon, 30 Oct 2017 at 13:11, Andy Seaborne <[email protected]> wrote:

Phil,


Anthing thing to try:

Adding "?timeout=1" to a query HTTP URL sets a 1 second timeout on the
query.

Be careful - it is different from context settings - it is in seconds,
not milliseconds, and does not provide "X,Y"

       Andy

On 30/10/17 12:33, Andy Seaborne wrote:

Hi Phil,

That all looks OK to me.

I tried your configuration with timeout of "1000,1000" and a query of:

PREFIX afn:     <http://jena.apache.org/ARQ/function#>

ASK{
       FILTER(afn:wait(1000))
       FILTER(afn:wait(1000))
       FILTER(afn:wait(1000))
       FILTER(afn:wait(1000))
       FILTER(afn:wait(1000))
}

and I got back a query timeout (using the latest code - I don't see any
changes in the codebase).

I tried the standalone server and as a war file.

Could you try the same please?

       Andy

On 27/10/17 15:43, Phil Gooch wrote:

@Dave - thanks for the info about the two value timeout, I'll try that.

@Andy - according to the META-INF in the fuseki.war file I'm running
2.6.0

#Generated by Maven
#Tue May 02 13:43:43 EDT 2017
version=2.6.0
groupId=org.apache.jena
artifactId=jena-fuseki-war

The config file for demo.ttl in the configuration directory looks like
this

@prefix :      <http://base/#> .
@prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix fuseki: <http://jena.apache.org/fuseki#> .

:service_tdb_all  a                   fuseki:Service ;
           rdfs:label                    "TDB demo" ;
           fuseki:dataset                :tdb_dataset_readwrite ;
           fuseki:name                   "demo" ;
           fuseki:serviceQuery           "query" , "sparql" ;
           fuseki:serviceReadGraphStore  "get", "post" ;
           fuseki:serviceReadWriteGraphStore
                   "data" ;
           fuseki:serviceUpdate          "update" ;
           fuseki:serviceUpload          "upload" .

:tdb_dataset_readwrite
           a             tdb:DatasetTDB ;
           ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue
"30000,60000" ] ;
           tdb:location  "/etc/fuseki/databases/demo" .


Cheers

Phil



On Fri, Oct 27, 2017 at 3:27 PM, Andy Seaborne <[email protected]>
wrote:

Phil -


Which version are you running?

Can you show the configuration file?

        Andy


On 27/10/17 08:30, Dave Reynolds wrote:

On 26/10/17 12:27, Phil Gooch wrote:


Hi there


I am running Fuseki2 within Tomcat and I'm looking for a way to
configure
Fuseki to limit the number of rows returned by a query. For
example, to
prevent a rogue query such as

SELECT * WHERE {?s ?v ?o}

from being executed to completion.

I've imposed a maximum timeout via

ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "60000" ]
;

in config.ttl and also in the individual <dataset>.ttl files, but

this

does

not seem to prevent the above query from locking up the server.

Timeouts do generally work. There used to be problems with sort

queries

but those have been resolved and that's not a sort query.


Might be worth trying the two value version (time to first result and
time for whole query):

ja:context [ja:cxtName "arq:queryTimeout";  ja:cxtValue
"30000,60000" ];


I've looked through the documentation at



https://jena.apache.org/documentation/fuseki2/fuseki-configu

ration.html

https://jena.apache.org/documentation/serving_data/#fuseki-
configuration-file
https://github.com/apache/jena/tree/master/jena-fuseki2/examples

but I've not found the right config option.

Is this possible, or will I need to modify the source code to add a
LIMIT n
if this is not specified in the original query?

There's no built-in machinery to limit the number of rows so far as I
know. So if timeouts really don't work for you then indeed you would
need
to inject a LIMIT clause into the queries yourself.

Timeouts are generally better because some queries are really really
hard
but return few results whereas queries like the above stream
perfectly well
and should impose low load, they just go on for a long time.

In our case the endpoints we expose are typically APIs where we can
inject API-specific hard/soft row limits as part of the query
generation
phase. For full sparql endpoints then we rely on timeouts.

Dave

Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

Reply via email to