Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

Phil Gooch Mon, 30 Oct 2017 08:30:07 -0700

Thanks Andy

In the Tomcat logs I see


Query Cancelled - results truncated (but 200 already sent)

but the Javascript web app becomes unresponsive and locks up the browser
(Chrome and Safari tested).

The context to this is that I have modified shiro.ini to prevent non-admin
remote users from modifying or uploading data - they can just use the UI to
create and run SPARQL queries. So I wanted a way of preventing the user
doing things they shouldn't, such as running queries that might attempt to
return all the data.

I was thinking the easiest way to do this would be to sniff the query and
append a LIMIT onto the end of it if there isn't one already present.

Cheers

Phil

On Mon, Oct 30, 2017 at 2:47 PM, Andy Seaborne <[email protected]> wrote:

> When are you getting server overload, what's happening to the server? When
> I've tried it, yes, the request is doing a lot of work and the network is
> working but the server itself was just a bit sluggish.
>
>
> There isn't a way ATM and it is messy in HTTP (like timeouts).
> But it would be a good thing to have.
>
> HTTP requires the status code be sent first so if it is "200 OK" the
> contract is that the response will complete properly.
>
>
> If you or somone wants to put in a contribution, that would be great.
>
> Recorded as JENA-1412.
>
> What is needed is handling in the same way as query timeouts. (The old
> JENA-228 tried by inserting a LIMIT but then you don't know if the query
> overran or not.  What is needed is a QueryIterator to wrap the execution
> and throw QueryCancelledException , then check the handling of results.
> Does really need to be done the same as query timeout).
>
>     Andy
>
>
> On 30/10/17 13:43, Phil Gooch wrote:
>
>> Hi Andy
>>
>> Thanks, the timeout works fine, it’s just the number of rows returned that
>> I’d like to impose a hard limit on via a configuration file, if possible.
>>
>> Cheers
>>
>> Phil
>>
>>
>> On Mon, 30 Oct 2017 at 13:11, Andy Seaborne <[email protected]> wrote:
>>
>> Phil,
>>>
>>> Anthing thing to try:
>>>
>>> Adding "?timeout=1" to a query HTTP URL sets a 1 second timeout on the
>>> query.
>>>
>>> Be careful - it is different from context settings - it is in seconds,
>>> not milliseconds, and does not provide "X,Y"
>>>
>>>       Andy
>>>
>>> On 30/10/17 12:33, Andy Seaborne wrote:
>>>
>>>> Hi Phil,
>>>>
>>>> That all looks OK to me.
>>>>
>>>> I tried your configuration with timeout of "1000,1000" and a query of:
>>>>
>>>> PREFIX afn:     <http://jena.apache.org/ARQ/function#>
>>>>
>>>> ASK{
>>>>       FILTER(afn:wait(1000))
>>>>       FILTER(afn:wait(1000))
>>>>       FILTER(afn:wait(1000))
>>>>       FILTER(afn:wait(1000))
>>>>       FILTER(afn:wait(1000))
>>>> }
>>>>
>>>> and I got back a query timeout (using the latest code - I don't see any
>>>> changes in the codebase).
>>>>
>>>> I tried the standalone server and as a war file.
>>>>
>>>> Could you try the same please?
>>>>
>>>>       Andy
>>>>
>>>> On 27/10/17 15:43, Phil Gooch wrote:
>>>>
>>>>> @Dave - thanks for the info about the two value timeout, I'll try that.
>>>>>
>>>>> @Andy - according to the META-INF in the fuseki.war file I'm running
>>>>> 2.6.0
>>>>>
>>>>> #Generated by Maven
>>>>> #Tue May 02 13:43:43 EDT 2017
>>>>> version=2.6.0
>>>>> groupId=org.apache.jena
>>>>> artifactId=jena-fuseki-war
>>>>>
>>>>> The config file for demo.ttl in the configuration directory looks like
>>>>> this
>>>>>
>>>>> @prefix :      <http://base/#> .
>>>>> @prefix tdb:   <http://jena.hpl.hp.com/2008/tdb#> .
>>>>> @prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
>>>>> @prefix ja:    <http://jena.hpl.hp.com/2005/11/Assembler#> .
>>>>> @prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
>>>>> @prefix fuseki: <http://jena.apache.org/fuseki#> .
>>>>>
>>>>> :service_tdb_all  a                   fuseki:Service ;
>>>>>           rdfs:label                    "TDB demo" ;
>>>>>           fuseki:dataset                :tdb_dataset_readwrite ;
>>>>>           fuseki:name                   "demo" ;
>>>>>           fuseki:serviceQuery           "query" , "sparql" ;
>>>>>           fuseki:serviceReadGraphStore  "get", "post" ;
>>>>>           fuseki:serviceReadWriteGraphStore
>>>>>                   "data" ;
>>>>>           fuseki:serviceUpdate          "update" ;
>>>>>           fuseki:serviceUpload          "upload" .
>>>>>
>>>>> :tdb_dataset_readwrite
>>>>>           a             tdb:DatasetTDB ;
>>>>>           ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue
>>>>> "30000,60000" ] ;
>>>>>           tdb:location  "/etc/fuseki/databases/demo" .
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>> Phil
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Oct 27, 2017 at 3:27 PM, Andy Seaborne <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Phil -
>>>>>>
>>>>>> Which version are you running?
>>>>>>
>>>>>> Can you show the configuration file?
>>>>>>
>>>>>>        Andy
>>>>>>
>>>>>>
>>>>>> On 27/10/17 08:30, Dave Reynolds wrote:
>>>>>>
>>>>>> On 26/10/17 12:27, Phil Gooch wrote:
>>>>>>>
>>>>>>> Hi there
>>>>>>>>
>>>>>>>> I am running Fuseki2 within Tomcat and I'm looking for a way to
>>>>>>>> configure
>>>>>>>> Fuseki to limit the number of rows returned by a query. For
>>>>>>>> example, to
>>>>>>>> prevent a rogue query such as
>>>>>>>>
>>>>>>>> SELECT * WHERE {?s ?v ?o}
>>>>>>>>
>>>>>>>> from being executed to completion.
>>>>>>>>
>>>>>>>> I've imposed a maximum timeout via
>>>>>>>>
>>>>>>>> ja:context [ ja:cxtName "arq:queryTimeout" ;  ja:cxtValue "60000" ]
>>>>>>>> ;
>>>>>>>>
>>>>>>>> in config.ttl and also in the individual <dataset>.ttl files, but
>>>>>>>>
>>>>>>> this
>>>
>>>> does
>>>>>>>> not seem to prevent the above query from locking up the server.
>>>>>>>>
>>>>>>>>
>>>>>>> Timeouts do generally work. There used to be problems with sort
>>>>>>>
>>>>>> queries
>>>
>>>> but those have been resolved and that's not a sort query.
>>>>>>>
>>>>>>> Might be worth trying the two value version (time to first result and
>>>>>>> time for whole query):
>>>>>>>
>>>>>>> ja:context [ja:cxtName "arq:queryTimeout";  ja:cxtValue
>>>>>>> "30000,60000" ];
>>>>>>>
>>>>>>>
>>>>>>> I've looked through the documentation at
>>>>>>>>
>>>>>>>>
>>>>>>>> https://jena.apache.org/documentation/fuseki2/fuseki-configu
>>> ration.html
>>>
>>>>
>>>>>>>> https://jena.apache.org/documentation/serving_data/#fuseki-
>>>>>>>> configuration-file
>>>>>>>> https://github.com/apache/jena/tree/master/jena-fuseki2/examples
>>>>>>>>
>>>>>>>> but I've not found the right config option.
>>>>>>>>
>>>>>>>> Is this possible, or will I need to modify the source code to add a
>>>>>>>> LIMIT n
>>>>>>>> if this is not specified in the original query?
>>>>>>>>
>>>>>>>>
>>>>>>> There's no built-in machinery to limit the number of rows so far as I
>>>>>>> know. So if timeouts really don't work for you then indeed you would
>>>>>>> need
>>>>>>> to inject a LIMIT clause into the queries yourself.
>>>>>>>
>>>>>>> Timeouts are generally better because some queries are really really
>>>>>>> hard
>>>>>>> but return few results whereas queries like the above stream
>>>>>>> perfectly well
>>>>>>> and should impose low load, they just go on for a long time.
>>>>>>>
>>>>>>> In our case the endpoints we expose are typically APIs where we can
>>>>>>> inject API-specific hard/soft row limits as part of the query
>>>>>>> generation
>>>>>>> phase. For full sparql endpoints then we rely on timeouts.
>>>>>>>
>>>>>>> Dave
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>
>>

Re: [Fuseki] Configuring Fuseki2 to impose a maximum limit on the number of rows returned.

Reply via email to