Re: Dealing with expensive queries

Andy Seaborne Fri, 30 Nov 2012 03:14:25 -0800

On 29/11/12 21:41, Sarven Capadisli wrote:

Hi all,


I would like to better control over my public SPARQL Endpoints (using
Fuseki) due to some harmless looking, but expensive queries coming in.
My initial thoughts were, and not necessarily the ones I want to take:

* Block the IP or agent

There are some unfriendly crawlers out there that ignore robots.txt orcheck once a month. Then send 10+ bots at your site. We had to blockthem explicitly.


RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

* Use default query timeout values
* Start using API keys or other authentication
* Catch the exact queries from httpd and block it off
* Handle certain query types from Fuseki or TDB

But, more importantly, I'd love to her some of the actions you all take
on your endpoints. If you can point me to any documentation or some of
the common practices out there, that'd be awesome as well.

We rely on the query timeout, which with TDB is a drop dead timeout (itreally does kill the query (there is currently one hole in it - JENA-289).

We do not put the Fuseki server on the public internet - we front withhttpd (usually to mix in other stuff using Tomcat - having separateservers for Tomcat and Fuseki means you can kill/restart one and not theother). And use a load balancer in front of that for multiple machines.

httpd is also a good place to do login security. We have also usedApache Shiro + Fuseki embedded (use the jar) in a single webapp.


        Andy

-Sarven


https://issues.apache.org/jira/browse/JENA-289

Re: Dealing with expensive queries

Reply via email to