On 29/11/12 21:41, Sarven Capadisli wrote:
Hi all,

I would like to better control over my public SPARQL Endpoints (using
Fuseki) due to some harmless looking, but expensive queries coming in.
My initial thoughts were, and not necessarily the ones I want to take:

* Block the IP or agent

There are some unfriendly crawlers out there that ignore robots.txt or check once a month. Then send 10+ bots at your site. We had to block them explicitly.

RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

* Use default query timeout values
* Start using API keys or other authentication
* Catch the exact queries from httpd and block it off
* Handle certain query types from Fuseki or TDB

But, more importantly, I'd love to her some of the actions you all take
on your endpoints. If you can point me to any documentation or some of
the common practices out there, that'd be awesome as well.

We rely on the query timeout, which with TDB is a drop dead timeout (it really does kill the query (there is currently one hole in it - JENA-289).

We do not put the Fuseki server on the public internet - we front with httpd (usually to mix in other stuff using Tomcat - having separate servers for Tomcat and Fuseki means you can kill/restart one and not the other). And use a load balancer in front of that for multiple machines.

httpd is also a good place to do login security. We have also used Apache Shiro + Fuseki embedded (use the jar) in a single webapp.

        Andy

-Sarven

https://issues.apache.org/jira/browse/JENA-289

Reply via email to