On 29/11/12 21:41, Sarven Capadisli wrote:
Hi all,
I would like to better control over my public SPARQL Endpoints (using
Fuseki) due to some harmless looking, but expensive queries coming in.
My initial thoughts were, and not necessarily the ones I want to take:
* Block the IP or agent
There are some unfriendly crawlers out there that ignore robots.txt or
check once a month. Then send 10+ bots at your site. We had to block
them explicitly.
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]
* Use default query timeout values
* Start using API keys or other authentication
* Catch the exact queries from httpd and block it off
* Handle certain query types from Fuseki or TDB
But, more importantly, I'd love to her some of the actions you all take
on your endpoints. If you can point me to any documentation or some of
the common practices out there, that'd be awesome as well.
We rely on the query timeout, which with TDB is a drop dead timeout (it
really does kill the query (there is currently one hole in it - JENA-289).
We do not put the Fuseki server on the public internet - we front with
httpd (usually to mix in other stuff using Tomcat - having separate
servers for Tomcat and Fuseki means you can kill/restart one and not the
other). And use a load balancer in front of that for multiple machines.
httpd is also a good place to do login security. We have also used
Apache Shiro + Fuseki embedded (use the jar) in a single webapp.
Andy
-Sarven
https://issues.apache.org/jira/browse/JENA-289