Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Andrea Splendiani Thu, 18 Apr 2013 06:26:54 -0700

Hi,

I think that some caching with a minimum of query rewriting would get read of 
90% of the select{?s ?p ?o} where {?s?p ?o} queries.


From a user perspective, I would rather have a clear result code upfront 
telling me: your query is to heavy, not enough resources and so on, than 
partial results + extra codes. I won't do much of partial results anyway... so 
it's time wasted both sides.

One empiric solution could be to assign a quota per requesting IP (or other 
form of identification). Then one could restrict the total amount of resource 
per time-frame, possibly with smart policies. It would also avoid people 
breaking big queries in many small ones...

But I was wondering: why is resource consumption a problem for sparql endpoint 
providers, and not for other "providers" on the web ? (say, YouTube, Google, 
...).
Is it the unpredictability of the resources needed ? 

best,
Andrea

Il giorno 18/apr/2013, alle ore 12:53, Jerven Bolleman 
<jerven.bolle...@isb-sib.ch> ha scritto:

> Hi All,
> 
> Managing a public SPARQL endpoint has some difficulties in comparison to 
> managing a simpler REST api.
> Instead of counting api calls or external bandwidth use we need to look at 
> internal IO and CPU usage as well.
> 
> Many of the current public SPARQL endpoints limit all their users to queries 
> of limited CPU time.
> But this is not enough to really manage (mis) use of an endpoint. Also the 
> SPARQL api being http based
> suffers from the problem that we first send the status code and may only find 
> out later that we can't
> answer the query after all. Leading to a 200 not OK problem :(
> 
> What approaches can we come up with as a community to embedded resource limit 
> exceeded exceptions in the 
> SPARQL protocols. e.g. we could add an exception element to the sparql xml 
> result format.[1]
> 
> The current limits to CPU use are not enough to really avoid misuse. Which is 
> why I submitted a patch to
> Sesame that allows limits on memory use as well. Although limits on disk 
> seeks or other IO counts may be needed by some as well.
> 
> But these are currently hard limits what I really want is 
> "playground limits" i.e. you can use the swing as much as you want if you are 
> the only child in the park. 
> Once there are more children you have to share. 
> 
> And how do we communicate this to our users. i.e. this result set is 
> incomplete because you exceeded your IO
> quota please break up your queries in smaller blocks. 
> 
> For my day job where I do manage a 7.4 billion triple store with public 
> access some extra tools in managing users would be 
> great.
> 
> Last but not least how can we avoid that users need to run SELECT 
> (COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends.
> For beta.sparql.uniprot.org I have been moving much of this information into 
> the sparql endpoint description but its not a place
> where people look for this information.
> 
> Regards,
> Jerven
> 
> [1] Yeah these ideas are not great timing just after 1.1 but we can always 
> start SPARQL 1.2 ;)
> 
> 
> 
> -------------------------------------------------------------------
> Jerven Bolleman                        jerven.bolle...@isb-sib.ch
> SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland     www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
> 
>

Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Reply via email to