Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Kingsley Idehen Thu, 18 Apr 2013 05:48:23 -0700

On 4/18/13 7:53 AM, Jerven Bolleman wrote:

Hi All,


Managing a public SPARQL endpoint has some difficulties in comparison to 
managing a simpler REST api.
Instead of counting api calls or external bandwidth use we need to look at 
internal IO and CPU usage as well.

Many of the current public SPARQL endpoints limit all their users to queries of 
limited CPU time.
But this is not enough to really manage (mis) use of an endpoint. Also the 
SPARQL api being http based
suffers from the problem that we first send the status code and may only find 
out later that we can't
answer the query after all. Leading to a 200 not OK problem :(

What approaches can we come up with as a community to embedded resource limit 
exceeded exceptions in the
SPARQL protocols. e.g. we could add an exception element to the sparql xml 
result format.[1]


Good idea, for sure.


The current limits to CPU use are not enough to really avoid misuse. Which is 
why I submitted a patch to
Sesame that allows limits on memory use as well. Although limits on disk seeks 
or other IO counts may be needed by some as well.

But these are currently hard limits what I really want is
"playground limits" i.e. you can use the swing as much as you want if you are 
the only child in the park.
Once there are more children you have to share.

That level of granularity isn't really in scope per se. re. HTTP or HTTP+SPARQL (aka SPARQL-Protocol).


And how do we communicate this to our users. i.e. this result set is incomplete 
because you exceeded your IO
quota please break up your queries in smaller blocks.

A good amount of these error conditions could fit into existing HTTP responses. Worst case, HTTP+SPARQL could be enhanced to provide additional granularity etc..


For my day job where I do manage a 7.4 billion triple store with public access 
some extra tools in managing users would be
great.

Last but not least how can we avoid that users need to run SELECT 
(COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends.
For beta.sparql.uniprot.org I have been moving much of this information into 
the sparql endpoint description but its not a place
where people look for this information.


We should encourage them to look there :-)

Kingsley


Regards,
Jerven

[1] Yeah these ideas are not great timing just after 1.1 but we can always 
start SPARQL 1.2 ;)



-------------------------------------------------------------------
Jerven Bolleman                        [email protected]
SIB Swiss Institute of Bioinformatics      Tel: +41 (0)22 379 58 85
CMU, rue Michel Servet 1               Fax: +41 (0)22 379 58 58
1211 Geneve 4,
Switzerland     www.isb-sib.ch - www.uniprot.org
Follow us at https://twitter.com/#!/uniprot
-------------------------------------------------------------------



--

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Public SPARQL endpoints:managing (mis)-use and communicating limits to users.

Reply via email to