Hi,
I think that some caching with a minimum of query rewriting would get read of
90% of the select{?s ?p ?o} where {?s?p ?o} queries.
From a user perspective, I would rather have a clear result code upfront
telling me: your query is to heavy, not enough resources and so on, than
partial results + extra codes. I won't do much of partial results anyway... so
it's time wasted both sides.
One empiric solution could be to assign a quota per requesting IP (or other
form of identification). Then one could restrict the total amount of resource
per time-frame, possibly with smart policies. It would also avoid people
breaking big queries in many small ones...
But I was wondering: why is resource consumption a problem for sparql endpoint
providers, and not for other "providers" on the web ? (say, YouTube, Google,
...).
Is it the unpredictability of the resources needed ?
best,
Andrea
Il giorno 18/apr/2013, alle ore 12:53, Jerven Bolleman
<[email protected]> ha scritto:
> Hi All,
>
> Managing a public SPARQL endpoint has some difficulties in comparison to
> managing a simpler REST api.
> Instead of counting api calls or external bandwidth use we need to look at
> internal IO and CPU usage as well.
>
> Many of the current public SPARQL endpoints limit all their users to queries
> of limited CPU time.
> But this is not enough to really manage (mis) use of an endpoint. Also the
> SPARQL api being http based
> suffers from the problem that we first send the status code and may only find
> out later that we can't
> answer the query after all. Leading to a 200 not OK problem :(
>
> What approaches can we come up with as a community to embedded resource limit
> exceeded exceptions in the
> SPARQL protocols. e.g. we could add an exception element to the sparql xml
> result format.[1]
>
> The current limits to CPU use are not enough to really avoid misuse. Which is
> why I submitted a patch to
> Sesame that allows limits on memory use as well. Although limits on disk
> seeks or other IO counts may be needed by some as well.
>
> But these are currently hard limits what I really want is
> "playground limits" i.e. you can use the swing as much as you want if you are
> the only child in the park.
> Once there are more children you have to share.
>
> And how do we communicate this to our users. i.e. this result set is
> incomplete because you exceeded your IO
> quota please break up your queries in smaller blocks.
>
> For my day job where I do manage a 7.4 billion triple store with public
> access some extra tools in managing users would be
> great.
>
> Last but not least how can we avoid that users need to run SELECT
> (COUNT(DISTINT(?s) as ?sc} WHERE {?s ?p ?o} and friends.
> For beta.sparql.uniprot.org I have been moving much of this information into
> the sparql endpoint description but its not a place
> where people look for this information.
>
> Regards,
> Jerven
>
> [1] Yeah these ideas are not great timing just after 1.1 but we can always
> start SPARQL 1.2 ;)
>
>
>
> -------------------------------------------------------------------
> Jerven Bolleman [email protected]
> SIB Swiss Institute of Bioinformatics Tel: +41 (0)22 379 58 85
> CMU, rue Michel Servet 1 Fax: +41 (0)22 379 58 58
> 1211 Geneve 4,
> Switzerland www.isb-sib.ch - www.uniprot.org
> Follow us at https://twitter.com/#!/uniprot
> -------------------------------------------------------------------
>
>