Smalyshev added a comment.

( ASK{ ?x ?y ?z };) does timeout from time to time.

This is definitely the thing that should not be happening, but I wonder how we can build metric around it beyond "this should never happen".

We could solve those issues by throwing loads of hardware at the problem.

I guess. But before that, we should define what the issues are :) I.e. do we want to get p95 or p50 into some range? Which range? Do we even care is p95 is 0.5s or 10s or 30s? If p99 is 40s, is it good or bad? Right now I am not sure I know how to answer these.

WDQS public endpoint is not expected to have high availability / stability guarantees.

Well, this sounds a bit like giving up on availability (even if it's not the intention), so I think we want to have something. Let's think/brainstorm on what this something could be and how we could measure it.



