Hi!

> For me, it’s perfectly ok when a query runs for 20 minutes, when it
> spares me some hours of setting up a specific environment for one
> specific dataset (and doing it again when I need current data two month
> later). And it would be no issue if the query runs much longer, in
> situations where it competes with several others. But of course, that’s
> not what I want to experience when I use a wikidata service to drive,
> e.g., an autosuggest function for selecting entities.

I understand that, but this is a shared server which is supposed to
serve many users, and if we allow to run 20-minute queries on this
service, soon enough it would become unusable. This is why we have
30-second limit on the server.

Now, we have considered having an option for the server or setup that
allows to run longer queries, but currently we don't have one. It would
require some budget allocation and work to make it, so it's not
something we can have right now. There are use cases for very long
queries and very large results, the current public service endpoint is
just not good in serving them, because it's not what it was meant for.

> And do you think the policies and limitations of different access
> strategies could be documented? These could include a high-reliability

I agree that limitations better to be documented, the problem is we
don't know everything we may need to document. Such as "what are queries
that may be bad". When I see something like "I want to download
million-row dataset" I know it's probably a bit too much. But I can't
have hard rule that says 1M-1 is ok, but 1M is too much.

> preferred option). And on the other end of the spectrum something what
> allows people to experiment freely. Finally, the latter kind of

I'm not sure how I could maintain an endpoint that would allow people to
do anything they want and still provide adequate experience for
everybody. Maybe if we had infinite hardware resources... but we do not.

Otherwise, it is possible - and should not be extremely hard - to set
one's own instance of the Query Service and use it for experimenting
with heavy lifting. Of course, that would require resources - but
there's no magic here, it'd require resources from us too, both in terms
of hardware and people that would maintain it. So some things we can do
now, some things we would be able to do later, and some things we
probably would not be able to offer with any adequate quality.
-- 
Stas Malyshev
smalys...@wikimedia.org

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to