Also per https://phabricator.wikimedia.org/T126730 and https://gerrit.wikimedia.org/r/#/c/274864/8 requests to the query service are now cached for 60 seconds. I expect this will include error results from timeouts so retrying a request within the same 60 seconds as the first won't event reach the WDQS servers now.
On 19 April 2016 at 10:05, Addshore <[email protected]> wrote: > In the case we are discussing here the truncated JSON is caused by blaze > graph deciding it has been sending data for too long and then stopping (as > I understand). > Thus you will only see a spike on the graph for the amount of data > actually sent from the server, not the size of the result blazegraph was > trying to send back. > > I also ran into this with some simple queries that returned big sets of > data. > Although with my issue I did actually also see a Java exception somewhere. > > On 18 April 2016 at 21:51, Markus Kroetzsch < > [email protected]> wrote: > >> On 18.04.2016 22:21, Markus Kroetzsch wrote: >> >>> On 18.04.2016 21:56, Markus Kroetzsch wrote: >>> >>>> Thanks, the dashboard is interesting. >>>> >>>> I am trying to run this query: >>>> >>>> SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC } >>>> >>>> It is supposed to return a large result set. But I am only running it >>>> once per week. It used to work fine, but today I could not get it to >>>> succeed a single time. >>>> >>> >>> Actually, the query seems to work as it should. I am investigating why I >>> get an error in some cases on my machine. >>> >> >> Ok, I found that this is not so easy to reproduce reliably. The symptom I >> am seeing is a truncated JSON response, which just stops in the middle of >> the data (at a random location, but usually early on), and which is *not* >> followed by any error message. The stream just ends. >> >> So far, I could only get this in Java, not in Python, and it does not >> always happen. If successful, the result is about 250M in size. The >> following Python script can retrieve it: >> >> import requests >> SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql' >> query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }""" >> print requests.get(SPARQL_SERVICE_URL, params={'query': query, 'format': >> 'json'}).text >> >> (output should be redirected to a file) >> >> I will keep an eye on the issue, but I don't know how to debug this any >> further now, since it started to work without me changing any code. >> >> I also wonder how to read the dashboard after all. In spite of me >> repeating an experiment that creates a 250M result file for five times in >> the past few minutes, the "Bytes out" figure remains below a few MB for >> most of the time. >> >> >> Markus >> >> >> >>>> On 18.04.2016 21:40, Stas Malyshev wrote: >>>> >>>>> Hi! >>>>> >>>>> I have the impression that some not-so-easy SPARQL queries that used to >>>>>> run just below the timeout are now timing out regularly. Has there >>>>>> been >>>>>> a change in the setup that may have caused this, or are we maybe >>>>>> seeing >>>>>> increased query traffic [1]? >>>>>> >>>>> >>>>> We've recently run on a single server for couple of days due to >>>>> reloading of the second one, so this may have made it a bit slower. But >>>>> that should be gone now, we're back to two. Other than that, not seeing >>>>> anything abnormal in >>>>> https://grafana.wikimedia.org/dashboard/db/wikidata-query-service >>>>> >>>>> [1] The deadline for the Int. Semantic Web Conf. is coming up, so it >>>>>> might be that someone is running experiments on the system to get >>>>>> their >>>>>> paper finished. It has been observed for other endpoints that traffic >>>>>> increases at such times. This community sometimes is the greatest >>>>>> enemy >>>>>> of its own technology ... (I recently had to IP-block an RDF crawler >>>>>> from one of my sites after it had ignored robots.txt completely). >>>>>> >>>>> >>>>> We don't have any blocks or throttle mechanisms right now. But if we >>>>> see >>>>> somebody making serious negative impact on the service, we may have to >>>>> change that. >>>>> >>>>> >>>> >>>> >>> >>> >> >> -- >> Markus Kroetzsch >> Faculty of Computer Science >> Technische Universität Dresden >> +49 351 463 38486 >> http://korrekt.org/ >> >> _______________________________________________ >> Wikidata mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> > > > > -- > Addshore > -- Addshore
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
