In the case we are discussing here the truncated JSON is caused by blaze graph deciding it has been sending data for too long and then stopping (as I understand). Thus you will only see a spike on the graph for the amount of data actually sent from the server, not the size of the result blazegraph was trying to send back.
I also ran into this with some simple queries that returned big sets of data. Although with my issue I did actually also see a Java exception somewhere. On 18 April 2016 at 21:51, Markus Kroetzsch <[email protected]> wrote: > On 18.04.2016 22:21, Markus Kroetzsch wrote: > >> On 18.04.2016 21:56, Markus Kroetzsch wrote: >> >>> Thanks, the dashboard is interesting. >>> >>> I am trying to run this query: >>> >>> SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC } >>> >>> It is supposed to return a large result set. But I am only running it >>> once per week. It used to work fine, but today I could not get it to >>> succeed a single time. >>> >> >> Actually, the query seems to work as it should. I am investigating why I >> get an error in some cases on my machine. >> > > Ok, I found that this is not so easy to reproduce reliably. The symptom I > am seeing is a truncated JSON response, which just stops in the middle of > the data (at a random location, but usually early on), and which is *not* > followed by any error message. The stream just ends. > > So far, I could only get this in Java, not in Python, and it does not > always happen. If successful, the result is about 250M in size. The > following Python script can retrieve it: > > import requests > SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql' > query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }""" > print requests.get(SPARQL_SERVICE_URL, params={'query': query, 'format': > 'json'}).text > > (output should be redirected to a file) > > I will keep an eye on the issue, but I don't know how to debug this any > further now, since it started to work without me changing any code. > > I also wonder how to read the dashboard after all. In spite of me > repeating an experiment that creates a 250M result file for five times in > the past few minutes, the "Bytes out" figure remains below a few MB for > most of the time. > > > Markus > > > >>> On 18.04.2016 21:40, Stas Malyshev wrote: >>> >>>> Hi! >>>> >>>> I have the impression that some not-so-easy SPARQL queries that used to >>>>> run just below the timeout are now timing out regularly. Has there been >>>>> a change in the setup that may have caused this, or are we maybe seeing >>>>> increased query traffic [1]? >>>>> >>>> >>>> We've recently run on a single server for couple of days due to >>>> reloading of the second one, so this may have made it a bit slower. But >>>> that should be gone now, we're back to two. Other than that, not seeing >>>> anything abnormal in >>>> https://grafana.wikimedia.org/dashboard/db/wikidata-query-service >>>> >>>> [1] The deadline for the Int. Semantic Web Conf. is coming up, so it >>>>> might be that someone is running experiments on the system to get their >>>>> paper finished. It has been observed for other endpoints that traffic >>>>> increases at such times. This community sometimes is the greatest enemy >>>>> of its own technology ... (I recently had to IP-block an RDF crawler >>>>> from one of my sites after it had ignored robots.txt completely). >>>>> >>>> >>>> We don't have any blocks or throttle mechanisms right now. But if we see >>>> somebody making serious negative impact on the service, we may have to >>>> change that. >>>> >>>> >>> >>> >> >> > > -- > Markus Kroetzsch > Faculty of Computer Science > Technische Universität Dresden > +49 351 463 38486 > http://korrekt.org/ > > _______________________________________________ > Wikidata mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikidata > -- Addshore
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
