Re: [Wikidata] SPARQL service timeouts

Addshore Tue, 19 Apr 2016 02:07:09 -0700

In the case we are discussing here the truncated JSON is caused by blaze
graph deciding it has been sending data for too long and then stopping (as
I understand).
Thus you will only see a spike on the graph for the amount of data actually
sent from the server, not the size of the result blazegraph was trying to
send back.


I also ran into this with some simple queries that returned big sets of
data.
Although with my issue I did actually also see a Java exception somewhere.

On 18 April 2016 at 21:51, Markus Kroetzsch <[email protected]>
wrote:

> On 18.04.2016 22:21, Markus Kroetzsch wrote:
>
>> On 18.04.2016 21:56, Markus Kroetzsch wrote:
>>
>>> Thanks, the dashboard is interesting.
>>>
>>> I am trying to run this query:
>>>
>>> SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }
>>>
>>> It is supposed to return a large result set. But I am only running it
>>> once per week. It used to work fine, but today I could not get it to
>>> succeed a single time.
>>>
>>
>> Actually, the query seems to work as it should. I am investigating why I
>> get an error in some cases on my machine.
>>
>
> Ok, I found that this is not so easy to reproduce reliably. The symptom I
> am seeing is a truncated JSON response, which just stops in the middle of
> the data (at a random location, but usually early on), and which is *not*
> followed by any error message. The stream just ends.
>
> So far, I could only get this in Java, not in Python, and it does not
> always happen. If successful, the result is about 250M in size. The
> following Python script can retrieve it:
>
> import requests
> SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql'
> query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }"""
> print requests.get(SPARQL_SERVICE_URL, params={'query': query, 'format':
> 'json'}).text
>
> (output should be redirected to a file)
>
> I will keep an eye on the issue, but I don't know how to debug this any
> further now, since it started to work without me changing any code.
>
> I also wonder how to read the dashboard after all. In spite of me
> repeating an experiment that creates a 250M result file for five times in
> the past few minutes, the "Bytes out" figure remains below a few MB for
> most of the time.
>
>
> Markus
>
>
>
>>> On 18.04.2016 21:40, Stas Malyshev wrote:
>>>
>>>> Hi!
>>>>
>>>> I have the impression that some not-so-easy SPARQL queries that used to
>>>>> run just below the timeout are now timing out regularly. Has there been
>>>>> a change in the setup that may have caused this, or are we maybe seeing
>>>>> increased query traffic [1]?
>>>>>
>>>>
>>>> We've recently run on a single server for couple of days due to
>>>> reloading of the second one, so this may have made it a bit slower. But
>>>> that should be gone now, we're back to two. Other than that, not seeing
>>>> anything abnormal in
>>>> https://grafana.wikimedia.org/dashboard/db/wikidata-query-service
>>>>
>>>> [1] The deadline for the Int. Semantic Web Conf. is coming up, so it
>>>>> might be that someone is running experiments on the system to get their
>>>>> paper finished. It has been observed for other endpoints that traffic
>>>>> increases at such times. This community sometimes is the greatest enemy
>>>>> of its own technology ... (I recently had to IP-block an RDF crawler
>>>>> from one of my sites after it had ignored robots.txt completely).
>>>>>
>>>>
>>>> We don't have any blocks or throttle mechanisms right now. But if we see
>>>> somebody making serious negative impact on the service, we may have to
>>>> change that.
>>>>
>>>>
>>>
>>>
>>
>>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486
> http://korrekt.org/
>
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>



-- 
Addshore

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] SPARQL service timeouts

Reply via email to