Re: [Wikidata] SPARQL service timeouts

Addshore Tue, 19 Apr 2016 02:34:27 -0700

Also per https://phabricator.wikimedia.org/T126730 and
https://gerrit.wikimedia.org/r/#/c/274864/8 requests to the query service
are now cached for 60 seconds.
I expect this will include error results from timeouts so retrying a
request within the same 60 seconds as the first won't event reach the WDQS
servers now.


On 19 April 2016 at 10:05, Addshore <[email protected]> wrote:

> In the case we are discussing here the truncated JSON is caused by blaze
> graph deciding it has been sending data for too long and then stopping (as
> I understand).
> Thus you will only see a spike on the graph for the amount of data
> actually sent from the server, not the size of the result blazegraph was
> trying to send back.
>
> I also ran into this with some simple queries that returned big sets of
> data.
> Although with my issue I did actually also see a Java exception somewhere.
>
> On 18 April 2016 at 21:51, Markus Kroetzsch <
> [email protected]> wrote:
>
>> On 18.04.2016 22:21, Markus Kroetzsch wrote:
>>
>>> On 18.04.2016 21:56, Markus Kroetzsch wrote:
>>>
>>>> Thanks, the dashboard is interesting.
>>>>
>>>> I am trying to run this query:
>>>>
>>>> SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }
>>>>
>>>> It is supposed to return a large result set. But I am only running it
>>>> once per week. It used to work fine, but today I could not get it to
>>>> succeed a single time.
>>>>
>>>
>>> Actually, the query seems to work as it should. I am investigating why I
>>> get an error in some cases on my machine.
>>>
>>
>> Ok, I found that this is not so easy to reproduce reliably. The symptom I
>> am seeing is a truncated JSON response, which just stops in the middle of
>> the data (at a random location, but usually early on), and which is *not*
>> followed by any error message. The stream just ends.
>>
>> So far, I could only get this in Java, not in Python, and it does not
>> always happen. If successful, the result is about 250M in size. The
>> following Python script can retrieve it:
>>
>> import requests
>> SPARQL_SERVICE_URL = 'https://query.wikidata.org/sparql'
>> query = """SELECT ?subC ?supC WHERE { ?subC p:P279/ps:P279 ?supC }"""
>> print requests.get(SPARQL_SERVICE_URL, params={'query': query, 'format':
>> 'json'}).text
>>
>> (output should be redirected to a file)
>>
>> I will keep an eye on the issue, but I don't know how to debug this any
>> further now, since it started to work without me changing any code.
>>
>> I also wonder how to read the dashboard after all. In spite of me
>> repeating an experiment that creates a 250M result file for five times in
>> the past few minutes, the "Bytes out" figure remains below a few MB for
>> most of the time.
>>
>>
>> Markus
>>
>>
>>
>>>> On 18.04.2016 21:40, Stas Malyshev wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I have the impression that some not-so-easy SPARQL queries that used to
>>>>>> run just below the timeout are now timing out regularly. Has there
>>>>>> been
>>>>>> a change in the setup that may have caused this, or are we maybe
>>>>>> seeing
>>>>>> increased query traffic [1]?
>>>>>>
>>>>>
>>>>> We've recently run on a single server for couple of days due to
>>>>> reloading of the second one, so this may have made it a bit slower. But
>>>>> that should be gone now, we're back to two. Other than that, not seeing
>>>>> anything abnormal in
>>>>> https://grafana.wikimedia.org/dashboard/db/wikidata-query-service
>>>>>
>>>>> [1] The deadline for the Int. Semantic Web Conf. is coming up, so it
>>>>>> might be that someone is running experiments on the system to get
>>>>>> their
>>>>>> paper finished. It has been observed for other endpoints that traffic
>>>>>> increases at such times. This community sometimes is the greatest
>>>>>> enemy
>>>>>> of its own technology ... (I recently had to IP-block an RDF crawler
>>>>>> from one of my sites after it had ignored robots.txt completely).
>>>>>>
>>>>>
>>>>> We don't have any blocks or throttle mechanisms right now. But if we
>>>>> see
>>>>> somebody making serious negative impact on the service, we may have to
>>>>> change that.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>> --
>> Markus Kroetzsch
>> Faculty of Computer Science
>> Technische Universität Dresden
>> +49 351 463 38486
>> http://korrekt.org/
>>
>> _______________________________________________
>> Wikidata mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
>
> --
> Addshore
>



-- 
Addshore

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] SPARQL service timeouts

Reply via email to