On 30.09.2016 20:47, Denny Vrandečić wrote:
Markus, do you have access to the corresponding HTTP request logs? The
fields there might be helpful (although I might be overtly optimistic
about it)

Yes, we can access all logs. For bot-based queries, this should be very helpful indeed. I can still think of several cases where this won't help much:

* People writing a quick Python (or whatever) script to run thousands of queries, without setting a meaningful user agent. * Web applications like Reasonator or SQID that cause the client to run SPARQL queries when viewing a page (in this case, the user agent that gets logged is the user's browser).

But, yes, we will definitely look at all signals that we can get from the data.

Best,

Markus




On Fri, Sep 30, 2016 at 11:38 AM Yuri Astrakhan
<yastrak...@wikimedia.org <mailto:yastrak...@wikimedia.org>> wrote:

    I guess I qualify for #2 several times:
    * The <mapframe> & <maplink> support access to the geoshapes
    service, which in turn can make requests to WDQS. For example, see
    https://en.wikipedia.org/wiki/User:Yurik/maplink  (click on
    "governor's link")

    * The <graph> wiki tag supports the same geoshapes service, as well
    as direct queries to WDQS. This graph uses both (one to get all
    countries, the other is to get the list of disasters)
    https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql/Largest_disasters

    * There has been some discussion to allow direct WDQS querying from
    maps too - e.g. to draw points of interest based on Wikidata (very
    easy to implement, but we should be careful to cache it properly)

    Since all these queries are called from either nodejs or our
    javascript, we could attach extra headers, like X-Analytics, which
    is already handled by Varnish.  Also, NodeJS queries could set the
    user agent string.


    On Fri, Sep 30, 2016 at 10:44 AM Markus Kroetzsch
    <markus.kroetz...@tu-dresden.de
    <mailto:markus.kroetz...@tu-dresden.de>> wrote:

        On 30.09.2016 16:18, Andra Waagmeester wrote:
        > Would it help if I add the following header to every large
        batch of queries?
        >
        > #######
        > # access: (http://query.wikidata.org
        > or
        https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}
        
<https://query.wikidata.org/bigdata/namespace/wdq/sparql?query=%7BSPARQL%7D>
        .)
        > # contact: email, acountname, twittername etc
        > # bot: True/False
        > # .........
        > ######

        This is already more detailed than what I had in mind. Having a
        way to
        tell apart bots and tools from "organic" queries would already
        be great.
        We are mainly looking for something that will help us to understand
        sudden peaks of activity. For this, it might be enough to have a
        short
        signature (a URL could be given, but a tool name with a version
        would
        also be fine). This is somewhat like the "user agent" field in HTTP.

        But you are right that some formatting convention may help
        further here.
        How about this:

        #TOOL:<any user agent information that you like to share>

        Then one could look for comments of this form without knowing
        all the
        tools upfront. Of course, this is just a hint in any case, since one
        could always use the same comment in any manually written query.

        Best regards,

        Markus

        >
        > On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
        > <markus.kroetz...@tu-dresden.de
        <mailto:markus.kroetz...@tu-dresden.de>
        <mailto:markus.kroetz...@tu-dresden.de
        <mailto:markus.kroetz...@tu-dresden.de>>>
        > wrote:
        >
        >     Dear SPARQL users,
        >
        >     We are starting a research project to investigate the use
        of the
        >     Wikidata SPARQL Query Service, with the goal to gain
        insights that
        >     may help to improve Wikidata and the query service [1].
        Currently,
        >     we are still waiting for all data to become available.
        Meanwhile, we
        >     would like to ask for your input.
        >
        >     Preliminary analyses show that the use of the SPARQL query
        service
        >     varies greatly over time, presumably because power users and
        >     software tools are running large numbers of queries. For a
        >     meaningful analysis, we would like to understand such
        high-impact
        >     biases in the data. We therefore need your help:
        >
        >     (1) Are you a SPARQL power user who sometimes runs large
        numbers of
        >     queries (over 10,000)? If so, please let us know how your
        queries
        >     might typically look so we can identify them in the logs.
        >
        >     (2) Are you the developer of a tool that launches SPARQL
        queries? If
        >     so, then please let us know if there is any way to
        identify your
        >     queries.
        >
        >     If (1) or (2) applies to you, then it would be good if you
        could
        >     include an identifying comment into your SPARQL queries in the
        >     future, to make it easier to recognise them. In return,
        this would
        >     enable us to provide you with statistics on the usage of
        your tool [2].
        >
        >     Further feedback is welcome.
        >
        >     Cheers,
        >
        >     Markus
        >
        >
        >     [1]
        >
         https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
        >
         
<https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
        >
        >     [2] Pending permission by the WMF. Like all Wikimedia
        usage data,
        >     the query logs are under strict privacy protection, so we
        will need
        >     to get clearance before sharing any findings with the
        public. We
        >     hope, however, that there won't be any reservations against
        >     publishing non-identifying information.
        >
        >     --
        >     Prof. Dr. Markus Kroetzsch
        >     Knowledge-Based Systems Group
        >     Faculty of Computer Science
        >     TU Dresden
        >     +49 351 463 38486 <tel:%2B49%20351%20463%2038486>
        >     https://iccl.inf.tu-dresden.de/web/KBS/en
        >     <https://iccl.inf.tu-dresden.de/web/KBS/en>
        >
        >     _______________________________________________
        >     Wikidata mailing list
        >     Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>
        <mailto:Wikidata@lists.wikimedia.org
        <mailto:Wikidata@lists.wikimedia.org>>
        >     https://lists.wikimedia.org/mailman/listinfo/wikidata
        >     <https://lists.wikimedia.org/mailman/listinfo/wikidata>
        >
        >
        >
        >
        > _______________________________________________
        > Wikidata mailing list
        > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
        > https://lists.wikimedia.org/mailman/listinfo/wikidata
        >


        _______________________________________________
        Wikidata mailing list
        Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
        https://lists.wikimedia.org/mailman/listinfo/wikidata

    _______________________________________________
    Wikidata mailing list
    Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
    https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to