[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-18 Thread GoranSMilovanovic
GoranSMilovanovic closed this task as "Resolved".

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread Maintenance_bot
Maintenance_bot removed a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Maintenance_bot
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331, Alter-paule, Beast1978, Un1tY, Hook696, 
Kent7301, joker88john, CucyNoiD, Gaboe420, Giuliamocci, Cpaulf30, Af420, 
Bsandipan, Lewizho99, Maathavan
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a project: Patch-For-Review.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, gerritbot
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, Alter-paule, Beast1978, CBogen, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a comment.


  Change 617863 had a related patch set uploaded (by GoranSMilovanovic; owner: 
GoranSMilovanovic):
  [analytics/wmde/WD/WD_HumanEdits@master] T248308 

  
  https://gerrit.wikimedia.org/r/617863

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, gerritbot
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-02 Thread gerritbot
gerritbot added a comment.


  Change 617863 **merged** by GoranSMilovanovic:
  [analytics/wmde/WD/WD_HumanEdits@master] T248308 

  
  https://gerrit.wikimedia.org/r/617863

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, gerritbot
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, Alter-paule, Beast1978, CBogen, Un1tY, 
Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, 
Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-08-01 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Lydia_Pintscher We forgot to mention this task in our recent 1:1. In the 
meantime, I've tested a 10% daily queries  sample and the statistics of the 
smaller, previously used 1% daily queries sample, turn out to be quite 
representative. However, if tabulation - e.g. counts and average query response 
times, and similar, per user agent - is really all that we need here, then we 
do not need to sample anything at all, just let PySpark do it in the Analytics 
Cluster and follow everything up to some amount of time in the past.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-24 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Lydia_Pintscher You're welcome.
  
  > We should get this list once a quarter or so to find new uses of our data
  
  It is perfectly doable. Let's discuss this on Monday and see what data and 
statistics precisely do we want to have reported regularly.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-24 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.


  In T248308#6324161 , 
@GoranSMilovanovic wrote:
  
  > @Lydia_Pintscher
  >
  > Let's see if there is anything interesting here:
  >
  > F31943519: ref_user_agent_sample.csv 

  
  There is definitely something interesting there! (We should get this list 
once a quarter or so to find new uses of our data ;-)
  Thanks!

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, Lydia_Pintscher
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @JAllemandou Superfine. Enjoy your holidays!

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I have indeed done some analysis using Apache Jena parser 
to extract algebraic representation of queries. Not yet to the level of 
completion I like though. I'll be on holidays until August 15th starting 
tonight - let's discuss when I come back?

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @JAllemandou Awesome! You did a nice EDA here + you've analyzed both 
`event.wdqs_external_sparql_query` and `event.wdqs_internal_sparql_query` - 
while I've focused only on the `external` source in my previous analyses...
  
  So, we do need ML to be able to predict query processing time after all:
  
  - if you take a look at my Report in T248308#6087571 

  - you will find out that many features like query length, concurrency, etc. 
actually do contribute to query processing time,
  - **when** combined in XGBoost; **however**, and exactly like your analyses 
show us,
  - taken in isolation from other candidate features they do not show 
significant correlations with query processing times themselves.
  
  **Q**. I remember you've mentioned somewhere - in a doc 

 shared with @Addshore, I guess - that you've used Apache Jena AQR to parse the 
queries, probably to obtain algebraic representations of SPARQL and extract 
some features from it; do we have Jena installed somewhere on the stat100* 
servers?
  
  Maybe we should meet to discuss our analyses at some point - and if you find 
some time.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-22 Thread JAllemandou
JAllemandou added a comment.


  @GoranSMilovanovic I finally published a wiki page with most of the results I 
found: https://wikitech.wikimedia.org/wiki/User:Joal/WDQS_Traffic_Analysis
  Sorry for the delay ...

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Lydia_Pintscher There is absolutely no correlation between
  
  (a) how often does a particular `user_agent` value appears, and
  (b) the mean, or median WDQS processing time for that `user_agent`'s SPARQL 
queries.
  
  We can search for particular `user_agents` with high average query processing 
time, or study the variability of query processing times per `user_agents`, 
however... as a feature in a predictive model... probably not.
  
  Again, I will take a look at a larger sample of queries.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-21 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @Lydia_Pintscher
  
  Let's see if there is anything interesting here:
  
  F31943519: ref_user_agent_sample.csv 

  
  Data:
  
  - it is produced from a sample of SPARQL queries from 
`event.wdqs_external_sparql_query`,
  - time span from 20. June - 20. July,
  - sampling: 1% of queries observed each day were randomly selected,
  - final dataset has 2,097,070 queries;
  - the `user_agent` fields and query processing times were extracted to 
produce the `ref_user_agent_sample.csv` dataset;
  - only `user_agents` who appear at least //twice// in the sample were kept.
  
  Columns in `ref_user_agent_sample.csv`:
  
  - count: how many times was this `user_agent` observed in the sample?
  - mean_time - mean query processing time for this `user_agent`
  - median_time - median query processing time for this `user_agent`
  - min_time - minimum query processing time for this `user_agent`
  - max_time - maximum query processing time for this `user_agent`
  - range_time - maximum minus minimum query processing time for this 
`user_agent`
  - percent_count: this `user_agent` accounts for how many % of the total 
queries in the sample (note: when `user_agents` who did not appear more than 
twice were filtered out);
  
  + the dataset is sorted in decreasing order of `count`.
  
  I will now test a larger sample (order of magnitude+) to see how much would 
it cost us to process it.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @JAllemandou Got it, thanks.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread JAllemandou
JAllemandou added a comment.


SELECT
http.request_headers['user-agent'],
user_agent_map,
count(1) as c
FROM event.wdqs_external_sparql_query
WHERE year = 2020 and month = 5 and day = 1
GROUP BY
http.request_headers['user-agent'],
user_agent_map
ORDER BY c DESC
LIMIT 100;

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @JAllemandou However...
  
0: jdbc:hive2://an-coord1001.eqiad.wmnet:1000> select user_agent_map from 
event.wdqs_external_sparql_query where year = 2020 and month = 5 and day = 1 
limit 10;
going to print operations logs
printed operations logs
Getting log thread is interrupted, since query is done!
INFO  : Compiling 
command(queryId=hive_20200715080808_74c55818-8e11-44d6-94b7-cfb516d24001): 
select user_agent_map from event.wdqs_external_sparql_query where year = 2020 
and month = 5 and day = 1 limit 10
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: 
Schema(fieldSchemas:[FieldSchema(name:user_agent_map, type:map, 
comment:null)], properties:null)
INFO  : Completed compiling 
command(queryId=hive_20200715080808_74c55818-8e11-44d6-94b7-cfb516d24001); Time 
taken: 0.082 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing 
command(queryId=hive_20200715080808_74c55818-8e11-44d6-94b7-cfb516d24001): 
select user_agent_map from event.wdqs_external_sparql_query where year = 2020 
and month = 5 and day = 1 limit 10
INFO  : Completed executing 
command(queryId=hive_20200715080808_74c55818-8e11-44d6-94b7-cfb516d24001); Time 
taken: 0.0 seconds
INFO  : OK
user_agent_map

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Windows","os_major":"10","os_minor":"-","browser_major":"81","browser_family":"Chrome","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}

{"os_family":"Other","os_major":"-","os_minor":"-","browser_major":"-","browser_family":"Other","device_family":"Other","wmf_app_version":"-"}
10 rows selected (0.159 seconds)
  
  Tested for '2020/06/01` too, with similar results.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread GoranSMilovanovic
GoranSMilovanovic added a comment.


  @JAllemandou Please see T248308#6080150 
. I also see that 
`event.wdqs_external_sparql_query` encompasses the `user_agent_map` so yes I 
will go for it and not for `wmf.webrequest`.
  
  > I have done some work emcompassing user-agent frequency analysis and I 'm 
in the process of writing the findings for this end of week.
  
  Nice, thank you, please share your work with us.
  
  If we are both already analyzing the WDQS responses, why don't we meet (w. 
@Lydia_Pintscher @darthmon_wmde and other interested parties) to exchange and 
discuss our findings?

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-15 Thread JAllemandou
JAllemandou added a comment.


  > First step: analyze the frequency distribution of the user_agent field 
(string) from wmf.webrequest where queries are SPARQL.
  
  I suggest you use events instead fo webrequest:  
`event.wdqs_internal_sparql_query` and `event.wdqs_external_sparql_query`.
  
  I have done some work emcompassing user-agent frequency analysis and I 'm in 
the process of writing the findings for this end of week.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic, JAllemandou
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] T248308: Analyse a small sample of the most often used query patterns on WDQS

2020-07-14 Thread GoranSMilovanovic
GoranSMilovanovic reopened this task as "Open".
GoranSMilovanovic added a comment.


  - Re-opening the task to address the question of automated vs. non-automated 
SPARQL queries observed at the WDQS end-point.
  - Reference: WMDE in-house email and Google Meet discussions with 
@darthmon_wmde and @Lydia_Pintscher.
  - First step: analyze the frequency distribution of the `user_agent` field 
(string) from `wmf.webrequest` where queries are SPARQL.
  - Second step: proceed towards a working definition of a non-automated WDQS 
query.

TASK DETAIL
  https://phabricator.wikimedia.org/T248308

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GoranSMilovanovic
Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, 
Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, 
WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, 
Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, 
Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs