Smalyshev added a comment.

Here's how I see the process for handling releases (see also T183020: Investigate the possibility to release Wikidata queries):

  1. WDQS logs are placed in separate partition on hive
  2. We create a pipeline that parses these logs and produces sanitized logs containing successful queries with:
    • Timestamp
    • Sanizitzed query, as described in T183020, probably with additional provisions mentioned there in comments
    • Sanitized user agent, as described in T183020
    • Bot flag
    • Session-hashed client IP (i.e., same IP produces same hash in the short term, but not necessarily over all data set)
    • Possibly geocoded_data - i.e. country etc. (probably not more specific than that)
    • Time to first byte
    • Response size
    • Referer class (external/internal)

This data can still be sensitive, but not as sensitive as raw source data, so can be easily shared with researchers after appropriate vetting and NDA procedures.

  1. From the data set above, we would create another data set, that includes:
    • Timestamp
    • Sanitized query, from above
    • (?) External/internal flag
    • (?) Bot flag
    • Additional tagging by items (Q-ids) and properties (P-ids) used in the query, so people could see usage by specific properties.

This data set could be periodically published openly.

Would like to hear comments about this idea.


TASK DETAIL
https://phabricator.wikimedia.org/T143819

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: PokestarFan, Nuria, Lydia_Pintscher, mkroetzsch, leila, debt, thiemowmde, Jonas, Smalyshev, AndrewSu, Aklapper, I9606, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, Avner, Gehel, FloNight, Xmlizer, JAllemandou, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to