Bawolff added a comment. |
Question: Looking at https://github.com/Wikidata/QueryAnalysis/blob/master/tools/extractAnonymized.py, at first glance, it looks like the string handline code wouldn't handle edge cases properly e.g.
"foo\"bar" "foo'bar"
?
(I only skimmed the code, may have misunderstood)
We have a list of query types ordered by frequency. However, there are millions of query types, and the most frequent are those created by bots. I can dig up a pointer to the local file where we have it, if this is what you want. If you are interested in a broader analysis of the data, you could take a look at a recent workshop paper of ours: https://iccl.inf.tu-dresden.de/web/Inproceedings3196/en
It has detailed statistics of SPARQL feature distributions and discusses some findings.Ok, that's good enough that you did that I think. The main point of the question was to ensure that you had a good idea of the type of data in the data set (i.e. We aren't just releasing data we've never actually looked at)
Cc: Bawolff, APalmer_WMF, Aklapper, DarTar, mkroetzsch, EBjune, Smalyshev, leila, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Avner, dpatrick, ZhouZ, Luke081515, Mpaulson, Wikidata-bugs, aude, Capt_Swing, jayvdb, JanZerebecki, Slaporte, csteipp, Mbch331, Jay8g, Krenair, Legoktm
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs