GoranSMilovanovic added a comment.
@JAllemandou Awesome! You did a nice EDA here + you've analyzed both `event.wdqs_external_sparql_query` and `event.wdqs_internal_sparql_query` - while I've focused only on the `external` source in my previous analyses... So, we do need ML to be able to predict query processing time after all: - if you take a look at my Report in T248308#6087571 <https://phabricator.wikimedia.org/T248308#6087571> - you will find out that many features like query length, concurrency, etc. actually do contribute to query processing time, - **when** combined in XGBoost; **however**, and exactly like your analyses show us, - taken in isolation from other candidate features they do not show significant correlations with query processing times themselves. **Q**. I remember you've mentioned somewhere - in a doc <https://docs.google.com/document/d/1hTxGVKRpv52SlULnva1ry60h0doOz5GafBTbYLsrbJg/edit#heading=h.5nfw48isoprv> shared with @Addshore, I guess - that you've used Apache Jena AQR to parse the queries, probably to obtain algebraic representations of SPARQL and extract some features from it; do we have Jena installed somewhere on the stat100* servers? Maybe we should meet to discuss our analyses at some point - and if you find some time. TASK DETAIL https://phabricator.wikimedia.org/T248308 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: GoranSMilovanovic Cc: Samantha_Alipio_WMDE, MGerlach, JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, Jakob_WMDE, Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, darthmon_wmde, CBogen, Akuckartz, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs