GoranSMilovanovic added a comment.
Update `Thu 09 Apr 2020 10:19:24 PM UTC`:
- XGBoost w. `gbtree` on a binary classification problem ("typical" vs.
"extreme outlier" server response times) cross-validation started on
**stat1005**;
- using 9 data sets with varying number of features (<100 - 2000);
- splitting test from train data for each data set;
- running `xgboost` internal cross-validation controls;
- cross-validating across: learning rate (`eta`, 4 levels), subsample (rows,
4 levels) parameter to build trees, `max_depth` (how deep trees are allowed, 4
levels);
- number of iterations set to monotonically decrease with `eta`;
- keeping `colsample_bytree` (proportion of features used to build each tree)
fixed at .5;
- setting `max_delta_step` to 1 - documented to be useful for highly
unbalanced designs in binary classification (as ours is);
- model selection: ROC Analysis -> AUC.
Resource consumption: 32 cores, approx. 15Gb RAM.
Approximate running time guesstimate: 24 - 30h.
TASK DETAIL
https://phabricator.wikimedia.org/T248308
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: GoranSMilovanovic
Cc: JAllemandou, Lucas_Werkmeister_WMDE, Simon_Villeneuve, dcausse, Jakob_WMDE,
Gehel, Addshore, Lydia_Pintscher, WMDE-leszek, Aklapper, darthmon_wmde,
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer,
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs,
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs