Hi!

> repository. They can use javascript and connect to the github api
> however, so it might be possible to build a useful browser interface
> that filtered based on the '-' separated name components. You could at

Exactly! I've (ab)used github API for a bit, and found this (please
correct me if the figures look weird, I trusted API passed through
several quick-n-dirty scripts so mistakes possible):
- we have 1780 repos
- 900 of them are extensions, so putting extensions in separate group
cuts it by half
- 102 repos do not have "-" in their name, meaning those would be
hardest to classify
- 1016 have "mediawiki" prefix
- 243 are "operations"
- 76 are "analytics"
- 48 are "labs"
- 20 are "pywikibot"
- 18 are "thumbor"
- 18 are "integration"
- 15 are "apps"
- 10 are "phabricator"

The rest of the groups are smaller or don't look like a good unifying
principle immediately.
Thus, even with very naive approach we could categorize at least 1464 or
over 80% of the repos, leaving just about 300 non-obvious ones, of which
probably about 100-200 will end up in "other". Not ideal, but at least
better than 1780 :)
The list of names is at
https://gist.github.com/smalyshev/dfa72c79d9f750058262b902f47c0130 if
anybody wants to play with it and save time on extracting it :)
-- 
Stas Malyshev
[email protected]

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to