After discussions on this list last year, I decided to try categorizing queries on the list with a view to training a Bayes classifier for an autoresponder. I've got about 5 weeks worth now of /queries/ (not replies) sent to the list which I've been classifying somewhat crudely, and thought people might be interested in the stats.
The categories are necessarily more than a bit subjective; a few queries were ambiguous; the categories are what seemed to me to be relevant to newcomers. 158 installation (incl download, dictionaries, removal) 95 M$ compatibility (not vista issues) 24 foreign language submissions 18 selling OOo 9 vista compatibilty 7 envelope printing 424 uncategorized (may need to split) giving a total of around 740 queries. Oh, and the classifier doesn't work at all well. Probably needs much more data. -- http://www.scottsonline.org.uk lists incoming sites blocked because of spam [EMAIL PROTECTED] Mike Scott, Harlow, Essex, England --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
