Hi Ted, thank you for the explanation. For example imagine a term cloud, in which terms are presented. Some terms are bigger than other, because they are more likely than the other terms. I would need those results for analysis. We want to compare different ML-algorithms and methods and/or compinations of them. And first I have to gain some basic knowledge about Mahout.
For example, when I take the word 'social' as input I'd like to have that result: social 1.0 social media 0.8 social networking 0.65 social news 0.6 facebook 0.5 ... (ignore those values, it's not correct, but it should show what I need) The 20Newsgroup-example shows with the summary(int n) method the most likely categorisation of a term (--> the most important feature). I would like to have a list with the second, third, and so on important feature. I imagine, while computing the features, only the most import ones are added to the list and the less important features are rejected. Thanks and regards, David 2011/11/3 Ted Dunning <[email protected]> > There are no confidence values per se in the models computed by Mahout at > this time. > > There are several issues here, > > 1) Naive Bayes doesn't have such a concept. 'Nuff said there. > > 2) SGD logistic regresssion could compute confidence intervals, but I am > not quite sure how to do that with stochastic gradient descent. > > 3) in most uses of Mahout's logistic regression, the issues are data size > and feature set size. Confidence values are typically used for selecting > features which is typically not a viable strategy for problems with very > large feature sets. That is what the L1 regularization is all about. > > 4) with an extremely large number features, the noise on confidence > intervals makes them very hard to understand > > 5) with hashed features and feature collisions it is hard enough to > understand which feature is doing what, much less what the confidence > interval means. > > Can you say more about your problem? Is it small enough to use bayesglm in > R? > > On Thu, Nov 3, 2011 at 7:25 AM, David Rahman <[email protected] > >wrote: > > > Me again, > > > > can someone point me to right direction? How can I access these features? > > I looked into the summary(int n) -method located in > > org.apache.mahout.classifier.sgd.Modeldissector.java, but somehow I don't > > understand how it works. > > > > Could someone explain to me how it works? As I understand it, it returns > > just the max-value of a feature. > > > > Thanks and regards, > > David > > > > 2011/10/20 David Rahman <[email protected]> > > > > > Hi, > > > > > > how can I access the confidence values of one (or more) feature(s) with > > > its possibilities? > > > > > > In the 20Newsgroup-example, there is the dissect method, within there > is > > > used summary(int n), which returns the n most important features with > > their > > > weights. I want also the features which are placed second or third (or > > > more). How can I access those? > > > > > > Regards, > > > David > > > > > >
