No SGD (stochastic gradient descent) and factorization are two different things. More strictly, those are two different classes of problems -- factorization and regression. SGD is one implementation for regression classifcation. Factorization is finding virtual factors in a user/item space (ALS-WR is one of the methods to find such factors).
Yes SGD is in the book but not with your example specifically since I meant to apply it after you find latent variables (factors, whatever). You will get more help on ALS-WR method by staying on the list and also perhaps create an archive entry for others to follow in a similar situation. The idea is that we all learn together and effectively:) (and i score more points for support :) CVB (if i am not totally off) is something called continuous variational Bayes implementation of LDA (Latent Dirichlet Allocation) which may help you to analyze content of your web pages IF you manage to grab the text off of them. in Mahout, it is facilitated by a package here: https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/clustering/lda/cvb/package-summary.html I don't know where exactly wiki help on CVB is, but searching mahout archive and stack overflow may help. Again, by staing on the list you may get more help with that. LSA (Latent semantic analysis) is another way to analyze the content of you web. See a wikipedia article for refresher, but basically it is a run of SVD over tf-idf of unigrams, bigrams etc. Mahout has general pipeline to prepare that context data with seqdirectory, seq2sparse commands (again, you can find details in the book). Then you just run 'mahout ssvd <options>' on the output of seq2sparse and use rows of U*Sigma output for the topical allocation values. Somebody will probably correct me on this, but I think you can use topical allocation values to further build your classification with regressions (SGD). -d On Fri, Nov 9, 2012 at 1:11 PM, qiaoresearcher <[email protected]>wrote: > Hi Dmitriy, > > Many thanks for your comments and i really appreciate although I think I > may not fully understood you. > > As I understand, SGD mean stochastic gradient descent, is that right? > I What I need now is some example code to : read the files, construct the > web page set, then form the vectors. Such steps are called 'factorization' > in Mahout, right? > > Do you mean Mahout in Action has examples similar to what I described? > what is CVB and LSA, and SSVD (singular value decomposition?) > > >
