many thanks, i may need sometime to digest the information you provide...:-)
have a nice weekend, On Fri, Nov 9, 2012 at 3:34 PM, Dmitriy Lyubimov <[email protected]> wrote: > No SGD (stochastic gradient descent) and factorization are two different > things. More strictly, those are two different classes of problems -- > factorization and regression. SGD is one implementation for regression > classifcation. Factorization is finding virtual factors in a user/item > space (ALS-WR is one of the methods to find such factors). > > Yes SGD is in the book but not with your example specifically since I meant > to apply it after you find latent variables (factors, whatever). > > You will get more help on ALS-WR method by staying on the list and also > perhaps create an archive entry for others to follow in a similar > situation. The idea is that we all learn together and effectively:) (and i > score more points for support :) > > CVB (if i am not totally off) is something called continuous variational > Bayes implementation of LDA (Latent Dirichlet Allocation) which may help > you to analyze content of your web pages IF you manage to grab the text off > of them. in Mahout, it is facilitated by a package here: > > https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/clustering/lda/cvb/package-summary.html > I > don't know where exactly wiki help on CVB is, but searching mahout archive > and stack overflow may help. Again, by staing on the list you may get more > help with that. > > LSA (Latent semantic analysis) is another way to analyze the content of you > web. See a wikipedia article for refresher, but basically it is a run of > SVD over tf-idf of unigrams, bigrams etc. Mahout has general pipeline to > prepare that context data with seqdirectory, seq2sparse commands (again, > you can find details in the book). Then you just run 'mahout ssvd > <options>' on the output of seq2sparse and use rows of U*Sigma output for > the topical allocation values. Somebody will probably correct me on this, > but I think you can use topical allocation values to further build your > classification with regressions (SGD). > > -d > > > On Fri, Nov 9, 2012 at 1:11 PM, qiaoresearcher <[email protected] > >wrote: > > > Hi Dmitriy, > > > > Many thanks for your comments and i really appreciate although I think I > > may not fully understood you. > > > > As I understand, SGD mean stochastic gradient descent, is that right? > > I What I need now is some example code to : read the files, construct > the > > web page set, then form the vectors. Such steps are called > 'factorization' > > in Mahout, right? > > > > Do you mean Mahout in Action has examples similar to what I described? > > what is CVB and LSA, and SSVD (singular value decomposition?) > > > > > > >
