2011/9/15 eric konsirald <eric.konsir...@gmail.com> > ... When you say that co-occurrence processing will not work the way i > might > like, you mean because those might be feature hashed independently and will > hardly be linked together right? >
No. I meant because you will have have odd artifacts. For instance, if you use two probes, then the two probes for a single item will appear to be a very strong cooccurrence. Likewise, when items collide on a single probe, that collision will make the cooccurrence appear as it were real. You can repair this somewhat by normalizing the encoding and then doing the direct matrix multiplication. You will get some spreading of the cooccurrence energy, but it should be mostly focussed on the points you want. maybe what i can do is to do an independent co-occurrence analysis and then > inject the meaningful co-occurrences as a single token (each word separated > e.g. by a '_') in the feature vector. > That is an excellent approach. - with SGD for instance, i'm more familiar seeing it at work as a learning > algorithm for e.g. classification (referring to the 20NewsGroup example i > saw in Mahout), but i'm not sure how to use it as a standalone optimization > algorithm for minimizing a given objective function. The SGD algorithm we have in Mahout uses gradient descent to optimize an error criterion. If you compute the gradient you can use it for whatever optimization you like. The sparse update stuff will all change, but it should still work. > For instance, to use > the terminology from the paper i mentioned ( > http://www2008.org/papers/pdf/p1041-debnath.pdf ), if i'd like to find the > optimized values for the weights ω0,..., ωN given a set of equations of the > form > ω0 + ω1f(A1i, A1j ) + ω2f(A2i, A2j ) + · · · + ωNf(Ani, Anj ) = E(Oi, Oj) > is there any easy way to do that in Mahout? > Not directly and how you want to do this would likely vary with the definition of f. The rest of your questions are all good. I can't really spare the time just now (plane about to leave) to think through a good answer, but I htink that based on your questions that you have some good momentum in a good direction.