Re: Detecting rank-deficiency, or worse, via QR decomposition

2013-04-08 Thread Sean Owen
PS I think the issue is really more like this, after some more testing. When lambda (overfitting parameter) is high, the X and Y in the factorization A = X*Y' are forced to have a small (frobenius) norm. They underfit A, potentially a lot, if lambda is high; the values of A are always small and

Re: Detecting rank-deficiency, or worse, via QR decomposition

2013-04-08 Thread Koobas
Okay, it sheds some light on the problem. Thanks for sharing. On Mon, Apr 8, 2013 at 4:33 AM, Sean Owen sro...@gmail.com wrote: PS I think the issue is really more like this, after some more testing. When lambda (overfitting parameter) is high, the X and Y in the factorization A = X*Y' are

Re: Integrating Mahout with existing nlp libraries

2013-04-08 Thread Ted Dunning
This sounds like the best suggestion so far. On Apr 3, 2013, at 8:45 AM, Julien Nioche wrote: This is typically what Behemoth can be used for https://github.com/DigitalPebble/behemoth. It has a Mahout module to generate vectors at the same format as SparseVectorsFromSequenceFiles.

Re: I believe the TanimotoSimilarity scorer actually uses the Jaccard similarity measure

2013-04-08 Thread Ted Dunning
I don't see the problem here. We only want to compare two items so Jaccard and Tanimoto are identical. Could you file a JIRA and suggest a javadoc patch? Why did this take you to an ancient journal instead of Wikipedia? On Apr 7, 2013, at 6:54 AM, James Endicott wrote: As far as I can

Re: I believe the TanimotoSimilarity scorer actually uses the Jaccard similarity measure

2013-04-08 Thread James Endicott
I didn't want to file a suggestion for a javadoc patch without hearing from someone who knows a bit more about the math history behind it because I didn't want to suggest something that may be in error. When I checked the Wikipedia article on it, the article noted that there was confusion an

In-memory kmeans clustering

2013-04-08 Thread Ahmet Ylmaz
Hi, It seems to be that in-memory kmeans clustering is removed from Mahout 0.7. Does this mean that it is no longer possible to do in-memory kmeans clustering with Mahout? Or, is Hadoop based kmeans clustering the only option? Thanks Ahmet

Re: cross recommender

2013-04-08 Thread Ted Dunning
On Sat, Apr 6, 2013 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote: I guess I don't understand this issue. In my case both the item ids and user ids of the separate DistributedRow Matrix will match and I know the size for the entire space from a previous step where I create id maps. I

Re: I believe the TanimotoSimilarity scorer actually uses the Jaccard similarity measure

2013-04-08 Thread Ted Dunning
To my mind, you as the reader have a major voice here. So if you were confused/not happy with the doc, then there is a problem. You will know best how to fix that when you get done. So let us know how! On Mon, Apr 8, 2013 at 2:16 PM, James Endicott endicott.ja...@gmail.comwrote: I didn't