Re: log-likelihood ratio value in item similarity calculation

2013-04-12 Thread Phoenix Bai
I got 168, because I use log base 2 instead of e. ([?]) if memory serves right, I read it in entropy definition that people normally use base 2, so I just assumed it was 2 in code. (my bad). And now I have a better understanding, so thank you both for the explanation. On Fri, Apr 12, 2013 at

Re: log-likelihood ratio value in item similarity calculation

2013-04-12 Thread Sean Owen
Yes that's true, it is more usually bits. Here it's natural log / nats. Since it's unnormalized anyway another constant factor doesn't hurt and it means not having to change the base. On Fri, Apr 12, 2013 at 8:01 AM, Phoenix Bai baizh...@gmail.com wrote: I got 168, because I use log base 2

Java Code for PCA

2013-04-12 Thread Chirag Lakhani
I am having trouble understanding whether the following code is sufficient for running PCA I have a sequence file of dense vectors that I am calling and then I am trying to run the following code SSVDSolver pcaFactory = new SSVDSolver(conf, new Path(vectorsFolder), new

Re: cross recommender

2013-04-12 Thread Pat Ferrel
That looks like the best shortcut. It is one of the few places where the rows of one and the columns of the other are seen together. Now I know why you transpose the first input :-) But, I have begun to wonder whether it is the right thing to do for a cross recommender because you are

Re: Java Code for PCA

2013-04-12 Thread Dmitriy Lyubimov
No,this is not right. I will explain later when i have a moment. On Apr 12, 2013 8:08 AM, Chirag Lakhani clakh...@zaloni.com wrote: I am having trouble understanding whether the following code is sufficient for running PCA I have a sequence file of dense vectors that I am calling and then I

Re: Java Code for PCA

2013-04-12 Thread Dmitriy Lyubimov
On Fri, Apr 12, 2013 at 8:42 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: No,this is not right. I will explain later when i have a moment. On Apr 12, 2013 8:08 AM, Chirag Lakhani clakh...@zaloni.com wrote: I am having trouble understanding whether the following code is sufficient for

Re: log-likelihood ratio value in item similarity calculation

2013-04-12 Thread Ted Dunning
The only virtue of using the natural base is that you get a nice asymptotic distribution for random data. On Fri, Apr 12, 2013 at 1:10 AM, Sean Owen sro...@gmail.com wrote: Yes that's true, it is more usually bits. Here it's natural log / nats. Since it's unnormalized anyway another

Re: cross recommender

2013-04-12 Thread Ted Dunning
Log-likelihood similarity is a bit of a force-fit of the concept of the LLR. It is basically a binarizing and sparsifying filter applied to cooccurrence counts. As such, it is eminently suited to implementation using a matrix multiply. On Fri, Apr 12, 2013 at 8:35 AM, Pat Ferrel

Feature reduction for LibLinear weights

2013-04-12 Thread Ken Krugler
Hi all, We're (ab)using LibLinear (linear SVM) as a multi-class classifier, with 200+ labels and 400K features. This results in a model that's 800MB, which is a bit unwieldy. Unfortunately LibLinear uses a full array of weights (nothing sparse), being a port from the C version. I could do