Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-03-02 Thread Vishal Santoshi
Should we maintain ( num_categories * num_of features ) matrix for per term learning rates in a num_categories-way classification ? for( i = 0 ; i num_categories ;i++){ for( j = 0 '; j num_of features;j++){ sum_of_squares[i][j] = sum_of_squares[i][j]

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-03-02 Thread Ted Dunning
Yes. I think that maintaining a learning rate for every parameter that is being learned is important. It might help to make that sparse, but I wouldn't think so. On Sun, Mar 2, 2014 at 1:33 PM, Vishal Santoshi vishal.santo...@gmail.comwrote: Should we maintain ( num_categories * num_of

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-28 Thread Ted Dunning
I have been swamped. Generally ad adagrad is a great idea. The code looks fine at first glance. Certainly some sort of adagrad would be preferable to the hack that I put in. Sent from my iPhone On Feb 26, 2014, at 18:30, Vishal Santoshi vishal.santo...@gmail.com wrote: Ted, Any

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-26 Thread Vishal Santoshi
Ted, Any feedback ? On Mon, Feb 24, 2014 at 2:58 PM, Vishal Santoshi vishal.santo...@gmail.comwrote: Hello Ted, This is regarding AdaGrad update per feature.Have attached a file which reflects http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf ( 2 ) It does differ from

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-24 Thread Vishal Santoshi
Hello Ted, This is regarding AdaGrad update per feature.Have attached a file which reflects http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf ( 2 ) It does differ from OnlineLogisticRegression in the way it implements public double perTermLearningRate(int j) ; This class

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-20 Thread Vishal Santoshi
Hey Ted, I presume that you would like Adagrad-like solution to replace the above ? Things that I could glean out. * Maintain a simple d-dimensional vector representing to store a running total of the squares of the gradients, where d is the number of terms. Say *gradients*. *

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2014-02-20 Thread Vishal Santoshi
I do see the regularize has the prior ( LI and L2 ) depend on * perTermLearningRate(j)) ...* On Thu, Feb 20, 2014 at 11:49 AM, Vishal Santoshi vishal.santo...@gmail.com wrote: Hey Ted, I presume that you would like Adagrad-like solution to replace the above ? Things that I could

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-12-29 Thread Ted Dunning
:-) Many leaks are *very* subtle. One leak that had me going for weeks was in a news wire corpus. I couldn't figure out why the cross validation was so good and running the classifier on new data was s much worse. The answer was that the training corpus had near-duplicate articles. This

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-12-04 Thread optimusfan
We've been playing around with a number of different parameters, feature selection, etc. and are able to achieve pretty good results in cross-validation. When you say cross validation, do you mean the magic cross validation that the ALR uses?  Or do you mean your 20%? I mean the 20%.  

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-12-02 Thread Gokhan Capan
Gokhan On Thu, Nov 28, 2013 at 3:18 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi vishal.santo...@gmail.com Are we to assume that SGD is still a work in progress and implementations ( Cross Fold, Online, Adaptive ) are too flawed to

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-12-02 Thread Ted Dunning
Inline On Mon, Dec 2, 2013 at 8:55 AM, optimusfan optimus...@yahoo.com wrote: ... To accomplish this, we used AdaptiveLogisticRegression and trained 46 binary classification models. Our approach has been to do an 80/20 split on the data, holding the 20% back for cross-validation of the

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-28 Thread Vishal Santoshi
Absolutely. I will read through. The idea is to first fix the learning rate update equation in OLR. I think this code in OnlineLogisticRegression is the current equation ? @Override public double currentLearningRate() { return mu0 * Math.pow(decayFactor, getStep()) *

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-28 Thread Ted Dunning
Yes. Exactly. On Thu, Nov 28, 2013 at 6:32 AM, Vishal Santoshi vishal.santo...@gmail.comwrote: Absolutely. I will read through. The idea is to first fix the learning rate update equation in OLR. I think this code in OnlineLogisticRegression is the current equation ? @Override

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-27 Thread Vishal Santoshi
Hell Ted, Are we to assume that SGD is still a work in progress and implementations ( Cross Fold, Online, Adaptive ) are too flawed to be realistically used ? The evolutionary algorithm seems to be the core of OnlineLogisticRegression, which in turn builds up to Adaptive/Cross Fold. b) for truly

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-27 Thread Vishal Santoshi
Sorry to spam, I never meant the Hello to come out as Hell. Given a little disappointment in the mail, I figure I rather spam than be misunderstood, On Wed, Nov 27, 2013 at 10:07 AM, Vishal Santoshi vishal.santo...@gmail.com wrote: Hell Ted, Are we to assume that SGD is still a work in

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-27 Thread Ted Dunning
No problem at all. Kind of funny. On Wed, Nov 27, 2013 at 7:08 AM, Vishal Santoshi vishal.santo...@gmail.comwrote: Sorry to spam, I never meant the Hello to come out as Hell. Given a little disappointment in the mail, I figure I rather spam than be misunderstood, On Wed, Nov 27, 2013

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-27 Thread Ted Dunning
On Wed, Nov 27, 2013 at 7:07 AM, Vishal Santoshi vishal.santo...@gmail.com Are we to assume that SGD is still a work in progress and implementations ( Cross Fold, Online, Adaptive ) are too flawed to be realistically used ? They are too raw to be accepted uncritically, for sure. They have

Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-26 Thread optimusfan
Hi- We're currently working on a binary classifier using Mahout's  AdaptiveLogisticRegression class.  We're trying to determine whether or not the models are suffering from high bias or variance and were wondering how to do this using Mahout's APIs?  I can easily calculate the cross validation

Re: Detecting high bias and variance in AdaptiveLogisticRegression classification

2013-11-26 Thread Ted Dunning
Well, first off, let me say that I am much less of a fan now of the magical cross validation approach and adaptation based on that than I was when I wrote the ALR code. There are definitely legs in the ideas, but my implementation has a number of flaws. For example: a) the way that I provide