On Sun, Nov 20, 2011 at 5:52 AM, Olivier Grisel
wrote:
> Also do you have any hint whether this has an impact on the test error
> in practice on your data?
I've implemented the naive and lazy implementations of Langford (and
made sure that they give the same results) so I will try them on
several
2011/11/19 Mathieu Blondel :
> On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
> wrote:
>
>> Unfortunately, I'm not that familiar with "SGD-L1 (Clipped +
>> Lazy-Update)" either - I just quickly skimmed over a technical report
>> of Bob [1]. I agree with your description: it seems to me that th
On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
wrote:
> Unfortunately, I'm not that familiar with "SGD-L1 (Clipped +
> Lazy-Update)" either - I just quickly skimmed over a technical report
> of Bob [1]. I agree with your description: it seems to me that the
> major difference is the fact that
On Thu, Nov 17, 2011 at 10:03 PM, Alexandre Passos
wrote:
> What do you mean by regularize weight here? Do an L1 truncation?
Yes.
For L2-regularization, doing the regularization before or after the
prediction doesn't change the sign of the prediction (as L2
regularization just needs to multiply
On Thu, Nov 17, 2011 at 07:29, Mathieu Blondel wrote:
> In most SGD papers I know, people do:
>
> 1) Sample instance x_i
> 2) Predict label for x_i
> 3) Regularize weight
> 4) Update weight if non-zero loss suffered
>
> However, J. Langford and B. Carpenter do:
>
> 1) Sample instance x_i
> 2) Regu
In most SGD papers I know, people do:
1) Sample instance x_i
2) Predict label for x_i
3) Regularize weight
4) Update weight if non-zero loss suffered
However, J. Langford and B. Carpenter do:
1) Sample instance x_i
2) Regularize weight
3) Predict label for x_i
4) Update weight if non-zero loss s
On Thu, Nov 10, 2011 at 8:12 PM, Adrien wrote:
> For my own needs (projected gradient descent), I quickly implemented it
> here: https://gist.github.com/1272551 (I tested it against Duchi's own
> Matlab code).
I think you implemented the algorithm based on sorting, which has
complexity O(n_featu
Le 09/11/2011 14:32, Mathieu Blondel a écrit :
> On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
> wrote:
>
>> I'm aware of the issue - it seems to me that Bob is right but I can
>> hardly tell based on empirical evidence. Truncated gradient is quite a
>> crude procedure anyways - Olivier once
On Wed, Nov 9, 2011 at 4:37 AM, Peter Prettenhofer
wrote:
> I'm aware of the issue - it seems to me that Bob is right but I can
> hardly tell based on empirical evidence. Truncated gradient is quite a
> crude procedure anyways - Olivier once suggested to use a projected
> gradient approach instea
2011/11/8 Mathieu Blondel :
> Hello,
>
> I was re-reading Tsuruoka's paper, based on which the SGDClassifier
> implements L1 regularization and found this interesting post (as
> usual?) by Bob Carpenter:
>
> http://lingpipe-blog.com/2009/09/18/tsuruoka-tsujii-ananiadou-2009-stochastic-gradient-desc
Hello,
I was re-reading Tsuruoka's paper, based on which the SGDClassifier
implements L1 regularization and found this interesting post (as
usual?) by Bob Carpenter:
http://lingpipe-blog.com/2009/09/18/tsuruoka-tsujii-ananiadou-2009-stochastic-gradient-descent-training-for-l1-regularized-log-line
11 matches
Mail list logo