On Tue, Dec 06, 2011 at 07:43:26PM -0500, David Warde-Farley wrote:
> I think that scaling by n_samples makes sense in the supervised learning
> context (we often do the equivalent thing where we take the mean, rather than
> the sum, over the unregularized training objective, making the regularizat
On Tue, Dec 06, 2011 at 10:26:04AM -0500, Ian Goodfellow wrote:
> I agree with David that it seems like the optimizer is broken, but I
> disagree that the problem is the termination criterion. There should
> not be any NaNs anywhere in the course of optimization.
I have also seen problems with NaN
On Tue, Dec 6, 2011 at 6:02 AM, Olivier Grisel wrote:
> Hi all,
>
> My tutorial on scikit-learn at PyCon has been accepted. Would anybody
> be interested in sprinting there? The sprint days are Mar. 12-15.
>
> http://us.pycon.org/2012/
>
> I think Wes has submitted a talk on Pandas too.
>
> I wou
On Tue, Dec 6, 2011 at 9:54 AM, Olivier Grisel wrote:
> 2011/12/6 Fernando Perez :
>> On Tue, Dec 6, 2011 at 3:02 AM, Olivier Grisel
>> wrote:
>>> My tutorial on scikit-learn at PyCon has been accepted. Would anybody
>>> be interested in sprinting there? The sprint days are Mar. 12-15.
>>>
>>>
On Tue, Dec 06, 2011 at 10:25:47PM +0100, Alexandre Gramfort wrote:
> regarding the scaling by n_samples using estimators I am convinced the right
> thing to do cf. my current PR to do this also on SVM models
I think that scaling by n_samples makes sense in the supervised learning
context (we ofte
> I am not going to get involved in the discussion of whether to
> normalize the coefficient or not, but in all cases the objective
> function should be clearly documented.
+1
if it's not done before the NIPS sprint that will be an easy first task
Alex
--
On Tue, Dec 6, 2011 at 11:46 PM, Alexandre Gramfort
wrote:
> I do confirm that Lasso and LassoLars both minimize
>
> 1/2n || y - Xw || + alpha ||w||_1
>
> and that the n should not be present in the sparse coding context.
>
> it means :
>
> http://scikit-learn.org/stable/modules/linear_model.html#
On Tue, Dec 6, 2011 at 4:25 PM, Alexandre Gramfort
wrote:
> regarding the scaling by n_samples using estimators I am convinced the right
> thing to do cf. my current PR to do this also on SVM models
I am not going to get involved in the discussion of whether to
normalize the coefficient or not, b
I do confirm that Lasso and LassoLars both minimize
1/2n || y - Xw || + alpha ||w||_1
and that the n should not be present in the sparse coding context.
it means :
http://scikit-learn.org/stable/modules/linear_model.html#lasso
is not correct. I don't know if this also affects the doc of the SG
2011/12/6 David Warde-Farley :
> On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote:
>> 2011/12/6 David Warde-Farley :
>> > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
>> >> > This actually gets at something I've been meaning to fiddle with and
>> >> > report bu
On Tue, Dec 06, 2011 at 08:43:06PM +0100, Olivier Grisel wrote:
> 2011/12/6 David Warde-Farley :
> > On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
> >> > This actually gets at something I've been meaning to fiddle with and
> >> > report but haven't had time: I'm not sure I co
regarding the scaling by n_samples using estimators I am convinced the right
thing to do cf. my current PR to do this also on SVM models
regarding the convergence pb and potential error, can you put a gist on github
to make the pb more easily reproducible.
Alex
On Tue, Dec 6, 2011 at 9:17 PM, Ia
ok, decreasing alpha by a factor of n_samples (5000 in my case) makes
sparse_encode behave much more reasonably.
However I still have two bugs to report:
1. The default algorithm returns this error:
Traceback (most recent call last):
File "s3c_sparsity_scale_plot.py", line 86, in
HS = spa
2011/12/6 David Warde-Farley :
> On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
>> > This actually gets at something I've been meaning to fiddle with and
>> > report but haven't had time: I'm not sure I completely trust the
>> > coordinate descent implementation in scikit-lea
On Tue, Dec 06, 2011 at 09:04:22AM +0100, Alexandre Gramfort wrote:
> > This actually gets at something I've been meaning to fiddle with and report
> > but haven't had time: I'm not sure I completely trust the coordinate
> > descent implementation in scikit-learn, because it seems to give me bogu
On Tue, Dec 6, 2011 at 10:37 AM, Peter Prettenhofer
wrote:
> 2011/12/6 James Bergstra :
>> On Fri, Dec 2, 2011 at 12:54 PM, Peter Prettenhofer
>> wrote:
>>> [...]
>>>
>>
>> How does the current tree implementation support boosting? I don't see
>> anything in the code about weighted samples.
>>
>>
On Tue, Dec 6, 2011 at 10:31 AM, Vlad Niculae wrote:
> On Tue, Dec 6, 2011 at 5:26 PM, Ian Goodfellow
> wrote:
>> I was initially confused by the specification of the dictionary size
>> for sparse_encode. It makes sense if you think of it as solving
>> multiple lasso problems, but as Vlad said i
2011/12/6 James Bergstra :
> On Fri, Dec 2, 2011 at 12:54 PM, Peter Prettenhofer
> wrote:
>> [...]
>>
>
> How does the current tree implementation support boosting? I don't see
> anything in the code about weighted samples.
>
> - James
You're right - we don't support sample weights at the moment
On Mon, Dec 5, 2011 at 4:38 PM, Alexandre Passos wrote:
> On Mon, Dec 5, 2011 at 16:26, James Bergstra wrote:
>>
>> This is definitely a good idea. I think randomly sampling is still
>> useful though. It is not hard to get into settings where the grid is
>> in theory very large and the user has a
On Tue, Dec 6, 2011 at 5:26 PM, Ian Goodfellow wrote:
> I was initially confused by the specification of the dictionary size
> for sparse_encode. It makes sense if you think of it as solving
> multiple lasso problems, but as Vlad said it is different from the
> dictionary learning setup. As Vlad s
On Tue, Dec 6, 2011 at 4:09 AM, Olivier Grisel wrote:
> 2011/12/6 Gael Varoquaux :
>> On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote:
>>> On Mon, Dec 5, 2011 at 13:31, James Bergstra
>>> wrote:
>>> > I should probably not have scared ppl off speaking of a 250-job
>>> > budget.
I was initially confused by the specification of the dictionary size
for sparse_encode. It makes sense if you think of it as solving
multiple lasso problems, but as Vlad said it is different from the
dictionary learning setup. As Vlad said there is no right or wrong,
but personally I think it is co
On Fri, Dec 2, 2011 at 12:54 PM, Peter Prettenhofer
wrote:
> 2011/12/2 James Bergstra :
>> I'm looking at the decision tree code and I'm not seeing any pruning
>> logic, or other logic to prevent over-fitting (other than requiring
>> that leaf nodes be sufficiently populated). Decision trees are
2011/12/6 Fernando Perez :
> On Tue, Dec 6, 2011 at 3:02 AM, Olivier Grisel
> wrote:
>> My tutorial on scikit-learn at PyCon has been accepted. Would anybody
>> be interested in sprinting there? The sprint days are Mar. 12-15.
>>
>> http://us.pycon.org/2012/
>>
>> I think Wes has submitted a tal
On Tue, Dec 6, 2011 at 3:02 AM, Olivier Grisel wrote:
> My tutorial on scikit-learn at PyCon has been accepted. Would anybody
> be interested in sprinting there? The sprint days are Mar. 12-15.
>
> http://us.pycon.org/2012/
>
> I think Wes has submitted a talk on Pandas too.
Min and I will be th
Hi all,
My tutorial on scikit-learn at PyCon has been accepted. Would anybody
be interested in sprinting there? The sprint days are Mar. 12-15.
http://us.pycon.org/2012/
I think Wes has submitted a talk on Pandas too.
I would be very interested in sprinting on machine learning & data
analytic
2011/12/6 Vlad Niculae :
> On Tue, Dec 6, 2011 at 12:07 PM, Olivier Grisel
> wrote:
>> 2011/12/6 Vlad Niculae :
>>>
>>> On Dec 6, 2011, at 11:04 , Gael Varoquaux wrote:
>>>
On Tue, Dec 06, 2011 at 09:41:56AM +0200, Vlad Niculae wrote:
> This is actually exactly how the module is designed.
On Tue, Dec 6, 2011 at 12:07 PM, Olivier Grisel
wrote:
> 2011/12/6 Vlad Niculae :
>>
>> On Dec 6, 2011, at 11:04 , Gael Varoquaux wrote:
>>
>>> On Tue, Dec 06, 2011 at 09:41:56AM +0200, Vlad Niculae wrote:
This is actually exactly how the module is designed.
>>>
>>> Great design! I should hav
2011/12/6 Vlad Niculae :
>
> On Dec 6, 2011, at 11:04 , Gael Varoquaux wrote:
>
>> On Tue, Dec 06, 2011 at 09:41:56AM +0200, Vlad Niculae wrote:
>>> This is actually exactly how the module is designed.
>>
>> Great design! I should have looked at it closer before writing my mail.
>>
>>> We have Base
> We don't want generators or list of functions as parameters though as
> it would break the ability to do cross validation and picklability.
Agreed, but this does seem to fit in the general usecase of on-line
learning, some hopefully we should be able to addresse this usecase in
the long run.
G
On Dec 6, 2011, at 11:04 , Gael Varoquaux wrote:
> On Tue, Dec 06, 2011 at 09:41:56AM +0200, Vlad Niculae wrote:
>> This is actually exactly how the module is designed.
>
> Great design! I should have looked at it closer before writing my mail.
>
>> We have BaseDictionaryLearning which only imp
2011/12/6 Andreas Mueller :
> On 12/06/2011 04:55 AM, Gael Varoquaux wrote:
>> On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote:
>>> - libsvm uses SMO (a dual solver) and supports non-linear kernels and
>>> has complexity ~ n_samples^3 hence cannot scale to large n_samples
>>> (e.g. m
2011/12/6 Gael Varoquaux :
> On Mon, Dec 05, 2011 at 01:41:53PM -0500, Alexandre Passos wrote:
>> On Mon, Dec 5, 2011 at 13:31, James Bergstra
>> wrote:
>> > I should probably not have scared ppl off speaking of a 250-job
>> > budget. My intuition would be that with 2-8 hyper-parameters, and 1-3
On Tue, Dec 06, 2011 at 09:41:56AM +0200, Vlad Niculae wrote:
> This is actually exactly how the module is designed.
Great design! I should have looked at it closer before writing my mail.
> We have BaseDictionaryLearning which only implements transforms. I
> didn't try but you should be able to
On 12/06/2011 04:55 AM, Gael Varoquaux wrote:
> On Mon, Dec 05, 2011 at 10:54:42PM +0100, Olivier Grisel wrote:
>> - libsvm uses SMO (a dual solver) and supports non-linear kernels and
>> has complexity ~ n_samples^3 hence cannot scale to large n_samples
>> (e.g. more than 50k).
>> - liblinear uses
> This actually gets at something I've been meaning to fiddle with and report
> but haven't had time: I'm not sure I completely trust the coordinate descent
> implementation in scikit-learn, because it seems to give me bogus answers a
> lot (i.e., the optimality conditions necessary for it to be
36 matches
Mail list logo