>
> I get that if you have 10, 000 samples and 150 features, then your 
> system is over-determined.
> Where I think you go wrong is in worrying about a large number of 
> unique solutions. Over-determined typically means 0 solutions! (Have 
> another look at that page you linked, it's the under-determined 
> systems that need explicit regularization to find a unique solution.)
Sorry for the confusion. Yes, it can either mean 0 solutions or 
infinitely many solutions; that's why we use pseudo inverse to 
approximate a solution :).

Sorry again, I meant that constrains help in underdetermined systems 
like in the example below, I don't know why I confused between the two 
:S, thanks for pointing this out.

x1 + x2 + x3 = 5
x1+ x2 + x3 = 2


>
> Anyway, ML in general deals with noisy data (both in classification 
> and numeric regression) so that's actually the dominant reason why 
> regularization is used, even when the system is technically 
> overdetermined.
>
> For your proposal, it would probably be more accurate to explain that 
> when training data is noisy, it regularization during training can 
> lead to more accurate predictions on test data. That's why the 
> regularized ELM is worth implementing.

You are right.
>
> Also: new topic: Did you mention earlier in the thread that you need 
> derivatives to implement a regularized ELM? Why don't you just use 
> some of the existing linear (or even non-linear?) regression models in 
> sklearn to classify the features computed by the initial layers of the 
> ELM? This is a more detailed question that doesn't really affect your 
> proposal, but I'd like to hear your thoughts and maybe discuss it.
>
" regression models in sklearn to classify the features computed by the 
initial layers of the ELM" I didn't get this, do you mean we can use PCA 
or SVDs to get more meaningful hidden features?

The derivative is for solving the dual optimization problem by the KKT 
theorem, allowing us to add constraints like in SVM.
Please look at page 6 in, 
http://www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf

> Sounds good, but I wouldn't be so confident they always take seconds 
> to train. I think some deep vision system models are pretty much just 
> big convolutional ELMs (e.g. 
> http://jmlr.org/proceedings/papers/v28/bergstra13.pdf) and they can 
> take up to say, an hour of GPU time to (a) compute all of the features 
> for a big data set and (b) train the linear output model. Depending on 
> your data set you might want to use more than 150 output neurons! When 
> I was doing those experiments, it seemed that models got better and 
> better the more outputs I used, they just take longer to train and 
> eventually don't fit in memory.
True, well "seconds to train" was in its figurative sense. However, the 
time it takes is nothing like backpropagation ;). You could go as large 
as 1000 hidden neurons and time is still not an issue. It wouldn't be 
slower than SVM, for example. The real issue lies in memory :).

Thank you!




------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to