Hey Frederic. > I don't have a good understanding of scikit.learn, but I think that > all the hyper-parameter selection is a hot research topic for now. How > do you plan to include this in the current scikit.learn interface of > the fit method? > Depends on what you think of when you say hyper parameters? Things like learning rate, weight decay and size of the hidden layer can be cross validated.
Of course there are many other possibilities like pretraining, deeper networks, different learning rate schedules etc.. You are right, this is somewhat of an active research field. Though I have not seen conclusive evidence that any of these methods are consistently better than a vanilla mlp. Does that answer your question? > About torch7 being faster then Theano. I have heard that a few times, > but never see the papers, numbers, code or whatever subtential for > this. I would love to have any number with something about that. Do > you have some? But don't forget that in the Theano framework, we can > just implement all the trick that other people used to beat Theano. So > if torch7 is faster in some case, this will tell us where we can make > Theano faster! Can you tell us more about the comparison you refer to? > There will be a paper at this years NIPS large scale learning workshop. They just give total times on running networks so it is not clear why they are faster :( This is a bit unhelpful in improving existing implementations, I think. They claim to have pretty fast convolutions, I think. > Just a side note. I don't imply the comparison you refer to is biased, > but benchmarking is VERY HARD. So I like to have information on how > the comparison is done. We tried to make the Theano comparison as fair > as we could at that time. We spend days compiling each applications > with the same blas and other stuff like that. But since them torch > have new version released. > I just trusted the torch people there, though I haven't seen the benchmark code or anything. I know that you put a lot of effort into this. The point I was trying to make is that if one codes a simple mlp, then I think there is a good chance in being as fast as Theano since it is pretty clear what is going on and computation is dominated by the matrix products. > Thanks and I hope to have more info on the comparison people used to > tell that torch7 is faster then Theano and how you plan to work around > the hyper-paramete selection problem. That would be very valuable to > every body I think. > Side note from me: Don't get me wrong, I really like your project and I think you are making a great effort as a community. In particular the deep learning tutorials are great! I just think the goal is different to the sklearn one and I don't think it is a good idea to make sklearn dependent on theano. btw: I'm not a core sklearn developer so consider that my opinion and not sklearn opinion ;) Cheers, Andy ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
