Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Gilles Louppe
Hi Youssef, Regarding memory usage, you should know that it'll basically blow up if you increase the number of jobs. With the current implementation, you'll need O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes). That issue stems from the use of joblib which basically

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Peter Prettenhofer
Hi Youssef, please make sure that you use the latest version of sklearn (= 0.13) - we did some enhancements to the sub-sampling procedure lately. Looking at the RandomForest code - it seems that the jobs=-1 should not be the issue for the parallel training of the trees since ``n_jobs =

Re: [Scikit-learn-general] question about scikit / sklearn K folds cross validation

2013-04-25 Thread Fabian Pedregosa
On Wed, Apr 24, 2013 at 5:03 PM, John Richey ric...@vt.edu wrote: Hello, I am having difficulty with a cross validation problem, and any help would be much appreciated. I have a large number of research subjects from 15 different data collection sites. I want to assess whether site has any

Re: [Scikit-learn-general] question about scikit / sklearn K folds cross validation

2013-04-25 Thread bthirion
Dear John, Hello, I am having difficulty with a cross validation problem, and any help would be much appreciated. I have a large number of research subjects from 15 different data collection sites. I want to assess whether site has any influence on the data. The simplest way to do this

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Jaques Grobler
I also think it will be great to have this example on the website. Do you mean like an interactive example that works similiar to the SVM Gui examplehttp://scikit-learn.org/dev/auto_examples/applications/svm_gui.html#example-applications-svm-gui-py , but for understand the effects shifting and

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Gael Varoquaux
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote: I also think it will be great to have this example on the website. Do you mean like an interactive example that works similiar to the SVM Gui example , but for understand the effects shifting and scaling of data has on the rate

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Ronnie Ghose
I think he means what increases/benefits do you get from rescaling features e.g. minmax or preprocessing.scale On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote: I also think it will be great to have this example on the website. Do you mean like an interactive example that works

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi
thank you very much Peter, you are right about the n_jobs, something was going wrong with that. When n_jobs = -1, for larger dataset (1E6 for this case), no cpu was being used and the process was hanging for a while. getting n_jobs = 1 made everything work. yes, I will look into the iPython

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi
ohh makes total sense now!! thank you Gilles!! Y On Thu, Apr 25, 2013 at 2:38 AM, Gilles Louppe g.lou...@gmail.com wrote: Hi Youssef, Regarding memory usage, you should know that it'll basically blow up if you increase the number of jobs. With the current implementation, you'll need

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Youssef Barhomi
Hi Brian, thanks for your feedback. were you able to reproduce their results? how big was your dataset that you have processed so far with an RF? the MS people have used a distributed RF, so yes, the features I am guessing were being computed in parallel on all these cores. Though, I am still new

Re: [Scikit-learn-general] Distributed RandomForests

2013-04-25 Thread Ronnie Ghose
I've tried larger data sets. It wasn't pretty, much fewer features though On Apr 25, 2013 4:03 AM, Peter Prettenhofer peter.prettenho...@gmail.com wrote: Hi Youssef, please make sure that you use the latest version of sklearn (= 0.13) - we did some enhancements to the sub-sampling procedure

[Scikit-learn-general] [GSOC] Application for the idea Coordinated descent in linear models beyond squared loss (eg Logistic)

2013-04-25 Thread Mingxing Zhang
Hello everyone, I'm Mingxing Zhang, currently a Ph.D. student at Tsinghua University in China (majoered in Computer Science in Institute for High Performance Computing). If possible, I want to apply for the idea Coordinated descent in linear models beyond squared loss (eg Logistic) in

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Shishir Pandey
Thanks Ronnie for pointing out the exact method in the scikit-learn library. Yes, that is exactly what I was asking how does the rescaling of features affect the gradient descent algorithm. Since, stochastic gradient descent is an algorithm which is used in machine learning quite a lot. It

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Matthieu Brucher
Hi, Do you mean scaling the parameters of the cost function? If so, scaling will change the surface of the cost function, of course. It's kind of complicated to say anything about how the surface will behave, it completely depends of the cost function you are using. A cost function that is linear

Re: [Scikit-learn-general] GSoC applications are open (based on Fwd: [Soc2013-general] Student Application Template (Applications start April 22!))

2013-04-25 Thread Kemal Eren
Okay, then I'll put together a biclustering proposal tomorrow after work. It will be a difficult task to come up with a good set of core algorithms, because the field is so varied. There are over a hundred published methods, each of which formulates the biclustering problem differently. Any

[Scikit-learn-general] [scikit-learn] plot SVM results and classification space

2013-04-25 Thread Gianni Iannelli
Hi everyone! I'm new to scikit and I'm gettin trouble with some visualization method!!! What I wanna do is visualize in a plot/graph, something like this: http://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html Essentialy I would see the background color based on my

Re: [Scikit-learn-general] [scikit-learn] plot SVM results and classification space

2013-04-25 Thread Andreas Mueller
Hi Gianni. There is a fundamental problem with what you want to do, independent of SVMs. In the plot, the 2d plane of the pot represents the input space. Your input space is 6d. You can not represent 6d on a computer monitor (that I know of). So there is no way to plot your data. What you

Re: [Scikit-learn-general] Effects of shifting and scaling on Gradient Descent

2013-04-25 Thread Shishir Pandey
I did not mean parameters of the cost function. I only want to scale the input variables. Suppose one of the independent variables has a range from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and others say in their machine learning lectures that one should rescale the