Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if you
increase the number of jobs. With the current implementation, you'll need
O(n_jobs * |X| * 2) in memory space (where |X| is the size of X, in bytes).
That issue stems from the use of joblib which basically
Hi Youssef,
please make sure that you use the latest version of sklearn (= 0.13) - we
did some enhancements to the sub-sampling procedure lately.
Looking at the RandomForest code - it seems that the jobs=-1 should not be
the issue for the parallel training of the trees since ``n_jobs =
On Wed, Apr 24, 2013 at 5:03 PM, John Richey ric...@vt.edu wrote:
Hello,
I am having difficulty with a cross validation problem, and any help would
be much appreciated.
I have a large number of research subjects from 15 different data
collection sites. I want to assess whether site has any
Dear John,
Hello,
I am having difficulty with a cross validation problem, and any help
would be much appreciated.
I have a large number of research subjects from 15 different data
collection sites. I want to assess whether site has any influence on
the data.
The simplest way to do this
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM Gui
examplehttp://scikit-learn.org/dev/auto_examples/applications/svm_gui.html#example-applications-svm-gui-py
,
but for
understand the effects shifting and
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote:
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate
I think he means what increases/benefits do you get from rescaling features
e.g. minmax or preprocessing.scale
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler wrote:
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works
thank you very much Peter,
you are right about the n_jobs, something was going wrong with that. When
n_jobs = -1, for larger dataset (1E6 for this case), no cpu was being used
and the process was hanging for a while. getting n_jobs = 1 made everything
work.
yes, I will look into the iPython
ohh makes total sense now!! thank you Gilles!!
Y
On Thu, Apr 25, 2013 at 2:38 AM, Gilles Louppe g.lou...@gmail.com wrote:
Hi Youssef,
Regarding memory usage, you should know that it'll basically blow up if
you increase the number of jobs. With the current implementation, you'll
need
Hi Brian,
thanks for your feedback. were you able to reproduce their results? how big
was your dataset that you have processed so far with an RF?
the MS people have used a distributed RF, so yes, the features I am
guessing were being computed in parallel on all these cores. Though, I am
still new
I've tried larger data sets. It wasn't pretty, much fewer features though
On Apr 25, 2013 4:03 AM, Peter Prettenhofer peter.prettenho...@gmail.com
wrote:
Hi Youssef,
please make sure that you use the latest version of sklearn (= 0.13) - we
did some enhancements to the sub-sampling procedure
Hello everyone,
I'm Mingxing Zhang, currently a Ph.D. student at Tsinghua University in
China (majoered in Computer Science in Institute for High Performance
Computing).
If possible, I want to apply for the idea Coordinated descent in
linear models beyond squared loss (eg Logistic) in
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It
Hi,
Do you mean scaling the parameters of the cost function? If so, scaling
will change the surface of the cost function, of course. It's kind of
complicated to say anything about how the surface will behave, it
completely depends of the cost function you are using. A cost function that
is linear
Okay, then I'll put together a biclustering proposal tomorrow after work.
It will be a difficult task to come up with a good set of core algorithms,
because the field is so varied. There are over a hundred published methods,
each of which formulates the biclustering problem differently. Any
Hi everyone!
I'm new to scikit and I'm gettin trouble with some visualization method!!!
What I wanna do is visualize in a plot/graph, something like this:
http://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html
Essentialy I would see the background color based on my
Hi Gianni.
There is a fundamental problem with what you want to do, independent of
SVMs.
In the plot, the 2d plane of the pot represents the input space.
Your input space is 6d. You can not represent 6d on a computer monitor
(that I know of).
So there is no way to plot your data.
What you
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the
18 matches
Mail list logo