you can have a look at sklearn.cross_validation.train_test_split() and
some other methods
from here:
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.cross_validation
On Fri, Jun 21, 2013 at 3:59 AM, Joel Nothman
jnoth...@student.usyd.edu.auwrote:
Please see
On Fri, Jun 21, 2013 at 6:56 AM, Nicolas Trésegnie
nicolas.treseg...@gmail.com wrote:
- To impute only some of the missing values (rows, columns or a
combination)
I think this can be added later if you have time. For now, I would rather
not clutter the API.
For rows, one can just use
Hi all,
I would like to know whether we have bootstrap aggregating functionality
in scikit-learn library. If so, How do I use that?
(If it doesn't exist I would like to implement that explicitly to cohere
with the learning algorithms we have in scikit-learn)
Thank you
Hi,
Such ensembles are not implemented at the moment.
Gilles
On 21 June 2013 09:59, Maheshakya Wijewardena pmaheshak...@gmail.com wrote:
Hi all,
I would like to know whether we have bootstrap aggregating functionality in
scikit-learn library. If so, How do I use that?
(If it doesn't exist
I'm doing a brownfield development for a university project and I'm so
interested in this field. If I start implementing that kind ensemble
method, will it suit in the scope of this scikit-learn project. Will it be
useful for the users?( I've felt the need of that personally. It has
improved the
can anyone give me a sample algorithm for one hot encoding used in
scikit-learn?
On Thu, Jun 20, 2013 at 8:37 PM, Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
you can try an ordinal encoding instead - just map each categorical value
to an integer so that you end up with 8 numerical
? you already use one-hot encoding in your example (
preprocessing.OneHotEncoder)
2013/6/21 Maheshakya Wijewardena pmaheshak...@gmail.com
can anyone give me a sample algorithm for one hot encoding used in
scikit-learn?
On Thu, Jun 20, 2013 at 8:37 PM, Peter Prettenhofer
I'd like to analyse a bit and encode using that method to cohere with
random forests in scikit-learn.
On Fri, Jun 21, 2013 at 2:08 PM, Peter Prettenhofer
peter.prettenho...@gmail.com wrote:
? you already use one-hot encoding in your example (
preprocessing.OneHotEncoder)
2013/6/21
2013/6/21 Gilles Louppe g.lou...@gmail.com:
Hi,
Such ensembles are not implemented at the moment.
Ensembles of trees have a `bootstrap` parameter that do bagging,
although they also randomize the feature selection and optionally
split locations.
--
Olivier
So that means that bagging can only be applied to trees. How about
implementing a general module so that it can be applied on more learning
algorithms.
On Fri, Jun 21, 2013 at 4:17 PM, Olivier Grisel olivier.gri...@ensta.orgwrote:
2013/6/21 Gilles Louppe g.lou...@gmail.com:
Hi,
Such
What do you mean? It's pretty trivial to implement a one-hot encoding, the
issue is that if you use a non-sparse format then you'll end up with a
matrix which is far too dense to be practical, for anything but trivial
examples.
On Fri, Jun 21, 2013 at 10:46 AM, Maheshakya Wijewardena
On 06/21/2013 12:56 PM, Maheshakya Wijewardena wrote:
So that means that bagging can only be applied to trees. How about
implementing a general module so that it can be applied on more
learning algorithms.
I think that would be great.
You should look at the forest implementation to get
Thank you. I'll have look at the forest implementation and check what can
be done and inform you.
I'd like to have a look at Gilles s code. If it's convenient, can you tell
how you tried to implement that?
best
Maheshakya
On Fri, Jun 21, 2013 at 6:55 PM, Andreas Mueller
On 06/21/2013 03:37 PM, Maheshakya Wijewardena wrote:
Thank you. I'll have look at the forest implementation and check what
can be done and inform you.
I'd like to have a look at Gilles s code. If it's convenient, can you
tell how you tried to implement that?
I think it was mostly removing
Ok, I got it.
I'll look at the code and see what can be done.
Thank you.
On Fri, Jun 21, 2013 at 7:17 PM, Andreas Mueller
amuel...@ais.uni-bonn.dewrote:
On 06/21/2013 03:37 PM, Maheshakya Wijewardena wrote:
Thank you. I'll have look at the forest implementation and check what
can be done
Thank You very much for the link!! It does closely what I wanna do!
In my case I have two classes that are for example 0 and 1. I wanna keep the
distribution (in the training set and so also the test set) between them
similar. And I also need that are choosen randomly, I don't care if in one
StratifiedKFold will keep the class distribution the same for you:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold
There are lots of metrics (score functions, etc.) available:
StratifiedKFold will keep the class distribution the same for you:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold
I was looking at this, it is written:
This cross-validation object is a variation of KFold,
Oh sorry, I was thinking of balanced sets for cross validation, rather than
a training and testing split. I don't know of a convenience routine
specifically for producing stratified training and testing sets. If both
your classes have decent support and the training and testing set sizes
aren't
Ah ok! Yeah, I was thinking that having in my dataset 50/50 (also 40/60) of my
dataset for the two classes will be not a problem but since that the ratio is
1/3 I would prefere to have the same distribution for the two, then my choose
to use the train_test_split method. I don't know if there
Found the error...I post below. The problem is that metrics.confusion_matrix
accept lists and not numpy.array. So I converted everything in list:
#Compute the confusion matrixy_testlist_tmp =
y_test.transpose().tolist()y_testlist = y_testlist_tmp[0]resultlist =
result.tolist()
Dear All,
I'm stuck with a problem and I don't know if it's a bug. I'm defining the
optimization parameter C and gamma for my SVM in this way:
C = 10.0 ** numpy.arange(-3, 9)gamma = 10.0 ** numpy.arange(-6, 4)param_grid =
dict(gamma=gamma, C=C)svr = svm.SVC(kernel='rbf')clfopt =
22 matches
Mail list logo