Re: [Scikit-learn-general] Canonical Correlation Forests

2015-12-10 Thread Artem
Hi Scott The paper is quite new, and sklearn has a policy about introducing new algorithms. I'd say we need more time for others to test it and prove its usefulness. On Thu, Dec 10, 2015

Re: [Scikit-learn-general] Contribution to Scikit

2015-10-23 Thread Artem
Hi Rajlaxmi There are *many issue*s labeled easy with no assignee. On Fri, Oct 23, 2015 at 2:43 PM, Rajlaxmi Sahu wrote: > Hi, > > I would like to contribute to

Re: [Scikit-learn-general] passing optional parameters to fit() when using a pipeline

2015-09-20 Thread Artem
Hi Don't pass any parameters to fit method. Current API assumes that you set all the parameters in estimator's constructor (__init__ method). It's a bit nasty to set validation set during construction stage, but there's no better approach. On Sun, Sep 20, 2015 at 3:47 PM, okek padokek

Re: [Scikit-learn-general] passing optional parameters to fit() when using a pipeline

2015-09-20 Thread Artem
el__n_epochs = [10, 20], *my_model__validation_set > = [???]*) > estimator = > ​​ > GridSearchCV(pipe, my_params, verbose=5, cv=5) > estimator.fit(x_train, y_train) > > > ? > > On Sun, Sep 20, 2015 at 10:10 AM, Artem <barmaley@gmail.com> wrote: > >> Hi >> >> Don

Re: [Scikit-learn-general] About C50

2015-08-22 Thread Artem
Do you mean C5.0 which is further development of C4.5 tree algorithm? If so, then the answer is no, it's not implemented in sklearn. Furthermore, according to wikipedia https://en.wikipedia.org/wiki/C4.5_algorithm#Improvements_in_C5.0.2FSee5_algorithm, C5.0 is a commercial product and (AFAIK)

Re: [Scikit-learn-general] AUC realy low

2015-08-05 Thread Artem
there is the misstake? it seems that i should turn the expected vector y_test ? On 4 August 2015 at 16:36, Artem barmaley@gmail.com wrote: Hi Herbert The worst value for AUC is 0.5 actually. Having values close to 0 means than you can get a value as close to 1 by just changing your predictions

Re: [Scikit-learn-general] AUC realy low

2015-08-04 Thread Artem
Hi Herbert The worst value for AUC is 0.5 actually. Having values close to 0 means than you can get a value as close to 1 by just changing your predictions (predict class 1 when you think it's 0 and vice versa). Are you sure you didn't confuse classes somewhere along the lines? (You might have

Re: [Scikit-learn-general] Speed up transformation step with multiple 1 vs rest binary text classifiers.

2015-07-02 Thread Artem
Hi Nikhil Do you somehow do topic-specific TF-IDF transformations? Could you provide a small (pseudo) code snippet for what you're doing? I may be wrong, but judging from what you wrote, it doesn't look like you use scikit-learn's OneVsRestClassifier

Re: [Scikit-learn-general] RandomForestClassifier with warm_start and n_jobs

2015-06-24 Thread Artem
Hi Dale Thanks for the code sample! Indeed, warm_start does not disable parallelization, I can confirm by both running your code and checking the source. Moreover, that example you mentioned was added on May, 2nd, and it doesn't look

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-31 Thread Artem
reference. Michael On Fri, May 29, 2015 at 6:24 PM, Artem barmaley@gmail.com wrote: So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored the meat into a function. I also rewrote it as a first order

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-31 Thread Artem
version, but its speedup isn't that significant. On Sun, May 31, 2015 at 9:29 PM, Michael Eickenberg michael.eickenb...@gmail.com wrote: On Sun, May 31, 2015 at 7:25 PM, Artem barmaley@gmail.com wrote: I added a simple benchmark https://github.com/Barmaley-exe/scikit-learn/blob/metric

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-29 Thread Artem
So, I created a WIP PR dedicated to NCA: https://github.com/scikit-learn/scikit-learn/pull/4789 As suggested by Michael, I refactored the meat into a function. I also rewrote it as a first order oracle, so I can (and I do) use scipy's optimizers. I've seen scipy.optimize.minimize (apparently,

Re: [Scikit-learn-general] [GSoC2015 metric learning]

2015-05-28 Thread Artem
it's enough to use scipy's conjugate gradients optimizer? On Mon, May 4, 2015 at 2:02 PM, Michael Eickenberg michael.eickenb...@gmail.com wrote: Dear Artem, congratulations on the acceptance of your GSoC proposal! I am certain there will be a very interesting summer ahead of us. Kyle and I

Re: [Scikit-learn-general] GSoC Community Bonding

2015-05-27 Thread Artem
Hi Gael ​ My GSoC blog url is http://barmaley-exe.blogspot.com As required, there's relevant tag gsoc15 On Mon, May 25, 2015 at 3:08 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Hi GSOC students, And welcome. I hope that you will have a fun and productive summer. To communicate

Re: [Scikit-learn-general] use features from a sklearn branch

2015-05-08 Thread Artem
Looks like you have a circular import, and Python doesn't like them. Sorry, I don't have a quick hack solution to this, all I can propose is to look at import chain, understand, which import breaks it all, and get rid of it. For example, you can move some imports into functions, so they're not

[Scikit-learn-general] [GSoC] Project Metric Learning

2015-05-02 Thread Artem
Hello Andreas Hello Michael First, I'm happy to be selected as this year's scikit-learn student, and hope to make a great work. According to my timeline https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Metric-Learning-module#april-27th--may-24th, I'm going to use community

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Artem
​GridSearchCV is not a​n estimator, but an utility to find one. So you should `fit` grid search first in order to find that classifier that performs well on cv-splits, and then use it. Like this gbr = GradientBoostingClassifier() parameters = {'learning_rate': [0.1, 0.01, 0.001],

Re: [Scikit-learn-general] Degree parameter in Nu-Support Vector Classification

2015-04-22 Thread Artem
Looks like a typo, indeed. Libsvm only uses `degree` for polynomial kernels. On Wed, Apr 22, 2015 at 11:39 PM, Sebastian Raschka se.rasc...@gmail.com wrote: Hi all, I am wondering a little bit about this documentation of the degree parameter on NuSVM and SVR: degree : int, optional

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
`AgglomerativeClustering` works well with a custom metric, and Spectral Clustering and Affinity Propagation can work with a [n_samples, n_samples] affinity matrix. On Thu, Mar 26, 2015 at 12:08 PM, Mathieu Blondel math...@mblondel.org wrote: On Thu, Mar 26, 2015 at 5:49 PM, Artem barmaley@gmail.com

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
SimilarityTransformer(TransformerMixin): def fit(self, X, y): self.X_ = X; return self def transform(self, X): return -euclidean_distances(X, self.X_) On Thu, Mar 26, 2015 at 6:28 PM, Artem barmaley@gmail.com wrote: Yes, the only need for such similarity learners

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
distances and needs affinity=precomputed (otherwise, it assumes that X is [n_samples, n_features]) - Instead of duplicating each class, you could create a generic transformer that outputs a similarity / distance matrix from X. M. On Thu, Mar 26, 2015 at 4:50 PM, Artem barmaley@gmail.com wrote

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-26 Thread Artem
how this would look like. M. On Thu, Mar 26, 2015 at 5:18 AM, Artem barmaley@gmail.com wrote: ​Ok, so I removed matrix y from the proposal https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Metric-Learning-module. Therefore I also shortened the first iteration by one

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Artem
this by either building upon it or testing against it, after having evaluated it. Michael On Wed, Mar 25, 2015 at 9:22 PM, Andreas Mueller t3k...@gmail.com wrote: You can always amend your melange proposal, so there is no reason not to submit an early version. On 03/25/2015 04:18 PM, Artem wrote

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
() pipeline = Pipeline([ ('ml', ml), ('knn', knn) ]) pipeline.fit(X_train, y_train)​ pipeline.predict(X_test) # ml.transform returns transformed data On Tue, Mar 24, 2015 at 1:43 AM, Andreas Mueller t3k...@gmail.com wrote: Hi Artem. I thought that was you, but I wasn't sure. Great, I linked

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
. I'll elaborate my proposal later today. On Tue, Mar 24, 2015 at 2:34 PM, Joel Nothman joel.noth...@gmail.com wrote: Hi Artem, I've taken a look at your proposal. I think this is an interesting contribution, but I suspect your proposal is far too ambitious: - The proposal doesn't well account

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
(ITML). By the end of the 10th week I might still not have the second review completed, but it's okay, there're 2+ more weeks to get it done. On Wed, Mar 25, 2015 at 4:04 AM, Vlad Niculae zephy...@gmail.com wrote: Hi Artem, hi everybody, There were two API issues and I think both need thought

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
You mean matrix-like y? Gael said FWIW It'll require some changes to cross-validation routines.​ I'd rather we try not to add new needs and usecases to these before we ​ ​ release 1.0. We are already having a hard time covering in a homogeneous ​ ​ way all the possible options.​ ​Then

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
​ In other words, I would like to get in an API freeze state where we add/modify only essentials stuff to the API. ​Ok, then I suppose, the easiest way would be to create 2 kind of transformers for each method: one that transforms the space so that Euclidean distance acts like Mahalanobis'

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
, apparently, there's no justification to use kernel approximation with ITML, since even the regular KPCA trick doesn't apply to it. On Mon, Mar 23, 2015 at 5:07 PM, Andreas Mueller t3k...@gmail.com wrote: On 03/21/2015 08:54 PM, Artem wrote: Are there any objections on Joel's variant of y

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-23 Thread Artem
with any method. There are also methods that directly learn a nonlinear distance, for instance GB-LMNN (http://www-bcf.usc.edu/~feisha/pubs/chi2.pdf) or some approaches based on deep neural nets. Aurélien Le 3/23/15 11:43 PM, Andreas Mueller a écrit : Hi Artem. I thought that was you

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Artem
It's worth noting that there was a similar project https://github.com/scikit-learn/scikit-learn/pull/2387 2 years ago, but unfortunately it wasn't completed. I made some work upon that, but I didn't get any feedback. On Tue, Mar 24, 2015 at 3:23 AM, Vlad Niculae zephy...@gmail.com wrote: Hi

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-19 Thread Artem
, I think this does look like a good basis for a proposal :) On 03/18/2015 05:14 PM, Artem wrote: ​ Do you think this interface would be useful enough? ​One of mentioned methods (LMNN) actually uses prior knowledge in exactly the same way, by comparing labels' equality. Though

[Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
Hello everyone Recently I mentioned metric learning as one of possible projects for this years' GSoC, and would like to hear your comments. Metric learning, as follows from the name, is about learning distance functions. Usually the metric that is learned is a Mahalanobis metric, thus the

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
?) But how would the labels for the fit look like? Cheers, Andy On 03/18/2015 08:39 AM, Artem wrote: Hello everyone Recently I mentioned metric learning as one of possible projects for this years' GSoC, and would like to hear your comments. Metric learning, as follows from the name

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
gael.varoqu...@normalesup.org wrote: On Wed, Mar 18, 2015 at 07:21:18PM +0300, Artem wrote: As to what y should look like, it depends on what we'd like the algorithm to do. We can go with usual y vector consisting of feature labels. Actually, LMNN is done this way, the optimization objective depends

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
the current API. Can you explain this statement a bit more We can go with usual y vector consisting of feature labels ? Thanks, Andy On 03/18/2015 12:55 PM, Artem wrote: Well, we could go with fit(X, y), but since algorithms use S and D, it'd better to give user a way to specify them

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-18 Thread Artem
, but overfitting could be a problem, indeed. Not sure how severe it can be. On Wed, Mar 18, 2015 at 10:07 PM, Andreas Mueller t3k...@gmail.com wrote: On 03/18/2015 02:53 PM, Artem wrote: I mean that if we were solving classification, we would have y that tells us which class each example belongs

Re: [Scikit-learn-general] GSoC2015 topics

2015-03-03 Thread Artem
There was a discussion http://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg06931.html on metric learning a while ago, and several people expressed interest to see (and contribute to) it in sklearn. But, it looks like that attempt didn't get anywhere. What about a project to

Re: [Scikit-learn-general] random forests with njobs1

2015-02-27 Thread Artem
Do you have joblib installed? Does n_jobs 1 work with other algorithms? On Sat, Feb 28, 2015 at 12:55 AM, Pagliari, Roberto rpagli...@appcomsci.com wrote: When using random forests with njobs 1, I see one python process only. Does RF support using multiprocessor module?

Re: [Scikit-learn-general] Desicion_function SVM returns one class score only

2015-02-26 Thread Artem
Hi Shalu decision_function returns (signed) distance to each of separating hyperplanes. There's one hyperplane for each pair of classes, so in case of 2 classes there'd be one hyperplane. Iris dataset contains 3 classes, so there are 3 possible pairs, and thus 3 columns in the result of

[Scikit-learn-general] Circular import

2015-02-19 Thread Artem
Hello While working on matrix factorization with missing values https://github.com/scikit-learn/scikit-learn/pull/4237 I faced a circular import issue ( https://travis-ci.org/scikit-learn/scikit-learn/jobs/50276638#L1362). The problem is that I want to add a new imputer, which should reside in

Re: [Scikit-learn-general] Regarding classification with one variable only

2015-02-16 Thread Artem
X needs to be a matrix of shape (n_samples, n_features), not a vector. You need to reshape it into the matrix by doing X_train = X_train.reshape( (len(X_train), 1) ) On Mon, Feb 16, 2015 at 4:01 PM, shalu jhanwar shalu.jhanwa...@gmail.com wrote: Hi Scikit fans, I am facing following error

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-12 Thread Artem
t3k...@gmail.com wrote: On 02/12/2015 04:47 AM, Artem wrote: There are several packages (spearmint, hyperopt, MOE) offering Bayesian Optimization to the problem of choosing hyperparameters. Wouldn't it be nice to add such *Search[CV] to sklearn? Yes. I haven't really looked much

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-12 Thread Artem
There are several packages (spearmint, hyperopt, MOE) offering Bayesian Optimization to the problem of choosing hyperparameters. Wouldn't it be nice to add such *Search[CV] to sklearn? On Thu, Feb 12, 2015 at 12:33 PM, Mathieu Blondel math...@mblondel.org wrote: A grid-search related project

Re: [Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Artem
​fit expects 2-dimensional input, whereas X[:, 0] is one dimensional. You can either reshape it manually: ​regr.fit(x_train[:, 0].reshape((x_train.shape[0], 1)), x_train[:, 1]) or use slices to select continuous range of columns: ​regr.fit(x_train[:, :1], x_train[:, 1]) What exception tells

Re: [Scikit-learn-general] regression with one independent variable

2015-02-11 Thread Artem
and order. What does the line below do? Thank yuou, *From:* Artem [mailto:barmaley@gmail.com] *Sent:* Wednesday, February 11, 2015 1:39 PM *To:* scikit-learn-general@lists.sourceforge.net *Subject:* Re: [Scikit-learn-general] regression with one independent variable ​fit expects

Re: [Scikit-learn-general] GSoC2015 topics

2015-02-11 Thread Artem
There was an interview with Ilya Sutskever about deep learning ( http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html), where he states that DL's success can be attributed to 3 main breakthroughs: 1. Computing resources. 2. Large datasets. 3. Tricks of the trade, discovered in