Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Gael Varoquaux
On Tue, Mar 24, 2015 at 07:39:17PM -0400, Vlad Niculae wrote: > 1. The design of multiple metric support is important and would bring an > immense usability gain. But it will also require a framework of its own. I would say that this is to be considered in a second step. G -

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Gael Varoquaux
> I think the problem with matrix-like Y is that Y would be symmetric. Thus for > doing cross-validation one would need to select both rows and columns. Correct. Then ideed it's off limits. These are specifically the kind of problem I would like not to have to worry about. The combination of all t

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Gael Varoquaux
On Tue, Mar 24, 2015 at 09:04:28PM -0400, Vlad Niculae wrote: > There were two API issues and I think both need thought. The first is the > matrix-like Y which at the moment overlaps semantically with multilabel and > multioutput-multiclass (though I think it could be seen as a form of > multi-t

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Gael Varoquaux
On Wed, Mar 25, 2015 at 03:25:40AM +0300, Artem wrote: > You mean matrix-like y? matrix-like y (ie y 2D: n_sample, n_features) is already covered in our API, so I see no problem with it. -- Dive into the World of Parallel

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Gael Varoquaux
On Wed, Mar 25, 2015 at 11:22:51AM +0900, Mathieu Blondel wrote: > The part I am most enthusiastic about is fixing the CV generators, though this > could be a merge nightmare since we are in the process of changing the API. We > need it to figure out which modifications are most likely to get in fi

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Mathieu Blondel
The part I am most enthusiastic about is fixing the CV generators, though this could be a merge nightmare since we are in the process of changing the API. We need it to figure out which modifications are most likely to get in first. Lars did some work on semi-supervised naive bayes. Since this is

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Mathieu Blondel
I think the problem with matrix-like Y is that Y would be symmetric. Thus for doing cross-validation one would need to select both rows and columns. This is why I suggested to add a _pairwise_y property like the _pairwise property that we use in kernel methods, e.g., https://github.com/scikit-learn

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

2015-03-24 Thread Mathieu Blondel
Hi Lucas, Instead of creating a new thread every time, it would be nice if you could reply directly in the same thread. This would make the discussion easier to follow. (To do so you need to be fully subscribed to the ML. I'm guessing you may be subscribed to the digest version) Thanks, M. On W

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread Kyle Kastner
The undefined symbol seems like an issue with LD_LIBRARY_PATH or the like. You might try running 'ldd _traversal.so' to see if all the links are resolved correctly. I have also had to change which libm was being pointed at for similar errors related to Anaconda in the past, are you using the defaul

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Kyle Kastner
I like the fact that this can broken into nice parts. I also think documentation should be farther up the list, and math part lumped in. GMM cleanup should probably start out of the gate, as fixing that will define what API/init changes have to stay consistent in the other two models. Is there any

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
Hi Vlad 1. Usually metric learning uses supervision in one of 2 forms: either two sets of similar (distance is less than some predefined value u) and dissimilar (distance is bigger than l) pairs, or a set of triplets (x, y, z) such that d(x, y) < d(x, z). Though, I think, it's possible to generali

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread João Felipe Santos
It was inside a virtualenv, so this kind of thing supposedly was taken care of automatically. I installed using "pip install scikit-learn". I'll try a manual install tomorrow to see what happens. -- João Felipe Santos On 24 March 2015 at 20:25, Kyle Kastner wrote: > How did you install it? > py

Re: [Scikit-learn-general] [GSOC] Global optimization based Hyper parameter optimization Hamzeh Alsalhi

2015-03-24 Thread hamzeh alsalhi
Hi Andy! I improved my proposal. My background is somewhat beginner so I am doing my best to make sure I understand what I am getting myself into. I have removed the redundant problem statement. I added details for what I think implementation will consist of, right now I am mostly referencing the

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Vlad Niculae
Hi Artem, hi everybody, There were two API issues and I think both need thought. The first is the matrix-like Y which at the moment overlaps semantically with multilabel and multioutput-multiclass (though I think it could be seen as a form of multi-target regression…) The second is the `estima

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
I would focus on the API of this functionality and how/what users will be allowed to specify. To me, this is a particularly tricky bit of the PR. As Vlad said, take a close look at GridSearchCV and RandomizedSearchCV and see how they interact with the codebase. Do you plan to find good defaults for

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Boyuan Deng
Hi Vlad: Thank you for your comments! I think I should rename that part as something like "add new implementations and improve existing ones" and mention self-taught learning as an example. We can further discuss what semi-supervised algorithms (one or more) we want later on. Exact dates have

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread Kyle Kastner
How did you install it? python setup.py develop or install? Did you have to use --user? On Tue, Mar 24, 2015 at 7:41 PM, João Felipe Santos wrote: > Hi, > > I am using MKL with Numpy and Scipy on a cluster and just installed > scikit-learn. The setup process goes without any issue, but if I try t

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
You mean matrix-like y? Gael said > > FWIW It'll require some changes to cross-validation routines.​ > I'd rather we try not to add new needs and usecases to these before we > ​ ​ > release 1.0. We are already having a hard time covering in a homogeneous > ​ ​ > way all the possible options.​

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Vlad Niculae
Hi Cristoph, Gael, hi everyone, > On 24 Mar 2015, at 18:09, Gael Varoquaux > wrote: > >> Don't you think that I could also benchmark models that are not >> implemented in sklearn? […] > > I am personally less interested in that. We have already a lot in > scikit-learn and more than enough to

[Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

2015-03-24 Thread Luca Puggini
Hi guys, thanks for the interest. Some comments below Message: 1 > Date: Tue, 24 Mar 2015 16:32:40 -0400 > From: Andy > Subject: Re: [Scikit-learn-general] My personal suggestion regarding > topics for GSoC (and my official application :-) ) > To: scikit-learn-general@lists.sourceforge.

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Olivier Grisel
Christof, don't forget to put your proposal on melange by Thursday (the earlier the better). Please put "scikit-learn" in the title to make it easy to find. -- Olivier -- Dive into the World of Parallel Programming The G

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Vlad Niculae
Hi Wei Xue, hi everyone, I think Andy’s comments about testing and documentation are very important. I have just a few things to add: 1. As confused as I am about the world around me, I still knew that the current year is 2015 :P I think that the form is asking “which year of your program you

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Vlad Niculae
Hi Boyuan, hi everyone, On top of what Andy said, I would like to add that you don’t have to commit to certain algorithms in the proposal, as long as you make the plan very clear, and you leave time for discussing alternatives, pros and cons with the community. Since you say there is some ove

[Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-24 Thread João Felipe Santos
Hi, I am using MKL with Numpy and Scipy on a cluster and just installed scikit-learn. The setup process goes without any issue, but if I try to import sklearn.hmm I get the following error: ImportError: /sb/home/jfsantos/venv/lib/python2.7/site-packages/sklearn/utils/sparsetools/_traversal.so: un

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Vlad Niculae
Hi Raghav, hi everyone, If I may, I have a very high-level comment on your proposal. It clearly shows that you are very involved in the project and understand the internals well. However, I feel like it’s written from a way too technical perspective. Your proposal contains implementation detai

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Boyuan Deng
Hi Andreas: when I think there is a closed form solution Yes, I remember that in some paper they first give the analytical solution to the optimization problem, and then prove that it's the same result that iterative version will converge to. I'll find that paper and read it again. I think

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Olivier Grisel
I also share Gael's concerns with respect to extending our API in yet another direction at a time where we are trying to focus on ironing out consistency issues... -- Olivier -- Dive into the World of Parallel Programmin

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Gael Varoquaux
> Don't you think that I could also benchmark models that are not > implemented in sklearn? For instance, I could write a wrapper > DeepNet(...) with fit() and predict(), and which uses internally theano > to build a ANN? In this way, I could benchmark complex deep networks > beyond what will b

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Christof Angermueller
On 20150324 21:25, Andy wrote: > One thing that might also be interesting is "Bootstrapping" (in the > compiler sense, not the statistics sense) the optimizer. > The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used > a hyper-parameter optimizer to optimiz

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Christof Angermueller
ra et at 2011) to evaluate optimizer like spearmint. For classification, I candidates are * MNIST * CIFAR-10 and for regression: * Bosting housing precises @Andy, @Kyle, and @Matthias: thanks for your references! I will have a closer look at them tomorrow! Christof On 20150324 21:25, Andy

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Andy
One thing that might also be interesting is "Bootstrapping" (in the compiler sense, not the statistics sense) the optimizer. The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used a hyper-parameter optimizer to optimize the parameter of a hyper-parameter optimizer on a set of opt

[Scikit-learn-general] [GSOC] Global optimization based Hyper parameter optimization Hamzeh Alsalhi

2015-03-24 Thread Andy
Hi Hamzeh. Somehow I didn't see you posting in this years GSoC thread, maybe I was looking for the wrong email address. Here is some initial feedback on your GSoC proposal. Problem description and Project abstract seem a bit redundant. I don't think you mean "constitutional neural networks". It

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
This paper (http://arxiv.org/pdf/1306.3476v1.pdf) might also give you some ideas for things to try. Boosting an untrained "deep" model got a lot of benefit from bayesian optimization. Note that this model was built prior to the release of the dataset! Weird but very interesting. On Tue, Mar 24, 20

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
That said, I would think random forests would get a lot of the benefits that deep learning tasks might get, since they also have a lot of hyperparameters. Boosting tasks would be interesting as well, since swapping the estimator used could make a huge difference, though that may be trickier to impl

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time like SMAC did in their paper. I don't see any reason we couldn't do this in sklearn, and it might be of value to users since we don't really do deep learning as Andy said. On Tue, Mar 24, 2015 at 4:52 PM, Andy wrote: > On 03/2

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Andy
On 03/24/2015 04:38 PM, Christof Angermueller wrote: > Thanks Andy! I replied to your comments: > https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing. > > I summary, > * I will not mentioned parallelization as an extended features, > * suggest concrete d

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Michael Eickenberg
nd to which extend I can contribute. > https://github.com/scikit-learn/scikit-learn/pull/4270/ > > I will upload the final version to melange tomorrow. > > > Cheers, > Christof > > > Any further ideas on > > > On 20150324 00:07, Andreas Mueller wrote: > &

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Christof Angermueller
expect an improvement. Any further ideas? Where can I find the PR for gaussian_processes? I would like to know about what will be implemented and to which extend I can contribute. I will upload the final version to melange tomorrow. Cheers, Christof Any further ideas on On 20150324 00:07

Re: [Scikit-learn-general] My personal suggestion regarding topics for GSoC (and my official application :-) )

2015-03-24 Thread Andy
Hi Luca. If you give write comment permissions, I could comment on the google doc in-place which might be helpful. As I think was commented earlier, the current PLS already implements NIPALS. What would the addition be? Use that in PCA? That is not super clear from the proposal. I think impleme

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-24 Thread Olivier Grisel
Please send a link to your proposal as a reply to this thread as soon as it's online. -- Olivier -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in partnership wit

Re: [Scikit-learn-general] Subject: Hyperparameters in scikit-learn

2015-03-24 Thread Andy
Hi Matthias. I think that is an interesting direction to go into, and I actually thought a bit about if and how we could add something like that to scikit-learn. Is there online documentation for paramsklearn? It is a bit hard to say what are good defaults, I think, and it often encodes intuit

[Scikit-learn-general] Subject: Hyperparameters in scikit-learn

2015-03-24 Thread Matthias Feurer
Dear scikit-learn team, After reading the proposal of Christoph Angermüller wanting to enhance scikit-learn with Bayesian optimization (http://sourceforge.net/p/scikit-learn/mailman/message/33630274/) as a GSoC project, you might also want to think again about the integration of a hyperparame

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Andy
Hi Wei Xue. I think the proposal looks good and the scope should work well. I feel like the explanation in Implementing VBGMM is a bit fuzzy, maybe you can rework it a bit. Also, for the timeline, the documentation shouldn't come as an afterthought. Ideally, each improvement is its own pu

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Andy
Hi Boyuan. I looked over your application and it looks good so far. I think it could be a bit more ambitious. I know the idea page was not very elaborate. It might be interesting to improve the existing graph-based algorithms. There is some discussion in https://github.com/scikit-learn/scikit-l

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
> > ​ > In other words, I would like to get in an "API freeze" state where we add/modify only essentials stuff to the API. ​Ok, then I suppose, the easiest way would be to create 2 kind of transformers for each method: one that transforms the space so that Euclidean distance acts like Mahalanobis'

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Jean K
Andreas Muller suggested GroupIndependentKFold. The problem with adding a parameter (such as stratified) to the existing LeaveKLabelOut is that it might be misleading in the sense that: (i) here we might we don't care about the number of labels left out (ii) The number of labels left out might var

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Jean K
Hi, Yes Michael, that's exactly what I want. Basically I don't care about the number of Labels left out, I just want K (approximately) equilibrated folds, where the same label does not appear in both training and testing (therefore the number of labels left out might vary for each fold). Indeed

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Michael Eickenberg
looks like the difference is that it can group several labels into one fold. not everybody works with "subjects" - the proper name would contain the word Label or Group, or it should be incorporated in a LeaveLabelsOut which could have several modes, among which LeaveOneLabelOut and "balanced" mod

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Gael Varoquaux
On Tue, Mar 24, 2015 at 04:58:20PM +0100, Alexandre Gramfort wrote: > how different is it from > http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LeaveOneLabelOut.html > ? Well, LeaveOneLabelOut should be modified to be able not to do an exhaustive set (which scales pretty

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Jean K
Hi Alexandre, My problem was that each sample was obtained from a specific subject (a same subject possibly produced several samples) and I wanted to train and test on different subjects. As I understand it, LeaveOneLabelOut (and more generally LeavePLabelOut) can be used to leaves one subject (o

Re: [Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Alexandre Gramfort
hi jean, how different is it from http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.LeaveOneLabelOut.html ? A On Tue, Mar 24, 2015 at 4:49 PM, Jean K wrote: > Hi all, > > I recently needed to perform some subject independent KFold > cross-validation. To my knowledge this

[Scikit-learn-general] Subject Independent KFold

2015-03-24 Thread Jean K
Hi all, I recently needed to perform some subject independent KFold cross-validation. To my knowledge this feature isn't in the scikit-learn yet, so I created a pull-request with a simple implementation. It is similar the original Fold exce

Re: [Scikit-learn-general] 8.23.6. sklearn.neighbors.BallTree

2015-03-24 Thread Andy
Hi. There was an issue in generating the docs afterwards. The docs for the 0.16b1 version work again: http://scikit-learn.org/dev/modules/generated/sklearn.neighbors.BallTree.html#sklearn.neighbors.BallTree I don't think there were any major changes. You can look at the docstring directly if you h

[Scikit-learn-general] 8.23.6. sklearn.neighbors.BallTree

2015-03-24 Thread nafise mehdipoor
Hi,Would you please let me know about some documents of "Ball Tree" of version 0.14 or 0.15. When I search for this, I just find the link below which is for version 0.13.1 http://scikit-learn.org/0.13/modules/generated/sklearn.neighbors.BallTree.html#sklearn.neighbors.BallTree Actually, I need t

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Joel Nothman
On 25 March 2015 at 00:01, Gael Varoquaux wrote: > > > To make this more concrete, the MetricLearner().metric_ estimator would > > require specialised set_params or clone behaviour, I assume. I.e. it > > involves hacking API fundamentals. > > It's more a general principle of "freeze": to be able

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Gael Varoquaux
> To make this more concrete, the MetricLearner().metric_ estimator would > require specialised set_params or clone behaviour, I assume. I.e. it > involves hacking API fundamentals. It's more a general principle of "freeze": to be able to settle down on something that we _know_ works and is robus

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Joel Nothman
On 24 March 2015 at 23:56, Gael Varoquaux wrote: > > So I just thought: what if metric learners will have an attribute > `metric` > > Before adding features and API entries, I'd really like to focus on > having a 1.0 release, with a fixed API that really solves the problems > that we currently ar

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Gael Varoquaux
> So I just thought: what if metric learners will have an attribute `metric` Before adding features and API entries, I'd really like to focus on having a 1.0 release, with a fixed API that really solves the problems that we currently are trying to solve. In other words, I would like to get in an

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
> > ​ > I'd still call it ``transform`` probably, though. It would be a bit > confusing because it uses the squared transform, but it would make it > possible to build pipelines with clustering algorithms. ​ It's unfortunate that we already have a transform for "linear" metric learners. One could

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Joel Nothman
I agree with everything Andy says. I think the core developers are very enthusiastic to have a project along the lines of "Finish all the things that need finishing", but it's very impractical to do so much context switching both for students and mentors/reviewers. One of the advantages of GSoC is

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Raghav R V
Hi Andy, Thanks a lot for your feedback... I'll update my proposal wiki based on your guidelines and also submit the same to melange too by today! Thanks, R On Tue, Mar 24, 2015 at 3:10 AM, Andreas Mueller wrote: > Hi Raghav. > > I feel that your proposal lacks some focus. > I'd remove the

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Artem
Hi Joel. Thanks for your input! I understand that I put a lot into my proposal, but it's hard to estimate timeline exactly. Thus, I suggest thinking about it as being ordered by priority: most important things go first, and least important (like kernel ITML) may be abandoned in favor of documentat

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Joel Nothman
Hi Artem, I've taken a look at your proposal. I think this is an interesting contribution, but I suspect your proposal is far too ambitious: - The proposal doesn't well account for the need to receive reviews and alter the PR in accordance. This is especially so because you are developing

[Scikit-learn-general] gini impurity as stopping criterion?

2015-03-24 Thread Laurent Prévot
Hello everyone, I am using DecisionTreeClassifier() and RandomForestClassifier() and I would like to know whether there are some ways to use gini impurity as stopping criterion (rather than the proposed parameters: size of leaves or size of splitting nodes). If there is a good reason for not do

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Christof Angermueller
thanks Andy! I will revise my proposal and submit it to melange today! Christof On 20150324 00:07, Andreas Mueller wrote: > Hi Christof. > I gave some comments on the google doc. > > Andy > > On 03/19/2015 05:12 PM, Christof Angermueller wrote: >> Hi All, >> >&

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-24 Thread Aurélien Bellet
> Thanks for your comments! Can you say anything on kernelization as part > of a model, not KPCA? I'm especially interested in a kernelized version > of ITML. I think, kernel metric learning methods don't scale well, since > one has to work a huge matrix of size n_samples x n_samples, which > quic