Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Gael Varoquaux
> It would be nice to do something else instead of crash and burn, but for the > moment that's on the user. I think that in recent Python versions segfault can be captured. -- Dive into the World of Parallel Programming T

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Gael Varoquaux
> 1. For the part Implementing VBGMM, do you mean it would be better if I add > specific functions to be implemented?  @Andreas. My question is: why do you think that, by coding it from scratch rather than trying to understand the existing one and improving it, you'll do a better job? The guy who

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
Dear all, I just updated the proposal draft on github and melange. Thanks, Wei Xue On Wed, Mar 25, 2015 at 5:21 PM, Andreas Mueller wrote: > I don't have a strong opinion. > Maybe it is better than the

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Mathieu Blondel
> Each of them is a transformer that utilizes y during fit, where y is a usual vector of labels of training samples, just like in case of classification. I am actually confused by this. How are you going to encode the similarities / dissimilarities between samples if y is a vector? > Another poss

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-25 Thread Raghav R V
Hi all, thanks a lot for the comments! I've just edited/formatted my prop. based on all of your comments... https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:-Multiple-metric-support-for-CV-and-grid_search-and-other-general-improvements Only thing to be done is to plan what I

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Michael Eickenberg
I do not know the exact state of the algorithm, but the author was working on sklearn compatibility at a sklearn sprint last summer. It seemed like the algorithmic side had been pretty much taken care of, but this needs to be checked. Michael On Wed, Mar 25, 2015 at 11:08 PM, Artem wrote: > Yes

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Artem
Yes, I saw the repo. Didn't know, though, that it's almost completed, thanks for checking! On Thu, Mar 26, 2015 at 1:05 AM, Michael Eickenberg < michael.eickenb...@gmail.com> wrote: > FWIW, although the NCA conversation on github ( > https://github.com/scikit-learn/scikit-learn/issues/3213) is on

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Michael Eickenberg
FWIW, although the NCA conversation on github ( https://github.com/scikit-learn/scikit-learn/issues/3213) is only an issue, Roland (https://github.com/RolT) actually has a full implementation of NCA, which is almost (up to a few details, such as the **kwargs, the class inheritance and some camel ca

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
I don't have a strong opinion. Maybe it is better than the current regularization, but then I was wondering why not go all the way to VBGMM. Though I found the min_covars hard to set, and so MAP EM might be a good addition. On 03/25/2015 05:17 PM, Wei Xue wrote: @Andreas, on the second though

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
@Andreas, on the second thought, MAP EM seems not so important. It just has more theoretic support. We might skip this. Wei On Wed, Mar 25, 2015 at 4:09 PM, Wei Xue wrote: > Sorry for the confusion. > > I am just saying min_covar that prevent singular covariance may be not > flexible. I think t

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Andreas Mueller
Sorry for the confusion, but that was actually not the meta-estimator I was thinking of. I was thinking about the iterative self-learning method, which is a classical way to make a supervised algorithm semi-supervised. Either way, these would be quite simple meta-estimators, and wouldn't require

Re: [Scikit-learn-general] Proposal for GSoC: Dimensionality reduction and features selection

2015-03-25 Thread Michael Eickenberg
Hi Luca, thanks for your gsoc proposal. The proposed topics look interesting as such, but I am having a hard time following the planning: A more fine-grained timeline than 3-4 weeks per sub-project would be very helpful. As Andy says, code review and revisions take time which should be allocated p

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Andreas Mueller
clude? I will use some of the datasets described in the spearmint publication, including * MNIST, * CIFAR-10, and * Bosting housing prices. Christof I decided to only benchmark scikit-learn models. On 20150325 19:42, Andreas Mueller wrote: Testing on the global optimization problems dir

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Andreas Mueller
Implementing this directly in the estimators seems very messy. If we had decent logging, we could try that. Unfortunately we don't. Pretty printing could also be achieved via a logging mechanism, so that people could define it themselves. I don't think it is something we necessary want to provide.

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Christof Angermueller
. On 20150325 19:42, Andreas Mueller wrote: Testing on the global optimization problems directly will actually be a time saver, as they can be evaluated directly, without needing to compute an estimator on MNIST for each point. On 03/25/2015 03:15 PM, Gael Varoquaux wrote: I am very afraid of

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Kyle Kastner
See figure 5 of this paper: http://www.cs.ubc.ca/~hutter/papers/ICML14-HyperparameterAssessment.pdf for an example. There is a better paper that exclusively tackles this but I cannot find it at the moment. I was referring to the optimizer preferring algorithms which are both fast and give good pe

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Sebastian Raschka
Hi, I think some memory monitoring/warning stuff would be very helpful in general. As far as I know, memory usage via e.g,. psutil is not supported by every OS or machine, but we could add an optional "monitor_memory" parameter to estimators/transformers like SomeEstimator(..., monitor_memory=

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Christof Angermueller
To which SMAC paper are you referring to? What do you mean about optimizing runtime/training time? The optimizer should find good parameters with in a short time. Do you mean comparing the best result in a predefined time frame? For this, the 'expected improvement per second' acquisition functio

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Andreas Mueller
You can always amend your melange proposal, so there is no reason not to submit an early version. On 03/25/2015 04:18 PM, Artem wrote: ​Ok, so I removed matrix y from the proposal . Therefore I also

Re: [Scikit-learn-general] [GSoC] Metric Learning

2015-03-25 Thread Artem
​Ok, so I removed matrix y from the proposal . Therefore I also shortened the first iteration by one week, since no changes to the current code are needed. This allowed me to extend the last iteration by

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Vinayak Mehta
What do you think about the proposal though? Vinayak On Thu, Mar 26, 2015 at 1:39 AM, Andreas Mueller wrote: > Hi Vinayak. > I was specifically commenting about the self-taught clustering paper that > you mentioned in your email. > Sorry about not being specific. > > Best, > Andy > > > > On 03

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
Sorry for the confusion. I am just saying min_covar that prevent singular covariance may be not flexible. I think the value of min_covar is too large for estimated covariance, sometimes. For example, a user first try a small subset of training data using GMM with default min_covar = 0.001, then

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Andreas Mueller
Hi Vinayak. I was specifically commenting about the self-taught clustering paper that you mentioned in your email. Sorry about not being specific. Best, Andy On 03/25/2015 04:01 PM, Vinayak Mehta wrote: Hi Andy The idea wiki showed issue #1243 as a reference link which specifically mention

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Kyle Kastner
OK, the mention of sampling had me worried! That clears it up, thanks. And thanks for the paper reference! On Wed, Mar 25, 2015 at 3:53 PM, Andreas Mueller wrote: > Even higher up, it compares variation, collapsed and truncated. > So the variational does not need any sampling (which makes sense

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Vinayak Mehta
Hi Andy The idea wiki showed issue #1243 as a reference link which specifically mentions self-taught learning as a solution for turning an estimator into a semi-supervised one. So, I tried to base my proposal on that. Could you guide me on how to focus more on semi-supervised learning than transfe

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Andreas Mueller
It would be nice to do something else instead of crash and burn, but for the moment that's on the user. Well the kernel approximation should make it work. If you are after visualization I'd also recommend the T-SNE from this branch: https://github.com/scikit-learn/scikit-learn/pull/4025 On 0

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
Even higher up, it compares variation, collapsed and truncated. So the variational does not need any sampling (which makes sense). Btw, this paper has a couple of references for more detailed equations: http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf On 03/25/2015 03:20 PM, Kyle Kastner wr

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
Even higher up, it compares variation, collapsed and truncated. So the variational does not need any sampling (which makes sense). Btw, this paper has a couple of references for more detailed equations: http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-449.pdf On 03/25/2015 03:20 PM, Kyle Kastner wr

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
Sorry, I'm not following. I'm not sure what you are arguing for. I know how VBGMM works, but I'm not sure how MAP EM would work, and why it would be preferable over VBGMM. On 03/25/2015 03:38 PM, Wei Xue wrote: VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step' (although there

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
On 03/25/2015 03:20 PM, Kyle Kastner wrote: > (so fast at email!) Aka so slow at actually getting anything done. -- Dive into the World of Parallel Programming The Go Parallel Website, sponsored by Intel and developed in

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Andreas Mueller
Testing on the global optimization problems directly will actually be a time saver, as they can be evaluated directly, without needing to compute an estimator on MNIST for each point. On 03/25/2015 03:15 PM, Gael Varoquaux wrote: I am very afraid of the time sink that this will be. Sent fro

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
VBGMM is a full Bayesian estimation in both 'E-step' and 'M-step' (although there is no such concept in VB) . The parameters in VB are random variables, and described by a posterior distribution. The posterior distribution is the product of the likelihood and the prior distribution. On the other h

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Stephen O'Neill
Hey Andy, Hmmm, that might be it. My machine only has 8GB of RAM - why didn't I think of that? Indeed the RAM usage seems to have pretty large fluctuations for the process, and when I re-run now instead of just silently dying its choking up my whole computer - indicative of a RAM issue. Thank y

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Andreas Mueller
Hi Vinayak. That looks more like a transfer-learning task and I'm not sure how that a) tie into the project b) work with the sklearn API. So I'd be -1 on that. Cheers, Andy On 03/25/2015 04:16 AM, Vinayak Mehta wrote: Hi everyone! I've added my proposal to the wiki page. Please suggest impr

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-25 Thread Andreas Mueller
On 03/24/2015 07:39 PM, Vlad Niculae wrote: > Hi Raghav, hi everyone, > > If I may, I have a very high-level comment on your proposal. It clearly shows > that you are very involved in the project and understand the internals well. > However, I feel like it’s written from a way too technical per

Re: [Scikit-learn-general] MultinomialNB vs. svm.SVC(kernel='linear')

2015-03-25 Thread Andreas Mueller
Hi Ali. As far as I know, MulitnomialNB just implements Mulitnomial Naive Bayes. The paper just gives context, the docs don't say we implement that method. I'm not sure how established their tricks actually are. Best, Andy On 03/25/2015 09:53 AM, ali hürriyetoglu wrote: Dear List members, I

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Kyle Kastner
There was mention of TDP (blocked Gibbs higher up in the paper) vs collapsed Gibbs sampling - both mentioned burn-in and lag. I was under the impression you would have to be using one of these two to do the computation, see page 137 of the paper just below the pictures, second paragraph http://www.

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
Thanks for your feedback. On 03/25/2015 02:59 PM, Wei Xue wrote: Thanks Andreas, Kyle, Vlad and Olivier for the detailed review. 1. For the part /Implementing VBGMM, /do you mean it would be better if I add specific functions to be implemented? @Andreas. I just felt the paragraph was a bit un

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Gael Varoquaux
I am very afraid of the time sink that this will be. Sent from my phone. Please forgive brevity and mis spelling On Mar 25, 2015, 19:47, at 19:47, Andreas Mueller wrote: >I think you could bench on other problems, but maybe focus on the ones >in scikit-learn. >Deep learning people might be h

Re: [Scikit-learn-general] Fwd: Trouble when compiling with MKL

2015-03-25 Thread Andreas Mueller
There is some discussion of the issue on the issue tracker: https://github.com/scikit-learn/scikit-learn/issues/4083 but I'm not sure if there is much help there. On 03/24/2015 07:41 PM, João Felipe Santos wrote: Hi, I am using MKL with Numpy and Scipy on a cluster and just installed scikit-le

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
Ha, I just get confused about the sampling in DPGMM :). Wei Xue On Wed, Mar 25, 2015 at 2:57 PM, Andreas Mueller wrote: > > > On 03/24/2015 09:44 PM, Kyle Kastner wrote: > > > > Will users be allowed to set/tweak the burn-in and lag for the sampler > > in the DPGMM? > > > This is variational!

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Wei Xue
Thanks Andreas, Kyle, Vlad and Olivier for the detailed review. 1. For the part *Implementing VBGMM, *do you mean it would be better if I add specific functions to be implemented? @Andreas. 2. For the documentation, I will rework on it and reschedule the API specification and math part to the ve

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-25 Thread Andreas Mueller
On 03/24/2015 09:44 PM, Kyle Kastner wrote: > > Will users be allowed to set/tweak the burn-in and lag for the sampler > in the DPGMM? > This is variational! -- Dive into the World of Parallel Programming The Go Parallel

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-25 Thread Andreas Mueller
I think you could bench on other problems, but maybe focus on the ones in scikit-learn. Deep learning people might be happy with using external tools for optimizing. I'd also recommend benchmarking just the global optimization part on global optimization datasets as they were used in Jasper's wor

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Andreas Mueller
Hi Steve. Can you monitor the RAM usage before it fails? Because of the complexity of the algorithm, and as we don't truncate the rbf kernel, this will take 16GB of ram. If the process starts swapping, your OS might just kill it. There is nothing much we can do about that. A solution to runnin

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-25 Thread Boyuan Deng
Hi everyone: I have updated my proposal according to your suggestions. You can find the updates on the wiki page. Text in the melange system has also been updated. https://github.com/scikit-learn/scikit-learn/wiki/GSoC-2015-Proposal:--Cross-validation-and-Meta-estimators-for-Semi-supervised-Lear

Re: [Scikit-learn-general] Kernel PCA .fit() Failing Silently

2015-03-25 Thread Stephen O'Neill
Hey Andy, Sorry, yes, by failing I mean it never finishes, and the python process dies without raising any exceptions. The shape of the data is (46196,114). Also numpy.all(numpy.isfinite(my_data)) returns True before I call transformer.fit() I'm running on python 2.7.8 numpy 1.9.1 sklearn 0.15.2

Re: [Scikit-learn-general] Scikit-learn sprint in Paris, April 2nd

2015-03-25 Thread Nelle Varoquaux
Hello everyone, This is a friendly reminder that you absolutely need to register to the sprint if you plan on coming [0]_. Access to the building will be restricted to those on the list. If you cannot edit the page, please send me or Vincent Michel an email and we will add you to the list of atten

[Scikit-learn-general] Proposal for GSoC: Dimensionality reduction and features selection

2015-03-25 Thread Luca Puggini
Dear All, following some of the advices I have modified my proposal https://docs.google.com/document/d/1gCHUKsfvii1sUQW-4E4dpbpWkmTPAg6WpLUcWbu4vk0/edit?usp=sharing I am now subscribed on the full ML and so I will try to keep all the conversation in the same thread. Let me know what do you think

[Scikit-learn-general] MultinomialNB vs. svm.SVC(kernel='linear')

2015-03-25 Thread ali hürriyetoglu
Dear List members, I saw a Note on [1] about MultinomialNB. The note is: "For the rationale behind the names coef_ and intercept_, i.e. naive Bayes as a linear classifier, see J. Rennie et al. (2003), Tackling the poor assumptions of naive Bayes text classifiers, ICML." Does it mean the implement

[Scikit-learn-general] Fwd: ANN: SciPy (Scientific Python) 2015 Call for Proposals & Registration Open - tutorial & talk submissions due April 1st

2015-03-25 Thread Nelle Varoquaux
Hello everyone, (I apologize for the cross posting). This is a quick reminder that the call for submission for Scipy 2015 is open but due April 1st! There is only 7 days left to submit a proposal. Thanks, Nelle -- Forwarded message -- From: Courtenay Godshall Date: 19 March 201

Re: [Scikit-learn-general] Subject: Hyperparameters in scikit-learn

2015-03-25 Thread Matthias Feurer
Hi Andy, On 24.03.2015 21:00, Andy wrote: Hi Matthias. I think that is an interesting direction to go into, and I actually thought a bit about if and how we could add something like that to scikit-learn. Is there online documentation for paramsklearn? I just compiled the current state of the

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-25 Thread Vinayak Mehta
Hi everyone! I've added my proposal to the wiki page. Please suggest improvements. Here is a link to the Google doc: https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing Further, I want to discuss on if this -> http://www.machinelearning.org/archive/icm

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-25 Thread Raghav R V
Hi Vlad!! Thanks a tonne for the detailed review of my proposal. :) > Your proposal contains implementation details, but little or no discussion of why each change is important and how it impacts users Yes, I'll add a section discussing the motivation of the various deliverable. (which actually