Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Paolo Losi
Hi Albert On Fri, Mar 22, 2013 at 6:33 PM, Albert Kottke wrote: > Consider the following profiles: > > # A B C D > 1 10 9 3 2 > 2 4 5 4 5 > 3 6 5 6 7 > > I have removed thickness and just used layer number for simplicity. The > desired behavior is that profiles A and B are grouped

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Ricardo Corral C.
Sorry, these are the refs: [1] Djamel A. Zighed, Stéphane Lallich, and Fabrice Muhlenbach. Sepa- rability index in supervised learning. Principles of Data Mining and Knowledge Discovery, 2431:475–487, 2002. [2] Supowit, K. J. (1983), "The relative neighborhood graph, with an application to minimu

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Albert Kottke
Consider the following profiles: # A B C D 1 10 9 3 2 2 4 5 4 5 3 6 5 6 7 I have removed thickness and just used layer number for simplicity. The desired behavior is that profiles A and B are grouped together because they start high, decrease, and then slightly increase. Profiles C

Re: [Scikit-learn-general] using grid_seach

2013-03-22 Thread Jaques Grobler
Hey Andy, sorry been busy all day. You mean something like this to make it more clear ? >>> kernel_param = {'kernel':('linear', 'rbf')} >>> C_param = {'C':[1,10]} >>> parameters = (kernel_param, C_param) #List of parameter dictionaries Sorry I'm a bit scatter brained with my visa appo

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Lars Buitinck
2013/3/22 Albert Kottke : > My biggest question is forming the data into the X matrix (n_samples, > n_features). The approach you describe would cluster based on thickness and > velocity without consideration of the relationship between adjacent layers. > Initially, I want to try to cluster based o

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Albert Kottke
The NaNs at the base of the profile implies that the velocity in that layer continues on for some unspecified thickness, which I can handle using a couple of different approaches. I am not too concerned about that. My biggest question is forming the data into the X matrix (n_samples, n_features).

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Lars Buitinck
2013/3/22 Albert Kottke : > Here is the data that I would be working with: > > No Thickness DepthVp Vs > (m) (m)(m/s)(m/s) > 1,2.00,2.00, 480.00, 180.00 > 2,8.00, 10.00, 2320.00, 700.00 > 3,8.00, 18.00, 2980.00, 1150.00 > 4, 52.00, 70.00, 298

Re: [Scikit-learn-general] Machine learning on 2D problems.

2013-03-22 Thread Albert Kottke
Here is the data that I would be working with: No Thickness DepthVp Vs (m) (m)(m/s)(m/s) 1,2.00,2.00, 480.00, 180.00 2,8.00, 10.00, 2320.00, 700.00 3,8.00, 18.00, 2980.00, 1150.00 4, 52.00, 70.00, 2980.00, 1720.00 5, -, -, 3120.0

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread nipun batra
Ya i remember the discussion. Was just confirming if there's any change of plan. Thanks for clarifying and the suggestion. I will ask on Pandas. There is stuff on timeseries already in Pandas, and HMM's might fall somewhat nearby. On Fri, Mar 22, 2013 at 7:12 PM, Andreas Mueller wrote: > Hi Nipu

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Olivier Grisel
2013/3/22 Andreas Mueller : > Hi Nipun. > We discussed this and basically think structured learning is off-topic for > sklearn at the moment. > I am building a structured learning library, but it is still changing quite > a bit. > > It is not so clear to me what happens with the HMMs. > And I guess

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Olivier Grisel
2013/3/21 Ricardo Corral C. : > Ok, this is a brief description of what I'm interested in. > > Recently, I faced a problem of evaluating the quality of a method to > obtain features from protein structures. > I adopted the approach given in [1] to measure separability of my > classes independently

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Andreas Mueller
Hi Nipun. We discussed this and basically think structured learning is off-topic for sklearn at the moment. I am building a structured learning library, but it is still changing quite a bit. It is not so clear to me what happens with the HMMs. And I guess we should decide that soon. I think th

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread nipun batra
Might be a bit off topic. Is Structured Learning still not a priority for sklearn? I would have ideally liked to have put my development code in sklearn for HMM's (since what i need would goes beyond what is currently implemented in sklearn). I have started porting Murphy's HMM toolbox

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Alexandre Gramfort
hi all, my feeling is that new SGD schemes (Averaged SGD and recent efforts in online learning) would be a nice addition. There is also an open PR on ranking with SGD using a pairwise hinge loss. Alex On Fri, Mar 22, 2013 at 1:23 PM, Andreas Mueller wrote: > Hi Anne. > Thanks for the offer. > I

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Peter Prettenhofer
2013/3/22 Yanir Seroussi : > Thanks for the quick response. Good to see that I'm not imagining things :-) > > Before posting this question, I had a look at Friedman's paper and ESLII and > the R gbm documentation, but I couldn't find a clear description of how OOB > estimates are calculated. I thin

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Yanir Seroussi
Thanks for the quick response. Good to see that I'm not imagining things :-) Before posting this question, I had a look at Friedman's paper and ESLII and the R gbm documentation, but I couldn't find a clear description of how OOB estimates are calculated. I think it makes sense to have a separate

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Peter Prettenhofer
I've opened an issue for this: https://github.com/scikit-learn/scikit-learn/issues/1802 2013/3/22 Andreas Mueller : > We should open an issue in the issue tracker. > > -- > Everyone hates slow websites. So do we. > Make yo

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Andreas Mueller
Hi Anne. Thanks for the offer. I'm not sure we want a newtons method implementation. There is on in liblinear. but that is one-vs-rest. If we start reimplementing parts of liblinear, we might open pandoras box ;) In principal I could imagine a "MultinomialLogisticRegression" estimator. The spee

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Anne Dwyer
Andy, I wrote Python code for Newton's method logistic regression and a plot of the hyperplane. Is this something the GSoC project would be interested in or is it too low level? Anne Dwyer On Fri, Mar 22, 2013 at 6:58 AM, Andreas Mueller wrote: > Hi Ricardo. > I think you forgot to mention wha

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Andreas Mueller
We should open an issue in the issue tracker. -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_mar _

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Peter Prettenhofer
Hi Yanir, thanks for raising this issue. I've implemented this feature without much though; furthermore, I haven't used OOB estimates in my work yet. I need to think more deeply about the issue - will come back to you. You propose to update ``y_pred`` only for the in-bag samples, correct? best,

Re: [Scikit-learn-general] GSoC 2013

2013-03-22 Thread Andreas Mueller
Hi Ricardo. I think you forgot to mention what [1] and [2] are. What is the difference between a relative neighborhood graph and a neighborhood graph? To me that sounds a bit to special purpose for the moment. We need Logistic Regression first (which might also be a good GSoC project)! Just my

Re: [Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Andreas Mueller
Hi Yanir. I was not aware that GradientBoosting had oob scores. Is that even possible / sensible? It definitely does not do what it promises :-/ Peter, any thoughts? Cheers, Andy On 03/22/2013 11:39 AM, Yanir Seroussi wrote: Hi, I'm new to the mailing list, so I apologise if this has been a

[Scikit-learn-general] OOB score in gradient boosting models

2013-03-22 Thread Yanir Seroussi
Hi, I'm new to the mailing list, so I apologise if this has been asked before. I want to use the oob_score_ in GradientBoostingRegressor to determine the optimal number of iterations without relying on an external validation set, so I set the subsample parameter to 0.5 and trained the model. Howe

Re: [Scikit-learn-general] Getting started

2013-03-22 Thread Andreas Mueller
+1 on doing these things. I'm not sure if the work that needs to be done is ideal for GSoC, though. I can't really judge that. Merging the RBM is very high on my priority list, and so is getting the MLP done. I can't mentor, though. On 03/22/2013 10:46 AM, Vlad Niculae wrote: > Hi Abinash, > > In

Re: [Scikit-learn-general] Getting started

2013-03-22 Thread Vlad Niculae
Hi Abinash, Indeed, like Andy said, you should first get familiar with the codebase by starting to contribute on easy issues, even if they are not related to what you want to work on. Second, regarding the multi-layer perceptron: there are two codebases under development implementing it. One of t

Re: [Scikit-learn-general] Getting started

2013-03-22 Thread Andreas Mueller
Hi Abinash. As you went through the idea page, hopefully you also read Gael's letter. As stated there, you are expected to contribute to the project prior to applying for a GSoC. Please go through the issue tracker for possible starting points. If you are undecided, we can try to pick something