Re: [Scikit-learn-general] No methods seem to predict well

2012-08-09 Thread Zach Bastick
Hi David I did try Ridge Regression as per my original message, but didn't get any results. Maybe I'm implementing it incorrecly. Generally, the data set should work fine I think. I've correlated the features to the dependent variable and get high correlations for them independently. I'm not s

Re: [Scikit-learn-general] No methods seem to predict well

2012-08-09 Thread David Warde-Farley
On Thu, Aug 9, 2012 at 2:08 PM, Zach Bastick wrote: > But as you can see, the predictions are absolutely terrible, no matter > what I do. > The training set predictions are quite accurate though. From my reading, > this could be due to over fitting. However, I don’t see how simple > linear model

Re: [Scikit-learn-general] machine learning on text

2012-08-09 Thread Robert Layton
On 10 August 2012 01:53, mathieu lacage wrote: > hi, > > I have been using sklearn for a while now but I only recently started to > figure out how to make sure I am using it correctly and that the results I > get are meaningful so, the following questions are fairly general questions > about mach

Re: [Scikit-learn-general] No methods seem to predict well

2012-08-09 Thread Zach Bastick
Thanks I've created a script here with the two datasets (one for training the model and one for testing the model), and a loader file to get the data into python variables. https://gist.github.com/3309139 I can't seem to get anything out of the data, I'd really appreciate the help figuring ou

[Scikit-learn-general] GSOC: last straight line, and blogging

2012-08-09 Thread Gael Varoquaux
Hi GSOCers (Vlad and Immanuel), The GSOC is getting to an end, and we are in the last rush. It's really a pitty, as the projects seem to be entering a super-productive phase in which pull request with significant speed ups are popping up. As we want to benefit as much as possible of the remaining

Re: [Scikit-learn-general] No methods seem to predict well

2012-08-09 Thread Peter Prettenhofer
Hi Zach, if you provide a gist with your evaluation setup (similar to this one [1]) I can look into it. best, Peter [1] https://gist.github.com/3266657 2012/8/9 Zach Bastick : > I’m having some conceptual trouble with this supervised machine learning > project (regression) that hopefully someo

[Scikit-learn-general] No methods seem to predict well

2012-08-09 Thread Zach Bastick
I’m having some conceptual trouble with this supervised machine learning project (regression) that hopefully someone can help me with. I am trying to do sentiment analysis on texts (scoring them from -10 to +10) based on a human-scored training set. Training set: Cases = 35 Score Mean = 0.77 Sc

[Scikit-learn-general] machine learning on text

2012-08-09 Thread mathieu lacage
hi, I have been using sklearn for a while now but I only recently started to figure out how to make sure I am using it correctly and that the results I get are meaningful so, the following questions are fairly general questions about machine learning applied to text content. Hopefully, someone who

Re: [Scikit-learn-general] Sprints at EuroScipy

2012-08-09 Thread Andreas Müller
As Alex is not wearing the cheerleader outfit, I think I won't be able to make it ;) Andy - Ursprüngliche Mail - Von: "Alexandre Gramfort" An: [email protected] Gesendet: Donnerstag, 9. August 2012 14:58:37 Betreff: Re: [Scikit-learn-general] Sprints at EuroScip

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Paolo Losi
On Thu, Aug 9, 2012 at 3:28 PM, Vlad Niculae wrote: > Andy, Mathieu: > The docs are lacking guidelines and examples on how to tune SVR > parameters. IIUC, C, gamma, etc should be use just as in SVC. The tricky > part is epsilon, how should it be set? What are sensible defaults and a > sensible gr

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Paolo Losi
On Thu, Aug 9, 2012 at 1:30 PM, Andreas Müller wrote: > Sorry for being unspecific. > Using the kernel should be more efficient with higher degree polynomials > and when having > many features. The dimensionality of the explicit features grows very fast > with the degree while the cost > of the ke

Re: [Scikit-learn-general] Sprints at EuroScipy

2012-08-09 Thread Alexandre Gramfort
I won't be able to attend the sprints either... Otherwise I would have for sure volunteered to wear this cheerleader outfit :) Alex On Wed, Aug 8, 2012 at 6:43 PM, Gael Varoquaux wrote: > Hey list, > > The euroscipy organizers are asking me to organize a bit the sprint. I > have been trying to i

Re: [Scikit-learn-general] Unexpected class prediction in BernoulliNB

2012-08-09 Thread Jaques Grobler
Dear JP, Firstly, sorry for the delayed reply.. You appear to be on to something. I played around with your code and the classifier and after discussing it with @Gael, it seems clear that there's a pig in the truffle-patch here. It seems that in the case you sample code, the problem occurs when

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Vlad Niculae
Andy, Mathieu: The docs are lacking guidelines and examples on how to tune SVR parameters. IIUC, C, gamma, etc should be use just as in SVC. The tricky part is epsilon, how should it be set? What are sensible defaults and a sensible grid search range? Thanks, Vlad On Aug 9, 2012, at 13:30 , An

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Andreas Müller
Hey Paolo. Sorry for being unspecific. Using the kernel should be more efficient with higher degree polynomials and when having many features. The dimensionality of the explicit features grows very fast with the degree while the cost of the kernel computation stays the same. Also SVMs work quite

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Paolo Losi
Hi Andy, On Thu, Aug 9, 2012 at 11:53 AM, Andreas Müller wrote: > Also you might need to normalize the data and set the value of C. > Still this should work better than doing the explicit expansion. > What do you mean exactly by work better? Paolo ---

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-09 Thread Andreas Müller
> Please review: > https://github.com/scikit-learn/scikit-learn/pull/1003 > I think that I made sure that pretty much every estimator was > well-behaved. A lot of small changes. This can PR benefit from many > eyes. > Woah that was fast. Thanks! > That common testing framework is a pleasure, And

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Andreas Müller
Also you might need to normalize the data and set the value of C. Still this should work better than doing the explicit expansion. - Ursprüngliche Mail - Von: "Mathieu Blondel" An: [email protected] Gesendet: Donnerstag, 9. August 2012 09:53:18 Betreff: Re: [Sciki

Re: [Scikit-learn-general] LinearSVC best match

2012-08-09 Thread Andreas Müller
Alternatively you could look at the output of "decision_function" in LinearSVC. These do not represent probabilities, though. Andy - Ursprüngliche Mail - Von: "Gael Varoquaux" An: [email protected] Gesendet: Donnerstag, 9. August 2012 05:50:14 Betreff: Re: [Scik

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Mathieu Blondel
On Thu, Aug 9, 2012 at 4:02 PM, Zach Bastick wrote: > I'm going to manually stop it now by closing the python window. Am I > doing something wrong? > > It probably means that epsilon is not well tuned. You can try SVR(kernel="linear") to see how it fares compared to least squares. Mathieu --

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-09 Thread Zach Bastick
I ran: >> model = SVR(kernel="poly", degree=2) but the % Error of the prediction is worse than using simple Ordinary Least Squares using: >> linear_model.LinearRegression() It's also much slower. I changed the degree to 4 to see if the results of the prediction got any better, but it's taking