Re: [Scikit-learn-general] LinearSVC best match

2012-08-08 Thread Gael Varoquaux
On Thu, Aug 09, 2012 at 01:02:21AM +, Abhi wrote: > I am using sklearn.svm.LinearSVC for document classification and I get a > good accuracy[98%] on predict. Is there a way to find the confidence of match > (like predict_proba() in SGDClassifier)? Not simply using LinearSVC: liblinear d

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread Mathieu Blondel
On Wed, Aug 8, 2012 at 8:50 PM, Andreas Müller wrote: > > 2) There are at the moment no plans to add structured SVMs to the library. > The reason is that structured > models usually are very problem specific. It is possible to build generic > frameworks like Joachsim SVMstruct, > which works by th

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-08 Thread Mathieu Blondel
On Thu, Aug 9, 2012 at 9:11 AM, Zach Bastick wrote: > > So, how do you do multivariate regression with higher degree polynomials? > In the multivariate case, the principle is the same as np.vander. You just need to concatenate the higher degree features. Only this time since your data is multi-v

[Scikit-learn-general] LinearSVC best match

2012-08-08 Thread Abhi
I am using sklearn.svm.LinearSVC for document classification and I get a good accuracy[98%] on predict. Is there a way to find the confidence of match (like predict_proba() in SGDClassifier)? This would help me in determining the way to handle the remaining 2%, ie the documents that do n

Re: [Scikit-learn-general] multivariate regression with higher degree polynomials

2012-08-08 Thread Zach Bastick
That works when there is only 1 feature / indepedent-variable / x-value for each case, but not when there are many (ie. for multivariate regression). Since there are many independent variables my variables look like this: |x= [[1,2,3,4,5], [2,2,4,4,5], [2,2,4,4,1]] y= [1,2,3,4,5] | For

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Gael Varoquaux
On Wed, Aug 08, 2012 at 11:01:47PM +0200, Gael Varoquaux wrote: > On Wed, Aug 08, 2012 at 11:00:53PM +0200, Peter Prettenhofer wrote: > > I appologize for the poor error message (I need to fix that). > I am on it. Please review: https://github.com/scikit-learn/scikit-learn/pull/1003 I think that

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Brian Wingenroth
Thanks, guys. This makes more sense to me now. Calling toarray() on the sparse array does in fact let the code run, but I understand now that that may not be my best approach. So, again, thanks. Brian On 8/8/12 5:00 PM, Peter Prettenhofer wrote: > 2012/8/8 Philipp Singer : >> Hey! >> >> The

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Gael Varoquaux
On Wed, Aug 08, 2012 at 11:00:53PM +0200, Peter Prettenhofer wrote: > I appologize for the poor error message (I need to fix that). I am on it. G -- Live Security Virtual Conference Exclusive live event will cover all th

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Peter Prettenhofer
2012/8/8 Philipp Singer : > Hey! > > The problem seems to be the following: > > With the TfidfVectorizer you get back a sparse array representation. > > I think the GradientBoostingClassifier can't directly work with sparse > matrices, whereas the first three can. > > So you can try it again with:

Re: [Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Philipp Singer
Hey! The problem seems to be the following: With the TfidfVectorizer you get back a sparse array representation. I think the GradientBoostingClassifier can't directly work with sparse matrices, whereas the first three can. So you can try it again with: training_set.toarray() HTH Philipp Am

[Scikit-learn-general] Preparing text for GBClassifier

2012-08-08 Thread Brian Wingenroth
Hi, I'm completely new to sklearn, so it's entirely likely that I'm just misunderstanding something very fundamental here. I thought that the inputs for the GradientBoostingClassifier would be the same as for other classifiers (LinearSVC, MultinomialNB, etc.), but when trying to run the code

Re: [Scikit-learn-general] Release 0.12 schedule

2012-08-08 Thread Andreas Mueller
Hi Michael. Actually that one is on my priority list. But my priority list is long ;) Any help is always welcome. Andy On 08/08/2012 07:41 PM, Michael Waskom wrote: Hi, Do you think multinomial logit via SGD (GH849 https://github.com/scikit-learn/scikit-learn/pull/849) will make it into 0.12

Re: [Scikit-learn-general] Release 0.12 schedule

2012-08-08 Thread Michael Waskom
Hi, Do you think multinomial logit via SGD (GH849 https://github.com/scikit-learn/scikit-learn/pull/849) will make it into 0.12? This pull request seems to have stalled, but would be very nice to have! Best, Michael On Wed, Aug 1, 2012 at 6:52 AM, Gael Varoquaux < [email protected]>

[Scikit-learn-general] Sprints at EuroScipy

2012-08-08 Thread Gael Varoquaux
Hey list, The euroscipy organizers are asking me to organize a bit the sprint. I have been trying to ignore their request in my effort to get things done, but they are becoming fairly insisting. I am not going to be at this sprint, because I'll be giving a talk at a conference at the same time :(

Re: [Scikit-learn-general] how to pickel CountVectorizer

2012-08-08 Thread Philipp Singer
Am 08.08.2012 15:41, schrieb David Montgomery: > oh..but I want to run the below. The reason why I want to pickle. I > do picke the output of vec.fit though. So, I just want to load up a > saved vec pickle and create an array based on the fit so I can score a > svm model. > > vectorizer.transfo

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread Peter Prettenhofer
Am 08.08.2012 15:48 schrieb "amir rahimi" : > > Thanks for the fast response. > > to JP: It works for me using gcc and g++ on 32-bit Mac and Linux! :) > > J. Friedman in the paper "Greedy Function Approximation: A Gradient Boosting Machine" has mentioned the M-regression algorithm which is a gradie

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread amir rahimi
In fact I wanted to estimate plane parameters for small patches using structured output prediction. But, my dataset is very noisy and I had not enough time to do that ( choosing kernels, parameters, cross validation and etc). I decided to estimate the depth at each point and smooth it by a CRF. As

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread Andreas Müller
> > Thanks for the fast response. > > > to JP: It works for me using gcc and g++ on 32-bit Mac and Linux! :) > > > J. Friedman in the paper "Greedy Function Approximation: A Gradient > Boosting Machine" has mentioned the M-regression algorithm which is > a gradient boosting regression method

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread amir rahimi
Thanks for the fast response. to JP: It works for me using gcc and g++ on 32-bit Mac and Linux! :) J. Friedman in the paper "Greedy Function Approximation: A Gradient Boosting Machine" has mentioned the M-regression algorithm which is a gradient boosting regression method with huber loss function

Re: [Scikit-learn-general] how to pickel CountVectorizer

2012-08-08 Thread David Montgomery
oh..but I want to run the below. The reason why I want to pickle. I do picke the output of vec.fit though. So, I just want to load up a saved vec pickle and create an array based on the fit so I can score a svm model. vectorizer.transform(utterance).toarray() On Wed, Aug 8, 2012 at 9:25 PM, Ph

Re: [Scikit-learn-general] how to pickel CountVectorizer

2012-08-08 Thread David Montgomery
Ah.. Yesmakes sense On Wed, Aug 8, 2012 at 9:25 PM, Philipp Singer wrote: > Am 08.08.2012 14:53, schrieb David Montgomery: >> >> So...does it make sense to pickel CountVectorizer? I just did not >> want to fit CountVectorizer every time I wanted to score a svm model. >> >> > It ma

Re: [Scikit-learn-general] how to pickel CountVectorizer

2012-08-08 Thread Philipp Singer
Am 08.08.2012 14:53, schrieb David Montgomery: > > So...does it make sense to pickel CountVectorizer? I just did not > want to fit CountVectorizer every time I wanted to score a svm model. > > It makes sense to pickle the fitted Vectorizer. In this case you are just trying to pickle the plain obj

Re: [Scikit-learn-general] 2.2.1.2. Estimators objects

2012-08-08 Thread Jaques Grobler
Ah, scrap that.. didn't see Lars' reply. Take care, J 2012/8/8 Jaques Grobler > Hi there, > > Thanks for the feedback. Yes the estimator is an object that must be > instantiated first, after which you can use it for your purposes. I'll have > a read through the part of the documentation that yo

Re: [Scikit-learn-general] 2.2.1.2. Estimators objects

2012-08-08 Thread Jaques Grobler
Hi there, Thanks for the feedback. Yes the estimator is an object that must be instantiated first, after which you can use it for your purposes. I'll have a read through the part of the documentation that you refer to and see whats potting. Thanks for your time. Regards, J 2012/8/3 none other

[Scikit-learn-general] how to pickel CountVectorizer

2012-08-08 Thread David Montgomery
Hi, I am using the below to pickle CountVectorizer vectorizer = CountVectorizer(tokenizer=extract_features_sk,lowercase=self.lowercase,binary=self.is_binary) output = open(self.fn_vec , 'wb') pickle.dump(vectorizer, output) output.close() If I load the pickle in the same app, all works. If I l

Re: [Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread Andreas Müller
Hi Amir. 1) As far as I know, the gradient boosting works only with trees using deviance or least squares regression. I don't think it should be hard to add other losses, though. 2) There are at the moment no plans to add structured SVMs to the library. The reason is that structured models usual

[Scikit-learn-general] GradientBoostingRegression loss function and Structured svm

2012-08-08 Thread amir rahimi
Hi all, I have two questions/requests Is there any way to define arbitrary loss function for gradient boosting regression? e.g. using huber penalty My request is about adding structured output prediction for SVM in the library. Is there any plan for adding that? -- ---