date:20150428

Re: [Scikit-learn-general] Value error when using KNeighboursClassifier with GridSearch

2015-04-28 Thread Jitesh Khandelwal

@Joel: Thats exactly the mistake I made. Actually, I already had the transforms implemented in another package. They all do not require any fitting on the data. While wrapping them in sklearn transformer classes, I called the transform functions in the fit() method rather than the transform() meth

Re: [Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample? (Andreas Mueller)

2015-04-28 Thread Fabrizio Fasano

Thanks a lot: Based on your suggestion I performed the following 2 tests (code below); 1) on the true labels, instead of defining train,test by StratifiedShuffleSplit I performed 1 permutations of train, test sets by cross_validation.train_test_split, and accuracy resulted to be Accuracy: 9

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

2015-04-28 Thread Olivier Grisel

Note that Trevor already tries to automate checks for semantic compatibility by leveraging Andy's estimator checks utility when possible: https://github.com/trevorstephens/gplearn/blob/master/gplearn/tests/test_common.py This can probably be improved (on both sides) but it's a great start! As fo

[Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto

>From the documentation: "Feature selection is usually used as a pre-processing step before doing the actual learning. The recommended way to do this in scikit-learn is to use a sklearn.pipeline.Pipeline

[Scikit-learn-general] Topic extraction

2015-04-28 Thread C K Kashyap

Hi everyone, I am new to scikit. I only feel sad for not knowing it earlier - it's awesome. I am trying to do the following. Extract topics from a bunch of tweets. I tried NMF (from the sample here - http://scikit-learn.org/stable/auto_examples/applications/topics_extraction_with_nmf.html) but I w

[Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Pagliari, Roberto

I'm trying to use recursive feature elimination with gradient boosting and grid search as shown below gbr = GradientBoostingClassifier() parameters = {'learning_rate': [0.1, 0.01, 0.001], 'max_depth': [1, 4, 6], 'min_samples_leaf': [3, 5, 9, 17],

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Sebastian Raschka

With the L1 regularization, you can't "control" the exact number of features that will be selected, it depends on the data (which features are irrelevant), and the regularization strength. What it basically does is zero-ing out coefficients. If you want to experiment with the number of features

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

2015-04-28 Thread Andreas Mueller

On 04/28/2015 09:49 AM, Olivier Grisel wrote: > Note that Trevor already tries to automate checks for semantic > compatibility by leveraging > Andy's estimator checks utility when possible: > > https://github.com/trevorstephens/gplearn/blob/master/gplearn/tests/test_common.py > > This can probabl

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Artem

GridSearchCV is not an estimator, but an "utility" to find one. So you should `fit` grid search first in order to find that classifier that performs well on cv-splits, and then use it. Like this gbr = GradientBoostingClassifier() parameters = {'learning_rate': [0.1, 0.01, 0.001],

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto

Hi Sebastian, thanks for the hint. I think another way of doing it could be using PCA in the pipeline, and setting the number of components in 'parameters'? Thanks, From: Sebastian Raschka [se.rasc...@gmail.com] Sent: Tuesday, April 28, 2015 3:20 PM To: scikit-le

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Sebastian Raschka

Yes, PCA would work too, but then you'll get feature extraction instead of feature selection :) > On Apr 28, 2015, at 4:45 PM, Pagliari, Roberto > wrote: > > Hi Sebastian, > thanks for the hint. I think another way of doing it could be using PCA in > the pipeline, and setting the number of c

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Pagliari, Roberto

Thank you! one more question. When it comes to pipelining with grid search, which estimators can I use for feature selection, apart from SVC and PCA? Thank you, From: Artem [barmaley@gmail.com] Sent: Tuesday, April 28, 2015 4:07 PM To: scikit-learn-general Su

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Andreas Mueller

GradientBoostingClassifier has feature_importances_, so at least the RFE in master will will work. You can make grid-search work in RFECV but I wouldn't recommend it. Why don't you grid-search over the rfecv? Regarding your other question, have you looked at the feature selection documentation:

Re: [Scikit-learn-general] error with RFE and gridsearchCV

2015-04-28 Thread Sebastian Raschka

First, I think it's important to think about if the combination makes sense: E.g., I think it wouldn't make much sense to combine PCA and kernel SVM, since PCA is a linear transformation technique (scikit-learn implements so non-linear dim reduction techniques, too). Also, if the size of the dat

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread Andreas Mueller

Clusters are one per data point, while topics are not. So the model is slightly different. You can get the list of topics for each sample using NMF().fit_transform(X). On 04/28/2015 01:13 PM, C K Kashyap wrote: Hi everyone, I am new to scikit. I only feel sad for not knowing it earlier - it's

Re: [Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample? (Andreas Mueller)

2015-04-28 Thread Andreas Mueller

For 1) the two methods should give the same result, except that currently there is no stratification in train_test_split. So the StratifiedShuffleSplit should be better. For 2) 51.66 for 100 permutations seems more reasonable than 60%. On 04/28/2015 05:04 AM, Fabrizio Fasano wrote: Thanks a l

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto

hi Sebastian, correct. however, if you set the number of components, you should get feature selection as well. Thank you, From: Sebastian Raschka [se.rasc...@gmail.com] Sent: Tuesday, April 28, 2015 4:53 PM To: scikit-learn-general@lists.sourceforge.net Subject:

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Eraldo Pomponi

Dear Roberto, Just in case you want to better understand what Sebastian suggested, let me suggest you two short videos taken from the ML course of Hastie and Tibshirani about the shrinkage methods: https://www.youtube.com/watch?v=cSKzqb0EKS0 https://www.youtube.com/watch?v=A5I1G1MfUmA They help

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Andreas Mueller

No, because each component will use all features (PCA coefficients are dense) On 04/28/2015 05:05 PM, Pagliari, Roberto wrote: hi Sebastian, correct. however, if you set the number of components, you should get feature selection as well. Thank you, -

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Ndjido Ardo Bar

Hi folks, When it comes to perform feature selection, I often suggest to use ElasticNet which is a combination of an L1 and L2 penalties. When using a penalty-based feature selection one must make sure that features are standardized otherwise the selection can end up being misleading. Cheers,

Re: [Scikit-learn-general] SVM for feature selection

2015-04-28 Thread Pagliari, Roberto

Thanks for the info. I did not explain myself clearly. I just meant to say that once PCA is done, you could choose a smaller number of features, starting from the most relevant. To do that, I would still need to implement a custom transformer. Thank you, From: E

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread Joel Nothman

This shows the newsgroup name and highest scoring topic for each doc. zip(np.take(dataset.target_names, dataset.target), np.argmax(nmf.transform(tfidf), axis=1)) I think something based on this should be added to the example. On 29 April 2015 at 07:01, Andreas Mueller wrote: > Clusters are on

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

2015-04-28 Thread Trevor Stephens

Hi All, Thanks a lot for your responses. Gaël : Certainly not looking to break any openness or trust, that's why I asked :-) After a bit more thought on the font issue, I think the danger of implying my package is reviewed/endorsed by scikit-learn is too great with the graphic similarities that yo

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread C K Kashyap

Thanks Joel and Andreas, Joel, I think "highest ranking topic for each doc" is exactly what I am looking for. Could you elaborate on the code please? What would be dataset.target_names and dataset.target in my case - http://lpaste.net/131649 Regards, Kashyap On Wed, Apr 29, 2015 at 3:08 AM, Joe

[Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

2015-04-28 Thread nmura...@masonlive.gmu.edu

Hello, I am very new to scikit-learn and am trying to run cross-validation on a data frame consisting of text features, classification class. I am trying to perform text data classification. It is a 2-class classification problem where the distribution between positive and negative instances i

Re: [Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

2015-04-28 Thread Sebastian Raschka

Hi, Nikhil, you could use stratified k-fold cross validation, which preserves the "original" class proportions. An example can be found here: http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread Joel Nothman

Highest ranking topic for each doc is just np.argmax(nmf.transform(tfidf), axis=1). This is because nmf.transform (tfidf) returns a matrix of shape (num samples, num components / to

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread C K Kashyap

Thank you so much Joel, I understood. Just one more thing please. How can I include a document against it's highest ranking topic only if it crosses a threshold? regards, Kashyap On Wed, Apr 29, 2015 at 9:45 AM, Joel Nothman wrote: > Highest ranking topic for each doc is just np.argmax(nmf.tr

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread Joel Nothman

mask with np.max(..., axis=1) > threshold On 29 April 2015 at 14:35, C K Kashyap wrote: > Thank you so much Joel, > > I understood. Just one more thing please. > > How can I include a document against it's highest ranking topic only if it > crosses a threshold? > > regards, > Kashyap > > On Wed,

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread C K Kashyap

Works like a charm. Just noticed though that the max value is sometimes more than 1.0 is that okay? Regards, Kashyap On Wed, Apr 29, 2015 at 10:12 AM, Joel Nothman wrote: > mask with np.max(..., axis=1) > threshold > > On 29 April 2015 at 14:35, C K Kashyap wrote: > >> Thank you so much J

Re: [Scikit-learn-general] Topic extraction

2015-04-28 Thread Joel Nothman

Yes, this is not a probabilistic method. On 29 April 2015 at 14:56, C K Kashyap wrote: > Works like a charm. Just noticed though that the max value is sometimes > more than 1.0 is that okay? > > Regards, > Kashyap > > On Wed, Apr 29, 2015 at 10:12 AM, Joel Nothman > wrote: > >> mask with n

Re: [Scikit-learn-general] Value error when using KNeighboursClassifier with GridSearch

Re: [Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample? (Andreas Mueller)

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

[Scikit-learn-general] SVM for feature selection

[Scikit-learn-general] Topic extraction

[Scikit-learn-general] error with RFE and gridsearchCV

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

Re: [Scikit-learn-general] error with RFE and gridsearchCV

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] error with RFE and gridsearchCV

Re: [Scikit-learn-general] error with RFE and gridsearchCV

Re: [Scikit-learn-general] error with RFE and gridsearchCV

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] bias in svm.LinearSVC classification accuracy in very small data sample? (Andreas Mueller)

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] SVM for feature selection

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] Use of the 'learn' font in third party packages

Re: [Scikit-learn-general] Topic extraction

[Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

Re: [Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] Topic extraction

Re: [Scikit-learn-general] Topic extraction

31 matches

Site Navigation

Mail list logo

Footer information