Re: [Scikit-learn-general] optimization with constraints

2012-02-02 Thread Gael Varoquaux
On Thu, Feb 02, 2012 at 10:17:02PM -0500, Jieyun Fu wrote: >Is there a way to enforce the constraints on sklearn optimizers or >classifiers? For example, if I put some data into a logistic regression, I >want to make sure some coefficients are positive / negative.  No. The optimizers a

[Scikit-learn-general] Fw: Improving the accuracy of classifier

2012-02-02 Thread adnan rajper
Hi, Actually I followed this tutorial http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html. It uses Tf-IDF normalization so same I have incorporated after removing URLs, user names and stop words. Adnan From: Gael Varoquaux To: a

[Scikit-learn-general] Fw: Improving the accuracy of classifier

2012-02-02 Thread adnan rajper
Yes Peter, Indeed I am doing sentiment classification. Your suggestion are highly appreciable. Sorry, but I am not able to understand your question: "how many features do you have?". Would you care to elaborate it? Again Thanks millions  Adnan From: Pete

[Scikit-learn-general] optimization with constraints

2012-02-02 Thread Jieyun Fu
Is there a way to enforce the constraints on sklearn optimizers or classifiers? For example, if I put some data into a logistic regression, I want to make sure some coefficients are positive / negative. Thanks! -- Try befo

Re: [Scikit-learn-general] Bug in master

2012-02-02 Thread Gael Varoquaux
On Tue, Jan 31, 2012 at 05:09:54PM +0100, Gael Varoquaux wrote: > On Tue, Jan 31, 2012 at 05:05:53PM +0100, Lars Buitinck wrote: > > I don't have a NumPy 2 installation and I haven't followed its > > development closely. Could you open an issue for this? > https://github.com/scikit-learn/scikit-le

Re: [Scikit-learn-general] ImportError: cannot import sparse

2012-02-02 Thread Lars Buitinck
2012/2/2 Jacob VanderPlas : > File >  "/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py", > line 15, in >    from . import sparse, libsvm, liblinear > ImportError: cannot import name sparse It looks suspicously similar to this issue: https:

Re: [Scikit-learn-general] ImportError: cannot import sparse

2012-02-02 Thread Andreas
On 02/02/2012 04:44 PM, Jacob VanderPlas wrote: > When I build the documentation with the current master, I get a string > of errors related to svm: > > File > > "/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py", > line 15, in > from

[Scikit-learn-general] ImportError: cannot import sparse

2012-02-02 Thread Jacob VanderPlas
When I build the documentation with the current master, I get a string of errors related to svm: File "/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py", line 15, in from . import sparse, libsvm, liblinear ImportError: cannot impor

Re: [Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread Peter Prettenhofer
Ok, so I assume you do sentiment classification? For millions of examples I definitely recommend using either NaiveBayes or SGDClassifier. I'd start with a bernoulli NB as a baseline. Personally, I hardly use IDF weighting for sentiment classification; words with low document frequency are usuall

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Gael Varoquaux
On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote: > No objection to it being merged, but would you consider doing a rebase > -i? LP's history contains lots of micro-commits, which I think can be > largely squashed together. Sorry to disappoint everybody, but they were so many conflict

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Andreas
On 02/02/2012 12:34 PM, Olivier Grisel wrote: > 2012/2/2 Mathieu Blondel: > >> On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel >> wrote: >> >> >>> I wonder which representation is the nicest for the end user? It might >>> be the case that keeping the unlabeled data as a separate variable

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Olivier Grisel
2012/2/2 Mathieu Blondel : > On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel > wrote: > >> I wonder which representation is the nicest for the end user? It might >> be the case that keeping the unlabeled data as a separate variable >> might be more natural but that will probably impact pipeline-ab

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Mathieu Blondel
On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel wrote: > I wonder which representation is the nicest for the end user? It might > be the case that keeping the unlabeled data as a separate variable > might be more natural but that will probably impact pipeline-ability > and cross-validation since X

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Olivier Grisel
2012/2/2 Mathieu Blondel : > On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux > wrote: >> Just a heads up: I am going to merge in label propagation >> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour >> unless somebody has concerns with the code. > > I personally don't like usi

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Gael Varoquaux
On Thu, Feb 02, 2012 at 08:04:03PM +0900, Mathieu Blondel wrote: > I personally don't like using -1 to encode unlabeled data. I would > prefer np.nan (which require y to be np.float) or -2 (if you prefer y > to be np.int). I am against nan, but I might agree with you that -1 is not ideal. I sugge

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Olivier Grisel
2012/2/2 Gael Varoquaux : > On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote: >> No objection to it being merged, but would you consider doing a rebase >> -i? LP's history contains lots of micro-commits, which I think can be >> largely squashed together. > > This is a bit further than

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Gael Varoquaux
On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote: > No objection to it being merged, but would you consider doing a rebase > -i? LP's history contains lots of micro-commits, which I think can be > largely squashed together. This is a bit further than I am usually willing to go in term

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Mathieu Blondel
On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux wrote: > Just a heads up: I am going to merge in label propagation > https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour > unless somebody has concerns with the code. I personally don't like using -1 to encode unlabeled data. I wou

Re: [Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Lars Buitinck
2012/2/2 Gael Varoquaux : > Just a heads up: I am going to merge in label propagation > https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour > unless somebody has concerns with the code. > > I think that it is a beautiful pull request and I am very happy to see it > landing in the

[Scikit-learn-general] Merging in label propagation

2012-02-02 Thread Gael Varoquaux
Just a heads up: I am going to merge in label propagation https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour unless somebody has concerns with the code. I think that it is a beautiful pull request and I am very happy to see it landing in the scikit. G

Re: [Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread adnan rajper
Hi Peter, number of samples: 1 million tweets number of features: I use the bag of words model, in-fact I have followed this example   http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html. It uses TF-IDF normalization. class distribution: equal number of positive and

Re: [Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread Peter Prettenhofer
Hi Adnan, can you give use some more specific information about your learning task / dataset including: - number of samples - number of features - class distribution - features (normalization, preprocessing) best, Peter 2012/2/2 adnan rajper : > hi everybody, > > I am using multinomial

Re: [Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread Olivier Grisel
2012/2/2 Gael Varoquaux : > On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote: >>    I tried "parameter tuning using grid search",  but it gets too slow. Both >>    classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is >>    that I want to improve the accuracy, for ins

Re: [Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread Gael Varoquaux
On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote: >I tried "parameter tuning using grid search",  but it gets too slow. Both >classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is >that I want to improve the accuracy, for instance I want to make it more >

[Scikit-learn-general] Improving the accuracy of classifier

2012-02-02 Thread adnan rajper
hi everybody, I am using multinomial and LinearSVC classifier with default parameters to classify twitter messages into two classes (positive or negative). I followed the tutorial on http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html. I tried "parameter tuning usi

Re: [Scikit-learn-general] Causes for one class dominating?

2012-02-02 Thread Yaroslav Halchenko
Would it hold if you PCA it to two dimensions and visualize it of the same effects hold Michael Waskom wrote: >Hi Alex, > >See my response to Yarick for some results from a binary >classification. I reran both the three-way and binary classification >with SVC, though, with similar results: > >

Re: [Scikit-learn-general] Joblib and IPython

2012-02-02 Thread Andreas Müller
On 02/01/2012 04:03 PM, Gael Varoquaux wrote: > On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote: > >> I started working with IPython.parallel for training the trees using joblib. >> It works in principal, but it is SLOW. >> The time between starting and the jobs arriving at the engines

Re: [Scikit-learn-general] Unit test fail when building the latest version of scikit-learn.

2012-02-02 Thread Lars Buitinck
2012/1/23, Alejandro Weinstein : == > FAIL: sklearn.tests.test_multiclass.test_ovr_fit_predict > -- > File "/home/ajw/local/scikit-learn/sklearn/tests/test_multi