On Thu, Feb 02, 2012 at 10:17:02PM -0500, Jieyun Fu wrote:
>Is there a way to enforce the constraints on sklearn optimizers or
>classifiers? For example, if I put some data into a logistic regression, I
>want to make sure some coefficients are positive / negative.
No. The optimizers a
Hi,
Actually I followed this
tutorial http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html.
It uses Tf-IDF normalization so same I have incorporated after removing URLs,
user names and stop words.
Adnan
From: Gael Varoquaux
To: a
Yes Peter, Indeed I am doing sentiment classification.
Your suggestion are highly appreciable. Sorry, but I am not able to understand
your question: "how many features do you have?". Would you care to elaborate it?
Again Thanks millions
Adnan
From: Pete
Is there a way to enforce the constraints on sklearn optimizers or
classifiers? For example, if I put some data into a logistic regression, I
want to make sure some coefficients are positive / negative.
Thanks!
--
Try befo
On Tue, Jan 31, 2012 at 05:09:54PM +0100, Gael Varoquaux wrote:
> On Tue, Jan 31, 2012 at 05:05:53PM +0100, Lars Buitinck wrote:
> > I don't have a NumPy 2 installation and I haven't followed its
> > development closely. Could you open an issue for this?
> https://github.com/scikit-learn/scikit-le
2012/2/2 Jacob VanderPlas :
> File
> "/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py",
> line 15, in
> from . import sparse, libsvm, liblinear
> ImportError: cannot import name sparse
It looks suspicously similar to this issue:
https:
On 02/02/2012 04:44 PM, Jacob VanderPlas wrote:
> When I build the documentation with the current master, I get a string
> of errors related to svm:
>
> File
>
> "/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py",
> line 15, in
> from
When I build the documentation with the current master, I get a string
of errors related to svm:
File
"/usr/local/lib/python2.6/dist-packages/scikit_learn-0.11_git-py2.6-linux-i686.egg/sklearn/svm/__init__.py",
line 15, in
from . import sparse, libsvm, liblinear
ImportError: cannot impor
Ok, so I assume you do sentiment classification?
For millions of examples I definitely recommend using either
NaiveBayes or SGDClassifier. I'd start with a bernoulli NB as a
baseline.
Personally, I hardly use IDF weighting for sentiment classification;
words with low document frequency are usuall
On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote:
> No objection to it being merged, but would you consider doing a rebase
> -i? LP's history contains lots of micro-commits, which I think can be
> largely squashed together.
Sorry to disappoint everybody, but they were so many conflict
On 02/02/2012 12:34 PM, Olivier Grisel wrote:
> 2012/2/2 Mathieu Blondel:
>
>> On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel
>> wrote:
>>
>>
>>> I wonder which representation is the nicest for the end user? It might
>>> be the case that keeping the unlabeled data as a separate variable
2012/2/2 Mathieu Blondel :
> On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel
> wrote:
>
>> I wonder which representation is the nicest for the end user? It might
>> be the case that keeping the unlabeled data as a separate variable
>> might be more natural but that will probably impact pipeline-ab
On Thu, Feb 2, 2012 at 8:15 PM, Olivier Grisel wrote:
> I wonder which representation is the nicest for the end user? It might
> be the case that keeping the unlabeled data as a separate variable
> might be more natural but that will probably impact pipeline-ability
> and cross-validation since X
2012/2/2 Mathieu Blondel :
> On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux
> wrote:
>> Just a heads up: I am going to merge in label propagation
>> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour
>> unless somebody has concerns with the code.
>
> I personally don't like usi
On Thu, Feb 02, 2012 at 08:04:03PM +0900, Mathieu Blondel wrote:
> I personally don't like using -1 to encode unlabeled data. I would
> prefer np.nan (which require y to be np.float) or -2 (if you prefer y
> to be np.int).
I am against nan, but I might agree with you that -1 is not ideal.
I sugge
2012/2/2 Gael Varoquaux :
> On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote:
>> No objection to it being merged, but would you consider doing a rebase
>> -i? LP's history contains lots of micro-commits, which I think can be
>> largely squashed together.
>
> This is a bit further than
On Thu, Feb 02, 2012 at 11:27:08AM +0100, Lars Buitinck wrote:
> No objection to it being merged, but would you consider doing a rebase
> -i? LP's history contains lots of micro-commits, which I think can be
> largely squashed together.
This is a bit further than I am usually willing to go in term
On Thu, Feb 2, 2012 at 7:17 PM, Gael Varoquaux
wrote:
> Just a heads up: I am going to merge in label propagation
> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour
> unless somebody has concerns with the code.
I personally don't like using -1 to encode unlabeled data. I wou
2012/2/2 Gael Varoquaux :
> Just a heads up: I am going to merge in label propagation
> https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour
> unless somebody has concerns with the code.
>
> I think that it is a beautiful pull request and I am very happy to see it
> landing in the
Just a heads up: I am going to merge in label propagation
https://github.com/scikit-learn/scikit-learn/pull/547 in the next hour
unless somebody has concerns with the code.
I think that it is a beautiful pull request and I am very happy to see it
landing in the scikit.
G
Hi Peter,
number of samples: 1 million tweets
number of features: I use the bag of words model, in-fact I have followed this
example
http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html.
It uses TF-IDF normalization.
class distribution: equal number of positive and
Hi Adnan,
can you give use some more specific information about your learning
task / dataset including:
- number of samples
- number of features
- class distribution
- features (normalization, preprocessing)
best,
Peter
2012/2/2 adnan rajper :
> hi everybody,
>
> I am using multinomial
2012/2/2 Gael Varoquaux :
> On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote:
>> I tried "parameter tuning using grid search", but it gets too slow. Both
>> classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is
>> that I want to improve the accuracy, for ins
On Thu, Feb 02, 2012 at 12:45:04AM -0800, adnan rajper wrote:
>I tried "parameter tuning using grid search", but it gets too slow. Both
>classifiers (multinomial and LinearSVC) give 75% accuracy. My problem is
>that I want to improve the accuracy, for instance I want to make it more
>
hi everybody,
I am using multinomial and LinearSVC classifier with default parameters to
classify twitter messages into two classes (positive or negative). I followed
the tutorial
on http://scikit-learn.github.com/scikit-learn-tutorial/working_with_text_data.html.
I tried "parameter tuning usi
Would it hold if you PCA it to two dimensions and visualize it of the same
effects hold
Michael Waskom wrote:
>Hi Alex,
>
>See my response to Yarick for some results from a binary
>classification. I reran both the three-way and binary classification
>with SVC, though, with similar results:
>
>
On 02/01/2012 04:03 PM, Gael Varoquaux wrote:
> On Wed, Feb 01, 2012 at 03:05:49PM +0100, Andreas wrote:
>
>> I started working with IPython.parallel for training the trees using joblib.
>> It works in principal, but it is SLOW.
>> The time between starting and the jobs arriving at the engines
2012/1/23, Alejandro Weinstein :
==
> FAIL: sklearn.tests.test_multiclass.test_ovr_fit_predict
> --
> File "/home/ajw/local/scikit-learn/sklearn/tests/test_multi
28 matches
Mail list logo