Re: [Scikit-learn-general] SVM - Scaling data or not?

2013-05-25 Thread amueller
you should not use your method in any common setting. the difference when using scaler is that it will remember mean and variance of the Training set and reuse that for the test set. Gianni Iannelli schrieb: >what I usually do is scale the training set and the dataset separately, >but I'm d

Re: [Scikit-learn-general] using grid_seach

2013-03-30 Thread amueller
no, just replacing () with []. Jaques Grobler schrieb: >Hey Andy, sorry been busy all day. You mean something like this to make >it >more clear ? > >>>> kernel_param = {'kernel':('linear', 'rbf')} >>>> C_param = {'C':[1,10]} parameters = (kernel_param, C_param) #List of parameter

Re: [Scikit-learn-general] Participation in GSoC 2013

2013-03-30 Thread amueller
how do we represent missing values here? Mathieu Blondel schrieb: >On Tue, Mar 26, 2013 at 9:25 PM, Lee Zamparo wrote: >> AFAIK, you might not want all the missing values to be imputed at >once, >> especially if the dimensions of X are large. Maybe something like: >> >> >> X_transformed = es

Re: [Scikit-learn-general] Questions about converting categorical data into input data for an SVM

2013-03-30 Thread amueller
i thought OneHotEncoder solves that. Lars Buitinck schrieb: >2013/3/27 Anne Dwyer : >> Just to clarify, you are saying that there is no procedure in scikit >that >> will transform categorical feature values into numerical values like >I was >> trying to do here. Correct? > >Not that I know of.

Re: [Scikit-learn-general] Vectorizing input

2013-03-14 Thread amueller
did you see my earlier reply? Roman Sinayev schrieb: >min_df=2 in the second and min_df=1 in the first. > >On Thu, Mar 14, 2013 at 7:19 PM, Ark <[email protected]> wrote: >> >>> >>> This is unexpected. Can you inspect the vocabulary_ on both >>> vectorizers? Try computing their set.intersectio

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-08 Thread amueller
I want to have a non-empty menu for the user guide. the template just uses the build in toc variable. there is also a toc_tree function but that gives the whole toc tree, not just below the current page. I think I know how to get what i want in rst but i have no idea how to tell sphinx to render

Re: [Scikit-learn-general] setup script refering to .c

2013-03-05 Thread amueller
Exactly. Not only would you need cython, it also needs to be a recent version. people with older versions would get cryptic error messages, leading to frustrated users and busy mailing lists. Matthieu Brucher schrieb: >Hi, > >If I remember correctly, this is done to avoid an explicit Cython

Re: [Scikit-learn-general] get the label representing a row in predict_proba(X)

2013-03-01 Thread amueller
The classifiers have a 'classes_' attribute that contains the original class labels. ShNaYkHs ShNaYkHs schrieb: >Let x an example to classify: >probas = model_svm.predict_proba([x])[0] >how can I know what is the label (a string) corresponding to each >predicted >probability ? That is, probas

Re: [Scikit-learn-general] Any incremental classifier for sklearn

2013-03-01 Thread amueller
sgdclassifier using partial_fit. I want to do naive bayes soon ShNaYkHs ShNaYkHs schrieb: >Is there any incremental classifier in sklearn, that can be trained >incrementally considering one data-point at a time ? An existing one or >under >development one .. > > >-

Re: [Scikit-learn-general] Should sklearn.pipeline.Pipeline expose "classes_" property if the final estimator is a classifier?

2013-02-26 Thread amueller
not all estimators, but those that are needed for the kind of estimator it represents, maybe? there are only four kinds of estimators, right? we should really write those api docs ;) Lars Buitinck schrieb: >2013/2/26 : >> actually i think i share tadej's view on being able to exchange >Pipe

Re: [Scikit-learn-general] Should sklearn.pipeline.Pipeline expose "classes_" property if the final estimator is a classifier?

2013-02-26 Thread amueller
actually i think i share tadej's view on being able to exchange Pipelines and classifiers. since 0.13 classes_ is basically part of the public classifier api. so a pipeline should also have it, i guess. "Tadej Janež" schrieb: >On Tue, 2013-02-26 at 14:39 +0100, Lars Buitinck wrote: >> >>

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-24 Thread amueller
the missing 2 in tokenizing 2.50 is indeed a bit weird, though. Tom Fawcett schrieb: >First, thanks for all your great work on scikits.learn! It’s making my >life easier. > >Second, I found surprising behavior in sklearn.feature_extraction.text. >I’m using TfidfVectorizer and CountVectorizer

Re: [Scikit-learn-general] Problem in text feature extraction (sklearn.feature_extraction.text)

2013-02-24 Thread amueller
for the missing 'r' in the docs: it looks like a sphnix glitch to me and I have not found a way to fix. for the tokenization: the sklearn regexp seems like a sensible default to me. what would you change it to so as to still be robust? Tom Fawcett schrieb: >First, thanks for all your great w

Re: [Scikit-learn-general] Packaging large objects

2013-02-21 Thread amueller
btw you could also use a different multiclass strategy like error correcting output codes (exists in sklearn) or a binary tree of classifiers (would have to implement yourself) Ark <[email protected]> schrieb: >> >> The size is dominated by the n_features * n_classes coef_ matrix, >> which y

Re: [Scikit-learn-general] Packaging large objects

2013-02-21 Thread amueller
you could try some backward feature selection like recursive feature elimination or just dropping features with neglectible coeficients. group l1 penalty on the weigths would probably be the way to go but we don't have that ... Ark <[email protected]> schrieb: >> >> The size is dominated by

Re: [Scikit-learn-general] Packaging large objects

2013-02-21 Thread amueller
you only need coef_ and intercept_ to make predictions but not much else should be stored. if there is a gain from storing coef yourself it is probably a bug. what is the number of features and classes? Ark <[email protected]> schrieb: >I have been wondering about what makes the size of an S

Re: [Scikit-learn-general] Extreme Learning Machine implementation question

2013-02-14 Thread amueller
how about softmax? David Lambert schrieb: >Given the method of determining the class predictions in the extreme >learning machine classifier: > > class_predictions = np.argmax(raw_predictions, axis=1) > >where raw_predictions are the (potentially negative) linear regression >outputs > (see

Re: [Scikit-learn-general] Matlab linear classify vs SKlearn LDA

2013-02-14 Thread amueller
matlab doc online says linear classifier is lda by default. Andrew Winterman schrieb: >Logistic regression can be used as a linear classifier. Maybe that's >matlab's linear classifier? > >On Thursday, February 14, 2013, David Reed wrote: > >> I was mistaken, R is providing the exact same resu

Re: [Scikit-learn-general] Hyperparameter optimization

2013-02-10 Thread amueller
I have a pull request for randomized seaech but I need to update it as it is quite old... Ronnie Ghose schrieb: >afaik yes. Please tell me if i'm wrong, more experienced scikitters :) > > >On Sun, Feb 10, 2013 at 9:23 PM, Yaser Martinez >wrote: > >> Any further development on this? Is a "brut

Re: [Scikit-learn-general] Scikit-learn scalability options ?

2013-02-07 Thread amueller
please check out current master, there was a bug in minibatch k means in the release. "Vinay B," schrieb: >So I tried your recommendations. The partial fit seems to operate to an >extent. Then BOOM! It looks very similar to the example in >http://scikit-learn.org/dev/auto_examples/document_cl

Re: [Scikit-learn-general] Python 2.x & 3.x under one code base

2013-01-14 Thread amueller
+1. in fact I think we should merge simple compatible fixes to master asap, maybe you could do a pr with the none six changes? Lars Buitinck schrieb: >Regarding Python 3 compat, I just started rebasing Olivier's code. Is >it ok if I push the result a branch py3 in the master repo? I think >th

Re: [Scikit-learn-general] Cross validation turns my lists into numpy arrays

2013-01-13 Thread amueller
there is a fix for that in current master. check arrays now has 'allow lists'. andy Robert Layton schrieb: >When using cross_validation.X, all arrays are checked in the normal way >-- >using check_arrays. >I am developing code that uses string documents as input, so I have a >list >of strings

Re: [Scikit-learn-general] Python 2.x & 3.x under one code base

2013-01-05 Thread amueller
in general +1 but actually I'd like to release pretty soon. Jake Vanderplas schrieb: >Hi All, >Just a quick heads-up: thanks to some good work by Pauli Virtanen, >SciPy >is currently in the process of moving to a single code-base which >supports 2.x and 3.x, and it doesn't look extremely dif

Re: [Scikit-learn-general] Shape of classes_ varies?

2012-11-29 Thread amueller
+1 Doug Coleman schrieb: >I guess transforming it would be more in line with other classifiers. >The >design decision could be "You should only have to know about >multi-output >if you want to use it." > > >On Thu, Nov 29, 2012 at 10:07 AM, Doug Coleman >wrote: > >> Going off of my unit tests

Re: [Scikit-learn-general] Shape of classes_ varies?

2012-11-29 Thread amueller
the classes_ attribute is not present in all classifiers and not consistent, as you noticed. this is a known issue (see the issue tracker) and it would be great to address this. I am not sure about the decision trees in particular. Doug Coleman schrieb: >Decision trees' classes are wrapped i

Re: [Scikit-learn-general] OvR, Logistic Regression and SGD

2012-11-06 Thread amueller
we should probably improve the docs on the ovr. iirc the user guide was already very explicit, maybe add something to the docstring? abhi: did you read the user guide on the one vs rest classifier? how could we improve it to make things more clear? Mathieu Blondel schrieb: >On Tue, Nov 6, 20

Re: [Scikit-learn-general] Atlas configuration error OSX

2012-10-17 Thread amueller
can you try linking against libatlas manually? that should do it. then i'll to fix the setup.py Andrew Godbehere schrieb: >Hi Andy, > >I found _ATL_drotg defined in /opt/local/lib/libatlas.a. >_ATL_drotg is listed as an undefined reference in libcblas.a, >libf77blas.a, and libptcblas.a. > >T

Re: [Scikit-learn-general] Issue tags on github

2012-09-03 Thread amueller
I agree, but for this specific issue, I thought that the consensus was 'give a warning' and so now it should be fairly clear what to do. I tried to avoid tagging issues as easy if high level knowledge was needed, maybe I didn't succeed. Andy -- Diese Nachricht wurde von meinem Android-Mobiltel

Re: [Scikit-learn-general] Release

2012-08-31 Thread amueller
+1 but we should adhere to the rule of waiting at least two releases. And deprecation warnings on renamed parameters never produce spurious warnings. -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Gael Varoquaux schrieb: On Fri, Aug 31, 2012 at 03:50:15PM +03

Re: [Scikit-learn-general] Release

2012-08-31 Thread amueller
We do? Which warnings do you mean? I am not aware of any warnings in the tests or examples. -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Vlad Niculae schrieb: We are all annoyed by warnings; we have a ton of them at the moment. Some of them are scheduled f

Re: [Scikit-learn-general] Release

2012-08-31 Thread amueller
Sure, no problem. -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Peter Prettenhofer schrieb: Hi all, unfortunately, I'm not available on Saturday and Sunday - if possible, it would be great if we could post-pone the release until Tuesday. thanks, Peter 2012

Re: [Scikit-learn-general] test windows build for PR 899

2012-08-04 Thread amueller
I might be able to give it a try later on. Alexandre Gramfort schrieb: nobody working with current master on windows with mingw ? Any help would be greatly appreciated. Alex On Fri, Aug 3, 2012 at 9:39 PM, Alexandre Gramfort wrote: > hi, > > can anybody with a windows machine and no blas a

Re: [Scikit-learn-general] Nice blog post comparing various scikit-learn classifier runtimes

2012-06-24 Thread amueller
I just read the Post and i was wodering: shouldn't extra trees be faster than random forests? In the Blog Post they are slower. Andy -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Olivier Grisel schrieb: Here is the link: http://blog.explainmydata.com/2012/0

Re: [Scikit-learn-general] GSoC progress reports on the mailing list

2012-06-02 Thread amueller
Hi David. Very Nice Blog post. I'm out so just a short comment for now: both the difference in Timing and Performance is probably due to the fact that my Implementation does batch learning and yours does online learning. For benchmarking cython i recommend you look into Fabians yep Tool. Cheer

Re: [Scikit-learn-general] read libsvm format data

2012-05-31 Thread amueller
Hi Sheila. I think Peter got the right answer: load_svmlight_File yields a sparse Matrix that you need to convert to an Array First. Cheers, andy -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Sheila the angel schrieb: Hi Andreas, there is no difference bet

Re: [Scikit-learn-general] Classificator for probability features

2012-05-14 Thread amueller
I would try using a chi squared Kernel. You can Start by using the approximation provided in sklearn. Cheers, andy -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet. Philipp Singer schrieb: Hey there! I am currently trying to classify a dataset which has the fol