Re: [Scikit-learn-general] CountVectorizer in feature extraction is still slow

2013-03-08 Thread Vlad Niculae
That doesn't mean you should try, though ;) I believe Andy meant that it doesn't mean you *shouldn't* try :) -- Symantec Endpoint Protection 12 positioned as A LEADER in The Forrester Wave(TM): Endpoint Security, Q1

[Scikit-learn-general] Vectorizer question

2013-03-08 Thread Dirk Nachbar
Hi I want to run some classification and I have some variables which are string, I do not need a bag of words vectorizer, just a simple 10 categories into 10 columns transformation. How do I do that. BTW thanks to Andreas and Oliver for tutorial last night. Dirk -- http://twitter.com/dirknbr

[Scikit-learn-general] Why Gaussian Naive Bayes is not working as a base classifier?

2013-03-08 Thread Issam
Evening Dear Developers! I'm peculiarly getting an error while using AdaBoostClassifier with GaussianNB() as a a base estimator. These are my commands *In [65]: gnb = GaussianNB()** **In [66]: bdt = AdaBoostClassifier(gnb,n_estimators=100)** **In [67]: bdt.fit(X,y)* I get the following

Re: [Scikit-learn-general] Vectorizer question

2013-03-08 Thread Lars Buitinck
2013/3/8 Dirk Nachbar dirk...@gmail.com: I want to run some classification and I have some variables which are string, I do not need a bag of words vectorizer, just a simple 10 categories into 10 columns transformation. How do I do that. Use a DictVectorizer:

Re: [Scikit-learn-general] Why Gaussian Naive Bayes is not working as a base classifier?

2013-03-08 Thread Peter Prettenhofer
Issam, currently, GaussianNB does not support sample weights thus it cannot be used w/ Adaboost. In Weka, if a classifier does not support sample weights they fall back to data set re-sampling. We could implement this strategy as well but it would not be very efficient due to the data structures

Re: [Scikit-learn-general] Vectorizer question

2013-03-08 Thread Dirk Nachbar
How do I make sure the dict is in the same order as my other data? On 8 March 2013 10:38, Lars Buitinck l.j.buiti...@uva.nl wrote: 2013/3/8 Dirk Nachbar dirk...@gmail.com: I want to run some classification and I have some variables which are string, I do not need a bag of words vectorizer,

Re: [Scikit-learn-general] Vectorizer question

2013-03-08 Thread Lars Buitinck
2013/3/8 Dirk Nachbar dirk...@gmail.com: How do I make sure the dict is in the same order as my other data? I'm not sure what you mean. The keys in a dict are unordered; DictVectorizer is really a mapping from keys to column indices. -- Lars Buitinck Scientific programmer, ILPS University of

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-08 Thread Jaques Grobler
Hey Andy, I'm not entirely sure about modifying it - it gets put in at line 206 of layout.html by basically just getting the 'toc' code from, I assume, the layout.html file within the main sphinx/themes/basic.. or at least somewhere around there. What do you want to change there? Removing it is

[Scikit-learn-general] recommended workflow

2013-03-08 Thread Kevin Kunzmann
Hey All, I am trying to make some changes to the Random Forest (mainly diagnostics to try some new stuff). However cython is giving me a really hard time here. I tried recompiling only the local changes (_tree.pyx - _tree.c) and that is working as long as I do not change the C-signature of

Re: [Scikit-learn-general] recommended workflow

2013-03-08 Thread Lars Buitinck
2013/3/8 Kevin Kunzmann kevinkunzm...@gmx.net: However cython is giving me a really hard time here. I tried recompiling only the local changes (_tree.pyx - _tree.c) and that is working as long as I do not change the C-signature of any classes defined in _tree.pxd (ValueError: ***, try

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-08 Thread amueller
I want to have a non-empty menu for the user guide. the template just uses the build in toc variable. there is also a toc_tree function but that gives the whole toc tree, not just below the current page. I think I know how to get what i want in rst but i have no idea how to tell sphinx to

Re: [Scikit-learn-general] Flat is better than nested: Website edition

2013-03-08 Thread Jaques Grobler
My guess is that you'd have to tweak around in doc/themes/scikit-learn/layout.html Basically, if you want that something must be in the side bar or header or what-not, you have to use the html and jenga within the mentioned file. You could for example say {%- if (pagename == 'user_guide') %}

Re: [Scikit-learn-general] recommended workflow

2013-03-08 Thread Kevin Kunzmann
Hey Lars, so nobody ever uses 'make cython'? You just fiddle with the .pyx/.pxd that you need for a specific change, fix any issues locally and for a release only the .c files must compile as a whole? I assumed 'make cython' to be working out of the box in the release versions, guess that

Re: [Scikit-learn-general] recommended workflow

2013-03-08 Thread Lars Buitinck
2013/3/8 Kevin Kunzmann kevinkunzm...@gmx.net: so nobody ever uses 'make cython'? You just fiddle with the .pyx/.pxd that you need for a specific change, fix any issues locally and for a release only the .c files must compile as a whole? I assumed 'make cython' to be working out of the box in

Re: [Scikit-learn-general] Text classifier with varying training data size for each labelled set

2013-03-08 Thread Abhiram Koneru
Ronnie, I apologize for not supplying the code or data. I have figured a workaround for the problem. I broke the huge set (A) into smaller sets of equal size (A1, A2, A3). And it works like a charm. Thank you for taking an interest in my problem. Thanks and Regards Abhi On Thu, Mar 7, 2013

[Scikit-learn-general] Data format

2013-03-08 Thread Mohamed Radhouane Aniba
Hello ! I am wondering if someone has developed a snippet or a script that converts libsvm format into a format directly usable by scikit without the need to use of load_svmlight_file. The reason is that I am trying to use the examples provided on the website, but all of them are written in a

[Scikit-learn-general] Regarding Gsoc-2013

2013-03-08 Thread chinmay naik
Hi everyone, I am Chinmay Naik, an undergraduate at Bangalore Institute of Technology, Bangalore. I sincerely hope that scikit-learn will be participating in Gsoc-2013. I would love to contribute to scikit-learn through Gsoc-13. From the proposed ideas list of Gsoc-12 and after looking up the

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Philipp Singer
Why do you want to convert libsvm to another structure? I don't quite get it. If you want to use examples: scikit learn has included datasets that can be directly loaded. I think this section should help: http://scikit-learn.org/stable/datasets/index.html Am 08.03.2013 18:44, schrieb Mohamed

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Mohamed Radhouane Aniba
Simply because I am new to both python and scikit (Coming from R world) The problem is that I tried using load_svmlight_file with in particular RBF parameters example http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html#example-svm-plot-rbf-parameters-py and I get a lot of

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Ronnie Ghose
That uses the Boolean indexing function of numpy arrays iirc On Mar 8, 2013 1:28 PM, Mohamed Radhouane Aniba arad...@gmail.com wrote: Simply because I am new to both python and scikit (Coming from R world) The problem is that I tried using load_svmlight_file with in particular RBF parameters

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Flavio Vinicius
Suppose you have x = np.array([1, 2, 3, 4]) Then x 2 = [False, False, True, True] Using boolean indexing x[x 2] = [3, 4] -- Flavio On Fri, Mar 8, 2013 at 4:41 PM, Ronnie Ghose ronnie.gh...@gmail.com wrote: That uses the Boolean indexing function of numpy arrays iirc On Mar 8, 2013 1:28

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Mohamed Radhouane Aniba
Thank you guys it makes more sense now. I slightly changed the code to fit my data ( I have 6 features) I got then an error message saying : File plot_rbf_parameters.py, line 109, in module Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) File

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Ronnie Ghose
X_2d = X[:, :6] if the data is formatted correctly. Its rows by cols and then slicing. The numpy docs should help On Mar 8, 2013 3:12 PM, Mohamed Radhouane Aniba arad...@gmail.com wrote: Thank you guys it makes more sense now. I slightly changed the code to fit my data ( I have 6 features) I

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Mohamed Radhouane Aniba
Ronnie, This is exactly what I did and that's what shows in the error message saying X.shape[1] = 2 should be equal to 6, the number of features at training time The training was made successfully, best parameters sent to output successfully but then I think it is a bug when rendering the

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Ronnie Ghose
could you by chance upload a part of your data if not all of it / a representation or the like? On Fri, Mar 8, 2013 at 3:21 PM, Mohamed Radhouane Aniba arad...@gmail.comwrote: Ronnie, This is exactly what I did and that's what shows in the error message saying X.shape[1] = 2 should be equal

Re: [Scikit-learn-general] Data format

2013-03-08 Thread Mohamed Radhouane Aniba
Sorry for the format but this is what it looks like -1 1:0.0256992 2:0.89 3:16.2094 4:3.17376 5:1.03704 6:0.161745 -1 1:0.0382503 2:7.159 3:44.5586 4:65.4716 5:24.0289 6:0.168695 1 1:0.0908366 2:10.2772 3:8.25109 4:31.2472 5:47.3532 6:0.163662 -1 1:0.0158669 2:1.87153 3:8.5248 4:2.775

Re: [Scikit-learn-general] Regarding Gsoc-2013

2013-03-08 Thread Robert Layton
On 9 March 2013 04:53, chinmay naik chin.nai...@gmail.com wrote: Hi everyone, I am Chinmay Naik, an undergraduate at Bangalore Institute of Technology, Bangalore. I sincerely hope that scikit-learn will be participating in Gsoc-2013. I would love to contribute to scikit-learn through

Re: [Scikit-learn-general] Regarding Gsoc-2013

2013-03-08 Thread Mathieu Blondel
Hello, To maximize the chances to get accepted, it is very important that you start contributing to the project as soon as possible. It is actually a requirement but it will also allow us to know you. Try to send us a few pull requests, even if it's just for fixing small issues