Re: [Scikit-learn-general] Linking error with intel 13.1

2015-07-12 Thread Lars Buitinck
2015-07-11 22:43 GMT+02:00 Andreas Mueller t3k...@gmail.com: It's a bit odd because numpy is not linked against iomp5. What is that / It's the OpenMP runtime. And by the looks of it, MKL is linking with it. -- Don't

Re: [Scikit-learn-general] Estimators of RAKEL and (Ensemble) Classifier Chain for multilabel proposal

2015-07-10 Thread Lars Buitinck
2015-07-10 15:20 GMT+02:00 Al alain.pen...@gmail.com: My name is Alain Pena, (now previously) student in computer engineering at University of Liège. For my master thesis, I had to implement some methods for multilabel classification, those methods being RAKEL [1] and (Ensemble) Classifier

Re: [Scikit-learn-general] Library of pre-trained models

2015-07-01 Thread Lars Buitinck
2015-07-01 16:27 GMT+02:00 Fred Mailhot fred.mail...@gmail.com: 2) The gensim implementation predates the patenting Does that matter? -- Don't Limit Your Business. Reach for the Cloud. GigeNET's Cloud Solutions provide

Re: [Scikit-learn-general] inconsistencies between predict_proba and predict

2015-05-22 Thread Lars Buitinck
2015-05-22 9:01 GMT+02:00 zhenjiang zech xu zhenjiang...@gmail.com: I tested the following code and its outputs show predict_proba and predict give very different result, even for the samples with high probability (0.7) to be label 1 are predicted as label 1. I am very surprised. Is this

Re: [Scikit-learn-general] Problems reproducing the results from TfidfTransformer and Vectorizer

2015-05-22 Thread Lars Buitinck
2015-05-22 8:29 GMT+02:00 Sebastian Raschka se.rasc...@gmail.com: The default equation is: # idf = log ( number_of_docs / number_of_docs_where_term_appears ) And in the online documentation at

Re: [Scikit-learn-general] load_svmlight_file fails for large svmlight files

2015-02-17 Thread Lars Buitinck
2015-02-17 11:25 GMT+01:00 abhishek abhish...@gmail.com: 4294901761:21 4294902016:18 4294967041:15 4294967296:54 I am unable to understand why should it fail when maxint for python is 9223372036854775807. Is there any workaround available for this? Or is it just not possible to load at

Re: [Scikit-learn-general] Dimension Requirements on train_test_split and GridSearchCV

2014-12-18 Thread Lars Buitinck
2014-12-18 20:14 GMT+01:00 David Brough david.brough.0...@gmail.com: Thank you for the quick response. I currently using version 0.15.2. Our arrays have dimensions [n_samples, x, y, z]. Below are the two trace-backs I get for both train_test_split and GrindSearchCV. They both end at the same

Re: [Scikit-learn-general] Dimension Requirements on train_test_split and GridSearchCV

2014-12-18 Thread Lars Buitinck
2014-12-18 21:45 GMT+01:00 Alexandre Gramfort alexandre.gramf...@telecom-paristech.fr: there is an allow_nd param I added to support this use case. Maybe we could add it to GridSearchCV... We decided against this in the case of k-NN, where the suggestion (I don't remember by whom) was to alllow

Re: [Scikit-learn-general] Problems with Python's garbage collection in GridSearch

2014-12-16 Thread Lars Buitinck
2014-12-16 17:57 GMT+01:00 Sebastian Raschka se.rasc...@gmail.com: Maybe it's something in scipy according to Manoj's linked discussion ... in any case, maybe a workaround for this issue and future issues would be to have a forxe_clear_gc (default=False) parameter to force the garbage

Re: [Scikit-learn-general] Exclusivity of scikit-learn

2014-12-04 Thread Lars Buitinck
2014-12-04 0:55 GMT+01:00 Joel Nothman joel.noth...@gmail.com: For example, let's say someone has implemented an algorithm (Affinity Propagation is what triggered this discussion so you might consider that). Someone else wants to come and add features to it, or even just clean the code, but by

Re: [Scikit-learn-general] pairwise.cosine_similarity(...) takes sparse inputs but forces a dense output?

2014-11-27 Thread Lars Buitinck
2014-11-27 17:26 GMT+01:00 Ian Ozsvald i...@ianozsvald.com: If safe_sparse_dot is called with dense_output=False then I get a sparse result and everything looks sensible with low RAM usage. I'm using 0.15, the current github shows the line:

Re: [Scikit-learn-general] random forest prediction performance

2014-11-18 Thread Lars Buitinck
2014-11-18 11:07 GMT+01:00 Nicola Sambin sam...@spaziodati.eu: - when I computed: for vector in vectors: classifier.predict_proba(vector) it took: 2227,99s user 90,75s system 21:29,94 total - while classifier.predict_proba(vectors) took: 1,06s user 0,39s system 1,984 total What is

Re: [Scikit-learn-general] Manually set the coefficients of a model?

2014-11-18 Thread Lars Buitinck
2014-11-18 16:38 GMT+01:00 Chris Holdgraf choldg...@berkeley.edu: Sometimes this works, but sometimes it doesn't. For example, if I use LinearSVC as my model, then setting .coef_ works fine. However, if I use SVC(kernel='linear'), then I get an attribute error when I try to set any attribute.

Re: [Scikit-learn-general] feature selection

2014-11-02 Thread Lars Buitinck
2014-11-02 22:09 GMT+01:00 Andy t3k...@gmail.com: No. That would be backward stepwise selection. Neither that, nor its forward cousin (find most discriminative feature, then second-most, etc.) are implemented in scikit-learn. Isn't RFE the backward step selection using a maximum number of

Re: [Scikit-learn-general] Getting probabilities with LinearSVC

2014-10-23 Thread Lars Buitinck
2014-10-23 16:21 GMT+02:00 George Bezerra gbeze...@gmail.com: Is there a simple way to get the probabilities that a data point belongs to a class for this model? SVMs aren't probability models. You can use LogisticRegression, that's the same algorithm but with a different loss function.

Re: [Scikit-learn-general] feature selection

2014-10-21 Thread Lars Buitinck
2014-10-21 4:14 GMT+02:00 Joel Nothman joel.noth...@gmail.com: I assume Robert's query is about RFECV. Oh wait, RFE = backward subset selection. I'm an idiot, sorry. -- Comprehensive Server Monitoring with Site24x7.

Re: [Scikit-learn-general] feature selection

2014-10-20 Thread Lars Buitinck
2014-10-20 22:08 GMT+02:00 George Bezerra gbeze...@gmail.com: Not an expert, but I think the idea is that you remove (or add) features one by one, starting from the ones that have the least (or most) impact. E.g., try removing a feature, if performance improves, keep it that way and move on

Re: [Scikit-learn-general] Suggestion: break up the metrics module

2014-10-14 Thread Lars Buitinck
2014-10-14 21:53 GMT+02:00 Robert Layton robertlay...@gmail.com: Currently the word metrics is overloaded with at least two type of algorithms in that module. The first is evaluation metrics and the second is functions dealing with distance metrics. My suggestion is to: 1) Move the

Re: [Scikit-learn-general] Using TFxIDF with HashingVectorizer

2014-10-09 Thread Lars Buitinck
2014-09-09 3:36 GMT+02:00 Apu Mishra apumishra...@gmail.com: Lars Buitinck larsmans@... writes: The way to combine HV and Tfidf is hashing = HashingVectorizer(non_negative=True, norm=None) tfidf = TfidfTransformer() hashing_tfidf = Pipeline([(hashing, hashing), (tidf, tfidf)]) I notice

Re: [Scikit-learn-general] using svmlight trained file with nltk

2014-10-08 Thread Lars Buitinck
2014-10-08 11:32 GMT+02:00 Karimkhan Pathan karimkhan...@gmail.com: can I use this trained file with nltk to classify `plain input text`? This is the scikit-learn mailing list. You should be asking on the NLTK ML. --

Re: [Scikit-learn-general] feature union

2014-10-07 Thread Lars Buitinck
2014-10-07 23:03 GMT+02:00 Pagliari, Roberto rpagli...@appcomsci.com: Do I just use the bug tracker? You can, but we'd much rather have a patch :) See https://github.com/scikit-learn/scikit-learn/blob/master/CONTRIBUTING.md on how to create one. (The change is tiny, so you don't need to read the

Re: [Scikit-learn-general] MemoryError() in 'sklearn.tree._tree.Tree._resize' ignored

2014-09-23 Thread Lars Buitinck
2014-09-23 6:38 GMT+02:00 c TAKES ctakesli...@gmail.com: Thanks! What should be the proper behavior when I run the script I wrote? A MemoryError. The bug is already fixed in master. -- Meet PCI DSS 3.0 Compliance

Re: [Scikit-learn-general] scikit-learn 0.15.2 is out!

2014-09-23 Thread Lars Buitinck
2014-09-08 17:53 GMT+02:00 Yaroslav Halchenko s...@onerussian.com: On Mon, 08 Sep 2014, Yaroslav Halchenko wrote: hm... actually not clear since it claims that it is because of missing bdepends scikit-learn build-depends on missing: - libsvm-dev (= 2.84.0) Late reply, but why would it

Re: [Scikit-learn-general] MemoryError() in 'sklearn.tree._tree.Tree._resize' ignored

2014-09-20 Thread Lars Buitinck
2014-09-20 21:29 GMT+02:00 c TAKES ctakesli...@gmail.com: Exception MemoryError: MemoryError() in 'sklearn.tree._tree.Tree._resize' ignored Anyone recognize this error? All too well, but I thought it was fixed for good last time we went through the tree growing code. Which version exactly are

Re: [Scikit-learn-general] Update website Comparison of LDA and PCA 2D projection of Iris dataset

2014-09-06 Thread Lars Buitinck
2014-09-06 10:12 GMT+02:00 Gael Varoquaux gael.varoqu...@normalesup.org: On Fri, Sep 05, 2014 at 10:12:14PM -0400, Sebastian Raschka wrote: just saw that scikit-learn 15.2.0 is out and the LDA was fixed. That's great :). I missed that reviewing the patches that made it into the 0.15.2. The

Re: [Scikit-learn-general] Dirichlet priors on multinomial Bayes?

2014-09-04 Thread Lars Buitinck
2014-08-28 5:47 GMT+02:00 Josh Wasserstein ribonucle...@gmail.com: What prior does scikit-learn use for MultinomialNB? The documentation says: class_prior : array-like, size (n_classes,) Prior probabilities of the classes. If specified the priors are not adjusted according to the data. For

Re: [Scikit-learn-general] Dirichlet priors on multinomial Bayes?

2014-09-04 Thread Lars Buitinck
2014-09-04 12:30 GMT+02:00 Lars Buitinck larsm...@gmail.com: This class prior is just the P(y) in P(y|x) = (P(x|y) × P(y)) / Z. It's a simple multinomial. s/multinomial/categorical/ (I always confuse those two

Re: [Scikit-learn-general] scikit learn classification issue

2014-09-04 Thread Lars Buitinck
2014-09-04 15:45 GMT+02:00 Karimkhan Pathan karimkhan...@gmail.com: Oh okay, well I tried with predict_proba. But if query is out of domain then classifier uniformly divide probability to all learned domains. Like in case of 4 domains (0.333123570669, 0.333073654046, 0.166936800591,

Re: [Scikit-learn-general] Print coordinate descent coefficients at each iteration

2014-09-01 Thread Lars Buitinck
2014-09-01 17:23 GMT+02:00 Alberto Torres albert...@gmail.com: I would like to print the coordinate descent coefficients at each iteration. So far I've identified the code and variable I want to print. In particular I want to print the variable w in function enet_coordinate_descent from

Re: [Scikit-learn-general] Runtime warning scikit 0.15, warning for numpy

2014-08-28 Thread Lars Buitinck
2014-08-28 11:51 GMT+02:00 Giuseppe Marco Randazzo gmranda...@gmail.com: I write for you and other people that are interested this tutorial. Hope will be helpful and more explicative http://gmrand.blogspot.ch/2014/08/howto-install-scipy-scikit-learn-and.html Given the frequency with which

Re: [Scikit-learn-general] error in grid search for KNN

2014-08-25 Thread Lars Buitinck
2014-08-25 17:12 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: iris = datasets.load_iris() gp = {n_neighbors:[2,3], metric:['euclidean']} clf = GridSearchCV(KNeighborsClassifier(), gp, cv=4).fit(iris.data, iris.target) TypeError: __init__() got an unexpected keyword argument 'p' Why

Re: [Scikit-learn-general] error in grid search for KNN

2014-08-25 Thread Lars Buitinck
2014-08-25 17:46 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: I think the problem is in KNeighborsClassifier not in the grid_search. You're right. https://github.com/scikit-learn/scikit-learn/pull/3093 fixes it.

Re: [Scikit-learn-general] error in grid search for KNN

2014-08-25 Thread Lars Buitinck
2014-08-25 17:50 GMT+02:00 Lars Buitinck larsm...@gmail.com: 2014-08-25 17:46 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: I think the problem is in KNeighborsClassifier not in the grid_search. You're right. https://github.com/scikit-learn/scikit-learn/pull/3093 fixes it. It's just been

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 15:44 GMT+02:00 Pavel Soriano sorianopa...@gmail.com: I don't know if this would be helpful to anybody or if this was already discussed. That is why I am asking if it is worthy to be pull requested. Gist URL : https://gist.github.com/psorianom/0b9d8a742fe0efe0fe82 Yes! BM25 is high

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Lars Buitinck
2014-08-23 20:41 GMT+02:00 Gael Varoquaux gael.varoqu...@normalesup.org: Interesting discussion. Of course, the danger here is that it might be borderline for the scope of scikit-learn. In case somebody is going to docstringdo a PR on these topics, I would advise to work on the docstring and

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Lars Buitinck
2014-08-21 12:32 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: 1. What should be the n_jobs value, 8 or (8*4=) 32 ? n_jobs is the number of CPUs you want to use, not the amount of work. (It's a misnomer because the number of jobs/work items is variable; the parameter determines the number

Re: [Scikit-learn-general] optimal n_jobs in GridSearchCV

2014-08-21 Thread Lars Buitinck
2014-08-21 13:44 GMT+02:00 Joel Nothman joel.noth...@gmail.com: I think RandomForestClassifier, using multithreading in version 0.15, should work nested in multiprocessing. It would work, but the p * n threads from p processes using n threads each would still compete for the cores, right?

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Lars Buitinck
2014-08-19 18:03 GMT+02:00 Adamantios Corais adamantios.cor...@gmail.com: I am looking for implementations \ configurations of machine learning algorithms that, instead of a boolean value (class), they return a probability along with the corresponding confidence error. Any hints? Any

Re: [Scikit-learn-general] classification algorithms that return probabilities?

2014-08-19 Thread Lars Buitinck
2014-08-19 20:07 GMT+02:00 Adamantios Corais adamantios.cor...@gmail.com: Great. And what about the confidence error? I mean, how should I select a subset of classified data points such that the probability they belong to any class is high whereas the confidence error is 95% or above? Sorry, I

Re: [Scikit-learn-general] MNIST benchmark

2014-08-18 Thread Lars Buitinck
2014-08-17 7:26 GMT+02:00 Amey a...@cs.utah.edu: Zero-one classification Loss as : 0.0426 I would like somebody to help me interpret this in terms of benchmarks : http://yann.lecun.com/exdb/mnist/ What is the test error % metric given on the link corresponding to the metrics I have

Re: [Scikit-learn-general] penalty l1, loss l2

2014-08-15 Thread Lars Buitinck
2014-08-15 2:42 GMT+02:00 Joel Nothman joel.noth...@gmail.com: This is knowledge you should be able to obtain from almost any machine learning course or textbook, and you should almost certainly be asking it on a wider forum than scikit-learn's mailing list, such as stats.stackexchange.com.

Re: [Scikit-learn-general] Regarding Multiclass SVMs

2014-08-12 Thread Lars Buitinck
2014-08-12 18:44 GMT+02:00 Saurabh Jha saurabh.j...@gmail.com: I am trying to fix this issue. https://github.com/scikit-learn/scikit-learn/issues/3451 Can anyone please recommend some resources for multiclass SVM. I tried to search for them but got confused. I know these things regarding two

Re: [Scikit-learn-general] Using LSH Forest approximate neibghbor search in DBSCAN[GSoC]

2014-08-06 Thread Lars Buitinck
2014-08-06 7:52 GMT+02:00 Joel Nothman joel.noth...@gmail.com: Instead, could we have an interface in which the `algorithm` parameter could take any object supporting `fit(X)`, `query(X)` and `query_radius(X)`, such as an LSHForest instance? Indeed you could also make 'lsh' an available

Re: [Scikit-learn-general] How to compile scikit-learn against python 3.4 using make

2014-08-05 Thread Lars Buitinck
2014-08-05 14:27 GMT+02:00 Frank Dai soulmach...@gmail.com: I want to compile scikit-learn under python 3.4, the following commands are what I'm doing: alias python=python3 Aliases only work in your current shell, not in make. Use make PYTHON=python3

Re: [Scikit-learn-general] Sparse Random Projection Issue

2014-08-04 Thread Lars Buitinck
2014-08-04 17:39 GMT+02:00 Philipp Singer kill...@gmail.com: Apart from that, does anyone know a solution of how I can efficiently calculate the resulting matrix Y = X * X.T? I am currently thinking about using PyTables with some sort of chunked calculation algorithm. Unfortunately, this is

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-31 Thread Lars Buitinck
2014-07-31 14:04 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: Also the NearestCentroid classifier do not have decision_function ! I think we should add one, but I've never bothered to figure out what the right decision function would be. Inverse of distance?

Re: [Scikit-learn-general] sparse datasets loading

2014-07-29 Thread Lars Buitinck
2014-07-29 10:22 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: So my question is : is there some utility or snippet to load a CSV into CSR that I overlooked ? No, but it's not that hard to write [1]. import array data = array.array(f) indices = array.array(i) indptr = array.array(i,

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-29 Thread Lars Buitinck
2014-07-28 23:46 GMT+02:00 Mario Michael Krell kr...@uni-bremen.de: I have to somehow contradict. In fact it would be possible to get a probability but it requires some work. So it is not easy. I my group, we are using a sigmoid fit introduced by Platt to map SVM scores to probability values.

Re: [Scikit-learn-general] sparse datasets loading

2014-07-29 Thread Lars Buitinck
2014-07-29 14:13 GMT+02:00 Eustache DIEMERT eusta...@diemert.fr: No, but it's not that hard to write [1]. Do people see any value in either including it in the docs or encapsulating it into sklearn.datasets ? If it fits somewhere in the docs, +1 from me. I might offer this to the SciPy folks

Re: [Scikit-learn-general] sparse datasets loading

2014-07-29 Thread Lars Buitinck
2014-07-29 14:40 GMT+02:00 Joel Nothman joel.noth...@gmail.com: I think the scipy folks intend that numpy-like setting operations should suffice for many cases (although be a bit slower than the technique you've illustrated). E.g. you can use: X[i, nonzero] = data[nonzero] to replace some

Re: [Scikit-learn-general] calculate the posterior probability

2014-07-28 Thread Lars Buitinck
2014-07-28 18:39 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: For the classifier which do not provide probability estimate of the class (gives error 'object has no attribute predict_proba ), is there any easy way to calculate the posterior probability? No. If there were, we would have

Re: [Scikit-learn-general] Regarding content classification using HashingVectorizer

2014-07-24 Thread Lars Buitinck
2014-07-24 4:35 GMT+02:00 Kartik Kumar Perisetla kartik.p...@gmail.com: Also, Could someone please throw some light on how HashingVectorizer works? https://larsmans.github.io/ilps-hashing-trick/ https://en.wikipedia.org/wiki/Feature_hashing

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 18:07 GMT+02:00 Michael Eickenberg michael.eickenb...@gmail.com: To answer 1): yes, if you set cv=number, then it will do K-fold cross-validation with that number of folds. You can do this explicitly by using from sklearn.cross_validation import KFold cv = KFold(len(data), 6)

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 18:21 GMT+02:00 Pagliari, Roberto rpagli...@appcomsci.com: Is there a way to make prediction, once grid search is done? Right now I’m getting the error 'GridSearchCV' object has no attribute 'best_estimator_' Works fine here. What does `python -c 'import sklearn;

Re: [Scikit-learn-general] GridSearchVC with SVM

2014-07-23 Thread Lars Buitinck
2014-07-23 21:31 GMT+02:00 Pagliari, Roberto rpagli...@appcomsci.com: It says 0.15.0 Right now I am finding the optimal values manually, using cross_validation (by picking the best average). That can't be right. This attribute was in place in at least 0.14.0. How did you install

Re: [Scikit-learn-general] discrepancy of results with sklearn grid_search

2014-07-22 Thread Lars Buitinck
2014-07-23 0:58 GMT+02:00 Pagliari, Roberto rpagli...@appcomsci.com: Also, notice that I had to use gs.best_estimator_, and not gs.best_estimator, and also that the module name for me is sklearn and not scikits.learn. Has there been a change in recent versions? No. Not in recent versions. We

Re: [Scikit-learn-general] Cython profiling question

2014-07-16 Thread Lars Buitinck
2014-07-16 16:43 GMT+02:00 Andy t3k...@gmail.com: I'm pretty sure I could use yep for profiling, as mentioned in the docs: http://scikit-learn.org/dev/developers/performance.html#profiling-compiled-extensions and get line-by-line counts. However I did not manage to do that recently. I ususally

Re: [Scikit-learn-general] Cython profiling question

2014-07-16 Thread Lars Buitinck
2014-07-16 17:29 GMT+02:00 Andy t3k...@gmail.com: That is using google perftools. I thought you were referring to the bit about gprof. So you get line-by-line with google perftools without using debugging versions? How? I don't, I look at per-function cost.

Re: [Scikit-learn-general] Problem with the online documentation/website!!!

2014-07-09 Thread Lars Buitinck
2014-07-09 16:11 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: Prior to getting this message I notice that the load time of the home was slow. It might be caused by big thumbnails in the carousel. IIRC, @larsmans and @jaquesgrobler had worked to reduce the size of the thumbnails after the

Re: [Scikit-learn-general] Problem with the online documentation/website!!!

2014-07-09 Thread Lars Buitinck
2014-07-09 16:52 GMT+02:00 Lars Buitinck larsm...@gmail.com: I think that was done on the webserver rather than in version control. But I just cherry-picked the relevant commit into 0.14.X and pushed it. https://github.com/scikit-learn/scikit-learn/pull/3355 has a makefile target that runs

Re: [Scikit-learn-general] How the classifiers of sklearn are modified to handle sample weighting ?

2014-07-09 Thread Lars Buitinck
2014-07-09 19:56 GMT+02:00 Manoj Kumar manojkumarsivaraj...@gmail.com: Intuitively, a sample with higher weight, should be predicted more accurately and hence contribute more to the loss. Hence we just multiply, the sample weight of each term to its loss. For example, in the least square

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:00 GMT+02:00 Michael Eickenberg michael.eickenb...@gmail.com: That totally depends on your data. Here it looks like you are scaling down a feature that captures a lot of the variation you are looking for, thus making it less important with respect to the other features in the

Re: [Scikit-learn-general] higher accuracy with non scaled data

2014-07-08 Thread Lars Buitinck
2014-07-08 16:27 GMT+02:00 Sheila the angel from.d.pu...@gmail.com: First I scaled the complete data-set and then splitting it in test and train data. Not the cleanest option, but that should work. -- Open source

Re: [Scikit-learn-general] Is there a reference for the Bootstrap iterator?

2014-07-08 Thread Lars Buitinck
2014-07-08 21:33 GMT+02:00 Andreas Mueller t3k...@gmail.com: On Jul 8, 2014 8:40 PM, Chris Holdgraf choldg...@berkeley.edu wrote: Hey all - I know that Bootstrap has a billion papers on it, but I was wondering if there's a specific paper one should reference if we've been using the Bootstrap

Re: [Scikit-learn-general] Handle sparse data on Instance Reduction

2014-07-04 Thread Lars Buitinck
2014-07-04 10:28 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: 2014-07-04 3:35 GMT+02:00 Dayvid Victor victor.d...@gmail.com: Should I do the classifier setup in the __init__ (passing all arguments of the KNN to in the InstanceReduction constructor)? You might want to pass a KNN instance

Re: [Scikit-learn-general] Regarding partial_fit in naive_bayes

2014-07-04 Thread Lars Buitinck
2014-07-04 16:37 GMT+02:00 Kyle Kastner kastnerk...@gmail.com: You should probably read the paper: Training Highly Multiclass Classifiers http://jmlr.org/papers/v15/gupta14a.html That said, I think you could gain a lot of value by looking into hierarchical approaches - training a bunch of

Re: [Scikit-learn-general] Regarding partial_fit in naive_bayes

2014-07-03 Thread Lars Buitinck
2014-07-03 12:23 GMT+02:00 Kartik Kumar Perisetla kartik.p...@gmail.com: I am trying to use naive_bayes agorithm for training the model using partial_fit in scikit-learn. I tried with 16011( # of features) , 100 training instances and 1018664( total # of classes), I get an error when I invoke

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 21:03 GMT+02:00 Geetu Ambwani geet...@gmail.com: I imagine this transformer would be useful to others who use lucene for text analysis and already have access to term vectors and have the partial pipeline but might still want access to the various weighting schemes available in

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 23:44 GMT+02:00 Joel Nothman joel.noth...@gmail.com: Calculating TfIdf really isn't that hard. It's much easier for you to do so while transforming that into DictVectorizer input than for the library to be everything to everyone. Indeed. I just indexed 20news in ES, then did $

Re: [Scikit-learn-general] Extending TfIdf Vectorizer to use given idf set

2014-07-01 Thread Lars Buitinck
2014-07-01 23:58 GMT+02:00 Michael Eickenberg michael.eickenb...@gmail.com: (the 4th one is typically a kwarg it didn't care about) Ah: from elasticsearch import Elasticsearch es = Elasticsearch() hits = [es.termvector('20news', 'post', i, fields=['text']) for i in range(1, 4)] does the trick,

Re: [Scikit-learn-general] Difference between sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

2014-06-30 Thread Lars Buitinck
2014-06-30 0:28 GMT+02:00 Christian Jauvin cjau...@gmail.com: What explains the difference in terms of the Chi-Square value (0.5 vs 2) and the P-value (0.48 vs 0.157)? Here's the feature_extraction.chi2 algorithm: A = numpy.vstack(([[0,0]] * 18, [[0,1]] * 7, [[1,0]] * 42, [[1,1]] * 33)) X =

Re: [Scikit-learn-general] SVC.predict_proba result inconsistent with SVC.predict result

2014-06-26 Thread Lars Buitinck
2014-06-26 9:15 GMT+02:00 Andy t3k...@gmail.com: Maybe the calibration is not used for prediction? That would be a bit odd, though... That's exactly what's going on. Prediction is consistent with decision_function, but not predict_proba.

Re: [Scikit-learn-general] Explicitly mention loadings in PCA documentation/examples

2014-06-26 Thread Lars Buitinck
2014-06-26 11:36 GMT+02:00 federico vaggi vaggi.feder...@gmail.com: Would a pull request clarifying this be welcome, or do people think it's clear enough as is? Maybe mentioning that this is commonly known as 'loadings' would be enough. That sounds like a good idea.

Re: [Scikit-learn-general] Scikit learn's multiprocessing

2014-06-25 Thread Lars Buitinck
2014-06-25 4:50 GMT+02:00 Sturla Molden sturla.mol...@gmail.com: In general, only POSIX APIs are safe to use on both sides of a fork Actually, only a short list of async-signal-safe library routines [1, 2]. Practically all of POSIX is off-limits after fork in a multithreaded program. [1]

Re: [Scikit-learn-general] Using json files as dataset for clustering

2014-06-18 Thread Lars Buitinck
2014-06-18 19:05 GMT+02:00 Abijith Kp abijith@gmail.com: I would like to use a Json file from which I take the dataset for my clustering algorithm. The Json would be in the form of a nested dictionary. It would be great if someone could show me the correct direction regarding how to load

Re: [Scikit-learn-general] Exporting a scikit learn model

2014-06-16 Thread Lars Buitinck
2014-06-16 16:56 GMT+02:00 Joel Nothman joel.noth...@gmail.com: There is, at present, no standard way to do this (although PMML has been mooted). It depends entirely on which model class you want to export. Which? Apparently there's a third-party scikit-learn - PMML adapter package now:

Re: [Scikit-learn-general] Flexible Naive Bayes

2014-06-11 Thread Lars Buitinck
2014-06-11 15:54 GMT+02:00 Gavin Gray s0805...@sms.ed.ac.uk: I need to use Naive Bayes for mixed categorial and numerical data and was thinking of implementing a flexible Naive Bayes algorithm similar to Weka's instead of hacking my way around by converting the numerical to categorical or

Re: [Scikit-learn-general] Flexible Naive Bayes

2014-06-11 Thread Lars Buitinck
2014-06-11 18:16 GMT+02:00 Gavin Gray s0805...@sms.ed.ac.uk: Yeah, you'd have to hand in a ve ctor listing which distribution to use for each element in the feature vector. Weka might have a way round this, but I'll have to try using it to see what the interface is like. They reference a paper

Re: [Scikit-learn-general] Question with csc_matrix/csr_matrix and concatenations

2014-06-05 Thread Lars Buitinck
2014-06-05 13:44 GMT+02:00 ZORAIDA HIDALGO SANCHEZ zora...@tid.es: Why SelectKBest returns a csr_matrix and is that efficient? Because it's efficient (compact storage, fast row-wise operations such as matrix multiplication with CSR on the left), because it's easy to generate, and because CSR-CSC

Re: [Scikit-learn-general] Question with csc_matrix/csr_matrix and concatenations

2014-06-05 Thread Lars Buitinck
2014-06-05 15:04 GMT+02:00 Joel Nothman joel.noth...@gmail.com: It might also be appropriate to use sklearn.pipeline.FeatureUnion which will perform an hstack on the output of a number of Transformers. This may be complicated, but ensures that the transformers are fit correctly, particularly

Re: [Scikit-learn-general] Releasing 0.15

2014-06-05 Thread Lars Buitinck
2014-06-05 15:12 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: Actually I am pretty sure that this will prove problematic as we might want to have checkins specific for the 0.15.0 release such as version change, website tweaks and such. I think it's better to explicitly backport the

Re: [Scikit-learn-general] Releasing 0.15

2014-06-05 Thread Lars Buitinck
2014-06-05 15:24 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: I don't think this is possible and this is not what we did for previous releases: It's certainly possible, git merge --strategy=ours does a null merge where the result is just the state of master. The question is whether we

Re: [Scikit-learn-general] Releasing 0.15

2014-06-05 Thread Lars Buitinck
2014-06-05 15:43 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: The question is more in the even we start having release specific changes to the website such as some done in: https://github.com/scikit-learn/scikit-learn/commit/9cf67f961e3d0b493f432df0201d91bd0a7dcedf Right. Ok.

Re: [Scikit-learn-general] Storing the values of internal nodes in DecisionTrees

2014-06-02 Thread Lars Buitinck
2014-06-02 5:43 GMT+02:00 John Prior john.w.pr...@gmail.com: Does the optimization really save that much time/space? Yes, it does. We can now use multithreading instead of separate worker processes with each a copy of X/y. There was a PR for a tree-nested dict conversion at some point. I think

Re: [Scikit-learn-general] Anyone experience hanging when parallelizing fits?

2014-05-30 Thread Lars Buitinck
2014-05-30 22:34 GMT+02:00 Anders Aagaard aagaa...@gmail.com: Which blas implementation are you using? openblas is known to cause this issue. Same thought here, but this time it's MKL. -- Time is money. Stop wasting

Re: [Scikit-learn-general] Fwd: Pull request for multiple target SGDRegression

2014-05-28 Thread Lars Buitinck
2014-05-28 15:57 GMT+02:00 Andrew O'Harney ohar...@gmail.com: I'm new to the mailing list, so sorry if this is the wrong way to go about requesting information about work that is underway. I was wondering if there was any work planned on supporting multiple targets for the SGDRegression

Re: [Scikit-learn-general] Unexpected behavior using numpy.asarray with RandomForestClassifier

2014-05-26 Thread Lars Buitinck
2014-05-24 0:28 GMT+02:00 Steven Kearnes skear...@gmail.com: a is a list of the individual DecisionTreeClassifier objects belonging to the model, instead of a list containing the model itself. The same result occurs if I add dtype=object to np.asarray. Why is this happening? Is there a way to

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-23 Thread Lars Buitinck
2014-05-22 8:13 GMT+02:00 Gilles Louppe g.lou...@gmail.com: Just for letting you know, my talk Accelerating Random Forests in Scikit-Learn was approved for EuroScipy'14. Details can be found at https://www.euroscipy.org/2014/schedule/presentation/9/. My slides are far from being ready, but my

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-23 Thread Lars Buitinck
2014-05-23 11:08 GMT+02:00 Gilles Louppe g.lou...@gmail.com: Thanks! Oh, I would be interested in seeing them. Could send me the link if you still have them? Here's one with quicksort:

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-23 Thread Lars Buitinck
2014-05-23 11:35 GMT+02:00 Gilles Louppe g.lou...@gmail.com: Thanks! This is really cool! I think I'll try to reproduce some of them and put one or two in my slides. I used Fabian's extension_profiler to produce these. https://github.com/fabianp/extension_profiler

Re: [Scikit-learn-general] Using CBLAS libraries externally, setup (Quick Question)

2014-05-21 Thread Lars Buitinck
2014-05-21 13:47 GMT+02:00 Olivier Grisel olivier.gri...@ensta.org: This is a great trick. We might want to get rid of our own partial copy of CBLAS at some point. I remember Radim (gensim maint) describing some trouble with BLAS ABIs on a mailing list some time ago, but I can't find the mail

Re: [Scikit-learn-general] Using CBLAS libraries externally, setup (Quick Question)

2014-05-21 Thread Lars Buitinck
2014-05-21 13:59 GMT+02:00 Sergio Pascual sergio.pa...@gmail.com: This is the patch we use in fedora to compile scikit-learn 0.14.1 with system cblas http://pkgs.fedoraproject.org/cgit/python-scikit-learn.git/tree/sklearn-unbundle-cblas.patch But the build system should skip our copy if CBLAS

Re: [Scikit-learn-general] sklearn 0.14.0 TypeError: 'NDArrayWrapper' object does not support indexing

2014-05-12 Thread Lars Buitinck
2014-05-10 19:58 GMT+02:00 Xiandi Zhang zxd_ci...@hotmail.com: I can run the same successfully in sklearn 0.13. But got this error when I upgraded to 0.14. This is my first post in this list. If there is anything I need to specifically provide, in case of reporting an issue, please let me

Re: [Scikit-learn-general] Scikit-learn options for mono-GPU / GPU-cluster Parallelised code-execution

2014-05-07 Thread Lars Buitinck
2014-05-07 9:41 GMT+02:00 Matthieu Brucher matthieu.bruc...@gmail.com: IMHO GPU will be usable when CPU and GPU memories will be integrated without move cost. Before, GPU will be hype without mainstream usage. My thought exactly. Without mapping device memory into the CPU's virtual memory, no

Re: [Scikit-learn-general] training score in GridSearchCV?

2014-05-03 Thread Lars Buitinck
2014-05-03 11:34 GMT+02:00 Joel Nothman joel.noth...@gmail.com: ... as long as no one ever wrote code like: for parameters, mean_validation_score, cv_validation_scores in clf.grid_scores_: That is actually how tuples are supposed to be used... I suggest next time we have to return custom

Re: [Scikit-learn-general] Why *sorted* feature_names_ in dict_vectorizer.fit?

2014-05-01 Thread Lars Buitinck
2014-05-01 15:59 GMT+02:00 Ian Ozsvald i...@ianozsvald.com: Hello. I'm looking at feature_extraction.dict_vectorizer and I'm wondering why fit() and restrict() use a sorted list of feature names rather than their naturally-encountered order? Is there an algorithmic requirement somewhere for

Re: [Scikit-learn-general] HMM Incorporation

2014-04-28 Thread Lars Buitinck
2014-04-28 18:56 GMT+02:00 Jacob Schreiber jmschreibe...@gmail.com: I understand that HMMs do not perform classification in the same manner as SVMs or Random Forest, but why is it not desirable to create a new section to handle HMMs and possibly other graphical models? They seem like an

Re: [Scikit-learn-general] Classifier for Neural Networks

2014-04-27 Thread Lars Buitinck
2014-04-27 18:35 GMT+02:00 Danny Sullivan dsulliv...@hotmail.com: Is there an intention to add a classifier, like a predict method, to the BernoulliRBM class? You mean a discriminative RBM? -- Start Your Social Network

Re: [Scikit-learn-general] Classifier for Neural Networks

2014-04-27 Thread Lars Buitinck
2014-04-27 19:06 GMT+02:00 Danny Sullivan dsulliv...@hotmail.com: I see that BernoulliRBM is used primarily as preprocessing to pass off to a classification algorithm. I initially started thinking about after I saw an image processing problem using Neural Networks for classification. So my

Re: [Scikit-learn-general] Using CBLAS libraries externally, setup (Quick Question)

2014-04-27 Thread Lars Buitinck
2014-04-27 18:53 GMT+02:00 Sturla Molden sturla.mol...@gmail.com: Unlike the NumPy _dotblas module, SciPy uses an f2py wrapper that actually exports a function poiinter. Using this scheme to code a fake cblas layer is not difficult either. I think this question comes about because Manoj is

  1   2   3   4   5   6   >