Re: [Scikit-learn-general] sklearn Hackathon during ICML ?

2016-04-12 Thread Vlad Niculae
I would definitely join the sprint, anything after June 17 works for me. I was thinking to come hang around during ICML, even if I might not be able to afford the conference. Cheers, Vlad On Tue, Apr 12, 2016 at 11:39 AM, Andreas Mueller wrote: > So should we pick another or

Re: [Scikit-learn-general] Latent Dirichlet Allocation

2016-02-09 Thread Vlad Niculae
I usually use an absolute threshold for min_df and a relative one for max_df. I find it very useful to look at the histogram of word dfs for choosing the latter, it varies a lot from dataset to dataset. For short texts, like tweets, words such as "the" can have a df of 0.1. It's very easy to look

Re: [Scikit-learn-general] Parameter estimation by Customised Cross Validation

2016-02-05 Thread Vlad Niculae
Hi Mamun, If your cluster labels are known, you can use the LabelShuffleSplit ore LeavePLabelOut cross-validation generators. HTH, Vlad On Fri, Feb 5, 2016 at 10:05 AM, Mamun Rashid wrote: > Hi Folks, > I have a two class classification problem where the positive

Re: [Scikit-learn-general] maximum and minimum regularization for NMF

2016-02-02 Thread Vlad Niculae
Hi James, I'm not sure how useful a minimum alpha would be. Even if no weights are shrunk quite to zero, the regularization can still impact performance metrics. I would be curious what application you have in mind for this. The max alpha question is interesting, I am curious as well. (Sorry my

Re: [Scikit-learn-general] Analyzer and tokenizer in (Count/TfIdf)Vectorizer

2015-12-07 Thread Vlad Niculae
In the case of "char_wb" it sounds indeed like a custom tokenizer should be called if given. That would require a different implementation than the current one, however. You might want to file an issue. Sebastian's suggestion works, but note that scikit-learn's default tokenization is not the

Re: [Scikit-learn-general] passing parameters to a transformer

2015-04-29 Thread Vlad Niculae
Is there a reason why you are (still) not respecting the API constraints for custom estimators given in the documentation? __init__ should only set parameters on self that have (exactly) the same name as the arguments passed to it. Your __init__ should be: self.k = k

Re: [Scikit-learn-general] Topic extraction

2015-04-29 Thread Vlad Niculae
Another thing I've seen people do is to threshold based on the difference between the scores of the best and second best topics. (Only take documents with a clear winning topic.) For estimating the number of topics, you can use cross-validation. Vlad On Wed, Apr 29, 2015 at 12:42 AM, Joel

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
Hi Roberto what does None do for max_depth? Copy-pasted from http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.” In particular,

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
, Pagliari, Roberto wrote: Hi Vlad, when using randomized grid search, does sklearn look into intermediate values, or does it samples from the values provided in the parameter grid? Thank you, From: Vlad Niculae [zephy...@gmail.com] Sent: Monday, April 20

Re: [Scikit-learn-general] randomized grid search

2015-04-20 Thread Vlad Niculae
-optimization Vlad On 20 Apr 2015, at 15:34, Vlad Niculae zephy...@gmail.com wrote: The example you cite contains these lines: max_features: sp_randint(1, 11), min_samples_split: sp_randint(1, 11), min_samples_leaf: sp_randint(1, 11), Those

Re: [Scikit-learn-general] GSoC 2015: Global optimization based Hyper parameter optimization (SMAC)

2015-03-31 Thread Vlad Niculae
In order to support discrete parameters, our tree implementation would need to support categorical variables though. Ah, good point, I didn’t think about that. But we could use the usual hacks (integer or one-hot encoding). I wonder how that compares to using GPs and rounding when it

Re: [Scikit-learn-general] GSoC 2015: Global optimization based Hyper parameter optimization (SMAC)

2015-03-31 Thread Vlad Niculae
Hi Gael, On 31 Mar 2015, at 14:01, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Why do you think the GP route is easier? Because we already have GPs. Well, we already have random forests too. Both cases would need quite a bit of machinery on top, and I don’t know the extent of

Re: [Scikit-learn-general] [GSoC 2015] Cross-validation and Meta-Estimators for semi-supervised learning

2015-03-24 Thread Vlad Niculae
Hi Boyuan, hi everyone, On top of what Andy said, I would like to add that you don’t have to commit to certain algorithms in the proposal, as long as you make the plan very clear, and you leave time for discussing alternatives, pros and cons with the community. Since you say there is some

Re: [Scikit-learn-general] GSoC 2015 Proposal: Multiple Metric Learning

2015-03-24 Thread Vlad Niculae
Hi Raghav, hi everyone, If I may, I have a very high-level comment on your proposal. It clearly shows that you are very involved in the project and understand the internals well. However, I feel like it’s written from a way too technical perspective. Your proposal contains implementation

Re: [Scikit-learn-general] GSoC2015 Improve GMM

2015-03-24 Thread Vlad Niculae
Hi Wei Xue, hi everyone, I think Andy’s comments about testing and documentation are very important. I have just a few things to add: 1. As confused as I am about the world around me, I still knew that the current year is 2015 :P I think that the form is asking “which year of your program you

Re: [Scikit-learn-general] GSoC2015 Hyperparameter Optimization topic

2015-03-24 Thread Vlad Niculae
Hi Cristoph, Gael, hi everyone, On 24 Mar 2015, at 18:09, Gael Varoquaux gael.varoqu...@normalesup.org wrote: Don't you think that I could also benchmark models that are not implemented in sklearn? […] I am personally less interested in that. We have already a lot in scikit-learn and

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Vlad Niculae
Hi Vinayak, The wiki page just lists a subset of possible topics for which candidates already showed concrete interest. I think an application for low-rank matrix completion would be more than welcome. It’s very important to work on a topic that you are interested in directly, versus just

Re: [Scikit-learn-general] Question regarding the list of topics for GSoC 2015

2015-03-23 Thread Vlad Niculae
some work upon that, but I didn't get any feedback. On Tue, Mar 24, 2015 at 3:23 AM, Vlad Niculae zephy...@gmail.com wrote: Hi Vinayak, The wiki page just lists a subset of possible topics for which candidates already showed concrete interest. I think an application for low-rank matrix

Re: [Scikit-learn-general] Regarding viewing the decision boundaries of classifiers

2015-02-21 Thread Vlad Niculae
Apologies in advance, but this fits so well, I couldn’t help myself. A Mathematician and an Engineer attend a lecture by a Physicist. The topic concerns Kulza-Klein theories involving physical processes that occur in spaces with dimensions of 9, 12 and even higher. The Mathematician is sitting,

Re: [Scikit-learn-general] same cross validation score with different parameter configurations

2015-02-18 Thread Vlad Niculae
Hi Roberto, This is explained in the Python standard library documentation: https://docs.python.org/3/library/functions.html#sorted Cheers, Vlad On 18 Feb 2015, at 21:33, Pagliari, Roberto rpagli...@appcomsci.com wrote: what does sorted do if the best average cv score is the same? how

Re: [Scikit-learn-general] which methods do I need to implement for a regressor?

2015-02-16 Thread Vlad Niculae
Hi Roberto, This is all documented in more detail here: [1] The transform looks good (just that you might want to add a flag to avoid memory copies when you can afford to destroy the original data). It’s not clear what the intention of `my_param` is here. It’s not user specified, right?

Re: [Scikit-learn-general] custom regressor keeps failing

2015-02-16 Thread Vlad Niculae
Hi Roberto, Everything I say below is also explained in the developers documentation that I linked to in the other e-mail. [1] You are breaking some conventions that make the default `get_params` and `set_params` not work well. As I said in the other thread, fitted attributes are suffixed

Re: [Scikit-learn-general] Feature selection and cross validation; and identifying chosen features

2015-02-11 Thread Vlad Niculae
On 11 Feb 2015, at 16:31, Andy t3k...@gmail.com wrote: On 02/11/2015 04:22 PM, Timothy Vivian-Griffiths wrote: Hi Gilles, Thank you so much for clearing this up for me. So, am I right in thinking that the feature selection is carried for every CV-fold, and then once the best

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-17 Thread Vlad Niculae
Hi Luca x_3_dimensional = x.dot(spca.components_.T) # this is equivalent to spca.transform(x) This part is specific to PCA. In general, the transform part of such a decomposition is `X * components ^ -1`. In PCA, because `components` is orthogonal, `components ^ -1` is `components.T`. The

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-17 Thread Vlad Niculae
To clarify, it is *not* the case that `x.dot(spca.components_.T) ` is equivalent to `spca.transform(x)`. The latter performs a solve. Best, Vlad On Fri, Oct 17, 2014 at 12:03 PM, Vlad Niculae zephy...@gmail.com wrote: Hi Luca x_3_dimensional = x.dot(spca.components_.T) # this is equivalent

Re: [Scikit-learn-general] Data reconstruction after SparsePCA

2014-10-16 Thread Vlad Niculae
Hi Luca, The other part of the decomposition that you're missing is available in `spca.components_` and has shape `(n_components, n_features)`. The approximation of X is therefore `np.dot(x_3_dimensional, spca.components_)`. Best, Vlad On Thu, Oct 16, 2014 at 6:07 PM, Luca Puggini

Re: [Scikit-learn-general] Inputer, python list and strings

2014-09-25 Thread Vlad Niculae
Hi Zoraida, The Imputer assumes that your data is a numeric numpy array, or convertible to one. You should replace your string NA values with np.nan objects, then use the Imputer with the default, `missing_values='NaN'`. It's easier to debug if you explicitly convert your data to a float numpy

Re: [Scikit-learn-general] delta idf and bm25

2014-08-23 Thread Vlad Niculae
Hi Pavel, First of all, this is an interesting subject, thanks for bringing it up! I fear that it's too domain-specific to go very deep in this direction. That being said, and trying to interpret your benchmarks, it seems that Delta-idf might actually be interesting. Or, more generally, the idea

Re: [Scikit-learn-general] Custom Scoring Functions for Grid Search

2014-08-20 Thread Vlad Niculae
It has confused me as well, +1. It's counterintuitive and broken, in my opinion. Vlad On Wed, Aug 20, 2014 at 2:31 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: It's been around for so long, but it's also hard to believe that anyone exploited this behaviour intentionally. Shall we

Re: [Scikit-learn-general] VarianceThreshold

2014-08-17 Thread Vlad Niculae
Also, the class is well documented, but because of an omission, it wasn't linked from the API page at the time of the last stable release. This has been fixed in the development version, so you can read the docs in a friendlier way here [1]. Best, Vlad [1]

Re: [Scikit-learn-general] How to implement cross_val_score scoring function with a weights array?

2014-08-03 Thread Vlad Niculae
Hi, If you want to get `sample_weights` working with the current master, the easiest is to take PR 3524 and either pass it through `fit_params` or just undo the last commit in the branch. I needed to change a couple of things to get 1574 up to date with the current master, but nothing else is

Re: [Scikit-learn-general] Sparse NMF

2014-06-25 Thread Vlad Niculae
Hi, Allow me to clarify. We don't implement Hoyer's sparse update rule indeed (it shouldn't say this implements, I initially cited Hoyer for motivating sparseness constraints in NMF). Instead, we implement a version of sparse NMF with a clear (but not particularly elegant) objective function,

Re: [Scikit-learn-general] Sparse NMF

2014-06-25 Thread Vlad Niculae
wrote: i will never post a docstring again :) sorry for the noise michael On Wednesday, June 25, 2014, Vlad Niculae zephy...@gmail.com wrote: Hi, Allow me to clarify. We don't implement Hoyer's sparse update rule indeed (it shouldn't say this implements, I initially cited Hoyer

Re: [Scikit-learn-general] About weekly posts for GSoc 2014

2014-06-01 Thread Vlad Niculae
IIRC, weekly post are not a GSoC requirement but they are a _PSF_ requirement, and since scikit-learn is participating to GSoC under the PSF umbrella, the requirement applies to us. I think it's great incentive to think of your work in terms of what you could show to others. No matter how

Re: [Scikit-learn-general] My talk was approved for EuroScipy'14

2014-05-22 Thread Vlad Niculae
This is great news, congratulations Gilles! Cheers, Vlad On May 22, 2014 8:15 AM, Gilles Louppe g.lou...@gmail.com wrote: Hi folks, Just for letting you know, my talk Accelerating Random Forests in Scikit-Learn was approved for EuroScipy'14. Details can be found at

Re: [Scikit-learn-general] Belief propagation and message-passing methods

2014-03-19 Thread Vlad Niculae
Hi John, I believe general inference methods are out of scope for scikit-learn. Even general structured learning algorithms are not in scope at the moment, as it's hard to fit problems in numpy arrays. For learning, you might want to check out pystruct [1]. If you just want inference,

Re: [Scikit-learn-general] Query in Sparse matrices: scipy.linalg.get_blas_funcs()

2014-03-18 Thread Vlad Niculae
Hi Manoj, For efficiency, the BLAS api defines different functions for different underlying datatypes (float32, float64, complex64, complex128). The scipy get_blas_funcs utility has the role of getting the Python wrapper for the given BLAS functions (in this case 'swap' and 'nrm2', that's

Re: [Scikit-learn-general] GSoC

2014-03-17 Thread Vlad Niculae
This program is granted free of charge for research and education purposes. However you must obtain a license from the author to use it for commercial purposes. Definitely FEST is not BSD compatible :( Vlad On 17/3/2014 14:19 , Arnaud Joly wrote: Hi, The support for sparse matrices

Re: [Scikit-learn-general] GSoC 2014 Proposal - Improving Linear Models (First draft)

2014-03-07 Thread Vlad Niculae
In some cases it might be preferable to fit an OvA model. In those cases, I think the user code would look nicer and more explicit if it'd use the sklearn.multiclass.OneVsRest encoder. The downside is that we'll need to go through an ugly deprecation cycle for a major class in the library.

Re: [Scikit-learn-general] GSoC - Completing my Neural Network PRs and more

2014-02-26 Thread Vlad Niculae
On Wed Feb 26 13:32:08 2014, Gael Varoquaux wrote: documentation and example This was exactly my thought. Many such (near-)equivalences are not obvious, especially for beginners. If Lars's hinge ELM and RBF network would work well (or provide interesting feature visualisations) on some

Re: [Scikit-learn-general] Parallel computing of Mahalanobis distances

2014-02-24 Thread Vlad Niculae
If you're affiliated with a university, Anaconda has free academic licenses that include MKL and their optimized builds. Vlad On Mon Feb 24 09:22:07 2014, Javier Martínez-López wrote: That is great, thanks! I do not have the mkl module (it isn't free, right?) but with your script the

Re: [Scikit-learn-general] Query with fit_intercept param

2014-02-15 Thread Vlad Niculae
Hi Manoj, In the first example, the intercept is not regularized, hence the difference. Vlad On Feb 15, 2014 8:54 AM, Manoj Kumar manojkumarsivaraj...@gmail.com wrote: Hello I have a query with fit_intercept parameter in most of the estimators. When we have a linear model like w0 + w1*x1 +

Re: [Scikit-learn-general] Contributing to Scikit

2014-02-02 Thread Vlad Niculae
I've heard stchee-kit once, along with stchee-pee and num-pee. Vlad On Sun Feb 2 18:39:58 2014, Hadayat Seddiqi wrote: i always said skikit On Sun, Feb 2, 2014 at 12:20 PM, Andy t3k...@gmail.com mailto:t3k...@gmail.com wrote: On 02/02/2014 12:06 PM, Olivier Grisel wrote: Note:

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-28 Thread Vlad Niculae
I like the locality-sensitive hashing idea! Vlad On Tue Jan 28 10:04:36 2014, Nick Pentreath wrote: This would be a great addition. Some ideas /code perhaps: http://nearpy.io/ On Tue, Jan 28, 2014 at 10:59 AM, Mathieu Blondel math...@mblondel.org mailto:math...@mblondel.org wrote:

Re: [Scikit-learn-general] Scikit-Learn for android

2014-01-19 Thread Vlad Niculae
I don't think Weka (at least the interesting parts of it) could run on Android either. I don't really foresee the whole Scipy stack running on Android; maybe one day when all dependencies are rewritten in PyPy and are faster and still 100% compatible... One thing that would be possible (but I

Re: [Scikit-learn-general] A poster about scikit-learn at Giga-day

2014-01-17 Thread Vlad Niculae
Hi Arnaud, awesome poster! Here are a few things that popped out: Firstly, I doubt it matters, but some of the links are mangled. Then, I think it should say students' master's theses or something like this (plural). Also the chromosome 15 sounds strange to me compared to chromosome 15.

Re: [Scikit-learn-general] Suggestion to add author names/emails at the bottom of module documentations

2014-01-16 Thread Vlad Niculae
I would rather have this sorted out through the github issue tracker. I don't think it's a good idea to encourage users to e-mail individual developers. Someone else could have the expertise and do the change confidently. My 2c, Vlad On Thu Jan 16 18:12:05 2014, Issam wrote: Hi scikit-learn

Re: [Scikit-learn-general] Releasing joblib 0.8a

2013-12-20 Thread Vlad Niculae
Works exactly as you described on my machine (which doesn't mean much because it's relatively close to yours, but I am just too enthusiastic about this not to reply! \o/) Memory usage is as expected. I see a speedup in train time but a slight slowdown in test time (1.7 vs 1.0), is it expected or

Re: [Scikit-learn-general] Releasing joblib 0.8a

2013-12-20 Thread Vlad Niculae
propose to turn off multiprocessing at prediction time - this might backfire quite easily. 2013/12/20 Olivier Grisel olivier.gri...@ensta.org 2013/12/20 Vlad Niculae zephy...@gmail.com: Works exactly as you described on my machine (which doesn't mean much because it's relatively close to yours

Re: [Scikit-learn-general] Updated KMeansCoder now available as gist

2013-12-13 Thread Vlad Niculae
Great, thanks a lot! I'm also curious about what you're running it on and about how the performance is. Vlad On Fri, Dec 13, 2013 at 7:11 PM, Olivier Grisel olivier.gri...@ensta.org wrote: Nice. Have you used it with success for real image classification tasks? I see you have been involved

Re: [Scikit-learn-general] Updated KMeansCoder now available as gist

2013-12-13 Thread Vlad Niculae
On Fri, Dec 13, 2013 at 12:20 PM, Vlad Niculae zephy...@gmail.com wrote: Great, thanks a lot! I'm also curious about what you're running it on and about how the performance is. Vlad On Fri, Dec 13, 2013 at 7:11 PM, Olivier Grisel olivier.gri...@ensta.org wrote: Nice. Have you used

Re: [Scikit-learn-general] from sklearn.all import *

2013-12-02 Thread Vlad Niculae
Personally I'd rather be a bit frustrated but have tab completion and pyflakes warnings. I avoid using star imports even in hackish scripts. I assume the warning will create unnecessary confusion when people learn to use the star import first. These users will probably feel that the warning is a

Re: [Scikit-learn-general] release time

2013-11-30 Thread Vlad Niculae
I guess remove means deprecate, right? I am +1 but we should definitely find a place for the code. Worse case it will be a repo with containing just the HMM. My thoughts exactly; my impression is that people do find the code useful and it's reasonably readable. It should definitely go into a

Re: [Scikit-learn-general] release time

2013-11-30 Thread Vlad Niculae
seqlearn uses a different API on purpose though (one big ndarray), whereas pystruct uses lists of arrays but is only focused on max-margin learning :) On Sat, Nov 30, 2013 at 12:38 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: +1 on the whole thread. I was hoping that Lars's seqlearn

[Scikit-learn-general] Fwd: Problem with scikit learn kernel PCA

2013-11-25 Thread Vlad Niculae
it will be the same one. But I'm not the best person to ask, I've never even used the Kernel PCA. Cheers, Vlad -- Forwarded message -- From: Vlad Niculae v...@vene.ro Date: Mon, Nov 25, 2013 at 10:41 PM Subject: Fwd: Problem with scikit learn kernel PCA To: Vlad Niculae zephy...@gmail.com

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-19 Thread Vlad Niculae
I finally found a desk and some focus. I addressed Mathieu's suggestions and added some timings on real data (with a lot of concessions so that it would run reasonably quick on my machine). Here's the results: http://nbviewer.ipython.org/7224672 It becomes clear that `tol` still means different

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-08 Thread Vlad Niculae
Re: the discussion we had at PyCon.fr, I noticed that the internal elastic net coordinate descent functions are parametrized with `l1_reg` and `l2_reg`, but the exposed classes and functions have `alpha` and `l1_ratio`. Only yesterday there was somebody on IRC who couldn't match Ridge with

Re: [Scikit-learn-general] Automated benchmarking

2013-11-08 Thread Vlad Niculae
We have an instance of vbench continuously running [1] that I did as a GSoC project last year. For some reason it seems that the links don't generate properly now, but it still works (though all data got lost in a jenkins setup incident this summer). Here are some linear model benchmarks for

Re: [Scikit-learn-general] Automated benchmarking

2013-11-08 Thread Vlad Niculae
Vlad, that's exactly what I've been looking for! Thanks, Karol 2013/11/8 Vlad Niculae zephy...@gmail.com We have an instance of vbench continuously running [1] that I did as a GSoC project last year. For some reason it seems that the links don't generate properly now, but it still works

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
, Olivier Grisel olivier.gri...@ensta.org wrote: 2013/11/7 Vlad Niculae zephy...@gmail.com: Hi everybody, I just updated the gist quite a lot, please take a look: http://nbviewer.ipython.org/7224672 I'll go to sleep and interpret it with a fresh eye tomorrow, but what's interesting at the moment

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
? On Thu, Nov 7, 2013 at 11:12 AM, Vlad Niculae zephy...@gmail.com wrote: The regularization is the same, I think the higher residuals come from the fact that the gradient is raveled, so compared to `n_targets` independent problems, it will take different steps. I don't think there are any

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
4 35.7 MiB 0.0 MiB def linalg(X): 5 42.7 MiB 7.0 MiB return np.linalg.norm(X, 'fro') On Thu, Nov 7, 2013 at 11:46 AM, Vlad Niculae zephy...@gmail.com wrote: Come to think of it, Olivier, what do you mean when you say L-BFGS-B has higher residuals

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
0.9300757 2900.9297058 2970.9262745 3040.9274619 3110.9275654 Name: residual, dtype: object It looks spot on. Note that tolerance is 1e-3. Any idea how to make it visible in the plot when two lines are so close? On Thu, Nov 7, 2013 at 12:26 PM, Vlad Niculae zephy...@gmail.com wrote

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
This is a known problem with np.linalg.norm, and so is the memory consumption. You should use sklearn.utils.extmath.norm for the Frobenius norm. Hmm. Indeed I missed that, but still, this is a bit odd. sklearn.utils.extmath.norm is slower than raveling on my anaconda with MKL accelerate setup:

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-07 Thread Vlad Niculae
I feel like this would go against explicit is better than implicit, but without it grid search would indeed be awkward. Maybe: if self.alpha_coef == 'same': alpha_coef = self.alpha_comp ? On Thu, Nov 7, 2013 at 4:19 PM, Mathieu Blondel math...@mblondel.org wrote: On Thu, Nov 7, 2013 at

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-11-06 Thread Vlad Niculae
lasso (as well as the sparse variant). Is there any other reason for this or just that nobody needed it? Cheers, Vlad On Wed, Oct 30, 2013 at 10:40 AM, Vlad Niculae zephy...@gmail.com wrote: Thanks Mathieu, well part of it comes from your gist (I added an attribution now) ;) Non-negative

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
i guess it's just a bug in how the solvers return residuals, I'll add some unit tests with manually-computed residuals to check. On Wed, Oct 30, 2013 at 9:48 AM, Olivier Grisel olivier.gri...@ensta.org wrote: Does anyone have a explanation for the discrepancy in the residuals for the lbfgs-b

Re: [Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-30 Thread Vlad Niculae
Thanks Mathieu, well part of it comes from your gist (I added an attribution now) ;) Non-negative lasso is really interesting, I forgot about it but I think it would be very interesting to compare qualitatively. Vlad On Wed, Oct 30, 2013 at 10:15 AM, Olivier Grisel olivier.gri...@ensta.org

[Scikit-learn-general] Benchmarking non-negative least squares solvers, work in progress

2013-10-29 Thread Vlad Niculae
Hi all, During the PyCon sprint I kept digging into the NMF and specifically ways to solve each sub-iteration. It became clear that the alternating NLS approach finds good reconstructions and converges well, but the NLS solving step is critical and must be optimized. I have started looking into

Re: [Scikit-learn-general] Multi Class Classification

2013-10-20 Thread Vlad Niculae
Hi, We refer to such a setting as *multi-label*. Please take a look at http://scikit-learn.org/stable/modules/multiclass.html Yours, Vlad On Sun, Oct 20, 2013 at 1:19 PM, Mahendra Kariya geek3142-skle...@yahoo.co.in wrote: Hi, I am trying to do multi class classification using NB or linear

Re: [Scikit-learn-general] Multiclass Logistic Regression.

2013-09-25 Thread Vlad Niculae
There are still a few things that are not clear to me from the documentation. Can you customize the classifier to perform a different decision function? You can subclass it and override the decision_function method. While true, this can be misleading. You're just changing the final step used

Re: [Scikit-learn-general] Error when using an array for one feature linear regression

2013-09-24 Thread Vlad Niculae
Just to add, I don't think you need to reshape y. And reshaping x can be more briefly stated as x[:, np.newaxis]. In my opinion supporting such cases, while convenient for users, would lead to annyoing branches and code that is harder to maintain and test. The important thing is being consistent.

Re: [Scikit-learn-general] Does scikit RBM support continuous values?

2013-09-17 Thread Vlad Niculae
And under the current implementation, implementing them involves changing only the sampling and energy computation, I think. I discussed this with Gabriel Synnaeve during the sprint and I think he was working on the gaussian version, it might be on his repo. Lars, do you have any practical

Re: [Scikit-learn-general] Shining Panda emails

2013-09-10 Thread Vlad Niculae
Also, the builds fail quite rarely (with the exception of the last few weeks). And when they do, I think these e-mails make sure that it gets fixed faster than without them. It's better not to unsubscribe. Even if it's annoying if it's *definitely* not your fault (documentation PRs) sometimes you

Re: [Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-29 Thread Vlad Niculae
PM, Olivier Grisel olivier.gri...@ensta.orgwrote: 2013/8/28 Lars Buitinck l.j.buiti...@uva.nl: 2013/8/28 Vlad Niculae zephy...@gmail.com: Do the indices/indptr arrays need to be int32 or is this a limitation of the implementation? This is a limit in scipy.sparse, which uses signed int

Re: [Scikit-learn-general] Files at sourceforge

2013-08-29 Thread Vlad Niculae
It's about redirecting /dev and /stable to the appropriate fixed paths. Actually I remember that this has been looked into, I vaguely remember a thread a while back. I think the problem is that we couldn't move to github while keeping all the old links and looking the same in the eyes of the

Re: [Scikit-learn-general] Testing small code peices

2013-08-29 Thread Vlad Niculae
If you're writing an external script that just interfaces with scikit-learn and you intend to keep it separately distributable (3rd party), you can replace them with absolute imports: ``` from sklearn.base import ClassifierMixin, RegressorMixin from sklearn.externals.joblib import Parallel,

Re: [Scikit-learn-general] Nonn-ASCII in source files

2013-08-28 Thread Vlad Niculae
I'll have to side slightly against Lars on this one. I agree with Lars that any software that doesn't support these is broken, that Unicode looks better than other ad-hoc formatting. If the software works, often the fonts won't. Personally if I'd need to see the source and find characters

[Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-28 Thread Vlad Niculae
Hi all, I got an unexpected error with current master, when trying to run TfidfVectorizer on a 2 billion token corpus. /home/vniculae/envs/sklearn/local/lib/python2.7/site-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab) 728

Re: [Scikit-learn-general] Overflow when vectorizing large corpus

2013-08-28 Thread Vlad Niculae
After doing it again with pdb I figured out that it has nothing to do with vocabulary size, which is decent; the list of indices simply grows too big. Vlad On Wed, Aug 28, 2013 at 11:01 PM, Vlad Niculae zephy...@gmail.com wrote: Hi all, I got an unexpected error with current master, when

Re: [Scikit-learn-general] starting sklearn

2013-08-24 Thread Vlad Niculae
on nosetests. I have made some progress using scikits and learning python but I never got that to work. Thanks again, Don On Aug 24, 2013, at 6:16 PM, Vlad Niculae zephy...@gmail.com wrote: The `python` and `nosetests` executables that you are running are probably not the macports ones. Type `which

Re: [Scikit-learn-general] Segfault with large dataset

2013-08-24 Thread Vlad Niculae
Is it maybe related to the OS, as it seems that the problem is with opening the memmapped file? Vlad On Sat, Aug 24, 2013 at 1:52 PM, Olivier Grisel olivier.gri...@ensta.orgwrote: Sounds like a serious bug, could you please open an issue on github? -- Olivier

Re: [Scikit-learn-general] PyStruct 0.1 released

2013-08-11 Thread Vlad Niculae
Congratulations Andy! Thanks for all your hard work on this. This is a good moment for pystruct to gain some momentum! Cheers, Vlad On Sun, Aug 11, 2013 at 8:55 PM, Andreas Mueller amuel...@ais.uni-bonn.de wrote: Hey everybody. I just wanted to spam the ML again and say I just released

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
Sorry, but I can't find the issue, you posted the same link twice. Those errors are very similar to what I was getting before figuring out that I need to use nosetests3 instead of nosetests. Vlad On Mon, Jul 29, 2013 at 10:35 AM, Olivier Grisel olivier.gri...@ensta.org wrote: I found problems

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
I can do it; the question is whether to build against anaconda or against binary numpy/scipy; and whether it matters. I'll see if I can check. On Mon, Jul 29, 2013 at 12:09 PM, Olivier Grisel olivier.gri...@ensta.org wrote: 2013/7/29 Olivier Grisel olivier.gri...@ensta.org: I found problems

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
Or simply hide the 0.14a1 release? It should still stay pip installable if you use the right magic words, right? On Mon, Jul 29, 2013 at 1:35 PM, Andreas Mueller amuel...@ais.uni-bonn.de wrote: On 07/29/2013 01:20 PM, Andreas Mueller wrote: On 07/29/2013 01:13 PM, Olivier Grisel wrote: Maybe

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
I uploaded the windows binaries manually through the web interface with no issue. Unrelated question: We could go for a python3.3 binary too, but I would need to build it using the (free) scipy installed with Anaconda, because official scipy doesn't provide binaries for python 3.3. From what I

Re: [Scikit-learn-general] Feature freeze

2013-07-29 Thread Vlad Niculae
/ On Mon, Jul 29, 2013 at 1:58 PM, Gael Varoquaux gael.varoqu...@normalesup.org wrote: On Mon, Jul 29, 2013 at 01:54:21PM +0200, Vlad Niculae wrote: I uploaded the windows binaries manually through the web interface with no issue. I might give up and upload it manually, but I tend to like

Re: [Scikit-learn-general] 20 newsgroups classification

2013-07-26 Thread Vlad Niculae
Hi Harold, Only the current development version, and the upcoming release, has, as of recently, support for Python 3. Even so, it won't be easy to support 3.2, we just aim for 3.3 at the moment. This being said, I have no idea what causes this specific error. That line seems unchanged in the

Re: [Scikit-learn-general] Error while building Scikit-learn in Windows (32-bit)

2013-07-22 Thread Vlad Niculae
The unable to find vcvarsall.bat error is because you don't have environment variables set appropriately. Click StartProgramsVisual Studio C++ Express Visual Studio Command Prompt and run the setup from there. Vlad On Mon, Jul 22, 2013 at 11:08 AM, Andreas Mueller amuel...@ais.uni-bonn.de

Re: [Scikit-learn-general] Error while building Scikit-learn in Windows (32-bit)

2013-07-22 Thread Vlad Niculae
. The problem might be the unavailability of blas implementation in Windows as I figured out. Numpy doesn't have settings for blas. In Linux versions we need to get dependencies for blas (libatlas-dev) before builiding. But in Windows it's not there. On Mon, Jul 22, 2013 at 2:44 PM, Vlad

Re: [Scikit-learn-general] scikit-learn for Android?

2013-07-16 Thread Vlad Niculae
Also depending on the model you want to deploy, if you just need to predict using a pre-trained model you can extract the decision function and the data of the model and rewrite it in another language. In many cases applying a trained model is very easy. Vlad On Wed, Jul 17, 2013 at 12:31 AM,

Re: [Scikit-learn-general] Pystruct website and mailing list

2013-07-12 Thread Vlad Niculae
The requirements are definitely the blocking thing here. Not just the dependency on cvxopt but also the inference packages and the fact they need to be built manually. The api is sklearn-ish enough even with lists-of-lists. On Fri, Jul 12, 2013 at 10:06 AM, Andreas Mueller

Re: [Scikit-learn-general] Error while building Scikit-learn in Windows (32-bit)

2013-07-11 Thread Vlad Niculae
If you have MSVC from C++ express 2008 available could you try with that? Are you trying to build the latest master, does the last release work well? Vlad On Thu, Jul 11, 2013 at 5:17 PM, Maheshakya Wijewardena pmaheshak...@gmail.com wrote: I do not have MKL. Can there be any other reason for

Re: [Scikit-learn-general] Paris Sprint location

2013-07-11 Thread Vlad Niculae
Hi Mathieu, Will you be joining online? People have been asking this on IRC ;) Personally I want to take care of unfinished business like the omp CV, the RBM pull request, GSOC PRs, and I was thinking of trying to tackle Averaged SGD; apart from this I'll be side-sprinting on pystruct. Cheers,

Re: [Scikit-learn-general] clf.fit freezes on small dataset in scikit-learn

2013-07-03 Thread Vlad Niculae
Also, it's not that GridSearch is sensitive in itself, but remember you're doing LeaveOneOut, so for every grid point you are actually doing `n_samples` calls to clf.fit. Maybe one of these calls is significantly slower than others due to scaling. On Wed, Jul 3, 2013 at 10:42 PM, Lars Buitinck

Re: [Scikit-learn-general] Adding Sparse Autoencoder to Scikit

2013-06-27 Thread Vlad Niculae
Why would autoencoders be naturally batch? I think historically one of their early uses was for Online PCA, but I may be wrong. Vlad On Wed, Jun 26, 2013 at 11:51 PM, Issam issamo...@gmail.com wrote: Hi @Olivier, you are absolutely right, scipy.optimize.fmin_l_bfgs_b would not be suitable for

[Scikit-learn-general] Time for GSoC 2013!

2013-06-17 Thread Vlad Niculae
Hey everybody, Today is the official starting date for GSoC 2013, and I am very excited! As those of you following could definitely see, we had a lot of very good proposals, sadly, more than we could accept. We managed to get 2 slots from the PSF, and the projects that were accepted are: -

Re: [Scikit-learn-general] Creating a new image dataset

2013-06-17 Thread Vlad Niculae
But using OrderedDict or some other Bunch 2.0 is beside the point. Even if we find some awesome way of storing the datasets while allowing the cool oneliner, it will still mislead people into thinking they need to put their data into that format. What we want is to make it super-obvious that

Re: [Scikit-learn-general] Creating a new image dataset

2013-06-17 Thread Vlad Niculae
Now, how to do that? We don't. I am tired of completely dumming down our code to make it usable by people who don't understand what they do. All it does is give us extra work in terms of support. In this case, we can't really do any better than the way it is, the Bunch is pretty clear.

  1   2   3   >