Re: [scikit-learn] What is the FeatureAgglomeration algorithm?

2018-07-25 Thread Raphael C
uot;linkage". These are well > documented in the literature, or on wikipedia. > > Gaƫl > > On Thu, Jul 26, 2018 at 06:05:21AM +0100, Raphael C wrote: > > Hi, > > > I am trying to work out what, in precise mathematical terms, > > [FeatureAgglomeration][1] does and w

[scikit-learn] What is the FeatureAgglomeration algorithm?

2018-07-25 Thread Raphael C
Hi, I am trying to work out what, in precise mathematical terms, [FeatureAgglomeration][1] does and would love some help. Here is some example code: import numpy as np from sklearn.cluster import FeatureAgglomeration for S in ['ward', 'average', 'complete']: FA =

Re: [scikit-learn] Finding a single cluster in 1d data

2018-04-14 Thread Raphael C
at > also. On this approach, personally, I think the jenskpy module more > straightforward. > > I hope it helps. > > Pedro Pazzini > > 2018-04-12 16:22 GMT-03:00 Raphael C <drr...@gmail.com>: >> >> I have a set of points in 1d represented by a list X of float

Re: [scikit-learn] Parallel MLP version

2017-12-20 Thread Raphael C
I believe tensorflow will do what you want. Raphael On 20 Dec 2017 16:43, "Luigi Lomasto" wrote: > Hi all, > > I have a computational problem to training my neural network so, can you > say me if exists any parallel version about MLP library? > > >

Re: [scikit-learn] Unclear help file about sklearn.decomposition.pca

2017-10-17 Thread Raphael C
How about including the scaling that people might want to use in the User Guide examples? Raphael On 17 October 2017 at 16:40, Andreas Mueller wrote: > In general scikit-learn avoids automatic preprocessing. > That's a convention to give the user more control and decrease

Re: [scikit-learn] Truncated svd not working for complex matrices

2017-08-11 Thread Raphael C
Although the first priority should be correctness (in implementation and documentation) and it makes sense to explicitly test for inputs for which code will give the wrong answer, it would be great if we could support complex data types, especially where it is very little extra work. Raphael On

Re: [scikit-learn] decision trees

2017-03-29 Thread Raphael C
There is https://github.com/scikit-learn/scikit-learn/pull/4899 . It looks like it is waiting for review? Raphael On 29 March 2017 at 11:50, federico vaggi wrote: > That's a really good point. Do you know of any systematic studies about the > two different encodings?

Re: [scikit-learn] Missing data and decision trees

2016-10-13 Thread Raphael C
You can simply make a new binary feature (per feature that might have a missing value) that is 1 if the value is missing and 0 otherwise. The RF can then work out what to do with this information. I don't know how this compares in practice to more sophisticated approaches. Raphael On Thursday,

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
over this information but I am sure I must have misunderstood. At best it seems it could cover the number of positive values but this is missing half the information. Raphael > > On Mon, Oct 10, 2016 at 1:15 PM, Raphael C <drr...@gmail.com> wrote: >> >> How do I use sampl

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
il.com> wrote: > should be the sample weight function in fit > > http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html > > On Mon, Oct 10, 2016 at 1:03 PM, Raphael C <drr...@gmail.com> wrote: >> >> I just

[scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Raphael C
I am trying to perform regression where my dependent variable is constrained to be between 0 and 1. This constraint comes from the fact that it represents a count proportion. That is counts in some category divided by a total count. In the literature it seems that one common way to tackle this is

Re: [scikit-learn] Github project management tools

2016-09-29 Thread Raphael C
My apologies I see it is in the spreadsheet. It would be great to see this work finished for 0.19 if at all possible IMHO. Raphael On 29 September 2016 at 20:12, Raphael C <drr...@gmail.com> wrote: > I hope this isn't out of place but I notice that > https://github.com/scikit-learn/

Re: [scikit-learn] Gradient Boosting: Feature Importances do not sum to 1

2016-08-31 Thread Raphael C
Can you provide a reproducible example? Raphael On Wednesday, August 31, 2016, Douglas Chan wrote: > Hello everyone, > > I notice conditions when Feature Importance values do not add up to 1 in > ensemble tree methods, like Gradient Boosting Trees or AdaBoost Trees. I >

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-29 Thread Raphael C
On Monday, August 29, 2016, Andreas Mueller <t3k...@gmail.com> wrote: > > > On 08/28/2016 01:16 PM, Raphael C wrote: > > > > On Sunday, August 28, 2016, Andy <t3k...@gmail.com > <javascript:_e(%7B%7D,'cvml','t3k...@gmail.com');>> wrote: > >

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
On Sunday, August 28, 2016, Andy <t3k...@gmail.com> wrote: > > > On 08/28/2016 12:29 PM, Raphael C wrote: > > To give a little context from the web, see e.g. http://www.quuxlabs.com/ > blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation- > i

Re: [scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
actly. Instead, we will only try to minimise the errors of the observed user-item pairs. " Raphael On Sunday, August 28, 2016, Raphael C <drr...@gmail.com> wrote: > Thank you for the quick reply. Just to make sure I understand, if X is > sparse and n by n with X[0,0] = 1, X_[n-1,

[scikit-learn] Does NMF optimise over observed values

2016-08-28 Thread Raphael C
Reading the docs for http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html it says The objective function is: 0.5 * ||X - WH||_Fro^2 + alpha * l1_ratio * ||vec(W)||_1 + alpha * l1_ratio * ||vec(H)||_1 + 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2 + 0.5 * alpha * (1 -

Re: [scikit-learn] How to get the most important features from a RF efficiently

2016-07-21 Thread Raphael C
The problem was that I had a loop like for i in xrange(len(clf.feature_importances_)): print clf.feature_importances_[i] which recomputes the feature importance array in every step. Obvious in hindsight. Raphael On 21 July 2016 at 16:22, Raphael C <drr...@gmail.com> wrote: > I h