Re: [scikit-learn] Fitting Lognormal Distribution

2016-05-25 Thread Michael Eickenberg
Hi Sanant, On Thursday, May 26, 2016, Startup Hire wrote: > Hi all, > > Hope you are doing good. > I would like to think so, but you never know where ML will lead us ... > > I am working on a project where I need to do the following things: > > 1. I need to fit a lognormal distribution to a s

Re: [scikit-learn] Fitting Lognormal Distribution

2016-06-03 Thread Michael Eickenberg
probably, especially if they are normalised. you have the formulas for those, right? then you can say it for sure. just take the log on both sides. start by plotting the log of both of those distributions and you willprobably see already On Friday, June 3, 2016, Startup Hire wrote: > Hi, > > Any

Re: [scikit-learn] Fitting Lognormal Distribution

2016-06-03 Thread Michael Eickenberg
Regards, > Sanant > > On Fri, Jun 3, 2016 at 3:08 PM, Michael Eickenberg < > michael.eickenb...@gmail.com> wrote: > >> probably, especially if they are normalised. >> you have the formulas for those, right? then you can say it for sure. >> just take the log

Re: [scikit-learn] Spherical Kmeans #OT

2016-06-27 Thread Michael Eickenberg
hmm, not an answer, and off the top of my head: if you normalize your data points to l2 norm equal 1, and then use standard kmeans with euclidean distance (which then amounts to 2 - 2 cos(angle between points)) would this be enough for your purposes? (with a bit of luck there may even be some sort

Re: [scikit-learn] Spherical Kmeans #OT

2016-06-27 Thread Michael Eickenberg
text/document_clustering.html >> >> if your inputs are normalized, sklearn's kmeans behaves like sperical >> kmeans (unless I'm misunderstanding something, which is certainly possible, >> caveat lector, &c )... >> On Jun 27, 2016 12:13 PM, "Michael

Re: [scikit-learn] Spherical Kmeans #OT

2016-06-27 Thread Michael Eickenberg
lly provide any benefit over > sklearn.preprocessing.normalize) > > On 28 June 2016 at 09:20, Michael Eickenberg > wrote: > >> You could do >> >> from sklearn.pipeline import make_pipeline >> from sklearn.preprocessing import Normalizer >> from sk

Re: [scikit-learn] Using fit_intercept with sparse matrices

2016-07-04 Thread Michael Eickenberg
On Tuesday, July 5, 2016, Joel Nothman wrote: > Jaidev is suggesting that fit_intercept=False makes no sense if the data > is sparse. > +1 > But I think that depends on your target variable. > +1 > > > > On 4 July 2016 at 22:11, Alexandre Gramfort < > alexandre.gramf...@telecom-paristech.fr

Re: [scikit-learn] Install sklearn into a specific folder to make some changes

2016-08-01 Thread Michael Eickenberg
On Monday, August 1, 2016, Andreas Mueller wrote: > Hi. > The best is probably to use a virtual environment or conda environment > specific for this changed version of scikit-learn. > In that environment you could just run an "install" and it would not mess > with your other environments. +1!

Re: [scikit-learn] Install sklearn into a specific folder to make some changes

2016-08-01 Thread Michael Eickenberg
There are several ways of achieving this. One is to build scikit-learn in place by going into the sklearn clone and typing make in or alternatively python setup.py build_ext --inplace # (i think) Then you can use the environment variable PYTHONPATH, set to the github clone, and python will gi

Re: [scikit-learn] Question about Python's L2-Regularized Logistic Regression

2016-09-29 Thread Michael Eickenberg
That should totally depend on your dataset. Maybe it is an "easy" dataset and not much regularization is needed. Maybe use PCA(n_components=2) or an LDA transform to take a look at your data in 2D. Maybe they are easily linearly separable? Sklearn does not do any feature selection if you don't as

Re: [scikit-learn] Using logistic regression with count proportions data

2016-10-10 Thread Michael Eickenberg
Here is a possibly useful comment of larsmans on stackoverflow about exactly this procedure http://stackoverflow.com/questions/26604175/how-to-predict-a-continuous-dependent-variable-that-expresses-target-class-proba/26614131#comment41846816_26614131 On Mon, Oct 10, 2016 at 4:04 PM, Sean Violant

Re: [scikit-learn] Silhouette example - performance issue

2016-10-14 Thread Michael Eickenberg
Dear Anaƫl, if you wish, you could add a line to the example verifying this correspondence. E.g. by moving the print function from between the two silhouette evaluations to after and also evaluating that average and printing it in parentheses. Probably not necessary though. A comment would do als

Re: [scikit-learn] GPR intervals and MCMC

2016-11-08 Thread Michael Eickenberg
Dear Alessio, if it helps, the implementation quite strictly follows what is described in GPML: http://www.gaussianprocess.org/gpml/chapters/ https://github.com/scikit-learn/scikit-learn/blob/412996f09b6756752dfd3736c306d46fca8f1aa1/sklearn/gaussian_process/gpr.py#L23 Hyperparameter optimization

Re: [scikit-learn] NuSVC and ValueError: specified nu is infeasible

2016-12-08 Thread Michael Eickenberg
You have to set a bigger \nu. Try nus =2 ** np.arange(-1, 10) # starting at .5 (default), going to 512 for nu in nus: clf = svm.NuSVC(nu=nu) try: clf.fit ... except ValueError as e: print("nu {} not feasible".format(nu)) At some point it should start working. Hope th

Re: [scikit-learn] NuSVC and ValueError: specified nu is infeasible

2016-12-08 Thread Michael Eickenberg
feasible due to your data. > Have you tried balancing the dataset as I mentioned in your other question > regarding the MLPClassifier? > > > Greets, > Piotr > > > > > > > On 08.12.2016 10:57, Michael Eickenberg wrote: > > You have to set a bigger \nu

Re: [scikit-learn] Specify boosting percentage using Randomoversampling?

2017-01-10 Thread Michael Eickenberg
Is maybe this contrib what you are looking for? Take a close look to see whether it does what you expect. http://contrib.scikit-learn.org/imbalanced-learn/auto_examples/over-sampling/plot_smote.html On Tue, Jan 10, 2017 at 6:36 PM, Suranga Kasthurirathne < suranga...@gmail.com> wrote: > > Hi a

Re: [scikit-learn] Calculate p-value, the measure of statistical significance, in scikit-learn

2017-02-03 Thread Michael Eickenberg
Dear Afarin, scikit-learn is designed for predictive modelling, where evaluation is done out of sample (using train and test sets). You seem to be looking for a package with which you can do classical in-sample statistics and their corresponding evaluations among which p-values. You are probably

Re: [scikit-learn] Are sample weights normalized?

2017-07-28 Thread Michael Eickenberg
Hi Abhishek, think of your example as being equivalent to putting 1 of sample 1, 10 of sample 2 and 100 of sample 3 in a dataset and then run your SVM. This is exactly true for some estimators and approximately true for others, but always a good intuition. Hope this helps! Michael On Fri, Jul 2

Re: [scikit-learn] Are sample weights normalized?

2017-07-28 Thread Michael Eickenberg
100 > of sample 3, sample 3 will be given a lot of focus during training because > it exists in majority, but if my dataset size was say 1 million, these > weights wouldn't really affect much? > > Thanks, > Abhishek > > On Jul 28, 2017 10:41 PM, "Michael Eickenbe

Re: [scikit-learn] 1. Re: unclear help file for sklearn.decomposition.pca

2017-10-16 Thread Michael Eickenberg
Your document says: > This data has already been pre-processed so that each of the features and have about the same mean (zero) and variance. This means that you do this before doing the eigendecomposition. Check the wikipedia article https://en.wikipedia.org/wiki/Principal_component_analysis

Re: [scikit-learn] How does multiple target Ridge Regression work in scikit learn?

2018-05-02 Thread Michael Eickenberg
By the linear nature of the problem the targets are always separately treated (even if there was a matrix-variate normal prior indicating covariance between target columns, you could do that adjustment before or after fitting). As for different alpha parameters, I think you can specify a different

Re: [scikit-learn] Jeff Levesque: neuroscience related datasets

2018-05-05 Thread Michael Eickenberg
Hi Jeffrey, check out these here for neuron data and fmri: http://crcns.org/ And the ones here for fmri: https://openfmri.org/ You can get started by installing one of the following packages and using their dataset downloaders http://nilearn.github.io/modules/reference.html#module-nilearn.datas

Re: [scikit-learn] Should we standardize data before PCA?

2018-05-24 Thread Michael Eickenberg
Hi, that totally depends on the nature of your data and whether the standard deviation of individual feature axes/columns of your data carry some form of importance measure. Note that PCA will bias its loadings towards columns with large standard deviations all else being held equal (meaning that

Re: [scikit-learn] Why doesn't sklearn have support for a Batch Gradient Descent Regressor

2018-05-29 Thread Michael Eickenberg
Hi Lekan, for which type of estimator are you looking for a batch gradient descent regressor? Michael On Tue, May 29, 2018 at 4:54 PM, Lekan Wahab wrote: > I have a feeling this question might have been asked before or there's > some sort of resource somewhere on it but so far I haven't found

Re: [scikit-learn] RidgeCV with multiple targets returns a single alpha. Is it possible to get one alpha per target?

2018-08-07 Thread Michael Eickenberg
You can get one alpha per target in the Ridge estimator (without CV). Then you would have to code the cv loop yourself. Depending on how many target you have this can be more efficient than looping over targets as Alex suggests. Either way there is some coding to do unfortunately. Michael On

Re: [scikit-learn] Difference in normalization between Lasso and LogisticRegression + L1

2019-05-29 Thread Michael Eickenberg
Hi Jesse, I think there was an effort to compare normalization methods on the data attachment term between Lasso and Ridge regression back in 2012/13, but this might have not been finished or extended to Logistic Regression. If it is not documented well, it could definitely benefit from a documen

Re: [scikit-learn] Porting old MLPY KRR model to scikit-learn

2019-09-19 Thread Michael Eickenberg
What exactly do you mean by "port"? Put already fitted models into a sklearn estimator object? You can do this as follows: You should be able to create a `estimator = sklearn.kernel_ridge.KernelRidge(...)` object, call `fit` to some random data of the appropriate shape, and then set `estimator.dua

Re: [scikit-learn] PolynomialFeatures

2019-11-23 Thread Michael Eickenberg
I think it might generate a basis that is capable of generating what you describe above, but feature expansion concretely reads as 1, a, b, c, a ** 2, ab, ac, b ** 2, bc, c ** 2, a ** 3, a ** 2 * b, a ** 2 * c, a* b ** 2, abc, a*c**2, b**3, b**2 * c, b*c**2, c**3 Hope this helps On Fri, Nov 22,

Re: [scikit-learn] Why does sklearn require one-hot-encoding for categorical features? Can we have a "factor" data type?

2020-04-30 Thread Michael Eickenberg
Hi, I think there are many reasons that have led to the current situation. One is that scikit-learn is based on numpy arrays, which do not offer categorical data types (yet: ideas are being discussed https://numpy.org/neps/nep-0041-improved-dtype-support.html Pandas already has a categorical data

Re: [scikit-learn] Tikhonov regularization

2020-08-11 Thread Michael Eickenberg
Hi David, I am assuming you mean that T acts on w. If T is invertible, you can absorb it into the design matrix by making a change of variable v=Tw, w=T^-1 v, and use standard ridge regression for v. If it is not (e.g. when T is a standard finite difference derivative operator) then this trick won