[scikit-learn] Inquiry third-party package affiliation

2017-07-14 Thread Sebastian
four thousand times a month after launch. All the best, Sebastian Flennerhag ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Welcome to the moved scikit-learn mailing list.

2016-05-17 Thread Sebastian Benthall
Would you be willing to share the notebook? It sounds interesting. On May 17, 2016 5:33 AM, "Andreas Mueller" wrote: > Hm we need to update the websites. Maybe the stable one, too. > I kind of forgot about that. > > Mathieu: via the mailman web interface. > Though I have no idea how I extracted t

Re: [scikit-learn] Fitting Lognormal Distribution

2016-05-26 Thread Sebastian Benthall
You may also be interested in the 'powerlaw' Python package, which detects the tail cutoff. On May 26, 2016 5:46 AM, "Warren Weckesser" wrote: > > > On Thu, May 26, 2016 at 2:08 AM, Startup Hire > wrote: > >> Hi all, >> >> Hope you are doing good. >> >> I am working on a project where I need to

Re: [scikit-learn] Fwd: ValueError

2016-06-01 Thread Sebastian Raschka
may want to run import numpy numpy.test('full') import scipy scipy.test('full’) to narrow down the problem further. And how did you compile & install scikit-learn? Best, Sebastian > On Jun 1, 2016, at 1:24 PM, Ruchika Nayyar wrote: > > > Thanks, > Ruchika > -

Re: [scikit-learn] Fwd: ValueError

2016-06-01 Thread Sebastian Raschka
ssues/6706)! Like Maniteja suggested, it is likely due to “a mismatch between numpy installed and the one scikit-learn is compiled with" Best, Sebastian > On Jun 1, 2016, at 1:55 PM, Ruchika Nayyar wrote: > > Hello Sebastian > > Thanks for some insight.. So here ar

Re: [scikit-learn] Fwd: ValueError

2016-06-01 Thread Sebastian Raschka
python --version Python 3.5.1 :: Continuum Analytics, Inc. > On Jun 1, 2016, at 2:07 PM, Matthew Brett wrote: > > Hi, > > On Wed, Jun 1, 2016 at 11:00 AM, Sebastian Raschka > wrote: >> Sorry, >> >> $ python -c 'import numpy; print(scipy.__version__)’ &

Re: [scikit-learn] Fwd: ValueError

2016-06-01 Thread Sebastian Raschka
1, 2016, at 2:39 PM, Matthew Brett wrote: > > On Wed, Jun 1, 2016 at 11:17 AM, Sebastian Raschka > wrote: >>> I think you're using system Python on the Mac. I'd really strongly >>> recommend against that, because system Python >> >> Yeah, but I

Re: [scikit-learn] Fwd: ValueError

2016-06-01 Thread Sebastian Raschka
's own method of managing environments: > > On Wed, Jun 1, 2016 at 2:43 PM, Andrea Bravi wrote: > > Hi guys, > > > I recommend using https://virtualenv.pypa.io to solve those issues! > > > Best regards, > > Andrea > > > On Wednesday, 1 June 20

Re: [scikit-learn] The culture of commit squashing

2016-06-13 Thread Sebastian Raschka
to the reviewers, everyone gave their okay, the CI tests pass, I think there’s nothing against summarizing it to a single commit: - implement EstimatorX In my opinion, it helps tracking down code in the commit history in the long run, but that’s just my personal opinion. Best, Sebastian

Re: [scikit-learn] The culture of commit squashing

2016-06-14 Thread Sebastian Raschka
Oh wow, that looks like a neat feature, didn’t know about this, thanks for sharing! (And I would be in favor of this) > On Jun 14, 2016, at 5:34 AM, Tom DLT wrote: > > We could stop squashing during development, and use the new Squash-and-Merge > button on GitHub. > What do you think? > Tom >

Re: [scikit-learn] Estimator.predict() thread safety

2016-06-17 Thread Sebastian Raschka
ehaviour demonstrates: Best, Sebastian > On Jun 17, 2016, at 11:01 AM, Philip Tully wrote: > > Hi all, > > I notice when I train a model and expose the predict function through a web > API, predict takes longer to run in a multi-threaded environment than a > single-thr

Re: [scikit-learn] Estimator.predict() thread safety

2016-06-17 Thread Sebastian Raschka
am I too conservative? Best, Sebastian > On Jun 17, 2016, at 11:01 AM, Philip Tully wrote: > > Hi all, > > I notice when I train a model and expose the predict function through a web > API, predict takes longer to run in a multi-threaded environment than a > single-th

Re: [scikit-learn] Estimator.predict() thread safety

2016-06-17 Thread Sebastian Raschka
I think > FeatureUnion[n_jobs=1] + GirdSearch[n_jobs <= cores] would be better regarding the nested parallelism limitation > On Jun 17, 2016, at 11:46 AM, Philip Tully wrote: > > Gotcha - so perhaps I should ensure FeatureUnion[n_jobs] + GirdSearch[n_jobs] > < # cores? > > On Fri, Jun 17,

Re: [scikit-learn] Adding BM25 to sklearn.feature_extraction.text (Update)

2016-06-30 Thread Sebastian Raschka
typically memory capacity, especially if you are using multiprocessing via the cv param. PS: > regular numpy matrix I think you mean "numpy array”? (Since there’s a numpy matrix datastruct in numpy as well, however, almost no one uses it) Best, Sebastian > On Jun 30, 2016, at 6:2

Re: [scikit-learn] How to test on PYTHON_ARCH=32 with mac?

2016-07-19 Thread Sebastian Raschka
running Windows XP). E.g., via conda you could do # Create set CONDA_FORCE_32BIT=1 conda create -n 32bit_py27 python=2 # Activate set CONDA_FORCE_32BIT=1 activate 32bit_py27 Best, Sebastian > On Jul 20, 2016, at 1:05 AM, lin yenchen wrote: > > Hi all, > > currently the CI te

Re: [scikit-learn] scikit-learn Digest, Vol 4, Issue 31

2016-07-21 Thread Sebastian Raschka
/stable/ - A different browser - clearing the browser cache Hope one of these things work! Best, Sebastian > On Jul 21, 2016, at 12:27 PM, Rahul Ahuja wrote: > > Yes I can open github pages. > > > > > > Kind regards, > Rahul Ahuja > > > From: sci

Re: [scikit-learn] Sklearn website is down in my place

2016-07-21 Thread Sebastian Raschka
..." > > > Today's Topics: > >1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) >2. Re: scikit-learn Digest, Vol 4, Issue 31 (Sebastian Raschka) > > > -- > >

Re: [scikit-learn] Sklearn website is down in my place

2016-07-21 Thread Sebastian Raschka
it-learn-ow...@python.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of scikit-learn digest..." > > > Today's Topics: > >1. Re: scikit-learn Digest, Vol 4, Issue 31 (Rahul Ahuja) >2. Re: scikit-lear

Re: [scikit-learn] Sklearn website is down in my place

2016-07-21 Thread Sebastian Raschka
, sounds tricky … Another thing you could try is visiting the site via a proxy. E.g., try to go to https://hide.me/en/proxy and type "scikit-learn.org” into the form field. Best, Sebastian > On Jul 21, 2016, at 2:18 PM, Rahul Ahuja wrote: > > > > yes it does via that link

Re: [scikit-learn] scikit-learn.org not opening

2016-07-21 Thread Sebastian Raschka
Glad to hear that it works at least. > but it may not be permanent solution? Yeah, that’s probably not ideal, and I am not sure if there’s a better solution if your country’s government prohibits the use of github :(. > On Jul 21, 2016, at 3:29 PM, Rahul Ahuja wrote: > >

Re: [scikit-learn] Is there any official position on PEP484/mypy?

2016-07-28 Thread Sebastian Raschka
ython 2.7, 3.4 etc? Or are you only thinking about the “comment” syntax? E.g., def hello(r, c=5): s = 'hello' # type: str return '(%d + %d) times %s' % (r, c, s) Which should work on all Py versions. Best, Sebastian > On Jul 28, 2016, at 12:49 PM, Andreas Muelle

Re: [scikit-learn] Declaring numpy and scipy dependencies?

2016-07-28 Thread Sebastian Raschka
I think that should work fine for the `pip install scikit-learn`, however, I think the problem was with upgrading, right? E.g., if you run pip install scikit-learn --upgrade it would try to upgrade numpy and scipy as well, which may not be desired. I think the only workaround would be to run

Re: [scikit-learn] Is there any official position on PEP484/mypy?

2016-07-29 Thread Sebastian Raschka
xample, in Jupyter Notebooks/IPython regarding the shift-tab function help. However, I’d say that your suggestion is the best bet for now to maintain Py 2.x compatibility (until 2020 maybe :P). Cheers, Sebastian > On Jul 29, 2016, at 12:55 PM, Daniel Moisset wrote: > > @Andreas,

Re: [scikit-learn] Install sklearn into a specific folder to make some changes

2016-08-01 Thread Sebastian Raschka
virtualenv (http://docs.python-guide.org/en/latest/dev/virtualenvs/). Best, Sebastian > On Aug 1, 2016, at 3:55 PM, luizfgoncal...@dcc.ufmg.br wrote: > > I'm looking for the best way to install sklearn into a specific folder so > I can make changes for my work, without worrying

Re: [scikit-learn] StackOverflow Documentation

2016-08-03 Thread Sebastian Raschka
Hm, that’s an “interesting” approach by SO, I guess their idea is to build a collection of code-and-example based snippets for less well-documented libraries — especially, libraries that want to keep their documentation lean. > But I assume that copying without attribution is actually plagiaris

Re: [scikit-learn] Supervised anomaly detection in time series

2016-08-05 Thread Sebastian Raschka
ixture models (http://scikit-learn.org/stable/modules/mixture.html) Best, Sebastian > On Aug 5, 2016, at 2:55 PM, Jared Gabor wrote: > > Lots of great suggestions on how to model your problem. But this might be > the kind of problem where you seriously ask how hard it would be to g

[scikit-learn] update pydata schedule

2016-08-18 Thread Sebastian Raschka
___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Tuning custom parameters using grid_search

2016-09-07 Thread Sebastian Raschka
earch1 = GridSearchCV(estimator=pipe, param_grid=grid) gsearch1.fit(X, y) Then, you can put in your desired preprocessing stuff into fit and transform. Best, Sebastian > On Sep 7, 2016, at 2:03 PM, Piotr Bialecki wrote: > > Hi all, > > I am currently tuning some parameters of

[scikit-learn] Mailing list "slow"?

2016-09-07 Thread Sebastian Raschka
with my particular mailing list account? (besides the mailing list, my usually arrives within 1-2 seconds, so it’s not a problem with my email client or server in general). Best, Sebastian ___ scikit-learn mailing list scikit-learn@python.org https

Re: [scikit-learn] Mailing list "slow"?

2016-09-08 Thread Sebastian Raschka
Thanks! So it must be something on my side (or sth. weird with this email account in combination with the Python mailing list). Sorry for spamming, but let me try using my gmail account and send 2 mails simultaneously (I will later delete one of the two). 9:30:30 AM EDT (from gmail) > On Sep 8

Re: [scikit-learn] Mailing list "slow"?

2016-09-08 Thread Sebastian Raschka
Thanks! So it must be something on my side (or sth. weird with this email account in combination with the Python mailing list). Sorry for spamming, but let me try using my gmail account and send 2 mails simultaneously (I will later delete one of the two). 9:29:50 AM EDT > On Sep 8, 2016, at 9:

Re: [scikit-learn] Mailing list "slow"?

2016-09-08 Thread Sebastian Raschka
the bother :P > On Sep 8, 2016, at 9:29 AM, Sebastian Raschka > wrote: > > Thanks! So it must be something on my side (or sth. weird with this email > account in combination with the Python mailing list). Sorry for spamming, but > let me try using my gmail account and send 2 ma

Re: [scikit-learn] Use of Scaler with LassoCV, RidgeCV

2016-09-13 Thread Sebastian Raschka
StandardScaler attached to it. Best, Sebastian > On Sep 13, 2016, at 8:16 AM, Brenet, Yoann wrote: > > Hi all, > > I was trying to use scikit-learn LassoCV/RidgeCV while applying a > 'StandardScaler' on each fold set. I do not want to apply the scaler before >

[scikit-learn] Problems with plotting decision regions

2016-09-13 Thread Sebastian Raschka
really simple here. I created a gist of 2 simple examples with images attached: https://gist.github.com/rasbt/6fb65bba38b70e28e60a9842b988cc67 I think it is very likely that it is not a bug in scikit-learn but rather a matplotlib contourf bug? In case it is a bug at all … Best, Sebastian

Re: [scikit-learn] Problems with plotting decision regions

2016-09-13 Thread Sebastian Raschka
Thanks a lot, Jake, ‘viridis’ seems to work, indeed. I guess I should move this to the matplotlib bug tracker then. Best, Sebastian > On Sep 13, 2016, at 10:58 AM, Jacob Vanderplas > wrote: > > It seems to work correctly if you replace the colormap with a continuous one > li

Re: [scikit-learn] Scikit-learn 0.18-rc2 release candidate!

2016-09-14 Thread Sebastian Raschka
the book ;) I hope the release date in October is fixed! :). Cheers, Sebastian > On Sep 14, 2016, at 7:26 PM, Andreas Mueller wrote: > > Hi all. > We just published the 0.18-rc2 release candidate on pipy and anaconda.org. > Please go ahead and test it, so we can iron out the

Re: [scikit-learn] Scikit-learn 0.18-rc2 release candidate!

2016-09-14 Thread Sebastian Raschka
ing the > book ;) I hope the release date in October is fixed! :). > > Except that now it requires substantial revisions > > On 15 September 2016 at 09:43, Sebastian Raschka wrote: > Thanks for all the effort putting it together! Looks like a nice set of > features a

Re: [scikit-learn] Github project management tools

2016-09-16 Thread Sebastian Raschka
Scikit-learn’s GitHub repo already makes use of these templates. I think the issue is more a technical one arising from their latest “style” changes. > On Sep 16, 2016, at 8:25 AM, Dale T Smith wrote: > > A form – with required, pre-defined fields – can help when people submit > bugs, issues,

Re: [scikit-learn] Github project management tools

2016-09-16 Thread Sebastian Raschka
ore important issues; I am sometimes a bit hesitant to submit/tackle pull requests or issues since I feel like they are somewhat distracting the core contributors from the more important stuff. Best, Sebastian > On Sep 16, 2016, at 9:11 AM, Sebastian Raschka wrote: > > Scikit-learn’s

Re: [scikit-learn] behaviour of OneHotEncoder somewhat confusing

2016-09-19 Thread Sebastian Raschka
0., 0., 0., 1.], [ 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1.]]) Best, Sebastian > On Sep 19, 2016, at 5:45 PM, Lee Zamparo wrote: > > Hi sklearners, > > A lab-mate came to me with a problem about encoding DNA seq

Re: [scikit-learn] Contribution project proposal

2016-09-20 Thread Sebastian Raschka
I remember that there was a discussion regarding stacking in general after we implemented the majority voting classifier, and I just found a PR with some stacking implementation that seems to be in progress https://github.com/scikit-learn/scikit-learn/pull/6674 > On Sep 20, 2016, at 8:02 PM, J

Re: [scikit-learn] ANN Scikit-learn 0.18 released

2016-09-28 Thread Sebastian Raschka
Have been playing around with the new functionality tonight. There are so many great additions, especially the new CV functionality in the model_selection module is super great. Nested CV is much more convenient now! Congratulations to everyone, and thanks for this great new version! :) > On S

Re: [scikit-learn] Question about Python's L2-Regularized Logistic Regression

2016-09-29 Thread Sebastian Raschka
) lr.coef_ > Should I be coding my predictors as +1/-1? 0 and 1 should be just fine and is the expected default. Best, Sebastian > On Sep 29, 2016, at 6:09 PM, Kristen M. Altenburger > wrote: > > Hi All, > > I am trying to understand Python’s code [function ‘_fit_liblinear'

Re: [scikit-learn] suggested machine learning algorithm

2016-10-01 Thread Sebastian Raschka
Maybe it’s worth switching to LOOCV since you may have a bit of a pessimistic bias here due to the small training set size (in bootstrap you only have asymptotically 0.632 unique samples for training). I would try both linear and nonlinear models; instead of adding more features maybe also try t

Re: [scikit-learn] Welcome Raghav to the core-dev team

2016-10-03 Thread Sebastian Raschka
Congrats Raghav! And thanks a lot for all the great work on the model_selection module! > On Oct 3, 2016, at 12:53 PM, Siddharth Gupta > wrote: > > Congrats Raghav! :D > > > On Oct 3, 2016 10:22 PM, "Aakash Agarwal" wrote: > Congrats Raghav! > > On Mon, Oct 3, 2016 at 9:54 PM, Manoj Kumar

Re: [scikit-learn] Random Forest with Bootstrapping

2016-10-03 Thread Sebastian Raschka
ally 0.632 * n unique samples in your bootstrap set. Or in other words 0.368 * n samples are not used for growing the respective tree (to compute the OOB). As far as I understand, the random forest OOB score is then computed as the average OOB of each tee (correct me if I am wrong!). Best, Sebast

Re: [scikit-learn] Random Forest with Bootstrapping

2016-10-03 Thread Sebastian Raschka
bootstrap sample. This is asymptotically "1/e approx. 0.368” (i.e., for very, very large n) Then, you can compute the probability of a sample being chosen as P(chosen) = 1 - (1 - 1/n)^n approx. 0.632 Best, Sebastian > On Oct 3, 2016, at 3:05 PM, Ibrahim Dalal via scikit-learn > wro

Re: [scikit-learn] Random Forest with Bootstrapping

2016-10-03 Thread Sebastian Raschka
alpha=0.5,) plt.xlabel('n') plt.ylabel('1 - (1 - 1/n)^n') plt.xlim([0, 210]) plt.show() > On Oct 3, 2016, at 3:15 PM, Sebastian Raschka wrote: > > Say the probability that a given sample from a dataset of size n is *not* > drawn as a bootstrap sample is > >

Re: [scikit-learn] Random Forest with Bootstrapping

2016-10-03 Thread Sebastian Raschka
ples are left out > (theoretically at least), some of the samples in B must be repeated? > > On Tue, Oct 4, 2016 at 12:50 AM, Sebastian Raschka > wrote: > Or maybe more intuitively, you can visualize this asymptotic behavior e.g., > via > > import matplotlib.pyplot as

Re: [scikit-learn] Random Forest with Bootstrapping

2016-10-03 Thread Sebastian Raschka
Ibrahim Dalal via scikit-learn > wrote: > > So what is the point of having duplicate entries in your training set? This > seems just a pure overhead. Sorry but you will again have to help me here. > > On Tue, Oct 4, 2016 at 1:29 AM, Sebastian Raschka > wrote: > > H

Re: [scikit-learn] tree visualization with class names in leaves

2016-10-24 Thread Sebastian Raschka
x27;virginica’], where 0 -> ‘setosa’, 1 -> ‘versicolor’, 2 -> ‘virginica’. Best, Sebastian > On Oct 24, 2016, at 10:18 AM, greg g wrote: > > bLaf1ox-forefront-antispam-report: EFV:NLI; SFV:NSPM; > SFS:(10019020)(9893); > DIR:OUT; SFP:1102; SCL:1; SRVR:DB5EUR03HT168;

Re: [scikit-learn] tree visualization with class names in leaves

2016-10-25 Thread Sebastian Raschka
oder -> le = LabelEncoder() -> y = le.fit_transform(labels) -> le.classes_ array(['Setosa', 'Versicolor', 'Virginica'], dtype=' import numpy as np -> np.bincount(y) array([50, 50, 50]) Best, Sebastian > On Oct 25, 2016, at 3:00 AM, greg g

Re: [scikit-learn] Problem using boxplots to compare significance of model performance

2016-10-30 Thread Sebastian Raschka
om/rasbt/mlxtend/blob/master/docs/sources/user_guide/evaluate/mcnemar.ipynb Best Sebastian > On Oct 30, 2016, at 3:24 PM, Suranga Kasthurirathne > wrote: > > > Hi folks! > > I'm using scikit-learn to build two neural networks using 10% holdout, and > compare the

Re: [scikit-learn] Problem using boxplots to compare significance of model performance

2016-10-30 Thread Sebastian Raschka
ork: model_1 = [0.85, # experiment 1 0.84] # experiment 2 model_2 = [0.84, # experiment 1 0.83] # experiment 2 plt.boxplot([model_1, model_2]) However, a boxplot based on 2 values only doesn’t make sense imho, I you could just plot the range. Best, Sebastian > On Oct 30

Re: [scikit-learn] suggested classification algorithm

2016-11-16 Thread Sebastian Raschka
Yeah, there are many useful resources and implementations scattered around the web. However, a good, brief overview of the general ideas and concepts would be this one, for example: http://www.svds.com/learning-imbalanced-classes/ > On Nov 16, 2016, at 3:54 PM, Dale T Smith wrote: > > Unbala

Re: [scikit-learn] suggested classification algorithm

2016-11-17 Thread Sebastian Raschka
or under-sampling would be more > suitable? > > https://dl.dropboxusercontent.com/u/48168252/PCA_of_features.png > > thanks for your advices > Thomas > > > On 16 November 2016 at 22:20, Sebastian Raschka wrote: > Yeah, there are many useful resources and implement

Re: [scikit-learn] question about using sklearn.neural_network.MLPClassifier?

2016-11-23 Thread Sebastian Raschka
> If you keep everything at their default values, it seems to work - > > ```py > from sklearn.neural_network import MLPClassifier > X = [[0, 0], [0, 1], [1, 0], [1, 1]] > y = [0, 1, 1, 0] > clf = MLPClassifier(max_iter=1000) > clf.fit(X, y) > res = clf.predict([[0, 0], [0, 1], [1, 0], [1, 1]])

Re: [scikit-learn] 答复: question about using sklearn.neural_network.MLPClassifier?

2016-11-24 Thread Sebastian Raschka
Cheers, Sebastian > On Nov 24, 2016, at 8:08 PM, lin...@ruijie.com.cn wrote: > > @ Sebastian Raschka > thanks for your analyzing , > here is another question, when I use neural network lib routine, can I save > the trained network for use at the next time? > Just like t

Re: [scikit-learn] 答复: 答复: question about using sklearn.neural_network.MLPClassifier?

2016-11-25 Thread Sebastian Raschka
y of them need > number of outlier and distance as input parameter in advance, is there > algorithm more intelligently ? > > > > > > -邮件原件- > 发件人: scikit-learn > [mailto:scikit-learn-bounces+linjia=ruijie.com...@python.org] 代表 Sebastian > Raschka >

Re: [scikit-learn] Problem with nested cross-validation example?

2016-11-28 Thread Sebastian Raschka
On first glance, the image shown in the image and the code example seem to do/show the same thing? Maybe it would be worth adding an explanatory figure like this to the docs to clarify? > On Nov 28, 2016, at 7:07 PM, Joel Nothman wrote: > > If that clarifies, please offer changes to the exampl

Re: [scikit-learn] Problem with nested cross-validation example?

2016-11-29 Thread Sebastian Raschka
I have an ipynb where I did the nested CV more “manually” in sklearn 0.17 vs sklearn 0.18 — I intended to add it as an appendix to a blog article (model eval part 4), which I had no chance to write, yet. Maybe the sklearn 0.17 part is a bit more obvious (although way less elegant) than the sklea

Re: [scikit-learn] question in using Scikit-learn MLPClassifier?

2016-12-06 Thread Sebastian Raschka
transform(X_test) Good luck! Sebastian > On Dec 6, 2016, at 6:12 AM, lin...@ruijie.com.cn wrote: > > Hi all: > I uses a ‘Car Evaluation’ dataset from > http://archive.ics.uci.edu/ml/machine-learning-databases/car/car.data to test > the effect of MLP. (I transfer some

Re: [scikit-learn] no positive predictions by neural_network.MLPClassifier

2016-12-07 Thread Sebastian Raschka
surface). Best, Sebastian > The default is set 100 units in the hidden layer, but theoretically, it > should work with 2 hidden logistic units (I think that’s the typical > textbook/class example). I think what happens is that it gets stuck in local > minima depending on the r

Re: [scikit-learn] no positive predictions by neural_network.MLPClassifier

2016-12-08 Thread Sebastian Raschka
n is available via the loss_ attribute: mlp = MLPClassifier(…) # after training: mlp.loss_ > On Dec 8, 2016, at 9:55 AM, Thomas Evangelidis wrote: > > Hello Sebastian, > > I did normalization of my training set and used the same mean and stdev > values to normalize my test set, ins

Re: [scikit-learn] biased predictions in logistic regression

2016-12-15 Thread Sebastian Raschka
dard deviation to get “z” scores (e.g., this can be done by the StandardScaler()). Best, Sebastian > On Dec 15, 2016, at 4:02 PM, Rachel Melamed wrote: > > I just tried it and it did not appear to change the results at all? > I ran it as follows: > 1) Normalize dummy variables (by

[scikit-learn] n_jobs for LogisticRegression

2016-12-18 Thread Sebastian Raschka
d for the LogisticRegressionCV, and should the n_jobs docstring in LogisticRegression be described as "Number of CPU cores used for model fitting” instead of “during cross-validation,” or am I getting this wrong? Best, Sebastian ___ scikit-learn mailing lis

Re: [scikit-learn] combining arrays of features to train an MLP

2016-12-19 Thread Sebastian Raschka
Thanks, Thomas, that makes sense! Will submit a PR then to update the docstring. Best, Sebastian > On Dec 19, 2016, at 11:06 AM, Thomas Evangelidis wrote: > > ​​ > Greetings, > > My dataset consists of objects which are characterised by their structural > features whi

Re: [scikit-learn] combining arrays of features to train an MLP

2016-12-19 Thread Sebastian Raschka
representations, e.g,. learning from the graphs directly: http://papers.nips.cc/paper/5954-convolutional-networks-on-graphs-for-learning-molecular-fingerprints.pdf http://pubs.acs.org/doi/abs/10.1021/ci400187y Best, Sebastian > On Dec 19, 2016, at 4:56 PM, Thomas Evangelidis wrote: > > t

Re: [scikit-learn] n_jobs for LogisticRegression

2016-12-19 Thread Sebastian Raschka
Thanks, Tom, that makes sense. Submitted a PR to fix that. Best, Sebastian > On Dec 19, 2016, at 10:14 AM, Tom DLT wrote: > > Hi, > > In LogisticRegression, n_jobs is only used for one-vs-rest parallelization. > In LogisticRegressionCV, n_jobs is used for both one-vs

Re: [scikit-learn] combining arrays of features to train an MLP

2016-12-20 Thread Sebastian Raschka
small the sample/feature ratio), I think there are way too many (hyper/)parameters to fit in an MLP to get good results. I think you could be better off with a kernel SVM (if linear models don’t work well) or ensemble learning. Best, Sebastian > On Dec 19, 2016, at 6:51 PM, Thomas Evangeli

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Sebastian Raschka
the estimator that you can initialize with “refit=False” to avoid refitting if it helps. http://rasbt.github.io/mlxtend/user_guide/classifier/EnsembleVoteClassifier/#example-5-using-pre-fitted-classifiers Best, Sebastian > On Jan 7, 2017, at 11:15 AM, Thomas Evangelidis wrote: > >

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-07 Thread Sebastian Raschka
])) However, it may be better to use stacking, and use the output of r.predict(X) as meta features to train a model based on these? Best, Sebastian > On Jan 7, 2017, at 1:49 PM, Thomas Evangelidis wrote: > > Hi Sebastian, > > Thanks, I will try it in another classification

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-08 Thread Sebastian Raschka
between the mlps and the meta estimator. However I'd definitely also recommend simpler models als alternative. Best, Sebastian > On Jan 7, 2017, at 4:36 PM, Thomas Evangelidis wrote: > > > >> On 7 January 2017 at 21:20, Sebastian Raschka wrote: >> Hi, Thomas, >

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-09 Thread Sebastian Raschka
s set a max constraint for the weights in combination with dropout, e.g. “ ||w||_2 < constant “, which worked even better than dropout alone (the constant becomes another hyperparm to tune though). Best, Sebastian > On Jan 9, 2017, at 1:21 PM, Jacob Schreiber wrote: > > Thomas, it

Re: [scikit-learn] meta-estimator for multiple MLPRegressor

2017-01-11 Thread Sebastian Raschka
24, where they talk about alternative (the more classic) representations of protein-ligand complexes or interactions as inputs to either random forests or multi-layer perceptrons. Best, Sebastian > On Jan 10, 2017, at 7:46 AM, Thomas Evangelidis wrote: > > Jacob, > > The featur

[scikit-learn] Identify spectra with "marker"

2017-01-20 Thread Sebastian Illner
Hi guys, I'm new to NIR-measurement as wenn as chemometrics. My current project involvs the recognition of determined spectra (of a reference system) among others. The materials are currentlys not really set. So I try to give a predetermined mixture of substances into another matrix and group t

Re: [scikit-learn] numpy integration with random forrest implementation

2017-01-21 Thread Sebastian Raschka
://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html#sklearn.preprocessing.MultiLabelBinarizer). Also, the RandomForestClassifier should support multillabel classification. Best, Sebastian > On Jan 21, 2017, at 12:59 PM, Carlton Banks wrote: > > Mo

Re: [scikit-learn] numpy integration with random forrest implementation

2017-01-21 Thread Sebastian Raschka
Oh okay. But that shouldn’t be a problem, the RandomForestRegressor also supports multi-outpout regression; same expected target array shape: [n_samples, n_outputs] Best, Sebastian > On Jan 21, 2017, at 1:27 PM, Carlton Banks wrote: > > Not classifiication… but regression.. >

Re: [scikit-learn] numpy integration with random forrest implementation

2017-01-21 Thread Sebastian Raschka
PM, Carlton Banks wrote: > > Thanks for the Info!.. > How do you set it up.. > > There doesn’t seem a example available for regression purposes.. >> Den 21. jan. 2017 kl. 19.32 skrev Sebastian Raschka : >> >> Oh okay. But that shouldn’t be a problem, the RandomFor

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-26 Thread Sebastian Raschka
= StratifiedKFold(n_splits=5, shuffle=True, random_state=i) gs = GridSearchCV(..., cv=k_fold) ... Best, Sebastian > On Jan 26, 2017, at 5:39 PM, Raga Markely wrote: > > Hello, > > I was trying to do repeated Grid Search CV (20 repeats). I thought that each > time I call GridSearchCV

Re: [scikit-learn] Scores in Cross Validation

2017-01-26 Thread Sebastian Raschka
u haven’t touched before. I often use “training, validation, and testing “ approach as well, though, especially when working with large datasets and for early stopping on neural nets. Best, Sebastian > On Jan 26, 2017, at 1:19 PM, Raga Markely wrote: > > Thank you, Guillaume. >

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-26 Thread Sebastian Raschka
do a McNemar test. Best, Sebastian > On Jan 26, 2017, at 8:09 PM, Raga Markely wrote: > > Ahh.. nice.. I will use that.. thanks a lot, Sebastian! > > Best, > Raga > > On Thu, Jan 26, 2017 at 6:34 PM, Sebastian Raschka > wrote: > Hi, Raga, > > I th

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-27 Thread Sebastian Raschka
.) model selection based on best algo via k-fold on whole training set 3.) fit best algo w. best hyperparams (from 2.) to whole training set 4.) evaluate on test set 5.) fit classifier to whole dataset, done Best, Sebastian > On Jan 27, 2017, at 10:23 AM, Raga Markely wrote: > > Sounds good,

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-27 Thread Sebastian Raschka
.) model selection based on best algo via k-fold on whole training set 3.) fit best algo w. best hyperparams (from 2.) to whole training set 4.) evaluate on test set 5.) fit classifier to whole dataset, done Best, Sebastian > On Jan 27, 2017, at 12:49 PM, Sebastian Raschka > wrote: > &

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-30 Thread Sebastian Raschka
Hm, which version of scikit-learn are you using? Are you running this on sklearn 0.18? Best, Sebastian > On Jan 30, 2017, at 2:48 PM, Raga Markely wrote: > > Hi Sebastian, > > Following up on the original question on repeated Grid Search CV, I tried to > do repeated nes

Re: [scikit-learn] Random StratifiedKFold Grid Search CV

2017-01-30 Thread Sebastian Raschka
Cool, glad to hear that it was such an easy fix :) > On Jan 30, 2017, at 3:49 PM, Raga Markely wrote: > > Nice catch!! The sklearn was 0.18, but I used sklearn.grid_search instead of > sklearn.model_selection. > > Error is gone now. > > Thank you, Sebastian! > Rag

Re: [scikit-learn] can we have a slack team for scikit-learn

2017-02-19 Thread Sebastian Raschka
In my opinion, Slack can be quite useful for discussing things “live.” However, one of the main problems I have with Slack — I am using it for some other projects — is that it is easy to lose track if important things are discussed and one is not constantly online and checking the timeline. In a

Re: [scikit-learn] Control over the inner loop in GridSearchCV

2017-02-27 Thread Sebastian Raschka
KFold(n_splits=5, shuffle=True, random_state=1) for name, gs_est in sorted(gridcvs.items()): nested_score = cross_val_score(gs_est, X=X_train, y=y_train, cv=outer_cv, n_jobs=1)

Re: [scikit-learn] Control over the inner loop in GridSearchCV

2017-02-27 Thread Sebastian Raschka
3rd round list(my_gen)[2][1] # stores an array of indices used as test fold in the 3rd round Hope that helps. Best, Sebastian > The following did not work. This is what we get --> ValueError: too many > values to unpack > On Feb 27, 2017, at 5:13 PM, Ludovico Coletta wrote: &g

Re: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

2017-03-01 Thread Sebastian Raschka
Hi, Raga, I have a short section on this here (https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html#the-bootstrap-method-and-empirical-confidence-intervals) if it helps. Best, Sebastian > On Mar 1, 2017, at 3:07 PM, Raga Markely wrote: > > Hi everyone, >

Re: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

2017-03-01 Thread Sebastian Raschka
mation rate for regression would be ... > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > Thanks a lot, Sebastian! Very nicely written. > > I have a few follow-up questions: > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_l

Re: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

2017-03-01 Thread Sebastian Raschka
mation rate for regression would be ... > On Mar 1, 2017, at 5:39 PM, Raga Markely wrote: > > Thanks a lot, Sebastian! Very nicely written. > > I have a few follow-up questions: > 1. Just to make sure I understand correctly, using the .632+ bootstrap > method, the ACC_l

Re: [scikit-learn] Confidence and Prediction Intervals of Support Vector Regression

2017-03-01 Thread Sebastian Raschka
:07 PM, Raga Markely wrote: > > No worries, Sebastian :) .. thank you very much for your help.. I learned a > lot of new things from your site today.. it led me to some relevant chapters > in "The Elements of Statistical Learning", which then led me to chapter 8 > p

Re: [scikit-learn] Linear Discriminant Analysis with Cross Validation in Python

2017-03-07 Thread Sebastian Raschka
sklearn.model_selection import cross_val_score cross_val_score(estimator=lda, X=X, y=y, cv=loo) ``` Best, Sebastian > On Mar 7, 2017, at 10:01 AM, Serafeim Loukas wrote: > > Dear Mahesh, > > Thank you for your response. > > I read the documentation however I did not fin

Re: [scikit-learn] Logistic regression with elastic net regularization

2017-03-13 Thread Sebastian Raschka
Hi, Stuart, I think the only way to do that right now would be through the SGD classifier, e.g., sklearn.linear_model.SGDClassifier(loss='log', penalty='elasticnet' …) Best, Sebastian > On Mar 13, 2017, at 12:57 PM, Stuart Reynolds > wrote: > > Is the

Re: [scikit-learn] Differences between scikit-learn and Spark.ml for regression toy problem

2017-03-15 Thread Sebastian Raschka
completely for now. And when you run the LogisticRegression, maybe run it multiple times with different random seeds to see if your solutions are generally stable. Best, Sebastian > On Mar 13, 2017, at 1:06 PM, Stuart Reynolds > wrote: > > Both libraries are heavily parameterized. You

Re: [scikit-learn] GridsearchCV

2017-03-15 Thread Sebastian Raschka
ive. Best, Sebastian > On Mar 16, 2017, at 12:00 AM, Carlton Banks wrote: > > Hi… > > I currently trying to optimize my CNN model using gridsearchCV, but seem to > have some problems feading my input data.. > > My training data is stored as a list of Np.ndarr

Re: [scikit-learn] GridsearchCV

2017-03-15 Thread Sebastian Raschka
gb ram.. > >> Den 16. mar. 2017 kl. 05.30 skrev Sebastian Raschka : >> >> Sklearn estimators typically assume 2d inputs (as numpy arrays) with >> shape=[n_samples, n_features]. >> >>> list of Np.ndarrays of shape (6,3,3) >> >> I assume you

Re: [scikit-learn] GridsearchCV

2017-03-15 Thread Sebastian Raschka
a super computer, and seem to >> have problems with memory.. already used 62 gb ram.. >> >> > Den 16. mar. 2017 kl. 05.30 skrev Sebastian Raschka : >> > >> > Sklearn estimators typically assume 2d inputs (as numpy arrays) with >> > shape=[n_samples,

Re: [scikit-learn] GridsearchCV

2017-03-15 Thread Sebastian Raschka
; > I changed it to -48?.. and it seem to be running.. >> Den 16. mar. 2017 kl. 06.06 skrev Sebastian Raschka : >> >> the “-1” means that it will run on all processors that are available >> >>> On Mar 16, 2017, at 1:01 AM, Carlton Banks wrote: >>> >

  1   2   3   >