Re: [scikit-learn] check_estimator and score_samples method

2018-12-13 Thread Jason Rudy
Thanks, Joel. From your response I assume that the use of a y argument to score_samples is not a violation of the sklearn API, so I'll keep the method and find a workaround for the check_estimator test as it's currently written. I'll comment on the issue as well. On Mon, Dec 10, 2018 at 2:58 P

Re: [scikit-learn] Difference between linear model and tree-based regressor?

2018-12-13 Thread Brown J.B. via scikit-learn
"Elements of Statistical Learning" is on my bookshelf, but even so, that was a great summary! J.B. ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Difference between linear model and tree-based regressor?

2018-12-13 Thread Olivier Grisel
They are very different statistical models from a mathematical point of view. See the online scikit-learn documentation or reference text books such as "Elements of Statistical Learning" for more details. In practice, linear model tends to be faster to fit on large data, especially when the number

Re: [scikit-learn] benchmarking TargetEncoder Was: ANN Dirty_cat: learning on dirty categories

2018-12-13 Thread Joris Van den Bossche
Hi all, I finally had some time to start looking at it the last days. Some preliminary work can be found here: https://github.com/jorisvandenbossche/target-encoder-benchmarks. Up to now, I only did some preliminary work to set up the benchmarks (based on Patricio Cerda's code, https://arxiv.org/p