Re: [scikit-learn] Continues monitoring of benchmark performances

2019-07-23 Thread Jeremie du Boisberranger
> Isn't Jérémie's project at https://github.com/jeremiedbb/scikit-learn_benchmarks meant to be doing this? What's its status? How does it relate to Tom's work? Yes it's the same project. Tom kindly accepted to run it alongside other projects from the pydata ecosystem. > I can't find it back

Re: [scikit-learn] Long term roadmap and moonshot goals

2019-07-23 Thread Adrin
It may be worth doing a user survey to get a feeling of what people care about, we may or may not take them into account afterwards. Here's how Dask is doing it: https://github.com/dask/dask/issues/4748 On Sun, Jul 14, 2019 at 8:44 PM Andreas Mueller wrote: > Hi all. > At SciPy, Brian Granger r

Re: [scikit-learn] Long term roadmap and moonshot goals

2019-07-23 Thread Tom Augspurger
Pandas will be running one soon too: https://github.com/pandas-dev/pandas/issues/27477 It may be worth coordinating on questions so that we can compare communities (or combining surveys to reduce "survey-fatigue" somehow? Haven't thought through this). Tom On Tue, Jul 23, 2019 at 6:54 AM Adrin

Re: [scikit-learn] Long term roadmap and moonshot goals

2019-07-23 Thread Piotr Szymański
If I could pitch in, it would be lovely, very lovely indeed, if scikit-learn models could: - operate on sparse data, both input and output by default - implement some kind of sparse vector representation (as in https://github.com/scikit-learn/scikit-learn/issues/8908 ) - perhaps have a unifiying n

Re: [scikit-learn] Long term roadmap and moonshot goals

2019-07-23 Thread Andreas Mueller
We had one done in 2013 (wow!). I'll post the link to the internal mailing list since it could have identifying information. Obviously the answers now would be quite different, just thought it would be interesting to look at it again. On 7/23/19 10:28 AM, Tom Augspurger wrote: Pandas will be r

Re: [scikit-learn] Long term roadmap and moonshot goals

2019-07-23 Thread Andreas Mueller
Can you give an example? I imagine that just supporting the data structure will not give you any speed benefit unless the algorithms are reimplemented to take advantage of the problem structure. Even if the output of logistic regression would be a sparse binary vector, you'd still need to comp

[scikit-learn] Random Forest without target to measure feature importance

2019-07-23 Thread Gabor Toth
Hello, I would like to use Random Forest classifier to assess the importance of features (bag-of-words) but I don't have any predefined class labels or any test data. I have earlier used ExtraTreesClassifier() with fit_transform, which is not available anymore (see below). I am wondering how I cou