Dear scikit-learn mailing list similarly to standing feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE and RFECV*, this is a request to openly discuss the *PROPOSAL* and requirements of *feature_selection.EFS and/or EFSCV* which would stand for "Evolutionary Feature Selection" with starting 8 algorithms or methods to be used with scikit-learn estimators, just as published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to help integrate it (in cc).
*PROPOSAL* Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn: *1) CODE* - implementing *feature_selection.EFS and/or EFSC*V (a space for evolutionary computing community interested in feature selection) RFE is: feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> *RFE*(estimator, *[, ...]) Feature ranking with recursive feature elimination. feature_selection.RFECV <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> (estimator, *[, ...]) Recursive feature elimination with cross-validation to select features. The "EFS" could be: feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> *EFS*(estimator, *[, ...]) Feature ranking and feature elimination with *8 different algorithms, SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with evolutionary computing, swarm, genetic etc. * feature_selection. <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> *EFSCV*(estimator, *[, ...]) Feature elimination with cross-validation to select features *2) DATASETS & CANCER BENCHMARK* - curating and integrating fetch of *cancer_benchmark* 40 datasets, directly in scikit-learn or externally pullable somehow and maintained (space for contributing expanding high-dimensional datasets on cancer topics). fetch_c <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing> ancer-benchmark(*[,, ...]) Loads 40 individual cancer related high-dimensional datasets for benchmarking feature selection methods (classification). *3) TUTORIAL / WEBSITE* - writing tutorial to replicate IEEE paper results with *feature_selection.EFS and/or EFSCV* on *cancer_benchmark (40 datasets)* I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very interesting novelty in working with high-dimensional datasets as it reports small subsets of predictive features selected with SVM, KNN across 40 datasets. Replicability under BSD-3 and high quality under scikit-learn could assure benchmarking novel feature selection algorithms easier - in my very first opinion. Since this is the very first touch of myself with IEEE paper authors and the scikit-learn list altogether, we would welcome some help/guide how integration could work out, and if there is any interest on that line at all. Kind regards Dalibor Hrg https://www.linkedin.com/in/daliborhrg/ On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < alexandre.gramf...@inria.fr> wrote: > Dear Dalibor > > you should discuss this on the main scikit-learn mailing list. > > https://mail.python.org/mailman/listinfo/scikit-learn > > Alex > > On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor....@gmail.com> > wrote: > >> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc), >> >> This is a request to openly discuss the idea of potential for >> feature_selection. >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >> *EFS* which would stand for "Evolutionary Feature Selection" or shortly >> EFS with starting 8 algorithms as published in IEEE >> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional >> datasets. I have identified this work to be of very interesting novelty in >> working with high-dimensional datasets, especially for health fields, and >> it could mean a lot to the ML community and scikit-learn project - in my >> very first opinion. >> >> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE >> paper/work as feature_selection. >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >> *EFS *and 8 algorithms in it could be a near term goal. And eventually, >> scikit-learn EFSCV and diverse classification algorithms could be >> benchmarked for "joint paper" in JOSS, or a health journal. >> >> My initial idea (doesn't need to be that way or is open to discussion) >> has some first thought like this: >> >> RFE has: >> >> feature_selection. >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >> *RFE*(estimator, *[, ...]) >> >> Feature ranking with recursive feature elimination. >> >> feature_selection.RFECV >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> >> (estimator, *[, ...]) >> >> Recursive feature elimination with cross-validation to select features. >> The "EFS" could have: >> >> feature_selection. >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >> *EFS*(estimator, *[, ...]) >> >> Feature ranking and feature elimination with *8 different algorithms, >> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked >> with evolutionary computing, swarm, genetic etc. * >> >> feature_selection. >> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> >> *EFSCV*(estimator, *[, ...]) >> >> Feature elimination with cross-validation to select features >> Looking forward to an open discussion and if Evolutionary Feature >> Selection EFS is something for sklearn project, or maybe a separate pip >> install package. >> >> Kind regards >> Dalibor Hrg >> https://www.linkedin.com/in/daliborhrg/ >> >> On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadz...@yahoo.com> >> wrote: >> >>> >>> >>> Dear Dalibor Hrg, >>> >>> Thank you very much for your attention to the SFE algorithm. Thank you >>> very much for the time you took to guide me and my colleagues. According to >>> your guidance, we will add this algorithm to the scikit-learn library as >>> soon as possible. >>> >>> Kind regards, >>> Ahadzadeh. >>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg < >>> dalibor....@gmail.com> wrote: >>> >>> >>> Dear Authors, >>> >>> you have done some amazing work on feature selection here published in >>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code >>> here without a LICENSE file or any info on this: >>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are >>> mentioned to download data. >>> >>> I would be interested with you that we: >>> >>> Step 1) make and release a pip package, publish this code in JOSS >>> https://joss.readthedocs.io i.e. >>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license >>> and replicate IEEE paper table results. All 8 algorithms could be in >>> potentially one class "EFS" meaning "Evolutionary Feature Selection", >>> selectable as 8 options among them SFE. Or something like that. >>> >>> Step 2) try integrate and work with scikit-learn people, I would >>> recommend it to integrate this under >>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection >>> similarly >>> to sklearn.feature_selection.RFE. I believe this would be a great >>> contribution to the best open library for ML, scikit-learn. >>> >>> I am unsure what is the status of datasets and licenses therein?. But, >>> the datasets could be fetched externally from OpenML.org repository, for >>> example >>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or >>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit >>> on the dataset licenses? >>> >>> Overall, I hope this can hugely maximize your published work visibility >>> but also for others to credit you in papers in a more citable and >>> replicable way. I believe your IEEE paper and work definitely deserve a >>> spot in scikit-learn. There is need for some replicable code on >>> "Evolutionary Methods for Feature Selection" and such Benchmark in >>> life-science datasets, and you have done some great work so far. >>> >>> Let me know what you think. >>> >>> Best regards, >>> Dalibor Hrg >>> >>> https://www.linkedin.com/in/daliborhrg/ >>> >>
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn