starting with the Efroymson stepwise regression, the selection of relevant regressors has a long history. Of course, Efroymson's case is an old and simple one in a very wide set of more general problems where the number of variables and the missingness pattern make things very hard to tackle. I had a look at the paper that seems to me to be based on a wide review of the literature and an in depth focus on the main extant algorithms. I do not feel as an expert about the matter. However, the subject is so important that, in view of the thorough analysis the authors performed, I think this enterprise worthwhile. My best regards. Ulderico Santarelli.
Il giorno dom 24 set 2023 alle ore 11:12 Dalibor Hrg <dalibor....@gmail.com> ha scritto: > Dear scikit-learn mailing list > > similarly to standing feature_selection. > <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE > and RFECV*, this is a request to openly discuss the *PROPOSAL* and > requirements of *feature_selection.EFS and/or EFSCV* which would stand > for "Evolutionary Feature Selection" with starting 8 algorithms or methods > to be used with scikit-learn estimators, just as published in IEEE > https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to > help integrate it (in cc). > > *PROPOSAL* > Implement/integrate https://arxiv.org/abs/2303.10182 paper into > scikit-learn: > > *1) CODE* > > - implementing *feature_selection.EFS and/or EFSC*V (a space for > evolutionary computing community interested in feature selection) > > RFE is: > > feature_selection. > <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> > *RFE*(estimator, *[, ...]) > > Feature ranking with recursive feature elimination. > > feature_selection.RFECV > <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> > (estimator, *[, ...]) > > Recursive feature elimination with cross-validation to select features. > The "EFS" could be: > > feature_selection. > <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> > *EFS*(estimator, *[, ...]) > > Feature ranking and feature elimination with *8 different algorithms, > SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with > evolutionary computing, swarm, genetic etc. * > > feature_selection. > <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> > *EFSCV*(estimator, *[, ...]) > > Feature elimination with cross-validation to select features > > *2) DATASETS & CANCER BENCHMARK* > > - curating and integrating fetch of *cancer_benchmark* 40 datasets, > directly in scikit-learn or externally pullable somehow and maintained > (space for contributing expanding high-dimensional datasets on cancer > topics). > > fetch_c > <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing> > ancer-benchmark(*[,, ...]) > > Loads 40 individual cancer related high-dimensional datasets for > benchmarking feature selection methods (classification). > > *3) TUTORIAL / WEBSITE* > > - writing tutorial to replicate IEEE paper results with > *feature_selection.EFS > and/or EFSCV* on *cancer_benchmark (40 datasets)* > > > I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of > very interesting novelty in working with high-dimensional datasets as it > reports small subsets of predictive features selected with SVM, KNN across > 40 datasets. Replicability under BSD-3 and high quality under scikit-learn > could assure benchmarking novel feature selection algorithms easier - in my > very first opinion. Since this is the very first touch of myself with IEEE > paper authors and the scikit-learn list altogether, we would welcome some > help/guide how integration could work out, and if there is any interest on > that line at all. > > Kind regards > Dalibor Hrg > https://www.linkedin.com/in/daliborhrg/ > > > On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort < > alexandre.gramf...@inria.fr> wrote: > >> Dear Dalibor >> >> you should discuss this on the main scikit-learn mailing list. >> >> https://mail.python.org/mailman/listinfo/scikit-learn >> >> Alex >> >> On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor....@gmail.com> >> wrote: >> >>> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc), >>> >>> This is a request to openly discuss the idea of potential for >>> feature_selection. >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >>> *EFS* which would stand for "Evolutionary Feature Selection" or shortly >>> EFS with starting 8 algorithms as published in IEEE >>> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional >>> datasets. I have identified this work to be of very interesting novelty in >>> working with high-dimensional datasets, especially for health fields, and >>> it could mean a lot to the ML community and scikit-learn project - in my >>> very first opinion. >>> >>> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE >>> paper/work as feature_selection. >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >>> *EFS *and 8 algorithms in it could be a near term goal. And eventually, >>> scikit-learn EFSCV and diverse classification algorithms could be >>> benchmarked for "joint paper" in JOSS, or a health journal. >>> >>> My initial idea (doesn't need to be that way or is open to discussion) >>> has some first thought like this: >>> >>> RFE has: >>> >>> feature_selection. >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >>> *RFE*(estimator, *[, ...]) >>> >>> Feature ranking with recursive feature elimination. >>> >>> feature_selection.RFECV >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> >>> (estimator, *[, ...]) >>> >>> Recursive feature elimination with cross-validation to select features. >>> The "EFS" could have: >>> >>> feature_selection. >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE> >>> *EFS*(estimator, *[, ...]) >>> >>> Feature ranking and feature elimination with *8 different algorithms, >>> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked >>> with evolutionary computing, swarm, genetic etc. * >>> >>> feature_selection. >>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV> >>> *EFSCV*(estimator, *[, ...]) >>> >>> Feature elimination with cross-validation to select features >>> Looking forward to an open discussion and if Evolutionary Feature >>> Selection EFS is something for sklearn project, or maybe a separate pip >>> install package. >>> >>> Kind regards >>> Dalibor Hrg >>> https://www.linkedin.com/in/daliborhrg/ >>> >>> On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadz...@yahoo.com> >>> wrote: >>> >>>> >>>> >>>> Dear Dalibor Hrg, >>>> >>>> Thank you very much for your attention to the SFE algorithm. Thank you >>>> very much for the time you took to guide me and my colleagues. According to >>>> your guidance, we will add this algorithm to the scikit-learn library as >>>> soon as possible. >>>> >>>> Kind regards, >>>> Ahadzadeh. >>>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg < >>>> dalibor....@gmail.com> wrote: >>>> >>>> >>>> Dear Authors, >>>> >>>> you have done some amazing work on feature selection here published in >>>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code >>>> here without a LICENSE file or any info on this: >>>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are >>>> mentioned to download data. >>>> >>>> I would be interested with you that we: >>>> >>>> Step 1) make and release a pip package, publish this code in JOSS >>>> https://joss.readthedocs.io i.e. >>>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license >>>> and replicate IEEE paper table results. All 8 algorithms could be in >>>> potentially one class "EFS" meaning "Evolutionary Feature Selection", >>>> selectable as 8 options among them SFE. Or something like that. >>>> >>>> Step 2) try integrate and work with scikit-learn people, I would >>>> recommend it to integrate this under >>>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection >>>> similarly >>>> to sklearn.feature_selection.RFE. I believe this would be a great >>>> contribution to the best open library for ML, scikit-learn. >>>> >>>> I am unsure what is the status of datasets and licenses therein?. But, >>>> the datasets could be fetched externally from OpenML.org repository, for >>>> example >>>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or >>>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit >>>> on the dataset licenses? >>>> >>>> Overall, I hope this can hugely maximize your published work visibility >>>> but also for others to credit you in papers in a more citable and >>>> replicable way. I believe your IEEE paper and work definitely deserve a >>>> spot in scikit-learn. There is need for some replicable code on >>>> "Evolutionary Methods for Feature Selection" and such Benchmark in >>>> life-science datasets, and you have done some great work so far. >>>> >>>> Let me know what you think. >>>> >>>> Best regards, >>>> Dalibor Hrg >>>> >>>> https://www.linkedin.com/in/daliborhrg/ >>>> >>> _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn >
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn