starting with the Efroymson stepwise regression, the selection of relevant
regressors has a long history. Of course, Efroymson's case is an old and
simple one in a very wide set of more general problems where the number of
variables and the missingness pattern make things very hard to tackle.
I had a look at the paper that seems to me to be based on a wide review of
the literature and an in depth focus on the main extant algorithms. I do
not feel as an expert about the matter. However, the subject is so
important that, in view of the thorough analysis the authors performed, I
think this enterprise worthwhile.
My best regards. Ulderico Santarelli.

Il giorno dom 24 set 2023 alle ore 11:12 Dalibor Hrg <dalibor....@gmail.com>
ha scritto:

> Dear scikit-learn mailing list
>
> similarly to standing feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE
> and RFECV*, this is a request to openly discuss the *PROPOSAL* and
> requirements of *feature_selection.EFS and/or EFSCV* which would stand
> for "Evolutionary Feature Selection" with starting 8 algorithms or methods
> to be used with scikit-learn estimators, just as published in IEEE
> https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to
> help integrate it (in cc).
>
> *PROPOSAL*
> Implement/integrate https://arxiv.org/abs/2303.10182 paper into
> scikit-learn:
>
> *1) CODE*
>
>    - implementing *feature_selection.EFS and/or EFSC*V (a space for
>    evolutionary computing community interested in feature selection)
>
> RFE is:
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
> *RFE*(estimator, *[, ...])
>
> Feature ranking with recursive feature elimination.
>
> feature_selection.RFECV
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
> (estimator, *[, ...])
>
> Recursive feature elimination with cross-validation to select features.
>  The "EFS" could be:
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
> *EFS*(estimator, *[, ...])
>
> Feature ranking and feature elimination with *8 different algorithms,
> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked with
> evolutionary computing, swarm, genetic etc. *
>
> feature_selection.
> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
> *EFSCV*(estimator, *[, ...])
>
> Feature elimination with cross-validation to select features
>
> *2) DATASETS & CANCER BENCHMARK*
>
>    - curating and integrating fetch of *cancer_benchmark* 40 datasets,
>    directly in scikit-learn or externally pullable somehow and maintained
>    (space for contributing expanding high-dimensional datasets on cancer
>    topics).
>
> fetch_c
> <https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing>
> ancer-benchmark(*[,, ...])
>
> Loads 40 individual cancer related high-dimensional datasets for
> benchmarking feature selection methods (classification).
>
> *3) TUTORIAL / WEBSITE*
>
>    - writing tutorial to replicate IEEE paper results with 
> *feature_selection.EFS
>    and/or EFSCV* on *cancer_benchmark (40 datasets)*
>
>
> I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of
> very interesting novelty in working with high-dimensional datasets as it
> reports small subsets of predictive features selected with SVM, KNN across
> 40 datasets. Replicability under BSD-3 and high quality under scikit-learn
> could assure benchmarking novel feature selection algorithms easier - in my
> very first opinion. Since this is the very first touch of myself with IEEE
> paper authors and the scikit-learn list altogether, we would welcome some
> help/guide how integration could work out, and if there is any interest on
> that line at all.
>
> Kind regards
> Dalibor Hrg
> https://www.linkedin.com/in/daliborhrg/
>
>
> On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort <
> alexandre.gramf...@inria.fr> wrote:
>
>> Dear Dalibor
>>
>> you should discuss this on the main scikit-learn mailing list.
>>
>> https://mail.python.org/mailman/listinfo/scikit-learn
>>
>> Alex
>>
>> On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor....@gmail.com>
>> wrote:
>>
>>> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>>>
>>> This is a request to openly discuss the idea of potential for
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS* which would stand for "Evolutionary Feature Selection" or shortly
>>> EFS with starting 8 algorithms as published in IEEE
>>> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional
>>> datasets. I have identified this work to be of very interesting novelty in
>>> working with high-dimensional datasets, especially for health fields, and
>>> it could mean a lot to the ML community and scikit-learn project - in my
>>> very first opinion.
>>>
>>> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
>>> paper/work as feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS *and 8 algorithms in it could be a near term goal. And eventually,
>>> scikit-learn EFSCV and diverse classification algorithms could be
>>> benchmarked for "joint paper" in JOSS, or a health journal.
>>>
>>> My initial idea (doesn't need to be that way or is open to discussion)
>>> has some first thought like this:
>>>
>>> RFE has:
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *RFE*(estimator, *[, ...])
>>>
>>> Feature ranking with recursive feature elimination.
>>>
>>> feature_selection.RFECV
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>>> (estimator, *[, ...])
>>>
>>> Recursive feature elimination with cross-validation to select features.
>>>  The "EFS" could have:
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>>> *EFS*(estimator, *[, ...])
>>>
>>> Feature ranking and feature elimination with *8 different algorithms,
>>> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked
>>> with evolutionary computing, swarm, genetic etc. *
>>>
>>> feature_selection.
>>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>>> *EFSCV*(estimator, *[, ...])
>>>
>>> Feature elimination with cross-validation to select features
>>> Looking forward to an open discussion and if Evolutionary Feature
>>> Selection EFS is something for sklearn project, or maybe a separate pip
>>> install package.
>>>
>>> Kind regards
>>> Dalibor Hrg
>>> https://www.linkedin.com/in/daliborhrg/
>>>
>>> On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadz...@yahoo.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> Dear Dalibor Hrg,
>>>>
>>>> Thank you very much for your attention to the SFE algorithm. Thank you
>>>> very much for the time you took to guide me and my colleagues. According to
>>>> your guidance, we will add this algorithm to the scikit-learn library as
>>>> soon as possible.
>>>>
>>>> Kind regards,
>>>> Ahadzadeh.
>>>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg <
>>>> dalibor....@gmail.com> wrote:
>>>>
>>>>
>>>> Dear Authors,
>>>>
>>>> you have done some amazing work on feature selection here published in
>>>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code
>>>> here without a LICENSE file or any info on this:
>>>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are
>>>> mentioned to download data.
>>>>
>>>> I would be interested with you that we:
>>>>
>>>> Step 1) make and release a pip package, publish this code in JOSS
>>>> https://joss.readthedocs.io i.e.
>>>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license
>>>> and replicate IEEE paper table results. All 8 algorithms could be in
>>>> potentially one class "EFS" meaning "Evolutionary Feature Selection",
>>>> selectable as 8 options among them SFE. Or something like that.
>>>>
>>>> Step 2) try integrate and work with scikit-learn people, I would
>>>> recommend it to integrate this under
>>>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
>>>>  similarly
>>>> to sklearn.feature_selection.RFE. I believe this would be a great
>>>> contribution to the best open library for ML, scikit-learn.
>>>>
>>>> I am unsure what is the status of datasets and licenses therein?. But,
>>>> the datasets could be fetched externally from OpenML.org repository, for
>>>> example
>>>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or
>>>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit
>>>> on the dataset licenses?
>>>>
>>>> Overall, I hope this can hugely maximize your published work visibility
>>>> but also for others to credit you in papers in a more citable and
>>>> replicable way. I believe your IEEE paper and work definitely deserve a
>>>> spot in scikit-learn. There is need for some replicable code on
>>>> "Evolutionary Methods for Feature Selection" and such Benchmark in
>>>> life-science datasets, and you have done some great work so far.
>>>>
>>>> Let me know what you think.
>>>>
>>>> Best regards,
>>>> Dalibor Hrg
>>>>
>>>> https://www.linkedin.com/in/daliborhrg/
>>>>
>>> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn
>
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Reply via email to