EFSCV" and cancer_benchmark datasets

Dalibor Hrg Sun, 24 Sep 2023 02:13:07 -0700

Dear scikit-learn mailing list

similarly to standing feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>*RFE
and RFECV*, this is a request to openly discuss the *PROPOSAL* and
requirements of *feature_selection.EFS and/or EFSCV* which would stand for
"Evolutionary Feature Selection" with starting 8 algorithms or methods to
be used with scikit-learn estimators, just as published in IEEE
https://arxiv.org/abs/2303.10182 by the authors of paper. They agreed to
help integrate it (in cc).


*PROPOSAL*
Implement/integrate https://arxiv.org/abs/2303.10182 paper into
scikit-learn:

*1) CODE*

   - implementing *feature_selection.EFS and/or EFSC*V (a space for
   evolutionary computing community interested in feature selection)

RFE is:

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
*RFE*(estimator, *[, ...])

Feature ranking with recursive feature elimination.

feature_selection.RFECV
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
(estimator, *[, ...])

Recursive feature elimination with cross-validation to select features.
 The "EFS" could be:

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
*EFS*(estimator, *[, ...])

Feature ranking and feature elimination with *8 different algorithms, SFE,
SFE-PSO* etc. *<- new algorithms could be added and benchmarked with
evolutionary computing, swarm, genetic etc. *

feature_selection.
<https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
*EFSCV*(estimator, *[, ...])

Feature elimination with cross-validation to select features

*2) DATASETS & CANCER BENCHMARK*

   - curating and integrating fetch of *cancer_benchmark* 40 datasets,
   directly in scikit-learn or externally pullable somehow and maintained
   (space for contributing expanding high-dimensional datasets on cancer
   topics).

fetch_c
<https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html#sklearn.datasets.fetch_california_housing>
ancer-benchmark(*[,, ...])

Loads 40 individual cancer related high-dimensional datasets for
benchmarking feature selection methods (classification).

*3) TUTORIAL / WEBSITE*

   - writing tutorial to replicate IEEE paper results with
*feature_selection.EFS
   and/or EFSCV* on *cancer_benchmark (40 datasets)*


I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very
interesting novelty in working with high-dimensional datasets as it reports
small subsets of predictive features selected with SVM, KNN across 40
datasets. Replicability under BSD-3 and high quality under scikit-learn
could assure benchmarking novel feature selection algorithms easier - in my
very first opinion. Since this is the very first touch of myself with IEEE
paper authors and the scikit-learn list altogether, we would welcome some
help/guide how integration could work out, and if there is any interest on
that line at all.

Kind regards
Dalibor Hrg
https://www.linkedin.com/in/daliborhrg/


On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort <
alexandre.gramf...@inria.fr> wrote:

> Dear Dalibor
>
> you should discuss this on the main scikit-learn mailing list.
>
> https://mail.python.org/mailman/listinfo/scikit-learn
>
> Alex
>
> On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor....@gmail.com>
> wrote:
>
>> Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc),
>>
>> This is a request to openly discuss the idea of potential for
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS* which would stand for "Evolutionary Feature Selection" or shortly
>> EFS with starting 8 algorithms as published in IEEE
>> https://arxiv.org/abs/2303.10182 by the authors on high-dimensional
>> datasets. I have identified this work to be of very interesting novelty in
>> working with high-dimensional datasets, especially for health fields, and
>> it could mean a lot to the ML community and scikit-learn project - in my
>> very first opinion.
>>
>> A Jupyter Notebook and scikit-learn tutorial replicating this IEEE
>> paper/work as feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS *and 8 algorithms in it could be a near term goal. And eventually,
>> scikit-learn EFSCV and diverse classification algorithms could be
>> benchmarked for "joint paper" in JOSS, or a health journal.
>>
>> My initial idea (doesn't need to be that way or is open to discussion)
>> has some first thought like this:
>>
>> RFE has:
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *RFE*(estimator, *[, ...])
>>
>> Feature ranking with recursive feature elimination.
>>
>> feature_selection.RFECV
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>> (estimator, *[, ...])
>>
>> Recursive feature elimination with cross-validation to select features.
>>  The "EFS" could have:
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE>
>> *EFS*(estimator, *[, ...])
>>
>> Feature ranking and feature elimination with *8 different algorithms,
>> SFE, SFE-PSO* etc. *<- new algorithms could be added and benchmarked
>> with evolutionary computing, swarm, genetic etc. *
>>
>> feature_selection.
>> <https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFECV.html#sklearn.feature_selection.RFECV>
>> *EFSCV*(estimator, *[, ...])
>>
>> Feature elimination with cross-validation to select features
>> Looking forward to an open discussion and if Evolutionary Feature
>> Selection EFS is something for sklearn project, or maybe a separate pip
>> install package.
>>
>> Kind regards
>> Dalibor Hrg
>> https://www.linkedin.com/in/daliborhrg/
>>
>> On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade <b.ahadz...@yahoo.com>
>> wrote:
>>
>>>
>>>
>>> Dear Dalibor Hrg,
>>>
>>> Thank you very much for your attention to the SFE algorithm. Thank you
>>> very much for the time you took to guide me and my colleagues. According to
>>> your guidance, we will add this algorithm to the scikit-learn library as
>>> soon as possible.
>>>
>>> Kind regards,
>>> Ahadzadeh.
>>> On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor Hrg <
>>> dalibor....@gmail.com> wrote:
>>>
>>>
>>> Dear Authors,
>>>
>>> you have done some amazing work on feature selection here published in
>>> IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python code
>>> here without a LICENSE file or any info on this:
>>> https://github.com/Ahadzadeh2022/SFE and in the paper some links are
>>> mentioned to download data.
>>>
>>> I would be interested with you that we:
>>>
>>> Step 1) make and release a pip package, publish this code in JOSS
>>> https://joss.readthedocs.io i.e.
>>> https://joss.theoj.org/papers/10.21105/joss.04611 under BSD-3 license
>>> and replicate IEEE paper table results. All 8 algorithms could be in
>>> potentially one class "EFS" meaning "Evolutionary Feature Selection",
>>> selectable as 8 options among them SFE. Or something like that.
>>>
>>> Step 2) try integrate and work with scikit-learn people, I would
>>> recommend it to integrate this under
>>> https://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
>>>  similarly
>>> to sklearn.feature_selection.RFE. I believe this would be a great
>>> contribution to the best open library for ML, scikit-learn.
>>>
>>> I am unsure what is the status of datasets and licenses therein?. But,
>>> the datasets could be fetched externally from OpenML.org repository, for
>>> example
>>> https://scikit-learn.org/stable/datasets/loading_other_datasets.html or
>>> CERN Zenodo where "benchmark datasets" could be expanded. It depends a bit
>>> on the dataset licenses?
>>>
>>> Overall, I hope this can hugely maximize your published work visibility
>>> but also for others to credit you in papers in a more citable and
>>> replicable way. I believe your IEEE paper and work definitely deserve a
>>> spot in scikit-learn. There is need for some replicable code on
>>> "Evolutionary Methods for Feature Selection" and such Benchmark in
>>> life-science datasets, and you have done some great work so far.
>>>
>>> Let me know what you think.
>>>
>>> Best regards,
>>> Dalibor Hrg
>>>
>>> https://www.linkedin.com/in/daliborhrg/
>>>
>>

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

[scikit-learn] Request / Proposal: integrating IEEE paper in scikit-learn as "feature_selection.EFS / EFSCV" and cancer_benchmark datasets

Reply via email to