Dear Dalibor, As detailed in the FAQ, https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms """ We only consider well-established algorithms for inclusion. A rule of thumb is at least 3 years since publication, 200+ citations, and wide use and usefulness. """
These days, I would say that the bar is even harder, as we are finding that we prioritize things such as high-quality documentation or better dataframe support to new algorithms. Best, Gaël On Sun, Sep 24, 2023 at 11:10:23AM +0200, Dalibor Hrg wrote: > Dear scikit-learn mailing list > similarly to standing feature_selection.RFE and RFECV, this is a request to > openly discuss the PROPOSAL and requirements of feature_selection.EFS and/or > EFSCV which would stand for "Evolutionary Feature Selection" with starting 8 > algorithms or methods to be used with scikit-learn estimators, just as > published in IEEE https://arxiv.org/abs/2303.10182 by the authors of paper. > They agreed to help integrate it (in cc). > PROPOSAL > Implement/integrate https://arxiv.org/abs/2303.10182 paper into scikit-learn: > 1) CODE > • implementing feature_selection.EFS and/or EFSCV (a space for evolutionary > computing community interested in feature selection) > RFE is: > feature_selection.RFE Feature ranking with recursive feature > (estimator, *[, ...]) elimination. > feature_selection.RFECV Recursive feature elimination with > (estimator, *[, ...]) cross-validation to select features. > The "EFS" could be: > Feature ranking and feature elimination with 8 > feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new > (estimator, *[, ...]) algorithms could be added and benchmarked with > evolutionary computing, swarm, genetic etc. > feature_selection.EFSCV Feature elimination with cross-validation to select > (estimator, *[, ...]) features > 2) DATASETS & CANCER BENCHMARK > • curating and integrating fetch of cancer_benchmark 40 datasets, directly > in > scikit-learn or externally pullable somehow and maintained (space for > contributing expanding high-dimensional datasets on cancer topics). > fetch_cancer-benchmark Loads 40 individual cancer related high-dimensional > (*[,, ...]) datasets for benchmarking feature selection methods > (classification). > 3) TUTORIAL / WEBSITE > • writing tutorial to replicate IEEE paper results with > feature_selection.EFS > and/or EFSCV on cancer_benchmark (40 datasets) > I have identified IEEE work https://arxiv.org/abs/2303.10182 to be of very > interesting novelty in working with high-dimensional datasets as it reports > small subsets of predictive features selected with SVM, KNN across 40 > datasets. > Replicability under BSD-3 and high quality under scikit-learn could assure > benchmarking novel feature selection algorithms easier - in my very first > opinion. Since this is the very first touch of myself with IEEE paper authors > and the scikit-learn list altogether, we would welcome some help/guide > how integration could work out, and if there is any interest on that line at > all. > Kind regards > Dalibor Hrg > https://www.linkedin.com/in/daliborhrg/ > > On Sat, Sep 23, 2023 at 9:08 AM Alexandre Gramfort > <alexandre.gramf...@inria.fr > > wrote: > Dear Dalibor > you should discuss this on the main scikit-learn mailing list. > https://mail.python.org/mailman/listinfo/scikit-learn > Alex > On Fri, Sep 22, 2023 at 12:19 PM Dalibor Hrg <dalibor....@gmail.com> > wrote: > Dear sklearn feature_selection.RFE Team and IEEE Authors (in-cc), > This is a request to openly discuss the idea of potential for > feature_selection.EFS which would stand for "Evolutionary Feature > Selection" or shortly EFS with starting 8 algorithms as published in > IEEE https://arxiv.org/abs/2303.10182 by the authors on > high-dimensional datasets. I have identified this work to be of very > interesting novelty in working with high-dimensional datasets, > especially for health fields, and it could mean a lot to the ML > community and scikit-learn project - in my very first opinion. > A Jupyter Notebook and scikit-learn tutorial replicating this IEEE > paper/work as feature_selection.EFS and 8 algorithms in it could be a > near term goal. And eventually, scikit-learn EFSCV and diverse > classification algorithms could be benchmarked for "joint paper" in > JOSS, or a health journal. > My initial idea (doesn't need to be that way or is open to discussion) > has some first thought like this: > > RFE has: > feature_selection.RFE Feature ranking with recursive feature > (estimator, *[, ...]) elimination. > feature_selection.RFECV Recursive feature elimination with > (estimator, *[, ...]) cross-validation to select features. > The "EFS" could have: > Feature ranking and feature elimination with 8 > feature_selection.EFS different algorithms, SFE, SFE-PSO etc. <- new > (estimator, *[, ...]) algorithms could be added and benchmarked with > evolutionary computing, swarm, genetic etc. > feature_selection.EFSCV Feature elimination with cross-validation to > (estimator, *[, ...]) select features > Looking forward to an open discussion and if Evolutionary Feature > Selection EFS is something for sklearn project, or maybe a separate > pip > install package. > Kind regards > Dalibor Hrg > https://www.linkedin.com/in/daliborhrg/ > On Fri, Sep 22, 2023 at 10:50 AM Behrooz Ahadzade > <b.ahadz...@yahoo.com > > wrote: > Dear Dalibor Hrg, > Thank you very much for your attention to the SFE algorithm. Thank > you very much for the time you took to guide me and my colleagues. > According to your guidance, we will add this algorithm to the > scikit-learn library as soon as possible. > Kind regards, > Ahadzadeh. > On Wednesday, September 13, 2023 at 12:22:04 AM GMT+3:30, Dalibor > Hrg <dalibor....@gmail.com> wrote: > Dear Authors, > you have done some amazing work on feature selection here > published > in IEEE: https://arxiv.org/abs/2303.10182 . I have noticed Python > code here without a LICENSE file or any info on this: https:// > github.com/Ahadzadeh2022/SFE and in the paper some links are > mentioned to download data. > I would be interested with you that we: > Step 1) make and release a pip package, publish this code in JOSS > https://joss.readthedocs.io i.e. https://joss.theoj.org/papers/ > 10.21105/joss.04611 under BSD-3 license and replicate IEEE paper > table results. All 8 algorithms could be in potentially one class > "EFS" meaning "Evolutionary Feature Selection", selectable as 8 > options among them SFE. Or something like that. > > Step 2) try integrate and work with scikit-learn people, I would > recommend it to integrate this under https://scikit-learn.org/ > stable/modules/classes.html#module-sklearn.feature_selection > similarly to sklearn.feature_selection.RFE. I believe this would > be a great contribution to the best open library for ML, > scikit-learn. > I am unsure what is the status of datasets and licenses therein?. > But, the datasets could be fetched externally from OpenML.org > repository, for example https://scikit-learn.org/stable/datasets/ > loading_other_datasets.html or CERN Zenodo where "benchmark > datasets" could be expanded. It depends a bit on the dataset > licenses? > Overall, I hope this can hugely maximize your published work > visibility but also for others to credit you in papers in a more > citable and replicable way. I believe your IEEE paper and work > definitely deserve a spot in scikit-learn. There is need for some > replicable code on "Evolutionary Methods for Feature Selection" > and > such Benchmark in life-science datasets, and you have done some > great work so far. > Let me know what you think. > Best regards, > Dalibor Hrg > https://www.linkedin.com/in/daliborhrg/ > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn -- Gael Varoquaux Research Director, INRIA http://gael-varoquaux.info http://twitter.com/GaelVaroquaux _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn