With thanks to Alex, Adrin and Christian, we have a proposal to implement what we used to call "sample props" that should be expressive enough for us to resolve tens of issues and PRs, but will be largely unobtrusive for most current users.
Core developers, please cast your vote in this PR <https://github.com/scikit-learn/enhancement_proposals/pull/52> after considering the proposal here <https://scikit-learn-enhancement-proposals.readthedocs.io/en/latest/slep006/proposal.html>, which has a partial implementation in #16079 <https://github.com/scikit-learn/scikit-learn/pull/16079>. In brief, the problem we are trying to solve: Scikit-learn has limited support for information pertaining to each sample (henceforth “sample properties”) to be passed through an estimation pipeline. The user can, for instance, pass fit parameters to all members of a FeatureUnion, or to a specified member of a Pipeline using dunder (__) prefixing: >>> from sklearn.pipeline import Pipeline>>> from sklearn.linear_model import >>> LogisticRegression>>> pipe = Pipeline([('clf', LogisticRegression())])>>> >>> pipe.fit([[1, 2], [3, 4]], [5, 6],... clf__sample_weight=[.5, .7]) Several other meta-estimators, such as GridSearchCV, support forwarding these fit parameters to their base estimator when fitting. Yet a number of important use cases are currently not supported. Features we currently do not support and wish to include: - passing sample properties (e.g. sample_weight <https://scikit-learn.org/stable/glossary.html#term-sample_weight>) to a scorer used in cross-validation - passing sample properties (e.g. groups <https://scikit-learn.org/stable/glossary.html#term-groups>) to a CV splitter in nested cross validation - passing sample properties (e.g. sample_weight <https://scikit-learn.org/stable/glossary.html#term-sample_weight>) to some scorers and not others in a multi-metric cross-validation setup Solution: Each consumer requests A meta-estimator provides along to its children only what they request. A meta-estimator needs to request, on behalf of its children, any metadata that descendant consumers request. Each object that could receive metadata should have a method called get_metadata_request() which returns a dict that specifies which metadata is consumed by each of its methods (keys of this dictionary are therefore method names, e.g. fit <https://scikit-learn.org/stable/glossary.html#term-fit>, transform <https://scikit-learn.org/stable/glossary.html#term-transform> etc.). Estimators supporting weighted fitting may return {} by default, but have a method called request_sample_weight which allows the user to specify the requested sample_weight <https://scikit-learn.org/stable/glossary.html#term-sample_weight> in each of its methods. make_scorer accepts request_metadata as keyword parameter through which the user can specify what metadata is requested. Regards, Joel
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn