Hi there, I'd like to bring your attention to a proposal being discussed among pandas developers, regarding copy-on-write semantics.
A very short summary of the proposal, according to the document <https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit#>, is: *- The result of any indexing operation (subsetting a DataFrame or Series in any way, i.e. including accessing a DataFrame column as a Series) or any method returning a new DataFrame or Series, always behaves as if it were a copy in terms of user API.- We implement Copy-on-Write (as implementation detail). This way, we can actually use views as much as possible under the hood, while ensuring the user API behaves as a copy.* *- As a consequence, if you want to modify an object (DataFrame or Series), the only way to do this is to modify that object itself directly.* *This addresses multiple aspects: 1) a clear and consistent user API (a clear rule: any subset or returned series/dataframe always behaves as a copy of the original, and thus never modifies the original) and 2) improving performance by avoiding excessive copies (eg a chained method workflow would no longer return an actual data copy at each step). Because every single indexing step behaves as a copy, this also means that with this proposal, “chained assignment” (with multiple setitem steps) will never work.* You can also read the related discussion on the pandas mailing list here <https://mail.python.org/pipermail/pandas-dev/2021-July/001358.html>. It would be nice for us to think about the implications of this proposal on our work related to supporting pandas dataframes. Cheers, Adrin
_______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn