Re: Bug#1043240: transition: pandas 1.5 -> 2.1

2023-12-11 Thread Matthias Klose

On 11.12.23 08:12, Matthias Klose wrote:

On 10.12.23 14:06, Rebecca N. Palmer wrote:
Is this an acceptable amount of breakage or should we continue to 
wait? Bear in mind that if we wait too long, we may be forced into it 
by some transition further up the stack (e.g. a future Python or 
numpy) that breaks pandas 1.x.


up to the maintainers. But please wait at least until the current pandas 
and numpy migrated to testing, e.g. that the autopkg tests of pandas and 
numpy triggered by python3-defaults pass.


I just nmued pyrle and sorted-nearest, having dependencies on 
cython3-legacy, letting the pyranges autopkg tests fail. Once this 
succeeds, pandas should be able to migrate.




Re: Bug#1043240: transition: pandas 1.5 -> 2.1

2023-12-11 Thread Thomas Goirand

On 12/11/23 08:12, Matthias Klose wrote:

On 10.12.23 14:06, Rebecca N. Palmer wrote:
I'd like to move forward with the pandas 1.5 -> 2.1 transition 
reasonably soon.


Given that pandas 2.x is *not* required for Python 3.12 (but is 
required for Cython 3.0), should we wait for the Python 3.12 
transition to be done first?


These are broken by pandas 2.x and have a possible (but untested) fix 
in their bug - please test and apply it:
dask(?) dials influxdb-python* python-altair python-feather-format 
python-upsetplot seaborn tqdm*
(* = this package is currently also broken for a non-pandas reason, 
probably Python 3.12, that I don't have a fix for)


These are broken by pandas 2.x and have no known-to-me fix:
augur cnvkit dyda emperor esda mirtop pymatgen pyranges python-anndata 
python-biom-format python-cooler python-nanoget python-skbio 
python-ulmo q2-quality-control q2-demux q2-taxa q2-types q2templates 
sklearn-pandas
Some generic things to try are pandas.util.testing -> pandas.testing, 
.iteritems() -> .items(), and if one exists, a more recent upstream 
version.


Is this an acceptable amount of breakage or should we continue to 
wait? Bear in mind that if we wait too long, we may be forced into it 
by some transition further up the stack (e.g. a future Python or 
numpy) that breaks pandas 1.x.


up to the maintainers. But please wait at least until the current pandas 
and numpy migrated to testing, e.g. that the autopkg tests of pandas and 
numpy triggered by python3-defaults pass.


Is there a way to see the binNMUs which are still stuck in unstable, and 
don't migrate?


Matthias


As a reminder: it's best practice to first upload the new release to 
Experimental, so we can see what happens with autopkgtest before 
destroying everything at once...


Cheers,

Thomas Goirand (zigo)



Re: pandas 1.5 -> 2.1?

2023-12-11 Thread Julian Gilbey
Hi Kingsley,

On Sun, Dec 10, 2023 at 12:55:43PM -0800, Kingsley G. Morse Jr. wrote:
> Hi Rebecca, Julian and all science minded pythonistas of debian, great and 
> small!
> 
> I like your correspondence about upgrading from
> version 1.5 of pandas to 2.1.
> 
> It's open, scientific and explores the ideal of
> proceeding wisely in a matter of public interest.
> 
> My humble thoughts are:
> 
> 1.) Rebecca: *Why* did you write that you'd like
> to move forward with the pandas 1.5 -> 2.1
> transition? What's your reason?

A thought from me on this: pandas 2.1 has many improvements over
pandas 1.5.  And increasingly, other packages will be requiring these
new features.  So why would one not want to move forward with it?

> 2.) What may be the advantage of migrating to
> version 3.0 of Cython?

It is compatible with Python 3.12, whereas the current version of
Cython in Debian (0.29.x) is not really.  (For example, it has an
"import imp" in it, and this breaks with Python 3.12, which has
removed this deprecated module.)  As Cython 0.29.x is no longer
maintained upstream, having been superseded by Cython 3.x after many
years of development, our options are to either continue to patch
Cython 0.29.x within Debian to keep it working with Python 3.12 or to
upgrade to Cython 3.x.  As there is also software which now depends on
Cython 3.x to build, the former option seems unappealing.  (At best,
we might wish to keep the cython-legacy package around for building
packages which can't yet use Cython 3.x, but that should be a
short-term thing, not a long-term one.)

> 3.) The following one-liner suggests 44 debian
> packages might be affected by the breaks
> Rebecca said would be caused by pandas 2.x:
> 
> $ for s in augur cnvkit dyda emperor esda mirtop pymatgen pyranges 
> python-anndata python-biom-format python-cooler python-nanoget python-skbio 
> python-ulmo q2-quality-control q2-demux q2-taxa q2-types q2templates 
> sklearn-pandas ; do apt-cache search "$s" ; done | less

This does not seem like a particularly helpful one-liner; it picks up
packages such as python3-dyda-pipeline-config which are not in the
original list.  Instead, you perhaps want to count the number of
packages depending on these packages.  But what Rebecca is looking at
(I think) is how many packages would need fixing by the pandas
upgrade.

(But it is probably worse than this: I'm guessing these are only the
packages which fail to build with pandas 2.x or whose autopkgtest
fails with pandas 2.x.  But there may well be other breakage caused by
the upgrade which is not detectable in this way.  That is an issue
which will have to be handled by individual packages as they are
discovered, and the timing of the pandas upgrade is not related to
this problem.)

> 4.) The break that worries me the most is
> sklearn-pandas, because it seems to me that
> sklearn is 
> 
> popular and 
> 
> fundamental.

It seems that sklearn-pandas is abandoned; there were just two commits
in 2022, and prior to that was May 2021.  There has been no activity
since.  If someone is willing to patch it for Pandas 2.x, great
(perhaps you might help the maintainer to do this?), otherwise it
might have to drop out of Debian.

Best wishes,

   Julian