Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)
On 3/25/24 19:17, Julian Gilbey wrote: Hi all, [NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to set to d-science and the RFP bug only] An update on Apache Arrow, and in particular the Python library PyArrow. For those who don't know: Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable big data systems to process and move data fast. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. The project is developing a multi-language collection of libraries for solving systems problems related to in-memory analytical data processing. This includes such topics as: * Zero-copy shared memory and RPC-based data movement * Reading and writing file formats (like CSV, Apache ORC, and Apache Parquet) * In-memory analytics and query processing (from: https://arrow.apache.org/docs/index.html) Pandas has announced that Pandas 3.x will depend on PyArrow in a critical way (it will back the "string" datatype), and it is due to be released imminently. So this is a plea for anyone looking for something really helpful to do: it would be great to have a group of developers finally package this! There was some initial work done (see the RFP bug report for details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021), but that is fairly old now. As Apache Arrow supports numerous languages, it may well benefit from having a group of developers with different areas of expertise to build it. (Or perhaps it would make more sense to split the upstream source into a collection of different Debian source packages for the different supported languages. I don't know.) Unfortunately I don't have the capacity to devote any time to it myself. Thanks in advance for anyone who can step forward for this! Best wishes, Julian Hi, I may not have much available time to help, though I'd love to have Arrow in Debian, as Ceph uses it, and currently use an embedded version. Cheers, Thomas Goirand (zigo)
Re: Bug#1043240: transition: pandas 1.5 -> 2.1
On 12/11/23 08:12, Matthias Klose wrote: On 10.12.23 14:06, Rebecca N. Palmer wrote: I'd like to move forward with the pandas 1.5 -> 2.1 transition reasonably soon. Given that pandas 2.x is *not* required for Python 3.12 (but is required for Cython 3.0), should we wait for the Python 3.12 transition to be done first? These are broken by pandas 2.x and have a possible (but untested) fix in their bug - please test and apply it: dask(?) dials influxdb-python* python-altair python-feather-format python-upsetplot seaborn tqdm* (* = this package is currently also broken for a non-pandas reason, probably Python 3.12, that I don't have a fix for) These are broken by pandas 2.x and have no known-to-me fix: augur cnvkit dyda emperor esda mirtop pymatgen pyranges python-anndata python-biom-format python-cooler python-nanoget python-skbio python-ulmo q2-quality-control q2-demux q2-taxa q2-types q2templates sklearn-pandas Some generic things to try are pandas.util.testing -> pandas.testing, .iteritems() -> .items(), and if one exists, a more recent upstream version. Is this an acceptable amount of breakage or should we continue to wait? Bear in mind that if we wait too long, we may be forced into it by some transition further up the stack (e.g. a future Python or numpy) that breaks pandas 1.x. up to the maintainers. But please wait at least until the current pandas and numpy migrated to testing, e.g. that the autopkg tests of pandas and numpy triggered by python3-defaults pass. Is there a way to see the binNMUs which are still stuck in unstable, and don't migrate? Matthias As a reminder: it's best practice to first upload the new release to Experimental, so we can see what happens with autopkgtest before destroying everything at once... Cheers, Thomas Goirand (zigo)
Re: Python 3.9 for bullseye
On 10/18/20 12:13 PM, Matthias Klose wrote: > Python 3.9 as a supported Python3 version is now in unstable, and all binNMUs > are done (thanks to Graham for the work). Bug reports should be all filed > for > all known problems [1], and the current state of the 3.9 addition can be seen > at > [2] (a few of the "bad" are false packages with b-d n python3-all-dev, but not > building for 3.9, bug reports also filed). > > The major outstanding issue is the pandas stack, all other problems are found > in > leaf packages (leaf in the sense of that no other package for the 3.9 addition > is blocked). > > Please help fixing the remaining issues! > > Matthias Hi Matthias, I don't know if that was on purpose, but you happen to upload Python 3.9 in Unstable the day OpenStack was released. I then rebuilt all of OpenStack to upload from Experimental to Unstable, and to my surprise, it went very well (note: all packages in the OpenStack team are running unit tests at build time on all available Python versions). Only 2 issues happened multiple times: - base64.{en,de}codestring removal (easy fix: s/string/bytes/) - Threading.isAlive removal (easy fix too: s/isAlive/is_alive/) This happened This is on a set of 200+ packages which I manually rebuilt. I do expect that there will be more packages with the same issue, so it'd be nice to have all Python-using packages rebuilt. As Lucas Nussbaum proposed such a service, should we ask him to do such a massive rebuilt? Or maybe you have other plans? Cheers, Thomas Goirand (zigo)
Re: dropping python2 [was Re: scientific python stack transitions]
On 7/8/19 10:10 AM, Ansgar wrote: > Thomas Goirand writes: >> On 7/7/19 5:31 PM, Matthias Klose wrote: >>> you can start dropping it now, however please don't drop anything yet with >>> reverse dependencies. So leaf packages first. >> >> I'm sorry, but I think I need to contest this. Doing things in order, >> first leaf, then go all the way back, will take too long, and this is >> IMO unnecessary effort. Older binary packages will anyway stay in the >> archive as long as they are needed, and no FTP hint is added (please >> correct me if I'm wrong here... but that's what I saw in the past). > > Packages usually don't migrate to testing if they cause packages to be > uninstallable which will happen if you start breaking reverse > dependencies. Will that be the case here? Right. Which will be an incentive to fix the reverse-dependency, and a way to see what work is remaining to be done. Maybe we can also do a mass-bugfilling after a period we can discuss (probably during Debconf?). > Removing an entire language isn't a simple case, even less so when > doesn't mean we just remove all packages written in said language Right. Thomas
Re: dropping python2 [was Re: scientific python stack transitions]
On 7/7/19 5:31 PM, Matthias Klose wrote: > On 07.07.19 16:55, Drew Parsons wrote: >> On 2019-07-07 22:46, Mo Zhou wrote: >>> Hi science team, >>> >>> By the way, when do we start dropping python2 support? >>> The upstreams of the whole python scientific computing >>> stack had already started dropping it. >> >> Good question. I think it is on the agenda this cycle, but debian-python >> will >> have the call on it. > > you can start dropping it now, however please don't drop anything yet with > reverse dependencies. So leaf packages first. > > Matthias I'm sorry, but I think I need to contest this. Doing things in order, first leaf, then go all the way back, will take too long, and this is IMO unnecessary effort. Older binary packages will anyway stay in the archive as long as they are needed, and no FTP hint is added (please correct me if I'm wrong here... but that's what I saw in the past). Also, those who aren't doing the work of switching to Py2 will need to be kicked away anyway. You've done some pretty destructive transitions yourself for other stuff, so why should we bother on this simple case? Last, a 2 years cycle to get rid of all traces of Python 2 *will* take some time, maybe more than we can even think of, so we'd better take some shortcuts if we can. Cheers, Thomas Goirand (zigo)
Re: Bug#750713: ITP: gf-complete -- Galois Field Arithmetic
On 06/17/2014 09:55 PM, Andreas Tille wrote: On Fri, Jun 06, 2014 at 04:15:48PM +0800, Thomas Goirand wrote: Package: wnpp Severity: wishlist Owner: Thomas Goirand z...@debian.org * Package name: gf-complete Version : 1.02~0+2014.05.git259d53ea590b Upstream Author : Jim Plank pl...@cs.utk.edu * URL : https://bitbucket.org/jimplank/gf-complete * License : BSD-3-clause Programming Lang: C Description : Galois Field Arithmetic Galois Field arithmetic forms the backbone of erasure-coded storage systems, most famously the Reed-Solomon erasure code. A Galois Field is defined over w-bit words and is termed GF(2w). As such, the elements of a Galois Field are the integers 0, 1, . . ., 2w − 1. Galois Field arithmetic defines addition and multiplication over these closed sets of integers in such a way that they work as you would hope they would work. Specifically, every number has a unique multiplicative inverse. Moreover, there is a value, typically the value 2, which has the property that you can enumerate all of the non-zero elements of the field by taking that value to successively higher powers. Hi Thomas, looks like a nice target to be maintained in Debian Science. Kind regards Andreas. Actually, this is very applied science. It's used by jerasure, which itself is used by PyECLib, which itself will be used by Swift for doing storage with erasure support. I believe Ceph already uses Jerasure too. Which is why I'm packaging all this: this is useful for OpenStack. Cheers, Thomas P.S: This already is waiting in the NEW queue. -- To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org Archive: https://lists.debian.org/53a06b74.5070...@debian.org