Re: Seeking a small group to package Apache Arrow (was: Bug#970021: RFP: apache-arrow -- cross-language development platform for in-memory analytics)

2024-04-04 Thread Thomas Goirand

On 3/25/24 19:17, Julian Gilbey wrote:

Hi all,

[NB: sent to d-science, d-python, d-devel and the RFP bug; reply-to
set to d-science and the RFP bug only]

An update on Apache Arrow, and in particular the Python library
PyArrow.  For those who don't know:

   Apache Arrow is a development platform for in-memory analytics. It
   contains a set of technologies that enable big data systems to
   process and move data fast. It specifies a standardized
   language-independent columnar memory format for flat and
   hierarchical data, organized for efficient analytic operations on
   modern hardware.

   The project is developing a multi-language collection of libraries
   for solving systems problems related to in-memory analytical data
   processing. This includes such topics as:

   * Zero-copy shared memory and RPC-based data movement

   * Reading and writing file formats (like CSV, Apache ORC, and Apache
 Parquet)

   * In-memory analytics and query processing

   (from: https://arrow.apache.org/docs/index.html)

Pandas has announced that Pandas 3.x will depend on PyArrow
in a critical way (it will back the "string" datatype), and it is due
to be released imminently.

So this is a plea for anyone looking for something really helpful to
do: it would be great to have a group of developers finally package
this!  There was some initial work done (see the RFP bug report for
details: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=970021),
but that is fairly old now.  As Apache Arrow supports numerous
languages, it may well benefit from having a group of developers with
different areas of expertise to build it.  (Or perhaps it would make
more sense to split the upstream source into a collection of different
Debian source packages for the different supported languages.  I don't
know.)  Unfortunately I don't have the capacity to devote any time to
it myself.

Thanks in advance for anyone who can step forward for this!

Best wishes,

Julian


Hi,

I may not have much available time to help, though I'd love to have 
Arrow in Debian, as Ceph uses it, and currently use an embedded version.


Cheers,

Thomas Goirand (zigo)



Re: Bug#1043240: transition: pandas 1.5 -> 2.1

2023-12-11 Thread Thomas Goirand

On 12/11/23 08:12, Matthias Klose wrote:

On 10.12.23 14:06, Rebecca N. Palmer wrote:
I'd like to move forward with the pandas 1.5 -> 2.1 transition 
reasonably soon.


Given that pandas 2.x is *not* required for Python 3.12 (but is 
required for Cython 3.0), should we wait for the Python 3.12 
transition to be done first?


These are broken by pandas 2.x and have a possible (but untested) fix 
in their bug - please test and apply it:
dask(?) dials influxdb-python* python-altair python-feather-format 
python-upsetplot seaborn tqdm*
(* = this package is currently also broken for a non-pandas reason, 
probably Python 3.12, that I don't have a fix for)


These are broken by pandas 2.x and have no known-to-me fix:
augur cnvkit dyda emperor esda mirtop pymatgen pyranges python-anndata 
python-biom-format python-cooler python-nanoget python-skbio 
python-ulmo q2-quality-control q2-demux q2-taxa q2-types q2templates 
sklearn-pandas
Some generic things to try are pandas.util.testing -> pandas.testing, 
.iteritems() -> .items(), and if one exists, a more recent upstream 
version.


Is this an acceptable amount of breakage or should we continue to 
wait? Bear in mind that if we wait too long, we may be forced into it 
by some transition further up the stack (e.g. a future Python or 
numpy) that breaks pandas 1.x.


up to the maintainers. But please wait at least until the current pandas 
and numpy migrated to testing, e.g. that the autopkg tests of pandas and 
numpy triggered by python3-defaults pass.


Is there a way to see the binNMUs which are still stuck in unstable, and 
don't migrate?


Matthias


As a reminder: it's best practice to first upload the new release to 
Experimental, so we can see what happens with autopkgtest before 
destroying everything at once...


Cheers,

Thomas Goirand (zigo)



Re: Python 3.9 for bullseye

2020-10-26 Thread Thomas Goirand
On 10/18/20 12:13 PM, Matthias Klose wrote:
> Python 3.9 as a supported Python3 version is now in unstable, and all binNMUs
> are done (thanks to Graham for the work).   Bug reports should be all filed 
> for
> all known problems [1], and the current state of the 3.9 addition can be seen 
> at
> [2] (a few of the "bad" are false packages with b-d n python3-all-dev, but not
> building for 3.9, bug reports also filed).
> 
> The major outstanding issue is the pandas stack, all other problems are found 
> in
> leaf packages (leaf in the sense of that no other package for the 3.9 addition
> is blocked).
> 
> Please help fixing the remaining issues!
> 
> Matthias

Hi Matthias,

I don't know if that was on purpose, but you happen to upload Python 3.9
in Unstable the day OpenStack was released. I then rebuilt all of
OpenStack to upload from Experimental to Unstable, and to my surprise,
it went very well (note: all packages in the OpenStack team are running
unit tests at build time on all available Python versions). Only 2
issues happened multiple times:
- base64.{en,de}codestring removal (easy fix: s/string/bytes/)
- Threading.isAlive removal (easy fix too: s/isAlive/is_alive/)

This happened

This is on a set of 200+ packages which I manually rebuilt.

I do expect that there will be more packages with the same issue, so
it'd be nice to have all Python-using packages rebuilt. As Lucas
Nussbaum proposed such a service, should we ask him to do such a massive
rebuilt? Or maybe you have other plans?

Cheers,

Thomas Goirand (zigo)



Re: dropping python2 [was Re: scientific python stack transitions]

2019-07-08 Thread Thomas Goirand
On 7/8/19 10:10 AM, Ansgar wrote:
> Thomas Goirand writes:
>> On 7/7/19 5:31 PM, Matthias Klose wrote:
>>> you can start dropping it now, however please don't drop anything yet with
>>> reverse dependencies.  So leaf packages first.
>>
>> I'm sorry, but I think I need to contest this. Doing things in order,
>> first leaf, then go all the way back, will take too long, and this is
>> IMO unnecessary effort. Older binary packages will anyway stay in the
>> archive as long as they are needed, and no FTP hint is added (please
>> correct me if I'm wrong here... but that's what I saw in the past).
> 
> Packages usually don't migrate to testing if they cause packages to be
> uninstallable which will happen if you start breaking reverse
> dependencies.  Will that be the case here?

Right. Which will be an incentive to fix the reverse-dependency, and a
way to see what work is remaining to be done.

Maybe we can also do a mass-bugfilling after a period we can discuss
(probably during Debconf?).

> Removing an entire language isn't a simple case, even less so when
> doesn't mean we just remove all packages written in said language

Right.

Thomas



Re: dropping python2 [was Re: scientific python stack transitions]

2019-07-07 Thread Thomas Goirand
On 7/7/19 5:31 PM, Matthias Klose wrote:
> On 07.07.19 16:55, Drew Parsons wrote:
>> On 2019-07-07 22:46, Mo Zhou wrote:
>>> Hi science team,
>>>
>>> By the way, when do we start dropping python2 support?
>>> The upstreams of the whole python scientific computing
>>> stack had already started dropping it.
>>
>> Good question.  I think it is on the agenda this cycle, but debian-python 
>> will
>> have the call on it.
> 
> you can start dropping it now, however please don't drop anything yet with
> reverse dependencies.  So leaf packages first.
> 
> Matthias

I'm sorry, but I think I need to contest this. Doing things in order,
first leaf, then go all the way back, will take too long, and this is
IMO unnecessary effort. Older binary packages will anyway stay in the
archive as long as they are needed, and no FTP hint is added (please
correct me if I'm wrong here... but that's what I saw in the past).

Also, those who aren't doing the work of switching to Py2 will need to
be kicked away anyway.

You've done some pretty destructive transitions yourself for other
stuff, so why should we bother on this simple case?

Last, a 2 years cycle to get rid of all traces of Python 2 *will* take
some time, maybe more than we can even think of, so we'd better take
some shortcuts if we can.

Cheers,

Thomas Goirand (zigo)



Re: Bug#750713: ITP: gf-complete -- Galois Field Arithmetic

2014-06-17 Thread Thomas Goirand
On 06/17/2014 09:55 PM, Andreas Tille wrote:
 On Fri, Jun 06, 2014 at 04:15:48PM +0800, Thomas Goirand wrote:
 Package: wnpp
 Severity: wishlist
 Owner: Thomas Goirand z...@debian.org

 * Package name: gf-complete
   Version : 1.02~0+2014.05.git259d53ea590b
   Upstream Author : Jim Plank pl...@cs.utk.edu
 * URL : https://bitbucket.org/jimplank/gf-complete
 * License : BSD-3-clause
   Programming Lang: C
   Description : Galois Field Arithmetic

  Galois Field arithmetic forms the backbone of erasure-coded storage systems,
  most famously the Reed-Solomon erasure code. A Galois Field is defined over
  w-bit words and is termed GF(2w). As such, the elements of a Galois Field 
 are
  the integers 0, 1, . . ., 2w − 1. Galois Field arithmetic defines addition 
 and
  multiplication over these closed sets of integers in such a way that they
  work as you would hope they would work. Specifically, every number has a
  unique multiplicative inverse. Moreover, there is a value, typically the 
 value
  2, which has the property that you can enumerate all of the non-zero 
 elements
  of the field by taking that value to successively higher powers.
 Hi Thomas,

 looks like a nice target to be maintained in Debian Science.

 Kind regards

   Andreas.

Actually, this is very applied science. It's used by jerasure, which
itself is used by PyECLib, which itself will be used by Swift for doing
storage with erasure support. I believe Ceph already uses Jerasure too.
Which is why I'm packaging all this: this is useful for OpenStack.

Cheers,

Thomas

P.S: This already is waiting in the NEW queue.


--
To UNSUBSCRIBE, email to debian-science-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: https://lists.debian.org/53a06b74.5070...@debian.org