Re: [matplotlib-devel] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation

2009-12-29 Thread David Cournapeau
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield  wrote:
> Hi,
>
> In the toydist proposal/release notes, I would address 'what does
> toydist do better' more explicitly.
>
>
>
>  A big problem for science users is that numpy does not work with
> pypi + (easy_install, buildout or pip) and python 2.6. 
>
>
>
> Working with the rest of the python community as much as possible is
> likely a good goal.

Yes, but it is hopeless. Most of what is being discussed on
distutils-sig is useless for us, and what matters is ignored at best.
I think most people on distutils-sig are misguided, and I don't think
the community is representative of people concerned with packaging
anyway - most of the participants seem to be around web development,
and are mostly dismissive of other's concerns (OS packagers, etc...).

I want to note that I am not starting this out of thin air - I know
most of distutils code very well, I have been the mostly sole
maintainer of numpy.distutils for 2 years now. I have written
extensive distutils extensions, in particular numscons which is able
to fully build numpy, scipy and matplotlib on every platform that
matters.

Simply put, distutils code is horrible (this is an objective fact) and
 flawed beyond repair (this is more controversial). IMHO, it has
almost no useful feature, except being standard.

If you want a more detailed explanation of why I think distutils and
all tools on top are deeply flawed, you can look here:

http://cournape.wordpress.com/2009/04/01/python-packaging-a-few-observations-cabal-for-a-solution/

> numpy used to work with buildout in python2.5, but not with 2.6.
> buildout lets other team members get up to speed with a project by
> running one command.  It installs things in the local directory, not
> system wide.  So you can have different dependencies per project.

I don't think it is a very useful feature, honestly. It seems to me
that they created a huge infrastructure to split packages into tiny
pieces, and then try to get them back together, imaganing that
multiple installed versions is a replacement for backward
compatibility. Anyone with extensive packaging experience knows that's
a deeply flawed model in general.

> Plenty of good work is going on with python packaging.

That's the opposite of my experience. What I care about is:
  - tools which are hackable and easily extensible
  - robust install/uninstall
  - real, DAG-based build system
  - explicit and repeatability

None of this is supported by the tools, and the current directions go
even further away. When I have to explain at length why the
command-based design of distutils is a nightmare to work with, I don't
feel very confident that the current maintainers are aware of the
issues, for example. It shows that they never had to extend distutils
much.

>
> There are build farms for windows packages and OSX uploaded to pypi.
> Start uploading pre releases to pypi, and you get these for free (once
> you make numpy compile out of the box on those compile farms).  There
> are compile farms for other OSes too... like ubuntu/debian, macports
> etc.  Some distributions even automatically download, compile and
> package new releases once they spot a new file on your ftp/web site.

I am familiar with some of those systems (PPA and opensuse build
service in particular). One of the goal of my proposal is to make it
easier to interoperate with those tools.

I think Pypi is mostly useless. The lack of enforced metadata is a big
no-no IMHO. The fact that Pypi is miles beyond CRAN for example is
quite significant. I want CRAN for scientific python, and I don't see
Pypi becoming it in the near future.

The point of having our own Pypi-like server is that we could do the following:
 - enforcing metadata
 - making it easy to extend the service to support our needs

>
> pypm:  http://pypm.activestate.com/list-n.html#numpy

It is interesting to note that one of the maintainer of pypm has
recently quitted the discussion about Pypi, most likely out of
frustration from the other participants.

> Documentation projects are being worked on to document, give tutorials
> and make python packaging be easier all round.  As witnessed by 20 or
> so releases on pypi every day(and growing), lots of people are using
> the python packaging tools successfully.

This does not mean much IMO. Uploading on Pypi is almost required to
use virtualenv, buildout, etc.. An interesting metric is not how many
packages are uploaded, but how much it is used outside developers.

>
> I'm not sure making a separate build tool is a good idea.  I think
> going with the rest of the python community, and improving the tools
> there is a better idea.

It has been tried, and IMHO has been proved to have failed. You can
look at the recent discussion (the one started by Guido in
particular).

> pps. some notes on toydist itself.
> - toydist convert is cool for people converting a setup.py .  This
> means that most people can try out toydist right away.  but 

Re: [matplotlib-devel] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation

2009-12-29 Thread David Cournapeau
On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield  wrote:

> Buildout is what a lot of the python community are using now.

I would like to note that buildout is a solution to a problem that I
don't care to solve. This issue is particularly difficult to explain
to people accustomed with buildout in my experience - I have not found
a way to explain it very well yet.

Buildout, virtualenv all work by sandboxing from the system python:
each of them do not see each other, which may be useful for
development, but as a deployment solution to the casual user who may
not be familiar with python, it is useless. A scientist who installs
numpy, scipy, etc... to try things out want to have everything
available in one python interpreter, and does not want to jump to
different virtualenvs and whatnot to try different packages.

This has strong consequences on how you look at things from a packaging POV:
 - uninstall is crucial
 - a package bringing down python is a big no no (this happens way too
often when you install things through setuptools)
 - if something fails, the recovery should be trivial - the person
doing the installation may not know much about python
 - you cannot use sandboxing as a replacement for backward
compatibility (that's why I don't care much about all the discussion
about versioning - I don't think it is very useful as long as python
itself does not support it natively).

In the context of ruby, this article makes a similar point:
http://www.madstop.com/ruby/ruby_has_a_distribution_problem.html

David

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation

2009-12-29 Thread Gael Varoquaux
On Tue, Dec 29, 2009 at 11:34:44PM +0900, David Cournapeau wrote:
> Buildout, virtualenv all work by sandboxing from the system python:
> each of them do not see each other, which may be useful for
> development, but as a deployment solution to the casual user who may
> not be familiar with python, it is useless. A scientist who installs
> numpy, scipy, etc... to try things out want to have everything
> available in one python interpreter, and does not want to jump to
> different virtualenvs and whatnot to try different packages.

I think that you are pointing out a large source of misunderstanding
in packaging discussion. People behind setuptools, pip or buildout care
to have a working ensemble of packages that deliver an application (often
a web application)[1]. You and I, and many scientific developers see
libraries as building blocks that need to be assembled by the user, the
scientist using them to do new science. Thus the idea of isolation is not
something that we can accept, because it means that we are restricting
the user to a set of libraries.

Our definition of user is not the same as the user targeted by buildout.
Our user does not push buttons, but he writes code. However, unlike the
developer targeted by buildout and distutils, our user does not want or
need to learn about packaging.

Trying to make the debate clearer...

Gaël

[1] I know your position on why simply focusing on sandboxing working
ensemble of libraries is not a replacement for backward compatibility,
and will only create impossible problems in the long run. While I agree
with you, this is not my point here.

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation

2009-12-29 Thread Christopher Barker
David Cournapeau wrote:

> Buildout, virtualenv all work by sandboxing from the system python:
> each of them do not see each other, which may be useful for
> development,

And certain kinds of deployment, like web servers or installed tools.

> but as a deployment solution to the casual user who may
> not be familiar with python, it is useless. A scientist who installs
> numpy, scipy, etc... to try things out want to have everything
> available in one python interpreter, and does not want to jump to
> different virtualenvs and whatnot to try different packages.

Absolutely true -- which is why Python desperately needs package version 
selection of some sort. I've been tooting this horn on and off for years 
but never got any interest at all from the core python developers.

I see putting packages in with no version like having non-versioned 
dynamic libraries in a system -- i.e. dll hell. If I have a bunch of 
stuff running just fine with the various package versions I've 
installed, but then I start working on something (maybe just testing, 
maybe something more real) that requires the latest version of a 
package, I have a few choices:
   - install the new package and hope I don't break too much
   - use something like virtualenv, which requires a lot of overhead to 
setup and use (my evidence is personal, despite working with a team that 
uses it, somehow I've never gotten around to using for my dev work, even 
though, in theory, it should be a good solution)
   - setuptools does supposedly support multiple version installs and 
selection, but it's ugly and poorly documented enough that I've never 
figured out how to use it.

This has been addressed with a handful of ad-hock solution: wxPython as 
wxversion.select, and I think PyGTK has something, and who knows what 
else. It would be really nice to have a standard solution available.

Note that the usual response I've gotten is to use py2exe or something 
to distribute, so you're defining the whole stack. That's good for some 
things, but not all (though py2app's "alias" bundles are nice), and 
really pretty worthless for development. Also, many, many packages are a 
  pain to use with py2exe and friends anyway (see my forthcoming other 
long post...)

>  - you cannot use sandboxing as a replacement for backward
> compatibility (that's why I don't care much about all the discussion
> about versioning - I don't think it is very useful as long as python
> itself does not support it natively).

could be -- I'd love to have Python support it natively, though 
wxversion isn't too bad.

-Chris


-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

chris.bar...@noaa.gov

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel


Re: [matplotlib-devel] [Numpy-discussion] Announcing toydist, improving distribution and packaging situation

2009-12-29 Thread David Cournapeau
On Wed, Dec 30, 2009 at 3:36 AM, René Dudfield  wrote:
> On Tue, Dec 29, 2009 at 2:34 PM, David Cournapeau  wrote:
>> On Tue, Dec 29, 2009 at 10:27 PM, René Dudfield  wrote:
>>
>>> Buildout is what a lot of the python community are using now.
>>
>> I would like to note that buildout is a solution to a problem that I
>> don't care to solve. This issue is particularly difficult to explain
>> to people accustomed with buildout in my experience - I have not found
>> a way to explain it very well yet.
>
> Hello,
>
> The main problem buildout solves is getting developers up to speed
> very quickly on a project.  They should be able to call one command
> and get dozens of packages, and everything else needed ready to go,
> completely isolated from the rest of the system.
>
> If a project does not want to upgrade to the latest versions of
> packages, they do not have to.  This reduces the dependency problem a
> lot.  As one package does not have to block on waiting for 20 other
> packages.  It makes iterating packages daily, or even hourly to not be
> a problem - even with dozens of different packages used.  This is not
> theoretical, many projects iterate this quickly, and do not have
> problems.
>
> Backwards compatibility is of course a great thing to keep up... but
> harder to do with dozens of packages, some of which are third party
> ones.  For example, some people are running pygame applications
> written 8 years ago that are still running today on the latest
> versions of pygame.  I don't think people in the python world
> understand API, and ABI compatibility as much as those in the C world.
>
> However buildout is a solution to their problem, and allows them to
> iterate quickly with many participants, on many different projects.
> Many of these people work on maybe 20-100 different projects at once,
> and some machines may be running that many applications at once too.
> So using the system pythons packages is completely out of the question
> for them.

This is all great, but I don't care about solving this issue, this is
a *developer* issue. I don't mean this is not an important issue, it
is just totally out of scope.

The developer issues I care about are much more fine-grained (corrent
dependency handling between target, toolchain customization, etc...).
Note however that hopefully, by simplifying the packaging tools, the
problems you see with numpy on 2.6 would be less common. The whole
distutils/setuptools/distribute stack is hopelessly intractable, given
how messy the code is.

>
> It is very easy to include a dozen packages in a buildout, so that you
> have all the packages required.

I think there is a confusion - I mostly care about *end users*. People
who may not have compilers, who want to be able to easily upgrade one
package, etc...

David

--
This SF.Net email is sponsored by the Verizon Developer Community
Take advantage of Verizon's best-in-class app development support
A streamlined, 14 day to market process makes app distribution fast and easy
Join now and get one step closer to millions of Verizon customers
http://p.sf.net/sfu/verizon-dev2dev 
___
Matplotlib-devel mailing list
Matplotlib-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-devel