Re: [Python-3000] Changing the import machinery

Ian Bicking Thu, 20 Apr 2006 10:53:18 -0700

Guido van Rossum wrote:
> It must not be my day. I don't understand anything you're saying.

Hmm... well, trying again...

> On 4/20/06, Ian Bicking <[EMAIL PROTECTED]> wrote:
> 
>>Cleaning up import stuff would be excellent.  Time spent debugging
>>imports is time wasted, but it happens all too often.
>>
>>I would argue against any list of loaders, or list of anything.  That
>>builds ambiguity directly into the system.  Without a list, if you want
>>ambiguity, a container loader could search a list of loaders.  Or if you
>>want to avoid all ambiguity, you could have a loader that was more picky.
> 
> 
> What ambiguity are you talking of here?

If you have a sys.path like ['/A', '/B'], and you have a package in 
/B/foo, you don't know if importing foo will import that package, 
because it depends on the contents of A.

>>Setuptools version-based eager loading
> 
> 
> Can you explain this? I don't actually know much of what setuptools
> does (nor does almost anyone else it seems :-) -- I have no frame of
> reference to understand this.

When you explicitly activate an egg, setuptools looks at the 
requirements listed for that egg and activates all those required eggs 
as well.  (Activating doesn't import anything, but adds the egg to 
sys.path if necessary, and recursively activates that eggs requirements)

So you get an early failure if something is simply missing, you don't 
have to wait for the import to happen.

It's mostly an aside, but the "try:import foo except ImportError: foo = 
None" pattern doesn't really work in this context, because what fails to 
import might be importable later when another egg is activated. 
Generally that pattern is best replaced with setuptools requirements.

>>can give you some confidence that
>>everything you think you need is installed, but can't provide much
>>confidence that everything you *think you are using* is actually what
>>you are using.  That is, it's been fairly common in my own experience
>>for me to realize some other version of a package is being loaded than
>>what I thought, or I spend an inordinant amount of time tweaking
>>requirements to get the right version of a package from one place
>>without affecting another package that needs a different version (or
>>perhaps is run with a different sys.path).
> 
> 
> Is this a setuptools thing or a Python thing? I don't understand what
> you mean by "tweaking requirements".

This is a setuptools thing... but I guess I was getting ahead of myself. 
  I guess I was alluding to a problem I didn't actually describe, then 
describing how setuptools tries but doesn't entirely succede in 
resolving the problem.

With the ambiguity of sys.path, it's hard to statically determine if you 
have all the requirements.  Well, with the lack of requirement 
declarations, you have to scan documentation and imports to actually 
figure anything out.  With the lack of version declarations, you have to 
scan lots of documentation to figure out if you have the right versions.

If you need to resolve a problem in a non-global manner, you can do so 
with alternate installation locations and sys.path manipulation.  How 
you do the sys.path manipulation is very important, as order is very 
important in this situation.  Side effects from later installations are 
common depending on how you set it up.  Sometimes these side effects are 
desired (e.g., a security upgrade), sometimes not.  Usually not.  Also 
the environment you use is very important -- if your manipulations 
depend on tweaked scripts or $PYTHONPATH, alternate entry points will 
often give you a non-working process, or partially-working.

In theory, with setuptools you can use a single global installation and 
let setuptools resolve the requirements and activate the correct 
packages.  So setuptools is doing the sys.path manipulation for you.  In 
practice I have found this to still be too implicit.

>>But, back to more concrete use cases:
>>
>>* Right now it is pretty hard to set up an environment where changes
>>elsewhere on the system can't leak in.  That is, installation of
>>something in a system-wide site-packages can cause problems everywhere
>>on the system, and even if you try to avoid these it is quite hard.  One
>>strategy is setting up an entirely different environment (aka, a
>>different prefix); this is heavy-feeling.  Another is avoiding site.py
>>or using a custom site.py, but the tool support is iffy for that.
> 
> 
> I have a sense of deja vu -- somehow I feel we've been discussing this
> some years ago. Are you sure you didn't post this same message around
> the time we were discussing PEP 302?

I didn't participate in that discussion, but maybe someone else felt the 
same way ;)

> Seriously, can you provide an example of the sort of thing you're
> thinking of? Again, is this setuptools specific or not? Is the problem
> caused by users setting their PYTHONPATH a certain way? (If so, how?)

No, this is not setuptools specific.  Setuptools has some workarounds 
related site.py (which I assume are themselves going away in Python 2.5, 
but I don't know)... but I don't think these relate.

It's hard to safely remove things from sys.path.  So site.py runs 
extremely early in the process, and adds a bunch of things to sys.path 
(including .pth processing).  If you want to create an isolated 
environment at that point, it feels like a lost cause -- sys.path has 
all sorts of spurious entries, and figuring out which ones are important 
and which ones are not important (and just introduce ambiguity) is hard.

PYTHONPATH is a way of adding things to the path, but there's no way of 
keeping things off the path.

>>There's really just no good way to tell Python to leave well enough alone.
> 
> 
> That's a rather general and vague remark. What do you mean?

Mostly what I was talking about up there.  Package installations 
frequently have unintended side effects.  And there's lots of 
complicated machinery in place (often in distribution packaging systems) 
to handle this complexity, but it'd be nicer if the complexity just 
didn't exist.

>>* Relatedly, installation and management when you don't have root or the
>>cooperation of root can be hard.  I think the answer to this is much
>>like the isolated environment, but the use case is fairly different.
> 
> 
> This I understand. What do you suggest we do to improve the situation?

Mostly the ability to set up a localized environment disassociated from 
the global installation (except the stdlib).  So, what I described up 
there was mostly my frustrations in managing multiple web applications 
running on the same server, where the apps require considerably 
different versions of software.  The solution I would prefer -- isolated 
and localized environments -- would also apply to the hosting situation.

>>* Configuration about where to install things (e.g., distutils.cfg) is
>>separate from information about where to look for things (sys.path).
>>These should form a consistent description of the environment, but
>>currently they are disassociated from each other.
> 
> 
> I didn't even know we have a distutils.cfg.

It isn't installed by default, but it's like ~/.pydistutils.cfg for the 
system.  It should exist more often than it does.

> I am beginning to believe that you're talking about issues with
> installation rather than issues in the import machinery proper. But
> maybe if you clear up some of my earlier confusion I'll understand
> better.

Yeah, I guess I've had installation on the brain for a while, but I 
think the two issues intersect a lot.  Some of the import machinery is 
complex to handle complex installation situations.

I think sys.path and the import machinery ignore installation currently, 
but I think it's a fake simplicity, since nothing appears on the path 
unless it is put there by someone.  There needs to be consistency 
between the two.

In my mind, there's some concept that encompasses both of these items, 
and I think if it is identified some of this will be easier to understand.

>>* Any kind of automatic installation is difficult, because you can't
>>really count on being able to install even the most inoccuous package in
>>an automated way.  There's too many manual overrides, and too many
>>redundant options, and few people actually have their system set up to
>>work without tweaking these options through the command line or other
>>feedback.
> 
> 
> Again that's rather vague and general. Please be specific. "Build a
> better mousetrap" can't be our requirements spec.

Again, it's just a frustration at the lack of integration of 
installation and importing.

>>* Personally I've settled on putting everything I make into a Python
>>package that is distutils-installable.  But many people don't.  I'm not
>>sure if this is just because the tools seem too hard, or the namespaces
>>feel too deep, or all the documentation starts without using packages,
>>or having '.' (sometimes) on sys.path does it, or what.  I'd rather
>>there be consistent practices; but the consistent practices that we have
>>that actually work (setup.py scripts and packages) are too heavy for a
>>lot of people.
> 
> 
> To the contrary, if your entire code fits in a single .py file, I'm
> not sure what distutils even buys you, and it sure costs a lot of
> complexity. FWIW Google's internal build system agrees with you and
> doesn't like 3rd party code that's not in a package.
> 
> I often have a hard time finding the source code in a distutils-based
> package that I have unpacked but am hesitant to install just to read
> the source code; there seems to be no consistent convention as to
> where the source is.

For experiments I never distutilify my code to start with.  No one does. 
  Certainly it's heavier than it should have to be.

I guess, to randomly create a parallel, it's a great feature than in 
Python every .py file is a module.  This gets people writing modular 
code without even realizing it; good practice is automatic, and doesn't 
even have any real overhead.  There's not much disconnect between the 
Right Way and the Easy Way.  I wish it was the same for packages and 
distutil code.

>>* People are seriously planning on using relative imports to manage
>>their packages, and so an application will be 'installed' by putting it
>>into another package.  Presumably unpacking it directly in some other
>>package's directory.  Who knows what the version control plans are, or
>>maintenance, or whatever.  I think it's a bad idea.  We need to give
>>these people a carrot to keep them from doing this.
> 
> 
> You seem to have version control on your brain. (Not that that's a bad
> thing, but most people don't -- it's a specific way of looking at
> things.) I'm not entirely sure I understand what bad practice you're
> describing here; and what do you propose to do instead? KSurely not
> killing relative import?

No, definitely not.  I'm just noting that I think relative imports will 
be used by people who want to avoid the current distutil process, even 
though relative imports are not a good solution.  The stick way of 
keeping people from doing this would be keeping relative imports out of 
the language.  The carrot way is making the good solution easier than 
the bad solution.

>>* Right now namespace packages are hard.  That is, a Python package
>>(like 'zope') that is used by several distutils packages.  I almost feel
>>like namespace packages should be installed flat, like 'zope-interface'
>>and 'zope-tal', and turned into namespaces dynamically.
> 
> 
> Fine. That's the most advanced usage there is. I'd be happy if we
> solved all other problems first.

Yeah, I'm not even sure how important namespace packages are.  They 
don't actually buy you anything; it's not like zope.interface is 
magically more modular and elegant than zopeinterface.

Maybe it should really just be seen as a case of the stability of 
imports; zope.interface needs to be supported because people are using 
it.  As distributions are refactored, they don't want to also refactor 
imports.

>>* The module layout is used both as an API and as an internal factoring
>>of the code.  If you want to refactor the code you break the API.
>>Personally I really like the strong connection between imports and code
>>  location, and appreciate how easily I can find code as a result.  But
>>setting up the scaffolding and warnings necessary when moving a module
>>can be tiresome.
> 
> 
> I don't expect there's a silver bullet here; backwards compatibility
> hacks are always tiresome.

No; the most obvious solution of declaring and mapping all external 
interfaces is not a solution I would care for.

Setuptools' entry points also offer some help here, basically by 
providing a such a declarative frontend, but with the query facilities 
to actually add extra value.

>>* Circular imports should fail more nicely.  Everyone suffers this at
>>some time; maybe it can't be fixed, but at least it should be clear
>>what's happening.
> 
> 
> It probably can't be fixed (or do you see a fix?). Do you mean it
> should be easier to debug, or do you mean it should be explained
> better how things work?

I don't see any fix.  Maybe just something like a giving 
AttributeError('partially-loaded module foo has no attribute bar') 
instead of the really unhelpful AttributeError("'module' object has no 
attribute 'foo'").  Just some error that lets the developer know that a 
circular import is happening.

>>* You can't really tell if "from foo import bar" can be written as
>>"import foo; bar = foo.bar", because it works if foo contains bar, but
>>not if foo is a package and bar is a module in that package.
> 
> 
> And is that bad? "from foo import bar" means what it means. Nobody
> said the other code is equivalent. If you know what foo is you will
> know whether the other code is equivalent; isn't that enough?

There's no explicit namespace operator, so you don't necessarily know 
what foo is, nor do you have to.  But it's a minor issue.

>>Well... I think that's maybe half way through the list of issues I have,
>>but this email is already much too long.
> 
> 
> And still not clear enough. It's difficult to communicate when one
> party has spent a long time exploring the issues and another is just
> starting; this time I seem to be on the receiving end (== haven't
> spent a lot of time).

Quite understandable; maybe a few iterations of this and I'll make 
myself clear ;)  I'm also at a loss in some ways, as I'm only really 
familiar with how Python works.  There's probably some useful mechanisms 
and terminology from other languages that we could learn from -- I've 
heard interesting things about .NET, for instance.  OTOH, there's a 
*lot* of really bad examples in other languages too ;)  When I see 
references to package management in other languages (especially dynamic 
ones), most of the problems are very recognizable.

I actually suspect a there's a set of practices here which is 
language-neutral, but I don't know of any community where they could 
really be worked on.  Linux distributions, perhaps, but many of the 
people involved in packaging come off as surprisingly stodgy.  They've 
spent so much time working around these problems that they dislike any 
attempt to fix the problem.

-- 
Ian Bicking  /  [EMAIL PROTECTED]  /  http://blog.ianbicking.org
_______________________________________________
Python-3000 mailing list
Python-3000@python.org
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Re: [Python-3000] Changing the import machinery

Reply via email to