On 2020-01-11 23:34, Steven D'Aprano wrote:
On Sun, Jan 12, 2020 at 11:59:20AM +1100, Chris Angelico wrote:
>The biggest difference is that scripts can't do relative imports.
How is that relevent? People keep mentioning minor differences between
different ways of executing different kinds of entities (scripts,
packages, submodules etc) but not why those differences are important or
why they would justify any change in the way -m works.
I don't think I condone the details of the OP's proposal, but I do
agree that the process for executing Python files has some irritating
warts. In fact, I would say the problem is precisely that a difference
exists between running a "script" and a "module". So let me explain why
I think this is annoying.
The pain point is relative imports. The docs at
https://docs.python.org/3/reference/import.html#packages say:
"You can think of packages as the directories on a file system and
modules as files within directories, but don’t take this analogy too
literally since packages and modules need not originate from the file
system."
The basic problem is that the overwhelming majority of packages and
modules DO originate from the filesystem, and so people naturally want
to be able to use the filesystem directly to represent package
structure, REGARDLESS OF HOW OR WHETHER THE FILES ARE RUN OR IMPORTED.
I'm sorry to put that in caps but that is really the fundamental issue.
People want to be able to write something like "from . import stuff"
in a file, and know that that will work purely based on the filesystem
location in which that file is situated, regardless of how the file is
"accessed" by Python (i.e., as a module, script, program, whatever you
want to call it).
In other words, what non-expert users expect is that if there is a
directory called `foo` with a subdirectory `bar` with some more files,
that alone should be sufficient to establish that `foo` is a package
with `bar` as a subpackage and the other files available as modules like
`foo.stuff` and `foo.bar.morestuff`. (Some users perhaps understand
that the folders should have an __init__.py to be considered part of the
package, but I think even this is less well understood in the era of
namespace packages.) It should not matter exactly how you "get to"
these files in the first place --- that is, it should not matter whether
you are importing a file or running one "as a script" or "as a module",
nor should it matter precisely which file you run. The mere fact that a
file "a.py" exists and is in the same directory with a file called
"b.py" should be enough for "a.py" to use "from . import b" and have it
work, always.
Now, I realize that there are various reasons why it doesn't work this
way. Basically these reasons boil down to the fact that although most
packages are transparently represented by their file/directory
structure, there are also exist namespace packages, which can have a
more diffuse file/directory structure, and it's also possible to create
"virtual" packages that have no filesystem representation at all.
But the documentation is a long, long way from making this clear. For
instance, it says this:
"For example, the following file system layout defines a top level
parent package with three subpackages:"
But that's not true! The filesystem layout itself does not define the
package! For relative import purposes, it only "counts" as a package if
it's imported, not if a file in it is run directly. Otherwise it's just
some files on disk, and if you run one of them "as a script", no package
exists as far as Python is concerned.
The documentation does go on to describe how __main__ works and how the
file's __name__ is set if it's run, and so on. But it does all this
using the term "package", which is a trap for the unwary, because they
already think package means "a directory with a certain structure" and
not "something you get via the `import` statement".
Ultimately, the problem is that users (especially beginners) want to be
able to put some files in a folder and have it work as a package as long
as they are working locally in that folder --- without messing with
sys.path or "installing" anything. In other words they want to create a
directory and put "my_script.py" in there, and then put "mylib.py" in
there and have the former use relative imports to get stuff from the
latter. But they can't.
Personally, I am in agreement that this behavior is extremely
bothersome. (In particular, the fact that __name__ becomes __main__
when the script is run, but is set to its usual name when it is
imported, was a poor design decision that creates confusing asymmetries
between the run and import cases.) It makes it unnecessarily difficult
to write small, self-contained programs which make use of relative
imports. Yes, it is better to write a setup.py and specify the
dependencies, and blah blah, but for small tasks people often simply
don't want to do that. They want to unzip their files into a directory
and have it work, without notifying Python about installing anything or
putting anything on the path.
As far as solutions, I think an idea worth considering would be a new
command-line option similar to "-m" which effectively says "run this
FILE that I am telling you, but pretend it is in whatever package it
seems to be in based on the directory structure". So like suppose the
option is -f for "file as module". It means if I do "python -f
script.py", it would run that file, but correctly set up __package__ and
so on so that "script.py" (and other files it imports) would be able to
use relative imports. Maybe that would mean they could unexpectedly
import higher than their level (i.e., use relative-import dots going
above the actual top level of the package), or maybe the relative
imports would be local to the directory where "script.py" is located, or
maybe you could even specify the relative import "root" in a separate
option, like "python -f script.py -r my/package/root".
The basic point is that people want to use relative imports without
including boilerplate code to put themselves on sys.path, and without
caring about whether the file is run directly or imported as a module,
and without "installing" anything, and in general without thinking about
anything except the local directory structure in which the file they are
running is situated.
I realize that in many ways this is sloppy and you could say "don't do
that", but I think if that is the position, the documentation needs to
be seriously tightened up. In particular it needs to be made clear ---
at every single mention! --- that "package" refers only to something
that is imported and not to a file's "identity" based on its filesystem
location.
Just over six years ago I wrote an answer about this on StackOverflow
(https://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time/14132912#14132912)
that continues to get upvotes and comments of the form "wow why isn't
this explained in the documentation" almost daily. I hope it is clear
that, even if we want to leave the behavior exactly as it is, there is a
major problem with how people think they can use relative imports based
on the official documentation.
--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no
path, and leave a trail."
--author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/python-ideas@python.org/message/VE64KSEMU7IOUXSJ5HVFMDKTMXDUEZTG/
Code of Conduct: http://python.org/psf/codeofconduct/