On 2020-01-11 23:34, Steven D'Aprano wrote:
On Sun, Jan 12, 2020 at 11:59:20AM +1100, Chris Angelico wrote:

>The biggest difference is that scripts can't do relative imports.
How is that relevent? People keep mentioning minor differences between
different ways of executing different kinds of entities (scripts,
packages, submodules etc) but not why those differences are important or
why they would justify any change in the way -m works.

I don't think I condone the details of the OP's proposal, but I do agree that the process for executing Python files has some irritating warts. In fact, I would say the problem is precisely that a difference exists between running a "script" and a "module". So let me explain why I think this is annoying.

The pain point is relative imports. The docs at https://docs.python.org/3/reference/import.html#packages say:

"You can think of packages as the directories on a file system and modules as files within directories, but don’t take this analogy too literally since packages and modules need not originate from the file system."

The basic problem is that the overwhelming majority of packages and modules DO originate from the filesystem, and so people naturally want to be able to use the filesystem directly to represent package structure, REGARDLESS OF HOW OR WHETHER THE FILES ARE RUN OR IMPORTED. I'm sorry to put that in caps but that is really the fundamental issue. People want to be able to write something like "from . import stuff" in a file, and know that that will work purely based on the filesystem location in which that file is situated, regardless of how the file is "accessed" by Python (i.e., as a module, script, program, whatever you want to call it).

In other words, what non-expert users expect is that if there is a directory called `foo` with a subdirectory `bar` with some more files, that alone should be sufficient to establish that `foo` is a package with `bar` as a subpackage and the other files available as modules like `foo.stuff` and `foo.bar.morestuff`. (Some users perhaps understand that the folders should have an __init__.py to be considered part of the package, but I think even this is less well understood in the era of namespace packages.) It should not matter exactly how you "get to" these files in the first place --- that is, it should not matter whether you are importing a file or running one "as a script" or "as a module", nor should it matter precisely which file you run. The mere fact that a file "a.py" exists and is in the same directory with a file called "b.py" should be enough for "a.py" to use "from . import b" and have it work, always.

Now, I realize that there are various reasons why it doesn't work this way. Basically these reasons boil down to the fact that although most packages are transparently represented by their file/directory structure, there are also exist namespace packages, which can have a more diffuse file/directory structure, and it's also possible to create "virtual" packages that have no filesystem representation at all.

But the documentation is a long, long way from making this clear. For instance, it says this:

"For example, the following file system layout defines a top level parent package with three subpackages:"

But that's not true! The filesystem layout itself does not define the package! For relative import purposes, it only "counts" as a package if it's imported, not if a file in it is run directly. Otherwise it's just some files on disk, and if you run one of them "as a script", no package exists as far as Python is concerned.

The documentation does go on to describe how __main__ works and how the file's __name__ is set if it's run, and so on. But it does all this using the term "package", which is a trap for the unwary, because they already think package means "a directory with a certain structure" and not "something you get via the `import` statement".

Ultimately, the problem is that users (especially beginners) want to be able to put some files in a folder and have it work as a package as long as they are working locally in that folder --- without messing with sys.path or "installing" anything. In other words they want to create a directory and put "my_script.py" in there, and then put "mylib.py" in there and have the former use relative imports to get stuff from the latter. But they can't.

Personally, I am in agreement that this behavior is extremely bothersome. (In particular, the fact that __name__ becomes __main__ when the script is run, but is set to its usual name when it is imported, was a poor design decision that creates confusing asymmetries between the run and import cases.) It makes it unnecessarily difficult to write small, self-contained programs which make use of relative imports. Yes, it is better to write a setup.py and specify the dependencies, and blah blah, but for small tasks people often simply don't want to do that. They want to unzip their files into a directory and have it work, without notifying Python about installing anything or putting anything on the path.

As far as solutions, I think an idea worth considering would be a new command-line option similar to "-m" which effectively says "run this FILE that I am telling you, but pretend it is in whatever package it seems to be in based on the directory structure". So like suppose the option is -f for "file as module". It means if I do "python -f script.py", it would run that file, but correctly set up __package__ and so on so that "script.py" (and other files it imports) would be able to use relative imports. Maybe that would mean they could unexpectedly import higher than their level (i.e., use relative-import dots going above the actual top level of the package), or maybe the relative imports would be local to the directory where "script.py" is located, or maybe you could even specify the relative import "root" in a separate option, like "python -f script.py -r my/package/root".

The basic point is that people want to use relative imports without including boilerplate code to put themselves on sys.path, and without caring about whether the file is run directly or imported as a module, and without "installing" anything, and in general without thinking about anything except the local directory structure in which the file they are running is situated.

I realize that in many ways this is sloppy and you could say "don't do that", but I think if that is the position, the documentation needs to be seriously tightened up. In particular it needs to be made clear --- at every single mention! --- that "package" refers only to something that is imported and not to a file's "identity" based on its filesystem location.

Just over six years ago I wrote an answer about this on StackOverflow (https://stackoverflow.com/questions/14132789/relative-imports-for-the-billionth-time/14132912#14132912) that continues to get upvotes and comments of the form "wow why isn't this explained in the documentation" almost daily. I hope it is clear that, even if we want to leave the behavior exactly as it is, there is a major problem with how people think they can use relative imports based on the official documentation.

--
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail."
   --author unknown
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VE64KSEMU7IOUXSJ5HVFMDKTMXDUEZTG/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to