[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

Rhodri James Tue, 03 Dec 2019 10:08:41 -0800

On 03/12/2019 16:15, Mark Shannon wrote:

Hi Everyone,
I am proposing a new PEP, still in draft form, to impose a limit of onemillion on various aspects of Python programs, such as the lines of codeper module.
Any thoughts or feedback?

The PEP:
https://github.com/markshannon/peps/blob/one-million/pep-1000000.rst

Cheers,
Mark.


Full text
*********

PEP: 1000000
Title: The one million limit
Author: Mark Shannon <m...@hotpy.org>
Status: Active
Type: Enhancement
Content-Type: text/x-rst
Created: 03-Dec-2019
Post-History:



Abstract
========
This PR proposes a limit of one million (1 000 000) for various aspectsof Python code and its implementation.
The Python language does not specify limits for many of its features.
Not having any limit to these values seems to enhance programmer freedom,
at least superficially, but in practice the CPython VM and other Pythonvirtual
machines have implicit limits or are forced to assume that the limits are
astronomical, which is expensive.
This PR lists a number of features which are to have a limit of onemillion.If a language feature is not listed but appears unlimited and must befinite,for physical reasons if no other, then a limit of one million should beassumed.
Motivation
==========

There are many values that need to be represented in a virtual machine.
If no limit is specified for these values, then the representation musteither be inefficient or vulnerable to overflow.
The CPython virtual machine represents values like line numbers,
stack offsets and instruction offsets by 32 bit values. This isinefficient, and potentially unsafe.
It is inefficient as actual values rarely need more than a dozen or sobits to represent them.
It is unsafe as malicious or poorly generated code could cause values toexceed 2\ :sup:`32`.
For example, line numbers are represented by 32 bit values internally.
This is inefficient, given that modules almost never exceed a fewthousand lines.
Despite being inefficent, is is still vulnerable to overflow as
it is easy for an attacker to created a module with billions of newlinecharacters.
Memory access is usually a limiting factor in the performance of modernCPUs.Better packing of data structures enhances locality and reduces memorybandwith,
at a modest increase in ALU usage (for shifting and masking).
Being able to safely store important values in 20 bits would allowmemory savings
in several data structures including, but not limited to:

* Frame objects
* Object headers
* Code objects
There is also the potential for a more efficient instruction format,speeding up interpreter dispatch.
Rationale
=========
Imposing a limit on values such as lines of code in a module, and thenumber of local variables,has significant advantages for ease of implementation and efficiency ofvirtual machines.If the limit is sufficiently large, there is no adverse effect on usersof the language.
By selecting a fixed but large limit for these values,
it is possible to have both safety and efficiency whilst causing noinconvience to human programmers
and only very rare problems for code generators.

One million
-----------
The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1(65535) for many program
elements similar to those covered here.
This limit enables limited values to fit in 16 bits, which is a veryefficient machine representation.However, this limit is quite easily exceeded in practice by codegenerators andthe author is aware of existing Python code that already exceeds 2\:sup:`16` lines of code.
A limit of one million fits into 20 bits which, although not asconvenient for machine representation,is still reasonably compact. Three signed valuses in the range -1000_000to +1000_000 can fit into a 64 bit word.A limit of one million is small enough for efficiency advantages (only20 bits),but large enough not to impact users (no one has ever written a moduleof one million lines).

OK, let me stop you here. If you have twenty bits of information,you'll be fitting them into a 32-bit word anyway. Anything else will bemore or less inefficient to access, depending on your processor. Youaren't going to save anything there.

If you have plans to use the spare bits for something else, pleasedon't. I've seen this done in two major architectures (status flags forboth the IBM System/370 and ARM 2 and 3 architectures lived in the topbits of the program counter), and it was acknowledged to be a majormistake both times. Aside from limiting your expansion (Who would everwant more than 24 bits of address space? Everyone, it turns out :-),every access you make to that word is going to need to mask out somebits of the word. You would take an efficiency hit on every access.

Isn't this "640K ought to be enough for anybody" again?
-------------------------------------------------------

The infamous 640K memory limit was a limit on machine usable resources.
The proposed one million limit is a limit on human generated code.

While it is possible that generated code could exceed the limit,
it is easy for a code generator to modify its output to conform.

The author has hit the 64K limit in the JVM on at least two occasionswhen generating Java code.

The workarounds were relatively straightforward and

probably wouldn't have been necessary with a limit of one millionbytecodes or lines of code.

I can absolutely guarantee that this will come back and bite you.Someone out there will be doing something more complicated than youthink is plausible, and eventually someone will hit your limits. It maynot take as long as you think, either.


--
Rhodri James *-* Kynesim Ltd
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/6BJVZ6KA3OSDP5ID2RFZM3KRGLZS6VAD/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

Reply via email to