[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

Joao S. O. Bueno Wed, 04 Dec 2019 05:44:36 -0800

* The number of classes in a running interpreter.
* The number of live coroutines in a running interpreter.


These two can (and coroutines actually are always) dynamically generated -
and it is not hard to imagine
scenarios were 1 million for these would easily be beaten.

I don't know the data structures needed for those, but it would be much
saner to keep
both limited to 2**32 (due to reasons on artificial limiting the word
length as put by
others), or at least a much higher count.



On Tue, 3 Dec 2019 at 13:22, Mark Shannon <m...@hotpy.org> wrote:

> Hi Everyone,
>
> I am proposing a new PEP, still in draft form, to impose a limit of one
> million on various aspects of Python programs, such as the lines of code
> per module.
>
> Any thoughts or feedback?
>
> The PEP:
> https://github.com/markshannon/peps/blob/one-million/pep-1000000.rst
>
> Cheers,
> Mark.
>
>
> Full text
> *********
>
> PEP: 1000000
> Title: The one million limit
> Author: Mark Shannon <m...@hotpy.org>
> Status: Active
> Type: Enhancement
> Content-Type: text/x-rst
> Created: 03-Dec-2019
> Post-History:
>
>
>
> Abstract
> ========
> This PR proposes a limit of one million (1 000 000) for various aspects
> of Python code and its implementation.
>
> The Python language does not specify limits for many of its features.
> Not having any limit to these values seems to enhance programmer freedom,
> at least superficially, but in practice the CPython VM and other Python
> virtual
> machines have implicit limits or are forced to assume that the limits are
> astronomical, which is expensive.
>
> This PR lists a number of features which are to have a limit of one
> million.
> If a language feature is not listed but appears unlimited and must be
> finite,
> for physical reasons if no other, then a limit of one million should be
> assumed.
>
>
> Motivation
> ==========
>
> There are many values that need to be represented in a virtual machine.
> If no limit is specified for these values, then the representation must
> either be inefficient or vulnerable to overflow.
> The CPython virtual machine represents values like line numbers,
> stack offsets and instruction offsets by 32 bit values. This is
> inefficient, and potentially unsafe.
>
> It is inefficient as actual values rarely need more than a dozen or so
> bits to represent them.
>
> It is unsafe as malicious or poorly generated code could cause values to
> exceed 2\ :sup:`32`.
>
> For example, line numbers are represented by 32 bit values internally.
> This is inefficient, given that modules almost never exceed a few
> thousand lines.
> Despite being inefficent, is is still vulnerable to overflow as
> it is easy for an attacker to created a module with billions of newline
> characters.
>
> Memory access is usually a limiting factor in the performance of modern
> CPUs.
> Better packing of data structures enhances locality and reduces memory
> bandwith,
> at a modest increase in ALU usage (for shifting and masking).
> Being able to safely store important values in 20 bits would allow
> memory savings
> in several data structures including, but not limited to:
>
> * Frame objects
> * Object headers
> * Code objects
>
> There is also the potential for a more efficient instruction format,
> speeding up interpreter dispatch.
>
> Rationale
> =========
>
> Imposing a limit on values such as lines of code in a module, and the
> number of local variables,
> has significant advantages for ease of implementation and efficiency of
> virtual machines.
> If the limit is sufficiently large, there is no adverse effect on users
> of the language.
>
> By selecting a fixed but large limit for these values,
> it is possible to have both safety and efficiency whilst causing no
> inconvience to human programmers
> and only very rare problems for code generators.
>
> One million
> -----------
>
> The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1
> (65535) for many program
> elements similar to those covered here.
> This limit enables limited values to fit in 16 bits, which is a very
> efficient machine representation.
> However, this limit is quite easily exceeded in practice by code
> generators and
> the author is aware of existing Python code that already exceeds 2\
> :sup:`16` lines of code.
>
> A limit of one million fits into 20 bits which, although not as
> convenient for machine representation,
> is still reasonably compact. Three signed valuses in the range -1000_000
> to +1000_000 can fit into a 64 bit word.
> A limit of one million is small enough for efficiency advantages (only
> 20 bits),
> but large enough not to impact users (no one has ever written a module
> of one million lines).
>
> The value "one million" is very easy to remember.
>
> Isn't this "640K ought to be enough for anybody" again?
> -------------------------------------------------------
>
> The infamous 640K memory limit was a limit on machine usable resources.
> The proposed one million limit is a limit on human generated code.
>
> While it is possible that generated code could exceed the limit,
> it is easy for a code generator to modify its output to conform.
> The author has hit the 64K limit in the JVM on at least two occasions
> when generating Java code.
> The workarounds were relatively straightforward and
> probably wouldn't have been necessary with a limit of one million
> bytecodes or lines of code.
>
>
> Specification
> =============
>
> This PR proposes that the following language features and runtime values
> be limited to one million.
>
> * The number of source code lines in a module
> * The number of bytecode instructions in a code object.
> * The sum of local variables and stack usage for a code object.
> * The number of distinct names in a code object
> * The number of constants in a code object.
> * The number of classes in a running interpreter.
> * The number of live coroutines in a running interpreter.
>
> The advantages for CPython of imposing these limits:
> ----------------------------------------------------
>
> Line of code in a module and code object restrictions.
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> When compiling source code to bytecode or modifying bytecode for
> profiling or debugging,
> an intermediate form is required. The limiting line numbers and operands
> to 20 bits,
> instructions can be represented in a compact 64 bit form allowing
> very fast passes over the instruction sequence.
>
> Having 20 bit operands (21 bits for relative branches) allows instructions
> to fit into 32 bits without needing additional ``EXTENDED_ARG``
> instructions.
> This improves dispatch, as the operand is strictly local to the
> instruction.
> Using super-instructions would make that the 32 bit format
> almost as compact as the 16 bit format, and significantly faster.
>
> Total number of classes in a running interpreter
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> This limit has to the potential to reduce the size of object headers
> considerably.
>
> Currently objects have a two word header, for objects without references
> (int, float, str, etc.) or a four word header for objects with references.
> By reducing the maximum number of classes, the space for the class
> reference
> can be reduced from 64 bits to fewer than 32 bits allowing a much more
> compact header.
>
> For example, a super-compact header format might look like this:
>
> .. code-block::
>
>      struct header {
>          uint32_t gc_flags:6; /* Needs finalisation, might be part of a
> cycle, etc. */
>          uint32_t class_id:26; /* Can be efficiently mapped to address
> by ensuring suitable alignment of classes */
>          uint32_t refcount; /* Limited memory or saturating */
>      }
>
> This format would reduce the size of a Python object without slots, on a
> 64 bit machine, from 40 to 16 bytes.
>
> Note that there are two ways to use a 32 bit refcount on a 64 bit machine.
> One is to limit each sub-interpreter to 32Gb of memory.
> The other is to use a saturating reference count, which would be a
> little bit slower, but allow unlimited memory allocation.
>
>
> Backwards Compatibility
> =======================
>
> It is hypothetically possible that some machine generated code exceeds
> one or more of the above limits.
> The author believes that to be highly unlikely and easily fixed by
> modifying the output stage of the code generator.
>
>
> Security Implications
> =====================
>
> Minimal. This reduces the attack surface of any Python virtual machine
> by a small amount.
>
> Reference Implementation
> ========================
>
> None, as yet. This will be implemented in CPython, once the PEP has been
> accepted.
>
>
> Rejected Ideas
> ==============
>
> None, as yet.
>
>
> Open Issues
> ===========
>
> None, as yet.
>
>
> References
> ==========
>
> .. [1] The Java Virtual Machine specification
>
> https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf
>
>
>
> Copyright
> =========
>
> This document is placed in the public domain or under the
> CC0-1.0-Universal license, whichever is more permissive.
>
>
>
> ..
> Local Variables:
> mode: indented-text
> indent-tabs-mode: nil
> sentence-end-double-space: t
> fill-column: 70
> coding: utf-8
> End:
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-dev@python.org/message/QM4QUJOBQORN5WJ2WZ4RVSHSQG52VKCQ/
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/7SAPVZWEHEBW5SLBL5UPQKXVHVB4DYYV/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

Reply via email to