I would not have such a small limit. I can envisage generating code from a log then evaluating that code. 1 million lines could be small, given the speed of the interpreter on modern machines. One might want to generate data as a Python file rather than a pile and load that as a module. There might well be alternate methods, but if it works and is quickly debugged at small scale, then you put a possible extra barrier in the way. Python is still a scripting language, why limit a "quick hack".
On Tue, Dec 3, 2019, 4:22 PM Mark Shannon <m...@hotpy.org> wrote: > Hi Everyone, > > I am proposing a new PEP, still in draft form, to impose a limit of one > million on various aspects of Python programs, such as the lines of code > per module. > > Any thoughts or feedback? > > The PEP: > https://github.com/markshannon/peps/blob/one-million/pep-1000000.rst > > Cheers, > Mark. > > > Full text > ********* > > PEP: 1000000 > Title: The one million limit > Author: Mark Shannon <m...@hotpy.org> > Status: Active > Type: Enhancement > Content-Type: text/x-rst > Created: 03-Dec-2019 > Post-History: > > > > Abstract > ======== > This PR proposes a limit of one million (1 000 000) for various aspects > of Python code and its implementation. > > The Python language does not specify limits for many of its features. > Not having any limit to these values seems to enhance programmer freedom, > at least superficially, but in practice the CPython VM and other Python > virtual > machines have implicit limits or are forced to assume that the limits are > astronomical, which is expensive. > > This PR lists a number of features which are to have a limit of one > million. > If a language feature is not listed but appears unlimited and must be > finite, > for physical reasons if no other, then a limit of one million should be > assumed. > > > Motivation > ========== > > There are many values that need to be represented in a virtual machine. > If no limit is specified for these values, then the representation must > either be inefficient or vulnerable to overflow. > The CPython virtual machine represents values like line numbers, > stack offsets and instruction offsets by 32 bit values. This is > inefficient, and potentially unsafe. > > It is inefficient as actual values rarely need more than a dozen or so > bits to represent them. > > It is unsafe as malicious or poorly generated code could cause values to > exceed 2\ :sup:`32`. > > For example, line numbers are represented by 32 bit values internally. > This is inefficient, given that modules almost never exceed a few > thousand lines. > Despite being inefficent, is is still vulnerable to overflow as > it is easy for an attacker to created a module with billions of newline > characters. > > Memory access is usually a limiting factor in the performance of modern > CPUs. > Better packing of data structures enhances locality and reduces memory > bandwith, > at a modest increase in ALU usage (for shifting and masking). > Being able to safely store important values in 20 bits would allow > memory savings > in several data structures including, but not limited to: > > * Frame objects > * Object headers > * Code objects > > There is also the potential for a more efficient instruction format, > speeding up interpreter dispatch. > > Rationale > ========= > > Imposing a limit on values such as lines of code in a module, and the > number of local variables, > has significant advantages for ease of implementation and efficiency of > virtual machines. > If the limit is sufficiently large, there is no adverse effect on users > of the language. > > By selecting a fixed but large limit for these values, > it is possible to have both safety and efficiency whilst causing no > inconvience to human programmers > and only very rare problems for code generators. > > One million > ----------- > > The Java Virtual Machine (JVM) [1]_ specifies a limit of 2\ :sup:`16`-1 > (65535) for many program > elements similar to those covered here. > This limit enables limited values to fit in 16 bits, which is a very > efficient machine representation. > However, this limit is quite easily exceeded in practice by code > generators and > the author is aware of existing Python code that already exceeds 2\ > :sup:`16` lines of code. > > A limit of one million fits into 20 bits which, although not as > convenient for machine representation, > is still reasonably compact. Three signed valuses in the range -1000_000 > to +1000_000 can fit into a 64 bit word. > A limit of one million is small enough for efficiency advantages (only > 20 bits), > but large enough not to impact users (no one has ever written a module > of one million lines). > > The value "one million" is very easy to remember. > > Isn't this "640K ought to be enough for anybody" again? > ------------------------------------------------------- > > The infamous 640K memory limit was a limit on machine usable resources. > The proposed one million limit is a limit on human generated code. > > While it is possible that generated code could exceed the limit, > it is easy for a code generator to modify its output to conform. > The author has hit the 64K limit in the JVM on at least two occasions > when generating Java code. > The workarounds were relatively straightforward and > probably wouldn't have been necessary with a limit of one million > bytecodes or lines of code. > > > Specification > ============= > > This PR proposes that the following language features and runtime values > be limited to one million. > > * The number of source code lines in a module > * The number of bytecode instructions in a code object. > * The sum of local variables and stack usage for a code object. > * The number of distinct names in a code object > * The number of constants in a code object. > * The number of classes in a running interpreter. > * The number of live coroutines in a running interpreter. > > The advantages for CPython of imposing these limits: > ---------------------------------------------------- > > Line of code in a module and code object restrictions. > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > When compiling source code to bytecode or modifying bytecode for > profiling or debugging, > an intermediate form is required. The limiting line numbers and operands > to 20 bits, > instructions can be represented in a compact 64 bit form allowing > very fast passes over the instruction sequence. > > Having 20 bit operands (21 bits for relative branches) allows instructions > to fit into 32 bits without needing additional ``EXTENDED_ARG`` > instructions. > This improves dispatch, as the operand is strictly local to the > instruction. > Using super-instructions would make that the 32 bit format > almost as compact as the 16 bit format, and significantly faster. > > Total number of classes in a running interpreter > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > This limit has to the potential to reduce the size of object headers > considerably. > > Currently objects have a two word header, for objects without references > (int, float, str, etc.) or a four word header for objects with references. > By reducing the maximum number of classes, the space for the class > reference > can be reduced from 64 bits to fewer than 32 bits allowing a much more > compact header. > > For example, a super-compact header format might look like this: > > .. code-block:: > > struct header { > uint32_t gc_flags:6; /* Needs finalisation, might be part of a > cycle, etc. */ > uint32_t class_id:26; /* Can be efficiently mapped to address > by ensuring suitable alignment of classes */ > uint32_t refcount; /* Limited memory or saturating */ > } > > This format would reduce the size of a Python object without slots, on a > 64 bit machine, from 40 to 16 bytes. > > Note that there are two ways to use a 32 bit refcount on a 64 bit machine. > One is to limit each sub-interpreter to 32Gb of memory. > The other is to use a saturating reference count, which would be a > little bit slower, but allow unlimited memory allocation. > > > Backwards Compatibility > ======================= > > It is hypothetically possible that some machine generated code exceeds > one or more of the above limits. > The author believes that to be highly unlikely and easily fixed by > modifying the output stage of the code generator. > > > Security Implications > ===================== > > Minimal. This reduces the attack surface of any Python virtual machine > by a small amount. > > Reference Implementation > ======================== > > None, as yet. This will be implemented in CPython, once the PEP has been > accepted. > > > Rejected Ideas > ============== > > None, as yet. > > > Open Issues > =========== > > None, as yet. > > > References > ========== > > .. [1] The Java Virtual Machine specification > > https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf > > > > Copyright > ========= > > This document is placed in the public domain or under the > CC0-1.0-Universal license, whichever is more permissive. > > > > .. > Local Variables: > mode: indented-text > indent-tabs-mode: nil > sentence-end-double-space: t > fill-column: 70 > coding: utf-8 > End: > _______________________________________________ > Python-Dev mailing list -- python-dev@python.org > To unsubscribe send an email to python-dev-le...@python.org > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/python-dev@python.org/message/QM4QUJOBQORN5WJ2WZ4RVSHSQG52VKCQ/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-le...@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6JHMUZDE5D4PJP5GM3DVABIFKP46JVNE/ Code of Conduct: http://python.org/psf/codeofconduct/