On 28/03/2019 23.35, Steve Dower wrote: > Hi all > > Time is short, but I'm hoping to get PEP 578 (formerly PEP 551) into > Python 3.8. Here's the current text for review and comment before I > submit to the Steering Council. > > The formatted text is at https://www.python.org/dev/peps/pep-0578/ > (update just pushed, so give it an hour or so, but it's fundamentally > the same as what's there) > > No Discourse post, because we don't have a python-dev equivalent there > yet, so please reply here for this one. > > Implementation is at https://github.com/zooba/cpython/tree/pep-578/ and > my backport to 3.7 (https://github.com/zooba/cpython/tree/pep-578-3.7/) > is already getting some real use (though this will not be added to 3.7, > unless people *really* want it, so the backport is just for reference). > > Cheers, > Steve > > ===== > > PEP: 578 > Title: Python Runtime Audit Hooks > Version: $Revision$ > Last-Modified: $Date$ > Author: Steve Dower <steve.do...@python.org> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 16-Jun-2018 > Python-Version: 3.8 > Post-History: > > Abstract > ======== > > This PEP describes additions to the Python API and specific behaviors > for the CPython implementation that make actions taken by the Python > runtime visible to auditing tools. Visibility into these actions > provides opportunities for test frameworks, logging frameworks, and > security tools to monitor and optionally limit actions taken by the > runtime. > > This PEP proposes adding two APIs to provide insights into a running > Python application: one for arbitrary events, and another specific to > the module import system. The APIs are intended to be available in all > Python implementations, though the specific messages and values used > are unspecified here to allow implementations the freedom to determine > how best to provide information to their users. Some examples likely > to be used in CPython are provided for explanatory purposes. > > See PEP 551 for discussion and recommendations on enhancing the > security of a Python runtime making use of these auditing APIs. > > Background > ========== > > Python provides access to a wide range of low-level functionality on > many common operating systems. While this is incredibly useful for > "write-once, run-anywhere" scripting, it also makes monitoring of > software written in Python difficult. Because Python uses native system > APIs directly, existing monitoring tools either suffer from limited > context or auditing bypass. > > Limited context occurs when system monitoring can report that an > action occurred, but cannot explain the sequence of events leading to > it. For example, network monitoring at the OS level may be able to > report "listening started on port 5678", but may not be able to > provide the process ID, command line, parent process, or the local > state in the program at the point that triggered the action. Firewall > controls to prevent such an action are similarly limited, typically > to process names or some global state such as the current user, and > in any case rarely provide a useful log file correlated with other > application messages. > > Auditing bypass can occur when the typical system tool used for an > action would ordinarily report its use, but accessing the APIs via > Python do not trigger this. For example, invoking "curl" to make HTTP > requests may be specifically monitored in an audited system, but > Python's "urlretrieve" function is not. > > Within a long-running Python application, particularly one that > processes user-provided information such as a web app, there is a risk > of unexpected behavior. This may be due to bugs in the code, or > deliberately induced by a malicious user. In both cases, normal > application logging may be bypassed resulting in no indication that > anything out of the ordinary has occurred. > > Additionally, and somewhat unique to Python, it is very easy to affect > the code that is run in an application by manipulating either the > import system's search path or placing files earlier on the path than > intended. This is often seen when developers create a script with the > same name as the module they intend to use - for example, a > ``random.py`` file that attempts to import the standard library > ``random`` module. > > This is not sandboxing, as this proposal does not attempt to prevent > malicious behavior (though it enables some new options to do so). > See the `Why Not A Sandbox`_ section below for further discussion. > > Overview of Changes > =================== > > The aim of these changes is to enable both application developers and > system administrators to integrate Python into their existing > monitoring systems without dictating how those systems look or behave. > > We propose two API changes to enable this: an Audit Hook and Verified > Open Hook. Both are available from Python and native code, allowing > applications and frameworks written in pure Python code to take > advantage of the extra messages, while also allowing embedders or > system administrators to deploy builds of Python where auditing is > always enabled. > > Only CPython is bound to provide the native APIs as described here. > Other implementations should provide the pure Python APIs, and > may provide native versions as appropriate for their underlying > runtimes. Auditing events are likewise considered implementation > specific, but are bound by normal feature compatibility guarantees. > > Audit Hook > ---------- > > In order to observe actions taken by the runtime (on behalf of the > caller), an API is required to raise messages from within certain > operations. These operations are typically deep within the Python > runtime or standard library, such as dynamic code compilation, module > imports, DNS resolution, or use of certain modules such as ``ctypes``. > > The following new C APIs allow embedders and CPython implementors to > send and receive audit hook messages:: > > # Add an auditing hook > typedef int (*hook_func)(const char *event, PyObject *args, > void *userData); > int PySys_AddAuditHook(hook_func hook, void *userData); > > # Raise an event with all auditing hooks > int PySys_Audit(const char *event, PyObject *args); > > # Internal API used during Py_Finalize() - not publicly accessible > void _Py_ClearAuditHooks(void); > > The new Python APIs for receiving and raising audit hooks are:: > > # Add an auditing hook > sys.addaudithook(hook: Callable[[str, tuple]]) > > # Raise an event with all auditing hooks > sys.audit(str, *args) > > > Hooks are added by calling ``PySys_AddAuditHook()`` from C at any time, > including before ``Py_Initialize()``, or by calling > ``sys.addaudithook()`` from Python code. Hooks cannot be removed or > replaced. > > When events of interest are occurring, code can either call > ``PySys_Audit()`` from C (while the GIL is held) or ``sys.audit()``. The > string argument is the name of the event, and the tuple contains > arguments. A given event name should have a fixed schema for arguments, > which should be considered a public API (for each x.y version release), > and thus should only change between feature releases with updated > documentation. > > For maximum compatibility, events using the same name as an event in > the reference interpreter CPython should make every attempt to use > compatible arguments. Including the name or an abbreviation of the > implementation in implementation-specific event names will also help > prevent collisions. For example, a ``pypy.jit_invoked`` event is clearly > distinguised from an ``ipy.jit_invoked`` event. > > When an event is audited, each hook is called in the order it was added > with the event name and tuple. If any hook returns with an exception > set, later hooks are ignored and *in general* the Python runtime should > terminate. This is intentional to allow hook implementations to decide > how to respond to any particular event. The typical responses will be to > log the event, abort the operation with an exception, or to immediately > terminate the process with an operating system exit call. > > When an event is audited but no hooks have been set, the ``audit()`` > function should impose minimal overhead. Ideally, each argument is a > reference to existing data rather than a value calculated just for the > auditing call. > > As hooks may be Python objects, they need to be freed during > ``Py_Finalize()``. To do this, we add an internal API > ``_Py_ClearAuditHooks()`` that releases any Python hooks and any > memory held. This is an internal function with no public export, and > we recommend it raise its own audit event for all current hooks to > ensure that unexpected calls are observed. > > Below in `Suggested Audit Hook Locations`_, we recommend some important > operations that should raise audit events. > > Python implementations should document which operations will raise > audit events, along with the event schema. It is intentional that > ``sys.addaudithook(print)`` be a trivial way to display all messages. > > Verified Open Hook > ------------------ > > Most operating systems have a mechanism to distinguish between files > that can be executed and those that can not. For example, this may be an > execute bit in the permissions field, a verified hash of the file > contents to detect potential code tampering, or file system path > restrictions. These are an important security mechanism for preventing > execution of data or code that is not approved for a given environment. > Currently, Python has no way to integrate with these when launching > scripts or importing modules. > > The new public C API for the verified open hook is:: > > # Set the handler > typedef PyObject *(*hook_func)(PyObject *path, void *userData) > int PyImport_SetOpenForImportHook(hook_func handler, void *userData) > > # Open a file using the handler > PyObject *PyImport_OpenForImport(const char *path) > > The new public Python API for the verified open hook is:: > > # Open a file using the handler > importlib.util.open_for_import(path : str) -> io.IOBase > > > The ``importlib.util.open_for_import()`` function is a drop-in > replacement for ``open(str(pathlike), 'rb')``. Its default behaviour is > to open a file for raw, binary access. To change the behaviour a new > handler should be set. Handler functions only accept ``str`` arguments. > The C API ``PyImport_OpenForImport`` function assumes UTF-8 encoding.
[...] > All import and execution functionality involving code from a file will > be changed to use ``open_for_import()`` unconditionally. It is important > to note that calls to ``compile()``, ``exec()`` and ``eval()`` do not go > through this function - an audit hook that includes the code from these > calls is the best opportunity to validate code that is read from the > file. Given the current decoupling between import and execution in > Python, most imported code will go through both ``open_for_import()`` > and the log hook for ``compile``, and so care should be taken to avoid > repeating verification steps. > > There is no Python API provided for changing the open hook. To modify > import behavior from Python code, use the existing functionality > provided by ``importlib``. I think that the import hook needs to be extended. It only works for simple Python files or pyc files. There are at least two other important scenarios: zipimport and shared libraries. For example how does the importhook work in regarding of alternative importers like zipimport? What does the import hook 'see' for an import from a zipfile? Shared libraries are trickier. libc doesn't define a way to dlopen() from a file descriptor. dlopen() takes a file name, but a file name leaves the audit hook open to a TOCTOU attack. Christian _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com