It's time to discuss Argument Clinic again. I think the implementation is ready for public scrutiny. (It was actually ready a week ago, but I lost a couple of days to "make distclean" corrupting my hg data store--yes, I hadn't upped my local clinic branch in a while. Eventually I gave up on repairing it and just brute-forcd it. Anyway...) My Clinic test branch is here: https://bitbucket.org/larry/python-clinic/ And before you ask, no, the above branch should never ever ever be merged back into trunk. We'll start clean once Clinic is ready for merging and do a nice neat job. ___________________________________________________________________ There's no documentation, apart from the PEP. But you can see plenty of test cases of using Clinic, just grep for the string "clinic" in */*.c. But for reference here's the list: Modules/_cursesmodule.c Modules/_datetimemodule.c Modules/_dbmmodule.c Modules/posixmodule.c Modules/unicodedata.c Modules/_weakref.c Modules/zlibmodule.c Objects/dictobject.c Objects/unicodeobject.c I haven't reimplemented every PyArg_ParseTuple "format unit" in the retooled Clinic, so it's not ready to try with every single builtin yet. The syntax is as Guido dictated it during our meeting after the Language Summit at PyCon US 2013. The implementation has been retooled, several times, and is now both nicer and more easily extensible. The internals are just a little messy, but the external interfaces are all ready for critique. ___________________________________________________________________ Here are the external interfaces as I forsee them. If you add your own data types, you'll subclass "Converter" and maybe "ReturnConverter". Take a look at the existing subclasses to get a feel for what that's like. If you implemented your own DSL, you'd make something that quacked like "PythonParser" (implementing __init__ and parse methods), and you'd deal with "Block", "Module", "Class", "Function", and "Parameter" objects a lot. What do you think? ___________________________________________________________________ What follows are six questions I'd like to put to the community, ranked oddly enough in order of how little to how much I care about the answer. BTW, by convention, every time I need a arbitrary sample function I use "os.stat". (Please quote the question line in your responses, otherwise I fear we'll get lost in the sea of text.) ___________________________________________________________________ Question 0: How should we integrate Clinic into the build process? Clinic presents a catch-22: you want it as part of the build process, but it needs Python to be built before it'll run. Currently it requires Python 3.3 or newer; it might work in 3.2, I've never tried it. We can't depend on Python 3 being available when we build. This complicates the build process somewhat. I imagine it's a solvable problem on UNIX... with the right wizardry. I have no idea how one'd approach it on Windows, but obviously we need to solve the problem there too. ___________________________________________________________________ Question 1: Which C function nomenclature? Argument Clinic generates two functions prototypes per Python function: one specifying one of the traditional signatures for builtins, whose code is generated completely by Clinic, and the other with a custom-generated signature for just that call whose code is written by the user. Currently the former doesn't have any specific name, though I have been thinking of it as the "parse" function. The latter is definitely called the "impl" (pronounced IM-pull), short for "implementation". When Clinic generates the C code, it uses the name of the Python function to create the C functions' names, with underscores in place of dots. Currently the "parse" function gets the base name ("os_stat"), and the "impl" function gets an "_impl" added to the end ("os_stat_impl"). Argument Clinic is agnostic about the names of these functions. It's possible it'd be nicer to name these the other way around, say "os_stat_parse" for the parse function and "os_stat" for the impl. Anyone have a strong opinion one way or the other? I don't much care; all I can say is that the "obvious" way to do it when I started was to add "_impl" to the impl, as it is the new creature under the sun. ___________________________________________________________________ Question 2: Emit code for modules and classes? Argument Clinic now understands the structure of the modules and classes it works with. You declare them like so: module os class os.ImaginaryClassHere def os.ImaginaryClassHere.stat(...): ... Currently it does very little with the information; right now it mainly just gets baked into the documentation. In the future I expect it to get used in the introspection metadata, and it'll definitely be relevant to external consumers of the Argument Clinic information (IDEs building per-release databases, other implementations building metadata for library interface conformance testing). Another way we could use this metadata: have Argument Clinic generate more of the boilerplate for a class or module. For example, it could kick out all the PyMethodDef structures for the class or module. If we grew Argument Clinic some, and taught it about the data members of classes and modules, it could also generate the PyModuleDef and PyTypeObject structures, and even generate a function that initialized them at runtime for you. (Though that does seem like mission creep to me.) There are some complications to this, one of which I'll discuss next. But I put it to you, gentle reader: how much boilerplate should Argument Clinic undertake to generate, and how much more class and module metadata should be wired in to it? ___________________________________________________________________ Question 3: #ifdef support for functions? Truth be told, I did experiment with having Argument Clinic generate more of the boilerplate associated with modules. Clinic already generates a macro per function defining that function's PyMethodDef structure, for example: #define OS_STAT_METHODDEF \ {"stat", (PyCFunction)os_stat, \ METH_VARARGS|METH_KEYWORDS, os_stat__doc__} For a while I had it generating the PyMethodDef structures, like so: /*[clinic] generate_method_defs os [clinic]*/ #define OS_METHODDEFS \ OS_STAT_METHODDEF, \ OS_ACCESS_METHODDEF, \ OS_TTYNAME_METHODDEF, \ static PyMethodDef os_methods[] = { OS_METHODDEFS /* existing methoddefs here... */ NULL } But I ran into trouble with os.ttyname(), which is only created and exposed if the platform defines HAVE_TTYNAME. Initially I'd just thrown all the Clinic stuff relevant to os.ttyname in the #ifdef block. But Clinic pays no attention to #ifdef statements--so it would still add OS_TTYNAME_METHODDEF, to OS_METHODDEFS. And kablooey! Right now I've backed out of this--I had enough to do without getting off into extra credit like this. But I'd like to return to it. It just seems natural to have Clinic generate this nasty boilerplate. Four approaches suggest themselves to me, listed below in order of least- to most-preferable in my opinion: 0) Don't have Clinic participate in populating the PyMethodDefs. 1) Teach Clinic to understand simple C preprocessor statements, just enough so it implicitly understands that os.ttyname was defined inside an #ifdef HAVE_TTYPE block. It would then intelligently generate the code to take this into account. 2) Explicitly tell Clinic that os.ttyname must have HAVE_TTYNAME defined in order to be active. Clinic then generates the code intelligently taking this into account, handwave handwave. 3) Change the per-function methoddef macro to have the trailing comma: #define OS_STAT_METHODDEF \ {"stat", (PyCFunction)os_stat, \ METH_VARARGS|METH_KEYWORDS, os_stat__doc__}, and suppress it in the macro Clinic generates: /*[clinic] generate_method_defs os [clinic]*/ #define OS_METHODDEFS \ OS_STAT_METHODDEF \ OS_ACCESS_METHODDEF \ OS_TTYNAME_METHODDEF \ And then the code surrounding os.ttyname can look like this: #ifdef HAVE_TTYNAME // ... real os.ttyname stuff here #else #define OS_STAT_TTYNAME #endif And I think that would work great, actually. But I haven't tried it. Do you agree that Argument Clinic should generate this information, and it should use the approach in 3) ? ___________________________________________________________________ Question 4: Return converters returning success/failure? With the addition of the "return converter", we have the lovely feature of being able to *return* a C type and have it converted back into a Python type. Your C extensions have never been more readable! The problem is that the PyObject * returned by a C builtin function serves two simultaneous purposes: it contains the return value on success, but also it is NULL if the function threw an exception. We can probably still do that for all pointer-y return types (I'm not sure, I haven't played with it yet). But if the impl function now returns "int", or some decidedly other non-pointer-y type, there's no longer a magic return value we can use to indicate "we threw an exception". This isn't the end of the world; I can detect that the impl threw an exception by calling PyErr_Occurred(). But I've been chided before for calling this unnecessarily; it's ever-so slightly expensive, in that it has to dereference TLS, and does so with an atomic operation. Not to mention that it's a function call! The impl should know whether or not it failed. So it's the interface we're defining that forces it to throw away that information. If we provided a way for it to return that information, we could shave off some cycles. The problem is, how do we do that in a way that doesn't suck? Four approaches suggest themselves to me, and sadly I think they all suck to one degree or another. In order of sucking least to most: 0) Return the real type and detect the exception with PyErr_Occurred(). This is by far the loveliest option, but it incurs runtime overhead. 1) Have the impl take an extra parameter, "int *failed". If the function fails, it sets that to a true value and returns whatever. 2) Have the impl return its calculated return value through an extra pointer-y parameter ("int *return_value"), and its actual return value is an int indicating success or failure. 3) Have the impl return a structure containing both the real return value and a success/failure integer. Then its return lines would look like this: return {-1, 0}; or maybe return {-3, PY_HORRIBLE_CLINIC_INTERFACE__SUCCESS}; Can we live with PyErr_Occurred() here? ___________________________________________________________________ Question 5: Keep too-magical class decorator Converter.wrap? Converter is the base class for converter objects, the objects that handle the details of converting a Python object into its C equivalent. The signature for Converter.__init__ has become complicated: def __init__(self, name, function, default=unspecified, *, doc_default=None, required=False) "name" is the name of the function ("stat"), "function" is an object representing the function for which this Converter is handling an argument (duck-type compatible with inspect.Signature), and default is the default (Python) value if any. "doc_default" is a string that overrides repr(default) in the documentation, handy if repr(default) is too ugly or you just want to mislead the user. "required", if True specifies that the parameter should be considered required, even if it has a default value. Complicating the matter further, converter subclasses may take extra (keyword-only and optional) parameters to configure exotic custom behavior. For example, the "Py_buffer" converter takes "zeroes" and "nullable"; the "path_t" converter implemented in posixmodule.c takes "allow_fd" and "nullable". This means that converter subclasses have to define a laborious __init__, including three parameters with defaults, then turn right around and pass most of the parameters back into super().__init__. This interface has changed several times during the development of Clinic, and I got tired of fixing up all my existing prototypes and super calls. So I made a class decorator that did it for me. Shield your eyes from the sulferous dark wizardry of Converter.wrap: @staticmethod def wrap(cls): class WrappedConverter(cls, Converter): def __init__(self, name, function, default=unspecified, *, doc_default=None, required=False, **kwargs): super(cls, self).__init__(name, function, default, doc_default=doc_default, required=required) cls.__init__(self, **kwargs) return functools.update_wrapper(WrappedConverter, cls, updated=()) When you decorate your class with Converter.wrap, you only define in your __init__ your custom arguments. All the arguments Converter.__init__ cares about are taken care of for you (aka hidden from you). As an example, here's the relevant bits of path_t_converter from posixmodule.c: @Converter.wrap class path_t_converter(Converter): def __init__(self, *, allow_fd=False, nullable=False): ... So on the one hand I admit it's smelly. On the other hand it hides a lot of stuff that the user needn't care about, and it makes the code simpler and easier to read. And it means we can change the required arguments for Converter.__init__ without breaking any code (as I have already happily done once or twice). I'd like to keep it in, and anoint it as the preferred way of declaring Converter subclasses. Anybody else have a strong opinion on this either way? (I don't currently have an equivalent mechanism for return converters--their interface is a lot simpler, and I just haven't needed it so far.) ___________________________________________________________________ Well! That's quite enough for now. //arry/
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com