Excerpts from Brett Cannon's message of 2017-01-21 19:51:48 +0000: > What I'm picking up from this is (as a gross oversimplification): > > * Victor _wants_ code reviews > * Raymond thinks we _need_ code reviews > > So the common theme here regardless of whether you agree with Raymond or > Victor's approach to development is that we are not getting enough code > reviews to go around. To me that's what the systemic issue is that this > email is bringing up. > > Now I think most of us don't think the solution to the lack of reviews is > to lower our standard of what it takes to become a core developer (this > doesn't mean we shouldn't do a better job of identifying potential > candidates, just that we shouldn't give people commit privileges after a > single patch like some projects do). To me that means we need to address > why out of 79 core developers only 39 have a single commit over the past > year, 30/79 have more than 12 commits over that same time scale, 15/79 > people have more than 52 commits, and 2/79 people have over 365 commits ( > https://github.com/python/cpython/graphs/contributors?from=2016-01-22&to=2017-01-21&type=c > for > the stats). > > Some of you have said you're waiting for the GitHub migration before you > start contributing again, which I can understand (I'm going to be sending > an email with an update on that after this email to python-dev & > core-workflow). But for those that have not told me that I don't know what > it will take to get you involved again. For instance, not to pick on Andrew > but he hasn't committed anything but he obviously still cares about the > project. So what would it take to get Andrew to help review patches again > so that the next time something involving random comes through he feels > like taking a quick look? > > As I have said before, the reason I took on the GitHub migration is for us > core developers. I want our workflow to be as easy as possible so that we > can be as productive as possible. But the unspoken goal I have long-term is > to get to the point that even dormant core devs want to contribute again, > and to the point that everyone reviews a patch/month and more people > reviewing a patch/week (although I'll take a patch/year to start). I want > to get to the point that every person with commit privileges takes 30 > minutes a month to help with reviews and that the majority of folks take 30 > minutes a week to review (and please don't think this as a hard rule and if > you don't the privileges go away, view this as an aspirational goal). Even > if people who don't have time to review the kind of patches Victor is > producing which triggered this thread, reviewing documentation patches can > be done without deep knowledge of things and without taking much time. That > way people who have time to review the bigger, more difficult patches can > actually spend their time on those reviews and not worrying about patches > fixing a spelling mistake or adding a new test to raise test coverage. > > All of this is so that I hope one day we get to the point where all patches > require a review no matter who proposed the code change. Now I think we're > quite a ways of from being there, but that's my moonshot goal for our > workflow: that we have enough quality reviews coming in that we feel that > even patches from fellow core developers is worth requiring the extra code > check and disbursement of knowledge without feeling like a terrible drag on > productivity. > > Once the GitHub migration has occurred I'm planning to tackle our Misc/NEWS > problem and then automate Misc/ACKS. After that, though, I hope we can take
I would be happy to help with both of those tasks. I have experience with both within the OpenStack project. And put me on the list of "waiting for github" contributors. I should have more time freeing up in a couple of months as I change some of my responsibilities at work. I would like to help with the migration and eventually regular reviews. Doug > the time to have a hard look at what in our workflow prevents people from > making even occasional code reviews so that everyone wants to help out > again (and if any of this interests you then please subscribe to > core-workflow). > > On Fri, 20 Jan 2017 at 02:46 Victor Stinner <victor.stin...@gmail.com> > wrote: > > > Hi, > > > > Raymond Hettinger used a regression that I introduced in the builtin > > sorted() function (in Python 3.6.0) to give me his feedback on my > > FASTCALL work, but also on Argument Clinic. > > > > Context: http://bugs.python.org/issue29327#msg285848 > > > > Since the reported issues is wider than just FASTCALL, including how I > > contribute to CPython, I decided to discuss the topic with a wider > > audience. I continue the discussion on python-committers to get the > > opinion of the other core developers. > > > > Sorry for my very long answer! I tried to answer to each issues > > reported by Raymond. > > > > Inaccurate summary: I'm a strong supporter of "it's better to ask > > forgiveness than permission", whereas Raymond considers that I > > introduced too many regressions with my workflow. > > > > > > Raymond Hettinger added the comment: > > > A few random thoughts that may or may not be helpful: > > > > > > * We now have two seasoned developers and one new core developer that > > collectively are creating many non-trivial patches to core parts of Python > > at an unprecedented rate of change. The patches are coming in much faster > > than they can reasonably be reviewed and carefully considered, especially > > by devs such as myself who have very limited time available. IMO, taken as > > whole, these changes are destabilizing the language. Python is so > > successful and widely adopted that we can't afford a "shit happens" > > attitude. Perhaps that works in corners of the language, infrequently used > > modules, but it makes less sense when touching the critical paths that have > > had slow and careful evolution over 26 years. > > > > > > * Besides the volume of patches, one other reason that reviews are hard > > to come by is that they apply new APIs that I don't fully understand yet. > > There are perhaps two people on the planet who could currently give > > thoughtful, correct, and critical evaluation of all those patches. > > Everyone else is just watching them all fly by and hoping that something > > good is happening. > > > > Since one or maybe even two years, I noticed that many of my issues > > were blocked by the lack of reviews. As you wrote, only few developer > > have the knowledge and background to be able to provide a good review > > (not only "tests pass, so LGTM") on my changes modifying the Python > > core. > > > > I also wanted to discuss this topic, but I didn't really know what to > > propose. Let's take this opportunity to explain how I contribute to > > CPython, especially how I decide to wait for a review or not. > > > > For each patch that I write, I estimate the risk of regression. You > > may know that any regression is something unexpected, so such > > estimation is tricky. Here is my heuristic: > > > > (*) if the patch is trivial (short, non controversal), I push it > > immediatly. > > > > > > (*) If I'm less confident, I open an issue and attach the patch. I > > wait at least one day before pushing. > > > > It's strange, but the process of opening an issue and attaching the > > patch usually helps to review the code myself (find bugs, or more > > generally enhance the patch). Maybe because it forces me to review the > > change one more time? > > > > If the change is not part of a larger patch serie, so doesn't block me > > to move further, I try to keep the issue open around one week. > > > > The truth is that too few of my patches get a review :-/ Maybe I > > should wait longer, but then it becomes harder for me to handle many > > patches. > > > > Maybe it's a tooling issues. Recently, I started to use local branches > > in a Git repository. It helps a lot of work on parallel on large > > changes. Before, I only worked in a single directory (default/, the > > default Mercurial branch) and applied/reverted patches everytime. It's > > painful, especially when I have to use multiple computers, download > > again publshed patches, etc. Maybe it will run smoother once CPython > > will move to Git and GitHub. > > > > By the way, it's painful to squash a long patch serie into a giant > > patch, much harder to review, where changes don't make sense at all at > > the first look. Again, a better reviewing tool supporting patch series > > (GitHub) will help here too. > > > > Not supporting patch series in our reviewing tool also explains why I > > prefer to push than having to wait for a review. Rebasing manually > > long patch series stored as giant .patch files is complicated. > > > > > > (*) If the change changes an API or changes a core component, I wait > > for at least one review from a core reviewer. Sometimes, I even send > > an email to python-dev. Again, sometimes I don't get any feedback on > > the patch nor the email after two weeks :-/ At least, I tried :-) > > Usually, I get feedback in less than one week, or no feedback at all. > > I understand that nobody understands my change or nobody cares :-) > > > > I totally understand that most core developers have a little amount of > > time available to contribute to Python. I'm trying to find a > > compromise between the risk of introducing regressions and being stuck > > in my work. This email might help me to adjust my workflow. > > > > By the way, I'm trying to always run the full test suite (./python -m > > test -rW -j0) before pushing any change. If I suspect that I may have > > introduced reference leaks, I also run "./python -m test -R 3:3 ..." > > on the tests related to the modified code to check for > > memory/reference leaks. > > > > > > > * One other reason for the lack of review comments in the enthusiasm and > > fervor surrounding the patches. I feel like there is a cost of questioning > > whether the patches should be done or how they are done, like I am burning > > little karma every time. Sometimes it feels safest and most cordial to > > just say nothing and let you make hundreds of semi-reviewed changes to just > > about every critical part of the language. > > > > "semi-reviewed". Let me be more accurate: yeah, I do push a lot of > > changes which were not reviewed by anyone (see above). > > > > > > > * Historically, if there was creator or maintainer of the code who was > > still active, that person would always be consulted and have a final say on > > whether a change should be applied. Now, we have code constantly being > > changed without consulting the original author (for example, the recent and > > catastrophic random initialization bug was due to application of a patch > > without consulting the author of _randommodule.c and the maintainer of > > random.py, or this change to sorted(), or the changes to decimal, etc). > > > > What do you mean by "author"? As you wrote, Python is now 26 years > > old, so it had a very long history, and each file has a very long list > > of "authors". I guess that you mean more a "maintainer". > > > > My problem is that I'm not aware of any explicit list of maintainers. > > I didn't know that you were the maintainer of the random module before > > you told me that at the Facebook sprint last september. I didn't > > expect that the random module had a maintainer, I thought that any > > core developer would be allowed to modify the code. > > > > Moreover, since I open an issue for most of my changes, it gives an > > opportunity to maintainers to review changes. Maybe we need more > > components in the bug tracker to notify maintainers of pending > > changes? > > > > > > You mentionned 3 different changes, let me reply. > > > > > > (1) The random change: http://bugs.python.org/issue29085 > > > > I introduced a regression in random.Random.seed(): a typo in the C > > code has the consequence that the current time and process identifier > > is used, instead of os.urandom(16), to initialize the Mersenne Twister > > RNG. > > > > IMHO the regression is not "catastrophic". Only few developers > > instanciate random.Random themself, random.Random must not be used for > > security, etc. I let others decide if this bug was catastrophic or > > not. > > > > > > Since we are talking about the development process, let me see how the > > change was made. > > > > Context: The PEP 524 has a long and painful history... Something like > > more than 500 messages were sent on the bug tracker and python-dev, > > and nobody was listening to each others, two security experts > > "rage-quitted" Python because of this mess... I decided to try to fix > > this issue in a constructive way, so I wrote a PEP. Nick wrote a > > different PEP, since it was clear that it was possible to handle > > security in two different incompatible ways. A mailing list was even > > created just to discuss this bug! A mailing list just for a bug gives > > an idea of the size of the mess :-) > > > > Well, about the change itself, it was done in > > http://bugs.python.org/issue27776 > > > > The patch was available for review during 19 days > > (2016-08-18-2016-09-06) and was reviewed by Nick Coghlan. Since Nick > > wrote a similar PEP, I trusted him to be able to review my change. > > (Well, anyway I already trust all core developers, but I mean that I > > was trusting him even more than usual :-)) > > > > Since the change has a big impact on security, I had prefer to get a > > review of more developers, especially our security experts... but as I > > wrote, two security experts "rage- quitted". Again, this PEP has a > > long and sad story :-/ > > > > Note: you say that you are the maintainer of the random module, but I > > don't recall having see you in any recent discussions and issues > > related to os.urandom(), whereas a lot of enhancements and changes > > were done last 2 years. I made many changes to support new OS > > functions like getentropy() an getrandom(). > > > > > > Oooookay, let's see the second change, "this change to sorted()", > > http://bugs.python.org/issue29327 > > > > (2) I introduced a bug in sorted(), last August: > > https://hg.python.org/cpython/rev/15eab21bf934/ > > > > Calling sorted(iterable=[]) does crash. To be honest, I didn't imagine > > that anyone would pass the iterable by keyword, but Serhiy is very > > good to spot bugs in corner cases :-) > > > > IMHO the regression is subtle. > > > > When I optimized the code to use FASTCALL, I replaced > > PyTuple_GetSlice(args, 1, argc) with &PyTuple_GET_ITEM(args, 1). I > > checked that all tests passed, so it looks ok to me. > > > > I didn't imagine that anyone would call sorted(iterable=[]), so I > > didn't notice that PyTuple_GetSlice() can create an empty tuple. > > > > The previous code was wrong since sorted() accepted iterable as a > > keyword, whereas sort.list() doesn't. > > > > So well, I let you guess if a review would have spot this bug in the > > large change. > > > > > > (3) Recently, I ran sed to replace code patterns to use faster ways to > > call functions: > > https://hg.python.org/cpython/rev/54a89144ee1d > > > > "Replace PyObject_CallObject(callable, NULL) with > > _PyObject_CallNoArg(callable)" > > > > I recalled that I modified the _decimal module and that Stefan Krah > > complained, because he wants to have the same code base on Python 3.5, > > 3.6 and 3.7. He also mentionned an external test suite which was > > broken by recent _decimal changes (not sure if my specific change was > > in cause or not), but I wasn't aware of it. > > > > To be honest, I didn't even notice that I modified _decimal when I ran > > sed on all .c files. Since the change was straightforward and (IMHO) > > made the code more readable, I didn't even wait for a review if I > > recall correctly. > > > > Stefan and me handled this issue privately (he reverted my change), > > I'm not sure that it's worth it to say more about this "issue" (or > > even "non-issue"). > > > > To be clear, I don't consider that my change introduced a regression. > > > > > > > * In general, Guido has been opposed to sweeping changes across the code > > base for only tiny benefits. Of late, that rule seems to have been lost. > > > > > > * The benefits of FASTCALL mainly apply to fine grained functions which > > only do a little work and tend to be called frequently in loops. For > > functions such as sorted(), the calling overhead is dominated by the cost > > of actually doing the sort. For sorted(), FASTCALL is truly irrelevant and > > likely wasn't worth the complexity, or the actual bug, or any of the time > > we've now put in it. There was no actual problem being solved, just a > > desire to broadly apply new optimizations. > > > > Ok, first, you qualify my FASTCALL changes as code churn. So let me > > show an example with sorted(): > > https://hg.python.org/cpython/rev/b34d2ef5c412 > > > > Can you elaborate how such change increases the complexity? > > > > > > Second, "no actual problem being solved" > > > > Since the goal of FASTCALL is to optimize Python, I guess that you > > consider that the speedup doesn't justify the change. I gave numbers > > in the issue #29327: > > > > Microbenchmark on sorted() on Python 3.7 compared to 3.5 (before FASTCALL): > > --- > > haypo@smithers$ ./python -m perf timeit 'seq=list(range(10))' > > 'sorted(seq)' --compare-to=../3.5/python -v > > Median +- std dev: [3.5] 1.07 us +- 0.06 us -> [3.7] 958 ns +- 15 ns: > > 1.12x faster (-11%) > > > > haypo@smithers$ ./python -m perf timeit 'seq=list(range(10)); k=lambda > > x:x' 'sorted(seq, key=k)' --compare-to=../3.5/python -v > > Median +- std dev: [3.5] 3.34 us +- 0.07 us -> [3.7] 2.66 us +- 0.05 > > us: 1.26x faster (-21%) > > --- > > > > IMHO such speedup is significant even on a microbenchmark. Can you > > elaborate what are your criteria to decide if an optimization is worth > > it? > > > > > > > > > * Historically, we've relied on core developers showing restraint. Not > > every idea that pops into their head is immediately turned into a patch > > accompanied by pressure to apply it. Devs tended to restrict themselves to > > parts of the code they knew best through long and careful study rather > > sweeping through modules and altering other people's carefully crafted code. > > > > Should I understand that I should restrict myself to some files? Or > > not touch some specific parts of Python, like... "your" code like > > random, itertools and collections modules? > > > > I replied to the 3 issues you mentioned previously and explained how I > > contribute to Python. > > > > > > > * FWIW, I applaud your efforts to reduce call overhead -- that has long > > been a sore spot for the language. > > > > > > * Guido has long opposed optimizations that increase risk of bugs, > > introduce complexity, or that affect long-term maintainability. In some > > places, it looks like FASTCALL is increasing the complexity (replacing > > something simple and well-understood with a wordier, more intricate API > > that I don't yet fully understand and will affect my ability to maintain > > the surrounding code). > > > > I'm sorry, I didn't spent much time on explaing the FASTCALL design > > nor documenting my changes. It's partially deliberate to make > > everything related to FASTCALL private. Since it's a huge project > > modifying a lot of code, I wanted to wait until the APIs and the code > > stop moving too fast to take time to explain my work and document it. > > > > If you have specific questions, please go ahead. > > > > > > Shortest summary: > > > > * FASTCALL replaces (args: tuple, kwargs: optional dict) with (args: C > > array, nargs: int, kwnames: tuple of keyword keys). It's a new calling > > convention which allows to avoid a temporary tuple to pass positional > > arguments and avoids temporary dictionary to pass keyworkd arguments. > > > > * To use FASTCALL, C functions should be converted to the new > > METH_FASTCALL calling convention > > > > * PyObject_Call() can replaced with _PyObject_FastCallKeywords() or > > _PyObject_FastCallDict() (when we still get kwargs as a dict) in such > > conversion > > > > * Many existing C functions were optimized internally to use FASCALL, > > so even if you don't modify your code, you will benefit of it > > (speedup). Typical example: PyFunction_CallFunctionObjArgs(). > > > > > > The most massive change were purely internal and don't affect the most > > famous C APIs at all. In some cases, to fully benefit of FASTCALL, > > code should be modified. I'm trying to restrict such changes to Python > > internals, especially the most used functions. > > > > I expected that the required changes were straightforward enough, it > > looks like I was wrong, but I don't recall anyone, before you > > recently, asking for an explanation. > > > > > > > > > * It was no long ago that you fought tooth-and-nail against a single > > line patch optimization I submitted. The code was clearly correct and had > > a simple disassembly to prove its benefit. Your opposition was based on > > "it increases the complexity of the code, introduces a maintenance cost, > > and increases the risk of bugs". In the end, your opposition killed the > > patch. But now, the AC and FASTCALL patches don't seem to mind any of > > these considerations. > > > > Context: http://bugs.python.org/issue26201 > > > > It seems like we need more _explicit_ rules to decide if an > > optimization is worth it or not. For me, the de facto standard request > > for an optimization is to prove it with a benchmark. I requested a > > benchmark, but you refused to provide it. > > > > So I ran my own benchmark and saw that your change made the modified > > code (PyList_Append()) 6% slower. I'm not sure that my bencmark was > > correct, but it was a first step to take a decision. > > > > > > To come back to FASTCALL, your point is that it doesn't provide any > > speedup. > > > > In most FASTCALL issues that I opened, I provide a script to reproduce > > my benchmark and the benchmark results. The speedup is usually betwen > > 10% and 20% faster. > > > > Should I understand that 6% slower is ok, whereas 10-20% faster is not > > good? Can you please elaborate? > > > > > > > * AC is supposed to be a CPython-only concept. But along the way APIs > > are being changed without discussion. I don't mind that sorted() now > > exposes *iterable* as a keyword argument, but it was originally left out on > > purpose (Tim opined that code would look worse with iterable as a keyword > > argument). That decision was reversed unilaterally without consulting the > > author and without a test. Also as AC is being applied, the variable names > > are being changed. I never really liked the "mp" that used in dicts and > > prefer the use of "self" better, but it is a gratuitous change that > > unilaterally reverses the decisions of the authors and makes the code not > > match any of the surrounding code that uses the prior conventions. > > > > Ah, at least I concur with you on one point :-) Changes to convert > > functions to AC must not change the API (type of arguments: positional > > only/keyword/..., default values, etc.) nor provide a worse docstring. > > > > There is an active on-going work to enhance AC to fix issues that you > > reported, like the default value of positional-only parameters which > > should not be rendered in the function signature (I created the issue > > #29299 with a patch). Serhiy is also working on implementing the last > > major missing feature of AC: support *args and **kwargs parameters > > (issue #20291). > > > > FYI I wasn't involved in AC changes, I only started to look at AC > > recently (1 or 2 months ago). Again, I agree that these changes should > > be carefully reviewed, which is an hard task since required changes > > are usually large and move a lot of code. We need more eyes to look at > > these changes! > > > > For the specific case of sorted(), the name of first parameter is > > already documented in the docstring and documentation in Python 2.7: > > "iterable". So I guess that you mean that it is now possible to use it > > as a keyword argument. Well, see the issue #29327 for the long story. > > This issue is a regression, it was already fixed, and I didn't > > introduce the API change. > > > > > > Oh by the way, when I read your comment, I understand that I'm > > responsible of all regressions. It's true that I introduced > > regressions, that's where I said "shit happens" (or more politically > > correct: "it's better to ask forgiveness than permission" ;-)). Since > > I'm one of the most active contributor in CPython, I'm not surprised > > of being the one who introduce many (most?) regressions :-) I'm trying > > to review my changes multiple times, test corner cases, etc. But I'm > > not perfect. > > > > Sadly, to show its full power, FASTCALL requires changes at many > > levels of the code. It requires to change at lot of code, but I > > understood that core developers approved the whole project. Maybe I > > was wrong? At least, I asked for permissions multiple changes, > > especially at the start. > > > > > > > > > * FWIW, the claim that the help is much better is specious. AFAICT, > > there has never been the slightest problem with "sorted(iterable, key=None, > > reverse=False) --> new sorted list" which has been clear since the day it > > was released. It is some of the new strings the are causing problems with > > users (my students frequently are tripped-up by the / notation for example; > > no one seems to be able to intuit what it means without it being explained > > first). > > > > Good news, it seems like you have a good experience in API design, > > documentation, etc. Join the "Argument Clinic" project to help us to > > enhance docstrings, function signatures and documentation ;-) > > > > See the good part of the AC on-going work: it's a nice opportunity to > > also enhance documentation, not only provide a signature. > > > > By the way, to be honest, the main advantage of converting functions > > to AC is to get a signature. The signature is visible in docstrings > > which is nice, but it is also very useful to a wide range of tools > > like (IDE, static checks, etc.). > > > > Conversion to FASTCALL is more a nice effect. At least, it is a good > > motivation for me to convert mor and more code to AC :-) > > > > AC moves docstring closer to the list of parameters. IHMO it makes the > > C code simpler to read and understand. It also removes the boring code > > responsible to "parse" arguments, so it makes the code shorter. But > > well, this is just my opinion. > > > > > > > * FWIW, I'm trying to be constructive and contribute where I can, but > > frankly I can't keep up with the volume of churn. Having seen bugs being > > introduced, it is not inappropriate to ask another dev to please be > > careful, especially when that dev has been prolific to an unprecedented > > degree and altering core parts of the language for function calls, to new > > opcodes, the memory allocators, etc. Very few people on the planet are > > competent to review these changes, make reasonable assessments about > > whether the complexity and churn are worth it. An fewer still have the > > time to keep up with the volume of changes. > > > > > > Hum, I wasn't involved in bytecode changes. > > > > Well, I reviewed the very good work of Demur Rumed. I recall that you > > worked on a similar area, trying to fetch bytecode by 16-bit instead > > of 8-bit. Demur proposed a good design and I recall that the design > > was approved. > > > > I helped a little bit on the implementation and I pushed the final > > change, but all credits go to Demur and Serhiy Storshaka! By the way, > > Serhiy made further efficient enhancements in the bytecode of > > CALL_FUNCTION instructions. > > > > > > About memory allocations, I guess that you are referring to my change > > on PyMem_Malloc() allocator. I discussed the issue on python-dev and > > waited for approval of my peers before pushing anything, since I know > > well that it's a critical part of Python: > > https://mail.python.org/pipermail/python-dev/2016-March/143467.html > > > > I provide all data requested by Marc Andre Lemburg (test the change > > with common projects, Django, Pillow, numpy) and made further changes > > (PYTHONMALLOC=debug tool) to help to handle this backward incompatible > > change (GIL is now required to call PyMem_Malloc). > > > > Hopefully, it seems like nobody noticed this subtle change (GIL now > > requied): I didn't see any bug report. By the way, I fixed a misused > > PyMem_Mem() in numpy. > > > > > > > * Please do continue your efforts to improve the language, but also > > please moderate the rate of change, mitigate the addition complexity, value > > stability over micro-optimizations, consult the authors and maintainers of > > code, take special care without code that hasn't been reviewed because that > > lacks a safety net, and remember that newer devs may be taking cues from > > you (do you want them making extensive changes to long existing stable code > > without consulting the authors and with weak LGTM reviews?) > > > > Ok, I will do it. > > > > Thank you for you feedback Raymond. I hope that my email helps you to > > understand how I work and how I take my decisions. > > > > Victor > > _______________________________________________ > > python-committers mailing list > > python-committers@python.org > > https://mail.python.org/mailman/listinfo/python-committers > > Code of Conduct: https://www.python.org/psf/codeofconduct/ > > _______________________________________________ python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/