Re: [Python-Dev] PEP 414 - some numbers from the Django port
- Original Message - But the stuff you run is not really benchmarking anything. As far as I know django benchmarks benchmark something like mostly DB creation and deletion, although that might differ between CPython and PyPy. How about running *actual* django benchmarks, instead of the test suite? Not that proving anything is necessary, but if you try to prove something, make it right. But my point was only to show that in a reasonable body of Python code (as opposed to a microbenchmark), the overhead of using wrappers was not significant. All those wrapper calls in ported Django and its test suite were exercised. It was not exactly a benchmarking exercise in that it didn't matter what the actual numbers were, nor was any claim being made about absolute performance; just that they were the same for all three variants, within statistical variation. As I mentioned in my other post, I happened to have the Django test suite figures to hand, and to my mind they suited the purpose of showing that wrapper calls, in the overall mix, don't seem to have a noticeable impact (whereas they do, in a microbenchmark). Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - some numbers from the Django port
On Wed, Mar 7, 2012 at 2:36 PM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Armin Ronacher armin.ronacher at active-4.com writes: What are you trying to argue? That the overall Django testsuite does not do a lot of string processing, less processing with native strings? I'm surprised you see a difference at all over the whole Django testsuite and I wonder why you get a slowdown at all for the ported Django on 2.7. The point of the figures is to show there is *no* difference (statistically speaking) between the three sets of samples. Of course, any individual run or set of runs could be higher or lower due to other things happening on the machine (not that I was running any background tasks), so the idea of the simple statistical analysis is to determine whether these samples could all have come from the same populations. According to ministat, they could have (with a 95% confidence level). But the stuff you run is not really benchmarking anything. As far as I know django benchmarks benchmark something like mostly DB creation and deletion, although that might differ between CPython and PyPy. How about running *actual* django benchmarks, instead of the test suite? Not that proving anything is necessary, but if you try to prove something, make it right. Cheers, fijal ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - some numbers from the Django port
Hi, On 3/3/12 2:28 AM, Vinay Sajip wrote: So, looking at a large project in a relevant problem domain, unicode_literals and native string markers would appear not to adversely impact readability or performance. What are you trying to argue? That the overall Django testsuite does not do a lot of string processing, less processing with native strings? I'm surprised you see a difference at all over the whole Django testsuite and I wonder why you get a slowdown at all for the ported Django on 2.7. Regards, Armin` ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - some numbers from the Django port
Armin Ronacher armin.ronacher at active-4.com writes: What are you trying to argue? That the overall Django testsuite does not do a lot of string processing, less processing with native strings? I'm surprised you see a difference at all over the whole Django testsuite and I wonder why you get a slowdown at all for the ported Django on 2.7. The point of the figures is to show there is *no* difference (statistically speaking) between the three sets of samples. Of course, any individual run or set of runs could be higher or lower due to other things happening on the machine (not that I was running any background tasks), so the idea of the simple statistical analysis is to determine whether these samples could all have come from the same populations. According to ministat, they could have (with a 95% confidence level). The Django test suite is pretty comprehensive, so it would presumably exercise every part of Django, including the parts that handle processing of requests and producing responses. I can't confirm this, not having done a coverage analysis of Django; but this seems like a more representative workload than any microbenchmark which just measures a single operation, like the overhead of a wrapper. And so my argument was that the microbenchmark numbers didn't give a meaningful indication of the actual performance in a real scenario, and they should be taken in that light. No doubt there are other, better (more useful) tests that could be performed (e.g. ab run against all three variants and requests/sec figures compared) but I had the Django test run figures to hand (since they're a byproduct of the porting work), and so presented them in my post. Anyway, it doesn't really matter now, since the latest version of the PEP no longer mentions those figures. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 414 - some numbers from the Django port
On Thu, Mar 8, 2012 at 8:36 AM, Vinay Sajip vinay_sa...@yahoo.co.uk wrote: Anyway, it doesn't really matter now, since the latest version of the PEP no longer mentions those figures. Indeed, I deliberately removed the part about performance concerns, since I considered it a distraction from what I see as the heart of the problem PEP 414 is designed to address (i.e. that the purely mechanical changes previously required to Unicode text that is already clearly marked as such in the Python 2 version are irrelevant noise when it comes to identifying and reviewing the *actual* changes needed for a successful Python 3 port). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 414 - some numbers from the Django port
PEP 414 mentions the use of function wrappers and talks about both their obtrusiveness and performance impact on Python code. In the Django Python 3 port, I've used unicode_literals, and hence have no u prefixes in the ported code, and use a function wrapper to adorn native strings where they are needed. Though the port is still work in progress, it passes all tests on 2.x and 3.x with the SQLite adapter, with only a small number skipped specifically during the porting exercise (generally due to representational differences). I'd like to share some numbers from this port to see what people here think about them. Firstly, on obtrusiveness: Out of a total of 1872 source files, the native string marker only appears in 30 files - 18 files in Django itself, and 12 files in the test suite. This is less than 2% of files, so the native string markers are not especially invasive when looking at code. There are only 76 lines in the ported Django which contain native string markers. Secondly, on performance. I ran the following steps 6 times: Run the test suite on unported Django using Python 2.7.2 (vanilla) Run the test suite on the ported Django using Python 2.7.2 (ported) Run the test suite on the ported Django using Python 3.2.2 (ported3) Django skips some tests because dependencies aren't installed (e.g. PIL for Python 3.2). The raw numbers, in seconds elapsed for the test run, are given below: vanilla (4659 tests): 468.586 486.231 467.584 464.916 480.530 475.457 ported (4655 tests): 467.350 480.902 479.276 478.748 478.115 486.044 ported3 (4609 tests): 463.161 470.423 463.833 448.097 456.727 504.402 If we allow for the different numbers of tests run by dividing by the number of tests and multiplying by 100, we get: vanilla-weighted: 10.057 10.436 10.036 9.979 10.314 10.205 ported-weighted: 10.040 10.331 10.296 10.285 10.271 10.441 ported3-weighted: 10.049 10.207 10.064 9.722 9.909 10.944 If I run these through ministat, it tells me there is no significant difference in these data sets, with a 95% confidence level: $ ministat -w 74 vanilla-weighted ported-weighted ported3-weighted x vanilla-weighted + ported-weighted * ported3-weighted +--+ |* + | |* * x *** ++x+ * *| ||___|___M|AA_M___AM___|__|_| | +--+ N Min MaxMedian AvgStddev x 6 9.97910.43610.205 10.1711670.17883782 + 6 10.0410.44110.296 10.2773330.13148485 No difference proven at 95.0% confidence * 6 9.72210.94410.064 10.1491670.42250274 No difference proven at 95.0% confidence So, looking at a large project in a relevant problem domain, unicode_literals and native string markers would appear not to adversely impact readability or performance. Your comments would be appreciated. Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com