Re: [Python-Dev] test_unicode failure on MIPS
Neal Norwitz wrote: Any ideas? this is a recent change, so it looks like the box simply didn't get around to rebuild the unicodeobject module. (I'm beginning to wonder if I didn't forget to add some header file dependencies somewhere during the stringlib refactoring, but none of the other buildbots seem to have a problem with this...) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] valgrind report
Looks pretty good, except for 1 cjk problem: test_codecencodings_jp Invalid read of size 1 at 0x110AEBC3: shift_jis_2004_decode (_codecs_jp.c:642) by 0xBFCBDB7: mbidecoder_decode (multibytecodec.c:839) Address 0xAEC376B is 0 bytes after a block of size 3 alloc'd at 0x4A19B7E: malloc (vg_replace_malloc.c:149) by 0xBFCBF54: mbidecoder_decode (multibytecodec.c:1023) n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] test_unicode failure on MIPS
On 6/1/06, Fredrik Lundh [EMAIL PROTECTED] wrote: Neal Norwitz wrote: Any ideas? this is a recent change, so it looks like the box simply didn't get around to rebuild the unicodeobject module. That shouldn't be. make distclean should be called (it was make clean until recently). However, http://www.python.org/dev/buildbot/all/MIPS%20Debian%20trunk/builds/176/step-compile/0 seems to indicate unicodeobject was in fact not built. I also don't see any previous record of any builds (or make cleans). That buildslave is new and it had some connectivity problems I think. So maybe something was whacky on it. The current build (still running) definitely did compile unicodeobject. So let's wait and see if that finishes successfully. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Fredrik Lundh wrote: M.-A. Lemburg wrote: Seriously, I've been using and running pybench for years and even though tweaks to the interpreter do sometimes result in speedups or slow-downs where you wouldn't expect them (due to the interpreter using the Python objects), they are reproducable and often enough have uncovered that optimizations in one area may well result in slow-downs in other areas. Often enough the results are related to low-level features of the architecture you're using to run the code such as cache size, cache lines, number of registers in the CPU or on the FPU stack, etc. etc. and that observation has never made you stop and think about whether there might be some problem with the benchmarking approach you're using? The approach pybench is using is as follows: * Run a calibration step which does the same as the actual test without the operation being tested (ie. call the function running the test, setup the for-loop, constant variables, etc.) The calibration step is run multiple times and is used to calculate an average test overhead time. * Run the actual test which runs the operation multiple times. The test is then adjusted to make sure that the test overhead / test run ratio remains within reasonable bounds. If needed, the operation code is repeated verbatim in the for-loop, to decrease the ratio. * Repeat the above for each test in the suite * Repeat the suite N number of rounds * Calculate the average run time of all test runs in all rounds. after all, if a change to e.g. the try/except code slows things down or speed things up, is it really reasonable to expect that the time it takes to convert Unicode strings to uppercase should suddenly change due to cache effects or a changing number of registers in the CPU? real hardware doesn't work that way... Of course, but then changes to try-except logic can interfere with the performance of setting up method calls. This is what pybench then uncovers. The only problem I see in the above approach is the way calibration is done. The run-time of the calibration code may be to small w/r to the resolution of the used timers. Again, please provide the parameters you've used to run the test case and the output. Things like warp factor, overhead, etc. could hint to the problem you're seeing. is PyBench perhaps using the following approach: T = set of tests for N in range(number of test runs): for t in T: t0 = get_process_time() t() t1 = get_process_time() assign t1 - t0 to test t print assigned time where t1 - t0 is very short? See above (or the code in pybench.py). t1-t0 is usually around 20-50 seconds: The tests must set .rounds to a value high enough to let the test run between 20-50 seconds. This is needed because clock()-timing only gives rather inaccurate values (on Linux, for example, it is accurate to a few hundreths of a second). If you don't want to wait that long, use a warp factor larger than 1. that's not a very good idea, given how get_process_time tends to be implemented on current-era systems (google for jiffies)... but it definitely explains the bogus subtest results I'm seeing, and the magic hardware behaviour you're seeing. That's exactly the reason why tests run for a relatively long time - to minimize these effects. Of course, using wall time make this approach vulnerable to other effects such as current load of the system, other processes having a higher priority interfering with the timed process, etc. For this reason, I'm currently looking for ways to measure the process time on Windows. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2006-07-03: EuroPython 2006, CERN, Switzerland 30 days left ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg wrote: Of course, but then changes to try-except logic can interfere with the performance of setting up method calls. This is what pybench then uncovers. I think the only thing PyBench has uncovered is that you're convinced that it's always right, and everybody else is always wrong, including people who've spent decades measuring performance, and the hardware in your own computer. See above (or the code in pybench.py). t1-t0 is usually around 20-50 seconds: what machines are you using? using the default parameters, the entire run takes about 50 seconds on the slowest machine I could find... that's not a very good idea, given how get_process_time tends to be implemented on current-era systems (google for jiffies)... but it definitely explains the bogus subtest results I'm seeing, and the magic hardware behaviour you're seeing. That's exactly the reason why tests run for a relatively long time - to minimize these effects. Of course, using wall time make this approach vulnerable to other effects such as current load of the system, other processes having a higher priority interfering with the timed process, etc. since process time is *sampled*, not measured, process time isn't exactly in- vulnerable either. it's not hard to imagine scenarios where you end up being assigned only a small part of the process time you're actually using, or cases where you're assigned more time than you've had a chance to use. afaik, if you want true performance counters on Linux, you need to patch the operating system (unless something's changed in very recent versions). I don't think that sampling errors can explain all the anomalies we've been seeing, but I'd wouldn't be surprised if a high-resolution wall time clock on a lightly loaded multiprocess system was, in practice, *more* reliable than sampled process time on an equally loaded system. /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Fredrik Lundh wrote: M.-A. Lemburg wrote: Of course, but then changes to try-except logic can interfere with the performance of setting up method calls. This is what pybench then uncovers. I think the only thing PyBench has uncovered is that you're convinced that it's always right, and everybody else is always wrong, including people who've spent decades measuring performance, and the hardware in your own computer. Oh, come on. You know that's not true and I'm trying to understand what is causing your findings, but this is difficult, since you're not providing enough details. E.g. the output of pybench showing the timing results would help a lot. I would also like to reproduce your findings. Do you have two revision numbers in svn which I could use for this ? See above (or the code in pybench.py). t1-t0 is usually around 20-50 seconds: what machines are you using? using the default parameters, the entire run takes about 50 seconds on the slowest machine I could find... If the whole suite runs in 50 seconds, the per-test run-times are far too small to be accurate. I usually adjust the warp factor so that each *round* takes 50 seconds. Looks like I have to revisit the default parameters and update the doc-strings. I'll do that when I add the new timers. Could you check whether you still see the same results with running with pybench.py -w 1 ? that's not a very good idea, given how get_process_time tends to be implemented on current-era systems (google for jiffies)... but it definitely explains the bogus subtest results I'm seeing, and the magic hardware behaviour you're seeing. That's exactly the reason why tests run for a relatively long time - to minimize these effects. Of course, using wall time make this approach vulnerable to other effects such as current load of the system, other processes having a higher priority interfering with the timed process, etc. since process time is *sampled*, not measured, process time isn't exactly in- vulnerable either. it's not hard to imagine scenarios where you end up being assigned only a small part of the process time you're actually using, or cases where you're assigned more time than you've had a chance to use. afaik, if you want true performance counters on Linux, you need to patch the operating system (unless something's changed in very recent versions). I don't think that sampling errors can explain all the anomalies we've been seeing, but I'd wouldn't be surprised if a high-resolution wall time clock on a lightly loaded multiprocess system was, in practice, *more* reliable than sampled process time on an equally loaded system. That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Note that time.clock, the current default timer in pybench, is a high accuracy wall-clock timer on Windows, so it should demonstrate similar behavior to timeit.py, even more so, since your using warp 20 and thus a similar timing strategy as that of timeit.py. I suspect that the calibration step is causing problems. Steve added a parameter to change the number of calibration runs done per test: -C n. The default is 20. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2006-07-03: EuroPython 2006, CERN, Switzerland 30 days left ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg: The approach pybench is using is as follows: ... The calibration step is run multiple times and is used to calculate an average test overhead time. One of the changes that occured during the sprint was to change this algorithm to use the best time rather than the average. Using the average assumes a Gaussian distribution. Timing results are not. There is an absolute best but that's rarely reached due to background noise. It's more like a gamma distribution plus the minimum time. To show the distribution is non-Gaussian I ran the following def compute(): x = 0 for i in range(1000): for j in range(1000): x += 1 def bench(): t1 = time.time() compute() t2 = time.time() return t2-t1 times = [] for i in range(1000): times.append(bench()) print times The full distribution is attached as 'plot1.png' and the close up (range 0.45-0.65) as 'plot2.png'. Not a clean gamma function, but that's a closer match than an exponential. The gamma distribution looks more like a exponential function when the shape parameter is large. This corresponds to a large amount of noise in the system, so the run time is not close to the best time. This means the average approach works better when there is a lot of random background activity, which is not the usual case when I try to benchmark. When averaging a gamma distribution you'll end up with a bit of a skew, and I think the skew depends on the number of samples, reaching a limit point. Using the minimum time should be more precise because there is a definite lower bound and the machine should be stable. In my test above the first few results are 0.472838878632 0.473038911819 0.473326921463 0.473494052887 0.473829984665 I'm pretty certain the best time is 0.4725, or very close to that. But the average time is 0.58330151391 because of the long tail. Here are the last 6 results in my population of 1000 1.76353311539 1.79937505722 1.82750201225 2.01710510254 2.44861507416 2.90868496895 Granted, I hit a couple of web pages while doing this and my spam filter processed my mailbox in the background... There's probably some Markov modeling which would look at the number and distribution of samples so far and assuming a gamma distribution determine how many more samples are needed to get a good estimate of the absolute minumum time. But min(large enough samples) should work fine. If the whole suite runs in 50 seconds, the per-test run-times are far too small to be accurate. I usually adjust the warp factor so that each *round* takes 50 seconds. The stringbench.py I wrote uses the timeit algorithm which dynamically adjusts the test to run between 0.2 and 2 seconds. That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Wasn't that decision a consequence of the problems found during the sprint? Andrew [EMAIL PROTECTED] plot1.png Description: PNG image plot2.png Description: PNG image ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing Mac OS 9 cruft
Just and Jack have confirmed that you can throw away everything except possibly Demo/*. (Just even speculated that some cruft may have been accidentally revived by the cvs - svn transition?) --Guido On 6/1/06, Neal Norwitz [EMAIL PROTECTED] wrote: I was about to remove Mac/IDE scripts, but it looks like there might be more stuff that is OS 9 related and should be removed. Other possibilities look like (everything under Mac/): Demo/* this is a bit more speculative IDE scripts/* MPW/* Tools/IDE/* this references IDE scripts, so presumably it should be toast? Tools/macfreeze/* Unsupported/mactcp/dnrglue.c Wastemods/* I'm going mostly based on what has been modified somewhat recently. Can someone confirm/reject these? I'll remove them. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Andrew Dalke wrote: M.-A. Lemburg: The approach pybench is using is as follows: ... The calibration step is run multiple times and is used to calculate an average test overhead time. One of the changes that occured during the sprint was to change this algorithm to use the best time rather than the average. Using the average assumes a Gaussian distribution. Timing results are not. There is an absolute best but that's rarely reached due to background noise. It's more like a gamma distribution plus the minimum time. To show the distribution is non-Gaussian I ran the following def compute(): x = 0 for i in range(1000): for j in range(1000): x += 1 def bench(): t1 = time.time() compute() t2 = time.time() return t2-t1 times = [] for i in range(1000): times.append(bench()) print times The full distribution is attached as 'plot1.png' and the close up (range 0.45-0.65) as 'plot2.png'. Not a clean gamma function, but that's a closer match than an exponential. The gamma distribution looks more like a exponential function when the shape parameter is large. This corresponds to a large amount of noise in the system, so the run time is not close to the best time. This means the average approach works better when there is a lot of random background activity, which is not the usual case when I try to benchmark. When averaging a gamma distribution you'll end up with a bit of a skew, and I think the skew depends on the number of samples, reaching a limit point. Using the minimum time should be more precise because there is a definite lower bound and the machine should be stable. In my test above the first few results are 0.472838878632 0.473038911819 0.473326921463 0.473494052887 0.473829984665 I'm pretty certain the best time is 0.4725, or very close to that. But the average time is 0.58330151391 because of the long tail. Here are the last 6 results in my population of 1000 1.76353311539 1.79937505722 1.82750201225 2.01710510254 2.44861507416 2.90868496895 Granted, I hit a couple of web pages while doing this and my spam filter processed my mailbox in the background... There's probably some Markov modeling which would look at the number and distribution of samples so far and assuming a gamma distribution determine how many more samples are needed to get a good estimate of the absolute minumum time. But min(large enough samples) should work fine. Thanks for the great analysis ! Using the minimum looks like the way to go for calibration. I wonder whether the same is true for the actual tests; since you're looking for the expected run-time, the minimum may not necessarily be the choice. Then again, in both cases you are only looking at a small number of samples (20 for the calibration, 10 for the number of rounds), so this may be irrelevant. BTW, did you run this test on Windows or a Unix machine ? There's also an interesting second high at around 0.53. What could be causing this ? If the whole suite runs in 50 seconds, the per-test run-times are far too small to be accurate. I usually adjust the warp factor so that each *round* takes 50 seconds. The stringbench.py I wrote uses the timeit algorithm which dynamically adjusts the test to run between 0.2 and 2 seconds. That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Wasn't that decision a consequence of the problems found during the sprint? It's a consequence of a discussion I had with Steve Holden and Tim Peters: I believe that using wall-clock timers for benchmarking is not a good approach due to the high noise level. Process time timers typically have a lower resolution, but give a better picture of the actual run-time of your code and also don't exhibit as much noise as the wall-clock timer approach. Of course, you have to run the tests somewhat longer to get reasonable accuracy of the timings. Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum. Since I like to give users the option of choosing for themselves, I'm going to make the choice of timer an option. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2006-07-03: EuroPython 2006, CERN, Switzerland 30 days left ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list
Re: [Python-Dev] Let's stop eating exceptions in dict lookup
Anthony Baxter [EMAIL PROTECTED] writes: On Friday 02 June 2006 02:21, Jack Diederich wrote: The CCP Games CEO said they have trouble retaining talent from more moderate latitudes for this reason. 18 hours of daylight makes them a bit goofy and when the Winter Solstice rolls around they are apt to go quite mad. Obviously they need to hire people who are already crazy. I think they already did! :) not-naming-any-names-ly, Anthony me-neither-ly y'rs mwh -- Look I don't know. Thankyou everyone for arguing me round in circles. No need for thanks, ma'am; that's what we're here for. -- LNR Michael M Mason, cam.misc ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg wrote: I believe that using wall-clock timers for benchmarking is not a good approach due to the high noise level. Process time timers typically have a lower resolution, but give a better picture of the actual run-time of your code and also don't exhibit as much noise as the wall-clock timer approach. please stop repeating this nonsense. there are no process time timers in con- temporary operating systems; only tick counters. there are patches for linux and commercial add-ons to most platforms that lets you use hardware performance counters for process stuff, but there's no way to emulate that by playing with different existing Unix or Win32 API:s; the thing you think you're using simply isn't there. /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg wrote: That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Wasn't that decision a consequence of the problems found during the sprint? It's a consequence of a discussion I had with Steve Holden and Tim Peters: I believe that using wall-clock timers for benchmarking is not a good approach due to the high noise level. Process time timers typically have a lower resolution, but give a better picture of the actual run-time of your code and also don't exhibit as much noise as the wall-clock timer approach. Of course, you have to run the tests somewhat longer to get reasonable accuracy of the timings. Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum. I just had an idea: if we could get each test to run inside a single time slice assigned by the OS scheduler, then we could benefit from the better resolution of the hardware timers while still keeping the noise to a minimum. I suppose this could be achieved by: * making sure that each tests needs less than 10ms to run * calling time.sleep(0) after each test run Here's some documentation on the Linux scheduler: http://www.samspublishing.com/articles/article.asp?p=101760seqNum=2rl=1 Table 3.1 has the minimum time slice: 10ms. What do you think ? Would this work ? -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ 2006-07-03: EuroPython 2006, CERN, Switzerland 30 days left ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Marc-Andre Lemburg writes: Using the minimum looks like the way to go for calibration. I wonder whether the same is true for the actual tests; since you're looking for the expected run-time, the minimum may not necessarily be the choice. No, you're not looking for the expected run-time. The expected run-time is a function of the speed of the CPU, the architechure of same, what else is running simultaneously -- perhaps even what music you choose to listen to that day. It is NOT a constant for a given piece of code, and is NOT what you are looking for. What you really want to do in benchmarking is to *compare* the performance of two (or more) different pieces of code. You do, of course, care about the real-world performance. So if you had two algorithms and one ran twice as fast when there were no context switches and 10 times slower when there was background activity on the machine, then you'd want prefer the algorithm that supports context switches. But that's not a realistic situation. What is far more common is that you run one test while listening to the Grateful Dead and another test while listening to Bach, and that (plus other random factors and the phase of the moon) causes one test to run faster than the other. Taking the minimum time clearly subtracts some noise, which is a good thing when comparing performance for two or more pieces of code. It fails to account for the distribution of times, so if one piece of code occasionally gets lucky and takes far less time then minimum time won't be a good choice... but it would be tricky to design code that would be affected by the scheduler in this fashion even if you were explicitly trying! Later he continues: Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum. Since I like to give users the option of choosing for themselves, I'm going to make the choice of timer an option. I'm generally a fan of giving programmers choices. However, this is an area where we have demonstrated that even very competent programmers often have misunderstandings (read this thread for evidence!). So be very careful about giving such a choice: the default behavior should be chosen by people who think carefully about such things, and the documentation on the option should give a good explanation of the tradeoffs or at least a link to such an explanation. -- Michael Chermside ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] test_ctypes failures on ppc64 debian
[Thomas Heller] test_ctypes fails on the ppc64 machine. I don't have access to such a machine myself, so I would have to do some trial and error, or try to print some diagnostic information. This should not be done in the trunk, so the question is: can the buildbots build branches? Yes. For example, that's how the buildbots run 2.4 tests. I assume I just have to enter a revision number and press the force-build button, is this correct? No, you need to enter the tail end of the branch path in the Branch to build: box. You probably want to leave the Revision to build: box empty. Examples I know work because I've tried them in the past: entering trunk in Branch to build: builds the current trunk, and entering branches/release24-maint in Branch to build: builds the current 2.4 branch. I'm not certain that paths other than those work. Or would someone consider this abuse? In this case, it only matters whether Matthias Klose thinks it's abuse (since klose-debian-ppc64 is his box), so I've copied him on this reply. Matthias, I hope you don't mind some extra activity on that box, since it may be the only way test_ctypes will ever pass on your box :-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] test_ctypes failures on ppc64 debian
Tim Peters wrote: [Thomas Heller] test_ctypes fails on the ppc64 machine. I don't have access to such a machine myself, so I would have to do some trial and error, or try to print some diagnostic information. This should not be done in the trunk, so the question is: can the buildbots build branches? Yes. For example, that's how the buildbots run 2.4 tests. I assume I just have to enter a revision number and press the force-build button, is this correct? No, you need to enter the tail end of the branch path in the Branch to build: box. You probably want to leave the Revision to build: box empty. Examples I know work because I've tried them in the past: entering trunk in Branch to build: builds the current trunk, and entering branches/release24-maint in Branch to build: builds the current 2.4 branch. I'm not certain that paths other than those work. Or would someone consider this abuse? In this case, it only matters whether Matthias Klose thinks it's abuse (since klose-debian-ppc64 is his box), so I've copied him on this reply. Matthias, I hope you don't mind some extra activity on that box, since it may be the only way test_ctypes will ever pass on your box :-) I have already mailed him asking if he can give me interactive access to this machine ;-). He has not yet replied - I'm not sure if this is because he's been shocked to see such a request, or if he already is in holidays. Thomas ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] SF patch #1473257: Add a gi_code attr to generators
On 6/1/06, Guido van Rossum [EMAIL PROTECTED] wrote: On 6/1/06, Phillip J. Eby [EMAIL PROTECTED] wrote: I didn't know it was assigned to me.I guess SF doesn't send any notifications, and neither did Georg, so your email is the very first time that I've heard of it.This is a longstanding SF bug. (One of the reasons why we should moveaway from it ASAP IMO.)The Request for Trackers should go out this weekend, putting a worst case timeline of choosing a tracker as three months from this weekend. Once that is done hopefully switching over won't take very long. In other words, hopefully this can get done before October. -BrettWhile we're still using SF, developers should probably get in the habit of sending an email to the assignee when assigning a bug...Guido van Rossum (home page: http://www.python.org/~guido/)___ Python-Dev mailing listPython-Dev@python.orghttp://mail.python.org/mailman/listinfo/python-devUnsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Granted, I hit a couple of web pages while doing this and my spam filter processed my mailbox in the background... Hardly a setting in which to run comparison tests, seems to me. Using the minimum looks like the way to go for calibration. Or possibly the median. But even better, the way to go to run comparison timings is to use a system with as little other stuff going on as possible. For Windows, this means rebooting in safe mode, waiting until the system is quiescent, and then run the timing test with *nothing* else active that can be avoided. Even then, I would look at the distribution of times for a given test to check for anomalously high values that should be tossed. (This can be automated somewhat.) Terry Jan Reedy ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Terry Reedy wrote: But even better, the way to go to run comparison timings is to use a system with as little other stuff going on as possible. For Windows, this means rebooting in safe mode, waiting until the system is quiescent, and then run the timing test with *nothing* else active that can be avoided. sigh. /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg wrote: I just had an idea: if we could get each test to run inside a single time slice assigned by the OS scheduler, then we could benefit from the better resolution of the hardware timers while still keeping the noise to a minimum. I suppose this could be achieved by: * making sure that each tests needs less than 10ms to run iirc, very recent linux kernels have a 1 millisecond tick. so does alphas, and probably some other platforms. * calling time.sleep(0) after each test run so some higher priority process can get a chance to run, and spend 9.5 milliseconds shuffling data to a slow I/O device before blocking? ;-) I'm not sure this problem can be solved, really, at least not as long as you're constrained to portable API:s. (talking of which, if someone has some time and a linux box to spare, and wants to do some serious hacking on precision benchmarks, using http://user.it.uu.se/~mikpe/linux/perfctr/2.6/ to play with the TSC might be somewhat interesting.) /F ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
M.-A. Lemburg wrote: That's why the timers being used by pybench will become a parameter that you can then select to adapt pybench it to the OS your running pybench on. Wasn't that decision a consequence of the problems found during the sprint? It's a consequence of a discussion I had with Steve Holden and Tim Peters: I believe that using wall-clock timers for benchmarking is not a good approach due to the high noise level. Process time timers typically have a lower resolution, but give a better picture of the actual run-time of your code and also don't exhibit as much noise as the wall-clock timer approach. Of course, you have to run the tests somewhat longer to get reasonable accuracy of the timings. Tim thinks that it's better to use short running tests and an accurate timer, accepting the added noise and counting on the user making sure that the noise level is at a minimum. I just had an idea: if we could get each test to run inside a single time slice assigned by the OS scheduler, then we could benefit from the better resolution of the hardware timers while still keeping the noise to a minimum. I suppose this could be achieved by: * making sure that each tests needs less than 10ms to run * calling time.sleep(0) after each test run Here's some documentation on the Linux scheduler: http://www.samspublishing.com/articles/article.asp?p=101760seqNum=2rl=1 Table 3.1 has the minimum time slice: 10ms. What do you think ? Would this work ? I ran some tests related to this and it appears that provide the test itself uses less than 1ms, chances are high that you don't get any forced context switches in your way while running the test. It also appears that you have to use time.sleep(10e6) to get the desired behavior. time.sleep(0) seems to receive some extra care, so doesn't have the intended effect - at least not on Linux. I've checked this on AMD64 and Intel Pentium M. The script is attached - it will run until you get more than 10 forced context switches in 100 runs of the test, incrementing the runtime of the test in each round. It's also interesting that the difference between max and min run-time of the tests can be as low as 0.2% on the Pentium, whereas the AMD64 always stays around 4-5%. On an old AMD Athlon, the difference rare goes below 50% - this might also have to do with the kernel version running on that machine which is 2.4 whereas the AMD64 and Pentium M are running 2.6. Note that is needs to the resource module, so it won't work on Windows. It's interesting that even pressing a key on your keyboard will cause forced context switches. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jun 02 2006) Python/Zope Consulting and Support ...http://www.egenix.com/ mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! import resource, time def workload(rounds): x = 0 for i in range(rounds): x = x + 1 def microbench(): print 'Microbench' sleeptime = 10e-6 sleep = time.sleep timer = time.time rtype = resource.RUSAGE_SELF rounds = 100 while 1: times = [] rstart = resource.getrusage(rtype) for i in range(100): # Make sure the test is run at the start of a scheduling time # slice sleep(sleeptime) # Test start = timer() workload(rounds) stop = timer() times.append(stop - start) rstop = resource.getrusage(rtype) volswitches = rstop[-2] - rstart[-2] forcedswitches = rstop[-1] - rstart[-1] min_time = min(times) max_time = max(times) diff = max_time - min_time if forcedswitches == 0: print 'Rounds: %i' % rounds print ' min time: %f seconds' % min_time print ' max time: %f seconds' % max_time print ' diff: %f %% = %f seconds' % (diff / min_time * 100.0, diff) print ' context switches: %r %r' % (volswitches, forcedswitches) print elif forcedswitches 10: break rounds += 100 microbench() ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Some more comments re new uriparse module, patch 1462525
[Not sure whether this kind of thing is best posted as tracker comments (but then the tracker gets terribly long and is mailed out every time a change happens) or posted here. Feel free to tell me I'm posting in the wrong place...] Some comments on this patch (a new module, submitted by Paul Jimenez, implementing the rules set out in RFC 3986 for URI parsing, joining URI references with a base URI etc.) http://python.org/sf/1462525 Sorry for the pause, Paul. I finally read RFC 3986 -- which I must say is probably the best-written RFC I've read (and there was much rejoicing). I still haven't read 3987 and got to grips with the unicode issues (whatever they are), but I have just implemented the same stuff you did, so have some comments on non-unicode aspects of your implementation (the version labelled v23 on the tracker): Your urljoin implementation seems to pass the tests (the tests taken from the RFC), but I have to I admit I don't understand it :-) It doesn't seem to take account of the distinction between undefined and empty URI components. For example, the authority of the URI reference may be empty but still defined. Anyway, if you're taking advantage of some subtle identity that implies that you can get away with truth-testing in place of is None tests, please don't ;-) It's slower than is [not] None tests both for the computer and (especially!) the reader. I don't like the use of module posixpath to implement the algorithm labelled remove_dot_segments. URIs are not POSIX filesystem paths, and shouldn't depend on code meant to implement the latter. But my own implementation is exceedingly ugly ATM, so I'm in no position to grumble too much :-) Normalisation of the base URI is optional, and your urljoin function never normalises. Instead, it parses the base and reference, then follows the algorithm of section 5.2 of the RFC. Parsing is required before normalisation takes place. So urljoin forces people who need to normalise the URI before to parse it twice, which is annoying. There should be some way to parse 5-tuples in instead of URIs. E.g., from my implementation: def urljoin(base_uri, uri_reference): return urlunsplit(urljoin_parts(urlsplit(base_uri), urlsplit(uri_reference))) It would be nice to have a 5-tuple-like class (I guess implemented as a subclass of tuple) that also exposes attributes (.authority, .path, etc.) -- the same way module time does it. The path component is required, though may be empty. Your parser returns None (meaning undefined) where it should return an empty string. Nit: Your tests involving ports contain non-digit characters in the port (viz. port), which is not valid by section 3.2.3 of the RFC. Smaller nit: the userinfo component was never allowed in http URLs, but you use them in your tests. This issue is outside of RFC 3986, of course. Particularly because the userinfo component is deprecated, I'd rather that userinfo-splitting and joining were separate functions, with the other functions dealing only with the standard RFC 3986 5-tuples. DefaultSchemes should be a class attribute of URIParser The naming of URLParser / URIParser is still insane :-) I suggest naming them _URIParser and URIParser respectively. I guess there should be no mention of URL anywhere in the module -- only URI (even though I hate URI, as a mostly-worthless distinction from URL, consistency inside the module is more important, and URI is technically correct and fits with all the terminology used in the RFC). I'm still heavily -1 on calling it uriparse though, because of the highly misleading comparison with the name urlparse (the difference between the modules isn't the difference between URIs and URLs). Re your comment on mailto:; in the tracker: sure, I understand it's not meant to be public, but the interface is! .parse() will return a 4-tuple for mailto: URLs. For everything else, it will return a 7-tuple. That's silly. The documentation should explain that the function of URIParser is hiding scheme-dependent URI normalisation. Method names and locals are still likeThis, contrary to PEP 8. docstrings and other whitespace are still non-standard -- follow PEP 8 (and PEP 257, which PEP 8 references) Doesn't have to be totally rigid of course -- e.g. lining up the : characters in the tests is fine. Standard stdlib form documentation is still missing. I'll be told off if I don't read you your rights: you don't have to submit in LaTeX markup -- apparently there are hordes of eager LaTeX markers-up lurking ready to pounce on poorly-formatted documentation wink Test suite still needs tweaking to put it in standard stdlib form John ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
On 6/2/06, Terry Reedy [EMAIL PROTECTED] wrote: Hardly a setting in which to run comparison tests, seems to me. The point though was to show that the time distribution is non-Gaussian, so intuition based on that doesn't help. Using the minimum looks like the way to go for calibration. Or possibly the median. Why? I can't think of why that's more useful than the minimum time. Given an large number of samples the difference between the minimum and the median/average/whatever is mostly providing information about the background noise, which is pretty irrelevent to most benchmarks. But even better, the way to go to run comparison timings is to use a system with as little other stuff going on as possible. For Windows, this means rebooting in safe mode, waiting until the system is quiescent, and then run the timing test with *nothing* else active that can be avoided. A reason I program in Python is because I want to get work done and not deal with stoic purity. I'm not going to waste all that time (or money to buy a new machine) just to run a benchmark. Just how much more accurate would that be over the numbers we get now. Have you tried it? What additional sensitivity did you get and was the extra effort worthwhile? Even then, I would look at the distribution of times for a given test to check for anomalously high values that should be tossed. (This can be automated somewhat.) I say it can be automated completely. Toss all but the lowest. It's the one with the least noise overhead. I think fitting the smaller data points to a gamma distribution might yield better (more reproducible and useful) numbers but I know my stats ability is woefully decayed so I'm not going to try. My observation is that the shape factor is usually small so in a few dozen to a hundred samples there's a decent chance of getting a time with minimal noise overhead. Andrew [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
On 6/2/06, M.-A. Lemburg [EMAIL PROTECTED] wrote: It's interesting that even pressing a key on your keyboard will cause forced context switches. When niceness was first added to multiprocessing OSes people found their CPU intensive jobs would go faster by pressing enter a lot. Andrew [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
[MAL] Using the minimum looks like the way to go for calibration. [Terry Reedy] Or possibly the median. [Andrew Dalke] Why? I can't think of why that's more useful than the minimum time. A lot of things get mixed up here ;-) The _mean_ is actually useful if you're using a poor-resolution timer with a fast test. For example, suppose a test takes 1/10th the time of the span between counter ticks. Then, on average, in 9 runs out of 10 the reported elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1 tick. 0 and 1 are both wrong, but the mean (1/10) is correct. So there _can_ be sense to that. Then people vaguely recall that the median is more robust than the mean, and all sense goes out the window ;-) My answer is to use the timer with the best resolution the machine has. Using the mean is a way to worm around timer quantization artifacts, but it's easier and clearer to use a timer with resolution so fine that quantization doesn't make a lick of real difference. Forcing a test to run for a long time is another way to make timer quantization irrelevant, but then you're also vastly increasing chances for other processes to disturb what you're testing. I liked benchmarking on Crays in the good old days. No time-sharing, no virtual memory, and the OS believed to its core that its primary purpose was to set the base address once at the start of a job so the Fortran code could scream. Test times were reproducible to the nanosecond with no effort. Running on a modern box for a few microseconds at a time is a way to approximate that, provided you measure the minimum time with a high-resolution timer :-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
On Fri, Jun 02, 2006 at 07:44:07PM -0400, Tim Peters wrote: Fortran code could scream. Test times were reproducible to the nanosecond with no effort. Running on a modern box for a few microseconds at a time is a way to approximate that, provided you measure the minimum time with a high-resolution timer :-) On Linux with a multi-CPU machine, you could probably boot up the system to use N-1 CPUs, and then start the Python process on CPU N. That should avoid the process being interrupted by other processes, though I guess there would still be some noise from memory bus and kernel lock contention. (At work we're trying to move toward this approach for doing realtime audio: devote one CPU to the audio computation and use other CPUs for I/O, web servers, and whatnot.) --amk ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Tim Peters wrote: I liked benchmarking on Crays in the good old days. ... Test times were reproducible to the nanosecond with no effort. Running on a modern box for a few microseconds at a time is a way to approximate that, provided you measure the minimum time with a high-resolution timer :-) Obviously what we need here is a stand-alone Python interpreter that runs on the bare machine, so there's no pesky operating system around to mess up our times. -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
A.M. Kuchling wrote: (At work we're trying to move toward this approach for doing realtime audio: devote one CPU to the audio computation and use other CPUs for I/O, web servers, and whatnot.) Speaking of creative uses for multiple CPUs, I was thinking about dual-core Intel Macs the other day, and I wondered whether it would be possible to configure it so that one core was running MacOSX and the other was running Windows at the same time. It would give the term dual booting a whole new meaning... -- Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Greg Ewing [EMAIL PROTECTED] wrote: Tim Peters wrote: I liked benchmarking on Crays in the good old days. ... Test times were reproducible to the nanosecond with no effort. Running on a modern box for a few microseconds at a time is a way to approximate that, provided you measure the minimum time with a high-resolution timer :-) Obviously what we need here is a stand-alone Python interpreter that runs on the bare machine, so there's no pesky operating system around to mess up our times. An early version of unununium would do that (I don't know if much progress has been made since I last checked their site). - Josiah ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com