Re: [Python-Dev] Python Benchmarks
Fredrik Lundh wrote: > since process time is *sampled*, not measured, process time isn't exactly in- > vulnerable either. I can't share that view. The scheduler knows *exactly* what thread is running on the processor at any time, and that thread won't change until the scheduler makes it change. So if you discount time spent in interrupt handlers (which might be falsely accounted for the thread that happens to run at the point of the interrupt), then process time *is* measured, not sampled, on any modern operating system: it is updated whenever the scheduler schedules a different thread. Of course, the question still is what the resolution of the clock is that makes these measurements. For Windows NT+, I would expect it to be "quantum units", but I'm uncertain whether it could measure also fractions of a quantum unit if the process does a blocking call. > I don't think that sampling errors can explain all the anomalies we've been > seeing, > but I'd wouldn't be surprised if a high-resolution wall time clock on a > lightly loaded > multiprocess system was, in practice, *more* reliable than sampled process > time > on an equally loaded system. On Linux, process time is accounted in jiffies. Unfortunately, for compatibility, times(2) converts that to clock_t, losing precision. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Removing Mac OS 9 cruft
Guido van Rossum wrote: > Just and Jack have confirmed that you can throw away everything except > possibly Demo/*. (Just even speculated that some cruft may have been > accidentally revived by the cvs -> svn transition?) No, they had been present when cvs was converted: http://python.cvs.sourceforge.net/python/python/dist/src/Mac/IDE%20scripts/ These had caused ongoing problems for Windows, which could not stand files with trailing dots. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] test_ctypes failures on ppc64 debian
Thomas Heller wrote: > I have already mailed him asking if he can give me interactive access > to this machine ;-). He has not yet replied - I'm not sure if this is because > he's been shocked to see such a request, or if he already is in holidays. I believe its a machine donated to Debian. They are quite hesitant to hand out shell accounts to people who aren't Debian Developers. OTOH, running a build through buildbot should be fine if you have some "legitimate" use. It would be bad if the build were triggered by people who are not contributing to Python (this hasn't happened so far). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Greg Ewing <[EMAIL PROTECTED]> writes: > Tim Peters wrote: > >> I liked benchmarking on Crays in the good old days. ... > > Test times were reproducible to the >> nanosecond with no effort. Running on a modern box for a few >> microseconds at a time is a way to approximate that, provided you >> measure the minimum time with a high-resolution timer :-) > > Obviously what we need here is a stand-alone Python interpreter > that runs on the bare machine, so there's no pesky operating > system around to mess up our times. I'm sure we can write a PyPy backend that targets Open Firmware :) Cheers, mwh -- speak of the devil exarkun: froor not you -- from Twisted.Quotes ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Martin v. Löwis wrote: >> since process time is *sampled*, not measured, process time isn't exactly in- >> vulnerable either. > > I can't share that view. The scheduler knows *exactly* what thread is > running on the processor at any time, and that thread won't change > until the scheduler makes it change. So if you discount time spent > in interrupt handlers (which might be falsely accounted for the > thread that happens to run at the point of the interrupt), then > process time *is* measured, not sampled, on any modern operating system: > it is updated whenever the scheduler schedules a different thread. updated with what? afaik, the scheduler doesn't have to wait for a timer interrupt to reschedule things (think blocking, or interrupts that request rescheduling, or new processes, or...) -- but it's always the thread that runs when the timer interrupt arrives that gets the entire jiffy time. for example, this script runs for ten seconds, usually without using any process time at all: import time for i in range(1000): for i in range(1000): i+i+i+i time.sleep(0.005) while the same program, without the sleep, will run for a second or two, most of which is assigned to the process. if the scheduler used the TSC to keep track of times, it would be *measuring* process time. but unless something changed very recently, it doesn't. it's all done by sampling, typically 100 or 1000 times per second. > On Linux, process time is accounted in jiffies. Unfortunately, for > compatibility, times(2) converts that to clock_t, losing precision. times(2) reports time in 1/CLOCKS_PER_SEC second units, while jiffies are counted in 1/HZ second units. on my machine, CLOCKS_PER_SEC is a thousand times larger than HZ. what does this code print on your machine? #include #include main() { printf("CLOCKS_PER_SEC=%d, HZ=%d\n", CLOCKS_PER_SEC, HZ); } ? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Fredrik Lundh wrote: >> it is updated whenever the scheduler schedules a different thread. > > updated with what? afaik, the scheduler doesn't have to wait for a > timer interrupt to reschedule things (think blocking, or interrupts that > request rescheduling, or new processes, or...) -- but it's always the > thread that runs when the timer interrupt arrives that gets the entire > jiffy time. Sure: when a thread doesn't consume its entire quantum, accounting becomes difficult. Still, if the scheduler reads the current time when scheduling, it measures the time consumed. > if the scheduler used the TSC to keep track of times, it would be > *measuring* process time. but unless something changed very recently, > it doesn't. You mean, "unless something changed very recently" *on Linux*, right? Or when did you last read the sources of Windows XP? It would still be measuring if the scheduler reads the latest value of some system clock, although that would be much less accurate than reading the TSC. > times(2) reports time in 1/CLOCKS_PER_SEC second units, while jiffies > are counted in 1/HZ second units. on my machine, CLOCKS_PER_SEC is a > thousand times larger than HZ. what does this code print on your machine? You are right; clock_t allows for higher precision than jiffies. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Tim: > A lot of things get mixed up here ;-) The _mean_ is actually useful > if you're using a poor-resolution timer with a fast test. In which case discrete probability distributions are better than my assumption of a continuous distribution. I looked at the distribution of times for 1,000 repeats of t1 = time.time() t2 = time.time() times.append(t2-t1) The times and counts I found were 9.53674316406e-07 388 1.19209289551e-06 95 1.90734863281e-06 312 2.14576721191e-06 201 2.86102294922e-06 2 1.90734863281e-05 1 3.00407409668e-05 1 This implies my Mac's time.time() has a resolution of 2.384185791015e-07 s (0.2µs or about 4.2MHz.) Or possibily a small integer fraction thereof. The timer overhead takes between 4 and 9 ticks. Ignoring the outliers, assuming I have the CPU all to my benchmark for the timeslice then I expect about +/- 3 ticks of noise per test. To measure 1% speedup reliably I'll need to run, what, 300-600 ticks? That's a millisecond, and with a time quantum of 10 ms there's a 1 in 10 chance that I'll incur that overhead. In other words, I don't think my high-resolution timer is high enough. Got a spare Cray I can use, and will you pay for the power bill? Andrew [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Martin v. Löwis wrote: > Sure: when a thread doesn't consume its entire quantum, accounting > becomes difficult. Still, if the scheduler reads the current time > when scheduling, it measures the time consumed. yeah, but the point is that it *doesn't* read the current time: all the system does it to note that "alright, we've reached the end of another jiffy, and this thread was running at that point. now, was it running in user space or in kernel space when we interrupted it?". here's the relevant code, from kernel/timer.c and kernel/sched.c: #define jiffies_to_cputime(__hz) (__hz) void update_process_times(int user_tick) { struct task_struct *p = current; int cpu = smp_processor_id(); if (user_tick) account_user_time(p, jiffies_to_cputime(1)); else account_system_time(p, HARDIRQ_OFFSET, jiffies_to_cputime(1)); run_local_timers(); if (rcu_pending(cpu)) rcu_check_callbacks(cpu, user_tick); scheduler_tick(); run_posix_cpu_timers(p); } void account_user_time(struct task_struct *p, cputime_t cputime) { struct cpu_usage_stat *cpustat = &kstat_this_cpu.cpustat; cputime64_t tmp; p->utime = cputime_add(p->utime, cputime); tmp = cputime_to_cputime64(cputime); if (TASK_NICE(p) > 0) cpustat->nice = cputime64_add(cpustat->nice, tmp); else cpustat->user = cputime64_add(cpustat->user, tmp); } (update_process_times is called by the hardware timer interrupt handler, once per jiffy, HZ times per second. task_struct contains information about a single thread, cpu_usage_stat is global stats for a CPU) for the benchmarks, the problem is of course not that the benchmarking thread gives up too early; it's when other processes give up early, and the benchmark process is next in line. in that case, the benchmark won't use a whole jiffy, but it's still charged for a full jiffy interval by the interupt handler (in my sleep test, *other processes* got charged for the time the program spent running that inner loop). a modern computer can to *lots of stuff* in a single jiffy interval (whether it's 15 ms, 10 ms, 4 ms, or 1 ms), and even more in a single scheduler quantum (=a number of jiffies). > You mean, "unless something changed very recently" *on Linux*, right? on any system involved in this discussion. they all worked the same way, last time I checked ;-) > Or when did you last read the sources of Windows XP? afaik, all Windows versions based on the current NT kernel (up to and including XP) uses tick-based sampling. I don't know about Vista; given the platform requirements for Vista, it's perfectly possible that they've switched to TSC-based accounting. > It would still be measuring if the scheduler reads the latest value > of some system clock, although that would be much less accurate than > reading the TSC. hopefully, this is the last time I will have to repeat this, but on both Windows and Linux, the "system clock" used for process timing is a jiffy counter. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
[Fredrik Lundh] >> ... >> since process time is *sampled*, not measured, process time isn't exactly in- >> vulnerable either. [Martin v. Löwis] > I can't share that view. The scheduler knows *exactly* what thread is > running on the processor at any time, and that thread won't change > until the scheduler makes it change. So if you discount time spent > in interrupt handlers (which might be falsely accounted for the > thread that happens to run at the point of the interrupt), then > process time *is* measured, not sampled, on any modern operating > system: it is updated whenever the scheduler schedules a different > thread. That doesn't seem to agree with, e.g., http://lwn.net/2001/0412/kernel.php3 under "No more jiffies?": ... Among other things, it imposes a 10ms resolution on most timing- related activities, which can make it hard for user-space programs that need a tighter control over time. It also guarantees that process accounting will be inaccurate. Over the course of one 10ms jiffy, several processes might have run, but the one actually on the CPU when the timer interrupt happens gets charged for the entire interval. Maybe this varies by Linux flavor or version? While the article above was published in 2001, Googling didn't turn up any hint that Linux jiffies have actually gone away, or become better loved, since then. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Tim Peters wrote: > Maybe this varies by Linux flavor or version? While the article above > was published in 2001, Googling didn't turn up any hint that Linux > jiffies have actually gone away, or become better loved, since then. well, on x86, they have changed from 10 ms in 2.4 to 1 ms in early 2.6 releases and 4 ms in later 2.6 releases, but that's about it. (the code in my previous post was from a 2.6.17 development version, which, afaict, is about as up to date as you can be). note that the jiffy interrupt handler does use the TSC (or similar mechanism) to update the wall clock time, so it wouldn't be that hard to refactor the code to use it also for process accounting. but I suppose the devil is in the backwards-compatibility details. just setting the HZ value to something very large will probably not work very well... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
[Fredrik Lundh] > but it's always the thread that runs when the timer interrupt > arrives that gets the entire jiffy time. for example, this script runs > for ten seconds, usually without using any process time at all: > > import time > for i in range(1000): > for i in range(1000): > i+i+i+i > time.sleep(0.005) > > while the same program, without the sleep, will run for a second or two, > most of which is assigned to the process. Nice example! On my desktop box (WinXP, 3.4GHz), I had to make it nastier to see it consume any "time" without the sleep: import time for i in range(1000): for i in range(1): # 10x bigger i+i+i+i*(i+i+i+i) # more work time.sleep(0.005) raw_input("done") The raw_input is there so I can see Task Manager's idea of elapsed "CPU Time" (sum of process "user time" and "kernel time") when it's done. Without the sleep, it gets charged 6 CPU seconds. With the sleep, 0 CPU seconds. But life would be more boring if people believed you the first time ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Tim Peters wrote: >> then >> process time *is* measured, not sampled, on any modern operating >> system: it is updated whenever the scheduler schedules a different >> thread. > > That doesn't seem to agree with, e.g., > >http://lwn.net/2001/0412/kernel.php3 > > under "No more jiffies?": [...] > > Maybe this varies by Linux flavor or version? No, Fredrik is right: Linux samples process time, instead of measuring it. That only proves it is not a modern operating system :-) I would still hope that Windows measures instead of sampling. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Tim Peters wrote: > Without the sleep, it gets charged 6 CPU seconds. With the sleep, 0 > CPU seconds. > > But life would be more boring if people believed you the first time ;-) This only proves that it uses clock ticks for the accounting, and not something with higher resolution. To find out whether it samples or measures CPU usage, you really have to read the source code of the operating system (or find some documentation of somebody who has seen the source code). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
Here are my suggestions: - While running bench marks don't listen to music, watch videos, use the keyboard/mouse, or run anything other than the bench mark code. Seams like common sense to me. - I would average the timings of runs instead of taking the minimum value as sometimes bench marks could be running code that is not deterministic in its calculations (could be using random numbers that effect convergence). - Before calculating the average number I would throw out samples outside 3 sigmas (the outliers). This would eliminate the samples that are out of wack due to events that are out of our control. To use this approach it would be necessary to run some minimum number of times. I believe 30-40 samples would be necessary but I'm no expert in statistics. I base this on my recollection of a study on this I did some time in the late 90s. I use to have a better feel for the number of samples that is required based on the number of sigmas that is used to determine the outliers but I have to confess that I just normally use a minimum of 100 samples to play it safe. I'm sure with a little experimentation with bench marks the proper number of samples could be determined. Here is a passage I found at http://www.statsoft.com/textbook/stbasic.html#Correlationsf that is related. '''Quantitative Approach to Outliers. Some researchers use quantitative methods to exclude outliers. For example, they exclude observations that are outside the range of �2 standard deviations (or even �1.5 sd's) around the group or design cell mean. In some areas of research, such "cleaning" of the data is absolutely necessary. For example, in cognitive psychology research on reaction times, even if almost all scores in an experiment are in the range of 300-700 milliseconds, just a few "distracted reactions" of 10-15 seconds will completely change the overall picture. Unfortunately, defining an outlier is subjective (as it should be), and the decisions concerning how to identify them must be made on an individual basis (taking into account specific experimental paradigms and/or "accepted practice" and general research experience in the respective area). It should also be noted that in some rare cases, the relative frequency of outliers across a number of groups or cells! of a d esign can be subjected to analysis and provide interpretable results. For example, outliers could be indicative of the occurrence of a phenomenon that is qualitatively different than the typical pattern observed or expected in the sample, thus the relative frequency of outliers could provide evidence of a relative frequency of departure from the process or phenomenon that is typical for the majority of cases in a group.''' Now I personally feel that using 1.5 or 2 sigma approach is rather loose for the case of bench marks and the suggestion I gave of 3 might be too tight. From experimentation we might find that 2.5 is more appropriate. I usually use this approach while reviewing data obtained by fairly accurate sensors so being being conservative using 3 sigmas works well for these cases. The last statement in the passage is worthy to note as a high ratio of outliers could be used as an indication that the bench mark results for a particular run are invalid. This could be used to throw out bad results due to some one starting to listen to music while the bench marks are running, anti virus software starts to run, etc. - Another improvement to bench marks can be obtained when both the old and new code is available to be benched mark together. By running the bench marks of both codes together we could eliminate effects of noise if we assume noise at a given point of time would be applied to both sets of code. Here is a modified version of the code that Andrew wrote previously to show this clearer than my words. def compute_old(): x = 0 for i in range(1000): for j in range(1000): x = x + 1 def compute_new(): x = 0 for i in range(1000): for j in range(1000): x += 1 def bench(): t1 = time.clock() compute_old() t2 = time.clock() compute_new() t3 = time.clock() return t2-t1, t3-t2 times_old = [] times_new = [] for i in range(1000): time_old, time_new = bench() times_old.append(time_old) times_new.append(time_new) John ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] ssize_t question: longs in header files
On 5/29/06, Tim Peters <[EMAIL PROTECTED]> wrote: > [Neal Norwitz] > > * hash values > > Include/abstract.h: long PyObject_Hash(PyObject *o); // also in > > object.h > > Include/object.h:typedef long (*hashfunc)(PyObject *); > > We should leave these alone for now. There's no real connection > between the width of a hash value and the number of elements in a > container, and Py_ssize_t is conceptually only related to the latter. True. Though it might be easier to have one big type changing than two. If this is likely to change in the future (and I think it should to avoid hash collisions and provide better consistency on 64-bit archs), would it be good to add: typedef long Py_hash_t; This will not change the type, but will make it easy to change in the future. I'm uncertain about doing this in 2.5. I think it would help me port code, but I'm only familiar with the Python base, not wild and crazy third party C extensions. The reason why it's easier for me is that grep can help me find and fix just about everything. There are fewer exceptions (longs left). It would also help mostly from a doc standpoint to have typedefs for Py_visit_t and other ints as well. But this also seems like diminishing returns. n ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Python Benchmarks
On 6/3/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > - I would average the timings of runs instead of taking the minimum value as > sometimes bench marks could be running code that is not deterministic in its > calculations (could be using random numbers that effect convergence). I would rewrite those to be deterministic. Any benchmarks of mine which use random numbers initializes the generator with a fixed seed and does it in such a way that the order or choice of subbenchmarks does not affect the individual results. Any other way is madness. > - Before calculating the average number I would throw out samples > outside 3 sigmas (the outliers). As I've insisted, talking about sigmas assumes a Gaussian distribution. It's more likely that the timing variations (at least in stringbench) are closer to a gamma distribution. > Here is a passage I found ... ... > Unfortunately, defining an outlier is subjective (as it should be), and the > decisions concerning how to identify them must be made on an individual > basis (taking into account specific experimental paradigms The experimental paradigm I've been using is: - precise and accurate clock on timescales much smaller than the benchmark (hence continuous distributions) - rare, random, short and uncorrelated interruptions This leads to a gamma distribution (plus constant offset for minimum compute time) Gamma distributions have longer tails than Gaussians and hence more "outliers". If you think that averaging is useful then throwing those outliers away will artificially lower the average value. To me, using the minimum time, given the paradigm, makes sense. How fast is the fastest runner in the world? Do you have him run a dozen times and get the average, or use the shortest time? > I usually use this approach while reviewing data obtained by fairly > accurate sensors so being being conservative using 3 sigmas works > well for these cases. That uses a different underlying physical process which is better modeled by Gaussians. Consider this. For a given benchmark there is an absolute minimum time for it to run on a given machine. Suppose this is 10 seconds and the benchmark timing comes out 10.2 seconds. The +0.2 comes from background overhead, though you don't know exactly what's due to overhead and what's real. If the times were Gaussian then there's as much chance of getting benchmark times of 10.5 seconds as of 9.9 seconds. But repeat the benchmark as many times as you want and you'll never see 9.9 seconds, though you will see 10.5. > - Another improvement to bench marks can be obtained when both > the old and new code is available to be benched mark together. That's what stringbench does, comparing unicode and 8-bit strings. However, how do you benchmark changes which are more complicated than that? For example, benchmark changes to the exception mechanism, or builds under gcc 3.x and 4.x. Andrew [EMAIL PROTECTED] ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] wsgiref documentation
Hi, I'm going over the possible tasks for the Arlington Sprint. Documentation for wsgiref looks like somethng I could handle. My friend Joe Griffin and I did something similar for Tim Peters' FixedPoint module. Is anyone already working on this? -- Doug Fort, Consulting Programmer http://www.dougfort.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Unhashable objects and __contains__()
I recently submitted a patch that would optimise "in (5, 6, 7)" (ie, "in" ops on constant tuples) to "in frozenset([5, 6, 7])". Raymond Hettinger rejected (rightly) the patch since it's not semantically consistent. Quoth: >> Sorry, this enticing idea has already been explored and >> rejected. This is issue is that the transformation is not >> semanatically neutral. Currently, writing "{} in (1,2,3)" >> returns False, but after the transformation would raise an >> exception, "TypeError: dict objects are unhashable". My question is this: maybe set/frozenset.__contains__ (as well as dict.__contains__, etc) should catch such TypeErrors and convert them to a return value of False? It makes sense that "{} in frozenset([(1, 2, 3])" should be False, since unhashable objects (like {}) clearly can't be part of the set/dict/whatever. I am, however, a bit unsure as to how __contains__() would be sure it was only catching the "this object can't be hash()ed" TypeErrors, as opposed to other TypeErrors that might legimately arise from a call to some __hash__() method. Idea: what if Python's -O option caused PySequence_Contains() to convert all errors into False return values? Collin Winter ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unhashable objects and __contains__()
Collin Winter wrote: > I recently submitted a patch that would optimise "in (5, 6, 7)" (ie, > "in" ops on constant tuples) to "in frozenset([5, 6, 7])". Raymond > Hettinger rejected (rightly) the patch since it's not semantically > consistent. Quoth: > >>> Sorry, this enticing idea has already been explored and >>> rejected. This is issue is that the transformation is not >>> semanatically neutral. Currently, writing "{} in (1,2,3)" >>> returns False, but after the transformation would raise an >>> exception, "TypeError: dict objects are unhashable". > > My question is this: maybe set/frozenset.__contains__ (as well as > dict.__contains__, etc) should catch such TypeErrors and convert them > to a return value of False? It makes sense that "{} in frozenset([(1, > 2, 3])" should be False, since unhashable objects (like {}) clearly > can't be part of the set/dict/whatever. > > I am, however, a bit unsure as to how __contains__() would be sure it > was only catching the "this object can't be hash()ed" TypeErrors, as > opposed to other TypeErrors that might legimately arise from a call to > some __hash__() method. > > Idea: what if Python's -O option caused PySequence_Contains() to > convert all errors into False return values? It would certainly give me an uneasy feeling if a command-line switch caused such a change in semantics. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unhashable objects and __contains__()
On 6/3/06, Collin Winter <[EMAIL PROTECTED]> wrote: > My question is this: maybe set/frozenset.__contains__ (as well as > dict.__contains__, etc) should catch such TypeErrors and convert them > to a return value of False? It makes sense that "{} in frozenset([(1, > 2, 3])" should be False, since unhashable objects (like {}) clearly > can't be part of the set/dict/whatever. Sounds like a bad idea. You already pointed out that it's tricky to catch exceptions and turn them into values without the risk of masking bugs that would cause those same exceptions. In addition, IMO it's a good idea to point out that "{} in {}" is a type error by raising an exception. It's just like "1 in 'abc'" -- the 'in' operation has an implementation that doesn't support all types, and if you try a type that's not supported, you expect a type error. I expect that this is more likely to help catch bugs than it is an obstacle. (I do understand your use case -- I just don't believe it's as important as the bug-catching property you'd be throwing away by supporting that use case.) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unhashable objects and __contains__()
On 6/3/06, Georg Brandl <[EMAIL PROTECTED]> wrote: > Collin Winter wrote: > > Idea: what if Python's -O option caused PySequence_Contains() to > > convert all errors into False return values? > > It would certainly give me an uneasy feeling if a command-line switch > caused such a change in semantics. I missed that. Collin must be suffering from a heat stroke. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Request for patch review
Georg Brandl wrote: > I've worked on two patches for NeedForSpeed, and would like someone > familiar with the areas they touch to review them before I check them > in, breaking all the buildbots which aren't broken yet ;) > > They are: > > http://python.org/sf/1346214 > Better dead code elimination for the AST compiler No one wants to look at this? It's not too complicated, I promise. > http://python.org/sf/921466 > Reduce number of open calls on startup GB That's now committed. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Request for patch review
On 6/3/06, Georg Brandl <[EMAIL PROTECTED]> wrote: Georg Brandl wrote:> I've worked on two patches for NeedForSpeed, and would like someone> familiar with the areas they touch to review them before I check them> in, breaking all the buildbots which aren't broken yet ;) >> They are:>> http://python.org/sf/1346214> Better dead code elimination for the AST compilerNo one wants to look at this? It's not too complicated, I promise. Well, "wants' is a strong word. =)Code looks fine (didn't apply it, but looked at the patch file itself). I would break the detection for 'return' in generators into a separate patch since it has nothing to do with detection of dead code. -Brett> http://python.org/sf/921466 > Reduce number of open calls on startup GBThat's now committed.Georg___Python-Dev mailing list Python-Dev@python.orghttp://mail.python.org/mailman/listinfo/python-devUnsubscribe: http://mail.python.org/mailman/options/python-dev/brett%40python.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Some more comments re new uriparse module, patch 1462525
On Friday, Jun 2, 2006, John J Lee writes: >[Not sure whether this kind of thing is best posted as tracker comments >(but then the tracker gets terribly long and is mailed out every time a >change happens) or posted here. Feel free to tell me I'm posting in the >wrong place...] I think this is a fine place - more googleable, still archived, etc. >Some comments on this patch (a new module, submitted by Paul Jimenez, >implementing the rules set out in RFC 3986 for URI parsing, joining URI >references with a base URI etc.) > >http://python.org/sf/1462525 Note that like many opensource authors, I wrote this to 'scratch an itch' that I had... and am submitting it in hopes of saving someone else somewhere some essentially identical work. I'm not married to it; I just want something *like* it to end up in the stdlib so that I can use it. >Sorry for the pause, Paul. I finally read RFC 3986 -- which I must say is >probably the best-written RFC I've read (and there was much rejoicing). No worries. Yeah, the RFC is pretty clear (for once) :) >I still haven't read 3987 and got to grips with the unicode issues >(whatever they are), but I have just implemented the same stuff you did, >so have some comments on non-unicode aspects of your implementation (the >version labelled "v23" on the tracker): > > >Your urljoin implementation seems to pass the tests (the tests taken from >the RFC), but I have to I admit I don't understand it :-) It doesn't seem >to take account of the distinction between undefined and empty URI >components. For example, the authority of the URI reference may be empty >but still defined. Anyway, if you're taking advantage of some subtle >identity that implies that you can get away with truth-testing in place of >"is None" tests, please don't ;-) It's slower than "is [not] None" tests >both for the computer and (especially!) the reader. First of all, I must say that urljoin is my least favorite part of this module; I include it only so as not to break backward compatibility - I don't have any personal use-cases for such. That said, some of the 'join' semantics are indeed a bit subtle; it took a bit of tinkering to make all the tests work. I was indeed using 'if foo:' instead of 'if foo is not None:', but that can be easily fixed; I didn't know there was a performance issue there. Stylistically I find them about the same clarity-wise. >I don't like the use of module posixpath to implement the algorithm >labelled "remove_dot_segments". URIs are not POSIX filesystem paths, and >shouldn't depend on code meant to implement the latter. But my own >implementation is exceedingly ugly ATM, so I'm in no position to grumble >too much :-) While URIs themselves are not, of course, POSIX filesystem paths, I believe there's a strong case that their path components are semantically identical in this usage. I see no need to duplicate code that I know can be fairly tricky to get right; better to let someone else worry about the corner cases and take advantage of their work when I can. >Normalisation of the base URI is optional, and your urljoin function >never normalises. Instead, it parses the base and reference, then >follows the algorithm of section 5.2 of the RFC. Parsing is required >before normalisation takes place. So urljoin forces people who need >to normalise the URI before to parse it twice, which is annoying. >There should be some way to parse 5-tuples in instead of URIs. E.g., >from my implementation: > >def urljoin(base_uri, uri_reference): > return urlunsplit(urljoin_parts(urlsplit(base_uri), > urlsplit(uri_reference))) > It would certainly be easy to add a version which took tuples instead of strings, but I was attempting, as previously stated, to conform to the extant urlparse.urljoin API for backward compatability. Also as I previously stated, I have no personal use-cases for urljoin so the issue of having to double-parse if you do normalization never came to my attention. >It would be nice to have a 5-tuple-like class (I guess implemented as a >subclass of tuple) that also exposes attributes (.authority, .path, etc.) >-- the same way module time does it. That starts to edge over into a 'generic URI' class, which I'm uncomfortable with due to the possibility of opaque URIs that don't conform to that spec. The fallback of putting everthing other than the scheme into 'path' doesn't appeal to me. >The path component is required, though may be empty. Your parser >returns None (meaning "undefined") where it should return an empty >string. Indeed. Fixed now; a fresh look at the code showed me where the mistakes that made that seem necessary lay. >Nit: Your tests involving ports contain non-digit characters in the >port (viz. "port"), which is not valid by section 3.2.3 of the RFC. Indeed. Nit fixed. >Smaller nit: the userinfo component was never allowed in http URLs, >but you use them in your tests. This issue i