Re: [Python-Dev] test_unicode failure on MIPS

2006-06-02 Thread Fredrik Lundh
Neal Norwitz wrote:

 Any ideas?

this is a recent change, so it looks like the box simply didn't get 
around to rebuild the unicodeobject module.

(I'm beginning to wonder if I didn't forget to add some header file 
dependencies somewhere during the stringlib refactoring, but none of the 
other buildbots seem to have a problem with this...)

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] valgrind report

2006-06-02 Thread Neal Norwitz
Looks pretty good, except for 1 cjk problem:

test_codecencodings_jp

  Invalid read of size 1
 at 0x110AEBC3: shift_jis_2004_decode (_codecs_jp.c:642)
 by 0xBFCBDB7: mbidecoder_decode (multibytecodec.c:839)
   Address 0xAEC376B is 0 bytes after a block of size 3 alloc'd
 at 0x4A19B7E: malloc (vg_replace_malloc.c:149)
 by 0xBFCBF54: mbidecoder_decode (multibytecodec.c:1023)

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_unicode failure on MIPS

2006-06-02 Thread Neal Norwitz
On 6/1/06, Fredrik Lundh [EMAIL PROTECTED] wrote:
 Neal Norwitz wrote:

  Any ideas?

 this is a recent change, so it looks like the box simply didn't get
 around to rebuild the unicodeobject module.

That shouldn't be.  make distclean should be called (it was make clean
until recently).  However,

http://www.python.org/dev/buildbot/all/MIPS%20Debian%20trunk/builds/176/step-compile/0

seems to indicate unicodeobject was in fact not built.  I also don't
see any previous record of any builds (or make cleans).  That
buildslave is new and it had some connectivity problems I think.  So
maybe something was whacky on it.

The current build (still running) definitely did compile
unicodeobject.  So let's wait and see if that finishes successfully.

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 Seriously, I've been using and running pybench for years
 and even though tweaks to the interpreter do sometimes
 result in speedups or slow-downs where you wouldn't expect
 them (due to the interpreter using the Python objects),
 they are reproducable and often enough have uncovered
 that optimizations in one area may well result in slow-downs
 in other areas.
 
   Often enough the results are related to low-level features
   of the architecture you're using to run the code such as
   cache size, cache lines, number of registers in the CPU or
   on the FPU stack, etc. etc.
 
 and that observation has never made you stop and think about whether 
 there might be some problem with the benchmarking approach you're using? 

The approach pybench is using is as follows:

* Run a calibration step which does the same as the actual
  test without the operation being tested (ie. call the
  function running the test, setup the for-loop, constant
  variables, etc.)

  The calibration step is run multiple times and is used
  to calculate an average test overhead time.

* Run the actual test which runs the operation multiple
  times.

  The test is then adjusted to make sure that the
  test overhead / test run ratio remains within
  reasonable bounds.

  If needed, the operation code is repeated verbatim in
  the for-loop, to decrease the ratio.

* Repeat the above for each test in the suite

* Repeat the suite N number of rounds

* Calculate the average run time of all test runs in all rounds.

   after all, if a change to e.g. the try/except code slows things down 
 or speed things up, is it really reasonable to expect that the time it 
 takes to convert Unicode strings to uppercase should suddenly change due 
 to cache effects or a changing number of registers in the CPU?  real 
 hardware doesn't work that way...

Of course, but then changes to try-except logic can interfere
with the performance of setting up method calls. This is what
pybench then uncovers.

The only problem I see in the above approach is the way
calibration is done. The run-time of the calibration code
may be to small w/r to the resolution of the used timers.

Again, please provide the parameters you've used to run the
test case and the output. Things like warp factor, overhead,
etc. could hint to the problem you're seeing.

 is PyBench perhaps using the following approach:
 
  T = set of tests
  for N in range(number of test runs):
  for t in T:
  t0 = get_process_time()
  t()
  t1 = get_process_time()
  assign t1 - t0 to test t
  print assigned time
 
 where t1 - t0 is very short?

See above (or the code in pybench.py). t1-t0 is usually
around 20-50 seconds:


The tests must set .rounds to a value high enough to let the
test run between 20-50 seconds. This is needed because
clock()-timing only gives rather inaccurate values (on Linux,
for example, it is accurate to a few hundreths of a
second). If you don't want to wait that long, use a warp
factor larger than 1.


 that's not a very good idea, given how get_process_time tends to be 
 implemented on current-era systems (google for jiffies)...  but it 
 definitely explains the bogus subtest results I'm seeing, and the magic 
 hardware behaviour you're seeing.

That's exactly the reason why tests run for a relatively long
time - to minimize these effects. Of course, using wall time
make this approach vulnerable to other effects such as current
load of the system, other processes having a higher priority
interfering with the timed process, etc.

For this reason, I'm currently looking for ways to measure the
process time on Windows.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

 Of course, but then changes to try-except logic can interfere
 with the performance of setting up method calls. This is what
 pybench then uncovers.

I think the only thing PyBench has uncovered is that you're convinced that it's
always right, and everybody else is always wrong, including people who've
spent decades measuring performance, and the hardware in your own computer.

 See above (or the code in pybench.py). t1-t0 is usually
 around 20-50 seconds:

what machines are you using?  using the default parameters, the entire run takes
about 50 seconds on the slowest machine I could find...

 that's not a very good idea, given how get_process_time tends to be
 implemented on current-era systems (google for jiffies)...  but it
 definitely explains the bogus subtest results I'm seeing, and the magic
 hardware behaviour you're seeing.

 That's exactly the reason why tests run for a relatively long
 time - to minimize these effects. Of course, using wall time
 make this approach vulnerable to other effects such as current
 load of the system, other processes having a higher priority
 interfering with the timed process, etc.

since process time is *sampled*, not measured, process time isn't exactly in-
vulnerable either.  it's not hard to imagine scenarios where you end up being
assigned only a small part of the process time you're actually using, or cases
where you're assigned more time than you've had a chance to use.

afaik, if you want true performance counters on Linux, you need to patch the
operating system (unless something's changed in very recent versions).

I don't think that sampling errors can explain all the anomalies we've been 
seeing,
but I'd wouldn't be surprised if a high-resolution wall time clock on a lightly 
loaded
multiprocess system was, in practice, *more* reliable than sampled process time
on an equally loaded system.

/F 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 Of course, but then changes to try-except logic can interfere
 with the performance of setting up method calls. This is what
 pybench then uncovers.
 
 I think the only thing PyBench has uncovered is that you're convinced that 
 it's
 always right, and everybody else is always wrong, including people who've
 spent decades measuring performance, and the hardware in your own computer.

Oh, come on. You know that's not true and I'm trying to
understand what is causing your findings, but this is
difficult, since you're not providing enough details.
E.g. the output of pybench showing the timing results
would help a lot.

I would also like to reproduce your findings. Do you have
two revision numbers in svn which I could use for this ?

 See above (or the code in pybench.py). t1-t0 is usually
 around 20-50 seconds:
 
 what machines are you using?  using the default parameters, the entire run 
 takes
 about 50 seconds on the slowest machine I could find...

If the whole suite runs in 50 seconds, the per-test
run-times are far too small to be accurate. I usually
adjust the warp factor so that each *round* takes
50 seconds.

Looks like I have to revisit the default parameters and
update the doc-strings. I'll do that when I add the new
timers.

Could you check whether you still see the same results with
running with pybench.py -w 1 ?

 that's not a very good idea, given how get_process_time tends to be
 implemented on current-era systems (google for jiffies)...  but it
 definitely explains the bogus subtest results I'm seeing, and the magic
 hardware behaviour you're seeing.
 That's exactly the reason why tests run for a relatively long
 time - to minimize these effects. Of course, using wall time
 make this approach vulnerable to other effects such as current
 load of the system, other processes having a higher priority
 interfering with the timed process, etc.
 
 since process time is *sampled*, not measured, process time isn't exactly in-
 vulnerable either.  it's not hard to imagine scenarios where you end up being
 assigned only a small part of the process time you're actually using, or cases
 where you're assigned more time than you've had a chance to use.
 
 afaik, if you want true performance counters on Linux, you need to patch the
 operating system (unless something's changed in very recent versions).
 
 I don't think that sampling errors can explain all the anomalies we've been 
 seeing,
 but I'd wouldn't be surprised if a high-resolution wall time clock on a 
 lightly loaded
 multiprocess system was, in practice, *more* reliable than sampled process 
 time
 on an equally loaded system.

That's why the timers being used by pybench will become a
parameter that you can then select to adapt pybench it to
the OS your running pybench on.

Note that time.clock, the current default timer in pybench,
is a high accuracy wall-clock timer on Windows, so it should
demonstrate similar behavior to timeit.py, even more so,
since your using warp 20 and thus a similar timing strategy
as that of timeit.py.

I suspect that the calibration step is causing problems.

Steve added a parameter to change the number of calibration
runs done per test: -C n. The default is 20.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke

M.-A. Lemburg:

The approach pybench is using is as follows:

...

 The calibration step is run multiple times and is used
 to calculate an average test overhead time.


One of the changes that occured during the sprint was to change this algorithm
to use the best time rather than the average.  Using the average assumes a
Gaussian distribution.  Timing results are not.  There is an absolute best but
that's rarely reached due to background noise.  It's more like a gamma
distribution
plus the minimum time.

To show the distribution is non-Gaussian I ran the following

def compute():
   x = 0
   for i in range(1000):
   for j in range(1000):
   x += 1

def bench():
   t1 = time.time()
   compute()
   t2 = time.time()
   return t2-t1

times = []
for i in range(1000):
   times.append(bench())

print times

The full distribution is attached as 'plot1.png' and the close up
(range 0.45-0.65)
as 'plot2.png'.  Not a clean gamma function, but that's a closer match than an
exponential.

The gamma distribution looks more like a exponential function when the shape
parameter is large.  This corresponds to a large amount of noise in the system,
so the run time is not close to the best time.  This means the average approach
works better when there is a lot of random background activity, which is not the
usual case when I try to benchmark.

When averaging a gamma distribution you'll end up with a bit of a
skew, and I think
the skew depends on the number of samples, reaching a limit point.

Using the minimum time should be more precise because there is a
definite lower bound and the machine should be stable.  In my test
above the first few results are

0.472838878632
0.473038911819
0.473326921463
0.473494052887
0.473829984665

I'm pretty certain the best time is 0.4725, or very close to that.
But the average
time is 0.58330151391 because of the long tail.  Here are the last 6 results in
my population of 1000

1.76353311539
1.79937505722
1.82750201225
2.01710510254
2.44861507416
2.90868496895

Granted, I hit a couple of web pages while doing this and my spam
filter processed
my mailbox in the background...

There's probably some Markov modeling which would look at the number
and distribution of samples so far and assuming a gamma distribution
determine how many more samples are needed to get a good estimate of
the absolute minumum time.  But min(large enough samples) should work
fine.


If the whole suite runs in 50 seconds, the per-test
run-times are far too small to be accurate. I usually
adjust the warp factor so that each *round* takes
50 seconds.


The stringbench.py I wrote uses the timeit algorithm which
dynamically adjusts the test to run between 0.2 and 2 seconds.


That's why the timers being used by pybench will become a
parameter that you can then select to adapt pybench it to
the OS your running pybench on.


Wasn't that decision a consequence of the problems found during
the sprint?

   Andrew
   [EMAIL PROTECTED]


plot1.png
Description: PNG image


plot2.png
Description: PNG image
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing Mac OS 9 cruft

2006-06-02 Thread Guido van Rossum
Just and Jack have confirmed that you can throw away everything except
possibly Demo/*. (Just even speculated that some cruft may have been
accidentally revived by the cvs - svn transition?)

--Guido

On 6/1/06, Neal Norwitz [EMAIL PROTECTED] wrote:
 I was about to remove Mac/IDE scripts, but it looks like there might
 be more stuff that is OS 9 related and should be removed.  Other
 possibilities look like (everything under Mac/):

   Demo/*  this is a bit more speculative
   IDE scripts/*
   MPW/*
   Tools/IDE/*  this references IDE scripts, so presumably it should be toast?
   Tools/macfreeze/*
   Unsupported/mactcp/dnrglue.c
   Wastemods/*

 I'm going mostly based on what has been modified somewhat recently.
 Can someone confirm/reject these?  I'll remove them.

 n
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Andrew Dalke wrote:
 M.-A. Lemburg:
 The approach pybench is using is as follows:
 ...
  The calibration step is run multiple times and is used
  to calculate an average test overhead time.
 
 One of the changes that occured during the sprint was to change this
 algorithm
 to use the best time rather than the average.  Using the average assumes a
 Gaussian distribution.  Timing results are not.  There is an absolute
 best but
 that's rarely reached due to background noise.  It's more like a gamma
 distribution
 plus the minimum time.
 
 To show the distribution is non-Gaussian I ran the following
 
 def compute():
x = 0
for i in range(1000):
for j in range(1000):
x += 1
 
 def bench():
t1 = time.time()
compute()
t2 = time.time()
return t2-t1
 
 times = []
 for i in range(1000):
times.append(bench())
 
 print times
 
 The full distribution is attached as 'plot1.png' and the close up
 (range 0.45-0.65)
 as 'plot2.png'.  Not a clean gamma function, but that's a closer match
 than an
 exponential.
 
 The gamma distribution looks more like a exponential function when the
 shape
 parameter is large.  This corresponds to a large amount of noise in the
 system,
 so the run time is not close to the best time.  This means the average
 approach
 works better when there is a lot of random background activity, which is
 not the
 usual case when I try to benchmark.
 
 When averaging a gamma distribution you'll end up with a bit of a
 skew, and I think
 the skew depends on the number of samples, reaching a limit point.
 
 Using the minimum time should be more precise because there is a
 definite lower bound and the machine should be stable.  In my test
 above the first few results are
 
 0.472838878632
 0.473038911819
 0.473326921463
 0.473494052887
 0.473829984665
 
 I'm pretty certain the best time is 0.4725, or very close to that.
 But the average
 time is 0.58330151391 because of the long tail.  Here are the last 6
 results in
 my population of 1000
 
 1.76353311539
 1.79937505722
 1.82750201225
 2.01710510254
 2.44861507416
 2.90868496895
 
 Granted, I hit a couple of web pages while doing this and my spam
 filter processed
 my mailbox in the background...
 
 There's probably some Markov modeling which would look at the number
 and distribution of samples so far and assuming a gamma distribution
 determine how many more samples are needed to get a good estimate of
 the absolute minumum time.  But min(large enough samples) should work
 fine.

Thanks for the great analysis !

Using the minimum looks like the way to go for calibration.

I wonder whether the same is true for the actual tests; since
you're looking for the expected run-time, the minimum may
not necessarily be the choice. Then again, in both cases
you are only looking at a small number of samples (20 for
the calibration, 10 for the number of rounds), so this
may be irrelevant.

BTW, did you run this test on Windows or a Unix machine ?

There's also an interesting second high at around 0.53.
What could be causing this ?

 If the whole suite runs in 50 seconds, the per-test
 run-times are far too small to be accurate. I usually
 adjust the warp factor so that each *round* takes
 50 seconds.
 
 The stringbench.py I wrote uses the timeit algorithm which
 dynamically adjusts the test to run between 0.2 and 2 seconds.

 That's why the timers being used by pybench will become a
 parameter that you can then select to adapt pybench it to
 the OS your running pybench on.
 
 Wasn't that decision a consequence of the problems found during
 the sprint?

It's a consequence of a discussion I had with Steve Holden
and Tim Peters:

I believe that using wall-clock timers
for benchmarking is not a good approach due to the high
noise level. Process time timers typically have a lower
resolution, but give a better picture of the actual
run-time of your code and also don't exhibit as much noise
as the wall-clock timer approach. Of course, you have
to run the tests somewhat longer to get reasonable
accuracy of the timings.

Tim thinks that it's better to use short running tests and
an accurate timer, accepting the added noise and counting
on the user making sure that the noise level is at a
minimum.

Since I like to give users the option of choosing for
themselves, I'm going to make the choice of timer an
option.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list

Re: [Python-Dev] Let's stop eating exceptions in dict lookup

2006-06-02 Thread Michael Hudson
Anthony Baxter [EMAIL PROTECTED] writes:

 On Friday 02 June 2006 02:21, Jack Diederich wrote:
 The CCP Games CEO said they have trouble retaining talent from more
 moderate latitudes for this reason.  18 hours of daylight makes
 them a bit goofy and when the Winter Solstice rolls around they are
 apt to go quite mad.

 Obviously they need to hire people who are already crazy.

I think they already did! :)

 not-naming-any-names-ly,
 Anthony

me-neither-ly y'rs
mwh

-- 
   Look I don't know.  Thankyou everyone for arguing me round in
   circles.
  No need for thanks, ma'am; that's what we're here for.
-- LNR  Michael M Mason, cam.misc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

 I believe that using wall-clock timers
 for benchmarking is not a good approach due to the high
 noise level. Process time timers typically have a lower
 resolution, but give a better picture of the actual
 run-time of your code and also don't exhibit as much noise
 as the wall-clock timer approach.

please stop repeating this nonsense.  there are no process time timers in con-
temporary operating systems; only tick counters.

there are patches for linux and commercial add-ons to most platforms that lets
you use hardware performance counters for process stuff, but there's no way to
emulate that by playing with different existing Unix or Win32 API:s; the thing
you think you're using simply isn't there.

/F 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
 That's why the timers being used by pybench will become a
 parameter that you can then select to adapt pybench it to
 the OS your running pybench on.
 Wasn't that decision a consequence of the problems found during
 the sprint?
 
 It's a consequence of a discussion I had with Steve Holden
 and Tim Peters:
 
 I believe that using wall-clock timers
 for benchmarking is not a good approach due to the high
 noise level. Process time timers typically have a lower
 resolution, but give a better picture of the actual
 run-time of your code and also don't exhibit as much noise
 as the wall-clock timer approach. Of course, you have
 to run the tests somewhat longer to get reasonable
 accuracy of the timings.
 
 Tim thinks that it's better to use short running tests and
 an accurate timer, accepting the added noise and counting
 on the user making sure that the noise level is at a
 minimum.

I just had an idea: if we could get each test to run
inside a single time slice assigned by the OS scheduler,
then we could benefit from the better resolution of the
hardware timers while still keeping the noise to a
minimum.

I suppose this could be achieved by:

* making sure that each tests needs less than 10ms to run

* calling time.sleep(0) after each test run

Here's some documentation on the Linux scheduler:

http://www.samspublishing.com/articles/article.asp?p=101760seqNum=2rl=1

Table 3.1 has the minimum time slice: 10ms.

What do you think ? Would this work ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Michael Chermside
Marc-Andre Lemburg writes:
 Using the minimum looks like the way to go for calibration.

 I wonder whether the same is true for the actual tests; since
 you're looking for the expected run-time, the minimum may
 not necessarily be the choice.

No, you're not looking for the expected run-time. The expected
run-time is a function of the speed of the CPU, the architechure
of same, what else is running simultaneously -- perhaps even
what music you choose to listen to that day. It is NOT a
constant for a given piece of code, and is NOT what you are
looking for.

What you really want to do in benchmarking is to *compare* the
performance of two (or more) different pieces of code. You do,
of course, care about the real-world performance. So if you
had two algorithms and one ran twice as fast when there were no
context switches and 10 times slower when there was background
activity on the machine, then you'd want prefer the algorithm
that supports context switches. But that's not a realistic
situation. What is far more common is that you run one test
while listening to the Grateful Dead and another test while
listening to Bach, and that (plus other random factors and the
phase of the moon) causes one test to run faster than the
other.

Taking the minimum time clearly subtracts some noise, which is
a good thing when comparing performance for two or more pieces
of code. It fails to account for the distribution of times, so
if one piece of code occasionally gets lucky and takes far less
time then minimum time won't be a good choice... but it would
be tricky to design code that would be affected by the scheduler
in this fashion even if you were explicitly trying!


Later he continues:
 Tim thinks that it's better to use short running tests and
 an accurate timer, accepting the added noise and counting
 on the user making sure that the noise level is at a
 minimum.

 Since I like to give users the option of choosing for
 themselves, I'm going to make the choice of timer an
 option.

I'm generally a fan of giving programmers choices. However,
this is an area where we have demonstrated that even very
competent programmers often have misunderstandings (read this
thread for evidence!). So be very careful about giving such
a choice: the default behavior should be chosen by people
who think carefully about such things, and the documentation
on the option should give a good explanation of the tradeoffs
or at least a link to such an explanation.

-- Michael Chermside

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_ctypes failures on ppc64 debian

2006-06-02 Thread Tim Peters
[Thomas Heller]
 test_ctypes fails on the ppc64 machine.  I don't have access to such
 a machine myself, so I would have to do some trial and error, or try
 to print some diagnostic information.

 This should not be done in the trunk, so the question is: can the buildbots
 build branches?

Yes.  For example, that's how the buildbots run 2.4 tests.

 I assume I just have to enter a revision number and press the force-build
 button, is this correct?

No, you need to enter the tail end of the branch path in the Branch
to build: box.  You probably want to leave the Revision to build:
box empty.  Examples I know work because I've tried them in the past:
entering trunk in Branch to build: builds the current trunk, and
entering branches/release24-maint in Branch to build: builds the
current 2.4 branch.  I'm not certain that paths other than those work.

 Or would someone consider this abuse?

In this case, it only matters whether Matthias Klose thinks it's abuse
(since klose-debian-ppc64 is his box), so I've copied him on this
reply.  Matthias, I hope you don't mind some extra activity on that
box, since it may be the only way test_ctypes will ever pass on your
box :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_ctypes failures on ppc64 debian

2006-06-02 Thread Thomas Heller
Tim Peters wrote:
 [Thomas Heller]
 test_ctypes fails on the ppc64 machine.  I don't have access to such
 a machine myself, so I would have to do some trial and error, or try
 to print some diagnostic information.

 This should not be done in the trunk, so the question is: can the 
 buildbots
 build branches?
 
 Yes.  For example, that's how the buildbots run 2.4 tests.
 
 I assume I just have to enter a revision number and press the force-build
 button, is this correct?
 
 No, you need to enter the tail end of the branch path in the Branch
 to build: box.  You probably want to leave the Revision to build:
 box empty.  Examples I know work because I've tried them in the past:
 entering trunk in Branch to build: builds the current trunk, and
 entering branches/release24-maint in Branch to build: builds the
 current 2.4 branch.  I'm not certain that paths other than those work.
 
 Or would someone consider this abuse?
 
 In this case, it only matters whether Matthias Klose thinks it's abuse
 (since klose-debian-ppc64 is his box), so I've copied him on this
 reply.  Matthias, I hope you don't mind some extra activity on that
 box, since it may be the only way test_ctypes will ever pass on your
 box :-)

I have already mailed him asking if he can give me interactive access
to this machine ;-).  He has not yet replied - I'm not sure if this is because
he's been shocked to see such a request, or if he already is in holidays.

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SF patch #1473257: Add a gi_code attr to generators

2006-06-02 Thread Brett Cannon
On 6/1/06, Guido van Rossum [EMAIL PROTECTED] wrote:
On 6/1/06, Phillip J. Eby [EMAIL PROTECTED] wrote: I didn't know it was assigned to me.I guess SF doesn't send any notifications, and neither did Georg, so your email is the very first time
 that I've heard of it.This is a longstanding SF bug. (One of the reasons why we should moveaway from it ASAP IMO.)The Request for Trackers should go out this weekend, putting a worst case timeline of choosing a tracker as three months from this weekend. Once that is done hopefully switching over won't take very long. In other words, hopefully this can get done before October.
-BrettWhile we're still using SF, developers should probably get in the
habit of sending an email to the assignee when assigning a bug...Guido van Rossum (home page: http://www.python.org/~guido/)___
Python-Dev mailing listPython-Dev@python.orghttp://mail.python.org/mailman/listinfo/python-devUnsubscribe: 
http://mail.python.org/mailman/options/python-dev/brett%40python.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Terry Reedy

M.-A. Lemburg [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Granted, I hit a couple of web pages while doing this and my spam
 filter processed my mailbox in the background...

Hardly a setting in which to run comparison tests, seems to me.

 Using the minimum looks like the way to go for calibration.

Or possibly the median.

But even better, the way to go to run comparison timings is to use a system 
with as little other stuff going on as possible.  For Windows, this means 
rebooting in safe mode, waiting until the system is quiescent, and then run 
the timing test with *nothing* else active that can be avoided.

Even then, I would look at the distribution of times for a given test to 
check for anomalously high values that should be tossed.  (This can be 
automated somewhat.)

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
Terry Reedy wrote:

 But even better, the way to go to run comparison timings is to use a system 
 with as little other stuff going on as possible.  For Windows, this means 
 rebooting in safe mode, waiting until the system is quiescent, and then run 
 the timing test with *nothing* else active that can be avoided.

sigh.

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

 I just had an idea: if we could get each test to run
 inside a single time slice assigned by the OS scheduler,
 then we could benefit from the better resolution of the
 hardware timers while still keeping the noise to a
 minimum.
 
 I suppose this could be achieved by:
 
 * making sure that each tests needs less than 10ms to run

iirc, very recent linux kernels have a 1 millisecond tick.  so does 
alphas, and probably some other platforms.

 * calling time.sleep(0) after each test run

so some higher priority process can get a chance to run, and spend 9.5 
milliseconds shuffling data to a slow I/O device before blocking? ;-)

I'm not sure this problem can be solved, really, at least not as long as 
you're constrained to portable API:s.

(talking of which, if someone has some time and a linux box to spare, 
and wants to do some serious hacking on precision benchmarks, using

 http://user.it.uu.se/~mikpe/linux/perfctr/2.6/

to play with the TSC might be somewhat interesting.)

/F

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
 That's why the timers being used by pybench will become a
 parameter that you can then select to adapt pybench it to
 the OS your running pybench on.
 Wasn't that decision a consequence of the problems found during
 the sprint?
 It's a consequence of a discussion I had with Steve Holden
 and Tim Peters:

 I believe that using wall-clock timers
 for benchmarking is not a good approach due to the high
 noise level. Process time timers typically have a lower
 resolution, but give a better picture of the actual
 run-time of your code and also don't exhibit as much noise
 as the wall-clock timer approach. Of course, you have
 to run the tests somewhat longer to get reasonable
 accuracy of the timings.

 Tim thinks that it's better to use short running tests and
 an accurate timer, accepting the added noise and counting
 on the user making sure that the noise level is at a
 minimum.
 
 I just had an idea: if we could get each test to run
 inside a single time slice assigned by the OS scheduler,
 then we could benefit from the better resolution of the
 hardware timers while still keeping the noise to a
 minimum.
 
 I suppose this could be achieved by:
 
 * making sure that each tests needs less than 10ms to run
 
 * calling time.sleep(0) after each test run
 
 Here's some documentation on the Linux scheduler:
 
 http://www.samspublishing.com/articles/article.asp?p=101760seqNum=2rl=1
 
 Table 3.1 has the minimum time slice: 10ms.
 
 What do you think ? Would this work ?

I ran some tests related to this and it appears that provide
the test itself uses less than 1ms, chances are
high that you don't get any forced context switches in your
way while running the test.

It also appears that you have to use time.sleep(10e6) to
get the desired behavior. time.sleep(0) seems to receive
some extra care, so doesn't have the intended effect - at
least not on Linux.

I've checked this on AMD64 and Intel Pentium M. The script is
attached - it will run until you get more than 10 forced
context switches in 100 runs of the test, incrementing the
runtime of the test in each round.

It's also interesting that the difference between max and min
run-time of the tests can be as low as 0.2% on the Pentium,
whereas the AMD64 always stays around 4-5%. On an old AMD Athlon,
the difference rare goes below 50% - this might also have
to do with the kernel version running on that machine which
is 2.4 whereas the AMD64 and Pentium M are running 2.6.

Note that is needs to the resource module, so it won't work
on Windows.

It's interesting that even pressing a key on your keyboard
will cause forced context switches.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
import resource, time

def workload(rounds):

x = 0
for i in range(rounds):
x = x + 1

def microbench():

print 'Microbench'
sleeptime = 10e-6
sleep = time.sleep
timer = time.time
rtype = resource.RUSAGE_SELF
rounds = 100

while 1:
times = []
rstart = resource.getrusage(rtype)
for i in range(100):
# Make sure the test is run at the start of a scheduling time
# slice
sleep(sleeptime)
# Test
start = timer()
workload(rounds)
stop = timer()
times.append(stop - start)
rstop = resource.getrusage(rtype)
volswitches = rstop[-2] - rstart[-2]
forcedswitches = rstop[-1] - rstart[-1]
min_time = min(times)
max_time = max(times)
diff = max_time - min_time
if forcedswitches == 0:
print 'Rounds: %i' % rounds
print '  min time: %f seconds' % min_time
print '  max time: %f seconds' % max_time
print '  diff: %f %% = %f seconds' % (diff / min_time * 100.0,
  diff)
print '  context switches: %r %r' % (volswitches, forcedswitches)
print
elif forcedswitches  10:
break
rounds += 100

microbench()
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Some more comments re new uriparse module, patch 1462525

2006-06-02 Thread John J Lee
[Not sure whether this kind of thing is best posted as tracker comments 
(but then the tracker gets terribly long and is mailed out every time a 
change happens) or posted here.  Feel free to tell me I'm posting in the 
wrong place...]

Some comments on this patch (a new module, submitted by Paul Jimenez, 
implementing the rules set out in RFC 3986 for URI parsing, joining URI 
references with a base URI etc.)

http://python.org/sf/1462525


Sorry for the pause, Paul.  I finally read RFC 3986 -- which I must say is 
probably the best-written RFC I've read (and there was much rejoicing).

I still haven't read 3987 and got to grips with the unicode issues 
(whatever they are), but I have just implemented the same stuff you did, 
so have some comments on non-unicode aspects of your implementation (the 
version labelled v23 on the tracker):


Your urljoin implementation seems to pass the tests (the tests taken from 
the RFC), but I have to I admit I don't understand it :-)  It doesn't seem 
to take account of the distinction between undefined and empty URI 
components.  For example, the authority of the URI reference may be empty 
but still defined.  Anyway, if you're taking advantage of some subtle 
identity that implies that you can get away with truth-testing in place of 
is None tests, please don't ;-) It's slower than is [not] None tests 
both for the computer and (especially!) the reader.

I don't like the use of module posixpath to implement the algorithm 
labelled remove_dot_segments.  URIs are not POSIX filesystem paths, and 
shouldn't depend on code meant to implement the latter.  But my own 
implementation is exceedingly ugly ATM, so I'm in no position to grumble 
too much :-)

Normalisation of the base URI is optional, and your urljoin function
never normalises.  Instead, it parses the base and reference, then
follows the algorithm of section 5.2 of the RFC.  Parsing is required
before normalisation takes place.  So urljoin forces people who need
to normalise the URI before to parse it twice, which is annoying.
There should be some way to parse 5-tuples in instead of URIs.  E.g.,
from my implementation:

def urljoin(base_uri, uri_reference):
 return urlunsplit(urljoin_parts(urlsplit(base_uri),
 urlsplit(uri_reference)))


It would be nice to have a 5-tuple-like class (I guess implemented as a 
subclass of tuple) that also exposes attributes (.authority, .path, etc.) 
-- the same way module time does it.

The path component is required, though may be empty.  Your parser
returns None (meaning undefined) where it should return an empty
string.

Nit: Your tests involving ports contain non-digit characters in the
port (viz. port), which is not valid by section 3.2.3 of the RFC.

Smaller nit: the userinfo component was never allowed in http URLs,
but you use them in your tests.  This issue is outside of RFC 3986, of
course.

Particularly because the userinfo component is deprecated, I'd rather
that userinfo-splitting and joining were separate functions, with the
other functions dealing only with the standard RFC 3986 5-tuples.

DefaultSchemes should be a class attribute of URIParser

The naming of URLParser / URIParser is still insane :-)  I suggest
naming them _URIParser and URIParser respectively.

I guess there should be no mention of URL anywhere in the module --
only URI (even though I hate URI, as a mostly-worthless
distinction from URL, consistency inside the module is more
important, and URI is technically correct and fits with all the
terminology used in the RFC).  I'm still heavily -1 on calling it
uriparse though, because of the highly misleading comparison with
the name urlparse (the difference between the modules isn't the
difference between URIs and URLs).

Re your comment on mailto:; in the tracker: sure, I understand it's not 
meant to be public, but the interface is!  .parse() will return a 4-tuple 
for mailto: URLs.  For everything else, it will return a 7-tuple.  That's 
silly.

The documentation should explain that the function of URIParser is
hiding scheme-dependent URI normalisation.

Method names and locals are still likeThis, contrary to PEP 8.

docstrings and other whitespace are still non-standard -- follow PEP 8
(and PEP 257, which PEP 8 references) Doesn't have to be totally rigid
of course -- e.g. lining up the : characters in the tests is fine.

Standard stdlib form documentation is still missing.  I'll be told off
if I don't read you your rights: you don't have to submit in LaTeX
markup -- apparently there are hordes of eager LaTeX markers-up
lurking ready to pounce on poorly-formatted documentation wink

Test suite still needs tweaking to put it in standard stdlib form


John

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke
On 6/2/06, Terry Reedy [EMAIL PROTECTED] wrote:
 Hardly a setting in which to run comparison tests, seems to me.

The point though was to show that the time distribution is non-Gaussian,
so intuition based on that doesn't help.

  Using the minimum looks like the way to go for calibration.

 Or possibly the median.

Why?  I can't think of why that's more useful than the minimum time.

Given an large number of samples the difference between the
minimum and the median/average/whatever is mostly providing
information about the background noise, which is pretty irrelevent
to most benchmarks.

 But even better, the way to go to run comparison timings is to use a system
 with as little other stuff going on as possible.  For Windows, this means
 rebooting in safe mode, waiting until the system is quiescent, and then run
 the timing test with *nothing* else active that can be avoided.

A reason I program in Python is because I want to get work done and not
deal with stoic purity.  I'm not going to waste all that time (or money to buy
a new machine) just to run a benchmark.

Just how much more accurate would that be over the numbers we get
now.  Have you tried it?  What additional sensitivity did you get and was
the extra effort worthwhile?

 Even then, I would look at the distribution of times for a given test to
 check for anomalously high values that should be tossed.  (This can be
 automated somewhat.)

I say it can be automated completely.  Toss all but the lowest.
It's the one with the least noise overhead.

I think fitting the smaller data points to a gamma distribution might
yield better (more reproducible and useful) numbers but I know my
stats ability is woefully decayed so I'm not going to try.  My observation
is that the shape factor is usually small so in a few dozen to a hundred
samples there's a decent chance of getting a time with minimal noise
overhead.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke
On 6/2/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 It's interesting that even pressing a key on your keyboard
 will cause forced context switches.

When niceness was first added to multiprocessing OSes people found their
CPU intensive jobs would go faster by pressing enter a lot.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Tim Peters
[MAL]
 Using the minimum looks like the way to go for calibration.

[Terry Reedy]
 Or possibly the median.

[Andrew Dalke]
 Why?  I can't think of why that's more useful than the minimum time.

A lot of things get mixed up here ;-)  The _mean_ is actually useful
if you're using a poor-resolution timer with a fast test.  For
example, suppose a test takes 1/10th the time of the span between
counter ticks.  Then, on average, in 9 runs out of 10 the reported
elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1
tick.  0 and 1 are both wrong, but the mean (1/10) is correct.

So there _can_ be sense to that.  Then people vaguely recall that the
median is more robust than the mean, and all sense goes out the window
;-)

My answer is to use the timer with the best resolution the machine
has.  Using the mean is a way to worm around timer quantization
artifacts, but it's easier and clearer to use a timer with resolution
so fine that quantization doesn't make a lick of real difference.
Forcing a test to run for a long time is another way to make timer
quantization irrelevant, but then you're also vastly increasing
chances for other processes to disturb what you're testing.

I liked benchmarking on Crays in the good old days.  No time-sharing,
no virtual memory, and the OS believed to its core that its primary
purpose was to set the base address once at the start of a job so the
Fortran code could scream.  Test times were reproducible to the
nanosecond with no effort.  Running on a modern box for a few
microseconds at a time is a way to approximate that, provided you
measure the minimum time with a high-resolution timer :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread A.M. Kuchling
On Fri, Jun 02, 2006 at 07:44:07PM -0400, Tim Peters wrote:
 Fortran code could scream.  Test times were reproducible to the
 nanosecond with no effort.  Running on a modern box for a few
 microseconds at a time is a way to approximate that, provided you
 measure the minimum time with a high-resolution timer :-)

On Linux with a multi-CPU machine, you could probably boot up the
system to use N-1 CPUs, and then start the Python process on CPU N.
That should avoid the process being interrupted by other processes,
though I guess there would still be some noise from memory bus and
kernel lock contention.

(At work we're trying to move toward this approach for doing realtime
audio: devote one CPU to the audio computation and use other CPUs for
I/O, web servers, and whatnot.)

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Greg Ewing
Tim Peters wrote:

 I liked benchmarking on Crays in the good old days.  ...  
  Test times were reproducible to the
 nanosecond with no effort.  Running on a modern box for a few
 microseconds at a time is a way to approximate that, provided you
 measure the minimum time with a high-resolution timer :-)

Obviously what we need here is a stand-alone Python interpreter
that runs on the bare machine, so there's no pesky operating
system around to mess up our times.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Greg Ewing
A.M. Kuchling wrote:

 (At work we're trying to move toward this approach for doing realtime
 audio: devote one CPU to the audio computation and use other CPUs for
 I/O, web servers, and whatnot.)

Speaking of creative uses for multiple CPUs, I was thinking
about dual-core Intel Macs the other day, and I wondered
whether it would be possible to configure it so that one
core was running MacOSX and the other was running Windows
at the same time.

It would give the term dual booting a whole new
meaning...

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Josiah Carlson

Greg Ewing [EMAIL PROTECTED] wrote:
 
 Tim Peters wrote:
 
  I liked benchmarking on Crays in the good old days.  ...  
   Test times were reproducible to the
  nanosecond with no effort.  Running on a modern box for a few
  microseconds at a time is a way to approximate that, provided you
  measure the minimum time with a high-resolution timer :-)
 
 Obviously what we need here is a stand-alone Python interpreter
 that runs on the bare machine, so there's no pesky operating
 system around to mess up our times.

An early version of unununium would do that (I don't know if much
progress has been made since I last checked their site).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com