Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
> 
>> Seriously, I've been using and running pybench for years
>> and even though tweaks to the interpreter do sometimes
>> result in speedups or slow-downs where you wouldn't expect
>> them (due to the interpreter using the Python objects),
>> they are reproducable and often enough have uncovered
>> that optimizations in one area may well result in slow-downs
>> in other areas.
> 
>  > Often enough the results are related to low-level features
>  > of the architecture you're using to run the code such as
>  > cache size, cache lines, number of registers in the CPU or
>  > on the FPU stack, etc. etc.
> 
> and that observation has never made you stop and think about whether 
> there might be some problem with the benchmarking approach you're using? 

The approach pybench is using is as follows:

* Run a calibration step which does the same as the actual
  test without the operation being tested (ie. call the
  function running the test, setup the for-loop, constant
  variables, etc.)

  The calibration step is run multiple times and is used
  to calculate an average test overhead time.

* Run the actual test which runs the operation multiple
  times.

  The test is then adjusted to make sure that the
  test overhead / test run ratio remains within
  reasonable bounds.

  If needed, the operation code is repeated verbatim in
  the for-loop, to decrease the ratio.

* Repeat the above for each test in the suite

* Repeat the suite N number of rounds

* Calculate the average run time of all test runs in all rounds.

>   after all, if a change to e.g. the try/except code slows things down 
> or speed things up, is it really reasonable to expect that the time it 
> takes to convert Unicode strings to uppercase should suddenly change due 
> to cache effects or a changing number of registers in the CPU?  real 
> hardware doesn't work that way...

Of course, but then changes to try-except logic can interfere
with the performance of setting up method calls. This is what
pybench then uncovers.

The only problem I see in the above approach is the way
calibration is done. The run-time of the calibration code
may be to small w/r to the resolution of the used timers.

Again, please provide the parameters you've used to run the
test case and the output. Things like warp factor, overhead,
etc. could hint to the problem you're seeing.

> is PyBench perhaps using the following approach:
> 
>  T = set of tests
>  for N in range(number of test runs):
>  for t in T:
>  t0 = get_process_time()
>  t()
>  t1 = get_process_time()
>  assign t1 - t0 to test t
>  print assigned time
> 
> where t1 - t0 is very short?

See above (or the code in pybench.py). t1-t0 is usually
around 20-50 seconds:

"""
The tests must set .rounds to a value high enough to let the
test run between 20-50 seconds. This is needed because
clock()-timing only gives rather inaccurate values (on Linux,
for example, it is accurate to a few hundreths of a
second). If you don't want to wait that long, use a warp
factor larger than 1.
"""

> that's not a very good idea, given how get_process_time tends to be 
> implemented on current-era systems (google for "jiffies")...  but it 
> definitely explains the bogus subtest results I'm seeing, and the "magic 
> hardware" behaviour you're seeing.

That's exactly the reason why tests run for a relatively long
time - to minimize these effects. Of course, using wall time
make this approach vulnerable to other effects such as current
load of the system, other processes having a higher priority
interfering with the timed process, etc.

For this reason, I'm currently looking for ways to measure the
process time on Windows.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

> Of course, but then changes to try-except logic can interfere
> with the performance of setting up method calls. This is what
> pybench then uncovers.

I think the only thing PyBench has uncovered is that you're convinced that it's
always right, and everybody else is always wrong, including people who've
spent decades measuring performance, and the hardware in your own computer.

> See above (or the code in pybench.py). t1-t0 is usually
> around 20-50 seconds:

what machines are you using?  using the default parameters, the entire run takes
about 50 seconds on the slowest machine I could find...

>> that's not a very good idea, given how get_process_time tends to be
>> implemented on current-era systems (google for "jiffies")...  but it
>> definitely explains the bogus subtest results I'm seeing, and the "magic
>> hardware" behaviour you're seeing.
>
> That's exactly the reason why tests run for a relatively long
> time - to minimize these effects. Of course, using wall time
> make this approach vulnerable to other effects such as current
> load of the system, other processes having a higher priority
> interfering with the timed process, etc.

since process time is *sampled*, not measured, process time isn't exactly in-
vulnerable either.  it's not hard to imagine scenarios where you end up being
assigned only a small part of the process time you're actually using, or cases
where you're assigned more time than you've had a chance to use.

afaik, if you want true performance counters on Linux, you need to patch the
operating system (unless something's changed in very recent versions).

I don't think that sampling errors can explain all the anomalies we've been 
seeing,
but I'd wouldn't be surprised if a high-resolution wall time clock on a lightly 
loaded
multiprocess system was, in practice, *more* reliable than sampled process time
on an equally loaded system.

 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Fredrik Lundh wrote:
> M.-A. Lemburg wrote:
> 
>> Of course, but then changes to try-except logic can interfere
>> with the performance of setting up method calls. This is what
>> pybench then uncovers.
> 
> I think the only thing PyBench has uncovered is that you're convinced that 
> it's
> always right, and everybody else is always wrong, including people who've
> spent decades measuring performance, and the hardware in your own computer.

Oh, come on. You know that's not true and I'm trying to
understand what is causing your findings, but this is
difficult, since you're not providing enough details.
E.g. the output of pybench showing the timing results
would help a lot.

I would also like to reproduce your findings. Do you have
two revision numbers in svn which I could use for this ?

>> See above (or the code in pybench.py). t1-t0 is usually
>> around 20-50 seconds:
> 
> what machines are you using?  using the default parameters, the entire run 
> takes
> about 50 seconds on the slowest machine I could find...

If the whole suite runs in 50 seconds, the per-test
run-times are far too small to be accurate. I usually
adjust the warp factor so that each *round* takes
50 seconds.

Looks like I have to revisit the default parameters and
update the doc-strings. I'll do that when I add the new
timers.

Could you check whether you still see the same results with
running with "pybench.py -w 1" ?

>>> that's not a very good idea, given how get_process_time tends to be
>>> implemented on current-era systems (google for "jiffies")...  but it
>>> definitely explains the bogus subtest results I'm seeing, and the "magic
>>> hardware" behaviour you're seeing.
>> That's exactly the reason why tests run for a relatively long
>> time - to minimize these effects. Of course, using wall time
>> make this approach vulnerable to other effects such as current
>> load of the system, other processes having a higher priority
>> interfering with the timed process, etc.
> 
> since process time is *sampled*, not measured, process time isn't exactly in-
> vulnerable either.  it's not hard to imagine scenarios where you end up being
> assigned only a small part of the process time you're actually using, or cases
> where you're assigned more time than you've had a chance to use.
> 
> afaik, if you want true performance counters on Linux, you need to patch the
> operating system (unless something's changed in very recent versions).
> 
> I don't think that sampling errors can explain all the anomalies we've been 
> seeing,
> but I'd wouldn't be surprised if a high-resolution wall time clock on a 
> lightly loaded
> multiprocess system was, in practice, *more* reliable than sampled process 
> time
> on an equally loaded system.

That's why the timers being used by pybench will become a
parameter that you can then select to adapt pybench it to
the OS your running pybench on.

Note that time.clock, the current default timer in pybench,
is a high accuracy wall-clock timer on Windows, so it should
demonstrate similar behavior to timeit.py, even more so,
since your using warp 20 and thus a similar timing strategy
as that of timeit.py.

I suspect that the calibration step is causing problems.

Steve added a parameter to change the number of calibration
runs done per test: -C n. The default is 20.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke

M.-A. Lemburg:

The approach pybench is using is as follows:

...

 The calibration step is run multiple times and is used
 to calculate an average test overhead time.


One of the changes that occured during the sprint was to change this algorithm
to use the best time rather than the average.  Using the average assumes a
Gaussian distribution.  Timing results are not.  There is an absolute best but
that's rarely reached due to background noise.  It's more like a gamma
distribution
plus the minimum time.

To show the distribution is non-Gaussian I ran the following

def compute():
   x = 0
   for i in range(1000):
   for j in range(1000):
   x += 1

def bench():
   t1 = time.time()
   compute()
   t2 = time.time()
   return t2-t1

times = []
for i in range(1000):
   times.append(bench())

print times

The full distribution is attached as 'plot1.png' and the close up
(range 0.45-0.65)
as 'plot2.png'.  Not a clean gamma function, but that's a closer match than an
exponential.

The gamma distribution looks more like a exponential function when the shape
parameter is large.  This corresponds to a large amount of noise in the system,
so the run time is not close to the best time.  This means the average approach
works better when there is a lot of random background activity, which is not the
usual case when I try to benchmark.

When averaging a gamma distribution you'll end up with a bit of a
skew, and I think
the skew depends on the number of samples, reaching a limit point.

Using the minimum time should be more precise because there is a
definite lower bound and the machine should be stable.  In my test
above the first few results are

0.472838878632
0.473038911819
0.473326921463
0.473494052887
0.473829984665

I'm pretty certain the best time is 0.4725, or very close to that.
But the average
time is 0.58330151391 because of the long tail.  Here are the last 6 results in
my population of 1000

1.76353311539
1.79937505722
1.82750201225
2.01710510254
2.44861507416
2.90868496895

Granted, I hit a couple of web pages while doing this and my spam
filter processed
my mailbox in the background...

There's probably some Markov modeling which would look at the number
and distribution of samples so far and assuming a gamma distribution
determine how many more samples are needed to get a good estimate of
the absolute minumum time.  But min(large enough samples) should work
fine.


If the whole suite runs in 50 seconds, the per-test
run-times are far too small to be accurate. I usually
adjust the warp factor so that each *round* takes
50 seconds.


The stringbench.py I wrote uses the timeit algorithm which
dynamically adjusts the test to run between 0.2 and 2 seconds.


That's why the timers being used by pybench will become a
parameter that you can then select to adapt pybench it to
the OS your running pybench on.


Wasn't that decision a consequence of the problems found during
the sprint?

   Andrew
   [EMAIL PROTECTED]


plot1.png
Description: PNG image


plot2.png
Description: PNG image
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing Mac OS 9 cruft

2006-06-02 Thread Guido van Rossum
Just and Jack have confirmed that you can throw away everything except
possibly Demo/*. (Just even speculated that some cruft may have been
accidentally revived by the cvs -> svn transition?)

--Guido

On 6/1/06, Neal Norwitz <[EMAIL PROTECTED]> wrote:
> I was about to remove Mac/IDE scripts, but it looks like there might
> be more stuff that is OS 9 related and should be removed.  Other
> possibilities look like (everything under Mac/):
>
>   Demo/*  this is a bit more speculative
>   IDE scripts/*
>   MPW/*
>   Tools/IDE/*  this references IDE scripts, so presumably it should be toast?
>   Tools/macfreeze/*
>   Unsupported/mactcp/dnrglue.c
>   Wastemods/*
>
> I'm going mostly based on what has been modified somewhat recently.
> Can someone confirm/reject these?  I'll remove them.
>
> n
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/guido%40python.org
>


-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
Andrew Dalke wrote:
> M.-A. Lemburg:
>> The approach pybench is using is as follows:
> ...
>>  The calibration step is run multiple times and is used
>>  to calculate an average test overhead time.
> 
> One of the changes that occured during the sprint was to change this
> algorithm
> to use the best time rather than the average.  Using the average assumes a
> Gaussian distribution.  Timing results are not.  There is an absolute
> best but
> that's rarely reached due to background noise.  It's more like a gamma
> distribution
> plus the minimum time.
> 
> To show the distribution is non-Gaussian I ran the following
> 
> def compute():
>x = 0
>for i in range(1000):
>for j in range(1000):
>x += 1
> 
> def bench():
>t1 = time.time()
>compute()
>t2 = time.time()
>return t2-t1
> 
> times = []
> for i in range(1000):
>times.append(bench())
> 
> print times
> 
> The full distribution is attached as 'plot1.png' and the close up
> (range 0.45-0.65)
> as 'plot2.png'.  Not a clean gamma function, but that's a closer match
> than an
> exponential.
> 
> The gamma distribution looks more like a exponential function when the
> shape
> parameter is large.  This corresponds to a large amount of noise in the
> system,
> so the run time is not close to the best time.  This means the average
> approach
> works better when there is a lot of random background activity, which is
> not the
> usual case when I try to benchmark.
> 
> When averaging a gamma distribution you'll end up with a bit of a
> skew, and I think
> the skew depends on the number of samples, reaching a limit point.
> 
> Using the minimum time should be more precise because there is a
> definite lower bound and the machine should be stable.  In my test
> above the first few results are
> 
> 0.472838878632
> 0.473038911819
> 0.473326921463
> 0.473494052887
> 0.473829984665
> 
> I'm pretty certain the best time is 0.4725, or very close to that.
> But the average
> time is 0.58330151391 because of the long tail.  Here are the last 6
> results in
> my population of 1000
> 
> 1.76353311539
> 1.79937505722
> 1.82750201225
> 2.01710510254
> 2.44861507416
> 2.90868496895
> 
> Granted, I hit a couple of web pages while doing this and my spam
> filter processed
> my mailbox in the background...
> 
> There's probably some Markov modeling which would look at the number
> and distribution of samples so far and assuming a gamma distribution
> determine how many more samples are needed to get a good estimate of
> the absolute minumum time.  But min(large enough samples) should work
> fine.

Thanks for the great analysis !

Using the minimum looks like the way to go for calibration.

I wonder whether the same is true for the actual tests; since
you're looking for the expected run-time, the minimum may
not necessarily be the choice. Then again, in both cases
you are only looking at a small number of samples (20 for
the calibration, 10 for the number of rounds), so this
may be irrelevant.

BTW, did you run this test on Windows or a Unix machine ?

There's also an interesting second high at around 0.53.
What could be causing this ?

>> If the whole suite runs in 50 seconds, the per-test
>> run-times are far too small to be accurate. I usually
>> adjust the warp factor so that each *round* takes
>> 50 seconds.
> 
> The stringbench.py I wrote uses the timeit algorithm which
> dynamically adjusts the test to run between 0.2 and 2 seconds.
>
>> That's why the timers being used by pybench will become a
>> parameter that you can then select to adapt pybench it to
>> the OS your running pybench on.
> 
> Wasn't that decision a consequence of the problems found during
> the sprint?

It's a consequence of a discussion I had with Steve Holden
and Tim Peters:

I believe that using wall-clock timers
for benchmarking is not a good approach due to the high
noise level. Process time timers typically have a lower
resolution, but give a better picture of the actual
run-time of your code and also don't exhibit as much noise
as the wall-clock timer approach. Of course, you have
to run the tests somewhat longer to get reasonable
accuracy of the timings.

Tim thinks that it's better to use short running tests and
an accurate timer, accepting the added noise and counting
on the user making sure that the noise level is at a
minimum.

Since I like to give users the option of choosing for
themselves, I'm going to make the choice of timer an
option.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Sola

Re: [Python-Dev] Let's stop eating exceptions in dict lookup

2006-06-02 Thread Michael Hudson
Anthony Baxter <[EMAIL PROTECTED]> writes:

> On Friday 02 June 2006 02:21, Jack Diederich wrote:
>> The CCP Games CEO said they have trouble retaining talent from more
>> moderate latitudes for this reason.  18 hours of daylight makes
>> them a bit goofy and when the Winter Solstice rolls around they are
>> apt to go quite mad.
>
> Obviously they need to hire people who are already crazy.

I think they already did! :)

> not-naming-any-names-ly,
> Anthony

me-neither-ly y'rs
mwh

-- 
  > Look I don't know.  Thankyou everyone for arguing me round in
  > circles.
  No need for thanks, ma'am; that's what we're here for.
-- LNR & Michael M Mason, cam.misc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

> I believe that using wall-clock timers
> for benchmarking is not a good approach due to the high
> noise level. Process time timers typically have a lower
> resolution, but give a better picture of the actual
> run-time of your code and also don't exhibit as much noise
> as the wall-clock timer approach.

please stop repeating this nonsense.  there are no "process time timers" in con-
temporary operating systems; only tick counters.

there are patches for linux and commercial add-ons to most platforms that lets
you use hardware performance counters for process stuff, but there's no way to
emulate that by playing with different existing Unix or Win32 API:s; the thing
you think you're using simply isn't there.

 



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
>>> That's why the timers being used by pybench will become a
>>> parameter that you can then select to adapt pybench it to
>>> the OS your running pybench on.
>> Wasn't that decision a consequence of the problems found during
>> the sprint?
> 
> It's a consequence of a discussion I had with Steve Holden
> and Tim Peters:
> 
> I believe that using wall-clock timers
> for benchmarking is not a good approach due to the high
> noise level. Process time timers typically have a lower
> resolution, but give a better picture of the actual
> run-time of your code and also don't exhibit as much noise
> as the wall-clock timer approach. Of course, you have
> to run the tests somewhat longer to get reasonable
> accuracy of the timings.
> 
> Tim thinks that it's better to use short running tests and
> an accurate timer, accepting the added noise and counting
> on the user making sure that the noise level is at a
> minimum.

I just had an idea: if we could get each test to run
inside a single time slice assigned by the OS scheduler,
then we could benefit from the better resolution of the
hardware timers while still keeping the noise to a
minimum.

I suppose this could be achieved by:

* making sure that each tests needs less than 10ms to run

* calling time.sleep(0) after each test run

Here's some documentation on the Linux scheduler:

http://www.samspublishing.com/articles/article.asp?p=101760&seqNum=2&rl=1

Table 3.1 has the minimum time slice: 10ms.

What do you think ? Would this work ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2006-07-03: EuroPython 2006, CERN, Switzerland  30 days left

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Michael Chermside
Marc-Andre Lemburg writes:
> Using the minimum looks like the way to go for calibration.
>
> I wonder whether the same is true for the actual tests; since
> you're looking for the expected run-time, the minimum may
> not necessarily be the choice.

No, you're not looking for the expected run-time. The expected
run-time is a function of the speed of the CPU, the architechure
of same, what else is running simultaneously -- perhaps even
what music you choose to listen to that day. It is NOT a
constant for a given piece of code, and is NOT what you are
looking for.

What you really want to do in benchmarking is to *compare* the
performance of two (or more) different pieces of code. You do,
of course, care about the real-world performance. So if you
had two algorithms and one ran twice as fast when there were no
context switches and 10 times slower when there was background
activity on the machine, then you'd want prefer the algorithm
that supports context switches. But that's not a realistic
situation. What is far more common is that you run one test
while listening to the Grateful Dead and another test while
listening to Bach, and that (plus other random factors and the
phase of the moon) causes one test to run faster than the
other.

Taking the minimum time clearly subtracts some noise, which is
a good thing when comparing performance for two or more pieces
of code. It fails to account for the distribution of times, so
if one piece of code occasionally gets lucky and takes far less
time then minimum time won't be a good choice... but it would
be tricky to design code that would be affected by the scheduler
in this fashion even if you were explicitly trying!


Later he continues:
> Tim thinks that it's better to use short running tests and
> an accurate timer, accepting the added noise and counting
> on the user making sure that the noise level is at a
> minimum.
>
> Since I like to give users the option of choosing for
> themselves, I'm going to make the choice of timer an
> option.

I'm generally a fan of giving programmers choices. However,
this is an area where we have demonstrated that even very
competent programmers often have misunderstandings (read this
thread for evidence!). So be very careful about giving such
a choice: the default behavior should be chosen by people
who think carefully about such things, and the documentation
on the option should give a good explanation of the tradeoffs
or at least a link to such an explanation.

-- Michael Chermside

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] test_ctypes failures on ppc64 debian

2006-06-02 Thread Thomas Heller
test_ctypes fails on the ppc64 machine.  I don't have access to such
a machine myself, so I would have to do some trial and error, or try
to print some diagnostic information.

This should not be done in the trunk, so the question is: can the buildbots
build branches?

I assume I just have to enter a revision number and press the force-build 
button,
is this correct?  Or would someone consider this abuse?

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_ctypes failures on ppc64 debian

2006-06-02 Thread Tim Peters
[Thomas Heller]
> test_ctypes fails on the ppc64 machine.  I don't have access to such
> a machine myself, so I would have to do some trial and error, or try
> to print some diagnostic information.
>
> This should not be done in the trunk, so the question is: can the buildbots
> build branches?

Yes.  For example, that's how the buildbots run 2.4 tests.

> I assume I just have to enter a revision number and press the force-build
> button, is this correct?

No, you need to enter the "tail end" of the branch path in the "Branch
to build:" box.  You probably want to leave the "Revision to build:"
box empty.  Examples I know work because I've tried them in the past:
entering "trunk" in "Branch to build:" builds the current trunk, and
entering "branches/release24-maint" in "Branch to build:" builds the
current 2.4 branch.  I'm not certain that paths other than those work.

> Or would someone consider this abuse?

In this case, it only matters whether Matthias Klose thinks it's abuse
(since klose-debian-ppc64 is his box), so I've copied him on this
reply.  Matthias, I hope you don't mind some extra activity on that
box, since it may be the only way test_ctypes will ever pass on your
box :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] test_ctypes failures on ppc64 debian

2006-06-02 Thread Thomas Heller
Tim Peters wrote:
> [Thomas Heller]
>> test_ctypes fails on the ppc64 machine.  I don't have access to such
>> a machine myself, so I would have to do some trial and error, or try
>> to print some diagnostic information.
>>
>> This should not be done in the trunk, so the question is: can the 
>> buildbots
>> build branches?
> 
> Yes.  For example, that's how the buildbots run 2.4 tests.
> 
>> I assume I just have to enter a revision number and press the force-build
>> button, is this correct?
> 
> No, you need to enter the "tail end" of the branch path in the "Branch
> to build:" box.  You probably want to leave the "Revision to build:"
> box empty.  Examples I know work because I've tried them in the past:
> entering "trunk" in "Branch to build:" builds the current trunk, and
> entering "branches/release24-maint" in "Branch to build:" builds the
> current 2.4 branch.  I'm not certain that paths other than those work.
> 
>> Or would someone consider this abuse?
> 
> In this case, it only matters whether Matthias Klose thinks it's abuse
> (since klose-debian-ppc64 is his box), so I've copied him on this
> reply.  Matthias, I hope you don't mind some extra activity on that
> box, since it may be the only way test_ctypes will ever pass on your
> box :-)

I have already mailed him asking if he can give me interactive access
to this machine ;-).  He has not yet replied - I'm not sure if this is because
he's been shocked to see such a request, or if he already is in holidays.

Thomas

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] SF patch #1473257: "Add a gi_code attr to generators"

2006-06-02 Thread Brett Cannon
On 6/1/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
On 6/1/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:> I didn't know it was assigned to me.  I guess SF doesn't send any> notifications, and neither did Georg, so your email is the very first time
> that I've heard of it.This is a longstanding SF bug. (One of the reasons why we should moveaway from it ASAP IMO.)The Request for Trackers should go out this weekend, putting a worst case timeline of choosing a tracker as three months from this weekend.  Once that is done hopefully switching over won't take very long.  In other words, hopefully this can get done before October.
-BrettWhile we're still using SF, developers should probably get in the
habit of sending an email to the assignee when assigning a bug...Guido van Rossum (home page: http://www.python.org/~guido/)___
Python-Dev mailing listPython-Dev@python.orghttp://mail.python.org/mailman/listinfo/python-devUnsubscribe: 
http://mail.python.org/mailman/options/python-dev/brett%40python.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Terry Reedy

"M.-A. Lemburg" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
>> Granted, I hit a couple of web pages while doing this and my spam
>> filter processed my mailbox in the background...

Hardly a setting in which to run comparison tests, seems to me.

> Using the minimum looks like the way to go for calibration.

Or possibly the median.

But even better, the way to go to run comparison timings is to use a system 
with as little other stuff going on as possible.  For Windows, this means 
rebooting in safe mode, waiting until the system is quiescent, and then run 
the timing test with *nothing* else active that can be avoided.

Even then, I would look at the distribution of times for a given test to 
check for anomalously high values that should be tossed.  (This can be 
automated somewhat.)

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
Terry Reedy wrote:

> But even better, the way to go to run comparison timings is to use a system 
> with as little other stuff going on as possible.  For Windows, this means 
> rebooting in safe mode, waiting until the system is quiescent, and then run 
> the timing test with *nothing* else active that can be avoided.

sigh.



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Fredrik Lundh
M.-A. Lemburg wrote:

> I just had an idea: if we could get each test to run
> inside a single time slice assigned by the OS scheduler,
> then we could benefit from the better resolution of the
> hardware timers while still keeping the noise to a
> minimum.
> 
> I suppose this could be achieved by:
> 
> * making sure that each tests needs less than 10ms to run

iirc, very recent linux kernels have a 1 millisecond tick.  so does 
alphas, and probably some other platforms.

> * calling time.sleep(0) after each test run

so some higher priority process can get a chance to run, and spend 9.5 
milliseconds shuffling data to a slow I/O device before blocking? ;-)

I'm not sure this problem can be solved, really, at least not as long as 
you're constrained to portable API:s.

(talking of which, if someone has some time and a linux box to spare, 
and wants to do some serious hacking on precision benchmarks, using

 http://user.it.uu.se/~mikpe/linux/perfctr/2.6/

to play with the TSC might be somewhat interesting.)



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread M.-A. Lemburg
M.-A. Lemburg wrote:
 That's why the timers being used by pybench will become a
 parameter that you can then select to adapt pybench it to
 the OS your running pybench on.
>>> Wasn't that decision a consequence of the problems found during
>>> the sprint?
>> It's a consequence of a discussion I had with Steve Holden
>> and Tim Peters:
>>
>> I believe that using wall-clock timers
>> for benchmarking is not a good approach due to the high
>> noise level. Process time timers typically have a lower
>> resolution, but give a better picture of the actual
>> run-time of your code and also don't exhibit as much noise
>> as the wall-clock timer approach. Of course, you have
>> to run the tests somewhat longer to get reasonable
>> accuracy of the timings.
>>
>> Tim thinks that it's better to use short running tests and
>> an accurate timer, accepting the added noise and counting
>> on the user making sure that the noise level is at a
>> minimum.
> 
> I just had an idea: if we could get each test to run
> inside a single time slice assigned by the OS scheduler,
> then we could benefit from the better resolution of the
> hardware timers while still keeping the noise to a
> minimum.
> 
> I suppose this could be achieved by:
> 
> * making sure that each tests needs less than 10ms to run
> 
> * calling time.sleep(0) after each test run
> 
> Here's some documentation on the Linux scheduler:
> 
> http://www.samspublishing.com/articles/article.asp?p=101760&seqNum=2&rl=1
> 
> Table 3.1 has the minimum time slice: 10ms.
> 
> What do you think ? Would this work ?

I ran some tests related to this and it appears that provide
the test itself uses less than 1ms, chances are
high that you don't get any forced context switches in your
way while running the test.

It also appears that you have to use time.sleep(10e6) to
get the desired behavior. time.sleep(0) seems to receive
some extra care, so doesn't have the intended effect - at
least not on Linux.

I've checked this on AMD64 and Intel Pentium M. The script is
attached - it will run until you get more than 10 forced
context switches in 100 runs of the test, incrementing the
runtime of the test in each round.

It's also interesting that the difference between max and min
run-time of the tests can be as low as 0.2% on the Pentium,
whereas the AMD64 always stays around 4-5%. On an old AMD Athlon,
the difference rare goes below 50% - this might also have
to do with the kernel version running on that machine which
is 2.4 whereas the AMD64 and Pentium M are running 2.6.

Note that is needs to the resource module, so it won't work
on Windows.

It's interesting that even pressing a key on your keyboard
will cause forced context switches.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 02 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
import resource, time

def workload(rounds):

x = 0
for i in range(rounds):
x = x + 1

def microbench():

print 'Microbench'
sleeptime = 10e-6
sleep = time.sleep
timer = time.time
rtype = resource.RUSAGE_SELF
rounds = 100

while 1:
times = []
rstart = resource.getrusage(rtype)
for i in range(100):
# Make sure the test is run at the start of a scheduling time
# slice
sleep(sleeptime)
# Test
start = timer()
workload(rounds)
stop = timer()
times.append(stop - start)
rstop = resource.getrusage(rtype)
volswitches = rstop[-2] - rstart[-2]
forcedswitches = rstop[-1] - rstart[-1]
min_time = min(times)
max_time = max(times)
diff = max_time - min_time
if forcedswitches == 0:
print 'Rounds: %i' % rounds
print '  min time: %f seconds' % min_time
print '  max time: %f seconds' % max_time
print '  diff: %f %% = %f seconds' % (diff / min_time * 100.0,
  diff)
print '  context switches: %r %r' % (volswitches, forcedswitches)
print
elif forcedswitches > 10:
break
rounds += 100

microbench()
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Some more comments re new uriparse module, patch 1462525

2006-06-02 Thread John J Lee
[Not sure whether this kind of thing is best posted as tracker comments 
(but then the tracker gets terribly long and is mailed out every time a 
change happens) or posted here.  Feel free to tell me I'm posting in the 
wrong place...]

Some comments on this patch (a new module, submitted by Paul Jimenez, 
implementing the rules set out in RFC 3986 for URI parsing, joining URI 
references with a base URI etc.)

http://python.org/sf/1462525


Sorry for the pause, Paul.  I finally read RFC 3986 -- which I must say is 
probably the best-written RFC I've read (and there was much rejoicing).

I still haven't read 3987 and got to grips with the unicode issues 
(whatever they are), but I have just implemented the same stuff you did, 
so have some comments on non-unicode aspects of your implementation (the 
version labelled "v23" on the tracker):


Your urljoin implementation seems to pass the tests (the tests taken from 
the RFC), but I have to I admit I don't understand it :-)  It doesn't seem 
to take account of the distinction between undefined and empty URI 
components.  For example, the authority of the URI reference may be empty 
but still defined.  Anyway, if you're taking advantage of some subtle 
identity that implies that you can get away with truth-testing in place of 
"is None" tests, please don't ;-) It's slower than "is [not] None" tests 
both for the computer and (especially!) the reader.

I don't like the use of module posixpath to implement the algorithm 
labelled "remove_dot_segments".  URIs are not POSIX filesystem paths, and 
shouldn't depend on code meant to implement the latter.  But my own 
implementation is exceedingly ugly ATM, so I'm in no position to grumble 
too much :-)

Normalisation of the base URI is optional, and your urljoin function
never normalises.  Instead, it parses the base and reference, then
follows the algorithm of section 5.2 of the RFC.  Parsing is required
before normalisation takes place.  So urljoin forces people who need
to normalise the URI before to parse it twice, which is annoying.
There should be some way to parse 5-tuples in instead of URIs.  E.g.,
from my implementation:

def urljoin(base_uri, uri_reference):
 return urlunsplit(urljoin_parts(urlsplit(base_uri),
 urlsplit(uri_reference)))


It would be nice to have a 5-tuple-like class (I guess implemented as a 
subclass of tuple) that also exposes attributes (.authority, .path, etc.) 
-- the same way module time does it.

The path component is required, though may be empty.  Your parser
returns None (meaning "undefined") where it should return an empty
string.

Nit: Your tests involving ports contain non-digit characters in the
port (viz. "port"), which is not valid by section 3.2.3 of the RFC.

Smaller nit: the userinfo component was never allowed in http URLs,
but you use them in your tests.  This issue is outside of RFC 3986, of
course.

Particularly because the userinfo component is deprecated, I'd rather
that userinfo-splitting and joining were separate functions, with the
other functions dealing only with the standard RFC 3986 5-tuples.

DefaultSchemes should be a class attribute of URIParser

The naming of URLParser / URIParser is still insane :-)  I suggest
naming them _URIParser and URIParser respectively.

I guess there should be no mention of "URL" anywhere in the module --
only "URI" (even though I hate "URI", as a mostly-worthless
distinction from "URL", consistency inside the module is more
important, and URI is technically correct and fits with all the
terminology used in the RFC).  I'm still heavily -1 on calling it
"uriparse" though, because of the highly misleading comparison with
the name "urlparse" (the difference between the modules isn't the
difference between URIs and URLs).

Re your comment on "mailto:"; in the tracker: sure, I understand it's not 
meant to be public, but the interface is!  .parse() will return a 4-tuple 
for mailto: URLs.  For everything else, it will return a 7-tuple.  That's 
silly.

The documentation should explain that the function of URIParser is
hiding scheme-dependent URI normalisation.

Method names and locals are still likeThis, contrary to PEP 8.

docstrings and other whitespace are still non-standard -- follow PEP 8
(and PEP 257, which PEP 8 references) Doesn't have to be totally rigid
of course -- e.g. lining up the ":" characters in the tests is fine.

Standard stdlib form documentation is still missing.  I'll be told off
if I don't read you your rights: you don't have to submit in LaTeX
markup -- apparently there are hordes of eager LaTeX markers-up
lurking ready to pounce on poorly-formatted documentation 

Test suite still needs tweaking to put it in standard stdlib form


John

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke
On 6/2/06, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Hardly a setting in which to run comparison tests, seems to me.

The point though was to show that the time distribution is non-Gaussian,
so intuition based on that doesn't help.

> > Using the minimum looks like the way to go for calibration.
>
> Or possibly the median.

Why?  I can't think of why that's more useful than the minimum time.

Given an large number of samples the difference between the
minimum and the median/average/whatever is mostly providing
information about the background noise, which is pretty irrelevent
to most benchmarks.

> But even better, the way to go to run comparison timings is to use a system
> with as little other stuff going on as possible.  For Windows, this means
> rebooting in safe mode, waiting until the system is quiescent, and then run
> the timing test with *nothing* else active that can be avoided.

A reason I program in Python is because I want to get work done and not
deal with stoic purity.  I'm not going to waste all that time (or money to buy
a new machine) just to run a benchmark.

Just how much more accurate would that be over the numbers we get
now.  Have you tried it?  What additional sensitivity did you get and was
the extra effort worthwhile?

> Even then, I would look at the distribution of times for a given test to
> check for anomalously high values that should be tossed.  (This can be
> automated somewhat.)

I say it can be automated completely.  Toss all but the lowest.
It's the one with the least noise overhead.

I think fitting the smaller data points to a gamma distribution might
yield better (more reproducible and useful) numbers but I know my
stats ability is woefully decayed so I'm not going to try.  My observation
is that the shape factor is usually small so in a few dozen to a hundred
samples there's a decent chance of getting a time with minimal noise
overhead.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Andrew Dalke
On 6/2/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> It's interesting that even pressing a key on your keyboard
> will cause forced context switches.

When niceness was first added to multiprocessing OSes people found their
CPU intensive jobs would go faster by pressing enter a lot.

Andrew
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Tim Peters
[MAL]
>>> Using the minimum looks like the way to go for calibration.

[Terry Reedy]
>> Or possibly the median.

[Andrew Dalke]
> Why?  I can't think of why that's more useful than the minimum time.

A lot of things get mixed up here ;-)  The _mean_ is actually useful
if you're using a poor-resolution timer with a fast test.  For
example, suppose a test takes 1/10th the time of the span between
counter ticks.  Then, "on average", in 9 runs out of 10 the reported
elapsed time is 0 ticks, and in 1 run out of 10 the reported time is 1
tick.  0 and 1 are both wrong, but the mean (1/10) is correct.

So there _can_ be sense to that.  Then people vaguely recall that the
median is more robust than the mean, and all sense goes out the window
;-)

My answer is to use the timer with the best resolution the machine
has.  Using the mean is a way to worm around timer quantization
artifacts, but it's easier and clearer to use a timer with resolution
so fine that quantization doesn't make a lick of real difference.
Forcing a test to run for a long time is another way to make timer
quantization irrelevant, but then you're also vastly increasing
chances for other processes to disturb what you're testing.

I liked benchmarking on Crays in the good old days.  No time-sharing,
no virtual memory, and the OS believed to its core that its primary
purpose was to set the base address once at the start of a job so the
Fortran code could scream.  Test times were reproducible to the
nanosecond with no effort.  Running on a modern box for a few
microseconds at a time is a way to approximate that, provided you
measure the minimum time with a high-resolution timer :-)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread A.M. Kuchling
On Fri, Jun 02, 2006 at 07:44:07PM -0400, Tim Peters wrote:
> Fortran code could scream.  Test times were reproducible to the
> nanosecond with no effort.  Running on a modern box for a few
> microseconds at a time is a way to approximate that, provided you
> measure the minimum time with a high-resolution timer :-)

On Linux with a multi-CPU machine, you could probably boot up the
system to use N-1 CPUs, and then start the Python process on CPU N.
That should avoid the process being interrupted by other processes,
though I guess there would still be some noise from memory bus and
kernel lock contention.

(At work we're trying to move toward this approach for doing realtime
audio: devote one CPU to the audio computation and use other CPUs for
I/O, web servers, and whatnot.)

--amk
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Greg Ewing
Tim Peters wrote:

> I liked benchmarking on Crays in the good old days.  ...  
 > Test times were reproducible to the
> nanosecond with no effort.  Running on a modern box for a few
> microseconds at a time is a way to approximate that, provided you
> measure the minimum time with a high-resolution timer :-)

Obviously what we need here is a stand-alone Python interpreter
that runs on the bare machine, so there's no pesky operating
system around to mess up our times.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Greg Ewing
A.M. Kuchling wrote:

> (At work we're trying to move toward this approach for doing realtime
> audio: devote one CPU to the audio computation and use other CPUs for
> I/O, web servers, and whatnot.)

Speaking of creative uses for multiple CPUs, I was thinking
about dual-core Intel Macs the other day, and I wondered
whether it would be possible to configure it so that one
core was running MacOSX and the other was running Windows
at the same time.

It would give the term "dual booting" a whole new
meaning...

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Benchmarks

2006-06-02 Thread Josiah Carlson

Greg Ewing <[EMAIL PROTECTED]> wrote:
> 
> Tim Peters wrote:
> 
> > I liked benchmarking on Crays in the good old days.  ...  
>  > Test times were reproducible to the
> > nanosecond with no effort.  Running on a modern box for a few
> > microseconds at a time is a way to approximate that, provided you
> > measure the minimum time with a high-resolution timer :-)
> 
> Obviously what we need here is a stand-alone Python interpreter
> that runs on the bare machine, so there's no pesky operating
> system around to mess up our times.

An early version of unununium would do that (I don't know if much
progress has been made since I last checked their site).

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com