Re: [Python-Dev] bytes / unicode

2010-06-22 Thread Mike Klaas
On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking i...@colorstudy.com wrote:

 This reminds me of the optimization ElementTree and lxml made in Python 2
 (not sure what they do in Python 3?) where they use str when a string is
 ASCII to avoid the memory and performance overhead of unicode.

An optimization that forces me to typecheck the return value of the
function and that I only discovered after code started breaking.  I
can't say was enthused about that decision when I discovered it.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the GIL (with a BFS scheduler)

2010-05-18 Thread Mike Klaas
On Sun, May 16, 2010 at 1:07 PM, Nir Aides n...@winpdb.org wrote:

 Relevant Python issue: http://bugs.python.org/issue7946

Is there any chance Antoine's gilinter patch from that issue might
be applied to python 2.7?  I have been experiencing rare long delays
in simple io operations in multithreaded python applications, and I
suspect that they might be related to this issue.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fixing the GIL (with a BFS scheduler)

2010-05-18 Thread Mike Klaas
On Tue, May 18, 2010 at 2:50 PM, Antoine Pitrou solip...@pitrou.net wrote:

 There's no chance for this since the patch relies on the new GIL.
 (that's unless there's a rush to backport the new GIL in 2.7, of course)

Thanks I missed that detail.

 I think your rare long delays might be related to the old GIL's own
 problems, though. How long are they?

Typically between 20 and 60s.  This is the time it takes to send and
receive a single small packet on an already-active tcp connection to
ensure it is still alive.   Most of the time it is  1ms.  I don't
have strong evidence that GIL issues are causing the problem, because
I can't reliably reproduce the issue.  But the general setup is
similar (one thread doing light io experiencing odd delays in a
process with multiple threads that are often cpu-bound, on a
multi-core machine)

thanks,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] deprecated stuff in standard library

2010-02-19 Thread Mike Klaas
On Fri, Feb 19, 2010 at 9:03 AM, Sjoerd Mullender sjo...@acm.org wrote:

 The policy should also be, if someone decides (or rather, implements) a
 deprecation of a module, they should do a grep to see where that module
 is used and fix the code.  It's not rocket science.

I'm not sure if you're aware of it, but you're starting to sound a little rude.

ISTM that it doesn't make sense to waste effort ensuring that
deprecated code is updated to not call other deprecated modules.  Of
course, all released non-deprecated code should steer clear of
deprecated apis.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-25 Thread Mike Klaas
On Mon, Jan 25, 2010 at 11:32 AM, Daniel Stutzbach
dan...@stutzbachenterprises.com wrote:
 On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote:

 I haven't completely worked out the best strategy to eventually release
 the memory taken up by the pointers of the unreleased elements, but the
 worst case scenario is that the unused memory only gets wasted until the
 time that the list itself gets garbage collected.

 FWIW, for a long-running FIFO queue, it's critical to release some of the
 memory along the way, otherwise the amount of wasted memory is unbounded.

 Good luck :)

It seems to me that the best way to do this is invert .append() logic:
leave at most X amount of wasted space at the beginning of the list,
where X is a constant fraction of the list size.

Whether it is worth adding a extra pointer to the data stored by a
list is another story.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?

2009-11-03 Thread Mike Klaas
On Tue, Nov 3, 2009 at 10:42 AM, Georg Brandl g.bra...@gmx.net wrote:

 sstein...@gmail.com schrieb:
 
  On Nov 3, 2009, at 12:28 PM, Arc Riley wrote:
 
  The main thing holding back the community are lazy and/or obstinate
  package maintainers.  If they spent half the time they've put into
  complaining about Py3 into actually working to upgrade their code
  they'd be done now.
 
  That's an inflammatory, defamatory, unsubstantiated, hyperbolic,
  sweeping overgeneralization.

 I know a few maintainers, and I have no problem seeing how Arc came
 to that conclusion.


Be that as it may, the only way python 3 will be widely adopted if people
have motivation to (need to be compatible with other libs, pressure from
users, their own interest in fostering python 3.0, etc.).  Deriding them as
lazy accomplishes nothing and obscures the fact that it is the python
maintainers responsibility to bring about this motivation if they want
python 3.0 to be adopted.  No-one is going to convert to python 3.0 because
you called them lazy.

-Mike


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pthreads, fork, import, and execvp

2009-07-20 Thread Mike Klaas
On Thu, Jul 16, 2009 at 1:08 PM, Thomas Wouters tho...@python.org wrote:


 Picking up a rather old discussion... We encountered this bug at Google and
 I'm now incentivized to fix it.

 For a short recap: Python has an import lock that prevents more than one
 thread from doing an import at any given time. However, unlike most of the
 locks we have lying around, we don't clear that lock in the child after an
 os.fork(). That means that doing an os.fork() during an import means the
 child process can't do any other imports. It also means that doing an
 os.fork() *while another thread is doing an import* means the child process
 can't do any other imports.

 Since this three-year-old discussion we've added a couple of
 post-fork-cleanups to CPython (the TLS, the threading module's idea of
 active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do
 simply discard the memory for other locks held during fork (the GIL, see
 Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in
 Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock,
 except when the platform is AIX. I don't see any particular reason why we
 aren't doing the same thing to the import lock that we do to the other
 locks, on all platforms. It's a quick fix for a real problem (see
 http://bugs.python.org/issue1590864 and
 http://bugs.python.org/issue1404925 for two bugreports that seem to be
 this very issue.)


+1.  We were also affected by this bug, getting sporatic deadlocks in a
multi-threaded program that fork()s subprocesses to do processing.
 It took a while to figure out what was going on.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 383 and GUI libraries

2009-04-30 Thread Mike Klaas


On 30-Apr-09, at 7:39 AM, Guido van Rossum wrote:


FWIW, I'm in agreement with this PEP (i.e. its status is now
Accepted). Martin, you can update the PEP and start the
implementation.


+1

Kudos to Martin for seeing this through with (imo) considerable  
patience and dignity.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Rethinking intern() and its data structure

2009-04-09 Thread Mike Klaas


On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote:


Greg Ewing wrote:

John Arbash Meinel wrote:

And the way intern is currently
written, there is a third cost when the item doesn't exist yet,  
which is

another lookup to insert the object.


That's even rarer still, since it only happens the first
time you load a piece of code that uses a given variable
name anywhere in any module.



Somewhat true, though I know it happens 25k times during startup of
bzr... And I would be a *lot* happier if startup time was 100ms  
instead

of 400ms.


I don't want to quash your idealism too severely, but it is extremely  
unlikely that you are going to get anywhere near that kind of speed up  
by tweaking string interning.  25k times doing anything (computation)  
just isn't all that much.


$ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in  
xrange(25000): d.get(x)'

100 loops, best of 3: 8.28 msec per loop

Perhaps this isn't representative (int hashing is ridiculously cheap,  
for instance), but the dict itself is far bigger than the dict you are  
dealing with and such would have similar cache-busting properties.   
And yet, 25k accesses (plus python-c dispatching costs which you are  
paying with interning) consume only ~10ms.  You could do more good by  
eliminating a handful of disk seeks by reducing the number of imported  
modules...


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] speeding up PyObject_GetItem

2009-03-24 Thread Mike Klaas


On 24-Mar-09, at 3:15 PM, Raymond Hettinger wrote:




4% on a micro-micro-benchmark is hardly compelling...


I concur!  This is utterly insignificant and certainly does
not warrant removing the checks.

-1 on these sort of fake optimizations.  We should focus
on algorithmic improvements and eliminating redundant
work and whatnot.  Removing checks that were put there for a reason  
doesn't seem useful at all.


To be fair, the main proposed optimization(s) would speed up the  
microbenchmark by 15-25% (Daniel already stated that the NULL checks  
didn't have a significant impact).  This seems significant,  
considering that tight loops whose cost is heavily due to array access  
are common.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] a nicer looking dir()

2009-02-18 Thread Mike Klaas
Someone has implemented a version of dir() which is much nicer for  
human consumption.  The difference is striking enough that I thought  
it would be bringing to python-dev's attention.


http://github.com/inky/see/tree/master

 pencil_case = []
 dir(pencil_case)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',  
'__delsli ce__', '__doc__', '__eq__', '__ge__', '__getattribute__',  
'__getitem__', '__gets lice__', '__gt__', '__hash__', '__iadd__',  
'__imul__', '__init__', '__iter__', ' __le__', '__len__', '__lt__',  
'__mul__', '__ne__', '__new__', '__reduce__', '__r educe_ex__',  
'__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__ ',  
'__setslice__', '__str__', 'append', 'count', 'extend', 'index',  
'insert', 'p op', 'remove', 'reverse', 'sort']

 see(pencil_case)

 ? [] for in + * += *=  = == !=  = len() .append() .count()

  .extend() .index() .insert() .pop() .remove() .reverse() .sort()

I'm not sure that this type of functionality merits a new built-in,  
but it might be useful as part of help()'s output.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C API for appending to arrays

2009-02-03 Thread Mike Klaas


On 2-Feb-09, at 9:21 AM, Hrvoje Niksic wrote:


It turns out that an even faster method of creating an array is by  
using the fromstring() method.  fromstring() requires an actual  
string, not a buffer, so in C++ I created an std::vectordouble  
with a contiguous array of doubles, passed that array to  
PyString_FromStringAndSize, and called array.fromstring with the  
resulting string.  Despite all the unnecessary copying, the result  
was much faster than either of the previous versions.



Would it be possible for the array module to define a C interface  
for the most frequent operations on array objects, such as appending  
an item, and getting/setting an item?  Failing that, could we at  
least make fromstring() accept an arbitrary read buffer, not just an  
actual string?


Do you need to append, or are you just looking to create/manipulate an  
array with a bunch of c-float values?  I find As{Write/Read}Buffer  
sufficient for most of these tasks.  I've included some example pyrex  
code that populates a new array.array at c speed.  (Note that you can  
get the size of the resulting c array more easily than you are by  
using PyObject_Length).  Of course, this still leaves difficult  
appending to an already-created array.


def calcW0(W1, colTotal):
 Calculate a W0 array from a W1 array.

@param W1: array.array of doubles
@param colTotal: value to which each column should sum

@return W0 = [colTotal] * NA - W1

cdef int NA
NA = len(W1)
W0 = array('d', [colTotal]) * NA

cdef double *cW1, *cW0
cdef int i
cdef Py_ssize_t dummy

PyObject_AsReadBuffer(W1, void**cW1, dummy)
PyObject_AsWriteBuffer(W0, void**cW0, dummy)

for i from 0 = i  NA:
cW0[i] = cW0[i] - cW1[i]

return W0

regards,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Partial function application 'from the right'

2009-01-29 Thread Mike Klaas

On 29-Jan-09, at 3:21 PM, Daniel Stutzbach wrote:

On Thu, Jan 29, 2009 at 4:04 PM, Antoine Pitrou  
solip...@pitrou.net wrote:

Alexander Belopolsky alexander.belopolsky at gmail.com writes:
 By this analogy, partial(f, ..., *args) is right_partial with '...'
 standing for any number of missing arguments.  I you want to specify
 exactly one missing argument, you would want to write partial(f, :,
 *args), which is not a valid syntax even in Py3.

Yes, of course, but... the meaning which numpy attributes to  
Ellipsis does not
have to be the same in other libraries. Otherwise this meaning would  
have been

embedded in the interpreter itself, while it hasn't.

The meaning which numpy attributes to Ellipsis is also the meaning  
that mathematical notation has attached to Ellipsis for a very long  
time.


And yet, python isn't confined to mathematical notation.  *, ** are  
both overloaded for use in argument lists to no-one's peril, AFAICT.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Psyco for -OO or -O

2008-12-15 Thread Mike Klaas


On 13-Dec-08, at 5:28 AM, Michael Foord wrote:


Lie Ryan wrote:
I'm sure probably most of you knows about psyco[1], the optimizer.  
Python has an -O and -OO flag that is intended to be optimization  
flag, but we know that currently it doesn't do much. Why not add  
psyco as standard library and let -O or -OO invoke psyco?




This really belongs on Python-ideas and not Python-dev.

The main reason why not is that someone(s) from the Python core team  
would then need to 'own' maintaining Psyco (which is x86 only as well


Worse, it is 32bit only, which has greatly diminished its usefulness  
in the last few years.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RELEASED Python 3.0 final

2008-12-05 Thread Mike Klaas


On 5-Dec-08, at 8:40 AM, A.M. Kuchling wrote:


On Fri, Dec 05, 2008 at 05:40:46AM -, [EMAIL PROTECTED] wrote:
For most users, especially new users who have yet to be impressed  
with
Python's power, 2.x is much better.  It's not like library  
support is

one small check-box on the language's feature sheet: most of the
attractive things about Python are libraries.  Of course I am not  
free


Here I agree, sort of.  Newbies may not understand what they're giving
up in terms of libraries.  (The 'sort of' is because, having learned
3.0, learning the changes for 2.6 is certainly much easier than
learning a first programming language is.)


For possible insight, here is a current discussion on the topic:

http://www.reddit.com/r/programming/comments/7hlra/ask_progit_ive_got_the_itch_to_learn_python_since/

(note that these would be programmers interested in learning python,  
not people trying to learn programming)


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] n.numbits: method or property?

2008-11-11 Thread Mike Klaas

On 11-Nov-08, at 4:16 PM, Mark Dickinson wrote:


More generally, what are the guidelines for determining
when it's appropriate to make something a property rather
than a method?


Both are awkward on numeric types in python, necessitating brackets or  
a space before the dot:


(1).__doc__
1 .__doc__

I'd suggest a third alternative, which is a standalone function in math:

from math import numbits:
numbits(1)

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP: Consolidating names and classes in the `unittest` module (updated 2008-07-15)

2008-07-15 Thread Mike Klaas

On 15-Jul-08, at 6:05 AM, Andrew Bennetts wrote:


Ben Finney wrote:

Stephen J. Turnbull [EMAIL PROTECTED] writes:

That measured only usage of unittest *within the Python standard
library*. Is that the only body of unittest-using code we need
consider?


Three more data points then:

bzr: 13228 assert* vs. 770 fail*.

Twisted: 6149 assert* vs. 1666 fail*.

paramiko: 431 assert* vs. 4 fail*.


Our internal code base:

$ ack self.assert. | wc -l
3232
$ ack self.fail. | wc -l
1124

-Mike

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 371: Additional Discussion

2008-06-03 Thread Mike Klaas


On 3-Jun-08, at 3:53 PM, Benjamin Peterson wrote:

On Tue, Jun 3, 2008 at 5:08 PM, Jesse Noller [EMAIL PROTECTED]  
wrote:
Also - we could leave in stubs to match the threading api - Guido,  
David
Goodger and others really prefer not to continue the broken API  
of the

threading API

I agree that the threading the the pyprocessing APIs should be PEP 8
compliant, but I think 2 APIs is almost worse than one wrong one.


A cleaner way to effectuate the transition would be to leave the  
camelCase API in 2.6 (for both modules), switch to PEP 8 in py3k (for  
both modules), and provide threading3k and multiprocessing3k modules  
in 2.6 that façade the 2.6 API with the PEP 8 API.


2to3 would rewrite 'import threading3k' to 'import threading' and  
everything would work (it would warn about 'import threading' in 2.6  
code).


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterable String Redux (aka String ABC)

2008-05-28 Thread Mike Klaas


On 28-May-08, at 2:33 PM, Bill Janssen wrote:



From what's been discussed so far, I don't see any advantage

of isinstance(o, String) over hasattr(o, 'encode') or somesuch.


Look, even if there were *no* additional methods, it's worth adding
the base class, just to differentiate the class from the Sequence, as
a marker, so that those of us who want to ask isinstance(o, String)
can do so.

Personally, I'd add in all the string methods to that class, in all
their gory complexity.  Those who need a compliant class should
subclass the String base class, and override/add what they need.


I'm not sure I agree with you on the solution, but I definitely agree  
that although str/unicode are conceptually sequences of characters, it  
is rarely useful to think of them as iterables of objects, unlike all  
other Sequences.  (Note: I don't dispute that it is occasionally  
useful to treat them as such.)


In my perfect world, strings would be indicable and sliceable, but not  
iterable.  A character iterator could be obtained using a new method,  
such as .chars().


s = 'hello world'
list(s) # exception
set(s) # exception
tuple(s) # exception
for char in s: # exception
[ord(c) for c in s] # exception
s[2] # ok
s[::-1] # ok
for char in s.chars(): # ok
[ord(c) for c in s.chars()] # ok

Though an argument could be made against this, I consider the current  
behaviour of strings one of the few instances where purity beats  
practicality in python.  It is often the cause of errors that fail too  
late in my experience.


-Mike

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Iterable String Redux (aka String ABC)

2008-05-28 Thread Mike Klaas


On 28-May-08, at 5:44 PM, Greg Ewing wrote:


Mike Klaas wrote:

In my perfect world, strings would be indicable and sliceable, but  
not  iterable.


An object that was indexable but not iterable would
be a very strange thing. If it has __len__ and __getitem__,
there's nothing to stop you iterating over it by hand
anyway, so disallowing __iter__ would just seem perverse.


Python has a beautiful abstraction in iteration: iter() is a generic  
function that allows you lazily consume a sequence of objects, whether  
it be lists, tuples, custom iterators, generators, or what have you.   
It is trivial to write your code to be agnostic to the type of  
iterable passed-in.  Almost anything else a consumer of your code  
passes in will result in an immediate exception.


Unfortunately, python has two extremely common data types which do not  
fail when this generic function is applied to them, and instead almost  
always returns a result which is not desired.  Instead, it iterates  
over the characters of the string, a behaviour which is rarely needed  
in practice due to the wealth of methods available.


I agree that it would be perverse to disallowing iterating over a  
string.  I just wish that the way to do that wasn't glommed on to the  
object-iteration abstraction.


As it stands, any consumer of iterables has to keep strings in mind.   
It is particularly irksome when the target input is an iterable of  
strings.  I recall a function that accepts a list/iterable of item  
keys, hashes them, and then retrieves values based on the item hashes  
(usually over the network, so it is necessary to batch requests).   
This function is often used in the interactive interpreter, and it is  
thus very prone to being passed-in a string rather than a list.  There  
was no good way to prevent the (frequent) mysterious not found  
errors save adding an explicit type check for basestring.


String already behaves slightly differently from the way other  
sequences act:  It is the only sequence for which 'seq in seq' is  
true, and the only sequence for which 'x in seq' can be true but  
'any(x==item for item in seq)' is false.  Abstractions are sometimes  
imperfect: this is why there is an explicit typecheck for strings in  
the sum() builtin.


I'll stop here as I realize that the likelihood that this will be  
accepted is terribly small, especially considering the late stage of  
the process.  But I would be willing to develop a patch that  
implements this behaviour on the off chance it is.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 8: Discourage named lambdas?

2008-05-03 Thread Mike Klaas

On 2-May-08, at 11:23 PM, Scott David Daniels wrote:


Mike Klaas wrote:

... A common pattern for me is to replace an instances method with a
lambda to add monitoring hooks or disable certain functionality:
  inst.get_foo = lambda: FakeFoo()
This is not replacable in one line with a def (or without locals()  
detritius).  Assuming this is good style, it seems odd that

  inst.get_foo = lambda: FakeFoo()
is acceptible style, but
  get_foo = lambda: FakeFoo()

But surely, none of these are great style, and in fact the lambda
lures you into using it.

I'd propose a far better use is:
   inst.get_foo = FakeFoo
or
   get_foo = FakeFoo



Sorry, that was a bad example.  It is obviously silly if the return  
value of the function is callable.


-Mike

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 8: Discourage named lambdas?

2008-05-02 Thread Mike Klaas

On 2-May-08, at 4:03 PM, Terry Reedy wrote:


Some people write
   somename = lambda args: expression
instead of the more obvious (to most people) and, dare I say, standard
   def somename(args): return expression

The difference in the result (the only one I know of) is that the  
code and

function objects get the generic name 'lambda' instead of the more
informative (in repr() output or tracebacks) 'somename'.  I consider  
this a

disadvantage.

In the absence of any compensating advantages (other than the trivial
saving of 3 chars), I consider the def form to be the proper Python  
style
to the point I think it should be at least recommended for the  
stdlib in

the Programming Recommendations section of PEP 8.

There are currently uses of named lambdas at least in urllib2.  This  
to me

is a bad example for new Python programmers.

What do our style mavens think?


I'm not a style maven, but I'll put forward why I don't think this is  
bad style.  Most importantly, these statements can result from  
sensible changes from what is (I believe) considered good style.


For example, consider:

registerCallback(lambda: frobnicate(7))

what if there are too places that the callback needs to be registered?

registerCallback(lambda: frobnicate(7))
registerCallback2(lambda: frobnicate(7))

DRY leads to factoring this out into a variable in a straightforward  
manner:


callback = lambda: frobnicate(7)
registerCallback(callback)
registerCallback2(callback)

Another thing to consider is that the def() pattern is only possible  
when the bound variable has no dots.  A common pattern for me is to  
replace an instances method with a lambda to add monitoring hooks or  
disable certain functionality:


inst.get_foo = lambda: FakeFoo()

This is not replacable in one line with a def (or without locals()  
detritius).  Assuming this is good style, it seems odd that


inst.get_foo = lambda: FakeFoo()

is acceptible style, but

get_foo = lambda: FakeFoo()

isn't.

(I also happen to think that the def pattern is less clear in some  
situations, but that speaks more to personal taste so isn't  
particularly relevant)


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Mike Klaas


On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote:



I don't think that should be part of the standard library. People
will mistake what it tells them for certain.


+1

I also think that it's better to educate people to add (correct)
encoding information to their text data, rather than give them a
guess mechanism...


That is a fallacious alternative: the programmers that need encoding  
detection are not the same people who are omitting encoding information.


I only have a small opinion on whether charset detection should appear  
in the stdlib, but I am somewhat perplexed by the arguments in this  
thread.  I don't see how inclusion in the stdlib would make people  
more inclined to think that the algorithm is always correct.  In terms  
of the need of this functionality:


Martin wrote:

Can you please explain why that is? Web programs should not normally
have the need to detect the encoding; instead, it should be specified
always - unless you are talking about browsers specifically, which
need to support web pages that specify the encoding incorrectly.


Any program that needs to examine the contents of documents/feeds/ 
whatever on the web needs to deal with incorrectly-specified encodings  
(which, sadly, is rather common).  The set of programs of programs  
that need this functionality is probably the same set that needs  
BeautifulSoup--I think that set is larger than just browsers grin


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Encoding detection in the standard library?

2008-04-22 Thread Mike Klaas


On 22-Apr-08, at 2:16 PM, Martin v. Löwis wrote:



Any program that needs to examine the contents of
documents/feeds/whatever on the web needs to deal with
incorrectly-specified encodings


That's not true. Most programs that need to examine the contents of
a web page don't need to guess the encoding. In most such programs,
the encoding can be hard-coded if the declared encoding is not
correct. Most such programs *know* what page they are webscraping,
or else they couldn't extract the information out of it that they
want to get at.


I certainly agree that if the target set of documents is small enough  
it is possible to hand-code the encoding.  There are many  
applications, however, that need to examine the content of an  
arbitrary, or at least non-small set of web documents.  To name a few  
such applications:


 - web search engines
 - translation software
 - document/bookmark management systems
 - other kinds of document analysis (market research, seo, etc.)


As for feeds - can you give examples of incorrectly encoded one
(I don't ever use feeds, so I honestly don't know whether they
are typically encoded incorrectly. I've heard they are often XML,
in which case I strongly doubt they are incorrectly encoded)


I also don't have much experience with feeds.  My statement is based  
on the fact that chardet, the tool that has been cited most in this  
thread, was written specifically for use with the author's feed  
parsing package.



As for whatever - can you give specific examples?


Not that I can substantiate.  Documents  feeds covers a lot of what  
is on the web--I was only trying to make the point that on the web,  
whenever an encoding can be specified, it will be specified  
incorrectly for a significant chunk of exemplars.



(which, sadly, is rather common). The
set of programs of programs that need this functionality is  
probably the
same set that needs BeautifulSoup--I think that set is larger than  
just

browsers grin


Again, can you give *specific* examples that are not web browsers?
Programs needing BeautifulSoup may still not need encoding guessing,
since they still might be able to hard-code the encoding of the web
page they want to process.


Indeed, if it is only one site it is pretty easy to work around.  My  
main use of python is processing and analyzing hundreds of millions of  
web documents, so it is pretty easy to see applications (which I have  
listed above).  I think that libraries like Mark Pilgrim's FeedParser  
and BeautifulSoup are possible consumers of guessing as well.



In any case, I'm very skeptical that a general guess encoding
module would do a meaningful thing when applied to incorrectly
encoded HTML pages.


Well, it does.  I wish I could easily provide data on how often it is  
necessary over the whole web, but that would be somewhat difficult to  
generate.  I can say that it is much more important to be able to  
parse all the different kinds of encoding _specification_ on the web  
(Content-Type/Content-Encoding/meta http-equiv tags, etc), and the  
malformed cases of these.


I can also think of good arguments for excluding encoding detection  
for maintenance reasons: is every case of the algorithm guessing wrong  
a bug that needs to be fixed in the stdlib?  That is an unbounded  
commitment.


-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)

2008-04-22 Thread Mike Klaas

Hi,

This is not a python-specific problem. See
http://en.wikipedia.org/wiki/Nagle's_algorithm

-Mike

On 17-Apr-08, at 3:08 AM, Robert Hölzl wrote:

hello,

I tried to implement a simple python XMLRPC service on a win32  
environment (client/server code inserted below).
The profiler of the client told me, that a simple function call  
needs about 200ms (even if I run it in a loop, the time needed per  
call stays the same).


After analysing the problem with etherreal I found out, that the  
XMLRPC request is transmitted via two TCP packets. One containing  
the HTTP header and one containting the data. But the acknowledge to  
the first TCP packet is delayed by 200ms.


I tried around on the server side and found out that if the server  
reads exactly all bytes transfered in the first TCP frame (via  
socket.recv()), the next socket.recv(), even if reading only one  
byte, needs about 200 ms. But if I read one byte less than  
transfered in the first TCP frame and then reading 2 bytes  
(socket.recv(2)) there is no delay, although the same total amount  
of data was read.


After some googling I found the website http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15 
, which proposed a workaround (modifing the registryentry for the  
tcp/ip driver) that did work. But modifing the clients registry  
settings is no option for us.


Is there anybody who nows how to solve the problem? Or is it even a  
problem if the python socket implementation?


By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and  
Python 2.5.1 each.


CLIENT:
--
import xmlrpclib
import profile
server = xmlrpclib.ServerProxy(http://server:80;)
profile.run('server.test(1,2)')

SERVER:
--
import SimpleXMLRPCServer
def test(a,b): return a+b
server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) )
server.register_function(test)
server.serve_forever()

--
Mit freundlichen Grüßen,
Best Regards,

Robert Hölzl
BALTECH AG

Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos
Registergericht: Amtsgericht München, HRB 115215
Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster
Aufsichtsratsvorsitzende: Eva Zeising

robert_hoelzl.vcf___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/mike.klaas%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] svnmerge and added files

2008-03-20 Thread Mike Klaas
On 20-Mar-08, at 2:32 PM, Christian Heimes wrote:

 Martin v. Löwis schrieb:
 It seems that recently, a number of merges broke in the sense
 that files added to the trunk were not merged into the
 3k branch.

 Is that a general problem with svnmerge? Should that be
 fixed to automatically do a svn add when merging changes
 that included file additions and removals?

 It sometimes happens when I do a svnmerge, revert the merge with svn
 revert -R and do a second svnmerge. Files added by the first svnmerge
 aren't added to the commit list for the second merge. Unfortunately
 svnmerge.py doesn't warn me when the file already exists.

It may not warn explicitly about that, but it certainly does warn:

M ...
Skipped path/to/missing/file...
M ...
M ...

As someone who deals with svnmerge.py a lot, I find that it is  
appropriate to treat Skipped as critical as a conflict.

I too wish that it was more explicit in this respect.
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5.2 release coming up

2008-01-29 Thread Mike Klaas
On 22-Jan-08, at 8:47 PM, Guido van Rossum wrote:

 While the exact release schedule for 2.5.2 is still up in the air, I
 expect that it will be within a few weeks. This means that we need to
 make sure that anything that should go into 2.5.2 goes in ASAP,
 preferably this week. It also means that we should be very careful
 what goes in though -- and we should be paying particular attention to
 stability on all platforms! Fortunately it looks like quite a few 2.5
 buildbots are green: http://python.org/dev/buildbot/2.5/

 I propose that anything that ought to go into 2.5.2 (or should be
 reviewed for suitability to go into it) should be marked urgent in
 the tracker, *and* have its version set to (or include) Python 2.5.

I'm not sure if it is particularly urgent because of the rarity of  
occurrence, but I discovered a bug that causes httplib to hang  
indefinitely given some rarely-occurring input in the wild.  To  
reproduce:

python -c 'import urllib2; urllib2.urlopen(http:// 
www.hunteros.com).read()'

WARNING: the page was tagged by one of our users and is definitely NSFW.

Again, it seems to occur very rarely, but the behaviour is quite  
painful and the fix trivial (see http://bugs.python.org/issue1966).

Thanks,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Contributing to Python

2008-01-04 Thread Mike Klaas
On 3-Jan-08, at 1:07 PM, Guido van Rossum wrote:

 On Jan 3, 2008 11:49 AM, Fred Drake [EMAIL PROTECTED] wrote:

 Python 2.6 seems to be entirely targeted at people who really want to
 be on Python 3, but have code that will need to be ported.  I
 certainly don't view it as interesting in it's own right.

 It will be though -- it will have genuine new features -- yes,
 backported from 3.0, but new features nevertheless, and in a
 compatible fashion.

I think that there are still tons of people like me for whom 3.0 is  
still a future concern that is too big to devote cycles to at the  
moment, but are still very much interested in improving the 2.x  
series (which improves 3.0) at the same time.

I've been inspired by this thread to start working on a few 2.6 items  
that I had in mind, starting with http://bugs.python.org/ 
issue1663329 , which mostly just needed documentation and cleanup  
(now done).

Question: should patches include edits to whatsnew.rst, or is the  
committer responsible for adding a note?

-Mike 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Request for inclusion in 2.5.2 (5-for-1)

2007-11-02 Thread Mike Klaas
On 2-Nov-07, at 6:57 AM, Guido van Rossum wrote:


 Since people are already jumping on those bugs but nobody has voiced
 an opinion on your own patch, let me say that I think it's a good
 patch, and I want it in 2.6, but I'm reluctant to add it to 2.5.2 as
 it goes well beyond a bugfix (adding a new C API and all that).

Thanks for looking at it!

Is there a better way of exposing some c-helper code for a stdlib  
module written in python?  It seems that the canonical pattern is to  
write a separate extension module called _modulename and import the  
functionality from there, but that seemed like a significantly more  
invasive patch.

Might it help to tack on the helper function in posix only, deleting  
it from the os namespace?

Thanks again,
-Mike

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Request for inclusion in 2.5.2 (5-for-1)

2007-11-01 Thread Mike Klaas
Issue http://bugs.python.org/issue1663329 details an annoyance in the  
subprocess module that has affected several users, including me.   
Essentially, closing hundreds of thousands of file descriptors by  
round-tripping through the python exception machinery is very slow,  
taking hundreds of milliseconds and at times many seconds.  The  
proposed fix is to write this loop in c.  The c function is but a  
handful of lines long.  I purposefully kept the implementation  
trivial so that it will work on all unix variants (there is another  
issue that contains a super-duper optimization for AIX, and other  
possibilities exist for Solaris, but the simple fix yields a ten-fold  
speedup everywhere but windows, so I didn't think that it was worth  
the complexity).

Though technically relating only to performance, I consider this a  
bug-fix candidate as mysterious multi-second delays when launching a  
subprocess end up making the functionality of close_fds unusable on  
some platform configurations (namely, those with high MAX_FD set).

It would be great to see this is 2.5.2.  Understanding that issue  
evaluation takes significant effort, I've done some evaluation/triage  
on other open tickets:

See issues for detailed comments.

http://bugs.python.org/issue1516330:  No clear problem, invalid  
patch.  Recommend rejection.

http://bugs.python.org/issue1516327:  No clear problem, no patch.  
Recommend closing.

http://bugs.python.org/issue1705170:  reproduced.  Conjecture as to  
why it is occurring, but I don't know the guts well enough to propose  
a decent fix.

http://bugs.python.org/issue1773632: tested patch.  Recommend  
accepting unless there are things I don't know about this mysterious  
_xmlrpclib extension (which there doubtlessly are)

http://bugs.python.org/issue738948: Rather old PEP that has gathered  
no comments.  Calling it a PEP is generous--it is really just a  
link to an academic paper with a note about how this might be  
integrated into Stackless.

Thanks,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding concat function to itertools

2007-09-28 Thread Mike Klaas
On 28-Sep-07, at 10:45 AM, Raymond Hettinger wrote:

 [Bruce Frederiksen]
  I've added a new function to itertools called 'concat'.  This  
 function is
 much like chain, but takes all of the iterables as a single  
 argument.

 Any practical use cases or is this just a theoretical improvement?

 For Py2.x, I'm not willing to unnecessarily expand the module.
 However, for Py3k, I'm open to changing the signature for chain().

For me, a fraction of chain() uses are of the * variety:

d = defaultdict(list)
allvals = chain(*d.values())

return chain(*imap(cache.__getitem__, keylist))

Interestingly, they seem to all have something to do with dictionary  
values() that are themselves iterable.

-Mike


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote:

 I have needed to push my stack to teach REs (don't ask), and am
 taking a look at the RE code.  I may be able to extend it to support
 RFE 694374 and (more importantly) atomic groups and possessive
 quantifiers.  While I regard such things as revolting beyond belief,
 they make a HELL of a difference to the efficiency of recognising
 things like HTML tags in a morass of mixed text.

+1.  I would use such a feature.

 The other approach, which is to stick to true regular expressions,
 and wholly or partially convert to DFAs, has already been rendered
 impossible by even the limited Perl/PCRE extensions that Python
 has adopted.

Impossible?  Surely, a sufficiently-competent re engine could detect  
when a DFA is possible to construct?

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Regular expressions, Unicode etc.

2007-08-08 Thread Mike Klaas
In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote:


 The other approach, which is to stick to true regular expressions,
 and wholly or partially convert to DFAs, has already been rendered
 impossible by even the limited Perl/PCRE extensions that Python
 has adopted.

 Impossible?  Surely, a sufficiently-competent re engine could detect
 when a DFA is possible to construct?

 I doubt it.  While it isn't equivalent to the halting problem, it IS
 an intractable one!  There are two problems:

 Firstly, things like backreferences are an absolute no-no.  They
 are not regular, and REs with them in cannot be converted to DFAs.
 That could be 'solved' by a parser that kicked out such constructions,
 but it would get screams from many users.

 Secondly, anything involving explicit or implicit negation can lead
 to (if I recall) a super-exponential explosion in the size of the
 DFA.  That could be 'solved' by imposing a limit, but few people
 would be able to predict when it would bite.

Right.  The analysis I envisioned would be more along the lines of  
if troublesome RE extensions are used, do not attempt to construct a  
DFA.  It could even be exposed via an alternate api (re.compile_dfa 
()) that admitted a subset of the usual grammar.

 Thirdly, I would require notice of the question of whether capturing
 parentheses could be supported, and what semantic changes would be
 to which were set and how.

Capturing groups are rather integral to the python regex api and, as  
you say, a major difficulty for DFA-based implementations.  Sounds  
like a task best left to a thirdparty package.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: [ python-Patches-1744382 ] Read Write lock

2007-07-06 Thread Mike Klaas
On 6-Jul-07, at 6:45 AM, Yaakov Nemoy wrote:


 I can do the other three parts, but I am wondering, how do I write a
 deterministic test unit for my patch?  How is it done with the
 threading model in python in general?

I don't know how it is done in general, but for reference, here are  
some of the unittests for my read/write lock class:

 def testReadCount(self):
 wrlock = ReadWriteLock()
 read, write = wrlock.reader, wrlock.writer

 self.assertEqual(wrlock.readerCount, 0)
 read.acquire()
 self.assertEqual(wrlock.readerCount, 1)
 read.acquire()
 self.assertEqual(wrlock.readerCount, 2)
 read.release()
 self.assertEqual(wrlock.readerCount, 1)
 read.release()
 self.assertEqual(wrlock.readerCount, 0)

 def testContention(self):
 wrlock = ReadWriteLock()
 read, write = wrlock.reader, wrlock.writer

 class Writer(Thread):
 gotit = False
 def run(self):
 write.acquire()
 self.gotit = True
 write.release()
 writer = Writer()

 self.assertEqual(wrlock.readerCount, 0)
 read.acquire()
 self.assertEqual(wrlock.readerCount, 1)
 writer.start()
 self.assertFalse(writer.gotit)

 read.acquire()
 self.assertEqual(wrlock.readerCount, 2)
 self.assertFalse(writer.gotit)

 read.release()
 self.assertEqual(wrlock.readerCount, 1)
 self.assertFalse(writer.gotit)

 read.release()
 self.assertEqual(wrlock.readerCount, 0)
 time.sleep(.1)
 self.assertTrue(writer.gotit)

 def testWRAcquire(self):
 wrlock = ReadWriteLock()
 read, write = wrlock.reader, wrlock.writer

 self.assertEqual(wrlock.readerCount, 0)
 write.acquire()
 write.acquire()
 write.release()
 write.release()

 read.acquire()
 self.assertEqual(wrlock.readerCount, 1)
 read.acquire()
 self.assertEqual(wrlock.readerCount, 2)
 read.release()
 self.assertEqual(wrlock.readerCount, 1)
 read.release()
 self.assertEqual(wrlock.readerCount, 0)
 write.acquire()
 write.release()

 def testOwnAcquire(self):
 wrlock = ReadWriteLock()
 read, write = wrlock.reader, wrlock.writer

 class Writer(Thread):
 gotit = False
 def run(self):
 write.acquire()
 self.gotit = True
 write.release()
 writer = Writer()

 self.assertEqual(wrlock.readerCount, 0)
 read.acquire()
 self.assertEqual(wrlock.readerCount, 1)
 writer.start()
 self.assertFalse(writer.gotit)

 # can acquire the write lock if only
 # this thread has the read lock
 write.acquire()
 write.release()

 read.acquire()
 self.assertEqual(wrlock.readerCount, 2)
 self.assertFalse(writer.gotit)

 read.release()
 self.assertEqual(wrlock.readerCount, 1)
 self.assertFalse(writer.gotit)

 read.release()
 self.assertEqual(wrlock.readerCount, 0)
 time.sleep(.1)
 self.assertTrue(writer.gotit)


 def testDeadlock(self):
 wrlock = ReadWriteLock()
 read, write = wrlock.reader, wrlock.writer

 errors = []

 # a situation which can readily deadlock if care isn't taken
 class LockThread(threading.Thread):
 def __init__(self):
 threading.Thread.__init__(self)
 self.q = Queue.Queue()
 def run(self):
 while True:
 task, lock, delay = self.q.get()
 if not task:
 break
 time.sleep(delay)
 if task == 'acquire':
 for delay in waittime(maxTime=5.0):
 if lock.acquire(False):
 break
 time.sleep(delay)
 else:
 errors.append(Couldn't acquire %s % str 
(lock))
 else:
 lock.release()

 thrd = LockThread()
 thrd.start()

 thrd.q.put(('acquire', read, 0))
 time.sleep(.2)
 read.acquire()
 thrd.q.put(('acquire', write, 0))
 thrd.q.put(('release', write, .5))
 thrd.q.put(('release', read, 0))
 write.acquire()
 time.sleep(0.0)
 write.release()
 read.release()

 # end
 thrd.q.put((None, None, None))
 thrd.join()

 self.assertFalse(errors, Errors: %s % errors)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 

Re: [Python-Dev] Py2.6 buildouts to the set API

2007-05-18 Thread Mike Klaas
On 18-May-07, at 6:34 PM, Raymond Hettinger wrote:

 Here some ideas that have been proposed for sets:

 * New method (proposed by Shane Holloway):  s1.isdisjoint(s2).   
 Logically equivalent to not s1.intersection(s2) but has an early- 
 out if a common member is found.  The speed-up is potentially large  
 given two big sets that may largely overlap or may not intersect at  
 all.  There is also a memory savings since a new set does not have  
 to be formed and then thrown away.

+1.  Disjointness verification is one of my main uses for set(), and  
though I don't think that the early-out condition would trigger often  
in my code, it would increase readability.

 * Additional optional arguments for basic set operations to allow  
 chained operations.  For example, s=s1.union(s2, s3, s4) would be  
 logically equivalent to s=s1.union(s2).union(s3).union(s4) but  
 would run faster because no intermediate sets are created, copied,  
 and discarded.  It would run as if written:  s=s1.copy(); s.update 
 (s2); s.update(s3); s.update(s4).

It's too bad that this couldn't work with the binary operator spelling:

s = s1 | s2 | s3 | s4

 * Make sets listenable for changes (proposed by Jason Wells):

 s = set(mydata)
 def callback(s):
  print 'Set %d now has %d items' % (id(s), len(s))
 s.listeners.append(callback)
 s.add(existing_element)   # no callback
 s.add(new_element)# callback

-1 on the base set type: it seems too complex for a base set type.   
Also, there are various possible semantics that might be desirable,  
such as receiving the added element, or returning False to prevent  
addition.

The proper place is perhaps a subclass of set with a magic method  
(analogous to defaultdict/__missing__).

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of Tracker Issues

2007-05-15 Thread Mike Klaas
On 15-May-07, at 12:32 AM, Georg Brandl wrote:


 There are two problems with this:
 * The set of questions is limited, and bots can be programmed to  
 know them all.

Sure, but if someone is customizing their bot to python's issue  
tracker, in all likelyhood they would have to be dealt with specially  
anyway.Foiling automated bots shoud be the first priority--they  
should represent the vast majority of cases.


 * Even programmers might not immediately know an answer, and I can  
 understand
   them turning away on that occasion (take for example the name- 
 binding term).

It would be hard to make it so easy that anyone with business  
submitting a bug report should know the answer:

What python keyword is used to define a function?
What file extension is typically used for python source files?
etc.

If there is still worry, then a failed answer could simply be the  
moderation trigger.

-Mike


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 30XZ: Simplified Parsing

2007-05-04 Thread Mike Klaas
On 5/4/07, Baptiste Carvello [EMAIL PROTECTED] wrote:

 maybe we could have a dedent literal that would remove the first newline and
 all indentation so that you can just write:

 call_something( d'''
  first part
  second line
  third line
  ''' )

Surely

from textwrap import dedent as d

is close enough?

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] (no subject)

2007-04-30 Thread Mike Klaas
On 4/30/07, Greg Ewing [EMAIL PROTECTED] wrote:
 JOSHUA ABRAHAM wrote:
  I was hoping you guys would consider creating  function in os.path or
  otherwise that would find the full path of a file when given only it's base
  name and nothing else.I have been made to understand that this is not
  currently possible.

 Does os.path.abspath() do what you want?

 If not, what exactly *do* you want?

probably:

def find_in_path(filename):
 for path in os.environ['PATH'].split(os.pathsep):
   if os.path.exists(filename):
return os.path.abspath(os.path.join(path, filename))

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] regexp in Python

2007-03-23 Thread Mike Klaas
On 3/23/07, Fredrik Lundh [EMAIL PROTECTED] wrote:
 Bartlomiej Wolowiec wrote:

  For some time I'm interested in regular expressions and Finite State 
  Machine.
  Recently, I saw that Python uses Secret Labs' Regular Expression Engine,
  which very often works too slow. Its pesymistic time complexity is O(2^n),
  although other solutions, with time complexity O(n*m) ( O(n*m^2), m is the
  length of the regular expression and n is the length of the text,
  introduction to problem: http://swtch.com/~rsc/regexp/regexp1.html )

 that article almost completely ignores all the subtle capturing and left-
 to-right semantics that a perl-style engine requires, though.  trust me,
 this is a much larger can of worms than you might expect.  but if you're
 ready to open it, feel free to hack away.

  major part of regular expressions

 the contrived example you used has nothing whatsoever to do with
 major part of regular expressions as seen in the wild, though.  I'd
 be much more interested in optimizations that focuses on patterns
 you've found in real code.

A fruitful direction that is not as ambitious as re-writing the entire
engine would be to add independent group assertions to python's RE
syntax [ (? ... ) in perl].  They are rather handy for optimizing the
malperforming cases alluded to here (which rarely occur as the OP
posted, but tend to crop up in less malignant forms).

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] deprecate commands.getstatus()

2007-03-22 Thread Mike Klaas
On 3/22/07, Greg Ewing [EMAIL PROTECTED] wrote:
 Titus Brown wrote:

  I could add in a 'system'-alike call easily enough; that was suggested.
  But I think
 
returncode = subprocess.call(program)
 
  is pretty simple, isn't it?

 Something to keep in mind is that system() doesn't
 directly launch a process running the command, it
 uses a shell. So it's not just simple sugar for
 some subprocess.* call.

 subprocess.call(ls | grep tmp, shell=True)
svn-commit.2.tmp
svn-commit.tmp

The more important difference is the encoding of the return value:
system() has magic to encode signal-related termination of the child
process.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Proposal to revert r54204 (splitext change)

2007-03-15 Thread Mike Klaas
On 3/15/07, Mike Krell [EMAIL PROTECTED] wrote:

 Here is a point of confusion.  Bear in mind I'm running this under
 windows, so explorer happily reports that .emacs has a type of
 emacs.  (In windows, file types are registered in the system based
 on the extension -- all the characters following the last dot.  An
 unregistered extension is listed as its own type.  Thus files ending
 in .txt are listed as type Text Document, but files ending in
 .emacs are listed as type emacs because it's an unregistered
 extension.)

Unix-derived files prepended with a dot (like .emacs) are not meant to
be interpreted as a file type.  It may be useful on occasion when
using windows, but it certainly is not the intent of a dotfile.

The following files reside in my /tmp:
.X0-lock
.X100-lock
.X101-lock
.X102-lock
.X103-lock
.X104-lock
.X105-lock
.X106-lock
.X11-unix
.X99-lock

...which are certainly not all unnamed files of different type.

 I often sort files in the explorer based on type, and I want a file
 and all its backups to appear next to each other in such a sorted
 list.  That's exactly why I rename the files the way I do.
 Thus, .1.emacs is what I want, and .emacs.1 is a markedly inferior
 and unacceptable alternative.  That's what I'm referring to by
 extension preservation.

Unacceptable?  You code fails in (ISTM) the more common case of an
extensionless file.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Patch 1644818: Allow importing built-in submodules

2007-03-12 Thread Mike Klaas
On 3/12/07, Miguel Lobo [EMAIL PROTECTED] wrote:

  Yet, the same can be said for most other patches: they are all for the
  benefit of users running into the same respective problems.

  Agreed.  What I mean is that this fasttrack system where the submitter has
 to do some extra work seems to imply that accepting the patch somehow
 benefits the submitter.  In fact I'm probably the person the patch will
 benefit least, because I have already run into the problem and know how to
 solve it.  I feel responsible for defending the patch since I've written it
 and I know the problem it fixes and my solution better than anybody else,
 but I don't see how that responsibility extends to having to do extra
 unrelated work to have the patch accepted.

It is certainly not your _responsibility_ to review additional patches
to get your accepted; without doing so, it likely will be accepted,
eventually (assuming it is correct).

As far as I understand, Martin's offer is purely a personal one:
there is a patch backlog, and if you help clear it out, he will help
your patch get processed faster.

cheers,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] locals(), closures, and IronPython...

2007-03-06 Thread Mike Klaas
On 3/6/07, Greg Ewing [EMAIL PROTECTED] wrote:

 Although you can get a similar effect now by doing

   def __init__(self, **kwds):
  args = dict(prec=None, rounding=None,
traps=None, flags=None,
_rounding_decision=None,
Emin=None, Emax=None,
capitals=None, _clamp=0,
_ignored_flags=None)
  args.update(kwds)
  for name, value in args:
 ...

 So, no need for locals() here.

Yes, that is the obvious approach.  But it is painful to abandon the
introspectable signature.

There's nothing quite like running help(func) and getting *args,
**kwargs as the documented parameter list.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bool conversion wart?

2007-02-22 Thread Mike Klaas
On 2/22/07, Neal Becker [EMAIL PROTECTED] wrote:

 Well consider this:
 str (4)
 '4'
 int(str (4))
 4
 str (False)
 'False'

 bool(str(False))
 True

 Doesn't this seem a bit inconsisent?

Virtually no python objects accept a stringified version of themselves
in their constructor:

 str({})
'{}'
 dict('{}')
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: dictionary update sequence element #0 has length 1; 2 is required
 str([])
'[]'
 list('[]')
['[', ']']

Python is not Perl.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bool conversion wart?

2007-02-22 Thread Mike Klaas
On 2/22/07, Neal Becker [EMAIL PROTECTED] wrote:

 Except, all the numeric types do, including int, float, and complex.  But
 not bool.

Oh?

In [5]: str(complex(1, 2))
Out[5]: '(1+2j)'

In [6]: complex(str(complex(1, 2)))
---
type 'exceptions.ValueError': complex() arg is a malformed string


 In fact, this is not just academic.  The fact that other numeric
 types act this way leaves a reasonable expectation that bool will.
 Instead, bool fails in _the worst possible way_: it silently gives a _wrong
 result_.

I'd debate the assertion that 'bool' is a numeric type (despite being
a subclass of int).

For bool() to return anything other than the value of the python
expression evaluated in boolean context would be _lunacy_ and there is
absolutely no chance it that will be changed.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of dynamic attribute access discussion

2007-02-13 Thread Mike Klaas
On 2/13/07, Josiah Carlson [EMAIL PROTECTED] wrote:

 As for people who say, but getattr, setattr, and delattr aren't used;
 please do some searches of the Python standard library.  In a recent
 source checkout of the trunk Lib, there are 100+ uses of setattr, 400+
 uses of getattr (perhaps 10-20% of which being the 3 argument form), and
 a trivial number of delattr calls.  In terms of applications where
 dynamic attribute access tends to happen; see httplib, urllib, smtpd,
 the SocketServer variants, etc.

Another data point:  on our six-figure loc code base, we have 123
instances of getattr, 30 instances of setattr, and 0 instances of
delattr.  There are 5 instances of setattr( ... getattr( ... ) ) on
one line (and probably a few more that grep didn't pick up that span
multiple lines).

As a comparison, enumerate (that I would have believed was much more
frequent a priori), is used 67 times, and zip/izip 165 times.

+1 on .[] notation and the idea in general.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of dynamic attribute access discussion

2007-02-13 Thread Mike Klaas
On 2/13/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Mike As a comparison, enumerate (that I would have believed was much
 Mike more frequent a priori), is used 67 times, and zip/izip 165 times.

 But (get|set|has)attr has been around much longer than enumerate.  I'm
 almost certain they existed in 1.5, and perhaps as far back as 1.0.  If you
 really want to compare the two, go back to your code baseline before
 enumerate was added to Python (2.3?) and subtract from your counts all the
 *attr calls that existed then and then compare the adjusted counts with
 enumerate.

The entire codebase was developed post-2.4, and I am a bit of an
enumerate-nazi, so I don't think that is a concern g.

 Given that you have more uses of zip/izip maybe we should be discussion
 syntactic support for that instead. ;-)

There are even more instances of len()... len(seq) - |seq|? g

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Summary of dynamic attribute access discussion

2007-02-13 Thread Mike Klaas
On 2/13/07, Greg Ewing [EMAIL PROTECTED] wrote:
 Mike Klaas wrote:

  As a comparison, enumerate (that I would have believed was much more
  frequent a priori), is used 67 times, and zip/izip 165 times.

 By that argument, we should be considering a special syntax
 for zip and izip before getattr.

I don't really buy that.  Frequency of use must be balanced against
the improvement in legibility.  Assuming that my figures bear some
correspondence to typical usage patterns, enumerate() was introduced
despite the older idiom of for i, item in zip(xrange(len(seq)), seq):
being less frequent than getattr.  SImilarly, I see no clamor to add
syntactic support for len().  It's current usage is clear.

[note: this post is not continuing to argue in favour of the proposal]

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Interning string subtype instances

2007-02-12 Thread Mike Klaas
On 2/12/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote:

 cause problems for other users of the interned string.  I agree with the
 reasoning, but propose a different solution: when interning an instance
 of a string subtype, PyString_InternInPlace could simply intern a copy.

Interning currently requires an external reference to prevent garbage
collection (I believe).  What will hold a reference to the string
copy?

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Object creation hook

2007-01-23 Thread Mike Klaas
On 1/23/07, Kristján V. Jónsson [EMAIL PROTECTED] wrote:

 Hello there.

 I am trying to insert a hook into python enabling a callback for all
 just-created objects.  The intention is to debug and find memory leaks, e.g.
 by having the hook function insert the object into a WeakKeyDictionary.

 I have already added a method to object to set such a hook, and
 object_new now calls it upon completion, but this is far from covering all
 places.  Initially, I thought object_init were the place, but almost no
 classes call object.__init__ from their __init__ method.  Then there is the
 separate case of old-style classes.



 Any suggestions on how to do a global object creation hook in python?

When I've used such things in the past, I usually had some idea which
classes I was interested in targeting.  I used a metaclass for doing
the tracking, and either invoked it on individual classes, or used
__metaclass__ = X to apply it (something like class object(object):
__metaclass__ = X would do the try for new-style class that inherit
from object directly).

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The bytes type

2007-01-12 Thread Mike Klaas
On 1/12/07, Raymond Hettinger [EMAIL PROTECTED] wrote:
 [A.M. Kuchling]
  2.6 wouldn't go changing existing APIs to begin requiring or returning
  the bytes type[*], of course, but extensions and new modules might use
  it.

 The premise is dubious.

 If I am currently maintaining a module, why would I switch to a bytes type
 and forgo compatibility with Py2.5 and prior?  I might as well just convert
 it to run on Py3.0 and leave my Py2.5 code as-is for people who want to
 run 2.x.

A mutables bytes type is a useful addition to 2.X aside of the
3.0-compatibility motivation.  Isn't that sufficient justification?

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 2.5.1 plans

2007-01-04 Thread Mike Klaas
On 1/4/07, Ralf W. Grosse-Kunstleve [EMAIL PROTECTED] wrote:
 It would be nice if this simple fix could be included (main branch and 2.5.1):

 https://sourceforge.net/tracker/?func=detailaid=1598181group_id=5470atid=105470

   [ 1598181 ] subprocess.py: O(N**2) bottleneck

 I submitted the trivial fix almost two months ago, but apparently nobody 
 feels responsible...

I just reviewed the patch, which should help it get accepted.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Non-blocking (asynchronous) timer without thread?

2006-12-22 Thread Mike Klaas
On 12/22/06, Evgeniy Khramtsov [EMAIL PROTECTED] wrote:
 The question to python core developers:
 Is there any plans to implement non-blocking timer like a
 threading.Timer() but without thread?
 Some interpreted languages (like Tcl or Erlang) have such functionality,
 so I think it would be a great
 feature in Python :)

 The main goal is to prevent threads overhead and problems with race
 conditions and deadlocks.

I'm not sure how having python execute code at an arbitrary time would
_reduce_ race conditions and/or deadlocks.  And if you want to make it
safe by executing code that shares no variables or resources, then it
is no less safe to use threads, due to the GIL.

If you can write you application in an event-driven way, Twisted might
be able to do what you are looking for.

cheers,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [NPERS] Re: a feature i'd like to see in python #2: indexing of match objects

2006-12-06 Thread Mike Klaas
On 12/6/06, Alastair Houghton [EMAIL PROTECTED] wrote:

[from previous message]:
 Anyway, clearly what people will expect here (talking about the match
 object API) is that m[3:4] would give them a list (or some equivalent
 sequence object) containing groups 3 and 4.  Why do you think someone
 would expect a match object?

 It's much more likely to be confusing to people that they have to write

list(m)[x:y]
 or
[m[i] for i in xrange(x,y)]
 when m[x] and m[y] work just fine.


 Look, I give in.  There's no point trying to convince any of you
 further, and I don't have the time or energy to press the point.
 Implement it as you will.  If necessary it can be an extension of my
 re replacement that slicing is supported on match objects.

Keep in mind when implementing  that m[3:4] should contain only the
element at index 3, not both 3 and 4, as you've seemed to imply twice.

cheers,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Segfault in python 2.5

2006-10-18 Thread Mike Klaas
[http://sourceforge.net/tracker/index.php?func=detailaid=1579370group_id=5470atid=105470]

Hello,

I'm managed to provoke a segfault in python2.5 (occasionally it just a
invalid argument to internal function error).  I've posted a
traceback and a general idea of what the code consists of in the
sourceforge entry.  Unfortunately, I've been attempting for hours to
reduce the problem to a completely self-contained script, but it is
resisting my efforts due to timing problems.

Should I continue in that vein, or is it more useful to provide more
detailed results from gdb?

Thanks,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Segfault in python 2.5

2006-10-18 Thread Mike Klaas
On 10/18/06, Michael Hudson [EMAIL PROTECTED] wrote:
 Mike Klaas [EMAIL PROTECTED] writes:

 I've been reading the bug report with interest, but unless I can
 reproduce it it's mighty hard for me to debug, as I'm sure you know.

Indeed.

  Unfortunately, I've been attempting for hours to
  reduce the problem to a completely self-contained script, but it is
  resisting my efforts due to timing problems.
 
  Should I continue in that vein, or is it more useful to provide more
  detailed results from gdb?

 Well, I don't think that there's much point in posting masses of
 details from gdb.  You might want to try trying to fix the bug
 yourself I guess, trying to figure out where the bad pointers come
 from, etc.

I've peered at the code, but my knowledge of the python core is
superficial at best.  The fact that it is occuring as a result of a
long string of garbage collection/dealloc/etc. and involves threading
lowers my confidence further.   That said, I'm beginning to think that
to reproduce this in a standalone script will require understanding
the problem in greater depth regardless...

 Are you absolutely sure that the fault does not lie with any extension
 modules you may be using?  Memory scribbling bugs have been known to
 cause arbitrarily confusing problems...

I've had sufficient experience being arbitrarily confused to never be
sure about such things, but I am quite confident.  The script I posted
in the bug report is all stock python save for the operation in 's.
That operation is pickling and unpickling (using pickle, not cPickle)
a somewhat complicated pure-python instance several times.  It's doing
nothing with the actual instance--it just happens to take the right
amount of time to trigger the segfault.  It's still not perfect--this
trimmed-down version segfaults only sporatically, while the original
python script segfaults reliably.

-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Segfault in python 2.5

2006-10-18 Thread Mike Klaas
On 10/18/06, Tim Peters [EMAIL PROTECTED] wrote:
 [Mike Klaas]
  Indeed.

 Note that I just attached a much simpler pure-Python script that fails
 very quickly, on Windows, using a debug build.  Read the new comment
 to learn why both Windows and debug build are essential to it
 failing reliably and quickly ;-)

Thanks!  Next time I find a bug, installing Windows will  certainly be
my first step g.


 Yes, but you did good!  This is still just an educated guess on my
 part, but my education here is hard to match ;-):  this new business
 of generators deciding to clean up after themselves if they're left
 hanging appears to have made it possible for a generator to hold on to
 a frame whose thread state has been free()'d, after the thread that
 created the generator has gone away.  Then when the generator gets
 collected as trash, the new exception-based clean up abandoned
 generator gimmick tries to access the generator's frame's thread
 state, but that's just a raw C struct (not a Python object with
 reachability-based lifetime), and the thread free()'d that struct when
 the thread went away.  The important timing-based vagary here is
 whether dead-thread cleanup gets performed before the main thread
 tries to clean up the trash generator.

Indeed--and normally it doesn't happen that way.  My/your script never
crashes on the first iteration because the thread's target is the
generator and thus it gets DECREF'd before the thread terminates.  But
the exception from the first iteration holds on to a reference to the
frame/generator so when it gets cleaned up (in the second iteration,
due to a new exception overwriting it) the generator is freed after
the thread is destroyed.  At least, I think...


 Offhand I don't know how to repair it.  Thread states /aren't/ Python
 objects, and there's no provision for a thread state to outlive the
 thread it represents.

Take this with a grain of salt, but ISTM that the problem can be
repaired by resetting the generator's frame threadstate to the current
threadstate:

(in genobject.c:gen_send_ex():80)
Py_XINCREF(tstate-frame);
assert(f-f_back == NULL);
f-f_back = tstate-frame;
+f-f_tstate = tstate;

gen-gi_running = 1;
result = PyEval_EvalFrameEx(f, exc);
gen-gi_running = 0;

Shouldn't the thread state generally be the same anyway? (I seem to
recall some gloomy warning against resuming generators in separate
threads).

This solution is surely wrong--if f_tstate != tstate, then the
generator _is_ being resumed in another thread and so the generated
traceback will be wrong (among other issues which surely occur by
fudging a frame's threadstate).  Perhaps it could be set conditionally
by gen_close before signalling the exception?  A lie, but a smaller
lie than a segfault.  We could advertise that the exception ocurring
from generator .close() isn't guaranteed to have an accurate traceback
in this case.

Take all this with a grain of un-core-savvy salt.

Thanks again for investigating this, Tim,
-Mike
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com