Re: [Python-Dev] bytes / unicode
On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking i...@colorstudy.com wrote: This reminds me of the optimization ElementTree and lxml made in Python 2 (not sure what they do in Python 3?) where they use str when a string is ASCII to avoid the memory and performance overhead of unicode. An optimization that forces me to typecheck the return value of the function and that I only discovered after code started breaking. I can't say was enthused about that decision when I discovered it. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the GIL (with a BFS scheduler)
On Sun, May 16, 2010 at 1:07 PM, Nir Aides n...@winpdb.org wrote: Relevant Python issue: http://bugs.python.org/issue7946 Is there any chance Antoine's gilinter patch from that issue might be applied to python 2.7? I have been experiencing rare long delays in simple io operations in multithreaded python applications, and I suspect that they might be related to this issue. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fixing the GIL (with a BFS scheduler)
On Tue, May 18, 2010 at 2:50 PM, Antoine Pitrou solip...@pitrou.net wrote: There's no chance for this since the patch relies on the new GIL. (that's unless there's a rush to backport the new GIL in 2.7, of course) Thanks I missed that detail. I think your rare long delays might be related to the old GIL's own problems, though. How long are they? Typically between 20 and 60s. This is the time it takes to send and receive a single small packet on an already-active tcp connection to ensure it is still alive. Most of the time it is 1ms. I don't have strong evidence that GIL issues are causing the problem, because I can't reliably reproduce the issue. But the general setup is similar (one thread doing light io experiencing odd delays in a process with multiple threads that are often cpu-bound, on a multi-core machine) thanks, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] deprecated stuff in standard library
On Fri, Feb 19, 2010 at 9:03 AM, Sjoerd Mullender sjo...@acm.org wrote: The policy should also be, if someone decides (or rather, implements) a deprecation of a module, they should do a grep to see where that module is used and fix the code. It's not rocket science. I'm not sure if you're aware of it, but you're starting to sound a little rude. ISTM that it doesn't make sense to waste effort ensuring that deprecated code is updated to not call other deprecated modules. Of course, all released non-deprecated code should steer clear of deprecated apis. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] patch to make list.pop(0) work in O(1) time
On Mon, Jan 25, 2010 at 11:32 AM, Daniel Stutzbach dan...@stutzbachenterprises.com wrote: On Mon, Jan 25, 2010 at 1:22 PM, Steve Howell showel...@yahoo.com wrote: I haven't completely worked out the best strategy to eventually release the memory taken up by the pointers of the unreleased elements, but the worst case scenario is that the unused memory only gets wasted until the time that the list itself gets garbage collected. FWIW, for a long-running FIFO queue, it's critical to release some of the memory along the way, otherwise the amount of wasted memory is unbounded. Good luck :) It seems to me that the best way to do this is invert .append() logic: leave at most X amount of wasted space at the beginning of the list, where X is a constant fraction of the list size. Whether it is worth adding a extra pointer to the data stored by a list is another story. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.7 Release? 2.7 == last of the 2.x line?
On Tue, Nov 3, 2009 at 10:42 AM, Georg Brandl g.bra...@gmx.net wrote: sstein...@gmail.com schrieb: On Nov 3, 2009, at 12:28 PM, Arc Riley wrote: The main thing holding back the community are lazy and/or obstinate package maintainers. If they spent half the time they've put into complaining about Py3 into actually working to upgrade their code they'd be done now. That's an inflammatory, defamatory, unsubstantiated, hyperbolic, sweeping overgeneralization. I know a few maintainers, and I have no problem seeing how Arc came to that conclusion. Be that as it may, the only way python 3 will be widely adopted if people have motivation to (need to be compatible with other libs, pressure from users, their own interest in fostering python 3.0, etc.). Deriding them as lazy accomplishes nothing and obscures the fact that it is the python maintainers responsibility to bring about this motivation if they want python 3.0 to be adopted. No-one is going to convert to python 3.0 because you called them lazy. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pthreads, fork, import, and execvp
On Thu, Jul 16, 2009 at 1:08 PM, Thomas Wouters tho...@python.org wrote: Picking up a rather old discussion... We encountered this bug at Google and I'm now incentivized to fix it. For a short recap: Python has an import lock that prevents more than one thread from doing an import at any given time. However, unlike most of the locks we have lying around, we don't clear that lock in the child after an os.fork(). That means that doing an os.fork() during an import means the child process can't do any other imports. It also means that doing an os.fork() *while another thread is doing an import* means the child process can't do any other imports. Since this three-year-old discussion we've added a couple of post-fork-cleanups to CPython (the TLS, the threading module's idea of active threads, see Modules/signalmodule.c:PyOS_AfterFork) and we already do simply discard the memory for other locks held during fork (the GIL, see Python/ceval.c:PyEval_ReInitThreads, and the TLS lock in Python/thread.c:PyThread_ReInitTLS) -- but not so with the import lock, except when the platform is AIX. I don't see any particular reason why we aren't doing the same thing to the import lock that we do to the other locks, on all platforms. It's a quick fix for a real problem (see http://bugs.python.org/issue1590864 and http://bugs.python.org/issue1404925 for two bugreports that seem to be this very issue.) +1. We were also affected by this bug, getting sporatic deadlocks in a multi-threaded program that fork()s subprocesses to do processing. It took a while to figure out what was going on. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 383 and GUI libraries
On 30-Apr-09, at 7:39 AM, Guido van Rossum wrote: FWIW, I'm in agreement with this PEP (i.e. its status is now Accepted). Martin, you can update the PEP and start the implementation. +1 Kudos to Martin for seeing this through with (imo) considerable patience and dignity. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Rethinking intern() and its data structure
On 9-Apr-09, at 6:24 PM, John Arbash Meinel wrote: Greg Ewing wrote: John Arbash Meinel wrote: And the way intern is currently written, there is a third cost when the item doesn't exist yet, which is another lookup to insert the object. That's even rarer still, since it only happens the first time you load a piece of code that uses a given variable name anywhere in any module. Somewhat true, though I know it happens 25k times during startup of bzr... And I would be a *lot* happier if startup time was 100ms instead of 400ms. I don't want to quash your idealism too severely, but it is extremely unlikely that you are going to get anywhere near that kind of speed up by tweaking string interning. 25k times doing anything (computation) just isn't all that much. $ python -mtimeit -s 'd=dict.fromkeys(xrange(1000))' 'for x in xrange(25000): d.get(x)' 100 loops, best of 3: 8.28 msec per loop Perhaps this isn't representative (int hashing is ridiculously cheap, for instance), but the dict itself is far bigger than the dict you are dealing with and such would have similar cache-busting properties. And yet, 25k accesses (plus python-c dispatching costs which you are paying with interning) consume only ~10ms. You could do more good by eliminating a handful of disk seeks by reducing the number of imported modules... -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] speeding up PyObject_GetItem
On 24-Mar-09, at 3:15 PM, Raymond Hettinger wrote: 4% on a micro-micro-benchmark is hardly compelling... I concur! This is utterly insignificant and certainly does not warrant removing the checks. -1 on these sort of fake optimizations. We should focus on algorithmic improvements and eliminating redundant work and whatnot. Removing checks that were put there for a reason doesn't seem useful at all. To be fair, the main proposed optimization(s) would speed up the microbenchmark by 15-25% (Daniel already stated that the NULL checks didn't have a significant impact). This seems significant, considering that tight loops whose cost is heavily due to array access are common. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] a nicer looking dir()
Someone has implemented a version of dir() which is much nicer for human consumption. The difference is striking enough that I thought it would be bringing to python-dev's attention. http://github.com/inky/see/tree/master pencil_case = [] dir(pencil_case) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__delsli ce__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__getitem__', '__gets lice__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', ' __le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__r educe_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__ ', '__setslice__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'p op', 'remove', 'reverse', 'sort'] see(pencil_case) ? [] for in + * += *= = == != = len() .append() .count() .extend() .index() .insert() .pop() .remove() .reverse() .sort() I'm not sure that this type of functionality merits a new built-in, but it might be useful as part of help()'s output. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] C API for appending to arrays
On 2-Feb-09, at 9:21 AM, Hrvoje Niksic wrote: It turns out that an even faster method of creating an array is by using the fromstring() method. fromstring() requires an actual string, not a buffer, so in C++ I created an std::vectordouble with a contiguous array of doubles, passed that array to PyString_FromStringAndSize, and called array.fromstring with the resulting string. Despite all the unnecessary copying, the result was much faster than either of the previous versions. Would it be possible for the array module to define a C interface for the most frequent operations on array objects, such as appending an item, and getting/setting an item? Failing that, could we at least make fromstring() accept an arbitrary read buffer, not just an actual string? Do you need to append, or are you just looking to create/manipulate an array with a bunch of c-float values? I find As{Write/Read}Buffer sufficient for most of these tasks. I've included some example pyrex code that populates a new array.array at c speed. (Note that you can get the size of the resulting c array more easily than you are by using PyObject_Length). Of course, this still leaves difficult appending to an already-created array. def calcW0(W1, colTotal): Calculate a W0 array from a W1 array. @param W1: array.array of doubles @param colTotal: value to which each column should sum @return W0 = [colTotal] * NA - W1 cdef int NA NA = len(W1) W0 = array('d', [colTotal]) * NA cdef double *cW1, *cW0 cdef int i cdef Py_ssize_t dummy PyObject_AsReadBuffer(W1, void**cW1, dummy) PyObject_AsWriteBuffer(W0, void**cW0, dummy) for i from 0 = i NA: cW0[i] = cW0[i] - cW1[i] return W0 regards, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Partial function application 'from the right'
On 29-Jan-09, at 3:21 PM, Daniel Stutzbach wrote: On Thu, Jan 29, 2009 at 4:04 PM, Antoine Pitrou solip...@pitrou.net wrote: Alexander Belopolsky alexander.belopolsky at gmail.com writes: By this analogy, partial(f, ..., *args) is right_partial with '...' standing for any number of missing arguments. I you want to specify exactly one missing argument, you would want to write partial(f, :, *args), which is not a valid syntax even in Py3. Yes, of course, but... the meaning which numpy attributes to Ellipsis does not have to be the same in other libraries. Otherwise this meaning would have been embedded in the interpreter itself, while it hasn't. The meaning which numpy attributes to Ellipsis is also the meaning that mathematical notation has attached to Ellipsis for a very long time. And yet, python isn't confined to mathematical notation. *, ** are both overloaded for use in argument lists to no-one's peril, AFAICT. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Psyco for -OO or -O
On 13-Dec-08, at 5:28 AM, Michael Foord wrote: Lie Ryan wrote: I'm sure probably most of you knows about psyco[1], the optimizer. Python has an -O and -OO flag that is intended to be optimization flag, but we know that currently it doesn't do much. Why not add psyco as standard library and let -O or -OO invoke psyco? This really belongs on Python-ideas and not Python-dev. The main reason why not is that someone(s) from the Python core team would then need to 'own' maintaining Psyco (which is x86 only as well Worse, it is 32bit only, which has greatly diminished its usefulness in the last few years. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RELEASED Python 3.0 final
On 5-Dec-08, at 8:40 AM, A.M. Kuchling wrote: On Fri, Dec 05, 2008 at 05:40:46AM -, [EMAIL PROTECTED] wrote: For most users, especially new users who have yet to be impressed with Python's power, 2.x is much better. It's not like library support is one small check-box on the language's feature sheet: most of the attractive things about Python are libraries. Of course I am not free Here I agree, sort of. Newbies may not understand what they're giving up in terms of libraries. (The 'sort of' is because, having learned 3.0, learning the changes for 2.6 is certainly much easier than learning a first programming language is.) For possible insight, here is a current discussion on the topic: http://www.reddit.com/r/programming/comments/7hlra/ask_progit_ive_got_the_itch_to_learn_python_since/ (note that these would be programmers interested in learning python, not people trying to learn programming) -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] n.numbits: method or property?
On 11-Nov-08, at 4:16 PM, Mark Dickinson wrote: More generally, what are the guidelines for determining when it's appropriate to make something a property rather than a method? Both are awkward on numeric types in python, necessitating brackets or a space before the dot: (1).__doc__ 1 .__doc__ I'd suggest a third alternative, which is a standalone function in math: from math import numbits: numbits(1) -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP: Consolidating names and classes in the `unittest` module (updated 2008-07-15)
On 15-Jul-08, at 6:05 AM, Andrew Bennetts wrote: Ben Finney wrote: Stephen J. Turnbull [EMAIL PROTECTED] writes: That measured only usage of unittest *within the Python standard library*. Is that the only body of unittest-using code we need consider? Three more data points then: bzr: 13228 assert* vs. 770 fail*. Twisted: 6149 assert* vs. 1666 fail*. paramiko: 431 assert* vs. 4 fail*. Our internal code base: $ ack self.assert. | wc -l 3232 $ ack self.fail. | wc -l 1124 -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 371: Additional Discussion
On 3-Jun-08, at 3:53 PM, Benjamin Peterson wrote: On Tue, Jun 3, 2008 at 5:08 PM, Jesse Noller [EMAIL PROTECTED] wrote: Also - we could leave in stubs to match the threading api - Guido, David Goodger and others really prefer not to continue the broken API of the threading API I agree that the threading the the pyprocessing APIs should be PEP 8 compliant, but I think 2 APIs is almost worse than one wrong one. A cleaner way to effectuate the transition would be to leave the camelCase API in 2.6 (for both modules), switch to PEP 8 in py3k (for both modules), and provide threading3k and multiprocessing3k modules in 2.6 that façade the 2.6 API with the PEP 8 API. 2to3 would rewrite 'import threading3k' to 'import threading' and everything would work (it would warn about 'import threading' in 2.6 code). -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Iterable String Redux (aka String ABC)
On 28-May-08, at 2:33 PM, Bill Janssen wrote: From what's been discussed so far, I don't see any advantage of isinstance(o, String) over hasattr(o, 'encode') or somesuch. Look, even if there were *no* additional methods, it's worth adding the base class, just to differentiate the class from the Sequence, as a marker, so that those of us who want to ask isinstance(o, String) can do so. Personally, I'd add in all the string methods to that class, in all their gory complexity. Those who need a compliant class should subclass the String base class, and override/add what they need. I'm not sure I agree with you on the solution, but I definitely agree that although str/unicode are conceptually sequences of characters, it is rarely useful to think of them as iterables of objects, unlike all other Sequences. (Note: I don't dispute that it is occasionally useful to treat them as such.) In my perfect world, strings would be indicable and sliceable, but not iterable. A character iterator could be obtained using a new method, such as .chars(). s = 'hello world' list(s) # exception set(s) # exception tuple(s) # exception for char in s: # exception [ord(c) for c in s] # exception s[2] # ok s[::-1] # ok for char in s.chars(): # ok [ord(c) for c in s.chars()] # ok Though an argument could be made against this, I consider the current behaviour of strings one of the few instances where purity beats practicality in python. It is often the cause of errors that fail too late in my experience. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Iterable String Redux (aka String ABC)
On 28-May-08, at 5:44 PM, Greg Ewing wrote: Mike Klaas wrote: In my perfect world, strings would be indicable and sliceable, but not iterable. An object that was indexable but not iterable would be a very strange thing. If it has __len__ and __getitem__, there's nothing to stop you iterating over it by hand anyway, so disallowing __iter__ would just seem perverse. Python has a beautiful abstraction in iteration: iter() is a generic function that allows you lazily consume a sequence of objects, whether it be lists, tuples, custom iterators, generators, or what have you. It is trivial to write your code to be agnostic to the type of iterable passed-in. Almost anything else a consumer of your code passes in will result in an immediate exception. Unfortunately, python has two extremely common data types which do not fail when this generic function is applied to them, and instead almost always returns a result which is not desired. Instead, it iterates over the characters of the string, a behaviour which is rarely needed in practice due to the wealth of methods available. I agree that it would be perverse to disallowing iterating over a string. I just wish that the way to do that wasn't glommed on to the object-iteration abstraction. As it stands, any consumer of iterables has to keep strings in mind. It is particularly irksome when the target input is an iterable of strings. I recall a function that accepts a list/iterable of item keys, hashes them, and then retrieves values based on the item hashes (usually over the network, so it is necessary to batch requests). This function is often used in the interactive interpreter, and it is thus very prone to being passed-in a string rather than a list. There was no good way to prevent the (frequent) mysterious not found errors save adding an explicit type check for basestring. String already behaves slightly differently from the way other sequences act: It is the only sequence for which 'seq in seq' is true, and the only sequence for which 'x in seq' can be true but 'any(x==item for item in seq)' is false. Abstractions are sometimes imperfect: this is why there is an explicit typecheck for strings in the sum() builtin. I'll stop here as I realize that the likelihood that this will be accepted is terribly small, especially considering the late stage of the process. But I would be willing to develop a patch that implements this behaviour on the off chance it is. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 8: Discourage named lambdas?
On 2-May-08, at 11:23 PM, Scott David Daniels wrote: Mike Klaas wrote: ... A common pattern for me is to replace an instances method with a lambda to add monitoring hooks or disable certain functionality: inst.get_foo = lambda: FakeFoo() This is not replacable in one line with a def (or without locals() detritius). Assuming this is good style, it seems odd that inst.get_foo = lambda: FakeFoo() is acceptible style, but get_foo = lambda: FakeFoo() But surely, none of these are great style, and in fact the lambda lures you into using it. I'd propose a far better use is: inst.get_foo = FakeFoo or get_foo = FakeFoo Sorry, that was a bad example. It is obviously silly if the return value of the function is callable. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 8: Discourage named lambdas?
On 2-May-08, at 4:03 PM, Terry Reedy wrote: Some people write somename = lambda args: expression instead of the more obvious (to most people) and, dare I say, standard def somename(args): return expression The difference in the result (the only one I know of) is that the code and function objects get the generic name 'lambda' instead of the more informative (in repr() output or tracebacks) 'somename'. I consider this a disadvantage. In the absence of any compensating advantages (other than the trivial saving of 3 chars), I consider the def form to be the proper Python style to the point I think it should be at least recommended for the stdlib in the Programming Recommendations section of PEP 8. There are currently uses of named lambdas at least in urllib2. This to me is a bad example for new Python programmers. What do our style mavens think? I'm not a style maven, but I'll put forward why I don't think this is bad style. Most importantly, these statements can result from sensible changes from what is (I believe) considered good style. For example, consider: registerCallback(lambda: frobnicate(7)) what if there are too places that the callback needs to be registered? registerCallback(lambda: frobnicate(7)) registerCallback2(lambda: frobnicate(7)) DRY leads to factoring this out into a variable in a straightforward manner: callback = lambda: frobnicate(7) registerCallback(callback) registerCallback2(callback) Another thing to consider is that the def() pattern is only possible when the bound variable has no dots. A common pattern for me is to replace an instances method with a lambda to add monitoring hooks or disable certain functionality: inst.get_foo = lambda: FakeFoo() This is not replacable in one line with a def (or without locals() detritius). Assuming this is good style, it seems odd that inst.get_foo = lambda: FakeFoo() is acceptible style, but get_foo = lambda: FakeFoo() isn't. (I also happen to think that the def pattern is less clear in some situations, but that speaks more to personal taste so isn't particularly relevant) -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 22-Apr-08, at 3:31 AM, M.-A. Lemburg wrote: I don't think that should be part of the standard library. People will mistake what it tells them for certain. +1 I also think that it's better to educate people to add (correct) encoding information to their text data, rather than give them a guess mechanism... That is a fallacious alternative: the programmers that need encoding detection are not the same people who are omitting encoding information. I only have a small opinion on whether charset detection should appear in the stdlib, but I am somewhat perplexed by the arguments in this thread. I don't see how inclusion in the stdlib would make people more inclined to think that the algorithm is always correct. In terms of the need of this functionality: Martin wrote: Can you please explain why that is? Web programs should not normally have the need to detect the encoding; instead, it should be specified always - unless you are talking about browsers specifically, which need to support web pages that specify the encoding incorrectly. Any program that needs to examine the contents of documents/feeds/ whatever on the web needs to deal with incorrectly-specified encodings (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers grin -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Encoding detection in the standard library?
On 22-Apr-08, at 2:16 PM, Martin v. Löwis wrote: Any program that needs to examine the contents of documents/feeds/whatever on the web needs to deal with incorrectly-specified encodings That's not true. Most programs that need to examine the contents of a web page don't need to guess the encoding. In most such programs, the encoding can be hard-coded if the declared encoding is not correct. Most such programs *know* what page they are webscraping, or else they couldn't extract the information out of it that they want to get at. I certainly agree that if the target set of documents is small enough it is possible to hand-code the encoding. There are many applications, however, that need to examine the content of an arbitrary, or at least non-small set of web documents. To name a few such applications: - web search engines - translation software - document/bookmark management systems - other kinds of document analysis (market research, seo, etc.) As for feeds - can you give examples of incorrectly encoded one (I don't ever use feeds, so I honestly don't know whether they are typically encoded incorrectly. I've heard they are often XML, in which case I strongly doubt they are incorrectly encoded) I also don't have much experience with feeds. My statement is based on the fact that chardet, the tool that has been cited most in this thread, was written specifically for use with the author's feed parsing package. As for whatever - can you give specific examples? Not that I can substantiate. Documents feeds covers a lot of what is on the web--I was only trying to make the point that on the web, whenever an encoding can be specified, it will be specified incorrectly for a significant chunk of exemplars. (which, sadly, is rather common). The set of programs of programs that need this functionality is probably the same set that needs BeautifulSoup--I think that set is larger than just browsers grin Again, can you give *specific* examples that are not web browsers? Programs needing BeautifulSoup may still not need encoding guessing, since they still might be able to hard-code the encoding of the web page they want to process. Indeed, if it is only one site it is pretty easy to work around. My main use of python is processing and analyzing hundreds of millions of web documents, so it is pretty easy to see applications (which I have listed above). I think that libraries like Mark Pilgrim's FeedParser and BeautifulSoup are possible consumers of guessing as well. In any case, I'm very skeptical that a general guess encoding module would do a meaningful thing when applied to incorrectly encoded HTML pages. Well, it does. I wish I could easily provide data on how often it is necessary over the whole web, but that would be somewhat difficult to generate. I can say that it is much more important to be able to parse all the different kinds of encoding _specification_ on the web (Content-Type/Content-Encoding/meta http-equiv tags, etc), and the malformed cases of these. I can also think of good arguments for excluding encoding detection for maintenance reasons: is every case of the algorithm guessing wrong a bug that needs to be fixed in the stdlib? That is an unbounded commitment. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socket recv on win32 can be extremly delayed (python bug?)
Hi, This is not a python-specific problem. See http://en.wikipedia.org/wiki/Nagle's_algorithm -Mike On 17-Apr-08, at 3:08 AM, Robert Hölzl wrote: hello, I tried to implement a simple python XMLRPC service on a win32 environment (client/server code inserted below). The profiler of the client told me, that a simple function call needs about 200ms (even if I run it in a loop, the time needed per call stays the same). After analysing the problem with etherreal I found out, that the XMLRPC request is transmitted via two TCP packets. One containing the HTTP header and one containting the data. But the acknowledge to the first TCP packet is delayed by 200ms. I tried around on the server side and found out that if the server reads exactly all bytes transfered in the first TCP frame (via socket.recv()), the next socket.recv(), even if reading only one byte, needs about 200 ms. But if I read one byte less than transfered in the first TCP frame and then reading 2 bytes (socket.recv(2)) there is no delay, although the same total amount of data was read. After some googling I found the website http://support.microsoft.com/?scid=kb%3Ben-us%3B823764x=12y=15 , which proposed a workaround (modifing the registryentry for the tcp/ip driver) that did work. But modifing the clients registry settings is no option for us. Is there anybody who nows how to solve the problem? Or is it even a problem if the python socket implementation? By the way: I testet Win2000 SP4 and WinXP SP2 with Python 2.3.3 and Python 2.5.1 each. CLIENT: -- import xmlrpclib import profile server = xmlrpclib.ServerProxy(http://server:80;) profile.run('server.test(1,2)') SERVER: -- import SimpleXMLRPCServer def test(a,b): return a+b server = SimpleXMLRPCServer.SimpleXMLRPCServer( ('', 80) ) server.register_function(test) server.serve_forever() -- Mit freundlichen Grüßen, Best Regards, Robert Hölzl BALTECH AG Firmensitz: Lilienthalstrasse 27, D-85399 Hallbergmoos Registergericht: Amtsgericht München, HRB 115215 Vorstand: Jürgen Rösch (Vorsitzender), Martina M. Schuster Aufsichtsratsvorsitzende: Eva Zeising robert_hoelzl.vcf___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/mike.klaas%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] svnmerge and added files
On 20-Mar-08, at 2:32 PM, Christian Heimes wrote: Martin v. Löwis schrieb: It seems that recently, a number of merges broke in the sense that files added to the trunk were not merged into the 3k branch. Is that a general problem with svnmerge? Should that be fixed to automatically do a svn add when merging changes that included file additions and removals? It sometimes happens when I do a svnmerge, revert the merge with svn revert -R and do a second svnmerge. Files added by the first svnmerge aren't added to the commit list for the second merge. Unfortunately svnmerge.py doesn't warn me when the file already exists. It may not warn explicitly about that, but it certainly does warn: M ... Skipped path/to/missing/file... M ... M ... As someone who deals with svnmerge.py a lot, I find that it is appropriate to treat Skipped as critical as a conflict. I too wish that it was more explicit in this respect. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5.2 release coming up
On 22-Jan-08, at 8:47 PM, Guido van Rossum wrote: While the exact release schedule for 2.5.2 is still up in the air, I expect that it will be within a few weeks. This means that we need to make sure that anything that should go into 2.5.2 goes in ASAP, preferably this week. It also means that we should be very careful what goes in though -- and we should be paying particular attention to stability on all platforms! Fortunately it looks like quite a few 2.5 buildbots are green: http://python.org/dev/buildbot/2.5/ I propose that anything that ought to go into 2.5.2 (or should be reviewed for suitability to go into it) should be marked urgent in the tracker, *and* have its version set to (or include) Python 2.5. I'm not sure if it is particularly urgent because of the rarity of occurrence, but I discovered a bug that causes httplib to hang indefinitely given some rarely-occurring input in the wild. To reproduce: python -c 'import urllib2; urllib2.urlopen(http:// www.hunteros.com).read()' WARNING: the page was tagged by one of our users and is definitely NSFW. Again, it seems to occur very rarely, but the behaviour is quite painful and the fix trivial (see http://bugs.python.org/issue1966). Thanks, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Contributing to Python
On 3-Jan-08, at 1:07 PM, Guido van Rossum wrote: On Jan 3, 2008 11:49 AM, Fred Drake [EMAIL PROTECTED] wrote: Python 2.6 seems to be entirely targeted at people who really want to be on Python 3, but have code that will need to be ported. I certainly don't view it as interesting in it's own right. It will be though -- it will have genuine new features -- yes, backported from 3.0, but new features nevertheless, and in a compatible fashion. I think that there are still tons of people like me for whom 3.0 is still a future concern that is too big to devote cycles to at the moment, but are still very much interested in improving the 2.x series (which improves 3.0) at the same time. I've been inspired by this thread to start working on a few 2.6 items that I had in mind, starting with http://bugs.python.org/ issue1663329 , which mostly just needed documentation and cleanup (now done). Question: should patches include edits to whatsnew.rst, or is the committer responsible for adding a note? -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Request for inclusion in 2.5.2 (5-for-1)
On 2-Nov-07, at 6:57 AM, Guido van Rossum wrote: Since people are already jumping on those bugs but nobody has voiced an opinion on your own patch, let me say that I think it's a good patch, and I want it in 2.6, but I'm reluctant to add it to 2.5.2 as it goes well beyond a bugfix (adding a new C API and all that). Thanks for looking at it! Is there a better way of exposing some c-helper code for a stdlib module written in python? It seems that the canonical pattern is to write a separate extension module called _modulename and import the functionality from there, but that seemed like a significantly more invasive patch. Might it help to tack on the helper function in posix only, deleting it from the os namespace? Thanks again, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Request for inclusion in 2.5.2 (5-for-1)
Issue http://bugs.python.org/issue1663329 details an annoyance in the subprocess module that has affected several users, including me. Essentially, closing hundreds of thousands of file descriptors by round-tripping through the python exception machinery is very slow, taking hundreds of milliseconds and at times many seconds. The proposed fix is to write this loop in c. The c function is but a handful of lines long. I purposefully kept the implementation trivial so that it will work on all unix variants (there is another issue that contains a super-duper optimization for AIX, and other possibilities exist for Solaris, but the simple fix yields a ten-fold speedup everywhere but windows, so I didn't think that it was worth the complexity). Though technically relating only to performance, I consider this a bug-fix candidate as mysterious multi-second delays when launching a subprocess end up making the functionality of close_fds unusable on some platform configurations (namely, those with high MAX_FD set). It would be great to see this is 2.5.2. Understanding that issue evaluation takes significant effort, I've done some evaluation/triage on other open tickets: See issues for detailed comments. http://bugs.python.org/issue1516330: No clear problem, invalid patch. Recommend rejection. http://bugs.python.org/issue1516327: No clear problem, no patch. Recommend closing. http://bugs.python.org/issue1705170: reproduced. Conjecture as to why it is occurring, but I don't know the guts well enough to propose a decent fix. http://bugs.python.org/issue1773632: tested patch. Recommend accepting unless there are things I don't know about this mysterious _xmlrpclib extension (which there doubtlessly are) http://bugs.python.org/issue738948: Rather old PEP that has gathered no comments. Calling it a PEP is generous--it is really just a link to an academic paper with a note about how this might be integrated into Stackless. Thanks, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Adding concat function to itertools
On 28-Sep-07, at 10:45 AM, Raymond Hettinger wrote: [Bruce Frederiksen] I've added a new function to itertools called 'concat'. This function is much like chain, but takes all of the iterables as a single argument. Any practical use cases or is this just a theoretical improvement? For Py2.x, I'm not willing to unnecessarily expand the module. However, for Py3k, I'm open to changing the signature for chain(). For me, a fraction of chain() uses are of the * variety: d = defaultdict(list) allvals = chain(*d.values()) return chain(*imap(cache.__getitem__, keylist)) Interestingly, they seem to all have something to do with dictionary values() that are themselves iterable. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions, Unicode etc.
On 8-Aug-07, at 2:28 AM, Nick Maclaren wrote: I have needed to push my stack to teach REs (don't ask), and am taking a look at the RE code. I may be able to extend it to support RFE 694374 and (more importantly) atomic groups and possessive quantifiers. While I regard such things as revolting beyond belief, they make a HELL of a difference to the efficiency of recognising things like HTML tags in a morass of mixed text. +1. I would use such a feature. The other approach, which is to stick to true regular expressions, and wholly or partially convert to DFAs, has already been rendered impossible by even the limited Perl/PCRE extensions that Python has adopted. Impossible? Surely, a sufficiently-competent re engine could detect when a DFA is possible to construct? -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Regular expressions, Unicode etc.
In 8-Aug-07, at 12:47 PM, Nick Maclaren wrote: The other approach, which is to stick to true regular expressions, and wholly or partially convert to DFAs, has already been rendered impossible by even the limited Perl/PCRE extensions that Python has adopted. Impossible? Surely, a sufficiently-competent re engine could detect when a DFA is possible to construct? I doubt it. While it isn't equivalent to the halting problem, it IS an intractable one! There are two problems: Firstly, things like backreferences are an absolute no-no. They are not regular, and REs with them in cannot be converted to DFAs. That could be 'solved' by a parser that kicked out such constructions, but it would get screams from many users. Secondly, anything involving explicit or implicit negation can lead to (if I recall) a super-exponential explosion in the size of the DFA. That could be 'solved' by imposing a limit, but few people would be able to predict when it would bite. Right. The analysis I envisioned would be more along the lines of if troublesome RE extensions are used, do not attempt to construct a DFA. It could even be exposed via an alternate api (re.compile_dfa ()) that admitted a subset of the usual grammar. Thirdly, I would require notice of the question of whether capturing parentheses could be supported, and what semantic changes would be to which were set and how. Capturing groups are rather integral to the python regex api and, as you say, a major difficulty for DFA-based implementations. Sounds like a task best left to a thirdparty package. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: [ python-Patches-1744382 ] Read Write lock
On 6-Jul-07, at 6:45 AM, Yaakov Nemoy wrote: I can do the other three parts, but I am wondering, how do I write a deterministic test unit for my patch? How is it done with the threading model in python in general? I don't know how it is done in general, but for reference, here are some of the unittests for my read/write lock class: def testReadCount(self): wrlock = ReadWriteLock() read, write = wrlock.reader, wrlock.writer self.assertEqual(wrlock.readerCount, 0) read.acquire() self.assertEqual(wrlock.readerCount, 1) read.acquire() self.assertEqual(wrlock.readerCount, 2) read.release() self.assertEqual(wrlock.readerCount, 1) read.release() self.assertEqual(wrlock.readerCount, 0) def testContention(self): wrlock = ReadWriteLock() read, write = wrlock.reader, wrlock.writer class Writer(Thread): gotit = False def run(self): write.acquire() self.gotit = True write.release() writer = Writer() self.assertEqual(wrlock.readerCount, 0) read.acquire() self.assertEqual(wrlock.readerCount, 1) writer.start() self.assertFalse(writer.gotit) read.acquire() self.assertEqual(wrlock.readerCount, 2) self.assertFalse(writer.gotit) read.release() self.assertEqual(wrlock.readerCount, 1) self.assertFalse(writer.gotit) read.release() self.assertEqual(wrlock.readerCount, 0) time.sleep(.1) self.assertTrue(writer.gotit) def testWRAcquire(self): wrlock = ReadWriteLock() read, write = wrlock.reader, wrlock.writer self.assertEqual(wrlock.readerCount, 0) write.acquire() write.acquire() write.release() write.release() read.acquire() self.assertEqual(wrlock.readerCount, 1) read.acquire() self.assertEqual(wrlock.readerCount, 2) read.release() self.assertEqual(wrlock.readerCount, 1) read.release() self.assertEqual(wrlock.readerCount, 0) write.acquire() write.release() def testOwnAcquire(self): wrlock = ReadWriteLock() read, write = wrlock.reader, wrlock.writer class Writer(Thread): gotit = False def run(self): write.acquire() self.gotit = True write.release() writer = Writer() self.assertEqual(wrlock.readerCount, 0) read.acquire() self.assertEqual(wrlock.readerCount, 1) writer.start() self.assertFalse(writer.gotit) # can acquire the write lock if only # this thread has the read lock write.acquire() write.release() read.acquire() self.assertEqual(wrlock.readerCount, 2) self.assertFalse(writer.gotit) read.release() self.assertEqual(wrlock.readerCount, 1) self.assertFalse(writer.gotit) read.release() self.assertEqual(wrlock.readerCount, 0) time.sleep(.1) self.assertTrue(writer.gotit) def testDeadlock(self): wrlock = ReadWriteLock() read, write = wrlock.reader, wrlock.writer errors = [] # a situation which can readily deadlock if care isn't taken class LockThread(threading.Thread): def __init__(self): threading.Thread.__init__(self) self.q = Queue.Queue() def run(self): while True: task, lock, delay = self.q.get() if not task: break time.sleep(delay) if task == 'acquire': for delay in waittime(maxTime=5.0): if lock.acquire(False): break time.sleep(delay) else: errors.append(Couldn't acquire %s % str (lock)) else: lock.release() thrd = LockThread() thrd.start() thrd.q.put(('acquire', read, 0)) time.sleep(.2) read.acquire() thrd.q.put(('acquire', write, 0)) thrd.q.put(('release', write, .5)) thrd.q.put(('release', read, 0)) write.acquire() time.sleep(0.0) write.release() read.release() # end thrd.q.put((None, None, None)) thrd.join() self.assertFalse(errors, Errors: %s % errors) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe:
Re: [Python-Dev] Py2.6 buildouts to the set API
On 18-May-07, at 6:34 PM, Raymond Hettinger wrote: Here some ideas that have been proposed for sets: * New method (proposed by Shane Holloway): s1.isdisjoint(s2). Logically equivalent to not s1.intersection(s2) but has an early- out if a common member is found. The speed-up is potentially large given two big sets that may largely overlap or may not intersect at all. There is also a memory savings since a new set does not have to be formed and then thrown away. +1. Disjointness verification is one of my main uses for set(), and though I don't think that the early-out condition would trigger often in my code, it would increase readability. * Additional optional arguments for basic set operations to allow chained operations. For example, s=s1.union(s2, s3, s4) would be logically equivalent to s=s1.union(s2).union(s3).union(s4) but would run faster because no intermediate sets are created, copied, and discarded. It would run as if written: s=s1.copy(); s.update (s2); s.update(s3); s.update(s4). It's too bad that this couldn't work with the binary operator spelling: s = s1 | s2 | s3 | s4 * Make sets listenable for changes (proposed by Jason Wells): s = set(mydata) def callback(s): print 'Set %d now has %d items' % (id(s), len(s)) s.listeners.append(callback) s.add(existing_element) # no callback s.add(new_element)# callback -1 on the base set type: it seems too complex for a base set type. Also, there are various possible semantics that might be desirable, such as receiving the added element, or returning False to prevent addition. The proper place is perhaps a subclass of set with a magic method (analogous to defaultdict/__missing__). -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of Tracker Issues
On 15-May-07, at 12:32 AM, Georg Brandl wrote: There are two problems with this: * The set of questions is limited, and bots can be programmed to know them all. Sure, but if someone is customizing their bot to python's issue tracker, in all likelyhood they would have to be dealt with specially anyway.Foiling automated bots shoud be the first priority--they should represent the vast majority of cases. * Even programmers might not immediately know an answer, and I can understand them turning away on that occasion (take for example the name- binding term). It would be hard to make it so easy that anyone with business submitting a bug report should know the answer: What python keyword is used to define a function? What file extension is typically used for python source files? etc. If there is still worry, then a failed answer could simply be the moderation trigger. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 30XZ: Simplified Parsing
On 5/4/07, Baptiste Carvello [EMAIL PROTECTED] wrote: maybe we could have a dedent literal that would remove the first newline and all indentation so that you can just write: call_something( d''' first part second line third line ''' ) Surely from textwrap import dedent as d is close enough? -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] (no subject)
On 4/30/07, Greg Ewing [EMAIL PROTECTED] wrote: JOSHUA ABRAHAM wrote: I was hoping you guys would consider creating function in os.path or otherwise that would find the full path of a file when given only it's base name and nothing else.I have been made to understand that this is not currently possible. Does os.path.abspath() do what you want? If not, what exactly *do* you want? probably: def find_in_path(filename): for path in os.environ['PATH'].split(os.pathsep): if os.path.exists(filename): return os.path.abspath(os.path.join(path, filename)) -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] regexp in Python
On 3/23/07, Fredrik Lundh [EMAIL PROTECTED] wrote: Bartlomiej Wolowiec wrote: For some time I'm interested in regular expressions and Finite State Machine. Recently, I saw that Python uses Secret Labs' Regular Expression Engine, which very often works too slow. Its pesymistic time complexity is O(2^n), although other solutions, with time complexity O(n*m) ( O(n*m^2), m is the length of the regular expression and n is the length of the text, introduction to problem: http://swtch.com/~rsc/regexp/regexp1.html ) that article almost completely ignores all the subtle capturing and left- to-right semantics that a perl-style engine requires, though. trust me, this is a much larger can of worms than you might expect. but if you're ready to open it, feel free to hack away. major part of regular expressions the contrived example you used has nothing whatsoever to do with major part of regular expressions as seen in the wild, though. I'd be much more interested in optimizations that focuses on patterns you've found in real code. A fruitful direction that is not as ambitious as re-writing the entire engine would be to add independent group assertions to python's RE syntax [ (? ... ) in perl]. They are rather handy for optimizing the malperforming cases alluded to here (which rarely occur as the OP posted, but tend to crop up in less malignant forms). -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] deprecate commands.getstatus()
On 3/22/07, Greg Ewing [EMAIL PROTECTED] wrote: Titus Brown wrote: I could add in a 'system'-alike call easily enough; that was suggested. But I think returncode = subprocess.call(program) is pretty simple, isn't it? Something to keep in mind is that system() doesn't directly launch a process running the command, it uses a shell. So it's not just simple sugar for some subprocess.* call. subprocess.call(ls | grep tmp, shell=True) svn-commit.2.tmp svn-commit.tmp The more important difference is the encoding of the return value: system() has magic to encode signal-related termination of the child process. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Proposal to revert r54204 (splitext change)
On 3/15/07, Mike Krell [EMAIL PROTECTED] wrote: Here is a point of confusion. Bear in mind I'm running this under windows, so explorer happily reports that .emacs has a type of emacs. (In windows, file types are registered in the system based on the extension -- all the characters following the last dot. An unregistered extension is listed as its own type. Thus files ending in .txt are listed as type Text Document, but files ending in .emacs are listed as type emacs because it's an unregistered extension.) Unix-derived files prepended with a dot (like .emacs) are not meant to be interpreted as a file type. It may be useful on occasion when using windows, but it certainly is not the intent of a dotfile. The following files reside in my /tmp: .X0-lock .X100-lock .X101-lock .X102-lock .X103-lock .X104-lock .X105-lock .X106-lock .X11-unix .X99-lock ...which are certainly not all unnamed files of different type. I often sort files in the explorer based on type, and I want a file and all its backups to appear next to each other in such a sorted list. That's exactly why I rename the files the way I do. Thus, .1.emacs is what I want, and .emacs.1 is a markedly inferior and unacceptable alternative. That's what I'm referring to by extension preservation. Unacceptable? You code fails in (ISTM) the more common case of an extensionless file. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Patch 1644818: Allow importing built-in submodules
On 3/12/07, Miguel Lobo [EMAIL PROTECTED] wrote: Yet, the same can be said for most other patches: they are all for the benefit of users running into the same respective problems. Agreed. What I mean is that this fasttrack system where the submitter has to do some extra work seems to imply that accepting the patch somehow benefits the submitter. In fact I'm probably the person the patch will benefit least, because I have already run into the problem and know how to solve it. I feel responsible for defending the patch since I've written it and I know the problem it fixes and my solution better than anybody else, but I don't see how that responsibility extends to having to do extra unrelated work to have the patch accepted. It is certainly not your _responsibility_ to review additional patches to get your accepted; without doing so, it likely will be accepted, eventually (assuming it is correct). As far as I understand, Martin's offer is purely a personal one: there is a patch backlog, and if you help clear it out, he will help your patch get processed faster. cheers, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] locals(), closures, and IronPython...
On 3/6/07, Greg Ewing [EMAIL PROTECTED] wrote: Although you can get a similar effect now by doing def __init__(self, **kwds): args = dict(prec=None, rounding=None, traps=None, flags=None, _rounding_decision=None, Emin=None, Emax=None, capitals=None, _clamp=0, _ignored_flags=None) args.update(kwds) for name, value in args: ... So, no need for locals() here. Yes, that is the obvious approach. But it is painful to abandon the introspectable signature. There's nothing quite like running help(func) and getting *args, **kwargs as the documented parameter list. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bool conversion wart?
On 2/22/07, Neal Becker [EMAIL PROTECTED] wrote: Well consider this: str (4) '4' int(str (4)) 4 str (False) 'False' bool(str(False)) True Doesn't this seem a bit inconsisent? Virtually no python objects accept a stringified version of themselves in their constructor: str({}) '{}' dict('{}') Traceback (most recent call last): File stdin, line 1, in module ValueError: dictionary update sequence element #0 has length 1; 2 is required str([]) '[]' list('[]') ['[', ']'] Python is not Perl. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] bool conversion wart?
On 2/22/07, Neal Becker [EMAIL PROTECTED] wrote: Except, all the numeric types do, including int, float, and complex. But not bool. Oh? In [5]: str(complex(1, 2)) Out[5]: '(1+2j)' In [6]: complex(str(complex(1, 2))) --- type 'exceptions.ValueError': complex() arg is a malformed string In fact, this is not just academic. The fact that other numeric types act this way leaves a reasonable expectation that bool will. Instead, bool fails in _the worst possible way_: it silently gives a _wrong result_. I'd debate the assertion that 'bool' is a numeric type (despite being a subclass of int). For bool() to return anything other than the value of the python expression evaluated in boolean context would be _lunacy_ and there is absolutely no chance it that will be changed. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of dynamic attribute access discussion
On 2/13/07, Josiah Carlson [EMAIL PROTECTED] wrote: As for people who say, but getattr, setattr, and delattr aren't used; please do some searches of the Python standard library. In a recent source checkout of the trunk Lib, there are 100+ uses of setattr, 400+ uses of getattr (perhaps 10-20% of which being the 3 argument form), and a trivial number of delattr calls. In terms of applications where dynamic attribute access tends to happen; see httplib, urllib, smtpd, the SocketServer variants, etc. Another data point: on our six-figure loc code base, we have 123 instances of getattr, 30 instances of setattr, and 0 instances of delattr. There are 5 instances of setattr( ... getattr( ... ) ) on one line (and probably a few more that grep didn't pick up that span multiple lines). As a comparison, enumerate (that I would have believed was much more frequent a priori), is used 67 times, and zip/izip 165 times. +1 on .[] notation and the idea in general. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of dynamic attribute access discussion
On 2/13/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Mike As a comparison, enumerate (that I would have believed was much Mike more frequent a priori), is used 67 times, and zip/izip 165 times. But (get|set|has)attr has been around much longer than enumerate. I'm almost certain they existed in 1.5, and perhaps as far back as 1.0. If you really want to compare the two, go back to your code baseline before enumerate was added to Python (2.3?) and subtract from your counts all the *attr calls that existed then and then compare the adjusted counts with enumerate. The entire codebase was developed post-2.4, and I am a bit of an enumerate-nazi, so I don't think that is a concern g. Given that you have more uses of zip/izip maybe we should be discussion syntactic support for that instead. ;-) There are even more instances of len()... len(seq) - |seq|? g -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Summary of dynamic attribute access discussion
On 2/13/07, Greg Ewing [EMAIL PROTECTED] wrote: Mike Klaas wrote: As a comparison, enumerate (that I would have believed was much more frequent a priori), is used 67 times, and zip/izip 165 times. By that argument, we should be considering a special syntax for zip and izip before getattr. I don't really buy that. Frequency of use must be balanced against the improvement in legibility. Assuming that my figures bear some correspondence to typical usage patterns, enumerate() was introduced despite the older idiom of for i, item in zip(xrange(len(seq)), seq): being less frequent than getattr. SImilarly, I see no clamor to add syntactic support for len(). It's current usage is clear. [note: this post is not continuing to argue in favour of the proposal] -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Interning string subtype instances
On 2/12/07, Hrvoje Nikšić [EMAIL PROTECTED] wrote: cause problems for other users of the interned string. I agree with the reasoning, but propose a different solution: when interning an instance of a string subtype, PyString_InternInPlace could simply intern a copy. Interning currently requires an external reference to prevent garbage collection (I believe). What will hold a reference to the string copy? -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Object creation hook
On 1/23/07, Kristján V. Jónsson [EMAIL PROTECTED] wrote: Hello there. I am trying to insert a hook into python enabling a callback for all just-created objects. The intention is to debug and find memory leaks, e.g. by having the hook function insert the object into a WeakKeyDictionary. I have already added a method to object to set such a hook, and object_new now calls it upon completion, but this is far from covering all places. Initially, I thought object_init were the place, but almost no classes call object.__init__ from their __init__ method. Then there is the separate case of old-style classes. Any suggestions on how to do a global object creation hook in python? When I've used such things in the past, I usually had some idea which classes I was interested in targeting. I used a metaclass for doing the tracking, and either invoked it on individual classes, or used __metaclass__ = X to apply it (something like class object(object): __metaclass__ = X would do the try for new-style class that inherit from object directly). -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] The bytes type
On 1/12/07, Raymond Hettinger [EMAIL PROTECTED] wrote: [A.M. Kuchling] 2.6 wouldn't go changing existing APIs to begin requiring or returning the bytes type[*], of course, but extensions and new modules might use it. The premise is dubious. If I am currently maintaining a module, why would I switch to a bytes type and forgo compatibility with Py2.5 and prior? I might as well just convert it to run on Py3.0 and leave my Py2.5 code as-is for people who want to run 2.x. A mutables bytes type is a useful addition to 2.X aside of the 3.0-compatibility motivation. Isn't that sufficient justification? -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] 2.5.1 plans
On 1/4/07, Ralf W. Grosse-Kunstleve [EMAIL PROTECTED] wrote: It would be nice if this simple fix could be included (main branch and 2.5.1): https://sourceforge.net/tracker/?func=detailaid=1598181group_id=5470atid=105470 [ 1598181 ] subprocess.py: O(N**2) bottleneck I submitted the trivial fix almost two months ago, but apparently nobody feels responsible... I just reviewed the patch, which should help it get accepted. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Non-blocking (asynchronous) timer without thread?
On 12/22/06, Evgeniy Khramtsov [EMAIL PROTECTED] wrote: The question to python core developers: Is there any plans to implement non-blocking timer like a threading.Timer() but without thread? Some interpreted languages (like Tcl or Erlang) have such functionality, so I think it would be a great feature in Python :) The main goal is to prevent threads overhead and problems with race conditions and deadlocks. I'm not sure how having python execute code at an arbitrary time would _reduce_ race conditions and/or deadlocks. And if you want to make it safe by executing code that shares no variables or resources, then it is no less safe to use threads, due to the GIL. If you can write you application in an event-driven way, Twisted might be able to do what you are looking for. cheers, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [NPERS] Re: a feature i'd like to see in python #2: indexing of match objects
On 12/6/06, Alastair Houghton [EMAIL PROTECTED] wrote: [from previous message]: Anyway, clearly what people will expect here (talking about the match object API) is that m[3:4] would give them a list (or some equivalent sequence object) containing groups 3 and 4. Why do you think someone would expect a match object? It's much more likely to be confusing to people that they have to write list(m)[x:y] or [m[i] for i in xrange(x,y)] when m[x] and m[y] work just fine. Look, I give in. There's no point trying to convince any of you further, and I don't have the time or energy to press the point. Implement it as you will. If necessary it can be an extension of my re replacement that slicing is supported on match objects. Keep in mind when implementing that m[3:4] should contain only the element at index 3, not both 3 and 4, as you've seemed to imply twice. cheers, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Segfault in python 2.5
[http://sourceforge.net/tracker/index.php?func=detailaid=1579370group_id=5470atid=105470] Hello, I'm managed to provoke a segfault in python2.5 (occasionally it just a invalid argument to internal function error). I've posted a traceback and a general idea of what the code consists of in the sourceforge entry. Unfortunately, I've been attempting for hours to reduce the problem to a completely self-contained script, but it is resisting my efforts due to timing problems. Should I continue in that vein, or is it more useful to provide more detailed results from gdb? Thanks, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Segfault in python 2.5
On 10/18/06, Michael Hudson [EMAIL PROTECTED] wrote: Mike Klaas [EMAIL PROTECTED] writes: I've been reading the bug report with interest, but unless I can reproduce it it's mighty hard for me to debug, as I'm sure you know. Indeed. Unfortunately, I've been attempting for hours to reduce the problem to a completely self-contained script, but it is resisting my efforts due to timing problems. Should I continue in that vein, or is it more useful to provide more detailed results from gdb? Well, I don't think that there's much point in posting masses of details from gdb. You might want to try trying to fix the bug yourself I guess, trying to figure out where the bad pointers come from, etc. I've peered at the code, but my knowledge of the python core is superficial at best. The fact that it is occuring as a result of a long string of garbage collection/dealloc/etc. and involves threading lowers my confidence further. That said, I'm beginning to think that to reproduce this in a standalone script will require understanding the problem in greater depth regardless... Are you absolutely sure that the fault does not lie with any extension modules you may be using? Memory scribbling bugs have been known to cause arbitrarily confusing problems... I've had sufficient experience being arbitrarily confused to never be sure about such things, but I am quite confident. The script I posted in the bug report is all stock python save for the operation in 's. That operation is pickling and unpickling (using pickle, not cPickle) a somewhat complicated pure-python instance several times. It's doing nothing with the actual instance--it just happens to take the right amount of time to trigger the segfault. It's still not perfect--this trimmed-down version segfaults only sporatically, while the original python script segfaults reliably. -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Segfault in python 2.5
On 10/18/06, Tim Peters [EMAIL PROTECTED] wrote: [Mike Klaas] Indeed. Note that I just attached a much simpler pure-Python script that fails very quickly, on Windows, using a debug build. Read the new comment to learn why both Windows and debug build are essential to it failing reliably and quickly ;-) Thanks! Next time I find a bug, installing Windows will certainly be my first step g. Yes, but you did good! This is still just an educated guess on my part, but my education here is hard to match ;-): this new business of generators deciding to clean up after themselves if they're left hanging appears to have made it possible for a generator to hold on to a frame whose thread state has been free()'d, after the thread that created the generator has gone away. Then when the generator gets collected as trash, the new exception-based clean up abandoned generator gimmick tries to access the generator's frame's thread state, but that's just a raw C struct (not a Python object with reachability-based lifetime), and the thread free()'d that struct when the thread went away. The important timing-based vagary here is whether dead-thread cleanup gets performed before the main thread tries to clean up the trash generator. Indeed--and normally it doesn't happen that way. My/your script never crashes on the first iteration because the thread's target is the generator and thus it gets DECREF'd before the thread terminates. But the exception from the first iteration holds on to a reference to the frame/generator so when it gets cleaned up (in the second iteration, due to a new exception overwriting it) the generator is freed after the thread is destroyed. At least, I think... Offhand I don't know how to repair it. Thread states /aren't/ Python objects, and there's no provision for a thread state to outlive the thread it represents. Take this with a grain of salt, but ISTM that the problem can be repaired by resetting the generator's frame threadstate to the current threadstate: (in genobject.c:gen_send_ex():80) Py_XINCREF(tstate-frame); assert(f-f_back == NULL); f-f_back = tstate-frame; +f-f_tstate = tstate; gen-gi_running = 1; result = PyEval_EvalFrameEx(f, exc); gen-gi_running = 0; Shouldn't the thread state generally be the same anyway? (I seem to recall some gloomy warning against resuming generators in separate threads). This solution is surely wrong--if f_tstate != tstate, then the generator _is_ being resumed in another thread and so the generated traceback will be wrong (among other issues which surely occur by fudging a frame's threadstate). Perhaps it could be set conditionally by gen_close before signalling the exception? A lie, but a smaller lie than a segfault. We could advertise that the exception ocurring from generator .close() isn't guaranteed to have an accurate traceback in this case. Take all this with a grain of un-core-savvy salt. Thanks again for investigating this, Tim, -Mike ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com