Re: [Python-Dev] py34 makes it harder to read all of a pty
2014-11-12 22:16 GMT+00:00 Buck Golemon buck.2...@gmail.com: This is due to the fix for issue21090, which aimed to un-silence errors which previously went unheard. The fix is for me, as a user, to write a loop that uses os.read and interpretes EIO as EOF. This is what I had hoped file.read() would do for me, however, and what it used to do in previous pythons. There's no reason for read() to interpret EIO as EOF in the general case: it was masked in previous versions because of a mere bug. The behavior is now correct, although being able to retrieve the data read so far in case of a buffered read could be useful. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
There's no return value, a KeywordInterrupt exception is raised. The PEP wouldn't change this behavior. As for the general behavior: all programming languages/platforms handle EINTR transparently. It's high time for Python to have a sensible behavior in this regard. 2014-09-01 8:38 GMT+01:00 Marko Rauhamaa ma...@pacujo.net: Victor Stinner victor.stin...@gmail.com: No, it's the opposite. The PEP doesn't change the default behaviour of SIGINT: CTRL+C always interrupt the program. Which raises an interesting question: what happens to the os.read() return value if SIGINT is received? Marko ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/cf.natali%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 475, Retry system calls failing with EINTR
2014-09-01 12:15 GMT+01:00 Marko Rauhamaa ma...@pacujo.net: Charles-François Natali cf.nat...@gmail.com: Which raises an interesting question: what happens to the os.read() return value if SIGINT is received? There's no return value, a KeywordInterrupt exception is raised. The PEP wouldn't change this behavior. Slightly disconcerting... but I'm sure overriding SIGINT would cure that. You don't want to lose data if you want to continue running. As for the general behavior: all programming languages/platforms handle EINTR transparently. C doesn't. EINTR is there for a purpose. Python is slightly higher level than C, right? I was referring to Java, go, Haskell... Furthermore, that's not true: many operating systems actually restart syscalls by default (including Linux, man 7 signal): Interruption of system calls and library functions by signal handlers If a signal handler is invoked while a system call or library function call is blocked, then either: * the call is automatically restarted after the signal handler returns; or * the call fails with the error EINTR. Which of these two behaviors occurs depends on the interface and whether or not the signal handler was established using the SA_RESTART flag (see sigaction(2)). The details vary across UNIX systems; below, the details for Linux. The reason the interpreter is subject to so many EINTR is that we *explicitly* clear SA_RESTART because the C-level signal handler must be handled by the interpreter to have a chance to run the Python-level handlers from the main loop. There are many aspects of signal handling in Python that make it different from C: if you want C semantics, stick to C. I do not want to have to put all blocking syscalls within a try/except loop: have a look at the stdlib code, you'll see it's really a pain and ugly. And look at the number of EINTR-related syscalls we've had. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Exposing the Android platform existence to Python modules
2014-08-01 13:23 GMT+01:00 Shiz h...@shiz.me: Is your P.S. suggestive that you would not be willing to support your port for use by others? Of course, until it is somewhat complete, it is hard to know how complete and compatible it can be. Oh, no, nothing like that. It's just that I'm not sure, as goes for anything, that it would be accepted into mainline CPython. Better safe than sorry in that aspect: maybe the maintainers don't want to support Android in the first place. :) Well, Android is so popular that supporting it would definitely be interesting. There are a couple questions however (I'm not familiar at all with Android, I don't have a smartphone ;-): - Do you have an idea of the amount of work/patch size required? Do you have an example of a patch (even if it's a work-in-progess)? - Is there really a common Android platform? I've heard a lot about fragmentation, so would we have to support several Android flavours (like #ifdef __ANDROID_VENDOR_A__, #elif defined __ANDROID_VENDOR_B__)? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
2014-07-01 8:44 GMT+01:00 Victor Stinner victor.stin...@gmail.com: IMO we must decide if scandir() must support or not file descriptor. It's an important decision which has an important impact on the API. I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471: scandir(fd) and pathlib.Path(name, dir_fd=None)
2014-07-02 12:51 GMT+02:00 Charles-François Natali cf.nat...@gmail.com: I don't think we should support it: it's way too complicated to use, error-prone, and leads to messy APIs. Can you please elaborate? Which kind of issue do you see? Handling the lifetime of the directory file descriptor? Yes, among other things. You can e.g. have a look at os.fwalk() or shutil._rmtree_safe_fd() to see that using those *properly* is far from being trivial. You don't like the dir_fd parameter of os functions? Exactly, I think it complicates the API for little benefit (FWIW, no other language I know of exposes them). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] should tests be thread-safe?
You might have forgotten to include Python-dev in the reply. Indeed, adding it back! Thank you for the reply. I might have expressed the question poorely. I meant: I have a script that I know is not thread-safe but it doesn't matter because the test itself doesn't run any threads and the current tests are never(?) run in multiple threads (-j uses processes). Should this *new* test be fixed if e.g., there is a desire to be able to run (at least some) tests in multiple threads concurrently in the future? The short answer is: no, you don't have to make you thread thread safe, as long as it can reliably run even in the presence of background threads (like the tkinter threads Victor mentions). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
2014-04-28 21:24 GMT+01:00 Claudiu Popa pcmantic...@gmail.com: [...] If anyone agrees with the above, then I'll modify the patch. This will be its last iteration, any other bikeshedding should be addressed by the core dev who'll apply it. I'm perfectly happy with those proposals. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [issue6839] zipfile can't extract file
2014-04-30 3:58 GMT+01:00 Steven D'Aprano st...@pearwood.info: On Tue, Apr 29, 2014 at 07:48:00PM -0700, Jessica McKellar wrote: Hi Adam, Gentlemen, Thanks for contributing to Python! But not everyone on this list is a guy. And not all of the guys are gentlemen :-) And I thought guys could be used to address mixed-gender groups (I'm pretty sure I've heard some ladies use it in this setting), but I'm not a native speaker. The idea being that one should not infer too much from a salutation from someone who might not be a native speaker (some languages default to masculine for a mixed audience), although in this case Ladies and gentlemen is really famous. In any case, I'm sure he'd like to have his code reviewed by someone, regardless of its gender! ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
(2) The patch adds new functionality to use multiple processes in parallel. The normal parameter values are integers indicating how many processes to use. The parameter also needs two special values -- one to indicate use os.cpu_count, and the other to indicate don't use multiprocessing at all. (A) Is there a Best Practices for this situation, with two odd cases? No. In this situation I would consider 0 or -1 for use os.cpu_count' and None for don't use multi-processing. Why would the user care if multiprocessing is used behind the scene? It would be strange for processes=1 to fail if multiprocessing is not available. If you set a default value of 1, then compileall() will work regardless of whether multiprocessing is available. In short: processes = 0: use os.cpu_count() processes == 1 (default): just use normal sequential compiling processes 1: use multiprocessing There's no reason to introduce None. Or am I missing something? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
And incidentally, I think that the argument *processes* should be renamed to *workers*, or *jobs* (like in make), and any mention of multiprocessing in the documentation should be removed (if any): multiprocessing is an implementation detail. When I type: make -jN I don't really care that make is using fork() ;-) ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] file objects guarantees
Hi, What's meant exactly by a file object? Let me be more specific: for example, pickle.dump() accepts a file object. Looking at the code, it doesn't check the return value of its write() method. So it assumes that write() should always write the whole data (not partial write). Same thing for read, it assumes there won't be short reads. A sample use case would be passing a socket.makefile() to pickle: it works, because makefile() returns a BufferedReader/Writer which takes care of short read/write. But the documentation just says file object. And if you have a look the file object definition in the glossary: https://docs.python.org/3.5/glossary.html#term-file-object There are actually three categories of file objects: raw binary files, buffered binary files and text files. Their interfaces are defined in the io module. The canonical way to create a file object is by using the open() function. So someone passing e.g. a raw binary file - which doesn't handle short reads/writes - would run into trouble. It's the same thing for e.g. GzipFile, and probably many others. Would it make sense to add a note somewhere? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] API and process questions (sparked by Claudiu Popa on 16104
2014-04-28 18:29 GMT+01:00 Jim J. Jewett jimjjew...@gmail.com: On Mon, Apr 28, 2014 at 12:56 PM, Charles-François Natali cf.nat...@gmail.com wrote: Why would the user care if multiprocessing is used behind the scene? Err ... that was another set of questions that I forgot to ask. (A) Why bother raising an error if multiprocessing is unavailable? After all, there is a perfectly fine fallback... On other hand, errors should not pass silently. If a user has explicitly asked for multiprocessing, there should be some notice that it didn't happen. And builds are presumably something that a developer will monitor to respond to the Exception. The point I'm making is that he's not asking for multiprocessing, he's asking for parallel build. If you pass 1 (or keep the default value), there's no fallback involved: the code should simply proceed sequentially. (A1) What sort of Error? I'm inclined to raise the original ImportError, but the patch prefers a ValueError. NotImplementedError would maybe make sense. As Claudiu pointed out, processes=1 should really mean 1 worker process, which is still different from do everything in the main process. I'm not sure that level of control is really worth the complexity, but I'm not certain it isn't. I disagree. If you pass job =1 (and not processes = 1), then you don't care whether multiprocessing is available or not: you just do everything in your main process. It would be quite wasteful to create a single child process! processes = 0: use os.cpu_count() I could understand doing that for 0 or -1; what is the purpose of doing it for both, let alone for -4? Are we at the point where the parameter should just take positive integers or one of a set of specified string values? Honestly, as long as it accepts 0, I'm happy. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [numpy wishlist] PyMem_*Calloc
Indeed, that's very reasonable. Please open an issue on the tracker! ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pickle self-delimiting
Hi, Unless I'm mistaken, pickle's documentation doesn't mention that the pickle wire-format is self-delimiting. Is there any reason why it's not documented? The reason I'm asking is because I've seen some code out there doing its own ad-hoc length-prefix framing. Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pickle self-delimiting
No reason AFAIK. However, the fact that it is self-delimited is implicit in the fact that Bytes past the pickled object's representation are ignored: https://docs.python.org/dev/library/pickle.html#pickle.load I find this sentence worrying: it could lead one to think that load() could read more bytes than the expected object representation size: this would make pickle actually non self-delimiting, and could lead to problems when reading e.g. from a socket, since an extraneous read() could block. I think it's worth making it clear in the doc, I'll open an issue on the tracker. ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Confirming status of new modules in 3.4
2014-03-15 21:44 GMT+00:00 Nikolaus Rath nikol...@rath.org: Guido van Rossum gu...@python.org writes: This downside of using subclassing as an API should be well known by now and widely warned against. It wasn't known to me until now. Are these downsides described in some more detail somewhere? The short version is: inheritance breaks encapsulation. As a trivial and stupid example, let's say you need a list object which counts the number of items inserted/removed (it's completely stupid, but that's not the point :-): So you might do something like: class CountingList(list): [...] def append(self, e): self.inserted += 1 return super().append(e) def extend(self, l): self.inserted += len(l) return super().extend(l) Looks fine, it would probably work. Now, it's actually very fragile: imagine what would happen if list.extend() was internally implemented by calling list.append() for each element: you'd end up counting each element twice (since the subclass append() method would be called). And that's the problem: by deriving from a class, you become dependent of its implementation, even though you're using its public API. Which means that it could work with e.g. CPython but not Pypy, or break with a new version of Python. Another related problem is, as Guido explained, that if you add a new method in the subclass, and the parent class gains a method with the same name in a new version, you're in trouble. That's why advising inheritance as a silver bullet for code reuses is IMO one of the biggest mistakes in OOP, simply because although attractive, inheritance breaks encapsulation. As a rule of thumb, you should only use inheritance within a module/package, or in other words only if you're in control of the implementation. The alternative is to use composition For more details, I highly encourage anyone interested in looking at the book Effective Java by Joshua Bloch (the example above is inspired by his book). Although Java-centric, it's packed with many advises, patterns and anti-patterns that are relevant to OOP and just programming in general (it's in my top-5 books). cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Confirming status of new modules in 3.4
2014-03-15 11:02 GMT+00:00 Giampaolo Rodola' g.rod...@gmail.com: One part which can be improved is that right now the selectors module doesn't take advance of e/poll()'s modify() method: instead it just unregister() and register() the fd every time, which is of course considerably slower (there's also a TODO in the code about this). I guess that can be fixed later in a safely manner. Sure, it can be fixed easily, but I'd like to see the gain of this on a non-trivial benchmark (I mean a realistic workload, not just calling modify() in a tight loop). ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428 - pathlib API questions
2013/11/25 Greg Ewing greg.ew...@canterbury.ac.nz: Ben Hoyt wrote: However, it seems there was no further discussion about why not extension and extensions? I have never heard a filename extension being called a suffix. You can't have read many unix man pages, then! I just searched for suffix in the gcc man page, and found this: For any given input file, the file name suffix determines what kind of compilation is done: I know it is a suffix in the sense of the English word, but I've never heard it called that in this context, and I think context is important. This probably depends on your background. In my experience, the term extension arose in OSes where it was a formal part of the filename syntax, often highly constrained. E.g. RT11, CP/M, early MS-DOS. Unix has never had a formal notion of extensions like that, only informal conventions, and has called them suffixes at least some of the time for as long as I can remember. Indeed. Just for reference, here's an extract of POSIX basename(1) man page [1]: SYNOPSIS basename string [suffix] DESCRIPTION The string operand shall be treated as a pathname, as defined in XBD Pathname. The string string shall be converted to the filename corresponding to the last pathname component in string and then the suffix string suffix, if present, shall be removed. [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/basename.html cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 454 - tracemalloc - accepted
Hi, I'm happy to officially accept PEP 454 aka tracemalloc. The API has substantially improved over the past weeks, and is now both easy to use and suitable as a fundation for high-level tools for memory-profiling. Thanks to Victor for his work! Charles-François ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 454 (tracemalloc) close to pronouncement
Hi, After several exchanges with Victor, PEP 454 has reached a status which I consider ready for pronuncement [1]: so if you have any last minute comment, now is the time! Cheers, cf [1] http://www.python.org/dev/peps/pep-0454/ ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Updated PEP 454 (tracemalloc): no more metrics!
2013/10/24 Kristján Valur Jónsson krist...@ccpgames.com: Now, I would personally not truncate the stack, because I can afford the memory, but even if I would, for example, to hide a bunch of detail, I would want to throw away the _lower_ detals of the stack. It is unimportant to me to know if memory was allocated in ...;itertools.py;logging.py;stringutil.py but more important to know that it was allocated in main.py;databaseengine.py;enginesettings.py;... Well, maybe to you, but if you look at valgrind for example, it keeps the top of the stack: and it makes a lot of sense to me, since otherwise you won't be able to find where the leak occurred. Anyway, since the stack depth is a tunable parameter, this shouldn't be an issue in practice: just save the whole stack. 2013/10/24 MRAB pyt...@mrabarnett.plus.com: When I was looking for memory leaks in the regex module I simply wrote all of the allocations, reallocations and deallocations to a log file and then parsed it afterwards using a Python script. Simple, but effective. We've all done that ;-) 1) really, all that is required in terms of data is the traceback.get_traces() function. Further, it _need_ not return addresses since they are not required for analysis. It is sufficient for it to return a list of (traceback, size, count) tuples. Sure. Since the beginning, I'm also leaning towards a minimal API, and let third-party tools do the analysis. It makes a lot of sense, since some people will want just basic snapshot information, some others will want to compute various statistics, some others will want to display the result in a GUI... But OTOT, it would be too bad not to ship the stdlib with a basic tool to process data, to as to make it usable out-of-the box. And in this regard, we should probably mimick what's done for CPU profiling: there are both low-level profiling data gathering infrastructure (profile and cProfile), but there's also a pstats.Stats class allowing basic operations/display on this raw data. That's IMO a reasonable balance. cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Switch subprocess stdin to a socketpair, attempting to fix issue #19293 (AIX
For the record, pipe I/O seems a little faster than socket I/O under Linux: $ ./python -m timeit -s import os, socket; a,b = socket.socketpair(); r=a.fileno(); w=b.fileno(); x=b'x'*1000 os.write(w, x); os.read(r, 1000) 100 loops, best of 3: 1.1 usec per loop $ ./python -m timeit -s import os, socket; a,b = socket.socketpair(); x=b'x'*1000 a.sendall(x); b.recv(1000) 100 loops, best of 3: 1.02 usec per loop $ ./python -m timeit -s import os; r, w = os.pipe(); x=b'x'*1000 os.write(w, x); os.read(r, 1000) 100 loops, best of 3: 0.82 usec per loop That's a raw write()/read() benchmark, but it's not taking something important into account: pipes/socket are usually used to communicate between concurrently running processes. And in this case, an important factor is the pipe/socket buffer size: the smaller it is, the more context switches (due to blocking writes/reads) you'll get, which greatly decreases throughput. And by default, Unix sockets have large buffers than pipes (between 4K and 64K for pipes depending on the OS): I wrote a quick benchmark forking a child process, with the parent writing data through the pipe, and waiting for the child to read it all. here are the results (on Linux): # time python /tmp/test.py pipe real0m2.479s user0m1.344s sys 0m1.860s # time python /tmp/test.py socketpair real0m1.454s user0m1.242s sys 0m1.234s So socketpair is actually faster. But as noted by Victor, there a slight differences between pipes and sockets I can think of: - pipes guarantee write atomicity if less than PIPE_BUF is written, which is not the case for sockets - more annoying: in subprocess, the pipes are not set non-blocking: after a select()/poll() returns a FD write-ready, we write less than PIPE_BUF at a time to avoid blocking: this likely wouldn't work with a socketpair But this patch doesn't touch subprocess itself, and the FDs is only used by asyncio, which sets them non-blocking: so this could only be an issue for the spawned process, if it does rely on the two pipe-specific behaviors above. OTOH, having a unique implementation on all platforms makes sense, and I don't know if it'll actually be a problem in practice, we we could ship as-is and wait until someone complains ;-) cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] pathlib (PEP 428) status
Hi, What's the current status of pathlib? Is it targeted for 3.4? It would be a really nice addition, and AFAICT it has already been maturing a while on pypi, and discussed several times here. If I remember correctly, the only remaining issue was stat()'s result caching. cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
``get_tracemalloc_memory()`` function: Get the memory usage in bytes of the ``tracemalloc`` module as a tuple: ``(size: int, free: int)``. * *size*: total size of bytes allocated by the module, including *free* bytes * *free*: number of free bytes available to store data What's *free* exactly? I assume it's linked to the internal storage area used by tracemalloc itself, but that's not clear at all. Also, is the tracemalloc overhead included in the above stats (I'm mainly thinking about get_stats() and get_traced_memory()? If yes, I find it somewhat confusing: for example, AFAICT, valgrind's memcheck doesn't report the memory overhead, although it can be quite large, simply because it's not interesting. My goal is to able to explain how *every* byte is allocated in Python. If you enable tracemalloc, your RSS memory will double, or something like that. You can use get_tracemalloc_memory() to add metrics to a snapshot. It helps to understand how the RSS memory evolves. Basically, get_tracemalloc_size() is the memory used to store traces. It's something internal to the C module (_tracemalloc). This memory is not traced because it *is* the traces... and so is not counted in get_traced_memory(). The issue is probably the name (or maybe also the doc): would you prefer get_python_memory() / get_traces_memory() names, instead of get_traced_memory() / get_tracemalloc_memory()? No, the names are fine as-is. FYI Objects allocated in tracemalloc.py (real objects, not traces) are not counted in get_traced_memory() because of a filter set up by default (it was not the case in previous versions of the PEP). You can remove the filter using tracemalloc.clear_filters() to see this memory. There are two exceptions: Python objects created for the result of get_traces() and get_stats() are never traced for efficiency. It *is* possible to trace these objects, but it's really too slow. get_traces() and get_stats() may be called outside tracemalloc.py, so another filter would be needed. Well, it's easier to never trace these objects. Anyway, they are not interesting to understand where your application leaks memory. Perfect, that's all I wanted to know. get_object_trace(obj) is a shortcut for get_trace(get_object_address(obj)). I agree that the wrong size information can be surprising. I can delete get_object_trace(), or rename the function to get_object_traceback() and modify it to only return the traceback. I prefer to keep the function (modified for get_object_traceback). tracemalloc can be combined with other tools like Melia, Heapy or objgraph to combine information. When you find an interesting object with these tools, you may be interested to know where it was allocated. If you mean modify it to return only the trace, then that's fine. As for the name, traceback does indeed sound less confusing than trace, but we should just make sure that the names are consistent across the API (i.e. always use trace or always use traceback, not both). ``get_trace(address)`` function: Get the trace of a memory block as a ``(size: int, traceback)`` tuple where *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the memory block. See also ``get_object_trace()``, ``get_stats()`` and ``get_traces()`` functions. Do you have example use cases where you want to work with a raw addresses? An address is the unique key to identify a memory block. In Python, you don't manipulate directly memory blocks, that's why you have a get_object_address() function (link objects to traces). I added get_trace() because get_traces() is very slow. It would be stupid to call it if you only need one trace of a memory block. I'm not sure that this function is really useful. I added it to workaround the performance issue, and because I believe that someone will need it later :-) What do you suggest for this function? Well, I can certainly find a use-case for get_object_trace(): even if it uses get_trace() internally, I'm not convinced that the later is useful. If we cannot come up with a use case for working with raw addresses, I'm tempted to just keep get_object_trace() public, and make get_object_address() and get_trace() private. In short, don't make any address-manipulating function public. Are those ``match`` methods really necessary for the end user, i.e. are they worth being exposed as part of the public API? (Oh, I just realized that match_lineno() and may lead to bugs, I removed it.) Initially, I exposed the methods for unit tests. Later, I used them in Snapshot.apply_filters() to factorize the code (before I add 2 implementations to match a filter, one in C, another in Python). I see tracemalloc more as a library, I don't know yet how it will be used by new tools based on it.
Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
2013/10/19 Nick Coghlan ncogh...@gmail.com: Speaking of which... Charles-François, would you be willing to act as BDFL-Delegate for this PEP? This will be a very useful new analysis tool, and between yourself and Victor it looks like you'll be able to come up with a solid API. I just suggested that approach to Guido and he also liked the idea :) Well, I'd be happy to help get this merged. There's still the deadline problem: do we have to get this PEP approved and merged within 24 hours? cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 454 (tracemalloc): new minimalist version
Hi, I'm happy to see this move forward! API === Main Functions -- ``clear_traces()`` function: Clear traces and statistics on Python memory allocations, and reset the ``get_traced_memory()`` counter. That's nitpicking, but how about just ``reset()`` (I'm probably biased by oprofile's opcontrol --reset)? ``get_stats()`` function: Get statistics on traced Python memory blocks as a dictionary ``{filename (str): {line_number (int): stats}}`` where *stats* in a ``(size: int, count: int)`` tuple, *filename* and *line_number* can be ``None``. It's probably obvious, but you might want to say once what *size* and *count* represent (and the unit for *size*). ``get_tracemalloc_memory()`` function: Get the memory usage in bytes of the ``tracemalloc`` module as a tuple: ``(size: int, free: int)``. * *size*: total size of bytes allocated by the module, including *free* bytes * *free*: number of free bytes available to store data What's *free* exactly? I assume it's linked to the internal storage area used by tracemalloc itself, but that's not clear at all. Also, is the tracemalloc overhead included in the above stats (I'm mainly thinking about get_stats() and get_traced_memory()? If yes, I find it somewhat confusing: for example, AFAICT, valgrind's memcheck doesn't report the memory overhead, although it can be quite large, simply because it's not interesting. Trace Functions --- ``get_traceback_limit()`` function: Get the maximum number of frames stored in the traceback of a trace of a memory block. Use the ``set_traceback_limit()`` function to change the limit. I didn't see anywhere the default value for this setting: it would be nice to write it somewhere, and also explain the rationale (memory/CPU overhead...). ``get_object_address(obj)`` function: Get the address of the main memory block of the specified Python object. A Python object can be composed by multiple memory blocks, the function only returns the address of the main memory block. IOW, this should return the same as id() on CPython? If yes, it could be an interesting note. ``get_object_trace(obj)`` function: Get the trace of a Python object *obj* as a ``(size: int, traceback)`` tuple where *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. I find the trace word confusing, so it might be interesting to add a note somewhere explaining what it is (callstack leading to the object allocation, or whatever). Also, this function leaves me a mixed feeling: it's called get_object_trace(), but you also return the object size - well, a vague estimate thereof. I wonder if the size really belongs here, especially if the information returned isn't really accurate: it will be for an integer, but not for e.g. a list, right? How about just using sys.getsizeof(), which would give a more accurate result? ``get_trace(address)`` function: Get the trace of a memory block as a ``(size: int, traceback)`` tuple where *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples, *filename* and *lineno* can be ``None``. Return ``None`` if the ``tracemalloc`` module did not trace the allocation of the memory block. See also ``get_object_trace()``, ``get_stats()`` and ``get_traces()`` functions. Do you have example use cases where you want to work with a raw addresses? Filter -- ``Filter(include: bool, pattern: str, lineno: int=None, traceback: bool=False)`` class: Filter to select which memory allocations are traced. Filters can be used to reduce the memory usage of the ``tracemalloc`` module, which can be read using the ``get_tracemalloc_memory()`` function. ``match(filename: str, lineno: int)`` method: Return ``True`` if the filter matchs the filename and line number, ``False`` otherwise. ``match_filename(filename: str)`` method: Return ``True`` if the filter matchs the filename, ``False`` otherwise. ``match_lineno(lineno: int)`` method: Return ``True`` if the filter matchs the line number, ``False`` otherwise. ``match_traceback(traceback)`` method: Return ``True`` if the filter matchs the *traceback*, ``False`` otherwise. *traceback* is a tuple of ``(filename: str, lineno: int)`` tuples. Are those ``match`` methods really necessary for the end user, i.e. are they worth being exposed as part of the public API? StatsDiff - ``StatsDiff(differences, old_stats, new_stats)`` class: Differences between two ``GroupedStats`` instances. The ``GroupedStats.compare_to()`` method creates a ``StatsDiff`` instance. ``sort()`` method: Sort the ``differences`` list from the biggest difference to the smallest difference. Sort by ``abs(size_diff)``, *size*, ``abs(count_diff)``, *count* and then by *key*.
Re: [Python-Dev] cpython: Try doing a raw test of os.fork()/os.kill().
2013/10/17 Antoine Pitrou solip...@pitrou.net: On Thu, 17 Oct 2013 15:33:02 +0200 (CEST) richard.oudkerk python-check...@python.org wrote: http://hg.python.org/cpython/rev/9558e9360afc changeset: 86401:9558e9360afc parent: 86399:9cd88b39ef62 user:Richard Oudkerk shibt...@gmail.com date:Thu Oct 17 14:24:06 2013 +0100 summary: Try doing a raw test of os.fork()/os.kill(). For this kind of ad-hoc testing, you can also use a custom builder to avoid disrupting the main source tree: AFAICT, the problem he's trying to debug (issue #19227) only occurs on two specific - stable - buildbots. cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: Pathlib
2013/9/16 Antoine Pitrou solip...@pitrou.net: Le Sun, 15 Sep 2013 06:46:08 -0700, Ethan Furman et...@stoneleaf.us a écrit : I see PEP 428 is both targeted at 3.4 and still in draft status. What remains to be done to ask for pronouncement? I think I have a couple of items left to integrate in the PEP. Mostly it needs me to take a bit of time and finalize the PEP, and then have a PEP delegate (or Guido) pronounce on it. IIRC, during the last discussion round, we were still debating between implicit stat() result caching - which requires an explicit restat() method - vs a mapping between the stat() method and a stat() syscall. What was the conclusion? ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] DTRACE support
As far as I know, Erlang, Ruby, PHP, Perl, etc., support Dtrace. Python is embarrasingly missing from this list. Some examples: http://crypt.codemancers.com/posts/2013-04-16-profile-ruby-apps-dtrace-part1/ http://www.phpdeveloper.org/news/18859 http://www.erlang.org/doc/apps/runtime_tools/DTRACE.html I have spend a very long time on a patch for Dtrace support in most platforms with dtrace available. Currently working under Solaris and derivatives, and MacOS X. Last time I checked, it would crash FreeBSD because bugs in the dtrace port, but that was a long time ago. I would like to push this to Python 3.4, and the window is going to be closed soon, so I think this is the time to ask for opinions and support here. Does Python-Dev have any opinion or interest in this project?. Should I push for it? IMO, that's a large, intrusive patch, which distracts the reader from the main code and logic. Here's an extract from Modules/gcmodule.c: static void dtrace_gc_done(Py_ssize_t value) { PYTHON_GC_DONE((long) value); /* * Currently a USDT tail-call will not receive the correct arguments. * Disable the tail call here. */ #if defined(__sparc) asm(nop); #endif } Also have a look at cevalc.c: http://bugs.python.org/review/13405/diff/6152/Python/ceval.c IMO it's not worth it (personally strace/gdb/valgrind are more than enough for me, and we''re about to gain memory tracing with Victor's tracemalloc). cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] DTRACE support
The main value of DTrace is systemwide observability. You can see something strange at kernel level and trace it to a particular line of code in a random Python script. There is no other tool that can do that. You have complete transversal observability of ALL the code running in your computer, kernel or usermode, clean reports with threads, etc. Don't get me wrong, I'm not saying DTrace is useless. I'm just saying that, as far as I'm concerned, I've never had any trouble debugging/tunning a Python script with non-intrusive tools (strace, gdb, valgrind, and oprofile for profiling). Of course, this includes analysing bug reports. Maybe the biggest objection would be that most python-devs are running Linux, and you don't have dtrace support on linux unless you are running Oracle distribution. But world is larger than linux, and there are some efforts to port DTrace to Linux itself. DTrace is available on Solaris and derivatives, MacOS X and FreeBSD. That's true, I might have a different opinion if I used Solaris. But that's not the case, so te me, the cognitive overhead incurred by this large patch isn't worth it. So I'm -1, but that's a personal opinion :-) cf ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Add a new tracemalloc module to trace memory allocations
2013/8/29 Victor Stinner victor.stin...@gmail.com: Charles-François Natali and Serhiy Storchaka asked me to add this module somewhere in Python 3.4: how about adding pyfailmalloc to the main repo (maybe under Tools), with a script making it easy to run the tests suite with it enabled? There are two reasons I think it would be a great addition: - since OOM conditions are - almost - never tested, the OOM handling code is - almost - always incorrect: indeed, Victor has found and fixed several dozens crashes thanks to this module - this module is actually really simple (~150 LOC) I have two comments on the API: 1) failmalloc.enable(range: int=1000): schedule a memory allocation failure in random.randint(1, range) allocations. That's one shot, i.e. only one failure will be triggered. So if this failure occurs in a place where the code is prepared to handle MemoryError (e.g. bigmem tests), no failure will occur in the remaining test. It would be better IMO to repeat this (i.e. reset the next failure counter), to increase the coverage. 2) It's a consequence of 1): since only one malloc() failure is triggered, it doesn't really reflect how a OOM condition would appear in real life: usually, it's either because you've exhausted your address space or the machine is under memory pressure, which means that once you've hit OOM, you're likely to encounter it again on subsequent allocations, for example if your OOM handling code allocates new memory (that's why it's so complicated to properly handle OOM, and one might want to use memory parachutes). It might be interesting to be able to pass an absolute maximum memory usage, or an option where once you've triggered an malloc() failure, you record the current memory usage, and use it as ceiling for subsequent allocations. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] EINTR handling...
Hello, This has been bothering me for years: why don't we properly handle EINTR, by running registered signal handlers and restarting the interrupted syscall (or eventually returning early e.g. for sleep)? EINTR is really a nuisance, and exposing it to Python code is just pointless. Now some people might argue that some code relies on EINTR to interrupt a syscall on purpose, but I don't really buy it: it's highly non-portable (depends on the syscall, SA_RESTART flag...) and subject to race conditions (it it comes before the syscall or if you get a partial read/write you'll deadlock). Furthermore, the stdlib code base is not consistent: some code paths handle EINTR, e.g. subprocess, multiprocessing, sock_sendall() does but not sock_send()... Just grep for EINTR and InterruptedError and you'll be amazed. GHC, the JVM and probably other platforms handle EINTR, maybe it's time for us too? Just for reference, here are some issues due to EINTR popping up: http://bugs.python.org/issue17097 http://bugs.python.org/issue12268 http://bugs.python.org/issue9867 http://bugs.python.org/issue7978 http://bugs.python.org/issue12493 http://bugs.python.org/issue3771 cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] EINTR handling...
2013/8/30 Amaury Forgeot d'Arc amaur...@gmail.com: I agree. Is there a way to see in C code where EINTR is not handled? EINTR can be returned on slow syscalls, so a good heuristic would be to start with code that releases the GIL. But I don't see a generic way apart from grepping for syscalls that are documented to return EINTR. Or a method to handle this systematically? The glibc defines this macro: # define TEMP_FAILURE_RETRY(expression) \ (__extension__ \ ({ long int __result; \ do __result = (long int) (expression); \ while (__result == -1L errno == EINTR); \ __result; })) #endif which you can then use as: pid = TEMP_FAILURE_RETRY(waitpid(pid, status, options)); Unfortunately, it's not as easy for us, since we must release the GIL around the syscall, try again if it failed with EINTR, only after having called PyErr_CheckSignals() to run signal handlers. e.g. waitpid(): Py_BEGIN_ALLOW_THREADS pid = waitpid(pid, status, options); Py_END_ALLOW_THREADS should become (conceptually): begin_handle_eintr: Py_BEGIN_ALLOW_THREADS pid = waitpid(pid, status, options); Py_END_ALLOW_THREADS if (pid 0 errno == EINTR) { if (PyErr_CheckSignals()) return NULL; goto begin_handle_eintr; } We might want to go for a clever macro (like BEGIN_SELECT_LOOP in socketmodule.c). 2013/8/30 Nick Coghlan ncogh...@gmail.com: Sounds good to me. I don't believe there's been a conscious decision that we *shouldn't* handle it, it just hasn't annoyed anyone enough for them to propose a systematic fix in CPython. If that latter part is no longer true, great ;) Great, I'll open a bug report then :) cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] hg.python.org is slow
Hi, I'm trying to checkout a pristine clone from ssh://h...@hg.python.org/cpython, and it's taking forever: 07:45:35.605941 IP 192.168.0.23.43098 virt-7yvsjn.psf.osuosl.org.ssh: Flags [.], ack 22081460, win 14225, options [nop,nop,TS val 368519 ecr 2401783356], length 0 07:45:38.558348 IP virt-7yvsjn.psf.osuosl.org.ssh 192.168.0.23.43098: Flags [.], seq 22081460:22082908, ack 53985, win 501, options [nop,nop,TS val 2401784064 ecr 368519], length 1448 07:45:38.558404 IP 192.168.0.23.43098 virt-7yvsjn.psf.osuosl.org.ssh: Flags [.], ack 22082908, win 14225, options [nop,nop,TS val 369257 ecr 2401784064], length 0 07:45:39.649995 IP virt-7yvsjn.psf.osuosl.org.ssh 192.168.0.23.43098: Flags [.], seq 22082908:22084356, ack 53985, win 501, options [nop,nop,TS val 2401784367 ecr 369257], length 1448 See the time to just get an ACK? Am I the only one experiencing this? Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] hg.python.org is slow
2013/8/27 Antoine Pitrou solip...@pitrou.net: Sounds a lot like a network problem, then? If I'm the only one, it's likely, although these pathological timeouts are transient, and I don't have any problem with other servers (my line sustains 8Mb/s without problem). Have you tried a traceroute? I'll try tonight if this persists, and keep you posted. 2013/8/27 Ned Deily n...@acm.org: BTW, do you have ssh compression enabled for that host? Yep. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446 (make FD non inheritable) ready for a final review
Hello, A couple remarks: The following functions are modified to make newly created file descriptors non-inheritable by default: [...] os.dup() then os.dup2() has a new optional inheritable parameter: os.dup2(fd, fd2, inheritable=True). fd2 is created inheritable by default, but non-inheritable if inheritable is False. Why does dup2() create inheritable FD, and not dup()? I think a hint is given a little later: Applications using the subprocess module with the pass_fds parameter or using os.dup2() to redirect standard streams should not be affected. But that's overly-optimistic. For example, a lot of code uses the guarantee that dup()/open()... returns the lowest numbered file descriptor available, so code like this: r, w = os.pipe() if os.fork() == 0: # child os.close(r) os.close(1) dup(w) *will break* And that's a lot of code (e.g. that's what _posixsubprocess.c uses, but since it's implemented in C it's wouldn't be affected). We've already had this discussion, and I stand by my claim that changing the default *will break* user code. Furthermore, many people use Python for system programming, and this change would be highly surprising. So no matter what the final decision on this PEP is, it must be kept in mind. The programming languages Go, Perl and Ruby make newly created file descriptors non-inheritable by default: since Go 1.0 (2009), Perl 1.0 (1987) and Ruby 2.0 (2013). OK, but do they expose OS file descriptors? I'm sure such a change would be fine for Java, which doesn't expose FDs and fork(), but Python's another story. Last time, I said that to me, the FD inheritance issue is solved on POSIX by the subprocess module which passes close_fds. In my own code, I use subprocess, which is the official, portable and safe way to create child processes in Python. Someone using fork() + exec() should know what he's doing, and be able to deal with the consequences: I'm not only talking about FD inheritance, but also about async-signal/multi-threaded safety ;-) As for Windows, since it doesn't have fork(), it would make sense to make its FD non heritable by default. And then use what you describe here to selectively inherit FDs (i.e. implement keep_fds): Since Windows Vista, CreateProcess() supports an extension of the STARTUPINFO struture: the STARTUPINFOEX structure. Using this new structure, it is possible to specify a list of handles to inherit: PROC_THREAD_ATTRIBUTE_HANDLE_LIST. Read Programmatically controlling which handles are inherited by new processes in Win32 (Raymond Chen, Dec 2011) for more information. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446 (make FD non inheritable) ready for a final review
About your example: I'm not sure that it is reliable/portable. I sa daemon libraries closing *all* file descriptors and then expecting new file descriptors to become 0, 1 and 2. Your example is different because w is still open. On Windows, I have seen cases with only fd 0, 1, 2 open, and the next open() call gives the fd 10 or 13... Well, my example uses fork(), so obviously doesn't apply to Windows. It's perfectly safe on Unix. I'm optimistic and I expect that most Python applications and libraries already use the subprocess module. The subprocess module closes all file descriptors (except 0, 1, 2) since Python 3.2. Developers relying on the FD inheritance and using the subprocess with Python 3.2 or later already had to use the pass_fds parameter. As long as the PEP makes it clear that this breaks backward compatibility, that's fine. IMO the risk of breakage outweights the modicum benefit. The subprocess module has still a (minor?) race condition in the child process. Another C thread can create a new file descriptor after the subprocess module closed all file descriptors and before exec(). I hope that it is very unlikely, but it can happen. No it can't, because after fork(), there's only one thread. It's perfectly safe. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Open issues/questions
2013/8/2 Victor Stinner victor.stin...@gmail.com: 2013/7/28 Antoine Pitrou solip...@pitrou.net: (A) How should we support support where os.set_inheritable() is not supported? Can we announce that os.set_inheritable() is always available or not? Does such platform exist? FD_CLOEXEC is POSIX: http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html Ok, but this information does not help me. Does Python support non-POSIX platforms? (Windows has HANDLE_FLAG_INHERIT.) If we cannot answer to my question, it's safer to leave os.get/set_inheritable() optional (need hasattr in tests for example). On Unix platforms, you should always have FD_CLOEXEC. If there were such a platform without FD inheritance support, then it would probably make sense to make it a no-op anyway. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Open issues/questions
2013/8/2 Victor Stinner victor.stin...@gmail.com: On Windows, inheritable handles (including open files) are still inherited when a standard stream is overriden in the subprocess module (default value of close_fds is set to False in this case). This issue cannot be solved (at least, I don't see how): it is a limitation of Windows. bInheritedHandles must be set to FALSE (inherit *all* inheritable handles) when handles of standard streams are specified in the startup information of CreateProcess(). Then how about changing the default to creating file descriptors unheritable on Windows (which is apparently the default)? Then you can implement keep_fds by setting them inheritable right before creation, and resetting them right after: sure there's a race in a multi-threaded program, but AFAICT that's already the case right now, and Windows API doesn't leave us any other choice. Amusingly, they address this case by recommending putting process creation in a critical section: http://support.microsoft.com/kb/315939/en-us This way, we keep default platform behavior on Unix and on Windows (so user using low-level syscalls/APIs won't be surprised), and we have a clean way to selectively inherit FD in child processes through subprocess. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Open issues/questions
Having stdin/stdout/stderr cloexec (e.g. after a dup() to redirect to a log file, a socket...) will likely break a lot of code, e.g. code using os.system(), or code calling exec manually (and I'm sure there's a bunch of it). Hmm. os.exec*() could easily make standard streams non-CLOEXEC before calling the underlying C library function. Things are more annoying for os.system(), though. Also, it'll be puzzling to have syscalls automatically set the cloexec flag. I guess a lot of people doing system programming with Python will get bitten, but that's a discussion we already had months ago... Perhaps this advocates for a global flag, e.g. sys.set_default_fd_inheritance(), with False (non-inheritable) being the default for sanity and security. This looks more and more like PEP 433 :-) And honestly, when I think about it, I think that this whole mess is a solution looking for a problem. If we don't want to inherit file descriptors in child processes, the answer is simple: the subprocess module (this fact is not even mentioned in the PEP). If a user wants to use the execve() syscall directly, then he should be aware of the implications. I don't think that patching half the stdlib and complicating the API of many functions is the right way to do this. The stdlib should be updated to replace the handful of places where exec() is called explicitly by subprocess (the only one I can think on top of my head is http.server.CGIHTTPRequestHandler (issue #16945)), otherwise that's about it. cf Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/cf.natali%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Open issues/questions
2013/7/28 Antoine Pitrou solip...@pitrou.net: (C) Should we handle standard streams (0: stdin, 1: stdout, 2: stderr) differently? For example, os.dup2(fd, 0) should make the file descriptor 0 (stdin) inheritable or non-inheritable? On Windows, os.set_inheritable(fd, False) fails (error 87, invalid argument) on standard streams (0, 1, 2) and copies of the standard streams (created by os.dup()). I have been advocating for that, but I now realize that special-casing these three descriptors in a myriad of fd-creating functions isn't very attractive. (if a standard stream fd has been closed, any fd-creating function can re-create that fd: including socket.socket(), etc.) So perhaps only the *original* standard streams should be left inheritable? Having stdin/stdout/stderr cloexec (e.g. after a dup() to redirect to a log file, a socket...) will likely break a lot of code, e.g. code using os.system(), or code calling exec manually (and I'm sure there's a bunch of it). Also, it'll be puzzling to have syscalls automatically set the cloexec flag. I guess a lot of people doing system programming with Python will get bitten, but that's a discussion we already had months ago... cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Add new parameters to configure the inherance of files and for non-blocking sockets
2013/7/7 Cameron Simpson c...@zip.com.au: On 06Jul2013 11:23, Charles-François Natali cf.nat...@gmail.com wrote: | I've read your Rejected Alternatives more closely and Ulrich | Drepper's article, though I think the article also supports adding | a blocking (default True) parameter to open() and os.open(). If you | try to change that default on a platform where it doesn't work, an | exception should be raised. | | Contrarily to close-on-exec, non-blocking only applies to a limited | type of files (e.g. it doesn't work for regular files, which represent | 90% of open() use cases). sockets, pipes, serial devices, ... How do you use open() on a socket (which are already covered by socket(blocking=...)? Also, I said *regular files* - for which O_NONBLOCK doesn't make sense - represent 90% of io.open() use cases, and stand by this claim. Nothing prevents you from setting the FD non-blocking manually. And you can set it on anything. Just because some things don't block anyway isn't really a counter argument. Well, it complicates the signature and implementation. If we go the same way, why stop there and not expose O_DSYNC, O_SYNC, O_DIRECT... When using a high-level API like io.open(), I think we should only expose portable flags, which are supported both on all operating systems (like the 'x' O_EXCL flag added in 3.3) and file types. If you want precise control over the open() sementics, os.open() is the way to go (that's also the rationale behind io.open() opener argument, see http://bugs.python.org/issue12105) cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Add new parameters to configure the inherance of files and for non-blocking sockets
I've read your Rejected Alternatives more closely and Ulrich Drepper's article, though I think the article also supports adding a blocking (default True) parameter to open() and os.open(). If you try to change that default on a platform where it doesn't work, an exception should be raised. Contrarily to close-on-exec, non-blocking only applies to a limited type of files (e.g. it doesn't work for regular files, which represent 90% of open() use cases). Also, one of the main reasons for exposing close-on-exec in open()/socket() etc is to make it possible to create file descriptors with the close-on-exec flag atomically, to prevent unwanted FD inheritance especially in multi-threaded code. And that's not necessary for the non-blocking parameter. Those are two reasons why IMO blocking doesn't have to receive the same treatment as close-on-exec (there's also the Windows issue but I'm not familiar with it). cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 446: Add new parameters to configure the inherance of files and for non-blocking sockets
2013/7/4 Victor Stinner victor.stin...@gmail.com: Even if the PEP 433 was not explicitly rejected, no consensus could be reached. I didn't want to loose all my work on this PEP and so I'm proposing something new which should make everbody agrees :-) Thanks Victor, I think this one is perfectly fine! cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] stat module in C -- what to do with stat.py?
2013/6/20 Thomas Wouters tho...@python.org: If the .py file is going to be wrong or incomplete, why would we want to keep it -- or use it as fallback -- at all? If we're dead set on having a .py file instead of requiring it to be part of the interpreter (whichever that is, however it was built), it should be generated as part of the build process. Personally, I don't see the value in it; other implementations will need to do *something* special to use it anyway. That's exactly my rationale for pushing for removal. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] pyparallel and new memory API discussions...
2013/6/19 Trent Nelson tr...@snakebite.org: The new memory API discussions (and PEP) warrant a quick pyparallel update: a couple of weeks after PyCon, I came up with a solution for the biggest show-stopper that has been plaguing pyparallel since its inception: being able to detect the modification of main thread Python objects from within a parallel context. For example, `data.append(4)` in the example below will generate an AssignmentError exception, because data is a main thread object, and `data.append(4)` gets executed from within a parallel context:: data = [ 1, 2, 3 ] def work(): data.append(4) async.submit_work(work) The solution turned out to be deceptively simple: 1. Prior to running parallel threads, lock all main thread memory pages as read-only (via VirtualProtect on Windows, mprotect on POSIX). 2. Detect attempts to write to main thread pages during parallel thread execution (via SEH on Windows or a SIGSEGV trap on POSIX), and raise an exception instead (detection is done in the ceval frame exec loop). Quick stupid question: because of refcounts, the pages will be written to even in case of read-only access. How do you deal with this? cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] HAVE_FSTAT?
2013/5/17 Antoine Pitrou solip...@pitrou.net: Hello, Some pieces of code are still guarded by: #ifdef HAVE_FSTAT ... #endif I would expect all systems to have fstat() these days. It's pretty basic POSIX, and even Windows has had it for ages. Shouldn't we simply make those code blocks unconditional? It would avoid having to maintain unused fallback paths. I was sure I'd seen a post/bug report about this: http://bugs.python.org/issue12082 The OP was trying to build Python on an embedded platform without fstat(). cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [RELEASED] Python 3.2.5 and Python 3.3.2
2013/5/16 Serhiy Storchaka storch...@gmail.com: 16.05.13 08:20, Georg Brandl написав(ла): On behalf of the Python development team, I am pleased to announce the releases of Python 3.2.5 and 3.3.2. The releases fix a few regressions in 3.2.4 and 3.3.1 in the zipfile, gzip and xml.sax modules. Details can be found in the changelogs: It seems that I'm the main culprit of this releases. Well, when I look at the changelogs, what strikes me more is that you're the author of *many* fixes, and also a lot of new features/improvements. So I wouldn't feel bad if I were you, this kind of things happens (and it certainly did to me). Cheers, Charles ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Issue 11406: adding os.scandir(), a directory iterator returning stat-like info
I wonder how sshfs compared to nfs. (I've modified your benchmark to also test the case where data isn't in the page cache). Local ext3: cached: os.walk took 0.096s, scandir.walk took 0.030s -- 3.2x as fast uncached: os.walk took 0.320s, scandir.walk took 0.130s -- 2.5x as fast NFSv3, 1Gb/s network: cached: os.walk took 0.220s, scandir.walk took 0.078s -- 2.8x as fast uncached: os.walk took 0.269s, scandir.walk took 0.139s -- 1.9x as fast ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 435 - requesting pronouncement
I'm chiming in late, but am I the only one who's really bothered by the syntax? class Color(Enum): red = 1 green = 2 blue = 3 I really don't see why one has to provide values, since an enum constant *is* the value. In many cases, there's no natural mapping between an enum constant and a value, e.g. there's no reason why Color.red should be mapped to 1 and Color.blue to 3. Furthermore, the PEP makes it to possible to do something like: class Color(Enum): red = 1 green = 2 blue = 3 red_alias = 1 which is IMO really confusing, since enum instances are supposed to be distinct. All the languages I can think of that support explicit values (Java being particular in the sense that it's really a full-fledge object which can have attributes, methods, etc) make it optional by default. Finally, I think 99% of users won't care about the assigned value (which is just an implementation detail), so explicit value will be just noise annoying users (well, me at least :-). cf 2013/5/5 Eli Bendersky eli...@gmail.com: Hello pydev, PEP 435 is ready for final review. A lot of the feedback from the last few weeks of discussions has been incorporated. Naturally, not everything could go in because some minor (mostly preference-based) issues did not reach a consensus. We do feel, however, that the end result is better than in the beginning and that Python can finally have a useful enumeration type in the standard library. I'm attaching the latest version of the PEP for convenience. If you've read previous versions, the easiest way to get acquainted with the recent changes is to go through the revision log at http://hg.python.org/peps A reference implementation for PEP 435 is available at https://bitbucket.org/stoneleaf/ref435 Kind regards and happy weekend. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/cf.natali%40gmail.com ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: stat caching undesirable?
Yes, definitely. This is exactly what my os.walk() replacement, Betterwalk, does: https://github.com/benhoyt/betterwalk#readme On Windows you get *all* stat information from iterating the directory entries (FindFirstFile etc). And on Linux most of the time you get enough for os.walk() not to need an extra stat (though it does depend on the file system). I still hope to clean up Betterwalk and make a C version so we can use it in the standard library. In many cases it speeds up os.walk() by several times, even an order of magnitude in some cases. I intend for it to be a drop-in replacement for os.walk(), just faster. Actually, there's Gregory's scandir() implementation (returning a generator to be able to cope with large directories) on it's way: http://bugs.python.org/issue11406 It's already been suggested to make it return a tuple (with d_type). I'm sure a review of the code (especially the Windows implementation) will be welcome. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 428: stat caching undesirable?
3) Leave it up to performance critical code, such as the import machinery, or walkdirs that Nick mentioned, to do their own caching, and simplify the filepath API for the simple case. But one can still make life easier for code like that, by adding is_file() and friends on the stat result object as I suggested. +1 from me. PEP 428 goes in the right direction with a distinction between pure path and concrete path. Pure path support syntactic operations, whereas I would expect concrete paths to actually access the file system. Having a method like restat() is a hint that something's wrong, I'm convinced this will bite some people. I'm also be in favor of having a wrapper class around os.stat() result which would export utility methods such as is_file()/is_directory() and owner/group, etc attributes. That way, the default behavior would be correct, and this helper class would make it easier for users like walkdir() to implement their own caching. As an added benefit, this would make path objects actually immutable, which is always a good thing (simpler, and you get thread-safety for free). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Slides from today's parallel/async Python talk
Hello, async.submit_work(func, args, kwds, callback=None, errback=None) How do you implement arguments passing and return value? e.g. let's say I pass a list as argument: how do you iterate on the list from the worker thread without modifying the backing objects for refcounts (IIUC you use a per-thread heap and don't do any refcounting). Correct, nothing special is done for the arguments (apart from incref'ing them in the main thread before kicking off the parallel thread (then decref'ing them in the main thread once we're sure the parallel thread has finished)). IIUC you incref the argument from the main thread before publishing it to the worker thread: but what about containers like list? How do you make sure the refcounts of the elements don't get deallocated while the worker thread iterates? More generally, how do you deal with non-local objects? BTW I don't know if you did, but you could probably have a look at Go's goroutines and Erlang processes. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Slides from today's parallel/async Python talk
Just a quick implementation question (didn't have time to read through all your emails :-) async.submit_work(func, args, kwds, callback=None, errback=None) How do you implement arguments passing and return value? e.g. let's say I pass a list as argument: how do you iterate on the list from the worker thread without modifying the backing objects for refcounts (IIUC you use a per-thread heap and don't do any refcounting). Same thing for return value, how do you pass it to the callback? cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Announcement] New mailing list for code quality tools including Flake8, Pyflakes and Pep8
Are you planning to cover the code quality of the interpreter itself too? I've been recently reading through the cert.org secure coding practice recommendations and was wondering if there has is any ongoing effort to perform static analysis on the cpython codebase. AFAICT CPython already benefits from Coverity scans (I guess the Python-security guys receive those notifications). Note that this only covers the C codebase. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Release or not release the GIL
dup2(oldfd, newfd) closes oldfd. No, it doesn't close oldfd. It may close newfd if it was already open. (I guess that's what he meant). Anyway, only dup2() should probably release the GIL. One reasonable heuristic is to check the man page: if the syscall can return EINTR, then the GIL should be released. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 433: Choose the default value of the new cloexec parameter
Library code should not be relying on globals settings that can change. Library code should be explicit in its calls so that the current value of a global setting is irrelevant. That's one of the problems I've raised with this global flag since the beginning: it's useless for libraries, including the stdlib (and, as a reminder, this PEP started out of a a bug report against socket inheritance in socketserver). And once again, it's an hidden global variable, so you won't be able any more to tell what this code does: r, w = os.pipe() if os.fork() == 0: os.close(w) os.execve(['myprog']) Furthermore, if the above code is part of a library, and relies upon 'r' FD inheritance, it will break if the user sets the global cloexec flag. And the fact that a library relies upon FD inheritance is an implementation detail, the users shouldn't have to wonder whether enabling a global flag (in their code, not in a library) will break a given library: the only alternative for such code to continue working would be to pass cloexec=True explicitly to os.pipe()... The global socket.settimeout() is IMO a bad idea, and shouldn't be emulated. So I'm definitely -1 against any form of tunable value (be it a sys.setdefaultcloexec(), an environment variable or command-line flag), and still against changing the default value. But I promise that's the last time I'm bringing those arguments up, and I perfectly admit that some people want it as much as I don't want it :-) cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] usefulness of extension modules section in Misc/NEWS
Hi, What's exactly the guideline for choosing between the Library and Extension modules section when updating Misc/NEWS? Is it just the fact that the modified files live under Lib/ or Modules/? I've frequently made a mistake when updating Misc/NEWS, and when looking at it, I'm not the only one. Is there really a good reason for having distinct sections? If the intended audience for this file are end users, ISTM that the only things that matters is that it's a library change, the fact that the modification impacted Python/C code isn't really relevant. Also, for example if you're rewriting a library from Python to C (or vice versa), should it appear under both sections? FWIW, the What's new documents don't have such a distinction. Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 433: Choose the default value of the new cloexec parameter
Hello, I tried to list in the PEP 433 advantages and drawbacks of each option. If I recorded correctly opinions, the different options have the following supporters: a) cloexec=False by default b) cloexec=True by default: Charles-François Natali c) configurable default value: Antoine Pitrou, Nick Coghlan, Guido van Rossum You can actually count me in the cloexec=False camp, and against the idea of a configurable default value. Here's why: Why cloexec shouldn't be set by default: - While it's really tempting to fix one of Unix historical worst decisions, I don't think we can set file descriptors cloexec by default: this would break some applications (I don't think there would be too many of them, but still), but most notably, this would break POSIX semantics. If Python didn't expose POSIX syscalls and file descriptors, but only high-level file streams/sockets/etc, then we could probably go ahead, but now it's too late. Someone said earlier on python-dev that many people use Python for prototyping, and indeed, when using POSIX API, you expect POSIX semantics. Why the default value shouldn't be tunable: - I think it's useless: if the default cloexec behavior can be altered (either by a command-line flag, an environment variable or a sys module function), then libraries cannot rely on it and have to make file descriptors cloexec on an individual basis, since the default flag can be disabled. So it would basically be useless for the Python standard library, and any third-party library. So the only use case is for application writers that use raw exec() (since subprocess already closes file descriptors 3, and AFAICT we don't expose a way to create processes manually on Windows), but there I think they fall into two categories: those who are aware of the problem of file descriptor inheritance, and who therefore set their FDs cloexec manually, and those who are not familiar with this issue, and who'll never look up a sys.setdefaultcloexec() tunable (and if they do, they might think: Hey, if that's so nice, why isn't it on by default? Wait, it might break applications? I'll just leave the default then.). - But most importantly, I think such a tunable flag is a really wrong idea because it's a global tunable that alters the underlying operating system semantics. Consider this code: r, w = os.pipe() if os.fork() == 0: os.execve(['myprog']) With a tunable flag, just by looking at this code, you have no way to know whether the file descriptor will be inherited by the child process. That would be introducing an hidden global variable silently changing the semantics of the underlying operating system, and that's just so wrong. Sure, we do have global tunables: sys.setcheckinterval() sys.setrecursionlimit() sys.setswitchinterval() hash_randomization But those alter extralinguistic behavior, i.e. they don't affect the semantics of the language or underlying operating system in a way that would break or change the behavior of a conforming program. Although it's not as bad, just to belabor the point, imagine we introduced a new method: sys.enable_integer_division(boolean) Depending on the value of this flag, the division of two integers will either yield a floating point or truncated integer value. Global variables are bad, hidden global variables are worse, and hidden global variables altering language/operating system semantics are evil :-) What I'd like to see: - Adding a cloexec parameter to file descriptor creating functions/classes is fine, it will make it easier for a library/application writer to create file descriptors cloexec, especially in an atomic way. - We should go over the standard library, and create FDs cloexec if they're not handed over to the caller, either because they're opened/closed before returning, or because the underlying file descriptor is kept private (not fileno() method, although it's relatively rare). That's the approach chosen by glibc, and it makes sense: if another thread forks() while a thread is in the middle of getpwnam(), you don't want to leak an open file descriptor to /etc/passwd (or /etc/shadow). cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 433: Add cloexec argument to functions creating file descriptors
Hello, PEP: 433 Title: Add cloexec argument to functions creating file descriptors I'm not a native English speaker, but it seems to me that the correct wording should be parameter (part of the function definition/prototype, whereas argument refers to the actual value supplied). This PEP proposes to add a new optional argument ``cloexec`` on functions creating file descriptors in the Python standard library. If the argument is ``True``, the close-on-exec flag will be set on the new file descriptor. It would probably be useful to recap briefly what the close-on-exec flag does. Also, ISTM that Windows also supports this flag. If it does, then cloexec might not be the best name, because it refers to the execve() Unix system call. Maybe something like noinherit would be clearer (although coming from a Unix background cloexec is crystal-clear to me :-). On UNIX, subprocess closes file descriptors greater than 2 by default since Python 3.2 [#subprocess_close]_. All file descriptors created by the parent process are automatically closed. (in the child process) ``xmlrpc.server.SimpleXMLRPCServer`` sets the close-on-exec flag of the listening socket, the parent class ``socketserver.BaseServer`` does not set this flag. As has been discussed earlier, the real issue is that the server socket is not closed in the child process. Setting it cloexec would only add an extra security for multi-threaded programs. Inherited file descriptors issues - Closing the file descriptor in the parent process does not close the related resource (file, socket, ...) because it is still open in the child process. You might want to go through the bug tracker to find examples of such issues, and list them: http://bugs.python.org/issue7213 http://bugs.python.org/issue12786 http://bugs.python.org/issue2320 http://bugs.python.org/issue3006 The list goes on. Some of those examples resulted in deadlocks. The listening socket of TCPServer is not closed on ``exec()``: the child process is able to get connection from new clients; if the parent closes the listening socket and create a new listening socket on the same address, it would get an address already is used error. See above for the real cause. Not closing file descriptors can lead to resource exhaustion: even if the parent closes all files, creating a new file descriptor may fail with too many files because files are still open in the child process. You might want to detail the course of events (a child if forked before the parent gets a chance to close the file descriptors... EMFILE). Leaking file descriptors is a major security vulnerability. An untrusted child process can read sensitive data like passwords and take control of the parent process though leaked file descriptors. It is for example a known vulnerability to escape from a chroot. You might add a link to this: https://www.securecoding.cert.org/confluence/display/seccode/FIO42-C.+Ensure+files+are+properly+closed+when+they+are+no+longer+needed It can also result in DoS (if the child process highjacks the server socket and accepts connections). Example of vulnerabilities: http://www.openssh.com/txt/portable-keysign-rand-helper.adv http://www.securityfocus.com/archive/1/348368 http://cwe.mitre.org/data/definitions/403.html The problem is that these flags and functions are not portable: only recent versions of operating systems support them. ``O_CLOEXEC`` and ``SOCK_CLOEXEC`` flags are ignored by old Linux versions and so ``FD_CLOEXEC`` flag must be checked using ``fcntl(fd, F_GETFD)``. If the kernel ignores ``O_CLOEXEC`` or ``SOCK_CLOEXEC`` flag, a call to ``fcntl(fd, F_SETFD, flags)`` is required to set close-on-exec flag. .. note:: OpenBSD older 5.2 does not close the file descriptor with close-on-exec flag set if ``fork()`` is used before ``exec()``, but it works correctly if ``exec()`` is called without ``fork()``. That would be *really* surprising, are your sure your test case is correct? Otherwise it could be a compilation issue, because I simply can't believe OpenBSD would ignore the close-on-exec flag. This PEP only change the close-on-exec flag of file descriptors created by the Python standard library, or by modules using the standard library. Third party modules not using the standard library should be modified to conform to this PEP. The new ``os.set_cloexec()`` function can be used for example. Impacted functions: * ``os.forkpty()`` * ``http.server.CGIHTTPRequestHandler.run_cgi()`` I've opened http://bugs.python.org/issue16945 to rewrite this to use subprocess. Impacted modules: * ``multiprocessing`` * ``socketserver`` * ``subprocess`` * ``tempfile`` Hum, I thought temporay file are already created with the close-on-exec flag. * ``xmlrpc.server`` * Maybe: ``signal``, ``threading`` XXX Should ``subprocess.Popen`` set the close-on-exec flag on file XXX XXX descriptors of the
Re: [Python-Dev] fork or exec?
*Lots* of applications make use of POSIX semantics for fork() / exec(). This doesn't mean much. We're talking about inheritance of FDs 2 upon exec, which is a very limited subset of POSIX semantics for fork() / exec(). I personally think that there's been enough feedback to show that we should stick with the default POSIX behavior, however broken it is... Can someone please point to a writeop of the security issues involved? I've posted sample codes earlier in this thread, but here's a writeup by Ulrich Drepper: http://udrepper.livejournal.com/20407.html ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Set close-on-exec flag by default in SocketServer
The SocketServer class creates a socket to listen on clients, and a new socket per client (only for stream server like TCPServer, not for UDPServer). Until recently (2011-05-24, issue #5715), the listening socket was not closed after fork for the ForkingMixIn flavor. This caused two issues: it's a security leak, and it causes address already in use error if the server is restarted (see the first message of #12107 for an example with Django). Note that the server socket is actually still not closed in the child process: one this gets fixed, setting FD_CLOEXEC will not be useful anymore (but it would be an extra security it it could be done atomically, especially against race conditions in multi-threaded applications). (Same thing for the client socket, which is actually already closed in the parent process). As for the backward compatibility issue, here's a thought: subprocess was changed in 3.2 to close all FDs 2 in the child process by default. AFAICT, we didn't get a single report complaining about this behavior change. OTOH, we did get numerous bug reports due to FDs inherited by subprocesses before that change. (I know that Python = 3.2 is less widespread than its predecessors, but still). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] fork or exec?
So, I read your e-mail again and I'm wondering if you're making a logic error, or if I'm misunderstanding something: 1. first you're talking about duplicate file or socket objects after *fork()* (which is an issue I agree is quite annoying) 2. the solution you're proposing doesn't close the file descriptors after fork() but after *exec()*. Basically the solution doesn't address the problem. Many fork() calls aren't followed by an exec() call (multiprocessing comes to mind). Yes. In this specific case, the proper solution is to close the server socket right after fork() in the child process. We can't do anything about file descriptors inherited upon fork() (and shouldn't do anything of course, except on an individual basis like this socket server example). On the other hand, setting file descriptors close-on-exec has the advantage of avoiding file descriptor inheritance to spawned (fork()+exec()) child processes, which, in 99% of cases, don't need them (apart from stdin/stdout/stderr). Not only can this cause subtle bugs (socket/file not being closed when the parent closes the file descriptor, deadlocks, there are several such examples in the bug tracker), but also a security issue, because contrarily to a fork()ed process which runs code controlled by the library/user, after exec() you might be running arbitrary code. Let's take the example of CGIHTTPServer: # Child try: try: os.setuid(nobody) except os.error: pass os.dup2(self.rfile.fileno(), 0) os.dup2(self.wfile.fileno(), 1) os.execve(scriptfile, args, env) The code tries to execute a CGI script as user nobody to minimize privilege, but if the current process has an sensitive file opened, the file descriptor will be leaked to the CGI script, which can do anything with it. In short, close-on-exec can solve a whole class of problems (but does not really apply to this specific case). On the other hand, the one widespread user of exec() after fork() in the stdlib, namely subprocess, *already* closes file descriptors by default, so the exec() issue doesn't really exist anymore for us (or is at least quite exotic). See the above example. There can be valid reasons to use fork()+exec() instead of subprocess. Disclaimer: I'm not saying we should be changing all FDs to close-on-exec by default like Ruby did, I'm just saying that there's a real problem. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] fork or exec?
Network servers like inetd or apache MPM (prefork) uses a process listening on a socket, and then fork to execute a request in a child process. I don't know how it works exactly, but I guess that the child process need a socket from the parent to send the answer to the client. If the socket is closed on execute (ex: Apache with CGI), it does not work :-) Yes, but the above (setting close-on-exec by default) would *not* apply to stdin, stdout and stderr. inetd servers use dup(socket, 0); dup(socket, 1, dup(socket, 2) before forking, so it would still work. Example with CGIHTTPRequestHandler.run_cgi(), self.connection is the socket coming from accept(): self.rfile = self.connection.makefile('rb', self.rbufsize) self.wfile = self.connection.makefile('wb', self.wbufsize) ... try: os.setuid(nobody) except OSError: pass os.dup2(self.rfile.fileno(), 0) os.dup2(self.wfile.fileno(), 1) os.execve(scriptfile, args, env) Same thing here. And the same thing holds for shell-type pipelines: you're always using stdin, stdout or stderr. Do you have an example of what that something may be? Apart from standard streams, I can't think of any inherited file descriptor an external program would want to rely on. Indeed, it should be really rare. There are far more programs that are bitten by FD inheritance upon exec than programs relying on it, and whereas failures and security issues in the first category are hard to debug and unpredictable (especially in a multi-threaded program), a program relying on a FD that would be closed will fail immediately with EBADF, and so could be updated quickly and easily. In other words, I think close-on-exec by default is probably a reasonable decision. close-on-exec should probably have been the default in Unix, and is a much saner option. The only question is whether we're willing to take the risk of breaking - admittedly a handful - of applications to avoid a whole class of difficult to debug and potential security issues. Note that if we do choose to set all file descriptors close-on-exec by default, there are several questions open: - This would hold for open(), Socket() and other high-level file-descriptor wrappers. Should it be enabled also for low-level syscall wrappers like os.open(), os.pipe(), etc? - On platforms that don't support atomic close-on-exec (e.g. open() with O_CLOEXEC, socket() with SOCK_CLOEXEC, pipe2(), etc), this would require extra fcntl()/ioctl() syscalls. The cost is probably negligible, but we'd have to check the impact on some benchmarks. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] fork or exec?
That could always be overcome by passing close_fds=False explicitly to subprocess from my code, though, right? I'm not doing that now, but then I'm not using the esoteric options in python-gnupg code, either. You could do that, or better explicitly support this option, and only specify this file descriptor in subprocess.Popen keep_fds argument. My point was that the GnuPG usage looked like an example where fds other than 0, 1 and 2 might be used by design in not-uncommonly-used programs. From a discussion I had with Barry Warsaw a while ago, I seem to remember that there was other software which relied on these features. See [1] for details. Yes, it might be used. But I maintain that it should be really rare, and even if it's not, since the official way to launch subprocesses is through the subprocess module, FDs 2 are already closed by default since Python 3.2. And the failure will be immediate and obvious (EBADF). Note that I admit I may be completely wrong, that's why I suggested to Victor to bring this up on python-dev to gather as much feedback as possible. Something saying like we never ever break backward compatibility intentionally, even in corner cases or this would break POSIX semantics would be enough (but OTOH, the subprocess change did break those hypothetical rules). Another pet peeve of mine is the non-handling of EINTR by low-level syscall wrappers, which results in code like this spread all over the stdlib and user code: while True: try: return syscall(...) except OSError as e: if e.errno =!= errno.EINTR: raise (and if it's select()/poll()/etc, the code doesn't update the timeout in 90% of cases). It gets a little better since the Exception hierarchy rework (InterruptedException), but's still a nuisance. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Set close-on-exec flag by default in SocketServer
My question is: would you accept to break backward compatibility (in Python 3.4) to fix a potential security vulnerability? Although obvious, the security implications are not restricted to sockets (yes, it's a contrived example): # cat test_inherit.py import fcntl import os import pwd import sys f = open(/tmp/passwd, 'w+') #fcntl.fcntl(f.fileno(), fcntl.F_SETFD, fcntl.FD_CLOEXEC) if os.fork() == 0: os.setuid(pwd.getpwnam('nobody').pw_uid) os.execv(sys.executable, ['python', '-c', 'import os; os.write(3, owned)']) else: os.waitpid(-1, 0) f.seek(0) print(f.read()) f.close() # python test_inherit.py owned I'm not sure that close-on-exec flag must be set on the listening socket *and* on the client sockets. What do you think? In the listening socket is inherited, it can lead to EADDRINUSE, or the child process hijacking new connections (by accept()ing on the same socket). As for the client sockets, there's at least one reason to set them close-on-exec: if a second forked process inherits the first process' client socket, even when the first client closes its file descriptor (and exits), the socket won't be closed until the the second process exits too: so one long-running child process can delay other child processes connection shutdown for arbitrarily long. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bumping autoconf from 2.68 to 2.69
My understanding is that we use a specific version of autoconf. The reason is that otherwise we end up with useless churn in the repo as the generated file changes when different committers use different versions. In the past we have had issues with a new autoconf version actually breaking the Python build, so we also need to test a new version before switching to it. Well, so I guess all committers will have to use the same Linux/FreeBSD/whatever distribution then? AFAICT there's no requirement regarding the mercurial version used by committers either. Charles ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Bumping autoconf from 2.68 to 2.69
It should be sufficient to install autoconf-x.y into /home/user/bin or something similar. Installing autoconf from source really takes about 3 minutes. Well, maybe, maybe not. autoconf depends on a least m4 and Perl, and you may very well have a compatibility issue here. That's why most distributions have package managers, and in 2012 we're past the './configure make make install. It doesn't matter which OS or Mercurial version a developer uses as they don't implicitly affect any versioned resources; autoconf does. If you're worried about the noise in diff, it's never been a problem at least to me (just don't post a configure diff for review, the configure.ac is enough). If you're worried about runtime compatibility, then autoconf is not your only worry. Proper build also depends on the target shell, target toolchain (gcc, libc, etc). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Checking if unsigned int less then zero.
Playing with cpython source, I found some strange strings in socketmodule.c: --- if (flowinfo 0 || flowinfo 0xf) { PyErr_SetString( PyExc_OverflowError, getsockaddrarg: flowinfo must be 0-1048575.); return 0; } --- --- if (flowinfo 0 || flowinfo 0xf) { PyErr_SetString(PyExc_OverflowError, getsockaddrarg: flowinfo must be 0-1048575.); return NULL; } --- The flowinfo variable declared few strings above as unsgined int. Is there any practical sense in this check? Seems like gcc just removes this check. I think any compiler will generate code that checks as unsigned, for example in x86 its JAE/JGE. May be this code is for bad compilers or exotic arch? Removed. Thanks, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] [help wanted] - IrDA sockets support
Hi, Issue #1522400 (http://bugs.python.org/issue1522400) has a patch adding IrDA socket support. It builds under Linux and Windows, however it cannot go any further because no developer involved in the issue has access to IrDA capable devices, which makes testing impossible. So, if you have access to such devices and are interested, feel free to chime in and help get this merged. Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Closes Issue #14661: posix module: add O_EXEC, O_SEARCH, O_TTY_INIT (I add some
jesus.cea python-check...@python.org wrote: http://hg.python.org/cpython/rev/2023f48b32b6 changeset: 76537:2023f48b32b6 user:Jesus Cea j...@jcea.es date:Tue Apr 24 20:59:17 2012 +0200 summary: Closes Issue #14661: posix module: add O_EXEC, O_SEARCH, O_TTY_INIT (I add some Solaris constants too) Could you please add a Misc/NEWS entry for all this? I also tend to always update Misc/ACKS too, even for trivial patches. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Experimenting with STM on CPython
Yes, that's using STM on my regular laptop. How HTM would help remains unclear at this point, because in this approach transactions are typically rather large --- likely much larger than what the first-generation HTM-capable processors will support next year. Ok. I guess once the code is there, the hardware will eventually catch up. However, I'm not sure what you consider large. A lot of manipulation operations for the builtin types are not all that involved, at least in the normal cases (read: fast paths) that involve no memory reallocation etc., and anything that can be called by and doesn't call into the interpreter would be a complete and independent transaction all by itself, as the GIL is allowed to be released between any two ticks. Large as in L2-cache large, and as in you won't get a page fault or an interrupt, you won't make any syscall, any I/O... ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 418: Add monotonic clock
What's wrong with time.time() again? As documented in http://docs.python.org/py3k/library/time.html it makes no guarantees, and specifically there is *no* guarantee that it will ever behave *badly*wink/. Of course, we'll have to guarantee that, if a badly-behaved clock is available, users can get access to it, so call that time._time(). I'm not sure I understand your suggestion correctly, but replacing time.time() by time.monotonic() with fallback won't work, because time.monotonic() isn't wall-clock time: it can very well use an arbitrary reference point (most likely system start-up time). As for the hires() function, since there's no guarantee whatsoever that it does provide a better resolution than time.time(), this would be really misleading IMHO. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
I personally don't see any reason to drop a module that isn't terminally broken or unmaintainable, apart from scaring users away by making them think that we don't care about backward compatibility. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] best place for an atomic file API
Hi, Issue #8604 aims at adding an atomic file API to make it easier to create/update files atomically, using rename() on POSIX systems and MoveFileEx() on Windows (which are now available through os.replace()). It would also use fsync() on POSIX to make sure data is committed to disk. For example, it could be used by importlib to avoid races when writting bytecode files (issues #13392, #13003, #13146), or more generally by any application that wants to make sure to end up with a consistent file even in face of crash (e.g. it seems that mercurial implemented their own version). Basically the usage would be, e.g.: with AtomicFile('foo') as f: pickle.dump(obj, f) or with AtomicFile('foo') as f: chunk = heavyCrunch() f.write(chunk) chunk = CrunchSomeMore() f.write(chunk) What would be the best place for a such a class? _pyio, tempfile, or a new atomicfile Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
There actually *is* an easy way, in regular ls: look at the link count. It comes out of ls -l by default, and if it's 1, there will be an identical file. This doesn't tell me which file it is, which is practically useless if I have both python3.3 and python3.2 in that directory. You can use 'ls -i' to print the inode, or you could use find's 'samefile' option. But this is definitely not as straightforward as a it would be for a symlink, and I'm also curious to know the reason behind this choice. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Backed out changeset 36f2e236c601: For some reason, rewinddir() doesn't work as
Can rewinddir() end up touching the filesystem to retrieve data? I noticed that your previous change (the one this checkin reverted) moved it outside the GIL release macros. It just resets a position count. (in glibc). Actually, it also calls lseek() on the directory FD: http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/rewinddir.c;hb=HEAD But lseek() doesn't (normally) perform I/O, it just sets an offset in the kernel file structure: http://lxr.free-electrons.com/source/fs/read_write.c#L38 For example, it's not documented to return EINTR. Now, one could imagine that the kernel could do some read-ahead or some other magic things when passed SEEK_DATA or SEEK_HOLE, but seeking at the beginning of a directory FD should be fast. Anyway, I ended up reverting this change, because for some reason this broke OpenIndiana buildbots (maybe rewinddir() is a no-op before readdir() has been called?). Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] svn.python.org certificate expired
Hi, All the buildbots are turning red because of test_ssl: == ERROR: test_connect (test.test_ssl.NetworkedTests) -- Traceback (most recent call last): File /var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/test/test_ssl.py, line 616, in test_connect s.connect((svn.python.org, 443)) File /var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py, line 519, in connect self._real_connect(addr, False) File /var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py, line 509, in _real_connect self.do_handshake() File /var/lib/buildslave/3.x.murray-gentoo-wide/build/Lib/ssl.py, line 489, in do_handshake self._sslobj.do_handshake() ssl.SSLError: [Errno 1] _ssl.c:420: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed It seems that svn.python.org certificate expired today (09/01/2012). Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] usefulness of Python version of threading.RLock
The yes/no answer is No, we can't drop it. Thanks, that's a clear answer :-) I'm not convinced of the benefits of removing the pure Python RLock implementation Indeed. As noted, this issue with signal handlers is more general, so this wouldn't solve the problem at hand. I just wanted to know whether we could remove this duplicate code, but since it might be used by some implementations, it's best to keep it. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] usefulness of Python version of threading.RLock
Thanks for those precisions, but I must admit it doesn't help me much... Can we drop it? A yes/no answer will do it ;-) I'm pretty sure the Python version of RLock is in use in several alternative implementations that provide an alternative _thread.lock. I think gevent would fall into this camp, as well as a personal project of mine in a similar vein that operates on python3. Sorry, I'm not sure I understand. Do those projects use _PyRLock directly? If yes, then aliasing it to _CRLock should do the trick, no? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] usefulness of Python version of threading.RLock
Hi, Issue #13697 (http://bugs.python.org/issue13697) deals with a problem with the Python version of threading.RLock (a signal handler which tries to acquire the same RLock is called right at the wrong time) which doesn't affect the C version. Whether such a use case can be considered good practise or the best way to fix this is not settled yet, but the question that arose to me is: why do we have both a C and Python version?. Here's Antoine answer (he suggested to me to bring this up on python-dev: The C version is quite recent, and there's a school of thought that we should always provide fallback Python implementations. (also, arguably a Python implementation makes things easier to prototype, although I don't think it's the case for an RLock) So, what do you guys think? Would it be okay to nuke the Python version? Do you have more details on this school of thought? Also, while we're at it, Victor created #13550 to try to rewrite the logging hack of the threading module: there again, I think we could just remove this logging altogether. What do you think? Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fwd: Anyone still using Python 2.5?
Do people still have to use this in commercial environments or is everyone on 2.6+ nowadays? RHEL 5.7 ships with Python 2.4.3. So no, not everybody is on 2.6+ today, and this won't happen before a couple years. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] STM and python
However given advances in locking and garbage collection in the last decade, what attempts have been made recently to try these new ideas out? In particular, how unlikely is it that all the thread safe primitives, global contexts, and reference counting functions be made __transaction_atomic, and magical parallelism performance boosts ensue? I'd say that given that the current libitm implementation uses a single global lock, you're more likely to see a performance loss. TM is useful to synchronize non-trivial operations: an increment or decrement of a reference count is highly trivial (and expensive when performed atomically, as noted), and TM's never going to help if you put each refcount operation in its own transaction: see Armin's http://morepypy.blogspot.com/2011/08/we-need-software-transactional-memory.html for more realistic use cases. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Unexpected behaviour in compileall
2011/11/2 Vinay Sajip vinay_sa...@yahoo.co.uk: I just started getting errors in my PEP 404 / pythonv branch, but they don't at first glance appear related to the functionality of this branch. What I'm seeing is that during installation, some of the .pyc/.pyo files written by compileall have mode 600 rather than the expected 644, with the result that test_compileall fails when run from the installed Python as an unprivileged user. If I manually do It's a consequence of http://hg.python.org/cpython/rev/740baff4f169. I'll fix that. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] socket module build failure
Hello, 2011/10/7 Vinay Sajip vinay_sa...@yahoo.co.uk: I work on Ubuntu Jaunty for my cpython development work - an old version, I know, but still quite serviceable and has worked well for me over many months. With the latest default cpython repository, however, I can't run the regression suite because the socket module now fails to build: It's due to the recent inclusion of PF_CAN support: http://hg.python.org/cpython/rev/e767318baccd It looks like your header files are different from what's found in other distributions. Please reopen issue #10141, we'll try to go from there. Cheers, cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as
I'd have expect this test to fail on _any_ UNIX system if run as root. Root's allowed to write to stuff! Any stuff! About the only permission with any effect on root is the eXecute bit for the exec call, to prevent blindly running random data files. You're right, here's another test on Linux (I must have screwed up when I tested this on my box): # mkdir /tmp/foo # chmod -w /tmp/foo # touch /tmp/foo/bar # ls /tmp/foo bar You can still set the directory immutable if you really want to deny write to root: # chattr +i /tmp/foo # touch /tmp/foo/spam touch: cannot touch `/tmp/foo/spam': Permission denied Equally, why on earth are you running tests as root!?!?!?!?! Madness. It's as bad as compiling stuff as root etc etc. A bad idea all around, securitywise. Agreed, I would personally never run a buildbot as root. I just changed this because I was tired of seeing the same buildbots always red (thus masking real failures). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Using PEP384 Stable ABI for the lzma extension module
That's not a given. Depending on the memory allocator, a copy can be avoided. That's why the str += str hack is much more efficient under Linux than Windows, AFAIK. Even Linux will have to copy a block on realloc in certain cases, no? Probably so. How often is totally unknown to me :) http://www.gnu.org/software/libc/manual/html_node/Changing-Block-Size.html It depends on whether there's enough free memory after the buffer you currently have allocated. I suppose that this becomes a question of what people consider the general case :-) But under certain circumstances (if a large block is requested), the allocator uses mmap(), no? That's right, if the block requested is bigger than mmap_threshold (256K by default with glibc, forgetting the sliding window algorithm): I'm not sure of what percentage of strings/buffers are concerned in a typical program. In which case mremap() should allow to resize without copying anything. Yes, there's no copying. Note however that it doesn't come for free, the kernel will still zero-fill the pages before handling them to user-space. It is still way faster than on, let's say, Solaris. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython (3.2): Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as
summary: Issue #11956: Skip test_import.test_unwritable_directory on FreeBSD when run as root (directory permissions are ignored). The same directory permission semantics apply to other (all?) BSD-derived systems, not just FreeBSD. For example, the test still fails in the same way on OS X when run via sudo. Thanks, I didn't know: I only noticed this on the FreeBSD buildbots (I guess OS-X buildbots don't run as root). Note that it does behave as expected on Linux (note the use of quotation marks, I'm not sure whether this behavior is authorized by POSIX). I changed the test to skip when the effective UID is 0, regardless of the OS, to stay on the safe side. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #12981: rewrite multiprocessing_{sendfd, recvfd} in Python.
On Sun, Sep 25, 2011 at 4:04 AM, charles-francois.natali python-check...@python.org wrote: +if not(sys.platform == 'win32' or (hasattr(socket, 'CMSG_LEN') and + hasattr(socket, 'SCM_RIGHTS'))): raise ImportError('pickling of connections not supported') I'm pretty sure the functionality checks for CMSG_LEN and SCM_RIGHTS mean the platform check for Windows is now redundant. I'm not sure I understand what you mean. FD passing is supported on Unix with sendmsg/SCM_RIGHTS, and on Windows using whatever Windows uses for that purpose (see http://hg.python.org/cpython/file/2b47f0146639/Lib/multiprocessing/reduction.py#l63). If we remove the check for Windows, an ImportError will be raised systematically, unless you suggest that Windows does support sendmsg/SCM_RIGHTS (I somehow doubt Windows supports Unix domain sockets, but I don't know Windows at all). cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
+3 (agreed to Jesse, Antoine and Ask here). The http://bugs.python.org/issue8713 described non-fork implementation that always uses subprocesses rather than plain forked processes is the right way forward for multiprocessing. I see two drawbacks: - it will be slower, since the interpreter startup time is non-negligible (well, normally you shouldn't spawn a new process for every item, but it should be noted) - it'll consume more memory, since we lose the COW advantage (even though it's already limited by the fact that even treating a variable read-only can trigger an incref, as was noted in a previous thread) cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Software Transactional Memory for Python
Hi Armin, This is basically dangerous, because it corresponds to taking lock GIL and lock L, in that order, whereas the thread B takes lock L and plays around with lock GIL in the opposite order. I think a reasonable solution to avoid deadlocks is simply not to use explicit locks inside with atomic blocks. The problem is that many locks are actually acquired implicitely. For example, `print` to a buffered stream will acquire the fileobject's mutex. Also, even if the code inside the with atomic block doesn't directly or indirectely acquire a lock, there's still the possibility of asynchronous code that acquire locks being executed in the middle of this block: for example, signal handlers are run on behalf of the main thread from the main eval loop and in certain other places, and the GC might kick in at any time. Generally speaking it can be regarded as wrong to do any action that causes an unbounded wait in a with atomic block, Indeed. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sendmsg/recvmsg on Mac OS X
The buildbots are complaining about some of tests for the new socket.sendmsg/recvmsg added by issue #6560 for *nix platforms that provide CMSG_LEN. Looks like kernel bugs: http://developer.apple.com/library/mac/#qa/qa1541/_index.html Yes. Mac OS X 10.5 fixes a number of kernel bugs related to descriptor passing [...] Avoid passing two or more descriptors back-to-back. We should probably add @requires_mac_ver(10, 5) for testFDPassSeparate and testFDPassSeparateMinSpace. As for InterruptedSendTimeoutTest and testInterruptedSendmsgTimeout, it also looks like a kernel bug: the syscall should fail with EINTR once the socket buffer is full. I guess one should skip those on OS-X. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] sendmsg/recvmsg on Mac OS X
But Snow Leopard, where these failures occur, is OS X 10.6. *sighs* It still looks like a kernel/libc bug to me: AFAICT, both the code and the tests are correct. And apparently, there are still issues pertaining to FD passing on 10.5 (and maybe later, I couldn't find a public access to their bug tracker): http://lists.apple.com/archives/Darwin-dev/2008/Feb/msg00033.html Anyway, if someone with a recent OS X release could run test_socket, it would probably help. Follow ups to http://bugs.python.org/issue6560 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
2011/8/23, Nir Aides n...@winpdb.org: Hi all, Hello Nir, Please consider this invitation to stick your head into an interesting problem: http://bugs.python.org/issue6721 Just for the record, I'm now in favor of the atfork mechanism. It won't solve the problem for I/O locks, but it'll at least make room for a clean and cross-library way to setup atfork handlers. I just skimmed over it, but it seemed Gregory's atfork module could be a good starting point. cf ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] issue 6721 Locks in python standard library should be sanitized on fork
2011/8/23 Antoine Pitrou solip...@pitrou.net: Well, I would consider the I/O locks the most glaring problem. Right now, your program can freeze if you happen to do a fork() while e.g. the stderr lock is taken by another thread (which is quite common when debugging). Indeed. To solve this, a similar mechanism could be used: after fork(), in the child process: - just reset each I/O lock (destroy/re-create the lock) if we can guarantee that the file object is in a consistent state (i.e. that all the invariants hold). That's the approach I used in my initial patch. - call a fileobject method which resets the I/O lock and sets the file object to a consistent state (in other word, an atfork handler) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Comments of the PEP 3151
I assume that ESHUTDOWN is the errno in question? (This is also already mentioned in the PEP.) Indeed, I mentioned it in the PEP, as it appears in asyncore.py. But I can't find it on www.opengroup.org, and no man page on my Linux system (except the errno man page) seems to mention it. It's not POSIX, but it's defined on Linux and FreeBSD (at least): http://lxr.free-electrons.com/source/include/asm-generic/errno.h#L81 http://fxr.watson.org/fxr/source/sys/errno.h?v=FREEBSD53#L122 The description from errnomodule.c says Cannot send after transport endpoint shutdown, but send() actually returns EPIPE, not ESHUTDOWN, when the socket has been shutdown: Indeed, as required by POSIX. But grepping through the Linux kernel source code, it seems to be used extensively for USB devices, see http://lxr.free-electrons.com/ident?i=ESHUTDOWN So the transport endpoint doesn't necessarily refer to a socket. It's also documented in http://lxr.free-electrons.com/source/Documentation/usb/error-codes.txt Finally, I found one place in the networking stack where ESHUTDOWN is used, in the SCTP code: http://lxr.free-electrons.com/source/net/sctp/outqueue.c#L329 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython (2.7): - Issue #12603: Fix pydoc.synopsis() on files with non-negative st_mtime.
+- Issue #12603: Fix pydoc.synopsis() on files with non-negative st_mtime. + Surely you mean non-positive? Non-negative st_mtime being the common case. Of course (st_mtime = 0). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Issue #11784: Improve multiprocessing.Process.join() documentation. Patch by
There’s a dedicated file to thank doc contributors: Doc/ACKS.rst I didn't know about this file, thanks. In my defense, there's this comment at the top of Misc/ACKS: This list is not complete and not in any useful order, but I would like to thank everybody who contributed in any way, with code, hints, bug reports, ideas, moral support, endorsement, or even complaints Without you, I would've stopped working on Python long ago! --Guido What's the rationale for having a dedicated file? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Merge - Issue #12592: Make Python build on OpenBSD 5 (and future major
Note that this commit wasn't actually a merge -- you'll have to use the hg merge command for that. You're right. I guess that's what happens when I try to work past my usual bedtime ;-) By the way, I'm still getting errors upon push, and it looks like when I push a patch, this doesn't trigger any build on the buildbots. It used to work, any idea what's going on? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com