Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
Am 15.02.2012 21:06, schrieb Antoine Pitrou: On Wed, 15 Feb 2012 20:56:26 +0100 Martin v. Löwis mar...@v.loewis.de wrote: With the quartz in Victor's machine, a single clock takes 0.3ns, so three of them make a nanosecond. As the quartz may not be entirely accurate (and also as the CPU frequency may change) you have to measure the clock rate against an external time source, but Linux has implemented algorithms for that. On my system, dmesg shows [2.236894] Refined TSC clocksource calibration: 2793.000 MHz. [2.236900] Switching to clocksource tsc But that's still not meaningful. By the time clock_gettime() returns, an unpredictable number of nanoseconds have elapsed, and even more when returning to the Python evaluation loop. This is not exactly true: while the current time won't be what was returned when using it, it is certainly possible to predict how long it takes to return from a system call. So the result is not accurate, but meaningful. If you are formally arguing that uncertain evens may happen, such as the scheduler interrupting the thread: this is true for any clock reading; the actual time may be many milliseconds off by the time it is used. That is no reason to return to second resolution. So the nanosecond precision is just an illusion, and a float should really be enough to represent durations for any task where Python is suitable as a language. I agree with that statement - I was just refuting your claim that Linux cannot do nanosecond measurements. Please do recognize the point I made to Guido: despite us three agreeing that a float is good enough for time stamps, people will continue to submit patches and ask for new features until we give in. One way to delay that by several years could be to reject the PEP in a way that makes it clear that not only the specific approach is rejected, but any approach using anything else but floats. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] best place for an atomic file API
(MvL complained in the tracker issue about a lack of concrete use cases, but I think fixing race conditions when overwriting bytecode files in importlib and the existing distutils/packaging use cases cover that) I certainly agree that there are applications of atomic replace, and that the os module should expose the relevant platform APIs where available. I'm not so sure that atomic writes is a useful concept. I haven't seen a proposed implementation, yet, but I'm doubtful that truly ACID writes are possible unless the operating system supports transactions (which only Windows 7 does). Even if you are ignoring Isolation, Atomic already is a challenge: if you first write to a tempfile, then rename it, you may end up with a state tempfile (e.g. if the process is killed), and no rollback operation. So atomic write to me promises something that it likely can't deliver. OTOH, I still think that the promise isn't actually asked for in practice (not even when overwriting bytecode files) Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
So, getting back to the topic again, is there any reason why you would oppose backing the ElementTree module in the stdlib by cElementTree's accelerator module? Or can we just consider this part of the discussion settled and start getting work done? I'd still like to know who is in charge of the etree package now. I know that I'm not, so I just don't have any opinion on the technical question of using the accelerator module (it sounds like a reasonable idea, but it also sounds like something that may break existing code). If the maintainer of the etree package would pronounce that it is ok to make this change, I'd have no objection at all. Lacking a maintainer, I feel responsible for any bad consequences of that change, which makes me feel uneasy about it. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
Does this imply that each and every package in the stdlib currently has a dedicated maintainer who promised to be dedicated to it? Or otherwise, should those packages that *don't* have a maintainer be removed from the standard library? That is my opinion, yes. Some people (including myself) are willing to act as maintainers for large sets of modules, covering even code that they don't ever use themselves. Isn't that a bit harsh? ElementTree is an overall functional library and AFAIK the preferred stdlib tool for processing XML for many developers. It currently needs some attention to fix a few issues, expose the fast C implementation by default when ElementTree is imported, and improve the documentation. At this point, I'm interested enough to work on these - given that the political issue with Fredrik Lundh is resolved. However, I can't *honestly* say I promise to maintain the package until 2017. So, what's next? If you feel qualified to make changes, go ahead and make them. Take the praise if they are good changes, take the blame if they fire back. Please do try to stay around until either has happened. It would also good if you would declare I will maintain the etree package. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
I'd still like to know who is in charge of the etree package now. I know that I'm not, so I just don't have any opinion on the technical question of using the accelerator module (it sounds like a reasonable idea, but it also sounds like something that may break existing code). If the maintainer of the etree package would pronounce that it is ok to make this change, I'd have no objection at all. Lacking a maintainer, I feel responsible for any bad consequences of that change, which makes me feel uneasy about it. Martin, as you've seen Fredrik Lundh finally officially ceded the maintenance of the ElementTree code to the Python developers: http://mail.python.org/pipermail/python-dev/2012-February/116389.html The change of backing ElementTree by cElementTree has already been implemented in the default branch (3.3) by Florent Xicluna with careful review from me and others. etree has an extensive (albeit a bit clumsy) set of tests which keep passing successfully after the change. The bots are also happy. In the past couple of years Florent has been the de-facto maintainer of etree in the standard library, although I don't think he ever committed to keep maintaining it for years to come. Neither can I make this commitment, however I do declare that I will do my best to keep the library functional, and I also plan to work on improving its documentation and cleaning up some of the accumulated cruft in its implementation. I also have all the intentions to take the blame if something breaks. That said, Florent is probably the one most familiar with the code at this point, and although his help will be most appreciated I can't expect or demand from him to stick around for a few years. We're all volunteers here, after all. Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
In article cadisq7fg3vgxd39teuvbcvhhmpkuwss0qcksrfpkn5ye0dv...@mail.gmail.com, Nick Coghlan ncogh...@gmail.com wrote: On Thu, Feb 16, 2012 at 12:06 PM, Guido van Rossum gu...@python.org wrote: Anyway, I don't think anyone is objecting against the PEP allowing symlinks now. Yeah, the onus is just back on me to do the final updates to the PEP and patch based on the discussion in this thread. Unless life unexpectedly intervenes, I expect that to happen on Saturday (my time). After that, the only further work is for Ned to supply whatever updates he needs to bring the 2.7 Mac OS X installers into line with the new naming scheme. There are two issues that I know of for OS X. One is just getting a python2 symlink into the bin directory of a framework build. That's easy. The other is managing symlinks (python, python2, and python3) across framework bin directories; currently there's no infrastructure for that. That part will probably have to wait until PyCon. -- Ned Deily, n...@acm.org ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
2012/2/16 Martin v. Löwis mar...@v.loewis.de: Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. We have to decide before the Python 3.3 release if this API is just fine, or if it should be changed. After the release, it will be more difficult to change the API. If os.utimensat() expects a tuple, it would be nice to have a function getting time as a tuple, like the C language has the clock_gettime() function to get a timestamp as a timespec structure. During the discussion, many developers wanted a type allowing to do arithmetic operations like t2-t1 to compute a delta, or t+delta to set a timezone. It is possible to do arithmetic on a tuple, but it is not practical and I don't like a type with a fixed resolution (in some cases you need millisecond, microseconds or 100 ns resolution). If you consider that the float loss of precision is not an issue for nanoseconds, we should use float for os.utimensat(), os.futimens() and signal.sigtimedwait(), just for consistency. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] best place for an atomic file API
Most users don't need a truly ACID write, but implement their own best-effort function. Instead of having a different implement in each project, Python can provide something better, especially when the OS provides low level function to implement such feature. Victor 2012/2/16 Martin v. Löwis mar...@v.loewis.de: I'm not so sure that atomic writes is a useful concept. I haven't seen a proposed implementation, yet, but I'm doubtful that truly ACID writes are possible unless the operating system supports transactions (which only Windows 7 does). Even if you are ignoring Isolation, Atomic already is a challenge: if you first write to a tempfile, then rename it, you may end up with a state tempfile (e.g. if the process is killed), and no rollback operation. So atomic write to me promises something that it likely can't deliver. OTOH, I still think that the promise isn't actually asked for in practice (not even when overwriting bytecode files) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
There are two issues that I know of for OS X. One is just getting a python2 symlink into the bin directory of a framework build. That's easy. Where exactly in the Makefile is that reflected? ISTM that the current patch already covers that, since the framwork* targets are not concerned with the bin directory. The other is managing symlinks (python, python2, and python3) across framework bin directories; currently there's no infrastructure for that. That part will probably have to wait until PyCon. What is the framework bin directory? The links are proposed for /usr/local/bin resp. /usr/bin. The proposed patch already manages these links across releases (the most recent install wins). If you are concerned about multiple feature releases: this is not an issue, since the links are just proposed for Python 2.7 (distributions may also add them for 2.6 and earlier, but we are not going to make a release in that direction). It may be that the PEP becomes irrelevant before it is widely accepted: if the sole remaining Python 2 version is 2.7, users may just as well refer to python2 as python2.7. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] best place for an atomic file API
Am 16.02.2012 10:54, schrieb Victor Stinner: Most users don't need a truly ACID write, but implement their own best-effort function. Instead of having a different implement in each project, Python can provide something better, especially when the OS provides low level function to implement such feature. It's then critical how this is named, IMO (and exactly what semantics it comprises). Calling it atomic when it is not is a mistake. Also notice that one user commented that that he already implemented something like this, and left out the issue of *permissions*. I found that interesting, since preserving permissions might indeed a requirement in a lot of in-place update use cases, but hasn't been considered in this discussion yet. So rather than providing a mechanism for atomic writes, I think providing a mechanism to update a file is what people might need. One way of providing this might be a u mode for open, which updates an existing file on close (unlike a, which appends, and unlike w, which truncates first). Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
Am 16.02.2012 10:51, schrieb Victor Stinner: 2012/2/16 Martin v. Löwis mar...@v.loewis.de: Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them. If you consider that the float loss of precision is not an issue for nanoseconds, we should use float for os.utimensat(), os.futimens() and signal.sigtimedwait(), just for consistency. I'm wondering what use cases utimensat and futimens have that are not covered by utime/utimes (except for the higher resolution). Keeping the ns in the name but not doing nanoseconds would be bad, IMO. For sigtimedwait, accepting float is indeed the right thing to do. In the long run, we should see whether using 128-bit floats is feasible. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] best place for an atomic file API
Martin v. Löwis martin at v.loewis.de writes: One way of providing this might be a u mode for open, which updates an existing file on close (unlike a, which appends, and unlike w, which truncates first). Doesn't r+ cover this? Regards, Vinay Sajip ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
On Thu, Feb 16, 2012 at 8:01 PM, Martin v. Löwis mar...@v.loewis.de wrote: It may be that the PEP becomes irrelevant before it is widely accepted: if the sole remaining Python 2 version is 2.7, users may just as well refer to python2 as python2.7. My hope is that a clear signal from us supporting a python2 symlink for cross-distro compatibility will encourage the commercial distros to add such a link to their 2.6 based variants (e.g. anything with an explicit python2.7 reference won't run by default on RHEL6, or rebuilds based on that). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
On Wed, Feb 15, 2012 at 12:44 AM, Barry Warsaw ba...@python.org wrote: On Feb 14, 2012, at 12:38 PM, Nick Coghlan wrote: I have no idea, and I'm not going to open that can of worms for this PEP. We need to say something about the executable aliases so that people can eventually write cross-platform python2 shebang lines, but how particular distros actually manage the transition is going to depend more on their infrastructure and community than it is anything to do with us. Then I think all the PEP needs to say is that it is explicitly up to the distros to determine if, when, where, and how they transition. I.e. take it off of python-dev's plate. It turns out I'd forgotten what was in the PEP - the Notes section already contained a lot of suggestions along those lines. I changed the title of the section to Migration Notes, but tried to make it clear that those *aren't* consensus recommendations, just ideas distros may want to think about when considering making the switch. The updated version is live on python.org: http://www.python.org/dev/peps/pep-0394/ I didn't end up giving an explicit rationale for the choice to use a symlink chain, since it really isn't that important to the main purpose of the PEP (i.e. encouraging distros to make sure python2 is on the system path somewhere). Once MvL or Guido give the nod to the latest version, I'll bump it up to approved. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
2012/2/15 Guido van Rossum gu...@python.org: So using floats we can match 100ns precision, right? Nope, not to store an Epoch timestamp newer than january 1987: x=2**29; (x+1e-7) != x # no loss of precision True x=2**30; (x+1e-7) != x # lose precision False print(datetime.timedelta(seconds=2**29)) 6213 days, 18:48:32 print(datetime.datetime.fromtimestamp(2**29)) 1987-01-05 19:48:32 Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
A data point on this specific use case. The following code throws its assert ~90% of the time in Python 3.2.2 on a modern Linux machine (assuming foo exists and bar does not): import shutil import os shutil.copy2(foo, bar) assert os.stat(foo).st_mtime == os.stat(bar).st_mtime It works because Python uses float for utime() and for stat(). But this assertion may fail if another program checks file timestamps without lossing precision (because of float), e.g. a program written in C that compares st_*time and st_*time_ns fields. I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys. Let's try in a ext4 filesystem: $ ~/prog/python/timestamp/python Python 3.3.0a0 (default:35d6cc531800+, Feb 16 2012, 13:32:56) import decimal, os, shutil, time open(test, x).close() shutil.copy2(test, test2) os.stat(test, timestamp=decimal.Decimal).st_mtime Decimal('1329395871.874886224') os.stat(test2, timestamp=decimal.Decimal).st_mtime Decimal('1329395871.873350282') os.stat(test2, timestamp=decimal.Decimal).st_mtime - os.stat(test, timestamp=decimal.Decimal).st_mtime Decimal('-0.001535942') So shutil.copy2() failed to copy the timestamp: test2 is 1 ms older than test... Let's try with a program not written in Python: GNU make. The makefile: - test2: test @echo Copy test into test2 @~/prog/python/default/python -c 'import shutil; shutil.copy2(test, test2)' test: @echo Create test @touch test clean: rm -f test test2 - First try: $ make clean rm -f test test2 $ make Create test Copy test into test2 $ make Copy test into test2 = test2 is always older than test and so is always regenerated. Second try: $ make clean rm -f test test2 $ make Create test Copy test into test2 $ make make: `test2' is up to date. = oh, here test2 is newer or has the exact same modification time, so there is no need to rebuild it. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
PEP author Victor asked (in http://mail.python.org/pipermail/python-dev/2012-February/116499.html): Maybe I missed the answer, but how do you handle timestamp with an unspecified starting point like os.times() or time.clock()? Should we leave these function unchanged? If *all* you know is that it is monotonic, then you can't -- but then you don't really have resolution either, as the clock may well speed up or slow down. If you do have resolution, and the only problem is that you don't know what the epoch was, then you can figure that out well enough by (once per type per process) comparing it to something that does have an epoch, like time.gmtime(). Hum, I suppose that you can expect that time.time() - time.monotonic() is constant or evolve very slowly. time.monotonic() should return a number of second. But you are right, usually monotonic clocks are less accurate. On Windows, QueryPerformanceCounter() is less accurate than GetSystemTimeAsFileTime() for example: http://msdn.microsoft.com/en-us/magazine/cc163996.aspx (read the The Issue of Frequency section) time.monotonic() (function added to Python 3.3) documentation should maybe mention the second unit and the accuracy issue. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
On Thu, 16 Feb 2012 13:46:18 +0100 Victor Stinner victor.stin...@gmail.com wrote: Let's try in a ext4 filesystem: $ ~/prog/python/timestamp/python Python 3.3.0a0 (default:35d6cc531800+, Feb 16 2012, 13:32:56) import decimal, os, shutil, time open(test, x).close() shutil.copy2(test, test2) os.stat(test, timestamp=decimal.Decimal).st_mtime Decimal('1329395871.874886224') os.stat(test2, timestamp=decimal.Decimal).st_mtime Decimal('1329395871.873350282') This looks fishy. Floating-point numbers are precise enough to represent the difference between these two numbers: f = 1329395871.874886224 f.hex() '0x1.3cf3e27f7fe23p+30' g = 1329395871.873350282 g.hex() '0x1.3cf3e27f7e4f9p+30' If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms): $ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100 In other words, you should check your PEP implementation for bugs. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
The way Linux does that is to use the time-stamping counter of the processor (the rdtsc instructions), which (originally) counts one unit per CPU clock. I believe current processors use slightly different countings (e.g. through the APIC), but still: you get a resolution within the clock frequency of the CPU quartz. Linux has an internal clocksource API supporting different hardwares: PIT (Intel 8253 chipset): configurable frequency between 8.2 Hz and 1.2 MHz PMTMR (power management timer): ACPI clock with a frequency of 3.5 MHz TSC (Time Stamp Counter): frequency of your CPU HPET (High Precision Event Timer): frequency of at least 10 MHz (14.3 MHz on my computer) Linux has an algorithm to choose the best clock depend on its performance and accurary. Most clocks have a frequency higher than 1 MHz and so a resolution smaller than 1 us, even if the clock is not really accurate. I suppose that you can plug specialized hardward like an atomic clocks, or a GPS receiver, for a better accurary. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms): $ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100 The loss of precision is not constant: it depends on the timestamp value. Another example using the stat program: import decimal, os, shutil, time try: os.unlink(test) except OSError: pass try: os.unlink(test2) except OSError: pass open(test, x).close() shutil.copy2(test, test2) print(os.stat(test, timestamp=decimal.Decimal).st_mtime) print(os.stat(test2, timestamp=decimal.Decimal).st_mtime) print(os.stat(test2, timestamp=decimal.Decimal).st_mtime - os.stat(test, timestamp=decimal.Decimal).st_mtime) os.system(stat test|grep ^Mod) os.system(stat test2|grep ^Mod) Outputs: $ ./python x.py 1329398229.918858600 1329398229.918208829 -0.000649771 Modify: 2012-02-16 14:17:09.918858600 +0100 Modify: 2012-02-16 14:17:09.918208829 +0100 $ ./python x.py 1329398230.862858588 1329398230.861343658 -0.001514930 Modify: 2012-02-16 14:17:10.862858588 +0100 Modify: 2012-02-16 14:17:10.861343658 +0100 $ ./python x.py 1329398232.450858570 1329398232.450067044 -0.000791526 Modify: 2012-02-16 14:17:12.450858570 +0100 Modify: 2012-02-16 14:17:12.450067044 +0100 $ ./python x.py 1329398233.090858561 1329398233.090853761 -0.04800 Modify: 2012-02-16 14:17:13.090858561 +0100 Modify: 2012-02-16 14:17:13.090853761 +0100 The loss of precision is between 1 ms and 4 us. Decimal timestamps display exactly the same value than the stat program: I don't see any bug in this example. Victor PS: Don't try os.utime(Decimal) with my patch, the conversion from Decimal to _PyTime_t does still use float internally (I know this issue, it should be fixed in my patch) and so loss precision ;-) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
Le jeudi 16 février 2012 à 14:20 +0100, Victor Stinner a écrit : If I run your snippet and inspect modification times using `stat`, the difference is much smaller (around 10 ns, not 1 ms): $ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100 The loss of precision is not constant: it depends on the timestamp value. Well, I've tried several times and I can't reproduce a 1 ms difference. The loss of precision is between 1 ms and 4 us. It still looks fishy to me. IEEE doubles have a 52-bit mantissa. Since the integral part of a timestamp takes 32 bits or less, there are still 20 bits left for the fractional part: which allows for at least a 1 µs precision (2**20 ~= 10**6). A 1 ms precision loss looks like a bug. Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
I'm away from the source for the next 36 hours. I'll reply with patches by Saturday morning. ___ Ned Deily n...@acm.org -- [] . Original Message ... On Thu, 16 Feb 2012 11:01:39 +0100 Martin v. Löwis mar...@v.loewis.de wrote: There are two issues that I know of for OS X. One is just getting a python2 symlink into the bin directory of a framework build. That's easy. Where exactly in the Makefile is that reflected? ISTM that the current patch already covers that, since the framwork* targets are not concerned with the bin directory. The other is managing symlinks (python, python2, and python3) across framework bin directories; currently there's no infrastructure for that. That part will probably have to wait until PyCon. What is the framework bin directory? The links are proposed for /usr/local/bin resp. /usr/bin. The proposed patch already manages these links across releases (the most recent install wins). If you are concerned about multiple feature releases: this is not an issue, since the links are just proposed for Python 2.7 (distributions may also add them for 2.6 and earlier, but we are not going to make a release in that direction). It may be that the PEP becomes irrelevant before it is widely accepted: if the sole remaining Python 2 version is 2.7, users may just as well refer to python2 as python2.7. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] best place for an atomic file API
15.02.12 23:16, Charles-François Natali написав(ла): Issue #8604 aims at adding an atomic file API to make it easier to create/update files atomically, using rename() on POSIX systems and MoveFileEx() on Windows (which are now available through os.replace()). It would also use fsync() on POSIX to make sure data is committed to disk. For example, it could be used by importlib to avoid races when writting bytecode files (issues #13392, #13003, #13146), or more generally by any application that wants to make sure to end up with a consistent file even in face of crash (e.g. it seems that mercurial implemented their own version). What if target file is symlink? ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
On 02/15/2012 08:12 PM, Guido van Rossum wrote: On Wed, Feb 15, 2012 at 7:28 PM, Larry Hastingsla...@hastings.org wrote: I fixed this in trunk last September (issue 12904); os.utime now preserves all the precision that Python currently conveys. So, essentially you fixed this particular issue without having to do anything as drastic as the proposed PEP... I wouldn't say that. The underlying representation is still nanoseconds, and Python only preserves roughly hundred-nanosecond precision. My patch only ensures that reading and writing atime/mtime looks consistent to Python programs using the os module. Any code that examined the nanosecond-precise values from stat()--written in Python or any other language--would notice the values didn't match. I'm definitely +1 for extending Python to represent nanosecond precision ctime/atime/mtime, but doing so in a way that permits seamlessly adding more precision down the road when the Linux kernel hackers get bored again and add femtosecond resolution. (And then presumably attosecond resolution four years later.) I haven't read 410 yet so I have no opinion on it. I wrote a patch last year that adds new Decimal ctime/mtime/atime fields to the output of stat, but it's a horrific performance regression (os.stat is 10x slower) and the reviewers were ambivalent so I've let it rot. Anyway I now agree that we should improve the precision of datetime objects and use those instead of Decimal. (But not timedeltas--ctime/mtime/atime are absolute times, not deltas.) /arry ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 394 request for pronouncement (python2 symlink in *nix systems)
On Feb 16, 2012, at 09:54 PM, Nick Coghlan wrote: It turns out I'd forgotten what was in the PEP - the Notes section already contained a lot of suggestions along those lines. I changed the title of the section to Migration Notes, but tried to make it clear that those *aren't* consensus recommendations, just ideas distros may want to think about when considering making the switch. The updated version is live on python.org: http://www.python.org/dev/peps/pep-0394/ That section looks great Nick, thanks. I have one very minor quibble left. In many places the PEP says something like: For the time being, it is recommended that python should refer to python2 (however, some distributions have already chosen otherwise; see the Rationale and Migration Notes below). which implies that we may change our recommendation, but never quite says what the mechanism is for us to do that. You could change the status of this PEP from Draft to Active, which perhaps implies a little more strongly that this PEP will be updated should our recommendation ever change. I suspect it won't though (or at least won't any time soon). If you mark the PEP as Final, we still have the option of updating the PEP some time later to reflect new recommendations. It might be worth a quick sentence to that effect in the PEP. As I say though, this is a very minor quibble, so just DTRT. +1 and thanks for your great work on it. Cheers, -Barry signature.asc Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
On 14/02/2012 9.58, Stefan Behnel wrote: Nick Coghlan, 14.02.2012 05:44: On Tue, Feb 14, 2012 at 2:25 PM, Eli Bendersky wrote: With the deprecation warning being silent, is there much to lose, though? Yes, it creates problems for anyone that deliberately converts all warnings to errors when running their test suites. This forces them to spend time switching over to a Python version dependent import of either cElementTree or ElementTree that could have been spent doing something actually productive instead of mere busywork. If I'm writing code that imports cElementTree on 3.3+, and I explicitly turn on DeprecationWarnings (that would otherwise be silenced) to check if I'm doing something wrong, I would like Python to tell me You don't need to import that anymore, just use ElementTree.. If I'm also converting all the warnings to errors, it's probably because I really want my code to do the right thing and spending 1 minute to add/change two line of code to fix this won't probably bother me too much. Regular users won't even notice the warning, unless they stumble upon the note in the doc or enable the warnings (and eventually when the module is removed). And, of course, even people that *don't* convert warnings to errors when running tests will have to make the same switch when the module is eventually removed. When the module is eventually removed and you didn't warn them in advance, the situation is going to turn much worse, because their code will suddenly stop working once they upgrade to the newer version. I don't mind keeping the module and the warning around for a few versions and give enough time for everyone to update their imports, but if eventually the module is removed I don't want all these developers to come and say why you removed cElementTree without saying anything and broke all my code?. I'm -1 on emitting a deprecation warning just because cElementTree is being replaced by a bare import. That's an implementation detail, just like cElementTree should have been an implementation detail in the first place. In all currently maintained CPython releases, importing cElementTree is the right thing to do for users. From 3.3 the right thing will be importing ElementTree, and at some point in the future that will be the only way to do it. These days, other Python implementations already provide the cElementTree module as a bare alias for ElementTree.py anyway, without emitting any warnings. Why should CPython be the only one that shouts at users for importing it? I would watch this from the opposite point of view. Why should the other Python implementation have a to keep around a dummy module due to a CPython implementation detail? If we all go through a deprecation process we will eventually be able to get rid of this. Best Regards, Ezio Melotti Stefan ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
I personally don't see any reason to drop a module that isn't terminally broken or unmaintainable, apart from scaring users away by making them think that we don't care about backward compatibility. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP for new dictionary implementation
PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt): ... allows ... (the ``__dict__`` attribute of an object) to share keys with other attribute dictionaries of instances of the same class. Is the same class a deliberate restriction, or just a convenience of implementation? I have often created subclasses (or even families of subclasses) where instances (as opposed to the type) aren't likely to have additional attributes. These would benefit from key-sharing across classes, but I grant that it is a minority use case that isn't worth optimizing if it complicates the implementation. By separating the keys (and hashes) from the values it is possible to share the keys between multiple dictionaries and improve memory use. Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances. Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation. How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts. If a table is split the values in the keys table are ignored, instead the values are held in a separate array. If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.) A combined-table dictionary never becomes a split-table dictionary. I thought it did (at least temporarily) as part of resizing; are you saying that it will be re-split by the time another thread is allowed to see it, so that it is never observed as combined? Given that this optimization is limited to class instances, I think there should be some explanation of why you didn't just automatically add slots for each variable assigned (by hard-coded name) within a method; the keys would still be stored on the type, and array storage could still be used for the values; the __dict__ slot could initially be a NULL pointer, and instance dicts could be added exactly when they were needed, covering only the oddball keys. I would reword (or at least reformat) the Cons section; at the moment, it looks like there are four separate objections, and seems to be a bit dismissive towards backwards copmatibility. Perhaps something like: While this PEP does not change any documented APIs or invariants, it does break some de facto invariants. C extension modules may be relying on the current physical layout of a dictionary. That said, extensions which rely on internals may already need to be recompiled with each feature release; there are already changes planned for both Unicode (for efficiency) and dicts (for security) that would require authors of these extensions to at least review their code. Because iteration (and repr) order can depend on the order in which keys are inserted, it will be possible to construct instances that iterate in a different order than they would under the current implementation. Note, however, that this will happen very rarely in code which does not deliberately trigger the differences, and that test cases which rely on a particular iteration order will already need to be corrected in order to take advantage of the security enhancements being discussed under hash randomization, or for use with Jython and PyPy. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
On 16/02/2012 19.55, Antoine Pitrou wrote: On Thu, 16 Feb 2012 19:32:24 +0200 Ezio Melottiezio.melo...@gmail.com wrote: If I'm writing code that imports cElementTree on 3.3+, and I explicitly turn on DeprecationWarnings (that would otherwise be silenced) to check if I'm doing something wrong, I would like Python to tell me You don't need to import that anymore, just use ElementTree.. If I'm also converting all the warnings to errors, it's probably because I really want my code to do the right thing and spending 1 minute to add/change two line of code to fix this won't probably bother me too much. But then you're going from a cumbersome situation (where you have to import cElementTree and then fallback on regular ElementTree) to an even more cumbersome one (where you have to first check the Python version, then conditionally import cElementTree, then fallback on regular ElementTree). This is true if you need to support Python =3.2, but on the long run this won't be needed anymore and a plain import ElementTree will be enough. When the module is eventually removed and you didn't warn them in advance, the situation is going to turn much worse, because their code will suddenly stop working once they upgrade to the newer version. Why would we remove the module? It seems supporting it should be mostly trivial (it's an alias). I'm assuming that eventually the module will be removed (maybe for Python 4?), and I don't expect nor want to seen it removed in the near future. If something gets removed it should be deprecated first, and it's usually better to deprecate it sooner so that the developers have more time to update their code. As I proposed on the tracker though, we could even delay the deprecation to 3.4 (by that time they might not need to support 3.2 anymore). I would watch this from the opposite point of view. Why should the other Python implementation have a to keep around a dummy module due to a CPython implementation detail? I don't know, but they already have this module, and it certainly costs them nothing to keep it. There will also be a cost if people keep importing cElementTree and fall back on ElementTree on failure even when this won't be necessary anymore. This also means that more people will have to fix their code if/when the module will be removed if they kept using cElementTree. They can also find cElementTree in old code/tutorial and figure out that it's better to use the C one because is faster and keep doing so because the only warning that would stop them is hidden in the doc. I think the problem with the DeprecationWarnings being too noisy was fixed by silencing them; if they are still too noisy then we need a better mechanism to warn people who care (and going to check the doc every once in a while to see if some new doc warning has been added doesn't strike me as a valid solution). Best Regards, Ezio Melotti ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] folding cElementTree behind ElementTree in 3.3
On 17 February 2012 04:55, Antoine Pitrou solip...@pitrou.net wrote: But then you're going from a cumbersome situation (where you have to import cElementTree and then fallback on regular ElementTree) to an even more cumbersome one (where you have to first check the Python version, then conditionally import cElementTree, then fallback on regular ElementTree). Well, you can reverse the import so you're not relying on version numbers: import xml.etree.ElementTree as ElementTree try: import xml.etree.cElementTree as ElementTree except ImportError: pass There is a slight cost compared to previously (always importing the python version) and you'll still be using cElementTree directly until it's removed, but if/when it is removed you won't notice it. Tim Delaney ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Store timestamps as decimal.Decimal objects
In http://mail.python.org/pipermail/python-dev/2012-February/116073.html Nick Coghlan wrote: Besides, float128 is a bad example - such a type could just be returned directly where we return float64 now. (The only reason we can't do that with Decimal is because we deliberately don't allow implicit conversion of float values to Decimal values in binary operations). If we could really replace float with another type, then there is no reason that type couldn't be a nearly trivial Decimal subclass which simply flips the default value of the (never used by any caller) allow_float parameter to internal function _convert_other. Since decimal inherits straight from object, this subtype could even be made to inherit from float as well, and to store the lower- precision value there. It could even produce the decimal version lazily, so as to minimize slowdown on cases that do not need the greater precision. Of course, that still doesn't answer questions on whether the higher precision is a good idea ... -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
On Wed, 08 Feb 2012 19:18:14 + Mark Shannon m...@hotpy.org wrote: Proposed PEP for new dictionary implementation, PEP 410? is attached. So, I'm running a few benchmarks using Twisted's test suite (see https://bitbucket.org/pitrou/t3k/wiki/Home). At the end of `python -i bin/trial twisted.internet.test`: - vanilla 3.3: RSS = 94 MB - new dict:RSS = 91 MB At the end of `python -i bin/trial twisted.python.test`: - vanilla 3.3: RSS = 31.5 MB - new dict:RSS = 30 MB At the end of `python -i bin/trial twisted.conch.test`: - vanilla 3.3: RSS = 68 MB - new dict:RSS = 42 MB (!) At the end of `python -i bin/trial twisted.trial.test`: - vanilla 3.3: RSS = 32 MB - new dict:RSS = 30 MB At the end of `python -i bin/trial twisted.test`: - vanilla 3.3: RSS = 62 MB - new dict:RSS = 78 MB (!) Runtimes were mostly similar in these test runs. Perspective broker benchmark (doc/core/benchmarks/tpclient.py and doc/core/benchmarks/tpserver.py): - vanilla 3.3: 422 MB/sec - new dict:402 MB/sec Regards Antoine. ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] plugging the hash attack
In http://mail.python.org/pipermail/python-dev/2012-January/116003.html Benjamin Peterson wrote: 2. It will be off by default in stable releases ... This will prevent code breakage ... 2012/1/27 Steven D'Aprano steve at pearwood.info: ... it will become on by default in some future release? On Fri, Jan 27, 2012, Benjamin Peterson benjamin at python.org wrote: Yes, 3.3. The solution in 3.3 could even be one of the more sophisticated proposals we have today. Brett Cannon (Mon Jan 30) wrote: I think that would be good. And I would even argue we remove support for turning it off to force people to no longer lean on dict ordering as a crutch (in 3.3 obviously). Turning it on by default is fine. Removing the ability to turn it off is bad. If regression tests fail with python 3, the easiest thing to do is just not to migrate to python 3. Some decisions (certainly around unittest, but I think even around hash codes) were settled precisely because tests shouldn't break unless the functionality has really changed. Python 3 isn't yet so dominant as to change that tradeoff. I would go so far as to add an extra step in the porting recommendations; before porting to python 3.x, run your test suite several times with hash randomization turned on; any failures at this point are relying on formally undefined behavior and should be fixed, but can *probably* be fixed just by wrapping the results in sorted. (I would offer a patch to the porting-to-py3 recommendation, except that I couldn't find any not associated specifically with 3.0) -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
Am 11.02.2012 22:22, schrieb Mark Shannon: Antoine Pitrou wrote: Hello Mark, I think the PEP should explain what happens when a keys table needs resizing when setting an object's attribute. If the object is the only instance of a class, it remains split, otherwise the table is combined. Hi Mark, Answering on-list is fine, but please do add such answers to the PEP when requested. I have such a question also: why does it provide storage for the value slot in the keys array, where this slot is actually not used? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
Am 13.02.2012 13:46, schrieb Mark Shannon: Revised PEP for new dictionary implementation, PEP 412? is attached. Committed as PEP 412. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
Am 16.02.2012 19:24, schrieb Jim J. Jewett: PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt): ... allows ... (the ``__dict__`` attribute of an object) to share keys with other attribute dictionaries of instances of the same class. Is the same class a deliberate restriction, or just a convenience of implementation? It's about the implementation: the class keeps a pointer to the key set. A subclass has a separate pointer for that. I have often created subclasses (or even families of subclasses) where instances (as opposed to the type) aren't likely to have additional attributes. These would benefit from key-sharing across classes, but I grant that it is a minority use case that isn't worth optimizing if it complicates the implementation. In particular, the potential savings are small: the instances of the subclass will share the key sets per-class. So if you have S subclasses, you could save up to S keysets, whereas you are already saving N-S-1 keysets (assuming you have a total of N objects across all classes). Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances. I'd be in favor of that, but it is actually an unrelated change: whether or not you share key sets is unrelated to whether or not str-only dicts drop the cached hash. Given a dict, it may be tricky to determine whether or not it is str-only, i.e. what layout to use. Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation. How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts. It's more difficult than that. He also drops the smalltable (which I think is a good idea), so accounting how this all plays together is tricky. If a table is split the values in the keys table are ignored, instead the values are held in a separate array. If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.) Good idea. However, how do you track per-dict how large the table is? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
$ stat test | \grep Modify Modify: 2012-02-16 13:51:25.643597139 +0100 $ stat test2 | \grep Modify Modify: 2012-02-16 13:51:25.643597126 +0100 The loss of precision is not constant: it depends on the timestamp value. Well, I've tried several times and I can't reproduce a 1 ms difference. The loss of precision is between 1 ms and 4 us. It still looks fishy to me. IEEE doubles have a 52-bit mantissa. Since the integral part of a timestamp takes 32 bits or less, there are still 20 bits left for the fractional part: which allows for at least a 1 µs precision (2**20 ~= 10**6). A 1 ms precision loss looks like a bug. Oh... It was a important bug in my function used to change the denominator of a timestamp. I tried to workaround integer overflow, but I added a bug. I changed my patch to use PyLong which has no integer overflow issue. Fixed example: open(test, x).close() import shutil shutil.copy2(test, test2) [94386 refs] print(os.stat(test, datetime.datetime).st_mtime) 2012-02-16 21:58:30.835062+00:00 print(os.stat(test2, datetime.datetime).st_mtime) 2012-02-16 21:58:30.835062+00:00 print(os.stat(test, decimal.Decimal).st_mtime) 1329429510.835061686 print(os.stat(test2, decimal.Decimal).st_mtime) 1329429510.835061789 os.stat(test2, decimal.Decimal).st_mtime - os.stat(test, decimal.Decimal).st_mtime Decimal('1.03E-7') So the difference is only 0.1 us (100 ns). It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision). Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Counting collisions for the win
In http://mail.python.org/pipermail/python-dev/2012-January/115715.html Frank Sievertsen wrote: Am 20.01.2012 13:08, schrieb Victor Stinner: I'm surprised we haven't seen bug reports about it from users of 64-bit Pythons long ago A Python dictionary only uses the lower bits of a hash value. If your dictionary has less than 2**32 items, the dictionary order is exactly the same on 32 and 64 bits system: hash32(str) mask == hash64(str) mask for mask= 2**32-1. No, that's not true. Whenever a collision happens, other bits are mixed in very fast. Frank Bits are mixed in quickly from a denial-of-service standpoint, but Victor is correct from a Why don't the tests already fail? standpoint. A dict with 2**12 slots, holding over 2700 entries, will be far larger than most test cases -- particularly those with visible output. In a dict that size, 32-bit and 64-bit machines will still probe the same first, second, third, fourth, fifth, and sixth slots. Even on the rare cases when there are at least 6 collisions, the next slots may well be either the same, or close enough that it doesn't show up in a changed iteration order. -jJ -- If there are still threading problems with my replies, please email me with details, so that I can try to resolve them. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner victor.stin...@gmail.com wrote: It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision). But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns. -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
2012/2/16 Guido van Rossum gu...@python.org: On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner victor.stin...@gmail.com wrote: It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision). But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns. The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ As shown in my previous email: in such case, make will always rebuild the second file instead of only build it once. Example with two consecutive runs: $ ./python diff.py 1329432426.650957952 1329432426.650958061 1.09E-7 $ ./python diff.py 1329432427.854957910 1329432427.854957819 -9.1E-8 Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
On Thu, Feb 16, 2012 at 2:48 PM, Victor Stinner victor.stin...@gmail.com wrote: 2012/2/16 Guido van Rossum gu...@python.org: On Thu, Feb 16, 2012 at 2:04 PM, Victor Stinner victor.stin...@gmail.com wrote: It doesn't change anything to the Makefile issue, if timestamps are different in a single nanosecond, they are seen as different by make (by another program comparing the timestamp of two files using nanosecond precision). But make doesn't compare timestamps for equality -- it compares for newer. That shouldn't be so critical, since if there is an *actual* causal link between file A and B, the difference in timestamps should always be much larger than 100 ns. The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ As shown in my previous email: in such case, make will always rebuild the second file instead of only build it once. Example with two consecutive runs: $ ./python diff.py 1329432426.650957952 1329432426.650958061 1.09E-7 $ ./python diff.py 1329432427.854957910 1329432427.854957819 -9.1E-8 Have you been able to reproduce this with an actual Makefile? What's the scenario? I'm thinking of a Makefile like this: a: cp /dev/null a b: a cp a b Now say a doesn't exist and we run make b. This will create a and then b. I can't believe that the difference between the mtimes of a and b is so small that if you copy the directory containing Makefile, a and b using a Python tool that reproduces mtimes only with usec accuracy you'll end up with a directory where a is newer than n. What am I missing? -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ (...) Have you been able to reproduce this with an actual Makefile? What's the scenario? Hum. I asked the Internet who use shutil.copy2() and I found an old issue (Decimal('43462967.173053') seconds ago): Python issue #10148: st_mtime differs after shutil.copy2 (october 2010) When copying a file with shutil.copy2() between two ext4 filesystems on 64-bit Linux, the mtime of the destination file is different after the copy. It appears as if the resolution is slightly different, so the mtime is truncated slightly. (...) I don't know if it is a theorical or practical issue. Then I found: Python issue #11941: Support st_atim, st_mtim and st_ctim attributes in os.stat_result They would expose relevant functionality from libc's stat() and provide better precision than floating-point-based st_atime, st_mtime and st_ctime attributes. Which is connected the issue that motivated me to write the PEP: Python issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution Support for such precision is available at the least on 2.6 Linux kernels. This is important for example with the tarfile module with the pax tar format. The POSIX tar standard[3] mandates storing the mtime in the extended header (if it is not an integer) with as much precision as is available in the underlying file system, and likewise to restore this time properly upon extraction. Currently this is not possible. The mailbox module would benefit from having this precision available. For the tarfile use case, we need at least a way to get the modification time with a nanosecond resolution *and* to set the modification time with a nanosecond resolution. We just need to decide which type is the best for this usecase, which is the purpose of the PEP 410 :-) Another use case of nanosecond timestamps are profilers (and maybe benchmark tools). The profiler itself may be implemented in a different language than Python. For example, DTrace uses nanosecond timestamps. -- Other examples. Debian bug #627460: (gcp) Expose nanoseconds in python (15 May 2011) http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627460 Debian bug #626787: (gcp) gcp: timestamp is not always copied exact http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626787 When copying a (large) file from HDD to USB the files timestamp is not copied exact. It seems to work fine with smaller files (up to 1Gig), I couldn't spot the time-diff on these files. (gcp is a grid enabled version of the scp copy command.) fuse-python supports nanosecond resolution: they chose to mimick the C API using: class Timespec(FuseStruct): Cf. struct timespec in time.h: http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html def __init__(self, name=None, **kw): self.tv_sec = None self.tv_nsec = None kw['name'] = name FuseStruct.__init__(self, **kw) Python issue #9079: Make gettimeofday available in time module ... exposes gettimeofday as time.gettimeofday() returning (sec, usec) pair The Oracle database supports timestamps with a nanosecond resolution. A related article about Ruby: http://marcricblog.blogspot.com/2010/04/who-cares-about-nanosecond.html Files are uploaded in groups (fifteen maximum). It was important to know the order on which files have been upload. Depending on the size of the files and users’ internet broadband capacity, some files could be uploaded in the same second. And a last one for the fun: This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps (November 15, 2009) http://ciaranm.wordpress.com/2009/11/15/this-week-in-python-stupidity-os-stat-os-utime-and-sub-second-timestamps/ Yup, that’s right, Python’s underlying type for floats is an IEEE 754 double, which is only good for about sixteen decimal digits. With ten digits before the decimal point, that leaves six for sub-second resolutions, which is three short of the range required to preserve POSIX nanosecond-resolution timestamps. With dates after the year 2300 or so, that leaves only five accurate digits, which isn’t even enough to deal with microseconds correctly. Brilliant. Python does have a half-assed fixed point type. Not sure why they don’t use it more. Victor ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
So, make is unaffected. In my first post on this subject I already noted that the only real use case is making a directory or filesystem copy and then verifying that the copy is identical using native tools that compare times with nsec precision. At least one of the bugs you quote is about the current 1-second granularity, which is already addressed by using floats (up to ~usec precision). The fs copy use case should be pretty rare, and I would be okay with a separate lower-level API that uses a long to represent nanoseconds (though MvL doesn't like that either). Using (seconds, nsec) tuples is silly though. --Guido On Thu, Feb 16, 2012 at 4:04 PM, Victor Stinner victor.stin...@gmail.com wrote: The problem is that shutil.copy2() produces sometimes *older* timestamp :-/ (...) Have you been able to reproduce this with an actual Makefile? What's the scenario? Hum. I asked the Internet who use shutil.copy2() and I found an old issue (Decimal('43462967.173053') seconds ago): Python issue #10148: st_mtime differs after shutil.copy2 (october 2010) When copying a file with shutil.copy2() between two ext4 filesystems on 64-bit Linux, the mtime of the destination file is different after the copy. It appears as if the resolution is slightly different, so the mtime is truncated slightly. (...) I don't know if it is a theorical or practical issue. Then I found: Python issue #11941: Support st_atim, st_mtim and st_ctim attributes in os.stat_result They would expose relevant functionality from libc's stat() and provide better precision than floating-point-based st_atime, st_mtime and st_ctime attributes. Which is connected the issue that motivated me to write the PEP: Python issue #11457: os.stat(): add new fields to get timestamps as Decimal objects with nanosecond resolution Support for such precision is available at the least on 2.6 Linux kernels. This is important for example with the tarfile module with the pax tar format. The POSIX tar standard[3] mandates storing the mtime in the extended header (if it is not an integer) with as much precision as is available in the underlying file system, and likewise to restore this time properly upon extraction. Currently this is not possible. The mailbox module would benefit from having this precision available. For the tarfile use case, we need at least a way to get the modification time with a nanosecond resolution *and* to set the modification time with a nanosecond resolution. We just need to decide which type is the best for this usecase, which is the purpose of the PEP 410 :-) Another use case of nanosecond timestamps are profilers (and maybe benchmark tools). The profiler itself may be implemented in a different language than Python. For example, DTrace uses nanosecond timestamps. -- Other examples. Debian bug #627460: (gcp) Expose nanoseconds in python (15 May 2011) http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=627460 Debian bug #626787: (gcp) gcp: timestamp is not always copied exact http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=626787 When copying a (large) file from HDD to USB the files timestamp is not copied exact. It seems to work fine with smaller files (up to 1Gig), I couldn't spot the time-diff on these files. (gcp is a grid enabled version of the scp copy command.) fuse-python supports nanosecond resolution: they chose to mimick the C API using: class Timespec(FuseStruct): Cf. struct timespec in time.h: http://www.opengroup.org/onlinepubs/009695399/basedefs/time.h.html def __init__(self, name=None, **kw): self.tv_sec = None self.tv_nsec = None kw['name'] = name FuseStruct.__init__(self, **kw) Python issue #9079: Make gettimeofday available in time module ... exposes gettimeofday as time.gettimeofday() returning (sec, usec) pair The Oracle database supports timestamps with a nanosecond resolution. A related article about Ruby: http://marcricblog.blogspot.com/2010/04/who-cares-about-nanosecond.html Files are uploaded in groups (fifteen maximum). It was important to know the order on which files have been upload. Depending on the size of the files and users’ internet broadband capacity, some files could be uploaded in the same second. And a last one for the fun: This Week in Python Stupidity: os.stat, os.utime and Sub-Second Timestamps (November 15, 2009) http://ciaranm.wordpress.com/2009/11/15/this-week-in-python-stupidity-os-stat-os-utime-and-sub-second-timestamps/ Yup, that’s right, Python’s underlying type for floats is an IEEE 754 double, which is only good for about sixteen decimal digits. With ten digits before the decimal point, that leaves six for sub-second resolutions, which is three short of the range required to preserve POSIX nanosecond-resolution timestamps. With dates after the year 2300 or so, that leaves only five accurate digits, which isn’t even enough to deal with microseconds correctly.
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
On Wed, Feb 15, 2012 at 11:39 AM, Guido van Rossum gu...@python.org wrote: Maybe it's okay to wait a few years on this, until either 128-bit floats are more common or cDecimal becomes the default floating point type? +1 ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
On Thu, Feb 16, 2012 at 4:34 PM, Martin v. Löwis mar...@v.loewis.de wrote: Am 16.02.2012 19:24, schrieb Jim J. Jewett: PEP author Mark Shannon wrote (in http://mail.python.org/pipermail/python-dev/attachments/20120208/05be469a/attachment.txt): ... allows ... (the ``__dict__`` attribute of an object) to share keys with other attribute dictionaries of instances of the same class. Is the same class a deliberate restriction, or just a convenience of implementation? It's about the implementation: the class keeps a pointer to the key set. A subclass has a separate pointer for that. I would prefer to see that reason in the PEP; after a few years, I have trouble finding email, even when I remember reading the conversation. Have you timed not storing the hash (in the dict) at all, at least for (unicode) str-only dicts? Going to the string for its own cached hash breaks locality a bit more, but saves 1/3 of the memory for combined tables, and may make a big difference for classes that have relatively few instances. I'd be in favor of that, but it is actually an unrelated change: whether or not you share key sets is unrelated to whether or not str-only dicts drop the cached hash. Except that the biggest arguments against it are that it breaks cache locality, and it changes the dictentry struct -- which this patch already does anyway. Given a dict, it may be tricky to determine whether or not it is str-only, i.e. what layout to use. Isn't that exactly the same determination needed when deciding whether or not to use lookdict_unicode? (It would make the switch to the more general lookdict more expensive, as that would involve a new allocation.) Reduction in memory use is directly related to the number of dictionaries with shared keys in existence at any time. These dictionaries are typically half the size of the current dictionary implementation. How do you measure that? The limit for huge N across huge numbers of dicts should be 1/3 (because both hashes and keys are shared); I assume that gets swamped by object overhead in typical small dicts. It's more difficult than that. He also drops the smalltable (which I think is a good idea), so accounting how this all plays together is tricky. All the more reason to explain in the PEP how he measured or approximated it. If a table is split the values in the keys table are ignored, instead the values are held in a separate array. If they're just dead weight, then why not use them to hold indices into the array, so that values arrays only have to be as long as the number of keys, rather than rounding them up to a large-enough power-of-two? (On average, this should save half the slots.) Good idea. However, how do you track per-dict how large the table is? Why would you want to? The per-instance array needs to be at least as large as the highest index used by any key for which it has a value; if the keys table gets far larger (or even shrinks), that doesn't really matter to the instance. What does matter to the instance is getting a value of its own for a new (to it) key -- and then the keys table can tell it which index to use, which in turn tells it whether or not it needs to grow the array. Are are you thinking of len(o.__dict__), which will indeed be a bit slower? That will happen with split dicts and potentially missing values, regardless of how much memory is set aside (or not) for the missing values. -jJ ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Disabling a test that fails on some bots. Will investigate the failure soon
On Fri, Feb 17, 2012 at 2:09 AM, eli.bendersky python-check...@python.org wrote: diff --git a/Lib/test/test_xml_etree_c.py b/Lib/test/test_xml_etree_c.py --- a/Lib/test/test_xml_etree_c.py +++ b/Lib/test/test_xml_etree_c.py @@ -53,8 +53,8 @@ # actual class. In the Python version it's a class. self.assertNotIsInstance(cET.Element, type) - def test_correct_import_cET_alias(self): - self.assertNotIsInstance(cET_alias.Element, type) + #def test_correct_import_cET_alias(self): + #self.assertNotIsInstance(cET_alias.Element, type) While this one was fixed quickly, *please* don't comment tests out without some kind of explanation in the code (not just in the checkin message). Even better is to use the expected_failure() decorator or the skip() decorator. Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] cpython: Disabling a test that fails on some bots. Will investigate the failure soon
On Fri, Feb 17, 2012 at 05:50, Nick Coghlan ncogh...@gmail.com wrote: On Fri, Feb 17, 2012 at 2:09 AM, eli.bendersky python-check...@python.org wrote: diff --git a/Lib/test/test_xml_etree_c.py b/Lib/test/test_xml_etree_c.py --- a/Lib/test/test_xml_etree_c.py +++ b/Lib/test/test_xml_etree_c.py @@ -53,8 +53,8 @@ # actual class. In the Python version it's a class. self.assertNotIsInstance(cET.Element, type) -def test_correct_import_cET_alias(self): -self.assertNotIsInstance(cET_alias.Element, type) +#def test_correct_import_cET_alias(self): +#self.assertNotIsInstance(cET_alias.Element, type) While this one was fixed quickly, *please* don't comment tests out without some kind of explanation in the code (not just in the checkin message). Even better is to use the expected_failure() decorator or the skip() decorator. http://mail.python.org/mailman/listinfo/python-checkins I just saw this test failing in some bots and wanted to fix it ASAP, without spending time on a real investigation. The follow-up fix came less than 2 hours later. But yes, I agree that commenting out wasn't a good choice - I should've just deleted it for the time I was working on a fix. By the way, I later discussed the failing test with Florent and http://bugs.python.org/issue14035 is the result. That failure had made no sense until Florent got deeper into import_fresh_module. Eli ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP for new dictionary implementation
Good idea. However, how do you track per-dict how large the table is? Why would you want to? The per-instance array needs to be at least as large as the highest index used by any key for which it has a value; if the keys table gets far larger (or even shrinks), that doesn't really matter to the instance. What does matter to the instance is getting a value of its own for a new (to it) key -- and then the keys table can tell it which index to use, which in turn tells it whether or not it needs to grow the array. To determine whether it needs to grow the array, it needs to find out how large the array is, no? So: how do you do that? Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 394 accepted
As the PEP czar for PEP 394, I have reviewed it and am happy to say that I can accept it. I suppose that Nick will keep track of actually implementing it in Python 2.7. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 410 (Decimal timestamp): the implementation is ready for a review
Am 16.02.2012 11:14, schrieb Martin v. Löwis: Am 16.02.2012 10:51, schrieb Victor Stinner: 2012/2/16 Martin v. Löwis mar...@v.loewis.de: Maybe an alternative PEP could be written that supports the filesystem copying use case only, using some specialized ns APIs? I really think that all you need is st_{a,c,m}time_ns fields and os.utime_ns(). I'm -1 on that, because it will make people write complicated code. Python 3.3 *has already* APIs for nanosecond timestamps: os.utimensat(), os.futimens(), signal.sigtimedwait(), etc. These functions expect a (seconds: int, nanoseconds: int) tuple. I'm -1 on adding these APIs, also. Since Python 3.3 is not released yet, it's not too late to revert them. +1. Georg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com