[issue1583] Patch for signal.set_wakeup_fd

2019-04-10 Thread Adam Olsen


Adam Olsen  added the comment:

signalmodule.c has a hack to limit it to the main thread.  Otherwise there's 
all sorts of platform-specific behaviour.

--

___
Python tracker 
<https://bugs.python.org/issue1583>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1583] Patch for signal.set_wakeup_fd

2019-04-10 Thread Adam Olsen


Adam Olsen  added the comment:

signal-safe is different from thread-safe (despite conceptual similarities), 
but regardless it's been a long time since I last delved into this so I'm quite 
rusty.  I could be doing it all wrong.

--

___
Python tracker 
<https://bugs.python.org/issue1583>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1583] Patch for signal.set_wakeup_fd

2019-04-09 Thread Adam Olsen


Adam Olsen  added the comment:

Converting to/from sig_atomic_t could have a compile time check on currently 
supported platforms and isn't buggy for them.  For platforms with a different 
size you could do a runtime check, only allowing a fd in the range of 0-254 
(with 255 reserved); that could sometimes fail, yes, but at least it's 
explicit, easily understood failure.  Just using int would fail in undefined 
ways down the road, likely writing to a random fd instead (corrupting whatever 
it was doing), with no way to trace it back.

Unpacking the int would mean having one sig_atomic_t for 'invalid', using that 
instead of INVALID_FD, plus an array of sig_atomic_t for the fd itself.  Every 
time you want to change the fd you first set the 'invalid' flag, then the 
individual bytes, then clear 'invalid'.

--

___
Python tracker 
<https://bugs.python.org/issue1583>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1583] Patch for signal.set_wakeup_fd

2019-04-09 Thread Adam Olsen


Adam Olsen  added the comment:

Disagree; if you're writing signal-handling code you should be very careful to 
do it properly, even if that's only proper for your current platform.  If you 
can't do it properly you should find an alternative that doesn't involve 
signals.

The fact that sig_atomic_t is only 1 byte on VxWorks strongly implies using int 
WILL fail in strange ways on that platform.  I can see three options:

1) use pycore_atomic.h, implementing it for VxWorks if you haven't already.  
This also implies sig_atomic_t could have been int but wasn't for some reason, 
such as performance.
2) disable wakeup_fd entirely.  It's obscure, GNOME being the biggest user I 
can think of.
3) unpack the int into an array of sig_atomic_t.  Only the main thread writes 
to it so this method is ugly but viable.

--

___
Python tracker 
<https://bugs.python.org/issue1583>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1583] Patch for signal.set_wakeup_fd

2019-04-09 Thread Adam Olsen


Adam Olsen  added the comment:

The fd field may be written from the main thread simultaneous with the signal 
handler activating and reading it out.  Back in 2007 the only POSIX-compliant 
type allowed for that was sig_atomic_t, anything else was undefined.

Looks like pycore_atomic.h should have alternatives now but I'm not at all 
familiar with it.

--

___
Python tracker 
<https://bugs.python.org/issue1583>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10046] Correction to atexit documentation

2010-10-07 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Signals can directly kill a process.  Try SIGTERM to see this.  SIGINT is 
caught and handled by Python, which just happens to default to a graceful exit 
(unless stuck in a lib that prevents that.)  Try pasting your script into an 
interactive interpreter session and you'll see that it doesn't exit at all.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10046
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1441] Cycles through ob_type aren't freed

2010-09-18 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

As far as I know.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1441
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1736792] dict reentrant/threading request

2010-09-17 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I don't believe there's anything to debate on this, so all it really needs is a 
patch, followed by getting someone to review and commit it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1736792
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6643] Throw away more radioactive locks that could be held across a fork in threading.py

2010-07-12 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I don't have any direct opinions on this, as it is just a bandaid.  fork, as 
defined by POSIX, doesn't allow what we do with it, so we're reliant on great 
deal of OS and library implementation details.  The only portable and robust 
solution would be to replace it with a unified fork-and-exec API that's 
implemented directly in C.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6643
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9200] str.isprintable() is always False for large code points

2010-07-09 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

There should be a way to walk the unicode string in Python too.  Afaik there 
isn't.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9200
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9198] Should repr() print unicode characters outside the BMP?

2010-07-08 Thread Adam Olsen

Changes by Adam Olsen rha...@gmail.com:


--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue9198
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8188] Unified hash for numeric types.

2010-03-20 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Why aren't you using 64-bit hashes on 64-bit architectures?

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8188
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8188] Unified hash for numeric types.

2010-03-20 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I assume you mean 63. ;)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8188
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7784] patch for making list/insert at the top of the list avoid memmoves

2010-01-26 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

$ ./python -m timeit -s 'from collections import deque; c = 
deque(range(100))' 'c.append(c.popleft())'
100 loops, best of 3: 0.29 usec per loop

$ ./python -m timeit -s 'c = range(100)' 'c.append(c.pop(0))'
100 loops, best of 3: 0.424 usec per loop

Using flox's issue7784_listobject_perf.diff.  Significantly slower, but it does 
scale linearly.


$ ./python -m timeit -s 'c = range(100)' 'c.insert(0, c.pop())'
100 loops, best of 3: 3.39 msec per loop

Unfortunately inserting does not.  Will future patches attempt to address this?

Note that, if it ends up slower than list and slower than deque there isn't 
really a use case for it.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7784
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1943] improved allocation of PyUnicode objects

2010-01-11 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

On Sun, Jan 10, 2010 at 14:59, Marc-Andre Lemburg
rep...@bugs.python.org wrote:
 BTW, I'm not aware of any changes to the PyUnicodeObject by some
 fastsearch implementation. Could you point me to this ?

/* We allocate one more byte to make sure the string is Ux terminated.
   The overallocation is also used by fastsearch, which assumes that it's
   safe to look at str[length] (without making any assumptions about what
   it contains). */

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1943
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1943] improved allocation of PyUnicode objects

2010-01-10 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Points against the subclassing argument:

* We have a null-termination invariant.  For byte strings this was part of the 
public API, and I'm not sure that's changed for unicode strings; aren't you 
arguing that we should maximize how much of our implementation is a public API? 
 This prevents lazy slicing.

* UTF-16 and UTF-32 are rarely used encodings, especially for longer strings 
(ie files).  For shorter strings (APIs) the unicode object overhead is more 
significant and we'd need a way to slave to the buffer's lifetime to that of 
the unicode object (hard to do).  For longer strings UTF-8 would be much more 
useful, but that's been shot down before.

* subclassing unicode so you can change the meaning of the fields (ie 
allocating your own buffer) is a gross hack.  It relies far too much on fine 
details of the implementation and is fragile (what if you miss the dummy byte 
needed by fastsearch?)  Most of the possible options could be, if they function 
correctly, applied directly to the basetype as a patch, so it's moot.

* If you dislike PyVarObject in general (I think the API is ugly too) you 
should argue for a general policy discouraging future use of it, not just get 
in the way of the one place where it's most appropriate

Terry: PyVarObjects would be much easier to subclass if the type object stored 
an offset to the beginning of the variable section, so it could be 
automatically recalculated for subclasses based on the size of the struct.  
This'd mean the PyBytesObject struct would no longer end with a char 
ob_sval[1].  The down side is a tiny bit more math when accessing the variable 
section (as the offset is no longer constant).

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1943
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-12-14 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

The real, OS signal does not get propagated to the main thread.  Only
the python-level signal handler runs from the main thread.

Correctly written programs are supposed to let select block
indefinitely.  This allows them to have exactly 0 CPU usage, especially
important on laptops and other limited power devices.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-12-14 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

You forget that the original report is about ctrl-C.  Should we abandon
support of it for threaded programs?  Close as won't-fix?

We could also just block SIGINT, but why?  That means we don't support
python signal handlers in threaded programs (signals sent to the
process, not ones sent direct to threads), and IMO threads expecting a
specific signal should explicitly unblock it anyway.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-12-14 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

A better solution would be to block all signals by default, then unblock
specific ones you expect.  This avoids races (as undeliverable signals
are simply deferred.)

Note that readline is not threadsafe anyway, so it doesn't necessarily
need to allow calls from the non-main thread.  Maybe somebody is using
that way, dunno.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3999] Real segmentation fault handler

2009-11-09 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

That's fine, but please provide a link to the new issue once you create it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3999
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1722344] Thread shutdown exception in Thread.notify()

2009-10-20 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Nope, no access.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1722344
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

On Mon, Oct 5, 2009 at 03:03, Marc-Andre Lemburg rep...@bugs.python.org wrote:
 We use UCS2 on narrow Python builds, not UTF-16.

 We might keep the old public API for compatibility, but it should be
 clearly marked as broken for non-BMP scalar values.

 That has always been the case. UCS2 doesn't support surrogates.

 However, we have been slowly moving into the direction of making
 the UCS2 storage appear like UTF-16 to the Python programmer.

 This process is not yet complete and will likely never complete
 since it must still be possible to create things line lone
 surrogates for processing purposes, so care has to be taken
 when using non-BMP code points on narrow builds.

Balderdash.  We expose UTF-16 code units, not UCS-2.  Guido has made
this quite clear.

UTF-16 was designed as an easy transition from UCS-2.  Indeed, if your
code only does searches or joins existing strings then it will Just
Work; declare it UTF-16 and you are done.  We have a lot more work to
do than that (as in this bug report), and we can't reasonably prevent
the user from splitting surrogate pairs via poor code, but a 95%
solution doesn't mean we suddenly revert all the way back to UCS-2.

If the intent really was to use UCS-2 then a correctly functioning
UTF-16 codec would join a surrogate pair into a single scalar value,
then raise an error because it's outside the range representable in
UCS-2.  That's not very helpful though; obviously, it's much better to
use UTF-16 internally.

The alternative (no matter what the configure flag is called) is
UTF-16, not UCS-2 though: there is support for surrogate pairs in
various places, including the \U escape and the UTF-8 codec.
http://mail.python.org/pipermail/python-dev/2008-July/080892.html

If you find places where the Python core or standard library is doing
Unicode processing that would break when surrogates are present you
should file a bug. However this does not mean that every bit of code
that slices a string at an arbitrary point (and hence risks slicing in
the middle of a surrogate) is incorrect -- it all depends on what is
done next with the slice.
http://mail.python.org/pipermail/python-dev/2008-July/080900.html

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5127
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5127] UnicodeEncodeError - I can't even see license

2009-10-05 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

On Mon, Oct 5, 2009 at 12:10, Marc-Andre Lemburg rep...@bugs.python.org wrote:
 All this is just nitpicking, really. UCS2 is a character set,
 UTF-16 an encoding.

UCS is a character set, for most purposes synonymous with the Unicode
character set.  UCS-2 and UTF-16 are both encodings of that character
set.  However, UCS-2 can only represent the BMP, while UTF-16 can
represent the full range.

 If we were to implement Unicode using UTF-16 as storage format,
 we would not be able to store single lone surrogates, since these
 are not allowed in UTF-16. Ditto for unassigned ordinals, invalid
 code points, etc.

No.  Internal usage may become temporarily ill-formed, but this is a
compromise, and acceptable so long as we never export them to other
systems.

Not that I wouldn't *prefer* a system that wouldn't store lone
surrogates, but.. pragmatics prevail.

 Note that I wrote the PEP and worked on the implementation at a time
 when Unicode 2.x was still in use wide-spread use (mostly on Windows)
 and 3.0 was just being release:

        http://www.unicode.org/history/publicationdates.html

I think you hit the nail on the head there.  10 years ago, unicode
meant something different than it does today.  That's reflected in PEP
100 and in the code.  Now it's time to move on, switch to the modern
terminology, modern usage, and modern specs.

 But all that is off-topic for this ticket, so please let's just
 stop such discussions.

It needs to be discussed somewhere.  It's a distraction from fixing
the bug, but at least it's more private here.  Would you prefer email?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5127
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5127] UnicodeEncodeError - I can't even see license

2009-10-04 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Surrogates aren't optional features of UTF-16, we really need to get
this fixed.  That includes .isalpha().

We might keep the old public API for compatibility, but it should be
clearly marked as broken for non-BMP scalar values.

I don't see a problem with changing 2.x.  The existing behaviour is
broken for non-BMP scalar values, so surely nobody can claim dependence
on it.

--
nosy: +Rhamphoryncus
type:  - behavior

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5127
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2009-10-04 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Patch, which uses UTF-32-BE as indicated in my last comment.  Test included.

--
keywords: +patch
Added file: http://bugs.python.org/file15043/py3k-nonBMP-literal.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2009-10-04 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

With some further prodding I've noticed that although the test behaves
as expected in the py3k branch (fails on UTF-32 builds before the
patch), it doesn't fail using python 3.0.  I'm guessing there's
interactions with compile() vs import and the issue 3672 fix.  Still
good enough though, IMO.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7045] utf-8 encoding error

2009-10-03 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I believe this is a duplicate of issue #3297.  When given a high unicode
scalar value directly in the source (rather than in escaped form) python
will split it into surrogates, even on a UTF-32 build where those
surrogates are nonsensical and ill-formed.

Patches for Issue #3672 probably made this more visible.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7045
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2009-10-03 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Looks like the failure mode has changed here, presumably due to issue
#3672 patches.  It now always fails, even after loading from a .pyc. 
This is using py3k via bzr, which reports itself as 3.2a0

$ rm unicodetest.pyc 
$ ./python -c 'import unicodetest'
Result: False
Len: 2 1
Repr: '\ud800\udd23' '\U00010123'
[28877 refs]
$ ./python -c 'import unicodetest'
Result: False
Len: 2 1
Repr: '\ud800\udd23' '\U00010123'
[28708 refs]

--
versions: +Python 2.7, Python 3.1, Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2009-10-03 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I've traced down the biggest problem to decode_unicode in ast.c.  It
needs to convert everything into a form of escapes so it becomes pure
ascii, which then become evaluated back into a unicode object. 
Unfortunately, it uses UTF-16-BE to do so, which always split
surrogates.  Switching it to UTF-32-BE is fairly straightforward, and
works even on UTF-16 (or narrow) builds.

Incidentally, there's no point using the surrogatepass error handler
once we actually support surrogates.

Unfortunately there's a second problem in repr(). 
'\U0001010F'.isprintable() returns True on UTF-32 builds and False on
UTF-16 builds.  This causes repr() to escape it unnecessarily on UTF-16
builds.  repr() at least joins surrogate pairs before its internally
printable test (unlike .isprintable() or any other str method), but it
turns out all of the APIs in unicodectype.c only accept a single 16-bit
int in UTF-16 builds anyway.  That'll be a bigger patch than the first part.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue992389] attribute error after non-from import

2009-08-31 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

The key distinction between this and a bad circular import is that
this is lazy.  You may list the import at the top of your module, but
you never touch it until after you've finished importing yourself (and
they feel the same about you.)

An ugly fix could be done today for module imports by creating a proxy
that triggers the import upon the first attribute access.  A more
general solution could be done with a lazyimport statement, triggered
when the target module finishes importing; only problem there is the
confusing error messages and other oddities if you reassign that name.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue992389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue992389] attribute error after non-from import

2009-08-31 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

It'd probably be sufficient if we raised NameError: lazy import 'foo'
not yet complete.  That should require a set of what names this module
is lazy importing, which is checked in the failure paths of module
attribute lookup and global/builtin lookup.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue992389
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6326] Add a swap method to list

2009-06-30 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Fix it at its source: patch your database engine to use the type you
want.  Or wrap the list without subclassing (__iter__ may be the only
method you need to wrap).

Obscure performance hacks don't warrant language extensions.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue6326
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: Adding a Par construct to Python?

2009-05-19 Thread Adam Olsen
On May 19, 5:05 am, jer...@martinfamily.freeserve.co.uk wrote:
 Thanks for explaining a few things to me. So it would seem that
 replacing the GIL with something which allows better scalability of
 multi-threaded applications, would be very complicated. The paper by
 Jesse Nolle which I referenced in my original posting includes the
 following:

 In 1999 Greg Stein created a patch set for the interpreter that
 removed the GIL, but added granular locking around sensitive
 interpreter operations. This patch set had the direct effect of
 speeding up threaded execution, but made single threaded execution two
 times slower.

 Source:http://jessenoller.com/2009/02/01/python-threads-and-the-global-inter...

 That was ten years ago - do you have any idea as to how things have
 been progressing in this area since then?

https://launchpad.net/python-safethread
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 5:30 am, Tim Wintle tim.win...@teamrubber.com wrote:
 On Thu, 2009-04-16 at 21:44 -0700, Adam Olsen wrote:
  The Wayback Machine has 150 billion pages, so 2**37.  Google's index
  is a bit larger at over a trillion pages, so 2**40.  A little closer
  than I'd like, but that's still 56294995000 to 1 odds of having
  *any* collisions between *any* of the files.  Step up to SHA-256 and
  it becomes 1915619400 to
  1.  Sadly, I can't even give you the odds for SHA-512, Qalculate
  considers that too close to infinite to display. :)

 That might be true as long as your data is completely uniformly
 distributed. For the example you give there's:

 a) a high chance that there's html near the top

 b) a non-uniform distribution of individual words within the text.

 c) a non-unifom distribution of all n-grams within the text (as there is
 in natural language)

 So it's very far from uniformly distributed. Just about the only
 situation where I could imagine that holding would be where you are
 hashing uniformly random data for the sake of testing the hash.

 I believe the point being made is that comparing hash values is a
 probabilistic algorithm anyway, which is fine if you're ok with that,
 but for mission critical software it's crazy.

Actually, *cryptographic* hashes handle that just fine.  Even for
files with just a 1 bit change the output is totally different.  This
is known as the Avalanche Effect.  Otherwise they'd be vulnerable to
attacks.

Which isn't to say you couldn't *construct* a pattern that it would be
vulnerable to.  Figuring that out is pretty much the whole point of
attacking a cryptographic hash.  MD5 has significant vulnerabilities
by now, and other will in the future.  That's just a risk you need to
manage.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, norseman norse...@hughes.net wrote:
 The more complicated the math the harder it is to keep a higher form of
 math from checking (or improperly displacing) a lower one.  Which, of
 course, breaks the rules.  Commonly called improper thinking. A number
 of math teasers make use of that.

Of course, designing a hash is hard.  That's why the *recommended*
ones get so many years of peer review and attempted attacks first.

I'd love of Nigel provided evidence that MD5 was broken, I really
would.  It'd be quite interesting to investigate, assuming malicious
content can be ruled out.  Of course even he doesn't think that.  He
claims that his 42 trillion trillion to 1 odds happened not just once,
but multiple times.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-17 Thread Adam Olsen
On Apr 17, 9:59 am, SpreadTooThin bjobrie...@gmail.com wrote:
 You know this is just insane.  I'd be satisfied with a CRC16 or
 something in the situation i'm in.
 I have two large files, one local and one remote.  Transferring every
 byte across the internet to be sure that the two files are identical
 is just not feasible.  If two servers one on one side and the other on
 the other side both calculate the CRCs and transmit the CRCs for
 comparison I'm happy.

Definitely use a hash, ignore Nigel.  SHA-256 or SHA-512.  Or, if you
might need to update one of the files, look at rsync.  Rsync still
uses MD4 and MD5 (optionally!), but they're fine in a trusted
environment.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote:
 Adam Olsen wrote:
  The chance of *accidentally* producing a collision, although
  technically possible, is so extraordinarily rare that it's completely
  overshadowed by the risk of a hardware or software failure producing
  an incorrect result.

 Not when you're using them to compare lots of files.

 Trust me. Been there, done that, got the t-shirt.

 Using hash functions to tell whether or not files are identical is an
 error waiting to happen.

 But please, do so if it makes you feel happy, you'll just eventually get
 an incorrect result and not know it.

Please tell us what hash you used and provide the two files that
collided.

If your hash is 256 bits, then you need around 2**128 files to produce
a collision.  This is known as a Birthday Attack.  I seriously doubt
you had that many files, which suggests something else went wrong.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote:
 Adam Olsen wrote:
  On Apr 15, 12:56 pm, Nigel Rantor wig...@wiggly.org wrote:
  Adam Olsen wrote:
  The chance of *accidentally* producing a collision, although
  technically possible, is so extraordinarily rare that it's completely
  overshadowed by the risk of a hardware or software failure producing
  an incorrect result.
  Not when you're using them to compare lots of files.

  Trust me. Been there, done that, got the t-shirt.

  Using hash functions to tell whether or not files are identical is an
  error waiting to happen.

  But please, do so if it makes you feel happy, you'll just eventually get
  an incorrect result and not know it.

  Please tell us what hash you used and provide the two files that
  collided.

 MD5

  If your hash is 256 bits, then you need around 2**128 files to produce
  a collision.  This is known as a Birthday Attack.  I seriously doubt
  you had that many files, which suggests something else went wrong.

 Okay, before I tell you about the empirical, real-world evidence I have
 could you please accept that hashes collide and that no matter how many
 samples you use the probability of finding two files that do collide is
 small but not zero.

I'm afraid you will need to back up your claims with real files.
Although MD5 is a smaller, older hash (128 bits, so you only need
2**64 files to find collisions), and it has substantial known
vulnerabilities, the scenario you suggest where you *accidentally*
find collisions (and you imply multiple collisions!) would be a rather
significant finding.

Please help us all by justifying your claim.

Mind you, since you use MD5 I wouldn't be surprised if your files were
maliciously produced.  As I said before, you need to consider
upgrading your hash every few years to avoid new attacks.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 8:59 am, Grant Edwards inva...@invalid wrote:
 On 2009-04-16, Adam Olsen rha...@gmail.com wrote:
  I'm afraid you will need to back up your claims with real files.
  Although MD5 is a smaller, older hash (128 bits, so you only need
  2**64 files to find collisions),

 You don't need quite that many to have a significant chance of
 a collision.  With only something on the order of 2**61
 files, you still have about a 1% chance of a collision.

Aye, 2**64 is more of the middle of the curve or so.  You can still go
either way.  What's important is the order of magnitude required.


 For a few million files (we'll say 4e6), the probability of a
 collision is so close to 0 that it can't be calculated using
 double-precision IEEE floats.

≈ 0.023509887

Or 4253529600 to 1.

Or 42 trillion trillion to 1.


 Here's the Python function I'm using:

 def bp(n, d):
     return 1.0 - exp(-n*(n-1.)/(2.*d))

 I haven't spent much time studying the numerical issues of the
 way that the exponent is calculated, so I'm not entirely
 confident in the results for small n values such that
 p(n) == 0.0.

Try using Qalculate.  I always resort to it for things like this.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 11:15 am, SpreadTooThin bjobrie...@gmail.com wrote:
 And yes he is right CRCs hashing all have a probability of saying that
 the files are identical when in fact they are not.

Here's the bottom line.  It is either:

A) Several hundred years of mathematics and cryptography are wrong.
The birthday problem as described is incorrect, so a collision is far
more likely than 42 trillion trillion to 1.  You are simply the first
person to have noticed it.

B) Your software was buggy, or possibly the input was maliciously
produced.  Or, a really tiny chance that your particular files
contained a pattern that provoked bad behaviour from MD5.

Finding a specific limitation of the algorithm is one thing.  Claiming
that the math is fundamentally wrong is quite another.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-16 Thread Adam Olsen
On Apr 16, 4:27 pm, Rhodri James rho...@wildebst.demon.co.uk
wrote:
 On Thu, 16 Apr 2009 10:44:06 +0100, Adam Olsen rha...@gmail.com wrote:
  On Apr 16, 3:16 am, Nigel Rantor wig...@wiggly.org wrote:
  Okay, before I tell you about the empirical, real-world evidence I have
  could you please accept that hashes collide and that no matter how many
  samples you use the probability of finding two files that do collide is
  small but not zero.

  I'm afraid you will need to back up your claims with real files.

 So that would be a no then.  If the implementation of dicts in Python,
 say, were to assert as you are that the hashes aren't going to collide,
 then I'd have to walk away from it.  There's no point in using something
 that guarantees a non-zero chance of corrupting your data.

Python's hash is only 32 bits on a 32-bit box, so even 2**16 keys (or
65 thousand) will give you a decent chance of a collision.  In
contrast MD5 needs 2**64, and a *good* hash needs 2**128 (SHA-256) or
2**256 (SHA-512).  The two are at totally different extremes.

There is *always* a non-zero chance of corruption, due to software
bugs, hardware defects, or even operator error.  It is only in that
broader context that you can realize just how minuscule the risk is.

Can you explain to me why you justify great lengths of paranoia, when
the risk is so much lower?


 Why are you advocating a solution to the OP's problem that is more
 computationally expensive than a simple byte-by-byte comparison and
 doesn't guarantee to give the correct answer?

For single, one-off comparison I have no problem with a byte-by-byte
comparison.  There's a decent chance the files won't be in the OS's
cache anyway, so disk IO will be your bottleneck.

Only if you're doing multiple comparisons is a hash database
justified.  Even then, if you expect matching files to be fairly rare
I won't lose any sleep if you're paranoid and do a byte-by-byte
comparison anyway.  New vulnerabilities are found, and if you don't
update promptly there is a small (but significant) chance of a
malicious file leading to collision.

That's not my concern though.  What I'm responding to is Nigel
Rantor's grossly incorrect statements about probability.  The chance
of collision, in our life time, is *insignificant*.

The Wayback Machine has 150 billion pages, so 2**37.  Google's index
is a bit larger at over a trillion pages, so 2**40.  A little closer
than I'd like, but that's still 56294995000 to 1 odds of having
*any* collisions between *any* of the files.  Step up to SHA-256 and
it becomes 1915619400 to
1.  Sadly, I can't even give you the odds for SHA-512, Qalculate
considers that too close to infinite to display. :)

You should worry more about your head spontaneously exploding than you
should about a hash collision on that scale.  To do otherwise is
irrational paranoia.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-15 Thread Adam Olsen
On Apr 15, 11:04 am, Nigel Rantor wig...@wiggly.org wrote:
 The fact that two md5 hashes are equal does not mean that the sources
 they were generated from are equal. To do that you must still perform a
 byte-by-byte comparison which is much less work for the processor than
 generating an md5 or sha hash.

 If you insist on using a hashing algorithm to determine the equivalence
 of two files you will eventually realise that it is a flawed plan
 because you will eventually find two files with different contents that
 nonetheless hash to the same value.

 The more files you test with the quicker you will find out this basic truth.

 This is not complex, it's a simple fact about how hashing algorithms work.

The only flaw on a cryptographic hash is the increasing number of
attacks that are found on it.  You need to pick a trusted one when you
start and consider replacing it every few years.

The chance of *accidentally* producing a collision, although
technically possible, is so extraordinarily rare that it's completely
overshadowed by the risk of a hardware or software failure producing
an incorrect result.
--
http://mail.python.org/mailman/listinfo/python-list


Re: binary file compare...

2009-04-14 Thread Adam Olsen
On Apr 13, 8:39 pm, Grant Edwards gra...@visi.com wrote:
 On 2009-04-13, Peter Otten __pete...@web.de wrote:

  But there's a cache. A change of file contents may go
  undetected as long as the file stats don't change:

 Good point.  You can fool it if you force the stats to their
 old values after you modify a file and you don't clear the
 cache.

The timestamps stored on the filesystem (for ext3 and most other
filesystems) are fairly coarse, so it's quite possible for a check/
update/check sequence to have the same timestamp at the beginning and
end.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Returning different types based on input parameters

2009-04-09 Thread Adam Olsen
On Apr 8, 8:09 am, George Sakkis george.sak...@gmail.com wrote:
 On Apr 7, 3:18 pm, Adam Olsen rha...@gmail.com wrote:

  On Apr 6, 3:02 pm, George Sakkis george.sak...@gmail.com wrote:

   For example, it is common for a function f(x) to expect x to be simply
   iterable, without caring of its exact type. Is it ok though for f to
   return a list for some types/values of x, a tuple for others and a
   generator for everything else (assuming it's documented), or it should
   always return the most general (iterator in this example) ?

  For list/tuple/iterable the correlation with the argument's type is
  purely superficial, *because* they're so compatible.  Why should only
  tuples and lists get special behaviour?  Why shouldn't every other
  argument type return a list as well?

 That's easy; because the result might be infinite. In which case you
 may ask why shouldn't every argument type return an iterator then,
 and the reason is usually performance; if you already need to store
 the whole result sequence (e.g. sorted()), why return just an iterator
 to it and force the client to copy it to another list if he needs
 anything more than iterating once over it ?

You've got two different use cases here.  sorted() clearly cannot be
infinite, so it might as well always return a list.  Other functions
that can be infinite should always return an iterator.


  A counter example is python 3.0's str/bytes functions.  They're
  mutually incompatible and there's no default.

 As already mentioned, another example is filter() that tries to match
 the input sequence type and falls back to list if it fails.

That's fixed in 3.0.  It's always an iterator now.


   To take it further, what if f wants to return different types,
   differing even in a duck-type sense?
 
  At a minimum it's highly undesirable.  You lose a lot of readability/
  maintainability.  solve2/solve_ex is a little ugly, but that's less
  overall, so it's the better option.

 That's my feeling too, at least in a dynamic language. For a static
 language that allows overloading, that should be a smaller (or perhaps
 no) issue.

Standard practices may encourage it in a static language, but it's
still fairly confusing.  Personally, I consider python's switch to a
different operator for floor division (//) to be a major step forward
over C-like languages.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Returning different types based on input parameters

2009-04-07 Thread Adam Olsen
On Apr 6, 3:02 pm, George Sakkis george.sak...@gmail.com wrote:
 For example, it is common for a function f(x) to expect x to be simply
 iterable, without caring of its exact type. Is it ok though for f to
 return a list for some types/values of x, a tuple for others and a
 generator for everything else (assuming it's documented), or it should
 always return the most general (iterator in this example) ?

For list/tuple/iterable the correlation with the argument's type is
purely superficial, *because* they're so compatible.  Why should only
tuples and lists get special behaviour?  Why shouldn't every other
argument type return a list as well?

A counter example is python 3.0's str/bytes functions.  They're
mutually incompatible and there's no default.


 To take it further, what if f wants to return different types,
 differing even in a duck-type sense? That's easier to illustrate in a
 API-extension scenario. Say that there is an existing function `solve
 (x)` that returns `Result` instances.  Later someone wants to extend f
 by allowing an extra optional parameter `foo`, making the signature
 `solve(x, foo=None)`. As long as the return value remains backward
 compatible, everything's fine. However, what if in the extended case,
 solve() has to return some *additional* information apart from
 `Result`, say the confidence that the result is correct ? In short,
 the extended API would be:

     def solve(x, foo=None):
         '''
         @rtype: `Result` if foo is None; (`Result`, confidence)
 otherwise.
         '''

 Strictly speaking, the extension is backwards compatible; previous
 code that used `solve(x)` will still get back `Result`s. The problem
 is that in new code you can't tell what `solve(x,y)` returns unless
 you know something about `y`. My question is, is this totally
 unacceptable and should better be replaced by a new function `solve2
 (x, foo=None)` that always returns (`Result`, confidence) tuples, or
 it might be a justifiable cost ? Any other API extension approaches
 that are applicable to such situations ?

At a minimum it's highly undesirable.  You lose a lot of readability/
maintainability.  solve2/solve_ex is a little ugly, but that's less
overall, so it's the better option.

If your tuple gets to 3 or more I'd start wondering if you should
return a single instance, with the return values as attributes.  If
Result is already such a thing I'd look even with a tuple of 2 to see
if that's appropriate.
--
http://mail.python.org/mailman/listinfo/python-list


[issue1683908] PEP 361 Warnings

2009-03-30 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Aye.  2.6 has come and gone, with most or all warnings applied using (I
believe) a different patch.  If any future work is needed it can get a
new ticket.

--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1683908
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5564] os.symlink/os.link docs should say old/new, not src/dst

2009-03-25 Thread Adam Olsen

New submission from Adam Olsen rha...@gmail.com:

destination is ambiguous.  It means opposite things, depending on if
it's the symlink creation operation or if it's the symlink itself.

In contrast, old is clearly what existed before the operation, and
new is what the operation creates.  This terminology is already in use
by os.rename.

--
assignee: georg.brandl
components: Documentation
messages: 84171
nosy: Rhamphoryncus, georg.brandl
severity: normal
status: open
title: os.symlink/os.link docs should say old/new, not src/dst

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5564
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: removing duplication from a huge list.

2009-03-03 Thread Adam Olsen
On Feb 27, 9:55 am, Falcolas garri...@gmail.com wrote:
 If order did matter, and the list itself couldn't be stored in memory,
 I would personally do some sort of hash of each item (or something as
 simple as first 5 bytes, last 5 bytes and length), keeping a reference
 to which item the hash belongs, sort and identify duplicates in the
 hash, and using the reference check to see if the actual items in
 question match as well.

 Pretty brutish and slow, but it's the first algorithm which comes to
 mind. Of course, I'm assuming that the list items are long enough to
 warrant using a hash and not the values themselves.

Might as well move all the duplication checking to sqlite.

Although it seems tempting to stick a layer in front, you will always
require either a full comparison or a full update, so there's no
potential for a fast path.
--
http://mail.python.org/mailman/listinfo/python-list


[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-03-03 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

issue 960406 broke this as part of a fix for readline.  I believe that
was motivated by fixing ctrl-C in the main thread, but non-main threads
were thrown in as a why not measure.

msg 46078 is the mention of this.  You can go into readlingsigs7.patch
and search for SET_THREAD_SIGMASK.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-02-27 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

The readline API just sucks.  It's not at all designed to be used
simultaneously from multiple threads, so we shouldn't even try.  Ban
using it in non-main threads, restore the blocking of signals, and go on
with our merry lives.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1975] signals not always delivered to main thread, since other threads have the signal unmasked

2009-02-27 Thread Adam Olsen

Changes by Adam Olsen rha...@gmail.com:


--
versions: +Python 2.6, Python 2.7, Python 3.0, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1975
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: more on unescaping escapes

2009-02-24 Thread Adam Olsen
On Feb 23, 7:18 pm, bvdp b...@mellowood.ca wrote:
 Gabriel Genellina wrote:
  En Mon, 23 Feb 2009 23:31:20 -0200, bvdp b...@mellowood.ca escribió:
  Gabriel Genellina wrote:
  En Mon, 23 Feb 2009 22:46:34 -0200, bvdp b...@mellowood.ca escribió:
  Chris Rebert wrote:
  On Mon, Feb 23, 2009 at 4:26 PM, bvdp b...@mellowood.ca wrote:

  [problem with Python and Windows paths using backslashes]
   Is there any particular reason you can't just internally use regular
  forward-slashes for the paths? [...]

  you are absolutely right! Just use '/' on both systems and be done
  with it. Of course I still need to use \x20 for spaces, but that is
  easy.
  Why is that? \x20 is exactly the same as  . It's not like %20 in
  URLs, that becomes a space only after decoding.

  I need to use the \x20 because of my parser. I'm reading unquoted
  lines from a file. The file creater needs to use the form foo\x20bar
  without the quotes in the file so my parser can read it as a single
  token. Later, the string/token needs to be decoded with the \x20
  converted to a space.

  So, in my file foo bar (no quotes) is read as 2 tokens; foo\x20bar
  is one.

  So, it's not really a problem of what happens when you assign a string
  in the form foo bar, rather how to convert the \x20 in a string to a
  space. I think the \\ just complicates the entire issue.

  Just thinking, if you was reading the string from a file, why were you
  worried about \\ and \ in the first place? (Ok, you moved to use / so
  this is moot now).

 Just cruft introduced while I was trying to figure it all out. Having to
 figure the \\ and \x20 at same time with file and keyboard input just
 confused the entire issue :) Having the user set a line like
 c:\\Program\x20File ... works just fine. I'll suggest he use
 c:/program\x20files to make it bit simple for HIM, not my parser.
 Unfortunately, due to some bad design decisions on my part about 5 years
 ago I'm afraid I'm stuck with the \x20.

 Thanks.

You're confusing the python source with the actual contents of the
string.  We already do one pass at decoding, which is why \x20 is
quite literally no different from a space:

 '\x20'
' '

However, the interactive interpreter uses repr(x), so various
characters that are considered formatting, such as a tab, get
reescaped when printing:

 '\t'
'\t'
 len('\t')
1

It really is a tab that gets stored there, not the escape for one.

Finally, if you give python an unknown escape it passes it leaves it
as an escape.  Then, when the interactive interpreter uses repr(x), it
is the backslash itself that gets reescaped:

 '\P'
'\\P'
 len('\P')
2
 list('\P')
['\\', 'P']

What does this all mean?  If you want to test your parser with python
literals you need to escape them twice, like so:

 'c:Program\\x20Filestest'
'c:Program\\x20Filestest'
 list('c:Program\\x20Filestest')
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']
 'c:Program\\x20Filestest'.decode('string-escape')
'c:\\Program Files\\test'
 list('c:Program\\x20Filestest'.decode('string-escape'))
['c', ':', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', ' ', 'F', 'i',
'l', 'e', 's', '\\', 't', 'e', 's', 't']

However, there's an easier way: use raw strings, which prevent python
from unescaping anything:

 r'c:\\Program\x20Files\\test'
'c:Program\\x20Filestest'
 list(r'c:\\Program\x20Files\\test')
['c', ':', '\\', '\\', 'P', 'r', 'o', 'g', 'r', 'a', 'm', '\\', 'x',
'2', '0', 'F', 'i', 'l', 'e', 's', '\\', '\\', 't', 'e', 's', 't']
--
http://mail.python.org/mailman/listinfo/python-list


Re: What encoding does u'...' syntax use?

2009-02-21 Thread Adam Olsen
On Feb 21, 10:48 am, a...@pythoncraft.com (Aahz) wrote:
 In article 499f397c.7030...@v.loewis.de,

 =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?=  mar...@v.loewis.de wrote:
  Yes, I know that.  But every concrete representation of a unicode string
  has to have an encoding associated with it, including unicode strings
  produced by the Python parser when it parses the ascii string u'\xb5'

  My question is: what is that encoding?

 The internal representation is either UTF-16, or UTF-32; which one is
 a compile-time choice (i.e. when the Python interpreter is built).

 Wait, I thought it was UCS-2 or UCS-4?  Or am I misremembering the
 countless threads about the distinction between UTF and UCS?

Nope, that's partly mislabeling and partly a bug.  UCS-2/UCS-4 refer
to Unicode 1.1 and earlier, with no surrogates.  We target Unicode
5.1.

If you naively encode UCS-2 as UTF-8 you really end up with CESU-8.
You miss the step where you combine surrogate pairs (which only exist
in UTF-16) into a single supplementary character.  Lo and behold,
that's actually what current python does in some places.  It's not
pretty.

See bugs #3297 and #3672.
--
http://mail.python.org/mailman/listinfo/python-list


[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-12 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Antoine, x ^= x4 has a higher collision rate than just a rotate. 
However, it's still lower than a statistically random hash.

If you modify the benchmark to randomly discard 90% of its contents this
should give you random addresses, reflecting a long-running program.

Here's the results I got (I used shift, too lazy to rotate):
XOR, sequential:  20.174627065692999
XOR, random:  30.460708379770004
shift, sequential:19.148091554626003
shift, random:30.495631933229998
original, sequential: 23.73646926877
original, random: 33.53617715837

Not massive, but still worth fixing the hash.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

The alignment requirements (long double) make it impossible to have
anything in those bits.

Hypothetically, a custom allocator could lower the alignment
requirements to sizeof(void *).  However, rotating to the high bits is
pointless as they're the least likely to be used — impossible in this
case, as only the 2 highest bits would contain anything, and for that
you'd need a dictionary with at least 2 billion entries on 32bit, which
is more than the 32bit address space.  64-bit is similar.

Note that mixing the bits back in, via XOR or similar, is actually more
likely to hurt than help.  It's just like ints and strings, who's hash
values are very sequential, a simple shift tends to get us sequential
hashes.  This gives us a far lower collision rate than a statistically
random hash.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Antoine, I only meant list() and dict() to be an example of objects with
a larger allocation pattern.  We get a substantial benefit from the
sequentially increasing memory addresses, and I wanted to make sure that
benefit wasn't lost on larger allocations than object().

Mark, I concede the point about rotating; I believe the cost on x86 is
the same regardless.

Why are you still only rotating 3 bits?  My results were better with 4
bits, and that should be the sweet spot for the typical use cases.

Also, would the use of size_t make this code simpler?  It should be the
size of the pointer even on windows.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

 At four bits, you may be throwing away information and I don't think
 that's cool.  Even if some selected timings are better with more bits
 shifted, all you're really showing is that there is more randomness in
 the upper bits than the lower ones.  But that doesn't mean than the
 lower one contribute nothing at all.

On the contrary, the expected collision rate for a half-full dictionary
is about 21%, whereas I'm getting less than 5%.  I'm taking advantage of
the sequentiality of addresses, just as int and str hashes do for their
values.

However, you're right that it's only one use case.  Although creating a
burst of objects for a throw-away set may itself be common, it's
typically with int or str, and doing it with custom objects is
presumably fairly rare; certainly not a good microbenchmark for the rest
of the interpreter.

Creating a list of 10 objects, then shuffling and picking a few
increases my collision rate back up to 21%.  That should more accurately
reflect a long-running program using custom objects as keys in a dict.

That said, I still prefer the simplicity of a rotate.  Adding an
arbitrary set of OR, XOR, or add makes me uneasy; I know enough to do
them wrong (reduce entropy), but not enough to do them right.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-11 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Testing with a large set of ids is a good demonstration, but not proof.
 Forming a set of *all* possible values within a certain range is proof.

However, XOR does work (OR definitely does not) — it's a 1-to-1
transformation (reversible as you say.)

Additionally, it still gives the unnaturally low collision rate when
using sequential addresses, so there's no objection there.

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

On my 64-bit linux box there's nothing in the last 4 bits:

 [id(o)%16 for o in [object() for i in range(128)]]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0]

And with a bit more complicated functions I can determine how much shift
gives us the lowest collision rate:

def a(size, shift):
return len(set((id(o)  shift) % (size * 2) for o in [object() for
i in range(size)]))

def b(size):
return [a(size, shift) for shift in range(11)]

def c():
for i in range(1, 9):
size = 2**i
x = ', '.join('% 3s' % count for count in b(size))
print('% 3s: %s' % (size, x))

 c()
  2:   1,   1,   1,   2,   2,   1,   1,   1,   2,   2,   2
  4:   1,   1,   2,   3,   4,   3,   2,   4,   4,   3,   2
  8:   1,   2,   4,   6,   6,   7,   8,   6,   4,   3,   2
 16:   2,   4,   7,   9,  12,  13,  12,   8,   5,   3,   2
 32:   4,   8,  14,  23,  30,  25,  19,  12,   7,   4,   2
 64:   8,  16,  32,  55,  64,  38,  22,  13,   8,   4,   2
128:  16,  32,  64, 114, 128,  71,  39,  22,  12,   6,   3
256:  32,  64, 128, 242, 242, 123,  71,  38,  20,  10,   5

The fifth column (ie 4 bits of shift, a divide of 16) works the best. 
Although it varies from run to run, probably more than half the results
in that column have no collisions at all.

.. although, if I replace object() with list() I get best results with a
shift of 6 bits.  Replacing it with dict() is best with 8 bits.

We may want something more complicated.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5186] Reduce hash collisions for objects with no __hash__ method

2009-02-10 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

Upon further inspection, although a shift of 4 (on a 64-bit linux box)
isn't perfect for dict, it's fairly close to it and well beyond random
hash values.  Mixing things more is just gonna lower it towards random
values.

 c()
  2:   1,   1,   1,   2,   2,   1,   1,   1,   1,   1,   2
  4:   1,   1,   2,   3,   4,   3,   3,   2,   2,   2,   3
  8:   1,   2,   4,   7,   8,   7,   5,   6,   7,   5,   5
 16:   2,   4,   7,  11,  16,  15,  12,  14,  15,   9,   7
 32:   3,   5,  10,  18,  31,  30,  30,  30,  31,  20,  12
 64:   8,  14,  23,  36,  47,  54,  59,  59,  61,  37,  21
128:  16,  32,  58,  83, 118, 100, 110, 114, 126,  73,  41
256:  32,  64, 128, 195, 227, 197, 203, 240, 253, 150,  78

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5186
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3959] Add Google's ipaddr.py to the stdlib

2009-01-05 Thread Adam Olsen

Changes by Adam Olsen rha...@gmail.com:


--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue3959
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4074] Building a list of tuples has non-linear performance

2008-12-14 Thread Adam Olsen

Adam Olsen rha...@gmail.com added the comment:

I didn't test it, but the patch looks okay to me.

--
nosy: +Rhamphoryncus

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4074
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3999] Real segmentation fault handler

2008-12-10 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3999
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1215] Python hang when catching a segfault

2008-12-05 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

I'm in favour of just the doc change now.  It's less work and we don't
really need to disable that usage.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1215
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4006] os.getenv silently discards env variables with non-UTF-8 values

2008-12-04 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue4006
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Adam Olsen
On Fri, Oct 24, 2008 at 4:48 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/24/2008 2:15 PM, came the following characters from the
 keyboard of Rhamphoryncus:

 On Oct 24, 2:59 pm, Glenn Linderman [EMAIL PROTECTED] wrote:


 On approximately 10/24/2008 1:09 PM, came the following characters from
 the keyboard of Rhamphoryncus:


 PyE: objects are reclassified as shareable or non-shareable, many
 types are now only allowed to be shareable.  A module and its classes
 become shareable with the use of a __future__ import, and their
 shareddict uses a read-write lock for scalability.  Most other
 shareable objects are immutable.  Each thread is run in its own
 private monitor, and thus protected from the normal threading memory
 module nasties.  Alas, this gives you all the semantics, but you still
 need scalable garbage collection.. and CPython's refcounting needs the
 GIL.


 Hmm.  So I think your PyE is an instance is an attempt to be more
 explicit about what I said above in PyC: PyC threads do not share data
 between threads except by explicit interfaces.  I consider your
 definitions of shared data types somewhat orthogonal to the types of
 threads, in that both PyA and PyC threads could use these new shared
 data items.


 Unlike PyC, there's a *lot* shared by default (classes, modules,
 function), but it requires only minimal recoding.  It's as close to
 have your cake and eat it too as you're gonna get.


 Yes, but I like my cake frosted with performance; Guido's non-acceptance of
 granular locks in the blog entry someone referenced was due to the slowdown
 acquired with granular locking and shared objects.  Your PyE model, with
 highly granular sharing, will likely suffer the same fate.

No, my approach includes scalable performance.  Typical paths will
involve *no* contention (ie no locking).  classes and modules use
shareddict, which is based on a read-write lock built into the
interpreter, so it's uncontended for read-only usage patterns.  Pretty
much everything else is immutable.

Of course that doesn't include the cost of garbage collection.
CPython's refcounting can't scale.


 The independent threads model, with only slight locking for a few explicitly
 shared objects, has a much better chance of getting better performance
 overall.  With one thread running, it would be the same as today; with
 multiple threads, it should scale at the same rate as the system... minus
 any locking done at the higher level.

So use processes with a little IPC for these expensive-yet-shared
objects.  multiprocessing does it already.


 I think/hope that you meant that many types are now only allowed to be
 non-shareable?  At least, I think that should be the default; they
 should be within the context of a single, independent interpreter
 instance, so other interpreters don't even know they exist, much less
 how to share them.  If so, then I understand most of the rest of your
 paragraph, and it could be a way of providing shared objects, perhaps.


 There aren't multiple interpreters under my model.  You only need
 one.  Instead, you create a monitor, and run a thread on it.  A list
 is not shareable, so it can only be used within the monitor it's
 created within, but the list type object is shareable.


 The python interpreter code should be sharable, having been written in C,
 and being/becoming reentrant.  So in that sense, there is only one
 interpreter.  Similarly, any other reentrant C extensions would be that way.
  On the other hand, each thread of execution requires its own interpreter
 context, so that would have to be independent for the threads to be
 independent.  It is the combination of code+context that I call an
 interpreter, and there would be one per thread for PyC threads.  Bytecode
 for loaded modules could potentially be shared, if it is also immutable.
  However, that could be in my mental phase 2, as it would require an extra
 level of complexity in the interpreter as it creates shared bytecode...
 there would be a memory savings from avoiding multiple copies of shared
 bytecode, likely, and maybe also a compilation performance savings.  So it
 sounds like a win, but it is a win that can deferred for initial simplicity,
 to prove the concept is or is not workable.

 A monitor allows a single thread to run at a time; that is the same
 situation as the present GIL.  I guess I don't fully understand your model.

To use your terminology, each monitor is a context.  Each thread
operates in a different monitor.  As you say, most C functions are
already thread-safe (reentrant).  All I need to do is avoid letting
multiple threads modify a single mutable object (such as a list) at a
time, which I do by containing it within a single monitor (context).


-- 
Adam Olsen, aka Rhamphoryncus
--
http://mail.python.org/mailman/listinfo/python-list


Re: 2.6, 3.0, and truly independent intepreters

2008-10-24 Thread Adam Olsen
On Fri, Oct 24, 2008 at 5:38 PM, Glenn Linderman [EMAIL PROTECTED] wrote:
 On approximately 10/24/2008 2:16 PM, came the following characters from the
 keyboard of Rhamphoryncus:

 On Oct 24, 3:02 pm, Glenn Linderman [EMAIL PROTECTED] wrote:


 On approximately 10/23/2008 2:24 PM, came the following characters from
 the
 keyboard of Rhamphoryncus:


 On Oct 23, 11:30 am, Glenn Linderman [EMAIL PROTECTED] wrote:



 On approximately 10/23/2008 12:24 AM, came the following characters
 from
 the keyboard of Christian Heimes


 Andy wrote:
 I'm very - not absolute, but very - sure that Guido and the initial
 designers of Python would have added the GIL anyway. The GIL makes
 Python faster on single core machines and more stable on multi core
 machines.


 Actually, the GIL doesn't make Python faster; it is a design decision
 that
 reduces the overhead of lock acquisition, while still allowing use of
 global
 variables.

 Using finer-grained locks has higher run-time cost; eliminating the use
 of
 global variables has a higher programmer-time cost, but would actually
 run
 faster and more concurrently than using a GIL. Especially on a
 multi-core/multi-CPU machine.


 Those globals include classes, modules, and functions.  You can't
 have *any* objects shared.  Your interpreters are entirely isolated,
 much like processes (and we all start wondering why you don't use
 processes in the first place.)


 Indeed; isolated, independent interpreters are one of the goals.  It is,
 indeed, much like processes, but in a single address space.  It allows the
 master process (Python or C for the embedded case) to be coded using memory
 references and copies and pointer swaps instead of using semaphores, and
 potentially multi-megabyte message transfers.

 It is not clear to me that with the use of shared memory between processes,
 that the application couldn't use processes, and achieve many of the same
 goals.  On the other hand, the code to create and manipulate processes and
 shared memory blocks is harder to write and has more overhead than the code
 to create and manipulate threads, which can, when told, access any memory
 block in the process.  This allows the shared memory to be resized more
 easily, or more blocks of shared memory created more easily.  On the other
 hand, the creation of shared memory blocks shouldn't be a high-use operation
 in a program that has sufficient number crunching to do to be able to
 consume multiple cores/CPUs.

 Or use safethread.  It imposes safe semantics on shared objects, so
 you can keep your global classes, modules, and functions.  Still need
 garbage collection though, and on CPython that means refcounting and
 the GIL.


 Sounds like safethread has 35-40% overhead.  Sounds like too much, to me.

The specific implementation of safethread, which attempts to remove
the GIL from CPython, has significant overhead and had very limited
success at being scalable.

The monitor design proposed by safethread has no inherent overhead and
is completely scalable.


-- 
Adam Olsen, aka Rhamphoryncus
--
http://mail.python.org/mailman/listinfo/python-list


[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-09-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Marc, I don't understand what you're saying.  UTF-16's surrogates are
not optional.  Unicode 2.0 and later require them, and Python is
supposed to support it.

Likewise, UCS-4 originally allowed a much larger range of code points,
but it no longer does; allowing them would mean supporting only old,
archaic versions of the standards (which is clearly not desirable.)

You are right in that I shouldn't have said a pair of ill-formed code
units.  I should have said a pair of unassigned code points, which is
how UCS-2 always have and always will classify them.

Although python may allow ill-formed sequences to be created internally
(primarily lone surrogates on UTF-16 builds), it cannot encode or decode
them.  The standard is clear that these are to be treated as errors,
which the .decode()'s errors argument controls.  You could add a new
value for errors to pass-through the garbage, but I fail to see a use
case for it.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-09-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

I've got another report open about the codecs not properly reporting
errors relating to surrogates: issue 3672

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3672] Ill-formed surrogates not treated as errors during encoding/decoding

2008-08-24 Thread Adam Olsen

New submission from Adam Olsen [EMAIL PROTECTED]:

The Unicode FAQ makes it quite clear that any surrogates in UTF-8 or
UTF-32 should be treated as errors.  Lone surrogates in UTF-16 should
probably be treated as errors too (but only during encoding/decoding;
unicode objects on UTF-16 builds should allow them to be created through
slicing).

http://unicode.org/faq/utf_bom.html#30
http://unicode.org/faq/utf_bom.html#42
http://unicode.org/faq/utf_bom.html#40

Lone surrogate in UTF-8 (effectively CESU-8):
 '\xED\xA0\x81'.decode('utf-8')
u'\ud801'

Surrogate pair in UTF-8:
 '\xED\xA0\x81\xED\xB0\x80'.decode('utf-8')
u'\ud801\udc00'

On a UTF-32 build, encoding a surrogate pair with UTF-16, then decoding
again will produce the proper non-surrogate scalar value.  This has
security implications, although rare as characters outside the BMP are rare:
 u'\ud801\udc00'.encode('utf-16').decode('utf-16')
u'\U00010400'

Also on a UTF-32 build, decoding of a lone surrogate in UTF-16 fails
(correctly), but encoding one does not:
 u'\ud801'.encode('utf-16')
'\xff\xfe\x01\xd8'


I have gotten a report of a user decoding bad data using
x.decode('utf-8', 'replace'), then getting an error from Gtk+ when the
ill-formed surrogates reached it.

Fixing this would cause issue 3297 to blow up loudly, rather than silently.

--
messages: 71889
nosy: Rhamphoryncus
severity: normal
status: open
title: Ill-formed surrogates not treated as errors during encoding/decoding

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3672
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3672] Ill-formed surrogates not treated as errors during encoding/decoding

2008-08-24 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
components: +Unicode
type:  - behavior

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3672
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1758146] Crash in PyObject_Malloc

2008-07-21 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Graham, I appreciate the history of sub-interpreters and how entrenched
they are.  Changing those practises requires a significant investment. 
This is an important factor to consider.

The other factor is the continuing maintenance and development cost. 
Subinterpreters add substantial complexity, which I can personally vouch
for.  This is exhibited in the GIL API not supporting them properly and
in the various bugs that have been found over the years.

Imagine, for a moment, that the situation were reversed; that everything
were built on threading.  Would you consider even for a moment adding
sub-interpreters?  How could you justify it?

It's not a decision to be taken lightly, but my preference is clear:
bite the bullet, make the change.  It's easier in the long run.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1758146
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3299] invalid object destruction in re.finditer()

2008-07-19 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3299
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-07-12 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Marc, perhaps Unicode has refined their definitions since you last looked?

Valid UTF-8 *cannot* contain surrogates[1].  If it does, you have
CESU-8[2][3], not UTF-8.

So there are two bugs: first, the UTF-8 codec should refuse to load
surrogates.  Second, since the original bug showed up before the .pyc is
created, something in the parse/compilation/whatever stage is producing
CESU-8.


[1] 4th bullet point of D92 in
http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf
[2] http://unicode.org/reports/tr26/
[3] http://en.wikipedia.org/wiki/CESU-8

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-07-12 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Err, to clarify, the parse/compile/whatever stages is producing broken
UTF-32 (surrogates are ill-formed there too), and that gets transformed
into CESU-8 when the .pyc is saved.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-07-11 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Simpler way to reproduce this (on linux):

$ rm unicodetest.pyc 
$ 
$ python -c 'import unicodetest'
Result: False
Len: 2 1
Repr: u'\ud800\udd23' u'\U00010123'
$ 
$ python -c 'import unicodetest'
Result: True
Len: 1 1
Repr: u'\U00010123' u'\U00010123'

Storing surrogates in UTF-32 is ill-formed[1], so the first part
definitely shouldn't be failing on linux (with a UTF-32 build).

The repr could go either way, as unicode doesn't cover escape sequences.
 We could allow u'\ud800\udd23' literals to magically become
u'\U00010123' on UTF-32 builds.  We already allow repr(u'\ud800\udd23')
to magically become u'\U00010123' on UTF-16 builds (which is why the
repr test always passes there, rather than always failing).

The bigger problem is how much we prohibit ill-formed character
sequences.  We already prevent values above U+10, but not
inappropriate surrogates.


[1] Search for D90 in http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf

--
nosy: +Rhamphoryncus
Added file: http://bugs.python.org/file10880/unicodetest.py

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

2008-07-11 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

No, the configure options are wrong - we do use UTF-16 and UTF-32. 
Although modern UCS-4 has been restricted down to the range of UTF-32
(it used to be larger!), UCS-2 still doesn't support the supplementary
planes (ie no surrogates.)

If it really was UCS-2, the repr wouldn't be u'\U00010123' on windows. 
It'd be a pair of ill-formed code units instead.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3297
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3329] API for setting the memory allocator used by Python

2008-07-10 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Basically you just want to kick the malloc implementation into doing
some housekeeping, freeing its caches?  I'm kinda surprised you don't
add the hook directly to your libc's malloc.

IMO, there's no use-case for this until Py_Finalize can completely tear
down the interpreter, which requires a lot of special work (killing(!)
daemon threads, unloading C modules, etc), and nobody intends to do that
at this point.

The practical alternative, as I said, is to run python in a subprocess.
 Let the OS clean up after us.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue874900] threading module can deadlock after fork

2008-07-09 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

In general I suggest replacing the lock with a new lock, rather than
trying to release the existing one.  Releasing *might* work in this
case, only because it's really a semaphore underneath, but it's still
easier to think about by just replacing.

I also suggest deleting _active and recreating it with only the current
thread.

I don't understand how test_join_on_shutdown could succeed.  The main
thread shouldn't be marked as done.. well, ever.  The test should hang.

I suspect test_join_in_forked_process should call os.waitpid(childpid)
so it doesn't exit early, which would cause the original Popen.wait()
call to exit before the output is produced.  The same problem of
test_join_on_shutdown also applies.

Ditto for test_join_in_forked_from_thread.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue874900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue874900] threading module can deadlock after fork

2008-07-09 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Looking over some of the other platforms for thread_*.h, I'm sure
replacing the lock is the right thing.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue874900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3329] API for setting the memory allocator used by Python

2008-07-09 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

How would this allow you to free all memory?  The interpreter will still
reference it, so you'd have to have called Py_Finalize already, and
promise not to call Py_Initialize afterwords.  This further supposes the
process will live a long time after killing off the interpreter, but in
that case I recommend putting python in a child process instead.

--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3329
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue874900] threading module can deadlock after fork

2008-07-08 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue874900
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1758146] Crash in PyObject_Malloc

2008-07-08 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Apparently modwsgi uses subinterpreters because some third-party
packages aren't sufficiently thread-safe - modwsgi can't fix those
packages, so subinterpreters are the next best thing.

http://groups.google.com/group/modwsgi/browse_frm/thread/988bf560a1ae8147/2f97271930870989

This is a weak argument for language design.  Subinterpreters should be
deprecated, the problems with third-party packages found and fixed, and
ultimately subinterpreters ripped out.

If you wish to improve the situation, I suggest you help fix the
problems in the third-party packages.  For example,
http://code.google.com/p/modwsgi/wiki/IntegrationWithTrac implies trac
is configured with environment variables - clearly not thread-safe.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1758146
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1758146] Crash in PyObject_Malloc

2008-07-08 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Ahh, I did miss that bit, but it doesn't really matter.

Tell modwsgi to only use the main interpreter (PythonInterpreter
main_interpreter), and if you want multiple modules of the same name
put them in different packages.  Any other problems (trac using env vars
for configuration) should be fixed directly.

(My previous comment about building your own import mechanism was
overkill.  Writing a package that uses relative imports is enough - in
fact, that's what relative imports are for.)

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1758146
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1758146] Crash in PyObject_Malloc

2008-07-08 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Franco, you need to look at the line above that check:

PyThreadState *check = PyGILState_GetThisThreadState();
if (check  check-interp == newts-interp  check != newts)
Py_FatalError(Invalid thread state for this thread);

PyGILState_GetThisThreadState returns the original tstate *for that
thread*.  What it's asserting is that, if there's a second tstate *in
that thread*, it must be in a different subinterpreter.

It doesn't prevent your second and third tstate from sharing the same
subinterpreter, but it probably should, as this check implies it's an
invariant.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1758146
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1758146] Crash in PyObject_Malloc

2008-07-08 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

It's only checking that the original tstate *for the current thread* and
the new tstate have a different subinterpreter.  A subinterpreter can
have multiple tstates, so long as they're all in different threads.

The documentation is referring specifically to the PyGILState_Ensure and
PyGILState_Release functions.  Calling these says I want a tstate, and
I don't know if I had one already.  The problem is that, with
subinterpreters, you may not get a tstate with the subinterpreter you
want.  subinterpreter references saved in globals may lead to obscure
crashes or other errors - some of these have been fixed over the years,
but I doubt they all have.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1758146
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3268] Cleanup of tp_basicsize inheritance

2008-07-03 Thread Adam Olsen

New submission from Adam Olsen [EMAIL PROTECTED]:

inherit_special contains logic to inherit the base type's tp_basicsize
if the new type doesn't have it set.  The logic was spread over several
lines, but actually does almost nothing (presumably an artifact of
previous versions), so here's a patch to clean it up.

There was also an incorrect comment which I've removed.  A new one
should perhaps be added explaining what the other code there does, but
it's not affected by what I'm changing, and I'm not sure why it's doing
what it's doing anyway, so I'll leave that to someone else.

--
files: python-inheritsize.diff
keywords: patch
messages: 69169
nosy: Rhamphoryncus, nnorwitz
severity: normal
status: open
title: Cleanup of tp_basicsize inheritance
Added file: http://bugs.python.org/file10798/python-inheritsize.diff

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3268
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3088] test_multiprocessing hangs on OS X 10.5.3

2008-07-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Wed, Jul 2, 2008 at 3:44 PM, Mark Dickinson [EMAIL PROTECTED] wrote:

 Mark Dickinson [EMAIL PROTECTED] added the comment:

 Mark, can you try commenting out _TestCondition and seeing if you can
 still get it to hang?;

 I removed the _TestCondition class entirely from test_multiprocessing,
 and did make test again.  It didn't hang! :-)  It crashed instead.  :-(

Try running ulimit -c unlimited in the shell before running the test
(from the same shell).  After it aborts it should dump a core file,
which you can then inspect using gdb ./python core, to which bt
will give you a stack trace (backtrace).

On a minor note, I'd suggest running ./python -m test.regrtest
explicitly, rather than make test.  The latter runs the test suite
twice, deleting all .pyc files before the first run, to detect
problems in their creation.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3088
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3088] test_multiprocessing hangs on OS X 10.5.3

2008-07-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Wed, Jul 2, 2008 at 5:08 PM, Mark Dickinson [EMAIL PROTECTED] wrote:

 Mark Dickinson [EMAIL PROTECTED] added the comment:

 Okay.  I just got about 5 perfect runs of the test suite, followed by:

 Macintosh-3:trunk dickinsm$ ./python.exe -m test.regrtest
 [...]
 test_multiprocessing
 Assertion failed: (bp != NULL), function PyObject_Malloc, file
 Objects/obmalloc.c, line 746.
 Abort trap (core dumped)

 I then did:

 gdb -c /cores/core.16235

 I've attached the traceback as traceback.txt

Are you sure that's right?  That traceback has no mention of
PyObject_Malloc or obmalloc.c.  Try checking the date.  Also, if  you
use gdb ./python.exe corefile to start gdb it should print a
warning if the program doesn't match the core.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3088
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3088] test_multiprocessing hangs on OS X 10.5.3

2008-07-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

That looks better.  It crashed while deleting an exception, who's args
tuple has a bogus refcount.  Could be a refcount issue of the
exception or the args, or of something that that references them, or a
dangling pointer, or a buffer overrun, etc.

Things to try:
1) Run pystack in gdb, from Misc/gdbinit
2) Print the exception type.  Use up until you reach
BaseException_clear, then do print self-ob_type-tp_name.  Also do
print *self and make sure the ob_refcnt is at 0 and the other fields
look sane.
3) Compile using --without-pymalloc and throw it at a real memory
debugger.  I'd suggest starting with your libc's own debugging
options, as they tend to be less invasive:
http://developer.apple.com/documentation/Performance/Conceptual/ManagingMemory/Articles/MallocDebug.html
.  If that doesn't work, look at Electric Fence, Valgrind, or your
tool of choice.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3088
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3088] test_multiprocessing hangs on OS X 10.5.3

2008-07-02 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

Also, make sure you do a make clean since you last updated the tree or
touched any file or ran configure.  The automatic dependency checking
isn't 100% reliable.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3088
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3154] Quick search box renders too long on FireFox 3

2008-06-27 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

I've checked it again, using the font preferences rather than the zoom
setting, and I can reproduce the problem.

Part of the problem stems from using pixels to set the margin, rather
than ems (or whatever the text box is based on).  However, although the
margin (at least visually) scales up evenly, the fonts themselves do
not.  Arguably this is a defect in Firefox, or maybe even the HTML specs
themselves.

Additionally, that only seems to control the visual margin.  I've yet to
figure out what controls the layout (such as wrapping the Go button).

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3154
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-23 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Sun, Jun 22, 2008 at 2:56 PM, Antoine Pitrou [EMAIL PROTECTED] wrote:
 Le dimanche 22 juin 2008 à 20:40 +, Adam Olsen a écrit :
 Passing in e.args is probably sufficient.

 I think it's very optimistic :-) Some exception objects can hold dynamic
 state which is simply not stored in the args tuple. See Twisted's
 Failure objects for an extreme example:
 http://twistedmatrix.com/trac/browser/trunk/twisted/python/failure.py

 (yes, it is used an an exception: see raise self in the trap() method)

Failure doesn't have an args tuple and doesn't subclass Exception (or
BaseException) - it already needs modification in 3.0.  It's heaped
full of complexity and implementation details.  I wouldn't be
surprised if your changes break it in subtle ways too.

In short, if forcing Failure to be rewritten is the only consequence
of using .args, it's an acceptable tradeoff of not corrupting
exception contexts.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-22 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

* cause/context cycles should be avoided.  Naive traceback printing
could become confused, and I can't think of any accidental way to
provoke it (besides the problem mentioned here.)

* I suspect PyErr_Display handled string exceptions in 2.x, and this is
an artifact of that

* No opinion on PyErr_DisplaySingle

* PyErr_Display is used by PyErr_Print, and it must end up with no
active exception.  Additionally, third party code may depend on this
semantic.  Maybe PyErr_DisplayEx?

* +1 on standardizing tracebacks

--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-22 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Sun, Jun 22, 2008 at 8:07 AM, Antoine Pitrou [EMAIL PROTECTED] wrote:
 You mean they should be detected when the exception is set? I was afraid
 that it may make exception raising slower. Reporting is not performance
 sensitive in comparison to exception raising.

 (the problem mentioned here is already avoided in the patch, but the
 detection of other cycles is deferred to exception reporting for the
 reason given above)

I meant only that trivial cycles should be detected.  However, I
hadn't read your patch, so I didn't realize you already knew of a way
to create a non-trivial cycle.

This has placed a niggling doubt in my mind about chaining the
exceptions, rather than the tracebacks.  Hrm.

 * PyErr_Display is used by PyErr_Print, and it must end up with no
 active exception.  Additionally, third party code may depend on this
 semantic.  Maybe PyErr_DisplayEx?

 I was not proposing to change the exception swallowing semantics, just
 to add a return value indicating if any errors had occurred while
 displaying the exception.

Ahh, harmless then, but to what benefit?  Wouldn't the traceback
module be better suited to any possible error reporting?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-22 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Sun, Jun 22, 2008 at 1:04 PM, Antoine Pitrou [EMAIL PROTECTED] wrote:

 Antoine Pitrou [EMAIL PROTECTED] added the comment:

 Le dimanche 22 juin 2008 à 17:17 +, Adam Olsen a écrit :
 I meant only that trivial cycles should be detected.  However, I
 hadn't read your patch, so I didn't realize you already knew of a way
 to create a non-trivial cycle.

 This has placed a niggling doubt in my mind about chaining the
 exceptions, rather than the tracebacks.  Hrm.

 Chaining the tracebacks rather than the exceptions loses important
 information: what is the nature of the exception which is the cause or
 context of the current exception?

I assumed each leg of the traceback would reference the relevant exception.

Although.. this is effectively the same as creating a new exception
instance when reraised, rather than modifying the old one.  Reusing
the old is done for performance I believe.

 It is improbable to create such a cycle involuntarily, it means you
 raise an old exception in replacement of a newer one caused by the
 older, which I think is quite contorted. It is also quite easy to avoid
 creating the cycle, simply by re-raising outside of any except handler.

I'm not convinced.

try:
...  # Lookup
except A as a:  # Lookup failed
try:
...  # Fallback
except B as b:  # Fallback failed
raise a  # The original exception is of the type we want

For this behaviour, this is the most natural way to write it.
Conceptually, there shouldn't be a cycle - the traceback should be the
lookup, then the fallback, then whatever code is about this - exactly
the order the code executed in.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-22 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Sun, Jun 22, 2008 at 1:48 PM, Antoine Pitrou [EMAIL PROTECTED] wrote:

 Antoine Pitrou [EMAIL PROTECTED] added the comment:

 Le dimanche 22 juin 2008 à 19:23 +, Adam Olsen a écrit :
 For this behaviour, this is the most natural way to write it.
 Conceptually, there shouldn't be a cycle

 I agree your example is not far-fetched. How about avoiding cycles for
 implicit chaining, and letting users shoot themselves in the foot with
 explicit recursive chaining if they want? Detection would be cheap
 enough, just a simple loop without any memory allocation.

That's still O(n).  I'm not so easily convinced it's cheap enough.

And for that matter, I'm not convinced it's correct.  The inner
exception's context becomes clobbered when we modify the outer
exception's traceback.  The inner's context should reference the
traceback as it was at that point.

This would all be a lot easier if reraising always created a new
exception.  Can you think of a way to skip that only when we can be
sure its safe?  Maybe as simple as counting the references to it?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3112] implement PEP 3134 exception reporting

2008-06-22 Thread Adam Olsen

Adam Olsen [EMAIL PROTECTED] added the comment:

On Sun, Jun 22, 2008 at 2:20 PM, Antoine Pitrou [EMAIL PROTECTED] wrote:

 Antoine Pitrou [EMAIL PROTECTED] added the comment:

 Le dimanche 22 juin 2008 à 19:57 +, Adam Olsen a écrit :
 That's still O(n).  I'm not so easily convinced it's cheap enough.

 O(n) when n will almost never be greater than 5 (and very often equal to
 1 or 2), and when the unit is the cost of a pointer dereference plus the
 cost of a pointer comparison, still sounds cheap. We could bench it
 anyway.

Indeed.

 And for that matter, I'm not convinced it's correct.  The inner
 exception's context becomes clobbered when we modify the outer
 exception's traceback.  The inner's context should reference the
 traceback as it was at that point.

 Yes, I've just thought about that, it's a bit annoying... We have to
 decide what is more annoying: that, or a reference cycle that can delay
 deallocation of stuff attached to an exception (including local
 variables attached to the tracebacks)?

The cycle is only created by broken behaviour.  The more I think about
it, the more I want to fix it (by not reusing the exception).

 This would all be a lot easier if reraising always created a new
 exception.

 How do you duplicate an instance of an user-defined exception? Using an
 equivalent of copy.deepcopy()? It will probably end up much more
 expensive than the above-mentioned O(n) search.

Passing in e.args is probably sufficient.  All this would need to be
discussed on python-dev (or python-3000?) though.

 Can you think of a way to skip that only when we can be
 sure its safe?  Maybe as simple as counting the references to it?

 I don't think so, the exception can be referenced in an unknown number
 of local variables (themselves potentially referenced by tracebacks).

Can be, or will be?  Only the most common behaviour needs to be optimized.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3112
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3155] Python should expose a pthread_cond_timedwait API for threading

2008-06-21 Thread Adam Olsen

Changes by Adam Olsen [EMAIL PROTECTED]:


--
nosy: +Rhamphoryncus

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3155
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3153] sqlite leaks on error

2008-06-20 Thread Adam Olsen

New submission from Adam Olsen [EMAIL PROTECTED]:

Found in Modules/_sqlite/cursor.c:

self-statement = PyObject_New(pysqlite_Statement,
pysqlite_StatementTy
pe);
if (!self-statement) {
goto error;
}
rc = pysqlite_statement_create(self-statement,
self-connection, operation);
if (rc != SQLITE_OK) {
self-statement = 0;
goto error;
}

Besides the ugliness of allocating the object before passing it to the
create function, if pysqlite_statement_create fails, the object is leaked.

--
components: Extension Modules
messages: 68478
nosy: Rhamphoryncus
severity: normal
status: open
title: sqlite leaks on error
type: resource usage

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3153
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   >