date:20110224

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 In issue11303.diff, I add similar optimization for encode('latin1') and for 
 'utf8' variant of utf-8.  I don't think dash-less variants of utf-16 and 
 utf-32 are common enough to justify special-casing.

Looks good.

Given that we are starting to have a whole set of such aliases
in the C code, I wonder whether it would be better to make the
string comparisons more efficient, e.g.
if utf matches, the checks could then continue with 8 or -8
instead of trying to match utf again and again.

--
title: b'x'.decode('latin1') is much slower than b'x'.decode('latin-1') - 
b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 What is the status of this.  Status=open and Resolution=rejected contradict 
 each other.

Sorry, forgot to close the ticket.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5902
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg


Changes by Marc-Andre Lemburg m...@egenix.com:


--
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5902
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 Accepting all common forms for
 encoding names means that you can usually give Python an encoding name
 from, e.g. a HTML page, or any other file or system that specifies an
 encoding.
 
 I don't buy this argument.  Running attached script on 
 http://www.iana.org/assignments/character-sets shows that there are hundreds 
 of registered charsets that are not accepted by python:
 
 $ ./python.exe iana.py| wc -l
  413
 
 Any serious HTML or XML processing software should be based on the IANA 
 character-sets file rather than on the ad-hoc list of aliases that made it 
 into encodings/aliases.py.

Let's do a reality check:

How often do you see requests for additions to the aliases we
have in Python ? Perhaps one every year, if at all.

We take great care not to add aliases that are not in common
use or that do not have a proven track record of really being
compatible to the codec in question.

If you think we are missing some aliases, please open tickets
for them, indicating why these should be added.

If you really want complete IANA coverage, I suggest you create
a normalization module which maps the IANA names to our names
and upload it to PyPI.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5902
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11295] On Windows, Python crashes on ANSI / Windows-formatted source files

2011-02-24 Thread Stefan Krah


Stefan Krah stefan-use...@bytereef.org added the comment:

Python works fine with Notepad generated scripts. I think this is a
CGI issue. Try following this tutorial:

http://www.imladris.com/Scripts/PythonForWindows.html


If you still suspect a bug, you should provide the exact CGI script
and all details of the Apache configuration.

--
nosy: +skrah

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11295
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5902] Stricter codec names

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 Ezio and I discussed on IRC the implementation of alias lookup and neither of 
 us was able to point out to the function that strips non-alphanumeric 
 characters from encoding names.

I think you are misunderstanding the way the codec registry works.

You register codec search functions with it which then have to try
to map a given encoding name to a codec module.

The stdlib ships with one such function (defined in encodings/__init__.py).
This is registered with the codec registry per default.

The codec search function takes care of any normalization and conversion
to the module name used by the codecs from that codec package.

 It turns out that there are three normalize functions that are successively 
 applied to the encoding name during evaluation of str.encode/str.decode.
 
 1. normalize_encoding() in unicodeobject.c

This was added to have the few shortcuts we have in the C code
for commonly used codecs match more encoding aliases.

The shortcuts completely bypass the codec registry and also
bypass the function call overhead incurred by codecs
run via the codec registry.

 2. normalizestring() in codecs.c

This is the normalization applied by the codec registry. See PEP 100
for details:


Search functions are expected to take one argument, the encoding
name in all lower case letters and with hyphens and spaces
converted to underscores, ...


 3. normalize_encoding() in encodings/__init__.py

This is part of the stdlib encodings package's codec search
function.

 Each performs a slightly different transformation and only the last one 
 strips non-alphanumeric characters.
 
 The complexity of codec lookup is comparable with that of the import 
 mechanism!

It's flexible, but not really complex.

I hope the above clarifies the reasons for the three normalization
functions.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue5902
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11296] Possible error in What's new in Python 3.2 : duplication of rsplit() mention

2011-02-24 Thread Raymond Hettinger


Raymond Hettinger rhettin...@users.sourceforge.net added the comment:

Okay fixed.  The rsplit() method was mentioned in both underlying tracker 
issues, so it got mentioned twice when once would have been enough :-)

--
assignee: docs@python - rhettinger
nosy: +rhettinger
priority: normal - low
resolution:  - fixed
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11296
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

I wonder what this normalize_encoding() does!  Here is a pretty standard 
version of mine which is a bit more expensive but catches match more cases!  
This is stripped, of course, and can be rewritten very easily to Python's needs 
(i.e. using char[32] instead of char[11].

 * @@li If a character is either ::s_char_is_space() or ::s_char_is_punct():
 *  @@liReplace with ASCII space (0x20).
 *  @@liSqueeze adjacent spaces to a single one.
 * @@li Else if a character is ::s_char_is_alnum():
 *  @@li::s_char_to_lower() characters.
 *  @@liSeparate groups of alphas and digits with ASCII space (0x20).
 * @@li Else discard character.
 * E.g. ISO_8859---1 becomes iso 8859 1
 * and ISO8859-1 also becomes iso 8859 1.

s_textcodec_normalize_name(s_CString *_name) {
enum { C_NONE, C_WS, C_ALPHA, C_DIGIT } c_type = C_NONE;
char *name, c;
auto s_CString input;

s_cstring_swap(s_cstring_init(input), _name);
_name = s_cstring_reserve(_name, 31, s_FAL0);
name = s_cstring_cstr(input);

while ((c = *(name++)) != s_NUL) {
s_si8 sep = s_FAL0;

if (s_char_is_space(c) || s_char_is_punct(c)) {
if (c_type == C_WS)
continue;
c_type = C_WS;
c = ' ';
} else if (s_char_is_alpha(c)) {
sep = (c_type == C_DIGIT);
c_type = C_ALPHA;
c = s_char_to_lower(c);
} else if (s_char_is_digit(c)) {
sep = (c_type == C_ALPHA);
c_type = C_DIGIT;
} else
continue;

do
_name = s_cstring_append_char(_name, (sep ? ' ' : c));
while (--sep = s_FAL0);
}

s_cstring_destroy(input);
return _name;
}

--
nosy: +sdaoden

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11197] information leakage with SimpleHTTPServer

2011-02-24 Thread david


david db.pub.m...@gmail.com added the comment:

This may be stupid but...

shouldn't the example be:

lynx http://localhost:8000/../../../../../etc/passwd

... which does _not_ work.

--
nosy: +db

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11197
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

(That is to say, i would do it.  But not if _cpython is thrown to trash ,-); 
i.e. not if there is not a slight chance that it gets actually patched in 
because this performance issue probably doesn't mean a thing in real life.  You 
know, i'm a slow programmer, i would need *at least* two hours to rewrite that 
in plain C in a way that can make it as a replacement of normalize_encoding().)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11307] re engine exhaustively explores more than necessary

2011-02-24 Thread Niko Matsakis


New submission from Niko Matsakis n...@alum.mit.edu:

Executing code like this:

 r = re.compile(r'(\w+)*=.*')
 r.match(abcdefghijklmnopqrstuvwxyz)

takes a long time (around 12 seconds, on my machine).  Presumably this is 
because it is enumerating all the various ways to divvy up the alphabet for 
(\w+), even though there is no = sign to be found.  In contrast, in perl a 
regular expression like that seems to run instantly.

This could be optimized by recognizing that no = sign was found, and thus it 
does not matter how the first part of the regular expression matches, so there 
is no need to try additional possibilities.  To some extent, of course, the 
answer is just don't write regular expressions like that.  This example is 
reduced down from a real regexp where the potential inefficiency was less 
obvious.  Nonetheless the general optimization of recognizing when further 
re-enumeration is not necessary makes sense more generally.

In any case, I am submitting the bug report merely to raise the issue as a 
possible future optimization, not to suggest that it must be addressed 
immediately (or even at all).

--
components: Regular Expressions
messages: 129262
nosy: nikomatsakis
priority: normal
severity: normal
status: open
title: re engine exhaustively explores more than necessary
type: performance
versions: Python 2.6, Python 3.1

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11307
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4350] Remove dead code from Tkinter.py

2011-02-24 Thread Graham Horler


Graham Horler tryexc...@gmail.com added the comment:

Are we sure this is dead code, and not just out of date?

e.g. this works, and I use it in production with if Tkinter.TkVersion = 8.4:

b = Tkinter.Button(root)
b.tk.call('tk::ButtonEnter', b._w)

--
nosy: +pysquared

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue4350
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11308] extraneous link getit in the main website sidebar

2011-02-24 Thread SilentGhost


New submission from SilentGhost ghost@gmail.com:

There is an extraneous entry in sidebar of the www.python.org
It has some two chinese characters and leads to download page.

--
messages: 129264
nosy: SilentGhost
priority: normal
severity: normal
status: open
title: extraneous link getit in the main website sidebar

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11308
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11308] extraneous link getit in the main website sidebar

2011-02-24 Thread SilentGhost


SilentGhost ghost@gmail.com added the comment:

Sorry, I realise that this is my mistake.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11308
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11309] #include wctype.h in Objects/unicodetype_db.h and Objects/unicodectype.c

2011-02-24 Thread Дилян Палаузов


New submission from Дилян Палаузов dilyan.palau...@aegee.org:

As of python 2.7.1 configured with --enable-ipv6 --enable-unicode 
--with-system-expat --with-system-ffi --with-signal-module --with-threads 
--with-wctype-functions --enable-shared:

Please #include wctype.h in Objects/unicodetype_db.h and 
Objects/unicodectype.c

compilation produces the warnings:

In file included from Objects/unicodectype.c:34:0:
Objects/unicodetype_db.h: In function '_PyUnicodeUCS2_IsWhitespace':
Objects/unicodetype_db.h:3277:5: warning: implicit declaration of function 
'iswspace'
Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsLowercase':
Objects/unicodectype.c:192:5: warning: implicit declaration of function 
'iswlower'
Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsUppercase':
Objects/unicodectype.c:197:5: warning: implicit declaration of function 
'iswupper'
Objects/unicodectype.c: In function '_PyUnicodeUCS2_ToLowercase':
Objects/unicodectype.c:202:5: warning: implicit declaration of function 
'towlower'
Objects/unicodectype.c:202:12: warning: incompatible implicit declaration of 
built-in function 'towlower'
Objects/unicodectype.c: In function '_PyUnicodeUCS2_ToUppercase':
Objects/unicodectype.c:207:5: warning: implicit declaration of function 
'towupper'
Objects/unicodectype.c:207:12: warning: incompatible implicit declaration of 
built-in function 'towupper'
Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsAlpha':
Objects/unicodectype.c:212:5: warning: implicit declaration of function 
'iswalpha'

--
components: Build
messages: 129266
nosy: dilyan.palauzov
priority: normal
severity: normal
status: open
title: #include wctype.h in Objects/unicodetype_db.h and 
Objects/unicodectype.c
type: compile error
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11309
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11309] #include wctype.h in Objects/unicodetype_db.h and Objects/unicodectype.c

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Дилян Палаузов wrote:
 
 New submission from Дилян Палаузов dilyan.palau...@aegee.org:
 
 As of python 2.7.1 configured with --enable-ipv6 --enable-unicode 
 --with-system-expat --with-system-ffi --with-signal-module --with-threads 
 --with-wctype-functions --enable-shared:
 
 Please #include wctype.h in Objects/unicodetype_db.h and 
 Objects/unicodectype.c
 
 compilation produces the warnings:
 
 In file included from Objects/unicodectype.c:34:0:
 Objects/unicodetype_db.h: In function '_PyUnicodeUCS2_IsWhitespace':
 Objects/unicodetype_db.h:3277:5: warning: implicit declaration of function 
 'iswspace'
 Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsLowercase':
 Objects/unicodectype.c:192:5: warning: implicit declaration of function 
 'iswlower'
 Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsUppercase':
 Objects/unicodectype.c:197:5: warning: implicit declaration of function 
 'iswupper'
 Objects/unicodectype.c: In function '_PyUnicodeUCS2_ToLowercase':
 Objects/unicodectype.c:202:5: warning: implicit declaration of function 
 'towlower'
 Objects/unicodectype.c:202:12: warning: incompatible implicit declaration of 
 built-in function 'towlower'
 Objects/unicodectype.c: In function '_PyUnicodeUCS2_ToUppercase':
 Objects/unicodectype.c:207:5: warning: implicit declaration of function 
 'towupper'
 Objects/unicodectype.c:207:12: warning: incompatible implicit declaration of 
 built-in function 'towupper'
 Objects/unicodectype.c: In function '_PyUnicodeUCS2_IsAlpha':
 Objects/unicodectype.c:212:5: warning: implicit declaration of function 
 'iswalpha'

--with-wctype-functions will only work if you have configured Python
to use the Unicode variant which is used by wchar_t on your platform.

Given the warnings you are seeing, this appears to be UCS4,
so you have to add --enable-unicode=ucs4 to your configure line.

Please note that support for wctype functions is not being
actively supported in Python anymore. I'd suggest you remove
the --with-wctype-functions option altogether.

--
nosy: +lemburg
title: #include wctype.h in Objects/unicodetype_db.h and 
Objects/unicodectype.c - #include wctype.h in Objects/unicodetype_db.h and   
 Objects/unicodectype.c

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11309
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11306] mailbox should test for errno.EROFS

2011-02-24 Thread R. David Murray


R. David Murray rdmur...@bitdance.com added the comment:

Creating a test for this may not be practical :(

--
assignee:  - r.david.murray
nosy: +r.david.murray
stage:  - needs patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11306
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11258] ctypes: Speed up find_library() on Linux by 500%

2011-02-24 Thread Jonas H.


Changes by Jonas H. jo...@lophus.org:


Added file: 
http://bugs.python.org/file20874/faster-find-library1-py3k-with-escaped-name.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11258
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11295] On Windows, Python crashes on ANSI / Windows-formatted source files

2011-02-24 Thread Jonathan Hayward


Jonathan Hayward jonathan.hayw...@pobox.com added the comment:

Thank you; noted. I'm closing the bug for now at least; I'll reopen it if need 
be.

--
resolution:  - invalid
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11295
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

See also discussion on #5902.

Steffen, your normalization function looks similar to 
encodings.normalize_encoding, with just a few differences (it uses spaces 
instead of dashes, it divides alpha chars from digits).

If it doesn't slow down the normal cases (i.e. 'utf-8', 'utf8', 'latin-1', 
etc.), a more flexible normalization done earlier might be a valid alternative.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


Changes by STINNER Victor victor.stin...@haypocalc.com:


--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Thu, Feb 24, 2011 at 10:30 AM, Ezio Melotti rep...@bugs.python.org wrote:
..
 See also discussion on #5902.

Mark has closed #5902 and indeed the discussion of how to efficiently
normalize encoding names (without changing what is accepted) is beyond
the scope of that or the current issue.  Can someone open a separate
issue to see if we can improve the current situation?  I don't think
having three slightly different normalize functions is optimal.  See
msg129248.

--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

.. i don't have actually invented this algorithm (but don't ask me where i got 
the idea from years ago), i've just implemented the function you see.  The 
algorithm itself avoids some pitfalls in respect to combining numerics and 
significantly reduces the number of possible normalization cases:

ISO-8859-1, ISO8859-1, ISO_8859-1, LATIN1
(+ think of additional mispellings)
all become
iso 8859 1, latin 1
in the end

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

(Everything else is beyond my scope.  But normalizing _ to - is possibly a bad 
idea as far as i can remember the situation three years ago.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

P.P.S.: separating alphanumerics is a win for things like, e.g. UTF-16BE: it 
gets 'utf 16 be' - think about the possible mispellings here and you see this 
algorithm is a good thing

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8036] Interpreter crashes on invalid arg to spawnl on Windows

2011-02-24 Thread Ralf Schmitt


Changes by Ralf Schmitt sch...@gmail.com:


--
nosy: +schmir

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8036
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 On Thu, Feb 24, 2011 at 10:30 AM, Ezio Melotti rep...@bugs.python.org wrote:
 ..
 See also discussion on #5902.
 
 Mark has closed #5902 and indeed the discussion of how to efficiently
 normalize encoding names (without changing what is accepted) is beyond
 the scope of that or the current issue.  Can someone open a separate
 issue to see if we can improve the current situation?  I don't think
 having three slightly different normalize functions is optimal.  See
 msg129248.

Please see my reply on this ticket: those three functions have
different application areas.

On this ticker, we're discussing just one application area: that
of the builtin short cuts.

To have more encoding name variants benefit from the optimization,
we might want to enhance that particular normalization function
to avoid having to compare against utf8 and utf-8 in the
encode/decode functions.

--
title: b'x'.decode('latin1') is much slower than b'x'.decode('latin-1') - 
b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11252] Handling statement OR assignment continuation '\' on Win32 platform

2011-02-24 Thread Suresh Kalkunte


Suresh Kalkunte sskalku...@gmail.com added the comment:

Thanks for the education (hopefully a slight detour for you 8-). I included '/' 
to convey uniform behavior across platforms.

I will take it that the difference in what os.path.split() returns on Win32 vs. 
Linux is not a bug in Python since its Win32 users have come to expect the 
response it gives ? if yes, please point me to a resource (i.e, if you are 
aware, else do not bother 8-) that identifies such other conundrums.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11252
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

So, well, a-ha, i will boot my laptop this evening and (try to) write a patch 
for normalize_encoding(), which will match the standart conforming LATIN1 and 
also will continue to support the illegal latin-1 without actually changing the 
two users PyUnicode_Decode() and PyUnicode_AsEncodedString(), from which i 
better keep the hands off.  But i'm slow, it may take until tomorrow...

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

If the first normalization function is flexible enough to match most of the 
spellings of the optimized encodings, they will all benefit of the optimization 
without having to go through the long path.

(If the normalized encoding name is then passed through, the following 
normalization functions will also have to do less work, but this is out of the 
scope of this issue.)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

I think that the normalization function in unicodeobject.c (only used for 
internal functions) can skip any character different than a-z, A-Z and 0-9. 
Something like:

 import re
 def normalize(name): return re.sub([^a-z0-9], , name.lower())
... 
 normalize(UTF-8)
'utf8'
 normalize(ISO-8859-1)
'iso88591'
 normalize(latin1)
'latin1'

So ISO-8859-1, ISO885-1, LATIN-1, latin1, UTF-8, utf8, etc. will be normalized 
to iso88591, latin1 and utf8.

I don't know any encoding name where a character outside a-z, A-Z, 0-9 means 
anything special. But I don't know all encoding names! :-)

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

Patch implementing my suggestion.

--
Added file: http://bugs.python.org/file20875/aggressive_normalization.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

That will also accept invalid names like 'iso88591' that are not valid now, 
'iso 8859 1' is already accepted.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


Changes by STINNER Victor victor.stin...@haypocalc.com:


Removed file: http://bugs.python.org/file20875/aggressive_normalization.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Thu, Feb 24, 2011 at 11:01 AM, Marc-Andre Lemburg
rep...@bugs.python.org wrote:
..
 On this ticker, we're discussing just one application area: that
 of the builtin short cuts.

Fair enough.  I was hoping to close this ticket by simply committing
the posted patch, but it looks like people want to do more.  I don't
think we'll get measurable performance gains but may improve code
understandability.

 To have more encoding name variants benefit from the optimization,
 we might want to enhance that particular normalization function
 to avoid having to compare against utf8 and utf-8 in the
 encode/decode functions.

Which function are you talking about?

1. normalize_encoding() in unicodeobject.c
2. normalizestring() in codecs.c

The first is s.lower().replace('-', '_') and the second is
s.lower().replace(' ', '_'). (Note space vs. dash difference.)

Why do we need both?  And why should they be different?

--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

As promised, here's the list of places where the wrong Latin-1 encoding 
spelling is used:

Lib//test/test_cmd_line.py:
-- for encoding in ('ascii', 'latin1', 'utf8'):
Lib//test/test_codecs.py:
-- ef = codecs.EncodedFile(f, 'utf-8', 'latin1')
Lib//test/test_shelve.py:
-- shelve.Shelf(d, keyencoding='latin1')[key] = [1]
-- self.assertIn(key.encode('latin1'), d)
Lib//test/test_uuid.py:
-- os.write(fds[1], value.hex.encode('latin1'))
-- child_value = os.read(fds[0], 100).decode('latin1')
Lib//test/test_xml_etree.py:
--  ET.tostring(ET.PI('test', 'testing\xe3'), 'latin1')
-- b?xml version='1.0' encoding='latin1'?\\n?test testing\\xe3?
Lib//urllib/request.py:
-- data = base64.decodebytes(data.encode('ascii')).decode('latin1')
Lib//asynchat.py:
-- encoding= 'latin1'
Lib//sre_parse.py:
-- encode = lambda x: x.encode('latin1')
Lib//distutils/command/bdist_wininst.py:
-- # convert back to bytes. latin1 simply avoids any possible
-- encoding=latin1) as script:
-- script_data = script.read().encode(latin1)
Lib//test/test_bigmem.py:
-- return s.encode(latin1)
-- return bytearray(s.encode(latin1))
Lib//test/test_bytes.py:
-- self.assertRaises(UnicodeEncodeError, self.type2test, sample, 
latin1)
-- b = self.type2test(sample, latin1, ignore)
-- b = self.type2test(sample, latin1)
Lib//test/test_codecs.py:
-- self.assertEqual(\udce4\udceb\udcef\udcf6\udcfc.encode(latin1, 
surrogateescape),
Lib//test/test_io.py:
-- with open(__file__, r, encoding=latin1) as f:
-- t.__init__(b, encoding=latin1, newline=\r\n)
-- self.assertEqual(t.encoding, latin1)
-- for enc in ascii, latin1, utf8 :# , utf-16-be, 
utf-16-le:
Lib//ftplib.py:
-- encoding = latin1

I'll fix those later today or tomorrow.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 I think that the normalization function in unicodeobject.c (only used for 
 internal functions) can skip any character different than a-z, A-Z and 0-9. 
 Something like:
 
 import re
 def normalize(name): return re.sub([^a-z0-9], , name.lower())
 ... 
 normalize(UTF-8)
 'utf8'
 normalize(ISO-8859-1)
 'iso88591'
 normalize(latin1)
 'latin1'
 
 So ISO-8859-1, ISO885-1, LATIN-1, latin1, UTF-8, utf8, etc. will be 
 normalized to iso88591, latin1 and utf8.
 
 I don't know any encoding name where a character outside a-z, A-Z, 0-9 means 
 anything special. But I don't know all encoding names! :-)

I think rather than removing any hyphens, spaces, etc. the
function should additionally:

 * add hyphens whenever (they are missing and) there's switch
   from [a-z] to [0-9]

That way you end up with the correct names for the given set of
optimized encoding names.

--
title: b'x'.decode('latin1') is much slower than b'x'.decode('latin-1') - 
b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 On Thu, Feb 24, 2011 at 11:01 AM, Marc-Andre Lemburg
 rep...@bugs.python.org wrote:
 ..
 On this ticker, we're discussing just one application area: that
 of the builtin short cuts.

 Fair enough.  I was hoping to close this ticket by simply committing
 the posted patch, but it looks like people want to do more.  I don't
 think we'll get measurable performance gains but may improve code
 understandability.
 
 To have more encoding name variants benefit from the optimization,
 we might want to enhance that particular normalization function
 to avoid having to compare against utf8 and utf-8 in the
 encode/decode functions.
 
 Which function are you talking about?
 
 1. normalize_encoding() in unicodeobject.c
 2. normalizestring() in codecs.c

The first one, since that's being used by the shortcuts.

 The first is s.lower().replace('-', '_') and the second is

It does this: s.lower().replace('_', '-')

 s.lower().replace(' ', '_'). (Note space vs. dash difference.)
 
 Why do we need both?  And why should they be different?

Because the first is specifically used for the shortcuts
(which can do more without breaking anything, since it's
only used internally) and the second prepares the encoding
names for lookup in the codec registry (which has a PEP100
defined behavior we cannot easily change).

--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

Ooops, I attached the wrong patch. Here is the new fixed patch.

Without the patch:

 import timeit
 timeit.Timer('a'.encode('latin1')).timeit()
3.8540711402893066
 timeit.Timer('a'.encode('latin-1')).timeit()
1.4946870803833008

With the patch:

 import timeit
 timeit.Timer('a'.encode('latin1')).timeit()
1.4461820125579834
 timeit.Timer('a'.encode('latin-1')).timeit()
1.463456153869629

 timeit.Timer('a'.encode('UTF-8')).timeit()
0.9479248523712158
 timeit.Timer('a'.encode('UTF8')).timeit()
0.9208409786224365

--
Added file: http://bugs.python.org/file20876/aggressive_normalization.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Thu, Feb 24, 2011 at 11:31 AM, Marc-Andre Lemburg
rep...@bugs.python.org wrote:
..
 I think rather than removing any hyphens, spaces, etc. the
 function should additionally:

  * add hyphens whenever (they are missing and) there's switch
   from [a-z] to [0-9]


This will do the wrong thing to the cs family of aliases:


The aliases that start with cs have been added for use with the
IANA-CHARSET-MIB as originally defined in RFC3808, and as currently
maintained by IANA at http://www.iana.org/assignments/ianacharset-mib.
Note that the ianacharset-mib needs to be kept in sync with this
registry.  These aliases that start with cs contain the standard
numbers along with suggestive names in order to facilitate applications
that want to display the names in user interfaces.  The cs stands
for character set and is provided for applications that need a lower
case first letter but want to use mixed case thereafter that cannot
contain any special characters, such as underbar (_) and dash (-).


--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

So happy hacker haypo did it, different however.  It's illegal, but since this 
is a static function which only serves some specific internal strcmp(3)s it may 
do for the mentioned charsets.  I won't boot my laptop this evening.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

STINNER Victor wrote:
 
 STINNER Victor victor.stin...@haypocalc.com added the comment:
 
 Ooops, I attached the wrong patch. Here is the new fixed patch.

That won't work, Victor, since it makes invalid encoding
names valid, e.g. 'utf(=)-8'.

We really only want to add the functionality of matching
encodings names with hyphen or not.

Perhaps it's not really worth the trouble as Alexander suggests
and we should simply add the few extra cases where needed.

--
title: b'x'.decode('latin1') is much slower than b'x'.decode('latin-1') - 
b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Marc-Andre Lemburg


Marc-Andre Lemburg m...@egenix.com added the comment:

Alexander Belopolsky wrote:
 
 Alexander Belopolsky belopol...@users.sourceforge.net added the comment:
 
 On Thu, Feb 24, 2011 at 11:31 AM, Marc-Andre Lemburg
 rep...@bugs.python.org wrote:
 ..
 I think rather than removing any hyphens, spaces, etc. the
 function should additionally:

  * add hyphens whenever (they are missing and) there's switch
   from [a-z] to [0-9]

 
 This will do the wrong thing to the cs family of aliases:

We don't support those for the shortcut optimizations.

--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Thu, Feb 24, 2011 at 11:39 AM, Marc-Andre Lemburg
rep...@bugs.python.org wrote:

 Marc-Andre Lemburg m...@egenix.com added the comment:
..
 That won't work, Victor, since it makes invalid encoding
 names valid, e.g. 'utf(=)-8'.


.. but this *is* valid:

b'abc'

--
title: b'x'.decode('latin1') is much slower thanb'x'.decode('latin-1') 
- b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

 'abc'.encode('utf(=)-8')
b'abc'

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

 That won't work, Victor, since it makes invalid encoding
 names valid, e.g. 'utf(=)-8'.

That already works in Python (thanks to encodings.normalize_encoding).
The problem with the patch is that it makes names like 'iso88591' valid.
Normalize to 'iso 8859 1' should solve this problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Éric Araujo


Éric Araujo mer...@netwok.org added the comment:

Agreed with Marc-André.  It seems too magic and error-prone to do anything else 
than stripping hyphens and spaces.

Steffen: This is a rather minor change in an area that is well known by several 
developers, so don’t take it personally that Victor went ahead and made a quick 
patch.  Patches for other bugs are welcome!  Thanks for your wanting to help.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Steffen Daode Nurpmeso


Steffen Daode Nurpmeso sdao...@googlemail.com added the comment:

That's ok by me.
And 'happy hacker haypo' was not ment unfriendly, i've only repeated the first 
response i've ever posted back to this tracker (guess who was very fast at that 
time :)).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11234] Error in What's new 3.2rc3 with sysconfig.get_config_var('SO')

2011-02-24 Thread Éric Araujo


Changes by Éric Araujo mer...@netwok.org:


--
resolution:  - fixed
stage:  - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11234
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11307] re engine exhaustively explores more than necessary

2011-02-24 Thread Matthew Barnett


Matthew Barnett pyt...@mrabarnett.plus.com added the comment:

It's a known issue (see issue #1662581, for example).

There's a new implementation at PyPI which doesn't have this problem:

http://pypi.python.org/pypi/regex

--
nosy: +mrabarnett

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11307
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10868] ABCMeta.register() should work as a decorator

2011-02-24 Thread Éric Araujo


Éric Araujo mer...@netwok.org added the comment:

Committed to py3k as r88545.  You’ll notice that I fixed the nesting of the 
versionchanged directive and that I changed my mind about “returns”.  Thanks 
again!

--
resolution:  - fixed
stage: patch review - committed/rejected
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10868
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11310] Document byte[s|array]() and byte[s|array](count) in docstrings

2011-02-24 Thread Terry J. Reedy


New submission from Terry J. Reedy tjre...@udel.edu:

The entry for bytearray(source...) says

The optional source parameter can be used to initialize the array in a few 
different ways:
...
If it is an integer, the array will have that size and will be initialized with 
null bytes. 
...
Without an argument, an array of size 0 is created.

[integer must be non-negative -- patch adds this]
The entry for bytes(source...) refers back to the bytearray entry.

The docstrings for bytes and bytearray omit both possibilities.
Attached is a possible patch to include them.

--
assignee: docs@python
components: Documentation
files: zbytes.diff
keywords: patch
messages: 129299
nosy: docs@python, terry.reedy
priority: normal
severity: normal
status: open
title: Document byte[s|array]() and byte[s|array](count) in docstrings
versions: Python 3.2, Python 3.3
Added file: http://bugs.python.org/file20877/zbytes.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11310
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10868] ABCMeta.register() should work as a decorator

2011-02-24 Thread Daniel Stutzbach


Daniel Stutzbach stutzb...@google.com added the comment:

In what use-cases would you want to call MyABC.register() when defining a class 
instead of inheriting from MyABC?

I always thought of the register() as hack to make it possible to support types 
written in C, which can't inherit from the ABC.

--
nosy: +stutzbach

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10868
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11309] #include wctype.h in Objects/unicodetype_db.h and Objects/unicodectype.c

2011-02-24 Thread Amaury Forgeot d'Arc


Amaury Forgeot d'Arc amaur...@gmail.com added the comment:

--with-wctype-functions was removed in 3.2 (see issue9210, r84752)

--
nosy: +amaury.forgeotdarc

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11309
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Committed in r88546 (3.3) and r88548 (3.2).

Note that a simple work-around before 3.2.1 is to spell encoding as 'latin-1' 
or 'iso-8859-1' in pickle.loads().

--
components: +Extension Modules -Library (Lib)
resolution:  - fixed
stage: commit review - committed/rejected
status: open - closed
versions: +Python 3.3

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10882] Add os.sendfile()

2011-02-24 Thread Giampaolo Rodola'


Giampaolo Rodola' g.rod...@gmail.com added the comment:

I'm going to commit the patch and then watch whether some of the buildbots turn 
red.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10882
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

I've committed the part of the patch which disallows a NULL data pointer with 
PyMemoryView_FromBuffer in r88550 and r88551.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

On Thu, Feb 24, 2011 at 3:54 PM, Antoine Pitrou rep...@bugs.python.org wrote:
..
 I've committed the part of the patch which disallows a NULL data pointer
 with PyMemoryView_FromBuffer in r88550 and r88551.

Is it possible to create such buffer in Python (other than by
exploiting a bug or writing a rogue extension module)?  If not, this
should be a SystemError or even just an assert() rather than
ValueError.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

The attached patch is a proof of concept to see if Steffen proposal might be 
viable.

I wrote another normalize_encoding function that implements the algorithm 
described in msg129259, adjusted the shortcuts and did some timings. (Note: the 
function is not tested extensively and might break. It might also be optimized 
further.)

These are the results:
# $ command
# result with my patch
# result without
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('latin1')
100 loops, best of 3: 0.626 usec per loop
10 loops, best of 3: 2.03 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('latin-1')
100 loops, best of 3: 0.614 usec per loop
100 loops, best of 3: 0.616 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('iso-8859-1')
100 loops, best of 3: 0.993 usec per loop
100 loops, best of 3: 0.649 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('iso8859_1')
100 loops, best of 3: 1.01 usec per loop
10 loops, best of 3: 2.08 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('iso_8859_1')
100 loops, best of 3: 0.734 usec per loop
100 loops, best of 3: 0.694 usec per loop
wolf@hp:~/dev/py/py3k$ ./python -m timeit b'x'.decode('utf8')
100 loops, best of 3: 0.728 usec per loop
10 loops, best of 3: 6.37 usec per loop

--
Added file: http://bugs.python.org/file20878/issue11303.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

  I've committed the part of the patch which disallows a NULL data pointer
  with PyMemoryView_FromBuffer in r88550 and r88551.
 
 Is it possible to create such buffer in Python (other than by
 exploiting a bug or writing a rogue extension module)?  If not, this
 should be a SystemError or even just an assert() rather than
 ValueError.

I'm against asserts for such use, since they get disabled in non-debug
mode (which is the mode 99.99% of users run in).

As for SystemError, it means Internal error in the Python interpreter,
which isn't the case here (most likely it's an error in an extension
module instead, possibly a third-party one).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

+char lower[strlen(encoding)*2];

Is this valid in C-89?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

Probably not, but that part should be changed if possible, because is less 
efficient than the previous version that was allocating only 11 bytes.

The problem here is that the previous versions was only changing/removing 
chars, whereas this might add spaces too, so the string might get longer. E.g. 
'utf8' - 'utf 8'. The worst case is 'a1a1a1' - 'a 1 a 1 a 1', and including 
the trailing \0, the result might end up being twice as long than the original 
encoding string. It can be fixed returning 0 as soon as the normalized string 
reaches a fixed threshold (something like 15 chars, depending on the longest 
normalized encoding name).

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10868] ABCMeta.register() should work as a decorator

2011-02-24 Thread Éric Araujo


Éric Araujo mer...@netwok.org added the comment:

Someone may want to register with an ABC but not inherit methods or add a class 
to the mro.  It’s always been allowed by the register method; the new decorator 
feature is just a very minor nicety on top of that.

Edoardo, was your request motivated by a real use case where you didn’t want to 
inherit from the ABC?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10868
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

It seems appropriate to consult python-dev on this.  I thought
ValueError was for values that are valid Python objects but out of
acceptable range of the function.  Errors that can only be triggered
in C code normally handled with either assert() or raise SystemError.
I think you are splitting hairs too thin by distinguishing between
stdlib and 3rd party extensions.

On Thu, Feb 24, 2011 at 4:07 PM, Antoine Pitrou rep...@bugs.python.org wrote:

 Antoine Pitrou pit...@free.fr added the comment:

  I've committed the part of the patch which disallows a NULL data pointer
  with PyMemoryView_FromBuffer in r88550 and r88551.

 Is it possible to create such buffer in Python (other than by
 exploiting a bug or writing a rogue extension module)?  If not, this
 should be a SystemError or even just an assert() rather than
 ValueError.

 I'm against asserts for such use, since they get disabled in non-debug
 mode (which is the mode 99.99% of users run in).

 As for SystemError, it means Internal error in the Python interpreter,
 which isn't the case here (most likely it's an error in an extension
 module instead, possibly a third-party one).

 --

 ___
 Python tracker rep...@bugs.python.org
 http://bugs.python.org/issue11286
 ___


--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

 It seems appropriate to consult python-dev on this.  I thought
 ValueError was for values that are valid Python objects but out of
 acceptable range of the function.  Errors that can only be triggered
 in C code normally handled with either assert() or raise SystemError.
 I think you are splitting hairs too thin by distinguishing between
 stdlib and 3rd party extensions.

Hey, I'm not sure who's splitting hairs here... Nick was ok with the
initial patch and that was sufficient as far as I'm concerned, but you
can ask python-dev if you care.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11258] ctypes: Speed up find_library() on Linux by 500%

2011-02-24 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

Thanks for the new patch. Looking again, I wonder if there's a reason the 
original regexp was so complicated. ldconfig output here has lines such as:

libBrokenLocale.so.1 (libc6,x86-64, OS ABI: Linux 2.6.9) = 
/lib64/libBrokenLocale.so.1
libBrokenLocale.so.1 (libc6, OS ABI: Linux 2.6.9) = 
/lib/libBrokenLocale.so.1
libBrokenLocale.so (libc6,x86-64, OS ABI: Linux 2.6.9) = 
/usr/lib64/libBrokenLocale.so
libBrokenLocale.so (libc6, OS ABI: Linux 2.6.9) = 
/usr/lib/libBrokenLocale.so

Ideally we would factor out the parsing to a separate private function, and 
have tests for it.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11258
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11311] StringIO.readline(0) returns incorrect results

2011-02-24 Thread Ville Skyttä


New submission from Ville Skyttä ville.sky...@iki.fi:

Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) 
[GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2
Type help, copyright, credits or license for more information.
 import StringIO
 StringIO.StringIO(foo).readline(0)
'foo'

I don't think this is the correct behavior, or at least it is not consistent 
with other file objects' readline() which return an empty string with 
.readline(0).

For example:

 import cStringIO
 cStringIO.StringIO(foo).readline(0)
''

...or:

 file(/usr/bin/python).readline(0)
''

--
components: Library (Lib)
messages: 129314
nosy: scop
priority: normal
severity: normal
status: open
title: StringIO.readline(0) returns incorrect results
type: behavior
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11311
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11258] ctypes: Speed up find_library() on Linux by 500%

2011-02-24 Thread Jonas H.


Jonas H. jo...@lophus.org added the comment:

As far as I can tell, it doesn't matter.

We're looking for the part after the = in any case - ignoring the 
ABI/architecture information - so the regex would chose the first of those 
entries.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11258
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11312] Confusing sentence in file.readline() doc

2011-02-24 Thread Ville Skyttä


New submission from Ville Skyttä ville.sky...@iki.fi:

http://docs.python.org/library/stdtypes.html#file.readline

An empty string is returned only when EOF is encountered immediately.

I think this sentence is misleading especially because the word only in it is 
emphasized, because an empty string is also returned when the size argument is 
0 (except for StringIO but I think that's a bug, see #11311).  I suggest 
rephrasing it as:

An empty string is returned only when EOF is encountered immediately or the 
size argument is zero.

...or just removing the sentence altogether.  Text before it already covers the 
size=0 case.

--
assignee: docs@python
components: Documentation
messages: 129316
nosy: docs@python, scop
priority: normal
severity: normal
status: open
title: Confusing sentence in file.readline() doc
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11312
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11286] Some trivial python 2.x pickles fails to load in Python 3.2

2011-02-24 Thread Nick Coghlan


Nick Coghlan ncogh...@gmail.com added the comment:

A SystemError indicates that an internal API was given bogus input or produces 
bogus output (i.e. we screwed up somewhere, or a third party is messing with 
interfaces they shouldn't be)

If data validation fails for part of the public C API (whether it is visible to 
Python code or not), then ValueError is the right thing to raise.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11286
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11313] Speed up default encode()/decode()

2011-02-24 Thread Alexander Belopolsky


New submission from Alexander Belopolsky belopol...@users.sourceforge.net:

In Python 3.x default encoding is always utf-8, but encode()/decode() still try 
to look it up.  Attached patch eliminates a call to normalize_encoding and 
several strcmp() calls.

--
files: default-encode.diff
keywords: patch
messages: 129318
nosy: belopolsky
priority: normal
severity: normal
status: open
title: Speed up default encode()/decode()
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file20879/default-encode.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11313] Speed up default encode()/decode()

2011-02-24 Thread Ezio Melotti


Changes by Ezio Melotti ezio.melo...@gmail.com:


--
nosy: +ezio.melotti

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11314] Subprocess suffers 40% process creation overhead penalty

2011-02-24 Thread Antoine Pitrou


Changes by Antoine Pitrou pit...@free.fr:


--
nosy: +gregory.p.smith

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11314
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11314] Subprocess suffers 40% process creation overhead penalty

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

Python 3.2 has a _posixsubprocess: some parts of subprocess are implemented in 
C. Can you try it?

Python 3.2 uses also pipe2(), if available, to avoid the extra fcntl(4, 
F_GETFD)+fcntl(4, F_SETFD, FD_CLOEXEC).

I suppose that the pipe and mmap(NULL, 1052672, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) is used by subprocess to transfer a Python 
exception from the child process. Python 2.7 encodes the exception value, type 
and traceback using pickle, and the parent process calls os.read(1048576). 
Python 3.2 only encodes the exception value and type using a simple string, the 
parent process uses bytearray with chunks of 50,000 bytes (stop when the total 
size is bigger than 50,000 bytes). So I suppose that Python 3.2 allocates less 
memory in the parent process to read the child exception (if any).

You may also try to change Popen buffer size, but it should not change anything 
if you test exit 0.

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11314
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11314] Subprocess suffers 40% process creation overhead penalty

2011-02-24 Thread Antoine Pitrou


Antoine Pitrou pit...@free.fr added the comment:

I think your analysis is wrong. These mmap() calls are for anonymous memory, 
most likely they are emitted by the libc's malloc() to get some memory from the 
kernel. In other words they will be blazingly fast.

I would suggest you try to dig deeper. For example, how much CPU time does the 
parent process take (excluding its children).

Of course, I also disagree with the idea that spawning exit 0 subprocesses is 
a performance critical operation ;) Therefore, it would be useful to know the 
absolute overhead difference (in milliseconds) between subprocess and 
os.popen(), to decide if there is really a problem.

--
nosy: +pitrou

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11314
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

 That won't work, Victor, since it makes invalid encoding
 names valid, e.g. 'utf(=)-8'.

 .. but this *is* valid: ...

Ah yes, it's because of encodings.normalize_encoding(). It's funny: we have 3 
functions to normalize an encoding name, and each function does something else 
:-) E.g. encodings.normalize_encoding() doesn't replace non-ASCII letters, and 
don't convert to lowercase.

more_aggressive_normalization.patch changes all of the 3 normalization 
functions and add tests on encodings.normalize_encoding().

I think that speed and backward compatibility is more important than conforming 
to IANA or other standards.

Even if ~~ utf#8 ~~ is ugly, I don't think that it really matter that we 
accept it.

--

If you don't want to touch the normalization functions and just add more 
aliases in C fast-paths: we should also add utf8, utf16 and utf32.

Use of utf8 in Python: random.Random.seed(), 
smtpd.SMTPChannel.collect_incoming_data(), tarfile, multiprocessing.connection 
(xml serialization)

PS: On error, UTF-8 decoder raises a UnicodeDecodeError with utf8 as the 
encoding name :-)

--
Added file: http://bugs.python.org/file20880/more_aggressive_normalization.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11303] b'x'.decode('latin1') is much slower than b'x'.decode('latin-1')

2011-02-24 Thread STINNER Victor


STINNER Victor victor.stin...@haypocalc.com added the comment:

 more_aggressive_normalization.patch

Woops, normalizestring() comment points to itself.

normalize_encoding() might also points to the C implementations, at least in a 
# comment.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11303
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11313] Speed up default encode()/decode()

2011-02-24 Thread Ezio Melotti


Ezio Melotti ezio.melo...@gmail.com added the comment:

Patch looks good.
I checked the tests and couldn't fine any test for .encode()/.decode() without 
encoding, so I added them in the attached patch.

--
components: +Interpreter Core
stage:  - commit review
Added file: http://bugs.python.org/file20881/issue11313.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11313] Speed up default encode()/decode()

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Thanks for the review and the tests.  I have found one more place that can be 
easily optimized.  (See patch below.) The decode() methods in bytes and 
bytearray are not so easy unfortunately because for some reason they are 
written to accept any object as self, not only byte/bytearray.  As a result, it 
is not that easy to short-circuit default case in these instances.   With 
respect to the patch below, I'll make sure that there is a test for it.


===
--- Python/getargs.c(revision 88545)
+++ Python/getargs.c(working copy)
@@ -1010,8 +1010,6 @@
 
 /* Get 'e' parameter: the encoding name */
 encoding = (const char *)va_arg(*p_va, const char *);
-if (encoding == NULL)
-encoding = PyUnicode_GetDefaultEncoding();
 
 /* Get output buffer parameter:
's' (recode all objects via Unicode) or
@@ -1051,9 +1049,12 @@
 arg, msgbuf, bufsize);
 
 /* Encode object; use default error handling */
-s = PyUnicode_AsEncodedString(u,
-  encoding,
-  NULL);
+if (encoding == NULL)
+s = PyUnicode_AsUTF8String(u);
+else
+s = PyUnicode_AsEncodedString(u,
+  encoding,
+  NULL);
 Py_DECREF(u);
 if (s == NULL)
 return converterr((encoding failed),

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11171] Python 2.7.1 does not start when ./configure is used with --prefix != --exec-prefix

2011-02-24 Thread Éric Araujo


Éric Araujo mer...@netwok.org added the comment:

Barry, could you try reproducing with distutils.sysconfig?

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11171
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11313] Speed up default encode()/decode()

2011-02-24 Thread Alexander Belopolsky


Alexander Belopolsky belopol...@users.sourceforge.net added the comment:

Committed issue11313.diff in revision 88553.

On the second thought, the getargs optimization is not worth the trouble 
because in existing sources 'e' code is used with constant encodings and one is 
unlikely to pass NULL as an encoding because that is equivalent to eliding the 
'e' code altogether.

I will keep this issue open to consider whether remaining  

if (encoding == NULL)
encoding = PyUnicode_GetDefaultEncoding();

clauses can be simply removed because the lower level code accepts NULL in 
encoding.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11313
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11171] Python 2.7.1 does not start when ./configure is used with --prefix != --exec-prefix

2011-02-24 Thread Barry A. Warsaw


Barry A. Warsaw ba...@python.org added the comment:

On Feb 25, 2011, at 12:35 AM, Éric Araujo wrote:

Éric Araujo mer...@netwok.org added the comment:

Barry, could you try reproducing with distutils.sysconfig?

I'm not quite sure what you mean, but configuring Python 3.1 with different
--prefix and --exec-prefix works just fine.

--
title: Python 2.7.1 does not start when ./configure is used with  --prefix 
!= --exec-prefix - Python 2.7.1 does not start when ./configure is used 
with --prefix != --exec-prefix

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11171
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11287] Add context manager support to dbm modules

2011-02-24 Thread Ray.Allen


Ray.Allen ysj@gmail.com added the comment:

Here is the patch.

--
keywords: +patch
Added file: http://bugs.python.org/file20882/issue11287.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11287
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11315] Cookie.py breaks when passed unicode, fix included

2011-02-24 Thread Alexander Tsepkov


New submission from Alexander Tsepkov atsep...@gmail.com:

in Lib/Cookie.py, BaseCookie load() method performs the following comparison on 
line 624:

str(rawdata) == str()

This breaks when a unicode string is passed in for rawdata. I've included a 
patch that fixes this issue by using isinstance(rawdata, basestring) comparison 
instead. Additionally the patch encodes rawdata in ascii before sending it to 
__ParseString() since that method does not support unicode.

--
components: Unicode
files: cookie_patch.patch
keywords: patch
messages: 129330
nosy: Alexander.Tsepkov
priority: normal
severity: normal
status: open
title: Cookie.py breaks when passed unicode, fix included
versions: Python 2.6
Added file: http://bugs.python.org/file20883/cookie_patch.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11315
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11311] StringIO.readline(0) returns incorrect results

2011-02-24 Thread Alex


Alex alex.gay...@gmail.com added the comment:

Fun fact: io.StringIO does the right thing, but _io and _pyio.

--
nosy: +alex

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11311
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10516] Add list.clear() and list.copy()

2011-02-24 Thread Eli Bendersky


Eli Bendersky eli...@gmail.com added the comment:

A slightly revised patch committed in revision 88554:

1. Fixed Éric's whitespace comment
2. Fixed a test in test_descrtut.py which was listing list's methods
3. Moved the change to collections.py onto Lib/collections/__init__.py
4. Added NEWS entry

Éric - as I mentioned earlier in this issue, I chose to leave the syntax of the 
docstring for the new methods similar to the same methods in dict (dict 
docstring look better and more internally consistent anyhow). I propose to move 
further discussion of this matter into a separate issue which will deal with 
the overall (in)consistency in the docstrings of list and dict.

--
stage: patch review - commit review

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10516
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11314] Subprocess suffers 40% process creation overhead penalty

2011-02-24 Thread Aaron Sherman


Aaron Sherman a...@ajs.com added the comment:

Python 3.2 has a _posixsubprocess: some parts of subprocess are implemented in 
C. Can you try it?

I don't have a Python 3 installation handy, but I can see what I can do 
tomorrow evening to get one set up and try it out.

disagree with the idea that spawning exit 0 subprocesses is a performance 
critical operation ;)

How else would you performance test process creation overhead? By introducing 
as little additional overhead as possible, it's possible for me to get fairly 
close to measuring just the subprocess module's overhead.

If you stop to think about it, though, this is actually a shockingly huge 
percent increase. In any process creation scenario I'm familiar with, its 
overhead should be so small that you could bump it up several orders of 
magnitude and still not compete with executing a shell and asking it to do 
anything, even just exit.

And yet, here we are. 40%

I understand that most applications won't be running massive numbers of 
external commands in parallel, and that's the only way this overhead will 
really matter (at least that I can think of). But in the two scenarios I 
mentioned (monitoring and Web services such as CGI, neither of which is 
particularly rare), this is going to make quite a lot of difference, and if 
you're going to deprecate os.popen, I would think that making sure your 
proposed replacement was at least nearly as performant would be standard 
procedure, no?

I think your analysis is wrong. These mmap() calls are for anonymous memory, 
most likely they are emitted by the libc's malloc() to get some memory from the 
kernel. In other words they will be blazingly fast.

The mremap might be a bit of a performance hit, but it's true that these calls 
should not be substantially slowing execution... then again, they might 
indicate that there's substantial amounts of work being done for which memory 
allocation is required, and as such may simply be a symptom of the actual 
problem.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11314
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11015] Bring test.support docs up to date

2011-02-24 Thread Eli Bendersky


Eli Bendersky eli...@gmail.com added the comment:

Following the python-dev discussion, attaching a patch for removing fcmp and 
replacing its uses with assertAlmostEqual when needed.

All tests pass and patchcheck is clean.

Please review before I commit.

--
nosy: +terry.reedy
Added file: http://bugs.python.org/file20884/issue11015.py3k.remove_fcmp.1.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue11015
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

88 matches

Mail list logo