Stefan Behnel, 25.08.2011 23:30:
Stefan Behnel, 25.08.2011 20:47:
"Martin v. Löwis", 24.08.2011 20:15:
- issues to be considered (unclarities, bugs, limitations, ...)
A problem of the current implementation is the need for calling
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g.
On Fri, Aug 26, 2011 at 2:04 PM, Andrew Pennebaker
wrote:
> Please have the Windows installers add the Python installation directory to
> the PATH environment variable.
Please read PEP 397: Python Launcher for Windows.
Or at least do us the courtesy of acknowledging that if the issue was
as simp
Isaac Morland, 26.08.2011 04:28:
On Thu, 25 Aug 2011, Guido van Rossum wrote:
I'm not sure what should happen with UTF-8 when it (in flagrant
violation of the standard, I presume) contains two separately-encoded
surrogates forming a valid surrogate pair; probably whatever the UTF-8
codec does on
+ 0 for automatically adding to %PATH%
+ 1 for providing an option to the user during install
- John
On Thu, Aug 25, 2011 at 9:04 PM, Andrew Pennebaker <
andrew.penneba...@gmail.com> wrote:
> Please have the Windows installers add the Python installation directory to
> the PATH environment var
Please have the Windows installers add the Python installation directory to
the PATH environment variable.
Many newbies dive in without knowing that they must manually add C:\PythonXY
to PATH. It's yak shaving, something perfectly automatable that should have
been done by the installers way back i
On Thu, Aug 25, 2011 at 7:28 PM, Isaac Morland wrote:
> On Thu, 25 Aug 2011, Guido van Rossum wrote:
>
>> I'm not sure what should happen with UTF-8 when it (in flagrant
>> violation of the standard, I presume) contains two separately-encoded
>> surrogates forming a valid surrogate pair; probably
On Thu, Aug 25, 2011 at 6:40 PM, Ezio Melotti wrote:
> On Fri, Aug 26, 2011 at 1:54 AM, Guido van Rossum wrote:
>>
>> On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy wrote:
>> > Excuse me for believing the fine 3.2 manual that says
>> > "Strings contain Unicode characters." (And to a naive reader,
On Thu, 25 Aug 2011, Guido van Rossum wrote:
I'm not sure what should happen with UTF-8 when it (in flagrant
violation of the standard, I presume) contains two separately-encoded
surrogates forming a valid surrogate pair; probably whatever the UTF-8
codec does on a wide build today should be goo
On Fri, Aug 26, 2011 at 1:54 AM, Guido van Rossum wrote:
> On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy wrote:
> > Excuse me for believing the fine 3.2 manual that says
> > "Strings contain Unicode characters." (And to a naive reader, that
> implies
> > that string iteration and indexing should
On Thu, Aug 25, 2011 at 4:58 AM, Stephen J. Turnbull wrote:
> The problem with your legalistic approach, as I see it, is that if our
> definition is looser than the users', all their surprises will be
> unpleasant. That's not good.
I see no alternative to explicitly spelling out what all operati
On Thu, Aug 25, 2011 at 2:39 AM, Stephen J. Turnbull wrote:
> If our process is working with an external process (the OS's file
> system driver) whose definition includes the statement that "File
> names are sequences of Unicode characters",
Does any OS actually say that? Don't they usually say "
On Wed, Aug 24, 2011 at 8:34 PM, Greg Ewing wrote:
> What about things like the surrogateescape codec that
> deliberately use code units in non-standard ways? Will
> tricks like that still be possible if the code-unit
> level is hidden from the programmer?
I would think that it should still be po
Guido wrote:
> Which reminds me. The PEP does not say what other Python
> implementations besides CPython should do. presumably Jython and
> IronPython will continue to use UTF-16, so presumably the language
> reference will still have to document that strings contain code units (not
> code
> poin
On Wed, Aug 24, 2011 at 11:37 PM, Terry Reedy wrote:
> On 8/24/2011 1:45 PM, Victor Stinner wrote:
>
>> Le 24/08/2011 02:46, Terry Reedy a écrit :
>>
>
> I don't think that using UTF-16 with surrogate pairs is really a big
>> problem. A lot of work has been done to hide this. For example,
>> rep
On Wed, Aug 24, 2011 at 10:50 AM, "Martin v. Löwis" wrote:
> Not with these words, though. As I recall, it's rather like (still
> with different words) "len() will stay O(1) forever, regardless of
> any perceived incorrectness of this choice".
And indexing/slicing will also be O(1).
> An attempt
[Apologies for sending out a long stream of pointed responses, written
before I have fully digested this entire mega-thread. I don't have the
patience today to collect them all into a single mega-response.]
On Wed, Aug 24, 2011 at 10:45 AM, Victor Stinner
wrote:
> Note: Java and the Qt library us
On Wed, Aug 24, 2011 at 3:06 AM, Terry Reedy wrote:
> Excuse me for believing the fine 3.2 manual that says
> "Strings contain Unicode characters." (And to a naive reader, that implies
> that string iteration and indexing should produce Unicode characters.)
The naive reader also doesn't know the
On Wed, Aug 24, 2011 at 1:22 AM, Stephen J. Turnbull
wrote:
> Well, no, it gives the right answer according to the design. unicode
> objects do not contain character strings. By design, they contain
> code point strings. Guido has made that absolutely clear on a number
> of occasions.
Actually
On Tue, Aug 23, 2011 at 7:41 PM, Torsten Becker
wrote:
> On Tue, Aug 23, 2011 at 10:08, Antoine Pitrou wrote:
>> Macros are useful to shield the abstraction from the implementation. If
>> you access the members directly, and the unicode object is represented
>> differently in some future version
On Thu, Aug 25, 2011 at 1:24 AM, "Martin v. Löwis" wrote:
>> With this PEP, the unicode object overhead grows to 10 pointer-sized
>> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
>> Does it have any adverse effects?
>
> If I count correctly, it's only three *additional* wor
Stefan Behnel, 25.08.2011 20:47:
"Martin v. Löwis", 24.08.2011 20:15:
- issues to be considered (unclarities, bugs, limitations, ...)
A problem of the current implementation is the need for calling
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
insufficient memory). Basic
Wiadomość napisana przez Sandro Tosi w dniu 23 sie 2011, o godz. 01:09:What I want to understand if it's an acceptable change.I see sphinx more as of an internal, building tool, so freezing itit's like saying "don't upgrade gcc" or so.Normally I'd say it's natural for us to specify that for a legac
"Martin v. Löwis", 24.08.2011 20:15:
- issues to be considered (unclarities, bugs, limitations, ...)
A problem of the current implementation is the need for calling
PyUnicode_(FAST_)READY(), and the fact that it can fail (e.g. due to
insufficient memory). Basically, this means that even somet
Okay, I am convinced. :) If Martin does not object, I would change
the "void *str" field to
union {
void *any;
unsigned char *latin1;
Py_UCS2 *ucs2;
Py_UCS4 *ucs4;
} data;
Regards,
Torsten
On Wed, Aug 24, 2011 at 02:57, Stefan Behnel wrote:
> To
Sorry for the crossposting, but I don't know who admins the pycon.org site.
it seems that something happened to "ar.pycon.org", it should point to
the same IP than "pycon.python.org.ar" (190.228.30.157).
Somebody knows who can fix it?
BTW, how do I update that page? We're having the third PyCon
On Thu, Aug 25, 2011 at 9:59 PM, Nick Coghlan wrote:
> A link to http://www.python.org/news/security/ would be handy here,
> since that has the GPG key to send encrypted messages to the security
> list.
http://www.python.org/security/ is a better variant of the link,
though (it redirects to the s
"Martin v. Löwis" writes:
> Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
> > "Martin v. Löwis" writes:
> >
> > > No, that's explicitly *not* what C6 says. Instead, it says that a
> > > process that treats s1 and s2 differently shall not assume that others
> > > will do the same, i.e
On Tue, Aug 23, 2011 at 7:46 AM, ezio.melotti
wrote:
> +security
> + Issues that might have security implications. If you think the issue
> + should not be made public, please report it to secur...@python.org
> instead.
A link to http://www.python.org/news/security/ would be handy here,
s
On Thu, Aug 25, 2011 at 7:57 PM, "Martin v. Löwis" wrote:
> Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
>> I'm simply saying that the current
>> implementation of strings, as improved by PEP 393, can not be said to
>> be conforming.
>
> I continue to disagree. The Unicode standard deliberate
Hello,
On Thu, 25 Aug 2011 10:24:39 +0200
"Martin v. Löwis" wrote:
>
> On a 32-bit machine with a 32-bit wchar_t, pure-ASCII strings of length
> 1 (+NUL) will take the same memory either way: 8 bytes for the
> characters in 3.2, 2 bytes in 3.3 + extra pointer + padding. Strings
> of 2 or more c
Am 25.08.2011 11:39, schrieb Stephen J. Turnbull:
> "Martin v. Löwis" writes:
>
> > No, that's explicitly *not* what C6 says. Instead, it says that a
> > process that treats s1 and s2 differently shall not assume that others
> > will do the same, i.e. that it is ok to treat them the same even t
"Martin v. Löwis" writes:
> No, that's explicitly *not* what C6 says. Instead, it says that a
> process that treats s1 and s2 differently shall not assume that others
> will do the same, i.e. that it is ok to treat them the same even though
> they have different code points. Treating them diff
Le 25/08/2011 06:46, Stefan Behnel a écrit :
Conversion to wchar_t* is common, especially on Windows.
That's an issue. However, I cannot say how common this really is in
practice. Surely depends on the specific code, right? How common is it
in core CPython?
Quite all functions taking text as
Le 25/08/2011 06:12, Stephen J. Turnbull a écrit :
> Let's take small steps. Do the evolutionary thing. Let's get things
> right so users won't have to worry about code points vs. code units
> any more. A conforming library for all things at the character level
> can be developed late
> With this PEP, the unicode object overhead grows to 10 pointer-sized
> words (including PyObject_HEAD), that's 80 bytes on a 64-bit machine.
> Does it have any adverse effects?
If I count correctly, it's only three *additional* words (compared to
3.2): four new ones, minus one that is removed. I
> What about things like the surrogateescape codec that
> deliberately use code units in non-standard ways? Will
> tricks like that still be possible if the code-unit
> level is hidden from the programmer?
Most certainly. In the PEP-393 representation, the surrogate
characters can readily be repre
> > What is non-conforming about comparing two code points?
>
> Unicode conformance means treating characters correctly.
Re-read the text. You are interpreting something that isn't there.
> > Seriously, what does Unicode-conforming mean here?
>
> Chapter 3, all verses. Here, specifically C6
>>Strings contain Unicode code units, which for most purposes can be
>>treated as Unicode characters. However, even as "simple" an
>>operation as "s1[0] == s2[0]" cannot be relied upon to give
>>Unicode-conforming results.
>>
>> The second sentence remains true under PEP 393.
>
>
On 25/08/11 14:29, Guido van Rossum wrote:
Let's get things
right so users won't have to worry about code points vs. code units
any more.
What about things like the surrogateescape codec that
deliberately use code units in non-standard ways? Will
tricks like that still be possible if the code-u
+1 FileSystemError - For already stated reasons.
- John
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
40 matches
Mail list logo