date:20110127

Re: [Python-Dev] Beta version of the new devguide

2011-01-27 Thread Eli Bendersky

On Sun, Jan 23, 2011 at 03:08, Brett Cannon  wrote:
> http://docs.python.org/devguide/
>
> If you are a core developer and have a correction you want to make you
> can simply check out the devguide yourself (link is in the Resources
> section of the devguide) and make the corrections yourself. Otherwise
> reply here (you can email me directly but I already have instances of
> multiple people telling me about the same spelling mistake so it's
> nice to have it public so people know when I have been informed).

Brett,
A couple of concerns regarding the "Getting Set Up" page:

1)

"Do note that CPython will notice that it is being run from a source
checkout. This means that it if you edit Python source code in your
checkout the changes will be picked up by the interpreter for
immediate testing. "

I'm not sure what this means. Does CPython really know it's being run
from a source checkout as opposed to a source tarball? By editing
"Python source code" you mean the standard libraries/tests? To be
"picked up by the interpreter" you then need to run it from the root
of the checkout (after build) but this is also true for source
tarballs.

2)

"The core CPython interpreter only needs a C compiler to build itself;"

I find this confusing since the CPython interpreter doesn't build
itself. A developer builds it with a C compiler / makefile. Some tools
indeed "build themselves" in some kind of a bootstrap process (i.e.
gcc, AFAIK).

I apologize in advance if this is too nit-picky ;-)
Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Stefan Behnel


"Martin v. Löwis", 28.01.2011 01:02:

Am 27.01.2011 23:53, schrieb Stefan Behnel:

"Martin v. Löwis", 24.01.2011 21:17:

If the string is created directly with the canonical representation
(see below), this representation doesn't take a separate memory block,
but is allocated right after the PyUnicodeObject struct.


Does this mean it's supposed to become a PyVarObject?


What do you mean by "become"? Will it be declared as such? No.


Antoine proposed
that, too. Apart from breaking (more or less) all existing C subtyping
code, this will also make it harder to subtype it in new code. I don't
like that idea at all.


Why will it break all existing subtyping code? See the PEP: Only objects
created through PyUnicode_New will be affected - I don't think this can
include objects of a subtype.


Ok, that's fine then.

Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] fcmp() in test.support

2011-01-27 Thread Eli Bendersky

I'm working on improving the .rst documentation of test.support (Issue
11015), and came upon the undocumented "fcmp" function that's being
exported from test.support, along with a "FUZZ"constant.

As I search through the tests (py3k trunk), I see fcmp() is being used
only in two places in a fairly trivial way:
1. test_float: where it can be directly replaced by assertAlmostEqual
from unittest
2. test_builtin: where the assertion can also be easily rewritten in
terms of assertAlmostEqual

Although fcmp seems to provide extra functionality over
assertAlmostEqual, the above makes me think it should probably be
removed altogether, or added to unittest if it's still deemed
important.

+/- ?
Eli
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

Am 27.01.2011 23:53, schrieb Stefan Behnel:
> "Martin v. Löwis", 24.01.2011 21:17:
>> If the string is created directly with the canonical representation
>> (see below), this representation doesn't take a separate memory block,
>> but is allocated right after the PyUnicodeObject struct.
> 
> Does this mean it's supposed to become a PyVarObject?

What do you mean by "become"? Will it be declared as such? No.

> Antoine proposed
> that, too. Apart from breaking (more or less) all existing C subtyping
> code, this will also make it harder to subtype it in new code. I don't
> like that idea at all.

Why will it break all existing subtyping code? See the PEP: Only objects
created through PyUnicode_New will be affected - I don't think this can
include objects of a subtype.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Martin v. Löwis

> Works for me! Short and elegant.

Done!

http://www.python.org/2.6.x
http://www.python.org/2.x
http://www.python.org/3.1.x
http://www.python.org/3.x

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Stefan Behnel


"Martin v. Löwis", 24.01.2011 21:17:

If the string is created directly with the canonical representation
(see below), this representation doesn't take a separate memory block,
but is allocated right after the PyUnicodeObject struct.


Does this mean it's supposed to become a PyVarObject? Antoine proposed 
that, too. Apart from breaking (more or less) all existing C subtyping 
code, this will also make it harder to subtype it in new code. I don't like 
that idea at all.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Alexander Belopolsky

On Thu, Jan 27, 2011 at 5:40 PM, "Martin v. Löwis"  wrote:
>> Whatever we do, let's use this opportunity to  unify redirect rules
>> for http://www.python.org/X.Y and http://docs.python.org/X.Y.  For a
>> related discussion, see http://bugs.python.org/issue10446.
>
> TLDR; somebody should summarize it and specify what exactly needs to
> be changed.
>

AFAICT, http://docs.python.org/X.Y links consistently point to
http://docs.python.org/release/X.Y.Z, where Z is the last micro
release of X.Y major.minor series.  I don't see any reason to change
anything at the moment, but if http://www.python.org will grow X.Y.x
redirects, it would be nice to have the same under
http://docs.python.org/release/ if not under http://docs.python.org/.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Martin v. Löwis

> Whatever we do, let's use this opportunity to  unify redirect rules
> for http://www.python.org/X.Y and http://docs.python.org/X.Y.  For a
> related discussion, see http://bugs.python.org/issue10446.

TLDR; somebody should summarize it and specify what exactly needs to
be changed.

I'm only going to change the release redirects now.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Antoine Pitrou


> > Incidentally, to slightly reduce the overhead the unicode objects,
> > there's this proposal: http://bugs.python.org/issue1943
> 
> I wonder what aspects of this patch and discussion should be integrated
> into the PEP. The notion of allocating the memory in the same block is
> already considered in the PEP; what else might be relevant?

Ok, I'm sorry for not reading the PEP carefully enough, then.
The patch does a couple of other tweaks such as making "state" a char
rather than an int, and changing the freelist algorithm. But the latter
doesn't need to be spelled out in a PEP anyway.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Alexander Belopolsky

On Thu, Jan 27, 2011 at 4:54 PM, "Martin v. Löwis"  wrote:
..
> How about http://www.python.org/2.7.x redirecting to the latest 2.7.x
> release? Likewise 2.x and 3.x.

Whatever we do, let's use this opportunity to  unify redirect rules
for http://www.python.org/X.Y and http://docs.python.org/X.Y.  For a
related discussion, see http://bugs.python.org/issue10446.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Brett Cannon

On Thu, Jan 27, 2011 at 13:54, "Martin v. Löwis"  wrote:
> Am 27.01.2011 21:38, schrieb Brett Cannon:
>> Because of all the writing I have been doing lately, I have been
>> pulling up a lot of URLs pointing to various Python releases based
>> around minor versions (e.g., Python 2.7, not specifically 2.7.1). What
>> has been somewhat annoying is that there are no URLs which act as a
>> redirect to the latest release of a minor version. For instance, it
>> would be great if http://www.python.org/2.7 redirected to the Python
>> 2.7.1 page.
>
> The tradition is that /X.Y actually points to download/releases/X.Y.
> These redirects haven't been added for 2.7, but are present for all
> earlier releases, and 3.1. So unless there are strong objections,
> I'll add the missing redirects soon.

That would be great. Keeping bumping up against the missing 2.7 redirect.

>
>> Get the ball rolling, I say we make http://www.python.org/version/2.7
>> and http://www.python.org/version/2 redirect to the 2.7.1 release
>> page, etc. Personally I would rather have http://www.python.org/2.7
>> redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt
>> people would be okay with the change.
>
> How about http://www.python.org/2.7.x redirecting to the latest 2.7.x
> release? Likewise 2.x and 3.x.

Works for me! Short and elegant.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Gregory P. Smith

BTW, has anyone looked at what other languages with a native unicode
type do for their implementations if any of them attempt to conserve
ram?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

> I agree. After all, CPython is lucky to have it available. It wouldn't
> be the first time that we duplicate looping code based on the input
> type. However, like the looping code, it will also complicate all
> indexing code at runtime as it always needs to test which of the
> representations is current before it can read a character. Currently,
> all of this is a compile time decision. This will necessarily have a
> performance impact.

That's most certainly the case. That's one of the reasons to discuss
this through a PEP, rather than just coming up with a patch: if people
object to it too much because of the impact on execution speed, it may
get rejected. Of course, that would make those unhappy who complain
about the memory consumption.

This is a classical time-space-tradeoff, favoring space reduction
over time reduction.

I fully understand that the actual impact can only be observed when
an implementation is available, and applications have made a reasonable
effort to work with the implementation efficiently (or perhaps not,
which would show the impact on unmodified implementations).

This is something that works much better in PyPy: the actual string
operations are written in RPython, and the tracing JIT would generate
all versions of the code that are relevant for the different
representations (IIUC, this approach is only planned for PyPy, yet).

I hope that C macros can help reduce the maintenance burden.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Brett Cannon

On Thu, Jan 27, 2011 at 13:21,   wrote:
>    Brett> Bonus points if we extend this to major versions, too. =)
>
> I know you added a smiley, but just wanted to point out that since Python 2
> and 3 are really different languages, referring 2.4 users to 3.3 might be a
> bad idea.  (I imagine it wouldn't be hard to generalize from micro to minor
> though. )

I don't get what you are worried about: http://www.python.org/2 would
refer to 2.7.1 while http://www.python.org/3 would refer to 3.1.3.

I added the smiley as I doubt many people worry about linking to
Python 2 vs. Python 3 as generically as I have lately.

>
> Skip
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Martin v. Löwis

Am 27.01.2011 21:38, schrieb Brett Cannon:
> Because of all the writing I have been doing lately, I have been
> pulling up a lot of URLs pointing to various Python releases based
> around minor versions (e.g., Python 2.7, not specifically 2.7.1). What
> has been somewhat annoying is that there are no URLs which act as a
> redirect to the latest release of a minor version. For instance, it
> would be great if http://www.python.org/2.7 redirected to the Python
> 2.7.1 page.

The tradition is that /X.Y actually points to download/releases/X.Y.
These redirects haven't been added for 2.7, but are present for all
earlier releases, and 3.1. So unless there are strong objections,
I'll add the missing redirects soon.

> Get the ball rolling, I say we make http://www.python.org/version/2.7
> and http://www.python.org/version/2 redirect to the 2.7.1 release
> page, etc. Personally I would rather have http://www.python.org/2.7
> redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt
> people would be okay with the change.

How about http://www.python.org/2.7.x redirecting to the latest 2.7.x
release? Likewise 2.x and 3.x.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

Am 27.01.2011 20:06, schrieb Stefan Behnel:
> "Martin v. Löwis", 24.01.2011 21:17:
>> The Py_UNICODE type is still supported but deprecated. It is always
>> defined as a typedef for wchar_t, so the wstr representation can double
>> as Py_UNICODE representation.
> 
> It's too bad this isn't initialised by default, though. Py_UNICODE is
> the only representation that can be used efficiently from C code and
> Cython relies on it for fast text processing.

That's not true. The str representation can also be used efficiently from C.

> This proposal will
> therefore likely have a pretty negative performance impact on extensions
> written in Cython as the compiler could no longer expect this
> representation to be available instantaneously.

In any case, I've added this concern.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

>>From my first impression, I'm
> not too thrilled by the prospect of making the Unicode implementation
> more complicated by having three different representations on each
> object.

Thanks, added as a concern.

> I also don't see how this could save a lot of memory. As an example
> take a French text with say 10mio code points. This would end up
> appearing in memory as 3 copies on Windows: one copy stored as UCS2 (20MB),
> one as Latin-1 (10MB) and one as UTF-8 (probably around 15MB, depending
> on how many accents are used). That's a saving of -10MB compared to
> today's implementation :-)

As others have pointed out: that's not how it works. It actually *will*
save memory, since the alternative representations are optional.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

Am 25.01.2011 12:08, schrieb Nick Coghlan:
> On Tue, Jan 25, 2011 at 6:17 AM, "Martin v. Löwis"  wrote:
>> A new function PyUnicode_AsUTF8 is provided to access the UTF-8
>> representation. It is thus identical to the existing
>> _PyUnicode_AsString, which is removed. The function will compute the
>> utf8 representation when first called. Since this representation will
>> consume memory until the string object is released, applications
>> should use the existing PyUnicode_AsUTF8String where possible
>> (which generates a new string object every time). API that implicitly
>> converts a string to a char* (such as the ParseTuple functions) will
>> use this function to compute a conversion.
> 
> I'm not entirely clear as to what "this function" is referring to here.

PyUnicode_AsUTF8 (i.e. the one where you don't need to release the
memory). I made this explicit now.

> I'm also dubious of the "PyUnicode_Finalize" name - "PyUnicode_Ready"
> might be a better option (PyType_Ready seems a better analogy for a
> "I've filled everything in, please calculate the derived fields now"
> than Py_Finalize).

Ok, changed (when I was pondering about this PEP, this once occurred
me also, but I forgot when I typed it in).

> 
> More generally, let me see if I understand the proposed structure correctly:
> 
> str: Always set once PyUnicode_Ready() has been called.
>   Always points to the canonical representation of the string (as
> indicated by PyUnicode_Kind)
> length: Always set once PyUnicode_Ready() has been called. Specifies
> the number of code points in the string.

Correct.

> wstr: Set only if PyUnicode_AsUnicode has been called on the string.

Might also be set when the string is created through
PyUnicode_FromUnicode was used, and PyUnicode_Ready hasn't been called.

> If (sizeof(wchar_t) == 2 && PyUnicode_Kind() == PyUnicode_2BYTE)
> or (sizeof(wchar_t) == 4 && PyUnicode_Kind() == PyUnicode_4BYTE), wstr
> = str, otherwise wstr points to dedicated memory
> wstr_length: Valid only if wstr != NULL
> If wstr_length != length, indicates presence of surrogate pairs in
> a UCS-2 string (i.e. sizeof(wchar_t) == 2, PyUnicode_Kind() ==
> PyUnicode_4BYTE).

Correct.

> utf8: Set only if PyUnicode_AsUTF8 has been called on the string.
> If string contents are pure ASCII, utf8 = str, otherwise utf8
> points to dedicated memory.
> utf8_length: Valid only if utf8_ptr != NULL

Correct.

> One change I would propose is that rather than hiding flags in the low
> order bits of the str pointer, we expand the use of the existing
> "state" field to cover the representation information in addition to
> the interning information.

Thanks for the idea; done.

> I would also suggest explicitly flagging
> internally whether or not a 1 byte string is ASCII or Latin-1 along
> the lines of:

Not sure about that. It would complicate PyUnicode_Kind.

Instead, I'd rather fill out utf8 right away if we can use sharing
(e.g. when the string is created with a max value <128, or
PyUnicode_Ready has determined that).

So I keep it for the moment as reserved (but would use it when
str is NULL, as I'd have to fill in some value, anyway).

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Stefan Behnel


James Y Knight, 27.01.2011 21:26:

On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote:

"Martin v. Löwis", 24.01.2011 21:17:

The Py_UNICODE type is still supported but deprecated. It is always
defined as a typedef for wchar_t, so the wstr representation can
double as Py_UNICODE representation.


It's too bad this isn't initialised by default, though. Py_UNICODE is
the only representation that can be used efficiently from C code and
Cython relies on it for fast text processing. This proposal will
therefore likely have a pretty negative performance impact on
extensions written in Cython as the compiler could no longer expect
this representation to be available instantaneously.


But the whole point of the exercise is so that it doesn't have to store
a 4byte-per-char representation when a 1byte-per-char rep would do.


I am well aware of that. But I'm arguing that the current simpler internal 
representation has had its advantages for CPython as a platform.




If cython wants to work most efficiently with this proposal, it should
learn to deal with the three possible raw representations.


I agree. After all, CPython is lucky to have it available. It wouldn't be 
the first time that we duplicate looping code based on the input type. 
However, like the looping code, it will also complicate all indexing code 
at runtime as it always needs to test which of the representations is 
current before it can read a character. Currently, all of this is a compile 
time decision. This will necessarily have a performance impact.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread skip

Brett> Bonus points if we extend this to major versions, too. =)

I know you added a smiley, but just wanted to point out that since Python 2
and 3 are really different languages, referring 2.4 users to 3.3 might be a
bad idea.  (I imagine it wouldn't be hard to generalize from micro to minor
though. )

Skip
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Fred Drake

On Thu, Jan 27, 2011 at 3:38 PM, Brett Cannon  wrote:
> Linking to the 2.7.0 release page seems off since it is
> out of date, but linking to 2.7.1 also seems silly as that will become
> out of date as the newest release of Python 2.7 at some point as well.

I'd love to see something like this as well.  Part of the problem is
that when we want URLs to specific versions (which might even mean
2.7.0), we use the version number as released, and... there's really
not a 2.7.0.  I'd love for us to include ".0" in the actual release
number, instead of calling it just 2.7.  Then we could much more
easily handle this for docs, downloads, and anywhere else we want to
multi-plex multiple versions.

  -Fred

--
Fred L. Drake, Jr.    
"A storm broke loose in my mind."  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

> Repetition of "11"; I'm guessing that the 2byte/UCS-2 should read "10",
> so that they give the width of the char representation.

Thanks, fixed.

>>   00 => null pointer
> 
> Naturally this assumes that all pointers are at least 4-byte aligned (so
> that they can be masked off).  I assume that this is sane on every
> platform that Python supports, but should it be spelled out explicitly
> somewhere in the PEP?

I'll change the PEP to move the type indicator into the state field, so
that issue becomes irrelevant.

>>   The string is null-terminated (in its respective representation).
>> - hash, state: same as in Python 3.2
>> - utf8_length, utf8: UTF-8 representation (null-terminated)
> If this is to share its buffer with the "str" representation for the
> Latin-1 case, then I take it this ptr will typically be (str & ~4) ?
> i.e. only "str" has the low-order-bit type info.

Yes, the other pointers are aligned. Notice that the case in which
sharing occurs is only ASCII, though (for Latin-1, some characters
require two bytes in UTF-8).

> Spelling out the meaning of "optional":
>   does this mean that the relevant ptr is NULL; if so, if utf8 is null,
> is utf8_length undefined, or is it some dummy value?

I've clarified this: I propose length is undefined (unless there is a
good reason to clear it).

>> If the string is created directly with the canonical representation
>> (see below), this representation doesn't take a separate memory block,
>> but is allocated right after the PyUnicodeObject struct.
> 
> Is the idea to do pointer arithmentic when deleting the PyUnicodeObject
> to determine if the ptr is in that location, and not delete it if it is,
> or is there some other way of determining whether the pointers need
> deallocating?

Correct.

> If the former, is this embedding an assumption that the
> underlying allocator couldn't have allocated a buffer directly adjacent
> to the PyUnicodeObject.  I know that GNU libc's malloc/free
> implementation has gaps of two machine words between each allocation;
> off the top of my head I'm not sure if the optimized Object/obmalloc.c
> allocator enforces such gaps.

No, it doesn't... So I guess I reserve another bit in the state for that.

> GDB Debugging Hooks
> ---
> Tools/gdb/libpython.py contains debugging hooks that embed knowledge
> about the internals of CPython's data types, include PyUnicodeObject
> instances.  It will need to be slightly updated to track the change.

Thanks, added.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Glenn Linderman


On 1/27/2011 12:26 PM, James Y Knight wrote:

On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote:

"Martin v. Löwis", 24.01.2011 21:17:

The Py_UNICODE type is still supported but deprecated. It is always
defined as a typedef for wchar_t, so the wstr representation can double
as Py_UNICODE representation.

It's too bad this isn't initialised by default, though. Py_UNICODE is the only 
representation that can be used efficiently from C code and Cython relies on it 
for fast text processing. This proposal will therefore likely have a pretty 
negative performance impact on extensions written in Cython as the compiler 
could no longer expect this representation to be available instantaneously.

But the whole point of the exercise is so that it doesn't have to store a 
4byte-per-char representation when a 1byte-per-char rep would do. If cython 
wants to work most efficiently with this proposal, it should learn to deal with 
the three possible raw representations.


C was doing fast text processing on char long before Py_UNICODE existed, 
or wchar_t.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

> I believe the intent this pep is aiming at is for the existing in
> memory structure to be compatible with already compiled binary
> extension modules without having to recompile them or change the APIs
> they are using.

No, binary compatibility is not achieved. ABI-conforming modules will
continue to work even under this change, but only because access to the
unicode object internal representation is not available to the
restricted ABI.

> Personally I don't care at all about preserving that level of binary
> compatibility, it has been convenient in the past but is rarely the
> right thing to do.  Of course I'd personally like to see PyObject
> nuked and revisited, it is too large and is probably not cache line
> efficient.

That's a different PEP :-)

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Martin v. Löwis

> So, the only criticism I have, intuitively, is that the unicode
> structure seems to become a bit too large. For example, I'm not sure you
> need a generic (pointer, size) pair in addition to the
> representation-specific ones.

It's not really a generic pointer, but rather a variable-sized pointer.
It may not fit into any of the other representations (e.g. if there is
a four-byte wchar_t, then a two-byte representation would fit neither
into the UTF-8 pointer nor into the wchar_t pointer).

> Incidentally, to slightly reduce the overhead the unicode objects,
> there's this proposal: http://bugs.python.org/issue1943

I wonder what aspects of this patch and discussion should be integrated
into the PEP. The notion of allocating the memory in the same block is
already considered in the PEP; what else might be relevant?
Input is welcome!

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] getting stable URLs for major.minor versions

2011-01-27 Thread Brett Cannon

Because of all the writing I have been doing lately, I have been
pulling up a lot of URLs pointing to various Python releases based
around minor versions (e.g., Python 2.7, not specifically 2.7.1). What
has been somewhat annoying is that there are no URLs which act as a
redirect to the latest release of a minor version. For instance, it
would be great if http://www.python.org/2.7 redirected to the Python
2.7.1 page. Linking to the 2.7.0 release page seems off since it is
out of date, but linking to 2.7.1 also seems silly as that will become
out of date as the newest release of Python 2.7 at some point as well.

Can we consider coming up with some URL scheme where people can link
to a version of Python that always redirects to the newest release?
Bonus points if we extend this to major versions, too. =) I am asking
here since the RMs will have to be okay with doing this as part of the
release plan.

Get the ball rolling, I say we make http://www.python.org/version/2.7
and http://www.python.org/version/2 redirect to the 2.7.1 release
page, etc. Personally I would rather have http://www.python.org/2.7
redirect to 2.7.1, but since that already redirects to 2.7.0 I doubt
people would be okay with the change.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread James Y Knight

On Jan 27, 2011, at 2:06 PM, Stefan Behnel wrote:
> "Martin v. Löwis", 24.01.2011 21:17:
>> The Py_UNICODE type is still supported but deprecated. It is always
>> defined as a typedef for wchar_t, so the wstr representation can double
>> as Py_UNICODE representation.
> 
> It's too bad this isn't initialised by default, though. Py_UNICODE is the 
> only representation that can be used efficiently from C code and Cython 
> relies on it for fast text processing. This proposal will therefore likely 
> have a pretty negative performance impact on extensions written in Cython as 
> the compiler could no longer expect this representation to be available 
> instantaneously.

But the whole point of the exercise is so that it doesn't have to store a 
4byte-per-char representation when a 1byte-per-char rep would do. If cython 
wants to work most efficiently with this proposal, it should learn to deal with 
the three possible raw representations.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Import and unicode: part two

2011-01-27 Thread Martin v. Löwis

>When switching to a UTF-8 locale, they can also change the file
> names of their modules to be encoded in UTF-8. It would be fairly easy
> to write a script that identifies non-ASCII file names in a directory
> and offers to transcode their names from their current encoding to
> UTF-8.

In fact, convmv (http://j3e.de/linux/convmv/) does exactly that;
it comes as a Debian package also.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Stefan Behnel


"Martin v. Löwis", 24.01.2011 21:17:

The Py_UNICODE type is still supported but deprecated. It is always
defined as a typedef for wchar_t, so the wstr representation can double
as Py_UNICODE representation.


It's too bad this isn't initialised by default, though. Py_UNICODE is the 
only representation that can be used efficiently from C code and Cython 
relies on it for fast text processing. This proposal will therefore likely 
have a pretty negative performance impact on extensions written in Cython 
as the compiler could no longer expect this representation to be available 
instantaneously.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages

2011-01-27 Thread Brett Cannon

2011/1/27 Łukasz Langa :
>
> W dniu 2011-01-24 23:13, Benjamin Peterson pisze:
>>
>>  I prefer lib2to3 tests to stay in lib2to3/.
>
> On a related note, I had trouble myself with using outdated 2to3 and
> heard complaints about that at least a couple of times. What do we gain
> from bundling 2to3 with Python?

Same thing we get when we bundle anything with Python: one less
dependency for people to download. Obviously this shouldn't be as much
of an issue once Python 3.2 is out.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 393: Flexible String Representation

2011-01-27 Thread Antoine Pitrou

Le mercredi 26 janvier 2011 à 21:50 -0800, Gregory P. Smith a écrit :
> >
> > Incidentally, to slightly reduce the overhead the unicode objects,
> > there's this proposal: http://bugs.python.org/issue1943
> 
> Interesting.  But that aims more at cpu performance than memory
> overhead.  What I see is programs that predominantly process ascii
> data yet waste memory on a 2-4x data explosion of the internal
> representation.  This PEP aims to address that larger target.

Right, but we should keep in mind that many unicode strings will not be
very large, and so the constant overhead of unicode objects is not
necessarily negligible.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages

2011-01-27 Thread Łukasz Langa



W dniu 2011-01-24 23:13, Benjamin Peterson pisze:

 I prefer lib2to3 tests to stay in lib2to3/.


On a related note, I had trouble myself with using outdated 2to3 and
heard complaints about that at least a couple of times. What do we gain
from bundling 2to3 with Python?

--
Best regards,
Łukasz Langa

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Beta version of the new devguide

Re: [Python-Dev] PEP 393: Flexible String Representation

[Python-Dev] fcmp() in test.support

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] PEP 393: Flexible String Representation

[Python-Dev] getting stable URLs for major.minor versions

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] Import and unicode: part two

Re: [Python-Dev] PEP 393: Flexible String Representation

Re: [Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages

Re: [Python-Dev] PEP 393: Flexible String Representation

[Python-Dev] Why do we bundle lib2to3 with Python? Was: Location of tests for packages

32 matches

Site Navigation

Mail list logo

Footer information