Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Greg Ewing

Tres Seaver wrote:


I do know for a fact that using a UCS2-compiled Python instead of the
system's UCS4-compiled Python leads to measurable, noticable drop in
memory consumption of long-running webserver processes using Unicode


Would there be any sanity in having an option to compile
Python with UTF-8 as the internal string representation?

--
Greg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stefan Behnel

Ian Bicking, 26.06.2010 00:26:

On Fri, Jun 25, 2010 at 4:02 PM, Guido van Rossum wrote:

On Fri, Jun 25, 2010 at 1:43 PM, Glyph Lefkowitz

I'd like a version of 'decode' which would give me a type that was, in

every

respect, unicode, and responded to all protocols exactly as other
unicode objects (or "str objects", if you prefer py3 nomenclature ;-))

do,

but wouldn't actually copy any of that memory unless it really needed to
(for example, to pass to a C API that expected native wide characters),

and

that would hold on to the original bytes so that it could produce them on
demand if encoded to the same encoding again. So, as others in this

thread

have mentioned, the 'ABC' really implies some stuff about C APIs as well.


Well, there's the buffer API, so you can already create something that 
refers to an existing C buffer. However, with respect to a string, you will 
have to make sure the underlying buffer doesn't get freed while the string 
is still in use. That will be hard and sometimes impossible to do at the 
C-API level, even if the string is allowed to keep a reference to something 
that holds the buffer.


At least in lxml, such a feature would be completely worthless, as text is 
never held by any ref-counted Python wrapper object. It's only part of the 
XML tree, which is allowed to change at (more or less) any time, so the 
underlying char* buffer could just get freed without further notice. Adding 
a guard against that would likely have a larger impact on the performance 
than the decoding operations.




I'm not sure about the exact performance impact of such a class, which is
why I'd like the ability to implement it *outside* of the stdlib and see

how

it works on a project, and return with a proposal along with some data.
  There are also different ways to implement this, and other optimizations
(like ropes) which might be better.
You can almost do this today, but the lack of things like the

hypothetical

"__rcontains__" does make it impossible to be totally transparent about

it.

But you'd still have to validate it, right? You wouldn't want to go on
using what you thought was wrapped UTF-8 if it wasn't actually valid
UTF-8 (or you'd be worse off than in Python 2). So you're really just
worried about space consumption. I'd like to see a lot of hard memory
profiling data before I got overly worried about that.


It wasn't my profiling, but I seem to recall that Fredrik Lundh specifically
benchmarked ElementTree with all-unicode and sometimes-ascii-bytes, and
found that using Python 2 strs in some cases provided notable advantages.  I
know Stefan copied ElementTree in this regard in lxml, maybe he also did a
benchmark or knows of one?


Actually, bytes vs. unicode doesn't make that a big difference in Py2 for 
lxml. ElementTree is a lot older, so I guess it made a larger difference 
when its code was written (and I even think I recall seeing numbers for 
lxml where it seemed to make a notable difference).


In lxml, text content is stored in the C tree of libxml2 as UTF-8 encoded 
char* text. On request, lxml creates a string object from it and returns 
it. In Py2, it checks for plain ASCII content first and returns a byte 
string for that. Only non-ASCII strings are returned as decoded unicode 
strings. In Py3, it always returns unicode strings.


When I run a little benchmark on lxml in Py2.6.5 that just reads some short 
text content from an Element object, I only see a tiny difference between 
unicode strings and byte strings. The gap obviously increases when the text 
gets longer, e.g. when I serialise the complete text content of an XML 
document to either a byte string or a unicode string. But even for 
documents in the megabyte range we are still talking about single 
milliseconds here, and the difference stays well below 10%. It's seriously 
hard to make that the performance bottleneck in an XML application.


Also, since the string objects are only instantiated at request, memory 
isn't an issue either. That's different for (c)ElementTree again, where 
string content is stored as Python objects. Four times the size even for 
plain ASCII strings (e.g. numbers, IDs or even trailing whitespace!) can 
well become a problem there, and can easily dominate the overall size of 
the in-memory tree. Plain ASCII content is surprisingly common in XML 
documents.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stefan Behnel

Greg Ewing, 26.06.2010 09:58:

Tres Seaver wrote:


I do know for a fact that using a UCS2-compiled Python instead of the
system's UCS4-compiled Python leads to measurable, noticable drop in
memory consumption of long-running webserver processes using Unicode


Would there be any sanity in having an option to compile
Python with UTF-8 as the internal string representation?


It would break Py_UNICODE, because the internal size of a unicode character 
would no longer be fixed.


Stefan

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Signs of neglect?

2010-06-26 Thread Steve Holden
Nick Coghlan wrote:
> On Sat, Jun 26, 2010 at 9:23 AM, Benjamin Peterson  
> wrote:
>> 2010/6/25 Steve Holden :
>> I would call it more a sign of no tests rather than one of neglect and
>> perhaps also an indication of the usefulness of those tools.
> 
> Less than useful tools with no tests probably qualify as neglected...
> 
> An assessment of the contents of the Py3k tools directory is probably
> in order, with at least a basic "will it run?" check added for those
> we decide to keep..
> 
Neither webchecker nor wcgui.py will run - the former breaks because
sgmllib is mossing, the latter because it uses the wrong name for
"tkinter" (but overcoming this will throw it bak to an sgmllib
dependency too).

Guido thinks it's OK to abandon at least some of them, so I don't see
the rest getting much love in the future. They do need sorting through -
I don't see anyone wanting xxci.py, for example ("check in files for
which rcsdiff returns nonzero exit status").

But I'm grateful you agree with my diagnosis of neglect (not that a
diagnosis in itself is going to help in fixing things).

regards
 Steve
-- 
Steve Holden   +1 571 484 6266   +1 800 494 3119
See Python Video!   http://python.mirocommunity.org/
Holden Web LLC http://www.holdenweb.com/
UPCOMING EVENTS:http://holdenweb.eventbrite.com/
"All I want for my birthday is another birthday" -
 Ian Dury, 1942-2000
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread John Nagle

We have just released a proof-of-concept implementation of a new
approach to thread management - "newthreading".  It is available
for download at

https://sourceforge.net/projects/newthreading/

The user's guide is at

http://www.animats.com/papers/languages/newthreadingintro.html

This is a pure Python implementation of synchronized objects, along
with a set of restrictions which make programs race-condition free,
even without a Global Interpreter Lock.  The basic idea is that
classes derived from SynchronizedObject are automatically locked
at entry and unlocked at exit. They're also unlocked when a thread
blocks within the class.  So at no time can two threads be active
in such a class at one time.

In addition, only "frozen" objects can be passed in and out of
synchronized objects.  (This is somewhat like the multiprocessing
module, where you can only pass objects that can be "pickled".
But it's not as restrictive; multiple threads can access the
same synchronized object, one at a time.

This pure Python implementation is usable, but does not improve
performance.  It's a proof of concept implementation so that
programmers can try out synchronized classes and see what it's
like to work within those restrictions.

The semantics of Python don't change for single-thread programs.
But when the program forks off the first new thread, the rules
change, and some of the dynamic features of Python are disabled.

Some of the ideas are borrowed from Java, and some are from
"safethreading".  The point is to come up with a set of liveable
restrictions which would allow getting rid of the GIL.  This
is becoming essential as Unladen Swallow starts to work and the
number of processors per machine keeps climbing.

This may in time become a Python Enhancement Proposal.  We'd like
to get some experience with it first. Try it out and report back.
The SourceForge forum for the project is the best place to report problems.

John Nagle
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [pypy-dev] PyPy 1.3 released

2010-06-26 Thread Armin Rigo
Hi,

On Fri, Jun 25, 2010 at 05:27:52PM -0600, Maciej Fijalkowski wrote:
>python setup.py build

As corrected on the blog (http://morepypy.blogspot.com/), this line
should read:

 pypy setup.py build


Armin.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread Michael Foord

On 26/06/2010 07:11, John Nagle wrote:

We have just released a proof-of-concept implementation of a new
approach to thread management - "newthreading". It is available
for download at

https://sourceforge.net/projects/newthreading/

The user's guide is at

http://www.animats.com/papers/languages/newthreadingintro.html


The user guide says:

The suggested import is

from newthreading import *

The import * form is considered bad practise in *general* and should not 
be recommended unless there is a good reason. This is slightly off-topic 
for python-dev, although I appreciate that you want feedback with the 
eventual goal of producing a PEP - however the introduction of 
free-threading in Python has not been hampered by lack of 
synchronization primitives but by the difficulty of changing the 
interpreter without unduly impacting single threaded code.


Providing an alternative garbage collection mechanism other than 
reference counting would be a more interesting first-step as far as I 
can see, as that removes the locking required around every access to an 
object (which currently touches the reference count). Introducing 
free-threading by *changing* the threading semantics (so you can't share 
non-frozen objects between threads) would not be acceptable. That 
comment is likely to be based on a misunderstanding of your future 
intentions though. :-)


All the best,

Michael Foord


This is a pure Python implementation of synchronized objects, along
with a set of restrictions which make programs race-condition free,
even without a Global Interpreter Lock. The basic idea is that
classes derived from SynchronizedObject are automatically locked
at entry and unlocked at exit. They're also unlocked when a thread
blocks within the class. So at no time can two threads be active
in such a class at one time.

In addition, only "frozen" objects can be passed in and out of
synchronized objects. (This is somewhat like the multiprocessing
module, where you can only pass objects that can be "pickled".
But it's not as restrictive; multiple threads can access the
same synchronized object, one at a time.

This pure Python implementation is usable, but does not improve
performance. It's a proof of concept implementation so that
programmers can try out synchronized classes and see what it's
like to work within those restrictions.

The semantics of Python don't change for single-thread programs.
But when the program forks off the first new thread, the rules
change, and some of the dynamic features of Python are disabled.

Some of the ideas are borrowed from Java, and some are from
"safethreading". The point is to come up with a set of liveable
restrictions which would allow getting rid of the GIL. This
is becoming essential as Unladen Swallow starts to work and the
number of processors per machine keeps climbing.

This may in time become a Python Enhancement Proposal. We'd like
to get some experience with it first. Try it out and report back.
The SourceForge forum for the project is the best place to report 
problems.


John Nagle
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk 




--
http://www.ironpythoninaction.com/
http://www.voidspace.org.uk/blog

READ CAREFULLY. By accepting and reading this email you agree, on behalf of 
your employer, to release me from all obligations and waivers arising from any 
and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, 
clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and 
acceptable use policies (”BOGUS AGREEMENTS”) that I have entered into with your 
employer, its partners, licensors, agents and assigns, in perpetuity, without 
prejudice to my ongoing rights and privileges. You further represent that you 
have the authority to release me from any BOGUS AGREEMENTS on behalf of your 
employer.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread Jesse Noller
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord
 wrote:
> On 26/06/2010 07:11, John Nagle wrote:
>>
>> We have just released a proof-of-concept implementation of a new
>> approach to thread management - "newthreading". It is available
>> for download at
>>
>> https://sourceforge.net/projects/newthreading/
>>
>> The user's guide is at
>>
>> http://www.animats.com/papers/languages/newthreadingintro.html
>
> The user guide says:
>
> The suggested import is
>
> from newthreading import *
>
> The import * form is considered bad practise in *general* and should not be
> recommended unless there is a good reason. This is slightly off-topic for
> python-dev, although I appreciate that you want feedback with the eventual
> goal of producing a PEP - however the introduction of free-threading in
> Python has not been hampered by lack of synchronization primitives but by
> the difficulty of changing the interpreter without unduly impacting single
> threaded code.
>

I asked John to drop a message here for this project - so feel free to
flame me if anyone. This *is* relevant, and I'd guess fairly
interesting to the group as a whole.

jesse
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread Antoine Pitrou
On Sat, 26 Jun 2010 14:29:24 +0100
Michael Foord  wrote:
> 
> the introduction of 
> free-threading in Python has not been hampered by lack of 
> synchronization primitives but by the difficulty of changing the 
> interpreter without unduly impacting single threaded code.

Exactly what I think too.

cheers

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread Jesse Noller
On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord
 wrote:
> On 26/06/2010 07:11, John Nagle wrote:
>>
>> We have just released a proof-of-concept implementation of a new
>> approach to thread management - "newthreading". It is available
>> for download at
>>
>> https://sourceforge.net/projects/newthreading/
>>
>> The user's guide is at
>>
>> http://www.animats.com/papers/languages/newthreadingintro.html
>
> The user guide says:
>
> The suggested import is
>
> from newthreading import *
>
> The import * form is considered bad practise in *general* and should not be
> recommended unless there is a good reason. This is slightly off-topic for
> python-dev, although I appreciate that you want feedback with the eventual
> goal of producing a PEP - however the introduction of free-threading in
> Python has not been hampered by lack of synchronization primitives but by
> the difficulty of changing the interpreter without unduly impacting single
> threaded code.
>
> Providing an alternative garbage collection mechanism other than reference
> counting would be a more interesting first-step as far as I can see, as that
> removes the locking required around every access to an object (which
> currently touches the reference count). Introducing free-threading by
> *changing* the threading semantics (so you can't share non-frozen objects
> between threads) would not be acceptable. That comment is likely to be based
> on a misunderstanding of your future intentions though. :-)
>
> All the best,
>
> Michael Foord

I'd also like to point out, that one of the project John cites is Adam
Olsen's Safethread work:

http://code.google.com/p/python-safethread/

Which, in and of itself is a good read.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Stephen J. Turnbull
Greg Ewing writes:

 > Would there be any sanity in having an option to compile
 > Python with UTF-8 as the internal string representation?

Losing Py_UNICODE as mentioned by Stefan Behnel (IIRC) is just the
beginning of the pain.

If Emacs's experience is any guide, the cost in speed and complexity
of a variable-width internal representation is high.  There are a
number of tricks you can use, but basically everything becomes O(n)
for the natural implementation of most operations (such as indexing by
character).  You can get around that with a position cache, of course,
but that adds complexity, and really cuts into the space saving (and
worse, adds another chunk that may or may not be paged in when you
need it).

What we're considering is a system where buffers come in 1-, 2-, and
4-octet widechars, with automatic translation depending on content.
But the buffer is the primary random-access structure in Emacsen, so
optimizing it is probably worth our effort.  I doubt it would be worth
it for Python, but my intuitions here are not reliable.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [ANN]: "newthreading" - an approach to simplified thread usage, and a path to getting rid of the GIL

2010-06-26 Thread John Nagle

On 6/26/2010 7:44 AM, Jesse Noller wrote:

On Sat, Jun 26, 2010 at 9:29 AM, Michael Foord
  wrote:

On 26/06/2010 07:11, John Nagle wrote:


We have just released a proof-of-concept implementation of a new
approach to thread management - "newthreading".




The import * form is considered bad practise in *general* and
should not be recommended unless there is a good reason.


   I agree.  I just did that to make the examples cleaner.


however the introduction of free-threading in Python has not been
hampered by lack of synchronization primitives but by the
difficulty of changing the interpreter without unduly impacting
single threaded code.


That's what I'm trying to address here.


Providing an alternative garbage collection mechanism other than
reference counting would be a more interesting first-step as far as
I can see, as that removes the locking required around every access
to an object (which currently touches the reference count).
Introducing free-threading by *changing* the threading semantics
(so you can't share non-frozen objects between threads) would not
be acceptable. That comment is likely to be based on a
misunderstanding of your future intentions though. :-)


This work comes out of a discussion a few of us had at a restaurant
in Palo Alto after a Stanford talk by the group at Facebook which
is building a JIT compiler for PHP.  We were discussing how to
make threading both safe for the average programmer and efficient.
Javascript and PHP don't have threads at all; Python has safe
threading, but it's slow.  C/C++/Java all have race condition
problems, of course.  The Facebook guy pointed out that you
can't redefine a function dynamically in PHP, and they get
a performance win in their JIT by exploiting this.

I haven't gone into the memory model in enough detail in the
technical paper.  The memory model I envision for this has three
memory zones:

1.  Shared fully-immutable objects: primarily strings, numbers,
and tuples, all of whose elements are fully immutable.  These can
be shared without locking, and reclaimed by a concurrent garbage
collector like Boehm's.  They have no destructors, so finalization
is not an issue.

2.  Local objects.  These are managed as at present, and
require no locking.  These can either be thread-local, or local
to a synchronized object.  There are no links between local
objects under different "ownership".  Whether each thread and
object has its own private heap, or whether there's a common heap with
locks at the allocator is an implementation decision.

3.  Shared mutable objects: mostly synchronized objects, but
also immutable objects like tuples which contain references
to objects that aren't fully immutable.  These are the high-overhead
objects, and require locking during reference count updates, or
atomic reference count operations if supported by the hardware.
The general idea is to minimize the number of objects in this
zone.

The zone of an object is determined when the object is created,
and never changes.   This is relatively simple to implement.
Tuples (and frozensets, frozendicts, etc.) are normally zone 2
objects.  Only "freeze" creates collections in zones 1 and 3.
Synchronized objects are always created in zone 3.
There are no difficult handoffs, where an object that was previously
thread-local now has to be shared and has to acquire locks during
the transition.

Existing interlinked data structures, like parse trees and GUIs,
are by default zone 2 objects, with the same semantics as at
present.  They can be placed inside a SynchronizedObject if
desired, which makes them usable from multiple threads.
That's optional; they're thread-local otherwise.

The rationale behind "freezing" some of the language semantics
when the program goes multi-thread comes from two sources -
Adam Olsen's Safethread work, and the acceptance of the
multiprocessing module.  Olsen tried to retain all the dynamism of
the language in a multithreaded environment, but locking all the
underlying dictionaries was a boat-anchor on the whole system,
and slowed things down so much that he abandoned the project.
The Unladen Swallow documentation indicates that early thinking
on the project was that Olsen's approach would allow getting
rid of the GIL, but later notes indicate that no path to a
GIL-free JIT system is currently in development.

The multiprocessing module provides semantics similar to
threading with "freezing".  Data passed between processes is "frozen"
by pickling.  Processes can't modify each other's code.  Restrictive
though the multiprocessing module is, it appears to be useful.
It is sometimes recommended as the Pythonic approach to multi-core CPUs.
This is an indication that "freezing" is not unacceptable to the
user community.

Most of the real-world use cases for extreme dynamism
involve events that happen during startup.  Configuration files are
read, modules are selectively included, functions are overridden,

Re: [Python-Dev] bytes / unicode

2010-06-26 Thread P.J. Eby

At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:

What I'm saying here is that if bytes are the signal of validity, and
the stdlib functions preserve validity, then it's better to have the
stdlib functions object to unicode data as an argument.  Compare the
alternative: it returns a unicode object which might get passed around
for a while before one of your functions receives it and identifies it
as unvalidated data.


I still don't follow, since passing in bytes should return 
bytes.  Returning unicode would be an error, in the case of a 
"polymorphic" function (per Guido).




But you agree that there are better mechanisms for validation
(although not available in Python yet), so I don't see this as an
potential obstacle to polymorphism now.


Nope.  I'm just saying that, given two bytestrings to url-join or 
path join or whatever, a polymorph should hand back a 
bytestring.  This seems pretty uncontroversial.




 > What I want is for the stdlib to create stringlike objects of a
 > type determined by the types of the inputs --

In general this is a hard problem, though.  Polymorphism, OK, one-way
tainting OK, but in general combining related types is pretty
arbitrary, and as in the encoded-bytes case, the result type often
varies depending on expectations of callers, not the types of the
data.


But the caller can enforce those expectations by passing in arguments 
whose types do what they want in such cases, as long as the string 
literals used by the function don't get to override the relevant 
parts of the string protocol(s).


The idea that I'm proposing is that the basic string and byte types 
should defer to "user-defined" string types for mixed type 
operations, so that polymorphism of string-manipulation functions is 
the *default* case, rather than a *special* case.  This makes 
tainting easier to implement, as well as optimizing and other special 
cases (like my "source string w/file and line info", or a string with 
font/formatting attributes).






___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/pje%40telecommunity.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] versioned .so files for Python 3.2

2010-06-26 Thread Matthias Klose

On 25.06.2010 22:12, James Y Knight wrote:


On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:


On 6/24/2010 8:23 PM, James Y Knight wrote:

On Jun 24, 2010, at 5:53 PM, Scott Dial wrote:

If the package has .so files that aren't compatible with other version
of python, then what is the motivation for placing that in a shared
location (since it can't actually be shared)


Because python looks for .so files in the same place it looks for the
.py files of the same package.


My suggestion was that a package that contains .so files should not be
shared (e.g., the entire lxml package should be placed in a
version-specific path). The motivation for this PEP was to simplify the
installation python packages for distros; it was not to reduce the
number of .py files on the disk.

Placing .so files together does not simplify that install process in any
way. You will still have to handle such packages in a special way.



This is a good point, but I think still falls short of a solution. For a
package like lxml, indeed you are correct. Since debian needs to build
it once per version, it could just put the entire package (.py files and
.so files) into a different per-python-version directory.


This is what is currently done.  This will increase the size of packages by 
duplicating the .py files, or you have to install the .py in a common location 
(irrelevant to sys.path), and provide (sym)links to the expected location.


A "different per-python-version directory" also has the disadvantage that file 
conflicts between (distribution) packages cannot be detected.



However, then you have to also consider python packages made up of
multiple distro packages -- like twisted or zope. Twisted includes some
C extensions in the core package. But then there are other twisted
modules (installed under a "twisted.foo" name) which do not include C
extensions. If the base twisted package is installed under a
version-specific directory, then all of the submodule packages need to
also be installed under the same version-specific directory (and thus
built for all versions).

In the past, it has proven somewhat tricky to coordinate which directory
the modules for package "foo" should be installed in, because you need
to know whether *any* of the related packages includes a native ".so"
file, not just the current package.

The converse situation, where a base package did *not* get installed
into a version-specific directory because it includes no native code,
but a submodule *does* include a ".so" file, is even trickier.


I don't think that installation into different locations based on the presence 
of extension will work.  Should a location really change if an extension is 
added as an optimization?  Splitting a (python) package into different 
installation locations should be avoided.


  Matthias
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] versioned .so files for Python 3.2

2010-06-26 Thread Matthias Klose

On 26.06.2010 02:19, Nick Coghlan wrote:

On Sat, Jun 26, 2010 at 6:12 AM, James Y Knight  wrote:

However, then you have to also consider python packages made up of multiple
distro packages -- like twisted or zope. Twisted includes some C extensions
in the core package. But then there are other twisted modules (installed
under a "twisted.foo" name) which do not include C extensions. If the base
twisted package is installed under a version-specific directory, then all of
the submodule packages need to also be installed under the same
version-specific directory (and thus built for all versions).

In the past, it has proven somewhat tricky to coordinate which directory the
modules for package "foo" should be installed in, because you need to know
whether *any* of the related packages includes a native ".so" file, not just
the current package.

The converse situation, where a base package did *not* get installed into a
version-specific directory because it includes no native code, but a
submodule *does* include a ".so" file, is even trickier.


I think there are two major ways to tackle this:
- allow multiple versions of a .so file within a single directory (i.e
Barry's current suggestion)


we already do this, see the naming of the extensions of a python debug build on 
Windows.  Several distributions (Debian, Fedora, Ubuntu) do use this as well to 
provide extensions for python debug builds.



- enhanced namespace packages, allowing a single package to be spread
across multiple directories, some of which may be Python version
specific (i.e. modifications to PEP 382 to support references to
version-specific directories)


this is not what I want to use in a distribution.  package management systems 
like rpm and dpkg do handle conflicts and replacements of files pretty well, 
having the same file in potentially different locations in the file system 
doesn't help detecting conflicts and duplicate packages.


  Matthias
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] versioned .so files for Python 3.2

2010-06-26 Thread Matthias Klose

On 24.06.2010 22:46, Barry Warsaw wrote:

On Jun 24, 2010, at 02:28 PM, Barry Warsaw wrote:


On Jun 24, 2010, at 01:00 PM, Benjamin Peterson wrote:


2010/6/24 Barry Warsaw:

On Jun 24, 2010, at 10:58 AM, Benjamin Peterson wrote:


2010/6/24 Barry Warsaw:

Please let me know what you think.  I'm happy to just commit this to the
py3k branch if there are no objections.  I don't think a new PEP is
in order, but an update to PEP 3147 might make sense.


How will this interact with PEP 384 if that is implemented?

I'm trying to come up with something that will work immediately while PEP 384
is being adopted.


But how will modules specify that they support multiple ABIs then?


I didn't understand, so asked Benjamin for clarification in IRC.

  barry: if python 3.3 will only load x.3.3.so, but x.3.2.so supports
   the stable abi, will it load it?  [14:25]
  gutworth: thanks, now i get it :)  [14:26]
  gutworth: i think it should, but it wouldn't under my scheme.  let me
think about it


So, we could say that PEP 384 compliant extension modules would get written
without a version specifier.  IOW, we'd treat foo.so as using the ABI.  It
would then be up to the Python runtime to throw ImportErrors if in fact we
were loading a legacy, non-PEP 384 compliant extension.


Is it realistic to never break the ABI?  I would think of having the ABI encoded 
in the file name as well, and only bump the ABI if it does change.  With the 
"versioned .so files" proposal an ABI bump is necessary with every python 
version, with PEP 384 the ABI bump will be decoupled from the python version.


  Matthias
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] FHS compliance of Python installation

2010-06-26 Thread Matthias Klose

On 25.06.2010 02:54, Ben Finney wrote:

James Y Knight  writes:


Really, python should store the .py files in /usr/share/python/, the
.so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc
files in /var/lib/python2.5- debug. But python doesn't work like that.


+1

So who's going to draft the “Filesystem Hierarchy Standard compliance”
PEP? :-)


This has nothing to do with the FHS.  The FHS talks about data, not code.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] FHS compliance of Python installation

2010-06-26 Thread C. Titus Brown
On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:
> On 25.06.2010 02:54, Ben Finney wrote:
>> James Y Knight  writes:
>>
>>> Really, python should store the .py files in /usr/share/python/, the
>>> .so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc
>>> files in /var/lib/python2.5- debug. But python doesn't work like that.
>>
>> +1
>>
>> So who's going to draft the ???Filesystem Hierarchy Standard compliance???
>> PEP? :-)
>
> This has nothing to do with the FHS.  The FHS talks about data, not code.

Really?  It has some guidelines here for object files, etc., at least as
of 2004.

http://www.pathname.com/fhs/pub/fhs-2.3.html

A quick scan suggests /usr/lib is the right place to look:

http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA

cheers,
--titus
-- 
C. Titus Brown, c...@msu.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] FHS compliance of Python installation

2010-06-26 Thread Matthias Klose

On 26.06.2010 22:30, C. Titus Brown wrote:

On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:

On 25.06.2010 02:54, Ben Finney wrote:

James Y Knight   writes:


Really, python should store the .py files in /usr/share/python/, the
.so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and the .pyc
files in /var/lib/python2.5- debug. But python doesn't work like that.


+1

So who's going to draft the ???Filesystem Hierarchy Standard compliance???
PEP? :-)


This has nothing to do with the FHS.  The FHS talks about data, not code.


Really?  It has some guidelines here for object files, etc., at least as
of 2004.

http://www.pathname.com/fhs/pub/fhs-2.3.html

A quick scan suggests /usr/lib is the right place to look:

http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA


agreed for object files, but
http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA
explicitely states "The /usr/share hierarchy is for all read-only architecture 
independent *data* files".

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] versioned .so files for Python 3.2

2010-06-26 Thread Matthias Klose

On 25.06.2010 20:58, Brett Cannon wrote:

On Fri, Jun 25, 2010 at 01:53, Scott Dial

Placing .so files together does not simplify that install process in any
way. You will still have to handle such packages in a special way. You
must still compile the package multiple times for each relevant version
of python (with special tagging that I imagine distutils can take care
of) and, worse yet, you have created a more trick install than merely
having multiple search paths (e.g., installing/uninstalling lxml for
*one* version of python is actually more difficult in this scheme).


This is meant to be used by distros in a programmatic fashion, so my
response is "so what?" Their package management system is going to
maintain the directory, not a person. You and I are not going to be
using this for anything. This is purely meant for Linux OS vendors
(maybe OS X) to manage their installs through their package software.
I honestly do not expect human beings to be mucking around with these
installs (and I suspect Barry doesn't either).


Placing files for a distribution in a version-independent path does help 
distributions handling file conflicts, detecting duplicates and with moving 
files between different (distribution) packages.


Having non-conflicting extension names is a schema which already is used on some 
platforms (debug builds on Windows).  The question for me is, if just a renaming 
of the .so files is acceptable for upstream, or if distributors should implement 
this on their own, as something like:


  if ext_path.startswith('/usr/') and not ext_path.startswith('/usr/local/'):
load_ext('foo.2.6.so')
  else:
load_ext('foo.so')

I fear this will cause issues when e.g. virtualenv environments start copying 
parts from the system installation instead of symlinking it.


  Matthias
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-26 Thread Roumen Petrov

Brett Cannon wrote:

I finally realized why clang has not been silencing its warnings about
unused return values: I have -Wno-unused-value set in CFLAGS which
comes before OPT (which defines -Wall) as set in PY_CFLAGS in
Makefile.pre.in.

I could obviously set OPT in my environment, but that would override
the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
but the README says that's for stuff that tweak binary compatibility.

So basically what I am asking is what environment variable should I
use? If CFLAGS is correct then does anyone have any issues if I change
the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
after OPT?


It is not important to me as flags set to BASECFLAGS, CFLAGS, OPT or 
EXTRA_CFLAGS will set makefile macros CFLAGS and after distribution 
python distutil will use them to build extension modules. So all 
variable are equal for builds.


Also after configure without OPT variable set we could check what script 
select for build platform and to rerun configure with OPT+own_flags set 
on command line (! ;) ) .


Roumen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] FHS compliance of Python installation

2010-06-26 Thread James Y Knight


On Jun 26, 2010, at 4:35 PM, Matthias Klose wrote:


On 26.06.2010 22:30, C. Titus Brown wrote:

On Sat, Jun 26, 2010 at 10:25:28PM +0200, Matthias Klose wrote:

On 25.06.2010 02:54, Ben Finney wrote:

James Y Knight   writes:

Really, python should store the .py files in /usr/share/python/,  
the
.so files in /usr/lib/x86_64- linux-gnu/python2.5-debug/, and  
the .pyc
files in /var/lib/python2.5- debug. But python doesn't work like  
that.


+1

So who's going to draft the ???Filesystem Hierarchy Standard  
compliance???

PEP? :-)


This has nothing to do with the FHS.  The FHS talks about data,  
not code.


Really?  It has some guidelines here for object files, etc., at  
least as

of 2004.

http://www.pathname.com/fhs/pub/fhs-2.3.html

A quick scan suggests /usr/lib is the right place to look:

http://www.pathname.com/fhs/pub/fhs-2.3.html#USRLIBLIBRARIESFORPROGRAMMINGANDPA


agreed for object files, but
http://www.pathname.com/fhs/pub/fhs-2.3.html#USRSHAREARCHITECTUREINDEPENDENTDATA
explicitely states "The /usr/share hierarchy is for all read-only  
architecture independent *data* files".


I always figured the "read-only architecture independent" bit was the  
important part there, and "code is data". Emacs's el files go into / 
usr/share/emacs, for instance.


James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Terry Reedy
The several posts in this and other threads go me to think about text 
versus number computing (which I am more familiar with).


For numbers, we have in Python three builtins, the general purpose ints 
and floats and the more specialized complex. Two other rational types 
can be imported for specialized uses. And then there are 3rd-party 
libraries like mpz and numpy with more number and array of number types.


What makes these all potentially work together is the special method 
system, including, in particular, the rather complete set of __rxxx__ 
number methods. The latter allow non-commutative operations to be mixed 
either way and ease mixed commutative operations.


For text, we have general purpose str and encoded bytes (and bytearry). 
I think these are sufficient for general use and I am not sure there 
should even be anything else in the stdlib. But I think it should be 
possible to experiment with and use specialized 3rd-party text classes 
just as one can with number classes.


I can imagine that inter-operation, when appropriate, might work better 
with addition of a couple of  missing __rxxx__ methods, such as the 
mentioned __rcontains__. Although adding such would affect the 
implementation of a core syntax feature, it would not affect syntax as 
such as seen by the user.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-26 Thread Brett Cannon
On Wed, Jun 23, 2010 at 14:53, Brett Cannon  wrote:
> I finally realized why clang has not been silencing its warnings about
> unused return values: I have -Wno-unused-value set in CFLAGS which
> comes before OPT (which defines -Wall) as set in PY_CFLAGS in
> Makefile.pre.in.
>
> I could obviously set OPT in my environment, but that would override
> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
> but the README says that's for stuff that tweak binary compatibility.
>
> So basically what I am asking is what environment variable should I
> use? If CFLAGS is correct then does anyone have any issues if I change
> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
> after OPT?
>

Since no one objected I swapped the order in r82259. In case anyone
else uses clang to compile Python, this means that -Wno-unused-value
will now work to silence the warning about unused return values that
is caused by some macros. Probably using -Wno-empty-body is also good
to avoid all the warnings triggered by the UCS4 macros in cjkcodecs.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] versioned .so files for Python 3.2

2010-06-26 Thread Scott Dial
On 6/26/2010 4:06 PM, Matthias Klose wrote:
> On 25.06.2010 22:12, James Y Knight wrote:
>> On Jun 25, 2010, at 4:53 AM, Scott Dial wrote:
>>> Placing .so files together does not simplify that install process in any
>>> way. You will still have to handle such packages in a special way.
>>
>> This is a good point, but I think still falls short of a solution. For a
>> package like lxml, indeed you are correct. Since debian needs to build
>> it once per version, it could just put the entire package (.py files and
>> .so files) into a different per-python-version directory.
> 
> This is what is currently done.  This will increase the size of packages
> by duplicating the .py files, or you have to install the .py in a common
> location (irrelevant to sys.path), and provide (sym)links to the
> expected location.

"This is what is currently done"  and "provide (sym)links to the
expected location" are conflicting statements. If you are symlinking .py
files from a shared location, then that is not the same as "just install
the package into a version-specific location". What motivation is there
for preferring symlinks?

Who cares if a ditro package install yields duplicate .py files? Nor am
I motivated by having to carry duplicate .py files in a distribution
package (I imagine the compression of duplicate .py files is amazing).

> A "different per-python-version directory" also has the disadvantage
> that file conflicts between (distribution) packages cannot be detected.

Why? That sounds like a broken tool, maybe I am naive, please explain.
If two packages install /usr/lib/python2.6/foo.so that should be just as
detectable two installing /usr/lib/python-shared/foo.cpython-26.so

If you *must* compile .so files for every supported version of python at
packaging time, then you are already saying the set of python versions
is known. I fail to see the difference between a package that installs
.py and .so files into many directories than having many .so files in a
single directory; except that many directories *already* works. The only
gain I can see is that you save duplicate .py files in the package and
on the filesystem, and I don't feel that gain alone warrants this
fundamental change.

I would appreciate a proper explanation of why/how a single directory is
better for your distribution. Also, I haven't heard anyone that wasn't
using debian tools chime in with support for any of this, so I would
like to know how this can help RPMs and ebuilds and the like.

> I don't think that installation into different locations based on the
> presence of extension will work.  Should a location really change if an
> extension is added as an optimization?  Splitting a (python) package
> into different installation locations should be avoided.

I'm not sure why changing paths would matter; any package that writes
data in its install location would be considered broken by your distro
already, so what harm is there in having the packaging tool move it
later? Your tool will remove the old path and place it in a new path.

All of these shenanigans seem to manifest from your distro's
python-support/-central design, which seems to be entirely motivated by
reducing duplicate files and *not* simplifying the packaging. While this
plan works rather well with .py files, the devil is in the details. I
don't think Python should be getting involved in what I believe is a
flawed design.

What happens to the distro packaging if a python package splits the
codebase between 2.x and 3.x (meaning they have distinct .py files)? As
someone else mentioned, how is virtualenv going to interact with
packages that install like this?

-- 
Scott Dial
sc...@scottdial.com
scod...@cs.indiana.edu
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-26 Thread M.-A. Lemburg
Brett Cannon wrote:
> On Wed, Jun 23, 2010 at 14:53, Brett Cannon  wrote:
>> I finally realized why clang has not been silencing its warnings about
>> unused return values: I have -Wno-unused-value set in CFLAGS which
>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in
>> Makefile.pre.in.
>>
>> I could obviously set OPT in my environment, but that would override
>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
>> but the README says that's for stuff that tweak binary compatibility.
>>
>> So basically what I am asking is what environment variable should I
>> use? If CFLAGS is correct then does anyone have any issues if I change
>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
>> after OPT?
>>
> 
> Since no one objected I swapped the order in r82259. In case anyone
> else uses clang to compile Python, this means that -Wno-unused-value
> will now work to silence the warning about unused return values that
> is caused by some macros. Probably using -Wno-empty-body is also good
> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs.

I think you need to come up with a different solution and revert
the change...

OPT has historically been the only variable to use for
adjusting the Python C compiler settings.

As the name implies this was usually used to adjust the
optimizer settings, including raising the optimization level
from the default or disabling it.

With your change CFLAGS will always override OPT and thus
any optimization definitions made in OPT will no longer
have an effect.

Note that CFLAGS defines -O2 on many platforms.

In your particular case, you should try setting OPT to
"... -Wno-unused-value ..." (ie. replace -Wall with your
setting).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jun 27 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2010-07-19: EuroPython 2010, Birmingham, UK21 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-26 Thread Brett Cannon
On Sat, Jun 26, 2010 at 16:37, M.-A. Lemburg  wrote:
> Brett Cannon wrote:
>> On Wed, Jun 23, 2010 at 14:53, Brett Cannon  wrote:
>>> I finally realized why clang has not been silencing its warnings about
>>> unused return values: I have -Wno-unused-value set in CFLAGS which
>>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in
>>> Makefile.pre.in.
>>>
>>> I could obviously set OPT in my environment, but that would override
>>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
>>> but the README says that's for stuff that tweak binary compatibility.
>>>
>>> So basically what I am asking is what environment variable should I
>>> use? If CFLAGS is correct then does anyone have any issues if I change
>>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
>>> after OPT?
>>>
>>
>> Since no one objected I swapped the order in r82259. In case anyone
>> else uses clang to compile Python, this means that -Wno-unused-value
>> will now work to silence the warning about unused return values that
>> is caused by some macros. Probably using -Wno-empty-body is also good
>> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs.
>
> I think you need to come up with a different solution and revert
> the change...
>
> OPT has historically been the only variable to use for
> adjusting the Python C compiler settings.

Just found the relevant section in the README.

>
> As the name implies this was usually used to adjust the
> optimizer settings, including raising the optimization level
> from the default or disabling it.

It meant optional to me, not optimization. I hate abbreviations sometimes.

>
> With your change CFLAGS will always override OPT and thus
> any optimization definitions made in OPT will no longer
> have an effect.

That was the point; OPT defines defaults through configure.in and I
simply wanted to add to those instead of having OPT completely
overwritten by me.

>
> Note that CFLAGS defines -O2 on many platforms.

So then wouldn't that mean they want that to be the optimization
level? Or is the historical reason that default exists is so that some
default exists but to expect the application to override as desired?

>
> In your particular case, you should try setting OPT to
> "... -Wno-unused-value ..." (ie. replace -Wall with your
> setting).

So what is CFLAGS for then? ``configure -h`` says it's for "C compiler
flags"; that's extremely ambiguous. And it doesn't help that OPT is
not mentioned by ``configure -h`` as that is what I have always gone
by to know what flags are available for compilation.

-Brett

>
> --
> Marc-Andre Lemburg
> eGenix.com
>
> Professional Python Services directly from the Source  (#1, Jun 27 2010)
 Python/Zope Consulting and Support ...        http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
> 
> 2010-07-19: EuroPython 2010, Birmingham, UK                21 days to go
>
> ::: Try our new mxODBC.Connect Python Database Interface for free ! 
>
>
>   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
>    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
>           Registered at Amtsgericht Duesseldorf: HRB 46611
>               http://www.egenix.com/company/contact/
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-26 Thread Nick Coghlan
On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby  wrote:
> The idea that I'm proposing is that the basic string and byte types should
> defer to "user-defined" string types for mixed type operations, so that
> polymorphism of string-manipulation functions is the *default* case, rather
> than a *special* case.  This makes tainting easier to implement, as well as
> optimizing and other special cases (like my "source string w/file and line
> info", or a string with font/formatting attributes).

Rather than building this into the base string type, perhaps it would
be better (at least initially) to add in a polymorphic str subtype
that worked along the following lines:

1. Has an encoded argument in the constructor (e.g. poly_str("/", encoded=b"/")
2. If given objects with an encode() method, assumes they're strings
and uses its own parent class methods
3. If given objects with a decode() method, assumes they're encoded
and delegates to the encoded attribute

str/bytes agnostic functions would need to invoke poly_str
deliberately, while bytes-only and text-only algorithms could just use
the appropriate literals.

Third party types would be supported to some degree (by having either
encode or decode methods), although they could still run into trouble
with some operations (While full support for third party strings and
byte sequence implementations is an interesting idea, I think it's
overkill for the specific problem of making it easier to write
str/bytes agnostic functions for tasks like URL parsing).

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] thoughts on the bytes/string discussion

2010-06-26 Thread Nick Coghlan
On Sun, Jun 27, 2010 at 8:11 AM, Terry Reedy  wrote:
> I can imagine that inter-operation, when appropriate, might work better with
> addition of a couple of  missing __rxxx__ methods, such as the mentioned
> __rcontains__. Although adding such would affect the implementation of a
> core syntax feature, it would not affect syntax as such as seen by the user.

The problem with strings isn't really the binary operations like
__contains__ - adding __rcontains__ would be a fairly simple
extrapolation of the existing approaches.

Where it gets really messy for strings is the fact that whereas
invoking named methods directly on numbers is rare, invoking them on
strings is very common, and some of those methods (e.g. split(),
join(), __mod__()) allow or require an iterable rather than a single
object. This extends the range of use cases to be covered beyond those
with syntactic support to potentially include all string methods that
take arguments. Creating minimally surprising semantics for the
methods which accept iterables is also rather challenging.

It's an interesting idea, but I think it's overkill for the specific
problem of making it easier to perform more text-like manipulations in
a bytes-only domain.

Cheers,
NIck.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-26 Thread P.J. Eby

At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote:

While full support for third party strings and
byte sequence implementations is an interesting idea, I think it's
overkill for the specific problem of making it easier to write
str/bytes agnostic functions for tasks like URL parsing.


OTOH, to write your partial implementation is almost as complex - it 
still must take into account joining and formatting, and so by that 
point, you've just proposed a new protocol for coercion...  so why 
not just make the coercion protocol explicit in the first place, 
rather than hardwiring a third type's worth of special cases?


Remember, bytes and strings already have to detect mixed-type 
operations.  If there was an API for that, then the hardcoded special 
cases would just be replaced, or supplemented with type slot checks 
and calls after the special cases.


To put it another way, if you already have two types special-casing 
their interactions with each other, then rather than add a *third* 
type to that mix, maybe it's time to have a protocol instead, so that 
the types that care can do the special-casing themselves, and you 
generalize to N user types.


(Btw, those who are saying that the resulting potential for N*N 
interaction makes the feature unworkable seem to be overlooking 
metaclasses and custom numeric types -- two Python features that in 
principle have the exact same problem, when you use them beyond a 
certain scope.  At least with those features, though, you can 
generally mix your user-defined metaclasses or numeric types with the 
Python-supplied basic ones and call arbitrary Python functions on 
them, without as much heartbreak as you'll get with a from-scratch 
stringlike object.)


All that having been said, a new protocol probably falls under the 
heading of the language moratorium, unless it can be considered "new 
methods on builtins"?  (But that seems like a stretch even to me.)


I just hate the idea that functions taking strings should have to be 
*rewritten* to be explicitly type-agnostic.  It seems *so* 
un-Pythonic...  like if all the bitmasking functions you'd ever 
written using 32-bit int constants had to be rewritten just because 
we added longs to the language, and you had to upcast them to be 
compatible or something.  Sounds too much like C or Java or some 
other non-Python language, where dynamism and polymorphy are the 
special case, instead of the general rule. 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] what environment variable should contain compiler warning suppression flags?

2010-06-26 Thread Jeffrey Yasskin
On Sat, Jun 26, 2010 at 4:37 PM, M.-A. Lemburg  wrote:
> Brett Cannon wrote:
>> On Wed, Jun 23, 2010 at 14:53, Brett Cannon  wrote:
>>> I finally realized why clang has not been silencing its warnings about
>>> unused return values: I have -Wno-unused-value set in CFLAGS which
>>> comes before OPT (which defines -Wall) as set in PY_CFLAGS in
>>> Makefile.pre.in.
>>>
>>> I could obviously set OPT in my environment, but that would override
>>> the default OPT settings Python uses. I could put it in EXTRA_CFLAGS,
>>> but the README says that's for stuff that tweak binary compatibility.
>>>
>>> So basically what I am asking is what environment variable should I
>>> use? If CFLAGS is correct then does anyone have any issues if I change
>>> the order of things for PY_CFLAGS in the Makefile so that CFLAGS comes
>>> after OPT?
>>>
>>
>> Since no one objected I swapped the order in r82259. In case anyone
>> else uses clang to compile Python, this means that -Wno-unused-value
>> will now work to silence the warning about unused return values that
>> is caused by some macros. Probably using -Wno-empty-body is also good
>> to avoid all the warnings triggered by the UCS4 macros in cjkcodecs.
>
> I think you need to come up with a different solution and revert
> the change...
>
> OPT has historically been the only variable to use for
> adjusting the Python C compiler settings.
>
> As the name implies this was usually used to adjust the
> optimizer settings, including raising the optimization level
> from the default or disabling it.
>
> With your change CFLAGS will always override OPT and thus
> any optimization definitions made in OPT will no longer
> have an effect.
>
> Note that CFLAGS defines -O2 on many platforms.
>
> In your particular case, you should try setting OPT to
> "... -Wno-unused-value ..." (ie. replace -Wall with your
> setting).

The python configure environment variables are really confused. If OPT
is intended to be user-overridden for optimization settings, it
shouldn't be used to set -Wall and -Wstrict-prototypes. If it's
intended to set warning options, it shouldn't also set optimization
options. Setting the user-visible customization option on the
configure command line shouldn't stomp unrelated defaults.

In configure-based systems, CFLAGS is traditionally
(http://sources.redhat.com/automake/automake.html#Flag-Variables-Ordering)
the way to tack options onto the end of the command line. Python
breaks this by threading flags through CFLAGS in the makefile, which
means they all get stomped if the user sets CFLAGS on the make command
line. We should instead use another spelling ("CFlags"?) for the
internal variable, and append $(CFLAGS) to it.

AC_PROG_CC is the macro that sets CFLAGS to -g -O2 on gcc-based
systems 
(http://www.gnu.org/software/hello/manual/autoconf/C-Compiler.html#index-AC_005fPROG_005fCC-842).
If Python's configure.in sets an otherwise-empty CFLAGS to -g before
calling AC_PROG_CC, AC_PROG_CC won't change it. Or we could just
preserve the users CFLAGS setting across AC_PROG_CC regardless of
whether it's set, to let the user set CFLAGS on the configure line
without stomping any defaults.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes / unicode

2010-06-26 Thread Nick Coghlan
On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby  wrote:
> I just hate the idea that functions taking strings should have to be
> *rewritten* to be explicitly type-agnostic.  It seems *so* un-Pythonic...
>  like if all the bitmasking functions you'd ever written using 32-bit int
> constants had to be rewritten just because we added longs to the language,
> and you had to upcast them to be compatible or something.  Sounds too much
> like C or Java or some other non-Python language, where dynamism and
> polymorphy are the special case, instead of the general rule.

The difference is that we have three classes of algorithm here:
- those that work only on octet sequences
- those that work only on character sequences
- those that can work on either

Python 2 lumped all 3 classes of algorithm together through the
multi-purpose 8-bit str type. The unicode type provided some scope to
separate out the second category, but the divisions were rather
blurry.

Python 3 forces the first two to be separated by using either octets
(bytes/bytearray) or characters (str). There are a *very small* number
of APIs where it is appropriate to be polymorphic, but this is
currently difficult due to the need to supply literals of the
appropriate type for the objects being operated on.

This isn't ever going to happen automagically due to the need to
explicitly provide two literals (one for octet sequences, one for
character sequences).

The virtues of a separate poly_str type are that:
1. It can be simple and implemented in Python, dispatching to str or
bytes as appropriate (probably in the strings module)
2. No chance of impacting the performance of the core interpreter (as
builtins are not affected)
3. Lower impact if it turns out to have been a bad idea

We could talk about this even longer, but the most effective way
forward is going to be a patch that improves the URL parsing
situation.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com