[Python-Dev] AIX 5.3 - Enabling Shared Library Support Vs Extensions
All, When I configure python to enable shared libraries, none of the extensions are getting built during the make step due to this error. building 'cStringIO' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so *collect2: library libpython2.6 not found* building 'cPickle' extension gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I. -IInclude -I./Include -I/opt/freeware/include -I/opt/freeware/include/readline -I/opt/freeware/include/ncurses -I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include -I/u01/home/apli/wm/GDD/Python-2.6.6 -c /u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o ./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o -L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so *collect2: library libpython2.6 not found* This is on AIX 5.3, GCC 4.2, Python 2.6.6 I can confirm that there is a libpython2.6.a file in the top level directory from where I am doing the configure/make etc Here are the options supplied to the configure command ./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I /opt/freeware/include -I /opt/freeware/include/readline -I /opt/freeware/include/ncurses" Please guide me in getting past this error. Thanks for your help on this. Regards, Anurag ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
So the following code defines constants with associated names that get
put in the repr.
I'm still a Python newbie in some areas, particularly classes and
metaclasses, maybe more.
But this Python 3 code seems to create constants with names ... works
for int and str at least.
Special case for int defines a special __or__ operator to OR both the
values and the names, which some might like.
Dunno why it doesn't work for dict, and it is too late to research that
today. That's the last test case in the code below, so you can see how
it works for int and string before it bombs.
There's some obvious cleanup work to be done, and it would be nice to
make the names actually be constant... but they do lose their .name if
you ignorantly assign the base type, so at least it is hard to change
the value and keep the associated .name that gets reported by repr,
which might reduce some confusion at debug time.
An idea I had, but have no idea how to implement, is that it might be
nice to say:
with imported_constants_from_module:
do_stuff
where do_stuff could reference the constants without qualifying them by
module. Of course, if you knew it was just a module of constants, you
could "import * from module" :) But the idea of with is that they'd go
away at the end of that scope.
Some techniques here came from Raymond's namedtuple code.
def constant( name, val ):
typ = str( type( val ))
if typ.startswith("___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
Terry Reedy wrote: > On 11/24/2010 3:06 PM, Alexander Belopolsky wrote: > >> Any non-trivial text processing is likely to be broken in presence of >> surrogates. Producing them on input is just trading known issue for >> an unknown one. Processing surrogate pairs in python code is hard. >> Software that has to support non-BMP characters will most likely be >> written for a wide build and contain subtle bugs when run under a >> narrow build. Note that my latest proposal does not abolish >> surrogates outright. Users who want them can still use something like >> "surrogateescape" error handler for non-BMP characters. > > It seems to me that what you are asking for is an alternate, optional, > utf-8-bmp codec that would raise an error, in either direction, for > non-bmp chars. Then, as you suggest, if one is not prepared for > surrogates, they are not allowed. That would be a possibility as well... but I doubt that many users are going to bother, since slicing surrogates is just as bad as slicing combining code points and the latter are much more common in real life and they do happen to mostly live in the BMP. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Nov 25 2010) >>> Python/Zope Consulting and Support ...http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/ ::: Try our new mxODBC.Connect Python Database Interface for free ! eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
Alexander Belopolsky wrote: > On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull > wrote: > .. >> > I note that an opinion has been raised on this thread that >> > if we want compressed internal representation for strings, we should >> > use UTF-8. I tend to agree, but UTF-8 has been repeatedly rejected as >> > too hard to implement. What makes UTF-16 easier than UTF-8? Only the >> > fact that you can ignore bugs longer, in my view. >> >> That's mostly true. My guess is that we can probably ignore those >> bugs for as long as it takes someone to write the higher-level >> libraries that James suggests and MAL has actually proposed and >> started a PEP for. >> > > As far as I can tell, that PEP generated grand total of one comment in > nine years. This may or may not be indicative of how far away we are > from seeing it implemented. :-) At the time it was too early for people to start thinking about these issues. Actual use of Unicode really only started a few years ago. Since I didn't have a need for such an indexing module myself (and didn't have much time to work on it anyway), I punted on the idea. If someone else wants to pick up the idea, I'd gladly help out with the details. > As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even > more far fetched. Once upon a time, Python Unicode strings supported > buffer protocol and would lazily fill an internal buffer with bytes in > the default encoding. In 3.x the default encoding has been fixed as > UTF-8, buffer protocol support was removed from strings, but the > internal buffer caching (now UTF-8) encoded representation remained. > Maybe we can now implement defenc logic in reverse. Recall that > strings are stored as UCS-2/4 sequences, but once buffer is requested > in 2.x Python code or char* is obtained via > _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer > is filled with UTF-8 bytes and defenc is set to point to that buffer. The original idea was for that buffer to go away once we moved to Unicode for strings. Reality has shown that we still need to stick the buffer, though, since the UTF-8 representation of Unicode objects is used a lot. > So the idea is for strings to store their data as UTF-8 buffer > pointed by defenc upon construction. If an application uses string > indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer. > Proper, Unicode-aware algorithms such as grapheme, word or line > iteration or simple operations such as concatenation, search or > substitution would operate directly on defenc buffers. Presumably > over time fewer and fewer applications would use code unit indexing > that require UCS-2/4 buffer and eventually Python strings can stop > supporting indexing altogether just like they stopped supporting the > buffer protocol in 3.x. I don't follow you: how would UTF-8, which has even more issues with variable length representation of code points, make something easier compared to UTF-16, which has far fewer such issues and then only for non-BMP code points ? Please note that we can only provide one way of string indexing in Python using the standard s[1] notation and since we don't want that operation to be fast and no more than O(1), using the code units as items is the only reasonable way to implement it. With an indexing module, we could then let applications work based on higher level indexing schemes such as complete code points (skipping surrogates), combined code points, graphemes (ignoring e.g. most control code points and zero width code points), words (with some customizations as to where to break words, which will likely have to be language dependent), lines (which can be complicated for scripts that use columns instead ;-)), paragraphs, etc. It would also help to add transparent indexing for right-to-left scripts and text that uses both left-to-right and right-to-left text (BIDI). However, in order for these indexing methods to actually work, they will need to return references to the code units, so we cannot just drop that access method. * Back on the surrogates topic: In any case, I think this discussion is losing its grip on reality. By far, most strings you find in actual applications don't use surrogates at all, so the problem is being exaggerated. If you need to be careful about surrogates for some reason, I think a single new method .hassurrogates() on string objects would go a long way in making detection and adding special-casing for these a lot easier. If adding support for surrogates doesn't make sense (e.g. in the case of the formatting methods), then we simply punt on that and leave such handling to other tools. * Regarding preventing surrogates from entering the Python runtime: It is by far more important to maintain round-trip safety for Unicode data, than getting every bit of code work correctly with surrogates (often, there won't be a single correct way). With a new method for fast detection of surrogates, we c
Re: [Python-Dev] constant/enum type in stdlib
On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote: > So the following code defines constants with associated names that get put > in the repr. The code you gave doesn't work if the constant() function is moved into a separate module from the code that calls it. The globals() function, as I understand it, gives you access to the global namespace *of the current module*, so the constants end up being defined in the module containing constant(), not the module you're calling it from. You could get around this by passing the globals of the calling module to constant(), but I think it's cleaner to use a class to provide a distinct namespace for the constants. > An idea I had, but have no idea how to implement, is that it might be nice > to say: > > with imported_constants_from_module: > do_stuff > > where do_stuff could reference the constants without qualifying them by > module. Of course, if you knew it was just a module of constants, you could > "import * from module" :) But the idea of with is that they'd go away at > the end of that scope. I don't think this is possible - the context manager protocol doesn't allow you to modify the namespace of the caller like that. Also, a with statement does not have its own namespace; any names defined inside its body will continue to be visible in the containing scope. Of course, if you want to achieve something similar (at function scope), you could say: def foo(bar, baz): from module import * ... ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
On 25/11/2010 10:12, Nadeem Vawda wrote:
On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman wrote:
So the following code defines constants with associated names that get put
in the repr.
The code you gave doesn't work if the constant() function is moved
into a separate module from the code that calls it. The globals()
function, as I understand it, gives you access to the global namespace
*of the current module*, so the constants end up being defined in the
module containing constant(), not the module you're calling it from.
You could get around this by passing the globals of the calling module
to constant(), but I think it's cleaner to use a class to provide a
distinct namespace for the constants.
An idea I had, but have no idea how to implement, is that it might be nice
to say:
with imported_constants_from_module:
do_stuff
where do_stuff could reference the constants without qualifying them by
module. Of course, if you knew it was just a module of constants, you could
"import * from module" :) But the idea of with is that they'd go away at
the end of that scope.
I don't think this is possible - the context manager protocol doesn't
allow you to modify the namespace of the caller like that. Also, a
with statement does not have its own namespace; any names defined
inside its body will continue to be visible in the containing scope.
Of course, if you want to achieve something similar (at function
scope), you could say:
def foo(bar, baz):
from module import *
...
Not in Python 3 you can't. :-)
That's invalid syntax, import * can only be used at module level. This
makes *testing* import * (i.e. testing your __all__) annoying - you have
to exec('from module import *') instead.
Michael
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
--
http://www.voidspace.org.uk/
READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
On 25/11/2010 09:34, Glenn Linderman wrote:
So the following code defines constants with associated names that get
put in the repr.
I'm still a Python newbie in some areas, particularly classes and
metaclasses, maybe more.
But this Python 3 code seems to create constants with names ... works
for int and str at least.
Special case for int defines a special __or__ operator to OR both the
values and the names, which some might like.
Dunno why it doesn't work for dict, and it is too late to research
that today. That's the last test case in the code below, so you can
see how it works for int and string before it bombs.
There's some obvious cleanup work to be done, and it would be nice to
make the names actually be constant... but they do lose their .name if
you ignorantly assign the base type, so at least it is hard to change
the value and keep the associated .name that gets reported by repr,
which might reduce some confusion at debug time.
An idea I had, but have no idea how to implement, is that it might be
nice to say:
with imported_constants_from_module:
do_stuff
where do_stuff could reference the constants without qualifying them
by module. Of course, if you knew it was just a module of constants,
you could "import * from module" :) But the idea of with is that
they'd go away at the end of that scope.
Some techniques here came from Raymond's namedtuple code.
def constant( name, val ):
typ = str( type( val ))
if typ.startswith("
Not quite correct. If you or a value you with itself you should get back
just the value not something with "name|name" as the repr.
We can hold off on implementations until we have general agreement that
some kind of named constant *should* be added, and what the feature set
should look like.
All the best,
Michael
ev += '''
%s = constant_%s( %s, '%s' )
'''
ev = ev % ( typ, typ, typ, name, typ, repr( val ), name )
print( ev )
exec( ev, globals())
constant('O_RANDOM', val=16 )
constant('O_SEQUENTIAL', val=32 )
constant("O_STRING", val="string")
def foo( x ):
print( str( x ))
print( repr( x ))
print( type( x ))
foo( O_RANDOM )
foo( O_SEQUENTIAL )
foo( O_STRING )
zz = O_RANDOM | O_SEQUENTIAL
foo( zz )
y = {'ab': 2, 'yz': 3 }
constant('O_DICT', y )
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk
--
http://www.voidspace.org.uk/
READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies ("BOGUS AGREEMENTS") that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py
> Author: senthil.kumaran > New Revision: 86748 > > Log: > Experimental - Transparent gzip Encoding in urllib2. There should be a good > way to deal with Content-Length. Cool feature! But... > Modified: >python/branches/py3k-urllib/Lib/http/client.py >python/branches/py3k-urllib/Lib/urllib/request.py No tests? Misc/NEWS? :) Regards ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] constant/enum type in stdlib
On 25/11/2010 03:46, Greg Ewing wrote:
On 25/11/10 12:38, average wrote:
Is immutability a general need that should have general solution?
Yes, I have sometimes thought this. Might be nice to have a "mutable"
attribute that could be read and could be changed from True to False,
though presumably not vice versa.
I don't think it really generalizes. Tuples are not just frozen
lists, for example -- they have a different internal structure
that's more efficient to create and access.
But couldn't they be presented to the Python programmer as a single
type, with the implementation details hidden "under the hood"?
So
MyList.__mutable__ = False
would have the same effect as the present
MyList = tuple(MyList)
This would simplify some code that copes with either list(s) or tuple(s)
as input data.
One would need syntax for (im)mutable literals, e.g.
[]i# immutable list (really a tuple). Bit of a shame that
"i[]" doesn't work.
or
[]f# frozen list (same thing)
[] # mutable list (same as now)
[]m # alternative syntax for mutable list
This would reduce the overloading on parentheses and avoid having to
write a tuple of one item as (t,) which often trips up newbies. It woud
also avoid one FAQ: Why does Python have separate list and tuple types?
Also the syntax could be extended, e.g.
{a,b,c}f # frozen set with 3 objects
{p:x,q:y}f # frozen dictionary with 2 items
{:}f, {}f # (re the thread on set literals) frozen empty
dictionary and frozen empty set!
Just some thoughts for Python 4.
Best wishes
Rob Cliffe
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py
Am 25.11.2010 12:47, schrieb Éric Araujo: >> Author: senthil.kumaran >> New Revision: 86748 >> >> Log: >> Experimental - Transparent gzip Encoding in urllib2. There should be a good >> way to deal with Content-Length. > Cool feature! But... > >> Modified: >>python/branches/py3k-urllib/Lib/http/client.py >>python/branches/py3k-urllib/Lib/urllib/request.py > No tests? Misc/NEWS? :) Note that this is work in a separate branch. Georg ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] python3k : imp.find_module raises SyntaxError
hello,
working on Pylint, we have a lot of voluntary corrupted files to test
Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""
__revision__ = ''
and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises SyntaxError
in that case ; it didn't raise SyntaxError on python2 nor does so on our
func_nonascii_noencoding and func_wrong_encoding modules (with obvious
names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from imp import find_module
>>> find_module('func_unknown_encoding', None)
Traceback (most recent call last):
File "", line 1, in
SyntaxError: encoding problem: with BOM
>>> find_module('func_wrong_encoding', None)
(<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
('.py', 'U', 1))
>>> find_module('func_nonascii_noencoding', None)
(<_io.TextIOWrapper name=6 encoding='utf-8'>,
'func_nonascii_noencoding.py', ('.py', 'U', 1))
So what is the reason of this selective behavior?
Furthermore, there is BOM in our func_unknown_encoding.py module.
--
Emile Anclin
http://www.logilab.fr/ http://www.logilab.org/
Informatique scientifique & et gestion de connaissances
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] python3k : imp.find_module raises SyntaxError
On 11/25/2010 08:30 AM, Emile Anclin wrote:
hello,
working on Pylint, we have a lot of voluntary corrupted files to test
Pylint behavior; for instance
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""
__revision__ = ''
and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises SyntaxError
in that case ; it didn't raise SyntaxError on python2 nor does so on our
func_nonascii_noencoding and func_wrong_encoding modules (with obvious
names)
Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
from imp import find_module
find_module('func_unknown_encoding', None)
Traceback (most recent call last):
File "", line 1, in
SyntaxError: encoding problem: with BOM
find_module('func_wrong_encoding', None)
(<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
('.py', 'U', 1))
find_module('func_nonascii_noencoding', None)
(<_io.TextIOWrapper name=6 encoding='utf-8'>,
'func_nonascii_noencoding.py', ('.py', 'U', 1))
So what is the reason of this selective behavior?
Furthermore, there is BOM in our func_unknown_encoding.py module.
I don't think there is a clear reason by design. Also try importing the
same modules directly and noting the differences in the errors you get.
For example, the problem that brought this to my attention in python3.2.
>>> find_module('test/badsyntax_pep3120')
Segmentation fault
>>> from test import badsyntax_pep3120
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xf6' in file
/usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no
encoding declared; see http://python.org/dev/peps/pep-0263/ for details
The import statement uses parser.c, and tokenizer.c indirectly, to import a
file, but the imp module uses tokenizer.c directly. They aren't consistent
in how they handle errors because the different error messages are
generated in different places depending on what the error is, *and* what
the code path to get to that point was, *and* weather or not a filename was
set. For the example above with imp.findmodule(), the filename isn't set,
so you get a different error than if you used import, which uses the parser
module and that does set the filename.
From what I've seen, it would help if the imp module was rewritten to use
parser.c like the import statement does, rather than tokenizer.c directly.
The error handling in parser.c is much better than tokenizer.c. Possibly
tokenizer.c could be cleaned up after that and be made much simpler.
Ron Adam
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py
>>> Modified: >>>python/branches/py3k-urllib/Lib/http/client.py >>>python/branches/py3k-urllib/Lib/urllib/request.py >> No tests? Misc/NEWS? :) > > Note that this is work in a separate branch. Ah, didn’t notice that! Senthil replied as much in private email: > That was in a different branch. Once stable shall definitey include > the tests and news. unconsciously-ignoring-svn-branches-to-preserve-sanity-ly yours, Éric ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Friday 19 November 2010 23:25:03 you wrote: > > Python is unclear about non-BMP characters: narrow build was called > > "ucs2" for long time, even if it is UTF-16 (each character is encoded to > > one or two UTF-16 words). > > No, no, no :-) > > UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even > "UTF-16" and "UTF-32". Ok for Python 2: $ ./python Python 2.7.0+ (release27-maint:84618M, Sep 8 2010, 12:43:49) >>> import sys; sys.maxunicode 65535 >>> x=u'\U0010'; len(x) 2 >>> ord(x) ... TypeError: ord() expected a character, but string of length 2 found But Python 3 does use UTF-16 for narrow build: $ ./python Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09) >>> import sys; sys.maxunicode 65535 >>> c=chr(0x10); len(c) 2 >>> ord(c) 1114111 Victor ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py
Hello, > Author: senthil.kumaran > Log: > Mouse support and colour to Demo/curses/life.py by Dafydd Crosby > > Modified: >python/branches/py3k/Demo/curses/life.py Okay, this time I’m reacting to the right branch > Modified: python/branches/py3k/Demo/curses/life.py > == > --- python/branches/py3k/Demo/curses/life.py (original) > +++ python/branches/py3k/Demo/curses/life.py Thu Nov 25 15:56:44 2010 > @@ -1,6 +1,7 @@ > #!/usr/bin/env python3 > # life.py -- A curses-based version of Conway's Game of Life. > # Contributed by AMK > +# Mouse support and colour by Dafydd Crosby Shouldn’t his name rather be in Misc/ACKS too? Modules typically (warning: non-scientific data) include the name of the author or first contributors but not the name of every contributor. I think these cool features deserve a note in Misc/NEWS too :) Re: “colour”: the rest of the file use US English, as do the function names (see for example curses.has_color). It’s good to use one dialect consistently in one file. going-back-to-stare-at-shiny-colors-ly yours, Éric ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py
On Fri, Nov 26, 2010 at 02:32:43AM +0100, Éric Araujo wrote:
> Shouldn’t his name rather be in Misc/ACKS too? Modules typically
> (warning: non-scientific data) include the name of the author or first
> contributors but not the name of every contributor.
>
> I think these cool features deserve a note in Misc/NEWS too :)
I don't think it is required. Demo stuffs are usually fun
demonstrations. The contributor had added his name to patch in the
header, and I just left it like that. It's fine.
For features and important patches (subjective), Misc/{ACKS,NEWS} are
both added.
> Re: “colour”: the rest of the file use US English, as do the function
> names (see for example curses.has_color). It’s good to use one dialect
> consistently in one file.
Good catch. Did not realize it because, we write it as colour too.
Changing it.
Thanks,
Senthil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
M.-A. Lemburg writes: > That would be a possibility as well... but I doubt that many users > are going to bother, since slicing surrogates is just as bad as > slicing combining code points and the latter are much more common in > real life and they do happen to mostly live in the BMP. That's only if you require 100% fidelity in the data, which may not be true in some use cases. Where 99.99% fidelity is good enough, an unexpected sliced surrogate pair is a show-stopper, while a sliced combining character sequence not only doesn't stop the show (at least in Python, and I doubt any correct Unicode process can signal a fatal error there either, I can put a tilde on a Cyrillic character if I want to, no?), it's probably readable enough that readers will assume a keypunch error. Personally, if available I would always use some such dodge in server software (I don't care enough about 24x7 availability to write it myself, though). And never in a script for interactive use; something needs fixing, may as well take the fatal error and fix it on the spot. (Again, "on the spot" for me can mean "tomorrow".) ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
M.-A. Lemburg writes: > Please note that we can only provide one way of string indexing > in Python using the standard s[1] notation and since we don't > want that operation to be fast and no more than O(1), using the > code units as items is the only reasonable way to implement it. AFAICT, the "we" that wants "no more than O(1)" does not include Glyph Lefkowitz, James Knight, and Greg Ewing. Greg even said that in designing a UTF-8 string type he might not provide a indexing operation at all. (Caution: That may not be what he meant; I'm just reporting the way I interpreted it.) Of course none of them are proposing to change Python, that's all in the context of designing a new language. But it does suggest that a lot of people can't think of use cases where O(1) string indexing is more important than Unicode robustness. > It is by far more important to maintain round-trip safety for > Unicode data, than getting every bit of code work correctly > with surrogates (often, there won't be a single correct way). But surely it's more important than that to ensure that surrogates can't crash a Python process with unexpect UnicodeErrors? ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Question about GDB bindings and 32/64 bits
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I have installed GDB 7.2 32 bits and 32 bits buildslaves are green. Nevertheless 64 bits buildslaves are failing test_gdb. Is there any expectation that a 32 bits GDB be able to debug a 64 bits python?. If not, gdb test should compare "platform.architecture()" (for python and gdb in the system) and run only when they are the same. If this should work, I would open a bug and maybe spend some time with it. But before thinking about investing time, I would like to know if this mix is actually expected or not to work. If not, I would consider to install a 64 bits GDB too and do some tricks (like using an "/usr/local/bin/gdb" script wrapper to choose 32/64 "real" gdb version) to actually execute "test_gdb" in both buildslaves (they are running in the same physical machine). Any advice? PS: I am talking about AMD64 OpenIndiana buildbots. Haven't check others. - -- Jesus Cea Avion _/_/ _/_/_/_/_/_/ [email protected] - http://www.jcea.es/ _/_/_/_/ _/_/_/_/ _/_/ jabber / xmpp:[email protected] _/_/_/_/ _/_/_/_/_/ . _/_/ _/_/_/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/_/_/ _/_/_/_/ _/_/ "My name is Dump, Core Dump" _/_/_/_/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQCVAwUBTO8zjJlgi5GaxT1NAQLusgP9GVuhvQJWhPqjzdkZnrMObQg0AD6ggbIR 2B4IstFpD1bKvIcGPJv0Irk3+heaQuFbTzYVLC132d89Ektfib9ZbJ/hzJz2wqd2 lnkfNUCV0tKal3P7kbGYUk828glIrlufSuF1HYIknd2BAzHFl5Zf6q5/AXzYr90D v4Y82b7Wg0k= =NHcR -END PGP SIGNATURE- ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote: > You end up proliferating types that all do the same kind of thing. Judicious > use of inheritance helps, but getting the fundamental abstraction right is > hard. Or least, Emacs hasn't found it in 20 years of trying. Emacs hasn't even figured out how to do general purpose iteration in 20 years of trying either. The easiest way I've found to loop across an arbitrary pile of 'stuff' is the CL 'loop' macro, which you're not even supposed to use. Even then, you still have to make the arcane and pointless distinction of using 'across' or 'in' or 'on'. Python, on the other hand, has iteration pretty well tied up nicely in a bow. I don't know how to respond to the rest of your argument. Nothing you've said has in any way indicated to me why having code-point offsets is a good idea, only that people who know C and elisp would rather sling around piles of integers than have good abstract types. For example: > I think it more likely that markers are very expense to create and use > compared to integers. What? When you do 'for x in str' in python, you are already creating an iterator object, which has to store the exact same amount of state that our proposed 'marker' or 'character pointer' would have to store. The proposed UTF-8 marker would have to do a tiny bit more work when iterating because it would have to combine multibyte characters, but in exchange for that you get to skip a whole ton of copying when encoding and decoding. How is this expensive to create and use? For every application I have ever designed, encountered, or can even conjecture about, this would be cheaper. (Assuming not just a UTF-8 string type, but one for UTF-16 as well, where native data is in that format already.) For what it's worth, not wanting to use abstract types in Emacs makes sense to me: I've written my share of elisp code, and it is hard to create reasonable abstractions in Emacs, because the facilities for defining types and creating polymorphic logic are so crude. It's a lot easier to just assume your underlying storage is an array, because at the end of the day you're going to need to call some functions on it which care whether it's an array or an alist or a list or a vector anyway, so you might as well just say so up front. But in Python we could just call 'mystring.by_character()' or 'mystring.by_codepoint()' and get an iterator object back and forget about all that junk. ___ Python-Dev mailing list [email protected] http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] len(chr(i)) = 2?
On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote:
> Greg Ewing writes:
>> On 24/11/10 22:03, Stephen J. Turnbull wrote:
>>> But
>>> if you actually need to remember positions, or regions, to jump to
>>> later or to communicate to other code that manipulates them, doing
>>> this stuff the straightforward way (just copying the whole iterator
>>> object to hang on to its state) becomes expensive.
>>
>> If the internal representation of a text pointer (I won't call it
>> an iterator because that means something else in Python) is a byte
>> offset or something similar, it shouldn't take up any more space
>> than a Python int, which is what you'd be using anyway if you
>> represented text positions by grapheme indexes or whatever.
>
> That's not necessarily true. Eg, in Emacs ("there you go again"),
> Lisp integers are not only immediate (saving one pointer), but the
> type is encoded in the lower bits, so that there is no need for a type
> pointer -- the representation is smaller than the opaque marker type.
> Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
> 24 bytes on a 64-bit platform.
Yes, yes, lisp is very clever. Maybe some other runtime, like PyPy, could make
this optimization. But I don't think that anyone is filling up main memory
with gigantic piles of character indexes and need to squeeze out that extra
couple of bytes of memory on such a tiny object. Plus, this would allow such a
user to stop copying the character data itself just to decode it, and on
mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the
bat.
> In Python it's true that markers can use the same data structure as
> integers and simply provide different methods, and it's arguable that
> Python's design is better. But if you use bytes internally, then you
> have problems.
No, you just have design questions.
> Do you expose that byte value to the user?
Yes, but only if they ask for it. It's useful for computing things like quota
and the like.
> Can users (programmers using the language and end users) specify positions in
> terms of byte values?
Sure, why not?
> If so, what do you do if the user specifies a byte value that points into a
> multibyte character?
Go to the beginning of the multibyte character. Report that position; if the
user then asks the requested marker object for its position, it will report
that byte offset, not the originally-requested one. (Obviously, do the same
thing for surrogate pair code points.)
> What if the user wants to specify position by number of characters?
Part of the point that we are trying to make here is that nobody really cares
about that use-case. In order to know anything useful about a position in a
text, you have to have traversed to that location in the text. You can remember
interesting things like the offsets of starts of lines, or the x/y positions of
characters.
> Can you translate efficiently?
No, because there's no point :). But you _could_ implement an overlay that
cached things like the beginning of lines, or the x/y positions of interesting
characters.
> As I say elsewhere, it's possible that there really never is a need to
> efficiently specify an absolute position in a large text as a character
> (grapheme, whatever) count.
> But I think it would be hard to implement an efficient text-processing
> *language*, eg, a Python module
> for *full conformance* in handling Unicode, on top of UTF-8.
Still: why? I guess if I have some free time I'll try my hand at it, and maybe
I'll run into a wall and realize you're right :).
> Any time you have an algorithm that requires efficient access to arbitrary
> text positions, you'll spend all your skull sweat fighting the
> representation. At least, that's been my experience with Emacsen.
What sort of algorithm would that be, though? The main thing that I could
think of is a text editor trying to efficiently allow the user to scroll to the
middle of a large file without reading the whole thing into memory. But, in
that case, you could use byte-positions to estimate, and display an heuristic
number while calculating the real line numbers. (This is what 'less' does, and
it seems to work well.)
>> So I don't really see what you're arguing for here. How do
>> *you* think positions in unicode strings should be represented?
>
> I think what users should see is character positions, and they should
> be able to specify them numerically as well as via an opaque marker
> object. I don't care whether that position is represented as bytes or
> characters internally, except that the experience of Emacsen is that
> representation as byte positions is both inefficient and fragile. The
> representation as character positions is more robust but slightly more
> inefficient.
Is it really the representation as byte positions which is fragile (i.e. the
internal implementation detail), or the exposure of that position to calling
code, and the idio
