[Python-Dev] AIX 5.3 - Enabling Shared Library Support Vs Extensions

2010-11-25 Thread Anurag Chourasia
All,

When I configure python to enable shared libraries, none of the
extensions are getting built during the make step due to this error.

building 'cStringIO' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I.
-IInclude -I./Include -I/opt/freeware/include
-I/opt/freeware/include/readline -I/opt/freeware/include/ncurses
-I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include
-I/u01/home/apli/wm/GDD/Python-2.6.6 -c
/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.c -o
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o
./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cStringIO.o
-L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cStringIO.so
*collect2: library libpython2.6 not found*

building 'cPickle' extension
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall
-Wstrict-prototypes -I. -I/u01/home/apli/wm/GDD/Python-2.6.6/./Include -I.
-IInclude -I./Include -I/opt/freeware/include
-I/opt/freeware/include/readline -I/opt/freeware/include/ncurses
-I/usr/local/include -I/u01/home/apli/wm/GDD/Python-2.6.6/Include
-I/u01/home/apli/wm/GDD/Python-2.6.6 -c
/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.c -o
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o
./Modules/ld_so_aix gcc -pthread -bI:Modules/python.exp
build/temp.aix-5.3-2.6/u01/home/apli/wm/GDD/Python-2.6.6/Modules/cPickle.o
-L/usr/local/lib *-lpython2.6* -o build/lib.aix-5.3-2.6/cPickle.so
*collect2: library libpython2.6 not found*

This is on AIX 5.3, GCC 4.2, Python 2.6.6

I can confirm that there is a libpython2.6.a file in the top level
directory from where I am doing the configure/make etc

Here are the options supplied to the configure command

./configure --enable-shared --disable-ipv6 --with-gcc=gcc CPPFLAGS="-I
/opt/freeware/include -I /opt/freeware/include/readline -I
/opt/freeware/include/ncurses"

Please guide me in getting past this error.

Thanks for your help on this.

Regards,
Anurag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-25 Thread Glenn Linderman
So the following code defines constants with associated names that get 
put in the repr.


I'm still a Python newbie in some areas, particularly classes and 
metaclasses, maybe more.
But this Python 3 code seems to create constants with names ... works 
for int and str at least.


Special case for int defines a special  __or__ operator to OR both the 
values and the names, which some might like.


Dunno why it doesn't work for dict, and it is too late to research that 
today.  That's the last test case in the code below, so you can see how 
it works for int and string before it bombs.


There's some obvious cleanup work to be done, and it would be nice to 
make the names actually be constant... but they do lose their .name if 
you ignorantly assign the base type, so at least it is hard to change 
the value and keep the associated .name that gets reported by repr, 
which might reduce some confusion at debug time.


An idea I had, but have no idea how to implement, is that it might be 
nice to say:


with imported_constants_from_module:
   do_stuff

where do_stuff could reference the constants without qualifying them by 
module.  Of course, if you knew it was just a module of constants, you 
could "import * from module" :)  But the idea of with is that they'd go 
away at the end of that scope.


Some techniques here came from Raymond's namedtuple code.


def constant( name, val ):
typ = str( type( val ))
if typ.startswith("___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread M.-A. Lemburg
Terry Reedy wrote:
> On 11/24/2010 3:06 PM, Alexander Belopolsky wrote:
> 
>> Any non-trivial text processing is likely to be broken in presence of
>> surrogates.  Producing them on input is just trading known issue for
>> an unknown one.  Processing surrogate pairs in python code is hard.
>> Software that has to support non-BMP characters will most likely be
>> written for a wide build and contain subtle bugs when run under a
>> narrow build.  Note that my latest proposal does not abolish
>> surrogates outright.  Users who want them can still use something like
>> "surrogateescape"  error handler for non-BMP characters.
> 
> It seems to me that what you are asking for is an alternate, optional,
> utf-8-bmp codec that would raise an error, in either direction, for
> non-bmp chars. Then, as you suggest, if one is not prepared for
> surrogates, they are not allowed.

That would be a possibility as well... but I doubt that many users
are going to bother, since slicing surrogates is just as bad as
slicing combining code points and the latter are much more common in
real life and they do happen to mostly live in the BMP.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 25 2010)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread M.-A. Lemburg
Alexander Belopolsky wrote:
> On Wed, Nov 24, 2010 at 9:17 PM, Stephen J. Turnbull  
> wrote:
> ..
>>  > I note that an opinion has been raised on this thread that
>>  > if we want compressed internal representation for strings, we should
>>  > use UTF-8.  I tend to agree, but UTF-8 has been repeatedly rejected as
>>  > too hard to implement.  What makes UTF-16 easier than UTF-8?  Only the
>>  > fact that you can ignore bugs longer, in my view.
>>
>> That's mostly true.  My guess is that we can probably ignore those
>> bugs for as long as it takes someone to write the higher-level
>> libraries that James suggests and MAL has actually proposed and
>> started a PEP for.
>>
> 
> As far as I can tell, that PEP generated grand total of one comment in
> nine years.  This may or may not be indicative of how far away we are
> from seeing it implemented.  :-)

At the time it was too early for people to start thinking about
these issues. Actual use of Unicode really only started a few years
ago.

Since I didn't have a need for such an indexing module myself
(and didn't have much time to work on it anyway), I punted on the
idea.

If someone else wants to pick up the idea, I'd gladly help out with
the details.

> As far as UTF-8 vs. UCS-2/4 debate, I have an idea that may be even
> more far fetched.  Once upon a time, Python Unicode strings supported
> buffer protocol and would lazily fill an internal buffer with bytes in
> the default encoding.  In 3.x the default encoding has been fixed as
> UTF-8, buffer protocol support was removed from strings, but the
> internal buffer caching (now UTF-8) encoded representation remained.
> Maybe we can now implement defenc logic in reverse.  Recall that
> strings are stored as UCS-2/4 sequences, but once buffer is requested
> in 2.x Python code or char* is obtained via
> _PyUnicode_AsStringAndSize() at the C level in 3.x, an internal buffer
> is filled with UTF-8 bytes and  defenc is set to point to that buffer.

The original idea was for that buffer to go away once we moved
to Unicode for strings. Reality has shown that we still need
to stick the buffer, though, since the UTF-8 representation
of Unicode objects is used a lot.

>   So the idea is for strings to store their data as UTF-8 buffer
> pointed by defenc upon construction.  If an application uses string
> indexing, UTF-8 only strings will lazily fill their UCS-2/4 buffer.
> Proper, Unicode-aware algorithms such as grapheme, word or line
> iteration or simple operations such as concatenation, search or
> substitution would operate directly on defenc buffers.  Presumably
> over time fewer and fewer applications would use code unit indexing
> that require UCS-2/4 buffer and eventually Python strings can stop
> supporting indexing altogether just like they stopped supporting the
> buffer protocol in 3.x.

I don't follow you: how would UTF-8, which has even more issues
with variable length representation of code points, make something
easier compared to UTF-16, which has far fewer such issues and
then only for non-BMP code points ?

Please note that we can only provide one way of string indexing
in Python using the standard s[1] notation and since we don't
want that operation to be fast and no more than O(1), using the
code units as items is the only reasonable way to implement it.

With an indexing module, we could then let applications work
based on higher level indexing schemes such as complete code
points (skipping surrogates), combined code points, graphemes
(ignoring e.g. most control code points and zero width
code points), words (with some customizations as to where to
break words, which will likely have to be language dependent),
lines (which can be complicated for scripts that use columns
instead ;-)), paragraphs, etc.

It would also help to add transparent indexing for right-to-left
scripts and text that uses both left-to-right and right-to-left
text (BIDI).

However, in order for these indexing methods to actually work,
they will need to return references to the code units, so we cannot
just drop that access method.

* Back on the surrogates topic:

In any case, I think this discussion is losing its grip on reality.

By far, most strings you find in actual applications don't use
surrogates at all, so the problem is being exaggerated.

If you need to be careful about surrogates for some reason, I think
a single new method .hassurrogates() on string objects would
go a long way in making detection and adding special-casing for
these a lot easier.

If adding support for surrogates doesn't make sense (e.g. in the
case of the formatting methods), then we simply punt on that and
leave such handling to other tools.

* Regarding preventing surrogates from entering the Python
runtime:

It is by far more important to maintain round-trip safety for
Unicode data, than getting every bit of code work correctly
with surrogates (often, there won't be a single correct way).

With a new method for fast detection of surrogates, we c

Re: [Python-Dev] constant/enum type in stdlib

2010-11-25 Thread Nadeem Vawda
On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman  wrote:
> So the following code defines constants with associated names that get put
> in the repr.

The code you gave doesn't work if the constant() function is moved
into a separate module from the code that calls it.  The globals()
function, as I understand it, gives you access to the global namespace
*of the current module*, so the constants end up being defined in the
module containing constant(), not the module you're calling it from.

You could get around this by passing the globals of the calling module
to constant(), but I think it's cleaner to use a class to provide a
distinct namespace for the constants.

> An idea I had, but have no idea how to implement, is that it might be nice
> to say:
>
>     with imported_constants_from_module:
>        do_stuff
>
> where do_stuff could reference the constants without qualifying them by
> module.  Of course, if you knew it was just a module of constants, you could
> "import * from module" :)  But the idea of with is that they'd go away at
> the end of that scope.

I don't think this is possible - the context manager protocol doesn't
allow you to modify the namespace of the caller like that.  Also, a
with statement does not have its own namespace; any names defined
inside its body will continue to be visible in the containing scope.

Of course, if you want to achieve something similar (at function
scope), you could say:

def foo(bar, baz):
from module import *
...
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-25 Thread Michael Foord

On 25/11/2010 10:12, Nadeem Vawda wrote:

On Thu, Nov 25, 2010 at 11:34 AM, Glenn Linderman  wrote:

So the following code defines constants with associated names that get put
in the repr.

The code you gave doesn't work if the constant() function is moved
into a separate module from the code that calls it.  The globals()
function, as I understand it, gives you access to the global namespace
*of the current module*, so the constants end up being defined in the
module containing constant(), not the module you're calling it from.

You could get around this by passing the globals of the calling module
to constant(), but I think it's cleaner to use a class to provide a
distinct namespace for the constants.


An idea I had, but have no idea how to implement, is that it might be nice
to say:

 with imported_constants_from_module:
do_stuff

where do_stuff could reference the constants without qualifying them by
module.  Of course, if you knew it was just a module of constants, you could
"import * from module" :)  But the idea of with is that they'd go away at
the end of that scope.

I don't think this is possible - the context manager protocol doesn't
allow you to modify the namespace of the caller like that.  Also, a
with statement does not have its own namespace; any names defined
inside its body will continue to be visible in the containing scope.

Of course, if you want to achieve something similar (at function
scope), you could say:

def foo(bar, baz):
 from module import *
 ...


Not in Python 3 you can't. :-)

That's invalid syntax, import * can only be used at module level. This 
makes *testing* import * (i.e. testing your __all__) annoying - you have 
to exec('from module import *') instead.


Michael


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-25 Thread Michael Foord

On 25/11/2010 09:34, Glenn Linderman wrote:
So the following code defines constants with associated names that get 
put in the repr.


I'm still a Python newbie in some areas, particularly classes and 
metaclasses, maybe more.
But this Python 3 code seems to create constants with names ... works 
for int and str at least.


Special case for int defines a special  __or__ operator to OR both the 
values and the names, which some might like.


Dunno why it doesn't work for dict, and it is too late to research 
that today.  That's the last test case in the code below, so you can 
see how it works for int and string before it bombs.


There's some obvious cleanup work to be done, and it would be nice to 
make the names actually be constant... but they do lose their .name if 
you ignorantly assign the base type, so at least it is hard to change 
the value and keep the associated .name that gets reported by repr, 
which might reduce some confusion at debug time.


An idea I had, but have no idea how to implement, is that it might be 
nice to say:


with imported_constants_from_module:
   do_stuff

where do_stuff could reference the constants without qualifying them 
by module.  Of course, if you knew it was just a module of constants, 
you could "import * from module" :)  But the idea of with is that 
they'd go away at the end of that scope.


Some techniques here came from Raymond's namedtuple code.


def constant( name, val ):
typ = str( type( val ))
if typ.startswith("

Not quite correct. If you or a value you with itself you should get back 
just the value not something with "name|name" as the repr.


We can hold off on implementations until we have general agreement that 
some kind of named constant *should* be added, and what the feature set 
should look like.


All the best,

Michael


ev += '''
%s = constant_%s( %s, '%s' )

'''
ev = ev % ( typ, typ, typ, name, typ, repr( val ), name )
print( ev )
exec( ev, globals())

constant('O_RANDOM', val=16 )

constant('O_SEQUENTIAL', val=32 )

constant("O_STRING", val="string")

def foo( x ):
print( str( x ))
print( repr( x ))
print( type( x ))

foo( O_RANDOM )
foo( O_SEQUENTIAL )
foo( O_STRING )

zz = O_RANDOM | O_SEQUENTIAL

foo( zz )

y = {'ab': 2, 'yz': 3 }
constant('O_DICT', y )


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies ("BOGUS AGREEMENTS") that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py

2010-11-25 Thread Éric Araujo
> Author: senthil.kumaran
> New Revision: 86748
> 
> Log:
> Experimental - Transparent gzip Encoding in urllib2. There should be a good 
> way to deal with Content-Length.
Cool feature!  But...

> Modified:
>python/branches/py3k-urllib/Lib/http/client.py
>python/branches/py3k-urllib/Lib/urllib/request.py
No tests?  Misc/NEWS?  :)

Regards

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-25 Thread Rob Cliffe



On 25/11/2010 03:46, Greg Ewing wrote:

On 25/11/10 12:38, average wrote:

Is immutability a general need that should have general solution?


Yes, I have sometimes thought this.  Might be nice to have a "mutable" 
attribute that could be read and could be changed from True to False, 
though presumably not vice versa.

I don't think it really generalizes. Tuples are not just frozen
lists, for example -- they have a different internal structure
that's more efficient to create and access.

But couldn't they be presented to the Python programmer as a single 
type, with the implementation details hidden "under the hood"?

So
MyList.__mutable__ = False
would have the same effect as the present
MyList = tuple(MyList)
This would simplify some code that copes with either list(s) or tuple(s) 
as input data.

One would need syntax for (im)mutable literals, e.g.
[]i# immutable list (really a tuple).  Bit of a shame that 
"i[]" doesn't work.

or
[]f# frozen list (same thing)
[] # mutable list (same as now)
[]m  # alternative syntax for mutable list
This would reduce the overloading on parentheses and avoid having to 
write a tuple of one item as (t,) which often trips up newbies.  It woud 
also avoid one FAQ: Why does Python have separate list and tuple types?  
Also the syntax could be extended, e.g.

{a,b,c}f  # frozen set with 3 objects
{p:x,q:y}f  # frozen dictionary with 2 items
{:}f,  {}f  # (re the thread on set literals) frozen empty 
dictionary and frozen empty set!

Just some thoughts for Python 4.
Best wishes
Rob Cliffe
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py

2010-11-25 Thread Georg Brandl
Am 25.11.2010 12:47, schrieb Éric Araujo:
>> Author: senthil.kumaran
>> New Revision: 86748
>> 
>> Log:
>> Experimental - Transparent gzip Encoding in urllib2. There should be a good 
>> way to deal with Content-Length.
> Cool feature!  But...
> 
>> Modified:
>>python/branches/py3k-urllib/Lib/http/client.py
>>python/branches/py3k-urllib/Lib/urllib/request.py
> No tests?  Misc/NEWS?  :)

Note that this is work in a separate branch.

Georg

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-25 Thread Emile Anclin

hello,

working on Pylint, we have a lot of voluntary corrupted files to test 
Pylint behavior; for instance 
 
$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py 
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""

__revision__ = ''


and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises SyntaxError 
in that case ; it didn't raise SyntaxError on python2 nor does so on our 
func_nonascii_noencoding and func_wrong_encoding modules (with obvious 
names)

Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36) 
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from imp import find_module
>>> find_module('func_unknown_encoding', None)
Traceback (most recent call last):
  File "", line 1, in 
SyntaxError: encoding problem: with BOM
>>> find_module('func_wrong_encoding', None)
(<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py', 
('.py', 'U', 1))
>>> find_module('func_nonascii_noencoding', None)
(<_io.TextIOWrapper name=6 encoding='utf-8'>, 
'func_nonascii_noencoding.py', ('.py', 'U', 1))


So what is the reason of this selective behavior?
Furthermore, there is BOM in our func_unknown_encoding.py module.

-- 

Emile Anclin 
http://www.logilab.fr/   http://www.logilab.org/ 
Informatique scientifique & et gestion de connaissances
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] python3k : imp.find_module raises SyntaxError

2010-11-25 Thread Ron Adam



On 11/25/2010 08:30 AM, Emile Anclin wrote:


hello,

working on Pylint, we have a lot of voluntary corrupted files to test
Pylint behavior; for instance

$ cat /home/emile/var/pylint/test/input/func_unknown_encoding.py
# -*- coding: IBO-8859-1 -*-
""" check correct unknown encoding declaration
"""

__revision__ = ''


and we try to find that module :
find_module('func_unknown_encoding', None). But python3 raises SyntaxError
in that case ; it didn't raise SyntaxError on python2 nor does so on our
func_nonascii_noencoding and func_wrong_encoding modules (with obvious
names)

Python 3.2a2 (r32a2:84522, Sep 14 2010, 15:22:36)
[GCC 4.3.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from imp import find_module
find_module('func_unknown_encoding', None)

Traceback (most recent call last):
   File "", line 1, in
SyntaxError: encoding problem: with BOM

find_module('func_wrong_encoding', None)

(<_io.TextIOWrapper name=5 encoding='utf-8'>, 'func_wrong_encoding.py',
('.py', 'U', 1))

find_module('func_nonascii_noencoding', None)

(<_io.TextIOWrapper name=6 encoding='utf-8'>,
'func_nonascii_noencoding.py', ('.py', 'U', 1))


So what is the reason of this selective behavior?
Furthermore, there is BOM in our func_unknown_encoding.py module.


I don't think there is a clear reason by design.  Also try importing the 
same modules directly and noting the differences in the errors you get.


For example, the problem that brought this to my attention in python3.2.

>>> find_module('test/badsyntax_pep3120')
Segmentation fault

>>> from test import badsyntax_pep3120
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.2/test/badsyntax_pep3120.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xf6' in file 
/usr/local/lib/python3.2/test/badsyntax_pep3120.py on line 1, but no 
encoding declared; see http://python.org/dev/peps/pep-0263/ for details



The import statement uses parser.c, and tokenizer.c indirectly, to import a 
file, but the imp module uses tokenizer.c directly.  They aren't consistent 
in how they handle errors because the different error messages are 
generated in different places depending on what the error is, *and* what 
the code path to get to that point was, *and* weather or not a filename was 
set.  For the example above with imp.findmodule(), the filename isn't set, 
so you get a different error than if you used import, which uses the parser 
module and that does set the filename.


From what I've seen, it would help if the imp module was rewritten to use 
parser.c like the import statement does, rather than tokenizer.c directly. 
The error handling in parser.c is much better than tokenizer.c.  Possibly 
tokenizer.c could be cleaned up after that and be made much simpler.


Ron Adam














___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86748 - in python/branches/py3k-urllib/Lib: http/client.py urllib/request.py

2010-11-25 Thread Éric Araujo
>>> Modified:
>>>python/branches/py3k-urllib/Lib/http/client.py
>>>python/branches/py3k-urllib/Lib/urllib/request.py
>> No tests?  Misc/NEWS?  :)
> 
> Note that this is work in a separate branch.

Ah, didn’t notice that!  Senthil replied as much in private email:

> That was in a different branch. Once stable shall definitey include
> the tests and news.

unconsciously-ignoring-svn-branches-to-preserve-sanity-ly yours,
Éric

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Victor Stinner
On Friday 19 November 2010 23:25:03 you wrote:
> > Python is unclear about non-BMP characters: narrow build was called
> > "ucs2" for long time, even if it is UTF-16 (each character is encoded to
> > one or two UTF-16 words).
> 
> No, no, no :-)
> 
> UCS2 and UCS4 are more appropriate than "narrow" and "wide" or even
> "UTF-16" and "UTF-32".

Ok for Python 2:

$ ./python 
Python 2.7.0+ (release27-maint:84618M, Sep  8 2010, 12:43:49) 
>>> import sys; sys.maxunicode
65535
>>> x=u'\U0010'; len(x)
2
>>> ord(x)
...
TypeError: ord() expected a character, but string of length 2 found


But Python 3 does use UTF-16 for narrow build:

$ ./python  

Python 3.2a3+ (py3k:86396:86399M, Nov 10 2010, 15:24:09)
   
>>> import sys; sys.maxunicode
65535
>>> c=chr(0x10); len(c)
2
>>> ord(c)
1114111

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py

2010-11-25 Thread Éric Araujo
Hello,

> Author: senthil.kumaran
> Log:
> Mouse support and colour to Demo/curses/life.py by Dafydd Crosby
> 
> Modified:
>python/branches/py3k/Demo/curses/life.py
Okay, this time I’m reacting to the right branch 

> Modified: python/branches/py3k/Demo/curses/life.py
> ==
> --- python/branches/py3k/Demo/curses/life.py  (original)
> +++ python/branches/py3k/Demo/curses/life.py  Thu Nov 25 15:56:44 2010
> @@ -1,6 +1,7 @@
>  #!/usr/bin/env python3
>  # life.py -- A curses-based version of Conway's Game of Life.
>  # Contributed by AMK
> +# Mouse support and colour by Dafydd Crosby
Shouldn’t his name rather be in Misc/ACKS too?  Modules typically
(warning: non-scientific data) include the name of the author or first
contributors but not the name of every contributor.

I think these cool features deserve a note in Misc/NEWS too :)

Re: “colour”: the rest of the file use US English, as do the function
names (see for example curses.has_color).  It’s good to use one dialect
consistently in one file.

going-back-to-stare-at-shiny-colors-ly yours,
Éric

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86750 - python/branches/py3k/Demo/curses/life.py

2010-11-25 Thread Senthil Kumaran
On Fri, Nov 26, 2010 at 02:32:43AM +0100, Éric Araujo wrote:
> Shouldn’t his name rather be in Misc/ACKS too?  Modules typically
> (warning: non-scientific data) include the name of the author or first
> contributors but not the name of every contributor.
> 
> I think these cool features deserve a note in Misc/NEWS too :)

I don't think it is required. Demo stuffs are usually fun
demonstrations. The contributor had added his name to patch in the
header, and I just left it like that. It's fine.

For features and important patches (subjective), Misc/{ACKS,NEWS} are
both added.

> Re: “colour”: the rest of the file use US English, as do the function
> names (see for example curses.has_color).  It’s good to use one dialect
> consistently in one file.

Good catch. Did not realize it because, we write it as colour too.
Changing it.

Thanks,
Senthil
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Stephen J. Turnbull
M.-A. Lemburg writes:

 > That would be a possibility as well... but I doubt that many users
 > are going to bother, since slicing surrogates is just as bad as
 > slicing combining code points and the latter are much more common in
 > real life and they do happen to mostly live in the BMP.

That's only if you require 100% fidelity in the data, which may not be
true in some use cases.  Where 99.99% fidelity is good enough, an
unexpected sliced surrogate pair is a show-stopper, while a sliced
combining character sequence not only doesn't stop the show (at least
in Python, and I doubt any correct Unicode process can signal a fatal
error there either, I can put a tilde on a Cyrillic character if I
want to, no?), it's probably readable enough that readers will assume
a keypunch error.

Personally, if available I would always use some such dodge in server
software (I don't care enough about 24x7 availability to write it
myself, though).  And never in a script for interactive use; something
needs fixing, may as well take the fatal error and fix it on the spot.
(Again, "on the spot" for me can mean "tomorrow".)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Stephen J. Turnbull
M.-A. Lemburg writes:

 > Please note that we can only provide one way of string indexing
 > in Python using the standard s[1] notation and since we don't
 > want that operation to be fast and no more than O(1), using the
 > code units as items is the only reasonable way to implement it.

AFAICT, the "we" that wants "no more than O(1)" does not include Glyph
Lefkowitz, James Knight, and Greg Ewing.  Greg even said that in
designing a UTF-8 string type he might not provide a indexing
operation at all.  (Caution: That may not be what he meant; I'm just
reporting the way I interpreted it.)  Of course none of them are
proposing to change Python, that's all in the context of designing a
new language.  But it does suggest that a lot of people can't think of
use cases where O(1) string indexing is more important than Unicode
robustness.

 > It is by far more important to maintain round-trip safety for
 > Unicode data, than getting every bit of code work correctly
 > with surrogates (often, there won't be a single correct way).

But surely it's more important than that to ensure that surrogates
can't crash a Python process with unexpect UnicodeErrors?

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Question about GDB bindings and 32/64 bits

2010-11-25 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

I have installed GDB 7.2 32 bits and 32 bits buildslaves are green.
Nevertheless 64 bits buildslaves are failing test_gdb.

Is there any expectation that a 32 bits GDB be able to debug a 64 bits
python?. If not, gdb test should compare "platform.architecture()" (for
python and gdb in the system) and run only when they are the same. If
this should work, I would open a bug and maybe spend some time with it.

But before thinking about investing time, I would like to know if this
mix is actually expected or not to work.

If not, I would consider to install a 64 bits GDB too and do some tricks
(like using an "/usr/local/bin/gdb" script wrapper to choose 32/64
"real" gdb version) to actually execute "test_gdb" in both buildslaves
(they are running in the same physical machine).

Any advice?

PS: I am talking about AMD64 OpenIndiana buildbots. Haven't check others.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
[email protected] - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:[email protected] _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
"Things are not so easy"  _/_/  _/_/_/_/  _/_/_/_/  _/_/
"My name is Dump, Core Dump"   _/_/_/_/_/_/  _/_/  _/_/
"El amor es poner tu felicidad en la felicidad de otro" - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTO8zjJlgi5GaxT1NAQLusgP9GVuhvQJWhPqjzdkZnrMObQg0AD6ggbIR
2B4IstFpD1bKvIcGPJv0Irk3+heaQuFbTzYVLC132d89Ektfib9ZbJ/hzJz2wqd2
lnkfNUCV0tKal3P7kbGYUk828glIrlufSuF1HYIknd2BAzHFl5Zf6q5/AXzYr90D
v4Y82b7Wg0k=
=NHcR
-END PGP SIGNATURE-
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 4:03 AM, Stephen J. Turnbull wrote:

> You end up proliferating types that all do the same kind of thing.  Judicious 
> use of inheritance helps, but getting the fundamental abstraction right is 
> hard.  Or least, Emacs hasn't found it in 20 years of trying.

Emacs hasn't even figured out how to do general purpose iteration in 20 years 
of trying either.  The easiest way I've found to loop across an arbitrary pile 
of 'stuff' is the CL 'loop' macro, which you're not even supposed to use.  Even 
then, you still have to make the arcane and pointless distinction of using 
'across' or 'in' or 'on'.  Python, on the other hand, has iteration pretty well 
tied up nicely in a bow.

I don't know how to respond to the rest of your argument.  Nothing you've said 
has in any way indicated to me why having code-point offsets is a good idea, 
only that people who know C and elisp would rather sling around piles of 
integers than have good abstract types.

For example:

> I think it more likely that markers are very expense to create and use 
> compared to integers.

What?  When you do 'for x in str' in python, you are already creating an 
iterator object, which has to store the exact same amount of state that our 
proposed 'marker' or 'character pointer' would have to store.  The proposed 
UTF-8 marker would have to do a tiny bit more work when iterating because it 
would have to combine multibyte characters, but in exchange for that you get to 
skip a whole ton of copying when encoding and decoding.  How is this expensive 
to create and use?  For every application I have ever designed, encountered, or 
can even conjecture about, this would be cheaper.  (Assuming not just a UTF-8 
string type, but one for UTF-16 as well, where native data is in that format 
already.)

For what it's worth, not wanting to use abstract types in Emacs makes sense to 
me: I've written my share of elisp code, and it is hard to create reasonable 
abstractions in Emacs, because the facilities for defining types and creating 
polymorphic logic are so crude.  It's a lot easier to just assume your 
underlying storage is an array, because at the end of the day you're going to 
need to call some functions on it which care whether it's an array or an alist 
or a list or a vector anyway, so you might as well just say so up front.  But 
in Python we could just call 'mystring.by_character()' or 
'mystring.by_codepoint()' and get an iterator object back and forget about all 
that junk.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-25 Thread Glyph Lefkowitz
On Nov 24, 2010, at 10:55 PM, Stephen J. Turnbull wrote:

> Greg Ewing writes:
>> On 24/11/10 22:03, Stephen J. Turnbull wrote:
>>> But
>>> if you actually need to remember positions, or regions, to jump to
>>> later or to communicate to other code that manipulates them, doing
>>> this stuff the straightforward way (just copying the whole iterator
>>> object to hang on to its state) becomes expensive.
>> 
>> If the internal representation of a text pointer (I won't call it
>> an iterator because that means something else in Python) is a byte
>> offset or something similar, it shouldn't take up any more space
>> than a Python int, which is what you'd be using anyway if you
>> represented text positions by grapheme indexes or whatever.
> 
> That's not necessarily true.  Eg, in Emacs ("there you go again"),
> Lisp integers are not only immediate (saving one pointer), but the
> type is encoded in the lower bits, so that there is no need for a type
> pointer -- the representation is smaller than the opaque marker type.
> Altogether, up to 8 of 12 bytes saved on a 32-bit platform, or 16 of
> 24 bytes on a 64-bit platform.

Yes, yes, lisp is very clever.  Maybe some other runtime, like PyPy, could make 
this optimization.  But I don't think that anyone is filling up main memory 
with gigantic piles of character indexes and need to squeeze out that extra 
couple of bytes of memory on such a tiny object.  Plus, this would allow such a 
user to stop copying the character data itself just to decode it, and on 
mostly-ascii UTF-8 text (a common use-case) this is a 2x savings right off the 
bat.

> In Python it's true that markers can use the same data structure as
> integers and simply provide different methods, and it's arguable that
> Python's design is better.  But if you use bytes internally, then you
> have problems.

No, you just have design questions.

> Do you expose that byte value to the user?

Yes, but only if they ask for it.  It's useful for computing things like quota 
and the like.

> Can users (programmers using the language and end users) specify positions in 
> terms of byte values?

Sure, why not?

> If so, what do you do if the user specifies a byte value that points into a 
> multibyte character?

Go to the beginning of the multibyte character.  Report that position; if the 
user then asks the requested marker object for its position, it will report 
that byte offset, not the originally-requested one.  (Obviously, do the same 
thing for surrogate pair code points.)

> What if the user wants to specify position by number of characters?

Part of the point that we are trying to make here is that nobody really cares 
about that use-case.  In order to know anything useful about a position in a 
text, you have to have traversed to that location in the text. You can remember 
interesting things like the offsets of starts of lines, or the x/y positions of 
characters.

> Can you translate efficiently?

No, because there's no point :).  But you _could_ implement an overlay that 
cached things like the beginning of lines, or the x/y positions of interesting 
characters.

> As I say elsewhere, it's possible that there really never is a need to 
> efficiently specify an absolute position in a large text as a character 
> (grapheme, whatever) count.

> But I think it would be hard to implement an efficient text-processing 
> *language*, eg, a Python module
> for *full conformance* in handling Unicode, on top of UTF-8.

Still: why?  I guess if I have some free time I'll try my hand at it, and maybe 
I'll run into a wall and realize you're right :).

> Any time you have an algorithm that requires efficient access to arbitrary 
> text positions, you'll spend all your skull sweat fighting the 
> representation.  At least, that's been my experience with Emacsen.

What sort of algorithm would that be, though?  The main thing that I could 
think of is a text editor trying to efficiently allow the user to scroll to the 
middle of a large file without reading the whole thing into memory.  But, in 
that case, you could use byte-positions to estimate, and display an heuristic 
number while calculating the real line numbers.  (This is what 'less' does, and 
it seems to work well.)

>> So I don't really see what you're arguing for here. How do
>> *you* think positions in unicode strings should be represented?
> 
> I think what users should see is character positions, and they should
> be able to specify them numerically as well as via an opaque marker
> object.  I don't care whether that position is represented as bytes or
> characters internally, except that the experience of Emacsen is that
> representation as byte positions is both inefficient and fragile.  The
> representation as character positions is more robust but slightly more
> inefficient.

Is it really the representation as byte positions which is fragile (i.e. the 
internal implementation detail), or the exposure of that position to calling 
code, and the idio