Re: [Python-Dev] [Python-checkins] r86703 - python/branches/release31-maint/Lib/idlelib/IOBinding.py

2010-11-23 Thread Georg Brandl
Am 23.11.2010 07:49, schrieb Terry Reedy:
 
 
 On 11/23/2010 1:16 AM, Senthil Kumaran wrote:
 Hi Terry,

 On Tue, Nov 23, 2010 at 2:07 PM, terry.reedypython-check...@python.org  
 wrote:
 Author: terry.reedy
 Date: Tue Nov 23 07:07:04 2010
 New Revision: 86703

 Log:
 Issue 9222 Fix filetypes for open dialog

 Modified:
python/branches/release31-maint/Lib/idlelib/IOBinding.py


 You should be using svnmerge.py script ( referenced in the dev FAQ),
 to merge your changes to release31-maint. This helps in merge tracking
 and helpful to release managers when they do the release.

 It is pretty simple, in your release31-maint checkout:

 Just run python svnmerge.py merge -r 9221 (your py3k revision value)
 If successful, do a svn commit -F svnmerge-output-filename ( this file
 is autogenerated)
 
 I am using TortoiseSVN which has a similar merge but does not seem to 
 autogenerate anything. I did use its merge + commit for the 2.7 backport.

While the policy is to use svnmerge and I'd expect developers to follow
this policy, in this specific case it's not as important anymore since we
use neither svnmerge's mass merging nor its blocking feature anymore.

Georg

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots

2010-11-23 Thread Trent Nelson

On 14-Nov-10 3:48 AM, David Bolen wrote:

This is a completely separate issue, though probably around just as
long, and like the popup problem its frequency changes over time.  By
hung here I'm referring to cases where something must go wrong with
a test and/or its cleanup such that a python_d process remains
running, usually several of them at the same time.


My guess: the hung (single-threaded) Python process has called 
select() without a timeout in order to wait for some data.  However, the 
data never arrives (due to a broken/failed test), and the select() never 
returns.


On Windows, processes seem harder to kill when they get into this state. 
 If I purposely wedge a Windows process via select() via the 
interactive interpreter, ctrl-c has absolutely no effect (whereas on 
Unix, ctrl-c will interrupt the select()).


As for why kill_python.exe doesn't seem to be able to kill said wedged 
processes, the MSDN documentation on TerminateProcess[1] states the 
following:


The terminated process cannot exit until all
pending I/O has been completed or canceled. (sic)

It's not unreasonable to assume a wedged select() constitutes pending 
I/O, so that's a possible explanation as to why kill_python.exe isn't 
able to terminate the processes.


(Also, kill_python currently assumes TerminateProcess() always works; 
perhaps this optimism is misplaced.  Also note the XXX TODO regarding 
the fact that we don't kill processes that have loaded our python*.dll, 
but may not be named python_d.exe.  I don't think that's the issue here, 
though.)


On 14-Nov-10 5:32 AM, David Bolen wrote:
 Martin v. Löwismar...@v.loewis.de  writes:

 This is what kill_python.exe is supposed to solve. So I recommend to
 investigate why it fails to kill the hanging Pythons.

 Yeah, I know, and I can't say I disagree in principle - not sure why
 Windows doesn't let the kill in that module work (or if there's an
 issue actually running it under all conditions).

 At the moment though, I do know that using the sysinternals pskill
 utility externally (which is what I currently do interactively)
 definitely works so to be honest,

That's interesting.  (That kill_python.exe doesn't kill the wedged 
processes, but pskill does.)  kill_python is pretty simple, it just 
calls TerminateProcess() after acquiring a handle with the relevant 
PROCESS_TERMINATE access right.  That being said, that's the recommended 
way to kill a process -- I doubt pskill would be going about it any 
differently (although, it is sysinternals... you never know what kind of 
crazy black magic it's doing behind the scenes).


Are you calling pskill with the -t flag? i.e. kill process and all 
dependents?  That might be the ticket, especially if killing the child 
process that wedged select() is waiting on causes it to return, and 
thus, makes it killable.


Otherwise, if it happens again, can you try kill_python.exe first, then 
pskill, and confirm if the former fails but the latter succeeds?


Trent.


[1]: http://msdn.microsoft.com/en-us/library/ms686714(VS.85).aspx
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Glenn Linderman

On 11/22/2010 8:33 AM, Guido van Rossum wrote:

On Sun, Nov 21, 2010 at 9:40 PM, Glenn Lindermanv+pyt...@g.nevcal.com  wrote:

  In reviewing my notes from my experimentations with CGIHTTPServer
  (Python2.6) and then http.server (Python 3.2a4), I note one behavior I
  haven't reported as a bug, nor do I know where to start to figure it out,
  other than experimentally.

  The experiment: launching CGIHTTPServer without environment variables, by
  the simple expedient of using a batch file to unset all the existing
  environment variables, and then launching Python2.6 with CGIHTTPServer.

  So it failed early: random.py fails at line 110 (Python 2.6).

What specific traceback do you get? In my copy of the code that line says

 a = long(_hexlify(_urandom(16)), 16)

and I could just imagine that _urandom() fails for some reason to do
with the environment (it is a reference to os.urandom()), which, being
part of the C library code, might depend on the environment.

But you're not giving enough info to debug this.


OK, here is the traceback.  I've upgraded the application from Python 
2.6 + CGIHTTPServer.py + bugfixes to Python 3.2a4 + http.server + 
bugfixes, hoping that it would fix it, but since it didn't that the 
traceback would be more relevant.  It seems that _urandom is the likely 
culprit.


Traceback (most recent call last):
  File d:\my\web\areliabl\0test\https.py, line 5, in module
import server
  File d:\my\web\areliabl\0test\server.py, line 88, in module
import email.message
  File C:\Python32\lib\email\message.py, line 17, in module
from email import utils
  File C:\Python32\lib\email\utils.py, line 27, in module
import random
  File C:\Python32\lib\random.py, line 698, in module
_inst = Random()
  File C:\Python32\lib\random.py, line 90, in __init__
self.seed(x)
  File C:\Python32\lib\random.py, line 108, in seed
a = int.from_bytes(_urandom(32), 'big')
WindowsError: [Error -2146893818] Invalid Signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Amaury Forgeot d'Arc
Hi,

2010/11/23 Glenn Linderman v+pyt...@g.nevcal.com:
   File C:\Python32\lib\random.py, line 108, in seed
     a = int.from_bytes(_urandom(32), 'big')
 WindowsError: [Error -2146893818] Invalid Signature

In the subprocess documentation http://docs.python.org/library/subprocess.html
On Windows, in order to run a side-by-side assembly the specified
env *must* include a valid SystemRoot.

Can you keep this variable and start again?

-- 
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Martin v. Löwis
Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc:
 Hi,
 
 2010/11/23 Glenn Linderman v+pyt...@g.nevcal.com:
   File C:\Python32\lib\random.py, line 108, in seed
 a = int.from_bytes(_urandom(32), 'big')
 WindowsError: [Error -2146893818] Invalid Signature
 
 In the subprocess documentation http://docs.python.org/library/subprocess.html
 On Windows, in order to run a side-by-side assembly the specified
 env *must* include a valid SystemRoot.

Indeed, setting SystemRoot might solve this problem. According to

http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/

CrypoAPI, in Windows 7, requires this variable be set. Failure to
find the enhanced crypto provider would explain why the random
module of Python fails to work.

The specific cause is in the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft
Strong Cryptographic Provider has as it's ImagePath value

%SystemRoot%\system32\rsaenh.dll

So the registry (and COM) do rely on environment variables.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
Terry Reedy writes:

  Yes. As I read the standard, UCS-2 is limited to BMP chars.

Et tu, Terry?

OK, I change my vote on the suggestion of UCS2 to -1.  If a couple
of conscientious blokes like you and David both understand it that
way, I can't see any way to fight it.

FWIW, ISO/IEC 10646 (which is authoritative for UCS-2 and UCS-4) is
available via

http://standards.iso.org/ittf/PubliclyAvailableStandards/index.html

Probably I'm the last non-author to ever read that document!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re-enable warnings in regrtest and/or unittest

2010-11-23 Thread Nadeem Vawda
2010/11/23 Łukasz Langa luk...@langa.pl:
 If you agree to do that for regrtest I will clean up the tests for warnings.
 Already did that for zipfile so it doesn't raise ResourceWarnings anymore. I
 just need to correct multiprocessing and xmlrpc ResourceWarnings, silence
 some DeprecationWarnings in the tests and we're all set. Ah, I see a couple
 more with -uall but nothing scary.

There are also some in test_socket - I've submitted a patch on
Roundup: http://bugs.python.org/issue10512

Looking at the multiprocessing warnings, they seem to be caused by
leaks in the underlying package, unlike xmlrpc and socket, where it's
just a matter of the test code neglecting to close the connection.  So
+1 to:

 Anyway, I find warnings as errors in regrtest a welcome feature. Let's make
 it happen :)

Nadeem
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Solaris family and 64 bits compiling

2010-11-23 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/11/10 07:55, Martin v. Löwis wrote:
  But if we say the Python can be compiled as 64 bits under Solaris,
would
  be nice if that was actually true. Now that we have a buildbot (under
  OpenIndiana) to test, it is doable.
 
  But it is true, and always has been true. The lib/64 issue did not
  prevent one building Python on Solaris/SPARC64 at all, including the
  extension modules. Just edit Modules/Setup to suit your needs - that
  works since 1995 (before distutils was even written).
Would be acceptable to change something like:


add_library_path(/usr/local/lib)


to something similar to:


if (platform.uname()==SunOS) and (platform.architecture()[0]==64bits) :
  add_library_path(/usr/local/lib/64)
else :
  add_library_path(/usr/local/lib)


python-dev would consider that change OK?.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTOuxW5lgi5GaxT1NAQJuDwP/dzbhDZScanoSnPeF4Ze5XHm+WnSmowx+
x9qvM782i4bYzqYNsbpPHflshROpUwdl9dC0/dFySLFWmMYo12hYogbM6vr5RD6k
vEgq1iriIfsei9yNrtt2Ou6+1LVxJ2FMsbpY0Av5hDQVfuJpvB5WRML/mbyYj4T7
9w/jmPT2+rc=
=riDG
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86633 - in python/branches/py3k: Doc/library/inspect.rst Doc/whatsnew/3.2.rst Lib/inspect.py Lib/test/test_inspect.py Misc/NEWS

2010-11-23 Thread Nick Coghlan
On Tue, Nov 23, 2010 at 2:46 AM,  exar...@twistedmatrix.com wrote:
 On 04:24 pm, solip...@pitrou.net wrote:

 On Mon, 22 Nov 2010 17:08:36 +0100
 Hrvoje Niksic hrvoje.nik...@avl.com wrote:

 On 11/22/2010 04:37 PM, Antoine Pitrou wrote:
  +1.  The problem with int constants is that the int gets printed, not
  the name, when you dump them for debugging purposes :)

 Well, it's trivial to subclass int to something with a nicer __repr__.
 PyGTK uses that technique for wrapping C enums:

 Nice. It might be useful to add a private _Constant class somewhere for
 stdlib purposes.

 http://www.python.org/dev/peps/pep-0354/

Indeed, it is difficult to do enums is such a way that they feel
sufficiently robust to be worth the effort of including them (although
these days, I would be inclined to follow the namedtuple API style
rather than that presented in PEP 354).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 13:41, Nick Coghlan wrote:

On Tue, Nov 23, 2010 at 2:46 AM,exar...@twistedmatrix.com  wrote:

On 04:24 pm, solip...@pitrou.net wrote:

On Mon, 22 Nov 2010 17:08:36 +0100
Hrvoje Niksichrvoje.nik...@avl.com  wrote:

On 11/22/2010 04:37 PM, Antoine Pitrou wrote:

+1.  The problem with int constants is that the int gets printed, not
the name, when you dump them for debugging purposes :)

Well, it's trivial to subclass int to something with a nicer __repr__.
PyGTK uses that technique for wrapping C enums:

Nice. It might be useful to add a private _Constant class somewhere for
stdlib purposes.

http://www.python.org/dev/peps/pep-0354/

Indeed, it is difficult to do enums is such a way that they feel
sufficiently robust to be worth the effort of including them (although
these days, I would be inclined to follow the namedtuple API style
rather than that presented in PEP 354).
Right. As it happens I just submitted a patch to Barry Warsaw's enum 
package (nice), flufl.enum [1], to allow namedtuple style creation of 
named constants:


 from flufl.enum import make_enum
 Colors = make_enum('Colors', 'red green blue')
 Colors
Colors {red: 1, green: 2, blue: 3}


PEP 354 was rejected for two primary reasons - lack of interest and 
nowhere obvious to put it. Would it be *so bad* if an enum type lived in 
its own module? There is certainly more interest now, and if we are to 
use something like this in the standard library it *has* to be in the 
standard library (unless every module implements their own private 
_Constant class).


Time to revisit the PEP?

All the best,

Michael

[1] https://launchpad.net/flufl.enum


Cheers,
Nick.




--
http://www.voidspace.org.uk/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-23 Thread Antoine Pitrou
On Tue, 23 Nov 2010 00:07:09 -0500
Glyph Lefkowitz gl...@twistedmatrix.com wrote:
 On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
 ocean-c...@m2.ccsnet.ne.jp wrote:
 
  Hello. Does this affect python? Thank you.
 
  http://www.openssl.org/news/secadv_20101116.txt
 
 
 No.

Well, actually it does, but Python links against the system OpenSSL on
most platforms (except Windows), so it's up to the OS vendor to apply
the patch.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Re-enable warnings in regrtest and/or unittest

2010-11-23 Thread Nick Coghlan
On Tue, Nov 23, 2010 at 8:01 AM, Michael Foord
fuzzy...@voidspace.org.uk wrote:
 On 22/11/2010 21:08, Guido van Rossum wrote:

 On Mon, Nov 22, 2010 at 11:24 AM, Brett Cannonbr...@python.org  wrote:

 The problem with that is it means developers who switch to Python 3.2
 or whatever are suddenly going to have their tests fail until they
 update their code to turn the warnings off.

 That sounds like a feature to me... :-)

 I think Ezio was suggesting just turning warnings on by default when
 unittest is run, not turning them into errors. Ezio is suggesting that
 developers could explicitly turn warnings off again, but when you use the
 default test runner warnings would be shown. His logic is that warnings are
 for developers, and so are tests...

Having at least the default test runner change the default warnings
behaviour to -Wd (while still respecting sys.warnoptions) sounds like
a good idea. That way users won't see the warnings (as intended with
that change), but developers are less likely to get nasty surprises
when things break in future releases (which was one of our major
concerns when we made the decision to change the default handling of
DeprecationWarning). A similar change may be appropriate for doctest
as well.

Printing out the list of suppressed warnings in verbose mode may also be useful.

A blanket -We is unlikely to work for the test suite, since generating
warnings on some platforms is expected behaviour (e.g. due to the
ongoing argument between multiprocessing and FreeBSD as to the
appropriate behaviour of semaphores). However, we may be able to get
to the point where it is run that way by default and then affected
tests use check_warnings() to alter the filter configuration
(something that many such affected tests already do).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py

2010-11-23 Thread Antoine Pitrou
On Mon, 22 Nov 2010 22:00:08 -0600
Benjamin Peterson benja...@python.org wrote:
 2010/11/22 Łukasz Langa luk...@langa.pl:
  Wiadomość napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 00:47:
 
  No test?
 
 
  The tests were there already, raising ResourceWarnings. After this change,
  they stopped doing that. You may say: now they pass for the first time :)
 
 It looks like you added new API, though. For that, we would expect new tests.

It's an internal API, although ZipExtFile doesn't begin with an
underscore.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Nick Coghlan
On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord
fuzzy...@voidspace.org.uk wrote:
 PEP 354 was rejected for two primary reasons - lack of interest and nowhere
 obvious to put it. Would it be *so bad* if an enum type lived in its own
 module? There is certainly more interest now, and if we are to use something
 like this in the standard library it *has* to be in the standard library
 (unless every module implements their own private _Constant class).

 Time to revisit the PEP?

If you (or anyone else) wanted to revisit the PEP, then I would advise
trawling through the standard library looking for constants that could
be sensibly converted to enum values.

A decision would also need to be made as to whether or not to subclass
int, or just provide __index__ (the former has the advantage of being
able to drop cleanly into OS level APIs that expect a numerical
constant).

Whether enums should provide arbitrary name-value mappings (ala C
enums) or were restricted to sequential indices starting from zero
would be another question best addressed by a code survey of at least
the stdlib.

And getgeneratorstate() doesn't count as a use case, since the
ordering isn't needed and using string literals instead of integers
will cover the debugging aspect :)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 14:16, Nick Coghlan wrote:

On Tue, Nov 23, 2010 at 11:50 PM, Michael Foord
fuzzy...@voidspace.org.uk  wrote:

PEP 354 was rejected for two primary reasons - lack of interest and nowhere
obvious to put it. Would it be *so bad* if an enum type lived in its own
module? There is certainly more interest now, and if we are to use something
like this in the standard library it *has* to be in the standard library
(unless every module implements their own private _Constant class).

Time to revisit the PEP?

If you (or anyone else) wanted to revisit the PEP, then I would advise
trawling through the standard library looking for constants that could
be sensibly converted to enum values.

A decision would also need to be made as to whether or not to subclass
int, or just provide __index__ (the former has the advantage of being
able to drop cleanly into OS level APIs that expect a numerical
constant).

Whether enums should provide arbitrary name-value mappings (ala C
enums) or were restricted to sequential indices starting from zero
would be another question best addressed by a code survey of at least
the stdlib.

And getgeneratorstate() doesn't count as a use case, since the
ordering isn't needed and using string literals instead of integers
will cover the debugging aspect :)

Well, for backwards compatibility reasons the new constants would have 
to *behave* like the old ones (including having the same underlying 
value and comparing equal to it).


In many cases it is *likely* that subclassing int is a better way of 
achieving that. Actually looking through the standard library to 
evaluate it is the only way of confirming that.


Another API, that reduces the duplication of creating the enum and 
setting the names, could be something like:


make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int, 
module=__name__)


Using __name__ we can set the module globals in the call to make_enums.

All the best,

Michael



Cheers,
Nick.




--
http://www.voidspace.org.uk/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
On Tue, 23 Nov 2010 14:24:18 +
Michael Foord fuzzy...@voidspace.org.uk wrote:
 Well, for backwards compatibility reasons the new constants would have 
 to *behave* like the old ones (including having the same underlying 
 value and comparing equal to it).
 
 In many cases it is *likely* that subclassing int is a better way of 
 achieving that. Actually looking through the standard library to 
 evaluate it is the only way of confirming that.
 
 Another API, that reduces the duplication of creating the enum and 
 setting the names, could be something like:
 
  make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int, 
 module=__name__)
 
 Using __name__ we can set the module globals in the call to make_enums.

I don't understand why people insist on calling that an enum. enum is
a C legacy and it doesn't bring anything useful as I can tell. Instead,
just assign the values explicitly.

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py

2010-11-23 Thread Benjamin Peterson
2010/11/23 Antoine Pitrou solip...@pitrou.net:
 On Mon, 22 Nov 2010 22:00:08 -0600
 Benjamin Peterson benja...@python.org wrote:
 2010/11/22 Łukasz Langa luk...@langa.pl:
  Wiadomość napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 
  00:47:
 
  No test?
 
 
  The tests were there already, raising ResourceWarnings. After this change,
  they stopped doing that. You may say: now they pass for the first time :)

 It looks like you added new API, though. For that, we would expect new tests.

 It's an internal API, although ZipExtFile doesn't begin with an
 underscore.

Why is it internal API then?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Benjamin Peterson
2010/11/23 Antoine Pitrou solip...@pitrou.net:
 On Tue, 23 Nov 2010 14:24:18 +
 Michael Foord fuzzy...@voidspace.org.uk wrote:
 Well, for backwards compatibility reasons the new constants would have
 to *behave* like the old ones (including having the same underlying
 value and comparing equal to it).

 In many cases it is *likely* that subclassing int is a better way of
 achieving that. Actually looking through the standard library to
 evaluate it is the only way of confirming that.

 Another API, that reduces the duplication of creating the enum and
 setting the names, could be something like:

      make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int,
 module=__name__)

 Using __name__ we can set the module globals in the call to make_enums.

 I don't understand why people insist on calling that an enum. enum is
 a C legacy and it doesn't bring anything useful as I can tell. Instead,
 just assign the values explicitly.

The concept of a enumeration of values is still useful outside its
stunted C incarnation.

Out of curiosity, why is enum legacy in C?



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 14:42, Antoine Pitrou wrote:

On Tue, 23 Nov 2010 14:24:18 +
Michael Foordfuzzy...@voidspace.org.uk  wrote:

Well, for backwards compatibility reasons the new constants would have
to *behave* like the old ones (including having the same underlying
value and comparing equal to it).

In many cases it is *likely* that subclassing int is a better way of
achieving that. Actually looking through the standard library to
evaluate it is the only way of confirming that.

Another API, that reduces the duplication of creating the enum and
setting the names, could be something like:

  make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int,
module=__name__)

Using __name__ we can set the module globals in the call to make_enums.

I don't understand why people insist on calling that an enum. enum is
a C legacy and it doesn't bring anything useful as I can tell. Instead,
just assign the values explicitly.



enum isn't only in C. (They are in C# as well at least.) Wikipedia links 
enum to enumerated type and says:


an enumerated type (also called enumeration or enum) is a data type 
consisting of a set of named values


It sounds entirely appropriate. I have no problem with explicitly 
assigning values instead of doing it automagically.


All the best,

Michael


Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
If you don't care about the ISO standard, but only about Python,
Martin's right, I was wrong.  You can stop reading now.wink

Martin v. Löwis writes:

  I could only find the FCD of 10646:2010, where annex H was integrated
  into section 10:

Thank you for the reference.

I referred to two older versions, 10646-1:1993 (for the annexes and
Amendment, and my basic understanding) and 10646:2003 (for the
detailed definition of UCS-2 in Sections 7, 8 and 13; unfortunately, I
missed the most important detail, which is in Section 9).  In :2003
the Annex I referred to as Annex H is Annex J, and Annex Q is
partly in Section 9.1 and mostly in Annex C.  I don't know where the
former is in the 2010 FCD, and the latter is section 9.2.

  I think they are now acknowledging that UCS-2 was a misleading term,
  making it ambiguous whether this refers to a CCS, a CEF, or a CES;
  like ASCII, people have been using it for all three of them.

In :1993 it wasn't ambiguous, they simply didn't make those
distinctions.  They were not needed for ISO 10646's published
versions, although they certainly are for Unicode.

Now, quite clearly, the ISO has *changed the definition* in every new
version, progressively adding new restrictions that go beyond
clarifying ambiguity.  But even in :2003, in view of 4.2, 6.2, 6.3,
and 13.1, UCS-2 is clearly well-defined as a CM according to UTR#17,
which can probably be identified with CCS in :2003 terminology.  Ie,
returning to UTR#17 terminology, it is the composition of a CES, a
CEF, and a CCS, which are not defined individually.  Note: The
definition of coded character changed between :2003 and the 2010
FCD, from character with representation to character with integer.

There is a NOTE indicating that 16-bit integers may be used in
processing.  Given that this is a non-normative note, I take it to
mean that in an array of 16-bit integers, most significant octet is
to be interpreted in the natural way for the architecture rather than
by the representation in memory, which might be little-endian.  IMO
it's unnatural to think that that changes the definition of UCS-2 to
be either a CEF, or a composition of a CEF and a CCS.

  Apparently, the ISO WG interprets earlier revisions as saying that
  UCS-2 is a CEF that restricted UTF-16 to the BMP.

I think that ISO 10646-1:1993 admits only one interpretation, a CM
restricted to the BMP (including surrogates), and ISO 10646:2003
admits only one interpretation, a CM restricted to the BMP (not
including surrogates).  The note under Table 4 on p.24 of the FCD is,
uh, well, a lie.  Earlier versions certainly did not restrict to
scalar values; they had no such concept.

  THIS IS NOT WHAT PYTHON DOES.

Well, no shit, Sherlock.  You don't have to yell at me, I know what
Python does.  The question is, is what does UCS-2 do?  The answer is
that in :1993, AFAICT it did what Python does.  In :2003, they added
(last sentence, section 9.1):

UCS-2 cannot be used to represent any characters on the
supplementary planes.

I assume they maintain that position in 2010, so End Of Thread.

I apologize for missing that when I was reviewing the standard
earlier, but I expected restrictions on UCS-2 to be explained in 13.1
or perhaps 14.  And 13.1 simply requires that characters in the BMP be
represented by their defined code positions, truncated to two octets.
Like earlier versions, it doesn't prohibit use of surrogates or say
that non-BMP characters can't be represented.

  Not sure what it says in your copy; in mine, section 9.3 says

[snip]

Mine (:2003) says NOTE 2 - When confined to the code positions in
Planes 00 to 10, UCS-4 is also referred to as UCS Transformation
Format 32 (UTF-32).  Then it references the Unicode Standard (v4.0)
as the authority for UTF-32.  Obviously they continued to be confused
at this point in time; by the draft you have, apparently the WG had
decided to pretty much completely synchronize the whole standard to a
subset of Unicode.  This seems pointless to me (unlike, say, the work
that has been done on standardizing criteria for repertoire changes).

In particular, the :1993 definition of UCS-2 was a perfectly good
standard for describing the processing Python actually does
internally.  The current definition of UCS-2 as identical to the BMP
is useless, and good riddance, I'm perfectly happy to have them
deprecate it.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 08:52 -0600, Benjamin Peterson a écrit :
 2010/11/23 Antoine Pitrou solip...@pitrou.net:
  On Tue, 23 Nov 2010 14:24:18 +
  Michael Foord fuzzy...@voidspace.org.uk wrote:
  Well, for backwards compatibility reasons the new constants would have
  to *behave* like the old ones (including having the same underlying
  value and comparing equal to it).
 
  In many cases it is *likely* that subclassing int is a better way of
  achieving that. Actually looking through the standard library to
  evaluate it is the only way of confirming that.
 
  Another API, that reduces the duplication of creating the enum and
  setting the names, could be something like:
 
   make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int,
  module=__name__)
 
  Using __name__ we can set the module globals in the call to make_enums.
 
  I don't understand why people insist on calling that an enum. enum is
  a C legacy and it doesn't bring anything useful as I can tell. Instead,
  just assign the values explicitly.
 
 The concept of a enumeration of values is still useful outside its
 stunted C incarnation.

Well, it is easy to assign range(N) to a tuple of names when desired. I
don't think an automatically-enumerating constant generator is needed.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 14:56 +, Michael Foord a écrit :
 On 23/11/2010 14:42, Antoine Pitrou wrote:
  On Tue, 23 Nov 2010 14:24:18 +
  Michael Foordfuzzy...@voidspace.org.uk  wrote:
  Well, for backwards compatibility reasons the new constants would have
  to *behave* like the old ones (including having the same underlying
  value and comparing equal to it).
 
  In many cases it is *likely* that subclassing int is a better way of
  achieving that. Actually looking through the standard library to
  evaluate it is the only way of confirming that.
 
  Another API, that reduces the duplication of creating the enum and
  setting the names, could be something like:
 
make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int,
  module=__name__)
 
  Using __name__ we can set the module globals in the call to make_enums.
  I don't understand why people insist on calling that an enum. enum is
  a C legacy and it doesn't bring anything useful as I can tell. Instead,
  just assign the values explicitly.
 
 
 enum isn't only in C. (They are in C# as well at least.)

Well, it's been inherited by C-like languages, no doubt. Like braces and
semicolumns :)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r86699 - python/branches/py3k/Lib/zipfile.py

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 08:49 -0600, Benjamin Peterson a écrit :
 2010/11/23 Antoine Pitrou solip...@pitrou.net:
  On Mon, 22 Nov 2010 22:00:08 -0600
  Benjamin Peterson benja...@python.org wrote:
  2010/11/22 Łukasz Langa luk...@langa.pl:
   Wiadomość napisana przez Benjamin Peterson w dniu 2010-11-23, o godz. 
   00:47:
  
   No test?
  
  
   The tests were there already, raising ResourceWarnings. After this 
   change,
   they stopped doing that. You may say: now they pass for the first time :)
 
  It looks like you added new API, though. For that, we would expect new 
  tests.
 
  It's an internal API, although ZipExtFile doesn't begin with an
  underscore.
 
 Why is it internal API then?

Because it's for use by ZipFile.open(). The ZipExtFile constructor is
not supposed to be called by the user.
You might instead asked why ZipExtFile isn't called _ZipExtFile instead,
and I have no idea.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 15:01, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 08:52 -0600, Benjamin Peterson a écrit :

2010/11/23 Antoine Pitrousolip...@pitrou.net:

On Tue, 23 Nov 2010 14:24:18 +
Michael Foordfuzzy...@voidspace.org.uk  wrote:

Well, for backwards compatibility reasons the new constants would have
to *behave* like the old ones (including having the same underlying
value and comparing equal to it).

In many cases it is *likely* that subclassing int is a better way of
achieving that. Actually looking through the standard library to
evaluate it is the only way of confirming that.

Another API, that reduces the duplication of creating the enum and
setting the names, could be something like:

  make_enums(Names, NAME_ONE NAME_TWO NAME_THREE, base_type=int,
module=__name__)

Using __name__ we can set the module globals in the call to make_enums.

I don't understand why people insist on calling that an enum. enum is
a C legacy and it doesn't bring anything useful as I can tell. Instead,
just assign the values explicitly.

The concept of a enumeration of values is still useful outside its
stunted C incarnation.

Well, it is easy to assign range(N) to a tuple of names when desired. I
don't think an automatically-enumerating constant generator is needed.

Right, and that is current practise. It has the disadvantage (that you 
seemed to acknowledge) that when debugging the integer values are seen 
instead of something with a useful repr.


Having a *simple* class (and API to create them) that produces named 
constants with a useful repr, is what we are discussing, and that seems 
awfully like an enum (in the general sense not in a C specific sense). 
For backwards compatibility these constants, where they replace integer 
constants, would need to be integer subclasses with the same behaviour. 
Like the Qt example you appreciated so much. ;-)


There are still two reasonable APIs (unless you have changed your mind 
and think that sticking with plain integers is best), of which I prefer 
the latter:


SOME_CONST = Constant('SOME_CONST', 1)
OTHER_CONST = Constant('OTHER_CONST', 2)

or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)
SOME_CONST = Constants.SOME_CONST
OTHER_CONST = Constants.OTHER_CONST

(Well, there is a third option that takes __name__ and sets the 
constants in the module automagically. I can understand why people would 
dislike that though.)


All the best,

Michael Foord

Michael


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--
http://www.voidspace.org.uk/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 15:15 +, Michael Foord a écrit :
 There are still two reasonable APIs (unless you have changed your mind 
 and think that sticking with plain integers is best), of which I prefer 
 the latter:
 
 SOME_CONST = Constant('SOME_CONST', 1)
 OTHER_CONST = Constant('OTHER_CONST', 2)
 
 or:
 
 Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)

Or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',   
   values=range(1, 3))

Again, auto-enumeration is useless since it's trivial to achieve
explicitly.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 15:30, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 15:15 +, Michael Foord a écrit :

There are still two reasonable APIs (unless you have changed your mind
and think that sticking with plain integers is best), of which I prefer
the latter:

SOME_CONST = Constant('SOME_CONST', 1)
OTHER_CONST = Constant('OTHER_CONST', 2)

or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)

Or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',
values=range(1, 3))

Again, auto-enumeration is useless since it's trivial to achieve
explicitly.


Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem.

I think the step that Nick described, of evaluating places in the 
standard library that this could be used, is a good one. I'll try to get 
around to it and perhaps attempt to resuscitate the PEP. (Any 
suggestions as to an appropriate module if having it live in its own 
module is still an objection?)


Michael


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 15:40 +, Michael Foord a écrit :
 On 23/11/2010 15:30, Antoine Pitrou wrote:
  Le mardi 23 novembre 2010 à 15:15 +, Michael Foord a écrit :
  There are still two reasonable APIs (unless you have changed your mind
  and think that sticking with plain integers is best), of which I prefer
  the latter:
 
  SOME_CONST = Constant('SOME_CONST', 1)
  OTHER_CONST = Constant('OTHER_CONST', 2)
 
  or:
 
  Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)
  Or:
 
  Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',
  values=range(1, 3))
 
  Again, auto-enumeration is useless since it's trivial to achieve
  explicitly.
 
 Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem.
 
 I think the step that Nick described, of evaluating places in the 
 standard library that this could be used, is a good one. I'll try to get 
 around to it and perhaps attempt to resuscitate the PEP. (Any 
 suggestions as to an appropriate module if having it live in its own 
 module is still an objection?)

We already have a bunch of bizarrely unrelated stuff in collections
(such as Callable), so we could put enum there too.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 16:05, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 15:40 +, Michael Foord a écrit :

On 23/11/2010 15:30, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 15:15 +, Michael Foord a écrit :

There are still two reasonable APIs (unless you have changed your mind
and think that sticking with plain integers is best), of which I prefer
the latter:

SOME_CONST = Constant('SOME_CONST', 1)
OTHER_CONST = Constant('OTHER_CONST', 2)

or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)

Or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',
 values=range(1, 3))

Again, auto-enumeration is useless since it's trivial to achieve
explicitly.

Ah, I see. It is the auto-enumeration you disliked. Sure - not a problem.

I think the step that Nick described, of evaluating places in the
standard library that this could be used, is a good one. I'll try to get
around to it and perhaps attempt to resuscitate the PEP. (Any
suggestions as to an appropriate module if having it live in its own
module is still an objection?)

We already have a bunch of bizarrely unrelated stuff in collections
(such as Callable), so we could put enum there too.



I guess it creates collections of constants...

Michael


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Ben . Cottrell
On Tue, 23 Nov 2010 15:15:29 +, Michael Foord wrote:
 There are still two reasonable APIs (unless you have changed your mind 
 and think that sticking with plain integers is best), of which I prefer 
 the latter:
 
 SOME_CONST = Constant('SOME_CONST', 1)
 OTHER_CONST = Constant('OTHER_CONST', 2)
 
 or:
 
 Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)
 SOME_CONST = Constants.SOME_CONST
 OTHER_CONST = Constants.OTHER_CONST

I prefer the latter too, because that makes it possible to have
'Constants' be a rendezvous point for making sure that you're
passing something valid. Perhaps using 'in':

def func(foo):
if foo not in Constants:
raise ValueError('foo must be SOME_CONST or OTHER_CONST')
...

I know this is probably not going to happen, but I would *so much*
like it if functions would start rejecting the wrong kind of 2.
Constants that are valid, integer-wise, but which aren't part of
the set of constants allowed for that argument. I'd prefer not to
think of the number of times I've made the following mistake:

s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)

~Ben
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
Martin v. Löwis writes:

  I disagree: Quoting from Unicode 5.0, section 5.4:
  
  # The individual components of implementations may have different
  # levels of support for surrogates, as long as those components are
  # assembled and communicate correctly.

Assembly is the problem.  If chr() or a slice creates a lone
surrogate and surrogateescape passes it back out, Python as a whole is
non-conforming.

Technically, you can hide behind none of slicing, chr(), or
surrogateescape promises to conform, and maybe that would fly to a
standards lawyer; I'd have to see the precise statement.

Here's a more convincing example.  A user specifies utf8 as her
locale charset.  Then she specifies a string containing a non-BMP
character as the description of a file, and internal code munges
this via slicing into a file name conforming to some specification
(eg, length limit + uniquifier if needed).  Then if the non-BMP
character is in the right place, she will get either a broken file
name, which will either get written to disk or raise an exception,
depending on whether the munging program has enabled surrogateescape
or not.

I claim both of those results are non-conforming to the specification
of UTF-16, and therefore Python Unicode processing as a whole must be
considered non-conforming.

It's still pretty damn good.  But I've elaborated that point
elsewhere.

  The rationale for supporting these characters in chr() goes back much
  further than the surrogateescape handler - as Python unicode strings
  are sequences of code points, it would be impractical if you couldn't
  create some of them, or even would have to consult the UCD before
  determining whether they can be created.

The Zen is irrelevant to determining conformance to Unicode, which has
its own Zen.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
Nick Coghlan writes:

  For practical purposes, UCS2/UCS4 convey far more inherent information
  than narrow/wide:

That was my stance, but in fact (1) the ISO JTC1/SC2 has deliberately
made them ambiguous by changing their definitions over the years[1],
and (2) the more recent definitions and interpretations of UCS-2
*prohibit* use of surrogates in UCS-2 as far as I can tell.  And
that's what you'll see everywhere you look, because Wikipedia and
friends pick up the most recent versions of everything.

  So don't just think about what will developers know?, also think
  about what will developers know, and what will a quick trip to a
  search engine tell them?.

It will tell them that UCS-2 cannot even *express* non-BMP characters.
Terry and David are *not* dummies, and that's what they got from more
or less careful study of the issue.

  And once you take that stance, the overly
  generic narrow/wide terms fail, badly.

I still agree that something more accurate would be nice, but face it:
the ISO will redefine and deprecate such terms as soon as they notice
us using them.wink

  +1 for MAL's suggested tweaks to the Py3k configure options.

Despite my natural sympathy for your arguments, and MAL's, I'm still
-1.  I really wish I could switch back, but it seems to me that
UCS-2 is a liability we don't need, *especially* on Windows where
the default build is presumably going to be UCS2 forever.

Footnotes: 
[1]  You'd think it would be hard to change the definition of UCS-4,
but they managed. :-(

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 15:37, ben.cottr...@nominum.com wrote:

On Tue, 23 Nov 2010 15:15:29 +, Michael Foord wrote:

There are still two reasonable APIs (unless you have changed your mind
and think that sticking with plain integers is best), of which I prefer
the latter:

SOME_CONST = Constant('SOME_CONST', 1)
OTHER_CONST = Constant('OTHER_CONST', 2)

or:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST', start=1)
SOME_CONST = Constants.SOME_CONST
OTHER_CONST = Constants.OTHER_CONST

I prefer the latter too, because that makes it possible to have
'Constants' be a rendezvous point for making sure that you're
passing something valid. Perhaps using 'in':

def func(foo):
 if foo not in Constants:
 raise ValueError('foo must be SOME_CONST or OTHER_CONST')
 ...

I know this is probably not going to happen, but I would *so much*
like it if functions would start rejecting the wrong kind of 2.
Constants that are valid, integer-wise, but which aren't part of
the set of constants allowed for that argument. I'd prefer not to
think of the number of times I've made the following mistake:

s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)


Well it would be perfectly possible for the __contains__ method (on the 
metaclass so that a Constants class can act as a container) to permit a 
*raw integer* (to be backwards compatible with code using hard coded 
values) but not permit other constants that aren't valid. Code that is 
*deliberately* using the wrong constants would be screwed of course...


All the best,

Michael

~Ben
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Barry Warsaw
On Nov 23, 2010, at 01:50 PM, Michael Foord wrote:

Right. As it happens I just submitted a patch to Barry Warsaw's enum package
(nice), flufl.enum [1], to allow namedtuple style creation of named
constants:

Thanks for the plug (and the nice patch).

FWIW, the documentation for the package is here:

http://packages.python.org/flufl.enum/

I made some explicit decisions about the API and semantics of this package, to
fit my own use cases and sensibilities.  I guess you wouldn't expect anything
else wink, but I'm willing to acknowledge that others would make different
decisions, and certainly the number of existing enum implementations out there
proves that there are lots of interesting ways to go about it.

That said, there are several things I like about my package:

* Enums are not subclassed from ints or strs.  They are a distinct data type
  that can be converted to and from ints and strs.  EIBTI.

* The typical way to create them is through a simple, but explicit class
  definition.  I personally like being explicit about the item values, and the
  assignments are required to make the metaclass work properly, but Michael's
  convenience patch is totally appropriate for cases where you don't care, or
  you want a one-liner.

* Enum items are singletons and are intended to be compared by identity.  They
  can be compared by equality but are not ordered.

* Enum items have an unambiguous symbolic repr and a nice human readable str.

* Given an enum item, you can get to its enum class, and given the class you
  can get to the set of items.

* Enums can be subclassed (though all items in the subclass must have unique
  values).

In any case it may be that enums are too tied to specific use cases to find a
good common ground for the stdlib.  I've been using my module for years and if
there's interest I would of course be happy to donate it for use in the
stdlib.  Like the original sets implementation, it makes perfect sense to
provide them in a separate module rather than as a built-in type.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Barry Warsaw
On Nov 23, 2010, at 03:15 PM, Michael Foord wrote:

(Well, there is a third option that takes __name__ and sets the constants in
the module automagically. I can understand why people would dislike that
though.)

Personally, I think if you want that, then the explicit class definition is a
better way to go.

-Barry



signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread P.J. Eby

At 11:31 AM 11/23/2010 -0500, Barry Warsaw wrote:

On Nov 23, 2010, at 03:15 PM, Michael Foord wrote:

(Well, there is a third option that takes __name__ and sets the constants in
the module automagically. I can understand why people would dislike that
though.)

Personally, I think if you want that, then the explicit class definition is a
better way to go.


This reminds me: a stdlib enum should support proper pickling and 
copying; i.e.:


   assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum))

This could probably be implemented by adding something like:

   def __reduce__(self):
   return getattr, (self._class, self._enumname)

in the EnumValue class.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 16:27, Barry Warsaw wrote:

On Nov 23, 2010, at 01:50 PM, Michael Foord wrote:


Right. As it happens I just submitted a patch to Barry Warsaw's enum package
(nice), flufl.enum [1], to allow namedtuple style creation of named
constants:

Thanks for the plug (and the nice patch).

FWIW, the documentation for the package is here:

http://packages.python.org/flufl.enum/

I made some explicit decisions about the API and semantics of this package, to
fit my own use cases and sensibilities.  I guess you wouldn't expect anything
elsewink, but I'm willing to acknowledge that others would make different
decisions, and certainly the number of existing enum implementations out there
proves that there are lots of interesting ways to go about it.

That said, there are several things I like about my package:

* Enums are not subclassed from ints or strs.  They are a distinct data type
   that can be converted to and from ints and strs.  EIBTI.


But if we are to use it *in* the standard library (as opposed to merely 
adding a module *to* the standard library) there are backwards 
compatibility concerns. Where modules are already using integers for 
constants then integers still need to work.


One easy way to achieve this is to subclass integer. If we don't do that 
(assuming we decide that putting a solution in the standard library is 
appropriate) then we'll have to evaluate what we mean by backwards 
compatible. If the modules that use the constants aren't to change then 
comparing equal to the underlying value is the minimum (so that the 
original value can still be used in place of the new named constant). 
Not sure if you'd be happy to make that change in flufl.enum.




* The typical way to create them is through a simple, but explicit class
   definition.  I personally like being explicit about the item values, and the
   assignments are required to make the metaclass work properly, but Michael's
   convenience patch is totally appropriate for cases where you don't care, or
   you want a one-liner.


If make_enum was to take a set of values to use (as Antoine suggested) I 
don't see what's un-explicit about it.


All the best,

Michael


* Enum items are singletons and are intended to be compared by identity.  They
   can be compared by equality but are not ordered.

* Enum items have an unambiguous symbolic repr and a nice human readable str.

* Given an enum item, you can get to its enum class, and given the class you
   can get to the set of items.

* Enums can be subclassed (though all items in the subclass must have unique
   values).

In any case it may be that enums are too tied to specific use cases to find a
good common ground for the stdlib.  I've been using my module for years and if
there's interest I would of course be happy to donate it for use in the
stdlib.  Like the original sets implementation, it makes perfect sense to
provide them in a separate module rather than as a built-in type.

-Barry


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (BOGUS AGREEMENTS) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 12:32 -0500, Isaac Morland a écrit :
 On Tue, 23 Nov 2010, Antoine Pitrou wrote:
 
  We already have a bunch of bizarrely unrelated stuff in collections
  (such as Callable), so we could put enum there too.
 
 Why not just enum (i.e., from enum import [...] or import 
 enum.[...])?  Enumerations are one of the basic kinds of types overall 
 (speaking informally and independent of any specific language) - they 
 aren't at all exotic.

Enumerations aren't a type at all (they have no distinguishing
property).

 And Flat is better than nested, after all.

Not when it means creating a separate module for every micro-feature.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Isaac Morland

On Tue, 23 Nov 2010, Antoine Pitrou wrote:


We already have a bunch of bizarrely unrelated stuff in collections
(such as Callable), so we could put enum there too.


Why not just enum (i.e., from enum import [...] or import 
enum.[...])?  Enumerations are one of the basic kinds of types overall 
(speaking informally and independent of any specific language) - they 
aren't at all exotic.  And Flat is better than nested, after all.


Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Isaac Morland

On Tue, 23 Nov 2010, Antoine Pitrou wrote:


Le mardi 23 novembre 2010 à 12:32 -0500, Isaac Morland a écrit :

On Tue, 23 Nov 2010, Antoine Pitrou wrote:


We already have a bunch of bizarrely unrelated stuff in collections
(such as Callable), so we could put enum there too.


Why not just enum (i.e., from enum import [...] or import
enum.[...])?  Enumerations are one of the basic kinds of types overall
(speaking informally and independent of any specific language) - they
aren't at all exotic.


Enumerations aren't a type at all (they have no distinguishing
property).


Each enumeration is a type (well, OK, not in every language, presumably, 
but certainly in many languages).  The word basic is more important than 
types in my sentence - the point is that an enumeration capability is a 
very common one in a type system, and is very general, not specific to any 
particular application.



And Flat is better than nested, after all.


Not when it means creating a separate module for every micro-feature.


Classes have their own keyword.  I don't think it's disproportionate to 
give enums a top-level module name.


Having said that, I understand we're trying to have a not-too-flat module 
namespace and I can see the sense in putting it in collections.  But I 
think the idea that enumerations are of very wide applicability and hence 
deserve a shorter name should be seriously considered.


I'll leave it at that, except for:

Hey, how about this syntax:

enum Colors:
red = 0
green = 10
blue

(blue gets the value 11)

;-)

Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Fred Drake
On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou solip...@pitrou.net wrote:
 Enumerations aren't a type at all (they have no distinguishing
 property).

In any given language, this may be true, or not.  Whether they should
be distinct in Python is core to the current discussion.

From a backward-compatibility perspective, what makes sense depends on
whether they're used to implement existing constants (socket.AF_INET,
etc.) or if they reserved for new features only.


  -Fred

--
Fred L. Drake, Jr.    fdrake at acm.org
A storm broke loose in my mind.  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 12:57 -0500, Fred Drake a écrit :
 On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou solip...@pitrou.net wrote:
  Enumerations aren't a type at all (they have no distinguishing
  property).
 
 In any given language, this may be true, or not.  Whether they should
 be distinct in Python is core to the current discussion.

I meant type in the structural sense (hence the parenthesis). enums
are just auto-generated constants. Since Python makes it trivial to
generate sequential integers, there's no need for a specific enum
construct.

Now you may argue that enums should be strongly-typed, but that would be
a bit backwards given Python's preference for duck-typing.

 From a backward-compatibility perspective, what makes sense depends on
 whether they're used to implement existing constants (socket.AF_INET,
 etc.) or if they reserved for new features only.

It's not only backwards compatibility. New features relying on C APIs
have to be able to map constants to the integers used in the C library.
It would be much better if this were done naturally rather than through
explicit conversion maps.
(this really means subclassing int, if we don't want to complicate
C-level code)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 12:50 -0500, Isaac Morland a écrit :
 Each enumeration is a type (well, OK, not in every language, presumably, 
 but certainly in many languages).  The word basic is more important than 
 types in my sentence - the point is that an enumeration capability is a 
 very common one in a type system, and is very general, not specific to any 
 particular application.

Python already has an enumeration capability. It's called range().
There's nothing else that C enums have. AFAICT, neither do enums in
other mainstream languages (assuming they even exist; I don't remember
Perl, PHP or Javascript having anything like that, but perhaps I'm
mistaken).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Glenn Linderman

On 11/23/2010 3:55 AM, Martin v. Löwis wrote:

Am 23.11.2010 11:55, schrieb Amaury Forgeot d'Arc:

Hi,

2010/11/23 Glenn Lindermanv+pyt...@g.nevcal.com:

   File C:\Python32\lib\random.py, line 108, in seed
 a = int.from_bytes(_urandom(32), 'big')
WindowsError: [Error -2146893818] Invalid Signature

In the subprocess documentation http://docs.python.org/library/subprocess.html
On Windows, in order to run a side-by-side assembly the specified
env *must* include a valid SystemRoot.

Indeed, setting SystemRoot might solve this problem. According to

http://jpassing.com/2009/12/28/the-hidden-danger-of-forgetting-to-specify-systemroot-in-a-custom-environment-block/

CrypoAPI, in Windows 7, requires this variable be set. Failure to
find the enhanced crypto provider would explain why the random
module of Python fails to work.

The specific cause is in the registry:
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Cryptography\Defaults\Provider\Microsoft
Strong Cryptographic Provider has as it's ImagePath value

%SystemRoot%\system32\rsaenh.dll

So the registry (and COM) do rely on environment variables.

Regards,
Martin


I find it sad but hilarious that after working so hard to remove the 
need for environment variables from Windows that M$ has introduced new 
dependencies on them.


I wonder if this particular registry variable is simply an oversight/bug 
on M$' part, that they will eventually fix, or if it a turnaround toward 
the use of more environment variables in the future.  Hmm.  Time will 
tell, I suppose.  I'm unaware of any benefits in _changing_ SystemRoot 
to other values, so not pre-expanding it in that registry location seems 
only to add an unnecessary dependency on the environment.


Indeed, preserving that one environment variable allows my version of 
http.server to proceed with, as far as initial testing can determine, 
proper behavior.  Thanks for your help in figuring this out.  That was a 
lot faster than a binary search to choose which variable(s) to preserve.


My purpose in such testing was two-fold: firstly, web servers, for 
security purposes, generally limit the number of environment variables 
that are seen by CGI programs, and secondly, in debugging whether or not 
http.server was properly setting the necessary environment variables, 
the many other environment variables were cluttering up log dumps of all 
environment variables.  It will be nicer to limit the passed through 
environment variables to SystemRoot, as see how things go.


I have read some about side-by-side assemblies but had considered them a 
good reason to stick with the outdated M$VC 6.0 compiler, which doesn't 
seem to need to create them, and their myriad requirements, which seem 
far from necessary for simply compiling a program.  I was disappointed 
to realize that Python was heading down the path of using the newer 
tools that create side-by-side assemblies, but I suppose using an old 
and crufty compiler like M$VC 6.0 cannot support some of the newer 
features of Windows, which may seem to be necessary to some like 
64-bit support, which does seem necessary, even to me.


I was well aware that shortcuts and the registry _may_ refer to 
environment variables, and have a number of environment variables of my 
own which leverage that capability, to avoid hard-coded drive letters 
and paths in certain areas, and for the convenience of shorting the 
specification of some of the long-winded path names that Windows foists 
upon us (some of those have been significantly shortened in Windows 6.1, 
and maybe 6.0 which I used only for 2 months with disgust; 6.1 has 
helped alleviate the disgust, but I still recommend XP for people that 
don't need 64-bit capabilities).



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Glenn Linderman

On 11/22/2010 2:56 PM, Tim Lesher wrote:

On Mon, Nov 22, 2010 at 16:54, Glenn Lindermanv+pyt...@g.nevcal.com  wrote:

I suppose it is possible that some environment variables are used by Python
directly (but I can't seem to find a documented list of them) although I
would expect that usage to be optional, with fall-back defaults when they
don't exist.

I can verify that that's the case: Python (at least through 3.1.2)
runs fine on Windows platforms when environment variables are
completely unavailable.  I know that from running our port for Windows
CE (which has no environment variables at all), cross-compiled for
Windows XP.


Is the Windows CE port generally available?  From where?  The CE ports I 
have found in past searches seem to have been quite outdated and not 
much on-going activity.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Alexander Belopolsky
On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger
raymond.hettin...@gmail.com wrote:
..
 Any explanation we give users needs to let them know two things:
 * that we cover the entire range of unicode not just BMP
 * that sometimes len(chr(i)) is one and sometimes two

This discussion motivated me to start looking into how well Python
library itself is prepared to deal with len(chr(i)) = 2.  I was not
surprised to find that textwrap does not handle the issue that well:

 len(wrap(' \U00010140' * 80, 20))
12
 len(wrap(' \U0140' * 80, 20))
8

That module should probably be rewritten to properly implement  the
Unicode line breaking algorithm
http://unicode.org/reports/tr14/tr14-22.html.

Yet finding a bug in a str object method after a 5 min review was a
bit discouraging:

 'xyz'.center(20, '\U00010140')
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: The fill character must be exactly one character long

Given the apparent difficulty of writing even basic text processing
algorithms in presence of surrogate pairs, I wonder how wise it is to
expose Python users to them.  As Wikipedia explains, [1]


Because the most commonly used characters are all in the Basic
Multilingual Plane, converting between surrogate pairs and the
original values is often not tested thoroughly. This leads to
persistent bugs, and potential security holes, even in popular and
well-reviewed application software.


Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to
cover only BMP, maybe rather than changing the terms used in the
reference manual, we should tighten the code to conform to the updated
standards?

Again, given that the str object itself has at least one non-BMP
character bug as we are closing on the third major release of py3k,
how likely are 3rd party developers to get their libraries right as
they port to 3.x?

[1] http://en.wikipedia.org/wiki/UTF-16/UCS-2
[2] http://unicode.org/reports/tr17/#CharacterEncodingForm
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Amaury Forgeot d'Arc
2010/11/23 Alexander Belopolsky alexander.belopol...@gmail.com:
 This discussion motivated me to start looking into how well Python
 library itself is prepared to deal with len(chr(i)) = 2.  I was not
 surprised to find that textwrap does not handle the issue that well:

 len(wrap(' \U00010140' * 80, 20))
 12
 len(wrap(' \U0140' * 80, 20))
 8

 That module should probably be rewritten to properly implement  the
 Unicode line breaking algorithm
 http://unicode.org/reports/tr14/tr14-22.html.

 Yet finding a bug in a str object method after a 5 min review was a
 bit discouraging:

 'xyz'.center(20, '\U00010140')
 Traceback (most recent call last):
  File stdin, line 1, in module
 TypeError: The fill character must be exactly one character long

 Given the apparent difficulty of writing even basic text processing
 algorithms in presence of surrogate pairs, I wonder how wise it is to
 expose Python users to them.

This was already discussed two years ago:

http://mail.python.org/pipermail/python-dev/2008-July/080900.html

So yes, wrap() and center() should be fixed.

--
Amaury Forgeot d'Arc
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Bill Janssen
Isaac Morland ijmor...@uwaterloo.ca wrote:

 On Tue, 23 Nov 2010, Antoine Pitrou wrote:
 
  Le mardi 23 novembre 2010 à 12:32 -0500, Isaac Morland a écrit :
  On Tue, 23 Nov 2010, Antoine Pitrou wrote:
 
  We already have a bunch of bizarrely unrelated stuff in collections
  (such as Callable), so we could put enum there too.
 
  Why not just enum (i.e., from enum import [...] or import
  enum.[...])?  Enumerations are one of the basic kinds of types overall
  (speaking informally and independent of any specific language) - they
  aren't at all exotic.
 
  Enumerations aren't a type at all (they have no distinguishing
  property).

Not in C, but in some other languages.

 Each enumeration is a type (well, OK, not in every language,
 presumably, but certainly in many languages).

The main purpose of that is to be able to catch type mismatches with
static typing, though.  Seems kind of pointless for Python.

 Classes have their own keyword.  I don't think it's disproportionate
 to give enums a top-level module name.

I do.

 Hey, how about this syntax:
 
 enum Colors:
   red = 0
   green = 10
   blue

Why not

  class Color:
 red = (255, 0, 0)
 green = (0, 255, 0)
 blue = (0, 0, 255)

Seems to handle the situation OK.

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread M.-A. Lemburg
Alexander Belopolsky wrote:
 On Mon, Nov 22, 2010 at 1:13 PM, Raymond Hettinger
 raymond.hettin...@gmail.com wrote:
 ..
 Any explanation we give users needs to let them know two things:
 * that we cover the entire range of unicode not just BMP
 * that sometimes len(chr(i)) is one and sometimes two
 
 This discussion motivated me to start looking into how well Python
 library itself is prepared to deal with len(chr(i)) = 2.  I was not
 surprised to find that textwrap does not handle the issue that well:
 
 len(wrap(' \U00010140' * 80, 20))
 12
 len(wrap(' \U0140' * 80, 20))
 8
 
 That module should probably be rewritten to properly implement  the
 Unicode line breaking algorithm
 http://unicode.org/reports/tr14/tr14-22.html.
 
 Yet finding a bug in a str object method after a 5 min review was a
 bit discouraging:
 
 'xyz'.center(20, '\U00010140')
 Traceback (most recent call last):
   File stdin, line 1, in module
 TypeError: The fill character must be exactly one character long
 
 Given the apparent difficulty of writing even basic text processing
 algorithms in presence of surrogate pairs, I wonder how wise it is to
 expose Python users to them. 

What's the alternative ?

Without surrogates, Python users with UCS-2 build (e.g. the Windows
Python users) would not be allowed to play with non-BMP code points.

IMHO, it's better to fix the stdlib. This is a long process, as you
can see with the Python3 stdlib evolution, but Python will eventually
get there.

 As Wikipedia explains, [1]
 
 
 Because the most commonly used characters are all in the Basic
 Multilingual Plane, converting between surrogate pairs and the
 original values is often not tested thoroughly. This leads to
 persistent bugs, and potential security holes, even in popular and
 well-reviewed application software.
 
 
 Since UCS-2 (the Character Encoding Form (CEF)) is now defined [1] to
 cover only BMP, maybe rather than changing the terms used in the
 reference manual, we should tighten the code to conform to the updated
 standards?

Can we please stop turning this around over and over again :-)
UCS-2 has never supported anything other than the BMP. However,
you can interpret sequences of UCS-2 code unit as UTF-16 and
then get access to the full Unicode character set. We've been
doing this in codecs ever since UCS-4 builds were introduced
some 8-9 years ago.

The change to have chr(i) return surrogates on UCS-2 builds
was perhaps done too early, but then, without such changes you'd
never notice that your code doesn't work well with surrogates.
It's just one piece of the puzzle when going from 8-bit strings
to Unicode.

 Again, given that the str object itself has at least one non-BMP
 character bug as we are closing on the third major release of py3k,
 how likely are 3rd party developers to get their libraries right as
 they port to 3.x?
 
 [1] http://en.wikipedia.org/wiki/UTF-16/UCS-2
 [2] http://unicode.org/reports/tr17/#CharacterEncodingForm

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 23 2010)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try our new mxODBC.Connect Python Database Interface for free ! 


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Guido van Rossum
On Tue, Nov 23, 2010 at 10:06 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Le mardi 23 novembre 2010 à 12:57 -0500, Fred Drake a écrit :
 On Tue, Nov 23, 2010 at 12:37 PM, Antoine Pitrou solip...@pitrou.net wrote:
  Enumerations aren't a type at all (they have no distinguishing
  property).

 In any given language, this may be true, or not.  Whether they should
 be distinct in Python is core to the current discussion.

 I meant type in the structural sense (hence the parenthesis). enums
 are just auto-generated constants. Since Python makes it trivial to
 generate sequential integers, there's no need for a specific enum
 construct.

 Now you may argue that enums should be strongly-typed, but that would be
 a bit backwards given Python's preference for duck-typing.

Please take a step back.

The best example of the utility of enums even for Python is bool. I
resisted this for the longest time but people kept asking for it. Some
properties of bool:

(a) bool is a (final) subclass of int, and an int is acceptable in a
pinch where a bool is expected
(b) bool values are guaranteed unique -- there is only one instance
with value True, and only one with value False
(c) bool values have a str() and repr() that shows their name instead
of their value (but not their class -- that's rarely an issue, and
makes the output more compact)

I think it makes sense to add a way to the stdlib to add other types
like bool. I think (c) is probably the most important feature,
followed by (a) -- except the *final* part: I want to subclass enums.
(b) is probably easy to do but I don't think it matters that much in
practice.

 From a backward-compatibility perspective, what makes sense depends on
 whether they're used to implement existing constants (socket.AF_INET,
 etc.) or if they reserved for new features only.

 It's not only backwards compatibility. New features relying on C APIs
 have to be able to map constants to the integers used in the C library.
 It would be much better if this were done naturally rather than through
 explicit conversion maps.

I'm not sure what you mean here. Can you give an example of what you
mean? I agree that it should be possible to make pretty much any
constant in the OS modules enums -- even if the values vary across
platforms.

 (this really means subclassing int, if we don't want to complicate
 C-level code)

Right.

FWIW I don't think I'm particular about the exact API to construct a
new enum type in Python code; I think in most cases explicitly
assigning values is fine. Often the values are constrained by
something external anyway; it should be easy to dynamically set the
values of a particular enum type (even add new values after the fact).
There might also be enums with the same value (even though the mapping
from int to enum will then have to pick one).

I expect that the API to convert between enums and bare ints should be
i = int(e) and e = enumclass(i). It would be nice if s = str(e) and
e = enumclass(s) would work too.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Barry Warsaw
On Nov 23, 2010, at 12:57 PM, Fred Drake wrote:

From a backward-compatibility perspective, what makes sense depends on
whether they're used to implement existing constants (socket.AF_INET,
etc.) or if they reserved for new features only.

As is usually the case, there's little reason to change existing working code.
Enums can be used whenever a module or API is updated.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Barry Warsaw
On Nov 23, 2010, at 05:02 PM, Michael Foord wrote:

 * Enums are not subclassed from ints or strs.  They are a distinct data type
that can be converted to and from ints and strs.  EIBTI.

But if we are to use it *in* the standard library (as opposed to merely
adding a module *to* the standard library) there are backwards compatibility
concerns. Where modules are already using integers for constants then
integers still need to work.

Is int(enum_value) enough, or must the enum value actually *be* an int?

One easy way to achieve this is to subclass integer. If we don't do that
(assuming we decide that putting a solution in the standard library is
appropriate) then we'll have to evaluate what we mean by backwards
compatible. If the modules that use the constants aren't to change then
comparing equal to the underlying value is the minimum (so that the original
value can still be used in place of the new named constant). Not sure if
you'd be happy to make that change in flufl.enum.

I'm not sure either.  In flufl.enum enum_class(i) also works as expected.

 * The typical way to create them is through a simple, but explicit class
definition.  I personally like being explicit about the item values, and
the assignments are required to make the metaclass work properly, but
Michael's convenience patch is totally appropriate for cases where you
don't care, or you want a one-liner.

If make_enum was to take a set of values to use (as Antoine suggested) I
don't see what's un-explicit about it.

When I saw your patch I immediately thought that I could add a default
argument that was something like `int_iter`, i.e. an iterator of integers for
the values in the string.  I suspect YAGNI, which is why I didn't just add it,
but I'm not totally opposed to it.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Barry Warsaw
On Nov 23, 2010, at 11:52 AM, P.J. Eby wrote:

This reminds me: a stdlib enum should support proper pickling and copying;
i.e.:

assert SomeEnum.anEnum is pickle.loads(pickle.dumps(SomeEnum.anEnum))

This could probably be implemented by adding something like:

def __reduce__(self):
return getattr, (self._class, self._enumname)

in the EnumValue class.

Excellent idea, thanks.  Added to flufl.enum in r38.  However, only enums
created with the class syntax can be pickled though.

Cheers,
-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Guido van Rossum
On Tue, Nov 23, 2010 at 11:47 AM, Barry Warsaw ba...@python.org wrote:
 On Nov 23, 2010, at 05:02 PM, Michael Foord wrote:

 * Enums are not subclassed from ints or strs.  They are a distinct data type
    that can be converted to and from ints and strs.  EIBTI.

But if we are to use it *in* the standard library (as opposed to merely
adding a module *to* the standard library) there are backwards compatibility
concerns. Where modules are already using integers for constants then
integers still need to work.

 Is int(enum_value) enough, or must the enum value actually *be* an int?

I vote for *be*, following bool's example.

One easy way to achieve this is to subclass integer. If we don't do that
(assuming we decide that putting a solution in the standard library is
appropriate) then we'll have to evaluate what we mean by backwards
compatible. If the modules that use the constants aren't to change then
comparing equal to the underlying value is the minimum (so that the original
value can still be used in place of the new named constant). Not sure if
you'd be happy to make that change in flufl.enum.

 I'm not sure either.  In flufl.enum enum_class(i) also works as expected.

 * The typical way to create them is through a simple, but explicit class
    definition.  I personally like being explicit about the item values, and
    the assignments are required to make the metaclass work properly, but
    Michael's convenience patch is totally appropriate for cases where you
    don't care, or you want a one-liner.

If make_enum was to take a set of values to use (as Antoine suggested) I
don't see what's un-explicit about it.

 When I saw your patch I immediately thought that I could add a default
 argument that was something like `int_iter`, i.e. an iterator of integers for
 the values in the string.  I suspect YAGNI, which is why I didn't just add it,
 but I'm not totally opposed to it.

 -Barry

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org





-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Sporadic problems with bugs.python.org

2010-11-23 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Happen to me last Sunday, and happening just now.

I can access http://bugs.python.org/ just fine, but trying to post a
message, open a new bug, change nosy, etc., takes a LONG time (minutes)
and it is finally failing with a 400 Bad Request error:


Bad Request

Your browser sent a request that this server could not understand.
Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9
OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80


Last sunday I was able to open the bug after a time. Today I have been
retrying for while, with no luck yet.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTOwk/plgi5GaxT1NAQJYuQP+LhEUtOXyaz0Ut6586/cwura87jq/XVxn
XatNzwadYNH4yF3ewXVkLk6eSjXOnEszr8kWX3inoLY9ND7o3TCMn5uCKOF2G4Lh
sgogv7eB5KEffAaXoxZxT+ZJVYBEPyUISgMeD40DL/tQJIcMBtyZtU1nY5QxwPzN
O8mGHBlEGpQ=
=i/s7
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Martin v. Löwis
 I have read some about side-by-side assemblies but had considered them a
 good reason to stick with the outdated M$VC 6.0 compiler, which doesn't
 seem to need to create them, and their myriad requirements, which seem
 far from necessary for simply compiling a program.  I was disappointed
 to realize that Python was heading down the path of using the newer
 tools that create side-by-side assemblies, but I suppose using an old
 and crufty compiler like M$VC 6.0 cannot support some of the newer
 features of Windows, which may seem to be necessary to some like
 64-bit support, which does seem necessary, even to me.

The rationale for moving along with the releases is different, though:
you cannot obtain the old versions anymore, except perhaps on Ebay.
So new developers coming to Python would not be able to build Python
extensions if we didn't always try to use a compiler that is still
available (and we are stressing that a little bit: 3.2 will use
VS 2008, even though it has been already superceded).

In any case, VS 2010 will stop using SxS for the CRT.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] is this a bug? no environment variables

2010-11-23 Thread Glenn Linderman

On 11/23/2010 12:33 PM, Martin v. Löwis wrote:

In any case, VS 2010 will stop using SxS for the CRT.


Good news!  Maybe M$VC will become a useful compiler yet again :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glenn Linderman

On 11/23/2010 11:34 AM, Guido van Rossum wrote:

The best example of the utility of enums even for Python is bool. I
resisted this for the longest time but people kept asking for it. Some
properties of bool:

(a) bool is a (final) subclass of int, and an int is acceptable in a
pinch where a bool is expected
(b) bool values are guaranteed unique -- there is only one instance
with value True, and only one with value False
(c) bool values have a str() and repr() that shows their name instead
of their value (but not their class -- that's rarely an issue, and
makes the output more compact)

I think it makes sense to add a way to the stdlib to add other types
like bool. I think (c) is probably the most important feature,
followed by (a) -- except the *final* part: I want to subclass enums.
(b) is probably easy to do but I don't think it matters that much in
practice.


I was concerned about uniqueness constraints some were touting.  While 
that can be a useful property for some enumerations, it can also be 
convenient for other enumerations to have multiple names map to the same 
value.


Bool seems appropriately not extensible to additional values.  While 
there are tri-valued (and other) logic systems, they deserve a separate 
namespace.


Bool seems to be an example, then of a set of distingushed names, with 
values associated to the names, and is restricted to [two] [unique] 
integer values.  C/C++/C# enum is somewhat like that, and is also 
restricted to integer values [not necessarily unique].  I wonder if a 
set of distinguished names need to be restricted to integer values to be 
useful, although I have no doubt that distinguished names with integer 
values are useful.  Someone used an example of color names class having 
RGB tuple values, which is a counter example to a restriction to 
integers.  I can think of others as well.


Perhaps a set of distinguished names, with values associated to the 
names is really a dict, with the unique names restricted to Python 
identifier syntax (to be useful), and the values unrestricted. The type 
of the named value, and the value of the named value, seem not to need 
to be restricted.


But the implementations

Bool = dict('False': 0, 'True': 1)

or alternately

class Bool():
self.False = 0
self.True = 1

is missing a couple characteristics of Python's present bool: the names 
are not special, and the values are not immutable.  Perhaps games could 
be played to make the second implementation effectively immutable.


So I think the real trick of the enum (or a generalized distinguished 
names) is in the naming.  A technique to import the keys that are legal 
Python identifiers from a dict into a namespace, and retain henceforth 
immutable values for those names would permit the syntactical usage that 
people are accustomed to from the C/C++/C# enum, but with extended 
ranges and types of values, and it seems Bool could be mostly 
reimplemented via that technique.


What is still missing?  The debugging help: the values, once imported, 
should not become just values of their type, but rather a new type of 
value, that has an associated name (and type, I think).  Whatever magic 
is worked under the covers to make sure that there is just one True and 
just one False, so that they can be distinguished from the values 1 and 
0, and so reported, should also be applied to these values.


So there need not be new syntax for creating the name/value pairs; just 
use dict.  The only new API would be the code that imports the dict 
into the local namespace.


Note that other scoped definitions of True and False are not possible 
today because True and False are keywords.  It would be inappropriate to 
define these distinguished names as all being keywords, so it seems like 
one could still override the names, even once defined, but such 
overridden names would lose their special value that makes them a 
distinguished name.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 11:34 -0800, Guido van Rossum a écrit :
  From a backward-compatibility perspective, what makes sense depends on
  whether they're used to implement existing constants (socket.AF_INET,
  etc.) or if they reserved for new features only.
 
  It's not only backwards compatibility. New features relying on C APIs
  have to be able to map constants to the integers used in the C library.
  It would be much better if this were done naturally rather than through
  explicit conversion maps.
 
 I'm not sure what you mean here. Can you give an example of what you
 mean? I agree that it should be possible to make pretty much any
 constant in the OS modules enums -- even if the values vary across
 platforms.

I mean that PyArg_ParseTuple should continue to be pratical even if e.g.
os.SEEK_SET and friends become named constants.  It implies that the
various format codes such as i, l, etc. are still usable with those
constants. Hence:

  (this really means subclassing int, if we don't want to complicate
  C-level code)
 
 Right.

:-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Ron Adam



On 11/23/2010 12:07 PM, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 12:50 -0500, Isaac Morland a écrit :

Each enumeration is a type (well, OK, not in every language, presumably,
but certainly in many languages).  The word basic is more important than
types in my sentence - the point is that an enumeration capability is a
very common one in a type system, and is very general, not specific to any
particular application.


Python already has an enumeration capability. It's called range().
There's nothing else that C enums have. AFAICT, neither do enums in
other mainstream languages (assuming they even exist; I don't remember
Perl, PHP or Javascript having anything like that, but perhaps I'm
mistaken).



Aren't we forgetting enumerate?

 colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE'

 dict(e for e in enumerate(colors.split()))
{0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6: 
'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'}


 dict((f, n) for (n, f) in enumerate(colors.split()))
{'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7, 
'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2}



Most other languages that use numbered constants number them by base n^2.

 [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]


Binary flags have the advantage of saving memory because you can assign 
more than one to a single integer.  Another advantage is other languages 
use them so it can make it easier interface with them.   There also may be 
some performance advantages as well since you can test for multiple flags 
with a single comparison.


Sets of strings can also work when you don't need to associate a numeric 
value to the constant.  ie... the constant is the value.  In this case the 
set supplies the api.


Cheers,
  Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:37 AM, ben.cottr...@nominum.com wrote:

 I'd prefer not to think of the number of times I've made the following 
 mistake:
 
 s = socket.socket(socket.SOCK_DGRAM, socket.AF_INET)

If it's any consolation, it's fewer than the number of times I have :).

(More fun, actually, is where you pass a file descriptor to the wrong argument 
of 'fromfd'...)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Steven D'Aprano

Antoine Pitrou wrote:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',   
   values=range(1, 3))


Again, auto-enumeration is useless since it's trivial to achieve
explicitly.


That doesn't make auto-enumeration useless. Unnecessary, perhaps, but 
not useless.


But even then it's only unnecessary if the number of constants are small 
enough that you can see how many there are without counting 
(essentially, 4 or fewer). When you have more, it becomes error-prone 
and a nuisance to have to count them by hand:


Constants = make_constants(
'Constants',
'ST_MODE ST_INO ST_DEV ST_NLINK ST_UID ST_GID' \
'ST_SIZE ST_ATIME ST_MTIME ST_CTIME',
values=range(10)
)


--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Glyph Lefkowitz

On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:

 Well, it is easy to assign range(N) to a tuple of names when desired. I
 don't think an automatically-enumerating constant generator is needed.

I don't think that numerical enumerations are the only kind of constants we're 
talking about.  Others have already mentioned strings.  Also, see 
http://tm.tl/4671 for some other use-cases.  Since this isn't coming to 2.x, 
we're probably going to do our own thing anyway (unless it turns out that 
flufl.enum is so great that we want to add another dependency...) but I'm 
hoping that the outcome of this discussion will point to something we can be 
compatible with.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Antoine Pitrou
Le mardi 23 novembre 2010 à 16:10 -0500, Glyph Lefkowitz a écrit :
 
 On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:
 
  Well, it is easy to assign range(N) to a tuple of names when
  desired. I
  don't think an automatically-enumerating constant generator is
  needed.
 
 I don't think that numerical enumerations are the only kind of
 constants we're talking about.  Others have already mentioned strings.
 Also, see http://tm.tl/4671 for some other use-cases.  Since this
 isn't coming to 2.x, we're probably going to do our own thing anyway
 (unless it turns out that flufl.enum is so great that we want to add
 another dependency...) but I'm hoping that the outcome of this
 discussion will point to something we can be compatible with.

I think that asking for too many features would get in the way, and also
make the API quite un-Pythonic. If you want your values to be e.g.
OR'able, just choose your values wisely ;)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Ron Adam

Oops..  x**2 should have been 2**x below.


On 11/23/2010 03:03 PM, Ron Adam wrote:



On 11/23/2010 12:07 PM, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 12:50 -0500, Isaac Morland a écrit :

Each enumeration is a type (well, OK, not in every language, presumably,
but certainly in many languages). The word basic is more important than
types in my sentence - the point is that an enumeration capability is a
very common one in a type system, and is very general, not specific to any
particular application.


Python already has an enumeration capability. It's called range().
There's nothing else that C enums have. AFAICT, neither do enums in
other mainstream languages (assuming they even exist; I don't remember
Perl, PHP or Javascript having anything like that, but perhaps I'm
mistaken).



Aren't we forgetting enumerate?

  colors = 'BLACK BROWN RED ORANGE YELLOW GREEN BLUE VIOLET GREY WHITE'

  dict(e for e in enumerate(colors.split()))
{0: 'BLACK', 1: 'BROWN', 2: 'RED', 3: 'ORANGE', 4: 'YELLOW', 5: 'GREEN', 6:
'BLUE', 7: 'VIOLET', 8: 'GREY', 9: 'WHITE'}

  dict((f, n) for (n, f) in enumerate(colors.split()))
{'BLUE': 6, 'BROWN': 1, 'GREY': 8, 'YELLOW': 4, 'GREEN': 5, 'VIOLET': 7,
'ORANGE': 3, 'BLACK': 0, 'WHITE': 9, 'RED': 2}


Most other languages that use numbered constants number them by base n^2.

  [x**2 for x in range(10)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]



 [2**x for x in range(10)]
[1, 2, 4, 8, 16, 32, 64, 128, 256, 512]



Binary flags have the advantage of saving memory because you can assign
more than one to a single integer. Another advantage is other languages use
them so it can make it easier interface with them. There also may be some
performance advantages as well since you can test for multiple flags with a
single comparison.

Sets of strings can also work when you don't need to associate a numeric
value to the constant. ie... the constant is the value. In this case the
set supplies the api.

Cheers,
Ron

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/python-python-dev%40m.gmane.org



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Steven D'Aprano

Antoine Pitrou wrote:


Python already has an enumeration capability. It's called range().
There's nothing else that C enums have. AFAICT, neither do enums in
other mainstream languages (assuming they even exist; I don't remember
Perl, PHP or Javascript having anything like that, but perhaps I'm
mistaken).



In Pascal, enumerations are a type, and the value of the named values 
are an implementation detail. E.g. one would define an enumerated type:


type
  flavour = (sweet, salty, sour, bitter, umame);
var
  x: flavour;

and then you would write something like:

x := sour;

Notice that the constants sweet etc. aren't explicitly predefined, since 
they're purely internal details and the compiler is allowed to number 
them any way it likes. In Python, we would need stronger guarantees 
about the values chosen, so that they could be exposed to external 
modules, pickled, etc.


But that doesn't mean we should be forced to specify the values ourselves.


--
Steven

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Antoine Pitrou wrote:


I don't understand why people insist on calling that an enum. enum is
a C legacy and it doesn't bring anything useful as I can tell.


The usefulness is that they can have a str() or repr() that
displays the name of the value instead of an integer.

The bool type was added for much the same reason -- otherwise
we would simply have gotten builtin names False = 0 and
True = 1.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Antoine Pitrou wrote:


Well, it's been inherited by C-like languages, no doubt. Like braces and
semicolumns :)


The idea isn't confined to the C family. Pascal and many of the
languages inspired by it also have enumerated types.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Terry Reedy

On 11/23/2010 2:11 PM, Alexander Belopolsky wrote:


This discussion motivated me to start looking into how well Python
library itself is prepared to deal with len(chr(i)) = 2.  I was not


Good idea!


surprised to find that textwrap does not handle the issue that well:


len(wrap(' \U00010140' * 80, 20))

12

len(wrap(' \U0140' * 80, 20))

8


How well does textwrap handles composable pairs (letter + accent)? Does 
is count two codepoints as one char space? and avoid putting line breaks 
between? I suspect textwrap should be regarded as 
(extended?)_ascii_textwrap.


That module should probably be rewritten to properly implement  the
Unicode line breaking algorithm
http://unicode.org/reports/tr14/tr14-22.html.


Probably a good idea


Yet finding a bug in a str object method after a 5 min review was a
bit discouraging:


'xyz'.center(20, '\U00010140')

Traceback (most recent call last):
   File stdin, line 1, inmodule
TypeError: The fill character must be exactly one character long


Again, what does it do with letter + decorator combinations? It seems to 
me that the whole notion that one code point == one printed character 
space is broken once one leaves ascii. Perhaps we need an is_uchar 
function to recognize multi-code sequences, inclusing surrogate pairs, 
that represent one char for the purpose of character oriented functions.



Given the apparent difficulty of writing even basic text processing
algorithms in presence of surrogate pairs, I wonder how wise it is to
expose Python users to them.  As Wikipedia explains, [1]


Because the most commonly used characters are all in the Basic
Multilingual Plane, converting between surrogate pairs and the
original values is often not tested thoroughly. This leads to
persistent bugs, and potential security holes, even in popular and
well-reviewed application software.



So we did not test thoroughly enough and need to add appropriate unit 
tests as bugs are fixed.



--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r86720 - python/branches/py3k/Misc/ACKS

2010-11-23 Thread Terry Reedy



On 11/23/2010 5:43 PM, Éric Araujo wrote:

Modified: python/branches/py3k/Misc/ACKS
==
--- python/branches/py3k/Misc/ACKS  (original)
+++ python/branches/py3k/Misc/ACKS  Tue Nov 23 21:32:47 2010
@@ -1,4 +1,4 @@
-Acknowledgements
+Acknowledgements


This change introduced a so-called UTF-8 BOM in the file.  Is
TortoiseSvn the culprit or a text editor?


I used Notepad to edit the file, TortoiseSvn to commit, the same as I 
did for #9222, rev86702, Lib\idlelib\IOBinding.py, yesterday.
If the latter is OK, perhaps *.py gets filtered better than misc. text 
files. I believe I have the config as specified in dev/faq.


[miscellany]
enable-auto-props = yes

[auto-props]
* = svn:eol-style=native
*.c = svn:keywords=Id
*.h = svn:keywords=Id
*.py = svn:keywords=Id
*.txt = svn:keywords=Author Date Id Revision

Terry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Isaac Morland

On Tue, 23 Nov 2010, Bill Janssen wrote:


The main purpose of that is to be able to catch type mismatches with
static typing, though.  Seems kind of pointless for Python.


The concept can work dynamically.  In fact, the flufl.enum package which 
has been discussed here makes each enumeration into a separate class so 
many of the advantages of catching type mismatches are obtained.



Hey, how about this syntax:

enum Colors:
red = 0
green = 10
blue


Why not

 class Color:
red = (255, 0, 0)
green = (0, 255, 0)
blue = (0, 0, 255)

Seems to handle the situation OK.


Yes, this looks almost exactly like flufl.enum syntax.  In any case my 
suggestion of a new keyword was not meant to be taken seriously.  If I 
ever think I have a good reason to suggest a new keyword I'll sleep on it, 
take a vacation, and then if I still think a new keyword is justified I 
will specifically disclaim any possibility of the suggestion being a joke.


Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stable buildbots

2010-11-23 Thread David Bolen
Trent Nelson tr...@snakebite.org writes:

 That's interesting.  (That kill_python.exe doesn't kill the wedged
 processes, but pskill does.)  kill_python is pretty simple, it just
 calls TerminateProcess() after acquiring a handle with the relevant
 PROCESS_TERMINATE access right.  (...)

 Are you calling pskill with the -t flag? i.e. kill process and all
 dependents?  That might be the ticket, especially if killing the child
 process that wedged select() is waiting on causes it to return, and
 thus, makes it killable.

Nope, just pskill python_d.  Haven't bothered to check the pskill
source but I'm assuming it's just a basic TerminateProcess. Ideally my
quickest workaround would just be to replace the kill_python in the
buildbot tools script with that command but of course they could get
updated on checkouts and I'm not arguing it's generally appropriate enough
to belong in the source.

I suspect the problem may be on the identify which process to kill
rather than the kill it part, but it's definitely going to take time
to figure that out for sure.  While the approach kill_python takes is
much more appropriate, since we don't currently have multiple builds
running simultaneously (and for me the machines are dedicated as build
slaves, so I won't be having my own python_d), a more blanket kill
operation is safe enough.

 Otherwise, if it happens again, can you try kill_python.exe first,
 then pskill, and confirm if the former fails but the latter succeeds?

Yeah, I've got a temporary tree with a built-binary around, but still
have to make sure of the right way to run it manually in a way that it
will do the identification right (which I think also means I need to
figure out from which build tree the hung process started).  Up until
now, typically when I've found a hung setup, the rest of the build
tree which originally applied to that process has been cleaned.

I definitely sympathize with Martin's position though - it wasn't the
simplest tool to write (and I still have some email from him about the
week+ it took just to test the process identification part remotely
through buildbots at the time), so I regret not jumping right in to
try to fix it.  But it's just way more effort than typing pskill
python_d, at least with my current availability.

-- David

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Antoine Pitrou wrote:


I think that asking for too many features would get in the way, and also
make the API quite un-Pythonic. If you want your values to be e.g.
OR'able, just choose your values wisely ;)


On the other hand it could be useful to have an easy way to
request power-of-2 value assignment, seeing as it's another
common pattern.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Bill Janssen wrote:


The main purpose of that is to be able to catch type mismatches with
static typing, though.  Seems kind of pointless for Python.


But catching type mismatches with dynamic typing doesn't
seem pointless for Python. There's nothing static about
the proposals being made here that I can see.


Why not

  class Color:
 red = (255, 0, 0)
 green = (0, 255, 0)
 blue = (0, 0, 255)


If all you want is a bunch of named constants, that's fine.
But the facilities being discussed here are designed to give
you other things as well, such as

  c = Color.red
  print(c)

printing red rather than (255, 0, 0).

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Antoine Pitrou wrote:

Constants = make_constants('Constants', 'SOME_CONST OTHER_CONST',   
   values=range(1, 3))


Again, auto-enumeration is useless since it's trivial to achieve
explicitly.


But seeing as it's going to be a common thing to do, why not
make it the default?

When defining an enum, often you don't *care* what the
underlying values are, so assigning sequential natural numbers
is as good a default as any.

In fact, with the Pascal concept of an enumerated type you
don't get any choice in the matter. It's only in the C family
that you get this bastardised conflation of enumerations with
arbitrary named constants...

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Greg Ewing

Isaac Morland wrote:
In any case my 
suggestion of a new keyword was not meant to be taken seriously.


I don't think it need be taken entirely as a joke, either.
All the proposed patterns for creating enums that I've seen
end up leaving something to be desired. They violate DRY
by requiring you to write the class name twice, or they
make you write the names of the values in quotes, or some
other minor ugliness.

While it may be possible to work around these things with
sufficient levels of metaclass hackery and black magic, at
some point one has to consider whether new syntax might
be the least worst option.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Greg Ewing

Alexander Belopolsky wrote:



Because the most commonly used characters are all in the Basic
Multilingual Plane, converting between surrogate pairs and the
original values is often not tested thoroughly. This leads to
persistent bugs, and potential security holes, even in popular and
well-reviewed application software.



Maybe Python should have used UTF-8 as its internal unicode
representation. Then people who were foolish enough to assume
one character per string item would have their programs break
rather soon under only light unicode testing. :-)

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread James Y Knight
On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
 Maybe Python should have used UTF-8 as its internal unicode
 representation. Then people who were foolish enough to assume
 one character per string item would have their programs break
 rather soon under only light unicode testing. :-)

You put a smiley, but, in all seriousness, I think that's actually the right 
thing to do if anyone writes a new programming language. It is clearly the 
right thing if you don't have to be concerned with backwards-compatibility: 
nobody really needs to be able to access the Nth codepoint in a string in 
constant time, so there's not really any point in storing a vector of 
codepoints.

Instead, provide bidirectional iterators which can traverse the string by byte, 
codepoint, or by grapheme (that is: the set of combining characters + base 
character that go together, making up one thing which a human would think of as 
a character).

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sporadic problems with bugs.python.org

2010-11-23 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 23/11/10 21:33, Jesus Cea wrote:
 Happen to me last Sunday, and happening just now.
 
 I can access http://bugs.python.org/ just fine, but trying to post a
 message, open a new bug, change nosy, etc., takes a LONG time (minutes)
 and it is finally failing with a 400 Bad Request error:
 
 
 Bad Request
 
 Your browser sent a request that this server could not understand.
 Apache/2.2.9 (Debian) mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9
 OpenSSL/0.9.8g mod_wsgi/2.5 Server at bugs.python.org Port 80
 
 
 Last sunday I was able to open the bug after a time. Today I have been
 retrying for while, with no luck yet.

Still retrying, with no luck.

Anybody else can reproduce?.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTOxcxZlgi5GaxT1NAQJGEQQApyTPFFyPbzc45v5AfeLwT0YHvIcFyT5a
lZVZIJ+TVeI1PY/bZpebO4YnjQ6JrHIIedXf8IUqBi9sD8UUDY5tST8TikZPwvvk
pGvdCRwa2A6slGG5zgnA4u4+H2MiOiRhua0sTELNQJYAgzTNER+LDTWQ04p31kOD
D++Hjb2mBs8=
=TI1J
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Michael Foord

On 23/11/2010 21:15, Antoine Pitrou wrote:

Le mardi 23 novembre 2010 à 16:10 -0500, Glyph Lefkowitz a écrit :

On Nov 23, 2010, at 10:01 AM, Antoine Pitrou wrote:


Well, it is easy to assign range(N) to a tuple of names when
desired. I
don't think an automatically-enumerating constant generator is
needed.

I don't think that numerical enumerations are the only kind of
constants we're talking about.  Others have already mentioned strings.
Also, seehttp://tm.tl/4671  for some other use-cases.  Since this
isn't coming to 2.x, we're probably going to do our own thing anyway
(unless it turns out that flufl.enum is so great that we want to add
another dependency...) but I'm hoping that the outcome of this
discussion will point to something we can be compatible with.

I think that asking for too many features would get in the way, and also
make the API quite un-Pythonic. If you want your values to be e.g.
OR'able, just choose your values wisely ;)



Well, the point of an OR'able flag is that the result shows the OR'd 
values in the repr. Raymond suggests using a set of strings where you 
need flag constants. For new apis (so no backwards compatibility 
constraints) where you don't need to use integers (i.e. not wrapping a C 
library) that's a great suggestion:


flags = {'FOO', 'BAR'}

Michael

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.uk



--

http://www.voidspace.org.uk/

READ CAREFULLY. By accepting and reading this email you agree,
on behalf of your employer, to release me from all obligations
and waivers arising from any and all NON-NEGOTIATED agreements,
licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap,
confidentiality, non-disclosure, non-compete and acceptable use
policies (”BOGUS AGREEMENTS”) that I have entered into with your
employer, its partners, licensors, agents and assigns, in
perpetuity, without prejudice to my ongoing rights and privileges.
You further represent that you have the authority to release me
from any BOGUS AGREEMENTS on behalf of your employer.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Centos 5.5 freeze during test_concurrent_futures

2010-11-23 Thread Łukasz Langa
Hi there!

py3k built from trunk on Centos 5.5 freezes during regrtest on 
test_concurrent_futures with Fatal Python error: Invalid thread state for this 
thread. As in a typical concurrent problem, subsequent calls freeze in 
different test cases, but the freeze itself is always reproducible and always 
during this test.

A colorful example: http://bpaste.net/show/11493/

I created an issue for that here: http://bugs.python.org/issue10517

If necessary, I can provide Centos 5.5 shell access. I would also like to 
donate a Centos 5.5 buildbot.

-- 
Best regards,
Łukasz Langa
tel. +48 791 080 144
WWW http://lukasz.langa.pl/

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sporadic problems with bugs.python.org

2010-11-23 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/11/10 01:31, Jesus Cea wrote:
 Still retrying, with no luck.
 
 Anybody else can reproduce?.

One of my tracker changes was just processed.

The important one still retrying every 5 minutes...

I hope I can go sleep before dawn :-P.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTOxrFZlgi5GaxT1NAQLHUQP+IyN3X/vt5AQKpg/fTjSUpfX2f3wTzeOp
8+5Gnb2ktyZQEF0ELBo0wiWNReJcxicw3ZD9Zqy05cprJ8VL7QZSRHkom+BiXrKK
P+Rllulp8Eu+wq59NKJb5DGk8tfDt6zywepUAHB449Dkcyq9p8gt8L5LAiABTfsy
dFaQPP2w1Kg=
=ERTw
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sporadic problems with bugs.python.org

2010-11-23 Thread Terry Reedy

On 11/23/2010 8:32 PM, Jesus Cea wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/11/10 01:31, Jesus Cea wrote:

Still retrying, with no luck.

Anybody else can reproduce?.


One of my tracker changes was just processed.

The important one still retrying every 5 minutes...

I hope I can go sleep before dawn :-P.


I added a comment to one issue and opened another with no problem during 
the last couple of hours.


--
Terry Jan Reedy

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 7:22 PM, James Y Knight wrote:

 On Nov 23, 2010, at 6:49 PM, Greg Ewing wrote:
 Maybe Python should have used UTF-8 as its internal unicode
 representation. Then people who were foolish enough to assume
 one character per string item would have their programs break
 rather soon under only light unicode testing. :-)
 
 You put a smiley, but, in all seriousness, I think that's actually the right 
 thing to do if anyone writes a new programming language. It is clearly the 
 right thing if you don't have to be concerned with backwards-compatibility: 
 nobody really needs to be able to access the Nth codepoint in a string in 
 constant time, so there's not really any point in storing a vector of 
 codepoints.
 
 Instead, provide bidirectional iterators which can traverse the string by 
 byte, codepoint, or by grapheme (that is: the set of combining characters + 
 base character that go together, making up one thing which a human would 
 think of as a character).


I really hope that this idea is not just for new programming languages.  If you 
switch from doing unicode wrong to doing unicode right in Python, you 
quadruple the memory footprint of programs which primarily store and manipulate 
large amounts of text.

This is especially ridiculous in PyGTK applications, where the GUI's internal 
representation required by the GUI UTF-8 anyway, so the round-tripping of 
string data back and forth to the exploded UTF-32 representation is wasting 
gobs of memory and time.  It at least makes sense when your C library's idea 
about character width and your Python build match up.

But, in a desktop app this is unlikely to be a performance concern; in servers, 
it's a big deal; measurably so.  I am pretty sure that in the server apps that 
I work on, we are eventually going to need our own string type and UTF-8 logic 
that does exactly what James suggested - certainly if we ever hope to support 
Py3.

(I dimly recall that both James and I have made this point before, but it's 
pretty important, so it bears repeating.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] OpenSSL Voluntarily (openssl-1.0.0a)

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:02 AM, Antoine Pitrou wrote:

 On Tue, 23 Nov 2010 00:07:09 -0500
 Glyph Lefkowitz gl...@twistedmatrix.com wrote:
 On Mon, Nov 22, 2010 at 11:13 PM, Hirokazu Yamamoto 
 ocean-c...@m2.ccsnet.ne.jp wrote:
 
 Hello. Does this affect python? Thank you.
 
 http://www.openssl.org/news/secadv_20101116.txt
 
 
 No.
 
 Well, actually it does, but Python links against the system OpenSSL on
 most platforms (except Windows), so it's up to the OS vendor to apply
 the patch.


It does?  If so, I must have misunderstood the vulnerability.  Can you explain 
how it affects Python?



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
Alexander Belopolsky writes:

  Yet finding a bug in a str object method after a 5 min review was a
  bit discouraging:
  
   'xyz'.center(20, '\U00010140')
  Traceback (most recent call last):
File stdin, line 1, in module
  TypeError: The fill character must be exactly one character long
  
  Given the apparent difficulty of writing even basic text processing
  algorithms in presence of surrogate pairs, I wonder how wise it is to
  expose Python users to them.

Consenting adults applies here.

What to do?  Write tests, fix the stdlib.  Raise the probability of
surrogate pair tests in the fuzzer.

But expose the users to surrogate pairs in an efficient (ie, UCS-2)
implementation is a fundamental design principle of Python.
Tightening up the internal implementation is -10 unacceptable IMO
YMMV.

  Again, given that the str object itself has at least one non-BMP
  character bug as we are closing on the third major release of py3k,
  how likely are 3rd party developers to get their libraries right as
  they port to 3.x?

Not our problem, really.  We need to fix the stdlib, but 3rd party
libraries know what they're doing.

I guess we could provide a fuzztest module that generates known nasty
data (zero, very big numbers, \0x00, \U00010140, etc) that people
would be able to plug in as a data source for their own code.

Of course that doesn't replace conventional unittests based on
analysis of edge cases and tests designed to tickle them, but it would
be a start for many projects.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Raymond Hettinger

On Nov 23, 2010, at 3:41 PM, Greg Ewing wrote:

 While it may be possible to work around these things with
 sufficient levels of metaclass hackery and black magic, at
 some point one has to consider whether new syntax might
 be the least worst option.

The least worst option is to do nothing at all.
That's better than creating a new little monster with its 
own nuances and limitations.

We've gotten by well for almost two decades without
this particular static language feature creeping into Python.

For the most part, strings work well enough (see
decimal.ROUND_UP for example).  They are self-documenting
and work well with the rest of the language.

When a cluster of names cries out for its own namespace,
the usual technique is to put the names in class (see the examples
in the namedtuple docs for a way to make this a one-liner) 
or in a module (see opcode.py for example).

For xor'able and or'able flags, sets of strings work well:
   flags = {'runnable', 'callable'}
   flags |= {'runnable', 'kissable'}
   if 'callable' in flags:
  . . .

We have a hard enough time getting people to not program
Java in Python.  IMO, adding a new enumeration type
would make this situation worse.  Also, it adds weight to
the language -- Python is not in needs of yet another fundamental
construct.


Raymond


P.S.  I do recognize that lots of people have written their own
versions of Enum(), but I think they do it either out of habits formed
from statically compiled languages that lack all of our namespace
mechanisms or they do it because it is easy and fun to write
(just like people seem to enjoy writing flatten() recipes more
than they like actually using them).

One other thought:  With Py3.x, the language had its one chance
to get smaller.  Old-style classes were tossed, some built-ins 
vanished, and a few obsolete modules got nuked.  It would be
easy to have a let's add thingie x fest and lose those benefits.
There are many devs who find that the language does not 
fit-in-their-heads anymore, so considerable restraint needs to
be exercised before adding a new language feature that would
soon permeate everyone's code base and add yet another thing
that infrequent users have to learn before being able to read code.




___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
James Y Knight writes:

  You put a smiley, but, in all seriousness, I think that's actually
  the right thing to do if anyone writes a new programming
  language. It is clearly the right thing if you don't have to be
  concerned with backwards-compatibility: nobody really needs to be
  able to access the Nth codepoint in a string in constant time, so
  there's not really any point in storing a vector of codepoints.

A sad commentary on the state of Emacs usage, nobody.

The theory is that accessing the first character of a region in a
string often occurs as a primitive operation in O(N) or worse
algorithms, sometimes without enough locality at the collection of
regions level to give a reasonably small average access time.

In practice, any *Emacs user can tell you that yes, we do need to be
able to access the Nth codepoint in a buffer in constant time.  The
O(N) behavior of current Emacs implementations means that people often
use a binary coding system on large files.  Yes, some position caching
is done, but if you have a large file (eg, a mail file) which is
virtually segmented using pointers to regions, locality gets lost.
(This is not a design bug, this is a fundamental requirement: consider
fast switching between threaded view and author-sorted view.)

And of course an operation that sorts regions in a buffer using
character pointers will have the same problem.  Working with memory
pointers, OTOH, sucks more than that; GNU Emacs recently bit the
bullet and got rid of their higher-level memory-oriented APIs, all of
the Lisp structures now work with pointers, and only the very
low-level structures know about character-to-memory pointer
translation.

This performance issue is perceptible even on 3GHz machines with not
so large (50MB) mbox files.  It's *horrid* if you do something like
occur on a 1GB log file, then try randomly jumping to detected log
entries.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] constant/enum type in stdlib

2010-11-23 Thread Fred Drake
On Tue, Nov 23, 2010 at 9:35 PM, Raymond Hettinger
raymond.hettin...@gmail.com wrote:
 The least worst option is to do nothing at all.

For the standard library, I agree.

There are enough variants that are needed/desired in different
contexts, and there isn't a single clear winner.  Nor is there any
compelling reason to have a winner.

I'm generally in favor of enums (or whatever you want to call them),
and I'm in favor of importing support for the flavor you need, or just
defining constants in whatever way makes sense for your library or
application.

I don't see any problems that aren't solved by that.


  -Fred

--
Fred L. Drake, Jr.    fdrake at acm.org
A storm broke loose in my mind.  --Albert Einstein
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sporadic problems with bugs.python.org

2010-11-23 Thread Jesus Cea
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/11/10 02:51, Terry Reedy wrote:
 I hope I can go sleep before dawn :-P.
 
 I added a comment to one issue and opened another with no problem during
 the last couple of hours.

My changes have work now. After like 8 hours and a retry every five minutes.

- -- 
Jesus Cea Avion _/_/  _/_/_/_/_/_/
j...@jcea.es - http://www.jcea.es/ _/_/_/_/  _/_/_/_/  _/_/
jabber / xmpp:j...@jabber.org _/_/_/_/  _/_/_/_/_/
.  _/_/  _/_/_/_/  _/_/  _/_/
Things are not so easy  _/_/  _/_/_/_/  _/_/_/_/  _/_/
My name is Dump, Core Dump   _/_/_/_/_/_/  _/_/  _/_/
El amor es poner tu felicidad en la felicidad de otro - Leibniz
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQCVAwUBTOyAiJlgi5GaxT1NAQLavgP/ZmlKIu+luLw7DpJAVk/p3BCF7wmciE0J
KW5SmCHVsyPuKFgOY45f5PM0q7+iXiv3m59zrDNbk0yBvLnVbmGwEeeV1/kGsZ94
NrYuHqnwW6h19tbrFTmVZ5BVKBSc4pdvBhV3+0Zx9hAfkkH/heE4WKJEFd7tIzTu
h9jsvAI8pR8=
=sG82
-END PGP SIGNATURE-
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Glyph Lefkowitz
On Nov 23, 2010, at 9:44 PM, Stephen J. Turnbull wrote:

 James Y Knight writes:
 
 You put a smiley, but, in all seriousness, I think that's actually
 the right thing to do if anyone writes a new programming
 language. It is clearly the right thing if you don't have to be
 concerned with backwards-compatibility: nobody really needs to be
 able to access the Nth codepoint in a string in constant time, so
 there's not really any point in storing a vector of codepoints.
 
 A sad commentary on the state of Emacs usage, nobody.
 
 The theory is that accessing the first character of a region in a
 string often occurs as a primitive operation in O(N) or worse
 algorithms, sometimes without enough locality at the collection of
 regions level to give a reasonably small average access time.

I'm not sure what you mean by the theory is.  Whose theory?  About what?

 In practice, any *Emacs user can tell you that yes, we do need to be
 able to access the Nth codepoint in a buffer in constant time.  The
 O(N) behavior of current Emacs implementations means that people often
 use a binary coding system on large files.  Yes, some position caching
 is done, but if you have a large file (eg, a mail file) which is
 virtually segmented using pointers to regions, locality gets lost.
 (This is not a design bug, this is a fundamental requirement: consider
 fast switching between threaded view and author-sorted view.)

Sounds like a design bug to me.  Personally, I'd implement fast switching 
between threaded view and author-sorted view the same way I'd address any 
other multiple-views-on-the-same-data problem.  I'd retain data structures for 
both, and update them as the underlying model changed.

These representations may need to maintain cursors into the underlying 
character data, if they must retain giant wads of character data as an 
underlying representation (arguably the _main_ design bug in Emacs, that it 
encourages you to do that for everything, rather than imposing a sensible 
structure), but those cursors don't need to be code-point counters; they could 
be byte offsets, or opaque handles whose precise meaning varied with the 
potentially variable underlying storage.

Also, please remember that Emacs couldn't be implemented with giant Python 
strings anyway: crucially, all of this stuff is _mutable_ in Emacs.

 And of course an operation that sorts regions in a buffer using
 character pointers will have the same problem.  Working with memory
 pointers, OTOH, sucks more than that; GNU Emacs recently bit the
 bullet and got rid of their higher-level memory-oriented APIs, all of
 the Lisp structures now work with pointers, and only the very
 low-level structures know about character-to-memory pointer
 translation.
 
 This performance issue is perceptible even on 3GHz machines with not
 so large (50MB) mbox files.  It's *horrid* if you do something like
 occur on a 1GB log file, then try randomly jumping to detected log
 entries.

Case in point: occur needs to scan the buffer anyway; you can't do better 
than linear time there.  So you're going to iterate through the buffer, using 
one of the techniques that James proposed, and remember some locations.  Why 
not just have those locations be opaque cursors into your data?

In summary: you're right, in that James missed a spot.  You need bidirectional, 
*copyable* iterators that can traverse the string by byte, codepoint, grapheme, 
or decomposed glyph.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] http.server - reference to bug #427345

2010-11-23 Thread Glenn Linderman
Where might I find the bug #427345 that is referred to in a comment 
inside http.server ?  Here is a code excerpt:


# throw away additional data [see bug #427345]
while select.select([self.rfile._sock], [], [], 0)[0]:
if not self.rfile._sock.recv(1):
break


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] http.server - reference to bug #427345

2010-11-23 Thread Brian Curtin
On Tue, Nov 23, 2010 at 22:28, Glenn Linderman
v+pyt...@g.nevcal.comv%2bpyt...@g.nevcal.com
 wrote:

  Where might I find the bug #427345 that is referred to in a comment inside
 http.server ?  Here is a code excerpt:

 # throw away additional data [see bug #427345]
 while select.select([self.rfile._sock], [], [], 0)[0]:
 if not self.rfile._sock.recv(1):
 break


http://bugs.python.org/issue427345

http://bugs.python.org/ has a box on the left-hand side where you can enter
issue numbers.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread Stephen J. Turnbull
Note that I'm not saying that there shouldn't be a UTF-8 string type;
I'm just saying that for some purposes it might be a good idea to keep
UTF-16 and UTF-32 string types around.

Glyph Lefkowitz writes:

   The theory is that accessing the first character of a region in a
   string often occurs as a primitive operation in O(N) or worse
   algorithms, sometimes without enough locality at the collection of
   regions level to give a reasonably small average access time.
  
  I'm not sure what you mean by the theory is.  Whose theory?  About what?

Mine.  About why somebody somewhere someday would need fast random
access to character positions.  Nobody ever needs that is a strong
claim.

   In practice, any *Emacs user can tell you that yes, we do need to be
   able to access the Nth codepoint in a buffer in constant time.  The
   O(N) behavior of current Emacs implementations means that people often
   use a binary coding system on large files.  Yes, some position caching
   is done, but if you have a large file (eg, a mail file) which is
   virtually segmented using pointers to regions, locality gets lost.
   (This is not a design bug, this is a fundamental requirement: consider
   fast switching between threaded view and author-sorted view.)
  
  Sounds like a design bug to me.  Personally, I'd implement fast
  switching between threaded view and author-sorted view the same
  way I'd address any other multiple-views-on-the-same-data problem.
  I'd retain data structures for both, and update them as the
  underlying model changed.

Um, that's precisely the design I'm talking about.  But as you
recognize later, the message content is not part of those structures
because there's no real point in copying it *if you have fast access
to character positions*.  In a variable width character, character-
addressed design, there can be a perceptible delay in accessing even
the next message's content if you're in the wrong view.

  These representations may need to maintain cursors into the
  underlying character data, if they must retain giant wads of
  character data as an underlying representation (arguably the _main_
  design bug in Emacs, that it encourages you to do that for
  everything, rather than imposing a sensible structure), but those
  cursors don't need to be code-point counters; they could be byte
  offsets, or opaque handles whose precise meaning varied with the
  potentially variable underlying storage.

Both byte offsets and opaque handles really really suck to design,
implement, and maintain, if Lisp or Python level users can use them.
They're hard enough to do when you can hide them behind internal APIs,
but if they're accessible to users they're an endless source of user
bugs.  What was that you were saying about the difficulty of
remembering which argument is the fd?  It's like that.  Sure, you can
design APIs to help get that right, but it's not easy to provide one
that can be used for all the different applications out there.

  Also, please remember that Emacs couldn't be implemented with giant
  Python strings anyway: crucially, all of this stuff is _mutable_ in
  Emacs.

No, that's a red herring.  The use-cases where Emacs users complain
most is browsing giant logs and reading old mail; neither needs the
content to be mutable (although of course it's a convenience in the
mail case if you delete messages or fetch new mail, but that could be
done with transaction logs that get appended to the on-disk file).

  Case in point: occur needs to scan the buffer anyway; you can't
  do better than linear time there.  So you're going to iterate
  through the buffer, using one of the techniques that James
  proposed, and remember some locations.  Why not just have those
  locations be opaque cursors into your data?

They are.  But unless you're willing to implement correct character
motion, they need to be character indicies, which will be slow to
access the actual locations.  We've implemented caches, as does Emacs,
but they don't always get hits.  Finding an arbitrary position once
can involve perceptible delay on up to 1GHz machines; doing it in a
loop (which mail programs have a habit of doing) could be very
painful.

  In summary: you're right, in that James missed a spot.  You need
  bidirectional, *copyable* iterators that can traverse the string by
  byte, codepoint, grapheme, or decomposed glyph.

That's a good start, yes.  But once you talk about remembering some
locations, you're implicitly talking about random access.  Either you
maintain position indexes which naively implemented can easily be
close to the size of the text buffer (indexes are going to be at least
4 bytes, possibly 8, per position, and something like occur can
generate a lot of positions) -- in which case you might as well just
use a representation that is an array in the first place -- or you
need to implement a position cache which can be very hairy to do well.
Or you can give user programs memory indicies, and enjoy 

Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread James Y Knight
On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote:
 Or you can give user programs memory indicies, and enjoy the fun as
 the poor developers do things like pos += 1 which works fine on
 the ASCII data they have lying around, then wonder why they get
 Unicode errors when they take substrings.


a) You seem to be hung up implementation details of emacs. But yes, positions 
should be stored as an byte offset into the utf8 string. NOT as number of 
codepoints since the beginning of the string. Probably you want it to be 
somewhat opaque, so that you actually have to specify whether you wanted to go 
to +1 byte, codepoint, or grapheme.

b) Those poor developers are *already* screwed if they're using pos += 1 when 
pos is a codepoint index and they then take a substring based on that! They 
will get half a character when the string contains combining characters...

Pretending that codepoints are a useful abstraction just makes poor 
developers get by without doing the correct thing (incrementing to the next 
grapheme boundary) for a little bit longer. But once you [the language 
implementor] are providing correct abstractions for grapheme movement, it's 
just as easy to also provide an abstraction for codepoint movement, and make 
your low-level implementation of the iterator object be a byte-offset into a 
UTF8 buffer.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] len(chr(i)) = 2?

2010-11-23 Thread James Y Knight
On Nov 24, 2010, at 12:07 AM, Stephen J. Turnbull wrote:
 By the way, to send the ball back into your court, I have this feeling
 that the demand for UTF-8 is once again driven by native English
 speakers who are very shortly going to find themselves, and the data
 they are most familiar with, very much in the minority.  Of course the
 market that benefits from UTF-8 compression will remain very large for
 the immediate future, but in the grand scheme of things, most of the
 world is going to prefer UTF-16 by a substantial margin.

No, the demand for UTF-8 is because that's what much of the internet (and not 
coincidentally, unix) world has standardized on. The main pieces of software 
using UTF-16 (Windows, Java) started doing so before it became apparent that 16 
bits wasn't enough to  actually hold a unicode codepoint, so they were actually 
implementing UCS-2. In those days, UCS-2 was a fairly sensible choice.

But, now, if your choices are UTF-8 or UTF-16, UTF-8 is clearly superior. Not 
because it's smaller -- it's pretty much a tossup -- but because it is an ASCII 
superset, and thus more easily compatible with other software. That also makes 
it most commonly used for internet communication. (So, there's a huge advantage 
for using it internally as well right there: no transcoding necessary for 
writing your HTML output). UTF-16 is incompatible with ASCII, and furthermore, 
it's still a variable-width encoding, with all the same issues that causes. As 
such, there's really very little to be said in favor of it.

If you really want a fixed-width encoding, you have to go to UTF-32, which is 
excessively large. UTF-32 is a losing choice, simply because of the wasted 
memory usage.

But that's all a side issue: even if you do choose UTF-16 as your underlying 
encoding, you *still* need to provide iterators that work by byte (only now 
bytes are 16-bits), by codepoint, and by grapheme. Of course, people who 
implement UTF-16 (such as python, java, and windows) often pretend they're 
still implementing UCS-2, and don't bother even providing their users with the 
necessary APIs to do things correctly. Which, you can often get away 
with...just so long as you don't mind that you sometimes end up splitting a 
string in the middle of a codepoint and causing a unicode error!

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Web servers, bytes, str, documentation, Python 3.2a4

2010-11-23 Thread Glenn Linderman

On 11/21/2010 8:39 PM, R. David Murray wrote:

On Sun, 21 Nov 2010 19:59:54 -0800, Glenn Lindermanv+pyt...@g.nevcal.com  
wrote:

On 11/21/2010 9:18 AM, R. David Murray wrote:

I want to look at the CGI issue, but I'm not sure when I'll get to it.

Actually, since this code was working before 3.x, and if email.parser
can now accept binary streams, it seems like maybe the only thing that
might be wrong is that presently it is getting a text stream instead, so
that is something cgi.py or the application program would have to
switch, and then maybe some testing would discover correctness, or maybe
a specification of UTF-8 as the encoding to use for the text parts would
have to be done.

Well, given the bytes/string split in Python3, code definitely has to
be changed to make this work, since you have to explicitly call bytes
processing routines (message_from_bytes, message_from_binary_file,
BytesFeedparser, etc) to parse binary data, and likewise use
BytesGenerator to emit binary data.


Looks like cgi.py also calls http.client and both of them would need to 
be changed to deal with bytes.  I don't have the full translation of API 
calls in my head, nor have I ever used the email.parser API to know what 
the calls actually do... just read a bit about it... but that is 
different than using it...


However, I find code in http.client.parse_headers that is attempting to 
work-around reading a binary stream and feeding email.parser a string.  
So definitely some work to be done to fix things.


I did add some explicit threads to http.server CGI script code that I 
think work around the deadlocks that can result from attempting to 
serialize 3 pipes, and yet not require full buffering of stdin or 
stdout.  At the moment, I still am doing full buffering of stderr, but 
that is thought to be small potatoes in an http.server environment, 
generally.


But since my test case is a CGI form data, I'm stuck until this is 
fixed, or I wrap my head around the code in http.client and 
email.parser.  But not tonight (yawn!).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com