Re: [Python-Dev] setuptools in the stdlib ([Python-checkins] r45510 - python/trunk/Lib/pkgutil.py python/trunk/Lib/pydoc.py)

2006-04-18 Thread M.-A. Lemburg
Phillip J. Eby wrote:
 At 10:55 AM 4/18/2006 +0200, M.-A. Lemburg wrote:
 Phillip.eby wrote:
  Author: phillip.eby
  Date: Tue Apr 18 02:59:55 2006
  New Revision: 45510
 
  Modified:
 python/trunk/Lib/pkgutil.py
 python/trunk/Lib/pydoc.py
  Log:
  Second phase of refactoring for runpy, pkgutil, pydoc, and setuptools
  to share common PEP 302 support code, as described here:
 
  http://mail.python.org/pipermail/python-dev/2006-April/063724.html

 Shouldn't this new module be named pkglib to be in line with
 the naming scheme used for all the other utility modules, e.g. httplib,
 imaplib, poplib, etc. ?
 
 It's not a new module; it was previously a module with only one function
 in it, introduced in Python 2.3.  If it were a new module, I'd have
 inquired about a name for it first.

Didn't realize that. Too bad the time machine didn't work on
this one :-/

  pydoc now supports PEP 302 importers, by way of utility functions in
  pkgutil, such as 'walk_packages()'.  It will properly document
  modules that are in zip files, and is backward compatible to Python
  2.3 (setuptools installs for Python 2.5 will bundle it so pydoc
  doesn't break when used with eggs.)

 Are you saying that the installation of setuptools in Python 2.3
 and 2.4 will then overwrite the standard pydoc included with
 those versions ?
 
 Yes.  As far as I can tell, there were no API changes to pydoc during
 this time, so this is effectively a hot fix.

Why should a 3rd party extension be hot-fixing the standard
Python distribution ?

 This hot-fixing doesn't apply to setuptools system packages built with
 --root or --single-version-externally-managed, however, so OS vendors
 who build packages that wrap setuptools will have to make an explicit
 decision whether to also apply any fixes.  If they do not, an end-user
 can of course install setuptools in their local PYTHONPATH and the
 hotfix will take effect.

What does setuptools have to do with pydoc ? Why should a user
installing setuptools assume that some arbitrary stdlib modules
get (effectively) replaced by installing setuptools ?

If you want to provide a hot fix for Python 2.3 and 2.4, you
should make this a separate install, so that users are aware
that their Python distribution is about to get modified in ways
that have nothing to do with setuptools.

 I bothered by the fact that installing setuptools actually changes
 the standard Python installation by either overriding stdlib modules
 or monkey-patching them at setuptools import time.
 
 Please feel free to propose alternative solutions that will still ensure
 that setuptools just works for end-users.  Both this and the pydoc
 hotfix are practicality beats purity issues.

Not really. I'd consider them design flaws.

distutils is built to be extended without having to monkey-patch
it, e.g. you can easily override commands with your own variants
by supplying them via the cmdclass and distclass keyword arguments
to setup().

By monkey patching distutils during import of setuptools,
you effectively *change* distutils at run-time and not only
for the application space that you implement in setuptools,
but for all the rest of the application.

If an application wants to use setuptools for e.g. plugin
management, then standard distutils features will get
replaced by setuptools implementations which are not compatible
to the standard distutils commands, effectively making it
impossible to access the original versions.

Monkey patching is only a last resort in case nothing
else works. In this case, it's not needed, since distutils
provides the interfaces needed to extend its command classes
via the setup() call.

See e.g. mxSetup.py in the eGenix extensions for an example
of how effective the distutils design can be put to use
without having to introduce lots of unwanted side-effects.

 Add setuptools to the stdlib ? I'm still missing the PEP for this
 along with the needed discussion touching among other things,
 the change of the distutils standard python setup.py install
 to install an egg instead of a site package.
 
 Setuptools in the stdlib simply means that people wanting to use it can
 import it.  It does not affect programs that do not import it.  It also
 means that python -m easy_install is available without having to first
 download ez_setup.py.

Doesn't really seem like much of an argument for the
addition... the above is true for any 3rd party module.

 As for discussion, Guido originally brought up the question here a few
 months ago, and it's been listed in PEP 356 for a while.  I've also
 posted things related to the inclusion both here and in distutils-sig.

I know, but the discussions haven't really helped much in
getting the setuptools design compatible with standard
distutils.

Unless that's being put in place, I'm -1 on the addition,
due to the invasive nature of setuptools and its various
side-effects on systems management.

Note that it only takes one single module in an application
doing import

Re: [Python-Dev] setuptools in the stdlib ([Python-checkins] r45510 - python/trunk/Lib/pkgutil.py python/trunk/Lib/pydoc.py)

2006-04-18 Thread M.-A. Lemburg
Phillip J. Eby wrote:
 At 07:15 PM 4/18/2006 +0200, M.-A. Lemburg wrote:
 Why should a 3rd party extension be hot-fixing the standard
 Python distribution ?
 
 Because setuptools installs things in zip files, and older versions of 
 pydoc don't work for packages zip files.

That doesn't answer my question.

 If you want to provide a hot fix for Python 2.3 and 2.4, you
 should make this a separate install, so that users are aware
 that their Python distribution is about to get modified in ways
 that have nothing to do with setuptools.
 
 Their Python distribution is not modified -- new modules are merely 
 placed on sys.path ahead of the stdlib.  (Which, I might add, is a 
 perfectly normal process in Python -- nothing stops users from installing 
 their own version of pydoc or any other module via PYTHONPATH.)
 
 Note also that uninstalling setuptools by removing the .egg file or 
 directory will effectively remove the hot fix, since the modules live in 
 the .egg, not in the stdlib.

Whether you place a module with the same name in front of
the stdlib path in PYTHONPATH (e.g. copy it into site-packages)
or replace the file in the Python installation is really the
same thing to the user.

Third-party extension *should not do this* !

It's OK to have private modified copies of a module inside
a package or used inside an application, but
python setup.py install should never (effectively)
replace a Python stdlib module with some modified copy
without explicit user interaction.

 If an application wants to use setuptools for e.g. plugin
 management, then standard distutils features will get
 replaced by setuptools implementations which are not compatible
 to the standard distutils commands, effectively making it
 impossible to access the original versions.
 
 Please do a little research before you spread FUD.  The 'pkg_resources' 
 module is used for runtime plugin management, and it does not monkeypatch 
 anything.

I'm talking about the setuptools package which does apply
monkey patching and is needed to manage the download and
installation of plugin eggs, AFAIK.

 Monkey patching is only a last resort in case nothing
 else works. In this case, it's not needed, since distutils
 provides the interfaces needed to extend its command classes
 via the setup() call.
 
 The monkeypatching is there so that the easy_install command can build eggs 
 for packages that use the distutils.  It's also there so that other 
 distutils extensions that monkeypatch distutils (and there are a few of 
 them out there) will have a better chance of working with setuptools.
 
 I originally took a minimally-invasive approach to setuptools-distutils 
 interaction, but it was simply not possible to provide a high-degree of 
 backward and sideward compatibility without it.  In fact, I seem to 
 recall finding some behaviors in some versions of distutils that can't be 
 modified without monkeypatching, although the details are escaping me at 
 this moment.

That's a very vague comment.

The distutils mechanism for providing your own command classes
lets you take complete control over distutils if needed.

What's good about it, is that this approach doesn't modify anything
inside distutils at run-time, but does these modifications
on a per-setup()-call basis.

As for setuptools, you import the package and suddenly distutils
isn't what's documented on python.org anymore.

 As for discussion, Guido originally brought up the question here a few
 months ago, and it's been listed in PEP 356 for a while.  I've also
 posted things related to the inclusion both here and in distutils-sig.

 I know, but the discussions haven't really helped much in
 getting the setuptools design compatible with standard
 distutils.
 
 That's because the job was already done.  :) 

Not much of an argument, if you ask me.

Some of the design decisions you made in setuptools are simply wrong
IMHO and these need to be discussed in a PEP process.

 The setuptools design bends 
 over backwards to be compatible with Python 2.3 and 2.4 versions of 
 distutils, not to mention py2exe, Pyrex, and other distutils extensions, 
 along with the quirky uses of distutils that exist in dozens of distributed 
 Python packages.
 
 However, I think you and I may perhaps have different definitions of 
 compatibility.  Mine is that things just work and users don't have to 
 do anything special.  For that definition, setuptools is extremely 
 compatible with the standard distutils.  In many situations it's more 
 compatible than the distutils themselves, in that more things just work.  ;)

You've implemented your own view of just works. This is fine,
but please remember that Python is a collaborative work, so
design decisions have to be worked out in collaboration as
well.

There aren't all that many things that are wrong in setuptools,
but some of them are essential:

* setuptools should not monkey patch distutils on import

* the standard python setup.py install should continue

Re: [Python-Dev] 2.5a1 Performance

2006-04-18 Thread M.-A. Lemburg
Tim Peters wrote:
 [M.-A. Lemburg]
 I could contribute pybench to the Tools/ directory if that
 makes a difference:
 
 +1.  It's frequently used and nice work.  Besides, then we could
 easily fiddle the tests to make Python look better ;-)

That's a good argument :-)

Note that the tests are versioned and the tools refuses to
compare tests with different version numbers.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode vs buffer (array) design issue can crash interpreter

2006-04-14 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Neal Norwitz wrote:
 I'll leave this decision to Martin or someone else, since I'm not
 familiar with the ramifications.  Since it was documented as unsigned,
 I think it's reasonable to consider changing.  Though it could create
 signed-ness warnings in other modules.  I'm not sure but it's possible
 it could create problems for C++ compilers since they are pickier.
 
 My concern is not so much that it becomes unsigned in 2.4.4, but that
 it stops being a typedef for wchar_t on Linux. C++ code that uses that
 assumption might stop compiling.

I'd argue that such code is broken anyway: 3rd party code simply
cannot make any assumptions on the typedef behind Py_UNICODE.

Note that you'd only see this change when compiling Python in the
non-standard UCS4 setting on Linux.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 14 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode vs buffer (array) design issue can crash interpreter

2006-04-14 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 I'd argue that such code is broken anyway: 3rd party code simply
 cannot make any assumptions on the typedef behind Py_UNICODE.
 
 The code would work just fine now, but will stop working with the
 change. Users of the code might not be open to arguments.

Fair enough. Let's leave things as they are for 2.4, then.

 In any case, it's up to the release manager to decide.
 
 Note that you'd only see this change when compiling Python in the
 non-standard UCS4 setting on Linux.
 
 Sure. That happens to be the default on many Linux distributions,
 though.

Unfortunately, that's true.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 14 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] unicode vs buffer (array) design issue can crash interpreter

2006-04-13 Thread M.-A. Lemburg
Neal Norwitz wrote:
 On 3/31/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Martin v. Löwis wrote:
 Neal Norwitz wrote:
 See http://python.org/sf/1454485 for the gory details.  Basically if
 you create a unicode array (array.array('u')) and try to append an
 8-bit string (ie, not unicode), you can crash the interpreter.

 The problem is that the string is converted without question to a
 unicode buffer.  Within unicode, it assumes the data to be valid, but
 this isn't necessarily the case.  We wind up accessing an array with a
 negative index and boom.
 There are several problems combined here, which might need discussion:

 - why does the 'u#' converter use the buffer interface if available?
   it should just support Unicode objects. The buffer object makes
   no promise that the buffer actually is meaningful UCS-2/UCS-4, so
   u# shouldn't guess that it is.
   (FWIW, it currently truncates the buffer size to the next-smaller
multiple of sizeof(Py_UNICODE), and silently so)

   I think that part should just go: u# should be restricted to unicode
   objects.
 'u#' is intended to match 's#' which also uses the buffer
 interface. It expects the buffer returned by the object
 to a be a Py_UNICODE* buffer, hence the calculation of the
 length.

 However, we already have 'es#' which is a lot safer to use
 in this respect: you can explicity define the encoding you
 want to see, e.g. 'unicode-internal' and the associated
 codec also takes care of range checks, etc.

 So, I'm +1 on restricting 'u#' to Unicode objects.
 
 Note:  2.5 no longer crashes, 2.4 does.
 
 Does this mean you would like to see this patch checked in to 2.5? 

Yes.

 What should we do about 2.4?

Perhaps you could add a warning that is displayed when
using u# with non-Unicode objects ?!

 Index: Python/getargs.c
 ===
 --- Python/getargs.c(revision 45333)
 +++ Python/getargs.c(working copy)
 @@ -1042,11 +1042,8 @@
 STORE_SIZE(PyUnicode_GET_SIZE(arg));
 }
 else {
 -   char *buf;
 -   Py_ssize_t count = convertbuffer(arg, p, buf);
 -   if (count  0)
 -   return converterr(buf, arg, msgbuf, bufsize);
 -   STORE_SIZE(count/(sizeof(Py_UNICODE)));
 +   return converterr(cannot convert raw 
 buffers,
 + arg, msgbuf, bufsize);
 }
 format++;
 } else {
 
 - should Python guarantee that all characters in a Unicode object
   are between 0 and sys.maxunicode? Currently, it is possible to
   create Unicode strings with either negative or very large Py_UNICODE
   elements.

 - if the answer to the last question is no (i.e. if it is intentional
   that a unicode object can contain arbitrary Py_UNICODE values): should
   Python then guarantee that Py_UNICODE is an unsigned type?
 Py_UNICODE must always be unsigned. The whole implementation
 relies on this and has been designed with this in mind (see
 PEP 100). AFAICT, the configure does check that Py_UNICODE
 is always unsigned.
 
 Martin fixed the crashing problem in 2.5 by making wchar_t unsigned
 which was a bug.  (A configure test was reversed IIRC.)  Can this
 change to wchar_t be made in 2.4?  That technically changes all the
 interfaces even though it was a mistake.  What should be done for 2.4?

If users want to interface from wchar_t to Python's Unicode
type they have to go through the PyUnicode_FromWideChar()
and PyUnicode_AsWideChar() interfaces. Assuming that Py_UNICODE
is the same as wchar_t is simply wrong (and always was).

I also think that changing the type from signed to unsigned
by backporting the configure fix will only make things safer
for the user, since extensions will probably not even be aware
of the fact that Py_UNICODE could be signed (it has always been
documented to be unsigned).

So +1 on backporting the configure test fix to 2.4.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] int()'s ValueError behaviour

2006-04-09 Thread M.-A. Lemburg
Guido van Rossum wrote:
 Go ahead and fix it. This was probably never changed since 1990 or
 so... Do expect some code brakage where people rely on the old
 behavior. :-(
 
 --Guido
 
 On 4/9/06, Thomas Wouters [EMAIL PROTECTED] wrote:
 Someone on IRC (who refuses to report bugs on sourceforge, so I guess he
 wants to remain anonymous) came with this very amusing bug: int(), when
 raising ValueError, doesn't quote (or repr(), rather) its arguments:

   int()
 Traceback (most recent call last):
   File stdin, line 1, in ?
 ValueError: invalid literal for int():
 int(34\n\n\n5)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 ValueError: invalid literal for int(): 34


 5
 Unicode behaviour also isn't always consistent:
 int(u'\u0100')
 Traceback (most recent call last):
   File stdin, line 1, in ?
 UnicodeEncodeError: 'decimal' codec can't encode character u'\u0100' in
 position 0: invalid decimal Unicode string
 int(u'\u09ec', 6)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 ValueError: invalid literal for int(): 6

 And trying to use the 'decimal' codec directly:
 u'6'.encode('decimal')
 Traceback (most recent call last):
   File stdin, line 1, in ?
 LookupError: unknown encoding: decimal

This part I can explain: the internal decimal codec isn't
made public through the codec registry since it only
supports encoding.

The encoder converts a Unicode decimal strings to plain
ASCII decimals.

The error message looks like a standard codec error message
because the raise_encode_exception() API is used.

 I'm not sure if the latter problems are fixable, but the former should be
 fixed by passing the argument to ValueError through repr(), I think. It's
 also been suggested (by the reporter, and I agree) that the actual base
 should be in the errormessage too. Is there some reason not to do this that
 I've overlooked?

 --
 Thomas Wouters [EMAIL PROTECTED]

 Hi! I'm a .signature virus! copy me into your .signature file to help me
 spread!
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/guido%40python.org



 
 
 --
 --Guido van Rossum (home page: http://www.python.org/~guido/)
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 09 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] The i string-prefix: I18n'ed strings

2006-04-07 Thread M.-A. Lemburg
Martin Blais wrote:
 Hi all
 
 I got an evil idea for Python this morning -- Guido: no, it's not
 about linked lists :-) -- , and I'd  like to bounce it here.  But
 first, a bit of context.

This has been discussed a few times before, see e.g.

http://mail.python.org/pipermail/python-list/2000-January/020346.html

In summary, the following points were made in the various
discussions (this is from memory, so I may have forgotten
a few points):

* the string literal modifiers r and u are really only a cludge
  which should not be extended to other uses

* being able to register such modifiers would result in unreadable
  and unmaintainable code, since the purpose of the used modifiers
  wouldn't be clear to the reader of a code snippet

* writing i instead of _() saves two key-strokes - not really
  enough to warrant the change

* if you want to do it right, you'd also have to add iu,
  ir for completeness

* internationalization requires a lot more than just calling
  a function: context and domains are very important when it
  comes to translating strings in i18n efforts; these can
  easily be added to a function call as parameter, but not
  to a string modifier

* there are lots of tools to do string extraction using the
  _() notation (which also works in C); for i such tools
  would have to be rewritten

 In the context of writing i18n apps, programmers have to mark
 strings that may be internationalized in a way that
 
 - a special hook gets called at runtime to perform the lookup in a
 catalog of translations setup for a specific language;
 
 - they can be extracted by an external tool to produce the keys of all
 the catalogs, so that translators can update the list of keys to
 translate and produce the values in the target languages.
 
 Usually, you bind a function to a short name, like _() and N_(), and
 it looks kind-of like this::
 
 _(My string to translate.)
 
 or
 
 N_(This is marked for translation) # N_() is a noop.
 
 pygettext does the work of extracting those patterns from the files,
 doing all the parsing manually, i..e it does not use runtime Python
 introspection to do this at all, it is simply a simple text parsing
 algorithm (which works pretty well).  I'm simplifying things a bit,
 but that is the jist of how it works, for those not familiar with
 i18n.
 
 
 This morning I woke up staring at the ceiling and the only thing in my
 mind was my web app code is ugly.  I had visions of LISP parentheses
 with constructs like
 
...
A(P(_(Click here to forget), href=...
...
 
 (In my example, I built a library not unlike stan for creating HTML,
 which is where classes A and P come from.)  I find the i18n markup a
 bit annoying, especially when there are many i18n strings close
 together.  My point is: adding parentheses around almost all strings
 gets tiresome and charges the otherwise divine esthetics of Python
 source code.
 
 (Okie, that's enough for context.)
 
 
 So I had the following idea: would it not be nice if there existed a
 string-prefix 'i' -- a string prefix like for the raw (r'...') and
 unicode (u'...') strings -- that would mark the string as being for
 i18n?   Something like this (reusing my example above)::
 
A(P(iClick here to forget, href=...
 
 Notes:
 
 - We could then use the spiffy new AST to build a better parser to
 extract those strings from the source code as well.
 
 - We could also have a prefix I for strings to be marked but not
 runtime-translated, to replace the N_() strings.
 
 - This implies that we would have to introduce some way for these
 strings to call a custom function at runtime.
 
 - My impression is that this process of i18n is common enough that it
 does not move very much, and that there aren't 18 ways to go about
 it either, so that it would be reasonable to consider adding it to the
 language.   This may be completely wrong, I am by no means an i18n
 expert, please show if this is not the case.
 
 - Potential issue: do we still need other prefixes when 'i' is used,
 and if so, how do we combine them...
 
 
 Okay, let's push it further a bit:  how about if there was some kind
 of generic mechanism built-in in Python for adding new string-prefixes
 which invoke callbacks when the string with the prefix is evaluated? 
 This could be used to implement what I'm suggesting above, and beyond.
  Something like this::
 
import i18n
i18n.register_string_prefix('i', _)
i18n.register_string_prefix('I', N_)
 
 I'm not sure what else we might be able to do with this, you may have
 other useful ideas.
 
 
 Any comments welcome.
 
 cheers,
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 07 2006)
 Python/Zope Consulting and Support ...  

Re: [Python-Dev] unicode vs buffer (array) design issue can crash interpreter

2006-03-31 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Neal Norwitz wrote:
 See http://python.org/sf/1454485 for the gory details.  Basically if
 you create a unicode array (array.array('u')) and try to append an
 8-bit string (ie, not unicode), you can crash the interpreter.

 The problem is that the string is converted without question to a
 unicode buffer.  Within unicode, it assumes the data to be valid, but
 this isn't necessarily the case.  We wind up accessing an array with a
 negative index and boom.
 
 There are several problems combined here, which might need discussion:
 
 - why does the 'u#' converter use the buffer interface if available?
   it should just support Unicode objects. The buffer object makes
   no promise that the buffer actually is meaningful UCS-2/UCS-4, so
   u# shouldn't guess that it is.
   (FWIW, it currently truncates the buffer size to the next-smaller
multiple of sizeof(Py_UNICODE), and silently so)
 
   I think that part should just go: u# should be restricted to unicode
   objects.

'u#' is intended to match 's#' which also uses the buffer
interface. It expects the buffer returned by the object
to a be a Py_UNICODE* buffer, hence the calculation of the
length.

However, we already have 'es#' which is a lot safer to use
in this respect: you can explicity define the encoding you
want to see, e.g. 'unicode-internal' and the associated
codec also takes care of range checks, etc.

So, I'm +1 on restricting 'u#' to Unicode objects.

 - should Python guarantee that all characters in a Unicode object
   are between 0 and sys.maxunicode? Currently, it is possible to
   create Unicode strings with either negative or very large Py_UNICODE
   elements.
 
 - if the answer to the last question is no (i.e. if it is intentional
   that a unicode object can contain arbitrary Py_UNICODE values): should
   Python then guarantee that Py_UNICODE is an unsigned type?

Py_UNICODE must always be unsigned. The whole implementation
relies on this and has been designed with this in mind (see
PEP 100). AFAICT, the configure does check that Py_UNICODE
is always unsigned.

Regarding the permitted range of values, I think the necessary
overhead to check that all Py_UNICODE* array values are within
the currently permitted range would unnecessarily slow down
the implementation.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 31 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pysqlite for 2.5?

2006-03-30 Thread M.-A. Lemburg
Anthony Baxter wrote:
 On Thursday 30 March 2006 22:31, M.-A. Lemburg wrote:
 I don't really care about the name, but please be aware that
 you are talking about adding a *very* popular module name to
 the top-level Python namespace if you go for db or database.

 Why can't we just have the pysqlite package as top-level
 package, with a slight change in name to prevent clashes
 with the external distribution, e.g. sqlite ?!

 Such a module name is less likely to cause problems.
 
 Excellent point. Hm. Maybe we should just go with 'sqlite', instead.

Anything, but please no db or database top-level module or
package :-)

 Anyway, at the moment, there's 'db.sqlite' all checked in and working 
 on a branch, at
 svn+ssh://[EMAIL PROTECTED]/python/branches/sqlite-integration
 or, if you use a readonly version
 http://svn.python.org/python/branches/sqlite-integration
 
 (you can use 'svn switch URL' to change a current trunk checkout to 
 the branch without having to checkout a whole new version). 
 
 It is from sqlite 2.1.3. Gerhard is cutting a 2.2.0 which reduces the 
 requirement for the version of sqlite3 that is required. Currently, 
 it needs 3.2.2 or later. There's tests (which pass), setup.py magic 
 to find a correct sqlite3 version, and the like. Still to do:
 
 Windows buildproj
 Documentation
 Upgrade to the updated pysqlite once it's out
 maybe switch from db.sqlite to just sqlite (trivial enough change).

I take it that this is not going to go into 2.5a1 ?!

Also your statement regarding sqlite3 suggests that sqlite
itself is not included - why not ?

Isn't the main argument for having pysqlite included in the
core to be able to play around with SQL without relying
on external libraries ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 30 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] pysqlite for 2.5?

2006-03-30 Thread M.-A. Lemburg
Anthony Baxter wrote:
 On Friday 31 March 2006 02:04, M.-A. Lemburg wrote:
 Excellent point. Hm. Maybe we should just go with 'sqlite',
 instead.
 Anything, but please no db or database top-level module or
 package :-)
 
 
 How about sql? wink
 
 I can't think of a better name right now - can anyone else, or should 
 it just go in the top level as 'sqlite'?

I think sqlite is just fine.

 I take it that this is not going to go into 2.5a1 ?!
 
 Well, right now the major missing bits for landing it right now are 
 the windows build project and the documentation. I'm pretty 
 comfortable with landing it for a1. It has tests, I've knitted these 
 into the Python regression testing suite and they're all passing 
 fine. I've tested building on systems with a version of sqlite that 
 is acceptable, with no sqlite, and with an old version of sqlite, and 
 the build process handles it all correctly. 

Will it also work with e.g. sqlite 2.8.15 (ie. sqlite  v3) -
this is the standard version on SuSE 9.2.

 Also your statement regarding sqlite3 suggests that sqlite
 itself is not included - why not ?
 
 For the same reasons we don't include the BerkeleyDB library. Many, 
 many modern operating systems now ship with libsqlite3 (just as they 
 ship with bsddb).  While sqlite is nowhere near the size of 
 BerkeleyDB, it's still a non-trivial amount of code. 

If it works with sqlite2 then I agree: these versions are
usually available on Unixes. sqlite3 is not as wide-spread
yet.

What about the Windows build ? Will that contain the necessary
DLLs ?

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 30 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r43214 - peps/trunk/pep-3000.txt

2006-03-23 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 And we still have someone actively interested in maintaining the OS2
 port, it seems.

 Dito for BeOS, now under the name Zeta OS.
 
 Who is the one interested in maintaining the BeOS port? the last
 checkins related to BeOS seem to originate from the 2001/2002 period.

That would be Donn Cave:

http://bebits.com/app/4232

He's also the one who wrote the Bethon wrapper for the BeOS C++ API.

BTW, the fact that you don't see new checkins doesn't necessarily
mean that a port is no longer used. It may just be that the existing
port still works without changes.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 23 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] [Fwd: buildbot warnings in amd64 gentoo trunk]

2006-03-22 Thread M.-A. Lemburg
Would it be possible to redirect these buildbot messages to the
python-checkins or a separate python-buildbot list ?

 Original Message 
Subject: [Python-Dev] buildbot warnings in amd64 gentoo trunk
Date: Wed, 22 Mar 2006 15:18:20 +
From: [EMAIL PROTECTED]
Reply-To: python-dev@python.org
To: python-dev@python.org

The Buildbot has detected a new failure of amd64 gentoo trunk.
Full details are available at:
 http://www.python.org/dev/buildbot/all/amd64%20gentoo%20trunk/builds/115

Buildbot URL: http://www.python.org/dev/buildbot/all/

Build Reason:
Build Source Stamp: [branch trunk] HEAD
Blamelist: barry.warsaw

Build Had Warnings: warnings test

sincerely,
 -The Buildbot

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Fwd: buildbot warnings in amd64 gentoo trunk]

2006-03-22 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Would it be possible to redirect these buildbot messages to the
 python-checkins or a separate python-buildbot list ?
 
 Sure. They are sent to python-dev, because I think Tim Peters
 wanted them here.

For the Snake-Farm we had a separate mailing list, so I'd prefer
that if possible. This lets you opt-in to the messages and also
makes it easier to search via the python.org search facility.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 22 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documenting the ssize_t Python C API changes

2006-03-21 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 It's not a waste of time at all: you'd be helping lots and
 lots of developers out there who want to fix their extensions.
 
 This is free software, anybody is free to decide what they do.

With due respect for other developers, yes.

 I don't believe that developers would be helped a lot - they
 can easily search for Py_ssize_t in the header files, and
 find all the APIs that have changed.

Of course they can. We could also stop writing documentation
and tell users to read the code instead - after all, it's
all there, ready to be consumed by interested parties.
Oh, and for changes: we'll just point them to Subversion and
tell them to run a 'svn diff'.

 However, they *should* not have to do that. Instead, they
 should look at it from a conceptual point of view: Does
 that variable count something (memory, number of elements,
 size of some structure). If it does, and it currently counts
 that using an int, it should be changed to use a Py_ssize_t
 instead.
 
 So just review all occurrences of int in your code, and you
 are done. No need to look at API lists.

Just did a grep on the mx Extensions: 17000 cases of 'int'
being used. Sounds like a nice weekend activity...

Seriously, your suggestion on how to port the extensions
to Py_ssize_t is certainly true, but this may not be what
all extension authors would want to do (or at least not
right away). Instead, they'll want to know what changed
and then check their code for uses of the changed APIs,
in particular those APIs where output parameters are
used.

I think that documenting these changes is part of doing
responsible development. You seem to disagree.

 The ssize_t patch is the single most disruptive patch in
 Python 2.5, so it deserves special attention.
 
 I can believe you that you would have preferred not to see
 that patch at all, not at this time, and preferably never.
 I have a different view. I don't see it as a problem, but
 as a solution.

You are right in that I would have rather seen this
change go into Py3k than into the 2.x series. You're
wrong in saying that I would have preferred not to
get such a change into Python at all.

I've given up believing that there would be a possibility
of having code that works in both Py3k and Py2.x. I've
also given up, believing that code written for Py2.x
will continue to work in Py3k.

I still holding on to the belief that the 2.x will
not introduce major breakage between the versions and
that there'll always be some way to write software that
works in 2.n and 2.n+1 for any n.

However, I feel that at least some Python developers
seem to be OK with breaking this possibility, ignoring
all the existing working code that's out there.

 Again, if you think the documentation should be improved,
 go ahead and improve it.

Here's a grep of all the changed/new APIs, please include it
in the PEP.

./dictobject.h:
-- PyAPI_FUNC(int) PyDict_Next(
--  PyObject *mp, Py_ssize_t *pos, PyObject **key, PyObject **value);
-- PyAPI_FUNC(Py_ssize_t) PyDict_Size(PyObject *mp);
./pyerrors.h:
-- PyAPI_FUNC(PyObject *) PyUnicodeDecodeError_Create(
--  const char *, const char *, Py_ssize_t, Py_ssize_t, Py_ssize_t,
const char *);
-- PyAPI_FUNC(PyObject *) PyUnicodeEncodeError_Create(
--  const char *, const Py_UNICODE *, Py_ssize_t, Py_ssize_t,
Py_ssize_t, const char *);
-- PyAPI_FUNC(PyObject *) PyUnicodeTranslateError_Create(
--  const Py_UNICODE *, Py_ssize_t, Py_ssize_t, Py_ssize_t, const
char *);
-- PyAPI_FUNC(int) PyUnicodeEncodeError_GetStart(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_GetStart(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_GetStart(PyObject *,
Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeEncodeError_SetStart(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_SetStart(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_SetStart(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyUnicodeEncodeError_GetEnd(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_GetEnd(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_GetEnd(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeEncodeError_SetEnd(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_SetEnd(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_SetEnd(PyObject *, Py_ssize_t);
./tupleobject.h:
-- PyAPI_FUNC(PyObject *) PyTuple_New(Py_ssize_t size);
-- PyAPI_FUNC(Py_ssize_t) PyTuple_Size(PyObject *);
-- PyAPI_FUNC(PyObject *) PyTuple_GetItem(PyObject *, Py_ssize_t);
-- PyAPI_FUNC(int) PyTuple_SetItem(PyObject *, Py_ssize_t, PyObject *);
-- PyAPI_FUNC(PyObject *) PyTuple_GetSlice(PyObject *, Py_ssize_t,
Py_ssize_t);
-- PyAPI_FUNC(int) _PyTuple_Resize(PyObject **, Py_ssize_t);
-- PyAPI_FUNC(PyObject *) PyTuple_Pack(Py_ssize_t, ...);
./sliceobject.h:
-- PyAPI_FUNC(int) PySlice_GetIndices(PySliceObject *r, Py_ssize_t length

Re: [Python-Dev] Documenting the ssize_t Python C API changes

2006-03-21 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Here's a grep of all the changed/new APIs, please include it
 in the PEP.
 
 You want me to include that *literally*? Are you serious?

Feel free to format it in a different way.

 Please go ahead and commit that change yourself: I consider
 it completely unreadable and entirely worthless.

Interesting: A few mails ago you suggested that developers
do exactly what I did to get the list of changes. Now you
gripe about the output format of the grep.

I really don't understand what your problem is with documenting
the work that you and others have put into this. Why is there
so much resistance ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 21 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documenting the ssize_t Python C API changes

2006-03-21 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Interesting: A few mails ago you suggested that developers
 do exactly what I did to get the list of changes. Now you
 gripe about the output format of the grep.
 
 Developers which use grep can read the output of grep. Developers
 which use other methods of searching (e.g. Visual Studio) can
 understand the output of these tools. I don't say they *should*
 search for Py_ssize_t, I said they *can* if they want to.
 
 I still don't think anybody would *want* to read such a list.

They don't necessarily want to read it, but like Fredrik did,
use as input for their testing tools.

 I really don't understand what your problem is with documenting
 the work that you and others have put into this. Why is there
 so much resistance ?
 
 Because I think it is pointless, confusing, and redundant in a dangerous
 way. There is absolutely NO problem with API changes where a function
 consumes Py_ssize_t. Why should anybody care that 
 PyString_FromStringAndSize now consumes a Py_ssize_t? Passing an int
 works just fine, and if the int fits the length of the string, it will
 absolutely do the right thing. There is no need to touch that code,
 or worry about the change.
 
 Putting PyString_FromStringAndSize into such a list *will* make
 developers worry, because they now think they have to change their code
 somehow, when there is absolutely for action.

Don't you think developers are capable enough to judge for
themselves ?

They might also want to change their extensions to make use
of the new possibilities, so a list of APIs taking Py_ssize_t
parameters on input would be handy to check where there's
potential for such a change in their code.

Perhaps we should have three lists:

1. Py_ssize_t output parameters (these need changes)
2. Py_ssize_t return values (these need overflow checks)
3. Py_ssize_t input parameters (these can be used to enhance
   the extension)

Here's the list for 2 (I already provided the list for 1 in the
other mail):

./dictobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyDict_Size(PyObject *mp);
./tupleobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyTuple_Size(PyObject *);
./stringobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyString_Size(PyObject *);
./longobject.h:
-- PyAPI_FUNC(Py_ssize_t) _PyLong_AsSsize_t(PyObject *);
./intobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyInt_AsSsize_t(PyObject *);
./abstract.h:
--  PyAPI_FUNC(Py_ssize_t) PyObject_Size(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) PyObject_Length(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) _PyObject_LengthHint(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) PySequence_Size(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) PySequence_Length(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) PyMapping_Size(PyObject *o);
--  PyAPI_FUNC(Py_ssize_t) PyMapping_Length(PyObject *o);
./unicodeobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyUnicode_GetSize(
-- PyObject *unicode/* Unicode object */
-- );
-- PyAPI_FUNC(Py_ssize_t) PyUnicode_AsWideChar(
-- PyUnicodeObject *unicode,   /* Unicode object */
-- register wchar_t *w,/* wchar_t buffer */
-- Py_ssize_t size /* size of buffer */
-- );
-- PyAPI_FUNC(Py_ssize_t) PyUnicode_Tailmatch(
-- PyObject *str,   /* String */
-- PyObject *substr,/* Prefix or Suffix string */
-- Py_ssize_t start,/* Start index */
-- Py_ssize_t end,  /* Stop index */
-- int direction/* Tail end: -1 prefix, +1 suffix */
-- );
-- PyAPI_FUNC(Py_ssize_t) PyUnicode_Find(
-- PyObject *str,   /* String */
-- PyObject *substr,/* Substring to find */
-- Py_ssize_t start,/* Start index */
-- Py_ssize_t end,  /* Stop index */
-- int direction/* Find direction: +1 forward, -1
backward */
-- );
-- PyAPI_FUNC(Py_ssize_t) PyUnicode_Count(
-- PyObject *str,   /* String */
-- PyObject *substr,/* Substring to count */
-- Py_ssize_t start,/* Start index */
-- Py_ssize_t end   /* Stop index */
-- );
./listobject.h:
-- PyAPI_FUNC(Py_ssize_t) PyList_Size(PyObject *);

 Furthermore, it is redundant because there are now *four* places where
 the signature of the API is mentioned: in the header, in the
 implementation, in the API docs, and in the PEP. The compiler watches
 for consistency of the first two; consistency of the others is a manual
 process, and thus error-prone.

It's a one-time operation, documenting the changes between
Python 2.4 and 2.5 - much like the grand renaming in the C API
a few years ago.

You'd only create the lists once in the PEP, namely when Python 2.5
is released.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 21 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC

Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-20 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Fernando Perez wrote:
 So I think M.A. is right on the money here with his statement.  Unless you
 consider the Linux/64bit camp insignificant.  But if that is the case, it
 might be worth putting a big statement in the 2.5 release notes indicating
 there is a good chance, if you use third party extensions and a 64 bit OS,
 that this won't work for you.  Which will mean that a fraction of the
 scientific userbase (a big, enthusiastic and growing community of python
 users) will have to stick to 2.4.
 
 It's more intricate than that. If certain extensions are of interest to
 the scientific community, I certainly expect the authors of them to fix
 them.
 
 However, to fix the extension can still be done in different levels:
 you can easily fix it to not crash, by just honoring all compiler
 warnings that gcc produces. That doesn't mean you fully support
 Py_ssize_t, since you might not support collections with more than
 2**31 elements. In these cases, the extension will not crash, but it
 will compute incorrect results if you ever encounter such a collection.

Sorry Martin, but such an advice is simply not good enough.

You can't expect people to go chasing compiler warnings to
fix their code without at the same time giving them a
complete list of APIs that changed and which is needed for
the still required manual inspection.

I really don't understand why you put so much effort into
trying to argue that the ssize_t patch isn't going to break
extensions or that fixing compiler warnings is good enough.

The interface to the Python API should not be a compiler
giving you warnings, it should be a document where users
(extension authors) can read about the changes, check
their code using grep an editor or some other tool and then
*know* that their code works rather than relying on the
compiler warnings which only *might* catch all the cases
where breakage occurs.

Like Fernando said, 64-bit is becoming the de-facto standard
for Unix systems very quickly these days. It is becoming
hard to buy systems not supporting 64-bit and, of course,
people are anxious to use the capabilities of the systems
they bought (whether they need them or not). If you look at
the market for root servers as second example of users
of 64-bit machines next to the scientific users, several
big hosters have completely switched over to 64-bit.
You can't even get a 32-bit install of Linux on those
machines - because they use custom kernels patched for
their particular hardware.

Ignoring this is just foolish.

I consider the fact that it's currently not possible to have
a look at the changed APIs a documentation which is in the
responsibility of the patch authors to provide (just like it
always is).

Please provide such a list.

 Even on all these AMD64 Linux machines, this is still fairly unlikely,
 since you need considerably more than 16GiB main memory to do something
 useful on such collections - very few machines have that much memory
 today. Again, people that have such machines and need to run Python
 programs with that many data need to make sure that the extensions
 work for them. Sticking with 2.4 is no option for these people, since
 2.4 doesn't support such large collections, anyway.
 
 To really fix such extensions, I recommend building with either
 Microsoft C or Intel C. The Microsoft compiler is available free
 of charge, but runs on Windows only. It gives good warnings, and
 if you fix them all (in a careful way), your code should fully
 support 64 bits. Likewise for the Intel compiler: it is available
 for free only for a month, but it runs on Linux as well.
 
 OTOH, I'm still doubtful how many extensions will be really affected
 by the change in the first place. Your code *breaks* with the change
 only if you implement the sequence or buffer protocols. I'm doubtful
 that this is an issue for most applications, since many extensions
 (I believe) work without implementing these protocols.

You know that it's not only the sequence and buffer protocol
that changed.

If you grep through Include, you get at these reference to
output variables which are going to cause breakage regardless
of whether you use 64-bit or not simply due to the fact that
the function is writing into memory owned by the caller:

./dictobject.h:
-- PyAPI_FUNC(int) PyDict_Next(
--  PyObject *mp, Py_ssize_t *pos, PyObject **key, PyObject **value);
./pyerrors.h:
-- PyAPI_FUNC(int) PyUnicodeEncodeError_GetStart(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_GetStart(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_GetStart(PyObject *,
Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeEncodeError_GetEnd(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeDecodeError_GetEnd(PyObject *, Py_ssize_t *);
-- PyAPI_FUNC(int) PyUnicodeTranslateError_GetEnd(PyObject *, Py_ssize_t *);
./sliceobject.h:
-- PyAPI_FUNC(int) PySlice_GetIndices(PySliceObject *r, Py_ssize_t length,
-- 

Re: [Python-Dev] Documenting the ssize_t Python C API changes

2006-03-20 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 I really don't understand why you put so much effort into
 trying to argue that the ssize_t patch isn't going to break
 extensions or that fixing compiler warnings is good enough.
 
 Well, in *this* thread, my main point is that I don't want
 to provide the list that you demand, mainly because I consider
 it a waste of my time to create it.

It's not a waste of time at all: you'd be helping lots and
lots of developers out there who want to fix their extensions.

Apart from that I can only repeat what I said before: you
expect developers to put lots of time into fixing their
extensions to run with Python 2.5, so it's only fair that
you invest some time into making it as easy as possible for
them.

The ssize_t patch is the single most disruptive patch in
Python 2.5, so it deserves special attention.

 I consider the fact that it's currently not possible to have
 a look at the changed APIs a documentation which is in the
 responsibility of the patch authors to provide (just like it
 always is).
 
 It is possible to look at the changed APIs, see
 
 http://docs.python.org/dev/api/sequence.html

That's neither a complete list (see the grep I sent), nor
does it highlight the changes.

The API documentation is not the right place for the
documentation of this change, IMHO, since it always refers
to the current state in a specific Python release and not
the changes applied to an API compared to an older release.
The PEP is the perfect place for such a list, I guess.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 20 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-18 Thread M.-A. Lemburg
Ronald Oussoren wrote:
 On 17-mrt-2006, at 22:14, M.-A. Lemburg wrote:
 
 Martin v. Löwis wrote:
 Thomas Heller wrote:
 I'm not sure if this is what Marc-Andre means, but maybe these  
 definitions
 could go into a new include file:
 How would that include file be used? You would have to copy it  
 into your
 own source base, and include it, right?
 We could put it into a b/w compatibility header file, e.g.

 #include pycompat.h
 
 But wouldn't this header be needed on versions of python before 2.5?  

Yes. Ideally it should work on more Python versions than just
Python 2.5.

I have such a compatibility header file for the mx Extensions
(called mxpyapi.h and included in egenix-mx-base).

It includes #defines such as the one Thomas proposed for
Python back to version 1.5.

 That
 would make inclusion of a pycompat.h header with python 2.5 less useful.

Why is that ? For older versions you can copy it into your
extension's include directory. With the usual #ifdef PYCOMPAT_H
wrapper this won't get included if Python already includes
the header file via  #include Python.h.

 Including the completed block into the pep would be useful.
 
 Ronald
 
 
 -- 
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Mar 17  
 2006)
 Python/Zope Consulting and Support ...http:// 
 www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http:// 
 zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http:// 
 python.egenix.com/
 __ 
 __

 ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for  
 free ! 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: http://mail.python.org/mailman/options/python-dev/ 
 ronaldoussoren%40mac.com
 
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-17 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Since this change is going to affect a lot of 3rd party extensions,
 I'd also like to see a complete list of public APIs that changed and
 how they changed (including the type slots)
 
 You can construct this list easily by comparing the header files of
 the previous and the current release. Please contribute a patch that
 presents these changes in a form that you would consider acceptable.

Sorry, but I'd rather spend that time on getting my extensions
working again for both Python 2.4 and 2.5.

I think it's only fair that I ask the patch authors to complete
the PEP, since the ssize_t patch is causing extension authors
enough trouble already.

If you want quick adoption of the changes, you have
to make it as easy as possible for the authors to port their
extensions to the new API. Otherwise, we'll end up having
quite a large number of users who can't switch to Python 2.5
simply because their favorite extensions don't work with it.

The argument that it only affects 64-bit platforms
simply ignores reality. Most new machines are 64-bit machines,
so by the time Python 2.5 goes public, 64-bit will have a
large enough market share to make a difference.

It's also not good enough to simply suggest to ignore compiler
warnings - this falls back on the extension authors and the
quality of their code without them really having done anything
wrong.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-17 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 I think it's only fair that I ask the patch authors to complete
 the PEP, since the ssize_t patch is causing extension authors
 enough trouble already.
 
 Well, the PEP is complete as it stands. It's possible to provide
 more guidelines, but the specification part of it says precisely
 what I intend it to say. Also, the API documentation ought to
 be in the Python documentation, and, for these changes, it is.

Changes to the public API must be documented somewhere,
either in Misc/NEWS or in the PEP. How else do you expect
users to find out about these changes ???

 If you want quick adoption of the changes, you have
 to make it as easy as possible for the authors to port their
 extensions to the new API. Otherwise, we'll end up having
 quite a large number of users who can't switch to Python 2.5
 simply because their favorite extensions don't work with it.
 
 I don't see how giving a complete list of all changed functions
 helps in any way.

It documents those changes, allows extension authors and extension
users to check their extensions for possible problems and is (or
at least should be) standard procedure.

The list of changes is also necessary in order to be able
to write code which allows a module to work in both Python
2.4 and 2.5. The code snippet that Thomas suggested
should be part of the conversion guidelines.

Again, if you don't make it easy for the developers, they
are going to have a hard time upgrading their modules and
getting them stable again after the changes. Telling
them to read the header files or API docs and suggesting
that they should compare 2.4 and 2.5 versions of these should
not be the attitude with which such changes are approached.

 It's also not good enough to simply suggest to ignore compiler
 warnings - this falls back on the extension authors and the
 quality of their code without them really having done anything
 wrong.
 
 Sure. Compiler warnings should be corrected. That's why the compiler
 emits them. However, I don't see how this is related to the text of
 the PEP.


Module authors have the choice whether they support this PEP in their
code or not; if they support it, they have the choice of different
levels of compatibility.

If a module is not converted to support this PEP, it will continue to
work unmodified on a 32-bit system. On a 64-bit system, compile-time
errors and warnings might be issued, and the module might crash the
interpreter if the warnings are ignored.


Do you really think module authors do have a choice given that last
sentence ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-17 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Thomas Heller wrote:
 I'm not sure if this is what Marc-Andre means, but maybe these definitions
 could go into a new include file:
 
 How would that include file be used? You would have to copy it into your
 own source base, and include it, right?

We could put it into a b/w compatibility header file, e.g.

#include pycompat.h

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r43041 - python/trunk/Modules/_ctypes/cfield.c

2006-03-16 Thread M.-A. Lemburg
Thomas Heller wrote:
 Martin v. Löwis wrote:
 Thomas Heller wrote:
 BTW: Is a porting guide to make extension modules compatible with 2.5
 available somewhere? PEP 353 scratches only the surface...
 Wrt. ssize_t changes, PEP 353 is meant to be comprehensive. Which
 particular aspect are you missing?
 
 I suggest to change this:
 
   #if PY_VERSION_HEX  0x0205
   typedef int Py_ssize_t;
   #endif
 
 with this:
 
   #if (PY_VERSION_HEX  0x0205)
   typedef int Py_ssize_t;
   #define lenfunc inquiry
   #define readbufferproc getreadbufferproc
   #define writebufferproc getwritebufferproc
   #define segcountproc getsegcountproc
   #define charbufferproc getcharbufferproc
   #define ssizeargfunc intargfunc
   #define ssizessizeargfunc intintargfunc
   #define ssizeobjargproc intobjargproc
   #define ssizessizeobjargproc intintobjargproc
   ... more defines
   #endif
 
 Maybe a complete list of defines needed can be given?
 
 Then, from only reading the PEP without looking up the sources,
 it is not clear to me what the PY_SIZE_T_CLEAN definition does.
 
 Finally, the format codes to use for Py_ssize_t arguments passed to 
 PyBuild_Value,
 PyString_FromFormat, PyObject_CallFunction (and other functions) are not 
 mentioned at all.

Since this change is going to affect a lot of 3rd party extensions,
I'd also like to see a complete list of public APIs that changed and
how they changed (including the type slots)

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 16 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Using relative imports in std lib packages ([Python-checkins] r43033 - in python/trunk/Lib: distutils/sysconfig.py encodings/__init__.py)

2006-03-15 Thread M.-A. Lemburg
guido.van.rossum wrote:
 Author: guido.van.rossum
 Date: Wed Mar 15 05:33:54 2006
 New Revision: 43033
 
 Modified:
python/trunk/Lib/distutils/sysconfig.py
python/trunk/Lib/encodings/__init__.py
 Log:
 Use relative imports in a few places where I noticed the need.
 (Ideally, all packages in Python 2.5 will use the relative import
 syntax for all their relative import needs.)

Instead of adding relative imports to packages in the
standard lib, I'd suggest to use absolute imports instead.
These are much easier to manage, maintain and read.

There's also no need for relative imports in std lib
packages since these won't be subject to possible
relocation.

 Modified: python/trunk/Lib/distutils/sysconfig.py
 ==
 --- python/trunk/Lib/distutils/sysconfig.py   (original)
 +++ python/trunk/Lib/distutils/sysconfig.py   Wed Mar 15 05:33:54 2006
 @@ -16,7 +16,7 @@
  import string
  import sys
  
 -from errors import DistutilsPlatformError
 +from .errors import DistutilsPlatformError
  
  # These are needed in a couple of spots, so just compute them once.
  PREFIX = os.path.normpath(sys.prefix)
 
 Modified: python/trunk/Lib/encodings/__init__.py
 ==
 --- python/trunk/Lib/encodings/__init__.py(original)
 +++ python/trunk/Lib/encodings/__init__.pyWed Mar 15 05:33:54 2006
 @@ -27,7 +27,8 @@
  
  #
  
 -import codecs, types, aliases
 +import codecs, types
 +from . import aliases
  
  _cache = {}
  _unknown = '--unknown--'
 ___
 Python-checkins mailing list
 [EMAIL PROTECTED]
 http://mail.python.org/mailman/listinfo/python-checkins

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 15 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] FrOSCon 2006 - Call for {Papers|Projects}

2006-03-04 Thread M.-A. Lemburg
Is anyone interested in having a Python track at this conference ?

PS: The EuroPython conference takes place in Geneva, one week
later.

 Original Message 
Subject: FrOSCon 2006 - Call for {Papers|Projects}
Date: Fri, 03 Mar 2006 16:39:08 +0100
From: Sebastian Bergmann sebastian.bergmann # froscon.de
Organization: Free and Open Source Software Conference -
Bonn/Rhein-Sieg, Germany

FrOSCon is a two-day conference on free software and open source, which
takes place on 24th and 25th June 2006 at the University of Applied
Sciences Bonn-Rhein-Sieg, in St. Augustin near Bonn, Germany.

Focus of the conference is a comprehensive range of talks about current
topics in free software and open source. Furthermore, space will be
provided for developers of free software and open source projects to
organize their own developer meetings or even their own program.

FrOSCon is organized for the first time in 2006 by the department of
computer science in collaboration with the Linux/Unix User Group Sankt
Augustin, the student body and the FrOSCon e.V., and aims to establish
itself as the largest event of its kind in Rhineland.

=== Projects ===

Successful projects form the backbone of the open source and free
software movements. FrOSCon wants to acknowledge this important role of
projects for the community by offering special facilities to individual
projects.

=== Developer Rooms ===

Open source and free software projects are often coordinated primarily
via the internet, via email, the web and IRC. Nonetheless, personal
contact between members of the project is of paramount importance. A
conference like FrOSCon can serve as a meeting place, where distant
members of the project can come together and meet.

FrOSCon wants to offer dedicated developer rooms to individual
projects, which are organized by the projects themselves. Thereby, we
provide a space for projects to hold special-interest talks or
meetings, to congregate, exchange views and to socialize.

We have rooms in different sizes suitable for small developer meetings
up to sub-conferences. We offer Internet access in all developer rooms.
Besides tables and chairs, most rooms are or can be equipped with a
video beamer. If necessary, an additional lecture hall can be assigned
for talks or presentations where a larger audience is expected.

=== Application ===

Please send applications for a developer room via email to
[EMAIL PROTECTED] Please include at least the name of your project,
the URL of your website and the expected number of participants in your
email.

The Call for Projects ends on March 31st.

=== Call for Papers ===

Please note that our Call for Papers is still open until March 15th.
Don't miss the chance to submit a talk.

You can find all the details about the Call for Papers here:

http://www.froscon.org/wiki/CallforPapers

=== Booth Space ===

If your project wants to man a booth at FrOSCon, feel free to contact
[EMAIL PROTECTED] as well.

We are looking forward to hearing from you.

Kind regards,
The FrOSCon Team

-- 
Sebastian Bergmann  http://www.sebastian-bergmann.de/
GnuPG Key: 0xB85B5D69 / 27A7 2B14 09E4 98CD 6277 0E5B 6867 C514 B85B 5D69



-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 03 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-03-01 Thread M.-A. Lemburg
Walter Dörwald wrote:
 M.-A. Lemburg wrote:
 Walter Dörwald wrote:
 Hye-Shik Chang wrote:

 On 2/19/06, Walter Dörwald [EMAIL PROTECTED] wrote:
 M.-A. Lemburg wrote:
 Walter Dörwald wrote:
 Anyway, I've started implementing a patch that just adds
 codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig,
 UTF-16, UTF-16-LE and UTF-16-BE are already working.
 Nice :-)
 gencodec.py is updated now too. The rest should be manageble too.
 I'll leave updating the CJKV codecs to Hye-Shik though.
 Okay. I'll look whether how CJK codecs can be improved by the
 new protocol soon.  I guess it'll be not so difficult because CJK
 codecs have a their own common stateful framework already.
 OK, here's the patch: bugs.python.org/1436130 (assigned to MAL).

 Thanks. I won't be able to look into it this week though, probably
 next week.
 
 Any progress on this? I'd really like to get this into 2.5 and the
 feature freeze is approaching fast!

I'll have a look this week.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 01 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Making ascii the default encoding

2006-02-28 Thread M.-A. Lemburg
Neal Norwitz wrote:
 PEP 263 states that in Phase 2 the default encoding will be set to
 ASCII.  Although the PEP is marked final, this isn't actually
 implemented.  The warning about using non-ASCII characters started in
 2.3.  Does anyone think we shouldn't enforce the default being ASCII?
 
 This means if an # -*- coding: ... -*- is not set and non-ASCII
 characters are used, an error will be generated.

+1.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 28 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing Non-Unicode Support?

2006-02-21 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Note that this does not mean that we should forget about memory
 consumption issues. It's just that if there's only marginal
 interest in certain special builds of Python, I don't see the
 requirement for the Python core developers to maintain them.
 
 Well, the cost of Unicode support is not so much in the algorithmic
 part, but in the tables that come along with it. AFAICT, everything
 but unicodectype is optional; that is 5KiB of code and 20KiB of data
 on x86. Actually, the size of the code *does* matter, at a second
 glance. Here are the largest object files in the Python code base
 on my system (not counting dynamic modules):
 
textdata bss dec hex filename
4845   19968   0   2481360ed Objects/unicodectype.o
   226332432 352   254176349 Objects/listobject.o
   292591412 152   308237867 Objects/classobject.o
   20696   11488   4   321887dbc Python/bltinmodule.o
   33579 740   0   34319860f Objects/longobject.o
   34119  16 288   344238677 Python/ceval.o
   351792796   0   379759457 Modules/_sre.o
   26539   15820 416   42775a717 Modules/posixmodule.o
   3528388001056   45139b053 Objects/stringobject.o
   50360   0  28   50388c4d4 Python/compile.o
   684554624 440   73519   11f2f Objects/typeobject.o
   6999393161196   80505   13a79 Objects/unicodeobject.o
 
 So it appears that dropping Unicode support can indeed provide
 some savings.

 For reference, we also have an option to drop complex numbers:
 
9654 692   4   10350286e Objects/complexobject.o

So why not drop that as well ?

Note that I'm not saying that these switches are useless - of
course they do allow to strip down the Python interpreter.
I believe that only very few people are interested in having these
options and it's fair enough to put the burden of maintaining these
branches on them.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 21 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing Non-Unicode Support?

2006-02-21 Thread M.-A. Lemburg
Jeff Rush wrote:
 M.-A. Lemburg wrote:
 
 I'd say that the parties interested in non-Unicode versions of
 Python should maintain these branches of Python. Dito for other
 stripped down versions.
 
 I understand where you're coming from but the embedded market I
 encounter tends to focus on the hardware side.  If they can get a
 marketing star by grabbing Python off-the shelf, tweak the build and
 produce something to include with their product, they will. But if they
 have to maintain a branch, they'll just go with the defacto C API most
 such devices use.
 
 Note that this does not mean that we should forget about memory
 consumption issues. It's just that if there's only marginal
 interest in certain special builds of Python, I don't see the
 requirement for the Python core developers to maintain them.
 
 These requirements of customization may not be a strong case for today
 but could be impacting future growth of the language in certain
 sectors.  I'm a rabid Python evangelist and alway try to push Python
 into more nooks and crannies of the marketplace, similar to how the
 Linux kernel is available from the tiniest machines to the largest
 iron.  If the focus of Python is to be strictly a desktop, conventional
 (mostly ;-) language, restricting its adaptability to other less
 interesting environments may be a reasonable tradeoff to improve its
 maintainability.  But adaptability, especially when you don't fully grok
 where or how it will be used, can also be a competitive advantage.

I don't think this is a strong enough case to warrant having
to maintain a separate branch of the Python core.

Even platforms like Palm nowadays have enough RAM to cope with
the 100kB or so that Unicode support adds.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 21 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Removing Non-Unicode Support?

2006-02-20 Thread M.-A. Lemburg
Jeff Rush wrote:
 Guido van Rossum wrote:
 On 2/19/06, Jeff Rush [EMAIL PROTECTED] wrote:
 [Quoting Neal Norwitz]

 I've heard of a bunch of people using --disable-unicode.  I'm not sure
 if it's curiosity or if there are really production builds without
 unicode.  Ask this on c.l.p too.

 Do you know of any embedded platform that doesn't have unicode support
 as a requirement? Python runs fine on Nokia phones running Symbian,
 where *everything* is a Unicode string.
 
 1. PalmOS, at least the last time I was involved with it.  Python on a
 Palm is a very tight fit.
 
 
 2. GM862 Cellular Module with Python Interpreter
   http://tinyurl.com/jgxz
 
 These may be dimishing markets as memory capacity increases and I
 wouldn't argue adding compile flags for such at this late date, but if
 the flags are already there, perhaps the slight inconvenience to
 Python-internal developers is worth it.
 
 Hey, perhaps dropping out Unicode support is not a big win - I just know
 it is useful at times to have a collection of flags to drop out floating
 point, complex arithmetic, language parsing and such for
 memory-constrained cases.

These switches make the code less maintainable. I'm not even
talking about the testing overhead.

I'd say that the parties interested in non-Unicode versions of
Python should maintain these branches of Python. Dito for other
stripped down versions.

Note that this does not mean that we should forget about memory
consumption issues. It's just that if there's only marginal
interest in certain special builds of Python, I don't see the
requirement for the Python core developers to maintain them.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 20 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r42490 - in python/branches/release24-maint: Lib/fileinput.py Lib/test/test_fileinput.py Misc/NEWS

2006-02-19 Thread M.-A. Lemburg
Why are these new features being backported to 2.4 ?

georg.brandl wrote:
 Author: georg.brandl
 Date: Sun Feb 19 10:51:33 2006
 New Revision: 42490
 
 Modified:
python/branches/release24-maint/Lib/fileinput.py
python/branches/release24-maint/Lib/test/test_fileinput.py
python/branches/release24-maint/Misc/NEWS
 Log:
 Patch #1337756: fileinput now accepts Unicode filenames.
 
 
 Modified: python/branches/release24-maint/Lib/fileinput.py
 ==
 --- python/branches/release24-maint/Lib/fileinput.py  (original)
 +++ python/branches/release24-maint/Lib/fileinput.py  Sun Feb 19 10:51:33 2006
 @@ -184,7 +184,7 @@
  
  
  def __init__(self, files=None, inplace=0, backup=, bufsize=0):
 -if type(files) == type(''):
 +if isinstance(files, basestring):
  files = (files,)
  else:
  if files is None:
 
 Modified: python/branches/release24-maint/Lib/test/test_fileinput.py
 ==
 --- python/branches/release24-maint/Lib/test/test_fileinput.py
 (original)
 +++ python/branches/release24-maint/Lib/test/test_fileinput.pySun Feb 
 19 10:51:33 2006
 @@ -157,3 +157,13 @@
  verify(fi.lineno() == 6)
  finally:
  remove_tempfiles(t1, t2)
 +
 +if verbose:
 +print 15. Unicode filenames
 +try:
 +t1 = writeTmp(1, [A\nB])
 +fi = FileInput(files=unicode(t1, sys.getfilesystemencoding()))
 +lines = list(fi)
 +verify(lines == [A\n, B])
 +finally:
 +remove_tempfiles(t1)
 
 Modified: python/branches/release24-maint/Misc/NEWS
 ==
 --- python/branches/release24-maint/Misc/NEWS (original)
 +++ python/branches/release24-maint/Misc/NEWS Sun Feb 19 10:51:33 2006
 @@ -74,6 +74,8 @@
  Library
  ---
  
 +- Patch #1337756: fileinput now accepts Unicode filenames.
 +
  - Patch #1373643: The chunk module can now read chunks larger than
two gigabytes.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 19 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r42490 - in python/branches/release24-maint: Lib/fileinput.py Lib/test/test_fileinput.py Misc/NEWS

2006-02-19 Thread M.-A. Lemburg
Georg Brandl wrote:
 M.-A. Lemburg wrote:
 Why are these new features being backported to 2.4 ?

 georg.brandl wrote:
 Author: georg.brandl
 Date: Sun Feb 19 10:51:33 2006
 New Revision: 42490

 Modified:
python/branches/release24-maint/Lib/fileinput.py
python/branches/release24-maint/Lib/test/test_fileinput.py
python/branches/release24-maint/Misc/NEWS
 Log:
 Patch #1337756: fileinput now accepts Unicode filenames.
 
 Is that a new feature? I thought that wherever a filename is accepted,
 it can be unicode too.
 
 The previous behavior was a bug in any case, since it treated the
 unicode string as a sequence of filenames. Would you fix that by
 raising a ValueError?

No, but from the text in the NEWS file things sounded a lot
like a feature rather than a bug fix.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 19 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin, v. Löwis wrote:
 How are users confused?
 
 Users do
 
 py Martin v. Löwis.encode(utf-8)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 UnicodeDecodeError: 'ascii' codec can't decode byte 0xf6 in position 11:
 ordinal not in range(128)
 
 because they want to convert the string to Unicode, and they have
 found a text telling them that .encode(utf-8) is a reasonable
 method.
 
 What it *should* tell them is
 
 py Martin v. Löwis.encode(utf-8)
 Traceback (most recent call last):
   File stdin, line 1, in ?
 AttributeError: 'str' object has no attribute 'encode'

I've already explained why we have .encode() and .decode()
methods on strings and Unicode many times. I've also
explained the misunderstanding that can codecs only do
Unicode-string conversions. And I've explained that
the .encode() and .decode() method *do* check the return
types of the codecs and only allow strings or Unicode
on return (no lists, instances, tuples or anything else).

You seem to ignore this fact.

If we were to follow your idea, we should remove .encode()
and .decode() altogether and refer users to the codecs.encode()
and codecs.decode() function. However, I doubt that users
will like this idea.

 bytes.encode CAN only produce bytes.
 
 I don't understand MAL's design, but I believe in that design,
 bytes.encode could produce anything (say, a list). A codec
 can convert anything to anything else.

True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Just because some codecs don't fit into the string.decode()
 or bytes.encode() scenario doesn't mean that these codecs are
 useless or that the methods should be banned.
 
 No. The reason to ban string.decode and bytes.encode is that
 it confuses users.

Instead of starting to ban everything that can potentially
confuse a few users, we should educate those users and tell
them what these methods mean and how they should be used.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Thomas Wouters wrote:
 On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
 
 I've already explained why we have .encode() and .decode()
 methods on strings and Unicode many times. I've also
 explained the misunderstanding that can codecs only do
 Unicode-string conversions. And I've explained that
 the .encode() and .decode() method *do* check the return
 types of the codecs and only allow strings or Unicode
 on return (no lists, instances, tuples or anything else).

 You seem to ignore this fact.
 
 Actually, I think the problem is that while we all agree the
 bytestring/unicode methods are a useful way to convert from bytestring to
 unicode and back again, we disagree on their *general* usefulness. Sure, the
 codecs mechanism is powerful, and even more so because they can determine
 their own returntype. But it still smells and feels like a Perl attitude,
 for the reasons already explained numerous times, as well:

It's by no means a Perl attitude.

The main reason is symmetry and the fact that strings and Unicode
should be as similar as possible in order to simplify the task of
moving from one to the other.

  - The return value for the non-unicode encodings depends on the value of
the encoding argument.

Not really: you'll always get a basestring instance.

  - The general case, by and large, especially in non-powerusers, is to
encode unicode to bytestrings and to decode bytestrings to unicode. And
that is a hard enough task for many of the non-powerusers. Being able to
use the encode/decode methods for other tasks isn't helping them.

Agreed.

Still, I believe that this is an educational problem. There are
a couple of gotchas users will have to be aware of (and this is
unrelated to the methods in question):

* encoding always refers to transforming original data into
  a derived form

* decoding always refers to transforming a derived form of
  data back into its original form

* for Unicode codecs the original form is Unicode, the derived
  form is, in most cases, a string

As a result, if you want to use a Unicode codec such as utf-8,
you encode Unicode into a utf-8 string and decode a utf-8 string
into Unicode.

Encoding a string is only possible if the string itself is
original data, e.g. some data that is supposed to be transformed
into a base64 encoded form.

Decoding Unicode is only possible if the Unicode string itself
represents a derived form, e.g. a sequence of hex literals.

 That is why I disagree with the hypergeneralization of the encode/decode
 methods, regardless of the fact that it is a natural expansion of the
 implementation of codecs. Sure, it looks 'right' and 'natural' when you look
 at the implementation. It sure doesn't look natural, to me and to many
 others, when you look at the task of encoding and decoding
 bytestrings/unicode.

That's because you only look at one specific task.

Codecs also unify the various interfaces to common encodings
such as base64, uu or zip which are not Unicode related.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] A codecs nit

2006-02-18 Thread M.-A. Lemburg
Barry Warsaw wrote:
 On Wed, 2006-02-15 at 22:07 +0100, M.-A. Lemburg wrote:
 
 Those are not pseudo-encodings, they are regular codecs.

 It's a common misunderstanding that codecs are only seen as serving
 the purpose of converting between Unicode and strings.

 The codec system is deliberately designed to be general enough
 to also work with many other types, e.g. it is easily possible to
 write a codec that convert between the hex literal sequence you
 have above to a list of ordinals:
 
 Slightly off-topic, but one thing that's always bothered me about the
 current codecs implementation is that str.encode() (and friends)
 implicitly treats its argument as module, and imports it, even if the
 module doesn't live in the encodings package.  That seems like a mistake
 to me (and a potential security problem if the import has side-effects).

It was a mistake, yes, and thanks for bringing this up.

Codec packages should implement and register their own
codec search functions.

 I don't know whether at the very least restricting the imports to the
 encodings package would make sense or would break things.
 
 import sys
 sys.modules['smtplib']
 Traceback (most recent call last):
   File stdin, line 1, in ?
 KeyError: 'smtplib'
 ''.encode('smtplib')
 Traceback (most recent call last):
   File stdin, line 1, in ?
 LookupError: unknown encoding: smtplib
 sys.modules['smtplib']
 module 'smtplib' from '/usr/lib/python2.4/smtplib.pyc'
 
 I can't see any reason for allowing any randomly importable module to
 act like an encoding.

The encodings package search function will try to import
the module and then check the module signature. If the
module fails to export the codec registration API, then
it raises the LookupError you see above.

At the time, it was nice to be able to write codec
packages as Python packages and have them readily usable
by just putting the package on the sys.path.

This was a side-effect of the way the encodings search
function worked. The original design idea was to have
all 3rd party codecs register themselves with the
codec registry. However, this implies that the application
using the codecs would have to run the registration
code at least ones. Since the encodings package search
function provided a more convenient way, this was used
by most codec package programmers.

In Py 2.5 we'll change that. The encodings package search
function will only allow codecs in that package to be
imported. All other codec packages will have to provide
their own search function and register this with the
codecs registry.

The big question is: what to do about 2.3 and 2.4 - adding
the same patch will cause serious breakage, since popular
codec packages such as Tamito's Japanese package rely
on the existing behavior.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex()

2006-02-18 Thread M.-A. Lemburg
Aahz wrote:
 On Sat, Feb 18, 2006, Ron Adam wrote:
 I like the bytes.recode() idea a lot. +1

 It seems to me it's a far more useful idea than encoding and decoding by 
 overloading and could do both and more.  It has a lot of potential to be 
 an intermediate step for encoding as well as being used for many other 
 translations to byte data.

 I think I would prefer that encode and decode be just functions with 
 well defined names and arguments instead of being methods or arguments 
 to string and Unicode types.

 I'm not sure on exactly how this would work. Maybe it would need two 
 sets of encodings, ie.. decoders, and encoders.  An exception would be
 given if it wasn't found for the direction one was going in.
 
 Here's an idea I don't think I've seen before:
 
 bytes.recode(b, src_encoding, dest_encoding)
 
 This requires the user to state up-front what the source encoding is.
 One of the big problems that I see with the whole encoding mess is that
 so much of it contains implicit assumptions about the source encoding;
 this gets away from that.

You might want to look at the codecs.py module: it has all these
things and a lot more.

http://docs.python.org/lib/module-codecs.html
http://svn.python.org/view/python/trunk/Lib/codecs.py?view=markup

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 I've already explained why we have .encode() and .decode()
 methods on strings and Unicode many times. I've also
 explained the misunderstanding that can codecs only do
 Unicode-string conversions. And I've explained that
 the .encode() and .decode() method *do* check the return
 types of the codecs and only allow strings or Unicode
 on return (no lists, instances, tuples or anything else).

 You seem to ignore this fact.
 
 I'm not ignoring the fact that you have explained this
 many times. I just fail to understand your explanations.

Feel free to ask questions.

 For example, you said at some point that codecs are not
 restricted to Unicode. However, I don't recall any
 explanation what the restriction *is*, if any restriction
 exists. No such restriction seems to be documented.

The codecs are not restricted w/r to the data types
they work on. It's up to the codecs to define which
data types are valid and which they take on input and
return.

 True. However, note that the .encode()/.decode() methods on
 strings and Unicode narrow down the possible return types.
 The corresponding .bytes methods should only allow bytes and
 Unicode.
 
 I forgot that: what is the rationale for that restriction?

To assure that only those types can be returned from those
methods, ie. instances of basestring, which in return permits
type inference for those methods.

The codecs functions encode() and decode() don't have these
restrictions, and thus provide a generic interface to the
codec's encode and decode functions. It's up to the caller
to restrict the allowed encodings and as result the
possible input/output types.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-18 Thread M.-A. Lemburg
Walter Dörwald wrote:
 M.-A. Lemburg wrote:
 Walter Dörwald wrote:
 I'd suggest we keep codecs.lookup() the way it is and
 instead add new functions to the codecs module, e.g.
 codecs.getencoderobject() and codecs.getdecoderobject().

 Changing the codec registration is not much of a problem:
 we could simply allow 6-tuples to be passed into the
 registry.
 OK, so codecs.lookup() returns 4-tuples, but the registry stores 6-tuples 
 and the search functions must return 6-tuples.
 And we add codecs.getencoderobject() and codecs.getdecoderobject() as 
 well as new classes codecs.StatefulEncoder and
 codecs.StatefulDecoder. What about old search functions that return 
 4-tuples?
 The registry should then simply set the missing entries to None and the 
 getencoderobject()/getdecoderobject() would then
 have
 to raise an error.
 Sounds simple enough and we don't loose backwards compatibility.

 Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
 +1, but I'd like to have a replacement for this, i.e. a function that 
 returns all info the registry has about an encoding:

 1. Name
 2. Encoder function
 3. Decoder function
 4. Stateful encoder factory
 5. Stateful decoder factory
 6. Stream writer factory
 7. Stream reader factory

 and if this is an object with attributes, we won't have any problems if we 
 extend it in the future.
 Shouldn't be a problem: just expose the registry dictionary
 via the _codecs module.

 The rest can then be done in a Python function defined in
 codecs.py using a CodecInfo class.
 
 This would require the Python code to call codecs.lookup() and then look into 
 the codecs dictionary (normalizing the encoding
 name again). Maybe we should make a version of __PyCodec_Lookup() that allows 
 4- and 6-tuples available to Python and use that?
 The official PyCodec_Lookup() would then have to downgrade the 6-tuples to 
 4-tuples.

Hmm, you're right: the dictionary may not have the requested codec
info yet (it's only used as cache) and only a call to _PyCodec_Lookup()
would fill it.

 BTW, if we change the API, can we fix the return value of the stateless 
 functions? As the stateless function always
 encodes/decodes the complete string, returning the length of the string 
 doesn't make sense.
 codecs.getencoder() and codecs.getdecoder() would have to continue to 
 return the old variant of the functions, but
 codecs.getinfo(latin-1).encoder would be the new encoding function.
 No: you can still write stateless encoders or decoders that do
 not process the whole input string. Just because we don't have
 any of those in Python, doesn't mean that they can't be written
 and used. A stateless codec might want to leave the work
 of buffering bytes at the end of the input data which cannot
 be processed to the caller.
 
 But what would the call do with that info? It can't retry encoding/decoding 
 the rejected input, because the state of the codec
 has been thrown away already.

This depends a lot on the nature of the codec. It may well be
possible to work on chunks of input data in a stateless way,
e.g. say you have a string of 4-byte hex values, then the decode
function would be able to work on 4 bytes each and let the caller
buffer any remaining bytes for the next call. There'd be no need for
keeping state in the decoder function.

 It is also possible to write
 stateful codecs on top of such stateless encoding and decoding
 functions.
 
 That's what the codec helper functions from Python/_codecs.c are for.

I'm not sure what you mean here.

 Anyway, I've started implementing a patch that just adds 
 codecs.StatefulEncoder/codecs.StatefulDecoder. UTF8, UTF8-Sig, UTF-16,
 UTF-16-LE and UTF-16-BE are already working.

Nice :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-18 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 True. However, note that the .encode()/.decode() methods on
 strings and Unicode narrow down the possible return types.
 The corresponding .bytes methods should only allow bytes and
 Unicode.
 I forgot that: what is the rationale for that restriction?

 To assure that only those types can be returned from those
 methods, ie. instances of basestring, which in return permits
 type inference for those methods.
 
 Hmm. So it for type inference
 Where is that documented?

Somewhere in the python-dev mailing list archives ;-)

Seriously, we should probably add this to the documentation.

 This looks pretty inconsistent. Either codecs can give arbitrary
 return types, then .encode/.decode should also be allowed to
 give arbitrary return types, or codecs should be restricted.

No.

As I've said before: the .encode() and .decode() methods
are convenience methods to interface to codecs which take
string/Unicode on input and create string/Unicode output.

 What's the point of first allowing a wide interface, and then
 narrowing it?

The codec interface is an abstract interface. It is a flexible
enough to allow codecs to define possible input and output
types while being strict about the method names and signatures.

Much like the file interface in Python, the copy protocol
or the pickle interface.

 Also, if type inference is the goal, what is the point in allowing
 two result types?

I'm not sure I understand the question: type inference is about
being able to infer the types of (among other things) function
return objects. This is what the restriction guarantees - much
like int() guarantees that you get either an integer or a long.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Pre-PEP: The bytes object

2006-02-17 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/15/06, Neil Schemenauer [EMAIL PROTECTED] wrote:
 This could be a replacement for PEP 332.  At least I hope it can
 serve to summarize the previous discussion and help focus on the
 currently undecided issues.

 I'm too tired to dig up the rules for assigning it a PEP number.
 Also, there are probably silly typos, etc.   Sorry.
 
 I may check it in for you, although right now it would be good if we
 had some more feedback.
 
 I noticed one behavior in your pseudo-code constructor that seems
 questionable: while in the QA section you explain why the encoding is
 ignored when the argument is a str instance, in fact you require an
 encoding (and one that's not ascii) if the str instance contains any
 non-ASCII bytes. So bytes(\xff) would fail, but bytes(\xff,
 blah) would succeed. I think that's a bit strange -- if you ignore
 the encoding, you should always ignore it. So IMO bytes(\xff) and
 bytes(\xff, ascii) should both return the same as bytes([255]).
 Also, there's a code path where the initializer is a unicode instance
 and its encode() method is called with None as the argument. I think
 both could be fixed by setting the encoding to
 sys.getdefaultencoding() if it is None and the argument is a unicode
 instance:
 
 def bytes(initialiser=[], encoding=None):
 if isinstance(initialiser, basestring):
 if isinstance(initialiser, unicode):
 if encoding is None:
 encoding = sys.getdefaultencoding()
 initialiser = initialiser.encode(encoding)
 initialiser = [ord(c) for c in initialiser]
 elif encoding is not None:
 raise TypeError(explicit encoding invalid for non-string 
 initialiser)
 create bytes object and fill with integers from initialiser
 return bytes object
 
 BTW, for folks who want to experiment, it's quite simple to create a
 working bytes implementation by inheriting from array.array. Here's a
 quick draft (which only takes str instance arguments):
 
 from array import array
 class bytes(array):
 def __new__(cls, data=None):
 b = array.__new__(cls, B)
 if data is not None:
 b.fromstring(data)
 return b
 def __str__(self):
 return self.tostring()
 def __repr__(self):
 return bytes(%s) % repr(list(self))
 def __add__(self, other):
 if isinstance(other, array):
 return bytes(super(bytes, self).__add__(other))
 return NotImplemented

Another hint:

If you want to play around with the migration
to all Unicode in Py3k, start Python with the -U switch and
monkey-patch the builtin str to be an alias for unicode.

Ideally, the bytes type should work under both the Py3k conditions
and the Py2.x default ones.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-17 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/16/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 What will be the explicit way to open a file in bytes mode
 and in text mode (I for one would like to move away from
 open() completely as well) ?

 Will we have a single file type with two different modes
 or two different types ?
 
 I'm currently thinking of an I/O stack somewhat like Java's. At the
 bottom there's a class that lets you do raw unbuffered reads and
 writes (and seek/tell) on binary files using bytes arrays. We can
 layer onto this buffering, text encoding/decoding, and more. (Windows
 CRLF-LF conversion is also an encoding of sorts).

Sounds like the stackable StreamWriters and -Readers would
nicely integrate into this design.

 Years ago I wrote a prototype; checkout sandbox/sio/.

Thanks. Maybe one of these days I'll get around to having
a look - unlike many of the pydev folks, I don't work for
Google and can't spend 20% or 50% of my time on
Python core development :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.translate vs unicode.translate

2006-02-17 Thread M.-A. Lemburg
Bengt Richter wrote:
 If str becomes unicode for PY 3000, and we then have bytes as out 
 coding-agnostic
 byte data, then I think bytes should have the str translation method, with a 
 tweak
 that I would hope could also be done to str now.
 
 BTW, str.translate will presumably become unicode.translate, so
 perhaps unicode.translate should grow a compatible deletechars parameter.

I'd much rather like to see .translate() method deprecated.

Writing a code for the task is much more effective - the
builtin charmap codec will do all the mapping for you,
if you have a need to go from bytes to Unicode and vice-
versa.

We could also have a bytemap codec for doing bytes to bytes
conversions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-17 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Josiah Carlson wrote:
 I would agree that zip is questionable, but 'uu', 'rot13', perhaps 'hex',
 and likely a few others that the two of you may be arguing against
 should stay as encodings, because strictly speaking, they are defined as
 encodings of data.  They may not be encodings of _unicode_ data, but
 that doesn't mean that they aren't useful encodings for other kinds of
 data, some text, some binary, ...
 
 To support them, the bytes type would have to gain a .encode method,
 and I'm -1 on supporting bytes.encode, or string.decode.
 
 Why is
 
 s.encode(uu)
 
 any better than
 
 binascii.b2a_uu(s)

The .encode() and .decode() methods are merely convenience
interfaces to the registered codecs (with some extra logic to
make sure that only a pre-defined set of return types are allowed).
It's up to the user to use them for e.g. UU-encoding or not.

The reason we have codecs for UU, zip and the others is that
you can use their StreamWriters/Readers in stackable streams.

Just because some codecs don't fit into the string.decode()
or bytes.encode() scenario doesn't mean that these codecs are
useless or that the methods should be banned.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] r42396 - peps/trunk/pep-0011.txt

2006-02-17 Thread M.-A. Lemburg
Neal Norwitz wrote:
 [Moving to python-dev]
 
 I don't have a strong opinion.  Any one else have an opinion about
 removing --with-wctype-functions from configure?

FWIW, I announced this plan in Dec 2004:

http://mail.python.org/pipermail/python-dev/2004-December/050193.html

I didn't get any replies back then, so assumed that no-one
would object, but forgot to add this to the PEP 11.

The reason I'd like to get this removed early rather than
later is that some Linux distros happen to use the config
switch causing the Python Unicode implementation on those
distros to behave inconsistent with regular Python
builds.

After all we've put a lot of effort into making sure that
the Unicode implementation does work independently of
the platform, even on platforms that don't have Unicode
support at all.

Another candidate for removal is the --disable-unicode
switch.

We should probably add a deprecation warning for that in
Py 2.5 and then remove the hundreds of
#idef Py_USING_UNICODE
from the source code in time for Py 2.6.

 n
 --
 
 On 2/16/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 neal.norwitz wrote:
 Author: neal.norwitz
 Date: Thu Feb 16 06:25:37 2006
 New Revision: 42396

 Modified:
peps/trunk/pep-0011.txt
 Log:
 MAL says this option should go away in bug report 874534:

 The reason for the removal is that the option causes
 semantical problems and makes Unicode work in non-standard
 ways on platforms that use locale-aware extensions to the
 wc-type functions.

 Since it wasn't previously announced, we can keep the option until 2.6
 unless someone feels strong enough to rip it out.
 I've been wanting to rip this out for some time now, but
 you're right: I forgot to add this to PEP 11, so let's
 wait for another release.

 OTOH, this normally only affects system builders, so perhaps
 we could do this a little faster, e.g. add a warning in the
 first alpha and then rip it out with one of the last betas ?!

 Modified: peps/trunk/pep-0011.txt

 +Name: Systems using --with-wctype-functions
 +Unsupported in:   Python 2.6
 +Code removed in:  Python 2.6
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-17 Thread M.-A. Lemburg
Walter Dörwald wrote:
 M.-A. Lemburg wrote:
 
 Walter Dörwald wrote:
 Guido van Rossum wrote:

 [...]
 Years ago I wrote a prototype; checkout sandbox/sio/.
 However sio.DecodingInputFilter and sio.EncodingOutputFilter don't work
 for encodings that need state (e.g. when reading/writing UTF-16).
 Switching to stateful encoders/decoders isn't so easy, because the
 stateful codecs require a stream-API, which brings in a whole bunch of
 other functionality (readline() etc.), which we'd probably like to keep
 separate. I have a patch (http://bugs.python.org/1101097) that should
 fix this problem (at least for all codecs derived from
 codecs.StreamReader/codecs.StreamWriter). Additionally it would make
 stateful codecs more useful in the context for iterators/generators.

 I'd like this patch to go into 2.5.

 The patch as-is won't go into 2.5. It's simply the wrong approach:
 StreamReaders and -Writers work on streams (hence the name). It
 doesn't make sense adding functionality to side-step this behavior,
 since it undermines the design.
 
 I agree that using a StreamWriter without a stream somehow feels wrong.
 
 Like I suggested in the patch discussion, such functionality could
 be factored out of the implementations of StreamReaders/Writers
 and put into new StatefulEncoder/Decoder classes, the objects of
 which then get used by StreamReader/Writer.

 In addition to that we could extend the codec registry to also
 maintain slots for the stateful encoders and decoders, if needed.
 
 We *have* to do it like this otherwise there would be no way to get a
 StatefulEncoder/Decoder from an encoding name.

 Does this mean that codecs.lookup() would have to return a 6-tuple? 
 But this would break if someone uses codecs.lookup(foo)[-1].

Right; though I'd much rather see that people use the direct
codecs module lookup APIs:

getencoder(), getdecoder(), getreader() and getwriter()

instead of using codecs.lookup() directly.

 So maybe
 codecs.lookup() should return an instance of a subclass of tuple which
 has the StatefulEncoder/Decoder as attributes. But then codecs.lookup()
 must be able to handle old 4-tuples returned by old search functions and
 update those to the new 6-tuples. (But we could drop this again after
 several releases, once all third party codecs are updated).

This was a design error: I should have not made
codecs.lookup() a documented function.

I'd suggest we keep codecs.lookup() the way it is and
instead add new functions to the codecs module, e.g.
codecs.getencoderobject() and codecs.getdecoderobject().

Changing the codec registration is not much of a problem:
we could simply allow 6-tuples to be passed into the
registry.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Stateful codecs [Was: str object going in Py3K]

2006-02-17 Thread M.-A. Lemburg
Walter Dörwald wrote:
 I'd suggest we keep codecs.lookup() the way it is and
 instead add new functions to the codecs module, e.g.
 codecs.getencoderobject() and codecs.getdecoderobject().

 Changing the codec registration is not much of a problem:
 we could simply allow 6-tuples to be passed into the
 registry.
 OK, so codecs.lookup() returns 4-tuples, but the registry stores
 6-tuples and the search functions must return 6-tuples. And we add
 codecs.getencoderobject() and codecs.getdecoderobject() as well as new
 classes codecs.StatefulEncoder and codecs.StatefulDecoder. What about
 old search functions that return 4-tuples?

 The registry should then simply set the missing entries to None
 and the getencoderobject()/getdecoderobject() would then have
 to raise an error.
 
 Sounds simple enough and we don't loose backwards compatibility.
 
 Perhaps we should also deprecate codecs.lookup() in Py 2.5 ?!
 
 +1, but I'd like to have a replacement for this, i.e. a function that
 returns all info the registry has about an encoding:
 
 1. Name
 2. Encoder function
 3. Decoder function
 4. Stateful encoder factory
 5. Stateful decoder factory
 6. Stream writer factory
 7. Stream reader factory
 
 and if this is an object with attributes, we won't have any problems if
 we extend it in the future.

Shouldn't be a problem: just expose the registry dictionary
via the _codecs module.

The rest can then be done in a Python function defined in
codecs.py using a CodecInfo class.

 BTW, if we change the API, can we fix the return value of the stateless
 functions? As the stateless function always encodes/decodes the complete
 string, returning the length of the string doesn't make sense.
 codecs.getencoder() and codecs.getdecoder() would have to continue to
 return the old variant of the functions, but
 codecs.getinfo(latin-1).encoder would be the new encoding function.

No: you can still write stateless encoders or decoders that do
not process the whole input string. Just because we don't have
any of those in Python, doesn't mean that they can't be written
and used. A stateless codec might want to leave the work
of buffering bytes at the end of the input data which cannot
be processed to the caller. It is also possible to write
stateful codecs on top of such stateless encoding and decoding
functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] from __future__ import unicode_strings?

2006-02-16 Thread M.-A. Lemburg
Neil Schemenauer wrote:
 On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
 On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:

 from __future__ import unicode_strings
 Didn't we have a command-line option to do this? I believe it was
 removed because nobody could see the point. (Or am I hallucinating?
 After several days of non-stop discussing bytes that must be
 considered a possibility.)
 We do, and it's not been removed: the -U switch.
 
 As Guido alluded, the global switch is useless.  A per-module switch
 something that could actually useful.  One nice advantage is that
 you would write code that works the same with Jython (wrt to string
 literals anyhow).

The global switch is not useless. It's purpose is to test the
standard library (or any other piece of Python code) for Unicode
compatibility.

Since we're not even close to such compatibility, I'm not sure
how useful a per-module switch would be.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 16 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] from __future__ import unicode_strings?

2006-02-16 Thread M.-A. Lemburg
Giovanni Bajo wrote:
 Thomas Wouters [EMAIL PROTECTED] wrote:
 
 from __future__ import unicode_strings
 Didn't we have a command-line option to do this? I believe it was
 removed because nobody could see the point. (Or am I hallucinating?
 After several days of non-stop discussing bytes that must be
 considered a possibility.)
 We do, and it's not been removed: the -U switch.
 
 
 It's not in the output of python -h, though. Is it secret or what?

Yes.

We removed it from the help output to not confuse users
who are not aware of the fact that this is an experimental
switch.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 16 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] from __future__ import unicode_strings?

2006-02-16 Thread M.-A. Lemburg
Jean-Paul Calderone wrote:
 On Thu, 16 Feb 2006 11:24:35 +0100, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Neil Schemenauer wrote:
 On Thu, Feb 16, 2006 at 02:43:02AM +0100, Thomas Wouters wrote:
 On Wed, Feb 15, 2006 at 05:23:56PM -0800, Guido van Rossum wrote:

 from __future__ import unicode_strings
 Didn't we have a command-line option to do this? I believe it was
 removed because nobody could see the point. (Or am I hallucinating?
 After several days of non-stop discussing bytes that must be
 considered a possibility.)
 We do, and it's not been removed: the -U switch.
 As Guido alluded, the global switch is useless.  A per-module switch
 something that could actually useful.  One nice advantage is that
 you would write code that works the same with Jython (wrt to string
 literals anyhow).
 The global switch is not useless. It's purpose is to test the
 standard library (or any other piece of Python code) for Unicode
 compatibility.

 Since we're not even close to such compatibility, I'm not sure
 how useful a per-module switch would be.
 
 Just what Neil suggested: developers writing new code benefit from having the 
 behavior which will ultimately be Python's default, rather than the behavior 
 that is known to be destined for obsolescence.
 
 Being able to turn this on per-module is useful for the same reason the rest 
 of the future system is useful on a per-module basis.  It's easier to convert 
 things incrementally than monolithicly.

Sure, but in this case the option would not only affect the module
you define it in, but also all other code that now gets Unicode
objects instead of strings as a result of the Unicode literals
defined in these modules.

It is rather likely that you'll start hitting Unicode-related
compatibility bugs in the standard lib more often than you'd
like.

It's usually better to switch to Unicode in a controlled manner:
not by switching all literals to Unicode, but only some, then
test things, then switch over some more, etc.

This can be done by prepending the literal with the u modifier.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 16 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-16 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/15/06, Alex Martelli [EMAIL PROTECTED] wrote:
 I agree, or, MAL's idea of bytes.open() and unicode.open() is also
 good.
 
 No, the bytes and text data types shouldn't have to be tied to the I/O
 system. (The latter tends to evolve at a much faster rate so should be
 isolated.)
 
 My fondest dream is that we do NOT have an 'open' builtin
 which has proven to be very error-prone when used in Windows by
 newbies (as evidenced by beginner errors as seen on c.l.py, the
 python-help lists, and other venues) -- defaulting 'open' to text is
 errorprone, defaulting it to binary doesn't seem the greatest idea
 either, principle when in doubt, resist the temptation to guess
 strongly suggests not having 'open' as a built-in at all.
 
 Bill Janssen has expressed this sentiment too. But this is because
 open() *appears* to work for both types to Unix programmers. If open()
 is *only* usable for text data, even Unix programmers will be using
 openbytes() from the start.

All the variations aside:

What will be the explicit way to open a file in bytes mode
and in text mode (I for one would like to move away from
open() completely as well) ?

Will we have a single file type with two different modes
or two different types ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 16 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/15/06, Nick Coghlan [EMAIL PROTECTED] wrote:
 If we went with longer names, a slight variation on the opentext/openbinary
 idea would be to use opentext and opendata.
 
 After some thinking I don't like opendata any more -- often data is
 text, so the term is wrong. openbinary is fine but long. So how about
 openbytes? This clearly links the resulting object with the bytes
 type, which is mutually reassuring.
 
 Regarding open vs. opentext, I'm still not sure. I don't want to
 generalize from the openbytes precedent to openstr or openunicode
 (especially since the former is wrong in 2.x and the latter is wrong
 in 3.0). I'm tempting to hold out for open() since it's most
 compatible.

Maybe a weird idea, but why not use static methods on the
bytes and str type objects for this ?!

E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
renamed to str.openfile())

After all, you are in a certain way constructing object
of the given types - only that the input to these
constructors happen to be files in the file system.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str object going in Py3K

2006-02-15 Thread M.-A. Lemburg
Barry Warsaw wrote:
 On Wed, 2006-02-15 at 18:29 +0100, M.-A. Lemburg wrote:
 
 Maybe a weird idea, but why not use static methods on the
 bytes and str type objects for this ?!

 E.g. bytes.openfile(...) and unicode.openfile(...) (in 3.0
 renamed to str.openfile())
 
 That's also not a bad idea, but I'd leave off one or the other of the
 redudant open and file parts.  E.g. bytes.open() and unicode.open()
 seem fine to me (we all know what 'open' means, right? :).

Thinking about it, I like your idea better (file.bytes()
and file.text()).

Anyway, as long as we don't start adding openthis() and openthat()
I guess I'm happy ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bytes.from_hex() [Was: PEP 332 revival in coordination with pep 349?]

2006-02-15 Thread M.-A. Lemburg
Jason Orendorff wrote:
 Instead of byte literals, how about a classmethod bytes.from_hex(), which
 works like this:
 
   # two equivalent things
   expected_md5_hash = bytes.from_hex('5c535024cac5199153e3834fe5c92e6a')
   expected_md5_hash = bytes([92, 83, 80, 36, 202, 197, 25, 145, 83, 227,
 131, 79, 229, 201, 46, 106])
 
 It's just a nicety; the former fits my brain a little better.  This would
 work fine both in 2.5 and in 3.0.
 
 I thought about unicode.encode('hex'), but obviously it will continue to
 return a str in 2.x, not bytes.  Also the pseudo-encodings ('hex', 'rot13',
 'zip', 'uu', etc.) generally scare me. 

Those are not pseudo-encodings, they are regular codecs.

It's a common misunderstanding that codecs are only seen as serving
the purpose of converting between Unicode and strings.

The codec system is deliberately designed to be general enough
to also work with many other types, e.g. it is easily possible to
write a codec that convert between the hex literal sequence you
have above to a list of ordinals:

 Hex string codec

Converts between a list of ordinals and a two byte hex literal
string.

Usage:
 codecs.encode([1,2,3], 'hexstring')
'010203'
 codecs.decode(_, 'hexstring')
[1, 2, 3]

(c) 2006, Marc-Andre Lemburg.


import codecs

class Codec(codecs.Codec):

def encode(self, input, errors='strict'):

 Convert hex ordinal list to hex literal string.

if not isinstance(input, list):
raise TypeError('expected list of integers')
return (
''.join(['%02x' % x for x in input]),
len(input))

def decode(self,input,errors='strict'):

 Convert hex literal string to hex ordinal list.

if not isinstance(input, str):
raise TypeError('expected string of hex literals')
size = len(input)
if not size % 2 == 0:
raise TypeError('input string has uneven length')
return (
[int(input[(i1):(i1)+2], 16)
 for i in range(size  1)],
size)

class StreamWriter(Codec,codecs.StreamWriter):
pass

class StreamReader(Codec,codecs.StreamReader):
pass

def getregentry():
return (Codec().encode,Codec().decode,StreamReader,StreamWriter)

 And now that bytes and text are
 going to be two very different types, they're even weirder than before.
 Consider:
 
   text.encode('utf-8') == bytes
   text.encode('rot13') == text
   bytes.encode('zip') == bytes
   bytes.encode('uu') == text (?)
 
 This state of affairs seems kind of crazy to me.

Really ?

It all depends on what you use the codecs for. The above
usages through the .encode() and .decode() methods is
not the only way you can make use of them.

To get full access to the codecs, you'll have to use
the codecs module.

 Actually users trying to figure out Unicode would probably be better served
 if bytes.encode() and text.decode() did not exist.

You're missing the point: the .encode() and .decode() methods
are merely interfaces to the registered codecs. Whether they
make sense for a certain codec depends on the codec, not the
methods that interface to it, and again, codecs do not
only exist to convert between Unicode and strings.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 15 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 It's the consequences:  nobody complains about tacking const on to a
 former honest-to-God char * argument that was in fact not modified,
 because that's not only helpful for C++ programmers, it's _harmless_
 for all programmers.  For example, nobody could sanely object (and
 nobody did :-)) to adding const to the attribute-name argument in
 PyObject_SetAttrString().  Sticking to that creates no new problems
 for anyone, so that's as far as I ever went.

 Well, it broke my C extensions... I now have this in my code:

 /* The keyword array changed to const char* in Python 2.5 */
 #if PY_VERSION_HEX = 0x0205
 # define Py_KEYWORDS_STRING_TYPE const char
 #else
 # define Py_KEYWORDS_STRING_TYPE char
 #endif
 ...
 static Py_KEYWORDS_STRING_TYPE *kwslist[] = {yada, NULL};
 ...
 
 You did not read Tim's message carefully enough. He wasn't talking
 about PyArg_ParseTupleAndKeywords *at all*. He only talked about
 changing char* arguments to const char*, e.g. in
 PyObject_SetAttrString. Did that break your C extensions also?

I did read Tim's post: sorry for phrasing the reply the way I did.

I was referring to his statement nobody complains about tacking const
on to a former honest-to-God char * argument that was in fact not
modified.

Also: it's not me complaining, it's the compilers !

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread M.-A. Lemburg
James Y Knight wrote:
 Kill the encoding argument, and you're left with:
 
 Python2.X:
 - bytes(bytes_object) - copy constructor
 - bytes(str_object) - copy the bytes from the str to the bytes object
 - bytes(sequence_of_ints) - make bytes with the values of the ints,  
 error on overflow
 
 Python3.X removes str, and most APIs that did return str return bytes  
 instead. Now all you have is:
 - bytes(bytes_object) - copy constructor
 - bytes(sequence_of_ints) - make bytes with the values of the ints,  
 error on overflow
 
 Nice and simple.

Albeit, too simple.

The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes(123) would not work in Py3k.

It's hard to imagine how you'd provide a decent upgrade path
for bytes() if you introduce the above semantics in Py2.x.

People would start writing bytes(123) in Py2.x and expect
it to also work in Py3k, which it wouldn't.

To prevent this, you'd have to outrule bytes() construction
from strings altogether, which doesn't look like a viable
option either.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/13/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Guido van Rossum wrote:
 It'd be cruel and unusual punishment though to have to write

   bytes(abc, Latin-1)

 I propose that the default encoding (for basestring instances) ought
 to be ascii just like everywhere else. (Meaning, it should really be
 the system default encoding, which defaults to ascii and is
 intentionally hard to change.)
 We're talking about Py3k here: abc will be a Unicode string,
 so why restrict the conversion to 7 bits when you can have 8 bits
 without any conversion problems ?
 
 As Phillip guessed, I was indeed thinking about introducing bytes()
 sooner than that, perhaps even in 2.5 (though I don't want anything
 rushed).

Hmm, that is probably going to be too early. As the thread shows
there are lots of things to take into account, esp. since if you
plan to introduce byte() in 2.x, the upgrade path to 3.x would
have to be carefully planned. Otherwise, we end up introducing
a feature which is meant to prepare for 3.x and then we end up
causing breakage when the move is finally implemented.

 Even in Py3k though, the encoding issue stands -- what if the file
 encoding is Unicode? Then using Latin-1 to encode bytes by default
 might not by what the user expected. Or what if the file encoding is
 something totally different? (Cyrillic, Greek, Japanese, Klingon.)
 Anything default but ASCII isn't going to work as expected. ASCII
 isn't going to work as expected either, but it will complain loudly
 (by throwing a UnicodeError) whenever you try it, rather than causing
 subtle bugs later.

I think there's a misunderstanding here: in Py3k, all string
literals will be converted from the source code encoding to
Unicode. There are no ambiguities - a Klingon character will still
map to the same ordinal used to create the byte content regardless
of whether the source file is encoded in UTF-8, UTF-16 or
some Klingon charset (are there any ?).

Furthermore, by restricting to ASCII you'd also outrule hex escapes
which seem to be the natural choice for presenting binary data in
literals - the Unicode representation would then only be an
implementation detail of the way Python treats string literals
and a user would certainly expect to find e.g. \x88 in the bytes object
if she writes bytes('\x88').

But maybe you have something different in mind... I'm talking
about ways to create bytes() in Py3k using string literals.

 While we're at it: I'd suggest that we remove the auto-conversion
 from bytes to Unicode in Py3k and the default encoding along with
 it.
 
 I'm not sure which auto-conversion you're talking about, since there
 is no bytes type yet. If you're talking about the auto-conversion from
 str to unicode: the bytes type should not be assumed to have *any*
 properties that the current str type has, and that includes
 auto-conversion.

I was talking about the automatic conversion of 8-bit strings to
Unicode - which was a key feature to make the introduction of
Unicode less painful, but will no longer be necessary in Py3k.

 In Py3k the standard lib will have to be Unicode compatible
 anyway and string parser markers like s# will have to go away
 as well, so there's not much need for this anymore.

 (Maybe a bit radical, but I guess that's what Py3k is meant for.)
 
 Right.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-13 Thread M.-A. Lemburg
Guido van Rossum wrote:
 One recommendation: for starters, I'd much rather see the bytes type
 standardized without a literal notation. There should be are lots of
 ways to create bytes objects from string objects, with specific
 explicit encodings, and those should suffice, at least initially.
 
 I also wonder if having a b... literal would just add more confusion
 -- bytes are not characters, but b... makes it appear as if they
 are.

Agreed.

Given that we have a source code encoding which would need
to be honored, b... doesn't really make all that much sense
(unless you always use hex escapes).

Note that if we drop the string type, all codecs which currently
return strings will have to return bytes. This gives you a pretty
exhaustive way of defining your binary literals in Python :-)

Here's one:

data = abc.encode(latin-1)

To simplify things we might want to have

bytes(abc)

do the above encoding per default.

 --Guido
 
 On 2/11/06, Bengt Richter [EMAIL PROTECTED] wrote:
 On Fri, 10 Feb 2006 21:35:26 -0800, Guido van Rossum [EMAIL PROTECTED] 
 wrote:

 On Sat, 11 Feb 2006 05:08:09 + (UTC), Neil Schemenauer [EMAIL 
 PROTECTED]  The backwards compatibility problems *seem* to be 
 relatively minor.
 I only found one instance of breakage in the standard library.  Note
 that my patch does not change PyObject_Str(); that would break
 massive amounts of code.  Instead, I introduce a new function:
 PyString_New().  I'm not crazy about the name but I couldn't think
 of anything better.
 On 2/10/06, Bengt Richter [EMAIL PROTECTED] wrote:
 Should this not be coordinated with PEP 332?
 Probably.. But that PEP is rather incomplete. Wanna work on fixing that?

 I'd be glad to add my thoughts, but first of course it's Skip's PEP,
 and Martin casts a long shadow when it comes to character coding issues
 that I suspect will have to be considered.

 (E.g., if there is a b'...' literal for bytes, the actual characters of
 the source code itself that the literal is being expressed in could be ascii
 or latin-1 or utf-8 or utf16le a la Microsoft, etc. UIAM, I read that the 
 source
 is at least temporarily normalized to Unicode, and then re-encoded (except 
 now
 for string literals?) per coding cookie or other encoding inference. (I may 
 be
 out of date, gotta catch up).

 If one way or the other a string literal is in Unicode, then presumably so is
 a byte string b'...' literal -- i.e. internally ub'...' just before
 being turned into bytes.

 Should that then be an internal straight ub'...'.encode('byte') with 
 default ascii + escapes
 for non-ascii and non-printables, to define the full 8 bits without encoding 
 error?
 Should unicode be encodable into byte via a specific encoding? E.g., 
 u'abc'.encode('byte','latin1'),
 to distinguish producing a mutable byte string vs an immutable str type as 
 with u'abc'.encode('latin1').
 (but how does this play with str being able to produce unicode? And when do 
 these changes happen?)
 I guess I'm getting ahead of myself ;-)

 So I would first ask Skip what he'd like to do, and Martin for some hints on 
 reading, to avoid
 going down paths he already knows lead to brick walls ;-) And I need to 
 think more about PEP 349.

 I would propose to do the reading they suggest, and edit up a new version of 
 pep-0332.txt
 that anyone could then improve further. I don't know about an early 
 deadline. I don't want
 to over-commit, as time and energies vary. OTOH, as you've noticed, I could 
 be spending my
 time more effectively ;-)

 I changed the thread title, and will wait for some signs from you, Skip, 
 Martin, Neil, and I don't
 know who else might be interested...

 Regards,
 Bengt Richter

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/guido%40python.org

 
 
 --
 --Guido van Rossum (home page: http://www.python.org/~guido/)
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/mal%40egenix.com

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-13 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 2/13/06, Phillip J. Eby [EMAIL PROTECTED] wrote:
 At 09:55 AM 2/13/2006 -0800, Guido van Rossum wrote:
 One recommendation: for starters, I'd much rather see the bytes type
 standardized without a literal notation. There should be are lots of
 ways to create bytes objects from string objects, with specific
 explicit encodings, and those should suffice, at least initially.

 I also wonder if having a b... literal would just add more confusion
 -- bytes are not characters, but b... makes it appear as if they
 are.
 Why not just have the constructor be:

  bytes(initializer [,encoding])

 Where initializer must be either an iterable of suitable integers, or a
 unicode/string object.  If the latter (i.e., it's a basestring), the
 encoding argument would then be required.  Then, there's no need for
 special codec support for the bytes type, since you call bytes on the thing
 to be encoded.  And of course, no need for a 'b' literal.
 
 It'd be cruel and unusual punishment though to have to write
 
   bytes(abc, Latin-1)
 
 I propose that the default encoding (for basestring instances) ought
 to be ascii just like everywhere else. (Meaning, it should really be
 the system default encoding, which defaults to ascii and is
 intentionally hard to change.)

We're talking about Py3k here: abc will be a Unicode string,
so why restrict the conversion to 7 bits when you can have 8 bits
without any conversion problems ?

While we're at it: I'd suggest that we remove the auto-conversion
from bytes to Unicode in Py3k and the default encoding along with
it. In Py3k the standard lib will have to be Unicode compatible
anyway and string parser markers like s# will have to go away
as well, so there's not much need for this anymore.

(Maybe a bit radical, but I guess that's what Py3k is meant for.)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-13 Thread M.-A. Lemburg
Tim Peters wrote:
 [Jeremy]
 I added some const to several API functions that take char* but
 typically called by passing string literals.
 
 [Tim]
 If he had _stuck_ to that, we wouldn't be having this discussion :-)
 (that is, nobody passes string literals to
 PyArg_ParseTupleAndKeywords's kws argument).
 
 [Jeremy]
 They are passing arrays of string literals.  In my mind, that was a
 nearly equivalent use case.  I believe the C++ compiler complains
 about passing an array of string literals to char**.
 
 It's the consequences:  nobody complains about tacking const on to a
 former honest-to-God char * argument that was in fact not modified,
 because that's not only helpful for C++ programmers, it's _harmless_
 for all programmers.  For example, nobody could sanely object (and
 nobody did :-)) to adding const to the attribute-name argument in
 PyObject_SetAttrString().  Sticking to that creates no new problems
 for anyone, so that's as far as I ever went.

Well, it broke my C extensions... I now have this in my code:

/* The keyword array changed to const char* in Python 2.5 */
#if PY_VERSION_HEX = 0x0205
# define Py_KEYWORDS_STRING_TYPE const char
#else
# define Py_KEYWORDS_STRING_TYPE char
#endif
...
static Py_KEYWORDS_STRING_TYPE *kwslist[] = {yada, NULL};
...
if (!PyArg_ParseTupleAndKeywords(args,kws,format,kwslist,a1))
goto onError;

The crux is that code which should be portable across Python
versions won't work otherwise: you either get Python 2.5 xor
Python 2.x (for x  5) compatibility.

Not too happy about it, but then compared to the ssize_t
changes and the relative imports PEP, this one is an easy
one to handle.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-13 Thread M.-A. Lemburg
Phillip J. Eby wrote:
 Why not just have the constructor be:

  bytes(initializer [,encoding])

 Where initializer must be either an iterable of suitable integers, or a
 unicode/string object.  If the latter (i.e., it's a basestring), the
 encoding argument would then be required.  Then, there's no need for
 special codec support for the bytes type, since you call bytes on the 
 thing
 to be encoded.  And of course, no need for a 'b' literal.
 It'd be cruel and unusual punishment though to have to write

   bytes(abc, Latin-1)

 I propose that the default encoding (for basestring instances) ought
 to be ascii just like everywhere else. (Meaning, it should really be
 the system default encoding, which defaults to ascii and is
 intentionally hard to change.)
 We're talking about Py3k here: abc will be a Unicode string,
 so why restrict the conversion to 7 bits when you can have 8 bits
 without any conversion problems ?
 
 Actually, I thought we were talking about adding bytes() in 2.5.

Then we'd need to make the ascii encoding assumption
again, just like Guido proposed.

 However, now that you've brought this up, it actually makes perfect sense 
 to just use latin-1 as the effective encoding for both strings and 
 unicode.  In Python 2.x, strings are byte strings by definition, so it's 
 only in 3.0 that an encoding would be required.  And again, latin1 is a 
 reasonable, roundtrippable default encoding.

It is. However, it's not a reasonable assumption of the
default encoding since there are many encodings out there
that special case the characters 0x80-0xFF, hence the choice
of using ASCII as default encoding in Python.

The conversion from Unicode to bytes is different in this
respect, since you are converting from a bigger type to
a smaller one. Choosing latin-1 as default for this
conversion would give you all 8 bits, instead of just 7
bits that ASCII provides.

 So, it sounds like making the encoding default to latin-1 would be a 
 reasonably safe approach in both 2.x and 3.x.

Reasonable for bytes(): yes. In general: no.

 While we're at it: I'd suggest that we remove the auto-conversion
from bytes to Unicode in Py3k and the default encoding along with
 it. In Py3k the standard lib will have to be Unicode compatible
 anyway and string parser markers like s# will have to go away
 as well, so there's not much need for this anymore.
 
 I thought all this was already in the plan for 3.0, but maybe I assume too 
 much.  :)

Wouldn't want to wait for Py4D :-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] release plan for 2.5 ?

2006-02-10 Thread M.-A. Lemburg
Guido van Rossum wrote:
PEP 328: Absolute/Relative Imports
 
 Yes, please.

+0 for adding relative imports. -1 for raising errors for
in-package relative imports using the current notation
in Python 2.6.

See:

http://mail.python.org/pipermail/python-dev/2004-September/048695.html

for a previous discussion.

The PEP still doesn't have any mention of the above discussion or
later follow-ups.

The main argument is that the strategy to make absolute imports
mandatory and offer relative imports as work-around breaks the
possibility to produce packages that work in e.g. Python 2.4 and
2.6, simply because Python 2.4 doesn't support the needed
relative import syntax.

The only strategy left would be to use absolute imports throughout,
which isn't all that bad, except when it comes to relocating a
package or moving a set of misc. modules into a package - which is
not all that uncommon in larger projects, e.g. to group third-party
top-level modules into a package to prevent cluttering up the
top-level namespace or to simply make a clear distinction in
your code that you are relying on a third-party module, e.g

from thirdparty import tool

I don't mind having to deal with a warning for these, but don't
want to see this raise an error before Py3k.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 10 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path inherits from string

2006-01-26 Thread M.-A. Lemburg
BJörn Lindqvist wrote:
 This seems to be the only really major issue with the PEP. Everything
 else is negotiable, IMHO. But the string inheritance seem to be such a
 critical issue it deserves its own thread. I have tried to address all
 criticism of it here:

I don't see why this is critical for the success of the Path
object. I agree with Thomas that interfaces should be made
compatible to Path object.

Please note that inheritance from string will cause the C type
checks of the form PyString_Check(obj) to return true.
C code will then assume that it has an object which is
compatible to string C API which instances aren't.

If the C code then uses the C API string macros, you
get segfaults - and lot's of old code does, since there
was no way to inherit from a string type at the time.

In fact, you're lucky that open() doesn't give you segfaults,
since the code used to fetch the string argument does exactly
that...

...

 And there is absolutely nothing that can be done about that. As far as
 I can tell, the string inheritance is either livable with or is a
 showstopper. If it is the latter, then:
 
 1. Someone has to make the required modifications to the Python
core.

Right.

Plus convert a few PyString_Check()s to PyString_CheckExact()...

 2. Create a Path class (or adapt the existing one so) that does
not inherit from string.
 3. Release it and wait a few years hoping for it to gain
widespread acceptance in the Python community.
 4. Make a PEP (or adapt this PEP) that gets accepted.
 
 This scenario makes me sad because it basically means that there will
 never be a Path module in Python, atleast not during my lifetime. :(

Why not ? We've added Unicode support to at least some
file I/O APIs - adding support for instances which
support the string interface shouldn't be all that
difficult :-)

BTW, if you're fine with this API:

class File:
def __unicode__(self):
return utest.txt

then the required change is minimal: we'd just need to
use PyObject_Unicode() in getargs.c:837 and you should
be set.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 26 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Path inherits from string

2006-01-26 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Please note that inheritance from string will cause the C type
 checks of the form PyString_Check(obj) to return true.
 C code will then assume that it has an object which is
 compatible to string C API which instances aren't.
 
 Oh, sure they are. Types inheriting from str have the same
 layout as str, and C code assuming that layout will work fine
 with them. Inheritance works (saying inheritance *just* works
 would deny the many fine details that have been engineered to
 actually make it work).

You're right, I forgot about how the .__new__() works on
new-style classes and that extra space is allocated
appended to the base type object for the extra
instance features.

From PEP 253:

class C(B): pass

...

In any case, the work for creating C is done by M's tp_new() slot.
It allocates space for an extended type structure, containing:
the type object; the auxiliary structures (as_sequence etc.); the
string object containing the type name (to ensure that this object
isn't deallocated while the type object is still referencing it);and
some auxiliary storage (to be described later).  It initializes this
storage to zeros except for a few crucial slots (for example,tp_name
is set to point to the type name) and then sets the tp_base slot to
point to B.



Sorry for the FUD,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 26 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New PEP: Using ssize_t as the index type

2006-01-19 Thread M.-A. Lemburg
Neal Norwitz wrote:
 On 1/10/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 We'd also have to make sure that old extensions don't
 just import with a warning, since the change will introduce
 buffer overflows and seg faults in extensions that are not
 aware of the change.
 
 I agree that on 64-bit platforms we should prevent the import.  In the
 past we only provided a warning and the users were on their own.  This
 is different.
 
 If you read my massive checkin to check the return results of
 Py_InitModule*(), you'll realize this isn't as simple as just failing
 in Py_InitMethod*().  I was hoping to just modify Py_InitModule4() in
 Python/modsupport.c to fail and return NULL.  That doesn't seem
 practical given that we didn't check return results.  We will just
 crash the interpreter with standard python 2.4 modules.
 
 ISTM we need to modify _PyImport_LoadDynamicModule() in
 Python/importdl.c before calling the init function (line 56, (*p)())
 to check for some magic symbol that is defined only when compiling 2.5
 and above.  For example we could add a static int  _64_bit_clean = 1;
 in modsupport.h.  Without some trickery we will get this defined in
 every .o file though, not just modules.
 
 Other ideas?

We could explicitly break binary compatibility for Python 2.5
on 64-bit platforms, by changing the name of an often used
API, e.g. the Py_InitModule*() APIs.

This is how Unicode does it - we map the various APIs to
either ...UCS2 or ...UCS4, so that you cannot import an
extension compiled for e.g. UCS2 into a Python interpreter
compiled for UCS4. If we didn't, you'd get seg faults and
buffer overflows the same way you would with the ssize_t
change on 64-bit platforms.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 19 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] basenumber redux

2006-01-18 Thread M.-A. Lemburg
Alex Martelli wrote:
 On 1/17/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Alex, I think you're missing a point here: what you are looking
 for is an interface, not a base class - simply because the
 
 I expect numbers to support arithmetic operators, c -- no need for
 basenumber to spell this out, i.e., be an itnerface.

If at all, basenumber would be an abstract class. However,
unlike for basestring, the interface (which methods it
supports, including operator methods) would not be well-
defined.

 If you look at the Python C API, you'll find that a number
 is actually never tested.

 There being no way to generically test for a number, that's unsurprising.

Hmm, I lost you there. If it's unsurprising that there's
no check for a number, then why would you want a
basenumber ?

 The tests always ask for either
 integers or floats.
 
 But this doesn't apply to the Python Standard Library, for example see
 line 1348 of imaplib.py: if isinstance(date_time, (int, float)):.

Why not use the functions I added to my previous mail ?

 The addition of a basenumber base class won't make these any
 simpler.
 
 Being able to change imaplib to use basenumber instead of (int, float)
 won't make it SIMPLER, but it will surely make it BETTER -- why should
 a long be rejected, or a Decimal,
 for that matter?  Similarly, on line 1352 it should use the existing
 basestring, though it now uses str (this function IS weird -- if it
 finds date_time to be of an unknown TYPE it raises a *ValueError*
 rather than a *TypeError* -- ah well).

Again, why not use floatnumber() instead, which takes care
of all the details behind finding out whether an object
should be considered a number and even converts it to
a float for you ?

Why try to introduce a low-level feature when a higher
level solution is readily available and more usable.

You will rarely really care for the type of an object
if all you're interested in is the float value of an
object (or the integer value).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 18 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] basenumber redux

2006-01-17 Thread M.-A. Lemburg
Alex, I think you're missing a point here: what you are looking
for is an interface, not a base class - simply because the
assumptions you make when finding a KnownNumberTypes instance
are only related to an interface you expect them to provide.

A common case class won't really help all that much with this,
since the implementations of the different types will vary a
lot (unlike, for example, strings and Unicode, which implement
a very common interface) and not necessarily provide a common
interface.

If you look at the Python C API, you'll find that a number
is actually never tested. The tests always ask for either
integers or floats.

The addition of a basenumber base class won't make these any
simpler.

Here's a snippet which probably does what you're looking for
using Python's natural way of hooking up to an implicit
interface:

import UserString

STRING_TYPES = (basestring, UserString.UserString)

def floatnumber(obj):
if isinstance(obj, STRING_TYPES):
raise TypeError('strings are not numbers')

# Convert to a float
try:
return float(obj)
except (AttributeError, TypeError, ValueError):
raise TypeError('%r is not a float' % obj)

def intnumber(obj):
if isinstance(obj, STRING_TYPES):
raise TypeError('strings are not numbers')

# Convert to an integer
try:
value = int(obj)
except (AttributeError, TypeError, ValueError):
raise TypeError('%r is not an integer' % obj)

# Double check so that we don't lose precision
try:
floatvalue = floatnumber(obj)
except TypeError:
return value
if floatvalue != value:
raise TypeError('%r is not an integer' % obj)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str with base

2006-01-17 Thread M.-A. Lemburg
Alex Martelli wrote:
 Is it finally time in Python 2.5 to allow the obvious use of, say,  
 str(5,2) to give '101', just the converse of the way int('101',1)  
 gives 5?  I'm not sure why str has never allowed this obvious use --  
 any bright beginner assumes it's there and it's awkward to explain  
 why it's not!-).  I'll be happy to propose a patch if the BDFL  
 blesses this, but I don't even think it's worth a PEP... it's an  
 inexplicable though long-standing omission (given the argumentative  
 nature of this crowd I know I'll get pushback, but I still hope the  
 BDFL can Pronounce about it anyway;-).

Hmm, how about this:

str(obj, ifisunicode_decode_using_encoding='ascii',
 ifisinteger_use_base=10,
 ifisfile_open_and_read_it=False,
 isdecimal_use_precision=10,
 ismoney_use_currency='EUR',
 isdatetime_use_format='%c')

and so on ?!

Or even better:

str(obj, **kws) and then call obj.__str__(**kws) instead of
just obj.__str__() ?!


Seriously, shouldn't these more specific convert to a string
functions be left to specific object methods or helper
functions ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 17 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r42015 - peps/trunk

2006-01-13 Thread M.-A. Lemburg
David Goodger wrote:
 On 1/12/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 I know, but I wouldn't expect SVN to query other servers
 than svn.python.org inside the standard repository
 directories.

 AFAIK, this is a first in the Python repository.
 
 True, and worth discussing.  Luckily the PEPs repository is a
 low-traffic, non-core area, so it's not urgent.

True.

 Not sure if it's such a good idea.
 
From my point of view, it's a time-saver.  No more manual updates!
 They were a pain, and rarely got done.
 
 Branching and tagging
 doesn't work with external resources in Subversion,
 so things can become inconsistent.
 
 How so?  The svn:externals property is treated the same as any
 other, and is copied along with everything else by svn copy.

Right, but Subversion doesn't store the revision of the
external resource at the time you made the copy, so
when you checkout the branch or tag, you'll still get
the most recent version of the external resource
which can break things, e.g. say you upgrade docutils
to Python 2.5, then a checkout of a tag built at
a time when Python 2.3 was current would probably
no longer work (with Python 2.3).

At least that's how things were documented the last
time I had a look at svn:externals - could be that they
modified the behavior to make it a little more clever
since.

 Also, what if you break the code in the berlios repo
 or the server is not reachable ?
 
 If the code in the repo ever breaks, it will be fixed within minutes.
 The server being unreachable is only a problem for initial checkouts;
 updates will just keep the code that was already there.  In my
 experience, the berlios.de SVN server has rarely been unreachable.  If
 there's a problem, we can always back out the svn:externals property
 and install the package.
 
 That having been said, I do see the value of installing a packaged
 release.  We just released Docutils 0.4 a few days ago; I could
 install that statically.  An alternative is to use svn:externals to
 link to a specific revision (via the -r option); I could link to the
 0.4 release revision.  This would solve the repo-breakage issue, but
 not the server-unreachability issue.

Interesting. Last time I looked these external revisions
were not possible (or I overread the possibility).

 I don't think these issues are major, but I wouldn't mind terribly if
 we decide to go with a static install.

There are two other nits with the external reference:

* connecting to a server that people probably don't
  recognize as being an official Python source and thus
  probably don't trust right away (though this is well
  hidden by subversion, firewall software will alarm the
  user)

* being able to download the complete repository of Python
  with all the history - since the external resource is not
  part of the history, users would have to track down
  the repository of the external resource (and this might
  not be readily available for download)

 A release copy in the
 external/ dir would solve all these issues.
 
 That's basically what we had before (except no explicity external
 directory), but it was always out of date.  The docutils package was
 directly in the trunk, for ease of import by the pep2html.py front end
 (easy to work around, but why bother?).

I guess that only PSF license covered software should
go into the main directories of the repository. Everything
else should first go in externals/ and then get copied
into the main trunks. It's only one additional step,
but will help a lot in the future if we ever have to
track the copyright status of things in the main
trunks.

This is not an issue for docutils, I guess, since it's
public domain, but then again, there are lawyers who
believe that there's no such thing as public domain...

http://www.linuxjournal.com/article/6225

 Another minor nit with the old way: updates polluted the
 python-checkins list.

I guess if you just update to new release versions, then
this will be minimal. I, for one, don't mind at all.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Py_ssize_t output parameters (Was: [Python-checkins] r41971 ...)

2006-01-13 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 What if x64 has a 64-bit value ? How do you catch
 and process the truncation error ?
 
 We were *both* discussing a scenario where no sizes
 exceed 2**31, right?

Right, but I don't see the point of each and every
extension having to go through these hoops when you
could add support for these checks (including error
reporting) to the few APIs in question, in particular
the PyArg_ParseTuple() API.

 Under such a scenario, this just won't happen.
 
 OTOH, if you were discussing a scenario where sizes
 might exceed 2**31, then I don't see why you are worried
 about Py_ssize_t* parameters alone: Even
 
   PyString_Size()
 
 might (of course!) return a value  2**31 - so it
 is not just output parameters, but also return values.

Indeed - which is why I'm going to convert our tools to
Py_ssize_t throughout.

I don't expect this to happen any time soon for the ten or
twenty other 3rd party extensions we regularly use and this
would prevent an upgrade to Python 2.5.

 For more information, please read Conversion guidelines
 in
 
 http://www.python.org/peps/pep-0353.html

BTW, the open issue should be reworded:

In particular, functions that currently take int* output
parameters should continue to do so. New functions should be
revised to enable Py_ssize_t* output arguments and preseve
backwards compatibility.

(This also includes the strategy to be used, so you can
remove the note on strategy)

Please also add a comment on the fact that extensions
which haven't been recompiled for the Python 2.5 API
will not get imported (the API_LEVEL check should consistently
fail for these instead of just issuing a warning).

 That is not necessary. Can you give in example of a module
 where you think it is necessary?

 If you want to port the extension to Py_ssize_t, this
 is essentially the case.

 You might want to take the unicodeobject.c file as
 example.
 
 unicodeobject.c is not an extension. We were discussing
 existing extension modules.
 
 We could use the type flags for these. much like we
 do for the new style numbers.
 
 Would you like to write a specification for that?

Sure, if there's hope to get this into the code.

 If you don't like the macro approach, why not simply
 leave the two separate sets of APIs visible.
 
 To simplify the porting.

Sorry, I lost you there:

I'm saying that two sets of APIs (one for int*, one
for Py_ssize_t*) will make porting easier since
updated code will only have to rename the APIs
used. Furthermore, it will allow to write code
that easily works in Python 2.1-2.4 as well as
Python 2.5.

 All Py_ssize_t aware and compatible extensions
 would use the new APIs instead. The old-style APIs
 should then take care of the proper down-casting and
 error reporting.
 
 That is not possible. Some API does not support
 error reporting (like PyString_Size()). So callers
 don't expect it to fail.

I'm talking about the few APIs that use output parameters.
Those do support error reporting.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 13 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Py_ssize_t output parameters (Was: [Python-checkins] r41971 ...)

2006-01-12 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 ... and then the type change of that variable propagates
 throughout the extension.
 
 That depends on the usage of the code. If the variable
 is passed by value, no further changes are necessary.
 If a pointer to the variable is passed, you could replace
 it with
 
   Py_ssize_t x64; int x;
   foo(x64);
   x = x64;
 
 Then use x as you did with the original code.

What if x64 has a 64-bit value ? How do you catch
and process the truncation error ?

To do this down-casting correctly, you'd need to write
code that does bounds checks and integrate into the
functions error handling.

 You basically end up having to convert the whole extension
 to Py_ssize_t.
 
 That is not necessary. Can you give in example of a module
 where you think it is necessary?

If you want to port the extension to Py_ssize_t, this
is essentially the case.

You might want to take the unicodeobject.c file as
example.

 Don't get me wrong: I don't mind doing this for the eGenix
 extensions, but I'm worried about the gazillion other
 useful extensions out there which probably won't get
 upgraded in time to be used with Python 2.5.
 
 I agree that it is not acceptable to require immediate
 whole-sale updates of every modules. However, I do
 believe that the number of modules that need any change
 at all is small, and that those modules can be modified
 with minimal effort to get them working again,
 backwards-compatible (i.e. with the only exception that
 they would fail if indices run above 2**31).
 
 I think all it takes is a set of new APIs for functions
 that use Py_ssize_t as output parameter and which are
 mapped to the regular API names if and only if the
 extension #defines PY_SSIZE_T_CLEAN (or some other
 capability flag).
 
 That is not enough. You also need to deal with the
 function pointers that change.

We could use the type flags for these. much like we
do for the new style numbers.

 Also, others have rejected/expressed dislike of the
 PY_SIZE_T_CLEAN macro already, so they would probably
 dislike further hackishness in that direction.

That's easy to say... by not providing an easy way
to upgrade extensions, you basically just move the
requirement for hacks into the extensions.

I wouldn't consider that a good solution.

If you don't like the macro approach, why not simply
leave the two separate sets of APIs visible. Old
extensions would then use and link against the existing
APIs. All Py_ssize_t aware and compatible extensions
would use the new APIs instead. The old-style APIs
should then take care of the proper down-casting and
error reporting.

In Py3K we could then remove the old-style ones
and rename the new ones to the old names. Porting
an already Py_ssize_t compatible extension to the
renamed new style APIs would then become a simple
task of searchreplace.

 Anyway, I have installed the PEP onsite, and
 added an Open Issues section, recording your
 comments.

Thanks.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New PEP: Using ssize_t as the index type

2006-01-12 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 If it were this easy, I wouldn't have objections. But it's
 not.
 
 It is so easy. Can you should me an example where it isn't?
 
 The variables you use with these APIs tend to propagate
 through the extension, you use them in other calls,
 make assignments, etc.
 
 They only propage if you make them. You don't have to,
 if you don't want to.
 
 If you implement extension types, you end up having to
 convert all the length related struct variables to
 Py_ssize_t.
 
 Only if you want to. If not, the module will work
 (nearly) unmodified. Of course, it will be limited
 to 32-bit indices.

See my other reply on this topic.

 All this is fine, but it's also a lot of work which
 can be made easier. Recompiling an extension is well
 within range of many Python users, manually checking,
 fixing and porting it to the new API is certainly not.
 
 Sure. However, most users will compile it on 32-bit
 systems. If they find they cannot get it to work on
 a 64-bit system, they should ask the author for help,
 or just use it in 32-bit mode (as 64-bit mode won't
 gain them anything, anyway).

I wonder how you are going to import a 32-bit
extension into a 64-bit binary of Python.
It simply doesn't work.

 The set of functions that will require Py_ssize_t
 is getting larger in your branch which is why I started
 this discussion.
 
 How so? I did not add a single function that has
 int* output values, AFAICT.

No, but there are quite a few APIs with Py_ssize_t*
output values.

 I am talking about the entirety of these functions,
 and claim that they are rarely used (including the
 Unicode and buffer APIs).

I wouldn't say that PyString_AsStringAndSize() is rarely
used and neither is PyArg_ParseTuple().

I agree that other APIs are certainly more domain
specific and can be dealt with in the extension, but
those two APIs need special care and so do the type
slot functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r42015 - peps/trunk

2006-01-12 Thread M.-A. Lemburg
David Goodger wrote:
 Author: david.goodger
 Date: Thu Jan 12 04:33:16 2006
 New Revision: 42015

 Modified:
peps/trunk/   (props changed)
 Log:
 add external link to Docutils public repo -- always up-to-date
 
 I just deleted the static copy of the docutils directory from the peps
 repository, and added in an external link (svn:externals 'docutils
 svn://svn.berlios.de/docutils/trunk/docutils/docutils').  This way, the 
 external
 code should always be up-to-date.  You may need to manually delete your
 peps/trunk/docutils directory to get this to work though -- SVN leaves
 subdirectories behind which hinder the externals update.
 
 Please let me know if this causes any problems.  Thanks.

Question: why do we need docutils in the peps/trunk/ directory
in the first place ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r42015 - peps/trunk

2006-01-12 Thread M.-A. Lemburg
David Goodger wrote:
 [M.-A. Lemburg]
 Question: why do we need docutils in the peps/trunk/ directory
 in the first place ?
 
 It's a convenience, so that a separate Docutils download  install
 ( maintain) isn't necessary for those who process reST-format PEPs.

Hmm, shouldn't these things be tracked under external/ ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] r42015 - peps/trunk

2006-01-12 Thread M.-A. Lemburg
David Goodger wrote:
 On 1/12/06, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 Hmm, shouldn't these things be tracked under external/ ?!
 
 What do you mean exactly? A new external directory?

Yes.

 SVN provides a built-in mechanism for tracking external
 repositories, via the svn:externals property, and that's
 what I used.

I know, but I wouldn't expect SVN to query other servers
than svn.python.org inside the standard repository
directories.

AFAIK, this is a first in the Python repository. Not
sure if it's such a good idea. Branching and tagging
doesn't work with external resources in Subversion,
so things can become inconsistent.

Also, what if you break the code in the berlios repo
or the server is not reachable ? A release copy in the
external/ dir would solve all these issues.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 12 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New PEP: Using ssize_t as the index type

2006-01-10 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Armin Rigo wrote:
 This would do the right thing for = 2.4, using ints everywhere; and the
 Python.h version 2.5 would detect the #define and assume it's a
 2.5-compatible module, so it would override the #define with the real
 thing *and* turn on the ssize_t interpretation of the '#' format
 character.
 
 This would be very similar to the PY_SIZE_T_CLEAN approach, except
 that it would also help to detect spelling mistakes.
 
From an implementation point of view, the real challenge is to
 give PyArg_ParseTuple a different meaning; I do this be #defining
 it to PyArg_ParseTupleSsize_t (to preserve binary compatibility
 for the original interpretation of ParseTuple). Putting
 additional flags arguments in the entire code is also quite
 hackish.
 
 I still don't like the idea of a magic #define that changes the behavior
 of '#include Python.h', but I admit I don't find any better solution.
 I suppose I'll just blame C.
 
 More precisely, the printf style of function calling, and varargs
 functions. ISO C is pretty type safe, but with varargs functions,
 you lose that completely.
 
 I still hope I can write a C parser some day that does
 ParseTuple/BuildValue checking.

At least gcc does check the parameter types and generates
warnings for wrong vararg parameters passed to printf() et al.

We definitely need a clean solution for this. Otherwise,
authors who want to support more than just the latest Python
release will run into deep trouble.

Note that the branch also has other changes of output parameter
types which will cause problems in extensions (not only extensions
implementing the sequence protocol as the PEP suggests, but
also ones using it as well as extensions implementing or
using the buffer protocol and the slicing protocol). These are not
as easily avoidable as the PyArg_ParseTuple() problem which
could be dealt with by a a new parser marker for ssize_t
lengths (with the '#' marker casting the argument to an int).

I don't see a good solution for these other than introducing
a set of new APIs (and maybe doing some macro magic as Martin
did for PyArg_ParseTuple()). Due to the fact that changes in
the types of output parameters require changes in the
extension variable type layout itself, they introduce a large
number of type changes in the extension and make writing
backwards compatible extensions harder than necessary.

Furthermore, all extensions for Python 2.4 would have to be
ported to the new Python API and porting is not going to
be a simple recompile, but will require C skills.

We'd also have to make sure that old extensions don't
just import with a warning, since the change will introduce
buffer overflows and seg faults in extensions that are not
aware of the change.

Martin, please add the above points to the PEP. I'd also
like to see it published, because it's hard to track a PEP
in the mailing

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 10 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New PEP: Using ssize_t as the index type

2006-01-10 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 I don't see a good solution for these other than introducing
 a set of new APIs (and maybe doing some macro magic as Martin
 did for PyArg_ParseTuple()). Due to the fact that changes in
 the types of output parameters require changes in the
 extension variable type layout itself, they introduce a large
 number of type changes in the extension and make writing
 backwards compatible extensions harder than necessary.
 
 That's not true. It is very easy to write extensions that
 receive such values and are still backwards-compatible.
 
 Suppose you had
 
   int pos;
   PyObject *k, *v;
 
   PyDict_Next(dict, pos, k, v);
 
 You just change this to
 
   /* beginning of file */
   #ifdef Py_HEX_VERSION  2.5
   typedef int Py_ssize_t;
   #endif
 
   /* later */
   Py_ssize_t pos;
   PyObject *k, *v;
 
   PyDict_Next(dict, pos, k, v);
 
 That's it!

If it were this easy, I wouldn't have objections. But it's
not.

The variables you use with these APIs tend to propagate
through the extension, you use them in other calls,
make assignments, etc.

If you implement extension types, you end up having to
convert all the length related struct variables to
Py_ssize_t.

If you're writing against 3rd party APIs which don't
use ssize_t or size_t, you have to convert Py_ssize_t
to int where necessary.

All this is fine, but it's also a lot of work which
can be made easier. Recompiling an extension is well
within range of many Python users, manually checking,
fixing and porting it to the new API is certainly not.

 Furthermore, all extensions for Python 2.4 would have to be
 ported to the new Python API and porting is not going to
 be a simple recompile, but will require C skills.
 
 Not all extensions. Only those that call functions that expect
 int* output parameters - which is fairly uncommon.

The set of functions that will require Py_ssize_t
is getting larger in your branch which is why I started
this discussion.

In the first checkin you only had
the rarely used slice APIs converted. In the meantime
the buffer APIs, the Unicode APIs and others have
been added to the list.

These APIs are used a lot more often than the slice
APIs.

I'm not saying that it's a bad idea to adjust these
to Py_ssize_t, it's just the backwards incompatible
way this is done which bothers me.

 Martin, please add the above points to the PEP. I'd also
 like to see it published, because it's hard to track a PEP
 in the mailing
 
 It's very difficult to get a PEP number assigned. I wrote
 [EMAIL PROTECTED] with no response.

Would it be possible to host the PEP in the python.org
wiki or maybe in the sandbox on svn.python.org ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Jan 10 2006)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Doc-SIG] that library reference, again

2005-12-30 Thread M.-A. Lemburg
I haven't followed the thread, so many I'm repeating things.

Has anyone considered using e.g. MediaWiki (the wiki used for
Wikipedia) for Python documentation ?

I'm asking because this wiki has proven to be ideally suited
for creating complex documentation tasks and offers many features
which would make Python documentation a lot easier and more
accessible:

* people wouldn't have to learn LaTeX to commit doc-patches
* it's easy to monitor and revert changes, discuss changes
* there's version history available
* new docs would be instantly available on the web
* builtin search facility, categories and all the other nifty
  wiki stuff
* it's one of the more popular wikis around and due to Wikipedia
  it's here to stay
* conversion to XML and DocBook is possible, providing
  entry points for conversion to other formats (including
  LaTeX)
* more following means more tools (over time)

Just a thought.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 30 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] NotImplemented reaching top-level

2005-12-29 Thread M.-A. Lemburg
Hi Armin,

 On Wed, Dec 28, 2005 at 09:56:43PM +0100, M.-A. Lemburg wrote:
 d += 1.2
 d
 NotImplemented
 The PEP documenting the coercion logic has complete tables
 for what should happen:
 
 Well, '+=' does not invoke coercion at all, with new-style classes like
 Decimal.

True, it doesn't invoke coercion in the sense that a coercion
method is called, but the mechanism described in the PEP is
still used via PyNumber_InPlaceAdd().

 Looking at the code in abstract.c the above problem appears
 to be related to the special cases applied to += and *=
 in case both operands cannot deal with the type combination.

 In such a case, a check is done whether the operation could
 be interpreted as sequence operation (concat or repeat) and
 then delegated to the appropriate handlers.
 
 Indeed.  The bug was caused by this delegation, which (prior to my
 patch) would also return a Py_NotImplemented that would leak through
 abstract.c.  My patch is to remove this unnecessary delegation by not
 defining sq_concat/sq_repeat for user-defined classes, and restoring the
 original expectation that the sq_concat/sq_repeat slots should not
 return Py_NotImplemented.  How does this relate to coercion?

The Py_NotImplemented singleton was introduced in the coercion
proposal to mean there is no implementation to execute the requested
operation on the given combination of types.

At the time we also considered using an exception for this, but
it turned out that this caused too much of a slow-down. Hence the use
of a special singleton which could be tested for by a simple
pointer comparison.

Originally, the singleton was only needed for mixed-type operations.
It seems that its use has spread to other areas as well and
can now also refer to missing same-type operator implementations.

 But then again, looking in typeobject.c, the following code
 could be the cause for leaking a NotImplemented singleton
 reference:

 #define SLOT1BINFULL(FUNCNAME, TESTFUNC, SLOTNAME, OPSTR, ROPSTR) \
 static PyObject * \
 FUNCNAME(PyObject *self, PyObject *other) \
 { \
  static PyObject *cache_str, *rcache_str; \
  int do_other = self-ob_type != other-ob_type  \
  other-ob_type-tp_as_number != NULL  \
  other-ob_type-tp_as_number-SLOTNAME == TESTFUNC; \
  if (self-ob_type-tp_as_number != NULL  \
  self-ob_type-tp_as_number-SLOTNAME == TESTFUNC) { \
  PyObject *r; \
  if (do_other  \
  PyType_IsSubtype(other-ob_type, self-ob_type)  \
  method_is_overloaded(self, other, ROPSTR)) { \
  r = call_maybe( \
  other, ROPSTR, rcache_str, (O), self); \
  if (r != Py_NotImplemented) \
  return r; \
  Py_DECREF(r); \
  do_other = 0; \
  } \
  r = call_maybe( \
  self, OPSTR, cache_str, (O), other); \
  if (r != Py_NotImplemented || \
  other-ob_type == self-ob_type) \
 ^
 If both types are of the same type, then a NotImplemented returng
 value would be returned.
 
 Indeed, however:
 
  return r; \
  Py_DECREF(r); \
  } \
  if (do_other) { \
  return call_maybe( \
  other, ROPSTR, rcache_str, (O), self); \
  } \
  Py_INCREF(Py_NotImplemented); \
  return Py_NotImplemented; \
 }
 
 This last statement also returns Py_NotImplemented.  So it's expected of
 this function to be able to return Py_NotImplemented, isn't it?  The
 type slots like nb_add can return Py_NotImplemented; the code that
 converts it to a TypeError is in the caller, which is abstract.c.

You're right - silly me.

Regards,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 29 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] NotImplemented reaching top-level

2005-12-28 Thread M.-A. Lemburg
Armin Rigo wrote:
 Hi Facundo,
 
 On Sat, Dec 24, 2005 at 02:31:19PM -0300, Facundo Batista wrote:
 d += 1.2
 d
 NotImplemented
 
 The situation appears to be a mess.  Some combinations of specific
 operators fail to convert NotImplemented to a TypeError, depending on
 old- or new-style-class-ness, although this is clearly a bug (e.g. in an
 example like yours but using -= instead of +=, we get the correct
 TypeError.)
 
 Obviously, we need to write some comprehensive tests about this.  But
 now I just found out that the old, still-pending SF bug #847024 about
 A()*5 in new-style classes hasn't been given any attention; my theory is
 that nobody fully understands the convoluted code paths of abstract.c
 any more :-(

The PEP documenting the coercion logic has complete tables
for what should happen:

http://www.python.org/peps/pep-0208.html

Looking at the code in abstract.c the above problem appears
to be related to the special cases applied to += and *=
in case both operands cannot deal with the type combination.

In such a case, a check is done whether the operation could
be interpreted as sequence operation (concat or repeat) and
then delegated to the appropriate handlers.

But then again, looking in typeobject.c, the following code
could be the cause for leaking a NotImplemented singleton
reference:

#define SLOT1BINFULL(FUNCNAME, TESTFUNC, SLOTNAME, OPSTR, ROPSTR) \
static PyObject * \
FUNCNAME(PyObject *self, PyObject *other) \
{ \
static PyObject *cache_str, *rcache_str; \
int do_other = self-ob_type != other-ob_type  \
other-ob_type-tp_as_number != NULL  \
other-ob_type-tp_as_number-SLOTNAME == TESTFUNC; \
if (self-ob_type-tp_as_number != NULL  \
self-ob_type-tp_as_number-SLOTNAME == TESTFUNC) { \
PyObject *r; \
if (do_other  \
PyType_IsSubtype(other-ob_type, self-ob_type)  \
method_is_overloaded(self, other, ROPSTR)) { \
r = call_maybe( \
other, ROPSTR, rcache_str, (O), self); \
if (r != Py_NotImplemented) \
return r; \
Py_DECREF(r); \
do_other = 0; \
} \
r = call_maybe( \
self, OPSTR, cache_str, (O), other); \
if (r != Py_NotImplemented || \
other-ob_type == self-ob_type) \
^
If both types are of the same type, then a NotImplemented returng
value would be returned.

return r; \
Py_DECREF(r); \
} \
if (do_other) { \
return call_maybe( \
other, ROPSTR, rcache_str, (O), self); \
} \
Py_INCREF(Py_NotImplemented); \
return Py_NotImplemented; \
}

Strange enough, the SLOT1BINFULL macro is not used by the
inplace concat or repeat slots...

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 28 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file() vs open(), round 7

2005-12-27 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 Aahz wrote:
 
 class file(object)
 |  file(name[, mode[, buffering]]) - file object
 |
 |  Open a file.  The mode can be 'r', 'w' or 'a' for reading (default),
 [...]
 |  Note:  open() is an alias for file().

 This is confusing.  I suggest that we make ``open()`` a factory function
 right now.  (I'll submit a bug report (and possibly a patch) after I get
 agreement.)
 
 +1.
 
 can we add a opentext factory for file/codecs.open while we're at it ?

Why a new factory function ? Can't we just redirect to codecs.open()
in case an encoding keyword argument is passed to open() ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file() vs open(), round 7

2005-12-27 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 can we add a opentext factory for file/codecs.open while we're at it ?
 Why a new factory function ? Can't we just redirect to codecs.open()
 in case an encoding keyword argument is passed to open() ?!
 
 I think open is overloaded enough as it is.  Using separate functions for 
 distinct
 use cases is also a lot better than keyword trickery.

Fair enough.

 Here's a rough draft:
 
 def textopen(name, mode=r, encoding=None):
 if U not in mode:
 mode += U

The U is not needed when opening files using codecs -
these always break lines using .splitlines() which
breaks lines according to the Unicode rules and also
knows about the various line break variants on different
platforms.

 if encoding:
 return codecs.open(name, mode, encoding)
 return file(name, mode)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file() vs open(), round 7

2005-12-27 Thread M.-A. Lemburg
Phillip J. Eby wrote:
 At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote:
 M.-A. Lemburg wrote:

 can we add a opentext factory for file/codecs.open while we're at it ?
 Why a new factory function ? Can't we just redirect to codecs.open()
 in case an encoding keyword argument is passed to open() ?!
 I think open is overloaded enough as it is.  Using separate functions for 
 distinct
 use cases is also a lot better than keyword trickery.

 Here's a rough draft:

 def textopen(name, mode=r, encoding=None):
 if U not in mode:
 mode += U
 if encoding:
 return codecs.open(name, mode, encoding)
 return file(name, mode)
 
 Nice. It should probably also check whether there's a 'b' or 't' in 'mode' 
 and raise an error if so. 

Why should it do that ?

FYI: codecs.open() explicitly adds the 'b' to the mode since
we don't want the platform text mode interfere with the
Unicode line breaking.

 I'd also prefer to call it 'textfile', as that 
 reads more nicely with for line in textfile(...): use cases, and it does 
 return a file object.

Nope: open() is only guaranteed to return a file-like object,
e.g. codecs.open() will return a wrapped version of a file
object.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file() vs open(), round 7

2005-12-27 Thread M.-A. Lemburg
Phillip J. Eby wrote:
 At 04:20 PM 12/27/2005 +0100, M.-A. Lemburg wrote:
 Phillip J. Eby wrote:
  At 02:35 PM 12/27/2005 +0100, Fredrik Lundh wrote:
  M.-A. Lemburg wrote:
 
  can we add a opentext factory for file/codecs.open while we're at
 it ?
  Why a new factory function ? Can't we just redirect to codecs.open()
  in case an encoding keyword argument is passed to open() ?!
  I think open is overloaded enough as it is.  Using separate
 functions for
  distinct
  use cases is also a lot better than keyword trickery.
 
  Here's a rough draft:
 
  def textopen(name, mode=r, encoding=None):
  if U not in mode:
  mode += U
  if encoding:
  return codecs.open(name, mode, encoding)
  return file(name, mode)
 
  Nice. It should probably also check whether there's a 'b' or 't' in
 'mode'
  and raise an error if so.

 Why should it do that ?
 
 It's not necessary if both codecs.open() and file() raise an error when
 there's both a 'U' and a 't' or 'b' in the mode string, I suppose.

I see what you mean. codecs.open() doesn't work with 'U'.

 FYI: codecs.open() explicitly adds the 'b' to the mode since
 we don't want the platform text mode interfere with the
 Unicode line breaking.
 
 I think maybe you're confusing the wrapped file's mode with the
 passed-in mode, here.  The passed-in mode should contain at most one of
 'b', 't', or 'U', IIUC.  The mode used for the wrapped file should of
 course always be 'b', but that's not visible to the user of the routine.

Thinking about this some more, I think it's better to
make encoding mandatory and to not use file() at all
in the API.

When we move to all text is Unicode in Py3k, we'll
have to require this anyway, so why not start with it
now.

That said, I think that a name textfile would be
more appropriate for the factory function, like you
suggested.

Valid values for mode would then be 'r', 'w' and 'a'.
'U' is not needed. 'b' and 't' neither. The '+' modes
don't work well with codecs.

  I'd also prefer to call it 'textfile', as that
  reads more nicely with for line in textfile(...): use cases, and
 it does
  return a file object.

 Nope: open() is only guaranteed to return a file-like object,
 e.g. codecs.open() will return a wrapped version of a file
 object.
 
 I meant it's a file object in use case terms, not that
 isinstance(ob,file).

We usually call this an xyz-like object (meaning that
the object provides a certain kind of interface).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] file() vs open(), round 7

2005-12-27 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 Here's a rough draft:

def textopen(name, mode=r, encoding=None):
if U not in mode:
mode += U

 The U is not needed when opening files using codecs -
 these always break lines using .splitlines() which
 breaks lines according to the Unicode rules and also
 knows about the various line break variants on different
 platforms.
 
 Still, codecs typically don't implement universal newlines
 correctly. If you specify 'U', then do .read(), you deserve
 to get \n (U+0010) as the line separator; with most codecs,
 you get whatever line breaks where in the file.
 
 Passing 'U' to the underlying stream is wrong, as well:
 if the stream is double-byte oriented (e.g. UTF-16),
 the 'U' filtering will rarely do anything, but if it does
 something, it will be wrong.
 
 I agree that it would be desirable to have textopen always
 default to universal newlines, however, this is difficult
 to implement.

I think that codecs solve the problem in a better way.
If you want to read lines from a stream, you'd use
.readline() or .readlines() to read the lines, and not
expect .read() to magically apply some conversion to the
original data.

Both line methods have a parameter keepends (which defaults to
True). This parameter specifies whether you will get the
original line end markers or not, which makes it possible to let
the application implement whatever logic it finds
appropriate.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Keep default comparisons - or add a second set?

2005-12-20 Thread M.-A. Lemburg
Josiah Carlson wrote:
 Jim Fulton [EMAIL PROTECTED] wrote:
 Jim Jewett wrote:
 PEP 3000 now suggests that dropping default comparison has become more
 than an idle what-if.

 Unfortunately, one very common use case of comparisons is to get a
 canonical order.  If the order is sensible, all the better, but that
 is not strictly required.  One of Python's selling points (especially
 compared to Java) is that getting a canonical order just works, even
 if the objects being sorted are not carefully homogenized by hand. 
 Python itself relies on this when comparing same-length dictionaries.

 There are times when a specific comparison doesn't make sense (date vs
 a datetime that it includes), but these are corner cases best handled
 by the specific class that understands the specific requirements --
 classes that already have to override the comparison operators anyhow.

 Even the recently posted get rid of default comparisons use case is
 really just an argument to make the canonical ordering work better. 
 The problem Jim Fulton describes is that the (current default)
 canonical order will change if objects are saved to a database and
 then imported to a different session.  Removing default comparisons
 wouldn't really help much; the errors would (sometimes) show up at
 saving instead of (maybe) at loading, but the solution would still be
 to handcode a default comparison for every single class individually.
 I think you need to do a much better job of defining canonical ordering.

 You've given two properties:

 - It need not make sense. :)

 - It must be consistent accross sessions

Does this also mean accross different versions of Python?

How about different operating systems and hardware?

If I create and pickle a BTree with a bunch of object keys
and reload that pickle in a different session, with a
later version of Python on a different OS and Hardware
architecture, will the keys still have the same order?

I consider (obviously) this second property to be crucial.

 Do you have any proposal for how to achieve these properties?
 
 New superclasses for all built-in types (except for string and unicode,
 which already subclass from basestring).
 
 int, float, complex (long) : subclass from basenumber
 tuple, list, set : subclass from basesequence
 dict : subclass from basemapping

set should be under basemapping.

 The idea is that each of the above classes define a group in which items
 are comparable.  If you end up in a situation in which the base classes
 of the compared object differ (and hence are not comparable directly by
 value), you compare their base class name.  Because their base classes
 differ, you always get a reliable differentiation between groups.

Python already uses this trick based on the type name.
If that still doesn't help, id(object) is used which is
what JimF is criticizing (I presume).

 What about comparisons between user-defined classes (classic or subclass
 of object)?  Presumably if a user wanted something to be compared
 against integers, floats, or complex, the user would subclass from
 basenumber, etc. 

... and get all kinds of weird side-effects. A user
probably doesn't want these :-)

 If the user only wanted their objects to compare
 against objects of its own type, they compose their own __cmp__ or
 related methods on their class, and they get this behavior 'for free'.
 
 The only thing necessary for canonical ordering persistancy is that the
 content of an object define its behavior in comparison operators, and
 that pickle knows how to save and restore this content reliably.

Actually, the only thing necessary for *persisting* order is
making sure that the persistence logic maintains order across
pickling.

Note that this is a completely different requirement
than making sure that the outcome of list.sort() is the same
across platforms and sessions.

 Note that one can remove the superclass requirement with a smart cmp()
 builtin to automatically choose the comparable group.
 
 
 This is not perfect, but it is an idea, and it would allow a reliable
 canonical ordering.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 20 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Sets are mappings?

2005-12-20 Thread M.-A. Lemburg
Aahz wrote:
 On Tue, Dec 20, 2005, M.-A. Lemburg wrote:
 Josiah Carlson wrote:
 New superclasses for all built-in types (except for string and unicode,
 which already subclass from basestring).

 int, float, complex (long) : subclass from basenumber
 tuple, list, set : subclass from basesequence
 dict : subclass from basemapping
 set should be under basemapping.
 
 Are you sure?  Sets are not actually a mapping; they consist only of
 keys. 

You're right, sets should really have a separate base class.

However, in reality they behave mostly like dictionaries
using (and hiding) a common value of all keys.

 The Python docs do not include sets under maps, and sets do not
 support some of the standard mapping methods (notably keys()).  Raymond
 Hettinger has also talked about switching to a different internal
 structure for sets.

basestring is an abstract class in the sense that it
doesn't provide any interface on its own. I guess the others
should use the same approach.

They are usually only used for quickly checking for an
interface or type property.

Note that unicode and strings don't share a common implementation
either - they just happen to expose a rather similar interface.

 (Should this discussion move to c.l.py?  Normally I'd think so, but I
 think it's critical that the core developers agree about this.  It's
 also critical for me to know because I'm writing a book, but that's not
 reason enough to stick with python-dev. ;-)

Not sure about others. I rarely read c.l.p. Even pydev has
enough traffic these days to require filtering.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 21 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ElementTree in stdlib

2005-12-14 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 M.-A. Lemburg wrote:
 
 Some questions:

 * Are you going to contribute cElementTree as well ?
 
 yes, but there are some build issues we need to sort out first (both pyexpat
 and cET link to their own copies of expat)

Great !

 we also need to figure out how to import the bundled version; should it be
 cElementTree, xml.etree.cElementTree, or just xml.etree.ElementTree
 (which would then fallback on the Python version if cElementTree isn't
 built) ?

If the semantics are identical I'd prefer the latter approach
of using the faster variant if possible.

 * What was the motivation to not include the whole ElementTree
  package ?
 
 this is a perfect time to get rid of some little-used stuff.  if there's 
 enough user
 demand, we can always add a few more modules before 2.5 goes out of the
 door...

Ok.

 * I'm missing the usual Licensed to PSF under a Contributor Agreement.
  in the copyright notices of the files:

  http://www.python.org/psf/contrib.html

  I assume that you'll add these, right ?
 
 will fix.
 
 * How should users that want to use the latest and greatest
  (more recent) distribution directly from your site go about in
  their apps ? Using from...as contructs ?
 
 from-import or import-as works fine

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Dec 14 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote:
 While running regrtest with -R to find reference leaks I found a usage
 issue.  When a codec is registered it is stored in the interpreter
 state and cannot be removed.  Since it is stored as a list, if you
 repeated add the same search function, you will get duplicates in the
 list and they can't be removed.  This shows up as a reference leak
 (which it really isn't) in test_unicode with this code modified from
 test_codecs_errors:
 
 import codecs
 def search_function(encoding):
 def encode1(input, errors=strict):
 return 42
 return (encode1, None, None, None)
 
 codecs.register(search_function)
 
 ###
 
 Should the search function be added to the search path if it is
 already in there?  I don't understand a benefit of having duplicate
 search functions.

Me neither :-) I never expected someone to register a search
function more than once, since there's no point in doing so.

 Should users have access to the search path (through a
 codecs.unregister())?  

Maybe, but why would you want to unregister a search function ?

 If so, should it search from the end of the
 list to the beginning to remove an item?  That way the last entry
 would be removed rather than the first.

I'd suggest to raise an exception in case a user tries
to register a search function twice. Removal should be the
same as doing list.remove(), ie. remove the first (and
only) item in the list of search functions.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] registering unicode codecs

2005-11-24 Thread M.-A. Lemburg
Neal Norwitz wrote:
 On 11/24/05, M.-A. Lemburg [EMAIL PROTECTED] wrote:
 
Should users have access to the search path (through a
codecs.unregister())?

Maybe, but why would you want to unregister a search function ?


If so, should it search from the end of the
list to the beginning to remove an item?  That way the last entry
would be removed rather than the first.

I'd suggest to raise an exception in case a user tries
to register a search function twice.
 
 
 This should take care of the testing problem.
 
 
Removal should be the
same as doing list.remove(), ie. remove the first (and
only) item in the list of search functions.
 
 
 Do you recommend adding an unregister()?  It's not necessary for this case.

Not really - I don't see much of a need for this; except
maybe if a codec package wants to replace another codec
package.

So far no-one has requested such a feature, so I'd say
we don't add .unregister() until a request for it pops up.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 24 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] str.dedent

2005-11-13 Thread M.-A. Lemburg
Noam Raphael wrote:
 Following Avi's suggestion, can I raise this thread up again? I think
 that Reinhold's .dedent() method can be a good idea after all.
 
 The idea is to add a method called dedent to strings. It would do
 exactly what the current textwrap.indent function does. 

You are missing a point here: string methods were introduced
to make switching from plain 8-bit strings to Unicode easier.

As such they are only needed in cases where an algorithm
has to work on the resp. internals differently or where direct
access to the internals makes a huge difference in terms
of performance.

In your use case, the algorithm is independent of the data type
interals and can be defined solely by using existing string
method APIs.

 The motivation
 is to be able to write multilined strings easily without damaging the
 visual indentation of the source code, like this:
 
 def foo():
 msg = '''\
  From: %s
  To: %s\r\n'
  Subject: Host failure report for %s
  Date: %s
 
  %s
  '''.dedent() % (fr, ', '.join(to), host, time.ctime(), err)
 
 Writing multilined strings without spaces in the beginning of lines
 makes functions harder to read, since although the Python parser is
 happy with it, it breaks the visual indentation.

This is really a minor compiler/parser issue and not one which
warrants adding another string method.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 13 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2005-10-17: Released mxODBC.Zope.DA 1.0.9http://zope.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Adding examples to PEP 263

2005-11-04 Thread M.-A. Lemburg
Fredrik Lundh wrote:
 the runtime warning you get when you use non-ascii characters in
 python source code points the poor user to this page:
 
 http://www.python.org/peps/pep-0263.html
 
 which tells the user to add a
 
 # -*- coding: encoding name -*-
 
 to the source, and then provides a more detailed syntax description
 as a RE pattern.  to help people that didn't grow up with emacs, and
 don't speak fluent RE, and/or prefer to skim documentation, it would
 be a quite helpful if the page also contained a few examples; e.g.
 
 # -*- coding: utf-8 -*-
 # -*- coding: iso-8859-1 -*-
 
 can anyone with SVN write access perhaps add this?

Good point. I'll add some examples.

 (I'd probably add a note to the top of the page for anyone who arrives
 there via a Python error message, which summarizes the pep and provides
 an example or two; abstracts and rationales are nice, but if you're just a
 plain user, a do this; here's how it works; further discussion below style
 is a bit more practical...)

The PEP isn't all that long, so I don't think a summary would
help. However, we might want to point the user to a different
URL in the error message, e.g. a Wiki page with more user-friendly
content.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 04 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2005-10-17: Released mxODBC.Zope.DA 1.0.9http://zope.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 328 - absolute imports (python-dev sprint at PyCon)

2005-11-01 Thread M.-A. Lemburg
Guido van Rossum wrote:
 On 11/1/05, Phillip J. Eby [EMAIL PROTECTED] wrote:
 
At 10:22 AM 11/1/2005 -0700, Guido van Rossum wrote:

* PEP 328 - absolute/relative import

I assume that references to 2.4 in that PEP should be changed to 2.5, and
so on.
 
 
 For the part that hasn't been implemented yet, yes.
 
 
It also appears to me that the PEP doesn't record the issue brought up by
some people about the current absolute/relative ambiguity being useful for
packaging purposes.  i.e., being able to nest third-party packages such
that they end up seeing their dependencies, even though they're not
installed at the root package level.

For example, I have a package that needs Python 2.4's version of pyexpat,
and I need it to run in 2.3, but I can't really overwrite the 2.3 pyexpat,
so I just build a backported pyexpat and drop it in the package, so that
the code importing it just ends up with the right thing.

Of course, that specific example is okay since 2.3 isn't going to somehow
grow absolute importing.  :)  But I think people brought up other examples
besides that, it's just the one that I personally know I've done.
 
 
 I guess this ought to be recorded. :-(
 
 The issue has been beaten to death and my position remains firm:
 rather than playing namespace games, consistent renaming is the right
 thing to do here. This becomes a trivial source edit, which beats the
 problems of debugging things when it doesn't work out as expected
 (which is very common due to the endless subtleties of loading
 multiple versions of the same code).

Just for reference, may I remind you of this thread last year:

http://mail.python.org/pipermail/python-dev/2004-September/048695.html

The PEP's timeline should be updated accordingly.

Thanks,
-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Nov 01 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] i18n identifiers

2005-10-31 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 Steve Holden wrote:
 
Therefore, if such steps are really going to be considered, I would 
really like to see them introduced in such a way that no breakage occurs 
for existing users, even the parochial ones who feel they (and their 
programs) don't need to understand anything but ASCII.
 
 
 It is straight-forward to make this feature completely backwards
 compatible. Syntactically, it is a pure extension: existing code
 will continue to work unmodified, and will continue to have the
 same meaning. With the feature, you will be able to write code
 that previously produced SyntaxErrors.
 
 Semantically, the only potential incompatibility is that you
 might find Unicode strings in __dict__. If purely-ASCII identifiers
 are going to be represented by byte strings (as they are now),
 no change in meaning for existing code is anticipated.
 
 So it is not necessary to make the feature conditional to preserve
 compatibility.

If people are really all enthusiastic about such a feature,
then it should happen in Python3k when the parser is rewritten
to work on Unicode natively.

Note that if you start with this now, a single module in
your application using Unicode identifiers could potentially
break the application: simply by the fact that stack frames,
tracebacks and module globals would now contain Unicode.

Any processing done on the identifiers, like e.g. error formatting
would then have to deal with Unicode objects (due to the automatic
conversion).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 31 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
You even argued against having non-ASCII identifiers:

http://mail.python.org/pipermail/python-list/2002-May/102936.html
 
 
 I see :-) It seems I have changed my mind since then (which
 apparently predates PEP 263).
 
 One issue I apparently was worried about was the plan to use
 native-encoding byte strings for the identifiers; this I didn't
 like at all.
 
 
* Unicode identifiers are going to introduce massive
code breakage - just think of all the tools people use
to manipulate Python code today; I'm quite sure that
most of it will fail in one way or another if you present
it Unicode literals such as in zähler += 1.
 
 
 True. Today, I think I would be willing to accept the
 code breakage: these tools had quite some time to update
 themselves to PEP 263 (even though not all of them have
 done so yet); also, usage of the feature would only spread
 gradually. A failure to support the feature in the Python
 proper would be treated as a bug by us; how tool providers
 deal with the feature would be their choice.

I was thinking of introspection and debugging tools.
These would then see Unicode objects in the namespace
dictionaries and this will likely break a lot of code -
much for the same reason you see code breakage now
if you let Unicode object enter the Python standard lib
without warning :-)

* People don't seem very interested in using Unicode
identifiers, e.g.

  http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html
 
 
 True. However, I also suspect that lack of tool support
 contributes to that. For the specific case of Java,
 there is no notion of source encoding, which makes Unicode
 identifiers really tedious to use.
 
 If it were really easy to use, I assume people would actually
 use it - atleast in some of the contexts, like teaching,
 where Python is also widely used.

Well, that has two sides: Of course, you'll always find
some people that will like a certain feature. The question
is what effects does it have on the rest of us.

Python has always put some constraints on programmers
to raise code readability, e.g. white space awareness.
Giving them Unicode identifiers sounds like a step
backwards in this context.

Note that I'm not talking about comments, string literal
contents, etc. - only the programming logic, ie. keywords
and identifiers.

Do you really think that it will help with code readability
if programmers are allowed to use native scripts for their
identifiers ?
 
 
 Yes, I do - for some groups of users. Of course, code sharing
 would be more difficult, and there certainly should be a policy
 to use only ASCII in the standard library. But within local
 groups, users would find understanding code easier if they
 knew what the identifiers actually meant.

Hmm, but why do you think they wouldn't understand the meaning of
ASCII versions of the identifiers ?

Note that using ASCII doesn't necessarily mean that you
have to use English as basis for the naming schemes of
identifiers.

If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time. Integrating such
code into other applications will be even harder, since you'd
be forced to use his Japanese class names in your application.
 
 
 Certainly, yes. There is a trade-off: you can make it easier
 for some people to read and write code if they can use their
 native script; at the same time, it would be harder for others
 to read and modify it.
 
 It's a policy decision whether you use English identifiers or
 not - it shouldn't be a technical decision (as it currently
 is).

See above: ASCII != English. Most scripts have a transliteration
into ASCII - simply because that's the global standard for
scripts.

I think source code encodings provide an ideal way to
have comments written in native scripts - and people
use that a lot. However, keeping the program code itself
in plain ASCII makes it far more readable and reusable
across locales. Something that's important in this
globalized world.
 
 
 Certainly. However, some programs don't need to live in
 a globalized world - e.g. if they are homework in a school.
 Within a locale, using native scripts would make the program
 more readable.

True.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-27 Thread M.-A. Lemburg
Greg Ewing wrote:
 M.-A. Lemburg wrote:
 
 
If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time.
 
 
 Or you could look upon it as an opportunity to
 broaden your mental horizons by learning some
 Japanese. :-)

I just took Japanese as exmaple for a language and script
that I don't know anything about. I would actually love
to learn some Japanese, but simply don't have the time
for learning it.

Anyway, I could just as well have chosen Tibetian, Thai or Limbu
scripts (which all look very nice, BTW):

http://www.unicode.org/charts/

Perhaps this is not as bad after all - I just don't think that
it will help code readability in the long run.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 27 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicitconversions).

2005-10-26 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
A few years ago we had a discussion about this on python-dev
and agreed to stick with ASCII identifiers for Python. I still
think that's the right way to go.
 
 I don't think there ever was such an agreement.

You even argued against having non-ASCII identifiers:

http://mail.python.org/pipermail/python-list/2002-May/102936.html

and I agree with you on most of the points you make in that
posting:

* Unicode identifiers are going to introduce massive
code breakage - just think of all the tools people use
to manipulate Python code today; I'm quite sure that
most of it will fail in one way or another if you present
it Unicode literals such as in zähler += 1.

* People don't seem very interested in using Unicode
identifiers, e.g.

  http://mail.python.org/pipermail/i18n-sig/2001-February/000828.html

most of the few who did comment, said they'd rather have
ASCII identifiers, e.g.

  http://mail.python.org/pipermail/python-list/2002-May/104050.html


Do you really think that it will help with code readability
if programmers are allowed to use native scripts for their
identifiers ?

I think this goes beyond just visual aspects of being able
to distinguish graphemes:

If you are told to debug a program
written by say a Japanese programmer using Japanese identifiers
you are going to have a really hard time. Integrating such
code into other applications will be even harder, since you'd
be forced to use his Japanese class names in your application.
This doesn't only introduce problems with being able to enter
the Japanese identifiers, it will also cause your application
to suddenly contain identifiers in Japanese even though that's
not your native script.

I think source code encodings provide an ideal way to
have comments written in native scripts - and people
use that a lot. However, keeping the program code itself
in plain ASCII makes it far more readable and reusable
across locales. Something that's important in this
globalized world.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 26 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Divorcing str and unicode (no more implicit conversions).

2005-10-25 Thread M.-A. Lemburg
Neil Hodgson wrote:
 M.-A. Lemburg:
 
 
Unicode has the concept of combining code points, e.g. you can
store an é (e with a accent) as e + '. Now if you slice
off the accent, you'll break the character that you encoded
using combining code points.
...
next_indextype(u, index) - integer

Returns the Unicode object index for the start of the next
indextype found after u[index] or -1 in case no next element
of this type exists.
 
 
Should entity breakage be further discouraged by returning a slice
 here rather than an object index?

You mean a slice that slices out the next indextype ?

Something like:
 
 i = first_grapheme(u)
 x = 0
 while x  width and u[i] != \n:
x, _ = draw(u[i], (x, y))
i = next_grapheme(u, i)

This sounds a lot like you'd want iterators for the various
index types. Should be possible to implement on top of the
proposed APIs, e.g. itergraphemes(u), itercodepoints(u), etc.

Note that what most people refer to as character is a
grapheme in Unicode speak. Given that interpretation,
breaking Unicode characters is something you won't
ever work around with by using larger code units such
as UCS4 compatible ones.

Furthermore, you should also note that surrogates (two
code units encoding one code point) are part of Unicode
life. While you don't need them when storing Unicode
in UCS4 code units, they can still be part of the
Unicode data and the programmer has to be aware of
these.

I personally, don't think that slicing Unicode is
such a big issue. If you know what you are doing,
things tend not to break - which is true for pretty
much everything you do in programming ;-)

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] New codecs checked in

2005-10-25 Thread M.-A. Lemburg
Martin v. Löwis wrote:
 M.-A. Lemburg wrote:
 
I just left them in because I thought they wouldn't do any harm
and might be useful in some applications.

Removing them where not directly needed by the codec would not
be a problem.
 
 
 I think memory usage caused is measurable (I estimated 4KiB per
 dictionary). More importantly, people apparently currently change
 the dictionaries we provide and expect the codecs to automatically
 pick up the modified mappings. It would be better if the breakage
 is explicit (i.e. they get an AttributeError on the variable) instead
 of implicit (their changes to the mapping simply have no effect
 anymore).

Agreed. I've already checked in the changes, BTW.

KOI8-U is not available as mapping on ftp.unicode.org and
I only recreated codecs from the mapping files available
there.
 
 
 I think we should come up with mapping tables for the additional
 codecs as well, and maintain them in the CVS. This also applies
 to things like rot13.

Agreed.

I'll rerun the creation with the above changes sometime this
week.
 
 
 I hope I can finish my encoding routine shortly, which again
 results in changes to the codecs (replacing the encoding dictionaries
 with other lookup tables).

Having seen the decode tables written as long Unicode string,
I think that this may indeed also be a good solution for
encoding - the major improvement here is that the parser
and compiler will do the work of creating the table. At
module load time, the .pyc file will only contain a long
string which is very fast to create and load (unlike dictionaries
which are set up dynamically at load time).

In general, it's better to do all the work up-front when
creating the codecs, rather than having run-time code
repeat these tasks over and over again.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2005)
 Python/Zope Consulting and Support ...http://www.egenix.com/
 mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


<    4   5   6   7   8   9   10   >