date:20060214

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Fred L. Drake, Jr.

On Wednesday 15 February 2006 01:44, Greg Ewing wrote:
 > If the protocol has been sensibly designed, that shouldn't
 > happen, since everything up to the coding marker should
 > be ascii (or some other protocol-defined initial coding).

Indeed.

 > For protocols that are not sensibly designed (or if you're
 > just trying to guess) what you suggest may be needed. But
 > it would be good to have a nicer way of going about it
 > for when the protocol is sensible.

I agree in principle, but the example of using an HTML  tag as a source 
of document encoding information isn't sensible.  Unfortunately, it's still 
part of the HTML specification.  :-(

I'm not opposing a way to do a sensible thing, but wanted to note that it 
wasn't going to be right for all cases, with such an example having been 
mentioned already (though the issues with it had not been fully spelled out).

  -Fred

-- 
Fred L. Drake, Jr.   
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Greg Ewing

Fred L. Drake, Jr. wrote:

> The proper response in this case is often to re-start decoding 
> with the correct encoding, since some of the data extracted so far may have 
> been decoded incorrectly.

If the protocol has been sensibly designed, that shouldn't
happen, since everything up to the coding marker should
be ascii (or some other protocol-defined initial coding).

For protocols that are not sensibly designed (or if you're
just trying to guess) what you suggest may be needed. But
it would be good to have a nicer way of going about it
for when the protocol is sensible.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Ron Adam wrote:

> My first impression and thoughts were:  (and seems incorrect now)
> 
>  bytes(object) ->  byte sequence of objects value
> 
> Basically a "memory dump" of objects value.

As I understand the current intentions, this is correct.
The bytes constructor would have two different signatures:

(1)   bytes(seq) --> interprets seq as a sequence of
 integers in the range 0..255,
 exception otherwise

(2a)  bytes(str, encoding) --> encodes the characters of
(2b)  bytes(unicode, encoding) the string using the specified
   encoding

In (2a) the string would be interpreted as containing
ascii characters, with an exception otherwise. In 3.0,
(2a) will disappear leaving only (1) and (2b).

> And I was thinking a bytes argument of more than one item would indicate 
> a byte sequence.
> 
>  bytes(1,2,3)  ->  bytes([1,2,3])

But then you have to test the argument in the one-argument
case and try to guess whether it should be interpreted as
a sequence or an integer. Best to avoid having to do that.

> Which is fine... so ???
> 
> b = bytes(0L) ->  bytes([0,0,0,0])

No, bytes(0L) --> TypeError because 0L doesn't implement
the iterator protocol or the buffer interface.

I suppose long integers might be enhanced to support the
buffer interface in 3.0, but that doesn't seem like a good
idea, because the bytes you got that way would depend on
the internal representation of long integers. In particular,

   bytes(0x12345678L)

via the buffer interface would most likely *not* give you
bytes[0x12, 0x34, 0x56, 0x78]).

Maybe types should grow a __bytes__ method?

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] 2.5 PEP

2006-02-14 Thread Neal Norwitz

Attached is the 2.5 release PEP 356.  It's also available from: 
http://www.python.org/peps/pep-0356.html

Does anyone have any comments?  Is this good or bad?  Feel free to
send to me comments.

We need to ensure that PEPs 308, 328, and 343 are implemented.  We
have possible volunteers for 308 and 343, but not 328.  Brett is doing
352 and Martin is doing 353.

We also need to resolve a bunch of other implementation details about
providing the C AST to Python, bdist_* issues and a few more possible
stdlib modules.  Don't be shy, tell the world what you think about
these.

Can someone go through PEP 4 and 11 and determine what work needs to be done?

The more we distribute the work, the easier it will be on everyone. 
You don't really want to listen to me whine any more do you? ;-)

Thank you,
n
PEP: 356
Title: Python 2.5 Release Schedule
Version: $Revision: 42375 $
Author: Neal Norwitz, GvR
Status: Draft
Type: Informational
Created: 07-Feb-2006
Python-Version: 2.5
Post-History: 

Abstract

This document describes the development and release schedule for
Python 2.5.  The schedule primarily concerns itself with PEP-sized
items.  Small features may be added up to and including the first
beta release.  Bugs may be fixed until the final release.

There will be at least two alpha releases, two beta releases, and
one release candidate.  The release date is planned 30 September 2006.


Release Manager

TBD (Anthony Baxter?)

Martin von Loewis is building the Windows installers,
Fred Drake the doc packages, and
TBD (Sean Reifschneider?) the RPMs.


Release Schedule

alpha 1: May 6, 2006 [planned]
alpha 2: June 3, 2006 [planned]
alpha 3: July 1, 2006 [planned]
beta 1:  July 29, 2006 [planned]
beta 2:  August 26, 2006 [planned]
rc 1:September 16, 2006 [planned]
final:   September 30, 2006 [planned]


Completed features for 2.5

PEP 309: Partial Function Application
PEP 314: Metadata for Python Software Packages v1.1
(should PEP 314 be marked final?)
PEP 341: Unified try-except/try-finally to try-except-finally
PEP 342: Coroutines via Enhanced Generators

- AST-based compiler

- Add support for reading shadow passwords (http://python.org/sf/579435)

- any()/all() builtin truth functions

- new hashlib module add support for SHA-224, -256, -384, and -512
  (replaces old md5 and sha modules)

- new cProfile module suitable for profiling long running applications
  with minimal overhead


Planned features for 2.5

PEP 308: Conditional Expressions
(Someone volunteered on python-dev, is there progress?)

PEP 328: Absolute/Relative Imports
(Needs volunteer, mail python-dev if interested)

PEP 343: The "with" Statement
(nn: I have a possible volunteer.)

Note there are two separate implementation parts:
interpreter changes and python code for utilities.

PEP 352: Required Superclass for Exceptions
(Brett Cannon is expected to implement this.)

PEP 353: Using ssize_t as the index type
MvL expects this to be complete in March.

Access to C AST from Python

Add bdist_msi to the distutils package.  (MvL wants one more
independent release first.)

Add bdist_deb to the distutils package?
(see http://mail.python.org/pipermail/python-dev/2006-February/060926.html)

Add bdist_egg to the distutils package???

Add setuptools to the standard library.

Add wsgiref to the standard library.

(GvR: I have a bunch more that could/would/should be added. -- Still true?)


Deferred until 2.6:

- None


Open issues

This PEP needs to be updated and release managers confirmed.

- Review PEP  4: Deprecate and/or remove the modules
- Review PEP 11: Remove support for platforms as described


Copyright

This document has been placed in the public domain.



Local Variables:
mode: indented-text
indent-tabs-mode: nil
End:





___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] 2.5 release schedule

2006-02-14 Thread Brett Cannon

On 2/14/06, Neal Norwitz <[EMAIL PROTECTED]> wrote:
> I was hoping to get a lot more feedback about PEP 356 and the 2.5
> release schedule.
>
> http://www.python.org/peps/pep-0356.html
>
> I updated the schedule it is now:
>
> alpha 1: May 6, 2006 [planned]
> alpha 2: June 3, 2006 [planned]
> alpha 3: July 1, 2006 [planned]
> beta 1:  July 29, 2006 [planned]
> beta 2:  August 26, 2006 [planned]
> rc 1:September 16, 2006 [planned]
> final:   September 30, 2006 [planned]
>
> What do people think about that?  There are still a lot of features we
> want to add.  Is this ok with everyone?  Do you think it's realistic?
>

Speaking as one of the people who has a PEP to implement, I am okay with it.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] how to upload new MacPython web page?

2006-02-14 Thread Bill Janssen

We (the pythonmac-sig mailing list) seem to have converged (almost --
still talking about the logo) on a new download page for MacPython, to
replace the page currently at
http://www.python.org/download/download_mac.html.  The strawman can be
seen at http://bill.janssen.org/mac/new-macpython-page.html.

How do I get the bits changed on python.org (when we're finished)?

Bill
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] 2.5 release schedule

2006-02-14 Thread Neal Norwitz

I was hoping to get a lot more feedback about PEP 356 and the 2.5
release schedule.

http://www.python.org/peps/pep-0356.html

I updated the schedule it is now:

alpha 1: May 6, 2006 [planned]
alpha 2: June 3, 2006 [planned]
alpha 3: July 1, 2006 [planned]
beta 1:  July 29, 2006 [planned]
beta 2:  August 26, 2006 [planned]
rc 1:September 16, 2006 [planned]
final:   September 30, 2006 [planned]

What do people think about that?  There are still a lot of features we
want to add.  Is this ok with everyone?  Do you think it's realistic?

We still need a release manager.  No one has heard from Anthony.  If
he isn't interested is someone else interested in trying their hand at
it?  There are many changes necessary in PEP 101 because since the
last release both python and pydotorg have transitioned from CVS to
SVN.  Creosote also moved.

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Adam Olsen

On 2/14/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> Not entirely, since I don't know what b"abcdef" would mean
> (where  is a Unicode Euro character typed in whatever source
> encoding was used).

SyntaxError I would hope.  Ascii and hex escapes only please. :)

Although I'm not arguing for or against byte literals.  They do make
for a much terser form, but they're not strictly necessary.

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Adam Olsen

On 2/14/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 2/13/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > If I understand correctly there's three main candidates:
> > 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x
>
> I'm not sure what you mean, but I'm guessing you're thinking that the
> repr() of a bytes object created from bytes('abc\xf0') would be
>
>   bytes('abc\xf0')
>
> under this rule. What's so bad about that?

See below.


> > 2. Direct copying to str/unicode if it's only ascii values, switching
> > to a list of hex literals if there's any non-ascii values
>
> That works for me too. But why hex literals? As MvL stated, a list of
> decimals would be just as useful.

PEBKAC.  Yeah, decimals are simpler and shorter even.


> > 3. b"foo" literal with ascii for all ascii characters (other than \
> > and "), \xFF for individual characters that aren't ascii
> >
> > Given the choice I prefer the third option, with the second option as
> > my runner up.  The first option just screams "silent errors" to me.
>
> The 3rd is out of the running for many reasons.
>
> I'm not sure I understand your "silent errors" fear; can you elaborate?

I think it's that someone will create a unicode object with real
latin-1 characters and it'll get passed through without errors, the
code assuming it's 8bit-as-latin-1.  If they had put other unicode
characters in they would have gotten an exception instead.

However, at this point all the posts on latin-1 encoding/decoding have
become so muddled in my mind that I don't know what they're
suggesting.  I think I'll wait for the pep to clear that up.

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-14 Thread Neal Norwitz

On 2/14/06, Fred L. Drake, Jr. <[EMAIL PROTECTED]> wrote:
>
> Releases generally aren't a problem, since they're heavily automated and
> scheduled well in advance.  I'm glad to continue helping with that,
> especially since that seems to be about all I can get to sometimes.

Great, I updated the PEP.

> Documentation build errors should probably be separated from leak detection
> reports.  I don't know what it would take to get them separated.

Yup, they already are AFAICT.  I will activate the 2.4 doc builds to
send failures to python-checkins unless someone has a better idea. 
These should be very rare.  The destination is controlled by
FAILURE_MAILTO in Misc/build.sh.

> The general question of where the development docs should show up remains.
[4 options sliced]

Agreed, I don't have a strong opinion either.  There should definitely
only be one place to look though.  That should make things easier. 
What do others think?

> My own inclination is that if we continue to use docs.python.org, it should
> contain only one copy of the documentation, and that should be for the most
> recent "stable" release (though perhaps an updated version of the
> documentation).

+1

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Adam Olsen

On 2/14/06, Guido van Rossum <[EMAIL PROTECTED]> wrote:
> On 2/14/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> > In 3.0 it changes to:
> > "It's...".encode('utf-8')
> > u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility
>
> No. 3.0 won't have "backward compatibility" features. That's the whole
> point of 3.0.

Conceded.


> > I realize it would be odd for the interactive interpret to print them
> > as a list of ints by default:
> > >>> u"It's...".byteencode('utf-8')
> > [73, 116, 39, 115, 46, 46, 46]
>
> No. This prints the repr() which should include the type. bytes([73,
> 116, 39, 115, 46, 46, 46]) is the right thing to print here.

Typo, sorry :)


--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Adam Olsen

On 2/14/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Raymond Hettinger wrote:
> >>- bytes("abc") == bytes(map(ord, "abc"))
> >
> >
> > At first glance, this seems obvious and necessary, so if it's somewhat
> > controversial, then I'm missing something.  What's the issue?
>
> There is an "implicit Latin-1" assumption in that code. Suppose
> you do
>
> # -*- coding: koi-8r -*-
> print bytes("Гвидо ван Россум")
>
> in Python 2.x, then this means something (*). In Python 3, it gives
> you an exception, as the ordinals of this are suddenly above 256.
>
> Or, perhaps worse, the code
>
> # -*- coding: utf-8 -*-
> print bytes("Martin v. Löwis")
>
> will work in 2.x and 3.x, but produce different numbers (**).

My assumption is these would become errors in 3.x.  bytes(str) is only
needed so you can do bytes(u"abc".encode('utf-8')) and have it work in
2.x and 3.x.

(I wonder if maybe they should be an error in 2.x as well.  Source
encoding is for unicode literals, not str literals.)

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-14 Thread Fred L. Drake, Jr.

On Tuesday 14 February 2006 03:09, Neal Norwitz wrote:
 > While you are here, are you planning to do the doc releases for 2.5?
 > You are tentatively listed in PEP 356.  (Technically it says TBD with
 > a ? next to your name.)

Releases generally aren't a problem, since they're heavily automated and 
scheduled well in advance.  I'm glad to continue helping with that, 
especially since that seems to be about all I can get to sometimes.

 > I think this was the quick hack I did.  I hope there are many
 > concerns. :-)  For example, if the doc build fails, ...  Hmmm, this
 > probably isn't a problem.  The doc won't be updated, but will still be
 > the last good version.  So if I send mail when the doc doesn't build,
 > then it might not be so bad.  

Seems reasonable to me.

 > I still need to 
 > switch over the failure mails to go to python-checkins.  There are too 
 > many right now though.  Unless people don't mind getting several
 > messages about refleaks every day?  Anyone?

Documentation build errors should probably be separated from leak detection 
reports.  I don't know what it would take to get them separated.

 > That shouldn't be a problem.  See http://docs.python.org/dev/2.4/

Works for me!  Thanks for putting the effort into this.

The general question of where the development docs should show up remains.  
There are a number of options:

1. www.python.org/dev/doc/, where I'd put them at one point

2. www.python.org/doc/..., which is reasonable, but new

3. docs.python.org/dev/, which seems reasonable, but docs.python.org
   proponents may not like

4. www.python.org/dev/doc/ for trunk documentation, and
   docs.python.org/ and/or www.python.org/doc/current/ for maintenance updates

That last one has a certain appeal.  It would allow corrections to go online 
quicker, so people using python.org or a mirror would get updates quickly (an 
advantage of delivering docs over the net!), and I wouldn't get so many 
repeat reports of commonly-noticed typos.  The released versions would still 
be available via www.python.org/doc/x.y.z/.

My own inclination is that if we continue to use docs.python.org, it should 
contain only one copy of the documentation, and that should be for the most 
recent "stable" release (though perhaps an updated version of the 
documentation).  I'm not really on either side of the fence about whether 
docs.python.org is the "right thing" to do; the idea came out of the folks 
interested in advocacy.

  -Fred

-- 
Fred L. Drake, Jr.   
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Fred L. Drake, Jr.

On Tuesday 14 February 2006 22:34, Greg Ewing wrote:
 > Seems to me this is a case where you want to be able
 > to change encodings in the middle of reading the stream.
 > You start off reading the data as ascii, and once you've
 > figured out the encoding, you switch to that and carry
 > on reading.

Not quite.  The proper response in this case is often to re-start decoding 
with the correct encoding, since some of the data extracted so far may have 
been decoded incorrectly.  A very carefully constructed application may be 
able to go back and re-decode any data saved from the stream with the 
previous encoding, but that seems like it would be pretty fragile in 
practice.

There may be cases where switching encoding on the fly makes sense, but I'm 
not aware of any actual examples of where that approach would be required.

  -Fred

-- 
Fred L. Drake, Jr.   
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Adam Olsen

On 2/14/06, Just van Rossum <[EMAIL PROTECTED]> wrote:
> +1 for two functions.
>
> My choice would be open() for binary and opentext() for text. I don't
> find that backwards at all: the text function is going to be more
> different from the current open() function then the binary function
> would be since in many ways the str type is closer to bytes than to
> unicode.
>
> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

Thus providing us with a transition period, even with warnings on use
of the old function.

I think coming up with a way to transition that doesn't silently break
code and doesn't leave us with permanent ugly names is the hardest
challenge here.

+1 on opentext(), openbinary()
-1 on silently changing open() in a way that results in breakage

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] Please comment on PEP 357 -- adding nb_index slot to PyNumberMethods

2006-02-14 Thread Travis E. Oliphant

After some revisions, PEP 357 is ready for more comments.  Please voice 
any concerns.


-Travis
PEP: 357
Title: Allowing Any Object to be Used for Slicing
Version: $Revision: 42367 $
Last Modified: $Date: 2006-02-14 18:12:07 -0700 (Tue, 14 Feb 2006) $
Author: Travis Oliphant <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Created: 09-Feb-2006
Python-Version: 2.5

Abstract

This PEP proposes adding an nb_index slot in PyNumberMethods and an
__index__ special method so that arbitrary objects can be used
whenever only integers are called for in Python, such as in slice
syntax (from which the slot gets its name).

Rationale

Currently integers and long integers play a special role in
slicing in that they are the only objects allowed in slice
syntax. In other words, if X is an object implementing the
sequence protocol, then X[obj1:obj2] is only valid if obj1 and
obj2 are both integers or long integers.  There is no way for obj1
and obj2 to tell Python that they could be reasonably used as
indexes into a sequence.  This is an unnecessary limitation.

In NumPy, for example, there are 8 different integer scalars
corresponding to unsigned and signed integers of 8, 16, 32, and 64
bits.  These type-objects could reasonably be used as integers in
many places where Python expects true integers but cannot inherit from 
the Python integer type because of incompatible memory layouts.  
There should be some way to be able to tell Python that an object can 
behave like an integer.

It is not possible to use the nb_int (and __int__ special method)
for this purpose because that method is used to *coerce* objects
to integers.  It would be inappropriate to allow every object that
can be coerced to an integer to be used as an integer everywhere
Python expects a true integer.  For example, if __int__ were used
to convert an object to an integer in slicing, then float objects
would be allowed in slicing and x[3.2:5.8] would not raise an error
as it should.

Proposal
 
Add an nb_index slot to PyNumberMethods, and a corresponding
__index__ special method.  Objects could define a function to place
in the nb_index slot that returns an appropriate C-integer (Py_ssize_t
after PEP 353).  This C-integer will be used whenever Python needs
one such as in PySequence_GetSlice, PySequence_SetSlice, and
PySequence_DelSlice.  

Specification:

1) The nb_index slot will have the signature

   Py_ssize_t index_func (PyObject *self)

2) The __index__ special method will have the signature

   def __index__(self):
   return obj
   
   Where obj must be either an int or a long or another object that has the
   __index__ special method (but not self).

3) A new C-API function PyNumber_Index will be added with signature

   Py_ssize_t PyNumber_index (PyObject *obj)

   which will special-case integer and long integer objects but otherwise
   return obj->ob_type->tp_as_number->nb_index(obj) if it is available. 
   A -1 will be returned and an exception set on an error. 

4) A new operator.index(obj) function will be added that calls
   equivalent of obj.__index__() and raises an error if obj does not 
implement
   the special method.
   
Implementation Plan

1) Add the nb_index slot in object.h and modify typeobject.c to 
   create the __index__ method

2) Change the ISINT macro in ceval.c to ISINDEX and alter it to 
   accomodate objects with the index slot defined.

3) Change the _PyEval_SliceIndex function to accomodate objects
   with the index slot defined.

4) Change all builtin objects (e.g. lists) that use the as_mapping 
   slots for subscript access and use a special-check for integers to 
   check for the slot as well.

5) Add PyNumber_Index C-API to return an integer from any 
   Python Object that has the nb_index slot.  

6) Add the operator.index(x) function.


Possible Concerns

Speed: 

Implementation should not slow down Python because integers and long
integers used as indexes will complete in the same number of
instructions.  The only change will be that what used to generate
an error will now be acceptable.

Why not use nb_int which is already there?:

The nb_int method is used for coercion and so means something
fundamentally different than what is requested here.  This PEP
proposes a method for something that *can* already be thought of as
an integer communicate that information to Python when it needs an
integer.  The biggest example of why using nb_int would be a bad
thing is that float objects already define the nb_int method, but
float objects *should not* be used as indexes in a sequence.

Why the name __index__?:

Some questions were raised regarding the name __index__ when other
interpretations of the slo

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Ron Adam

Greg Ewing wrote:
> Guido van Rossum wrote:
>> On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
>>
>>> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>>
 On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:

 What would bytes("abc\xf0", "latin-1") *mean*? 
>>> I'm saying that XXX would be the same encoding as you specified.  i.e.,
>>> including an encoding means you are encoding the *meaning* of the string.
> 
> No, this is wrong. As I understand it, the encoding
> argument to bytes() is meant to specify how to *encode*
> characters into the bytes object. If you want to be able
> to specify how to *decode* a str argument as well, you'd
> need a third argument.

I'm not sure I understand why this would be needed?  But maybe it's 
still too early to pin anything down.

My first impression and thoughts were:  (and seems incorrect now)

 bytes(object) ->  byte sequence of objects value

Basically a "memory dump" of objects value.  And so...

 object(bytes) ->  copy of original object

This would reproduce a copy of the original object as long as the from 
and to object are the same type with no encoding needed.  If they are 
different then you would get garbage, or an error. But that would be a 
programming error and not a language issue. It would be up to the 
programmer to not do that.

Of course this is one of those easier to say than do concepts I'm sure.

And I was thinking a bytes argument of more than one item would indicate 
a byte sequence.

 bytes(1,2,3)  ->  bytes([1,2,3])

Where any values above 255 would give an error,  but it seems an 
explicit list is preferred.  And that's fine because it creates a way 
for bytes to know how to handle everything else. (I think)

bytes([1,2,3]]  -> bytes[(1,2,3)]

Which is fine... so ???

b = bytes(0L) ->  bytes([0,0,0,0])

long(b) ->  0Lconvert it back to 0L

And ...

b = bytes([0L])  ->  bytes([0])  # a single byte

int(b) ->  0convert it back to 0
long(b) ->  0L

It's up to the programmer to know if it's safe. Working with raw data is 
always a programmer needs to be aware of what's going on thing.

But would it be any different with strings?  You wouldn't ever want to 
encode one type's bytes into a different type directly. It would be 
better to just encode it back to the original type, then use *it's* 
encoding method to change it.

so...

   b = bytes(s)  ->  bytes( raw sequence of bytes )

Weather or not you get a single byte per char or multiple bytes per 
character would depend on the strings encoding.

   s = str(bytes, encoding)  ->  original string

You need to specify it here, because there is more than one sting 
encoding. To avoid encodings entirely we would need a type for each 
encoding. (which isn't really avoiding anything) And it's the "raw data 
so programmer needs to be aware" situation again. Don't decode to 
something other than what it is.

If someone needs automatic encoding/decoding, then they probably should 
write a class to do what they want.  Something roughly like...

   class bytekeeper(object):
  b = None
  t = None
  e = None
  def __init__(self, obj, enc='bytes')   # or whatever encoding
 self.e = enc
 self.t = type(obj)
 self.b = bytes(obj)
  def decode(self):
 ...

Would we be able to subclass bytes?

 class bytekeeper(bytes):   ?
...

Ok.. enough rambling... I wonder how much of this is way out in left 
field.  ;)

cheers,
  Ronald Adam

And as fa

In this case the encoding argument would only be needed not to

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Travis E. Oliphant

Guido van Rossum wrote:
> I'm about to send 6 or 8 replies to various salient messages in the
> PEP 332 revival thread. That's probably a sign that there's still a
> lot to be sorted out. In the mean time, to save you reading through
> all those responses, here's a summary of where I believe I stand.
> Let's continue the discussion in this new thread unless there are
> specific hairs to be split in the other thread that aren't addressed
> below or by later posts.


I hope bytes objects will be pickle-able?  If so, and they support the 
buffer protocol, then many NumPy users will be very happy.

-Travis

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Greg Ewing

Thomas Wouters wrote:
> 
> The encoding of network streams or files may be
> entirely unknown beforehand, and depend on the content: a content-encoding,
> a  HTML tag. Will bytes-strings get string methods for easy
> searching of content descriptors?

Seems to me this is a case where you want to be able
to change encodings in the middle of reading the stream.
You start off reading the data as ascii, and once you've
figured out the encoding, you switch to that and carry
on reading.

Are there any plans to make it possible to change the
encoding of a text file object on the fly like this?

If that would be awkward, maybe file objects themselves
shouldn't be where the decoding occurs, but decoders
should be separate objects that wrap byte streams.
Under that model,

   opentext(filename, encoding)

would be a factory function that did something like

   codecs.streamdecoder(encoding, openbinary(filename))

Having codecs be stream filters might be a good idea
anyway, since then you could use them to wrap anything
that can be treated as a stream of bytes (sockets,
some custom object in your program, etc.), you
could create pipelines of encoders and decoders, etc.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Greg Ewing

Thomas Wouters wrote:

> Well, as an end user, I honestly don't care.
> As a programmer, I also don't care.

Perhaps I've been burned once too often by someone's
oh-so-clever installer script screwing up and leaving
me to wade through an impenetrable pile of makefiles,
shell scripts and m4 macros trying to figure out what
went wrong and what I can possibly do to fix it, but
I've become a deep believer in keeping things simple.

Common sense suggests that a system which keeps
everything related to a package, and only to that
package, in one directory, has got to be more robust
than one which scatters files far and wide and then
relies on some elaborate bookkeeping system to try
to make sure things don't step on each other's toes.

When everything goes right, I don't care either. But
things go wrong often enough to make me care about
unnecessary complexity in the tools I use.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Greg Ewing

Trent Mick wrote:

> ActivePython and MacPython have to install stuff to:
> 
> /usr/local/bin/...
> /Library/Frameworks/Python.framework/...
> /Applications/MacPython-2.4/...  # just MacPython does this

It's not perfect, but it's still a lot better than the
situation on any other unix I've seen so far. It's a
bit more complicated with something like Python, which
is really several things - a library, an application,
and some unix programs (the latter of which don't really
fit into the MacOSX structure).

At least all of the myriad library and header files go
together under a single easily-identified directory, if
you know where to look for it.

 > /Library/Documentation/Help/...
 > # Symlink needed here to have a hope of registration with
 > # Apple's (crappy) help viewer system to work.

I didn't know about that one. It never even occurred to me
that Python might *have* Apple Help Viewer files. I use
Firefox to view all my Python documentation. :-)

> Also, a receipt of the installation ends up here:
> 
> /Library/Receipts/$package_name/...
> 
> though Apple does not provide tools for uninstallation using those
> receipts.

And I hope they don't! I'd rather see progress towards
a system where you don't *need* a special tool to uninstall
something. It should be as simple and obvious as dragging
a file or folder to the trash.

> open DMG, don't run the app from here, drag it to your
> Applications folder, then eject this window/disk, then run it from
> /Applications,

A decently-designed application should be runnable from
anywhere, including a dmg, if the user wants to do that.
If an app refuses to run from a dmg, I consider that a
bug in the application.

Likewise, the user should be able to put it anywhere on
the HD, not just the Applications folder.

Also I consider the need for a dmg in the first place
to be a bug in the Web. :-) (You should be able to just
directly download the .app file.)

This sort of thing is still not quite as smooth as it
was under Classic MacOS, but I'm hopeful of improvement.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Bob Ippolito

On Feb 14, 2006, at 5:22 PM, Trent Mick wrote:

> [Greg Ewing wrote]
>> MacOSX seems to be the only system so far that has got
>> this right -- organising the system so that everything
>> related to a given application or library can be kept
>> under a single directory, clearly labelled with a
>> version number.
>
> ActivePython and MacPython have to install stuff to:
>
> /usr/local/bin/...

The /usr/local/bin links are superfluous.. people should really be  
putting sys.prefix/bin on their path, cause that's where distutils  
scripts get installed to.

> /Library/Frameworks/Python.framework/...
> /Applications/MacPython-2.4/...  # just MacPython does this

ActivePython doesn't install app bundles for IDLE or anything?

> /Library/Documentation/Help/...
> # Symlink needed here to have a hope of registration with
> # Apple's (crappy) help viewer system to work.

It is pretty bad.. probably even worth punting on this step.

>
> Also, a receipt of the installation ends up here:
>
> /Library/Receipts/$package_name/...
>
> though Apple does not provide tools for uninstallation using those
> receipts.

That stuff is really behind the scenes stuff that's wholly managed by  
Installer.app and is pretty much irrelevant.

> Mac OS X's installation tech ain't no panacea. If one is just
> distributing a single .app, then it is okay. If one is just  
> distributing
> a library with no UI (graphical or otherwise) for the user, then it is
> okay. And "okay" here still means a pretty poor installation  
> experience
> for the user: open DMG, don't run the app from here, drag it to your
> Applications folder, then eject this window/disk, then run it from
> /Applications, etc.

Single apps are better than OK.  Download them by whatever means you  
want, put them wherever you want, and run them.  You can run any well- 
behaved application from a DMG (or a CD, or a USB key, or any other  
readable media).

Libraries are not so great, as you've said.  However, only developers  
should have to install libraries.  Good applications are shipped with  
all of the libraries they need embedded in the application bundle.   
Dynamic linkage should only really happen internally, and to vendor  
supplied libraries.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Guido van Rossum wrote:
> On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> 
>>At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
>>
>>>On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
>>>
>>>What would bytes("abc\xf0", "latin-1") *mean*? 
>>
>>I'm saying that XXX would be the same encoding as you specified.  i.e.,
>>including an encoding means you are encoding the *meaning* of the string.

No, this is wrong. As I understand it, the encoding
argument to bytes() is meant to specify how to *encode*
characters into the bytes object. If you want to be able
to specify how to *decode* a str argument as well, you'd
need a third argument.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Thomas Wouters

On Wed, Feb 15, 2006 at 02:00:21PM +1300, Greg Ewing wrote:
> Joe Smith wrote:
> 
> > Windows and RPM are known for major dependency problems, letting packages 
> > damage each other, having packages that do not uninstall cleanly (i.e. 
> > packages that leave junk all over the place) and generally messing the 
> > sytem 
> > up quite baddly over time, so that the OS is usually removed and 
> > re-installed periodically.)
> 
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.

Well, as an end user, I honestly don't care. I install stuff through apt, it
installs the dependencies for me, does basic configuration where applicable
(often asking for user-input once, then remembering the settings) and allows
me to deinstall when I'm tired of a package. As long as apt handles it, I
couldn't care less whether it's installed in separate directories, large
bzip2 archives with suitable playmates from mixed ethnicity to improve
social contact, or spread out across every 17th byte of a logical volume.

As a programmer, I also don't care. I tell distutils which modules/packages,
data files and scripts to install, and it does the rest. And that's why I
like my Python packages to become .deb's through bdist_deb :)

You-think-too-much'ly y'rs,
-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Bob Ippolito

On Feb 14, 2006, at 5:00 PM, Greg Ewing wrote:

> Joe Smith wrote:
>
>> Windows and RPM are known for major dependency problems, letting  
>> packages
>> damage each other, having packages that do not uninstall cleanly  
>> (i.e.
>> packages that leave junk all over the place) and generally messing  
>> the sytem
>> up quite baddly over time, so that the OS is usually removed and
>> re-installed periodically.)
>
> I'm disappointed that the various Linux distributions
> still don't seem to have caught onto the very simple
> idea of *not* scattering files all over the place when
> installing something.
>
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number.
>
> I haven't looked closely into eggs yet, but if they allow
> Python packages to be managed this way, and do it cross-
> platform, that's a very good reason to prefer using eggs
> over a platform-specific package format.

It should also be mentioned that eggs and platform-specific package  
formats are absolutely not mutually exclusive.  You could use apt/rpm/ 
ports/etc. to fetch/build/install eggs too.  There are very few  
reasons not to use eggs -- in theory anyway, the implementation isn't  
finished yet.

The only things that really need to change are the packages like  
Twisted, numpy, or SciPy that don't have a distutils-based main  
setup.py... Technically, since egg is just a specification, they  
could even implement it themselves without the help of setuptools  
(though that seems like a bad approach).

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Guido van Rossum wrote:

> The only remaining question is what if anything to do with an
> encoding argment when the first argument is of type str...)

 From what you said earlier about str in 2.x being
interpretable as a unicode string which contains
only ascii, it seems to me that if you say

   bytes(s, encoding)

where s is a str, then by the presence of the encoding
argument you're saying that you want s to be treated as
unicode and encoded using the specified encoding.
So the result should be the same as

   bytes(u, encoding)

where u is a unicode string containing the same code
points as s. This implies that it should be an error
if s contains non-ascii characters.

This interpretation would satisfy the requirement for
a single call signature covering both unicode and
str-used-as-ascii-characters, while providing a
different call signature (without encoding) for
str-used-as-bytes.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Trent Mick

[Greg Ewing wrote]
> MacOSX seems to be the only system so far that has got
> this right -- organising the system so that everything
> related to a given application or library can be kept
> under a single directory, clearly labelled with a
> version number.

ActivePython and MacPython have to install stuff to:

/usr/local/bin/...
/Library/Frameworks/Python.framework/...
/Applications/MacPython-2.4/...  # just MacPython does this
/Library/Documentation/Help/...
# Symlink needed here to have a hope of registration with
# Apple's (crappy) help viewer system to work.

Also, a receipt of the installation ends up here:

/Library/Receipts/$package_name/...

though Apple does not provide tools for uninstallation using those
receipts.

Mac OS X's installation tech ain't no panacea. If one is just
distributing a single .app, then it is okay. If one is just distributing
a library with no UI (graphical or otherwise) for the user, then it is
okay. And "okay" here still means a pretty poor installation experience
for the user: open DMG, don't run the app from here, drag it to your
Applications folder, then eject this window/disk, then run it from
/Applications, etc.

Trent

-- 
Trent Mick
[EMAIL PROTECTED]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Jim Jewett <[EMAIL PROTECTED]> wrote:
> On 2/14/06, Neil Schemenauer  wrote:
> > People could spell it bytes(s.encode('latin-1')) in order to make it
> > work in 2.X.
>
> Guido wrote:
> > At the cost of an extra copying step.
>
> That sounds like an implementation issue.  If it is important
> enough to matter, then why not just add some smarts to the
> bytes constructor?

Short answer: you can't.

> If the argument is a str, and the constructor owns the only
> reference, then go ahead and use the argument's own
> underlying array; the string itself will be deallocated when
> (or before) the constructor returns, so no one else can use
> it expecting an immutable.

Hard to explain, but the VM usually keeps an extra reference on the
stack so the refcount is never 1. But you can't rely on that so
assuming that it's safe to reuse the storage if it's >1. Also, since
the str's underlying array is allocated inline with the str header,
this require str and bytes to have the same object layout. But since
bytes are mutable, they can't.

Summary: you don't understand the implementation well enough to
suggest these kinds of things.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Martin v. Löwis

Raymond Hettinger wrote:
>>- bytes("abc") == bytes(map(ord, "abc"))
> 
> 
> At first glance, this seems obvious and necessary, so if it's somewhat 
> controversial, then I'm missing something.  What's the issue?

There is an "implicit Latin-1" assumption in that code. Suppose
you do

# -*- coding: koi-8r -*-
print bytes("Гвидо ван Россум")

in Python 2.x, then this means something (*). In Python 3, it gives
you an exception, as the ordinals of this are suddenly above 256.

Or, perhaps worse, the code

# -*- coding: utf-8 -*-
print bytes("Martin v. Löwis")

will work in 2.x and 3.x, but produce different numbers (**).

Regards,
Martin

(*) [231, 215, 201, 196, 207, 32, 215, 193, 206, 32, 242, 207, 211, 211,
213, 205]

(**) In 2.x, this will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 195, 182, 119, 105, 115]
whereas in 3.x, it will give
[77, 97, 114, 116, 105, 110, 32, 118, 46, 32, 76, 246, 119, 105, 115]
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Greg Ewing

Alex Martelli wrote:

> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?

Because those words are just names for pieces of data,
with nothing to connect them with files or the act of
opening a file.

I think the association of "open" with "file" is
established strongly enough in programmers' brains that
dropping it now would just lead to unnecessary confusion.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Greg Ewing

Joe Smith wrote:

> Windows and RPM are known for major dependency problems, letting packages 
> damage each other, having packages that do not uninstall cleanly (i.e. 
> packages that leave junk all over the place) and generally messing the sytem 
> up quite baddly over time, so that the OS is usually removed and 
> re-installed periodically.)

I'm disappointed that the various Linux distributions
still don't seem to have caught onto the very simple
idea of *not* scattering files all over the place when
installing something.

MacOSX seems to be the only system so far that has got
this right -- organising the system so that everything
related to a given application or library can be kept
under a single directory, clearly labelled with a
version number.

I haven't looked closely into eggs yet, but if they allow
Python packages to be managed this way, and do it cross-
platform, that's a very good reason to prefer using eggs
over a platform-specific package format.

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Thomas Wouters

On Wed, Feb 15, 2006 at 01:51:03PM +1300, Greg Ewing wrote:
> Thomas Wouters wrote:

> > Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
> > writes a regular distutils setup.py, and I can install the latest,
> > not-quite-in-apt version by doing 'setup.py bdist_deb' and installing the
> > resulting .deb.

> Why not just do 'setup.py install' directly?

Because that *does* overwrite files the package system might not want
overwritten, and the resulting install is not listed in the packaging
system, not taken into account on upgrades, etc. I don't want to keep track
of a separate list of distutils-installed packages; that's what I use APT
for. If I wanted to keep manually massaging my system after each install or
upgrade, I'd be using Gentoo or FreeBSD ;)

(I should point out that CPAN and CPANPLUS on FreeBSD do this slightly
better; they register packages installed through CPAN (or actually the
build/install part of it, MakefileMaker or whatever it's called) with the
FreeBSD packaging database. I don't know what distutils does on FreeBSD, but
that packaging database is just a bunch of files in appropriately named
directories in /var/db/pkg...)

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Bob Ippolito

On Feb 14, 2006, at 4:17 PM, Guido van Rossum wrote:

> On 2/14/06, Bob Ippolito <[EMAIL PROTECTED]> wrote:
>> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
>>> - we need a new PEP; PEP 332 won't cut it
>>>
>>> - no b"..." literal
>>>
>>> - bytes objects are mutable
>>>
>>> - bytes objects are composed of ints in range(256)
>>>
>>> - you can pass any iterable of ints to the bytes constructor, as  
>>> long
>>> as they are in range(256)
>>
>> Sounds like array.array('B').
>
> Sure.
>
>> Will the bytes object support the buffer interface?
>
> Do you want them to?
>
> I suppose they should *not* support the *text* part of that API.

I would imagine that it'd be convenient for integrating with existing  
extensions... e.g. initializing an array or Numeric array with one.

>> Will it accept
>> objects supporting the buffer interface in the constructor (or a
>> class method)?  If so, will it be a copy or a view?  Current
>> array.array behavior says copy.
>
> bytes() should always copy -- thanks for asking.

I only really ask because it's worth fully specifying these things.   
Copy seems a lot more sensible given the rest of the interpreter and  
stdlib (e.g. buffer(x) seems to always return a read-only buffer).

>>> - longs or anything with an __index__ method should do, too
>>>
>>> - when you index a bytes object, you get a plain int
>>
>> When slicing a bytes object, do you get another bytes object or a
>> list? If its a bytes object, is it a copy or a view?  Current
>> array.array behavior says copy.
>
> Another bytes object which is a copy.
>
> (Why would you even think about views here? They are evil.)

I mention views because that's what numpy/Numeric/numarray/etc.  
do...  It's certainly convenient at times to have that functionality,  
for example, to work with only the alpha channel in an RGBA image.   
Probably too magical for the bytes type.

 >>> import numpy
 >>> image = numpy.array(list('RGBARGBARGBA'))
 >>> alpha = image[3::4]
 >>> alpha
array([A, A, A], dtype=(string,1))
 >>> alpha[:] = 'X'
 >>> image
array([R, G, B, X, R, G, B, X, R, G, B, X], dtype=(string,1))

>>> Very controversial:
>>>
>>> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
>>> argument
>>>
>>> - bytes(u"abc") == bytes("abc") # for ASCII at least
>>>
>>> - bytes(u"\x80\xff") raises UnicodeError
>>>
>>> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
>>>
>>> Martin von Loewis's alternative for the "very controversial" set  
>>> is to
>>> disallow an encoding argument and (I believe) also to disallow  
>>> Unicode
>>> arguments. In 3.0 this would leave us with s.encode()  
>>> as the
>>> only way to convert a string (which is always unicode) to bytes. The
>>> problem with this is that there's no code that works in both 2.x and
>>> 3.0.
>>
>> Given a base64 or hex string, how do you get a bytes object out of
>> it?  Currently str.decode('base64') and str.decode('hex') are good
>> solutions to this... but you get a str object back.
>
> I don't know -- you can propose an API you like here. base64 is as
> likely to encode text as binary data, so I don't think it's wrong for
> those things to return strings.

That's kinda true I guess -- but you'd still need an encoding in py3k  
to turn base64 -> text.  A lot of the current codecs infrastructure  
doesn't make sense in py3k -- for example, the 'zlib' encoding, which  
is really a bytes transform, or 'unicode_escape' which is a text  
transform.

I suppose there aren't too many different ways you'd want to encode  
or decode data to binary (beyond the text codecs), they should  
probably just live in a module -- something like the binascii we have  
now.  I do find the codecs infrastructure to be convenient at times  
(maybe too convenient), but since you're not interested in adding  
functions to existing types then a module seems like the best approach.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Greg Ewing

Thomas Wouters wrote:

> Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
> writes a regular distutils setup.py, and I can install the latest,
> not-quite-in-apt version by doing 'setup.py bdist_deb' and installing the
> resulting .deb.

Why not just do 'setup.py install' directly?

-- 
Greg Ewing, Computer Science Dept, +--+
University of Canterbury,  | Carpe post meridiam! |
Christchurch, New Zealand  | (I'm not a morning person.)  |
[EMAIL PROTECTED]  +--+
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Martin v. Löwis

Bob Ippolito wrote:
>>Martin von Loewis's alternative for the "very controversial" set is to
>>disallow an encoding argument and (I believe) also to disallow Unicode
>>arguments. In 3.0 this would leave us with s.encode() as the
>>only way to convert a string (which is always unicode) to bytes. The
>>problem with this is that there's no code that works in both 2.x and
>>3.0.
> 
> 
> Given a base64 or hex string, how do you get a bytes object out of  
> it?  Currently str.decode('base64') and str.decode('hex') are good  
> solutions to this... but you get a str object back.

If s is a base64 string,

bytes(s.decode("base64"))

should work. In 2.x, it returns a str, which is then copied into
bytes; in 3.x, .decode("base64") returns a byte string already (*),
for which an extra copy is made.

I would prefer to see base64.decodestring to return bytes,
though - perhaps even in 2.x already.

Regards,
Martin

(*) Interestingly enough, the "base64" encoding will work reversed
in terms of types, compared to all other encodings. Where .encode
returns bytes normally, it will return a string for base64, and
vice versa (assuming the bytes type has .decode/.encode methods).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Raymond Hettinger

[Guido van Rossum]
> Somewhat controversial:
>
> - bytes("abc") == bytes(map(ord, "abc"))

At first glance, this seems obvious and necessary, so if it's somewhat 
controversial, then I'm missing something.  What's the issue?


Raymond 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Jim Jewett

On 2/14/06, Neil Schemenauer  wrote:
> People could spell it bytes(s.encode('latin-1')) in order to make it
> work in 2.X.

Guido wrote:
> At the cost of an extra copying step.

That sounds like an implementation issue.  If it is important
enough to matter, then why not just add some smarts to the
bytes constructor?

If the argument is a str, and the constructor owns the only
reference, then go ahead and use the argument's own
underlying array; the string itself will be deallocated when
(or before) the constructor returns, so no one else can use
it expecting an immutable.

-jJ
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] nice()

2006-02-14 Thread Terry Reedy


"Greg Ewing" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> I don't think you're doing anyone any favours by trying to protect
> them from having to know about these things, because they *need* to
> know about them if they're not to write algorithms that seem to
> work fine on tests but mysteriously start producing garbage when
> run on real data,

I agree.  Here was my 'kick-in-the-butt' lesson (from 20+ years ago):  the 
'simplified for computation' formula for standard deviation, found in too 
many statistics books without a warning as to its danger, and specialized 
for three data points, is sqrt( ((a*a+b*b+c*c)-(a+b+c)**2/3.0) /2.0). 
After 1000s of ok calculations, the data were something like a,b,c = 
10005,10006,10007.  The correct answer is 1.0 but with numbers rounded to 7 
digits, the computed answer is sqrt(-.5) == CRASH.  I was aware that 
subtraction lost precision but not how rounding could make a theoretically 
guaranteed non-negative difference negative.

Of course, Python floats being C doubles makes such glitches much rarer. 
Not exposing C floats is a major newbie (and journeyman) protection 
feature.

Terry Jan Reedy



___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread Martin v. Löwis

Jeremy Hylton wrote:
>>Perhaps there is some value in finding functions which ought to expect
>>const char*. For that, occasional checks should be sufficient; I cannot
>>see a point in having code permanently pass with that option. In
>>particular not if you are interfacing with C libraries.
> 
> 
> I don't understand what you mean:  I'm not sure what you mean by
> "occasional checks" or "permanently pass".  The compiler flags are
> always the same.

I'm objecting to the "this warning should never occur" rule. If the
warning is turned on in a regular build, then clearly it is desirable
to make it go away in all cases, and add work-arounds to make it
go away if necessary.

This is bad, because it means you add work-arounds to code where
really no work-around is necessary (e.g. because it is *known* that
some function won't modify the storage behind a char*, even though
it doesn't take a const char*). So it is appropriate that the
warning generates many false positives. Therefore, it should be
a manual interaction to turn this warning on, inspect all the
messages, and fix those that need correction, then turn the warning
off again.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Thomas Wouters

On Tue, Feb 14, 2006 at 03:13:25PM -0800, Guido van Rossum wrote:

> Martin von Loewis's alternative for the "very controversial" set is to
> disallow an encoding argument and (I believe) also to disallow Unicode
> arguments. In 3.0 this would leave us with s.encode() as the
> only way to convert a string (which is always unicode) to bytes. The
> problem with this is that there's no code that works in both 2.x and
> 3.0.

Unless you only ever create (byte)strings by doing s.encode(), and only send
them to code that is either byte/string-agnostic or -aware. Oh, and don't
use indexing, only slicing (length-1 if you have to.) I guess it depends on
howmuch code will accept a bytes-string where currently a string is the norm
(and a unicode object is default-encoded.)

I'm still worried that all this is quite a big leap. Very few people
understand the intricacies of unicode encodings. (Almost everyone
understands unicode, except they don't know it yet; it's the encodings that
are the problem.) By forcing everything to be unicode without a uniform
encoding-detection scheme, we're forcing every programmer who opens a file
or reads from the network to think about encodings. This will be a pretty
big step for newbie programmers.

And it's not just that. The encoding of network streams or files may be
entirely unknown beforehand, and depend on the content: a content-encoding,
a  HTML tag. Will bytes-strings get string methods for easy
searching of content descriptors? Will the 're' module accept bytes-strings?
What would the literals you want to search for, look like? Do I really do
'if bytes("Content-Type:") in data:' and such? Should data perhaps get read
using the opentext() equivalent of 'decode('ascii', 'replace')' and then
parsed the 'normal' way? What about data gotten from an extension? And
nevermind what the 'right way' for that is; what will *programmers* do? The
'right way' often escapes them.

It may well be that I'm thinking too conservatively, too stuck in the old
ways, but I think we're being too hasty in dismissing the ol' string. Don't
get me wrong, I really like the idea of as much of Python doing unicode as
possible, and the idea of a mutable bytes type sounds good to me too. I just
don't like the wide gap between the troublesome-to-get unicode object and
the unreadable-repr, weird-indexing, hard-to-work-with bytes-string. I don't
think adding something inbetween is going to work (we basically have that
now, the normal string), so I suggest the bytes-string becomes a bit more
'string' and a bit less 'sequence of bytes'. Perhaps in the form of:

 - A bytes type that repr()'s to something readable

 - A way to write byte literals that doesn't bleed the eyes, and isn't so
   fragile in the face of source-encoding (all the suggestions so far have
   you explicitly re-stating the source-encoding at each bytes("".encode()))
   If you have to wonder why that's fragile, just think about a recoding
   editor. Alternatively, get a short way to say 'encode in source-encoding'

 (I can't think of anything better than b"..." for the above two...
  Except... hmm... didn't `` become available in Py3k? Too little visual
  distinction?)

 - A way to manipulation the bytes as character-strings. Pattern matching,
   splitting, finding, slicing, etc. Quite like current strings.

 - Disallowing any interaction between bytes and real (meaning 'unicode')
   strings. Not "oh, let's assume ascii or the default encoding", either. If
   the user wants to explicitly decode using 'ascii', that's their choice,
   but they should consciously make it.

 - Mutable or immutable, I don't know. I fear that if the bytes type was
   easy enough to handle and mutable, and the normal (unicode) strings were
   immutable, people may end up using bytes all the time. In fact, they may
   do that anyway; I'm sure Python will grow entire subcults that prefer
   doing 'string("\xa1Python!")' where 'string' is
   'bytes(arg.encode("iso-8859-1"))'

Bytes should be easy enough to manipulate 'as strings' to do the basic
tasks, but not easy enough to encourage people to forget about that whole
annoying 'encoding' business and just use them instead (which is basically
what we have now.) On the other hand, if people don't want to deal with that
whole encoding business, we should allow them to -- consciously. We can
offer a variety of hints and tips on how to figure out the encoding of
something, but we can't do the thinking for them (trust me, I've tried.)

When a file's encoding is specified in file metadata, that's great, really
great. When a network connection is handled by a library that knows how to
deal with the content (*cough*Twisted*cough*) and can decode it for you,
that's really great too. But we're not there yet, not by a long shot. And
explaining encodings to a ADHD-infested teenager high on adrenalin and
creative inspiration who just wants to connect to an IRC server to make his
bot say "Hi!", well, that's hard. I'd rather they don't go

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Guido van Rossum

On 2/14/06, Bob Ippolito <[EMAIL PROTECTED]> wrote:
> On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:
> > - we need a new PEP; PEP 332 won't cut it
> >
> > - no b"..." literal
> >
> > - bytes objects are mutable
> >
> > - bytes objects are composed of ints in range(256)
> >
> > - you can pass any iterable of ints to the bytes constructor, as long
> > as they are in range(256)
>
> Sounds like array.array('B').

Sure.

> Will the bytes object support the buffer interface?

Do you want them to?

I suppose they should *not* support the *text* part of that API.

> Will it accept
> objects supporting the buffer interface in the constructor (or a
> class method)?  If so, will it be a copy or a view?  Current
> array.array behavior says copy.

bytes() should always copy -- thanks for asking.

> > - longs or anything with an __index__ method should do, too
> >
> > - when you index a bytes object, you get a plain int
>
> When slicing a bytes object, do you get another bytes object or a
> list? If its a bytes object, is it a copy or a view?  Current
> array.array behavior says copy.

Another bytes object which is a copy.

(Why would you even think about views here? They are evil.)

> > - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
> >
> > Somewhat controversial:
> >
> > - it's probably too big to attempt to rush this into 2.5
> >
> > - bytes("abc") == bytes(map(ord, "abc"))
> >
> > - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,
> > 256])
>
> It would be VERY controversial if ord('\xff') == 256 ;)

Oops. :-)

> > Very controversial:
> >
> > - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"
> > argument
> >
> > - bytes(u"abc") == bytes("abc") # for ASCII at least
> >
> > - bytes(u"\x80\xff") raises UnicodeError
> >
> > - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
> >
> > Martin von Loewis's alternative for the "very controversial" set is to
> > disallow an encoding argument and (I believe) also to disallow Unicode
> > arguments. In 3.0 this would leave us with s.encode() as the
> > only way to convert a string (which is always unicode) to bytes. The
> > problem with this is that there's no code that works in both 2.x and
> > 3.0.
>
> Given a base64 or hex string, how do you get a bytes object out of
> it?  Currently str.decode('base64') and str.decode('hex') are good
> solutions to this... but you get a str object back.

I don't know -- you can propose an API you like here. base64 is as
likely to encode text as binary data, so I don't think it's wrong for
those things to return strings.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] byte literals unnecessary [Was: PEP 332 revival in coordination with pep 349?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Neil Schemenauer <[EMAIL PROTECTED]> wrote:
> Maybe you should ask your coworkers. :-)  I think gmail is trying to
> do something intelligent with the Mail-Followup-To header.

But you're the only person for whom it does that. Do you have a funny
gmail setting?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Barry Warsaw

On Tue, 2006-02-14 at 15:13 -0800, Guido van Rossum wrote:

> So I'm taking that the specific properties you want to model are the
> overflow behavior, right? N-bit unsigned is defined as arithmethic mod
> 2**N; N-bit signed is a bit more tricky to define but similar. These
> never overflow but instead just throw away bits in an exactly
> specified manner (2's complement arithmetic).

That would be my use case, yep.

> While I personally am comfortable with writing (x+y) & 0x (for
> 16-bit unsigned), I can see that someone who spends a lot of time
> doing arithmetic in this field might want specialized types.

I'd put it in the "annoying, although there exists a workaround that
might confound newbies" category.  Which means it's definitely not
urgent enough to address for 2.5 -- if ever -- especially given your
current stance on bytes(bunch_of_ints)[0].  The two are of course
separate issues, but thinking about one lead to the other.

> But I'm not sure that that's what the Numeric folks want -- I believe
> they're more interested in saving space, not in the mod 2**N
> properties. 

Could be.  I don't care about space savings.  And I definitely have no
clue what the Numeric folks want. ;)

> There's certainly a point to treating bytes as ints; I don't know if
> it's more compelling than to treating them as unit bytes. But if we
> decide that the bytes types contains ints, b[0] should return a plain
> int (whose value necessarily is in range(0, 256)), not some new
> unsigned-8-bit type. And creating a bytes object from a list of ints
> should accept any input values as long as their __index__ value is in
> that same range.
> 
> I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and
> bytes([-1]) should raise a ValueError.

That seems fine to me.

> I agree it's icky, and I'd rather not design APIs like that -- but I
> can't help it that others continue to want to use that idiom. I also
> agree that most likely we'll want to treat bytes the same as strings
> here. But no basestring (bytes are mutable and don't behave like
> sequences of characters).

That's interesting.  So bytes really behave a lot more like some weird
string/lists hybrid then? It makes some sense.  You read 801 bytes from
a binary file, twiddle bytes 223 and 741 and then write those bytes back
out to a different binary file.

If we don't inherit from basestring, what I'm worried about is that for
those who do continue to use the idiom described previously, we'll have
to extend our isinstance() to include both basestring and bytes.  Which
definitely gets ickier.  But if bytes are mutable, as make sense, then
it also makes sense that they don't inherit from basestring.

BTW, using that idiom is a bit of a hedge against such API (which you
may not control).  It allows us to say "okay, at /this/ point I don't
know whether I have a scalar or a sequence, but from this point forward,
I know I have something I can safely iterate over."

I wonder if it makes sense to add a more fundamental abstract base class
that can be used as a marker for "photonic behavior".  I don't know what
that class would be called, but you'd then have a hierarchy like this:

photonic
basestring
str
unicode
bytes

OTOH, it seems like a lot to add for a specialized (and some would say
dubious) use case.

-Barry

signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Neil Schemenauer <[EMAIL PROTECTED]> wrote:
> People could spell it bytes(s.encode('latin-1')) in order to make it
> work in 2.X.  That spelling would provide a way of ensuring the type
> of the return value.

At the cost of an extra copying step.

[Guido]
> > You missed the part where I said that introducing the bytes type
> > *without* a literal seems to be a good first step. A new type, even
> > built-in, is much less drastic than a new literal (which requires
> > lexer and parser support in addition to everything else).
>
> Are you concerned about the implementation effort?  If so, I don't
> think that's justified since adding a new string prefix should be
> pretty straightforward (relative to rest of the effort involved).

Not so much the implementation but also the documentation, updating
3rd party Python preprocessors, etc.

> Are you comfortable with the proposed syntax?

Not entirely, since I don't know what b"abcdef" would mean
(where  is a Unicode Euro character typed in whatever source
encoding was used).

Instead of b"abc" (only ASCII) you could write bytes("abc"). Instead
of b"\xf0\xff\xee" you could write bytes([0xf0, 0xff, 0xee]).

The key disconnect for me is that if bytes are not characters, we
shouldn't use a literal notation that resembles the literal notation
for characters. And there's growing consensus that a bytes type should
be considered as an array of (8-bit unsigned) ints.

Also, bytes objects are (in my mind anyway) mutable. We have no other
literal notation for mutable objects. What would the following code
print?

  for i in range(2):
b = b"abc"
print b
b[0] = ord("A")

Would the second output line print abc or Abc?

I guess the only answer that makes sense is that it should print abc
both times; but that means that b"abc" must be internally implemented
by creating a new bytes object each time. Perhaps the implementation
effort isn't so minimal after all...

(PS why is there a reply-to in your email the excludes you from the
list of recipients but includes me?)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Phillip J. Eby

At 03:14 PM 2/14/2006 -0800, Bob Ippolito wrote:
>I'm also not sure what the uninstallation story
>with scripts is.

The scripts have enough breadcrumbs in them that you can figure out what 
egg they go with.  More precisely, an egg contains enough information for 
you to search PATH for its scripts and verify that they still refer to the 
egg before removing them.

This is of course fragile if you put the scripts in some random location 
not on your PATH.

Anyway, actual *implementation* of uninstallation features isn't going to 
be until the 0.7 development cycle.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] byte literals unnecessary [Was: PEP 332 revival in coordination with pep 349?]

2006-02-14 Thread Neil Schemenauer

On Tue, Feb 14, 2006 at 03:13:37PM -0800, Guido van Rossum wrote:
> Also, bytes objects are (in my mind anyway) mutable. We have no other
> literal notation for mutable objects. What would the following code
> print?
> 
>   for i in range(2):
> b = b"abc"
> print b
> b[0] = ord("A")
> 
> Would the second output line print abc or Abc?
>
> I guess the only answer that makes sense is that it should print abc
> both times; but that means that b"abc" must be internally implemented
> by creating a new bytes object each time. Perhaps the implementation
> effort isn't so minimal after all...

I agree.  I was thinking that bytes() would be immutable and
therefore very similar to the current str object.  You've convinced
me that a literal representation is not needed.  Thanks for
clarifying your position.

> (PS why is there a reply-to in your email the excludes you from the
> list of recipients but includes me?)

Maybe you should ask your coworkers. :-)  I think gmail is trying to
do something intelligent with the Mail-Followup-To header.

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes type discussion

2006-02-14 Thread Bob Ippolito


On Feb 14, 2006, at 3:13 PM, Guido van Rossum wrote:

> I'm about to send 6 or 8 replies to various salient messages in the
> PEP 332 revival thread. That's probably a sign that there's still a
> lot to be sorted out. In the mean time, to save you reading through
> all those responses, here's a summary of where I believe I stand.
> Let's continue the discussion in this new thread unless there are
> specific hairs to be split in the other thread that aren't addressed
> below or by later posts.
>
> Non-controversial (or almost):
>
> - we need a new PEP; PEP 332 won't cut it
>
> - no b"..." literal
>
> - bytes objects are mutable
>
> - bytes objects are composed of ints in range(256)
>
> - you can pass any iterable of ints to the bytes constructor, as long
> as they are in range(256)

Sounds like array.array('B').

Will the bytes object support the buffer interface?  Will it accept  
objects supporting the buffer interface in the constructor (or a  
class method)?  If so, will it be a copy or a view?  Current  
array.array behavior says copy.

> - longs or anything with an __index__ method should do, too
>
> - when you index a bytes object, you get a plain int

When slicing a bytes object, do you get another bytes object or a  
list?  If its a bytes object, is it a copy or a view?  Current  
array.array behavior says copy.

> - repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'
>
> Somewhat controversial:
>
> - it's probably too big to attempt to rush this into 2.5
>
> - bytes("abc") == bytes(map(ord, "abc"))
>
> - bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128,  
> 256])

It would be VERY controversial if ord('\xff') == 256 ;)

> Very controversial:
>
> - bytes("abc", "encoding") == bytes("abc") # ignores the "encoding"  
> argument
>
> - bytes(u"abc") == bytes("abc") # for ASCII at least
>
> - bytes(u"\x80\xff") raises UnicodeError
>
> - bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")
>
> Martin von Loewis's alternative for the "very controversial" set is to
> disallow an encoding argument and (I believe) also to disallow Unicode
> arguments. In 3.0 this would leave us with s.encode() as the
> only way to convert a string (which is always unicode) to bytes. The
> problem with this is that there's no code that works in both 2.x and
> 3.0.

Given a base64 or hex string, how do you get a bytes object out of  
it?  Currently str.decode('base64') and str.decode('hex') are good  
solutions to this... but you get a str object back.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Thomas Wouters

On Tue, Feb 14, 2006 at 05:05:08PM -0500, Joe Smith wrote:

> I don't like the idea of bdist_deb very much.
> The idea behind the debian packaging system is that unlike with RPM and 
> Windows, package management should be clean.

The idea behind RPM is also that package management should be clean. Debian
packages, on average, do a better job, and 'dpkg' deals a bit more flexibly
with overwritten files and such, but it's not that big a difference.

> The Debian style system attempts to overcome these deficiencies, and
> generally does a decent job with it. The problem is that this can really
> only work if packages are well maintained, and adhere to a set of policies
> that help to further mitigate these problems. Making it easy to generate
> .debs of python modules will likely result in a noticable increase in the
> number of .debs that do not target a specific distribution and/or do not
> follow the policies of that distribution.

That sounds like "oh no, what if the user presses the wrong button". Users
can already mess up the system if they do the wrong thing. Distutils offers
a simple, generic way of saying 'install this' while letting distutils
figure out most of the details. bdist_deb can then put it all in
debian-specific locations, in the debian-preferred way, while registering
all the files so they get deleted properly on deinstall. Things get more
complicated when you have pre-/post-install/remove scripts, but those are
pretty rare for the average Python packages, and since they would (in the
Python package) most likely run from setup.py, those would break at
bdist-time, not deb-install-time.

It's not easier for bdist-deb created .deb's to break things than it is for
arbitrary developer-built .deb's to do so, and it's quite a bit easier for
'setup.py install' to break things. At least a .deb can be easily removed.
And the alternative to bdist_deb is in many cases 'setup.py install'.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Thomas Wouters

On Tue, Feb 14, 2006 at 05:48:57PM -0500, Barry Warsaw wrote:
> On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:
> 
> > What about shorter names, such as 'text' instead of 'opentext' and
> > 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> > might make it easy to eventually migrate off it. Maybe text and data
> > could be two subclasses of file, with file remaining initially as it
> > is (and perhaps becoming an abstract-only baseclass at the time 'open'
> > is deprecated).
> 
> I was actually thinking about static methods file.text() and file.data()
> which seem nicely self descriptive, if a little bit longer.

Make them classmethods though, like dict.fromkeys.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Bob Ippolito

On Feb 14, 2006, at 2:05 PM, Joe Smith wrote:

>
> "Guido van Rossum" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>> In private email, Phillip Eby suggested to add these things to the
>> 2.5. standard library:
>>
>> bdist_deb, bdist_msi, and friends
>>
>> He explained them as follows:
>>
>> """
>> bdist_deb makes .deb files (packages for Debian-based Linux  
>> distros, like
>> Ubuntu).  bdist_msi makes .msi installers for Windows (it's by  
>> Martin v.
>> Loewis).  Marc Lemburg proposed on the distutils-sig that these and
>> various
>> other implemented bdist_* formats (other than bdist_egg) be  
>> included in
>> the
>> next Python release, and there was no opposition there that I recall.
>> """
>>
>
> I don't like the idea of bdist_deb very much.
> The idea behind the debian packaging system is that unlike with RPM  
> and
> Windows, package management should be clean.
>
> Windows and RPM are known for major dependency problems, letting  
> packages
> damage each other, having packages that do not uninstall cleanly (i.e.
> packages that leave junk all over the place) and generally messing  
> the sytem
> up quite baddly over time, so that the OS is usually removed and
> re-installed periodically.)

This is one problem that eggs go a LONG way towards solving,  
especially for platforms such as Windows and OS X that do not ship  
with an intelligent package management solution.

The way that eggs are built more or less guarantees that they remain  
consistent, because it temporarily replaces file/open/etc and some  
other functions with sanity checks to make sure that the installation  
layout is self-contained** and thus compatible with eggs.  It's not a  
real chroot, of course, but it's good enough for all practical purposes.

The only things that easy_install overwrites** in the context of eggs  
are other eggs with an identical filename (version, platform, etc.),  
unless explicitly asked to do otherwise (e.g. remove some existing  
older version).  Uninstallation is of course similarly clean, because  
it just nukes one directory or .egg file, and/or an associated .pth  
file.

** The exception is scripts.  Scripts go wherever --install-scripts=  
point to, and AFAIK there is no means to ensure that the scripts from  
one egg do not interfere with the scripts for another egg or anything  
else on the PATH.  I'm also not sure what the uninstallation story  
with scripts is.

-bob

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > As Phillip guessed, I was indeed thinking about introducing bytes()
> > sooner than that, perhaps even in 2.5 (though I don't want anything
> > rushed).
>
> Hmm, that is probably going to be too early. As the thread shows
> there are lots of things to take into account, esp. since if you
> plan to introduce bytes() in 2.x, the upgrade path to 3.x would
> have to be carefully planned. Otherwise, we end up introducing
> a feature which is meant to prepare for 3.x and then we end up
> causing breakage when the move is finally implemented.

You make a good point. Someone probably needs to write up a new PEP
summarizing this discussion (or rather, consolidating the agreement
that is slowly emerging, where there is agreement, and summarizing the
key open questions).

> > Even in Py3k though, the encoding issue stands -- what if the file
> > encoding is Unicode? Then using Latin-1 to encode bytes by default
> > might not by what the user expected. Or what if the file encoding is
> > something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> > Anything default but ASCII isn't going to work as expected. ASCII
> > isn't going to work as expected either, but it will complain loudly
> > (by throwing a UnicodeError) whenever you try it, rather than causing
> > subtle bugs later.
>
> I think there's a misunderstanding here: in Py3k, all "string"
> literals will be converted from the source code encoding to
> Unicode. There are no ambiguities - a Klingon character will still
> map to the same ordinal used to create the byte content regardless
> of whether the source file is encoded in UTF-8, UTF-16 or
> some Klingon charset (are there any ?).

OK, so a string (literal or otherwise) containing a Klingon character
won't be acceptable to the bytes() constructor in 3.0. It shouldn't be
in 2.x either then.

I still think that someone who types a file in Latin-1 and enters
non-ASCII Latin-1 characters in a string literal and then passes it to
the bytes() constructor might expect to get bytes encoded in Latin-1,
and someone who types a file in UTF-8 and enters non-ASCII Unicode
characters might expect to get UTF-8-encoded bytes. Since they can't
both get what they want, we should disallow both, and only allow
ASCII.

> Furthermore, by restricting to ASCII you'd also outrule hex escapes
> which seem to be the natural choice for presenting binary data in
> literals - the Unicode representation would then only be an
> implementation detail of the way Python treats "string" literals
> and a user would certainly expect to find e.g. \x88 in the bytes object
> if she writes bytes('\x88').

I guess we'l just have to disappoint her. Too bad for the person who
wrote bytes("\x12\x34\x56\x78\x9a\xbc\xde\xf0") -- they'll have to
write bytes([0x12,0x34,0x56,0x78,0x9a,0xbc,0xde,0xf0]). Not so bad IMO
and certainly easier than a *mixture* of hex and ASCII like
'\xabc\xdef'.

> But maybe you have something different in mind... I'm talking
> about ways to create bytes() in Py3k using "string" literals.

I'm not sure that's going to be common practive except for ASCII
characters used in network protocols.

> >> While we're at it: I'd suggest that we remove the auto-conversion
> >> from bytes to Unicode in Py3k and the default encoding along with
> >> it.
> >
> > I'm not sure which auto-conversion you're talking about, since there
> > is no bytes type yet. If you're talking about the auto-conversion from
> > str to unicode: the bytes type should not be assumed to have *any*
> > properties that the current str type has, and that includes
> > auto-conversion.
>
> I was talking about the automatic conversion of 8-bit strings to
> Unicode - which was a key feature to make the introduction of
> Unicode less painful, but will no longer be necessary in Py3k.

OK. The bytes type certainly won't have this property.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> A related question: what would bytes([104, 101, 108, 108, 111, 8004])
> return?  An exception hopefully.

Absolutely.

> I also think you'd want bytes([x
> for x in some_bytes_object]) to return an object equal to the original.

You mean if types(some_bytes_object) is bytes? Yes. But that doesn't
constrain the API much.

Anyway, I'm now convinced that bytes should act as an array of ints,
where the ints are restricted to range(0, 256) but have type int.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> At 04:29 PM 2/13/2006 -0800, Guido van Rossum wrote:
> >On 2/13/06, Phillip J. Eby <[EMAIL PROTECTED]> wrote:
> > > I didn't mean that it was the only purpose.  In Python 2.x, practical code
> > > has to sometimes deal with "string-like" objects.  That is, code that 
> > > takes
> > > either strings or unicode.  If such code calls bytes(), it's going to want
> > > to include an encoding so that unicode conversions won't fail.
> >
> >That sounds like a rather hypothetical example. Have you thought it
> >through? Presumably code that accepts both str and unicode either
> >doesn't care about encodings, but simply returns objects of the same
> >type as the arguments -- and then it's unlikely to want to convert the
> >arguments to bytes; or it *does* care about encodings, and then it
> >probably already has to special-case str vs. unicode because it has to
> >control how str objects are interpreted.
>
> Actually, it's the other way around.  Code that wants to output
> uninterpreted bytes right now and accepts either strings or Unicode has to
> special-case *unicode* -- not str, because str is the only "bytes type" we
> currently have.

But this is assuming that the str input is indeed uninterpreted bytes.
That may be a tacit assumption or agreement but it may be wrong. Also,
there are many ways to interpret "uninterpreted bytes" -- is it an
image, a sound file, or UTF-8 text? In 2 out of those 3, passing
unicode is more likely a bug than anything else (except in Jython).

> This creates an interesting issue in WSGI for Jython, which of course only
> has one (unicode-based) string type now.  Since there's no bytes type in
> Python in general, the only solution we could come up with was to treat
> such strings as latin-1:

I believe that's the general convention in Jython, as it matches the
default (albeit deprecated) conversion between bytes and characters in
Java itself.

>  http://www.python.org/peps/pep-0333.html#unicode-issues
>
> This is why I'm biased towards latin-1 encoding of unicode to bytes; it's
> "the same thing" as an uninterpreted string of bytes.

But in CPython this is not how this is generally done.

> I think the difference in our viewpoints is that you're still thinking
> "string" thoughts, whereas I'm thinking "byte" thoughts.  Bytes are just
> bytes; they don't *have* an encoding.

I think when one side of the equation is Unicode, in CPython, I can be
forgiven for thinking string thoughts, since Unicode is never used to
carry binary bytes in CPython.

You may have to craft some kind of different rule for Jython; it
doesn't have a default encoding used when str meets unicode.

> So, if you think of "converting a string to bytes" as meaning "create an
> array of numerals corresponding to the characters in the string", then this
> leads to a uniform result whether the characters are in a str or a unicode
> object.  In other words, to me, bytes(str_or_unicode) should be treated as:
>
>  bytes(map(ord, str_or_unicode))
>
> In other words, without an encoding, bytes() should simply treat str and
> unicode objects *as if they were a sequence of integers*, and produce an
> error when an integer is out of range.  This is a logical and consistent
> interpretation in the absence of an encoding, because in that case you
> don't care about the encoding - it's just raw data.

I see your point (now that you mentioned Jython). But I still don't
think that this is a good default for CPython.

> If, however, you include an encoding, then you're stating that you want to
> encode the *meaning* of the string, not merely its integer values.

Note that in Python 3000 we won't be using str/unicode to carry
integer values around, since we will have the bytes type. So there, it
makes sense to think of the conversion to always involve an encoding,
possibly a default one. (And I think the default might more usefully
be UTF-8 then.)

> >What would bytes("abc\xf0", "latin-1") *mean*? Take the string
> >"abc\xf0", interpret it as being encoded in XXX, and then encode from
> >XXX to Latin-1. But what's XXX? As I showed in a previous post,
> >"abc\xf0".encode("latin-1") *fails* because the source for the
> >encoding is assumed to be ASCII.
>
> I'm saying that XXX would be the same encoding as you specified.  i.e.,
> including an encoding means you are encoding the *meaning* of the string.

That would be the same as ignoring the encoding argument when the
input is str in CPython 2.x, right? I believe we started out saying we
didn't want to ignore the encoding. Perhaps we need to reconsider
that, given the Jython requirement? Then code that converts str to
bytes and needs to be portable between Jython and CPython could write

  b = bytes(s, "latin-1")

> However, I believe I mainly proposed this as an alternative to having
> bytes(str_or_unicode) work like bytes(map(ord,str_or_unicode)), which I
> think is probably a saner default.

Sorry, i still don't buy that

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/13/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> What would that imply for repr()?  To support eval(repr(x)) it would
> have to produce whatever format the source code includes to begin
> with.

I'm not sure that's a requirement. (I do think that in 2.x,
str(bytes(s)) == s should hold as long as type(s) == str.)

> If I understand correctly there's three main candidates:
> 1. Direct copying to str in 2.x, pretending it's latin-1 in unicode in 3.x

I'm not sure what you mean, but I'm guessing you're thinking that the
repr() of a bytes object created from bytes('abc\xf0') would be

  bytes('abc\xf0')

under this rule. What's so bad about that?

> 2. Direct copying to str/unicode if it's only ascii values, switching
> to a list of hex literals if there's any non-ascii values

That works for me too. But why hex literals? As MvL stated, a list of
decimals would be just as useful.

> 3. b"foo" literal with ascii for all ascii characters (other than \
> and "), \xFF for individual characters that aren't ascii
>
> Given the choice I prefer the third option, with the second option as
> my runner up.  The first option just screams "silent errors" to me.

The 3rd is out of the running for many reasons.

I'm not sure I understand your "silent errors" fear; can you elaborate?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/13/06, Barry Warsaw <[EMAIL PROTECTED]> wrote:
> This makes me think I want an unsigned byte type, which b[0] would
> return.  In another thread I think someone mentioned something about
> fixed width integral types, such that you could have an object that
> was guaranteed to be 8-bits wide, 16-bits wide, etc.   Maybe you also
> want signed and unsigned versions of each.  This may seem like YAGNI
> to many people, but as I've been working on a tightly embedded/
> extended application for the last few years, I've definitely had
> occasions where I wish I could more closely and more directly model
> my C values as Python objects (without using the standard workarounds
> or writing my own C extension types).

So I'm taking that the specific properties you want to model are the
overflow behavior, right? N-bit unsigned is defined as arithmethic mod
2**N; N-bit signed is a bit more tricky to define but similar. These
never overflow but instead just throw away bits in an exactly
specified manner (2's complement arithmetic).

While I personally am comfortable with writing (x+y) & 0x (for
16-bit unsigned), I can see that someone who spends a lot of time
doing arithmetic in this field might want specialized types.

But I'm not sure that that's what the Numeric folks want -- I believe
they're more interested in saving space, not in the mod 2**N
properties. So (here I'm to some extent guessing) they have different
array types whose elements are ints or floats of various widths; I'm
guessing they also have scalars of those widths for consistency or to
guide the creation of new arrays from scalars. I wouldn't be surprised
if, rather than requiring N-bit 2's complement, they would prefer more
flexible control over overflow -- e.g. ignore, warn, error, turn into
NaN, etc.

> But anyway, without hyper-generalizing, it's still worth asking
> whether a bytes type is just a container of byte objects, where the
> contained objects would be distinct, fixed 8-bit unsigned integral
> types.

There's certainly a point to treating bytes as ints; I don't know if
it's more compelling than to treating them as unit bytes. But if we
decide that the bytes types contains ints, b[0] should return a plain
int (whose value necessarily is in range(0, 256)), not some new
unsigned-8-bit type. And creating a bytes object from a list of ints
should accept any input values as long as their __index__ value is in
that same range.

I.e. bytes([1, 2L]) should be the same as bytes([1L, 2]); and
bytes([-1]) should raise a ValueError.

> > There's also the consideration for APIs that, informally, accept
> > either a string or a sequence of objects. Many of these exist, and
> > they are probably all being converted to support unicode as well as
> > str (if it makes sense at all). Should a bytes object be considered as
> > a sequence of things, or as a single thing, from the POV of these
> > types of APIs? Should we try to standardize how code tests for the
> > difference? (Currently all sorts of shortcuts are being taken, from
> > isinstance(x, (list, tuple)) to isinstance(x, basestring).)
>
> I think bytes objects are very much like string objects today --
> they're the photons of Python since they can act like either
> sequences or scalars, depending on the context.  For example, we have
> code that needs to deal with situations where an API can return
> either a scalar or a sequence of those scalars.  So we have a utility
> function like this:
>
> def thingiter(obj):
>  try:
>  it = iter(obj)
>  except TypeError:
>  yield obj
>  else:
>  for item in it:
>  yield item
>
> Maybe there's a better way to do this, but the most obvious problem
> is that (for our use cases), this fails for strings because in this
> context we want strings to act like scalars.  So we add a little test
> just before the "try:" like "if isinstance(obj, basestring): yield
> obj".  But that's yucky.
>
> I don't know what the solution is -- if there /is/ a solution short
> of special case tests like above, but I think the key observation is
> that sometimes you want your string to act like a sequence and
> sometimes you want it to act like a scalar.  I suspect bytes objects
> will be the same way.

I agree it's icky, and I'd rather not design APIs like that -- but I
can't help it that others continue to want to use that idiom. I also
agree that most likely we'll want to treat bytes the same as strings
here. But no basestring (bytes are mutable and don't behave like
sequences of characters).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Adam Olsen <[EMAIL PROTECTED]> wrote:
> I'm starting to wonder, do we really need anything fancy?  Wouldn't it
> be sufficient to have a way to compactly store 8-bit integers?
>
> In 2.x we could convert unicode like this:
> bytes(ord(c) for c in u"It's...".encode('utf-8'))

Yuck.

> u"It's...".byteencode('utf-8')  # Shortcut for above

Yuck**2. I'd like to avoid adding new APIs to existing types to return
bytes instead of str. (It's okay to change existing APIs to *accept*
bytes as an alternative to str though.)

> In 3.0 it changes to:
> "It's...".encode('utf-8')
> u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility

No. 3.0 won't have "backward compatibility" features. That's the whole
point of 3.0.

> Passing a str or unicode directly to bytes() would be an error.
> repr(bytes(...)) would produce bytes([1,2,3]).

I'm fine with that.

> Probably need a __bytes__() method that print can call, or even better
> a __print__(file) method[0].  The write() methods would of course have
> to support bytes objects.

Right on the latter.

> I realize it would be odd for the interactive interpret to print them
> as a list of ints by default:
> >>> u"It's...".byteencode('utf-8')
> [73, 116, 39, 115, 46, 46, 46]

No. This prints the repr() which should include the type. bytes([73,
116, 39, 115, 46, 46, 46]) is the right thing to print here.

> But maybe it's time we stopped hiding the real nature of bytes from users?

That's the whole point.

> [0] By this I mean calling objects recursively and telling them what
> file to print to, rather than getting a temporary string from them and
> printing that.  I always wondered why you could do that from C
> extensions but not from Python code.

I want to keep the Python-level API small.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/14/06, Thomas Wouters <[EMAIL PROTECTED]> wrote:
> On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:
>
> > But adding an encoding doesn't help. The str.encode() method always
> > assumes that the string itself is ASCII-encoded, and that's not good
> > enough:
>
> > >>> "abc".encode("latin-1")
> > 'abc'
> > >>> "abc".decode("latin-1")
> > u'abc'
> > >>> "abc\xf0".decode("latin-1")
> > u'abc\xf0'
> > >>> "abc\xf0".encode("latin-1")
> > Traceback (most recent call last):
> >   File "", line 1, in ?
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> > 3: ordinal not in range(128)

(Note that I've since been convinced that bytes(s) where type(s) ==
str should just return a bytes object containing the same bytes as s,
regardless of encoding. So basically you're preaching to the choir
now. The only remaining question is what if anything to do with an
encoding argment when the first argument is of type str...)

> These comments disturb me. I never really understood why (byte) strings grew
> the 'encode' method, since 8-bit strings *are already encoded*, by their
> very nature. I mean, I understand it's useful because Python does
> non-unicode encodings like 'hex', but I don't really understand *why*. The
> benefits don't seem to outweigh the cost (but that's hindsight.)

It may also have something to do with Jython compatibility (which has
str and unicode being the same thing) or 3.0 future-proofing.

> Directly encoding a (byte) string into a unicode encoding is mostly useless,
> as you've shown. The only use-case I can think of is translating ASCII in,
> for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
> unless the system encoding isn't 'ascii' (and that's pretty rare, and not
> something a Python programmer should depend on.) On the other hand, the fact
> that (byte) strings have an 'encode' method creates a lot of confusion in
> unicode-newbies, and causes programs to break only when input is non-ASCII.
> And non-ASCII input just happens too often and too unpredictably in
> 'real-world' code, and not enough in European programmers' tests ;P

Oh, there are lots of ways that non-ASCII input can break code, you
don't have to invoke encode() on str objects to get that effect. :/

> Unicode objects and strings are not the same thing. We shouldn't treat them
> as the same thing.

Well in 3.0 they *will* be the same thing, and in Jython they already are.

> They share an interface (like lists and tuples do), and
> if you only use that interface, treating them as the same kind object is
> mostly ok. They actually share *less* of an interface than lists and tuples,
> though, as comparing strings to unicode objects can raise an exception,
> whereas comparing lists to tuples is not expected to.

No, it causes silent surprises since [1,2,3] != (1,2,3).

> For anything less
> trivial than indexing, slicing and most of the string methods, and anything
> what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
> objects and strings *must* be treated separately. For instance, there is no
> correct way to do:
>
>   s.split("\x80")
>
> unless you know the type of 's'. If it's unicode, you want u"\x80" instead
> of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
> but you wouldn't know from looking at the code -- maybe it expects a
> specific encoding (or encoding family), maybe not. As soon as you deal with
> unicode, you need to really understand the concept, and too many programmers
> don't. And it's very hard to tell from someone's comments whether they fail
> to understand or just get some of the terminology wrong; that's why Guido's
> comments about 'encoding a byte string' and 'what if the file encoding is
> Unicode' scare me. The unicode/string mixup almost makes me wish Python
> was statically typed.

I'm mostly trying to reflect various broken mental models that users
may have. Believe me, my own confusion is nothing compared to the
confusion that occurs in less gifted users. :-)

The only use case for mixing ASCII and Unicode that I *wanted* to work
right was the mixing of pure ASCII strings (typically literals) with
Unicode data. And that works.

Where things unfortunately fall flat is when you start reading data
from files or interactive input and it gives you some encoded str
object instead of a Unicode object. Our mistake was that we didn't
foresee this clearly enough. Perhaps open(filename).read(), where the
file contains non-ASCII bytes, should have been changed to either
return a Unicode string (if an encoding can somehow be guessed), or
raise an exception, rather than returning an str object in some
unknown (and usually unknowable) encoding.

I hope to fix that in 3.0 too, BTW.

> So please, please, please don't make the mistake of 'doing something' with
> the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
> It wouldn't actually be usable except for the same things as 'str

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Guido van Rossum

On 2/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> >>In py3k, when the str object is eliminated, then what do you have?
> >>Perhaps
> >>- bytes("\x80"), you get an error, encoding is required. There is no
> >>such thing as "default encoding" anymore, as there's no str object.
> >>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
> >>single byte of value 0x80.
> >
> > Yes to both again.
>
> Please reconsider, and don't give bytes() an encoding= argument.
> It doesn't need one. In Python 3, people should write
>
>   "\x80".encode("latin-1")
>
> if they absolutely want to, although they better write
>
>   bytes([0x80])
>
> Now, the first form isn't valid in 2.5, but
>
>   bytes(u"\x80".encode("latin-1"))
>
> could work in all versions.

In 3.0, I agree that .encode() should return a bytes object.

I'd almost be convinced that in 2.x bytes() doesn't need an encoding
argument, except it will require excessive copying.
bytes(u.encode("utf8")) will certainly use 2*len(u) bytes  space (plus
a constant); bytes(u, "utf8") only needs len(u) bytes. In 3.0,
bytes(s.encode(xxx)) would also create an extra copy, since the bytes
type is mutable (we all agree on that, don't we?).

I think that's a good enough argument for 2.x. We could keep the
extended API as an alternative form in 3.x, or automatically translate
calls to bytes(x, y) into x.encode(y).

BTW I think we'll need a new PEP instead of PEP 332. The latter has
almost no details relevant to this discussion, and it seems to treat
bytes as a near-synonym for str in 2.x. That's not the way this
discussion is going it seems.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] bytes type discussion

2006-02-14 Thread Guido van Rossum

I'm about to send 6 or 8 replies to various salient messages in the
PEP 332 revival thread. That's probably a sign that there's still a
lot to be sorted out. In the mean time, to save you reading through
all those responses, here's a summary of where I believe I stand.
Let's continue the discussion in this new thread unless there are
specific hairs to be split in the other thread that aren't addressed
below or by later posts.

Non-controversial (or almost):

- we need a new PEP; PEP 332 won't cut it

- no b"..." literal

- bytes objects are mutable

- bytes objects are composed of ints in range(256)

- you can pass any iterable of ints to the bytes constructor, as long
as they are in range(256)

- longs or anything with an __index__ method should do, too

- when you index a bytes object, you get a plain int

- repr(bytes[1,0 20, 30]) == 'bytes([10, 20, 30])'

Somewhat controversial:

- it's probably too big to attempt to rush this into 2.5

- bytes("abc") == bytes(map(ord, "abc"))

- bytes("\x80\xff") == bytes(map(ord, "\x80\xff")) == bytes([128, 256])

Very controversial:

- bytes("abc", "encoding") == bytes("abc") # ignores the "encoding" argument

- bytes(u"abc") == bytes("abc") # for ASCII at least

- bytes(u"\x80\xff") raises UnicodeError

- bytes(u"\x80\xff", "latin-1") == bytes("\x80\xff")

Martin von Loewis's alternative for the "very controversial" set is to
disallow an encoding argument and (I believe) also to disallow Unicode
arguments. In 3.0 this would leave us with s.encode() as the
only way to convert a string (which is always unicode) to bytes. The
problem with this is that there's no code that works in both 2.x and
3.0.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Guido van Rossum

On 2/14/06, Just van Rossum <[EMAIL PROTECTED]> wrote:
> Guido van Rossum wrote:
> > [...] surely text files are more commonly used, and surely the
> > most common operation should have the shorter name -- call it the
> > Huffman Principle.
>
> +1 for two functions.
>
> My choice would be open() for binary and opentext() for text. I don't
> find that backwards at all: the text function is going to be more
> different from the current open() function then the binary function
> would be since in many ways the str type is closer to bytes than to
> unicode.

It's still backwards because the current open function defaults to
text on Windows (the only platform where it matters any more).

> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

And then, on 2/14/06, Alex Martelli <[EMAIL PROTECTED]> wrote:
> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

Plain 'text' and 'data' don't convey the fact that we're talking about
opening I/O objects here. If you want, we could say textfile() and
datafile(). (I'm fine with data instead of binary.)

But somehow I still like the 'open' verb. It has a long and rich
tradition. And it also nicely conveys that it is a factory function
which may return objects of different types (though similar in API)
based upon either additional arguments (e.g. buffering) or the
environment (e.g. encodings) or even inspection of the file being
opened.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Barry Warsaw

On Tue, 2006-02-14 at 14:37 -0800, Alex Martelli wrote:

> What about shorter names, such as 'text' instead of 'opentext' and
> 'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
> might make it easy to eventually migrate off it. Maybe text and data
> could be two subclasses of file, with file remaining initially as it
> is (and perhaps becoming an abstract-only baseclass at the time 'open'
> is deprecated).

I was actually thinking about static methods file.text() and file.data()
which seem nicely self descriptive, if a little bit longer.

-Barry



signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Alex Martelli

On 2/14/06, Just van Rossum <[EMAIL PROTECTED]> wrote:
   ...
> Maybe it's even better to use opentext() AND openbinary(), and deprecate
> plain open(). We could even introduce them at the same time as bytes()
> (and leave the open() deprecation for 3.0).

What about shorter names, such as 'text' instead of 'opentext' and
'data' instead of 'openbinary'?  By eschewing the 'open' prefix we
might make it easy to eventually migrate off it. Maybe text and data
could be two subclasses of file, with file remaining initially as it
is (and perhaps becoming an abstract-only baseclass at the time 'open'
is deprecated).

In real life, people do all the time use 'open' inappropriately (on
non-text files on Windows): one of the most frequent tasks on
python-help has to do with diagnosing that this is what happened and
suggest the addition of an explicit 'rb' or 'wb' argument.  This
unending chore, in particular, makes me very wary of forever keeping
open to mean "open this _text_ file".

Alex
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Joe Smith


"Guido van Rossum" <[EMAIL PROTECTED]> wrote in message 
news:[EMAIL PROTECTED]
> In private email, Phillip Eby suggested to add these things to the
> 2.5. standard library:
>
> bdist_deb, bdist_msi, and friends
>
> He explained them as follows:
>
> """
> bdist_deb makes .deb files (packages for Debian-based Linux distros, like
> Ubuntu).  bdist_msi makes .msi installers for Windows (it's by Martin v.
> Loewis).  Marc Lemburg proposed on the distutils-sig that these and 
> various
> other implemented bdist_* formats (other than bdist_egg) be included in 
> the
> next Python release, and there was no opposition there that I recall.
> """
>

I don't like the idea of bdist_deb very much.
The idea behind the debian packaging system is that unlike with RPM and 
Windows, package management should be clean.

Windows and RPM are known for major dependency problems, letting packages 
damage each other, having packages that do not uninstall cleanly (i.e. 
packages that leave junk all over the place) and generally messing the sytem 
up quite baddly over time, so that the OS is usually removed and 
re-installed periodically.)

The Debian style system attempts to overcome these deficiencies, and 
generally does a decent job with it. The problem is that this can really 
only work if packages are well maintained, and adhere to a set of policies 
that help to further mitigate these problems. Even with all of that, 
packages from one debian based distribution may well cause problems with a 
different one. For that reason it is quite rare to see .debs distributed by 
parties other than those directly involved with a Debian-based distribution, 
and even then they are normally targeted specifically at one distibution. 
Making it easy to generate .debs of python modules will likely result in a 
noticable increase in the number of .debs that do not target a specific 
distribution and/or do not follow the policies of that distribution.

So basically what I am saying is that such a system has a pretty good chance 
of resulting in debs that mess-up users systems, and that is not good. I'm 
not saying don't do it, but if it would be included in the standard library, 
procede with caution! 


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Thomas Wouters

On Tue, Feb 14, 2006 at 11:16:32AM -0800, Guido van Rossum wrote:

> Well, just like Java, if you have pure Python code, why should a
> developer have to duplicate the busy-work of creating distributions
> for different platforms? (Especially since there are so many different
> target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
> you -- I'm no expert but ISTM there are too many!)

Actually, that's where distutils and bdist_* comes in. Mr. Random Developer
writes a regular distutils setup.py, and I can install the latest,
not-quite-in-apt version by doing 'setup.py bdist_deb' and installing the
resulting .deb. Very convenient for both parties ;)

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Tutor] nice()

2006-02-14 Thread Crutcher Dunnavant

On 2/14/06, Michael Walter <[EMAIL PROTECTED]> wrote:
> It doesn't seem to me that math.nice has an obvious meaning.

I don't disagree, I think math.nice is a terrible name. I was
objecting to the desire to try to come up with interesting, different
names in every module namespace.

> Regards,
> Michael
>
> On 2/14/06, Crutcher Dunnavant <[EMAIL PROTECTED]> wrote:
> > On 2/12/06, Alan Gauld <[EMAIL PROTECTED]> wrote:
> > > >> However I do dislike the name nice() - there is already a nice() in the
> > > >> os module with a fairly well understood function. But I'm sure some
> > >
> > > > Presumably it would be located somewhere like the math module.
> > >
> > > For sure, but let's avoid as many name clashes as we can.
> > > Python is very good at managing namespaces but there are still a
> > > lot of folks who favour the
> > >
> > > from x import *
> > >
> > > mode of working.
> >
> > Yes, and there are people who insist on drinking and driving, that
> > doesn't mean cars should be designed with that as a motivating
> > assumption. There are just too many places where you are going to get
> > name clashes, where something which is _obvious_ in one context will
> > have a different ( and _obvious_ ) meaning in another. Lets just keep
> > the namespaces clean, and not worry about inter-module conflicts.
> >
> > --
> > Crutcher Dunnavant <[EMAIL PROTECTED]>
> > littlelanguages.com
> > monket.samedi-studios.com
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: 
> > http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com
> >
>


--
Crutcher Dunnavant <[EMAIL PROTECTED]>
littlelanguages.com
monket.samedi-studios.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Just van Rossum

Guido van Rossum wrote:

> > what will
> > ``open(filename).read()`` return ?
> 
> Since you didn't specify an open mode, it'll open it as a text file
> using some default encoding (or perhaps it can guess the encoding from
> file metadata -- this is all OS specific). So it'll return a string.
> 
> If you open the file in binary mode, however, read() will return a
> bytes object. I'm currently considering whether we should have a
> single open() function which returns different types of objects
> depending on a string parameter's value, or whether it makes more
> sense to have different functions, e.g. open() for text files and
> openbinary() for binary files. I believe Fredrik Lundh wants open() to
> use binary mode and opentext() for text files, but that seems
> backwards -- surely text files are more commonly used, and surely the
> most common operation should have the shorter name -- call it the
> Huffman Principle.

+1 for two functions.

My choice would be open() for binary and opentext() for text. I don't
find that backwards at all: the text function is going to be more
different from the current open() function then the binary function
would be since in many ways the str type is closer to bytes than to
unicode.

Maybe it's even better to use opentext() AND openbinary(), and deprecate
plain open(). We could even introduce them at the same time as bytes()
(and leave the open() deprecation for 3.0).

Just
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Phillip J. Eby

(Disclaimer: I'm not currently promoting the addition of bdist_egg or any 
egg-specific features for the 2.5 timeframe, but neither am I 
opposed.  This message is just to clarify a few points and questions under 
discussion, not to advocate a particular outcome.  If you read this and 
think you see arguments for *doing* anything, you're projecting your own 
conclusions where there is only analysis.)

At 11:16 AM 2/14/2006 -0800, Guido van Rossum wrote:
>On 2/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> > I'm actually opposed to bdist_egg, from a conceptual point of view.
> > I think it is wrong if Python creates its own packaging format
> > (just as it was wrong that Java created jar files - but they are
> > without deployment procedures even today).
>
>I think Jars are a lower-level thing than what we're talking about
>here; they're no different than shared libraries, and for an
>architecture that has its own bytecode and toolchain it only makes
>sense to invent its own cross-platform shared library format
>(especially given the "deploy anywhere" slogan).

Java, however, layers many things atop jars, including resources (files, 
images, messages, etc.) and metadata (manifests, deployment descriptors, 
etc.).  Eggs are the same.

To think that jars or eggs are a "packaging format" is a conceptual error 
if by "packaging format" you're equating them with .rpm, .deb, .msi, 
etc.  It is merely a convenient side benefit that .jar files and .egg files 
are convenient transport mechanisms for what's inside them - the jar or 
egg.  Jars and eggs are conceptual entities independent of the distribution 
format, and in the case of eggs there are two other formats (.egg directory 
and .egg-info tags) that can be used to express the conceptual entity.

> > The burden should be
> > on developer's side, for creating packages for the various systems,
> > not on the users side, when each software comes with its own
> > deployment infrastructure.
>
>Well, just like Java, if you have pure Python code, why should a
>developer have to duplicate the busy-work of creating distributions
>for different platforms? (Especially since there are so many different
>target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
>you -- I'm no expert but ISTM there are too many!)

Indeed.  Placing the burden on the developer's side simply means that it 
doesn't happen until volunteers pick it up, which happens slowly and only 
for "popular enough" packages.  Which means that as a practical matter, 
developers cannot release packages that depend on other packages without 
committing to some small set of target platforms and packaging systems -- 
the situation that setuptools was created to help change.

> > OTOH, users are fond of eggs, for reasons that I haven't yet
> > understood.
>
>I'm neutral on them; to be honest I don't even understand the
>difference between eggs and setuptools yet. :-)

Eggs are a way of associating metadata and resources with installed Python 
packages.  ".egg" is a zip or directory file layout that is one 
implementation of this concept.

Setuptools is a set of distutils enhancements that make it easier to build, 
test, distribute and deploy eggs, including the pkg_resources module (egg 
runtime support) and  the easy_install package manager.

>  I imagine that users
>don't particularly care about eggs, but do care about the ease of use
>of the tools around them, i.e. ez_setup.

And developers of course also care about not having to create those myriad 
installation formats, for platforms they may not even have.  :)  They also 
care about being able to specify dependencies reliably, which rules out 
entire classes of support issues and debugging.  It actually makes reuse of 
Python packages practical *without* unnecessarily tying the result to just 
one of the myriad platforms that Python runs on.  Some developers also like 
the plugin features, the ability to easily get data from their package 
directories, etc.

(Setuptools also offers a lot of creature comforts that the distutils 
doesn't, and some of those conveniences depend on eggs, but others do not.)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Neil Schemenauer

On Mon, Feb 13, 2006 at 08:07:49PM -0800, Guido van Rossum wrote:
> On 2/13/06, Neil Schemenauer <[EMAIL PROTECTED]> wrote:
> > "\x80".encode('latin-1')
> 
> But in 2.5 we can't change that to return a bytes object without
> creating HUGE incompatibilities.

People could spell it bytes(s.encode('latin-1')) in order to make it
work in 2.X.  That spelling would provide a way of ensuring the type
of the return value.

> You missed the part where I said that introducing the bytes type
> *without* a literal seems to be a good first step. A new type, even
> built-in, is much less drastic than a new literal (which requires
> lexer and parser support in addition to everything else).

Are you concerned about the implementation effort?  If so, I don't
think that's justified since adding a new string prefix should be
pretty straightforward (relative to rest of the effort involved).
Are you comfortable with the proposed syntax?

  Neil
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bdist_* to stdlib?

2006-02-14 Thread Guido van Rossum

On 2/13/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> I'm actually opposed to bdist_egg, from a conceptual point of view.
> I think it is wrong if Python creates its own packaging format
> (just as it was wrong that Java created jar files - but they are
> without deployment procedures even today).

I think Jars are a lower-level thing than what we're talking about
here; they're no different than shared libraries, and for an
architecture that has its own bytecode and toolchain it only makes
sense to invent its own cross-platform shared library format
(especially given the "deploy anywhere" slogan).

> The burden should be
> on developer's side, for creating packages for the various systems,
> not on the users side, when each software comes with its own
> deployment infrastructure.

Well, just like Java, if you have pure Python code, why should a
developer have to duplicate the busy-work of creating distributions
for different platforms? (Especially since there are so many different
target platforms -- RPM, .deb, Windows, MSI, Mac, fink, and what have
you -- I'm no expert but ISTM there are too many!)

> OTOH, users are fond of eggs, for reasons that I haven't yet
> understood.

I'm neutral on them; to be honest I don't even understand the
difference between eggs and setuptools yet. :-) I imagine that users
don't particularly care about eggs, but do care about the ease of use
of the tools around them, i.e. ez_setup.

> From a release management point of view, I would still like to
> make another bdist_msi release before contributing it to Python.

Please go ahead.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Guido van Rossum

On 2/14/06, Fuzzyman <[EMAIL PROTECTED]> wrote:
> In Python 3K, when the string data-type has gone,

Technically it won't be gone; str will mean what it already means in
Jython and IronPython (for which CPython uses unicode in 2.x).

> what will
> ``open(filename).read()`` return ?

Since you didn't specify an open mode, it'll open it as a text file
using some default encoding (or perhaps it can guess the encoding from
file metadata -- this is all OS specific). So it'll return a string.

If you open the file in binary mode, however, read() will return a
bytes object. I'm currently considering whether we should have a
single open() function which returns different types of objects
depending on a string parameter's value, or whether it makes more
sense to have different functions, e.g. open() for text files and
openbinary() for binary files. I believe Fredrik Lundh wants open() to
use binary mode and opentext() for text files, but that seems
backwards -- surely text files are more commonly used, and surely the
most common operation should have the shorter name -- call it the
Huffman Principle.

> Will the object returned have a
> ``decode`` method, to coerce to a unicode string ?

No, the object returned will *be* a (unicode) string.

But a bytes object (returned by a binary open operation) will have a
decode() method.

> Also, what datatype will ``u'some string'.encode('ascii')`` return ?

It will be a syntax error (u"..." will be illegal).

The str.encode() method will return a bytes object (if the design goes
as planned -- none of this is set in stone yet).

> I assume that when the ``bytes`` datatype is implemented, we will be
> able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
> probably ought to read the bytes PEP and the Py3k one...

Sort of (except perhaps we'd be using openbinary(filename, 'w")).
Perhaps write(somedata) should automatically coerce the data to bytes?

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread James Y Knight

On Feb 14, 2006, at 11:25 AM, Phillip J. Eby wrote:
> At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:
>> I like it, it makes sense. Unicode strings are simply not allowed as
>> arguments to the byte constructor. Thinking about it, why would it be
>> otherwise? And if you're mixing str-strings and unicode-strings, that
>> means the str-strings you're sometimes giving are actually not byte
>> strings, but character strings anyhow, so you should be encoding
>> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good  
>> spelling.
> Actually, I think you mean:
>
> if isinstance(s_or_U, str):
> s_or_U = s_or_U.decode('utf-8')
>
> b = bytes(s_or_U.encode('utf-8'))
>
> Or maybe:
>
> if isinstance(s_or_U, unicode):
> s_or_U = s_or_U.encode('utf-8')
>
> b = bytes(s_or_U)
>
> Which is why I proposed that the boilerplate logic get moved *into*  
> the bytes constructor.  I think this use case is going to be common  
> in today's Python, but in truth I'm not as sure what bytes() will  
> get used *for* in today's Python.  I'm probably overprojecting  
> based on the need to use str objects now, but bytes aren't going to  
> be a replacement for str for a good while anyway.

I most certainly *did not* mean that. If you are mixing together str  
and unicode instances, the str instances _must be_ in the default  
encoding (ascii). Otherwise, you are bound for failure anyhow, e.g.  
''.join(['\x95', u'1']). Str is used for two things right now: 1) a  
byte string. 2) a unicode string restricted to 7bit ASCII. These two  
uses are separate and you cannot mix them without causing disaster.

You've created an interface which can take either a utf8 byte-string,  
or unicode character string. But that's wrong and can only cause  
problems. It should take either an encoded bytestring, or a unicode  
character string. Not both. If it takes a unicode character string,  
there are two ways of spelling that in current python: a "str" object  
with only ASCII in it, or a "unicode" object with arbitrary  
characters in it. bytes(s_or_U.encode('utf-8')) works correctly with  
both.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread James Y Knight

On Feb 14, 2006, at 11:47 AM, M.-A. Lemburg wrote:
> The above approach would basically remove the possibility to easily
> create bytes() from literals in Py3k, since literals in Py3k create
> Unicode objects, e.g. bytes("123") would not work in Py3k.

That is true. And I think that is correct. There should be b"string"  
syntax.

> It's hard to imagine how you'd provide a decent upgrade path
> for bytes() if you introduce the above semantics in Py2.x.
>
> People would start writing bytes("123") in Py2.x and expect
> it to also work in Py3k, which it wouldn't.

Agreed, it won't work.

> To prevent this, you'd have to outrule bytes() construction
> from strings altogether, which doesn't look like a viable
> option either.

I don't think you have to do that, you just have to provide b"string".

I'd like to point out that the previous proposal had the same issue:

On Feb 13, 2006, at 8:11 PM, Guido van Rossum wrote:
> On 2/13/06, James Y Knight <[EMAIL PROTECTED]> wrote:
>> In py3k, when the str object is eliminated, then what do you have?
>> Perhaps
>> - bytes("\x80"), you get an error, encoding is required. There is no
>> such thing as "default encoding" anymore, as there's no str object.
>> - bytes("\x80", encoding="latin-1"), you get a bytestring with a
>> single byte of value 0x80.
>>
>
> Yes to both again.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Tutor] nice()

2006-02-14 Thread Michael Walter

It doesn't seem to me that math.nice has an obvious meaning.

Regards,
Michael

On 2/14/06, Crutcher Dunnavant <[EMAIL PROTECTED]> wrote:
> On 2/12/06, Alan Gauld <[EMAIL PROTECTED]> wrote:
> > >> However I do dislike the name nice() - there is already a nice() in the
> > >> os module with a fairly well understood function. But I'm sure some
> >
> > > Presumably it would be located somewhere like the math module.
> >
> > For sure, but let's avoid as many name clashes as we can.
> > Python is very good at managing namespaces but there are still a
> > lot of folks who favour the
> >
> > from x import *
> >
> > mode of working.
>
> Yes, and there are people who insist on drinking and driving, that
> doesn't mean cars should be designed with that as a motivating
> assumption. There are just too many places where you are going to get
> name clashes, where something which is _obvious_ in one context will
> have a different ( and _obvious_ ) meaning in another. Lets just keep
> the namespaces clean, and not worry about inter-module conflicts.
>
> --
> Crutcher Dunnavant <[EMAIL PROTECTED]>
> littlelanguages.com
> monket.samedi-studios.com
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/michael.walter%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread M.-A. Lemburg

Guido van Rossum wrote:
> On 2/13/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
>> Guido van Rossum wrote:
>>> It'd be cruel and unusual punishment though to have to write
>>>
>>>   bytes("abc", "Latin-1")
>>>
>>> I propose that the default encoding (for basestring instances) ought
>>> to be "ascii" just like everywhere else. (Meaning, it should really be
>>> the system default encoding, which defaults to "ascii" and is
>>> intentionally hard to change.)
>> We're talking about Py3k here: "abc" will be a Unicode string,
>> so why restrict the conversion to 7 bits when you can have 8 bits
>> without any conversion problems ?
> 
> As Phillip guessed, I was indeed thinking about introducing bytes()
> sooner than that, perhaps even in 2.5 (though I don't want anything
> rushed).

Hmm, that is probably going to be too early. As the thread shows
there are lots of things to take into account, esp. since if you
plan to introduce byte() in 2.x, the upgrade path to 3.x would
have to be carefully planned. Otherwise, we end up introducing
a feature which is meant to prepare for 3.x and then we end up
causing breakage when the move is finally implemented.

> Even in Py3k though, the encoding issue stands -- what if the file
> encoding is Unicode? Then using Latin-1 to encode bytes by default
> might not by what the user expected. Or what if the file encoding is
> something totally different? (Cyrillic, Greek, Japanese, Klingon.)
> Anything default but ASCII isn't going to work as expected. ASCII
> isn't going to work as expected either, but it will complain loudly
> (by throwing a UnicodeError) whenever you try it, rather than causing
> subtle bugs later.

I think there's a misunderstanding here: in Py3k, all "string"
literals will be converted from the source code encoding to
Unicode. There are no ambiguities - a Klingon character will still
map to the same ordinal used to create the byte content regardless
of whether the source file is encoded in UTF-8, UTF-16 or
some Klingon charset (are there any ?).

Furthermore, by restricting to ASCII you'd also outrule hex escapes
which seem to be the natural choice for presenting binary data in
literals - the Unicode representation would then only be an
implementation detail of the way Python treats "string" literals
and a user would certainly expect to find e.g. \x88 in the bytes object
if she writes bytes('\x88').

But maybe you have something different in mind... I'm talking
about ways to create bytes() in Py3k using "string" literals.

>> While we're at it: I'd suggest that we remove the auto-conversion
>> from bytes to Unicode in Py3k and the default encoding along with
>> it.
> 
> I'm not sure which auto-conversion you're talking about, since there
> is no bytes type yet. If you're talking about the auto-conversion from
> str to unicode: the bytes type should not be assumed to have *any*
> properties that the current str type has, and that includes
> auto-conversion.

I was talking about the automatic conversion of 8-bit strings to
Unicode - which was a key feature to make the introduction of
Unicode less painful, but will no longer be necessary in Py3k.

>> In Py3k the standard lib will have to be Unicode compatible
>> anyway and string parser markers like "s#" will have to go away
>> as well, so there's not much need for this anymore.
>>
>> (Maybe a bit radical, but I guess that's what Py3k is meant for.)
> 
> Right.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Tutor] nice()

2006-02-14 Thread Crutcher Dunnavant

On 2/12/06, Alan Gauld <[EMAIL PROTECTED]> wrote:
> >> However I do dislike the name nice() - there is already a nice() in the
> >> os module with a fairly well understood function. But I'm sure some
>
> > Presumably it would be located somewhere like the math module.
>
> For sure, but let's avoid as many name clashes as we can.
> Python is very good at managing namespaces but there are still a
> lot of folks who favour the
>
> from x import *
>
> mode of working.

Yes, and there are people who insist on drinking and driving, that
doesn't mean cars should be designed with that as a motivating
assumption. There are just too many places where you are going to get
name clashes, where something which is _obvious_ in one context will
have a different ( and _obvious_ ) meaning in another. Lets just keep
the namespaces clean, and not worry about inter-module conflicts.

--
Crutcher Dunnavant <[EMAIL PROTECTED]>
littlelanguages.com
monket.samedi-studios.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Josiah Carlson


James Y Knight <[EMAIL PROTECTED]> wrote:
> I like it, it makes sense. Unicode strings are simply not allowed as  
> arguments to the byte constructor. Thinking about it, why would it be  
> otherwise? And if you're mixing str-strings and unicode-strings, that  
> means the str-strings you're sometimes giving are actually not byte  
> strings, but character strings anyhow, so you should be encoding  
> those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

I also like the removal of the encoding...

> Kill the encoding argument, and you're left with:
> 
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> 
> Python3.X removes str, and most APIs that did return str return bytes  
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow

What's great is that this already works:

>>> import array
>>> array.array('b', [1,2,3])
array('b', [1, 2, 3])
>>> array.array('b', "hello")
array('b', [104, 101, 108, 108, 111])
>>> array.array('b', u"hello")
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: array initializer must be list or string
>>> array.array('b', [150])
Traceback (most recent call last):
  File "", line 1, in ?
OverflowError: signed char is greater than maximum
>>> array.array('B', [150])
array('B', [150])
>>> array.array('B', [350])
Traceback (most recent call last):
  File "", line 1, in ?
OverflowError: unsigned byte integer is greater than maximum


And out of the deal we can get both signed and unsigned ints.

Re: Adam Olsen
> I'm starting to wonder, do we really need anything fancy?  Wouldn't it
> be sufficient to have a way to compactly store 8-bit integers?

It already exists.  It could just use another interface.  The buffer
interface offers any array the ability to return strings.  That may have
to change to return bytes objects in Py3k.

 - Josiah

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] [Tutor] nice()

2006-02-14 Thread Alan Gauld

>> However I do dislike the name nice() - there is already a nice() in the
>> os module with a fairly well understood function. But I'm sure some

> Presumably it would be located somewhere like the math module.

For sure, but let's avoid as many name clashes as we can.
Python is very good at managing namespaces but there are still a 
lot of folks who favour the 

from x import * 

mode of working.

Alan G.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread M.-A. Lemburg

James Y Knight wrote:
> Kill the encoding argument, and you're left with:
> 
> Python2.X:
> - bytes(bytes_object) -> copy constructor
> - bytes(str_object) -> copy the bytes from the str to the bytes object
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> 
> Python3.X removes str, and most APIs that did return str return bytes  
> instead. Now all you have is:
> - bytes(bytes_object) -> copy constructor
> - bytes(sequence_of_ints) -> make bytes with the values of the ints,  
> error on overflow
> 
> Nice and simple.

Albeit, too simple.

The above approach would basically remove the possibility to easily
create bytes() from literals in Py3k, since literals in Py3k create
Unicode objects, e.g. bytes("123") would not work in Py3k.

It's hard to imagine how you'd provide a decent upgrade path
for bytes() if you introduce the above semantics in Py2.x.

People would start writing bytes("123") in Py2.x and expect
it to also work in Py3k, which it wouldn't.

To prevent this, you'd have to outrule bytes() construction
from strings altogether, which doesn't look like a viable
option either.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Phillip J. Eby

At 11:08 AM 2/14/2006 -0500, James Y Knight wrote:

>On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:
>
>>Phillip J. Eby wrote:
>>>I was just pointing out that since byte strings are bytes by
>>>definition,
>>>then simply putting those bytes in a bytes() object doesn't alter the
>>>existing encoding.  So, using latin-1 when converting a string to
>>>bytes
>>>actually seems like the the One Obvious Way to do it.
>>
>>This is a misconception. In Python 2.x, the type str already *is* a
>>bytes type. So if S is an instance of 2.x str, bytes(S) does not need
>>to do any conversion. You don't need to assume it is latin-1: it's
>>already bytes.
>>
>>>In fact, the 'encoding' argument seems useless in the case of str
>>>objects,
>>>and it seems it should default to latin-1 for unicode objects.
>>
>>I agree with the former, but not with the latter. There shouldn't be a
>>conversion of Unicode objects to bytes at all. If you want bytes from
>>a Unicode string U, write
>>
>>   bytes(U.encode(encoding))
>
>I like it, it makes sense. Unicode strings are simply not allowed as
>arguments to the byte constructor. Thinking about it, why would it be
>otherwise? And if you're mixing str-strings and unicode-strings, that
>means the str-strings you're sometimes giving are actually not byte
>strings, but character strings anyhow, so you should be encoding
>those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Actually, I think you mean:

 if isinstance(s_or_U, str):
 s_or_U = s_or_U.decode('utf-8')

 b = bytes(s_or_U.encode('utf-8'))

Or maybe:

 if isinstance(s_or_U, unicode):
 s_or_U = s_or_U.encode('utf-8')

 b = bytes(s_or_U)

Which is why I proposed that the boilerplate logic get moved *into* the 
bytes constructor.  I think this use case is going to be common in today's 
Python, but in truth I'm not as sure what bytes() will get used *for* in 
today's Python.  I'm probably overprojecting based on the need to use str 
objects now, but bytes aren't going to be a replacement for str for a good 
while anyway.


>Kill the encoding argument, and you're left with:
>
>Python2.X:
>- bytes(bytes_object) -> copy constructor
>- bytes(str_object) -> copy the bytes from the str to the bytes object
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>
>Python3.X removes str, and most APIs that did return str return bytes
>instead. Now all you have is:
>- bytes(bytes_object) -> copy constructor
>- bytes(sequence_of_ints) -> make bytes with the values of the ints,
>error on overflow
>
>Nice and simple.

I could certainly live with that approach, and it certainly rules out all 
the "when does the encoding argument apply and when should it be an error 
to pass it" questions.  :)

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread James Y Knight


On Feb 14, 2006, at 1:52 AM, Martin v. Löwis wrote:

> Phillip J. Eby wrote:
>> I was just pointing out that since byte strings are bytes by  
>> definition,
>> then simply putting those bytes in a bytes() object doesn't alter the
>> existing encoding.  So, using latin-1 when converting a string to  
>> bytes
>> actually seems like the the One Obvious Way to do it.
>
> This is a misconception. In Python 2.x, the type str already *is* a
> bytes type. So if S is an instance of 2.x str, bytes(S) does not need
> to do any conversion. You don't need to assume it is latin-1: it's
> already bytes.
>
>> In fact, the 'encoding' argument seems useless in the case of str  
>> objects,
>> and it seems it should default to latin-1 for unicode objects.
>
> I agree with the former, but not with the latter. There shouldn't be a
> conversion of Unicode objects to bytes at all. If you want bytes from
> a Unicode string U, write
>
>   bytes(U.encode(encoding))

I like it, it makes sense. Unicode strings are simply not allowed as  
arguments to the byte constructor. Thinking about it, why would it be  
otherwise? And if you're mixing str-strings and unicode-strings, that  
means the str-strings you're sometimes giving are actually not byte  
strings, but character strings anyhow, so you should be encoding  
those too. bytes(s_or_U.encode('utf-8')) is a perfectly good spelling.

Kill the encoding argument, and you're left with:

Python2.X:
- bytes(bytes_object) -> copy constructor
- bytes(str_object) -> copy the bytes from the str to the bytes object
- bytes(sequence_of_ints) -> make bytes with the values of the ints,  
error on overflow

Python3.X removes str, and most APIs that did return str return bytes  
instead. Now all you have is:
- bytes(bytes_object) -> copy constructor
- bytes(sequence_of_ints) -> make bytes with the values of the ints,  
error on overflow

Nice and simple.

James
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Barry Warsaw

On Feb 14, 2006, at 6:35 AM, Greg Ewing wrote:

> Barry Warsaw wrote:
>
>> This makes me think I want an unsigned byte type, which b[0] would
>> return.
>
> Come to think of it, this is something I don't
> remember seeing discussed. I've been thinking
> that bytes[i] would return an integer, but is
> the intention that it would return another bytes
> object?

A related question: what would bytes([104, 101, 108, 108, 111, 8004])  
return?  An exception hopefully.  I also think you'd want bytes([x  
for x in some_bytes_object]) to return an object equal to the original.

-Barry

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread Jeremy Hylton

On 2/14/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Jeremy Hylton wrote:
> > The compiler in question is gcc and the warning can be turned off with
> > -Wno-write-strings.  I think we'd be better off leaving that option
> > on, though.  This warning will help me find places where I'm passing a
> > string literal to a function that does not take a const char*.  That's
> > valuable, not insensate.
>
> Hmm. I'd say this depends on what your reaction to the warning is.
> If you sprinkle const_casts in the code, nothing is gained.

Except for the Python APIs, we would declare the function as taking a
const char* if took a const char*.  If the function legitimately takes
a char*, then you have to change the code to avoid a segfault.

> Perhaps there is some value in finding functions which ought to expect
> const char*. For that, occasional checks should be sufficient; I cannot
> see a point in having code permanently pass with that option. In
> particular not if you are interfacing with C libraries.

I don't understand what you mean:  I'm not sure what you mean by
"occasional checks" or "permanently pass".  The compiler flags are
always the same.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Michael Hudson

Greg Ewing <[EMAIL PROTECTED]> writes:

> Guido van Rossum wrote:
>
>> There's also the consideration for APIs that, informally, accept
>> either a string or a sequence of objects.
>
> My preference these days is not to design APIs that
> way. It's never necessary and it avoids a lot of
> problems.

Oh yes.

Cheers,
mwh

-- 
  ZAPHOD:  Listen three eyes, don't try to outweird me, I get stranger
   things than you free with my breakfast cereal.
-- The Hitch-Hikers Guide to the Galaxy, Episode 7
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread Jeremy Hylton

On 2/14/06, M.-A. Lemburg <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> > M.-A. Lemburg wrote:
> >>> It's the consequences:  nobody complains about tacking "const" on to a
> >>> former honest-to-God "char *" argument that was in fact not modified,
> >>> because that's not only helpful for C++ programmers, it's _harmless_
> >>> for all programmers.  For example, nobody could sanely object (and
> >>> nobody did :-)) to adding const to the attribute-name argument in
> >>> PyObject_SetAttrString().  Sticking to that creates no new problems
> >>> for anyone, so that's as far as I ever went.
> >>
> >> Well, it broke my C extensions... I now have this in my code:
> >>
> >> /* The keyword array changed to const char* in Python 2.5 */
> >> #if PY_VERSION_HEX >= 0x0205
> >> # define Py_KEYWORDS_STRING_TYPE const char
> >> #else
> >> # define Py_KEYWORDS_STRING_TYPE char
> >> #endif
> >> ...
> >> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
> >> ...
> >
> > You did not read Tim's message carefully enough. He wasn't talking
> > about PyArg_ParseTupleAndKeywords *at all*. He only talked about
> > changing char* arguments to const char*, e.g. in
> > PyObject_SetAttrString. Did that break your C extensions also?
>
> I did read Tim's post: sorry for phrasing the reply the way I did.
>
> I was referring to his statement "nobody complains about tacking "const"
> on to a former honest-to-God "char *" argument that was in fact not
> modified".
>
> Also: it's not me complaining, it's the compilers !

Tim was talking about adding const to a char* not adding const to a
char** (note the two stars).  The subsequent discussion has been about
the different way those are handled in C and C++ and a general
agreement that the "const char**" has been a bother for people.

Jeremy
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread Jack Jansen

Thanks to all for a rather insightful discussion, it's always fun to  
learn that after 28 years of C programming the language still has  
little corners that I know absolutely nothing about:-)

Practically speaking, though, I've adopted MAL's solution for the  
time being:

> /* The keyword array changed to const char* in Python 2.5 */
> #if PY_VERSION_HEX >= 0x0205
> # define Py_KEYWORDS_STRING_TYPE const char
> #else
> # define Py_KEYWORDS_STRING_TYPE char
> #endif
> ...
> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
> ...
> if (!PyArg_ParseTupleAndKeywords(args,kws,format,kwslist,&a1))
> goto onError;

At least this appears to work...
--
Jack Jansen, <[EMAIL PROTECTED]>, http://www.cwi.nl/~jack
If I can't dance I don't want to be part of your revolution -- Emma  
Goldman


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Adam Olsen

On 2/14/06, "Martin v. Löwis" <[EMAIL PROTECTED]> wrote:
> Adam Olsen wrote:
> > What would that imply for repr()?  To support eval(repr(x))
>
> I don't think eval(repr(x)) needs to be supported for the bytes
> type. However, if that is desirable, it should return something
> like
>
>   bytes([1,2,3])

I'm starting to wonder, do we really need anything fancy?  Wouldn't it
be sufficient to have a way to compactly store 8-bit integers?

In 2.x we could convert unicode like this:
bytes(ord(c) for c in u"It's...".encode('utf-8'))
u"It's...".byteencode('utf-8')  # Shortcut for above

In 3.0 it changes to:
"It's...".encode('utf-8')
u"It's...".byteencode('utf-8')  # Same as above, kept for compatibility

Passing a str or unicode directly to bytes() would be an error. 
repr(bytes(...)) would produce bytes([1,2,3]).

Probably need a __bytes__() method that print can call, or even better
a __print__(file) method[0].  The write() methods would of course have
to support bytes objects.

I realize it would be odd for the interactive interpret to print them
as a list of ints by default:
>>> u"It's...".byteencode('utf-8')
[73, 116, 39, 115, 46, 46, 46]
But maybe it's time we stopped hiding the real nature of bytes from users?

[0] By this I mean calling objects recursively and telling them what
file to print to, rather than getting a temporary string from them and
printing that.  I always wondered why you could do that from C
extensions but not from Python code.

--
Adam Olsen, aka Rhamphoryncus
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP for adding an sq_index slot so that any object, a or b, can be used in X[a:b] notation

2006-02-14 Thread Nick Coghlan

Guido van Rossum wrote:
> On 2/10/06, Mark Russell <[EMAIL PROTECTED]> wrote:
>> On 10 Feb 2006, at 12:45, Nick Coghlan wrote:
>>
>> An alternative would be to call it "__discrete__", as that is the key
>>
>> characteristic of an indexing type - it consists of a sequence of discrete
>>
>> values that can be isomorphically mapped to the integers.
>> Another alternative: __as_ordinal__.  Wikipedia describes ordinals as
>> "numbers used to denote the position in an ordered sequence" which seems a
>> pretty precise description of the intended result.  The "as_" prefix also
>> captures the idea that this should be a lossless conversion.
> 
> Aren't ordinals generally assumed to be non-negative? The numbers used
> as slice or sequence indices can be negative!

The other problem with 'ordinal' as a name is that the term already has a 
meaning in Python (what else would 'ord' be short for?).

I liked index from the start, but I thought we should put at least a bit of 
effort into seeing if we could come up with anything better. I don't really 
see any way that either 'discrete' or 'ordinal' can be said to qualify as 
better :)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Nick Coghlan

Guido van Rossum wrote:
> In general I've come to appreciate that there are two ways of
> converting an object of type A to an object of type B: ask an A
> instance to convert itself to a B, or ask the type B to create a new
> instance from an A.

And the difference between the two isn't even always that clear cut. Sometimes 
you'll ask type B to create a new instance from an A, and then while you're 
not looking type B cheats and goes and asks the A instance to do it instead ;)

Cheers,
Nick.

-- 
Nick Coghlan   |   [EMAIL PROTECTED]   |   Brisbane, Australia
---
 http://www.boredomandlaziness.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Barry Warsaw wrote:

> This makes me think I want an unsigned byte type, which b[0] would  
> return.

Come to think of it, this is something I don't
remember seeing discussed. I've been thinking
that bytes[i] would return an integer, but is
the intention that it would return another bytes
object?

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Guido van Rossum wrote:

> There's also the consideration for APIs that, informally, accept
> either a string or a sequence of objects.

My preference these days is not to design APIs that
way. It's never necessary and it avoids a lot of
problems.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Greg Ewing

Guido van Rossum wrote:

> I also wonder if having a b"..." literal would just add more confusion
> -- bytes are not characters, but b"..." makes it appear as if they
> are.

I'm inclined to agree. Bytes objects are more likely to be used
for things which are *not* characters -- if they're characters,
they would be better kept in strings or char arrays.

+1 on any eventual bytes literal looking completely different
from a string literal.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] nice()

2006-02-14 Thread Greg Ewing

Smith wrote:

> computing the bin boundaries for a histogram
 > where bins are a width of 0.1:
> 
for i in range(20):
> ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
> ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.

I don't see how that has any relevance to the way bin boundaries
would be used in practice, which is to say something like

   i = int(value / 0.1)
   bin[i] += 1 # modulo appropriate range checks

which doesn't require comparing floats for equality at all.

> For, say, garden variety numbers that aren't full of garbage digits
 > resulting from fp computation, the boundaries computed as 0.1*i are\
 > not going to agree with such simple numbers as 1.4 and 0.7.

Because the arithmetic is binary rather than decimal. But even using
decimal, you get the same sort of problems using a bin width of
1.0/3.0. The solution is to use an algorithm that isn't sensitive
to those problems, then it doesn't matter what base your arithmetic
is done in.

> I understand that the above really is just a patch over the problem,
 > but I'm wondering if it moves the problem far enough away that most
 > users wouldn't have to worry about it.

No, it doesn't. The problems are not conveniently grouped together
in some place you can get away from; they're scattered all over the
place where you can stumble upon one at any time.

> So perhaps this brings us back to the original comment that "fp issues
 > are a learning opportunity." They are. The question I have is "how
> soon  do they need to run into them?" Is decreasing the likelihood that
 > they will see the problem (but not eliminate it) a good thing for the
 > python community or not?

I don't think you're doing anyone any favours by trying to protect
them from having to know about these things, because they *need* to
know about them if they're not to write algorithms that seem to
work fine on tests but mysteriously start producing garbage when
run on real data, possibly without it even being obvious that it is
garbage.

Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] str object going in Py3K

2006-02-14 Thread Fuzzyman

Guido van Rossum wrote:

> [snip..]
>
>>In py3k, when the str object is eliminated, then what do you have?
>>Perhaps
>>- bytes("\x80"), you get an error, encoding is required. There is no
>>such thing as "default encoding" anymore, as there's no str object.
>>- bytes("\x80", encoding="latin-1"), you get a bytestring with a
>>single byte of value 0x80.
>>
>>
>
>Yes to both again.
>
>  
>
*Slightly* related question. Sorry for the tangent.

In Python 3K, when the string data-type has gone, what will
``open(filename).read()`` return ? Will the object returned have a
``decode`` method, to coerce to a unicode string ?

Also, what datatype will ``u'some string'.encode('ascii')`` return ?

I assume that when the ``bytes`` datatype is implemented, we will be
able to do ``open(filename, 'wb').write(bytes(somedata))`` ? Hmmm... I
probably ought to read the bytes PEP and the Py3k one...

Just curious...

All the best,

Michael Foord

>--
>--Guido van Rossum (home page: http://www.python.org/~guido/)
>
>  
>

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] http://www.python.org/dev/doc/devel still available

2006-02-14 Thread Neal Norwitz

On 2/13/06, Fred L. Drake, Jr. <[EMAIL PROTECTED]> wrote:
> On Monday 13 February 2006 10:03, Georg Brandl wrote:
>  > The above docs are from August 2005 while docs.python.org/dev is current.
>  > Shouldn't the old docs be removed?
>
> I'm afraid I've generally been too busy to chime in much on this topic, but
> I've spent a bit of time thinking about it, and would like to keep on top of
> the issue still.

Fred,

While you are here, are you planning to do the doc releases for 2.5? 
You are tentatively listed in PEP 356.  (Technically it says TBD with
a ? next to your name.)

> The automatically-maintained version of the development docs is certainly
> preferrable to the manually-maintained-by-me version, and I've updated the
> link from www.python.org/doc/ to refer to that version for now.  However, I
> do have some concerns about how this is all structured still.

I think this was the quick hack I did.  I hope there are many
concerns. :-)  For example, if the doc build fails, ...  Hmmm, this
probably isn't a problem.  The doc won't be updated, but will still be
the last good version.  So if I send mail when the doc doesn't build,
then it might not be so bad.  Will have to test this.  I still need to
switch over the failure mails to go to python-checkins.  There are too
many right now though.  Unless people don't mind getting several
messages about refleaks every day?  Anyone?

> What I would also like to see is to have an automatically-updated version for
> each of the maintainer versions of Python, as well as the development trunk.
> That would mean two versions at this point (2.4.x, 2.5.x); only one of those
> is currently handled automatically.

That shouldn't be a problem.  See http://docs.python.org/dev/2.4/

n
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Baffled by PyArg_ParseTupleAndKeywords modification

2006-02-14 Thread M.-A. Lemburg

Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>>> It's the consequences:  nobody complains about tacking "const" on to a
>>> former honest-to-God "char *" argument that was in fact not modified,
>>> because that's not only helpful for C++ programmers, it's _harmless_
>>> for all programmers.  For example, nobody could sanely object (and
>>> nobody did :-)) to adding const to the attribute-name argument in
>>> PyObject_SetAttrString().  Sticking to that creates no new problems
>>> for anyone, so that's as far as I ever went.
>>
>> Well, it broke my C extensions... I now have this in my code:
>>
>> /* The keyword array changed to const char* in Python 2.5 */
>> #if PY_VERSION_HEX >= 0x0205
>> # define Py_KEYWORDS_STRING_TYPE const char
>> #else
>> # define Py_KEYWORDS_STRING_TYPE char
>> #endif
>> ...
>> static Py_KEYWORDS_STRING_TYPE *kwslist[] = {"yada", NULL};
>> ...
> 
> You did not read Tim's message carefully enough. He wasn't talking
> about PyArg_ParseTupleAndKeywords *at all*. He only talked about
> changing char* arguments to const char*, e.g. in
> PyObject_SetAttrString. Did that break your C extensions also?

I did read Tim's post: sorry for phrasing the reply the way I did.

I was referring to his statement "nobody complains about tacking "const"
on to a former honest-to-God "char *" argument that was in fact not
modified".

Also: it's not me complaining, it's the compilers !

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Feb 14 2006)
>>> Python/Zope Consulting and Support ...http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/


::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! 
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 332 revival in coordination with pep 349? [ Was:Re: release plan for 2.5 ?]

2006-02-14 Thread Thomas Wouters

On Mon, Feb 13, 2006 at 03:44:27PM -0800, Guido van Rossum wrote:

> But adding an encoding doesn't help. The str.encode() method always
> assumes that the string itself is ASCII-encoded, and that's not good
> enough:

> >>> "abc".encode("latin-1")
> 'abc'
> >>> "abc".decode("latin-1")
> u'abc'
> >>> "abc\xf0".decode("latin-1")
> u'abc\xf0'
> >>> "abc\xf0".encode("latin-1")
> Traceback (most recent call last):
>   File "", line 1, in ?
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position
> 3: ordinal not in range(128)

These comments disturb me. I never really understood why (byte) strings grew
the 'encode' method, since 8-bit strings *are already encoded*, by their
very nature. I mean, I understand it's useful because Python does
non-unicode encodings like 'hex', but I don't really understand *why*. The
benefits don't seem to outweigh the cost (but that's hindsight.)

Directly encoding a (byte) string into a unicode encoding is mostly useless,
as you've shown. The only use-case I can think of is translating ASCII in,
for instance, EBCDIC. Encoding anything into an ASCII superset is a no-op,
unless the system encoding isn't 'ascii' (and that's pretty rare, and not
something a Python programmer should depend on.) On the other hand, the fact
that (byte) strings have an 'encode' method creates a lot of confusion in
unicode-newbies, and causes programs to break only when input is non-ASCII.
And non-ASCII input just happens too often and too unpredictably in
'real-world' code, and not enough in European programmers' tests ;P

Unicode objects and strings are not the same thing. We shouldn't treat them
as the same thing. They share an interface (like lists and tuples do), and
if you only use that interface, treating them as the same kind object is
mostly ok. They actually share *less* of an interface than lists and tuples,
though, as comparing strings to unicode objects can raise an exception,
whereas comparing lists to tuples is not expected to. For anything less
trivial than indexing, slicing and most of the string methods, and anything
what so ever involving non-ASCII (or, rather, non-system-encoding), unicode
objects and strings *must* be treated separately. For instance, there is no
correct way to do:

  s.split("\x80")

unless you know the type of 's'. If it's unicode, you want u"\x80" instead
of "\x80". If it's not unicode, splitting "\x80" may not even be sensible,
but you wouldn't know from looking at the code -- maybe it expects a
specific encoding (or encoding family), maybe not. As soon as you deal with
unicode, you need to really understand the concept, and too many programmers
don't. And it's very hard to tell from someone's comments whether they fail
to understand or just get some of the terminology wrong; that's why Guido's
comments about 'encoding a byte string' and 'what if the file encoding is
Unicode' scare me. The unicode/string mixup almost makes me wish Python
was statically typed.

So please, please, please don't make the mistake of 'doing something' with
the 'encoding' argument to 'bytes(s, encoding)' when 's' is a (byte) string.
It wouldn't actually be usable except for the same things as 'str.encode':
to convert from ASCII to non-ASCII-supersets, or to convert to non-unicode
encodings (such as 'hex'.) You can achieve those two by doing, e.g.,
'bytes(s.encode('hex'))' if you really want to. Ignoring the encoding
(rather than raising an exception) would also allow code to be trivially
portable between Python 2.x and Py3K, when "" is actually a unicode object.

Not that I'm happy with ignoring anything, but not ignoring would be bigger
crime here.

Oh, and while on the subject, I'm not convinced going all-unicode in Py3K is
a good idea either, but maybe I should save that discussion for PyCon. I'm
not thinking "why do we need unicode" anymore (which I did two years ago ;)
but I *am* thinking it'll be a big step for 90% of the programmers if they
have to grasp unicode and encodings to be able to even do 'raw_input()'
sensibly. I know I spend an inordinate amount of time trying to explain the
basics on #python on irc.freenode.net already.

-- 
Thomas Wouters <[EMAIL PROTECTED]>

Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

96 matches

Mail list logo