Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 7:43 AM, Antoine Pitrou solip...@pitrou.net wrote:
 On Thu, 25 Apr 2013 04:19:36 +0200
 Lennart Regebro rege...@gmail.com wrote:
 On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull step...@xemacs.org 
 wrote:
  RFC 4648 repeatedly refers to *characters*, without specifying an
  encoding for them.
 [...]

 Base64 is an encoding that transforms between 8-bit streams.

 No, it isn't. What Stephen wrote above.

Yes it is. Base64 takes 8-bit bytes and transforms them into another
8-bit stream that can be safely transmitted over various channels that
would mangle an unencoded 8-bit stream, such as email etc.

http://en.wikipedia.org/wiki/Base64

 Either you get a LookupError: unknown
 encoding: base64, which is what you get now, or you get an
 UnicodeEncodingError if the text is not ASCII. We don't want the
 latter, because it means that code that looks fine for the developer
 breaks in real life because the developer was American

 That's bogus.

No, that's real life.

 By the same argument, we should suppress any
 encoding which isn't able to represent all possible unicode strings.

No, if you explicitly use such an encoding it is because you need to
because you are transferring data to a system that needs the encoding
in question. Unicode errors are unavoidable at that point, not an
unexpected surprise because a conversion happened implicitly that you
didn't know about.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Stephen J. Turnbull
Lennart Regebro writes:

  Base64 is an encoding that transforms between 8-bit streams. Let it be
  that. Don't try to shoehorn it into a completely different kind of
  encoding.

By completely different kind of encoding do you mean codec?

I think that would be an unfortunate result.  These operations on
streams are theoretically nicely composable.  It would be nice if
practice reflected that by having a uniform API for all of these
operations (charset translation, encoded text to internal, content
transfer encoding, compression ...).  I think it would be useful, too,
though I can't prove that.

Anyway, this discussion belongs on python-ideas at this point.  Or
would, if I had an idea about implementation.  I'll take it there when
I do have something to say about implementation.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 8:57 AM, Stephen J. Turnbull step...@xemacs.org wrote:
 I think that would be an unfortunate result.  These operations on
 streams are theoretically nicely composable.  It would be nice if
 practice reflected that by having a uniform API for all of these
 operations (charset translation, encoded text to internal, content
 transfer encoding, compression ...).  I think it would be useful, too,
 though I can't prove that.

But the translation to and from Unicode to some 8-bit encoding is
different from the others. It makes sense that they have a different
API. If you have a Unicode string you can go:

Unicode text - UTF8 - ZIP - BASE64.

Or you can go

Unicode text - UTF8 - BASE64 - ZIP

Although admittedly that makes much less sense. :-)

But you can not go:

   Unicode text - BASE64 - ZIP - UTF8

The str/bytes encoding/decoding is not like the others.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Nick Coghlan
On Thu, Apr 25, 2013 at 4:57 PM, Stephen J. Turnbull step...@xemacs.org wrote:
 Lennart Regebro writes:

   Base64 is an encoding that transforms between 8-bit streams. Let it be
   that. Don't try to shoehorn it into a completely different kind of
   encoding.

 By completely different kind of encoding do you mean codec?

 I think that would be an unfortunate result.  These operations on
 streams are theoretically nicely composable.  It would be nice if
 practice reflected that by having a uniform API for all of these
 operations (charset translation, encoded text to internal, content
 transfer encoding, compression ...).  I think it would be useful, too,
 though I can't prove that.

 Anyway, this discussion belongs on python-ideas at this point.  Or
 would, if I had an idea about implementation.  I'll take it there when
 I do have something to say about implementation.

Bringing the mailing list thread up to date with the state of the
relevant tracker issues:

I created http://bugs.python.org/issue17827 to cover adding the
missing documentation for codecs.encode and codecs.decode as the
officially supported solutions for easy use of the codec
infrastructure *without* the additional text model specific input and
output type restrictions imposed by the str.encode, bytes.decode and
bytearray.decode methods.

I created http://bugs.python.org/issue17828 to cover emitting more
meaningful exceptions when a codec throws TypeError or ValueError, as
well as when the additional type checking fails for str.encode,
bytes.decode and bytearray.decode.

I created http://bugs.python.org/issue17839 to cover the fact that
part of the problem here is that the base64 module currently only
accepts bytes and bytearray as inputs, rather than anything that
supports the PEP 3118 buffer interface.

http://bugs.python.org/issue7475 (linked earlier in the thread) is now
strictly about restoring the shorthand aliases for base64_codec,
bz2_codec et al that were removed in
http://bugs.python.org/issue10807.

Regards,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Antoine Pitrou
Le Thu, 25 Apr 2013 08:38:12 +0200,
Lennart Regebro rege...@gmail.com a écrit :
 On Thu, Apr 25, 2013 at 7:43 AM, Antoine Pitrou solip...@pitrou.net
 wrote:
  On Thu, 25 Apr 2013 04:19:36 +0200
  Lennart Regebro rege...@gmail.com wrote:
  On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull
  step...@xemacs.org wrote:
   RFC 4648 repeatedly refers to *characters*, without specifying an
   encoding for them.
  [...]
 
  Base64 is an encoding that transforms between 8-bit streams.
 
  No, it isn't. What Stephen wrote above.
 
 Yes it is. Base64 takes 8-bit bytes and transforms them into another
 8-bit stream that can be safely transmitted over various channels that
 would mangle an unencoded 8-bit stream, such as email etc.
 
 http://en.wikipedia.org/wiki/Base64

I don't see anything in that Wikipedia page that validates your opinion.
The Wikipedia page does talk about *text* and *characters* for
the result of base64 encoding.

Besides, I would consider a RFC more authoritative than a
Wikipedia definition.

  By the same argument, we should suppress any
  encoding which isn't able to represent all possible unicode strings.
 
 No, if you explicitly use such an encoding it is because you need to
 because you are transferring data to a system that needs the encoding
 in question. Unicode errors are unavoidable at that point, not an
 unexpected surprise because a conversion happened implicitly that you
 didn't know about.

I don't know what implicit conversion you are talking about. There's
no implicit conversion in a scheme where the result of base64
encoding is a text string.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 11:25 AM, Antoine Pitrou solip...@pitrou.net wrote:
 Le Thu, 25 Apr 2013 08:38:12 +0200,
 Yes it is. Base64 takes 8-bit bytes and transforms them into another
 8-bit stream that can be safely transmitted over various channels that
 would mangle an unencoded 8-bit stream, such as email etc.

 http://en.wikipedia.org/wiki/Base64

 I don't see anything in that Wikipedia page that validates your opinion.

OK, quote me the exact page text from the Wikipedia article or RFC
that explains how you map the 31-bit character space of Unicode to
Base64.

 The Wikipedia page does talk about *text* and *characters* for
 the result of base64 encoding.

So are saying that you want the Python implementation of base64
encoding to take 8-bit binary data in bytes format and return a
Unicode string containing the Base64 encoded data? I think that would
surprise most people, and be of significantly less use than a base64
encoding that returns bytes.

Python 3 still views text as Unicode only. Everything else is not
text, but binary data. This makes sense, is consistent and makes
things easier to handle. This is the whole point of making the str
into Unicode in Python 3.

 No, if you explicitly use such an encoding it is because you need to
 because you are transferring data to a system that needs the encoding
 in question. Unicode errors are unavoidable at that point, not an
 unexpected surprise because a conversion happened implicitly that you
 didn't know about.

 I don't know what implicit conversion you are talking about. There's
 no implicit conversion in a scheme where the result of base64
 encoding is a text string.

I'm sorry, I thought you were arguing for a base64 encoding taking
Unicode strings and returning 8-bit bytes. That position I can
understand, although I disagree with it. The position that a base64
encoding should take 8-bit bytes and return Unicode strings is
incomprehensible to me. I have no idea why you would want that, how
you would use it, how you would implement that API in a reasonable
way, nor how you would explain why it is like that. I can't think of
any usecase where you would want base64 encoded data unless you intend
to transmit it over an 8-bit channel, so why it should return a
Unicode string instead of 8-bit bytes is completely beyond my
comprehension. Sorry.


//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Xavier Morel
On 2013-04-25, at 11:25 , Antoine Pitrou wrote:
 
 Besides, I would consider a RFC more authoritative than a
 Wikipedia definition.

 Base encoding of data is used in many situations to store or transfer
 data in environments that, perhaps for legacy reasons, are restricted
 to US-ASCII [1] data.

so the output is US-ASCII data, a byte stream.

Stephen is correct that you could decide you don't care about those
semantics, and implement base64 encoding as a bytes - str decoding then
requiring a re-encoding (to ascii) before wire transmission.

The clarity of the interface (or lack thereof) would probably make users
want to send a strongly worded letter to whoever implemented it though,
I don't think `data.decode('base64').encode('ascii')` would fit the 
obviousness or readability expectations of most users.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Antoine Pitrou
Le Thu, 25 Apr 2013 12:46:43 +0200,
Xavier Morel catch-...@masklinn.net a écrit :

 On 2013-04-25, at 11:25 , Antoine Pitrou wrote:
  
  Besides, I would consider a RFC more authoritative than a
  Wikipedia definition.
 
  Base encoding of data is used in many situations to store or
  transfer data in environments that, perhaps for legacy reasons, are
  restricted to US-ASCII [1] data.
 
 so the output is US-ASCII data, a byte stream.

Well, depending on the context, US-ASCII can be a character set or a
character encoding. If some specification is talking about text and
characters, then it is something that can reasonably be a str in
Python land.

Similarly, we have chosen to make filesystem paths str by default in
Python 3, even though many Unix-heads would claim that filesystem paths
are bytes only. The reason is that while they are technically bytes
(under Unix), they are functionally text.

Now, if the base64-encoded data is your entire payload, this clearly
amounts to nitpicking. But when you want to *embed* that data in some
larger chunk of text (e.g. a JSON object), then it makes a lot of sense
to consider the base64-encoded data a piece of *text*, not bytes.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Antoine Pitrou
Le Thu, 25 Apr 2013 12:05:01 +0200,
Lennart Regebro rege...@gmail.com a écrit :
  The Wikipedia page does talk about *text* and *characters* for
  the result of base64 encoding.
 
 So are saying that you want the Python implementation of base64
 encoding to take 8-bit binary data in bytes format and return a
 Unicode string containing the Base64 encoded data?

I'm not wanting anything here, since that would clearly break backwards
compatibility. But I think binascii should have gone that way in Python
3, indeed. binascii.b2a_hex(), for example, would be much more
practical if it returned str, rather than bytes.

 Python 3 still views text as Unicode only.

Python 3 doesn't *view* text as unicode, it *represents* it as unicode.
That is, unicode is the character set that Python 3 is able to
represent in the canonical text type, str. If you ever encounter a
hypothetical text that uses characters outside of Unicode (obviously it
will be encoded using a non-unicode encoding :-)), then you can't
represent it as a str.

And base64 is clearly representable as unicode, since it's
representable using the ASCII character set (which is a subset of the
unicode character set).

 I can't think of
 any usecase where you would want base64 encoded data unless you intend
 to transmit it over an 8-bit channel,

I can think of many usecases where I want to *embed* base64-encoded
data in a larger text *before* encoding that text and transmitting
it over a 8-bit channel.

(GPG signatures, binary data embedded in JSON objects, etc.)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou solip...@pitrou.net wrote:
 I can think of many usecases where I want to *embed* base64-encoded
 data in a larger text *before* encoding that text and transmitting
 it over a 8-bit channel.

That still doesn't mean that this should be the default behavior. Just
because you *can* represent base64 as Unicode text doesn't mean that
it should be.

 (GPG signatures, binary data embedded in JSON objects, etc.)

Is the GPG signature calculated on the *Unicode* data? How is that
done? Isn't it done on the encoded message? As I understand it a GPG
signature is done on any sort of document. Either me or you have
completely misunderstood how GPG works, I think. :-)

In the case of JSON objects, they are intended for data exchange, and
hence in the end need to be byte strings. So if you have a byte string
you want to base64 encode before transmitting it with json, you would
just end up transforming it to a unicode string and then back. That
doesn't seem useful.

One use case where you clearly *do* want the base64 encoded data to be
unicode strings is because you want to embed it in a text discussing
base64 strings, for a blog or a book or something. That doesn't seem
to be a very common usecase.

For the most part you base64 encode things because it's going to be
transmitted, and hence the natural result of a base64 encoding should
be data that is ready to be transmitted, hence byte strings, and not
Unicode strings.

 Python 3 doesn't *view* text as unicode, it *represents* it as unicode.

I don't agree that there is a significant difference between those
wordings in this context. The end result is the same: Things intended
to be handled/seen as textual should be unicode strings, things
intended for data exchange should be byte strings. Something that is
base64 encoded is primarily intended for data exchange. A base64
encoding should therefore return byte strings, especially since most
API's that perform this transmission will take byte strings as input.
If you want to include this in textual data, for whatever reason, like
printing it in a book, then the conversion is trivial, but that is
clearly the less common use case, and should therefore not be the
default behavior.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/25/2013 01:43 AM, Antoine Pitrou wrote:
 On Thu, 25 Apr 2013 04:19:36 +0200 Lennart Regebro rege...@gmail.com
 wrote:
 On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull
 step...@xemacs.org wrote:
 RFC 4648 repeatedly refers to *characters*, without specifying an 
 encoding for them.
 [...]
 
 Base64 is an encoding that transforms between 8-bit streams.
 
 No, it isn't. What Stephen wrote above.

Stephen was incorrect:  the base64 standard is about encoding a binary
stream (8-bit bites) onto another binary stream (6-bit bytes), but one
which can be safely transmitted over a 7-bit-only medium.  Text in Py3ks
sense is irrelevant.

 Either you get a LookupError: unknown encoding: base64, which is
 what you get now, or you get an UnicodeEncodingError if the text is
 not ASCII. We don't want the latter, because it means that code that
 looks fine for the developer breaks in real life because the
 developer was American
 
 That's bogus. By the same argument, we should suppress any encoding
 which isn't able to represent all possible unicode strings. That's
 almost all encodings provided by Python (including utf-8, if you
 consider lone surrogates).
 
 I'm sorry for Americans, but they *still* must know about character 
 encodings, and be ready to handle UnicodeErrors, when using Python 3
 for encoding/decoding bytestrings. There's no way around it.

WHat does that snark have to do with this discussion?  base64 has no more
to do with character set encodings than it does the moon.  It would be a
transform (bytes - bytes), not an encoding.


Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlF5Nc4ACgkQ+gerLs4ltQ7f9ACgx19dzyLXCDzkLkWITSU+7WyD
XEMAn38mZgK8F1/FGWJc+ANOJz2tfHI/
=qpSL
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Barry Warsaw
On Apr 25, 2013, at 03:34 PM, Lennart Regebro wrote:

In the case of JSON objects, they are intended for data exchange, and
hence in the end need to be byte strings.

Except that they're not.

http://bugs.python.org/issue10976

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread MRAB

On 25/04/2013 14:34, Lennart Regebro wrote:

On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou solip...@pitrou.net wrote:

I can think of many usecases where I want to *embed* base64-encoded
data in a larger text *before* encoding that text and transmitting
it over a 8-bit channel.


That still doesn't mean that this should be the default behavior. Just
because you *can* represent base64 as Unicode text doesn't mean that
it should be.


(GPG signatures, binary data embedded in JSON objects, etc.)


Is the GPG signature calculated on the *Unicode* data? How is that
done? Isn't it done on the encoded message? As I understand it a GPG
signature is done on any sort of document. Either me or you have
completely misunderstood how GPG works, I think. :-)

In the case of JSON objects, they are intended for data exchange, and
hence in the end need to be byte strings. So if you have a byte string
you want to base64 encode before transmitting it with json, you would
just end up transforming it to a unicode string and then back. That
doesn't seem useful.


The JSON specification says that it's text. Its string literals can
contain Unicode codepoints. It needs to be encoded to bytes for
transmission and storage, but JSON itself is not a bytestring format.


One use case where you clearly *do* want the base64 encoded data to be
unicode strings is because you want to embed it in a text discussing
base64 strings, for a blog or a book or something. That doesn't seem
to be a very common usecase.

For the most part you base64 encode things because it's going to be
transmitted, and hence the natural result of a base64 encoding should
be data that is ready to be transmitted, hence byte strings, and not
Unicode strings.


Python 3 doesn't *view* text as unicode, it *represents* it as unicode.


I don't agree that there is a significant difference between those
wordings in this context. The end result is the same: Things intended
to be handled/seen as textual should be unicode strings, things
intended for data exchange should be byte strings. Something that is
base64 encoded is primarily intended for data exchange. A base64
encoding should therefore return byte strings, especially since most
API's that perform this transmission will take byte strings as input.
If you want to include this in textual data, for whatever reason, like
printing it in a book, then the conversion is trivial, but that is
clearly the less common use case, and should therefore not be the
default behavior.


base64 is a way of encoding binary data as text. The problem is that
traditionally text has been encoded with one byte per character, except
in those locales where there were too many characters in the character
set for that to be possible.

In Python 3 we're trying to stop mixing binary data (bytestrings) with
text (Unicode strings).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Daniel Holth
On Thu, Apr 25, 2013 at 10:07 AM, Barry Warsaw ba...@python.org wrote:
 On Apr 25, 2013, at 03:34 PM, Lennart Regebro wrote:

In the case of JSON objects, they are intended for data exchange, and
hence in the end need to be byte strings.

 Except that they're not.

 http://bugs.python.org/issue10976

 -Barry
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/dholth%40gmail.com

What am I doing wrong in this JSON crypto signature verification
snippet that features many conversions between binary and text?

recipients = jwsjs[recipients]
encoded_payload = binary(jwsjs[payload])
headers = []
for recipient in recipients:
h = binary(recipient[header])
s = binary(recipient[signature])
header = json.loads(native(urlsafe_b64decode(h)))
vk = urlsafe_b64decode(binary(header[jwk][vk]))
secured_input = b..join((h, encoded_payload))
sig = urlsafe_b64decode(s)
sig_msg = sig+secured_input
verified_input = native(ed25519ll.crypto_sign_open(sig_msg, vk))
verified_header, verified_payload = verified_input.split('.')
verified_header = binary(verified_header)
decoded_header = native(urlsafe_b64decode(verified_header))
headers.append(json.loads(decoded_header))

verified_payload = binary(verified_payload)

# only return header, payload that have passed through the crypto library.
payload = json.loads(native(urlsafe_b64decode(verified_payload)))

return headers, payload
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 4:22 PM, MRAB pyt...@mrabarnett.plus.com wrote:
 The JSON specification says that it's text. Its string literals can
 contain Unicode codepoints. It needs to be encoded to bytes for
 transmission and storage, but JSON itself is not a bytestring format.

OK, fair enough.

 base64 is a way of encoding binary data as text.

It's a way of encoding binary data using ASCII. There is a subtle but
important difference.

 In Python 3 we're trying to stop mixing binary data (bytestrings) with
 text (Unicode strings).

Yup. And that's why a byte64 encoding shouldn't return Unicode strings.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Antoine Pitrou
Le Thu, 25 Apr 2013 09:55:26 -0400,
Tres Seaver tsea...@palladion.com a écrit :
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 04/25/2013 01:43 AM, Antoine Pitrou wrote:
  On Thu, 25 Apr 2013 04:19:36 +0200 Lennart Regebro
  rege...@gmail.com wrote:
  On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull
  step...@xemacs.org wrote:
  RFC 4648 repeatedly refers to *characters*, without specifying an 
  encoding for them.
  [...]
  
  Base64 is an encoding that transforms between 8-bit streams.
  
  No, it isn't. What Stephen wrote above.
 
 Stephen was incorrect:  the base64 standard is about encoding a binary
 stream (8-bit bites) onto another binary stream (6-bit bytes), but one
 which can be safely transmitted over a 7-bit-only medium.

So where does the RFC talk about 6-bit bytes at all? Or did you just
invent it?


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Antoine Pitrou
Le Thu, 25 Apr 2013 15:34:45 +0200,
Lennart Regebro rege...@gmail.com a écrit :
 
 I don't agree that there is a significant difference between those
 wordings in this context. The end result is the same: Things intended
 to be handled/seen as textual should be unicode strings, things
 intended for data exchange should be byte strings.

I don't think this distinction is meaningful at all. In the end,
everything is a byte string on a classical computer (including unicode
strings displayed on your monitor, obviously).

If you think the technicalities of an operation should never be hidden
or abstracted away, then you're better off with C than Python ;-)

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread MRAB

On 25/04/2013 15:22, MRAB wrote:

On 25/04/2013 14:34, Lennart Regebro wrote:

On Thu, Apr 25, 2013 at 2:57 PM, Antoine Pitrou solip...@pitrou.net wrote:

I can think of many usecases where I want to *embed* base64-encoded
data in a larger text *before* encoding that text and transmitting
it over a 8-bit channel.


That still doesn't mean that this should be the default behavior. Just
because you *can* represent base64 as Unicode text doesn't mean that
it should be.


[snip]

One use case where you clearly *do* want the base64 encoded data to be
unicode strings is because you want to embed it in a text discussing
base64 strings, for a blog or a book or something. That doesn't seem
to be a very common usecase.

For the most part you base64 encode things because it's going to be
transmitted, and hence the natural result of a base64 encoding should
be data that is ready to be transmitted, hence byte strings, and not
Unicode strings.


Python 3 doesn't *view* text as unicode, it *represents* it as unicode.


I don't agree that there is a significant difference between those
wordings in this context. The end result is the same: Things intended
to be handled/seen as textual should be unicode strings, things
intended for data exchange should be byte strings. Something that is
base64 encoded is primarily intended for data exchange. A base64
encoding should therefore return byte strings, especially since most
API's that perform this transmission will take byte strings as input.
If you want to include this in textual data, for whatever reason, like
printing it in a book, then the conversion is trivial, but that is
clearly the less common use case, and should therefore not be the
default behavior.


base64 is a way of encoding binary data as text. The problem is that
traditionally text has been encoded with one byte per character, except
in those locales where there were too many characters in the character
set for that to be possible.

In Python 3 we're trying to stop mixing binary data (bytestrings) with
text (Unicode strings).

RFC 4648 says Base encoding of data is used in many situations to 
store or transfer data in environments that, perhaps for legacy reasons, 
are restricted to US-ASCII [1] data..


To me, US-ASCII is an encoding, so it appears to be talking about
encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 5:27 PM, Antoine Pitrou solip...@pitrou.net wrote:
 Le Thu, 25 Apr 2013 15:34:45 +0200,
 Lennart Regebro rege...@gmail.com a écrit :

 I don't agree that there is a significant difference between those
 wordings in this context. The end result is the same: Things intended
 to be handled/seen as textual should be unicode strings, things
 intended for data exchange should be byte strings.

 I don't think this distinction is meaningful at all.

OK, then I think we have found the core of the problem, and the end of
the discussion (from my side, that is).

 In the end,
 everything is a byte string on a classical computer (including unicode
 strings displayed on your monitor, obviously).

Yes of course. Especially since my monitor is an output device. ;-)

 If you think the technicalities of an operation should never be hidden
 or abstracted away, then you're better off with C than Python ;-)

The whole point is that Python *does* abstract it away. It abstract
the internals of Unicode strings in such a way that they are no
longer, conceptually, 8-bit data. This *is* a distinction Python does,
and it is a useful distinction. I do not see any reason to remove it.

http://regebro.wordpress.com/2011/03/23/unconfusing-unicode-what-is-unicode/

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Stephen J. Turnbull
Lennart Regebro writes:
  On Thu, Apr 25, 2013 at 4:22 PM, MRAB pyt...@mrabarnett.plus.com wrote:
   The JSON specification says that it's text. Its string literals can
   contain Unicode codepoints. It needs to be encoded to bytes for
   transmission and storage, but JSON itself is not a bytestring format.
  
  OK, fair enough.
  
   base64 is a way of encoding binary data as text.
  
  It's a way of encoding binary data using ASCII. There is a subtle but
  important difference.

Yes, there is a difference, but I think you're wrong.  RFC 4648
explicitly states that Base-n encodings are intended for human
handling and even makes reference to character glyphs (the rationale
for excluding confusable digits from the Base32 alphabet).  That's
text.  Even if it is a rather restricted subset of text, those
restrictions are much stronger than merely to ASCII, and they are
based on aspects of text that go well beyond merely an encoding with a
small code unit.

   In Python 3 we're trying to stop mixing binary data (bytestrings) with
   text (Unicode strings).
  
  Yup. And that's why a byte64 encoding shouldn't return Unicode strings.

That's inaccurate.  Antoine has presented several examples of why
*some* base64 encoders might return Unicode strings, precisely because
their output will be embedded in Unicode streams.  Debugging the MIME
composition functions in the email module is another.

An accurate statement is that these use cases are relatively unusual.
The common use case is feeding a binary stream directly into a wire
protocol.  Supporting that use case demands a base64 encoder with a
bytes-to-bytes signature in the stdlib, for both convenience and to
some extent efficiency.

I don't really care if the stdlib supports the specialized use cases
with a separate base64 encoder (Antoine suggested the binascii
module), or if it leaves that up to the user (it's just an occasional
use of .decode('ascii'), after all).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Isaac Morland

On Thu, 25 Apr 2013, Lennart Regebro wrote:


On Thu, Apr 25, 2013 at 4:22 PM, MRAB pyt...@mrabarnett.plus.com wrote:

The JSON specification says that it's text. Its string literals can
contain Unicode codepoints. It needs to be encoded to bytes for
transmission and storage, but JSON itself is not a bytestring format.


OK, fair enough.


base64 is a way of encoding binary data as text.


It's a way of encoding binary data using ASCII. There is a subtle but
important difference.


It is a way of encoding arrays of 8-bit bytes as arrays of characters that 
are part of the printable, non-whitespace subset of the ASCII repertoire. 
Since the ASCII repertoire is now simply the first 128 code points in the 
Unicode repertoire, it is equally correct to say that base64 is a way of 
encoding binary data as Unicode text.



In Python 3 we're trying to stop mixing binary data (bytestrings) with
text (Unicode strings).


Yup. And that's why a byte64 encoding shouldn't return Unicode strings.


That is exactly why it should return Unicode strings.  What bytes should 
get sent if base64 is used to send a byte array over an EBCDIC link? [*]


Having said that, there may be other reasons for base64 encoding to return 
bytes - I can conceive of arguments involving efficiency, or practicality, 
or the most common use cases.  So I can't say for sure what base64 
encoding actually ought to return in Python.  But the purist stance should 
be that base64 encoding should return text, i.e. a string, i.e. unicode.


[*] I apologize to anybody who just ate.

Isaac Morland   CSCF Web Guru
DC 2554C, x36650WWW Software Specialist
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-25 Thread Stephen J. Turnbull
MRAB writes:

  RFC 4648 says Base encoding of data is used in many situations to 
  store or transfer data in environments that, perhaps for legacy reasons, 
  are restricted to US-ASCII [1] data..
  
  To me, US-ASCII is an encoding, so it appears to be talking about
  encoding binary data (bytestrings) to ASCII-encoded text (bytestrings).

I think that's a misreading, inconsistent with the rest of the RFC.

The references to US-ASCII are not clearly normative, as the value-
character mappings are given in tables, and are self-contained.  (The
one you quote is clearly informative, since it describes a use-case.)
The term subset of US-ASCII suggests repertoire, not encoding, as
does the use of alphabet to refer to these subsets.

*Every* (other?) normative statement is very careful to say that input
of a Base-n encoder is octets (with two uses of bytes in the
definition of Base32), and the output is characters.  There are no
exceptions, and there are *no* references to encoding of characters or
the corresponding character codes (except the possible implicit
reference via US-ASCII).

I can make no sense of those facts if the intent of the RFC is to
restrict the output of a Base-n encoder to characters encoded in
(8-bit) US-ASCII.  Why not just say so, and use octets and their
ASCII codes throughout, with the corresponding characters used as
informative commentary?  I think it much more likely that subset of
the character repertoire of US-ASCII was intended, but abbreviated to
subset of US-ASCII.  This kind of abbreviation is very common in
informal discussion of coded character sets.

I admit it's a little surprising that the author would be so
incautious in his use of US-ASCII, but if he really meant US-ASCII-
the-encoding, I find the style of the rest of the RFC astonishing!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread M.-A. Lemburg
On 23.04.2013 23:37, Nick Coghlan wrote:
 On 24 Apr 2013 01:25, M.-A. Lemburg m...@egenix.com wrote:

 On 23.04.2013 17:15, Barry Warsaw wrote:
 On Apr 22, 2013, at 06:22 PM, Guido van Rossum wrote:

 You can ask the same question about all the other codecs.  (And that
 question has indeed been asked in the past.)

 Except for rot13. :-)

 The fact that you can do this instead *is* a bit odd. ;)

 from codecs import getencoder
 encoder = getencoder('rot-13')
 r13 = encoder('hello world')[0]

 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.
 
 If we already have those, why aren't they documented? 

Good question. I added them in 2004 and probably just forgot
to add the documentation:

http://hg.python.org/cpython-fullhistory/rev/8ea2cb1ec598

I guess the doc-strings could be used as basis for the
documentation.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 24 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-04-17: Released eGenix mx Base 3.2.6 ... http://egenix.com/go43

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread M.-A. Lemburg
On 23.04.2013 19:24, Guido van Rossum wrote:
 On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 23.04.2013 17:47, Guido van Rossum wrote:
 On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg m...@egenix.com wrote:
 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.

 As an implementation mechanism I see nothing wrong with this. I hope
 the codecs module lets you introspect the input and output types of a
 codec given by name?

 At the moment there is no standard interface to access supported
 input and output types... but then: regular Python functions or
 methods also don't provide such functionality, so no surprise
 there ;-)
 
 Not quite the same though. Each function has its own unique behavior.
 But codecs support a standard interface, *except* that the input and
 output types sometimes vary.

The codec system itself

 It's mostly a matter of specifying the supported type
 combinations in the codec documentation.

 BTW: What would be a use case where you'd want to
 programmatically access such information before calling
 the codec ?
 
 As you know, in Python 3, most code working with bytes doesn't also
 work with strings, and vice versa (except for a few cases where we've
 gone out of our way to write polymorphic code -- but users rarely do
 so, and any time you use a string or bytes literal you basically limit
 yourself to that type).
 
 Suppose I write a command-line utility that reads a file, runs it
 through a codec, and writes the result to another file. Suppose the
 name of the codec is a command-line argument (as well as the
 filenames). I need to know whether to open the files in text or binary
 mode based on the name of the codec.

Ok, so you need to know which codecs your tool can support and
which of those need text input and which bytes input.

I've been thinking about this some more: I think that type
information alone is not flexible enough to cover such
use cases.

In your use case you'd want to only permit use of a certain
set of codecs, not simply all of them, since some might
not implement what you actually want to achieve with the tool,
e.g. a user might have installed a codec set that adds
support for reading and writing image data, but your
intended use was to only support text data.

So what we need is a way to allow the codecs to say e.g.
I work on text, I support encoding bytes and text,
I encode to bytes, I'm reversible, I transform
input data, I support bytes and text, and will create
same type output, I work on image data, I work on
X509 certificates, I work on XML data, etc.

In other words, we need a form of tagging system, with a
set of standard tags that each codec can publish and
which also allows non-standard tags (which can then at
some point be made standard, if there's agreement on them).

Given a codec name you could then ask the codec registry for
the codec tags and verify that the chosen codec handles
text data, needs bytes or text encoding input and
creates bytes as encoding output. If the registry returns
codec tags that don't include the I work on text tag,
the tool could then raise an error.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 24 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-04-17: Released eGenix mx Base 3.2.6 ... http://egenix.com/go43

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread Glenn Linderman

On 4/24/2013 1:22 AM, M.-A. Lemburg wrote:

On 23.04.2013 19:24, Guido van Rossum wrote:

On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg m...@egenix.com wrote:

On 23.04.2013 17:47, Guido van Rossum wrote:

On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg m...@egenix.com wrote:

Just as reminder: we have the general purpose
encode()/decode() functions in the codecs module:

import codecs
r13 = codecs.encode('hello world', 'rot-13')

These interface directly to the codec interfaces, without
enforcing type restrictions. The codec defines the supported
input and output types.

As an implementation mechanism I see nothing wrong with this. I hope
the codecs module lets you introspect the input and output types of a
codec given by name?

At the moment there is no standard interface to access supported
input and output types... but then: regular Python functions or
methods also don't provide such functionality, so no surprise
there ;-)

Not quite the same though. Each function has its own unique behavior.
But codecs support a standard interface, *except* that the input and
output types sometimes vary.

The codec system itself


It's mostly a matter of specifying the supported type
combinations in the codec documentation.

BTW: What would be a use case where you'd want to
programmatically access such information before calling
the codec ?

As you know, in Python 3, most code working with bytes doesn't also
work with strings, and vice versa (except for a few cases where we've
gone out of our way to write polymorphic code -- but users rarely do
so, and any time you use a string or bytes literal you basically limit
yourself to that type).

Suppose I write a command-line utility that reads a file, runs it
through a codec, and writes the result to another file. Suppose the
name of the codec is a command-line argument (as well as the
filenames). I need to know whether to open the files in text or binary
mode based on the name of the codec.

Ok, so you need to know which codecs your tool can support and
which of those need text input and which bytes input.

I've been thinking about this some more: I think that type
information alone is not flexible enough to cover such
use cases.


Maybe MIME type and encoding would be sufficient type information, but 
probably not str vs. bytes.



In your use case you'd want to only permit use of a certain
set of codecs, not simply all of them, since some might
not implement what you actually want to achieve with the tool,
e.g. a user might have installed a codec set that adds
support for reading and writing image data, but your
intended use was to only support text data.


MIME type supports this sort of concept, with the two-level hierarchy of 
naming the type... text/xml text/plain image/jpeg



So what we need is a way to allow the codecs to say e.g.
I work on text, I support encoding bytes and text,
I encode to bytes, I'm reversible, I transform
input data, I support bytes and text, and will create
same type output, I work on image data, I work on
X509 certificates, I work on XML data, etc.


Guess what I think you are re-inventing here
Nope, guess again
Yep, MIME types _plus_ encodings.


In other words, we need a form of tagging system, with a
set of standard tags that each codec can publish and
which also allows non-standard tags (which can then at
some point be made standard, if there's agreement on them).


Hmm.  Sounds just like the registry for, um, you guessed it: MIME types.


Given a codec name you could then ask the codec registry for
the codec tags and verify that the chosen codec handles
text data, needs bytes or text encoding input and
creates bytes as encoding output. If the registry returns
codec tags that don't include the I work on text tag,
the tool could then raise an error.


For just doing text encoding transformations,  text/plain would work as 
a MIME type, and the encodings of interest for the encodings.


Seems like str always means Unicode but the MIME type can vary; 
bytes might mean encoded text, and the MIME type can also vary.


For non-textual transformations, encoding might mean Base 64, BinHex, 
or other such representations... but those can also be applied to text, 
so it might be a 3rd dimension, or it might just be a list of encodings 
rather than a single encoding.


Compression could be another dimension, or perhaps another encoding.

But really, then, a transformation needs to be a list of steps; a codec 
can sign up to perform one or more of the steps, a sequence of codecs 
would have to be found, capable of performing a subsequence of the 
steps, and then run in the appropriate order.


This all sounds so general, that probably the Python compiler could be 
implemented as a codec :)  Or any compiler. Probably a web server could 
be implemented as a codec too :)  Well, maybe not, codecs have limited 
error handling and reporting abilities.
___
Python-Dev mailing list
Python-Dev@python.org

Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread Tres Seaver
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 04/23/2013 09:29 AM, Stephen J. Turnbull wrote:
 By RFC specification, BASE64 is a *textual* representation of
 arbitrary binary data.

It isn't text in the sense Py3k means:  it is a representation for
transmission on-the-wire for protocols which requre 7-bit-safe data.
Nobody working with base64-encoded data is going to expect to do normal
string processing on that data:  the closest thing to that is splitting
it into 72-byte chunks for transmission via e-mail.

Tres.
- -- 
===
Tres Seaver  +1 540-429-0999  tsea...@palladion.com
Palladion Software   Excellence by Designhttp://palladion.com
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with undefined - http://www.enigmail.net/

iEYEARECAAYFAlF4D9YACgkQ+gerLs4ltQ5nUACfWm4YEMarjUb7fEEpP+aMtaQr
a7kAn1Pc8ufUwJzKHD0DgSxQ4H/uqf82
=CzTZ
-END PGP SIGNATURE-

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread Stephen J. Turnbull
Tres Seaver writes:

  On 04/23/2013 09:29 AM, Stephen J. Turnbull wrote:
   By RFC specification, BASE64 is a *textual* representation of
   arbitrary binary data.
  
  It isn't text in the sense Py3k means:

RFC 4648 repeatedly refers to *characters*, without specifying an
encoding for them.  In fact, if you copy accurately, you can write
BASE64 on a napkin and that napkin will accurate transmit the data
(assuming it doesn't run into sleet or gloom of night).  What else is
that but text in the sense of Py3k?

My point is not that Python's base64 codec *should* be bytes-to-str
and back.  My point is that, both in the formal spec and in historical
evolution, that is a plausible interpretation of .encode('base64')
which happens to be the reverse of the normal codec convention, where
.encode(codec) is a *string* method, and .decode(codec) is a
*bytes* method.

This is not harder to learn for people (for BASE64 encoding or for
coded character sets), because in each case there's a natural sense of
direction for *en*coding vs. *de*coding.  But it does break duck-
typing, as does the web developer bytes-to-bytes usage of BASE64.

What I'm groping toward is an idea of a variable method, so that we
could use .encode and .decode where they are TOOWTDI for people even
though a purely formal interpretation of duck-typing would say but
why is that blue whale quacking, waddling, and flying?  In other
words (although I have no idea how best to implement it), I would like
somestring.encode('base64') to fail with I don't know how to do
that (an attribute lookup error?), the same way that
somebytes.encode('utf-8') does in Python 3 today.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread Lennart Regebro
On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull step...@xemacs.org wrote:
 RFC 4648 repeatedly refers to *characters*, without specifying an
 encoding for them.  In fact, if you copy accurately, you can write
 BASE64 on a napkin and that napkin will accurate transmit the data
 (assuming it doesn't run into sleet or gloom of night).

Or Mrs Cake.

 What else is that but text in the sense of Py3k?

Text in the sense of Py3k is Unicode. That a 8-bit character stream
(or in this case 6-bit) fits in the 31 bit character space of Unicode
doesn't make it Unicode, and hence not text. (Napkins of course have
even higher bit density than 31 bits per character, unless you write
very small). From the viewpoint of Py3k, bytes data is not text.

This is a very useful way to deal with Unicode. See also
http://regebro.wordpress.com/2011/03/23/unconfusing-unicode-what-is-unicode/

 My point is not that Python's base64 codec *should* be bytes-to-str
 and back.

Base64 does not convert between a Unicode character stream and an
8-bite byte stream. It converts between a 8-bit byte-stream and an
8-bit byte stream. It therefore should be bytes to bytes. To fit
Unicode text into Base64 you have to first use an encoding on that
Unicode text to convert it to bytes.

 What I'm groping toward is an idea of a variable method, so that we
 could use .encode and .decode where they are TOOWTDI for people even
 though a purely formal interpretation of duck-typing would say but
 why is that blue whale quacking, waddling, and flying?  In other
 words (although I have no idea how best to implement it), I would like
 somestring.encode('base64') to fail with I don't know how to do
 that (an attribute lookup error?), the same way that
 somebytes.encode('utf-8') does in Python 3 today.

There's only two options there. Either you get a LookupError: unknown
encoding: base64, which is what you get now, or you get an
UnicodeEncodingError if the text is not ASCII. We don't want the
latter, because it means that code that looks fine for the developer
breaks in real life because the developer was American and didn't
think of this, but his client happens to have an accent in the name.

Base64 is an encoding that transforms between 8-bit streams. Let it be
that. Don't try to shoehorn it into a completely different kind of
encoding.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-24 Thread Antoine Pitrou
On Thu, 25 Apr 2013 04:19:36 +0200
Lennart Regebro rege...@gmail.com wrote:
 On Thu, Apr 25, 2013 at 3:54 AM, Stephen J. Turnbull step...@xemacs.org 
 wrote:
  RFC 4648 repeatedly refers to *characters*, without specifying an
  encoding for them.
[...]
 
 Base64 is an encoding that transforms between 8-bit streams.

No, it isn't. What Stephen wrote above.

 Either you get a LookupError: unknown
 encoding: base64, which is what you get now, or you get an
 UnicodeEncodingError if the text is not ASCII. We don't want the
 latter, because it means that code that looks fine for the developer
 breaks in real life because the developer was American

That's bogus. By the same argument, we should suppress any
encoding which isn't able to represent all possible unicode strings.
That's almost all encodings provided by Python (including utf-8, if
you consider lone surrogates).

I'm sorry for Americans, but they *still* must know about character
encodings, and be ready to handle UnicodeErrors, when using Python 3 for
encoding/decoding bytestrings. There's no way around it.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Greg Ewing

Steven D'Aprano wrote:
- If it is no burden to have to import a module and call an external 
function for some transformations, why have encode and decode methods at 
all?


Now that all text strings are unicode, the unicode codecs
are in a sense special, in that you can't do any string
I/O at all without using them at some stage. So arguably
it makes sense to have a very easy way of invoking them.

I suspect that without this, the idea of all strings
being unicode would have been even harder to sell than
it was.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Stephen J. Turnbull
R. David Murray writes:

  You transform *into* the encoding, and untransform *out* of the
  encoding.  Do you have an example where that would be ambiguous?

In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
and ISO-8859-15) would do.  Or how about in text, ReST to HTML?

BASE64 itself is ambiguous.  By RFC specification, BASE64 is a
*textual* representation of arbitrary binary data.  (Cf. URIs.)  The
natural interpretation of .encode('base64') in that context would be
as a bytes-to-text encoder.  However, this has several problems.  In
practice, we invariably use an ASCII octet stream to carry BASE64-
encoded data.  So web developers would almost certainly expect a
bytes-to-bytes encoder.  Such a bytes-to-bytes encoder can't be
duck-typed.  Double-encoding bugs wouldn't be detected until the
stream arrives at the user.  And the RFC-based signature of
.encode('base64') as bytes-to-text is precisely opposite to that of
.encode('utf-8') (text-to-bytes).

It is certainly true that there are many unambiguous cases.  In the
case of a true text processing facility (eg, Emacs buffers or Python 3
str) where there is an unambiguous text type with a constant and
opaque internal representation, it makes a lot of sense to treat the
text type as special/central, and use the terminology encode [from
text] and decode [to text].  It's easy to remember, which one is
special is obvious, and the difference in input and output types means
that mistaken use of the API will be detected by duck-typing.

However, in the case of bytes-bytes or text-text transformations, it's
not the presence of unambiguous cases that should drive API design
IMO.  It's the presence of the ambiguous cases that we should cater
to.  I don't see easy solutions to this issue.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread R. David Murray
On Tue, 23 Apr 2013 22:29:33 +0900, Stephen J. Turnbull step...@xemacs.org 
wrote:
 R. David Murray writes:
 
   You transform *into* the encoding, and untransform *out* of the
   encoding.  Do you have an example where that would be ambiguous?
 
 In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
 and ISO-8859-15) would do.  Or how about in text, ReST to HTML?

If I write:

  bytestring.transform('ISO-8859-15')

that would indeed be ambiguous, but only because I haven't named the
source encoding of the bytestring.  So the above is obviously
nonsense, and the easiest fix is to have the things that are currently
bytes-to-text or text-to-bytes character set transformations *only*
work with encode/decode, and not transform/untransform.

 BASE64 itself is ambiguous.  By RFC specification, BASE64 is a
 *textual* representation of arbitrary binary data.  (Cf. URIs.)  The
 natural interpretation of .encode('base64') in that context would be
 as a bytes-to-text encoder.  However, this has several problems.  In
 practice, we invariably use an ASCII octet stream to carry BASE64-
 encoded data.  So web developers would almost certainly expect a
 bytes-to-bytes encoder.  Such a bytes-to-bytes encoder can't be
 duck-typed.  Double-encoding bugs wouldn't be detected until the
 stream arrives at the user.  And the RFC-based signature of
 .encode('base64') as bytes-to-text is precisely opposite to that of
 .encode('utf-8') (text-to-bytes).

I believe that after much discussion we have settled on these
transformations (in their respective modules) accepting either bytes
or strings as input for decoding, only bytes as input for encoding,
and *always* producing bytes as output.  (Note that the base64 docs need
some clarification about this.)

Given this, the possible valid transformations would be:

  bytestring.transform('base64')
  bytesstring.untransform('base64')
  string.untransform('base64')

and all would produce a byte string.  That byte string would be in
base64 for the first one, and a decoded binary string for the second two.

Given our existing API, I don't think we want

  string.encode('base64')

to work (taking an ascii-only unicode string and returning bytes), and
we've already agreed that adding a 'decode' method to string is not
going to happen.

We could, however, and quite possibly should, disallow

  string.untransform('base64')

even though the underly module supports it.  Thus we would only have
bytes-to-bytes transformations for 'base64' and its siblings, and you
would write the unicode-ascii-to-bytes transformation as:

  string.encode('ascii').untransform('base64')

which has some pedagogical value :).

If you do transform('base64') on a bytestring already encoded as base64
you get a double encoding, yes.  I don't see that it is our responsibility
to try to protect you from this mistake.  The module functions certainly
don't.

Given that, is there anything ambiguous about the proposed API?

(Note: if you would like to argue that, eg, base64.b64encode or
binascii.b2a_base64 should return a string, it is too late for that
argument for backward compatibility reasons.)

 It is certainly true that there are many unambiguous cases.  In the
 case of a true text processing facility (eg, Emacs buffers or Python 3
 str) where there is an unambiguous text type with a constant and
 opaque internal representation, it makes a lot of sense to treat the
 text type as special/central, and use the terminology encode [from
 text] and decode [to text].  It's easy to remember, which one is
 special is obvious, and the difference in input and output types means
 that mistaken use of the API will be detected by duck-typing.
 
 However, in the case of bytes-bytes or text-text transformations, it's
 not the presence of unambiguous cases that should drive API design
 IMO.  It's the presence of the ambiguous cases that we should cater
 to.  I don't see easy solutions to this issue.

When I asked about ambiguous cases, I was asking for cases where the
meaning of transform('somecodec') was ambiguous.  Sure, it is possible
to feed the wrong input into that transformation, but I consider that a
programming error, not an ambiguity in the API.  After all, you have
exactly the same problem if you use the module functions directly,
which is currently the only option.

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Nick Coghlan
On Wed, Apr 24, 2013 at 12:16 AM, R. David Murray rdmur...@bitdance.com wrote:
 On Tue, 23 Apr 2013 22:29:33 +0900, Stephen J. Turnbull 
 step...@xemacs.org wrote:
 R. David Murray writes:

   You transform *into* the encoding, and untransform *out* of the
   encoding.  Do you have an example where that would be ambiguous?

 In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
 and ISO-8859-15) would do.  Or how about in text, ReST to HTML?

 If I write:

   bytestring.transform('ISO-8859-15')

 that would indeed be ambiguous, but only because I haven't named the
 source encoding of the bytestring.  So the above is obviously
 nonsense, and the easiest fix is to have the things that are currently
 bytes-to-text or text-to-bytes character set transformations *only*
 work with encode/decode, and not transform/untransform.

And that's where it all falls down - to make that work, you need to
engineer a complex system into the codecs module to say this codec
can be used with that API, but not with this one. I designed such a
beast in http://bugs.python.org/issue7475 and I now think it's a *bad
idea*.

By contrast, the convenience function approach dispenses with all
that, and simply says:

1. If you just want to deal with text encodings, use str.encode (which
always produces bytes), along with bytes.decode and bytearray.decode
(which always produce str)
2. If you want to use arbitrary codecs without any additional type
constraints, do from codecs import encode, decode

I think there's value in hiding the arbitrary codec support behind an
import barrier (as they definitely have potential to be an attractive
nuisance that makes it harder to grasp the nature of Unicode and text
encodings, particularly for those coming from Python 2.x), but I'm not
hugely opposed to providing them as builtins either.

Cheers,
Nick.

--
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Barry Warsaw
On Apr 22, 2013, at 10:30 PM, Donald Stufft wrote:

I may be dull, but it wasn't until I started using Python 3 that it really
clicked in my head what encode/decode did exactly. In Python2 I just sort of
sprinkled one or the other when there was errors until the pain stopped. I
mostly attribute this to str.decode and bytes.encode not existing.

This is a key observation.  It's also now much easier to *explain* what's
going on and recommend correct code in Python 3, so overall it's a win.

That's not to downplay the inconvenience of not being able to easily do
bytes-bytes or str-str transformations as easily as was possible in Python
2.  I've not thought about it much, but placing those types of transformations
on a different set of functions (methods or builtins) seems like the right
direction.  IOW, don't mess with encode/decode.

-Barry


signature.asc
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Barry Warsaw
On Apr 22, 2013, at 06:22 PM, Guido van Rossum wrote:

 You can ask the same question about all the other codecs.  (And that
 question has indeed been asked in the past.)

Except for rot13. :-)

The fact that you can do this instead *is* a bit odd. ;)

from codecs import getencoder
encoder = getencoder('rot-13')
r13 = encoder('hello world')[0]

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread M.-A. Lemburg
On 23.04.2013 17:15, Barry Warsaw wrote:
 On Apr 22, 2013, at 06:22 PM, Guido van Rossum wrote:
 
 You can ask the same question about all the other codecs.  (And that
 question has indeed been asked in the past.)

 Except for rot13. :-)
 
 The fact that you can do this instead *is* a bit odd. ;)
 
 from codecs import getencoder
 encoder = getencoder('rot-13')
 r13 = encoder('hello world')[0]

Just as reminder: we have the general purpose
encode()/decode() functions in the codecs module:

import codecs
r13 = codecs.encode('hello world', 'rot-13')

These interface directly to the codec interfaces, without
enforcing type restrictions. The codec defines the supported
input and output types.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 23 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-04-17: Released eGenix mx Base 3.2.6 ... http://egenix.com/go43

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Guido van Rossum
On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg m...@egenix.com wrote:
 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.

As an implementation mechanism I see nothing wrong with this. I hope
the codecs module lets you introspect the input and output types of a
codec given by name?

--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread M.-A. Lemburg
On 23.04.2013 17:47, Guido van Rossum wrote:
 On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg m...@egenix.com wrote:
 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.
 
 As an implementation mechanism I see nothing wrong with this. I hope
 the codecs module lets you introspect the input and output types of a
 codec given by name?

At the moment there is no standard interface to access supported
input and output types... but then: regular Python functions or
methods also don't provide such functionality, so no surprise
there ;-)

It's mostly a matter of specifying the supported type
combinations in the codec documentation.

BTW: What would be a use case where you'd want to
programmatically access such information before calling
the codec ?

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Apr 23 2013)
 Python Projects, Consulting and Support ...   http://www.egenix.com/
 mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
 mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2013-04-17: Released eGenix mx Base 3.2.6 ... http://egenix.com/go43

: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Stephen J. Turnbull
R. David Murray writes:
  On Tue, 23 Apr 2013 22:29:33 +0900, Stephen J. Turnbull 
  step...@xemacs.org wrote:
   R. David Murray writes:
   
 You transform *into* the encoding, and untransform *out* of the
 encoding.  Do you have an example where that would be ambiguous?
   
   In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
   and ISO-8859-15) would do.  Or how about in text, ReST to HTML?
  
  If I write:
  
bytestring.transform('ISO-8859-15')
  
  that would indeed be ambiguous, but only because I haven't named the
  source encoding of the bytestring.  So the above is obviously
  nonsense, and the easiest fix is to have the things that are currently
  bytes-to-text or text-to-bytes character set transformations *only*
  work with encode/decode, and not transform/untransform.

I think you're completely missing my point here.  The problem is that
in the cases I mention, what is encoded data and what is decoded data
can only be decided by asking the user.

  I believe that after much discussion we have settled on these
  transformations (in their respective modules) accepting either bytes
  or strings as input for decoding, only bytes as input for encoding,
  and *always* producing bytes as output.

Which, of course, is quite broken from the point of view of the RFC!
Of course, the RFC be damned[1], for the purposes of the Python
stdlib, the specific codecs used for Content-Transfer-Encoding have a
clear intuitive directionality, and their encoding methods should turn
bytes into bytes (and str or bytes into bytes on decoding).

Nevertheless, it's not TOOWTDI, it's a careful compromise.

  Given this, the possible valid transformations would be:
  
bytestring.transform('base64')
bytesstring.untransform('base64')
string.untransform('base64')

Which is an obnoxious API, since (1) you've now made it impossible to
use transform for

bytestring.transform(from='utf-8', to='iso-8859-1')
bytestring.transform(from='ulaw', to='mp3')
textstring.transform(from='rest', to='html')

without confusion, and (2) the whole world is going to wonder why you
don't use .encode and .decode instead of .transform and .untransform.

The idea in the examples is that we could generalize the codec
registry to look up codecs by pairs of media-types.  I'm not sure this
makes sense ... much of the codec API presumes a stream, especially
the incremental methods.  But many MIME media types are streams only
because they're serializations, incremental en/decoding is nonsense.

So I suppose I would want to write

bytestring.transform(from='octet-stream', to='BASE64')

for this hypothetical API.  (I suspect that in practice the
'application/octet-stream' media type would be spelled 'bytes', of
course.)  This kind of API could be used to improve the security of
composition of transforms.  In the case of BASE64, it would make sense
to match anything at all as the other type (as long as it's
represented in Python by a bytes object).  So it would be possible to
do

object = bytestring.transform(from='BASE64', to='PNG')

giving object a media_type attribute such that

object.decode('iso-8859-1')

would fail.  (This would require changes to the charset codecs, to pay
heed to the media_type attribute, so it's not immediately feasible.)

  and all would produce a byte string.  That byte string would be in
  base64 for the first one, and a decoded binary string for the second two.
  
  Given our existing API, I don't think we want
  
string.encode('base64')
  
  to work (taking an ascii-only unicode string and returning bytes),

No, we don't, but for reasons that have little to do with ASCII-only.
The problem with supporting that idiom is that *people can't read
strs* [in the Python 3 internal representation] -- they can only read
a str that has been encoded implicitly into the PYTHONIOENCODING or
explicitly to an explicitly requested encoding.  So the usage above is
clearly ambiguous.  Even if it is ASCII-only, in theory the user could
want EBCDIC.

  If you do transform('base64') on a bytestring already encoded as
  base64 you get a double encoding, yes.  I don't see that it is our
  responsibility to try to protect you from this mistake.  The module
  functions certainly don't.
  
  Given that, is there anything ambiguous about the proposed API?

Not for BASE64.  But what's so special about BASE64 that it deserves a
new method name for the same old idiom, using a word that's an obvious
candidate for naming a more general idiom?

  (Note: if you would like to argue that, eg, base64.b64encode or
  binascii.b2a_base64 should return a string, it is too late for that
  argument for backward compatibility reasons.)

Even if it weren't too late, the byte-shoveling lobby is way too
strong; that's not a winnable agument.

  When I asked about ambiguous cases, I was asking for cases where
  the meaning of transform('somecodec') was ambiguous.

If transform('somecodec') isn't ambiguous, 

Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Guido van Rossum
On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg m...@egenix.com wrote:
 On 23.04.2013 17:47, Guido van Rossum wrote:
 On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg m...@egenix.com wrote:
 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.

 As an implementation mechanism I see nothing wrong with this. I hope
 the codecs module lets you introspect the input and output types of a
 codec given by name?

 At the moment there is no standard interface to access supported
 input and output types... but then: regular Python functions or
 methods also don't provide such functionality, so no surprise
 there ;-)

Not quite the same though. Each function has its own unique behavior.
But codecs support a standard interface, *except* that the input and
output types sometimes vary.

 It's mostly a matter of specifying the supported type
 combinations in the codec documentation.

 BTW: What would be a use case where you'd want to
 programmatically access such information before calling
 the codec ?

As you know, in Python 3, most code working with bytes doesn't also
work with strings, and vice versa (except for a few cases where we've
gone out of our way to write polymorphic code -- but users rarely do
so, and any time you use a string or bytes literal you basically limit
yourself to that type).

Suppose I write a command-line utility that reads a file, runs it
through a codec, and writes the result to another file. Suppose the
name of the codec is a command-line argument (as well as the
filenames). I need to know whether to open the files in text or binary
mode based on the name of the codec.

--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread R. David Murray
On Wed, 24 Apr 2013 01:49:39 +0900, Stephen J. Turnbull step...@xemacs.org 
wrote:
 R. David Murray writes:
   On Tue, 23 Apr 2013 22:29:33 +0900, Stephen J. Turnbull 
 step...@xemacs.org wrote:
R. David Murray writes:

  You transform *into* the encoding, and untransform *out* of the
  encoding.  Do you have an example where that would be ambiguous?

In the bytes-to-bytes case, any pair of character encodings (eg, UTF-8
and ISO-8859-15) would do.  Or how about in text, ReST to HTML?
   
   If I write:
   
 bytestring.transform('ISO-8859-15')
   
   that would indeed be ambiguous, but only because I haven't named the
   source encoding of the bytestring.  So the above is obviously
   nonsense, and the easiest fix is to have the things that are currently
   bytes-to-text or text-to-bytes character set transformations *only*
   work with encode/decode, and not transform/untransform.
 
 I think you're completely missing my point here.  The problem is that
 in the cases I mention, what is encoded data and what is decoded data
 can only be decided by asking the user.

I think I understood that.  I don't understand why that's a problem.
(But see below.)

   Given this, the possible valid transformations would be:
   
 bytestring.transform('base64')
 bytesstring.untransform('base64')
 string.untransform('base64')
 
 Which is an obnoxious API, since (1) you've now made it impossible to
 use transform for
 
 bytestring.transform(from='utf-8', to='iso-8859-1')
 bytestring.transform(from='ulaw', to='mp3')
 textstring.transform(from='rest', to='html')
 
 without confusion, and (2) the whole world is going to wonder why you
 don't use .encode and .decode instead of .transform and .untransform.

I've been trying to explain what I thought the transform/untransform
proposal was: a minimalist extension of the encode/decode semantic
(under a different name) so that functionality that was lost from
Python2 encode/decode could be restored to Python3 in a reasonably
understandable way.  This would be a *limited* convenience function,
just as encode/decode are limited convenience functions with respect to
the full power of the codecs module.

I myself don't have any real investment in the proposal, or I would
have long since tried to push the tracker issue forward.

People (at least you and Nick, and maybe Guido) seem to be more interested
in a more general/powerful mechanism.  I'm fine with that :)

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Terry Jan Reedy

On 4/23/2013 12:49 PM, Stephen J. Turnbull wrote:


Which is an obnoxious API, since (1) you've now made it impossible to
use transform for

 bytestring.transform(from='utf-8', to='iso-8859-1')
 bytestring.transform(from='ulaw', to='mp3')
 textstring.transform(from='rest', to='html')

without confusion, and (2) the whole world is going to wonder why you
don't use .encode and .decode instead of .transform and .untransform.


I think the unambiguous solution is to get rid of the notion of 
'untransform' (which only means 'transform in the other direction'), 
since it requires and presumes an asymmetry that is not always present. 
It it precisely the lack of asymmetry in examples like the above that 
makes the transform/untransform pair ambiguous as to which is which.


 .transform should be explicit and always take two args, no implicit 
defaults, the 'from form' and the 'to' form. They can labelled by 
position in the natural order (from, to) or by keyword, as in your 
examples. For text, the plain undifferentiated form which one might 
think of as default could be called 'text' and that for bytes 'bytes' 
(as you suggest) or 'ascii' as appropriate.


str.transform would always be unicode to unicode and bytes.transform 
always bytes to bytes.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Nick Coghlan
On 24 Apr 2013 01:25, M.-A. Lemburg m...@egenix.com wrote:

 On 23.04.2013 17:15, Barry Warsaw wrote:
  On Apr 22, 2013, at 06:22 PM, Guido van Rossum wrote:
 
  You can ask the same question about all the other codecs.  (And that
  question has indeed been asked in the past.)
 
  Except for rot13. :-)
 
  The fact that you can do this instead *is* a bit odd. ;)
 
  from codecs import getencoder
  encoder = getencoder('rot-13')
  r13 = encoder('hello world')[0]

 Just as reminder: we have the general purpose
 encode()/decode() functions in the codecs module:

 import codecs
 r13 = codecs.encode('hello world', 'rot-13')

 These interface directly to the codec interfaces, without
 enforcing type restrictions. The codec defines the supported
 input and output types.

If we already have those, why aren't they documented? If they exist, they
should be the first thing in the codecs module docs and the porting guide
should list them as the replacement for the method versions when using
encodings that aren't directly related to the text model, or when the input
buffer for decoding isn't a bytes or bytearray object.

Regards,
Nick.


 --
 Marc-Andre Lemburg
 eGenix.com

 Professional Python Services directly from the Source  (#1, Apr 23 2013)
  Python Projects, Consulting and Support ...   http://www.egenix.com/
  mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
  mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/
 
 2013-04-17: Released eGenix mx Base 3.2.6 ... http://egenix.com/go43

 : Try our mxODBC.Connect Python Database Interface for free ! ::

eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Greg Ewing

Stephen J. Turnbull wrote:

By RFC specification, BASE64 is a
*textual* representation of arbitrary binary data.  (Cf. URIs.)  The
natural interpretation of .encode('base64') in that context would be
as a bytes-to-text encoder.  However, ...  In
practice, we invariably use an ASCII octet stream to carry BASE64-
encoded data.


As an aside, if we'd had the flexible string representation sooner,
this needn't have been such a big problem. With it, the base64
encoder could return a unicode string with 8-bit representation,
which could then be turned into an ascii byte string with
negligible overhead.

Web developers might grumble about the need for an extra call,
but they can no longer claim it would kill the performance of
their web server.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Stephen J. Turnbull
R. David Murray writes:

   I think you're completely missing my point here.  The problem is that
   in the cases I mention, what is encoded data and what is decoded data
   can only be decided by asking the user.
  
  I think I understood that.  I don't understand why that's a
  problem.

It's a problem because in that case it's hard for users to remember
the directionality of the codec based only on a single name; the API
needs to indicate what is being transformed to what else.

  I've been trying to explain what I thought the transform/untransform
  proposal was: a minimalist extension of the encode/decode semantic
  (under a different name) so that functionality that was lost from
  Python2 encode/decode could be restored to Python3 in a reasonably
  understandable way.

I think that the intention of the proposal is reasonably
understandable, and reasonable.  I just don't think the API proposed
is understandable, and therefore it's not reasonable.wink/

  People (at least you and Nick, and maybe Guido) seem to be more
  interested in a more general/powerful mechanism.  I'm fine with
  that :)

I can't speak to the opinions of people who actually know about
language design.  For myself, I'm sympathetic to the proposal of a
specific API limited to cases where the directionality is clear as a
generality.  I just don't think the transform proposal helps much,
partly because the actual applications are few, and partly because
transform is more ambiguous (to be unambiguous in English, you need
both the direct object (from media type) and the indirect object
(to media type) specified.  It is quite possible to say transform
encoded text to raw text or similar.  At least for me, encode
transformed text to raw text raises a WTFAssertion.

I know that I've experienced worlds of pain in the character coding
sphere from Emacs APIs and UIs that don't indicate directionality
clearly.  This is very delicate; GNU Emacs had an ugly bug that
regressed multiple times over more than a decade merely because they
exposed the internal representation of text to Lisp.  XEmacs has never
experienced that bug (to be precise, the presence of that kind of bug
resulted in an immediate assertion, so it was eliminated early in
development).  Surprisingly to me, the fact that XEmacs uses the
internal representation of *text* to also represent byte streams
(with bytes of variable width!) has never caused me confusion.  It
does cause others confusion, though, so although the XEmacs model of
text is easier to work with than Emacs's, I tend to think Python 3's
(which never confounds text with bytes) is better.

I suspect that delicacy extends to non-character transformations, so I
am pretty demanding about proposals in this area.  Specifically I
insist on EIBTI and TOOWTDI.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Stephen J. Turnbull
Greg Ewing writes:

  Web developers might grumble about the need for an extra call,
  but they can no longer claim it would kill the performance of
  their web server.

Of course they can.  There never was any performance measurement that
supported that claim in the first place.  I don't see how PEP 393
makes a difference to them.  The real problem for them is that
conceptually they think ASCII in byte form *is* text, and they want to
do text processing on it.  They'll use any flimsy excuse to avoid a
transform to str, because it's just unbearably ugly given their
givens.

I have sympathy for their position, I just (even today) think it's the
wrong thing for Python.  However, I've long since been overruled, and
I have no evidence to justify saying I told you so.wink/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-23 Thread Stephen J. Turnbull
Terry Jan Reedy writes:

.transform should be explicit and always take two args, no implicit 
  defaults, the 'from form' and the 'to' form. They can labelled by 
  position in the natural order (from, to)

Not natural to escaped-from-C programmers, though.  I hesitate to say
make it keywords-only, but using keywords should be *strongly*
encouraged.

  str.transform would always be unicode to unicode and bytes.transform 
  always bytes to bytes.

Which leaves the salient cases (MIME content transfer encodings) out
in the cold, although I guess

string.encode('ascii').transform(from='base64', to='bytes')

isn't too horrible.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Victor Stinner
Hi,

Your question is discussed since 4 years in the following issue:
http://bugs.python.org/issue7475

The last proposition is to add transform() and untransform() methods
to bytes and str types. But nobody implemented the idea. If I remember
correctly, the missing point is how to define which types are
supported by a codec (ex: only bytes for bz2 codec, bytes and str for
rot13).

Victor

2013/4/22 Ram Rachum r...@rachum.com:
 Hi everyone,

 Take a look at this question:

 http://stackoverflow.com/questions/16122435/python-3-how-do-i-use-bytes-to-bytes-and-string-to-string-encodings/16122472?noredirect=1#comment23034787_16122472

 Is there really no way to use base64 that's as short as:

 b'whatever'.encode('base64')

 Because doing this:

 import codecs
 codecs.decode(bwhatever, base64_codec)

 Or this:

 import base64
 encoded = base64.b64encode(b'whatever')

 Is cumbersome!

 Why can't I do something like b'whatever'.encode('base64')? Or maybe using a
 different method than `encode`?


 Thanks,
 Ram.

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/victor.stinner%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Calvin Spealman
if two lines is cumbersome, you're in for a cumbersome life a programmer.
On Apr 22, 2013 7:31 AM, Ram Rachum r...@rachum.com wrote:

 Hi everyone,

 Take a look at this question:


 http://stackoverflow.com/questions/16122435/python-3-how-do-i-use-bytes-to-bytes-and-string-to-string-encodings/16122472?noredirect=1#comment23034787_16122472

 Is there really no way to use base64 that's as short as:

 b'whatever'.encode('base64')

 Because doing this:

 import codecs
 codecs.decode(bwhatever, base64_codec)

 Or this:

 import base64
 encoded = base64.b64encode(b'whatever')

 Is cumbersome!

 Why can't I do something like b'whatever'.encode('base64')? Or maybe using
 a different method than `encode`?


 Thanks,
 Ram.

 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Paul Moore
On 22 April 2013 12:39, Calvin Spealman ironfro...@gmail.com wrote:

 if two lines is cumbersome, you're in for a cumbersome life a programmer.


One of which is essentially Python's equivalent of a declaration...
Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Devin Jeanpierre
On Mon, Apr 22, 2013 at 7:39 AM, Calvin Spealman ironfro...@gmail.com wrote:
 if two lines is cumbersome, you're in for a cumbersome life a programmer.

Other encodings are either missing completely from the stdlib, or have
corrupted behavior. For example, string_escape is gone, and
unicode_escape doesn't make any sense anymore -- python code is text,
not bytes, so why does 'abc'.encode('unicode_escape') return bytes? I
don't think this change was thought through completely before it was
implemented.

I agree base64 is a bad place to pick at the encode/decode changes, though. :(

-- Devin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Oleg Broytman
On Mon, Apr 22, 2013 at 09:50:14AM -0400, Devin Jeanpierre 
jeanpierr...@gmail.com wrote:
 unicode_escape doesn't make any sense anymore -- python code is text,
 not bytes, so why does 'abc'.encode('unicode_escape') return bytes?

   AFAIU the situation is simple: unicode.encode(encoding) returns
bytes, bytes.decode(encoding) returns unicode, and neither
unicode.decode() nor bytes.encode() exist.
   Transformations like base64 and bz2 are nor encoding/decoding -- they
are bytes/bytes or unicode/unicode transformations.

Oleg.
-- 
 Oleg Broytmanhttp://phdru.name/p...@phdru.name
   Programmers don't die, they just GOSUB without RETURN.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Stephen J. Turnbull
Devin Jeanpierre writes:

  why does 'abc'.encode('unicode_escape') return bytes?

Duck-typing: encode always turns unicode into bytes.

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread R. David Murray
On Mon, 22 Apr 2013 09:50:14 -0400, Devin Jeanpierre jeanpierr...@gmail.com 
wrote:
 On Mon, Apr 22, 2013 at 7:39 AM, Calvin Spealman ironfro...@gmail.com wrote:
  if two lines is cumbersome, you're in for a cumbersome life a programmer.
 
 Other encodings are either missing completely from the stdlib, or have
 corrupted behavior. For example, string_escape is gone, and
 unicode_escape doesn't make any sense anymore -- python code is text,
 not bytes, so why does 'abc'.encode('unicode_escape') return bytes? I
 don't think this change was thought through completely before it was
 implemented.

We use unicode_escape (actually raw_unicode_escape) in the email package,
and there we are converting between string and bytes.  It is used as an
encoder when we are supposed to have ASCII input but have other stuff,
and need ASCII output and don't want to lose information.  So yes,
that encoder does still make sense.  It would also be useful as a
transform function, but as someone has pointed out there's an issue
for that.

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Greg Ewing

Victor Stinner wrote:

The last proposition is to add transform() and untransform() methods
to bytes and str types. ... If I remember
correctly, the missing point is how to define which types are
supported by a codec


Also, for any given codec, which direction is transform
and which is untransform?

Also also, what's so special about base64 et al that they
deserve an ultra-special way of invoking them, instead of
having to import a class or function like you do for
*every* *other* piece of library functionality?

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread R. David Murray
On Tue, 23 Apr 2013 11:16:20 +1200, Greg Ewing greg.ew...@canterbury.ac.nz 
wrote:
 Victor Stinner wrote:
  The last proposition is to add transform() and untransform() methods
  to bytes and str types. ... If I remember
  correctly, the missing point is how to define which types are
  supported by a codec
 
 Also, for any given codec, which direction is transform
 and which is untransform?

You transform *into* the encoding, and untransform *out* of the encoding.
Do you have an example where that would be ambiguous?

 Also also, what's so special about base64 et al that they
 deserve an ultra-special way of invoking them, instead of
 having to import a class or function like you do for
 *every* *other* piece of library functionality?

You can ask the same question about all the other codecs.  (And that
question has indeed been asked in the past.)

(One answer is that they used to work in Python2...but the longer we go
without restoring the functionality to Python3, the weaker that particular
argument becomes.)

--David
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Guido van Rossum
--Guido van Rossum (sent from Android phone)
On Apr 22, 2013 6:09 PM, R. David Murray rdmur...@bitdance.com wrote:

 On Tue, 23 Apr 2013 11:16:20 +1200, Greg Ewing 
greg.ew...@canterbury.ac.nz wrote:
  Victor Stinner wrote:
   The last proposition is to add transform() and untransform() methods
   to bytes and str types. ... If I remember
   correctly, the missing point is how to define which types are
   supported by a codec
 
  Also, for any given codec, which direction is transform
  and which is untransform?

 You transform *into* the encoding, and untransform *out* of the encoding.
 Do you have an example where that would be ambiguous?

  Also also, what's so special about base64 et al that they
  deserve an ultra-special way of invoking them, instead of
  having to import a class or function like you do for
  *every* *other* piece of library functionality?

 You can ask the same question about all the other codecs.  (And that
 question has indeed been asked in the past.)

Except for rot13. :-)

 (One answer is that they used to work in Python2...but the longer we go
 without restoring the functionality to Python3, the weaker that particular
 argument becomes.)

 --David
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
http://mail.python.org/mailman/options/python-dev/guido%40python.org
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Steven D'Aprano

On 23/04/13 09:16, Greg Ewing wrote:

Victor Stinner wrote:

The last proposition is to add transform() and untransform() methods
to bytes and str types. ... If I remember
correctly, the missing point is how to define which types are
supported by a codec


Also, for any given codec, which direction is transform
and which is untransform?

Also also, what's so special about base64 et al that they
deserve an ultra-special way of invoking them, instead of
having to import a class or function like you do for
*every* *other* piece of library functionality?



As others have pointed out in the past, repeatedly, the codec system is completely general and can 
transform bytes-bytes and text-text just as easily as bytes-text. Or indeed any 
bijection, as the docs for 2.7 point out. The question isn't What's so special about 
base64? The questions should be:

- What's so special about exotic legacy transformations like ISO-8859-10 and 
MacRoman that they deserve a string method for invoking them?

- Why have common transformations like base64, which worked in 2.x, been 
relegated to second-class status in 3.x?

- If it is no burden to have to import a module and call an external function 
for some transformations, why have encode and decode methods at all?


If you haven't read this, you should:

http://lucumr.pocoo.org/2012/8/11/codec-confusion/




--
Steven
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Donald Stufft

On Apr 22, 2013, at 10:04 PM, Steven D'Aprano st...@pearwood.info wrote:

 On 23/04/13 09:16, Greg Ewing wrote:
 Victor Stinner wrote:
 The last proposition is to add transform() and untransform() methods
 to bytes and str types. ... If I remember
 correctly, the missing point is how to define which types are
 supported by a codec
 
 Also, for any given codec, which direction is transform
 and which is untransform?
 
 Also also, what's so special about base64 et al that they
 deserve an ultra-special way of invoking them, instead of
 having to import a class or function like you do for
 *every* *other* piece of library functionality?
 
 
 As others have pointed out in the past, repeatedly, the codec system is 
 completely general and can transform bytes-bytes and text-text just as 
 easily as bytes-text. Or indeed any bijection, as the docs for 2.7 point 
 out. The question isn't What's so special about base64? The questions 
 should be:
 
 - What's so special about exotic legacy transformations like ISO-8859-10 and 
 MacRoman that they deserve a string method for invoking them?
 
 - Why have common transformations like base64, which worked in 2.x, been 
 relegated to second-class status in 3.x?
 
 - If it is no burden to have to import a module and call an external function 
 for some transformations, why have encode and decode methods at all?
 
 
 If you haven't read this, you should:
 
 http://lucumr.pocoo.org/2012/8/11/codec-confusion/

I may be dull, but it wasn't until I started using Python 3 that it really 
clicked in my head what encode/decode did exactly. In Python2 I just sort of 
sprinkled one or the other when there was errors until the pain stopped. I 
mostly attribute this to str.decode and bytes.encode not existing.

 
 
 
 
 -- 
 Steven
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe: 
 http://mail.python.org/mailman/options/python-dev/donald%40stufft.io


-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Guido van Rossum
On Mon, Apr 22, 2013 at 7:04 PM, Steven D'Aprano st...@pearwood.info wrote:
 As others have pointed out in the past, repeatedly, the codec system is
 completely general and can transform bytes-bytes and text-text just as
 easily as bytes-text. Or indeed any bijection, as the docs for 2.7 point
 out. The question isn't What's so special about base64? The questions
 should be:

 - What's so special about exotic legacy transformations like ISO-8859-10 and
 MacRoman that they deserve a string method for invoking them?

 - Why have common transformations like base64, which worked in 2.x, been
 relegated to second-class status in 3.x?

 - If it is no burden to have to import a module and call an external
 function for some transformations, why have encode and decode methods at
 all?

There are good answers to all of these, and your rhetoric is not
appreciated. The special status is for the translation between bytes
and Unicode characters (code points). There are many contexts where a
byte stream is labeled (either separately or in-line) as being encoded
using some specific encoding.

--
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Lennart Regebro
On Tue, Apr 23, 2013 at 4:04 AM, Steven D'Aprano st...@pearwood.info wrote:
 As others have pointed out in the past, repeatedly, the codec system is
 completely general and can transform bytes-bytes and text-text just as
 easily as bytes-text.

Yes, but the encode()/decode() methods are not, and the fact that you
now know what goes in and what comes out means that people get much
fewer Decode/EncodeErrors. Which is a good thing.

//Lennart
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

2013-04-22 Thread Fábio Santos
Using decode() and encode() would break that predictability. But someone
suggested the use of transform() and untransform() instead. That would
clarify that the transformation is bytes  bytes and Unicode string 
Unicode string.
On 23 Apr 2013 05:50, Lennart Regebro rege...@gmail.com wrote:

 On Tue, Apr 23, 2013 at 4:04 AM, Steven D'Aprano st...@pearwood.info
 wrote:
  As others have pointed out in the past, repeatedly, the codec system is
  completely general and can transform bytes-bytes and text-text just as
  easily as bytes-text.

 Yes, but the encode()/decode() methods are not, and the fact that you
 now know what goes in and what comes out means that people get much
 fewer Decode/EncodeErrors. Which is a good thing.

 //Lennart
 ___
 Python-Dev mailing list
 Python-Dev@python.org
 http://mail.python.org/mailman/listinfo/python-dev
 Unsubscribe:
 http://mail.python.org/mailman/options/python-dev/fabiosantosart%40gmail.com

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com