On Fri, 10 Jan 2014 11:32:05 +1000
Nick Coghlan ncogh...@gmail.com wrote:
It's consistent with bytearray.join's behaviour:
x = bytearray()
x.join([babc])
bytearray(b'abc')
x
bytearray(b'')
Yeah, I guess I'm OK with us being consistent on that one. It's still
weird, but also
On 09.01.2014 22:45, Antoine Pitrou wrote:
On Thu, 9 Jan 2014 13:36:05 -0800
Chris Barker chris.bar...@noaa.gov wrote:
Some folks have suggested using latin-1 (or other 8-bit encoding) -- is
that guaranteed to work with any binary data, and round-trip accurately?
Yes, it is.
Just a word
On 10 January 2014 12:19, M.-A. Lemburg m...@egenix.com wrote:
Just a word of caution:
Using the 'latin-1' to mean unknown encoding can easily result
in Mojibake (unreadable text) entering your application with
dangerous effects on your other text data.
Agreed. The latin-1 suggestion is
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 2014-01-10, 12:19 GMT, you wrote:
Using the 'latin-1' to mean unknown encoding can easily result
in Mojibake (unreadable text) entering your application with
dangerous effects on your other text data.
E.g. Marc-André read using 'latin-1' if
On 10 January 2014 07:41, Eric V. Smith e...@trueblade.com wrote:
I'm not sure how format_map helps in porting from 2 to 3, since it
doesn't exist in any version of 2.
Although that said, it's no doubt a useful feature, just not useful in
code that supports both 2 and 3 with a single code
On 10 January 2014 13:32, Lennart Regebro rege...@gmail.com wrote:
On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson
krist...@ccpgames.com wrote:
Do I speak Chinese to my grocer because china is a growing force in the
world? Or start every discussion with my children with a negotiation
On 1/10/2014 10:20 AM, Nick Coghlan wrote:
On 10 January 2014 07:41, Eric V. Smith e...@trueblade.com wrote:
I'm not sure how format_map helps in porting from 2 to 3, since it
doesn't exist in any version of 2.
Although that said, it's no doubt a useful feature, just not useful in
code that
Nick Coghlan ncogh...@gmail.com wrote:
One idea we're considering for Python 3.5 is to have a report of
ascii on a POSIX OS imply the surrogateescape error handler (at
least for the standard streams, and perhaps in other contexts), since
the OS reporting the POSIX/C locale almost certainly
Now I feel it is bad thing that encouraging using unicode for binary with
latin-1 encoding or surrogateescape errorhandler.
Handling binary data in str type using latin-1 is just a hack.
Surrogateescape is just a workaround to keep undecodable bytes in text.
Encouraging binary data in str type
Le 10/01/2014 16:35, Nick Coghlan a écrit :
One idea we're considering for Python 3.5 is to have a report of
ascii on a POSIX OS imply the surrogateescape error handler (at
least for the standard streams, and perhaps in other contexts), since
the OS reporting the POSIX/C locale almost
ACTIVITY SUMMARY (2014-01-03 - 2014-01-10)
Python tracker at http://bugs.python.org/
To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.
Issues counts and deltas:
open4409 (+61)
closed 27580 (+42)
total 31989 (+103)
Open issues
(Sorry if this messes-up the thread order, it is meant as a reply to the
original RFC.)
Dear list,
newbie here. After much hesitation I decided to put forward a use case
which bothers me about the current proposal. Disclaimer: I happen to write
a library which is directly influenced by this.
As
On Fri, Jan 10, 2014 at 4:35 PM, Nick Coghlan ncogh...@gmail.com wrote:
On 10 January 2014 13:32, Lennart Regebro rege...@gmail.com wrote:
No, because your environment have a default language. And Python has a
default encoding. You only get problems when some file doesn't use the
default
10.01.14 14:19, M.-A. Lemburg написав(ла):
BTW: Perhaps it would be a good idea to backport the
surrogateescape error handler to Python 2.7 to simplify
writing code which works in both Python 2 and 3.
You also should change the UTF-8 codec so that it will reject surrogates
(i.e.
On 1/10/2014 12:17 PM, Juraj Sukop wrote:
(Sorry if this messes-up the thread order, it is meant as a reply to the
original RFC.)
Dear list,
newbie here. After much hesitation I decided to put forward a use case
which bothers me about the current proposal. Disclaimer: I happen to
write a
On 06/01/2014 13:24, Victor Stinner wrote:
Hi,
bytes % args and bytes.format(args) are requested by Mercurial and
Twisted projects. The issue #3982 was stuck because nobody proposed a
complete definition of the new features. Here is a try as a PEP.
Apologies if this has already been said,
On Jan 10, 2014, at 7:35 AM, Nick Coghlan wrote:
Putting this here because I found out today it's not in any of the
PEPs and folks have to go digging in mailing list archives to find it.
I'll add it to my Python 3 QA at some point.
The reason Python 3 currently tries to rely on the POSIX
INADA Naoki wrote:
latin1 is OK but is it Pythonic?
Latin is most certainly a Pythonic subject:
http://www.youtube.com/watch?v=IIAdHEwiAy8
--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Am 10.01.2014 18:56, schrieb Eric V. Smith:
On 1/10/2014 12:17 PM, Juraj Sukop wrote:
(Sorry if this messes-up the thread order, it is meant as a reply to the
original RFC.)
Dear list,
newbie here. After much hesitation I decided to put forward a use case
which bothers me about the
Steven D'Aprano wrote:
I think that heuristics to guess the encoding have their role to play,
if the caller understands the risks.
Ben Finney wrote:
In my opinion, content-type guessing heuristics certainly don't belong
in the standard library.
It would be great if there were never any
On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop juraj.su...@gmail.com wrote:
As you may know, PDF operates over bytes and an integer or floating-point
number is written down as-is, for example 100 or 1.23.
Just to be clear here -- is PDF specifically bytes+ascii?
Or could there be
10.01.14 18:27, Baptiste Carvello написав(ла):
would it make sense to be more general, and allow a lenient mode,
where all files implicitly opened with the default encoding would also
use the surrogateescape error handler ?
The surrogateescape error handler is compatible only with
2014/1/10 Juraj Sukop juraj.su...@gmail.com:
In the case of PDF, the embedding of an image into PDF looks like:
10 0 obj
/Type /XObject
/Width 100
/Height 100
/Alternates 15 0 R
/Length 2167
stream
...binary image data...
On 1/10/2014 5:12 PM, Victor Stinner wrote:
2014/1/10 Juraj Sukop juraj.su...@gmail.com:
In the case of PDF, the embedding of an image into PDF looks like:
10 0 obj
/Type /XObject
/Width 100
/Height 100
/Alternates 15 0 R
/Length 2167
On Fri, 10 Jan 2014 12:56:19 -0500
Eric V. Smith e...@trueblade.com wrote:
I agree. I don't see any reason to exclude int and float. See Guido's
messages http://bugs.python.org/issue3982#msg180423 and
http://bugs.python.org/issue3982#msg180430 for some justification and
discussion.
If you
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 12:56:19 -0500
Eric V. Smith e...@trueblade.com wrote:
I agree. I don't see any reason to exclude int and float. See Guido's
messages http://bugs.python.org/issue3982#msg180423 and
http://bugs.python.org/issue3982#msg180430 for
On Fri, 10 Jan 2014 17:20:32 -0500
Eric V. Smith e...@trueblade.com wrote:
Isn't the point of the PEP to make it easier to port 2.x code to 3.5?
Is
there really existing code like this in 2.x?
No, but so what? The point of the PEP is not to allow arbitrary
Python 2 code to run without
On Fri, 10 Jan 2014 17:33:57 -0500
Eric V. Smith e...@trueblade.com wrote:
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 12:56:19 -0500
Eric V. Smith e...@trueblade.com wrote:
I agree. I don't see any reason to exclude int and float. See Guido's
messages
On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 17:33:57 -0500
Eric V. Smith e...@trueblade.com wrote:
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 12:56:19 -0500
Eric V. Smith e...@trueblade.com wrote:
I agree. I don't see any reason to exclude int and
On Fri, 10 Jan 2014 14:58:15 -0800
Ethan Furman et...@stoneleaf.us wrote:
On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 17:33:57 -0500
Eric V. Smith e...@trueblade.com wrote:
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 12:56:19 -0500
Eric V. Smith
On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore p.f.mo...@gmail.com wrote:
Using the 'latin-1' to mean unknown encoding can easily result
in Mojibake (unreadable text) entering your application with
dangerous effects on your other text data.
Agreed. The latin-1 suggestion is purely for people
On 10/01/2014 22:06, Chris Barker wrote:
On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore p.f.mo...@gmail.com
mailto:p.f.mo...@gmail.com wrote:
Using the 'latin-1' to mean unknown encoding can easily result
in Mojibake (unreadable text) entering your application with
dangerous
On Fri, 10 Jan 2014 18:14:45 -0500
Eric V. Smith e...@trueblade.com wrote:
Because embedding the ASCII equivalent of ints and floats in byte streams
is a common operation?
Again, if you're representing ASCII, you're representing text and
should use a str object.
Yes, but is there
On Fri, Jan 10, 2014 at 10:52 PM, Chris Barker chris.bar...@noaa.govwrote:
On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop juraj.su...@gmail.comwrote:
As you may know, PDF operates over bytes and an integer or floating-point
number is written down as-is, for example 100 or 1.23.
Just to be
On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner
victor.stin...@gmail.comwrote:
What not building 10 0 obj ... stream and endstream endobj in
Unicode and then encode to ASCII? Example:
data = b''.join((
(%d %d obj ... stream % (10, 0)).encode('ascii'),
binary_image_data,
(endstream
On Sat, 11 Jan 2014 00:43:39 +0100
Juraj Sukop juraj.su...@gmail.com wrote:
Basically, to .encode('ascii') every possible
number is not exactly simple or pretty.
Well it strikes me that the PDF format itself is not exactly simple or
pretty. It might be convenient that Python 2 allows you, in
On Fri, Jan 10, 2014 at 3:40 PM, Juraj Sukop juraj.su...@gmail.com wrote:
What this all means is that the PDF objects are expressed in ASCII,
stream objects like images and fonts may have a binary part and I never
saw those UTF+16 strings.
hmm -- I wonder if they are out there in the wild,
On Fri, Jan 10, 2014 at 3:22 PM, Mark Lawrence breamore...@yahoo.co.ukwrote:
The correct way is to read the interface specification which tells you
what should be in the data. Or do people not use interface specifications
these days, preferring to guess what they've got instead?
No one is
On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou solip...@pitrou.netwrote:
Also, when you say you've never encountered UTF-16 text in PDFs, it
sounds like those people who've never encountered any non-ASCII data in
their programs.
Let me clarify: one does not think in writing text in
On 01/10/2014 03:22 PM, Mark Lawrence wrote:
On 10/01/2014 22:06, Chris Barker wrote:
I'm not so sure -- it could be used (abused?) for that, but I'm
suggesting it be used for mixed ascii-binary data. I don't know that
there IS a right way to do that -- at least not an efficient or easy
to
On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
With Victor's consent, I overhauled PEP 460 and made the feature set
more restricted and consistent with the bytes/str separation.
From the PEP:
=
Python 3 generally mandates that text be stored and manipulated as
unicode (i.e. str
On Fri, 10 Jan 2014 16:23:53 -0800
Ethan Furman et...@stoneleaf.us wrote:
On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
With Victor's consent, I overhauled PEP 460 and made the feature set
more restricted and consistent with the bytes/str separation.
From the PEP:
=
On 1/10/2014 8:12 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 16:23:53 -0800
Ethan Furman et...@stoneleaf.us wrote:
On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
With Victor's consent, I overhauled PEP 460 and made the feature set
more restricted and consistent with the bytes/str
On Fri, 10 Jan 2014 20:53:09 -0500
Eric V. Smith e...@trueblade.com wrote:
So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
3892. See for example http://bugs.python.org/issue3982#msg180432 .
Then we might as well not do anything, since any attempt to advance
things is met
On 01/10/2014 06:04 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 20:53:09 -0500
Eric V. Smith e...@trueblade.com wrote:
So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
3892. See for example http://bugs.python.org/issue3982#msg180432 .
Then we might as well not do
On Fri, 10 Jan 2014 18:28:41 -0800
Ethan Furman et...@stoneleaf.us wrote:
Is it safe to assume you don't use Python for the use-cases under discussion?
You know, I've done quite a bit of network programming. I've also done
an experimental port of Twisted to Python 3. I know what a network
To avoid implicit conversion between str and bytes, I propose adding only
limited %-format,
not .format() or .format_map().
limited %-format means:
%c accepts integer or bytes having one length.
%r is not supported
%s accepts only bytes.
%a is only format accepts arbitrary object.
And other
On 01/10/2014 06:39 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 18:28:41 -0800
Ethan Furman wrote:
Is it safe to assume you don't use Python for the use-cases under discussion?
You know, I've done quite a bit of network programming.
No, I didn't, that's why I asked.
I've also done an
On 01/10/2014 06:39 PM, Antoine Pitrou wrote:
I know what a network protocol with ill-defined encodings
looks like.
For the record, I've been (and I suspect Eric and some others have also been) talking about well-defined encodings. For
the DBF files that I work with, there is binary,
To avoid implicit conversion between str and bytes, I propose adding only
limited %-format,
not .format() or .format_map().
limited %-format means:
%c accepts integer or bytes having one length.
%r is not supported
%s accepts only bytes.
%a is only format accepts arbitrary object.
And other
On 11Jan2014 00:43, Juraj Sukop juraj.su...@gmail.com wrote:
On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner
victor.stin...@gmail.comwrote:
What not building 10 0 obj ... stream and endstream endobj in
Unicode and then encode to ASCII? Example:
data = b''.join((
(%d %d obj ...
On Fri, Jan 10, 2014 at 06:17:02PM +0100, Juraj Sukop wrote:
As you may know, PDF operates over bytes and an integer or floating-point
number is written down as-is, for example 100 or 1.23.
I'm sorry, I don't understand what you mean here. I'm honestly not
trying to be difficult, but you
Jim J. Jewett jimjjew...@gmail.com writes:
Steven D'Aprano wrote:
I think that heuristics to guess the encoding have their role to play,
if the caller understands the risks.
Ben Finney wrote:
In my opinion, content-type guessing heuristics certainly don't belong
in the standard
Am 11.01.2014 03:04, schrieb Antoine Pitrou:
On Fri, 10 Jan 2014 20:53:09 -0500
Eric V. Smith e...@trueblade.com wrote:
So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
3892. See for example http://bugs.python.org/issue3982#msg180432 .
I agree.
Then we might as well
54 matches
Mail list logo