Am 11.01.2014 03:04, schrieb Antoine Pitrou:
> On Fri, 10 Jan 2014 20:53:09 -0500
> "Eric V. Smith" wrote:
>>
>> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
>> 3892. See for example http://bugs.python.org/issue3982#msg180432 .
I agree.
> Then we might as well not do an
"Jim J. Jewett" writes:
>
> > Steven D'Aprano wrote:
> >> I think that heuristics to guess the encoding have their role to play,
> >> if the caller understands the risks.
>
> Ben Finney wrote:
> > In my opinion, content-type guessing heuristics certainly don't belong
> > in the standard library
On Fri, Jan 10, 2014 at 06:17:02PM +0100, Juraj Sukop wrote:
> As you may know, PDF operates over bytes and an integer or floating-point
> number is written down as-is, for example "100" or "1.23".
I'm sorry, I don't understand what you mean here. I'm honestly not
trying to be difficult, but you
On 11Jan2014 00:43, Juraj Sukop wrote:
> On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner
> wrote:
> > What not building "10 0 obj ... stream" and "endstream endobj" in
> > Unicode and then encode to ASCII? Example:
> >
> > data = b''.join((
> > ("%d %d obj ... stream" % (10, 0)).encode('ascii')
To avoid implicit conversion between str and bytes, I propose adding only
limited %-format,
not .format() or .format_map().
"limited %-format" means:
%c accepts integer or bytes having one length.
%r is not supported
%s accepts only bytes.
%a is only format accepts arbitrary object.
And other fo
On 01/10/2014 06:39 PM, Antoine Pitrou wrote:
I know what a network protocol with ill-defined encodings
looks like.
For the record, I've been (and I suspect Eric and some others have also been) talking about well-defined encodings. For
the DBF files that I work with, there is binary, ASCII,
On 01/10/2014 06:39 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 18:28:41 -0800
Ethan Furman wrote:
Is it safe to assume you don't use Python for the use-cases under discussion?
You know, I've done quite a bit of network programming.
No, I didn't, that's why I asked.
I've also done an ex
To avoid implicit conversion between str and bytes, I propose adding only
limited %-format,
not .format() or .format_map().
"limited %-format" means:
%c accepts integer or bytes having one length.
%r is not supported
%s accepts only bytes.
%a is only format accepts arbitrary object.
And other fo
On Fri, 10 Jan 2014 18:28:41 -0800
Ethan Furman wrote:
>
> Is it safe to assume you don't use Python for the use-cases under discussion?
You know, I've done quite a bit of network programming. I've also done
an experimental port of Twisted to Python 3. I know what a network
protocol with ill-def
On 01/10/2014 06:04 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 20:53:09 -0500
"Eric V. Smith" wrote:
So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
3892. See for example http://bugs.python.org/issue3982#msg180432 .
Then we might as well not do anything, since any at
On Fri, 10 Jan 2014 20:53:09 -0500
"Eric V. Smith" wrote:
>
> So, I'm -1 on the PEP. It doesn't address the cases laid out in issue
> 3892. See for example http://bugs.python.org/issue3982#msg180432 .
Then we might as well not do anything, since any attempt to advance
things is met by stubborn o
On 1/10/2014 8:12 PM, Antoine Pitrou wrote:
> On Fri, 10 Jan 2014 16:23:53 -0800
> Ethan Furman wrote:
>> On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
>>>
>>> With Victor's consent, I overhauled PEP 460 and made the feature set
>>> more restricted and consistent with the bytes/str separation.
>>
On Fri, 10 Jan 2014 16:23:53 -0800
Ethan Furman wrote:
> On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
> >
> > With Victor's consent, I overhauled PEP 460 and made the feature set
> > more restricted and consistent with the bytes/str separation.
>
> From the PEP:
> =
> > Python 3 gen
On 01/08/2014 02:42 PM, Antoine Pitrou wrote:
With Victor's consent, I overhauled PEP 460 and made the feature set
more restricted and consistent with the bytes/str separation.
From the PEP:
=
Python 3 generally mandates that text be stored and manipulated as
unicode (i.e. str ob
On 01/10/2014 03:22 PM, Mark Lawrence wrote:
On 10/01/2014 22:06, Chris Barker wrote:
I'm not so sure -- it could be used (abused?) for that, but I'm
suggesting it be used for mixed ascii-binary data. I don't know that
there IS a "right" way to do that -- at least not an efficient or easy
to re
On Sat, Jan 11, 2014 at 12:49 AM, Antoine Pitrou wrote:
> Also, when you say you've never encountered UTF-16 text in PDFs, it
> sounds like those people who've never encountered any non-ASCII data in
> their programs.
Let me clarify: one does not think in "writing text in Unicode"-terms in
PDF.
On Fri, Jan 10, 2014 at 3:22 PM, Mark Lawrence wrote:
> The correct way is to read the interface specification which tells you
> what should be in the data. Or do people not use interface specifications
> these days, preferring to guess what they've got instead?
>
No one is suggesting guessing (
On Fri, Jan 10, 2014 at 3:40 PM, Juraj Sukop wrote:
> What this all means is that the PDF objects are expressed in ASCII,
> "stream" objects like images and fonts may have a binary part and I never
> saw those UTF+16 strings.
>
hmm -- I wonder if they are out there in the wild, though
> u
On Sat, 11 Jan 2014 00:43:39 +0100
Juraj Sukop wrote:
> Basically, to ".encode('ascii')" every possible
> number is not exactly simple or pretty.
Well it strikes me that the PDF format itself is not exactly simple or
pretty. It might be convenient that Python 2 allows you, in certain
cases, to "i
On Fri, Jan 10, 2014 at 11:12 PM, Victor Stinner
wrote:
>
> What not building "10 0 obj ... stream" and "endstream endobj" in
> Unicode and then encode to ASCII? Example:
>
> data = b''.join((
> ("%d %d obj ... stream" % (10, 0)).encode('ascii'),
> binary_image_data,
> ("endstream endobj").e
On Fri, Jan 10, 2014 at 10:52 PM, Chris Barker wrote:
> On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop wrote:
>
>> As you may know, PDF operates over bytes and an integer or floating-point
>> number is written down as-is, for example "100" or "1.23".
>>
>
> Just to be clear here -- is PDF specifical
On Fri, 10 Jan 2014 18:14:45 -0500
"Eric V. Smith" wrote:
>
> >> Because embedding the ASCII equivalent of ints and floats in byte streams
> >> is a common operation?
> >
> > Again, if you're representing "ASCII", you're representing text and
> > should use a str object.
>
> Yes, but is there e
On 10/01/2014 22:06, Chris Barker wrote:
On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore mailto:p.f.mo...@gmail.com>> wrote:
> Using the 'latin-1' to mean unknown encoding can easily result
> in Mojibake (unreadable text) entering your application with
> dangerous effects on your othe
On 1/10/2014 6:02 PM, Antoine Pitrou wrote:
> On Fri, 10 Jan 2014 14:58:15 -0800
> Ethan Furman wrote:
>> On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
>>> On Fri, 10 Jan 2014 17:33:57 -0500
>>> "Eric V. Smith" wrote:
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
> On Fri, 10 Jan 2014 12:5
On Fri, Jan 10, 2014 at 6:05 AM, Paul Moore wrote:
> > Using the 'latin-1' to mean unknown encoding can easily result
> > in Mojibake (unreadable text) entering your application with
> > dangerous effects on your other text data.
>
> Agreed. The latin-1 suggestion is purely for people who object
On Fri, 10 Jan 2014 14:58:15 -0800
Ethan Furman wrote:
> On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
> > On Fri, 10 Jan 2014 17:33:57 -0500
> > "Eric V. Smith" wrote:
> >> On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
> >>> On Fri, 10 Jan 2014 12:56:19 -0500
> >>> "Eric V. Smith" wrote:
>
>
On 01/10/2014 02:42 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 17:33:57 -0500
"Eric V. Smith" wrote:
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
On Fri, 10 Jan 2014 12:56:19 -0500
"Eric V. Smith" wrote:
I agree. I don't see any reason to exclude int and float. See Guido's
messages http:/
On Fri, 10 Jan 2014 17:33:57 -0500
"Eric V. Smith" wrote:
> On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
> > On Fri, 10 Jan 2014 12:56:19 -0500
> > "Eric V. Smith" wrote:
> >>
> >> I agree. I don't see any reason to exclude int and float. See Guido's
> >> messages http://bugs.python.org/issue3982#
On Fri, 10 Jan 2014 17:20:32 -0500
"Eric V. Smith" wrote:
>
> Isn't the point of the PEP to make it easier to port 2.x code to 3.5?
> Is
> there really existing code like this in 2.x?
No, but so what? The point of the PEP is not to allow arbitrary
Python 2 code to run without modification under
On 1/10/2014 5:29 PM, Antoine Pitrou wrote:
> On Fri, 10 Jan 2014 12:56:19 -0500
> "Eric V. Smith" wrote:
>>
>> I agree. I don't see any reason to exclude int and float. See Guido's
>> messages http://bugs.python.org/issue3982#msg180423 and
>> http://bugs.python.org/issue3982#msg180430 for some ju
On Fri, 10 Jan 2014 12:56:19 -0500
"Eric V. Smith" wrote:
>
> I agree. I don't see any reason to exclude int and float. See Guido's
> messages http://bugs.python.org/issue3982#msg180423 and
> http://bugs.python.org/issue3982#msg180430 for some justification and
> discussion.
If you are represent
On 1/10/2014 5:12 PM, Victor Stinner wrote:
> 2014/1/10 Juraj Sukop :
>> In the case of PDF, the embedding of an image into PDF looks like:
>>
>> 10 0 obj
>> << /Type /XObject
>> /Width 100
>> /Height 100
>> /Alternates 15 0 R
>> /Length 2167
>> >
2014/1/10 Juraj Sukop :
> In the case of PDF, the embedding of an image into PDF looks like:
>
> 10 0 obj
> << /Type /XObject
> /Width 100
> /Height 100
> /Alternates 15 0 R
> /Length 2167
> >>
> stream
> ...binary image data...
> ends
10.01.14 18:27, Baptiste Carvello написав(ла):
would it make sense to be more general, and allow a "lenient mode",
where all files implicitly opened with the default encoding would also
use the surrogateescape error handler ?
The surrogateescape error handler is compatible only with
ASCII-comp
On Fri, Jan 10, 2014 at 9:17 AM, Juraj Sukop wrote:
> As you may know, PDF operates over bytes and an integer or floating-point
> number is written down as-is, for example "100" or "1.23".
>
Just to be clear here -- is PDF specifically bytes+ascii?
Or could there be some-other-encoding unicode
> Steven D'Aprano wrote:
>> I think that heuristics to guess the encoding have their role to play,
>> if the caller understands the risks.
Ben Finney wrote:
> In my opinion, content-type guessing heuristics certainly don't belong
> in the standard library.
It would be great if there were never
Am 10.01.2014 18:56, schrieb Eric V. Smith:
> On 1/10/2014 12:17 PM, Juraj Sukop wrote:
>> (Sorry if this messes-up the thread order, it is meant as a reply to the
>> original RFC.)
>>
>> Dear list,
>>
>> newbie here. After much hesitation I decided to put forward a use case
>> which bothers me a
INADA Naoki wrote:
latin1 is OK but is it Pythonic?
Latin is most certainly a Pythonic subject:
http://www.youtube.com/watch?v=IIAdHEwiAy8
--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
On Jan 10, 2014, at 7:35 AM, Nick Coghlan wrote:
> Putting this here because I found out today it's not in any of the
> PEPs and folks have to go digging in mailing list archives to find it.
> I'll add it to my Python 3 Q&A at some point.
>
> The reason Python 3 currently tries to rely on the PO
On 06/01/2014 13:24, Victor Stinner wrote:
Hi,
bytes % args and bytes.format(args) are requested by Mercurial and
Twisted projects. The issue #3982 was stuck because nobody proposed a
complete definition of the "new" features. Here is a try as a PEP.
Apologies if this has already been said, b
On 1/10/2014 12:17 PM, Juraj Sukop wrote:
> (Sorry if this messes-up the thread order, it is meant as a reply to the
> original RFC.)
>
> Dear list,
>
> newbie here. After much hesitation I decided to put forward a use case
> which bothers me about the current proposal. Disclaimer: I happen to
>
10.01.14 14:19, M.-A. Lemburg написав(ла):
BTW: Perhaps it would be a good idea to backport the
surrogateescape error handler to Python 2.7 to simplify
writing code which works in both Python 2 and 3.
You also should change the UTF-8 codec so that it will reject surrogates
(i.e. u'\ud880'.enco
On Fri, Jan 10, 2014 at 4:35 PM, Nick Coghlan wrote:
> On 10 January 2014 13:32, Lennart Regebro wrote:
>> No, because your environment have a default language. And Python has a
>> default encoding. You only get problems when some file doesn't use the
>> default encoding.
>
> The reason Python 3
(Sorry if this messes-up the thread order, it is meant as a reply to the
original RFC.)
Dear list,
newbie here. After much hesitation I decided to put forward a use case
which bothers me about the current proposal. Disclaimer: I happen to write
a library which is directly influenced by this.
As
ACTIVITY SUMMARY (2014-01-03 - 2014-01-10)
Python tracker at http://bugs.python.org/
To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.
Issues counts and deltas:
open4409 (+61)
closed 27580 (+42)
total 31989 (+103)
Open issues wi
Le 10/01/2014 16:35, Nick Coghlan a écrit :
> One idea we're considering for Python 3.5 is to have a report of
> "ascii" on a POSIX OS imply the surrogateescape error handler (at
> least for the standard streams, and perhaps in other contexts), since
> the OS reporting the POSIX/C locale almost ce
Now I feel it is bad thing that encouraging using unicode for binary with
latin-1 encoding or surrogateescape errorhandler.
Handling binary data in str type using latin-1 is just a hack.
Surrogateescape is just a workaround to keep undecodable bytes in text.
Encouraging binary data in str type wi
Nick Coghlan wrote:
> One idea we're considering for Python 3.5 is to have a report of
> "ascii" on a POSIX OS imply the surrogateescape error handler (at
> least for the standard streams, and perhaps in other contexts), since
> the OS reporting the POSIX/C locale almost certainly indicates a
> co
On 1/10/2014 10:20 AM, Nick Coghlan wrote:
> On 10 January 2014 07:41, Eric V. Smith wrote:
>> I'm not sure how format_map helps in porting from 2 to 3, since it
>> doesn't exist in any version of 2.
>>
>> Although that said, it's no doubt a useful feature, just not useful in
>> code that supports
On 10 January 2014 13:32, Lennart Regebro wrote:
> On Thu, Jan 9, 2014 at 10:06 AM, Kristján Valur Jónsson
> wrote:
>> Do I speak Chinese to my grocer because china is a growing force in the
>> world? Or start every discussion with my children with a negotiation on
>> what language to use?
>
>
On 10 January 2014 07:41, Eric V. Smith wrote:
> I'm not sure how format_map helps in porting from 2 to 3, since it
> doesn't exist in any version of 2.
>
> Although that said, it's no doubt a useful feature, just not useful in
> code that supports both 2 and 3 with a single code base or when port
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 2014-01-10, 12:19 GMT, you wrote:
> Using the 'latin-1' to mean unknown encoding can easily result
> in Mojibake (unreadable text) entering your application with
> dangerous effects on your other text data.
>
> E.g. "Marc-André" read using 'latin-1'
On 10 January 2014 12:19, M.-A. Lemburg wrote:
> Just a word of caution:
>
> Using the 'latin-1' to mean unknown encoding can easily result
> in Mojibake (unreadable text) entering your application with
> dangerous effects on your other text data.
Agreed. The latin-1 suggestion is purely for peop
On 09.01.2014 22:45, Antoine Pitrou wrote:
> On Thu, 9 Jan 2014 13:36:05 -0800
> Chris Barker wrote:
>>
>> Some folks have suggested using latin-1 (or other 8-bit encoding) -- is
>> that guaranteed to work with any binary data, and round-trip accurately?
>
> Yes, it is.
Just a word of caution:
On Fri, 10 Jan 2014 11:32:05 +1000
Nick Coghlan wrote:
> >
> > It's consistent with bytearray.join's behaviour:
> >
> > >>> x = bytearray()
> > >>> x.join([b"abc"])
> > bytearray(b'abc')
> > >>> x
> > bytearray(b'')
>
> Yeah, I guess I'm OK with us being consistent on that one. It's still
> weird
55 matches
Mail list logo