Stephen J. Turnbull a écrit :
What really saves the day here is not that common encodings just
don't do that. It's that even in the case where only syntactically
significant bytes in the representation are URL-encoded, they *are*
URL-encoded. As long as the parsing library restricts itself to
On Wed, Sep 22, 2010 at 9:37 AM, Andrew McNamara
andr...@object-craft.com.au wrote:
Yeah, that's the original reasoning that had me leaning towards the
parallel API approach. If I seem to be changing my mind a lot in this
thread it's because I'm genuinely torn between the desire to make it
easier
On Wed, Sep 22, 2010 at 12:59 PM, Stephen J. Turnbull
step...@xemacs.org wrote:
Neil Hodgson writes:
Over time, the set of trail bytes used has expanded - in GB18030
digits are possible although many of the most important characters
for parsing such as ''' #%.?/''' are still safe as
On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull step...@xemacs.org wrote:
On the other hand, it is dangerous to provide a polymorphic API which
does that more extensive parsing, because a less than paranoid
programmer will have very likely allowed the parsed components to
escape from the
On 21 September 2010 14:38, Nick Coghlan ncogh...@gmail.com wrote:
On Tue, Sep 21, 2010 at 3:03 PM, Stephen J. Turnbull step...@xemacs.org
wrote:
On the other hand, it is dangerous to provide a polymorphic API which
[...]
Sorry if this is off-topic, but I don't believe I ever saw Stephen's
Nick Coghlan writes:
(Basically, while the issue of programmers assuming 'latin-1' or
'utf-8' or similar ASCII friendly encodings when they shouldn't is
real, I don't believe a polymorphic API here will make things any
*worse* than what would happen with a parallel API)
That depends on
On Sep 21, 2010, at 04:01 PM, Paul Moore wrote:
Sorry if this is off-topic, but I don't believe I ever saw Stephen's
email. I have a feeling that's happened a couple of times recently.
Before I go off trying to work out why gmail is dumping list mails on
me, did anyone else see Stephen's mail via
On Wed, 22 Sep 2010 00:10:01 +0900
Stephen J. Turnbull step...@xemacs.org wrote:
But I don't know whether the web apps programmers will be satisfied
with such a minimal API.
Web app programmers will generally go through a framework, which
handles encoding/decoding for them (already so in
On Mon, Sep 20, 2010 at 6:19 PM, Nick Coghlan ncogh...@gmail.com wrote:
What are the cases you believe will cause new mojibake?
Calling operations like urlsplit on byte sequences in non-ASCII
compatible encodings and operations like urljoin on byte sequences
that are encoded with different
On 21 September 2010 16:23, Barry Warsaw ba...@python.org wrote:
On Sep 21, 2010, at 04:01 PM, Paul Moore wrote:
Sorry if this is off-topic, but I don't believe I ever saw Stephen's
email. I have a feeling that's happened a couple of times recently.
Before I go off trying to work out why gmail
On Tue, 2010-09-21 at 23:38 +1000, Nick Coghlan wrote:
And if this turns out to be a disaster in practice:
a) on my head be it; and
b) we still have the option of the DeprecationWarning dance for bytes
inputs to the existing functions and moving to a parallel API
In the case of urllib.parse,
On Wed, Sep 22, 2010 at 1:10 AM, Stephen J. Turnbull step...@xemacs.org wrote:
Nick Coghlan writes:
(Basically, while the issue of programmers assuming 'latin-1' or
'utf-8' or similar ASCII friendly encodings when they shouldn't is
real, I don't believe a polymorphic API here will make
On Wed, Sep 22, 2010 at 1:57 AM, Ian Bicking i...@colorstudy.com wrote:
All this is unrelated to the question, though -- a separate byte-oriented
function won't help any case I can think of. If the programmer is
implementing something like
Ian Bicking:
I think the use case everyone has in mind here is where
you get a URL from one of these sources, and you want to handle it. I have
a hard time imagining the sequence of events that would lead to mojibake.
Naive parsing of a document in bytes couldn't do it, because if you have a
On the other hand, it is dangerous to provide a polymorphic API which
does that more extensive parsing, because a less than paranoid
programmer will have very likely allowed the parsed components to
escape from the context where their encodings can be reliably
determined. =A0Remember, *it is
Neil Hodgson writes:
Over time, the set of trail bytes used has expanded - in GB18030
digits are possible although many of the most important characters
for parsing such as ''' #%.?/''' are still safe as they may not
be trail bytes in the common double-byte character sets.
That's just
On Mon, Sep 20, 2010 at 2:12 PM, Glyph Lefkowitz
gl...@twistedmatrix.com wrote:
While I don't like the email6 precedent as such (that there would be
different parsed objects, based on whether you started parsing with bytes or
with strings), the idea that when you are working directly with bytes
On Sun, 2010-09-19 at 12:03 +1000, Nick Coghlan wrote:
On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote:
On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote:
Polymorphic best practices [was: (Not) delaying the 3.2 release]
If you're hung up on this, try
On Mon, Sep 20, 2010 at 10:12 PM, Chris McDonough chr...@plope.com wrote:
urllib.parse.urlparse/urllib.parse.urlsplit will never need to decode
anything when passed bytes input.
Correct. Supporting manipulation of bytes directly is primarily a
speed hack for when an application wants to avoid
On Mon, 2010-09-20 at 23:23 +1000, Nick Coghlan wrote:
On Mon, Sep 20, 2010 at 10:12 PM, Chris McDonough chr...@plope.com wrote:
urllib.parse.urlparse/urllib.parse.urlsplit will never need to decode
anything when passed bytes input.
Correct. Supporting manipulation of bytes directly is
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote:
Existing APIs save for quote don't really need to deal with charset
encodings at all, at least on any level that Python needs to care about.
The potential already exists to emit garbage which will turn into
mojibake from
On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote:
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote:
Existing APIs save for quote don't really need to deal with charset
encodings at all, at least on any level that Python needs to care about.
The potential already
On Tue, 2010-09-21 at 08:19 +1000, Nick Coghlan wrote:
On Tue, Sep 21, 2010 at 7:39 AM, Chris McDonough chr...@plope.com wrote:
On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote:
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote:
Existing APIs save for quote don't
On Tue, Sep 21, 2010 at 7:39 AM, Chris McDonough chr...@plope.com wrote:
On Tue, 2010-09-21 at 07:12 +1000, Nick Coghlan wrote:
On Tue, Sep 21, 2010 at 4:30 AM, Chris McDonough chr...@plope.com wrote:
Existing APIs save for quote don't really need to deal with charset
encodings at all, at
On Sep 18, 2010, at 10:18 PM, Steve Holden wrote:
I could probably be persuaded to merge the APIs, but the email6
precedent suggests to me that separating the APIs better reflects the
mental model we're trying to encourage in programmers manipulating
text (i.e. the difference between the raw
On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote:
On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote:
Polymorphic best practices [was: (Not) delaying the 3.2 release]
If you're hung up on this, try writing the user-level documentation
first. Your target audience
On 9/18/2010 10:03 PM, Nick Coghlan wrote:
On Sun, Sep 19, 2010 at 4:18 AM, John Nagle na...@animats.com wrote:
On 9/18/2010 2:29 AM, python-dev-requ...@python.org wrote:
Polymorphic best practices [was: (Not) delaying the 3.2 release]
If you're hung up on this, try writing the user-level
27 matches
Mail list logo