On Mon, Jun 28, 2010 at 6:28 PM, Greg Ewing wrote:
> R. David Murray wrote:
>
>> Having such a poly_str type would probably make my life easier.
>
> A thought on this poly_str type: perhaps it could be
> called "ascii", since that's what it would have to be
> restricted to, and have
>
> a'xxx'
>
On Mon, 28 Jun 2010 13:55:26 +0530, Senthil Kumaran wrote:
> On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote:
> > Thinking way outside the square, and probably the pale
> > as well, maybe @ could be pressed into service as an
> > infix operator, with
> >
> > s...@i
> >
> > being equ
On Mon, Jun 28, 2010 at 08:28:45PM +1200, Greg Ewing wrote:
> A thought on this poly_str type: perhaps it could be
> called "ascii", since that's what it would have to be
> restricted to, and have
>
> a'xxx'
>
> as a literal syntax for it, seeing as literals seem to
> be one of its main use cas
R. David Murray wrote:
Having such a poly_str type would probably make my life easier.
A thought on this poly_str type: perhaps it could be
called "ascii", since that's what it would have to be
restricted to, and have
a'xxx'
as a literal syntax for it, seeing as literals seem to
be one of
I've been watching this discussion with intense interest, but have
been so lagged in following the thread that I haven't replied.
I got caught up today
On Sun, 27 Jun 2010 15:53:59 +1000, Nick Coghlan wrote:
> The difference is that we have three classes of algorithm here:
> - those that work
At 03:53 PM 6/27/2010 +1000, Nick Coghlan wrote:
We could talk about this even longer, but the most effective way
forward is going to be a patch that improves the URL parsing
situation.
Certainly, it's the only practical solution for the immediate problems in 3.2.
I only mentioned that I "hate
P.J. Eby writes:
> At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:
> >What I'm saying here is that if bytes are the signal of validity, and
> >the stdlib functions preserve validity, then it's better to have the
> >stdlib functions object to unicode data as an argument. Compare the
>
On Sat, 26 Jun 2010 23:49:11 -0400
"P.J. Eby" wrote:
>
> Remember, bytes and strings already have to detect mixed-type
> operations.
Not in Python 3. They just raise a TypeError on bad
("mixed-type") arguments.
Regards
Antoine.
___
Python-Dev mail
On Sun, Jun 27, 2010 at 1:49 PM, P.J. Eby wrote:
> I just hate the idea that functions taking strings should have to be
> *rewritten* to be explicitly type-agnostic. It seems *so* un-Pythonic...
> like if all the bitmasking functions you'd ever written using 32-bit int
> constants had to be rewr
At 12:43 PM 6/27/2010 +1000, Nick Coghlan wrote:
While full support for third party strings and
byte sequence implementations is an interesting idea, I think it's
overkill for the specific problem of making it easier to write
str/bytes agnostic functions for tasks like URL parsing.
OTOH, to wri
On Sun, Jun 27, 2010 at 4:17 AM, P.J. Eby wrote:
> The idea that I'm proposing is that the basic string and byte types should
> defer to "user-defined" string types for mixed type operations, so that
> polymorphism of string-manipulation functions is the *default* case, rather
> than a *special* c
At 12:42 PM 6/26/2010 +0900, Stephen J. Turnbull wrote:
What I'm saying here is that if bytes are the signal of validity, and
the stdlib functions preserve validity, then it's better to have the
stdlib functions object to unicode data as an argument. Compare the
alternative: it returns a unicode
P.J. Eby writes:
> it's just that if you already have the bytes, and all you want to
> do is tag them (e.g. the WSGI headers case), the extra encoding
> step seems pointless.
Well, I'll have to concede that unless and until I get involved in the
WSGI development effort.
> >But with your arch
At 01:18 AM 6/26/2010 +0900, Stephen J. Turnbull wrote:
It seems to me what is wanted here is something like Perl's taint
mechanism, for *both* kinds of strings. Am I missing something?
You could certainly view it as a kind of tainting. The part where
the type would be bytes-based is indeed
Ian Bicking writes:
> I don't get what you are arguing against. Are you worried that if
> we make URL code polymorphic that this will mean some code will
> treat URLs as bytes, and that code will be incompatible with URLs
> as text? No one is arguing we remove text support from any of
> the
P.J. Eby writes:
> I do know the ultimate target codec -- that's the point.
>
> IOW, I want to be able to do to all my operations by passing
> target-encoded strings to polymorphic functions.
IOW, you *do* have text and (ignoring efficiency issues) could just as
well use str. But That Other
On Fri, Jun 25, 2010 at 2:05 AM, Stephen J. Turnbull wrote:
> > But join('x', 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make
> > sense to me.
> >
> > So, actually, I *don't* understand what you mean by needing LBYL.
>
> Consider docutils. Some folks assert that URIs *are* bytes and should
>
At 04:49 PM 6/25/2010 +0900, Stephen J. Turnbull wrote:
P.J. Eby writes:
> This doesn't have to be in the functions; it can be in the
> *types*. Mixed-type string operations have to do type checking and
> upcasting already, but if the protocol were open, you could make an
> encoded-bytes ty
P.J. Eby writes:
> This doesn't have to be in the functions; it can be in the
> *types*. Mixed-type string operations have to do type checking and
> upcasting already, but if the protocol were open, you could make an
> encoded-bytes type that would handle the error checking.
Don't you rea
Guido van Rossum writes:
> On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull
> wrote:
> Understood, but both the majority of str/bytes methods and several
> existing APIs (e.g. many in the os module, like os.listdir()) do it
> this way.
Understood.
> Also, IMO a polymorphic function s
On Fri, Jun 25, 2010 at 1:41 AM, Guido van Rossum wrote:
> I don't think we should abuse sum for this. A simple idiom to get the
> *empty* string of a particular type is x[:0] so you could write
> something like this to concatenate a list or strings or bytes:
> xs[:0].join(xs). Note that if xs is
On Fri, Jun 25, 2010 at 3:07 AM, P.J. Eby wrote:
> (Btw, in some earlier emails, Stephen, you implied that this could be fixed
> with codecs -- but it can't, because the problem isn't with the bytes
> containing invalid Unicode, it's with the Unicode containing invalid bytes
> -- i.e., characters
At 05:12 PM 6/24/2010 +0900, Stephen J. Turnbull wrote:
Guido van Rossum writes:
> For example: how we can make the suite of functions used for URL
> processing more polymorphic, so that each developer can choose for
> herself how URLs need to be treated in her application.
While you have co
P.J. Eby a écrit :
[...] stdlib constants are almost always ASCII,
and the main use cases for ebytes would involve ascii-extended encodings.)
Then, how about a new "ascii string" literal? This would produce a special kind
of string that would coerce to a normal string when mixed with a str, a
On Thu, Jun 24, 2010 at 8:25 AM, Nick Coghlan wrote:
> On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote:
>> Also, IMO a polymorphic function should *not* accept *mixed*
>> bytes/text input -- join('x', b'y') should be rejected. But join('x',
>> 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y'
On Fri, Jun 25, 2010 at 12:33 AM, Guido van Rossum wrote:
> Also, IMO a polymorphic function should *not* accept *mixed*
> bytes/text input -- join('x', b'y') should be rejected. But join('x',
> 'y') -> 'x/y' and join(b'x', b'y') -> b'x/y' make sense to me.
A policy of allowing arguments to be ei
On Thu, Jun 24, 2010 at 1:12 AM, Stephen J. Turnbull wrote:
> Guido van Rossum writes:
>
> > For example: how we can make the suite of functions used for URL
> > processing more polymorphic, so that each developer can choose for
> > herself how URLs need to be treated in her application.
>
> Wh
On 24/06/2010 11:58, M.-A. Lemburg wrote:
Lennart Regebro wrote:
On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote:
Yeah. This is a real issue I have with the direction Python3 went: it pushes
you into decoding everything to unicode early, even when you don't care --
Well,
Lennart Regebro wrote:
> On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote:
>> Yeah. This is a real issue I have with the direction Python3 went: it pushes
>> you into decoding everything to unicode early, even when you don't care --
>
> Well, yes, maybe even if *you* don't care. But often the
On Tue, Jun 22, 2010 at 20:07, James Y Knight wrote:
> Yeah. This is a real issue I have with the direction Python3 went: it pushes
> you into decoding everything to unicode early, even when you don't care --
Well, yes, maybe even if *you* don't care. But often the functions you
need to call must
Guido van Rossum writes:
> For example: how we can make the suite of functions used for URL
> processing more polymorphic, so that each developer can choose for
> herself how URLs need to be treated in her application.
While you have come down on the side of polymorphism (as opposed to
separat
On Wed, Jun 23, 2010 at 11:35:12PM +0200, Antoine Pitrou wrote:
> On Wed, 23 Jun 2010 17:30:22 -0400
> Toshio Kuratomi wrote:
> > Note that this assumption seems optimistic to me. I started talking to
> > Graham
> > Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
> >
On Wed, 23 Jun 2010 17:30:22 -0400
Toshio Kuratomi wrote:
> Note that this assumption seems optimistic to me. I started talking to Graham
> Dumpleton, author of mod_wsgi a couple years back because mod_wsgi and paste
> do decoding of bytes to unicode at different layers which caused problems
> fo
On Wed, Jun 23, 2010 at 09:36:45PM +0200, Antoine Pitrou wrote:
> On Wed, 23 Jun 2010 14:23:33 -0400
> Tres Seaver wrote:
> > - - the slow adoption / porting rate of major web frameworks and libraries
> > to Python 3.
>
> Some of the major web frameworks and libraries have a ton of
> dependenci
On Wed, 23 Jun 2010 14:23:33 -0400
Tres Seaver wrote:
>
> Perhaps such decisions need revisiting in light of subsequent experience
> / pain / learning. E.g:
>
> - - the repeated inability of the web-sig to converge on appropriate
> semantics for a Python3-compatible version of the WSGI spec;
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Bill Janssen wrote:
> The bigger problem seems to be that we're revisiting the design
> discussion about urllib.parse from the summer of 2008. See
> http://bugs.python.org/issue3300 if you want to recall how we hashed
> this out 2 years ago. I didn'
On Jun 22, 2010, at 8:57 PM, Robert Collins wrote:
> bzr has a cache of decoded strings in it precisely because decode is
> slow. We accept slowness encoding to the users locale because thats
> typically much less data to examine than we've examined while
> generating the commit/diff/whatever. We
Oops, I forgot some important quoting (important for the algorithm,
maybe not actually for the discussion)...
from urllib.parse import urlsplit, urlunsplit
import encodings.idna
# urllib.parse.quote both always returns str, and is not as
conservative in quoting as required here...
def quote_unsaf
Guido van Rossum wrote:
> So I propose that we drop the discussion "are URLs text or bytes" and
> try to find something more pragmatic to discuss.
>
> For example: how we can make the suite of functions used for URL
> processing more polymorphic, so that each developer can choose for
> herself h
On Wed, Jun 23, 2010 at 10:30 AM, Tres Seaver wrote:
> Stephen J. Turnbull wrote:
>
> > We do need str-based implementations of modules like urllib.
>
>
> Why would that be? URLs aren't text, and never will be. The fact that
> to the eye they may seem to be text-ish doesn't make them text. Th
Tres Seaver wrote:
> Stephen J. Turnbull wrote:
>
> > We do need str-based implementations of modules like urllib.
>
> Why would that be? URLs aren't text, and never will be. The fact that
> to the eye they may seem to be text-ish doesn't make them text. This
URLs are exactly text (strings,
On Jun 23, 2010, at 08:43 AM, Guido van Rossum wrote:
>So I propose that we drop the discussion "are URLs text or bytes" and
>try to find something more pragmatic to discuss.
email has exactly the same question, and the answer is "yes".
>For example: how we can make the suite of functions used
On Wed, Jun 23, 2010 at 8:30 AM, Tres Seaver wrote:
> Stephen J. Turnbull wrote:
>
>> We do need str-based implementations of modules like urllib.
>
> Why would that be? URLs aren't text, and never will be. The fact that
> to the eye they may seem to be text-ish doesn't make them text. This
> *
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Stephen J. Turnbull wrote:
> We do need str-based implementations of modules like urllib.
Why would that be? URLs aren't text, and never will be. The fact that
to the eye they may seem to be text-ish doesn't make them text. This
*is* a case where
At 08:34 PM 6/22/2010 -0400, Glyph Lefkowitz wrote:
I suspect the practical problem here is that there's no CharacterString ABC
That, and the absence of a string coercion protocol so that mixing
your custom string with standard strings will do the right thing for
your intended use.
On Wed, Jun 23, 2010 at 7:18 PM, M.-A. Lemburg wrote:
> Note that the point of using a builtin method was to get
> better performance. Such type adaptions are often needed in
> loops, so adding a few extra Python function calls just to
> convert a str object to a bytes object or vice-versa is a
>
Nick Coghlan wrote:
> On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote:
>> It would be great if we could have something like the above as
>> builtin method:
>>
>> x.split('&'.as(x))
>
> As per my other message, another possible (and reasonably intuitive)
> spelling would be:
>
> x.split(x.
James Y Knight writes:
> The surrogateescape method is a nice workaround for this, but I can't
> help thinking that it might've been better to just treat stuff as
> possibly-invalid-but-probably-utf8 byte-strings from input, through
> processing, to output.
This is the world we already
Ian Bicking writes:
> Just for perspective, I don't know if I've ever wanted to deal with a URL
> like that.
Ditto, I do many times a day for Japanese media sites and Wikipedia.
> I know how it is supposed to work, and I know what a browser does
> with that, but so many tools will clean that
On Wed, Jun 23, 2010 at 12:25 PM, Glyph Lefkowitz
wrote:
> I can also appreciate what's been said in this thread a bunch of times: to my
> knowledge, nobody has actually shown a profile of an application where
> encoding is significant overhead. I believe that encoding _will_ be a
> significan
On Tue, Jun 22, 2010 at 4:23 PM, Ian Bicking wrote:
> This reminds me of the optimization ElementTree and lxml made in Python 2
> (not sure what they do in Python 3?) where they use str when a string is
> ASCII to avoid the memory and performance overhead of unicode.
An optimization that forces
On Jun 22, 2010, at 7:23 PM, Ian Bicking wrote:
> This is a place where bytes+encoding might also have some benefit. XML is
> someplace where you might load a bunch of data but only touch a little bit of
> it, and the amount of data is frequently large enough that the efficiencies
> are impor
On Jun 22, 2010, at 2:07 PM, James Y Knight wrote:
> Yeah. This is a real issue I have with the direction Python3 went: it pushes
> you into decoding everything to unicode early, even when you don't care --
> all you really wanted to do is pass it from one API to another, with some
> well-defi
On Jun 22, 2010, at 12:53 PM, Guido van Rossum wrote:
> On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger
> wrote:
>>
>> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
>>
>> This is a common pain-point for porting software to 3.x - you had a
>> string, it kinda worked most of the time
At 07:41 AM 6/23/2010 +1000, Nick Coghlan wrote:
Then my example above could be made polymorphic (for ASCII compatible
encodings) by writing:
[x for x in seq if x.endswith(x.coerce("b"))]
I'm trying to see downsides to this idea, and I'm not really seeing
any (well, other than 2.7 being almos
On Tue, Jun 22, 2010 at 11:17 AM, Guido van Rossum wrote:
> (2) Data sources.
>
> These can be functions that produce new data from non-string data,
> e.g. str(), read it from a named file, etc. An example is read()
> vs. write(): it's easy to create a (hypothetical) polymorphic stream
> object t
On 22/06/2010 19:07, James Y Knight wrote:
On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using Python
to want to take the same approach, sticking with unencoded data in
nearly all situations.
Yeah. This is a real issue I have with th
On 22/06/2010 22:40, Robert Collins wrote:
On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote:
return constant.encode('utf-8')
So now you can write x.split(literal_as('&', x)).
This polymorphism is what we used in Python2 a lot to write
code that works for both Unico
On Wed, Jun 23, 2010 at 4:09 AM, M.-A. Lemburg wrote:
> It would be great if we could have something like the above as
> builtin method:
>
> x.split('&'.as(x))
As per my other message, another possible (and reasonably intuitive)
spelling would be:
x.split(x.coerce('&'))
Writing it as a helper
On Wed, Jun 23, 2010 at 2:17 AM, Guido van Rossum wrote:
> (1) Literals.
>
> If you write something like x.split('&') you are implicitly assuming x
> is text. I don't see a very clean way to overcome this; you'll have to
> implement some kind of type check e.g.
>
> x.split('&') if isinstance(x,
On Wed, Jun 23, 2010 at 6:09 AM, M.-A. Lemburg wrote:
>> return constant.encode('utf-8')
>>
>> So now you can write x.split(literal_as('&', x)).
>
> This polymorphism is what we used in Python2 a lot to write
> code that works for both Unicode and 8-bit strings.
>
> Unfortunately, this
On Tue, Jun 22, 2010 at 1:07 PM, James Y Knight wrote:
> The surrogateescape method is a nice workaround for this, but I can't help
> thinking that it might've been better to just treat stuff as
> possibly-invalid-but-probably-utf8 byte-strings from input, through
> processing, to output. It seem
On 6/22/2010 12:53 PM, Guido van Rossum wrote:
On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger
wrote:
On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
This is a common pain-point for porting software to 3.x - you had a
string, it kinda worked most of the time before, but now you n
On 6/22/2010 1:22 AM, Glyph Lefkowitz wrote:
The thing that I have heard in passing from a couple of folks with
experience in this area is that some older software in asia would
present characters differently if they were originally encoded in a
"japanese" encoding versus a "chinese" encoding, e
Guido van Rossum wrote:
> [Just addressing one little issue here; generally I'm just happy that
> we're discussing this issue in such detail from so many points of
> view.]
>
> On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote:
>> [...] Would urljoin(b_base, b_subdir) => bytes and
>> urljoi
On Jun 22, 2010, at 1:03 PM, Ian Bicking wrote:
Similarly I'd expect (from experience) that a programmer using
Python to want to take the same approach, sticking with unencoded
data in nearly all situations.
Yeah. This is a real issue I have with the direction Python3 went: it
pushes you
On Mon, Jun 21, 2010 at 11:47 PM, Raymond Hettinger
wrote:
>
> On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
>
> This is a common pain-point for porting software to 3.x - you had a
> string, it kinda worked most of the time before, but now you need to keep
> track of text too and the func
On Tue, Jun 22, 2010 at 08:31:13PM +0900, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
> > unicode handling redesign. I'm stating my reading of the RFC not to defend
> > the use case Philip has, but because I think that the outlook that non-text
> > uris (before being percentencoded) ar
On Tue, Jun 22, 2010 at 6:31 AM, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
>
> > I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and
> > urljoin(u_base, u_subdir) => unicode be acceptable though?
>
> Probably.
>
> But it doesn't matter what I say, since Guido has d
[Just addressing one little issue here; generally I'm just happy that
we're discussing this issue in such detail from so many points of
view.]
On Mon, Jun 21, 2010 at 10:50 PM, Toshio Kuratomi wrote:
>[...] Would urljoin(b_base, b_subdir) => bytes and
> urljoin(u_base, u_subdir) => unicode be acc
Toshio Kuratomi writes:
> I'll definitely buy that. Would urljoin(b_base, b_subdir) => bytes and
> urljoin(u_base, u_subdir) => unicode be acceptable though?
Probably.
But it doesn't matter what I say, since Guido has defined that as
"polymorphism" and approved it in principle.
> (I think
Glyph Lefkowitz writes:
> On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote:
> > Note also that the "complete solution" argument cuts both ways. Eg, a
> > "complete" solution should implement UTS 39 "confusables detection"[1]
> > and IDNA[2]. Good luck doing that with bytes!
>
> And
On Jun 21, 2010, at 10:31 PM, Glyph Lefkowitz wrote:
> This is a common pain-point for porting software to 3.x - you had a string,
> it kinda worked most of the time before, but now you need to keep track of
> text too and the functions which seemed to work on bytes no longer do.
Thanks Glyph
On Jun 21, 2010, at 10:58 PM, Stephen J. Turnbull wrote:
> The RFC says that URIs are text, and therefore they can (and IMO
> should) be operated on as text in the stdlib.
No, *blue* is the best color for a shed.
Oops, wait, let me try that again.
While I broadly agree with this statement, it
On Tue, Jun 22, 2010 at 11:58:57AM +0900, Stephen J. Turnbull wrote:
> Toshio Kuratomi writes:
>
> > One comment here -- you can also have uri's that aren't decodable into
> their
> > true textual meaning using a single encoding.
> >
> > Apache will happily serve out uris that have utf-8, sh
On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote:
> One issue I remember from my "enterprise" days is some of the Asian-language
> developers at NTT/Verio explaining to me that unicode doesn't actually solve
> certain issues -- that there are use cases where you really *do* need "bytes
> plus encodi
Robert Collins writes:
> Perhaps you mean 3986 ? :)
Thank you for the correction.
> > A URI is an identifier consisting of a sequence of characters
> > matching the syntax rule named in Section 3.
> >
> > (where the phrase "sequence of characters" appears in all ancestors I
> > foun
Toshio Kuratomi writes:
> One comment here -- you can also have uri's that aren't decodable into their
> true textual meaning using a single encoding.
>
> Apache will happily serve out uris that have utf-8, shift-jis, and
> euc-jp components inside of their path but the textual
> representa
On 6/21/2010 1:29 PM, Guido van Rossum wrote:
Actually, the big problem with Python 2 is that if you mix str and
unicode, things work or crash depending on whether any of the str
objects involved contain non-ASCII bytes.
If one API decides to upgrade to Unicode, the result, when passed to
anoth
On 6/21/2010 1:29 PM, P.J. Eby wrote:
At 05:49 PM 6/21/2010 +0100, Michael Foord wrote:
Why is your proposed bstr wrapper not practical to implement outside
the core and use in your own libraries and frameworks?
__contains__ doesn't have a converse operation, so you can't code a type
that work
2010/6/21 Stephen J. Turnbull :
> Robert Collins writes:
>
> > Also, url's are bytestrings - by definition;
>
> Eh? RFC 3896 explicitly says
?Definitions of Managed Objects for the DS3/E3 Interface Type
Perhaps you mean 3986 ? :)
> A URI is an identifier consisting of a sequence of characte
At 10:29 AM 6/21/2010 -0700, Guido van Rossum wrote:
Perhaps there are more situations where a polymorphic API would be
helpful. Such APIs are not always so easy to implement, because they
have to be careful with literals or other constants (and even more so
mutable state) used internally -- but
At 12:56 PM 6/21/2010 -0400, Toshio Kuratomi wrote:
One comment here -- you can also have uri's that aren't decodable into their
true textual meaning using a single encoding.
Apache will happily serve out uris that have utf-8, shift-jis, and euc-jp
components inside of their path but the textual
On 6/21/2010 8:51 AM, Nick Coghlan wrote:
I don't know that the "all is well" camp actually exists. The camp
that I do see existing is the one that says "without a bug report,
inconsistencies in the standard library's unicode handling won't get
fixed".
The issues picked up by the regression te
At 05:49 PM 6/21/2010 +0100, Michael Foord wrote:
Why is your proposed bstr wrapper not practical to implement outside
the core and use in your own libraries and frameworks?
__contains__ doesn't have a converse operation, so you can't code a
type that works around this (Python 3.1 shown):
>>
On Mon, Jun 21, 2010 at 9:46 AM, P.J. Eby wrote:
> At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:
>>
>> It may be that there are places where we need to rewrite standard
>> library algorithms to be bytes/str neutral (e.g. by using length one
>> slices instead of indexing). It may be that there a
On 6/20/2010 11:56 PM, Terry Reedy wrote:
The specific example is
>>> urllib.parse.parse_qsl('a=b%e0')
[('a', 'b�')]
where the character after 'b' is white ? in dark diamond, indicating an
error.
parse_qsl() splits that input on '=' and sends each piece to
urllib.parse.unquote
unquote() atte
On Tue, Jun 22, 2010 at 01:08:53AM +0900, Stephen J. Turnbull wrote:
> Lennart Regebro writes:
>
> > 2010/6/21 Stephen J. Turnbull :
> > > IMO, the UI is right. "Something" like the above "ought" to work.
> >
> > Right. That said, many times when you want to do urlparse etc they
> > might b
At 01:08 AM 6/22/2010 +0900, Stephen J. Turnbull wrote:
But if you need that "everywhere", what's so hard about
def urljoin_wrapper (base, subdir):
return urljoin(str(base, 'latin-1'), subdir).encode('latin-1')
Now, note how that pattern fails as soon as you want to use
non-ISO-8859-1 langu
On 21/06/2010 17:46, P.J. Eby wrote:
At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:
It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need t
At 10:51 PM 6/21/2010 +1000, Nick Coghlan wrote:
It may be that there are places where we need to rewrite standard
library algorithms to be bytes/str neutral (e.g. by using length one
slices instead of indexing). It may be that there are more APIs that
need to grow "encoding" keyword arguments th
Lennart Regebro writes:
> 2010/6/21 Stephen J. Turnbull :
> > IMO, the UI is right. "Something" like the above "ought" to work.
>
> Right. That said, many times when you want to do urlparse etc they
> might be binary, and you might want binary. So maybe the methods
> should work with both?
On Mon, Jun 21, 2010 at 12:30 PM, P.J. Eby wrote:
> I also find it weird that there seem to be two camps on this subject, one of
> which claims that All Is Well And There Is No Problem -- but I do not recall
> seeing anyone who was in the "What do I do; this doesn't seem ready" camp
> who switched
2010/6/21 Stephen J. Turnbull :
> IMO, the UI is right. "Something" like the above "ought" to work.
Right. That said, many times when you want to do urlparse etc they
might be binary, and you might want binary. So maybe the methods
should work with both?
--
Lennart Regebro: http://regebro.wordp
Robert Collins writes:
> Also, url's are bytestrings - by definition;
Eh? RFC 3896 explicitly says
A URI is an identifier consisting of a sequence of characters
matching the syntax rule named in Section 3.
(where the phrase "sequence of characters" appears in all ancestors I
found ba
On Sun, Jun 20, 2010 at 23:55, Benjamin Peterson wrote:
> There are not many tools for treating bytes as text.
Well, what tools would you need that can be used also on bytes? Bytes
objects has a lot of the same methods like strings do, and that will
cover 99% of the cases. Most text tools assume
On 6/20/2010 9:33 PM, P.J. Eby wrote:
At 07:33 PM 6/20/2010 -0400, Terry Reedy wrote:
Do you have in mind any tools that could and should operate on both,
but do not?
From http://mail.python.org/pipermail/web-sig/2009-September/004105.html :
Thank for the concrete examples in this and your
At 11:47 PM 6/20/2010 +0200, Antoine Pitrou wrote:
On Sun, 20 Jun 2010 14:40:56 -0400
"P.J. Eby" wrote:
>
> Actually, I would say that it's more that (in the network protocol
> case) we *have* bytes, some of which we would like to *treat* as
> text, yet do not wish to constantly convert back and
At 07:33 PM 6/20/2010 -0400, Terry Reedy wrote:
Do you have in mind any tools that could and should operate on both,
but do not?
From http://mail.python.org/pipermail/web-sig/2009-September/004105.html :
"""The problem which arises is that unquoting of URLs in Python 3.X
stdlib can only be don
On 6/20/2010 5:55 PM, Benjamin Peterson wrote:
2010/6/20 Antoine Pitrou:
On Sun, 20 Jun 2010 14:40:56 -0400
"P.J. Eby" wrote:
Actually, I would say that it's more that (in the network protocol
case) we *have* bytes, some of which we would like to *treat* as
text, yet do not wish to constantly
1 - 100 of 103 matches
Mail list logo